1. Enable prompt caching — 90% off repeated input tokens
Prompt caching is the single most effective way to reduce GPT-4 API costs for any workflow with a stable system prompt, repeated retrieved context, or reused few-shot examples. OpenAI's automatic prompt caching prices cached input tokens at 10% of the standard rate. For GPT-4o at $3.00/1M input tokens, cached tokens cost $0.30/1M — a 90% reduction. For GPT-5-mini (priced at approximately $0.40/1M input tokens), cached tokens run $0.04/1M. The savings scale linearly with how often you re-send the same prefix. Full pricing is on openai.com/pricing.
Caching is automatic on OpenAI as of 2025 — no API parameter needed. The model server caches prefixes after 1,024 tokens and keeps the cache active for a sliding window of 5-10 minutes. To maximize cache hits: (a) put the stable content at the top of your prompt, before any dynamic variables; (b) keep the system prompt identical across calls in a session; (c) if you use retrieved documents, prepend them in a fixed order rather than sorting dynamically each call. For agent loops that call the model 10-50 times per session, this alone typically cuts the input-token portion of the bill by 80-90%.
Line-item example: a customer support bot with a 6,000-token system prompt (instructions + knowledge base excerpt) responding to 50,000 tickets per month. Without caching: 50,000 × 6,000 tokens × $3.00/1M = $900/month in system-prompt input tokens alone. With caching (assuming 85% cache hit rate): 50,000 × 6,000 × (0.15 × $3.00 + 0.85 × $0.30) / 1M = $50.25/month. **$849.75 saved per month.** The implementation change is restructuring the prompt so the system message is first and static — roughly 1 hour of work.
The only trap: OpenAI's cache window is session-scoped and time-limited. If calls are spread across hours with no requests in between, you will not get cache hits. For workloads that batch-process overnight with gaps, combine prompt caching with the Batch API (see section 2) and set cache-warming calls at the start of each batch. See the platform.openai.com prompt caching docs for the technical details.