Where tokens actually go (audit before you optimize)
Before trimming anything, find out where your tokens are going. In most production prompts the cost is not the user's question — it's the **standing context**: a giant system prompt, few-shot examples you no longer need, retrieved documents that are only partly relevant, and conversation history that grows every turn. That context is re-sent on every single call, so a 2,000-token system prompt used a thousand times a day is two million input tokens before anyone types a word.
Count tokens, don't guess. Roughly four characters of English equal one token, but use a real tokenizer for anything you're billing against. The shape of the bill tells you where to aim: if input tokens dwarf output, optimize context and caching first; if output dominates, cap length and tighten format.
For the economics behind this, see what is a context window and cost per token across all major models. The goal of the audit is to spend your effort on the 80% of tokens that come from 20% of the prompt.