Lever 1 — Model right-sizing (40-70% potential savings)
**The mechanic:** Frontier models (GPT-4o, Claude Opus, Gemini Pro) cost 10-50× more per token than capable smaller models (GPT-4o-mini, Claude Haiku, Gemini Flash). Per OpenAI pricing at openai.com and Anthropic pricing at anthropic.com, most production workloads use frontier models for tasks that smaller models handle equivalently.
**The audit:** Sample 100 production LLM calls. For each, ask: does this genuinely need the frontier model? Per Helicone at helicone.ai and Langfuse at langfuse.com, most teams find 50-80% of calls could use the cheaper tier with equal quality.
**The math:** A workload at $20K/month on frontier where 60% moves to small-model tier = $20K × 60% × (1 - 1/15) = ~$11K/month savings. Single largest cost lever for most LLM products.
**The implementation:** Per Vercel AI Gateway at vercel.com and OpenRouter at openrouter.ai, provider abstraction layers let you swap models per-request based on task complexity. Eval-set verification on the cheaper tier before promoting.