The cost formula (identical to every other provider)
Every DeepSeek API call follows the same math as OpenAI, Anthropic, or any other token-billed provider. There is no platform fee, no per-call fee, no minimum spend. You pay for what you send and what you get back, at the model's per-1M-token rate:
``` cost = (input_tokens / 1,000,000) × input_price_per_M + (output_tokens / 1,000,000) × output_price_per_M ```
The DeepSeek-specific adjustment that matters: cache-hit input. Portions of your prompt prefix that DeepSeek has seen in a recent prior call within the cache window bill at the cache-hit rate. On V3 and R1 that is exactly 10% of standard input (90% off). On V4-Flash and V4-Pro it drops to 2% and 0.83% respectively — close to free. Long stable system prompts, fixed tool schemas, and reused few-shot blocks are the typical winners. Cache activation is automatic — you do not pass a flag; DeepSeek's server-side cache matches your prompt prefix and applies the discount in billing.
Reasoning tokens on DeepSeek-R1 and DeepSeek-V4-Pro bill at the output rate even though they are not returned to the caller — the same shape as OpenAI's o-series. A model that thinks for 6,000 tokens before producing a 400-token answer bills 6,400 output tokens. Plan a 5-15x output budget on reasoning-heavy tasks vs straight chat tasks. R1 in particular has been measured generating 3,000-10,000 reasoning tokens on complex problems — model that into your per-call estimates or you will be surprised by the invoice.