The cost formula (same shape, three rate cards)
Every Llama 4 inference call follows the same per-token math regardless of host. There is no platform fee on Groq or Together, no per-call fee. You pay for what you send and what you get back, at the host's per-1M-token rate:
``` cost = (input_tokens / 1,000,000) × input_price_per_M + (output_tokens / 1,000,000) × output_price_per_M ```
What changes between hosts is the rate card. The same 1,000-in / 500-out call lands at $0.00028 on Groq Scout, $0.000475 on Together Scout, $0.00089 on Groq Maverick, and $0.00069 on Together Maverick. Groq Scout is the cheapest combination of model + host on the table — by a 1.7x margin over Together AI on the same model — for any workload where Scout's quality holds.
Replicate prices Llama 4 differently. Most community model pages on Replicate charge per-second of compute time (typically $0.000725-$0.00115/sec on H100-class hardware) rather than per token, which makes budgeting variable: a fast 500-token completion might cost $0.002, a slow one $0.008. Use Replicate when you need a one-off run or a model variant Groq and Together don't carry, not for predictable high-volume production.