Why the cost range is 30x
The 30x span between the cheapest fine-tune (Llama 4 8B LoRA on Together at $9 for a typical job) and the most expensive (GPT-5 full SFT at $2,812 for the same job) reflects three different sources of cost difference.
**Model size**: bigger models are more expensive to train because forward/backward passes touch more parameters and weights consume more GPU time. GPT-5 is several hundred billion parameters; Llama 4 8B is 8 billion. The 30-100x size ratio drives 5-15x of the cost difference.
**Method efficiency**: LoRA touches ~0.1-1% of the parameters during training while full fine-tuning touches all of them. The 20-30x compute efficiency gain of LoRA over full fine-tuning is the largest single cost lever in fine-tuning. See our LoRA vs QLoRA vs full fine-tuning cost deep-dive.
**Platform margin and infrastructure**: hosted platforms charge a margin on top of raw GPU cost. Together AI runs at the lowest margin (closest-to-cost per-token rates), Fireworks at moderate margin (premium for the serving stack), OpenAI and Anthropic at the highest margin (frontier-model premium, brand value, and tight integration). Self-hosting on raw GPU hours (Lambda, CoreWeave, RunPod) is the cheapest of all but adds engineering cost.