What fine-tuning actually does (and what it doesn't)
Fine-tuning adjusts the weights of a pre-trained model on a curated dataset of (prompt, completion) pairs. The result is a model that responds differently than the base — it follows a specific output format more reliably, mirrors a brand voice without needing style instructions in every prompt, or performs a narrow task (classification, extraction, code generation in a specific framework) better than few-shot prompting achieves.
What fine-tuning does NOT do: it does not reliably inject factual knowledge that wasn't already in pre-training. If you fine-tune GPT-5 on your internal product documentation, the model won't reliably retrieve specific facts from that documentation at query time — it will absorb statistical patterns from the text and may confidently hallucinate product details that weren't actually in the fine-tuning set. This is the single most common misconception that leads teams to spend weeks on fine-tuning only to end up with a confidently wrong chatbot.
The academic literature is clear on this distinction. A widely cited 2023 study from Stanford and MIT showed that fine-tuning on factual QA datasets improved format compliance and style but did not reduce factual hallucination rates — in some cases it increased them by teaching the model to generate fluent, confident answers regardless of factual grounding. For factual recall from a specific corpus, retrieval is the right mechanism. For behavior and style, fine-tuning is the right mechanism.
In 2026, fine-tuning is most cost-effective at scale on open models. Llama 3.3 70B and Llama 4 Scout (109B active parameters via MoE) can be fine-tuned with QLoRA on a single A100 node for a fraction of what API-based fine-tuning costs at high inference volume. The Llama fine-tuning documentation and Meta's published recipes give you a reproducible path. For teams spending >$10k/month on hosted inference and whose use case is style + format consistency rather than factual retrieval, self-hosted fine-tuned Llama 4 Scout is the current cost-optimal choice. See fine-tuning ROI by model for the math.