By The DDH Team · Digital Dashboard Hub

OpenAI vs Anthropic vs Google Fine-Tuning (2026): The Honest API Comparison

Three frontier vendors offer hosted fine-tuning in 2026, and they could not have picked more different bets. OpenAI offers the most mature API surface — supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement fine-tuning on GPT-5 family models, with per-token training pricing visible on the pricing page. Anthropic offers Claude fine-tuning exclusively on AWS Bedrock and Google Vertex (no direct API), at a higher per-token rate but with the option to fine-tune Claude Haiku and Sonnet families. Google offers Gemini supervised fine-tuning through Vertex AI with the most generous free quotas of any frontier vendor — but locked to a single deployment surface. Sourced from openai.com/docs/fine-tuning, docs.anthropic.com/fine-tune, and cloud.google.com/vertex-ai/generative-ai/docs/models/tune-gemini-overview as of June 2026.

By DDH Research Team at Digital Dashboard Hub·Updated June 21, 2026

Browse all 40+ free prompt tools

Fine-tuning a frontier model in 2026 means picking not just an algorithm — supervised learning, DPO, or RLHF — but also which vendor's API and which deployment surface you can live with for the next 12-24 months. The three frontier vendors that ship hosted fine-tuning today are OpenAI, Anthropic, and Google, and their offerings differ enough that the right pick can swing 3-5x on total cost and 6-12 months on iteration speed. If you are deciding between fine-tuning, RAG, and prompt engineering before picking a vendor, see our companion analysis when to fine-tune vs RAG vs prompt engineer.

OpenAI (https://platform.openai.com/docs/guides/fine-tuning) ships the deepest fine-tuning API in the market — supervised fine-tuning on the GPT-5 family including the gpt-5-mini and gpt-5-nano tiers, direct preference optimization (DPO) for offline preference learning, and reinforcement fine-tuning (RFT) for tasks where you can score model outputs programmatically. Anthropic (https://docs.anthropic.com/en/docs/build-with-claude/fine-tuning) takes the opposite philosophy: Claude fine-tuning is only available through cloud partners (Amazon Bedrock and Google Vertex), uses a managed jsonl format, and intentionally restricts which Claude models can be tuned. Google (https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-gemini-overview) bets on the most generous free tier — Gemini 2.5 Flash supervised fine-tuning includes a substantial free quota each month — paired with the deepest Vertex AI tooling for evaluation, deployment, and monitoring.

Below: full API surface comparison, supported base models, training cost per 1M tokens, quota ceilings, data format requirements, and a decision matrix by use case. If you are running the math on total fine-tune cost, estimate with our fine-tuning cost calculator by model and the LoRA H100 training cost calculator.

Digital Dashboard Hub

Picking the model is half the work. Writing the prompt the model actually wants is the other half — GPT-5 system/user split, Claude XML-tagged with cache prefix, Gemini long-context. DDH's AI Prompt Builder writes per-model so the comparison is fair.

Start free 14-day trial — AICHAT30 = 30% off Pro for 3 months. →

OpenAI vs Anthropic vs Google fine-tuning — API surface, pricing, and quota overview, June 2026

Feature	OpenAI	Anthropic	Google
Direct fine-tune API?	Yes — platform.openai.com REST API + dashboard	No — only via Amazon Bedrock or Google Vertex	Yes — Vertex AI REST API + Vertex Studio UI
Supported base models	GPT-5, GPT-5 mini, GPT-5 nano, GPT-4o, GPT-4o-mini, o4-mini (RFT only)	Claude Haiku 4.5, Claude Sonnet 4.6 (Bedrock); Haiku 4.5 (Vertex)	Gemini 2.5 Flash, Gemini 2.5 Pro (limited preview), Gemini 1.5 family (legacy)
Methods supported	SFT, DPO, RFT (reinforcement fine-tuning)	SFT only	SFT only (DPO in preview for Gemini 2.5 Flash)
Training cost (per 1M tokens, GPT-5 / Sonnet / Gemini 2.5 Flash equivalent)	~$25/1M training tokens (GPT-5 mini SFT, June 2026 docs)	~$45/1M training tokens (Sonnet 4.6 SFT on Bedrock)	~$8/1M training tokens (Gemini 2.5 Flash SFT) — paid tier
Free tier on training	No free training tokens — usage-based from token #1	None — Bedrock/Vertex billing only	Yes — substantial monthly free training token allowance on Gemini 2.5 Flash via Vertex
Inference cost on fine-tuned model	~2x base model input price; 1.5x output (GPT-5 family)	~1.5x base model price for Claude Sonnet fine-tunes	~1.0-1.2x base Gemini 2.5 Flash price
Data format	jsonl with `messages` array (chat format) or `prompt`/`completion`	jsonl with `messages` array (Anthropic chat schema)	jsonl with `contents` array (Gemini parts schema)
Min / max training examples	10 min, no published max (50k+ practical for GPT-5 mini)	50 min recommended; 10k example soft cap	16 min, 10k example recommended max (Gemini 2.5 Flash)
Max context per example	Up to 32K tokens (GPT-5 mini training)	Up to 200K input tokens (Sonnet 4.6 training)	Up to 32K tokens (Gemini 2.5 Flash training)
Validation set support	Yes — `validation_file` parameter	Yes — Bedrock job parameter; Vertex parameter	Yes — automatic validation split or explicit file
Eval / metric reporting	Loss + token accuracy; checkpoint inspection in dashboard	Loss + held-out metrics via CloudWatch (Bedrock)	Loss, BLEU, ROUGE; Vertex evaluation pipelines integrated
Deployment / serving	Auto-deployed — call via fine-tuned model ID	Bedrock provisioned throughput required (cost layer)	Vertex endpoint deployment required (per-hour serving cost)
RLHF / RFT support	Yes — RFT with custom graders (o4-mini and select models)	No public offering	DPO in preview on Gemini 2.5 Flash
Data privacy	Training data not used to train base models; ZDR available on Enterprise	Training data isolated per Bedrock/Vertex tenancy; not used by Anthropic	Vertex tenancy isolation; not used to train Gemini base models

Sources as of June 2026: OpenAI fine-tuning docs (https://platform.openai.com/docs/guides/fine-tuning), OpenAI pricing page (https://openai.com/api/pricing), Anthropic fine-tuning announcement (https://docs.anthropic.com/en/docs/build-with-claude/fine-tuning), AWS Bedrock Claude fine-tuning pricing (https://aws.amazon.com/bedrock/pricing/), Google Vertex AI supervised tuning docs (https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-models), Vertex AI pricing (https://cloud.google.com/vertex-ai/pricing). Per-1M-token pricing varies by model size — figures cited are for the cheapest production-ready fine-tunable model in each ecosystem (GPT-5 mini, Claude Haiku 4.5, Gemini 2.5 Flash). Verify before procurement — cloud pricing changes.

What each vendor's fine-tuning offering actually is

These three offerings start from very different problem definitions, and the API surface flows from those starting points. Understanding the underlying philosophy is the fastest way to know which vendor fits your situation before you spend a dollar.

**OpenAI fine-tuning** (https://platform.openai.com/docs/guides/fine-tuning) is the most mature hosted fine-tuning API in the market. It supports three different training algorithms: supervised fine-tuning (SFT) for the standard "here are input-output pairs, learn the pattern" workflow; direct preference optimization (DPO) for offline preference learning where you provide chosen-vs-rejected response pairs; and reinforcement fine-tuning (RFT) where you provide a programmatic grader function and the model is trained against that reward signal. RFT is the differentiator no one else matches in 2026 — for tasks like code completion, math problem solving, or any case where you can score outputs cheaply, RFT can extract quality gains that SFT cannot. The API surface is the standard OpenAI REST pattern: upload a jsonl file, create a fine-tune job, monitor status, then call the resulting fine-tuned model by its ID. The result is auto-deployed — no separate provisioning step.

**Anthropic fine-tuning** (https://docs.anthropic.com/en/docs/build-with-claude/fine-tuning) takes the opposite philosophy: there is no direct Anthropic API for fine-tuning. Claude fine-tuning is exclusively available through Amazon Bedrock and Google Vertex AI partner channels. The reasoning Anthropic has given publicly is that fine-tuning Claude requires custom infrastructure, data isolation, and serving guarantees that Anthropic prefers to delegate to cloud partners with mature enterprise security postures. Practically, this means your fine-tuning workflow looks like a Bedrock or Vertex job — you authenticate against the cloud provider, upload jsonl in Anthropic's chat schema, and the fine-tuned model lives on provisioned Bedrock throughput or a Vertex endpoint. The cost of that provisioned throughput is significant and not negligible compared to the training itself.

**Google Vertex AI fine-tuning** (https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-gemini-overview) is the most generous offering on the free-tier dimension, and the deepest in terms of eval/deployment tooling. Gemini 2.5 Flash supervised fine-tuning includes a substantial monthly free training token allowance — enough that small experiments can run at zero training cost — and the integration with Vertex AI's evaluation pipelines (BLEU, ROUGE, custom metrics) means you can run automated quality checks against held-out sets without writing infrastructure. The catch is that Vertex AI is locked to Google Cloud — you cannot serve a Gemini fine-tune outside Vertex, and the per-hour endpoint deployment cost is a recurring spend even when traffic is low.

**The positioning in one sentence:** OpenAI is for teams who want the deepest method surface (SFT + DPO + RFT) and the lowest serving friction. Anthropic is for teams already on Bedrock/Vertex who specifically need Claude's reasoning quality on their domain data. Google is for teams who want to experiment at zero or near-zero cost on Vertex and use Gemini for production.

Training cost per 1M tokens — the honest math, June 2026

Per-token training pricing is the single most-cited number in fine-tuning decisions, and the three vendors are not directly comparable without normalizing for what "training tokens" means in each ecosystem.

**OpenAI charges by total training tokens processed**, which is (tokens per example) × (number of examples) × (number of epochs). For GPT-5 mini supervised fine-tuning, the published rate as of June 2026 docs is approximately $25 per 1M training tokens. A typical fine-tune run with 5,000 examples averaging 1,500 tokens each, run for 3 epochs, would cost: 5,000 × 1,500 × 3 = 22.5M tokens × $25 = approximately $562. GPT-5 (full) training costs roughly 4-5x that rate, while GPT-5 nano is cheaper by half. RFT is priced as a higher-tier per-token rate plus per-grading-call overhead — budget 2-3x SFT cost for equivalent compute.

**Anthropic (via Bedrock) charges per training token at higher rates** — approximately $45 per 1M tokens for Claude Haiku 4.5 SFT and roughly 2x that for Sonnet 4.6 SFT. The same 22.5M-token job that costs $562 on GPT-5 mini would cost approximately $1,012 on Claude Haiku 4.5 via Bedrock. Additionally, Anthropic fine-tunes incur **provisioned throughput cost** on Bedrock — you pay for model serving capacity per hour whether or not you use it, with a typical Sonnet 4.6 provisioned throughput unit running ~$15-25 per hour depending on region and commitment. For low-volume production, this serving floor often exceeds the training cost.

**Google (via Vertex AI) charges per training token on the paid tier** — approximately $8 per 1M training tokens for Gemini 2.5 Flash SFT, the cheapest of the three by a wide margin. Crucially, Vertex includes a substantial free quota each month, so small-to-medium fine-tune jobs may train at zero cost. The same 22.5M-token job that costs $562 on OpenAI and $1,012 on Anthropic would cost approximately $180 on Gemini 2.5 Flash on the paid tier — and could be free entirely if it fits within the monthly free allowance. The deployment cost is comparable to Bedrock provisioned throughput: a Gemini 2.5 Flash endpoint runs per-hour at the published Vertex AI prediction rate, so always-on serving adds material cost.

**The honest summary on training-only cost:** Google is cheapest, OpenAI is in the middle and has the deepest method surface, Anthropic is the most expensive but the only path to a fine-tuned Claude. For total cost of ownership including serving, run the math at projected QPS — OpenAI's auto-deployed fine-tunes have no per-hour serving floor, which can dominate the calculation for low-traffic production deployments.

Supported base models and method matrix

What you can fine-tune is at least as important as what it costs, and the three vendors restrict different things.

**OpenAI** supports SFT on the full GPT-5 family (GPT-5, GPT-5 mini, GPT-5 nano), the GPT-4o family (GPT-4o, GPT-4o-mini), and select older models like babbage-002 and davinci-002 for legacy workloads. DPO is supported on the same GPT-5 family. RFT is restricted to a smaller subset — o4-mini and select models — with explicit allowlist requirements (you typically need to apply for RFT access through the OpenAI dashboard). The pattern is that newer, more capable base models tend to get all three methods, while older models support only SFT.

**Anthropic** supports SFT on Claude Haiku 4.5 (via both Bedrock and Vertex) and Claude Sonnet 4.6 (via Bedrock only as of June 2026). Claude Opus 4.7 is not currently available for fine-tuning anywhere. There is no public DPO or RLHF offering from Anthropic. The constraint set is the smallest of the three vendors and the most likely to change as Anthropic expands its cloud-partner program.

**Google** supports SFT on Gemini 2.5 Flash with the cleanest path and the most generous free quota; SFT on Gemini 2.5 Pro is in limited preview as of June 2026 and requires Vertex AI allowlisting. Legacy Gemini 1.5 family (Flash and Pro) is still tunable for backward compatibility. DPO on Gemini 2.5 Flash is in preview — you can apply via the Vertex AI console — but is not yet GA. The method matrix is narrower than OpenAI's but the free tier on Gemini 2.5 Flash SFT compensates for many experimentation use cases.

Data format — jsonl is universal but the schemas differ

All three vendors accept jsonl files, but the schema inside differs and you cannot directly port a training file between vendors without a translation step.

**OpenAI's chat format** uses the standard `messages` array with `role` and `content`, identical to the Chat Completions API: `{"messages":[{"role":"system","content":"..."},{"role":"user","content":"..."},{"role":"assistant","content":"..."}]}`. For DPO, the schema adds `preferred_completion` and `non_preferred_completion` fields. For RFT, you provide prompts and a grader specification — the grader is either a JSON schema-based comparison or a Python grader function for arbitrary scoring logic.

**Anthropic's format** uses the Anthropic chat schema with explicit `system` field and `messages` array: `{"system":"...","messages":[{"role":"user","content":"..."},{"role":"assistant","content":"..."}]}`. Tool-use examples include `tool_use` blocks if you are training for tool-calling behavior. Bedrock and Vertex both accept the same schema with slight wrapper differences for the job submission API itself.

**Google's format** uses the Gemini `contents` parts schema with `role` (`user` or `model`) and `parts` array: `{"contents":[{"role":"user","parts":[{"text":"..."}]},{"role":"model","parts":[{"text":"..."}]}]}`. System instructions are passed as a separate field (`systemInstruction`) at the top level. Vision examples can include image parts directly in the parts array, and Gemini 2.5 Flash fine-tuning supports multimodal training out of the box.

A common pattern teams use: maintain training data in a neutral internal format (typed Python dataclasses or Pydantic models), then run a per-vendor translation step at fine-tune job submission. This lets you re-train on a different vendor without a one-off cleanup — important because vendor choices shift over a 12-24 month horizon.

Quotas — examples, context, jobs, deployments

Quota differences can disqualify a vendor before pricing even enters the conversation. The three vendors have published different ceiling structures.

**OpenAI** does not publish a hard maximum on training examples — practical maxima for the GPT-5 mini and GPT-5 family are in the tens of thousands of examples, with reasonable batch processing. Per-example context is capped at 32K tokens for GPT-5 mini training (matching the base model's context). Concurrent fine-tune jobs are limited per organization with a default of 3 active jobs simultaneously; this can be raised on request. Deployed fine-tuned models are not separately quota'd — calls to them count against your normal token-per-minute and request-per-minute rate limits for the base model tier.

**Anthropic (via Bedrock)** recommends 50 minimum and ~10,000 example soft cap for typical jobs, with Claude Sonnet 4.6 training supporting up to 200,000 input tokens per example — the largest per-example context window of any frontier vendor's fine-tuning offering. Concurrent fine-tune jobs are limited by Bedrock service quotas (configurable per AWS account, default ~2-3 concurrent jobs). The provisioned throughput required to serve a fine-tuned Claude is the binding deployment quota — each provisioned throughput unit is a separate AWS service quota item.

**Google Vertex** caps Gemini 2.5 Flash training at approximately 10,000 examples and 32K tokens per example, with concurrent job limits of approximately 5 per project (raisable). The free-tier monthly token allowance is the cap most teams hit first — it is published in the Vertex AI Gemini pricing page and is renewed monthly. Endpoint quota is separate: each Vertex endpoint deployment counts against the project's compute quota.

See OpenAI fine-tuning rate limits, Anthropic fine-tune quotas, and Replicate training quotas for granular per-vendor quota deep-dives including how to request increases.

Eval, monitoring, and continuous improvement

Training a model is one workflow; knowing whether it actually improved on your task is another. The three vendors invest in eval/monitoring tooling at very different levels of depth.

**OpenAI** reports training and validation loss plus token-level accuracy through the fine-tune job dashboard and API. Checkpoints are saved at configurable intervals (default: epoch end) and can be inspected and deployed individually. For evaluation against custom metrics post-training, you write your own eval harness against the OpenAI Evals framework (https://github.com/openai/evals) or a third-party tool. The OpenAI ecosystem itself does not bundle eval pipelines into the fine-tuning workflow — you connect them manually.

**Anthropic via Bedrock** reports training loss and held-out validation metrics through CloudWatch. For evaluation against custom metrics, you typically use Amazon Bedrock Evaluations or build a custom pipeline in SageMaker. Bedrock's eval offering covers standard LLM benchmarks plus custom rubric-based judging via an LLM-as-judge approach. Anthropic via Vertex inherits the Vertex evaluation pipeline tooling.

**Google Vertex AI** has the deepest bundled eval tooling of the three: BLEU, ROUGE, METEOR, and custom evaluation metrics are first-class in the Vertex evaluation pipeline, and you can run an evaluation job against a held-out set immediately after a fine-tune job completes with a single API call. For continuous monitoring of deployed Vertex endpoints, the integrated Vertex AI Model Monitoring service provides drift detection, latency tracking, and quality regression alerts — none of which require custom infrastructure.

If evaluation depth matters more than model selection, Vertex AI's tooling has a clear edge. If you can write your own eval harness and want the deepest method surface, OpenAI is the better fit. If you must use Claude, Bedrock evaluations are sufficient for most use cases but require more glue code.

Decision matrix — which vendor for which job

Trade-offs are easier to navigate against a decision matrix. The following maps common production scenarios to the right vendor pick.

**Task is code completion, math, or anything programmatically gradable** → OpenAI with RFT. The reinforcement fine-tuning offering is the only path to extract quality gains from a reward signal in a hosted API, and OpenAI's o4-mini and select GPT-5 variants are the only models that support it. Budget 2-3x SFT cost but expect 10-30% quality lift over SFT on these tasks.

**You need a fine-tuned Claude for legal, regulatory, or alignment reasons** → Anthropic via Bedrock (or Vertex). There is no other option. Plan for the higher per-token training cost and the Bedrock provisioned throughput floor. Run a small SFT job first to confirm the workflow before committing budget.

**You are experimenting and budget is a hard ceiling** → Google Vertex with Gemini 2.5 Flash SFT. The free monthly training quota means you can run iteration cycles at zero training cost; you only start paying once you go to production and deploy an endpoint.

**You want the lowest serving cost for a production fine-tune** → OpenAI. Auto-deployed fine-tunes have no per-hour serving floor; you pay only per inference token. For low- and medium-traffic production deployments, this is the lowest TCO option.

**Your team is already deep in AWS or GCP** → Match the cloud. Bedrock-based Claude fine-tuning has IAM, KMS, CloudWatch, and SageMaker integrations available out of the box. Vertex AI fine-tuning has the deepest GCP IAM, audit logging, and Cloud Build integration. Cross-cloud workflows add real glue cost.

**You need preference learning (DPO) on a hosted API** → OpenAI for general availability; Google for Gemini 2.5 Flash if you are willing to use a preview API. Anthropic does not offer DPO.

Common pitfalls — what teams get wrong

Three failure modes show up repeatedly across teams running their first hosted fine-tune in 2026.

**Pitfall 1: Ignoring the deployment cost floor on Bedrock or Vertex.** Provisioned throughput on Bedrock and Vertex endpoint hours are charged whether or not you are sending traffic. A team that runs a $500 Claude fine-tune and then leaves a provisioned throughput unit running idle for a month can easily spend $10,000-15,000 on serving — far more than the training itself. Either commit to high traffic, use the on-demand inference path where supported (limited for fine-tunes), or batch your deployment windows.

**Pitfall 2: Treating fine-tuning as a substitute for prompt engineering or RAG.** Fine-tuning shifts a model's behavior distribution; it does not give it new facts (RAG does that better), and it cannot fully replace a well-structured prompt (you still need the system prompt to set scope, voice, and constraints). The typical sequence: ship the best prompt you can, ship RAG for any factual or up-to-date knowledge needs, and only fine-tune when prompt engineering plateaus on a specific behavior pattern that examples can teach better than instructions.

**Pitfall 3: Overfitting on too few examples or too many epochs.** All three vendors expose epoch counts and learning rate multipliers as tunable hyperparameters. The default settings are reasonable, but teams who increase epochs from 3 to 10 in pursuit of a quality lift often see worse generalization on held-out data. Start with the vendor defaults, watch the validation loss curve, and only tune hyperparameters once you can see overfitting in the data.

The 2026 fine-tuning landscape outside the frontier three

OpenAI, Anthropic, and Google are the frontier-model fine-tune offerings in 2026, but they are not the only options. For open-weight models, hosted fine-tuning is also available from Together AI (https://together.ai/), Fireworks AI (https://fireworks.ai/), Replicate (https://replicate.com/), and Modal (https://modal.com/). Open-weight fine-tuning brings different trade-offs — full control over weights, lower per-token cost, but you own the serving infrastructure or use the platform's serving endpoints.

See our companion comparison Together fine-tuning vs Fireworks vs Replicate for the open-weight side of this decision, and the LoRA vs QLoRA vs full fine-tuning cost breakdown for choosing the right training method on open weights.

The honest summary: if you want a frontier model fine-tuned on your data, your vendor choice is one of OpenAI, Anthropic, or Google. If you want maximum control and lowest per-token cost, fine-tune an open-weight model (Llama 4, Mistral, Qwen) on a platform like Together, Fireworks, or Replicate. Most production deployments end up using both — frontier fine-tune for the highest-quality user-facing path, open-weight fine-tune for cost-sensitive bulk operations.

Choosing between OpenAI, Anthropic, and Google for fine-tuning

1
Identify your training method need (SFT, DPO, RFT)
Start with the algorithm. If your task is gradable programmatically (code, math, reasoning where you can score outputs cheaply), reinforcement fine-tuning is the highest-quality path and OpenAI is the only frontier vendor offering it as a hosted API in June 2026. If you have chosen-vs-rejected preference data and want offline preference learning, DPO is available on OpenAI (GA) and Gemini 2.5 Flash (preview). If your data is standard input-output examples, all three vendors support SFT and the decision moves to cost, model selection, and deployment surface.
2
Match your model preference to vendor availability
If you specifically need Claude (legal, alignment, reasoning depth), Anthropic via Bedrock or Vertex is the only path — accept the higher training cost and the provisioned throughput serving floor. If you need a frontier OpenAI model (GPT-5, GPT-5 mini), OpenAI's direct API is the most mature option. If you want maximum free experimentation budget, Gemini 2.5 Flash on Vertex AI has the most generous free quota of any frontier vendor.
3
Estimate total cost including serving
Calculate (training tokens × per-token rate) + (deployment hours × per-hour serving cost) + (production inference tokens × per-token inference rate). For low-traffic production workloads, deployment serving floors on Bedrock and Vertex can dominate training cost; OpenAI's auto-deployed fine-tunes have no per-hour floor and may win on TCO. Use our fine-tuning cost by model calculator to model the full stack.
4
Format your data correctly per vendor
Each vendor uses a different jsonl schema. OpenAI uses the Chat Completions `messages` array, Anthropic uses the Anthropic chat schema with separate `system` field, Google uses Gemini `contents` parts. Maintain training data in a neutral internal format and run a per-vendor translation step at job submission. This lets you re-train on a different vendor without a cleanup pass.
5
Plan eval and monitoring before training, not after
All three vendors report training loss; only Vertex AI bundles deeper eval pipelines (BLEU, ROUGE, custom metrics) out of the box. If evaluation rigor matters and you do not have an internal eval harness, Vertex AI is the lowest-friction starting point. If you can write your own evals, OpenAI's deeper method surface (RFT especially) and lower serving cost typically win on overall ROI. See our DPO vs RLHF vs ORPO 2026 comparison for the eval considerations specific to preference learning.

Digital Dashboard Hub

"X vs Y" only matters if you give both the prompt they want. DDH's AI Prompt Builder writes once, exports to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama — same structure, model-tuned per output.

Try DDH's AI Prompt Builder — free 14 days, no card. AICHAT30 = 30% off Pro. →

Continue your research on adjacent topics — calculators, rate limits, head-to-head comparisons, and guides.

Related prompt tools

Fine-tuning cost by model calculator→LoRA training cost on H100→LoRA vs QLoRA vs full fine-tuning cost→Together vs Fireworks vs Replicate fine-tuning→When to fine-tune vs RAG vs prompt engineer→

Use the data programmatically

Every page on this site is also exposed as a free, CORS-open JSON endpoint. No auth, no rate limit (fair-use, please cache). License is CC-BY-4.0 — link back to attribution.canonicalUrl in the response.

Endpoint: https://aipromptshub.co/api/vs/openai-fine-tuning-vs-anthropic-vs-google

curl

curl -s 'https://aipromptshub.co/api/vs/openai-fine-tuning-vs-anthropic-vs-google' | jq .

Python

import requests

r = requests.get("https://aipromptshub.co/api/vs/openai-fine-tuning-vs-anthropic-vs-google", timeout=10)
r.raise_for_status()
data = r.json()
print(data["title"])
for source in data.get("sources", []):
    print("source:", source)

JavaScript / Node

// Node 20+ / modern browser
const res = await fetch("https://aipromptshub.co/api/vs/openai-fine-tuning-vs-anthropic-vs-google");
if (!res.ok) throw new Error("HTTP " + res.status);
const openai_fine_tuning_vs_anthropic_vs_google = await res.json();
console.log(openai_fine_tuning_vs_anthropic_vs_google.title);
for (const source of openai_fine_tuning_vs_anthropic_vs_google.sources ?? []) {
  console.log("source:", source);
}

Spec: /api/openapi.yaml · Docs: /api/docs

Frequently Asked Questions

What is the cheapest way to fine-tune a frontier model in 2026?

Google Vertex AI Gemini 2.5 Flash supervised fine-tuning is the cheapest entry point — it includes a substantial monthly free training token allowance and a published paid-tier rate of approximately $8 per 1M training tokens, the lowest of the frontier three. For OpenAI, GPT-5 nano SFT is the cheapest at approximately $12-15 per 1M training tokens. For Anthropic, Claude Haiku 4.5 via Bedrock is the cheapest but still about 2x the price of equivalent Gemini training.

Can I fine-tune Claude Opus 4.7?

No — as of June 2026, Anthropic does not offer fine-tuning on Claude Opus 4.7. Available Anthropic fine-tune targets are Claude Haiku 4.5 (via both Bedrock and Vertex) and Claude Sonnet 4.6 (via Bedrock only). This is a deliberate restriction documented at https://docs.anthropic.com/en/docs/build-with-claude/fine-tuning and may change in future releases.

Does OpenAI use my fine-tuning data to train base models?

No. OpenAI's data privacy policy for fine-tuning explicitly states that data uploaded for fine-tuning is not used to train base models. Enterprise customers can additionally enable Zero Data Retention (ZDR) for full control over data persistence. See https://platform.openai.com/docs/guides/your-data for the full data handling documentation.

How long does a typical fine-tuning job take?

Wall-clock time depends on dataset size, model, and method. A typical 5,000-example SFT job on GPT-5 mini completes in 30-90 minutes; on Claude Haiku 4.5 via Bedrock, 1-3 hours; on Gemini 2.5 Flash via Vertex, 1-2 hours. Reinforcement fine-tuning runs longer (4-12 hours typical) because they include grading cycles. Anthropic provisioned throughput deployment adds another 5-15 minutes after training completes.

Can I export a fine-tuned model's weights from these vendors?

No — all three vendors keep fine-tuned model weights proprietary and serve them via their hosted APIs only. If you need exportable weights for on-premise serving or vendor independence, you must fine-tune an open-weight model (Llama 4, Mistral, Qwen) on a platform like Together AI, Fireworks AI, or Replicate, where you can download the trained adapter (LoRA) or full weights.

What is the minimum dataset size for a useful fine-tune?

OpenAI's published minimum is 10 examples; in practice 200-500 examples are needed to see meaningful behavior shift. Anthropic recommends 50 minimum but typically 500-2,000 for production. Google recommends 16 minimum on Gemini 2.5 Flash with 200-1,000 examples as a practical floor. Below these thresholds, the marginal cost of running the job is wasted because the model does not have enough signal to learn from.

Is fine-tuning a better path than RAG or prompt engineering?

Almost never as the first step. The honest hierarchy: (1) try prompt engineering with examples in-context; (2) add RAG if you need factual grounding; (3) fine-tune only when prompt engineering plateaus on a specific behavior pattern that examples teach better than instructions. Fine-tuning shifts behavior distribution but does not give the model new facts (RAG does that), and it is the most expensive and slowest to iterate. See our when to fine-tune vs RAG vs prompt engineer deep-dive for the full decision tree.

How does fine-tuned model inference pricing compare to base model pricing?

OpenAI charges approximately 2x base model input price and 1.5x output price for GPT-5 family fine-tunes. Anthropic Claude fine-tunes on Bedrock charge approximately 1.5x base price for inference, plus the provisioned throughput hourly cost. Google Vertex AI Gemini 2.5 Flash fine-tunes charge approximately 1.0-1.2x base price for inference plus the endpoint hourly cost. Always model TCO including both training cost and projected inference volume — for low-traffic production deployments, inference cost on the fine-tune can exceed the training cost within a few weeks.

You picked the vendor. Now write the prompts that actually take advantage of the fine-tune.

A fine-tuned model is a different beast — your old GPT-4 prompts will underperform on a GPT-5 mini fine-tune, and a Claude SFT needs an XML-tagged system prompt to leverage the cache. AI Prompt Generator builds production-ready system prompts tuned to each vendor's quirks (OpenAI chat format, Anthropic XML, Gemini parts) based on YOUR business + task. Works with fine-tunes too. 14-day free trial, no credit card required.

Browse all prompt tools →