Claude runs on three first-party surfaces in 2026: Anthropic's direct API at claude.com, AWS Bedrock, and Google Cloud Vertex AI. The per-token list rates are essentially identical across all three — Sonnet 4.6 is $3 input / $15 output on each platform, Opus 4.8 is $5 / $25, Haiku 4.5 is $1 / $5, Fable 5 is $10 / $50. Where they diverge is everything around the meter: which credits you can spend, how fast new models arrive, which regions serve traffic, how authentication works, and which discount levers actually function.
Billing is the most consequential difference for most finance teams. Bedrock usage flows through your AWS invoice — eligible for AWS Activate startup credits (up to $100k), Enterprise Discount Program (EDP) commitments, and the AWS Marketplace private offer mechanism. Vertex AI usage flows through your GCP invoice — eligible for the Google for Startups Cloud Program ($200k-$350k tiers), Committed Use Discounts (CUDs), and BigQuery-adjacent credits. The direct Anthropic API bills through Anthropic directly — eligible for the Anthropic Startup Program (up to $100k in Claude credits via Y Combinator, Techstars, and similar partner programs) but not portable to AWS or GCP invoices. A startup sitting on $80k of unused AWS credits that expire in 6 months has a clear answer: route Claude through Bedrock and burn the credits before they vaporize.
Worked example. Take a Series A startup spending $25,000/month on Claude Sonnet 4.6 for a production agent workload — about 1.4B input tokens and 600M output tokens monthly at standard rates. On the direct API, that is $25,000 of cash out the door. On Bedrock with $80,000 of AWS Activate credits, the same $25,000 invoice draws down credits at 100% face value — net cash cost $0 until the credits run out at month 3.2, an effective ~30% saving over a 12-month horizon if the remaining 8.8 months bill at list. On Vertex with a similar GCP credit balance, the math is identical. The lesson: route Claude to wherever your dormant cloud credits live. Run `aws ce get-cost-and-usage` or the GCP billing console to see what is actually expiring.
Model availability lags vary. New Claude models almost always land on the direct API first. Bedrock typically follows 2-6 weeks later, sometimes longer for the largest tiers — Opus 4.8 hit the direct API in February 2026 and only landed in Bedrock us-east-1 in late March. Vertex AI tracks Bedrock's cadence within a week or two on either side. If your product roadmap depends on day-zero access to a new Claude release, the direct API is the only safe bet; Bedrock and Vertex are appropriate for production workloads that can absorb a one-month delay on the latest model. Regional availability also differs — Bedrock now serves Claude from us-east-1, us-west-2, eu-central-1, eu-west-3, ap-northeast-1, and ap-southeast-2; Vertex covers us-central1, us-east5, europe-west4, and asia-northeast1; the direct API serves globally from Anthropic's own edge with no region selection.
Prompt caching and Batch API support are not at parity. The direct Anthropic API has the most mature caching implementation — both 5-minute and 1-hour TTLs, full support across all four tiers, and the cleanest pricing semantics (1.25x write, 0.1x read). Bedrock supports prompt caching as of Q1 2026 but with restrictions: 5-minute TTL only on most regions, no 1-hour TTL on Haiku 4.5 until Q3 2026, and a minimum cacheable prefix size of 1,024 tokens versus 512 on the direct API. Vertex AI supports caching with similar caveats. The Batch API exists on all three, but only the direct API offers the full 50% discount on every tier — Bedrock applies the discount through its own Bedrock Batch Inference jobs (similar mechanics, occasionally smaller discount on Fable 5), and Vertex uses its Batch Prediction surface. If your workload depends heavily on caching a 600-token system prompt or stacking caching + batch for compounded discounts, the direct API still wins on raw economics by 8-15%.
Access control is the last axis. Bedrock plugs into AWS IAM — you can scope a service account to a specific model ARN, attach SCPs at the AWS Organization level, and audit every invoke through CloudTrail. Vertex plugs into GCP IAM equivalently with Cloud Audit Logs. The direct Anthropic API uses workspace-scoped API keys with per-key spend limits and usage dashboards, but lacks the policy-engine depth that enterprise security teams expect — no SCP-equivalent, no ABAC, no native SSO-bound key rotation on the standard tier. For regulated workloads (HIPAA on AWS, FedRAMP-adjacent on GCP, SOC 2 audit trails) the cloud-provider surfaces typically win on compliance posture even when they lose on raw price. The pragmatic pattern that has emerged at most scaled teams: production traffic runs through Bedrock or Vertex for billing and compliance reasons, while development, evaluation, and prompt iteration run through the direct API for speed and feature freshness.