Together's tier ladder: Build → Scale → Enterprise → Dedicated
Together AI does not use a points-or-spend-based tier promotion the way OpenAI does. There is no '$1,000 paid + 30 days' threshold to remember. Tiers are operational categories — Build is the default state for any organization with billing enabled, Scale is granted on request by support after you demonstrate a need, and Enterprise is a signed commercial contract. The promotion path is: ship on Build → start seeing 429s on a specific model → email support with traffic patterns and a target ceiling → get bumped on that model within typically 1-3 business days.
**Build** handles prototypes, eval runs, internal tools, modest production traffic. Most B2B SaaS features with <100 concurrent users sit comfortably on Build for the lifetime of the product. The headers are the source of truth — if your `x-ratelimit-remaining` is consistently above 30% of your `x-ratelimit-limit` during peak hours, you do not need to upgrade.
**Scale** handles production traffic with sustained throughput needs. The typical trigger is hitting 429s on the same model multiple times per day for a week, or running an eval batch that exceeds Build TPM on a large reasoning model (DeepSeek R1 in particular has the tightest Build ceiling of the popular models). Scale is granted per-model — getting bumped on Llama 3.3 70B does not automatically bump you on DeepSeek R1.
**Enterprise** handles regulated workloads, contractual SLAs, and very large committed spend (typically $10k+/month). It includes negotiated per-model ceilings, named account team, custom data-handling terms (no training on your data is the default across all tiers, but Enterprise gets it in writing), and the option to pre-commit to dedicated capacity reservations.
**Dedicated Endpoints** sit outside the tier ladder. They are not a tier promotion — they are a separate product where you reserve specific GPU hardware (H100, H200, or B200) and run a specific model on it for a flat per-hour fee. Rate limits on dedicated endpoints are 'whatever the hardware can do' — no shared-fleet contention, no `x-ratelimit-remaining`, no tier upgrade gate.