What is the Responsible Scaling Policy (RSP)?
Anthropic's RSP is a public, versioned commitment that the company will not train or deploy a model whose evaluated capabilities exceed a threshold without first implementing the corresponding mitigations. The full text lives at https://www.anthropic.com/rsp; revision history (v1 September 2023, v1.1 and v1.2 through 2024, v2.0 October 2024, plus 2025 updates) is summarized in the document itself.
The unit of risk is the **AI Safety Level** (ASL). The framework is explicitly modeled on the BSL (biosafety level) tiers used in biological research — a familiar mental model where each higher level requires materially stronger safeguards. ASL-1 covers systems with no meaningful risk (smaller-than-frontier models, narrow systems, classifiers). ASL-2 covers systems that show 'early signs of dangerous capabilities' (current frontier chat models, including most Claude releases through Sonnet 4.6). ASL-3 covers systems whose capabilities meaningfully increase the risk of catastrophic misuse OR that show low-level autonomous capabilities; Claude Opus 4 and 4.7 are at ASL-3 on specific axes per Anthropic's Capability Reports. ASL-4 and ASL-5 are reserved for substantially more capable systems and require mitigations Anthropic states it has not yet developed.
Each ASL has two distinct commitment sets: **deployment standards** (how the model is rolled out — internal-only, limited release, public release with safeguards, etc.) and **security standards** (how the model weights are protected — what kinds of insider threats and external attackers the security posture is designed to defeat). Crossing into ASL-3 requires hardened security against opportunistic attackers; ASL-4 requires defending against state-level adversaries.
The RSP gates training as well as deployment. If during pre-training evals a model is forecast to cross into a higher ASL, Anthropic commits to pausing further training until the corresponding deployment + security commitments for the higher ASL are in place. The 2024 update made this explicit; the 2025 update added the 'Capability Report' and 'Safeguards Report' artifacts that document, per model, what evals were run and what mitigations are in place.
Governance: a designated **Responsible Scaling Officer** owns RSP implementation. The **CEO** signs off on deployment decisions involving newly-crossed ASL thresholds. The **board** and the **Long-Term Benefit Trust** (Anthropic's unusual governance structure with safety-prioritizing trustees) have oversight roles. The RSP commits Anthropic to publishing material updates and to publishing Capability/Safeguards Reports for new ASL-3+ models — both of which have shipped for Claude Opus 4 and 4.7.