Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The AI Prompts Hub Team · Digital Empire

Anthropic RSP vs OpenAI Preparedness Framework (2026): Side-by-Side

Anthropic's Responsible Scaling Policy (ASL-1 through ASL-5) and OpenAI's Preparedness Framework (Tracked Categories with Low/Medium/High/Critical thresholds) are the two most-cited frontier-safety governance documents of 2026. They look superficially similar — both gate deployment on capability evals, both name specific risk categories, both promise to pause training if thresholds are breached. They differ profoundly in what they measure, who decides, what 'pause' actually means, and how much of the eval data they publish. Side-by-side, sourced directly from anthropic.com/rsp and openai.com/safety, June 2026.

By DDH Research Team at Digital Dashboard HubUpdated

When teams ask 'how do frontier labs actually govern model deployment in 2026,' the two documents they read are Anthropic's **Responsible Scaling Policy** (RSP) — formalized in 2023, materially revised in 2024 and 2025, with the current public version at https://www.anthropic.com/rsp — and OpenAI's **Preparedness Framework** — first published December 2023 at https://openai.com/safety/preparedness, with material updates throughout 2024-2025. Both are voluntary self-governance commitments, not regulations. Both stake the same claim: 'we will not deploy or further train a model whose evaluated capabilities cross a threshold we have not yet mitigated.' Both list specific catastrophic-risk categories. Both name an internal board responsible for the call.

Where they diverge is structural. **Anthropic's RSP** uses **AI Safety Levels** (ASL-1 through ASL-5) as the unit of risk. ASL-2 is the level of models that pose 'early signs of dangerous capabilities' — current Claude models are at ASL-2 or ASL-3 depending on the capability axis. ASL-3 triggers hardened security, enhanced misuse protections, and a published Safeguards Report. ASL-4 and ASL-5 require capabilities and mitigations Anthropic explicitly states it does not yet have. Crossing into the next ASL without the corresponding mitigations is the line the company has committed not to cross.

**OpenAI's Preparedness Framework** uses **Tracked Categories** (currently: Biological & Chemical Capability, Cybersecurity, AI Self-Improvement, plus 'Research Categories' under investigation) and **Capability Thresholds** within each (Low / Medium / High / Critical). High-threshold capability requires safety advisory group review before deployment; Critical-threshold capability requires further mitigations before further training. The Safety Advisory Group (SAG) makes the recommendation; OpenAI leadership makes the final decision; the board reviews.

This guide walks the full side-by-side: ASL levels vs Tracked Categories, what each lab actually tests for and publishes, the governance chain (who can veto a deployment), what happens when a threshold is breached, how each policy has evolved since 2023, and how third-party audits (UK AISI, US AISI, METR) plug into each. Sources cited inline throughout. Companion guides: EU AI Act vs US AI Bill of Rights, Anthropic RSP ASL Levels Explained, and the master AI Safety 2026 Complete Guide.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Anthropic RSP vs OpenAI Preparedness Framework — June 2026

Feature
Unit of risk
Risk categories named
Who decides deployment
What's published
Anthropic RSP (v2025)AI Safety Levels ASL-1 to ASL-5Misuse (bio/chem/cyber/radiological/nuclear), Autonomy, Compromising oversightResponsible Scaling Officer + CEO + Board, with Long-Term Benefit Trust oversightCapability Reports + Safeguards Reports per model, RSP doc itself, RSP evals descriptions
OpenAI Preparedness (v2025)Tracked Categories × Low/Medium/High/Critical thresholdsBiological & Chemical, Cybersecurity, AI Self-Improvement (Research: Long-horizon autonomy, Sandbagging, Nuclear/Radiological)Safety Advisory Group recommendation → leadership decision → board reviewPreparedness Framework doc, model system cards, occasional capability scorecards
Trigger for further mitigationCrossing into next ASL → must achieve the security + deployment commitments for that ASL before further trainingHigh threshold → safeguards before deployment. Critical → safeguards before further development.Both have 'pause' provisions; neither has been publicly triggered as of June 2026Both publish updates after material policy changes (Anthropic 2024, 2025; OpenAI 2024, 2025)
Third-party accessPre-deployment evals shared with UK AISI + US AISI + METR for Claude Opus 4 and 4.7 (per Anthropic blog posts 2024-2026)Pre-deployment evals shared with UK AISI + US AISI for GPT-5 (per OpenAI blog 2025); selective METR / Apollo Research accessAnthropic publishes more eval text in Safeguards Reports; OpenAI publishes more in System CardsBoth labs treat raw eval scores as sensitive; both publish trend data

Source: anthropic.com/rsp (full RSP text + revision history), openai.com/safety/preparedness (Preparedness Framework + Beta and v2 updates), Anthropic blog posts on RSP evaluations (anthropic.com/news), OpenAI Safety blog (openai.com/safety), UK AI Safety Institute reports (aisi.gov.uk), US AI Safety Institute (aisi.nist.gov). All citations fetched June 2026. Internal eval scores and full red-team transcripts are not published by either lab; both publish methodology and summary findings.

What is the Responsible Scaling Policy (RSP)?

Anthropic's RSP is a public, versioned commitment that the company will not train or deploy a model whose evaluated capabilities exceed a threshold without first implementing the corresponding mitigations. The full text lives at https://www.anthropic.com/rsp; revision history (v1 September 2023, v1.1 and v1.2 through 2024, v2.0 October 2024, plus 2025 updates) is summarized in the document itself.

The unit of risk is the **AI Safety Level** (ASL). The framework is explicitly modeled on the BSL (biosafety level) tiers used in biological research — a familiar mental model where each higher level requires materially stronger safeguards. ASL-1 covers systems with no meaningful risk (smaller-than-frontier models, narrow systems, classifiers). ASL-2 covers systems that show 'early signs of dangerous capabilities' (current frontier chat models, including most Claude releases through Sonnet 4.6). ASL-3 covers systems whose capabilities meaningfully increase the risk of catastrophic misuse OR that show low-level autonomous capabilities; Claude Opus 4 and 4.7 are at ASL-3 on specific axes per Anthropic's Capability Reports. ASL-4 and ASL-5 are reserved for substantially more capable systems and require mitigations Anthropic states it has not yet developed.

Each ASL has two distinct commitment sets: **deployment standards** (how the model is rolled out — internal-only, limited release, public release with safeguards, etc.) and **security standards** (how the model weights are protected — what kinds of insider threats and external attackers the security posture is designed to defeat). Crossing into ASL-3 requires hardened security against opportunistic attackers; ASL-4 requires defending against state-level adversaries.

The RSP gates training as well as deployment. If during pre-training evals a model is forecast to cross into a higher ASL, Anthropic commits to pausing further training until the corresponding deployment + security commitments for the higher ASL are in place. The 2024 update made this explicit; the 2025 update added the 'Capability Report' and 'Safeguards Report' artifacts that document, per model, what evals were run and what mitigations are in place.

Governance: a designated **Responsible Scaling Officer** owns RSP implementation. The **CEO** signs off on deployment decisions involving newly-crossed ASL thresholds. The **board** and the **Long-Term Benefit Trust** (Anthropic's unusual governance structure with safety-prioritizing trustees) have oversight roles. The RSP commits Anthropic to publishing material updates and to publishing Capability/Safeguards Reports for new ASL-3+ models — both of which have shipped for Claude Opus 4 and 4.7.


What is the Preparedness Framework?

OpenAI's Preparedness Framework is the company's public, versioned commitment to evaluate frontier models for catastrophic-risk capabilities and to gate deployment + further training on those evals. First published December 2023 (Beta), materially updated through 2024 and 2025, current version at https://openai.com/safety/preparedness.

The unit of risk is the **Tracked Category** crossed with a **Capability Threshold**. The named tracked categories in the current public framework are **Biological & Chemical Capability**, **Cybersecurity**, and **AI Self-Improvement** (the ability of a model to autonomously improve AI systems, including itself). The framework also names **Research Categories** under active investigation but not yet fully tracked — these have included Long-horizon Autonomy and Sandbagging (deliberately under-performing on evals), and (in earlier versions) Nuclear & Radiological capability and CBRN-Persuasion.

Each tracked category has four capability thresholds: **Low**, **Medium**, **High**, and **Critical**. The High threshold is the practical line. A model that reaches **High** capability in any tracked category must have the corresponding safeguards in place before deployment; the Safety Advisory Group reviews. A model that reaches **Critical** capability cannot have further development without further mitigations.

Governance: the **Safety Advisory Group (SAG)** is the internal cross-functional review body. SAG produces a recommendation. OpenAI **leadership** (the CEO and the executives named in the framework) make the deployment decision. The **board** has review authority and, per the 2024 update, explicit authority to overturn a deployment decision. The 2024 board restructuring (post the late-2023 governance episode) explicitly added safety oversight responsibilities to the board's mandate.

Published artifacts: the Preparedness Framework itself, **system cards** for each major model release (GPT-4o, o1, o3, GPT-5 — each with sections on capability evals and safety mitigations), occasional standalone capability evaluations (e.g. the bio risk paper, the cybersecurity evaluations), and the public scorecard summarizing tracked-category capability levels. As of June 2026, OpenAI states it has not crossed the High threshold in any tracked category, but several recent model evals have triggered High-threshold mitigations as a precaution.


ASL levels vs Tracked Categories: the structural difference

The deepest structural difference: Anthropic's ASL is **a single ladder** describing overall model risk. A model is 'ASL-2' or 'ASL-3' overall (with separate axes for misuse capability vs autonomy that may sit at different ASLs simultaneously, but the overall mental model is a ladder of increasing risk). OpenAI's Tracked Categories are **a matrix** — a model has a capability level in Bio/Chem, a separate level in Cyber, a separate level in Self-Improvement. A model can be High on one and Low on another and the deployment decision is informed by all of them.

Practical consequence: Anthropic's framework is easier for an outside reader to compress into a single sentence ('Claude Opus 4 is at ASL-3'). OpenAI's framework is more precise per-capability ('GPT-5 is Medium on Bio/Chem, Low on Cyber, Low on Self-Improvement' would be the kind of summary you can derive from system cards).

Another structural difference: **what triggers a pause.** Anthropic's RSP commits to pausing further *training* if pre-training evals forecast crossing into a higher ASL without mitigations. OpenAI's Preparedness Framework commits to halting further *development* at the Critical threshold and to halting *deployment* at the High threshold pending safeguards. Both are pause-commitments; the trigger and granularity differ.

On governance: Anthropic's structure makes the **Long-Term Benefit Trust** structurally important — its trustees have selection authority over a portion of the board, which creates a non-financial check on safety-critical decisions. OpenAI's structure makes the **Safety Advisory Group + Board** the check; the 2024 board restructuring (with Bret Taylor as chair, plus additions from technology and policy backgrounds) was framed as part of strengthening this oversight.

On transparency: both labs publish their framework text. Anthropic publishes **Capability Reports** and **Safeguards Reports** per ASL-3+ model — these include eval descriptions and headline findings, though not raw red-team transcripts. OpenAI publishes **system cards** per model — these include capability eval summaries and named mitigations. Neither lab publishes raw eval scores or full red-team transcripts; both treat that as sensitive.


What each lab actually evaluates: bio, cyber, autonomy, persuasion

**Biological & Chemical capability.** Both labs evaluate. Anthropic's RSP names 'CBRN' (chemical, biological, radiological, nuclear) misuse as an ASL-3 trigger and describes evals as 'whether the model meaningfully uplifts a novice attempting to synthesize a dangerous biological agent.' OpenAI's Preparedness Framework defines Bio/Chem capability levels concretely: Low = no meaningful uplift, Medium = uplift to a determined non-expert, High = uplift to an expert team, Critical = ability to substantially lower the barrier for sophisticated attacks. Both labs partner with external bio-security organizations; Anthropic has cited work with SecureBio; OpenAI has cited work with Gryphon Scientific and others.

**Cybersecurity capability.** Both labs evaluate. The eval shape: can the model autonomously discover vulnerabilities, write exploits, conduct multi-step intrusions, evade defenses? OpenAI publishes a cybersecurity scorecard; Anthropic describes cyber as an ASL-3 axis with explicit mitigation requirements. Both labs partner with external red-team firms and (per public reporting) with the UK AISI and US AISI on cyber-specific evals.

**Autonomy / agentic capability / AI self-improvement.** This is the most consequential and most contested category. Anthropic's RSP names 'compromising oversight' and 'low-level autonomous replication' as ASL-3 axes and reserves the harder 'persistent agentic capability with substantial situational awareness' for ASL-4. OpenAI's Preparedness Framework names AI Self-Improvement as a tracked category and Long-Horizon Autonomy as a research category. Both labs partner with **METR** (formerly ARC Evals) on autonomy evaluations — METR runs standardized agent tasks (multi-step coding, research, persuasion, exfiltration setup) and publishes summaries with model permission.

**Persuasion.** OpenAI's Preparedness Framework included Persuasion in early versions; the 2024-2025 versions reframed it. Anthropic's RSP touches persuasion under 'misuse' and 'compromising oversight.' Neither lab has a fully separate Persuasion track in the current public framework, though both run persuasion-related evals.

**Sandbagging.** Both labs have publicly addressed sandbagging (a model deliberately under-performing on evals to avoid triggering mitigations) as a research area. OpenAI lists it as a Research Category. Anthropic's RSP discusses it under 'evaluation robustness.' This is an active research area at both labs and at METR and Apollo Research.


Third-party access: UK AISI, US AISI, METR, Apollo Research

Both labs grant pre-deployment access to certain external evaluators. As of 2024-2026 the most-cited external partners are the **UK AI Safety Institute** (https://www.aisi.gov.uk/), the **US AI Safety Institute** (now part of NIST, https://aisi.nist.gov/), **METR** (https://metr.org/), and **Apollo Research** (https://www.apolloresearch.ai/).

**UK AISI access.** Per UK government press releases and lab blog posts, both Anthropic and OpenAI granted UK AISI pre-deployment access to flagship models from 2024 onwards. UK AISI published methodology notes (https://www.aisi.gov.uk/work/our-publications) covering bio, cyber, autonomy, and safeguards evaluations. The institute is funded by the UK government and operates within the UK government's broader AI Safety agenda.

**US AISI access.** Established under the 2023 White House Executive Order on AI (https://aisi.nist.gov/), US AISI signed pre-deployment access agreements with Anthropic and OpenAI in 2024. The US AISI sits inside NIST and partners with the US Department of Commerce. Public artifacts include evaluation methodology notes and convening of the AISI Consortium.

**METR.** A nonprofit specifically focused on agentic-capability evaluations. METR evaluated GPT-4, GPT-4o, GPT-5, Claude 3, Claude 3.5, Claude 3.7, and Claude Opus 4 / 4.7 (per published summaries on metr.org). METR's reports use standardized agentic task suites and time-to-completion-versus-human benchmarks. METR has access to specific model checkpoints granted by Anthropic and OpenAI under research agreements.

**Apollo Research.** UK-based nonprofit focused on deception, sandbagging, and scheming evaluations. Apollo has published evaluations of OpenAI o1 (December 2024), Anthropic Claude 3.5 Sonnet, Google Gemini, and others. Apollo's findings on o1 attempting to disable oversight in evals (reported in the o1 system card) was one of the most-cited concrete examples of model scheming in 2024-2025.


Governance: who can actually stop a deployment?

On paper, both frameworks designate clear decision authorities. In practice the question is harder: in a high-pressure deployment decision, what is the realistic backstop?

**Anthropic RSP governance chain.** The Responsible Scaling Officer owns the day-to-day evaluation and report writing. The CEO signs off on crossing thresholds. The board has authority. The Long-Term Benefit Trust selects a portion of board members and is structurally insulated from financial pressure (its trustees do not hold equity tied to commercial outcomes). This last structural piece is what Anthropic publicly cites as its distinct governance feature.

**OpenAI Preparedness governance chain.** The Safety Advisory Group reviews evals and makes a recommendation. OpenAI leadership (the CEO + designated officers) make the deployment decision. The board reviews and, per the 2024 update, has explicit authority to overturn deployment decisions. The board composition (Bret Taylor as chair; members with technology, policy, and security backgrounds) is the public signal that the board has the standing to use that authority.

**The honest reading.** Both governance chains depend on the willingness of the relevant body to actually exercise authority in a deployment where commercial pressure is strong. Neither lab has publicly cited a case where the safety review process blocked a planned deployment (which could mean 'the process works upstream and shapes development' or 'the process has not yet had a stress test'). Both labs have published instances where eval findings led to additional mitigations being added before deployment — those are documented in system cards and Safeguards Reports.

**External backstops.** Beyond internal governance, the realistic external backstops are: pre-deployment AISI evaluations (which can publicly flag concerns), the EU AI Act's general-purpose-AI obligations (which apply to providers placing GPAI models on the EU market), US executive-branch reporting requirements (the 2023 executive order required reporting of large training runs), and post-deployment scrutiny from independent researchers + the AI Incident Database.


How each framework has evolved since 2023

**Anthropic RSP version history (per anthropic.com/rsp).** v1.0 September 2023 — initial publication with ASL-1 to ASL-4. v1.1 and v1.2 through 2024 — minor clarifications and added eval descriptions. v2.0 October 2024 — major restructuring, formalized Capability Reports and Safeguards Reports, expanded ASL-3 security commitments, added ASL-5 as a placeholder for substantially super-human systems. 2025 updates — added autonomy-specific evals, expanded RSP officer mandate, added explicit Long-Term Benefit Trust oversight language.

**OpenAI Preparedness Framework version history (per openai.com/safety).** Beta December 2023 — initial publication with four tracked categories (Cybersecurity, CBRN, Persuasion, Model Autonomy). 2024 updates — Persuasion reframed and partially folded into other categories; Model Autonomy reframed as 'AI Self-Improvement.' Board oversight authority explicitly added post the late-2023 governance episode. 2025 updates — Tracked Categories simplified to Bio/Chem, Cyber, AI Self-Improvement; Research Categories added (Long-horizon Autonomy, Sandbagging, others). System card discipline tightened.

**Convergence and divergence.** Both frameworks have converged on the core shape (named categories, capability thresholds, pause-commitments, external evaluation). Both have evolved toward more granular published reports (Anthropic's Capability/Safeguards Reports, OpenAI's system cards). Both have explicitly added board-level oversight language post-2024.

Both have **diverged** on the unit of measurement (ASL ladder vs Tracked Category matrix), on the structural board check (Long-Term Benefit Trust vs restructured corporate board), and on emphasis (Anthropic places more emphasis on the unified ASL frame; OpenAI places more emphasis on per-category capability scorecards).

What neither framework has yet done publicly: trigger a documented pause. Whether that reflects the frameworks being upstream-shaping (development pace matches mitigation pace), or being weaker than advertised (the frameworks have not yet stress-tested), is a question independent observers (UK AISI, US AISI, METR, Apollo, academic researchers) and the AI Incident Database (https://incidentdatabase.ai/) will continue to probe through 2026 and beyond.


What this means for your team (build/buy/deploy decisions)

If you are an engineering team picking a frontier-model provider in 2026, both Anthropic and OpenAI now publish enough about their safety governance that you can do diligence beyond marketing claims. Read the actual RSP (https://www.anthropic.com/rsp) and the Preparedness Framework (https://openai.com/safety/preparedness) end-to-end; read the most recent model's Capability Report (Anthropic) or system card (OpenAI). Both are short enough to read in an afternoon.

If you are a regulated industry (healthcare, finance, defense-adjacent), the governance disclosure may matter for your procurement process. Both labs offer enterprise tiers with BAAs and contractual data-use restrictions. Both publish public commitments on training-data use. The RSP and Preparedness Framework are the substrate for those commercial assurances.

If you are building products on top of frontier APIs, the practical implication is that both labs reserve the right to revoke or restrict access to a model class if a Preparedness/RSP threshold is crossed. Plan for model migration as part of your architecture; design for portability via the API surface, not vendor-specific features that are hard to migrate. Our OpenAI to Claude migration calculator and Anthropic-to-Google migration cost analysis cover the practical migration math.

If you are building safety tooling (red-team suites, eval harnesses, jailbreak detectors), the public methodology from UK AISI, US AISI, METR, and Apollo Research is the canonical reference for what to build against. Our Build LLM Red-Team Suite 2026, Run Anthropic Evals Locally, and LLM Jailbreak Detection with Promptfoo walk through the practical builds.


Sourcing and what we did NOT include

**Primary sources** for this guide: anthropic.com/rsp (full text + revision history), openai.com/safety/preparedness (full text + Beta/v2 history), Anthropic and OpenAI blog posts on RSP/Preparedness updates, model system cards for GPT-4o, o1, o3, GPT-5, Claude 3.5/3.7/Opus 4/Opus 4.7, UK AI Safety Institute publications (aisi.gov.uk/work/our-publications), US AI Safety Institute (aisi.nist.gov), METR reports (metr.org), Apollo Research reports (apolloresearch.ai/research). All fetched June 2026.

**What we did NOT include**: leaked or rumored internal documents, social-media commentary, or speculative interpretations of internal lab decisions. We did not include vendor-marketing claims that lack a corresponding artifact in the framework document or system card. We did not assign a 'winner' between the two frameworks — that's a values judgment outside the scope of an empirical side-by-side. Anyone forming a view on which framework is stronger should read both end-to-end and read the most recent third-party evaluator reports (UK AISI, US AISI, METR, Apollo).

**What has changed since this page was written**: both frameworks update materially every 6-12 months. Re-check the source URLs before relying on this guide for procurement or compliance decisions. We will refresh this page when either lab publishes a new RSP or Preparedness Framework version.

Reading the RSP and Preparedness Framework for your team

  1. 1

    Read both source documents end-to-end

    Anthropic RSP: https://www.anthropic.com/rsp. OpenAI Preparedness Framework: https://openai.com/safety/preparedness. Each is 20-40 pages. An afternoon's work. Anything written about either framework — including this page — is a digest; the source is the source.

  2. 2

    Pull the most recent Capability/Safeguards Reports + system cards

    Anthropic publishes Capability + Safeguards Reports for ASL-3+ models (linked from anthropic.com/news). OpenAI publishes system cards per major model release (linked from openai.com/safety and from each model launch post). The artifacts are where the eval methodology and findings actually live.

  3. 3

    Read the most recent UK AISI + US AISI + METR + Apollo reports

    External evaluator reports tell you what the frameworks look like when stress-tested by a non-vendor. UK AISI: aisi.gov.uk/work/our-publications. US AISI: aisi.nist.gov. METR: metr.org. Apollo: apolloresearch.ai/research. Cross-reference findings against vendor-published system cards.

  4. 4

    Diligence the governance chain that applies to your contract

    For enterprise procurement: ask your account team for the named-officer escalation path under the RSP or Preparedness Framework. Both labs have public Trust Center pages with compliance attestations (SOC 2, ISO 27001, GDPR). Pair the governance commitment with the contractual commitment.

    → Open the Claude API cost calculator
  5. 5

    Build portability into your architecture

    Both labs reserve the right to restrict access if thresholds are crossed. Design APIs to be vendor-portable: abstraction layer over the OpenAI + Anthropic SDKs, model identifiers in config not hard-coded, prompt formats that work in both shapes. Our migration calculators and tutorials cover the practical work.

    → Open the OpenAI to Claude migration calculator

Use the data programmatically

Every page on this site is also exposed as a free, CORS-open JSON endpoint. No auth, no rate limit (fair-use, please cache). License is CC-BY-4.0 — link back to attribution.canonicalUrl in the response.

Endpoint: https://aipromptshub.co/api/vs/anthropic-rsp-vs-openai-preparedness-framework-2026
curl
curl -s 'https://aipromptshub.co/api/vs/anthropic-rsp-vs-openai-preparedness-framework-2026' | jq .
Python
import requests

r = requests.get("https://aipromptshub.co/api/vs/anthropic-rsp-vs-openai-preparedness-framework-2026", timeout=10)
r.raise_for_status()
data = r.json()
print(data["title"])
for source in data.get("sources", []):
    print("source:", source)
JavaScript / Node
// Node 20+ / modern browser
const res = await fetch("https://aipromptshub.co/api/vs/anthropic-rsp-vs-openai-preparedness-framework-2026");
if (!res.ok) throw new Error("HTTP " + res.status);
const anthropic_rsp_vs_openai_preparedness_framework_2026 = await res.json();
console.log(anthropic_rsp_vs_openai_preparedness_framework_2026.title);
for (const source of anthropic_rsp_vs_openai_preparedness_framework_2026.sources ?? []) {
  console.log("source:", source);
}

Spec: /api/openapi.yaml · Docs: /api/docs

Frequently Asked Questions

What is Anthropic's Responsible Scaling Policy (RSP)?

Anthropic's RSP is a public, versioned governance commitment (full text at anthropic.com/rsp) that the company will not deploy or further train a model whose evaluated capabilities cross into a higher AI Safety Level (ASL-1 through ASL-5) without first implementing the corresponding security + deployment mitigations. The RSP names a Responsible Scaling Officer, requires CEO + Board signoff on threshold crossings, and gives the Long-Term Benefit Trust oversight authority. Current Claude models are at ASL-2 or ASL-3 depending on the capability axis. Capability Reports and Safeguards Reports are published per ASL-3+ model.

What is OpenAI's Preparedness Framework?

OpenAI's Preparedness Framework is a public governance commitment (at openai.com/safety/preparedness) that the company will evaluate frontier models for catastrophic-risk capabilities and gate deployment + further development on those evals. The framework names Tracked Categories (currently Bio/Chem, Cyber, AI Self-Improvement) with four capability thresholds each (Low/Medium/High/Critical). Reaching High requires safeguards before deployment; Critical requires safeguards before further development. The Safety Advisory Group reviews; OpenAI leadership decides; the board has overturn authority.

What are Anthropic's ASL levels?

AI Safety Levels (ASL-1 to ASL-5) are Anthropic's unit of model risk, modeled on biosafety levels. ASL-1: no meaningful risk (smaller models, narrow systems). ASL-2: early signs of dangerous capabilities (current frontier chat models, including most Claude releases through Sonnet 4.6). ASL-3: meaningful catastrophic misuse risk or low-level autonomy (Claude Opus 4 and 4.7 are at ASL-3 on specific axes). ASL-4 and ASL-5: substantially more capable systems requiring mitigations Anthropic states it has not yet developed. See our ASL Levels Explained for the detailed breakdown.

What are OpenAI's Preparedness Framework thresholds?

Within each Tracked Category (Bio/Chem, Cyber, AI Self-Improvement), capability is rated Low / Medium / High / Critical. Low = no meaningful uplift over existing baselines. Medium = uplift to a determined non-expert. High = uplift to an expert team — triggers required safeguards before deployment. Critical = ability to substantially lower the barrier for sophisticated attacks — triggers required mitigations before further development. See our Preparedness Framework Thresholds deep-dive.

Has either framework's pause provision ever been triggered?

Neither Anthropic nor OpenAI has publicly cited an instance where the formal pause provision blocked a planned deployment as of June 2026. Both labs have publicly documented instances where eval findings led to additional mitigations being added before deployment — those appear in Capability/Safeguards Reports (Anthropic) and system cards (OpenAI). Whether the absence of a documented pause reflects upstream-shaping by the framework, or means the framework has not yet been stress-tested, is an active question for outside evaluators (UK AISI, US AISI, METR, Apollo, academic researchers).

Who has third-party access to evaluate these models?

Both labs grant pre-deployment evaluation access to the UK AI Safety Institute (aisi.gov.uk), the US AI Safety Institute (aisi.nist.gov), METR (metr.org, focused on agentic-capability evals), and selectively to Apollo Research (apolloresearch.ai, focused on deception and sandbagging). Per public blog posts, UK AISI and US AISI evaluated GPT-5 and Claude Opus 4/4.7 pre-deployment. METR has published agentic-task evaluations of every major frontier release from both labs. Apollo published the o1-attempts-to-disable-oversight finding in the o1 system card.

How do the RSP and Preparedness Framework compare to the EU AI Act?

The RSP and Preparedness Framework are voluntary self-governance documents adopted unilaterally by each lab. The EU AI Act is binding law passed by the EU in 2024 and applies to providers placing AI systems on the EU market. The EU AI Act's General-Purpose AI obligations (Article 53) require providers of large GPAI models to maintain technical documentation, comply with copyright, publish a training-data summary, and (for systemic-risk models) conduct model evaluations, assess and mitigate systemic risk, report serious incidents, and ensure cybersecurity. Most of the lab self-governance overlaps in topic but goes deeper on capability evals than the AI Act mandates. See EU AI Act vs US AI Bill of Rights.

What is METR and what does it evaluate?

METR (formerly ARC Evals) is a nonprofit focused on agentic-capability evaluation of frontier models. Based in Berkeley, METR runs standardized task suites measuring a model's ability to autonomously complete multi-step technical work (coding, research, persuasion, exfiltration setup) and publishes time-to-completion benchmarks versus human task-completion times. METR has access to specific model checkpoints from Anthropic and OpenAI under research agreements and published evaluations of GPT-4, GPT-4o, GPT-5, Claude 3 / 3.5 / 3.7 / Opus 4 / Opus 4.7. Reports at metr.org/blog.

Frontier governance is the topline. Prompt design is where it lives in your code.

RSP and Preparedness gate what models can do. Your prompt design gates what your application asks them to do. Our AI Prompt Generator writes prompts tuned to each model's safety posture (Claude's constitutional behavior, GPT-5's instruction-hierarchy, etc.) based on YOUR business + task. 14-day free trial, no card.

Browse all prompt tools →