By The AI Prompts Hub Team · Digital Empire

OpenAI Superalignment vs Anthropic RSP vs Google DeepMind Frontier Safety Framework (2026)

OpenAI's Superalignment program (launched July 2023, team disbanded May 2024, safety work continued via Preparedness + Model Spec), Anthropic's Responsible Scaling Policy (the ASL ladder), and Google DeepMind's Frontier Safety Framework (v1 2024, v2 2025) are the three labs' public-facing frontier-safety frameworks. They are NOT equivalent objects — Superalignment was a research bet, RSP is a deployment-governance commitment, Frontier Safety Framework is a risk-management protocol. Side-by-side, sourced from openai.com, anthropic.com, and deepmind.com, June 2026.

By DDH Research Team at Digital Dashboard Hub·Updated June 21, 2026

Browse all 40+ free prompt tools

When you ask 'what's each major lab's safety story in 2026,' the three documents people cite are: **OpenAI's Superalignment program** (announced July 2023 at https://openai.com/index/introducing-superalignment/), **Anthropic's Responsible Scaling Policy** (https://www.anthropic.com/rsp), and **Google DeepMind's Frontier Safety Framework** (https://deepmind.com/safety, v1 May 2024, v2 February 2025).

These are not directly comparable objects. **Superalignment** was OpenAI's research bet on solving the alignment problem for superhuman AI — a co-founder-led team, a 4-year horizon, and a public commitment of 20% of OpenAI's compute. The Superalignment team was effectively disbanded in May 2024 when co-leads Jan Leike and Ilya Sutskever departed; OpenAI's safety work continues through the Preparedness Framework, the Model Spec, the Safety Advisory Group, and integrated safety teams. **Anthropic's RSP** is a deployment-governance commitment — an ASL ladder that gates training and deployment on capability evaluations. **DeepMind's Frontier Safety Framework** is a risk-management protocol — Critical Capability Levels (CCLs) per misuse domain, with mitigations triggered at each level.

**What they share.** All three name capability domains that warrant pre-emptive evaluation (bio, cyber, autonomy, etc.). All three commit to internal evaluation processes. All three engage with external evaluators (UK AISI, US AISI, METR). All three publish per-model artifacts (system cards, capability/safeguards reports, evaluation summaries).

**Where they diverge structurally.** OpenAI's frame is now Preparedness Framework (the operational governance) + Model Spec (the behavioral specification) + Safety Advisory Group (the decision body) — Superalignment as a standalone research initiative is no longer the headline. Anthropic's RSP is a single integrated commitment with the ASL ladder as the unit of risk. DeepMind's Frontier Safety Framework uses domain-specific Critical Capability Levels with mitigations triggered at each.

This guide walks the full side-by-side and addresses the natural question: 'are the three frameworks meaningfully different in what they would do in the same risk scenario?' Sources cited throughout. Companion guides: Anthropic RSP vs OpenAI Preparedness, UK AISI vs US AISI vs EU AI Office, Red-Teaming Tools.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card. →

Superalignment vs RSP vs Frontier Safety Framework — June 2026

Feature	What it is	Status	Unit of risk	Most-cited artifact
OpenAI Superalignment	Research program targeting alignment of superhuman AI (July 2023 launch)	Standalone team disbanded May 2024; safety research continues via integrated teams	Originally framed around 'superintelligence' alignment; operational risk frame now via Preparedness	openai.com/index/introducing-superalignment/ launch post; subsequent work in OpenAI safety + alignment papers
OpenAI Preparedness Framework + Model Spec	Operational governance (Preparedness) + behavioral specification (Model Spec) since 2024	Active, materially updated through 2025-2026; SAG reviews; board oversight	Tracked Categories × Low/Medium/High/Critical thresholds	openai.com/safety/preparedness, openai.com/model-spec, per-model system cards
Anthropic Responsible Scaling Policy	Public deployment-governance commitment since Sep 2023, v2 Oct 2024, updated 2025	Active; Responsible Scaling Officer + CEO + Board + Long-Term Benefit Trust oversight	AI Safety Levels (ASL-1 to ASL-5), modeled on biosafety levels	anthropic.com/rsp; per-model Capability Reports + Safeguards Reports
Google DeepMind Frontier Safety Framework	Risk-management protocol; v1 May 2024, v2 Feb 2025	Active; DeepMind safety team + Responsibility & Safety Council oversight	Critical Capability Levels (CCLs) per misuse domain (bio, cyber, autonomy, etc.)	deepmind.com/safety FSF doc; Gemini system cards; FSF-related research papers

Source, fetched June 2026: https://openai.com/index/introducing-superalignment/ (launch), https://openai.com/safety/preparedness, https://openai.com/model-spec, https://www.anthropic.com/rsp, https://deepmind.com/safety. Superalignment team disbandment in May 2024 widely reported and confirmed by departing leadership statements. Subsequent OpenAI safety work documented in the Preparedness Framework updates, Model Spec releases, and per-model system cards.

OpenAI Superalignment: what was promised, what happened

OpenAI's Superalignment program was announced 5 July 2023 (https://openai.com/index/introducing-superalignment/). The headline commitments: dedicate 20% of secured compute over 4 years to solving the alignment problem for superhuman AI, co-led by Ilya Sutskever (OpenAI's co-founder and Chief Scientist) and Jan Leike (then Head of Alignment). Stated goal: produce 'scientific and technical breakthroughs to steer and control AI systems much smarter than us.'

**What happened.** Through late 2023 and early 2024, the Superalignment team published research on weak-to-strong generalization, scalable oversight, and related topics. In May 2024, Jan Leike departed OpenAI; Sutskever had effectively been on leave since November 2023; the Superalignment team was reorganized and effectively dissolved. Public reporting and Leike's own statement on departure cited disagreements over the trajectory of safety prioritization.

**Where the work continued.** Substantial safety work continued at OpenAI through three primary surfaces: (1) the **Preparedness Framework** as operational governance, (2) the **Model Spec** as a public behavioral specification, (3) integrated safety teams embedded across model development. Researchers from the former Superalignment team have published from a mix of OpenAI and other institutions through 2025-2026. Sutskever founded Safe Superintelligence (SSI) in mid-2024.

**Why it matters in 2026.** Superalignment as a standalone research initiative is no longer OpenAI's headline safety story. The framing has shifted from 'we will solve alignment for superintelligence in 4 years' to 'we will operate Preparedness Framework governance + Model Spec behavioral commitments + per-model evaluations.' Reading OpenAI's safety posture in 2026 requires reading the Preparedness Framework and the Model Spec, not the original Superalignment announcement.

**What survives from Superalignment thinking.** The technical research on scalable oversight and weak-to-strong generalization has informed evaluation methodology and is cited in subsequent papers. The framing of 'superalignment' as a problem class — alignment that scales to more capable systems than current evaluators can reliably evaluate — remains an active research question across labs, including in Anthropic's RSP discussions of ASL-4 and ASL-5 and DeepMind's discussion of CCLs that exceed current evaluation methodology.

OpenAI Preparedness Framework + Model Spec: what replaced the headline

The **Preparedness Framework** (https://openai.com/safety/preparedness) is OpenAI's current operational frontier-safety governance — Tracked Categories with Low/Medium/High/Critical thresholds, Safety Advisory Group review, leadership decision, board oversight. Covered in depth in our Anthropic RSP vs OpenAI Preparedness Framework and OpenAI Preparedness Framework Thresholds guides.

The **Model Spec** (https://openai.com/model-spec, first published May 2024, updated through 2025-2026) is OpenAI's public specification of how its models should behave. Distinguishes between **Objectives** (broad goals like 'be helpful,' 'be safe'), **Rules** (hard constraints models must not violate), and **Defaults** (preferred behaviors that can be overridden by user/developer instructions following the instruction hierarchy). The Model Spec is the public face of OpenAI's behavioral commitments — the document that says 'this is what a well-behaved OpenAI model does, and here's how exceptions are handled.'

**Instruction hierarchy.** A core technical commitment in the Model Spec: model behavior is governed by an ordering of instructions — platform-level (OpenAI policy) > developer (system prompt) > user (chat message). Higher-priority instructions take precedence; lower-priority instructions can refine but not override. This is the structural mechanism for resisting prompt injection and jailbreaks.

**Safety Advisory Group + leadership + board.** The Preparedness Framework names the decision chain. SAG produces a recommendation. OpenAI leadership decides. The board has overturn authority per the 2024 update. Public board membership (Bret Taylor as chair, members with tech/policy/security backgrounds) signals the board's standing to use that authority.

**Per-model artifacts.** System cards for each major release (GPT-4o, o1, o3, GPT-5) include capability evaluations, safety mitigations applied, third-party evaluator findings (UK AISI, US AISI, METR, Apollo), and known limitations. The system card is OpenAI's primary per-model safety artifact and is the document procurement and compliance teams should pull when diligencing a specific model.

Anthropic Responsible Scaling Policy: the ASL ladder

Anthropic's RSP (https://www.anthropic.com/rsp) is a single integrated public commitment with the AI Safety Level (ASL) as the unit of risk. ASL-1 (no meaningful risk) through ASL-5 (substantially super-human, requires mitigations Anthropic has not yet developed). Covered in depth in our Anthropic RSP ASL Levels Explained and RSP vs Preparedness guides.

**Two commitment axes per ASL.** Each ASL has deployment standards (how the model is rolled out) and security standards (how the weights are protected). ASL-3 requires hardened security against opportunistic attackers + enhanced misuse protections in deployment. ASL-4 requires security against state-level adversaries + substantially stronger deployment-misuse mitigations. ASL-5 commitments are deliberately specified in advance even though Anthropic states it does not yet have the corresponding mitigations.

**Capability + Safeguards Reports.** For ASL-3+ models, Anthropic commits to publishing a Capability Report (what the evaluations found about the model's capabilities) and a Safeguards Report (what mitigations are in place and why they are considered adequate). Both shipped for Claude Opus 4 and Opus 4.7.

**Governance structure.** Responsible Scaling Officer owns day-to-day implementation. CEO signs off on threshold crossings. Board has authority. Long-Term Benefit Trust selects a portion of board members and is structurally insulated from financial pressure (trustees do not hold equity tied to commercial outcomes). The LTBT is Anthropic's distinct governance feature.

**Constitutional AI.** Separate from but related to the RSP, Constitutional AI is Anthropic's technical approach to model behavioral training — a constitution of principles that guides RLAIF training and the model's refusal/instruction handling. Public version of the constitution at anthropic.com/research/constitutional-ai. Our Implement Constitutional AI Guardrails tutorial walks through how to apply the methodology to your own evaluations.

Google DeepMind Frontier Safety Framework: Critical Capability Levels

Google DeepMind's Frontier Safety Framework (https://deepmind.com/safety) was first published May 2024 and materially updated to v2 in February 2025. The framework is structured around **Critical Capability Levels (CCLs)** — for each misuse domain (autonomous capability, biosecurity, cybersecurity, machine learning R&D, etc.), the framework defines a CCL: the capability level at which a model would meaningfully increase risk and would require specific mitigations.

**Evaluation triggers.** When DeepMind's pre-release evals indicate a model is approaching a CCL, the framework triggers an internal Frontier Safety Council review. Mitigations are designed to keep the model below the CCL or to add safeguards sufficient to operate at the CCL safely. Mitigations include training-time interventions, deployment safeguards, weight security, monitoring, and incident response.

**v2 updates (Feb 2025).** Expanded the CCL catalog to include AI R&D capability (the ability of a model to meaningfully accelerate AI research), added more detail on deployment mitigations, and tightened internal review processes. v2 is the current public version.

**External engagement.** DeepMind engages with UK AISI, US AISI, and the EU AI Office. Gemini system cards (published per major release) include sections on FSF-relevant capability evaluations. DeepMind co-authored some of the cited safety research with Anthropic and OpenAI researchers.

**Distinct structural choice.** Where Anthropic uses a single overall ASL ladder and OpenAI uses Tracked Categories × thresholds, DeepMind uses per-domain Critical Capability Levels. The mental model: each domain has its own bar; crossing any one triggers domain-specific mitigations. Practical effect: similar to OpenAI's matrix approach but framed around discrete CCL crossings rather than continuous threshold tiers.

Direct side-by-side: same scenario, three responses

**Scenario: A new flagship model's pre-deployment evals indicate meaningfully elevated bio-capability — uplift to a determined non-expert is now plausible.**

**Anthropic RSP response.** Evaluate which ASL axis the bio-uplift triggers. If it crosses into ASL-3 on the misuse axis, the model cannot be deployed without ASL-3 deployment commitments (enhanced misuse protections, sandboxed deployment, additional refusal training, third-party Capability Report attestation). Training of further capabilities pauses until ASL-3 mitigations are in place. RSO + CEO + Board sign off on the threshold crossing.

**OpenAI Preparedness response.** Bio capability assessed at Low / Medium / High / Critical. If 'High' is the assessment, the model requires safeguards before deployment — refusal training, output filters, downstream-integrator constraints. SAG reviews the safeguards package; leadership decides on deployment; the board reviews. If 'Critical,' further development must include mitigations before continuing. The system card documents the eval and mitigations.

**DeepMind FSF response.** Bio CCL is one of the named domains. If the eval indicates the model is approaching the bio CCL, Frontier Safety Council reviews. Mitigations are designed to keep the model below the CCL or to make deployment safe at the CCL. Capability-specific evaluators (in this case bio-security partners) are consulted. The Gemini system card for the model documents the eval and mitigations.

**Practical similarity.** All three labs would (a) document the elevated capability, (b) add mitigations before public deployment, (c) consult external evaluators, (d) publish a per-model artifact. The structural differences are in the unit of risk (ASL ladder vs Tracked Category vs CCL) and the governance chain (RSO+CEO+Board+LTBT vs SAG+leadership+board vs Council+leadership), but the practical decision tree converges.

**Where they would diverge.** On a borderline case — eval results uncertain, time-to-market pressure — the structural commitments matter. Anthropic's RSP is the most operationally specific about pausing training, with the LTBT as a structural backstop. OpenAI's Preparedness leans more on the SAG/leadership/board chain. DeepMind's FSF leans more on the Council + external partners. Anyone forming a view on which framework is structurally strongest should read each end-to-end and read the most recent external evaluator reports.

What's published vs what isn't

**Anthropic.** Published: RSP full text + revision history, Capability + Safeguards Reports for ASL-3+ models, Constitutional AI research, model behavior documents, safety research papers, blog posts on RSP updates. Not published: raw eval scores, full red-team transcripts, internal threshold-crossing deliberations, full prompt sets used in evaluations.

**OpenAI.** Published: Preparedness Framework text + revision history, Model Spec, system cards per major model, safety research papers, Voluntary AI Commitments updates, AISI Consortium contributions. Not published: raw eval scores, full red-team transcripts, internal SAG deliberations, full prompt sets used in evaluations.

**DeepMind.** Published: FSF v1 + v2 text, Gemini system cards, safety research papers, AISI engagement summaries. Not published: raw eval scores, full red-team transcripts, internal Council deliberations.

**Common pattern.** All three publish methodology, headline findings, and named mitigations. None publish raw scores or full evaluation prompt sets — both because some details would aid misuse and because commercial confidentiality applies. External evaluators (UK AISI, US AISI, METR, Apollo) operate under agreements that allow publishing methodology + redacted findings but not vendor-internal materials.

**The transparency baseline.** Per public commentary from researchers and policy analysts, the current lab transparency baseline is meaningfully higher than 2022-2023 levels but lower than what some external evaluators have publicly requested. UK AISI, US AISI, and academic AI-safety research groups continue to publish what additional transparency would look like — these are the canonical references for 'is the lab publishing enough?' debates.

How third-party evaluators see each

**UK AISI.** Has evaluated GPT-5, Claude Opus 4 / 4.7, Gemini 2.5 Pro pre-deployment per public communiqués. Publishes methodology notes at aisi.gov.uk/work/our-publications. Joint evaluations with US AISI for major releases.

**US AISI.** AISI Consortium working groups include all three labs as members. Joint evaluations with UK AISI. NIST AI RMF generative-AI profile incorporates eval methodology informed by AISI Consortium output.

**METR.** Has evaluated GPT-4, GPT-4o, GPT-5, o1, o3 from OpenAI; Claude 3, 3.5, 3.7, Opus 4, Opus 4.7 from Anthropic; Gemini releases from DeepMind. Publishes time-to-completion benchmarks vs humans for agentic tasks. Reports at metr.org/blog.

**Apollo Research.** Published the o1 attempts-to-disable-oversight finding (in the o1 system card). Published deception and sandbagging evaluations of OpenAI, Anthropic, and Google models. Reports at apolloresearch.ai/research.

**Convergent view.** Third-party evaluators broadly characterize all three labs as engaging substantively with the frontier-safety-evaluation surface in 2026, with bilateral access agreements and published methodology. Divergent views exist on which framework is structurally strongest — these are values judgments and the public record supports reasonable people disagreeing. Anyone doing diligence should read each lab's framework end-to-end plus the most recent third-party evaluator reports.

What this means for your team (model selection, governance)

**If you select frontier models for a regulated industry.** All three labs (Anthropic, OpenAI, Google) now publish enough about their safety governance that you can do substantive diligence. Read the framework. Read the most recent model's per-model artifact (Capability/Safeguards Report, system card). Cross-reference with UK AISI / US AISI / METR / Apollo findings. Bring the artifacts into your procurement diligence file.

**If you need vendor portability.** All three frameworks reserve the right to restrict or revoke access if thresholds are crossed. Design for portability — abstraction layer over the OpenAI + Anthropic + Google SDKs, model identifiers in config, prompt formats portable across the three shapes. Our Openai-to-Claude migration, OpenAI to Claude Migration cost delta, and Anthropic to Google migration cost cover the practical work.

**If you build safety tooling.** All three labs' published methodology + UK AISI / US AISI / METR / Apollo published methodology is the canonical reference set. Build your eval suites and red-team tools to interoperate with these methodologies — your output will then map cleanly to vendor system-card sections.

**Practical artifact stack we recommend** for procurement diligence: (1) Lab framework text (RSP, Preparedness, FSF). (2) Most recent per-model artifact (Capability/Safeguards Report or system card). (3) Most recent third-party evaluator report (UK AISI, US AISI, METR, Apollo). (4) Lab Trust Center compliance attestations (SOC 2, ISO 27001, ISO/IEC 42001). (5) Contract terms on training-data use, data residency, and BAA availability. Maintain in your AI provider diligence folder.

Comparing lab frameworks for procurement

1
Read each lab's framework end-to-end
Anthropic RSP: anthropic.com/rsp. OpenAI Preparedness Framework: openai.com/safety/preparedness. OpenAI Model Spec: openai.com/model-spec. DeepMind FSF: deepmind.com/safety. Each 20-40 pages. An afternoon's work and the source for any later claim about the lab's safety posture.
2
Pull the most recent per-model artifact for your candidate model
Anthropic: Capability + Safeguards Reports for ASL-3+ models (linked from anthropic.com/news). OpenAI: system cards per major model. DeepMind: Gemini system cards per major release. These document what evaluations were run and what mitigations are in place.
3
Pull the most recent UK AISI / US AISI / METR / Apollo report
External evaluator reports tell you what the framework looks like under non-vendor scrutiny. Cross-reference against the vendor system card — the convergent and divergent findings are diagnostic.
4
Map your use case against the framework's named risk categories
If your use case is in a tracked category (bio, cyber, autonomy, persuasion), expect more documentation and more frequent updates. If your use case is not in a tracked category, you're operating in the framework's 'minimal-additional-mitigation' tier — but post-deployment monitoring still matters.
5
Build vendor portability into your architecture
All three labs reserve the right to restrict access if thresholds are crossed. Design for migration: abstraction layer, config-driven model selection, prompt formats portable across shapes. Test failover during normal operations, not when you're under pressure to swap.
→ Open the OpenAI to Claude migration

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. →

Related prompt tools

Anthropic RSP vs OpenAI Preparedness→EU AI Act vs US AI Bill of Rights→UK AISI vs US AISI vs EU AI Office→ASL Levels Explained→Preparedness Thresholds→Implement Constitutional AI Guardrails→AI Safety 2026 Complete Guide→

Use the data programmatically

Every page on this site is also exposed as a free, CORS-open JSON endpoint. No auth, no rate limit (fair-use, please cache). License is CC-BY-4.0 — link back to attribution.canonicalUrl in the response.

Endpoint: https://aipromptshub.co/api/vs/openai-superalignment-vs-anthropic-rsp-vs-google-deepmind-frontier-safety

curl

curl -s 'https://aipromptshub.co/api/vs/openai-superalignment-vs-anthropic-rsp-vs-google-deepmind-frontier-safety' | jq .

Python

import requests

r = requests.get("https://aipromptshub.co/api/vs/openai-superalignment-vs-anthropic-rsp-vs-google-deepmind-frontier-safety", timeout=10)
r.raise_for_status()
data = r.json()
print(data["title"])
for source in data.get("sources", []):
    print("source:", source)

JavaScript / Node

// Node 20+ / modern browser
const res = await fetch("https://aipromptshub.co/api/vs/openai-superalignment-vs-anthropic-rsp-vs-google-deepmind-frontier-safety");
if (!res.ok) throw new Error("HTTP " + res.status);
const openai_superalignment_vs_anthropic_rsp_vs_google_deepmind_frontier_safety = await res.json();
console.log(openai_superalignment_vs_anthropic_rsp_vs_google_deepmind_frontier_safety.title);
for (const source of openai_superalignment_vs_anthropic_rsp_vs_google_deepmind_frontier_safety.sources ?? []) {
  console.log("source:", source);
}

Spec: /api/openapi.yaml · Docs: /api/docs

Frequently Asked Questions

What happened to OpenAI's Superalignment program?

OpenAI's Superalignment program was announced July 2023 (openai.com/index/introducing-superalignment) with a commitment of 20% of secured compute over 4 years to alignment research, co-led by Ilya Sutskever and Jan Leike. The team was reorganized and effectively dissolved in May 2024 following Jan Leike's departure and the prior departure of Sutskever. Subsequent safety work at OpenAI continues through the Preparedness Framework, the Model Spec, the Safety Advisory Group, and integrated safety teams. Sutskever founded Safe Superintelligence (SSI) in mid-2024.

What is OpenAI's Model Spec?

OpenAI's Model Spec (openai.com/model-spec, first published May 2024) is OpenAI's public specification of how its models should behave. Distinguishes between Objectives (broad goals like 'be helpful'), Rules (hard constraints), and Defaults (preferred behaviors that can be overridden by higher-priority instructions). Defines the instruction hierarchy — platform > developer > user — that determines whose instructions take precedence and is the structural mechanism for resisting prompt injection.

What is Anthropic's Responsible Scaling Policy?

Anthropic's RSP (anthropic.com/rsp) is a public deployment-governance commitment. Uses the AI Safety Level (ASL-1 to ASL-5) as the unit of risk, modeled on biosafety levels. Each ASL has deployment + security standards. Crossing into a higher ASL requires the corresponding mitigations before training/deployment proceeds. Governance: Responsible Scaling Officer + CEO + Board + Long-Term Benefit Trust. Per-model Capability + Safeguards Reports published for ASL-3+ models. See ASL Levels Explained for the deep dive.

What is DeepMind's Frontier Safety Framework?

Google DeepMind's Frontier Safety Framework (deepmind.com/safety; v1 May 2024, v2 February 2025) is a risk-management protocol structured around Critical Capability Levels (CCLs) per misuse domain (autonomous capability, biosecurity, cybersecurity, ML R&D, etc.). When pre-release evals indicate a model approaches a CCL, the Frontier Safety Council reviews; mitigations are designed to keep the model below the CCL or make deployment safe at the CCL. v2 added AI R&D capability and tightened review processes.

Are the three frameworks meaningfully different?

Structurally yes: Anthropic uses a single ASL ladder; OpenAI uses Tracked Categories with Low/Medium/High/Critical thresholds; DeepMind uses Critical Capability Levels per domain. Practically, all three converge on a similar decision tree for the same risk scenario: document the capability, add mitigations before public deployment, consult external evaluators, publish a per-model artifact. The structural differences matter most on borderline cases and on the strength of the governance backstop (Anthropic's Long-Term Benefit Trust is the most distinct structural feature).

What is Constitutional AI?

Constitutional AI is Anthropic's technical approach to model behavioral training, distinct from but related to the RSP. A constitution of principles guides RLAIF (reinforcement learning from AI feedback) training. Public version of the constitution at anthropic.com/research/constitutional-ai. Used to train Claude's refusal handling, helpfulness, and instruction-following behavior. Our Implement Constitutional AI Guardrails tutorial walks through applying the methodology to your own evaluations.

What is the Safe Superintelligence company?

Safe Superintelligence (SSI) is the AI lab founded in mid-2024 by Ilya Sutskever (former OpenAI Chief Scientist and Superalignment co-lead) and others. SSI's stated mission is to build safe superintelligence as a single product. SSI is not a signatory to the public Voluntary AI Commitments or a member of the AISI Consortium as of June 2026; the company has not publicly released models or detailed safety frameworks. Sutskever's prior work on Superalignment at OpenAI informed the founding thesis.

Should I pick a vendor based on which safety framework is strongest?

Safety framework should be one factor among several in procurement diligence — alongside model quality, price, latency, data-residency, compliance posture (SOC 2, ISO 27001, ISO/IEC 42001, BAA availability), and your use-case fit. All three major frontier labs (Anthropic, OpenAI, Google) now publish substantial safety documentation. Read each framework end-to-end and read the most recent per-model artifact for your candidate model. Cross-reference against UK AISI / US AISI / METR / Apollo reports. The answer that emerges is rarely 'one is clearly best for all use cases' — more often it's 'these three are competitive on framework substance, decide on other factors.'

Framework substance is the topline. Your prompt design is where it ships.

Whichever vendor you pick — Anthropic, OpenAI, Google — the prompts you ship determine whether their safety surface helps or fights your application. Our AI Prompt Generator writes prompts tuned to each vendor's behavioral spec (Model Spec, Constitutional AI, FSF safeguards) based on YOUR business + task. 14-day free trial, no card.

Browse all prompt tools →