OpenAI Superalignment: what was promised, what happened
OpenAI's Superalignment program was announced 5 July 2023 (https://openai.com/index/introducing-superalignment/). The headline commitments: dedicate 20% of secured compute over 4 years to solving the alignment problem for superhuman AI, co-led by Ilya Sutskever (OpenAI's co-founder and Chief Scientist) and Jan Leike (then Head of Alignment). Stated goal: produce 'scientific and technical breakthroughs to steer and control AI systems much smarter than us.'
**What happened.** Through late 2023 and early 2024, the Superalignment team published research on weak-to-strong generalization, scalable oversight, and related topics. In May 2024, Jan Leike departed OpenAI; Sutskever had effectively been on leave since November 2023; the Superalignment team was reorganized and effectively dissolved. Public reporting and Leike's own statement on departure cited disagreements over the trajectory of safety prioritization.
**Where the work continued.** Substantial safety work continued at OpenAI through three primary surfaces: (1) the **Preparedness Framework** as operational governance, (2) the **Model Spec** as a public behavioral specification, (3) integrated safety teams embedded across model development. Researchers from the former Superalignment team have published from a mix of OpenAI and other institutions through 2025-2026. Sutskever founded Safe Superintelligence (SSI) in mid-2024.
**Why it matters in 2026.** Superalignment as a standalone research initiative is no longer OpenAI's headline safety story. The framing has shifted from 'we will solve alignment for superintelligence in 4 years' to 'we will operate Preparedness Framework governance + Model Spec behavioral commitments + per-model evaluations.' Reading OpenAI's safety posture in 2026 requires reading the Preparedness Framework and the Model Spec, not the original Superalignment announcement.
**What survives from Superalignment thinking.** The technical research on scalable oversight and weak-to-strong generalization has informed evaluation methodology and is cited in subsequent papers. The framing of 'superalignment' as a problem class — alignment that scales to more capable systems than current evaluators can reliably evaluate — remains an active research question across labs, including in Anthropic's RSP discussions of ASL-4 and ASL-5 and DeepMind's discussion of CCLs that exceed current evaluation methodology.