What changed between 2025 and 2026 that matters for code?
Three things moved the head-to-head in 2026. Anthropic shipped **Claude Opus 4.8 and Sonnet 4.6** with explicit coding/agentic improvements, pushing SWE-bench Verified past 80% for the top tier. OpenAI's **GPT-5.1 and GPT-5.1 Codex** consolidated the Codex line into the main GPT line with substantial gains on competitive-programming benchmarks. And the **agentic harness ecosystem matured** — Claude Code shipped as a first-party CLI, GPT got better tool use inside Cursor and Copilot Workspace.
HumanEval is no longer a useful tiebreaker — both vendors ace it above 95%. The real 2026 signal comes from **SWE-bench Verified** (real GitHub issues), **Aider Polyglot** (multi-language edit-and-pass-tests), and **LiveCodeBench** (contamination-resistant competitive coding). Sources: SWE-bench Verified, Aider leaderboards, LiveCodeBench, HumanEval (Chen et al. 2021).