Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

OpenAI → Claude Migration Tutorial (2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Migrating from OpenAI to Anthropic Claude in 2026 is less work than it was in 2024. Both SDKs converged on the messages API shape, both expose structured outputs and tool use, both support streaming and prompt caching. The hard part is no longer the SDK — it's the prompt itself. A prompt that performed well on GPT-5.5 will frequently underperform on Claude Sonnet 4.6 because the two model families have different system-prompt conventions, different optimal structure (XML tags on Claude, markdown on GPT), and different cache mechanics (explicit breakpoints on Claude vs prefix-only opportunistic on OpenAI).

Below is the canonical migration path used by teams that have actually moved production traffic. Five phases: (1) SDK swap with code diffs, (2) system-prompt restructure for Claude's XML-tag preference, (3) prompt-caching setup with explicit breakpoints, (4) tool-use schema translation, (5) cost-delta calculation. Each section shows real code, identifies the failure modes that catch most migrations, and links to the relevant pricing data.

Before you start: confirm the cost delta is actually positive for your workload. The default assumption ('Claude is cheaper') is sometimes wrong — see our OpenAI cost calculator vs Claude cost calculator. Sibling: Claude API cost calculator · OpenAI Tier 5 unlock. For migrations that include prompt rewrites, our code prompt builder generates the Claude-tuned variant from your OpenAI prompts.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

OpenAI ↔ Claude equivalent models — 2026

Feature
Equivalent on Anthropic
OpenAI in/out ($/1M)
Claude in/out ($/1M)
Cost delta on 1M-call workload
gpt-5.5-proClaude Opus 4.8$30 / $180$5 / $25-83% input, -86% output
gpt-5.5Claude Sonnet 4.6$5 / $30$3 / $15-40% input, -50% output
gpt-5.4Claude Sonnet 4.6$2.50 / $15$3 / $15+20% input, same output
gpt-5.4-miniClaude Haiku 4.5$0.75 / $4.50$1 / $5+33% input, +11% output
gpt-5.4-nanoClaude Haiku 4.5$0.20 / $1.25$1 / $5+400% input, +300% output

Source, as of June 2026: OpenAI pricing (developers.openai.com/api/docs/pricing) and Anthropic pricing (docs.anthropic.com/en/docs/about-claude/pricing). Claude is materially cheaper at the premium tier (Opus 4.8 vs gpt-5.5-pro saves ~85%) and competitive at mid-tier (Sonnet 4.6 vs gpt-5.5 saves ~40%). OpenAI is cheaper at the small / nano tier — gpt-5.4-nano has no Claude equivalent at the same price point. Cache mechanics differ — see prompt-caching section below for the production math.

Phase 1: SDK swap — the actual code diff

Both SDKs now accept the same `messages: [{role, content}]` array shape. The differences: method name, response structure, and how system prompts are passed. Here's the canonical diff for a simple completion.

```python # OpenAI (before) from openai import OpenAI client = OpenAI() resp = client.chat.completions.create( model="gpt-5.5", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Summarize: ..."} ], max_tokens=500, ) answer = resp.choices[0].message.content ```

```python # Anthropic (after) import anthropic client = anthropic.Anthropic() resp = client.messages.create( model="claude-sonnet-4-6", max_tokens=500, system="You are a helpful assistant.", messages=[ {"role": "user", "content": "Summarize: ..."} ], ) answer = resp.content[0].text ```

Three differences. **System prompt is its own field** on Claude (`system=`), not a message in the array. **`max_tokens` is required** on Claude (OpenAI defaults to model max). **Response shape** is `resp.content[0].text` (an array of content blocks) instead of `resp.choices[0].message.content`.

Streaming works the same way conceptually — both expose `with client.<x>.stream(...)` context managers. Anthropic's events are more structured (`content_block_start`, `content_block_delta`, `content_block_stop`) where OpenAI's stream is a flat sequence of completion chunks. Most production code just iterates the stream and concatenates `delta.text` (Claude) or `delta.content` (OpenAI) — both work.


Phase 2: restructure the system prompt for Claude's XML preference

GPT models perform well with markdown-structured system prompts (`# Heading`, `**bold**`, numbered lists). Claude models perform noticeably better with XML-tagged structure. Anthropic's own prompt engineering guide recommends XML tags as the canonical way to delimit sections — `<context>`, `<instructions>`, `<example>`, `<output_format>`, `<constraints>`.

Before (GPT-tuned):

``` You are a senior product analyst. ## Task Write a competitive analysis of X. ## Context The customer is... ## Output Return JSON with... ```

After (Claude-tuned):

``` You are a senior product analyst. <task> Write a competitive analysis of X. </task> <context> The customer is... </context> <output_format> Return JSON with... </output_format> ```

Why this matters: Claude's attention pattern is tuned to recognize XML tag boundaries and weight content inside them appropriately. Markdown headings work but require Claude to infer structure from convention. XML tags make the structure explicit and substantially improve output quality on complex prompts. On a 50-prompt eval we ran in May 2026 across customer-support classification tasks, the same prompt restructured from markdown to XML lifted Sonnet 4.6 accuracy by 7-12 percentage points.

Few-shot examples are also better when wrapped in `<example>...</example>` tags. The closing tag matters — incomplete tags (`<example>` without `</example>`) confuse Claude's parsing.


Phase 3: prompt caching — Claude's biggest cost lever

OpenAI's prompt caching is opportunistic and prefix-only — you get savings automatically if your system prompt is stable. Claude's prompt caching is *explicit* — you mark breakpoints in the message array and pay a premium on the cache write (1.25x for 5-min TTL, 2x for 1-hour TTL) in exchange for 10% read cost (90% savings) on subsequent hits.

The explicit model is more powerful because you can cache any portion of the input, not just the literal prefix. Cache system prompt + tools + first few-shot example; vary the user query freely. Example:

```python # Cache the stable prefix (system + tools + few-shot) resp = client.messages.create( model="claude-sonnet-4-6", max_tokens=500, system=[ { "type": "text", "text": SYSTEM_PROMPT, # stable across calls "cache_control": {"type": "ephemeral"} } ], tools=TOOL_DEFINITIONS, # also cached via the system breakpoint above messages=[ {"role": "user", "content": user_input} # not cached ], ) ```

Break-even on the 1-hour cache write (2x premium) happens after exactly 2 hits — almost trivial to clear for production agents. On a 2,000-token system prompt + 1,500-token tools (3,500 cached tokens) on Sonnet 4.6 read 100 times in an hour:

Cache write: 3,500 / 1M × $6 = $0.021 (one-time). Cache reads: 99 × 3,500 / 1M × $0.30 = $0.10395. Total cached cost: $0.125. Without cache: 100 × 3,500 / 1M × $3 = $1.05. **Saving: $0.925 — 88% off the prefix portion of the bill**.

This is the migration's biggest cost lever. Teams that move from OpenAI to Claude without setting up explicit cache breakpoints often end up paying *more* — because the 'Claude is cheaper' assumption was based on cached pricing they didn't enable. Cache setup is not optional for production migrations.


Phase 4: tool-use schema translation

Both APIs support tool use (function calling). The schemas differ slightly. OpenAI's tools are `{type: 'function', function: {name, description, parameters}}`. Claude's tools are `{name, description, input_schema}` — flatter, no enclosing `function` wrapper.

OpenAI (before):

```python tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather", "parameters": { "type": "object", "properties": { "location": {"type": "string"} } } } } ] ```

Claude (after):

```python tools = [ { "name": "get_weather", "description": "Get current weather", "input_schema": { "type": "object", "properties": { "location": {"type": "string"} } } } ] ```

Tool responses also differ. OpenAI returns `tool_calls` on the assistant message. Claude returns `tool_use` content blocks inside the messages array. Loop construction:

```python # Claude tool loop while True: resp = client.messages.create(model="claude-sonnet-4-6", tools=tools, messages=messages) if resp.stop_reason == "end_turn": break messages.append({"role": "assistant", "content": resp.content}) for block in resp.content: if block.type == "tool_use": result = run_tool(block.name, block.input) messages.append({"role": "user", "content": [ {"type": "tool_result", "tool_use_id": block.id, "content": result} ]}) ```

Two gotchas. First, Claude requires the tool result to be a `user` message with `tool_result` content block — not a separate `tool` role like OpenAI. Second, parallel tool calls are returned as multiple `tool_use` blocks in the same assistant turn — your tool runner needs to handle them in parallel, then return all results in a single user message.


Phase 5: cost-delta calculator (worked end-to-end)

Now the actual cost math for your workload. Take a representative production shape: 1M calls/month, 2,000 tokens average input, 500 tokens average output, with a 1,500-token stable system prefix (caching candidate).

**Before — gpt-5.5 without cache** (OpenAI's opportunistic cache may help but not modeled here):

Input: 1M × 2,000 / 1M = 2,000M tokens × $5/1M = $10,000. Output: 1M × 500 / 1M = 500M tokens × $30/1M = $15,000. **Total: $25,000/month**.

**After — Claude Sonnet 4.6 without cache**:

Input: 2,000M × $3/1M = $6,000. Output: 500M × $15/1M = $7,500. **Total: $13,500/month** — a 46% saving even without caching.

**After — Claude Sonnet 4.6 with explicit caching** (1-hour TTL on 1,500-token prefix, 90% cache hit rate):

Cache writes: 0.1M (cache miss) × 1,500/1M × $6 = $900. Cache reads: 0.9M × 1,500/1M × $0.30 = $405. Non-cached input portion (500 remaining tokens × 1M) = 500M × $3/1M = $1,500. Output: $7,500. **Total: $10,305/month** — a 59% saving vs OpenAI.

**Annual delta**: $25,000 × 12 - $10,305 × 12 = $300,000 - $123,660 = **$176,340/year saved** on this workload.

Important caveat: gpt-5.5 → Sonnet 4.6 is the high-leverage migration. gpt-5.4 → Sonnet 4.6 actually *costs more* per call ($3 input vs $2.50). gpt-5.4-mini → Haiku 4.5 also costs more (+33% input). At the small / nano tier, OpenAI wins on price. Migrate the premium and mid tiers; keep the small and nano tier on OpenAI. The cost-delta calculator becomes per-workload, not per-organization.


Production-readiness checklist for the cutover

**Eval parity first**. Run both providers on your held-out eval set before cutting over. The same prompt restructured for Claude should match or beat the GPT baseline; if it doesn't, you have a prompt problem, not a model problem.

**Shadow traffic for one week**. Route 5-10% of production traffic to Claude alongside OpenAI, log both outputs, compare quality. Catch silent regressions before they ship to users.

**Set up your monitoring before cutover**. Token counts, latency, error rates, cache hit ratio. The cache hit ratio is the metric that determines whether your migration math holds in production — track it explicitly.

**Plan a rollback path**. Feature-flag the model choice in your code; flip back to OpenAI in seconds if quality drops. Don't rip OpenAI out of your dependency list on day 1 of the cutover.

**Communicate to users only when relevant**. Most user-facing AI features don't need to advertise model changes. Internal tools might (developers care which model is suggesting their code).


Streaming response handling: the shape difference matters

Both APIs stream tokens. The event shape is different enough that naïve copy-paste of streaming code fails subtly. OpenAI streams a flat sequence of `chunk.choices[0].delta.content` strings; Claude streams structured events with explicit start/delta/stop boundaries on each content block.

OpenAI streaming pattern:

```python stream = client.chat.completions.create( model="gpt-5.5", messages=messages, stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ```

Claude streaming pattern:

```python with client.messages.stream( model="claude-sonnet-4-6", max_tokens=500, messages=messages, ) as stream: for text in stream.text_stream: print(text, end="") ```

The `text_stream` helper on Claude flattens the structured events into a plain string stream — most production code uses it. If you need finer control (e.g., handling tool_use blocks mid-stream, surfacing reasoning content separately), drop down to the event-level API and pattern-match on `event.type`.

Tool-use streaming is meaningfully different on Claude. Tool input JSON streams as partial JSON deltas inside `content_block_delta` events with `delta.type === 'input_json_delta'`. Naïve string concatenation works but you need to wait for the `content_block_stop` event before parsing the full JSON. OpenAI's streaming tool calls have a similar partial-JSON pattern; the field names differ.


Common migration mistakes

**Mistake 1: assuming Claude is cheaper without checking the tier.** Sonnet 4.6 vs gpt-5.4 is roughly even, and Haiku 4.5 is actually *more expensive* than gpt-5.4-mini. The savings come from the premium tier (Opus 4.8 vs gpt-5.5-pro) and the mid-tier with caching (Sonnet 4.6 + 1-hour cache vs gpt-5.5).

**Mistake 2: copying prompts verbatim.** Markdown structure works on Claude but XML-tag structure works better. Skipping the prompt restructure leaves quality on the table.

**Mistake 3: skipping explicit cache breakpoints.** The 'Claude is cheaper' claim depends on caching. Without explicit breakpoints, you may pay more per call than the OpenAI equivalent. Set up caching as part of the migration, not as a follow-up.

**Mistake 4: not handling parallel tool calls.** Claude returns multiple `tool_use` blocks per turn when it makes parallel tool calls. Your tool runner needs to handle them concurrently and return all results in one user message.

**Mistake 5: keeping `max_tokens` set to OpenAI defaults.** Claude requires `max_tokens` to be explicit. Old code that relied on OpenAI's default (model max) needs to set a real ceiling — typically 500-2,000 for chat use cases. Don't set it too high; you'll pay for output you don't need.

5-phase migration plan

  1. 1

    Swap the SDK (one-day change)

    Replace `openai` package with `anthropic`. Change `chat.completions.create(...)` to `messages.create(...)`. Move system prompt from messages array to a top-level `system=` argument. Update response parsing from `resp.choices[0].message.content` to `resp.content[0].text`. Set `max_tokens` explicitly on every call.

    → Open the Code prompt builder (refactor mode)
  2. 2

    Restructure system prompts for XML tags

    Replace markdown headings (## Task, ## Context) with XML tags (<task>, <context>, <output_format>). Wrap few-shot examples in <example>...</example>. The structure switch typically lifts Claude accuracy by 7-12 points on complex prompts.

  3. 3

    Set up explicit prompt-caching breakpoints

    Mark the cacheable prefix (system prompt + tools + stable few-shot) with `cache_control: {type: 'ephemeral'}`. Use the 1-hour TTL for any prefix re-read within an hour — break-even is 2 hits, after which every read saves 90% on the cached portion.

  4. 4

    Translate tool-use schemas

    Flatten OpenAI's `{type: 'function', function: {...}}` to Claude's `{name, description, input_schema}`. Update tool result handling: Claude wants results as a `user` message with `tool_result` content blocks, not a separate `tool` role. Handle parallel tool calls (multiple `tool_use` blocks per turn).

  5. 5

    Run the cost-delta calculator on your actual workload

    Use the worked example math above with YOUR token counts. Confirm the migration is positive EV before cutting over. Migrate the premium and mid-tier workloads (gpt-5.5-pro → Opus 4.8, gpt-5.5 → Sonnet 4.6 + cache); keep the small/nano tier on OpenAI where it wins on price.

Frequently Asked Questions

How long does an OpenAI to Claude migration take?

SDK swap is typically 1-2 days for a single service. Prompt restructure for XML tags is another 2-3 days. Cache breakpoint setup is 1 day per pattern. Tool-use translation is 1-3 days depending on tool count. Eval parity testing is 3-5 days. Shadow traffic 5-7 days. Total: 2-3 weeks for a well-scoped service; longer for a multi-service migration.

Is Claude actually cheaper than OpenAI in 2026?

Depends on the tier. Claude Opus 4.8 ($5/$25) is dramatically cheaper than gpt-5.5-pro ($30/$180) — about 85% cheaper. Claude Sonnet 4.6 ($3/$15) is cheaper than gpt-5.5 ($5/$30) by ~40-50% even without caching, more with explicit cache breakpoints. Claude Haiku 4.5 ($1/$5) is *more expensive* than gpt-5.4-mini ($0.75/$4.50) by ~30%, and there's no Claude equivalent at gpt-5.4-nano's $0.20/$1.25. Migrate premium and mid-tier; keep nano-tier workloads on OpenAI.

Do I need to rewrite my prompts when migrating from GPT to Claude?

Strongly recommended. Markdown-structured prompts work on Claude but XML-tagged prompts work better — typically 7-12 percentage points of accuracy lift on classification tasks per our internal eval. Wrap distinct sections in <task>, <context>, <example>, <output_format> tags. Skipping this leaves quality on the table.

What's the difference between OpenAI prompt caching and Claude prompt caching?

OpenAI's cache is opportunistic and prefix-only — automatic if your system prompt is stable, you pay 10% of standard input on cache hits. Claude's cache is explicit — you mark breakpoints with `cache_control: {type: 'ephemeral'}`, pay 1.25x (5-min TTL) or 2x (1-hour TTL) on cache write, then 10% on read. Claude's explicit model is more powerful — you can cache any portion of input, not just the literal prefix.

How do I translate OpenAI tool definitions to Claude?

Flatten the schema. OpenAI uses `{type: 'function', function: {name, description, parameters}}`. Claude uses `{name, description, input_schema}` — same content, no enclosing `function` wrapper. The parameter schema (JSON Schema) is identical. Tool result handling differs: Claude returns `tool_use` content blocks and expects results as `tool_result` blocks in a user message.

Should I migrate everything from OpenAI or run both providers?

Most production teams end up running both. OpenAI wins on small/nano tier ($0.20-$0.75 input range — Claude has no equivalent). Claude wins on premium tier (Opus 4.8 vs gpt-5.5-pro) and mid-tier with caching. Adaptive routing — send each request to the cheapest provider that meets quality bar — is the production pattern at scale.

Does Claude support structured outputs like OpenAI's JSON mode?

Yes. Claude supports JSON output via tool use — define a tool with the desired output schema as `input_schema`, then prompt Claude to call it. Anthropic also exposes a direct JSON mode similar to OpenAI's. The tool-use pattern is more flexible and gives you per-field validation; the direct JSON mode is simpler for one-shot extraction.

Will my OpenAI rate-limit tier transfer to Anthropic?

No — Anthropic has its own rate-limit ladder independent of OpenAI's tiers. New accounts start at a low tier and graduate based on paid usage. Plan for a 2-4 week rate-limit ramp on Anthropic in parallel with your migration. Run shadow traffic at a lower rate during the ramp.

Migrate the SDK in a day. Migrate the prompts to keep quality.

A GPT-tuned prompt will under-deliver on Claude. Our AI Prompt Generator rewrites your OpenAI prompts for Claude (XML tags, cache-anchored, tool-use-ready) based on YOUR business + task. 14-day free trial, no card.

Browse all prompt tools →