By The DDH Team · Digital Dashboard Hub

OpenAI Structured Outputs Tutorial: JSON Mode, json_schema, and strict:true

Structured Outputs is not just json_object with a fancier name. This tutorial covers every mode, every edge case, real Pydantic and Zod code, the token-cost implications, and the gotchas that will bite you in production — so you can ship reliable JSON extraction today.

By DDH Research Team at Digital Dashboard Hub·Updated June 27, 2026

Browse all 40+ free prompt tools

Getting consistent JSON out of a language model used to require a prayer and a retry loop. You'd set response_format to {"type": "json_object"}, write a system prompt that said 'respond only in JSON', and then handle the cases where the model added markdown fences, omitted required fields, invented extra keys, or just returned plain text when it felt like it.

OpenAI's Structured Outputs feature — released in late 2024 and expanded with GPT-5 and the updated gpt-4o family in 2025-2026 — changes the contract fundamentally. When you supply a JSON Schema and set strict:true, the model's sampler is constrained at inference time so that the output token stream is provably valid against your schema. Not 'usually valid'. Provably. The model cannot emit a token sequence that violates the schema — the constrained decoding rejects invalid continuations before they are sampled.

That guarantee has real implications for how you design schemas, handle edge cases, think about token costs, and structure your application code. This tutorial walks through the full picture: the three response_format modes, real code with Pydantic and Zod, schema constraints that trip people up, token and pricing impact, refusal handling, and the operational checklist before you put this in production. For context on how Structured Outputs fits into a cost-efficient AI stack, see our AI Cost Optimization Checklist.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card — AICHAT30 = 30% off Pro. →

OpenAI response_format modes compared

Feature	Guarantee	Schema required	Model support
text (default)	None — free-form text	No	All models
json_object (legacy)	Valid JSON only — no schema enforcement	No (describe in prompt)	gpt-4o, gpt-4o-mini, gpt-4-turbo
json_schema with strict:false	Best-effort schema adherence, not guaranteed	Yes	gpt-4o (2024-08-06+), gpt-4o-mini, GPT-5 family
json_schema with strict:true	Provably valid against schema — constrained decoding	Yes (additionalProperties:false required)	gpt-4o (2024-08-06+), gpt-4o-mini, GPT-5 family

Model support sourced from platform.openai.com/docs/guides/structured-outputs. GPT-5, gpt-4o, and gpt-4o-mini all support strict:true as of June 2026. Older gpt-4-turbo snapshots support json_object only.

The problem with legacy json_object mode

Before Structured Outputs, the standard approach was `response_format: {"type": "json_object"}`. This tells the model to emit valid JSON — but it says nothing about what that JSON should look like. The model is free to invent keys, omit required fields, nest objects differently than you expected, or emit an empty object `{}` when it has nothing to say.

In practice, json_object mode cuts parse errors to near zero — you rarely get a SyntaxError anymore. What you get instead are semantic errors: missing required fields, unexpected null values, arrays where you expected strings, extra keys your code doesn't handle. These require a full retry at full token cost, and they happen at rates of 2-8% per call in production depending on how clearly you describe the schema in your system prompt.

The retry cost is the hidden tax of json_object mode. Each failed parse means re-sending the full input context. At gpt-4o prices ($2.50/1M input, $10.00/1M output as of June 2026), a 5% retry rate on a 2k-token prompt adds about 10% to your effective per-call cost. For high-volume extraction pipelines, that's real money. See the numbers in our AI Cost Optimization Checklist for the before/after math on switching to strict mode.

The other problem with json_object is testing: because the schema lives in your system prompt as natural language, there's no machine-checkable contract between your prompt and your application code. When a field name changes, or you add a new required field, you're relying on the model to read your prose description correctly. That's a fragile guarantee — and it's exactly what Structured Outputs replaces.

How Structured Outputs actually works: constrained decoding

When you pass `response_format: {type: "json_schema", json_schema: {name: "...", strict: true, schema: {...}}}`, OpenAI converts your JSON Schema into a context-free grammar (CFG) that constrains which tokens the model can emit at each step. The sampler only considers tokens that could legally appear at the current position in the output given what's been emitted so far.

This is not post-hoc validation. The model cannot even start typing an invalid key name — if your schema says the object has exactly three keys (name, age, email), and the model wants to emit a fourth key, the constrained sampler simply cannot produce that token sequence. The guarantee is structural, not statistical.

The first call with a given schema incurs a small latency overhead (typically 10-50ms) as OpenAI compiles the schema into a grammar on their side. Subsequent calls with the same schema hit a cache and have no additional overhead. This is why schema design patterns matter: stable, reusable schemas get cached; dynamically generated schemas re-compile every time.

One important implication: constrained decoding changes the model's probability distribution. It doesn't prevent the model from hallucinating content — a field that should contain a city name can still contain 'Atlantis'. What it prevents is structural hallucination: the model cannot add `{"confidence": 0.95}` if your schema doesn't include a confidence field. This distinction matters when you're designing schemas. Constrained decoding is a structural guarantee, not a factual one.

Making your first Structured Outputs call with the Python SDK

The OpenAI Python SDK (v1.40.0+) ships a `beta.chat.completions.parse()` method that handles schema serialization and response deserialization automatically. Here's a minimal working example using Pydantic: ```python from openai import OpenAI from pydantic import BaseModel client = OpenAI() class ExtractedPerson(BaseModel): name: str age: int email: str | None completion = client.beta.chat.completions.parse( model="gpt-4o-2024-08-06", messages=[ {"role": "system", "content": "Extract the person's details from the text."}, {"role": "user", "content": "Hi, I'm Sarah Chen, 34 years old. Reach me at sarah@example.com."} ], response_format=ExtractedPerson, ) person = completion.choices[0].message.parsed print(person.name) # "Sarah Chen" print(person.age) # 34 print(person.email) # "sarah@example.com" ``` The SDK converts your Pydantic model to a JSON Schema with `additionalProperties: false` and `strict: true` automatically. The `.parsed` attribute returns a fully typed `ExtractedPerson` instance — no `json.loads()`, no dict access, no KeyError surprises.

If you want to call the raw completions API directly (not the `parse()` wrapper), you supply the schema manually: ```python response = client.chat.completions.create( model="gpt-4o-2024-08-06", messages=[...], response_format={ "type": "json_schema", "json_schema": { "name": "extracted_person", "strict": True, "schema": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"}, "email": {"type": ["string", "null"]} }, "required": ["name", "age", "email"], "additionalProperties": false } } } ) import json data = json.loads(response.choices[0].message.content) ``` Note the `additionalProperties: false` — this is *required* for strict mode. Without it, the API will return a 400 error. Also note that nullable fields must use `{"type": ["string", "null"]}` (a JSON Schema union), not `{"type": "string", "nullable": true}` (which is OpenAPI syntax, not JSON Schema).

For production use, the `parse()` method is strongly preferred. It handles the schema compilation, the `additionalProperties: false` requirement, and the `refusal` field (more on that in the edge cases section). Raw API calls require you to handle all of this manually, and the failure modes are subtle.

Structured Outputs with TypeScript and Zod

The OpenAI Node.js SDK (v4.55.0+) supports a `zodResponseFormat()` helper that converts a Zod schema to the `response_format` object OpenAI expects. Here's the TypeScript equivalent of the Python example above: ```typescript import OpenAI from 'openai'; import { zodResponseFormat } from 'openai/helpers/zod'; import { z } from 'zod'; const client = new OpenAI(); const ExtractedPerson = z.object({ name: z.string(), age: z.number().int(), email: z.string().email().nullable(), }); const completion = await client.beta.chat.completions.parse({ model: 'gpt-4o-2024-08-06', messages: [ { role: 'system', content: 'Extract the person details from the text.' }, { role: 'user', content: 'Hi, I am Sarah Chen, 34 years old. Reach me at sarah@example.com.' }, ], response_format: zodResponseFormat(ExtractedPerson, 'extracted_person'), }); const person = completion.choices[0].message.parsed; console.log(person?.name); // 'Sarah Chen' console.log(person?.age); // 34 ``` The `zodResponseFormat()` helper sets `strict: true` and `additionalProperties: false` automatically. The `.parsed` attribute is typed as `ExtractedPerson | null` — null when the model returns a refusal (see the refusal section below).

One Zod-specific gotcha: Zod validators like `.email()`, `.url()`, `.min(3)`, or `.regex(/^[A-Z]/)` do NOT translate to JSON Schema constraints that the constrained sampler enforces. The sampler works from the schema, and the OpenAI SDK converts Zod to JSON Schema without these format constraints. If you need the model to emit a valid email format, you must enforce it in your application code after parsing — not rely on the Zod validator to catch invalid model output. The same applies to Pydantic validators: `@validator` decorators run on your side, not the model's side.

For complex TypeScript types — nested objects, discriminated unions, recursive types — see function calling vs structured output for a detailed comparison of when each approach is more ergonomic. For pure data extraction, Structured Outputs with Zod is usually simpler; for multi-step agentic tool use, function calling is still the right model.

Nested objects, arrays, and discriminated unions

Structured Outputs supports nested schemas naturally. Here's a more complex example — a document extraction schema with nested arrays of line items: ```python from pydantic import BaseModel from typing import Literal class LineItem(BaseModel): description: str quantity: int unit_price_cents: int category: Literal["hardware", "software", "services", "other"] class Invoice(BaseModel): invoice_number: str vendor_name: str total_cents: int currency: Literal["USD", "EUR", "GBP", "CAD"] line_items: list[LineItem] notes: str | None ``` The `Literal` type maps to a JSON Schema `enum`, which the constrained sampler enforces perfectly — the model can only emit one of the specified string values for that field. This is extremely useful for classification tasks: instead of prompting 'classify as hardware, software, services, or other and return the result in a JSON object', you just define the field type and the constraint is built into the sampler.

Discriminated unions require more care. If you have a schema where different object shapes are valid depending on a discriminator field, you need to use `anyOf` or `oneOf` in your JSON Schema. Pydantic handles this with `Union` types, but the schema it generates can be verbose. OpenAI's constrained sampler supports `anyOf` as of mid-2025 — it computes the union of legal tokens across all valid branches at each position. Performance is slightly worse than a single schema because the grammar is more complex, so prefer flat schemas over deeply nested discriminated unions when throughput matters.

Recursive types (e.g., a tree node that contains a list of tree nodes) are partially supported. OpenAI's documentation notes that recursive schemas require defining the recursion via a `$defs` reference — inline recursion is not supported. For most real-world use cases (expression trees, nested comment threads, org chart structures), `$defs`-based recursion works correctly. For deeply recursive structures (depth > 8-10), consider flattening the structure and reconstructing the tree on your side, since each nesting level increases schema compilation time and output token count. See structured output schema design patterns for flattening strategies.

JSON Schema keywords that strict mode does NOT support

Not all JSON Schema keywords work with `strict: true`. OpenAI's constrained sampler supports a specific subset of JSON Schema Draft 2020-12. The following keywords are NOT supported in strict mode as of June 2026 (sourced from platform.openai.com/docs/guides/structured-outputs):

**Unsupported keywords:** `minLength`, `maxLength`, `minimum`, `maximum`, `exclusiveMinimum`, `exclusiveMaximum`, `multipleOf`, `pattern`, `format`, `minItems`, `maxItems`, `uniqueItems`, `contains`, `not`, `if`/`then`/`else`, `$ref` to external schemas. These constraints will be silently ignored if you include them — the schema will compile, but the constraints won't be enforced by the sampler.

**Supported keywords:** `type`, `properties`, `required`, `additionalProperties` (must be false), `enum`, `anyOf`, `oneOf`, `$defs` (for internal references), `items` (for arrays), `description`. The `description` field is especially important — it's not a constraint, but it's visible to the model during generation and influences what the model puts in the field. A field described as `"The ISO 4217 currency code, e.g. USD"` will produce more reliable output than one described as `"currency"`.

The practical implication: if you need length constraints on strings or count constraints on arrays, enforce them in your application code after parsing — not in the schema. For classification tasks, use `enum` instead of pattern constraints. For numeric ranges, validate post-parse. This is a real ergonomic gap compared to full JSON Schema Draft 7 validation, but it's intentional — supporting the full keyword set would make the constrained grammar extremely complex and slow to compile.

Handling refusals in strict mode

The most surprising edge case in Structured Outputs: the model can refuse to populate your schema. When a user request violates OpenAI's content policy (or the model's safety training flags the request as harmful), the model cannot emit a schema-valid response — it needs to explain the refusal in natural language. OpenAI handles this by adding a `refusal` field to the message object alongside `content`.

When you use the `parse()` SDK method, a refusal shows up as `completion.choices[0].message.refusal` (a string) while `completion.choices[0].message.parsed` is `null`. You must check for refusals before accessing `.parsed` — otherwise you'll get a null dereference in production: ```python message = completion.choices[0].message if message.refusal: # The model refused — handle gracefully print(f"Model refused: {message.refusal}") else: person = message.parsed print(person.name) ``` In TypeScript with Zod, the same pattern applies: ```typescript const message = completion.choices[0].message; if (message.refusal) { console.error('Model refused:', message.refusal); } else if (message.parsed) { console.log(message.parsed.name); } ```

Refusals are rare in non-adversarial use cases — they typically appear when users try to extract PII in prohibited contexts, generate harmful content structured as an object, or when the model encounters an edge case its safety training flags incorrectly. For internal tooling and data extraction pipelines with controlled inputs, you'll rarely see them. For user-facing applications where users can submit arbitrary text for extraction, build the refusal handler before launch.

A related edge case: `finish_reason: "length"`. If the model hits `max_tokens` before completing the JSON output, you get a truncated, invalid JSON string — even in strict mode. Strict mode guarantees validity given a complete response; it doesn't prevent truncation. Always set `max_tokens` generously enough for your schema's worst-case output, and handle `finish_reason: "length"` as a distinct error path in your retry logic.

GPT-5, gpt-4o, and gpt-4o-mini: which model to use for extraction

All three current production models support Structured Outputs with `strict: true`. The right choice depends on schema complexity and accuracy requirements. As of June 2026 pricing from platform.openai.com/pricing:

**gpt-4o-mini** ($0.15/1M input, $0.60/1M output): Best for high-volume, simple schemas. Customer support ticket classification, sentiment tagging, entity extraction from short texts, form field population. Accuracy on simple 3-5 field schemas is excellent. Struggles with deeply nested schemas (5+ levels of nesting) and ambiguous field semantics — if your schema requires judgment calls about which bucket a piece of data falls into, mini often makes the wrong call.

**gpt-4o** ($2.50/1M input, $10.00/1M output): The workhorse for production extraction. Handles complex schemas reliably, understands field descriptions and uses them correctly, makes better semantic judgments on ambiguous input. For most teams running extraction pipelines, gpt-4o hits the right accuracy/cost tradeoff. If you're choosing between models for a new extraction task, start here and down-tier to mini only after running an accuracy eval on 100+ examples.

**GPT-5** (pricing varies by tier — $5-15/1M input, $15-30/1M output at standard tier): Necessary for extraction tasks that require deep reasoning — legal document analysis, scientific paper extraction, complex financial data normalization, multi-document entity resolution. The accuracy gap between GPT-5 and gpt-4o is significant on complex schemas; the cost gap is 5-10x. Use GPT-5 for high-value, low-volume extraction; use gpt-4o for production volume. For more on model selection across tasks, see choosing an LLM for production.

Model tiering for structured extraction is one of the highest-leverage cost optimizations available. A common pattern: run GPT-5 on your eval set to establish a gold-standard baseline, then test gpt-4o and gpt-4o-mini against that baseline. If mini passes >95% of your quality bar, ship mini. If not, ship gpt-4o. Re-evaluate quarterly as model quality improves. For the full tiering methodology, see AI Cost Optimization Checklist.

Token cost implications of Structured Outputs vs json_object

Switching from `json_object` to `json_schema` with `strict: true` affects your token bill in several ways — some positive, some negative, and the net effect depends on your use case.

**Input token cost: slightly higher.** The schema you pass in `response_format` counts as input tokens. A typical 5-field schema adds 100-300 input tokens per call. At gpt-4o pricing ($2.50/1M), that's $0.00025-$0.00075 per call — negligible unless you're running millions of calls per day. If schema input token cost matters at your volume, cache your schema reference using prompt caching — the schema portion of your prompt qualifies for cache reads at 10% of standard rate.

**Output token cost: lower in almost every case.** With `json_object`, models tend to emit explanatory preamble before the JSON ("Here is the extracted information:"), markdown fences (```json), trailing commentary, and verbose field names. With strict mode, the model is constrained to emit pure JSON from token 1. Across real production workloads, strict mode typically cuts output tokens 15-40% compared to json_object mode on the same tasks — because the model can't emit anything that isn't part of the JSON structure.

**Retry cost: eliminated.** As described above, json_object mode has a 2-8% retry rate due to schema mismatches. Strict mode eliminates structural retries entirely (refusals are a separate failure mode, not a schema mismatch). On high-volume pipelines, eliminating retries saves 2-8% of your total cost.

**Net effect:** For most workloads, switching from json_object to strict Structured Outputs reduces total token cost by 10-25% after accounting for schema overhead, output reduction, and retry elimination. This is a free optimization — same model, same quality, lower cost. To measure the impact on your specific workload, use our AI Prompt Cost Calculator to model the before/after token counts.

Structured Outputs in function calling and tool use

Structured Outputs is not just for `response_format` — it also applies to function/tool call return values. When you define a tool with `strict: true` in the function definition, the model's function arguments are guaranteed to match the function's parameter schema: ```python tools = [ { "type": "function", "function": { "name": "create_calendar_event", "description": "Create a new calendar event", "strict": True, "parameters": { "type": "object", "properties": { "title": {"type": "string"}, "start_time": {"type": "string", "description": "ISO 8601 datetime"}, "duration_minutes": {"type": "integer"}, "attendees": { "type": "array", "items": {"type": "string"} } }, "required": ["title", "start_time", "duration_minutes", "attendees"], "additionalProperties": false } } } ] ``` With `strict: true` on the function, you no longer need to validate the arguments JSON before calling your function — the schema constraint guarantees they match. For agentic systems where the model calls tools in a loop, this eliminates a major class of runtime errors. See function calling vs structured output for a full comparison of when to use each approach.

One architectural consideration: if you're using `response_format` with `json_schema` to extract data AND the same model call needs to optionally call a tool, you can't combine both in a single call cleanly. The `response_format` setting applies to the final text response, while tool calls are a separate mechanism. For agentic pipelines that need both structured data extraction AND tool calls, the standard pattern is: first call with tools to gather information, then a second call with `response_format` to structure the gathered data. This adds latency but keeps each call's responsibility clear.

Production checklist before you ship

Before putting Structured Outputs in production, work through this checklist. These are the issues that trip up teams going from a working notebook to a reliable production system.

**Schema validation at startup.** Validate your schema against the OpenAI Structured Outputs schema requirements at application startup — before any user traffic hits. Common mistakes caught here: `additionalProperties` not set to false, use of unsupported keywords like `minLength`, missing `required` array for all properties. A 400 from the API on your first call in production is preventable.

**Refusal handling.** Every call that uses `parse()` must check `message.refusal` before accessing `message.parsed`. Missing this check is a null pointer exception waiting to happen. Add it now, before you need it.

**Finish reason handling.** Check `finish_reason === 'length'` on every response. If you hit it, either increase `max_tokens` or log it as a schema-sizing bug. A `max_tokens` that's just barely enough will fail on the 5% of inputs that produce longer-than-average output.

**Schema caching.** If you're calling the same schema more than 10 times per minute, add a `Cache-Control` header or use OpenAI's prompt caching on the schema portion of your system prompt. The schema compiles once per schema string on OpenAI's side — identical schema strings hit the cache. If your code dynamically generates the schema (e.g., inserting field names from a database), it'll re-compile every time. Freeze the schema structure and pass field semantics through `description` fields instead.

**Accuracy eval before production.** Run 100+ representative inputs through your schema and manually verify the output on at least 50 of them. Common failure modes: model puts data in the wrong field when two fields have similar semantics, model emits 'N/A' or 'unknown' strings in nullable fields instead of null, model interprets an integer-typed field as a string count rather than a number. These are model-level failures, not schema failures — they require prompt changes or model upgrades, not schema changes.

**Monitor output token counts.** After shipping, watch the output token distribution for your extraction calls. A sudden increase in output tokens often signals that the model is being verbose inside string fields — padding field values with explanations. Tighten field descriptions to specify the expected format precisely.

Structured Outputs vs alternatives: when to choose what

Structured Outputs with `json_schema` and `strict: true` is the right choice for the majority of production data extraction use cases — but it's not always the right tool. Here's when to use each approach:

**Use strict Structured Outputs when:** you need a guaranteed schema for downstream processing, your schema is stable across calls, you want to eliminate retry logic, you're using gpt-4o, gpt-4o-mini, or GPT-5. This covers ~80% of extraction use cases.

**Use json_object (legacy) when:** you need to support older model snapshots (gpt-4-turbo, gpt-3.5-turbo), your schema varies significantly call-to-call (e.g., the field list is determined at runtime from user input), or your schema uses JSON Schema keywords that aren't supported in strict mode (minLength, pattern, etc.) and you'd rather take the best-effort behavior than work around the limitation.

**Use function calling (tool use) with strict:true when:** you want the model to conditionally extract data (only call the function if there's relevant data to extract), you need the model to potentially call multiple different schemas in a single turn, or you're building an agent that both reasons and extracts data. Function calling lets the model choose not to extract if the input doesn't warrant it — `response_format` always forces an extraction attempt.

**Use post-processing validation (without strict mode) when:** you need the full JSON Schema keyword set for validation, your application already has strong validation infrastructure (e.g., a Pydantic model with custom validators), and a 2-5% retry rate is acceptable. Some teams find this simpler to reason about than the constrained sampler's limitations. For more on how these choices affect overall system quality, see how to get JSON output from LLMs for the full comparison.

The right architecture usually combines approaches: strict Structured Outputs for the extraction layer, Pydantic/Zod validators for business logic constraints (field length, regex format, value ranges), and monitoring for model-level accuracy. Strict mode handles structure; your application code handles semantics.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. AICHAT30 = 30% off Pro. →

Continue your research on adjacent topics — calculators, rate limits, head-to-head comparisons, and guides.

Related prompt tools

Function Calling vs Structured Output: Full Comparison→Structured Output Schema Design Patterns→How to Get JSON Output from LLMs→AI Cost Optimization Checklist 2026→AI Prompt Cost Calculator→How to Choose an LLM for Production→Prompt Injection Defense: 5 Strategies→Evals and Grading LLM Outputs Systematically→

Frequently Asked Questions

What's the difference between json_object mode and json_schema strict mode?

json_object guarantees the response is valid JSON, but says nothing about its structure — the model can emit any keys, omit required fields, or invent extra fields. json_schema with strict:true uses constrained decoding to guarantee the output matches your exact schema: no extra keys, no missing required fields, no wrong types. The guarantee is structural (not factual — the model can still hallucinate content, just not structure).

Does strict:true cost more than json_object?

Slightly more on input tokens (the schema adds 100-300 tokens), but less overall. Output tokens drop 15-40% because the model emits pure JSON without preamble or markdown fences. Retry costs are eliminated. Net effect for most workloads: 10-25% lower total token cost compared to json_object with a retry loop.

Which models support Structured Outputs with strict:true?

As of June 2026: gpt-4o (2024-08-06 and later snapshots), gpt-4o-mini, and the full GPT-5 family. Older models like gpt-4-turbo and gpt-3.5-turbo support json_object mode only. Always check platform.openai.com/docs/guides/structured-outputs for the current model support list, as it expands with new model releases.

What happens if the model can't fill a required field?

With strict:true, the model must emit every required field — it cannot omit them. If the input doesn't contain information for a required field, the model will typically emit a null value (for nullable fields), an empty string, or 'N/A' as a string. To handle this cleanly, make fields that may be absent nullable (string | null in Pydantic, z.string().nullable() in Zod), and add a description like 'null if not present in the input'. Then validate for 'N/A' strings in your application code.

Can I use $ref to reference an external schema?

No — external $ref is not supported in strict mode. You can use $defs for internal schema references (self-contained within the schema object), which is sufficient for most use cases including recursive types. If your schema currently relies on external $ref to a shared schema registry, you'll need to inline the referenced definitions.

How do I handle a model refusal in strict mode?

When the model refuses (due to content policy or safety training), the response has message.refusal set to a string explanation and message.parsed (or message.content) set to null. With the Python or Node SDK parse() method, check message.refusal before accessing message.parsed. If you're using the raw API, check that finish_reason is 'stop' and message.content is not null before parsing.

Will Structured Outputs break if I add a new field to my schema?

Adding a new optional field (not in the required array) is backward compatible — existing callers that don't expect the new field will just ignore it. Adding a new required field is a breaking change for any existing code that parses the output and doesn't expect the new field — but the API itself won't break, the schema will compile and run. The schema is compiled fresh (or from cache) on each unique schema string, so you don't need to notify OpenAI of schema changes.

Is there a schema size limit?

Yes — OpenAI limits schemas to a maximum of 100 object properties total across all nested objects, and 5 levels of nesting. Schemas exceeding these limits will be rejected with a 400 error. For complex extraction tasks that would exceed these limits, break the extraction into multiple calls with simpler schemas and combine the results in your application code.

Know your token costs before you ship.

Switch from json_object to strict Structured Outputs and your output tokens drop 15-40%. Use the AI Prompt Cost Calculator to model the exact dollar savings on your workload — then use DDH Pro to generate extraction prompts pre-tuned to your schema and model tier.

Browse all prompt tools →