Skip to content
Structured outputs · Schema design · JSON validation

Structured Output Schema Design 2026: Production Patterns + The 5 Schemas That Break LLMs

Getting an LLM to reliably emit valid JSON requires more than 'respond in JSON'. The 2026 patterns: response_format, JSON Schema, Zod/Pydantic validation, schema design that minimizes hallucination. + the 5 anti-patterns.

By Andy Gaber, Founder, Digital Dashboard HubUpdated

Per OpenAI's structured outputs documentation at platform.openai.com, Anthropic's tool use docs at docs.anthropic.com, Google's Gemini structured output docs at ai.google.dev, Pydantic's documentation at docs.pydantic.dev, Zod at zod.dev, and the JSON Schema specification at json-schema.org, 2026 LLM providers now support guaranteed-valid JSON output via constrained decoding. The infrastructure is solved.

What isn't solved: schema design. The same LLM with the same provider can produce 99% accuracy on a well-designed schema vs. 60% on a poorly-designed one. Per Instructor at jxnl.github.io/instructor, schema design is the bigger production lever than provider choice.

Below: the 4 schema design principles, the 5 anti-patterns that break production, validation patterns, and provider comparison. Sources include OpenAI structured outputs at platform.openai.com, Anthropic at docs.anthropic.com, Google Gemini at ai.google.dev, Pydantic at docs.pydantic.dev, Zod at zod.dev, Instructor at jxnl.github.io/instructor, JSON Schema at json-schema.org, and arxiv research on constrained decoding at arxiv.org.

Structured output — provider comparison + validation library matrix

Feature
Mechanism
Validation library
Notes
OpenAIresponse_format with json_schema mode (strict)Zod / Pydantic compatibleStrict mode requires all fields required + additionalProperties: false
AnthropicTool use with input_schema (JSON Schema)Zod / Pydantic compatibleTool use IS the structured output mechanism
Google Geminiresponse_mime_type + response_schemaPydantic compatibleSchema constraint enforced server-side
Instructor (library, multi-provider)Pydantic-first abstractionPydantic nativeAdds retry + validation on top of provider APIs
Pydantic AI (framework)Pydantic-native agent + structured outputPydantic nativeType-safe agent framework

Provider references: [OpenAI structured outputs at platform.openai.com](https://platform.openai.com/docs/guides/structured-outputs), [Anthropic tool use at docs.anthropic.com](https://docs.anthropic.com/en/docs/build-with-claude/tool-use), [Google Gemini at ai.google.dev](https://ai.google.dev/). Validation: [Pydantic at docs.pydantic.dev](https://docs.pydantic.dev/), [Zod at zod.dev](https://zod.dev/), [JSON Schema at json-schema.org](https://json-schema.org/). Library: [Instructor at jxnl.github.io/instructor](https://jxnl.github.io/instructor/).

Principle 1 — Flat schemas beat nested schemas

**The pattern:** A flat schema with 5 top-level fields outperforms a nested schema with 5 fields organized 3 levels deep, even when the information content is identical.

**Why:** Per Instructor at jxnl.github.io/instructor and Pydantic's structured output research at docs.pydantic.dev, nested schemas force the LLM to track multiple levels of context simultaneously. Flat schemas reduce cognitive load + improve completion accuracy. Field accuracy on flat schemas is typically 10-25% higher than on equivalent nested ones.

**The fix:** Flatten where possible. `{user_name, user_email, user_signup_date}` beats `{user: {name, email, signup_date}}`. The minor verbosity loss is worth the accuracy gain.

**The exception:** When nesting genuinely models real structure (e.g., line items in an invoice — each line item has product + quantity + price as a unit), keep the nesting. Don't flatten into `line_item_1_product`, `line_item_1_quantity`, `line_item_1_price` — that's worse.


Principle 2 — Required fields, not optional

**The pattern:** Schemas with all-required fields outperform schemas with optional fields. The LLM works harder to fill required fields; it skips optional fields more easily.

**Per OpenAI structured outputs at platform.openai.com:** OpenAI's strict mode literally requires all fields in the schema to be marked required. The pattern is to make every field required and use explicit null / 'unknown' sentinel values instead of optional fields.

**Implementation:** Instead of optional `customer_phone`, use required `customer_phone` with type `string | null` and explicit instruction: 'Use null if phone not in source material.' Forces the LLM to consider every field; reduces silent omissions.

**Per Pydantic at docs.pydantic.dev and Zod at zod.dev:** both libraries support this pattern via explicit nullable types. The validation layer accepts null but tracks it explicitly rather than letting absence pass silently.


Principle 3 — Enums beat free-text for categorical fields

**The pattern:** Fields that should be one of a small set of values should be typed as enums, not free-text strings. Per JSON Schema specification at json-schema.org, `enum: ['high', 'medium', 'low']` constrains LLM output to those exact values.

**Why it matters:** Without enum constraint, LLMs generate variants — 'high', 'High', 'HIGH', 'High priority', 'h', 'top priority'. Each variant breaks downstream code that expects exact-match. Per Instructor at jxnl.github.io/instructor, enum-typed fields have near-100% downstream-code compatibility; free-text categorical fields have 60-85%.

**The provider-side enforcement:** Per OpenAI structured outputs at platform.openai.com and Anthropic's tool use at docs.anthropic.com, modern constrained-decoding implementations enforce enum membership at the token level — invalid values are impossible by construction.

**The exception:** When the set of possible values is genuinely unbounded (e.g., user-supplied tag names), use string. When it's bounded but rapidly evolving (e.g., product categories), consider enum vs. validated-string trade-off.


Principle 4 — Examples in the schema description, not as separate few-shot

**The pattern:** Each schema field's description should include 1-2 examples of valid + invalid values. Embedded examples outperform separate few-shot examples for structured output tasks.

**Why:** Per Pydantic at docs.pydantic.dev and Instructor at jxnl.github.io/instructor, the LLM sees the field description in the schema context during generation. Examples embedded there are referenced at the moment of decision. Separate few-shot examples are at higher distance + less referenced.

**Implementation:** Field `customer_priority` description: 'Priority level. Valid: high, medium, low. Examples: VIP customer = high. Standard subscriber = medium. Free trial = low.' The embedded examples disambiguate edge cases the field name + enum alone don't capture.

**Per Anthropic at docs.anthropic.com:** the JSON Schema `description` field is the primary input channel for guiding LLM behavior on each field. Treat it as a mini-prompt for that specific field.


The 5 anti-patterns that break structured output

**Anti-pattern 1 — Deeply nested optional fields.** Multi-level nesting with optional fields at each level. The LLM frequently omits entire branches of the tree. Per Instructor at jxnl.github.io/instructor, this is the most-common production failure.

**Anti-pattern 2 — Free-text where enum belongs.** 'Severity: low' vs. 'Severity: a bit low maybe' — both pass schema validation; downstream code breaks on the second. Per JSON Schema at json-schema.org, use enum constraints for categorical data.

**Anti-pattern 3 — Ambiguous field names.** `value`, `data`, `info`, `result` — the LLM has to guess what should go there. Per Pydantic at docs.pydantic.dev, descriptive names (`customer_satisfaction_score`, `support_ticket_summary`) reduce ambiguity-driven errors.

**Anti-pattern 4 — Date/time as free-text.** 'When was this signed?' → LLM produces 'last Tuesday' or 'mid-2024' or '2024-03-15'. Per OpenAI structured outputs at platform.openai.com, use ISO 8601 string with explicit format constraint or numeric epoch timestamp. Validate via Zod at zod.dev or Pydantic at docs.pydantic.dev datetime validators.

**Anti-pattern 5 — Numbers as strings without type validation.** Quantities, prices, percentages — if not typed as number, LLM may return '$5.99' or '5.99 dollars' or 'about 5'. Per Google's Gemini structured output at ai.google.dev and Anthropic at docs.anthropic.com, type as number; let the constrained decoder enforce.

Naive 'respond in JSON' prompts: Field omissions. Type drift. Enum variants ('high' vs. 'High' vs. 'top priority'). Date format chaos. Nested-field accuracy drops. Downstream code breaks on 15-40% of LLM responses. Retry logic everywhere.
Schema design principles + constrained decoding: Flat where possible. All-required fields with explicit null. Enums for categorical. Embedded examples per field. Number + datetime types properly. 95-99% downstream-compatible LLM responses. Retry logic mostly unused.

Design structured output schemas that don't break (4 steps)

  1. 1

    Use constrained decoding (response_format / tool_use, not prompt-only)

    Per OpenAI structured outputs at platform.openai.com, Anthropic tool use at docs.anthropic.com, and Google Gemini at ai.google.dev, modern providers enforce JSON Schema at token level. 'Respond in JSON' in the prompt without constrained decoding is a fallback pattern that fails 15-40% of the time.

  2. 2

    Flatten the schema; mark all fields required (use null sentinels)

    Per Instructor at jxnl.github.io/instructor and Pydantic at docs.pydantic.dev, flat schemas + all-required-with-null-allowed outperform nested + optional. Forces the LLM to consider every field; reduces silent omissions.

    → Open the Code Prompt Builder
  3. 3

    Enum-type categorical fields; number/datetime-type quantitative fields

    Per JSON Schema at json-schema.org and Zod at zod.dev, categorical → enum, quantitative → number, temporal → ISO 8601 string or epoch number. Free-text-where-typed-belongs is the most-common downstream-code breaker.

  4. 4

    Embed examples in field descriptions (mini-prompts per field)

    Per Anthropic at docs.anthropic.com and Pydantic at docs.pydantic.dev, each field's description should include 1-2 examples. Embedded examples outperform separate few-shot for structured tasks. The schema description IS the prompt for each field.

Where to start the structured output work

If you're using 'respond in JSON' prompts without constrained decoding: Switch to provider's constrained decoding. Per OpenAI at platform.openai.com, Anthropic at docs.anthropic.com, and Google at ai.google.dev, this is the largest single quality jump available. 'Prompt only' fails 15-40% in production.

If you have nested schemas with optional fields: Flatten where possible + mark all fields required with explicit null sentinels. Per Instructor at jxnl.github.io/instructor, 10-25% accuracy lift typical.

If you have free-text categorical fields: Convert to enums. Per JSON Schema at json-schema.org and Pydantic at docs.pydantic.dev, enum constraints near-100% eliminate downstream-code-breaking variants.

If you're building a multi-provider stack: Instructor at jxnl.github.io/instructor or Pydantic AI abstract the structured-output mechanism across providers. The Code Prompt Builder helps design schema descriptions + per-field examples that travel cleanly across providers.

Frequently Asked Questions

Is 'respond in JSON' enough to get reliable structured output?

No. Per OpenAI structured outputs at platform.openai.com, Anthropic at docs.anthropic.com, and Google at ai.google.dev, prompt-only JSON requests fail 15-40% in production due to type drift, omissions, and format variants. Modern providers offer constrained decoding (response_format / tool_use / response_schema) that enforces JSON Schema at the token level — this is the production move.

Why are flat schemas better than nested?

Per Instructor at jxnl.github.io/instructor and Pydantic at docs.pydantic.dev, nested schemas force the LLM to track multiple levels of context simultaneously. Flat schemas reduce cognitive load + improve completion accuracy. Field accuracy on flat schemas is typically 10-25% higher than equivalent nested ones. Exception: when nesting models real structure (line items in invoice), keep it.

Should categorical fields be enum or string?

Enum. Per JSON Schema at json-schema.org, enum constraint prevents the LLM from generating variants ('high' vs. 'High' vs. 'top priority'). Per Instructor at jxnl.github.io/instructor, enum-typed fields have near-100% downstream-code compatibility; free-text categorical fields have 60-85%. The provider-side constrained decoder enforces enum membership at token level.

How do I handle optional fields?

Per OpenAI structured outputs at platform.openai.com (strict mode literally requires all fields required), Pydantic at docs.pydantic.dev, and Zod at zod.dev, make every field required + use explicit null / 'unknown' sentinel values instead of optional fields. Forces the LLM to consider every field; reduces silent omissions.

Where do field examples go — in the prompt or in the schema?

In the schema descriptions. Per Pydantic at docs.pydantic.dev, Anthropic at docs.anthropic.com, and Instructor at jxnl.github.io/instructor, embedded examples in each field's description outperform separate few-shot examples for structured output. The LLM sees the field description in the schema context during generation; examples there are referenced at the moment of decision.

What's the most common structured output anti-pattern?

Per Instructor at jxnl.github.io/instructor, deeply nested optional fields. The LLM frequently omits entire branches of the tree when nesting is deep + fields are optional. Combined with free-text fields where enum belongs, these two patterns cause 60%+ of production structured-output failures. Fix: flatten + all-required-with-null + enum constraints.

Design schemas that produce reliable structured output — not retry-prone JSON.

The Code Prompt Builder helps design schema descriptions + per-field examples that travel cleanly across OpenAI, Anthropic, and Google providers. Free, no signup. Part of 40+ free prompt tools.

Browse all prompt tools →