Tasks where multi-shot reliably helps (lots)
**Classification with non-obvious categories.** When the model needs to pick from a label set where the boundaries between labels aren't self-evident from the label names. Example: classifying customer support tickets into 'urgent / standard / informational' where 'urgent' has specific definitions you want enforced. 2–5 examples per label dramatically improve consistency. Lift: 15–35% accuracy improvement on production classification tasks compared to zero-shot.
**Structured output matching a specific schema.** When the output must match a precise format (specific JSON shape, specific markdown structure, specific table layout). Examples teach the model the exact format better than text descriptions. 2–3 examples typically sufficient; 5+ shows minimal additional benefit. Lift: 25–60% structural compliance improvement vs. zero-shot.
**Brand voice / writing style replication.** When the desired output should sound like specific reference content (your existing copy, a particular author's voice, a domain-specific register). The model can describe a voice in the abstract but rarely matches it; examples ground the abstraction. 3–5 examples needed; fewer doesn't capture enough range, more produces minimal incremental benefit. Lift: 20–40% style match improvement.
**Domain-specific reasoning with non-obvious patterns.** When the task requires a specific reasoning chain the model wouldn't produce by default. Examples that demonstrate the reasoning step-by-step (chain-of-thought few-shot) lift performance on complex reasoning tasks. Lift: 20–35% on math, multi-step logic, and structured analysis tasks.
**Translation or transformation between unusual formats.** When converting between formats the model hasn't seen in volume during training (e.g., specific configuration file conversions, legacy data format parsing). Examples teach the format better than format descriptions do.