What GPT-5 actually is (and what changed from GPT-4o)
GPT-5 is OpenAI's first model to ship reasoning, multimodal input, and tool use as a unified, single-model surface rather than three separate endpoints. Where GPT-4o, o1, and o3-mini were three distinct API surfaces in 2024-2025, GPT-5 collapses them: a single `gpt-5` model ID with a `reasoning_effort` parameter (`minimal`, `low`, `medium`, `high`) that scales how many internal reasoning tokens the model burns before answering.
Practically, this means you no longer pick a 'chat model' vs a 'reasoning model'. You pick GPT-5 and dial reasoning effort to match the task. A classification call uses `reasoning_effort: minimal` and bills like GPT-4o. A code-synthesis or math-proof call uses `reasoning_effort: high` and burns several thousand reasoning tokens — billed at the output rate even though they're not returned in the response.
Vision is built in: pass an image URL or base64-encoded image in any user message and GPT-5 will analyze it. Function calling, parallel tool calls, structured outputs (force the model to return JSON conforming to a JSON Schema), and prompt caching are all on by default. The Responses API (`/v1/responses`) is OpenAI's recommended endpoint for new code; chat completions still works for backward compatibility.