What makes a model good at code review?
Code review is a reasoning task, not just pattern matching. The traits that matter most are: a **deep reasoning / thinking mode** to trace logic and catch subtle bugs; a **large context window** so the model can see the whole diff plus related files; reliable **tool use / function calling** to plug into your IDE, CI, or a PR bot; and **structured output** so findings come back as a consistent, parseable list. For the reasoning side, see our chain-of-thought prompting guide.
Cost and latency matter too, because review runs often — on every commit or PR. A model that's brilliant but slow and expensive is wrong for inline checks and right for a deep pre-merge pass. The practical answer is usually a **two-tier setup**: a cheap fast model for routine checks and a flagship for the hard review. To structure findings consistently, see structured output schema design patterns.