What is self-consistency prompting?
Self-consistency is a decoding strategy layered on top of chain-of-thought. Instead of taking one reasoning chain produced with greedy decoding, you sample several independent chains for the exact same question, then marginalize over the reasoning and keep only the final answers — choosing whichever answer appears most often. The reasoning is a means to an end; the vote is on the conclusion.
The intuition from Wang et al. 2022 is that complex problems admit multiple valid reasoning paths that should all land on the same correct answer, whereas flawed paths tend to produce a spread of different wrong answers. Aggregating across samples therefore amplifies the signal (the convergent correct answer) and averages out the noise.
It is sometimes called 'sample-and-vote' or 'sample-and-marginalize.' Crucially, it only works when the final answer is something you can compare and count — a number, a label, a choice — so that 'most common' is well defined.