Why do models stop short?
Two reasons. First, every model has a maximum output length per response — a token cap on how much it writes in one turn — that is far smaller than how much it can read. A model might accept a 1M-token context but still only emit a few thousand tokens per reply. Ask for more than fits and the response is truncated mid-sentence.
Second, models interpret soft length words conservatively. "Detailed," "thorough," and "comprehensive" don't pin a number, so the model picks a safe, shorter length. The reliable fix is a concrete target: a word count, an item count, or a section count. As a rough conversion, per OpenAI and Anthropic docs, 1 token is about 4 characters or 0.75 words in English, so 1,500 words is roughly 2,000 tokens of output — well within a single reply on current models.