The 7 dimensions and what each scores
**Dimension 1 — Specificity (1–5).** Does the output address THIS specific case, or could it have been written for any similar input? Score 5 if the output uses concrete details from the input (audience, context, examples). Score 1 if the output is generic — would fit any input in the category. Most weak LLM outputs score 2–3 here; the lift from this dimension alone is large.
**Dimension 2 — Audience-appropriateness (1–5).** Does the language, examples, and level of complexity match the defined audience? Score 5 if a member of the target audience would feel addressed precisely. Score 1 if the output assumes the wrong expertise level or uses inappropriate references. Common failure: writing for 'beginners' but using jargon a beginner wouldn't recognize.
**Dimension 3 — Format adherence (1–5).** Does the output match the requested structure (length, sections, bullet vs. paragraph, headings, tone)? Score 5 if format is exactly what was specified. Score 1 if the output ignores format requirements entirely. The cheapest dimension to fix; LLMs respond well to specific format requirements when stated clearly.
**Dimension 4 — Constraint compliance (1–5).** Does the output respect explicit constraints (forbidden words, required inclusions, word count, etc.)? Score 5 if every constraint is honored. Score 1 if multiple constraints are ignored. Specific constraints (avoid words X, Y, Z) work better than vague constraints (avoid clichés); your prompt's constraint quality affects this score.
**Dimension 5 — Coherence (1–5).** Does the output hang together — paragraphs flow, claims connect, sections support each other? Score 5 for tight logical flow. Score 1 for disconnected paragraphs that could be reordered without losing meaning. Coherence drops in long outputs; weak prompts produce weak coherence at length.
**Dimension 6 — Insight / non-obviousness (1–5).** Does the output say something that wouldn't have been in the first response a competent human would write? Score 5 if the output surfaces non-obvious connections or analysis. Score 1 if it's the most-common surface-level response. The hardest dimension to score consistently because it's somewhat subjective; useful to score against 'would a competent expert in this domain learn anything from this output?'
**Dimension 7 — Actionability (1–5).** If the output is meant to drive action (recommendations, advice, decisions), is the action clear and executable? Score 5 if the reader can act immediately on the output. Score 1 if the output describes a situation without pointing to specific next moves. Many LLM outputs score high on description and low on actionability; the gap is fixable in the prompt.