The workflow-vs-agent decision
**Workflow:** Code picks the next step. The LLM is called at specific predefined points; each call's output feeds a specific next step encoded in code. Failure modes are deterministic + debuggable. Per Anthropic's building effective agents guide at anthropic.com, most production LLM systems should be workflows.
**Agent:** LLM picks the next step. The LLM has tool access + a goal + the autonomy to decide what to do next. Failure modes are non-deterministic. Per the Anthropic guide, agents trade reliability + cost for flexibility — only use when the flexibility is necessary.
**The decision rule:** Per Anthropic's guidance + production experience reflected in AutoGen's documentation at microsoft.github.io/autogen and LangGraph's docs at langchain-ai.github.io/langgraph, choose workflow when the task structure is known in advance; choose agent when the path through the task can't be predicted.
**The trap:** Frameworks make agents easy to spin up — `Agent(tools=[...]).run(task)`. The ease hides the cost: agents are 3-10× more expensive (more LLM calls + larger prompts), 2-5× slower (more sequential reasoning), and harder to debug (non-deterministic execution paths). Per arxiv research on multi-agent systems, 60-80% of 'agent' use cases would have been better served by a workflow.