The pragmatic guide to LLM agents in production
After shipping three agentic products to real users, here is the loop architecture that actually works — and the four things every demo gets wrong.
After shipping three agentic products to real users, here is the loop architecture that actually works — and the four things every demo gets wrong.
After eighteen months of shipping LLM-driven products into production — three live, two killed in beta — we have one strong opinion: the agent loop is not the interesting problem. The boring infrastructure around it is.
We have audited roughly 40 agent demos in the last year. They share a remarkably consistent set of architectural smells:
Most agent failures we shipped were not reasoning failures. They were the kind of plumbing bug any senior engineer would have caught in code review — if the code had been visible.
The pattern we have converged on is unglamorous. A planner produces a directed graph of typed subtasks. Each subtask is a pure function over the tool surface, with explicit pre/post conditions. The executor walks the graph in topological order, persisting the result of every node to durable storage before the next node runs.
Three properties fall out of this design for free: idempotency, observability, and better LLM planning. Re-running the plan after a crash skips completed subtasks. Each subtask is a row in a table; auditing what the agent did is a query. The LLM gets better at planning when its outputs have schemas.
The most counterintuitive result from this year: dropping the natural-language tool descriptions in favour of strict TypeScript-derived JSON schemas raised our task success rate by 11.4 points. Models prefer to be told the shape of what they are filling out.