The pragmatic guide to LLM agents in production

After eighteen months of shipping LLM-driven products into production — three live, two killed in beta — we have one strong opinion: the agent loop is not the interesting problem. The boring infrastructure around it is.

The four mistakes every demo makes

We have audited roughly 40 agent demos in the last year. They share a remarkably consistent set of architectural smells:

Tools whose only schema is a free-text docstring
A single top-level loop with no notion of subtasks
No idempotency on side-effecting tools
State stored only in the model context window

Most agent failures we shipped were not reasoning failures. They were the kind of plumbing bug any senior engineer would have caught in code review — if the code had been visible.

A loop that actually works in production

The pattern we have converged on is unglamorous. A planner produces a directed graph of typed subtasks. Each subtask is a pure function over the tool surface, with explicit pre/post conditions. The executor walks the graph in topological order, persisting the result of every node to durable storage before the next node runs.

Three properties fall out of this design for free: idempotency, observability, and better LLM planning. Re-running the plan after a crash skips completed subtasks. Each subtask is a row in a table; auditing what the agent did is a query. The LLM gets better at planning when its outputs have schemas.

Why schemas beat prose

The most counterintuitive result from this year: dropping the natural-language tool descriptions in favour of strict TypeScript-derived JSON schemas raised our task success rate by 11.4 points. Models prefer to be told the shape of what they are filling out.

Published 20 May 2026

The four mistakes every demo makes

A loop that actually works in production

Why schemas beat prose

The best of Bytevara,once a week.

The best of Bytevara,
once a week.