Agent Loops: Is This AI's New Brute Force Algorithm?

TL;DR:
- An agent loop is, technically, a
while(!done)that calls an LLM over and over until the model decides to stop — no magic, just retries - The pattern checks all three boxes of classic brute force: no real heuristic, cost that scales with attempts, and a stopping criterion nobody guarantees
- A simple agent burns ~4x the tokens of a direct query; a multi-agent system hits ~15x — and somebody's paying that invoice
- Deciding when to stop the loop is, literally, an instance of Turing's 1936 halting problem: in general, it's not decidable
- There are cases where the loop earns its cost (ReAct, JPMorgan, Siemens), but most production deployments ship with neither a token budget nor a real stopping criterion
"Brute force" isn't an insult. It's a technical term with almost 80 years of history: try every possible combination until one works, without necessarily understanding why it worked. Pick a lock by trying every key on the ring. Crack a hash by trying every string in the dictionary. Solve a chess position by evaluating millions of variations per second, the way Deep Blue did against Kasparov in 1997. It's the substitution of intelligence with compute cycles.
So when people ask me if the agent loop — the architecture behind GitHub Copilot, the Claude Agent SDK, AutoGPT, and half the Awesome-AI-Agents repo — is the new brute force, my answer isn't a metaphor. It's a reasonably precise technical description. Let's break down why, what it actually costs you in tokens, and why almost nobody bothered to give it a decent stopping criterion.
What an Agent Loop Actually Is
An agent loop is an iterative cycle where an LLM with tool access runs repeatedly until a task is done: call the model, if it asks for a tool call execute it, feed the result back, repeat until the model itself decides it's finished. Steve Kinney boiled it down to a few lines of pseudocode that should be tattooed on the forehead of anyone who says "agentic AI" in a product meeting:
while (!done) {
const response = await callLLM(messages);
if (response.toolCalls.length > 0) {
const results = await executeTools(response.toolCalls);
messages.push(...results);
} else {
done = true;
return response;
}
}
Each turn of the loop is a single LLM call. If the model returns a tool call, the system executes it and feeds the result back in for the next iteration. If it returns plain text, the cycle ends. LangChain describes it in the same terms: "a typical agent loop consists of two steps — model call and tool execution — and continues until the LLM decides to stop."
The conceptual difference from a traditional workflow matters, and Anthropic draws it well: workflows are predetermined sequences defined by the developer; agents are open loops where the model decides the flow in real time. Architecturally, this is described in five stages — perception, reasoning, planning, action, observation — repeated until the task ends. The LLM takes the wheel; the loop just executes whatever it decides on each pass.
So far, nothing new under the sun. It's the minimal viable architecture for an agent: LLM + memory + planning + tools, as Lilian Weng summarized in her now-classic post on LLM-powered autonomous agents. The problem starts when you look closely at how the model decides when to stop — and how much it charges you for every turn it doesn't.
From Chain-of-Thought to ReAct: the Loop's Prehistory
Chain-of-Thought (CoT), published by Wei et al. in 2022, asks the model to "think step by step" inside a single prompt. Nothing external gets executed: it's contained internal reasoning that ends when the text ends. Useful for logic, useless if you actually need to touch the real world.
Yao et al. took the next step that same year with ReAct (Reason + Action): interleave "Thought: ... / Action: ..." style reasoning inside a single prompt, simulating tool calls within the generated text itself. The result was significant — a 34% improvement over plain CoT on the ALFWorld benchmark. But it was still one call to the model: the agent simulated acting, it didn't actually act.
The agent loop is what happens when someone asks "what if, instead of simulating the action, we actually executed it and fed the result back to the model?" That jump — from simulation inside the prompt to real execution between iterations — is the entire novelty. It's not a new prompting technique. It's the decision to put the LLM's reasoning inside a real control loop, with real side effects on real systems.
| Feature | Chain-of-Thought | ReAct | Agent Loop | Multi-Agent |
|---|---|---|---|---|
| Iterations | Single | Single | Multiple, until goal met | Many, in parallel |
| Tool use | No | Simulated in prompt | Real: calls APIs/functions | Real, multiple agents |
| Resource cost | Low (1 call) | Low (1 call) | High (~4x a direct query) | Very high (~15x) |
| Who decides "done" | Prompt ends | Prompt ends | The model, every turn | Orchestrator + each agent |
The Brute Force Thesis: Why a while(!done) Isn't Magic
Brute force, formally, has three defining traits: no real heuristic that prunes bad branches in advance, cost that scales linearly (or worse) with the number of attempts, and total dependence on raw compute volume rather than decision quality. Let's check the agent loop against that definition, point by point.
Is there real heuristic pruning? Partially. The LLM "reasons" at each step, but that reasoning is local — it evaluates what to do on this turn, with no guarantee of a globally optimal plan. In practice, it's an expensive version of the "generate-and-test" pattern that symbolic AI ran in the 1960s: generate a candidate action, test it against the world, observe the result, generate the next one. Newell and Simon were already playing this game with the General Problem Solver in 1957. The difference is that today's "generator" costs money per token, and the "tester" is a production API with real side effects.
Does cost scale with attempts? Without question — we'll get to the numbers below. Does it depend on raw compute volume more than decision quality? Also yes: when the model gets something wrong, the system's answer isn't "think harder," it's "try again with the error stuffed back into context." That's retry-until-it-works dressed up in natural language.
And then there's the question of when to stop, which is where the metaphor stops being a metaphor and becomes pure computability theory. The loop ends when the LLM decides it's done. But determining, in general, whether a program containing a loop will terminate is formally undecidable — that's the halting problem, which Alan Turing proved in 1936. I'm not stretching the analogy here: Machiraju and others have documented real cases of AutoGPT "looping forever," chasing "constant improvement" with no stopping criterion that actually stopped the process. We've delegated the resolution of a problem we've known for 90 years has no general solution to a probabilistic model. That's not agent engineering. That's optimism on a credit card.
The Real Cost: Tokens, Latency, and the Bill Nobody Reads
Every loop iteration means an additional model call plus the corresponding tool execution. This isn't free, and the gap isn't marginal: estimates for production agentic architectures show that a simple agent burns roughly 4x the tokens of a direct chat query, and that a multi-agent solution can hit roughly 15x. Put it in dollars: if a direct query costs you one unit, a single-loop agent costs you four, and a coordinated multi-agent system costs you fifteen — before counting retries from failures, which in a system with an unsolved halting problem are the rule, not the exception.
Let's run the math with hypothetical but realistic numbers to illustrate scale: if your direct chat costs $0.01 per interaction and you run 10,000 sessions a month, that's $100. The same volume through a single-agent loop is $400. Through a multi-agent architecture, $1,500. Multiply that by however many products your company rebranded as "agentic" last year, and you'll understand why finance started asking uncomfortable questions in the cloud budget meeting.
Latency scales the same way: every loop turn adds the model's inference time plus the tool's execution time. An eight-iteration loop isn't eight times slower just because — it's eight times slower because you literally made eight sequential calls. And unlike classic brute force, where each attempt is cheap (trying one more key costs you fractions of a second), here every attempt is real tokens billed by a real provider. You combined innovation's cost profile with exhaustive search's intelligence profile. That combination is exactly what makes this worse than traditional brute force on unit economics, not better.
Steve Kinney puts it bluntly: the interesting part of building an agent isn't the loop itself — that's eight lines of code — it's the engineering around the loop: context management, token budgets, failure containment, iteration limits, infinite-loop detection. Without those layers, a trivial agent loop in a demo repo is exactly that: a demo. Shipping it to production without those guardrails isn't innovation, it's engineering negligence.
When the Loop Goes Off the Rails: Halting Problems, Confused Deputies, and Other Old Friends
The security risks of an agent loop aren't conceptually new — they're 1980s and 90s problems with an attack surface the size of an LLM holding real tool access.
The most elegant one, in the sense Edsger Dijkstra meant when he talked about elegance, is the confused deputy problem, described by Norm Hardy in 1988: a program with more privilege than its user gets tricked into using that privilege on an attacker's behalf. An agent loop with access to a delete API, manipulated via prompt injection hidden in a document it was "just" asked to summarize, is the same old confused deputy with a natural-language wrapper. The agent wasn't hacked in the classic sense — it did exactly what the context told it to do, and the context was poisoned.
We already covered this when we analyzed the OpenClaw/Moltbook security collapse: an agent with the "lethal trifecta" — access to private data, exposure to untrusted content, the ability to communicate externally — is a confused deputy waiting for someone to send it the right prompt. Seven hundred seventy thousand active agents inherited malicious commands without questioning them, because no production agent loop shipped with a serious validator for the tool calls it was executing.
The same underlying pattern showed up in the Vercel breach of April 2026: it wasn't a code vulnerability, it was an OAuth token with "Allow All" permissions that an attacker inherited and used to read plaintext environment variables over the API. Swap "OAuth token" for "agent with over-permissioned tools" and you get the same threat model: digital identities with more privilege than necessary, with no audit trail of intent behind each call.
And then there's the supply chain: tool definitions — schemas, descriptions, documentation — are part of the context the LLM processes. If a tool server like the ones we described in our post on MCP (Model Context Protocol) gets compromised, the attacker doesn't need to break anything in the model: they just need the agent to trust a tool that's no longer what it claims to be. Add hallucinations — the model "reporting" it validated something it never ran — and you've got a system that fails silently while billing tokens for every turn of the failure.
When Does Brute Force Actually Pay Off?
This isn't a blanket conviction. Old-school brute force also has legitimate use cases when the search space is manageable and the alternative — designing a perfect heuristic — costs more than just trying things. Agent loops get the same treatment.
ReAct's own benchmark (+34% over CoT on ALFWorld) shows that real feedback between iterations improves success rates on tasks where the right path isn't known in advance. In production, JPMorgan's LOXM trading agent adjusts strategy in milliseconds based on the outcome of each trade — exactly the kind of environment where a fixed plan doesn't work because the market moves faster than anyone could hand-code. Siemens reports a 30% drop in equipment failures with agents that replan maintenance based on real-time telemetry. Salesforce Einstein shaves roughly 20% off sales cycles by letting the agent decide the next follow-up step based on actual lead behavior.
The common pattern in the cases that work: the problem is genuinely open-ended, feedback from each action is informative, and somebody set hard limits — token budgets, max iterations, tool-call validation — before it ever touched production. The common pattern in the cases that fail: someone used an agent loop for a pipeline that was already deterministic, where a fixed sequence of steps would've been faster, cheaper, and easier to audit. If you already know the steps, code them. Don't pay an LLM by the token to rediscover an if/else you already had memorized.
Verdict: Brute Force with Better PR
Yes. An agent loop is brute force. Not in the dismissive sense — in the exact technical sense: generate, test, observe, repeat, with no guarantee of global heuristics or termination, cost scaling with every turn. The difference from 1980s brute force is that back then each attempt cost almost nothing — one CPU cycle — and now every attempt bills real tokens through a real cloud provider's invoice.
That doesn't make it useless. Exhaustive search solves real problems when the space is manageable and no better shortcut exists. But treat it for what it is: an algorithm with no termination guarantees, running with privileges over production systems, billed by usage. Put a circuit breaker on it. Set a hard token budget, not an aspirational one. Audit every tool call the way you'd audit any call carrying elevated privilege — because that's exactly what it is. And the next time someone in a meeting says "agentic AI" with stars in their eyes, ask them what the stopping criterion is. If their best answer is "the model decides," you already know what you're buying: brute force with a pitch deck.
Trust the loop, verify everything else.
Sources
- ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al., 2022 (arXiv)
- Building Effective Agents — Anthropic Research
- LLM Powered Autonomous Agents — Lilian Weng
- Confused Deputy Problem — Norm Hardy, 1988 (reference)
- LangGraph — LangChain AI
- OWASP Top 10 for LLM Applications
- Steve Kinney, "The Anatomy of an Agent Loop" (technical blog)
- Token consumption estimates for agentic architectures, Microsoft Azure AI Foundry
Nick Holmes Senior Engineer. Trust the code, verify everything else. 🔐