Skip to content

35,000 API Calls to Say 'Nothing to Learn Here'

· 7 min read · Read in Español
Share:

An innocent hook in my multi-agent system fired 35,000 LLM calls in three days. Cost: $165. Useful output: zero.


The week I discovered the hole in my bill, I wasn’t doing anything unusual. I had a multi-agent system coordinated via MCP running on Claude Code in autonomous mode — overnight sessions on a remote VPS, resolving development issues while I slept. The system worked. It solved tasks, committed code, closed issues. Everything looked fine.

Until I checked the usage.

75% of my monthly Max plan quota — roughly $165 — had evaporated in three days. Not three weeks. Three days. And not from the productive work, which accounted for just the remaining 20% and had successfully resolved about 50 issues.

The culprit: a reflection hook.

The hook that seemed harmless

In multi-agent architectures, hooks are events that fire automatically at key moments: when an agent finishes a task, when a session closes, when a cycle completes. They’re useful for logging, resource cleanup, or — in my case — “reflection”: asking the model to evaluate whether it had learned anything new from the work performed.

The idea made sense on paper. At the end of each task, a hook fired a call to Haiku (Anthropic’s cheapest model) asking: Is there any new learning from this session worth recording?

The problem is that “each task” included subtasks, sub-sessions, and cascading close events. What I thought would be a handful of reflection calls per day turned into continuous bombardment. The hook fired on every stop event, every session close, every task completion — including internal tasks from the orchestration system itself.

35,000 calls in three days.

And Haiku’s response, all thirty-five thousand times, was essentially the same:

“No significant learning in this session.”

Followed by three paragraphs explaining in detail why there was nothing to learn. Paying output tokens for an LLM to enthusiastically explain that it has nothing to say.

Why this isn’t an isolated bug

My case is anecdotal, but the pattern is structural. Multi-agent systems have an inherent tendency to multiply calls in ways nobody anticipates during initial design.

Chamath Palihapitiya, founder of Social Capital, described this recently when explaining why his team abandoned Cursor for Claude Code: agentic workflows generate what he called “Ralph Wiggum loops” — low-value cycles where the agent keeps executing actions without producing useful results, racking up bills in the background.

The problem has well-documented technical roots. In a multi-turn conversation, token cost grows quadratically: each new turn includes the entire previous history as input context. A 10-cycle reflection loop can consume 50x the tokens of a linear pass. According to a Stevens Institute analysis, an unrestricted agent can cost between $5 and $8 per individual software engineering task.

But you don’t need to reach those extremes. The real danger isn’t the obvious infinite loops — you catch those fast because the system hangs. The danger is functional loops: the system keeps operating correctly, tasks complete, everything looks normal. But underneath, thousands of parasitic calls that nobody sees because they don’t break anything. They just cost money. It’s exactly the kind of failure I documented in why AI agents fail in production — silent failures that don’t surface until you check the bill.

The invisible bill of agentic AI

A review of 127 enterprise agentic AI implementations found that 73% exceeded their budget, some by more than double. Initial development represents only 25-35% of the real three-year cost. The rest goes to tokens, infrastructure, monitoring, and exactly the kind of surprises I ran into. I’ve written about this problem at macro level in FinOps for AI — inference costs are crushing margins and nobody budgets for them properly.

This isn’t exclusive to personal projects. Any system that puts an LLM in an automated loop — whether it’s a support agent, an analysis pipeline, or a development task orchestrator — has the same fundamental risk: cost doesn’t scale linearly with value produced.

My system resolved 50 issues in those nights. That represented 20% of the consumption. The other 80% was an LLM explaining to itself that it had nothing to say. The cost/value ratio was inverted and I didn’t know it because the system wasn’t failing — it was just silently burning budget.

What would have prevented the disaster

After disabling the hook and analyzing what happened, the list of countermeasures is almost insultingly obvious:

A call counter. If the hook had a limit of, say, 10 calls per hour, the damage would have stayed at a couple of dollars. A simple rate limiter. Nothing sophisticated.

A budget cap per session. Claude Code doesn’t offer a real-time consumption dashboard while operating autonomously. There’s no easy way to see that thousands of sessions are firing unless you actively check. A spending limit per session or per day would have cut the bleeding before it became a hemorrhage.

Evaluate whether the hook needed an LLM. The question “is there anything to learn?” almost never had an affirmative answer. If the answer is “no” 99% of the time, that’s not an LLM use case — it’s a deterministic rule use case. Check with simple logic first; only then, if there’s material, invoke the model.

Real observability. Not logs in a file nobody checks. Alerts. If the number of Haiku calls exceeds X per hour, something should fire. A Telegram webhook, an email, whatever. In production, an autonomous agent without observability is a credit card with no limit in the hands of an algorithm.

The lesson for anyone building agentic systems

2026 is the year everyone wants to build AI agent systems. Frameworks multiply — LangChain, AutoGen, CrewAI, and a growing ecosystem of MCP tools that let you orchestrate models like microservices with their own personality. The temptation is understandable: the idea of your code working while you sleep is addictive.

But there’s a fundamental difference between traditional software and agentic software: in traditional software, an unnecessary loop consumes CPU. In agentic software, an unnecessary loop consumes money. Every LLM call has a real cost, and that cost multiplies silently when the system works exactly as designed.

Before adding a hook, a reflection agent, or any component that invokes an LLM automatically, ask yourself three questions:

  1. How often will this actually fire? Not in your test with three tasks. In production, at 3 AM, with 50 parallel tasks.

  2. Does it need an LLM or will deterministic logic do? If the expected answer is “no” 99% of the time, don’t ask a model. Check with an if.

  3. What happens if this fires 10,000 times? If the answer is “nothing bad,” go ahead. If the answer is “a three-figure bill,” put a rate limiter on it before deploying.

The $165 I burned was the price of a lesson I’m now sharing for free. The multi-agent system you’re building today probably works fine. The question isn’t whether it works — it’s how much it costs when nobody’s watching.


Keep exploring

Found this useful? Share it

Share:

You might also like