Claude Code's Silent Downgrade
TL;DR
There’s real data — not just Reddit complaints — showing Claude’s quality has dropped. AMD analyzed 6,852 sessions and found 73% less deep reasoning. Claude Code’s leaked source code confirms an automatic Opus-to-Sonnet fallback when servers are overloaded — and until recently, it didn’t even tell you. Meanwhile, your bill keeps climbing.
The feeling we all share
If you use Claude daily for work, you’ve probably noticed something over the past few weeks: shallower responses, dumb mistakes, loops that never happened before. And one uncomfortable question: am I paying for Opus tokens and getting a Haiku in disguise?
You’re not alone. A post on the Anthropic subreddit hit 1,060 upvotes with exactly this complaint. Another titled “Claude Is Dead” reached 841. AMD’s head of AI stopped using Claude Code after months of documented frustration.
But the interesting part isn’t the complaints — it’s that this time there’s data.
The numbers Anthropic doesn’t want you to see
Stella Laurenzo, AMD’s head of AI, didn’t just vent on Twitter. She analyzed 6,852 Claude Code sessions, 235,000 tool calls, and 18,000 reasoning blocks. Her findings:
| Metric | Before | After | Change |
|---|---|---|---|
| Reads per edit | 6.6 | 2.0 | -70% |
| Deep reasoning loops | Baseline | -73% | Collapse |
| Stop-hook violations | ~0/day | ~10/day | Zero to ten |
The model shifted from “research first, edit later” to “edit first, ask never.” On complex engineering tasks, deep reasoning dropped 73%.
Marginlab, an independent benchmark tracker, confirmed the trend: Opus’s pass rate on SWE-Bench-Pro fell 6 percentage points (from 56% to 50%).
AMD switched providers. That’s not a decision you make lightly.
The silent fallback: from rumor to source code
The most widespread suspicion is straightforward: when servers are overloaded, Anthropic serves you Sonnet instead of Opus. You pay for a Ferrari, you get a Corolla.
Anthropic has explicitly denied this. In a September 2025 postmortem, they stated: “We never reduce model quality due to demand, time of day, or server load.”
But Claude Code’s source code tells a different story.
What the leaked code reveals
A few days ago, Claude Code’s source code was leaked. Anthropic publishes a compiled package on npm and a repo on GitHub, but not the actual source code. The leak has been analyzed by developers in the community, and what they found in src/services/api/withRetry.ts (lines 327-350) is a fallback mechanism that works like this:
- Your Opus request gets consecutive 529 errors (server overloaded)
- When
MAX_529_RETRIESis reached, Claude Code checks if afallbackModelis configured - If so, it throws a
FallbackTriggeredErrorand retries the entire request with Sonnet
if (options.fallbackModel) {
logEvent('tengu_api_opus_fallback_triggered', {
original_model: options.model,
fallback_model: options.fallbackModel,
})
throw new FallbackTriggeredError(
options.model,
options.fallbackModel,
)
}
In query.ts (line 894), the error is caught, the full conversation history is injected into the fallback model, and it continues as if nothing happened. The fallback model sees all the context Opus generated — it responds without knowing it replaced another model.
There’s an important nuance in the trigger condition (withRetry.ts:329):
(process.env.FALLBACK_FOR_ALL_PRIMARY_MODELS ||
(!isClaudeAISubscriber() && isNonCustomOpusModel(options.model)))
Who’s affected? According to the community’s analysis of auth.ts:
- Subscribers = users logged into claude.ai with Pro, Max, Team, or Enterprise plans (OAuth + inference scope). They don’t get automatic fallback. If Opus is overloaded, they get an error — not a silent downgrade.
- Non-subscribers = users with direct API keys, or third-party tokens (Vertex, Bedrock, Foundry). These users get the Opus → Sonnet fallback.
In other words: if you pay $200/month for Max, you’re protected. If you pay via API key — which is how most developers and third-party tools like Cursor, Cline, or Windsurf work — you get the fallback.
There’s an undocumented env var (FALLBACK_FOR_ALL_PRIMARY_MODELS) that enables fallback for all models and all users, including subscribers. It doesn’t appear in any public documentation. And next to the condition, there’s a TODO in the code that literally says:
“Revisit if the isNonCustomOpusModel check should still exist, or if isNonCustomOpusModel is a stale artifact of when Claude Code was hardcoded on Opus.”
Even Anthropic’s own developers aren’t sure if that check should still be there.
Regarding billing: tokens from the failed Opus attempt are discarded. You’re charged for the model that actually responds (Sonnet). That’s fair. What’s not fair is that you don’t know it happened.
What the changelog confirms
Claude Code’s changelog is even more revealing:
- v1.0.59: “Fixed issue where some Max users that specified Opus would still see fallback to Sonnet” — premium paying users getting Sonnet without asking
- v2.1.73: “Fixed subagents with model: opus being silently downgraded to older model versions on Bedrock, Vertex, and Microsoft Foundry” — silent downgrade on cloud platforms
- v2.1.76: “Improved model fallback notifications — now always visible instead of hidden behind verbose mode” — fallback notifications were hidden by default
Read that last one again. Until version 2.1.76, if Claude Code swapped Opus for Sonnet, you’d only see it if you had verbose mode enabled. Most users don’t even know that mode exists.
Technically, Claude Code does emit a message: “Switched to Sonnet due to high demand for Opus”. But if it was hidden behind verbose mode… does it really count as a notification?
No conspiracy theory needed when the code has come to light — it’s just that Anthropic preferred you didn’t see it.
The double hit: fallback + configured degradation
But model fallback isn’t the only problem. Anthropic has also changed how Opus works even when you are getting the real thing:
-
Adaptive thinking (February 9, 2026): Opus started dynamically deciding how much to “think” per request. Less thinking = faster and cheaper for Anthropic, shallower for you.
-
Medium effort level by default (March 3, 2026):
effort=85was set as default, reducing reasoning depth by 67% according to internal telemetry. -
Thinking block redaction (February 12, 2026): A UI change that hides the model’s reasoning. Anthropic says it doesn’t affect actual reasoning — but the timing correlation with quality drops is hard to ignore.
So you have two possible scenarios when you open Claude Code:
- Scenario A: Opus is available, but working at 33% capacity due to default configuration
- Scenario B: Opus is overloaded and you’re served Sonnet without knowing
In both cases, you’re trusting responses that don’t have the quality you expect.
The precedent: OpenAI already got caught
This isn’t conspiracy thinking. OpenAI did exactly this with GPT-5.
In August 2025, it was discovered that GPT-5 wasn’t a single model but a routing system that automatically selected between cheaper and more expensive variants based on perceived request complexity. Users thought they were using the premium model. They were using a multi-agent system with hidden costs disguised as a single model.
A study tracking 2,250 responses from GPT-4 and Claude over 6 months found 23% variance in response length for GPT-4. A PLOS One paper (February 2026) confirmed “meaningful behavioral drift across deployed transformer services.”
The model you use today isn’t necessarily the model you used yesterday, even if it has the same name.
The bill: the elephant in the room
This is where the story gets truly uncomfortable.
One developer documented their API bill going from $345 in February to $42,121 in March. Not because they used more — because the model needed more attempts to do the same work.
An analysis of 42 agent runs revealed where your tokens actually go:
| Category | % of tokens |
|---|---|
| File reading and search | 35-45% |
| Tool output | 15-25% |
| Context re-sending | 15-20% |
| Actual reasoning | 10-15% |
| Generated code | 5-15% |
Only 5-15% of what you pay for produces actual code. The rest is overhead. And if the model reasons worse, it needs more iterations, which generates more overhead, which costs you more. A vicious cycle where more tokens doesn’t mean better results.
The irony: Opus dropped 67% in price (from $15/$75 to $5/$25 per million tokens). But if the model needs 3x more tokens to do the same work, your bill goes up 50% net.
What Anthropic says vs. what it does
Anthropic has been consistent with one message: they never intentionally degrade quality. Problems are infrastructure bugs, already fixed.
Technically, they’re not lying. The Opus-to-Sonnet fallback is in the client (Claude Code), not the API. When they say “we never reduce model quality due to demand,” they mean the server. That their own client does exactly that when the server returns 529… well, that’s a convenient technicality.
And we already have precedent that this argument doesn’t hold up. In September 2025, they acknowledged three bugs that had degraded responses for weeks:
- A context window routing error that affected 16% of requests
- Output corruption from misconfigured TPU servers
- A compilation bug causing precision errors in token probabilities
Three bugs. For weeks. Affecting up to 16% of requests. And they only acknowledged it after the community had been complaining for weeks.
What haven’t they done this time? Respond to AMD’s data. Or the Marginlab drop. Or the mass cancellations. The Register reported that Anthropic did not respond to requests for comment on the latest complaints.
The definition of “not intentionally degrading” has a lot of wiggle room when you publish a client with silent fallback and hide it behind verbose mode.
What to do as a user
This isn’t about quitting Claude — it’s still a powerful tool when it works well. It’s about not being naive:
1. Don’t blindly trust responses. If you notice shallow answers or mistakes it didn’t used to make, don’t be a model fanboy. Verify. Especially code heading to production.
2. Monitor your bill. If you use the API, track tokens consumed per task. If the tokens-to-useful-output ratio spikes, that’s a degradation signal — regardless of the cause.
3. Force maximum effort when it matters. If your tool allows it, set effort to maximum on critical tasks. The default configuration is optimized for Anthropic’s margins, not your productivity.
4. Have a plan B. Check other providers’ benchmarks, use updated comparisons and don’t marry anyone. Today Claude might be the best for your use case. Tomorrow it might not be.
5. Demand transparency. As an industry, we should demand that AI providers be transparent about configuration changes that affect quality. If you change the default effort from 100 to 85, that’s not a “product adjustment” — it’s a degradation.
The real problem
What’s happening with Claude is a symptom of a bigger issue: AI-as-a-service is a black box where the provider has every incentive to degrade quality and no obligation to tell you.
When you rent a server, you can measure CPU, RAM, latency. When you buy AI tokens, you can’t measure “depth of reasoning.” There’s no SLA for cognitive quality. There’s no way to prove that the model you’re served today is worse than yesterday’s.
And that’s exactly why AI cost transparency matters so much. Not just for your bill — to know what you’re actually buying.
Anthropic has an excellent product. But trust is built with transparency, not delayed postmortems and uncomfortable silences.
Have you noticed changes in Claude’s quality? Has your bill spiked without explanation? I’m interested in your experience — especially if you have data, not just feelings.
Keep exploring
- Don’t be a model fanboy - Why tying yourself to one AI provider is a bad idea
- More tokens isn’t better - The myth that more context = better responses
- FinOps for AI: inference costs - How to control what you spend on tokens
- ChatGPT vs Gemini vs Claude - Updated comparison of the big three
- Hidden costs of multi-agent systems - What they don’t tell you about model routing
Related course
Learn AI Development Master with real practice
Step-by-step modules, hands-on exercises and real projects. No fluff.
See course →Consulting
Got a similar problem with AI Integrations?
I can help. Tell me what you're dealing with and I'll give you an honest diagnosis — no commitment.
See consulting →You might also like
Anthropic just cut off third-party harnesses
Your Claude subscription no longer covers external harnesses like OpenClaw. Pay-per-use is now mandatory, plus free credits and what it means for the ecosystem.
ChatGPT drinks from Elon Musk's fountain
OpenAI integrates X/Twitter data into ChatGPT. What it means when your chatbot drinks from Musk's fountain.
LeCun leaves Meta: LLMs won't reach human-level AI
The godfather of AI bets $3.5 billion on a different architecture: World Models.