Claude Sonnet 4.6: Flagship Performance at Mid-Tier Pricing
TL;DR
- Sonnet 4.6 matches Opus 4.6 on computer use (72.5% vs 72.7%) and beats it on office tasks (GDPval-AA Elo: 1633 vs Opus)
- Price: $3/$15 per million tokens, vs $5/$25 for Opus 4.6. A year ago, that performance cost $15/$75 with Opus 4.5
- Computer use went from 14.9% to 72.5% in 16 months — nearly a 5x improvement
- 1 million token context window in beta. Now the default model in claude.ai and Claude Code
- If you’re paying for the most expensive model “just in case,” it’s time to audit your stack
Twelve days ago, Anthropic launched Opus 4.6 and the market shed $285 billion in market cap. Yesterday, without anyone flinching, they launched something arguably more disruptive: a model that does nearly the same thing at a fraction of the cost.
Claude Sonnet 4.6 isn’t an incremental update. It’s proof that frontier AI no longer needs to be expensive.
The numbers that matter
Let’s cut to the comparison:
| Benchmark | Sonnet 4.6 | Opus 4.6 | GPT-5.2 |
|---|---|---|---|
| SWE-bench Verified (code) | 79.6% | 80.8% | ~75% |
| OSWorld (computer use) | 72.5% | 72.7% | — |
| GDPval-AA Elo (office tasks) | 1633 | <1633 | — |
| Price input/1M tokens | $3 | $5 | $1.75 |
| Price output/1M tokens | $15 | $25 | $14 |
Read that table again. On real-world office tasks — drafting reports, analyzing documents, organizing information — Sonnet 4.6 outperforms the model that costs 67% more. On code and computer use, the gap with Opus is less than two percentage points.
The price of intelligence is collapsing
To put this in perspective:
- Opus 4.5 (late 2025): $15 input / $75 output — the frontier price benchmark
- Opus 4.6 (February 2026): $5 input / $25 output — a 67% price cut
- Sonnet 4.6 (February 2026): $3 input / $15 output — equivalent performance, 40% cheaper than Opus 4.6
In less than three months, the cost of frontier intelligence dropped 80%. We went from $75 per million output tokens to $15 for the same level of performance. This isn’t a trend. It’s a collapse.
And the AI cost curve keeps accelerating. What costs $15 today will probably cost $5 in six months.
Computer Use: from promise to reality
The stat generating the fewest headlines but mattering the most: the ability to use a computer like a human.
In October 2024, Claude scored 14.9% on OSWorld. Yesterday, Sonnet 4.6 scored 72.5%. That’s nearly 5x improvement in 16 months.
What does that mean in practice? A $3/$15 model can:
- Navigate web applications
- Fill out forms
- Click buttons, type in fields
- Execute multi-step workflows on your screen
The AI agent that operates your computer is no longer science fiction. And you don’t need the most expensive model to make it work.
One million tokens of context
Sonnet 4.6 ships with a 1 million token context window in beta. To put that in perspective: roughly 750,000 words, or about 3,000 pages of text.
There’s fine print: beyond 200,000 input tokens, pricing doubles ($6/$22.50). But even with that surcharge, it’s still cheaper than Opus.
For RAG applications, large document analysis, or code review of big repositories, this matters. You no longer need to reach for Opus or Gemini just for context length.
When to use Sonnet 4.6 vs Opus vs Haiku
Here’s the practical guide no press release will give you:
| Use case | Recommended model | Why |
|---|---|---|
| Chatbot, FAQ, classification | Haiku 4.5 ($0.50/$3) | Don’t bring a bazooka to a knife fight |
| Code, analysis, documents, agents | Sonnet 4.6 ($3/$15) | 95%+ of Opus performance for much less |
| Extreme scientific reasoning, research | Opus 4.6 ($5/$25) | That 1-2% matters when precision is critical |
| High-volume low-value tasks | Haiku 4.5 + batch ($0.25/$1.50) | 50% discount on batch API |
The rule is simple: start with Sonnet 4.6. Only scale up to Opus if you can prove you need that extra 1-2% of performance. And step down to Haiku for anything that doesn’t require complex reasoning.
If you’re serious about managing AI costs, this table should be your starting point.
The pattern you should recognize
This has happened before:
- GPT-4 was the premium model. GPT-4o matched it for less. GPT-4.1 mini did it for pennies.
- Opus 4.5 was the flagship. Opus 4.6 beat it at a third of the price. Now Sonnet 4.6 matches it for even less.
Today’s flagship is tomorrow’s mid-tier. Always.
The implication for businesses: don’t architect your system around a specific model. Design for model swappability. What costs you $15/$75 today will cost $3/$15 tomorrow, and in a year there’ll be something better for $1/$5.
If your AI vendor locks you into a specific model with long contracts, you’re overpaying. Guaranteed.
Sonnet 4.6 as the Claude Code default
For those who code: Sonnet 4.6 is now the default model in Claude Code. Claude Code already generates over $1 billion in annual revenue and has become the most widely adopted development tool in the ecosystem.
That Anthropic chose Sonnet as the default over Opus says a lot: they trust the performance is sufficient for 95% of use cases. And they’d rather developers use more (at a low price) than less (at a high price).
What this actually means for your budget
Do the math. If your company runs an internal chatbot with 10,000 daily interactions:
| Model | Estimated monthly cost |
|---|---|
| Opus 4.5 (previous) | ~€8,000-12,000 |
| Opus 4.6 | ~€3,000-5,000 |
| Sonnet 4.6 | ~€2,000-3,500 |
| Haiku 4.5 | ~€400-800 |
For most enterprise use cases, Sonnet 4.6 is the obvious choice. Performance is practically identical to the flagship, and the savings show up on your bottom line.
If your company hasn’t started with AI FinOps, now is the time. Not because prices are going up, but because they’re dropping so fast you might be overpaying without realizing it.
My take
I’ve been using Claude as my primary tool since 2024. I’ve gone through Opus 4.5, Opus 4.6, and now Sonnet 4.6. My conclusion: for 90% of my work — code, analysis, technical writing — Sonnet 4.6 is indistinguishable from Opus.
Where do I notice the difference? On very long chains of reasoning, where Opus maintains better coherence across many steps. But those cases account for 10% of real-world usage.
The reality is that the “premium tax” for paying for the most expensive model is evaporating. And that’s good for everyone — except those whose business model was built on selling the expensive version.
If you’re still unsure which major LLM is right for you, the answer in 2026 is more nuanced than ever. But one thing is clear: price is no longer the barrier.
Keep exploring
- Claude Opus 4.6: The Model That Crashed the Stock Market - The big sibling that triggered a market earthquake two weeks ago
- FinOps for AI: How to Stop Bleeding Money on Inference Costs - Practical guide to controlling what you spend on LLMs
- ChatGPT vs Gemini vs Claude: 2026 Comparison - The big three, head to head
You might also like
ChatGPT vs Gemini vs Claude: 2026 Comparison
Practical guide to choosing between the big three LLMs. Pricing, capabilities, use cases, and which one fits your situation.
Claude Opus 4.6: The Model That Tanked the Stock Market and Now Wants Your Job
Claude Opus 4.6 brings 1M context and agent teams. Cowork plugins wiped $285B off software stocks. What changed and why it matters.
FinOps for AI: How to Stop Bleeding Money on Inference Costs
Your AI pilot cost €50/month. Production is €5,000. A practical FinOps guide to controlling inference costs, tokens, embeddings, and LLM API spend in 2026.