How does Claude Sonnet 5 compare to Sonnet 4.6 and Opus 4.8?

On SWE-bench Verified, Sonnet 5 scores 89.4%, ahead of Sonnet 4.6's 79.6% and narrowly ahead of Opus 4.8's 88.6%. Opus 4.8 still leads on GPQA Diamond, 93.6% against Sonnet 5's 92.0%, so it remains the pick when the hardest reasoning is on the line. Sonnet 5's 200,000-token max output is well above both Sonnet 4.6's 64,000 and Opus 4.8's 128,000.

Is Claude Sonnet 5 worth upgrading to from Sonnet 4.6?

For most Sonnet 4.6 workloads, yes. Sonnet 5 costs about a third more per token but closes most of the gap to Opus 4.8 on coding benchmarks while staying well under Opus pricing. Skip it if your work depends on Opus 4.8's GPQA-level reasoning ceiling, or if it touches offensive-security or bio work, since the classifiers hand those sessions to Opus 4.8 regardless of which model you called.

News & analysis·July 2026

Claude Sonnet 5 launches: Mythos-class architecture at a mid-tier price

Anthropic's second Mythos-class model isn't a new flagship. It's $4/$20 pricing, a 200K max output, and a SWE-bench Verified score that edges out last month's Opus 4.8.

By the benchr team · Published July 1, 2026 · View changelog · Figures verified against Anthropic's announcement and official docs, July 1, 2026

Input / 1M tokens $4 Output $20 — between Sonnet 4.6 and Opus 4.8

Context window 1M Same as Sonnet 4.6 and Opus 4.8

Max output 200K Sonnet 4.6: 64K. Opus 4.8: 128K

SWE-bench Verified 89.4% Edges out Opus 4.8's 88.6%

Three weeks ago, Anthropic's Mythos-class architecture lived in exactly one place: Claude Fable 5, priced at $10 per million input tokens and $50 per million output for the hardest agentic work. The open question was whether that architecture would ever come down in price, or stay a flagship-only luxury. On July 1, Anthropic answered it. Claude Sonnet 5 runs on the same Mythos-class architecture, and it slots into the middle of the Claude lineup at $4 per million input tokens and $20 per million output — not a new "Sonnet 4.7," but a second model built on Fable 5's foundation.

One architecture, two tiers

Sonnet 5 sits between Sonnet 4.6's $3/$15 and Opus 4.8's $5/$25, at $4 per million input tokens and $20 per million output. Cached input runs $0.40 per million, and the Batch API takes 50% off both directions to $2/$10 — the same discount convention as the rest of the Claude line. Context stays at 1M tokens, matching Sonnet 4.6 and Opus 4.8, but max output jumps to 200,000 tokens, more than triple Sonnet 4.6's 64,000 and well past Opus 4.8's 128,000. The API id is claude-sonnet-5, and Anthropic's tentative retirement floor is not sooner than July 1, 2027 — the same one-year convention it applies to its other active models.

The architecture brings two more inherited traits. Adaptive thinking is always on, with no extended-thinking toggle to flip — the same behavior Fable 5 introduced. And the same safety classifiers apply: requests touching offensive cybersecurity, most biology and chemistry, or attempts to distill the model's capabilities fall back to Claude Opus 4.8, identical to how Fable 5 handles those categories.

What the benchmarks say

The headline number is SWE-bench Verified: Sonnet 5 scores 89.4%, ahead of Claude Opus 4.8's 88.6% — a mid-tier model beating last month's flagship on a closely watched metric. SWE-bench Pro, the harder agentic-coding test, comes in at 71.8%. Terminal-Bench 2.1 lands at 85.6%. Reasoning tells a different story: GPQA Diamond is 92.0%, behind Opus 4.8's 93.6%, so Opus 4.8 keeps the edge on graduate-level science reasoning. On ARC-AGI-2, Sonnet 5 scores 20.0, ahead of Sonnet 4.6's 15.0. Humanity's Last Exam without tools comes in at 42.5%. The rest of the sheet: LMSYS Arena 1435, MMLU 93.8%, HumanEval 96.0%, MATH 93.5%.

Read the pattern honestly: Sonnet 5 closes almost all of the coding gap to Opus 4.8, and actually passes it on SWE-bench Verified, while giving up ground on the hardest reasoning benchmark. That's a coherent trade for a model priced at 80% of Opus 4.8's input rate.

Sonnet 5 vs Sonnet 4.6 vs Opus 4.8

The Claude line after July 1, 2026, from the official docs
Spec	Claude Sonnet 5	Claude Sonnet 4.6	Claude Opus 4.8
Price (in/out per 1M)	$4 / $20	$3 / $15	$5 / $25
Context window	1M tokens	1M tokens	1M tokens
Max output	200K	64K	128K
SWE-bench Verified	89.4%	79.6%	88.6%
GPQA Diamond	92.0%	89.9%	93.6%
Thinking mode	Adaptive, always on	Standard	Standard
Restrictions	Cyber / bio / distillation fall back to Opus 4.8	Standard	Standard

Is the mid-tier upgrade worth it?

Run the math on a real workload before switching. A coding agent burning 2M input tokens and 400K output tokens a day costs about $16 on Sonnet 5 (2 × $4 + 0.4 × $20), against $12 on Sonnet 4.6 (2 × $3 + 0.4 × $15) and $20 on Opus 4.8 (2 × $5 + 0.4 × $25). That puts Sonnet 5 roughly a third above Sonnet 4.6's bill and 20% under Opus 4.8's, for a model that beats Opus 4.8 on SWE-bench Verified. The cost calculator will run this against your own volumes, and the Claude Sonnet 5 pricing breakdown covers the caching and batch math in full.

Where the upgrade clearly pays off: coding agents and long-running tool loops that were hitting Sonnet 4.6's 64K output ceiling, since 200K max output means far fewer truncated responses mid-task. Where it's a harder sell: workloads that are already comfortable on Sonnet 4.6 and don't need the extra output headroom or the coding bump — the 33% price increase isn't free. And if your work depends on Opus 4.8's GPQA-level reasoning ceiling, Sonnet 5 doesn't close that gap; Opus 4.8 stays the pick.

A crowded launch week

Sonnet 5 didn't ship in isolation. The same day, Anthropic's export-control review closed out and Claude Fable 5 was restored to all customers, with AWS reinstating Bedrock access in step — Anthropic frames both moves as the same review concluding cleanly. It's also the week OpenAI's GPT-5.6 left its partner-gated preview for general availability, and Google shipped Gemini 3.5 Pro with a 2-million-token context window. None of that changes the Sonnet 5 math directly, but it's the backdrop: three labs moved their pricing and capability lines in the same week, and benchr's model comparison tool is the fastest way to see how the current lineup shakes out.

Frequently asked

What is Claude Sonnet 5?

Claude Sonnet 5 is Anthropic's second Mythos-class-architecture model, launched July 1, 2026, priced for the mid-tier rather than the flagship spot Fable 5 occupies. $4/$20 per million tokens, 1M context, 200,000-token max output. Adaptive thinking is always on, and the same classifiers as Fable 5 route offensive-cyber, bio/chem, and distillation requests to Opus 4.8.

How much does Claude Sonnet 5 cost?

$4 per million input tokens and $20 per million output, between Sonnet 4.6's $3/$15 and Opus 4.8's $5/$25. Cached input is $0.40 per million, and the Batch API cuts both rates in half to $2/$10.

How does Sonnet 5 compare to Sonnet 4.6 and Opus 4.8?

SWE-bench Verified: 89.4% for Sonnet 5, ahead of Sonnet 4.6's 79.6% and narrowly ahead of Opus 4.8's 88.6%. Opus 4.8 still leads GPQA Diamond, 93.6% to 92.0%. Sonnet 5's 200K max output beats both Sonnet 4.6's 64K and Opus 4.8's 128K.

Is Sonnet 5 worth upgrading to from Sonnet 4.6?

For most workloads, yes — about a third more expensive per token but closing most of the coding gap to Opus 4.8. Skip it if you need Opus 4.8's reasoning ceiling, or if your work touches offensive-security or bio requests that get routed to Opus 4.8 anyway.

Changelog

July 1, 2026 — Published. Pricing, context window, max output, benchmark figures, classifier behavior, and the retirement floor verified against Anthropic's launch announcement and the official model docs.

References

Anthropic, "Claude Sonnet 5," anthropic.com/news/claude-sonnet-5, July 1, 2026. Source for the release, pricing, classifier behavior, and benchmark figures.
Anthropic, "Models overview," platform.claude.com/docs, accessed July 1, 2026. Source for the API id, context window, max output, and the retirement-floor policy.
Anthropic, "Claude Opus 4.8 system card," anthropic.com. Source for the Opus 4.8 comparison figures.