Three weeks ago, Anthropic's Mythos-class architecture lived in exactly one place: Claude Fable 5, priced at $10 per million input tokens and $50 per million output for the hardest agentic work. The open question was whether that architecture would ever come down in price, or stay a flagship-only luxury. On July 1, Anthropic answered it. Claude Sonnet 5 runs on the same Mythos-class architecture, and it slots into the middle of the Claude lineup at $4 per million input tokens and $20 per million output — not a new "Sonnet 4.7," but a second model built on Fable 5's foundation.
One architecture, two tiers
Sonnet 5 sits between Sonnet 4.6's $3/$15 and Opus 4.8's $5/$25, at $4 per million input tokens and $20 per million output. Cached input runs $0.40 per million, and the Batch API takes 50% off both directions to $2/$10 — the same discount convention as the rest of the Claude line. Context stays at 1M tokens, matching Sonnet 4.6 and Opus 4.8, but max output jumps to 200,000 tokens, more than triple Sonnet 4.6's 64,000 and well past Opus 4.8's 128,000. The API id is claude-sonnet-5, and Anthropic's tentative retirement floor is not sooner than July 1, 2027 — the same one-year convention it applies to its other active models.
The architecture brings two more inherited traits. Adaptive thinking is always on, with no extended-thinking toggle to flip — the same behavior Fable 5 introduced. And the same safety classifiers apply: requests touching offensive cybersecurity, most biology and chemistry, or attempts to distill the model's capabilities fall back to Claude Opus 4.8, identical to how Fable 5 handles those categories.
What the benchmarks say
The headline number is SWE-bench Verified: Sonnet 5 scores 89.4%, ahead of Claude Opus 4.8's 88.6% — a mid-tier model beating last month's flagship on a closely watched metric. SWE-bench Pro, the harder agentic-coding test, comes in at 71.8%. Terminal-Bench 2.1 lands at 85.6%. Reasoning tells a different story: GPQA Diamond is 92.0%, behind Opus 4.8's 93.6%, so Opus 4.8 keeps the edge on graduate-level science reasoning. On ARC-AGI-2, Sonnet 5 scores 20.0, ahead of Sonnet 4.6's 15.0. Humanity's Last Exam without tools comes in at 42.5%. The rest of the sheet: LMSYS Arena 1435, MMLU 93.8%, HumanEval 96.0%, MATH 93.5%.
Read the pattern honestly: Sonnet 5 closes almost all of the coding gap to Opus 4.8, and actually passes it on SWE-bench Verified, while giving up ground on the hardest reasoning benchmark. That's a coherent trade for a model priced at 80% of Opus 4.8's input rate.
Sonnet 5 vs Sonnet 4.6 vs Opus 4.8
| Spec | Claude Sonnet 5 | Claude Sonnet 4.6 | Claude Opus 4.8 |
|---|---|---|---|
| Price (in/out per 1M) | $4 / $20 | $3 / $15 | $5 / $25 |
| Context window | 1M tokens | 1M tokens | 1M tokens |
| Max output | 200K | 64K | 128K |
| SWE-bench Verified | 89.4% | 79.6% | 88.6% |
| GPQA Diamond | 92.0% | 89.9% | 93.6% |
| Thinking mode | Adaptive, always on | Standard | Standard |
| Restrictions | Cyber / bio / distillation fall back to Opus 4.8 | Standard | Standard |
Is the mid-tier upgrade worth it?
Run the math on a real workload before switching. A coding agent burning 2M input tokens and 400K output tokens a day costs about $16 on Sonnet 5 (2 × $4 + 0.4 × $20), against $12 on Sonnet 4.6 (2 × $3 + 0.4 × $15) and $20 on Opus 4.8 (2 × $5 + 0.4 × $25). That puts Sonnet 5 roughly a third above Sonnet 4.6's bill and 20% under Opus 4.8's, for a model that beats Opus 4.8 on SWE-bench Verified. The cost calculator will run this against your own volumes, and the Claude Sonnet 5 pricing breakdown covers the caching and batch math in full.
Where the upgrade clearly pays off: coding agents and long-running tool loops that were hitting Sonnet 4.6's 64K output ceiling, since 200K max output means far fewer truncated responses mid-task. Where it's a harder sell: workloads that are already comfortable on Sonnet 4.6 and don't need the extra output headroom or the coding bump — the 33% price increase isn't free. And if your work depends on Opus 4.8's GPQA-level reasoning ceiling, Sonnet 5 doesn't close that gap; Opus 4.8 stays the pick.
A crowded launch week
Sonnet 5 didn't ship in isolation. The same day, Anthropic's export-control review closed out and Claude Fable 5 was restored to all customers, with AWS reinstating Bedrock access in step — Anthropic frames both moves as the same review concluding cleanly. It's also the week OpenAI's GPT-5.6 left its partner-gated preview for general availability, and Google shipped Gemini 3.5 Pro with a 2-million-token context window. None of that changes the Sonnet 5 math directly, but it's the backdrop: three labs moved their pricing and capability lines in the same week, and benchr's model comparison tool is the fastest way to see how the current lineup shakes out.