Review·Covers April 2026·Published May 30, 2026

Claude Opus 4.7, reviewed

Name: Claude Opus 4.7, reviewed
Item: Claude Opus 4.7
Rating: 4.6
Author: benchr

Coding, long-document analysis, and multilingual capability. What Opus 4.7's pricing and documented capability profile imply for the workloads you should put on it.

By the benchr team · Updated May 30, 2026 · View changelog · Figures verified against official sources, 30 May 2026

benchr rating: 4.6 / 5

Input cost / 1M $5.00 Per Anthropic's pricing page

SWE-Bench Verified 87.6% Anthropic-reported, on SWE-bench Verified

Context window 1M Advertised by Anthropic

Output cost / 1M $25 vs $15 for Sonnet 4.6

Anthropic shipped Claude Opus 4.7 on April 16, 2026, per Anthropic's launch announcement. The headline chart did not move the needle. Every frontier model now lives in the upper 90s on the benchmarks that used to discriminate between them, and Opus 4.7 is no exception. The pricing table and the model card tell you more about who should pay for it than the leaderboards do.

This piece reads the public evidence: the launch announcement, the published Claude API documentation, the SWE-bench Verified score Anthropic reports, the pricing page, and the comparison patterns already in the open between Opus, GPT-5, and the Gemini 3 family. The goal is to tell you which workloads you should put on Opus 4.7 and which ones you should not.

If your job involves production code or dense documents, or reasoning where a wrong answer carries a concrete cost, Opus is worth paying for. Everything else stays on Sonnet 4.6. The sections below work through why.

What the documented capability profile says

Anthropic positions Opus 4.7 on three documented strengths. Coding comes first: the model is the company's lead candidate on SWE-bench Verified, the public benchmark closest to measuring whether a model can complete a real GitHub issue against a production repository. Then there's reasoning under uncertainty, which the release notes describe in terms of multi-step problem-solving and "hedging in the right places." Long-document analysis rounds it out, with the 1M-token window positioned as the differentiator.

These are positioning claims, so weigh them as you would any other vendor capability claim. They tell you what the model was tuned for and where the lab expects it to hold up, but they aren't independent verification of how it behaves on your workload. Put Opus on a task well outside those three strengths, say fast multimodal vision work, and the published case for the premium gets thin.

Coding: where the price difference earns its keep

SWE-bench Verified is the most useful public signal for coding. Anthropic reports Opus 4.7 at 87.6%, against Sonnet 4.6 in the low-80s and GPT-5 within a few points either way depending on configuration. The rank order between Opus and GPT-5 has been close enough on the public leaderboards through 2026 that you should not pick a model based on the small delta. Pick on what the model is built for, then verify on your repo.

What Opus is built for, on the coding side, is the long-horizon architectural call. Reading a thousand-line file and proposing a sane split between UI state, IO, and command dispatch. Spotting where a method should be synchronous instead of async because the call path is hot. The lab keeps surfacing examples of that kind of work in its release material, and the public agent-style evaluations (SWE-bench Verified, Aider's leaderboard) point in the same direction. If your work is closer to "ship this feature in this 50-file codebase," that is the case for paying for Opus.

When the work is closer to "rename this variable across 20 files," Sonnet 4.6 does it for 60% of the price, and the architectural taste you'd be paying extra for goes unused.

The three Claude tiers sorted by where Anthropic positions each. Pick the cheapest one that still does your job. Source: Anthropic product pages, May 2026.

Long documents: query the window, don't dump into it

Opus 4.7 advertises a 1M-token context window. Read that number as a ceiling for retrieval rather than a promise about summarization quality. Anthropic's own retrieval research, and the parallel work coming out of the academic community on long-context attention, has documented that attention inside very long windows is uneven. Every token is in scope, but the model weights them unevenly once the window gets that long.

So when you put a 300-page document in front of Opus, do not ask for a one-shot summary on the first pass. Treat the window as a database: drop the document in, then ask specific factual questions against it. Retrieval inside that long context holds up where one-shot compression tends to slip. Once you have the specific facts pulled, ask for the summary on a second pass, with those facts in hand. The marketing around million-token windows consistently oversells the one-shot summary and says nothing about the query-first approach that works.

A million-token window is for pulling facts back out on demand, not for crushing a whole document into one summary on the first pass.

For workloads dominated by long-document analysis, the kind legal review and government policy work throw off, that pattern is where Opus 4.7 earns its premium over a smaller model that cannot hold the document at all. If you just need a fast summary of a thirty-page doc, Sonnet 4.6 is more than enough.

Multilingual: a known weak spot at the edges

In English or one of the heavily-resourced European languages, Opus 4.7 produces text that needs only light editing. Push further from the training mass, to Arabic where a dialect register matters, or romanized Hindi, or Bahasa aimed at an Indonesian rather than a Malaysian audience, and the grammar still holds up while the tone slips.

The assessment here is qualitative; there are no numbers to lean on. No public leaderboard scores "natural Khaleeji Arabic" or "register-appropriate Bahasa for a Jakarta startup," and the published multilingual benchmarks score machine-translation quality on news text rather than dialect fit. Take the model's strong overall multilingual reputation as a starting point and plan for an editing pass wherever tone matters. For more on the Arabic case specifically, benchr's Arabic-content piece goes into where each frontier model lands.

The failure modes worth knowing about

A handful of failure patterns get flagged often enough in the public community discussion (Anthropic's own discussion forums, the Claude subreddit, Aider's issue tracker) to call out before you commit a workflow to Opus 4.7.

First: over-explanation. Ask Opus a yes-or-no factual question and you'll often get the right answer followed by paragraphs of caveats. The fix is a system-prompt instruction: "one-line answer, then stop." That works on most calls. It costs you on usability, not on capability.

Second: confident API hallucination in less-popular libraries. The model is reliable on the standard libraries of major languages. Move into a niche library and it can produce confident-looking signatures that don't exist. Never trust an API signature you can't verify against the library's documentation, which is good practice anyway. The defect is that the model doesn't flag its own uncertainty.

Third: helpful drift on refactors. Ask Opus to refactor one method and you may get a quietly refactored neighbor you did not ask for. Spell out the scope of the change at the start of the request. The default behavior over-reaches.

What it costs

Opus 4.7 lists at $5 per million input tokens and $25 per million output tokens, per Anthropic's published pricing. Cached input is cheaper. Your monthly bill is a linear function of your token volume, not a fixed subscription, so the only number that matters is your typical session size.

List pricing reference, May 2026 — Anthropic Claude family
Tier	Input / 1M	Output / 1M	Workload fit
Opus 4.7	$5.00	$25	Production code, long-document work, hard reasoning
Sonnet 4.6	$3.00	$15	Daily driver: drafts, chat, routine code
Haiku 4.5	$1.00	$5	High-volume, classification, extraction

The real pricing decision is which tier you default to and when you escalate off it. Default to the cheapest tier your workload tolerates, and move up to Opus only when the answer matters enough that a wrong one costs more than the model fee. Most teams get this backwards, defaulting to Opus and barely touching Sonnet. If you find your monthly bill climbing past where the value justifies it, the fix is almost always to push more traffic down the stack to Sonnet.

1.67× Opus 4.7's input price vs Sonnet 4.6. Output gap is wider.

The comparison Opus 4.7 has to win is the one against GPT-5; Sonnet is a different question. For that, the head-to-head piece walks through where each model leads, with Opus holding the documented edge on code and long-document analysis and GPT-5 ahead on structured output and vision. For a workload that mixes both, the cheaper play is to keep both in your stack and route each task to whichever model suits it, rather than committing to one and living with its weaker corners. The price-per-use-case piece shows what that routing looks like in practice.

The verdict

Claude Opus 4.7 is the right default for serious technical work where a wrong answer has a concrete cost: coding, long-document analysis, reasoning under uncertainty. That is where it earns its premium, and nowhere much else. For routine work, the chat and summaries and formatting and simple code that fill most days, drop to Sonnet 4.6 and keep the savings. The 1.67× input gap and the wider output gap are only worth paying when the work demands it.

If you're building a production stack that needs an Anthropic model in 2026, you'll likely end up running two of them: Opus for the calls that matter, Sonnet for everything else. That split is what the pricing page seems to assume, and setting your stack up to match it will save you more than almost anything else you do this quarter.

Frequently asked

Is Claude Opus 4.7 worth $5 per million input tokens?

For technical work like coding, document analysis, and complex reasoning, yes. The published SWE-bench Verified score and the headroom the model has on architectural reasoning justify the premium over Sonnet 4.6 ($3 per million input). For simple chat or content generation, Sonnet is the better deal at roughly 60% of the price.

How does Claude Opus 4.7 compare to GPT-5?

On the public coding leaderboards, the two trade places by a small margin. Anthropic positions Opus on reasoning under uncertainty and long-document work. OpenAI positions GPT-5 on structured output, vision, and breadth. For mixed workloads, picking the cheaper model first and reserving the other for hard calls beats committing to one.

What does Claude Opus 4.7 cost?

Per Anthropic's pricing page: $5 per million input tokens and $25 per million output tokens, with cached input rates lower. Spend scales linearly with your token volume; calculate from your typical session size rather than from a published anecdote.

When should you use Sonnet 4.6 instead of Opus 4.7?

Routine work: chat, summarization, draft generation, simple coding tasks. Sonnet is roughly 60% of the price and handles most of these well enough that Opus's edge is not worth the spend.

What's the effective context window for Claude Opus 4.7?

Advertised at 1M tokens. Treat long context as a queryable surface for retrieval rather than a one-shot summarization buffer. Attention quality is uneven across very long windows, a pattern Anthropic and independent researchers have flagged.

Changelog

May 25, 2026 — Rewrote sections that previously narrated original lab tests; the article now grounds its verdict in published benchmarks, pricing, and Anthropic's own positioning. Pricing verified against current provider documentation.
May 4, 2026 — Added Sonnet 4.6 comparison section after readers asked for the cross-tier math.
April 22, 2026 — Originally published.

References

Anthropic, "Claude API Documentation," docs.claude.com, accessed May 2026.
Anthropic, "Claude Pricing," anthropic.com/pricing, accessed May 2026.
LMSYS, "Chatbot Arena leaderboard," lmarena.ai, May 2026 snapshot.
"SWE-bench Verified leaderboard," swebench.com, May 2026.
Anthropic, "Introducing Claude Opus 4.7," anthropic.com/news/claude-opus-4-7, April 16, 2026.