Opus 4.8 vs GPT-5.5: the coder's flagship vs the daily driver

Both charge $5 per million input tokens. After that they pull apart fast. Here's where each one wins and where it loses.

· View changelog · Figures verified against official sources, 30 May 2026

Output price $25 vs $30 Opus / GPT-5.5 per 1M tokens
Opus SWE-bench Pro 69.2% Anthropic-reported vs 58.6%
GPT-5.5 Terminal-Bench 2.0 82.7% OpenAI-reported, state of the art
Opus fast mode $10 / $50 Up to 2.5× output speed

Start with the number that surprises people: these two flagships charge the exact same $5 per million input tokens. That's the price of feeding them your code, your docs, your context. The split shows up on output: Opus 4.8 bills $25 per million, GPT-5.5 bills $30. A dollar-per-million edge sounds tiny until you're generating long agent transcripts all day, where output dominates the bill.

This is the rematch of an older fight. Last time it was GPT-5 against Opus 4.7, and both labs have shipped a point release since. The shape of the contest has changed, so the verdict deserves a fresh look rather than a swapped-in version number.

Standard API pricing, May 2026, per OpenAI and Anthropic developer docs
ModelInput ($/M)Output ($/M)Notes
Claude Opus 4.8$5$251M context, 128K max output
Opus 4.8 fast mode$10$50Up to 2.5× output speed, research preview
GPT-5.5$5$301.05M context, $0.50 cached input
GPT-5.5 Pro$30$180Higher-accuracy tier, no cached discount

Coding, where Opus 4.8 pulls ahead

If your decision rests on writing and reviewing software, Opus 4.8 is the pick. Anthropic reports it at 69.2% on SWE-bench Pro, the tougher real-world variant that scores end-to-end GitHub issue resolution, against the 58.6% OpenAI reports for GPT-5.5. That's a wide gap on the benchmark closest to actual production work. Anthropic also calls Opus 4.8 the strongest computer-use and browser-agent model it has tested, at 84% on Online-Mind2Web.

The quieter win is about trust. Anthropic says Opus 4.8 is around four times less likely than Opus 4.7 to let a flaw in code it wrote pass without comment, and more willing to flag when it isn't sure. For code review, that instinct to raise a hand is worth more than a benchmark point, because the bug that ships quietly is the one that costs you. The full picture is in the Opus 4.8 review.

Winner on production coding: Opus 4.8, clearly.

The benchmark Opus loses

Now the honest part, because a comparison that only lists one model's wins isn't worth reading. GPT-5.5 beats Opus 4.8 on Terminal-Bench, the benchmark for long, multi-step command-line agent work. On Anthropic's own run with the common Terminus-2 public harness, Opus 4.8 lands around 74.6% to GPT-5.5's 78.2%. OpenAI reports GPT-5.5 even higher, a state-of-the-art 82.7% on Terminal-Bench 2.0, and Anthropic's footnote notes GPT-5.5 reaches 83.4% under OpenAI's own Codex CLI harness.

The harness matters, and the scores move with it, but the direction doesn't change: GPT-5.5 is the stronger terminal agent. If your workload is an autonomous agent grinding through shell commands, builds, and tool calls for hours, that's GPT-5.5's home turf, and it's exactly the kind of work OpenAI built this release around.

Winner on terminal-agent work: GPT-5.5.

GPT-5.5 as the daily driver

Outside of raw coding, GPT-5.5 is the broader generalist. It carries a slightly larger 1.05-million-token context window, OpenAI tuned it for concise answers, and it's pitched squarely at professional, document-heavy knowledge work: research, synthesis, analysis. It's also the model behind ChatGPT's default for everyone, so it's the one most of your non-developer colleagues are already using. The GPT-5 review traces how that generalist lineage holds up.

Where this really bites is writing. Neither lab markets a "best writer," and for long-form drafting the two trade blows in ways benchmarks won't capture. We pull that specific contest apart in Claude vs ChatGPT for long-form writing, including the output-length ceilings that decide how much either one can produce in a single pass.

Fast mode and the cost math

Opus 4.8's fast mode is the new lever. For $10 input and $50 output per million, double the standard rate, you get up to 2.5 times the output tokens per second, with identical model behavior. The headline is that this is roughly three times cheaper than fast mode on Opus 4.7, which ran $30 and $150. If latency is your bottleneck on an interactive coding agent, that's a real option now rather than a luxury. It's still gated behind a waitlist and API-only, so treat it as a tool you grow into.

For everything cost-sensitive, the matched $5 input price is the headline you should plan around. Feeding context is free-of-difference between these two; only your output mix moves the bill. For a fuller cost breakdown across workloads, price per use case does the math.

Verdict

Make Opus 4.8 your default for writing and reviewing code, where it wins SWE-bench Pro, computer use, and the willingness to flag its own mistakes. Switch to GPT-5.5 for terminal-agent runs, the broadest knowledge work, and as the all-purpose model your whole team can share. Their input prices match, so go with both if you ship software, and skip the GPT-5.5 Pro tier unless you've measured that you need it.

Frequently asked

Is Opus 4.8 or GPT-5.5 better at coding?

On most coding and agentic benchmarks, Opus 4.8 leads. Anthropic reports it at 69.2% on SWE-bench Pro against GPT-5.5's 58.6%, and calls it the strongest computer-use model it has tested at 84% on Online-Mind2Web. The clear exception is Terminal-Bench, where GPT-5.5 wins. So Opus 4.8 is the better default for building and reviewing code, with one real caveat.

Which benchmark does Opus 4.8 lose to GPT-5.5?

Terminal-Bench. On Anthropic's own run with the Terminus-2 public harness, Opus 4.8 scores about 74.6% to GPT-5.5's 78.2%. OpenAI separately reports GPT-5.5 at a state-of-the-art 82.7% on Terminal-Bench 2.0, and Anthropic's footnote notes GPT-5.5 reaches 83.4% under OpenAI's Codex CLI harness. Either way, GPT-5.5 wins the terminal-agent benchmark.

How much do Opus 4.8 and GPT-5.5 cost?

Both charge $5 per million input tokens. Output is where they split: Opus 4.8 is $25 per million, GPT-5.5 is $30 per million. Opus also offers a fast mode at $10 input and $50 output for up to 2.5 times the output speed, and GPT-5.5 has a separate Pro model at $30 input and $180 output.

What is Opus 4.8 fast mode?

Fast mode is a research-preview option that runs Opus 4.8 at up to 2.5 times the output tokens per second for $10 input and $50 output per million tokens. The model weights and behavior are identical to standard Opus 4.8; you're paying for throughput. It's gated behind a waitlist and is API-only.

Should I run both models?

If you ship software, yes. Make Opus 4.8 the default for writing and reviewing code, and keep GPT-5.5 for terminal-heavy agent work, long-context knowledge tasks, and as your general daily driver. Their input prices match, so the cost of keeping both keys is small next to picking the right model per job.

Changelog

  • May 30, 2026 — Originally published. Pricing and benchmark figures verified against OpenAI and Anthropic developer docs and launch materials; Terminal-Bench framing checked against Anthropic's own footnote.

References

  1. Anthropic, "Introducing Claude Opus 4.8," anthropic.com/news, accessed May 2026.
  2. Anthropic, "Pricing," platform.claude.com, accessed May 2026.
  3. Anthropic, "Fast mode," platform.claude.com/docs, accessed May 2026.
  4. OpenAI, "Introducing GPT-5.5," openai.com, accessed May 2026.
  5. OpenAI, "GPT-5.5 API model card," developers.openai.com, accessed May 2026.