Claude Fable 5 vs GPT-5.5 vs Gemini 3.1 Pro

The three newest frontier models, head to head on price, context, the benchmarks each lab actually published, and which one fits which job.

By the benchr team · · View changelog

Cheapest input $2 Gemini 3.1 Pro, per 1M tokens
Priciest output $50 Claude Fable 5, per 1M tokens
Gemini SWE-bench Verified 80.6% Google-reported
GPT-5.5 coding benchmarks 0 none published by OpenAI

These are the three frontier models you're most likely deciding between in the middle of 2026: Claude Fable 5 from Anthropic, GPT-5.5 from OpenAI, and Gemini 3.1 Pro from Google. They landed close together, they all carry roughly million-token context, and they sit at three very different price points. The hard part isn't finding the most capable model in the abstract. It's that the three labs published wildly different amounts of evidence, so a straight benchmark shootout isn't even possible. This piece lays out what each lab actually disclosed, flags the one launch table you shouldn't read at face value, and gives you a per-use-case verdict.

Start with the shape of the field. Gemini 3.1 Pro is the cheapest by a lot and the only one of the three with a full public benchmark sheet. Fable 5 is the most expensive and the most clearly aimed at long-horizon agentic work, but its launch numbers come with an asterisk we'll get into. GPT-5.5 is the awkward middle: mid-priced, broadly capable by reputation, and almost undocumented on the benchmarks that matter for ranking it against the other two.

Price: the gap is enormous

The spread here is wider than the usual frontier-tier shuffle. Gemini 3.1 Pro lists at $2 per million input tokens and $12 output for prompts up to 200K tokens, then steps up to $4 and $18 above 200K, with thinking tokens billed inside that output price. GPT-5.5 sits at $5 input and $30 output, with a cache-hit input price of $0.50. Claude Fable 5 is the priciest at $10 input and $50 output. Fable 5's prompt caching is aggressive, though: a 90% input discount drops cache-hit input to $1, which matters a lot for agentic loops that re-read the same context turn after turn.

Two pricing wrinkles are easy to miss. GPT-5.5's context isn't a round million, it's 1,050,000 tokens, and any session over 272K input tokens carries a surcharge of 2× on input and 1.5× on output. Gemini's tiered pricing flips at 200K, so a long-context Gemini job is priced very differently from a short one. Read the tier you'll actually land in, not the headline rate.

Claude Fable 5 vs GPT-5.5 vs Gemini 3.1 Pro: prices, context, the headline published benchmark, and availability, per each provider's official documentation as of June 13, 2026.
SpecClaude Fable 5GPT-5.5Gemini 3.1 Pro
Input price (per 1M)$10 ($1 cache-hit)$5 ($0.50 cache-hit)$2 up to 200K, $4 above
Output price (per 1M)$50$30$12 up to 200K, $18 above
Context window1M tokens1,050,000 tokens1M tokens
Max output128K tokens128K tokens64K tokens
Headline published benchmarkSWE-bench Pro 80.3 (see caveat)HealthBench 56.5SWE-bench Verified 80.6
AvailabilityFree on paid Claude plans Jun 9–22, then creditsPaid APIPreview, no free API tier

Benchmarks: only one lab gave you a full sheet

This is where the comparison breaks from the usual pattern, because the three labs published radically different amounts. Gemini 3.1 Pro is the documented one. Google reported ARC-AGI-2 at 77.1, GPQA Diamond at 94.3, Humanity's Last Exam with tools at 51.4, MMMU-Pro at 80.5, SWE-bench Verified at 80.6, and MMMLU at 92.6. That's a broad sheet covering reasoning, science, multimodal, and coding, and it's why Gemini is the one model here you can actually slot into a leaderboard.

Claude Fable 5 led with SWE-bench Pro at 80.3, and on its face that's a strong agentic-coding number. But there's a catch you need to understand before you quote it. Anthropic's launch table reports the higher of Mythos 5 or Fable 5 for each row, and the starred rows, including cyber, biology, and some reasoning scores, reflect Mythos 5, the restricted sibling, not Fable 5. On those starred tasks, everyday Fable 5 lands closer to Opus 4.8 because safety classifiers gate the request and fall back to the smaller model. Anthropic also did not publish SWE-bench Verified or GPQA for Fable 5 at launch, so there's no clean apples-to-apples coding or science number to set against Gemini's.

GPT-5.5 is the one you simply can't rank. OpenAI's only official benchmarks for the flagship are HealthBench 56.5 (length-adjusted) and HealthBench Professional 51.8. It did not publish SWE-bench, Terminal-Bench, or OSWorld. There's no shared coding or reasoning benchmark between GPT-5.5 and the other two, so any paper ranking against Fable 5 or Gemini 3.1 Pro would be invented. We're not going to invent one. If a coding or reasoning score for GPT-5.5 matters to your decision, the honest answer is that OpenAI hasn't published it.

Published benchmark coverage, by lab

How much each lab disclosed at launch, scaled 0–100. Higher means more comparable benchmarks you can actually use. This measures disclosure, not capability.

Gemini 3.1 Pro
Full sheet
Claude Fable 5
Partial
GPT-5.5
Health only

Context and limits: close, but not identical

All three are roughly million-token models, which a year ago would've been the headline and today is table stakes. Fable 5 and Gemini 3.1 Pro both list 1M tokens of context. GPT-5.5 actually edges them at 1,050,000, though as noted you pay a surcharge past 272K input. The bigger practical split is on output: Fable 5 and GPT-5.5 both cap at 128K output tokens, while Gemini 3.1 Pro caps at 64K. If your job is generating very long single responses, Gemini's ceiling is half the other two, and that can force you to chunk work that the others would do in one pass.

One more limit worth internalizing: Gemini 3.1 Pro folds thinking tokens into its output price. A reasoning-heavy Gemini call bills the thinking against that output rate, so the "cheap" output number is less cheap on hard reasoning than it first looks. It's still the cheapest of the three, just not as cheap as the sticker implies once the model thinks hard.

Availability: one's free right now, one has no free API at all

The access stories are as different as the prices. Claude Fable 5 is free to use inside Claude's Pro, Max, Team, and Enterprise plans from June 9 through June 22, 2026, with usage credits kicking in from June 23. That promo is about the Claude product, not the raw API. There's also a hard restriction to plan around: cyber, bio, and distillation requests fall back to Opus 4.8 rather than running on Fable 5, which is the same safety-gating that makes those starred launch numbers misleading.

GPT-5.5 is straightforward paid API access. Note the naming, though: the flagship GPT-5.5 was announced April 23, 2026, and it's distinct from "GPT-5.5 Instant," which has been the ChatGPT default since May 5, 2026. They're not the same model, and if you're benchmarking, make sure you're hitting the flagship. Gemini 3.1 Pro has no free API tier at all, just an AI Studio UI trial, and it's still officially a preview, released February 19, 2026, so its prices and limits may move.

Fable 5's input list price is roughly five times Gemini 3.1 Pro's at the low tier

Which one for which work

For everyday work where cost and broad capability both matter, default to Gemini 3.1 Pro. It's the cheapest by far, it has the only complete benchmark sheet, and its SWE-bench Verified 80.6 means you're not trading much measured capability for the savings. The two things to watch are the 64K output ceiling and the thinking-tokens-in-output billing on hard reasoning. For most teams this is the sensible default, and the full Gemini 3.1 Pro review goes deeper on where it slips.

For hard, long-horizon agentic engineering where you want the strongest tool and budget is secondary, reach for Claude Fable 5, especially during the June 9–22 free window on paid Claude plans. Its prompt caching makes the high sticker price more bearable on agentic loops that re-read context. Just go in clear-eyed about the launch table: the headline-grabbing cyber and bio rows are Mythos 5, and on safety-gated tasks you're getting something closer to Opus 4.8. The Fable 5 launch breakdown walks through that sibling-table issue in detail.

GPT-5.5 is the hardest to make a clean case for on paper, purely because OpenAI gave you so little to go on. If you already build on the OpenAI stack, GPT-5.5 is a reasonable mid-priced upgrade and its HealthBench numbers are genuinely strong for health-adjacent work. But if you're choosing fresh and you want to rank on evidence, it's tough to recommend over a Gemini that published everything or a Fable 5 that at least published its agentic coding score. The GPT-5.5 review covers what the upgrade buys you over GPT-5.

The one-line version: Gemini 3.1 Pro for value and breadth, Claude Fable 5 for premium agentic work with the caveat read, and GPT-5.5 only if the OpenAI ecosystem or its HealthBench strength is already your reason. None of the three is the obvious winner on every axis, and that's the honest state of the mid-2026 frontier.

Frequently asked

Which of the three is cheapest?

Gemini 3.1 Pro, by a wide margin. It lists at $2 per million input tokens and $12 output for prompts up to 200K tokens, rising to $4 and $18 above that. GPT-5.5 sits in the middle at $5 input and $30 output. Claude Fable 5 is the priciest at $10 input and $50 output, though prompt caching cuts its cache-hit input to $1.

Which model has the most published benchmarks?

Gemini 3.1 Pro, easily. Google published ARC-AGI-2 at 77.1, GPQA Diamond at 94.3, Humanity's Last Exam with tools at 51.4, MMMU-Pro at 80.5, SWE-bench Verified at 80.6, and MMMLU at 92.6. GPT-5.5 shipped only HealthBench numbers. Fable 5 led with SWE-bench Pro at 80.3 but did not publish SWE-bench Verified or GPQA at launch.

Why is Fable 5's launch table misleading?

Anthropic's launch table reports the higher of Mythos 5 or Fable 5 for each row. The starred rows, including cyber, biology, and some reasoning scores, reflect Mythos 5, the restricted sibling. On those tasks real-world Fable 5 lands closer to Opus 4.8 because safety classifiers gate the request. So you can't read every number in that table as Fable 5's everyday performance.

Can you rank GPT-5.5 against the other two on benchmarks?

Not really. OpenAI published only HealthBench 56.5 (length-adjusted) and HealthBench Professional 51.8 for the flagship GPT-5.5. It did not publish SWE-bench, Terminal-Bench, or OSWorld. With no shared coding or reasoning benchmark, any head-to-head ranking against Fable 5 or Gemini 3.1 Pro on paper would be guesswork.

Which one should I pick for agentic coding?

If budget is no object and the work is long-horizon agentic engineering, Claude Fable 5 is the premium pick, and it's free on Pro, Max, Team, and Enterprise from June 9 to 22, 2026. If you want strong published coding numbers at a fraction of the cost, Gemini 3.1 Pro posts SWE-bench Verified 80.6 at $2 input. GPT-5.5 is harder to recommend for coding specifically because OpenAI published no coding benchmark for it.

Do any of these have a free API tier?

Not in the usual sense. Gemini 3.1 Pro has no free API tier, only an AI Studio UI trial. Fable 5 is free to use inside Claude's Pro, Max, Team, and Enterprise plans from June 9 to 22, 2026, with usage credits starting June 23, but that's the product, not the API. GPT-5.5 is paid API access. Gemini 3.1 Pro is still a preview, so its prices and limits may change.

Changelog

  • June 13, 2026 — Originally published. Prices, context windows, max output, published benchmarks, availability windows, and the Mythos 5 launch-table caveat verified against Anthropic, OpenAI, and Google official documentation.

References

  1. Anthropic, "Claude Pricing," anthropic.com/pricing, accessed June 2026.
  2. Anthropic, "Claude API Documentation," docs.claude.com, accessed June 2026.
  3. OpenAI, "API Pricing," openai.com/api/pricing, accessed June 2026.
  4. OpenAI, "API Documentation," platform.openai.com/docs, accessed June 2026.
  5. Google, "Gemini API pricing," ai.google.dev/pricing, accessed June 2026.
  6. Google, "Gemini models," ai.google.dev/gemini-api/docs, accessed June 2026.