Comparison·Covers June 2026·Published June 13, 2026

Claude Fable 5 vs GPT-5.5 vs Gemini 3.1 Pro

The three newest frontier models, head to head on price, context, the benchmarks each lab published, and which one fits which job.

By the benchr team · Updated July 23, 2026 · View changelog

Claude Fable 5 vs GPT-5.5 vs Gemini 3.1 Pro: evidence layers and comparison routes. — **Model research**The visual for Claude Fable 5 vs GPT-5.5 vs Gemini 3.1 Pro pairs evidence layers and comparison routes.

Cheapest input $2 Gemini 3.1 Pro, per 1M tokens

Priciest output $50 Claude Fable 5, per 1M tokens

Gemini SWE-bench Verified 80.6% Google-reported

GPT-5.5 coding benchmarks 0 none published by OpenAI

These are the three frontier models you're most likely deciding between in the middle of 2026: Claude Fable 5 from Anthropic, GPT-5.5 from OpenAI, and Gemini 3.1 Pro from Google. They landed close together, they all carry roughly million-token context, and they sit at three very different price points. The hard part isn't finding the most capable model in the abstract. It's that the three labs published wildly different amounts of evidence, so a straight benchmark shootout isn't even possible. This piece lays out what each lab actually disclosed, flags the one launch table you shouldn't read at face value, and gives you a per-use-case verdict.

Start with the shape of the field. Gemini 3.1 Pro is the cheapest by a lot and the only one of the three with a full public benchmark sheet. Fable 5 is the most expensive and the most clearly aimed at long-horizon agentic work, but its launch numbers come with an asterisk we'll get into. GPT-5.5 is the awkward middle: mid-priced, broadly capable by reputation, and almost undocumented on the benchmarks that matter for ranking it against the other two.

Price: the gap is enormous

The spread here is wider than the usual frontier-tier shuffle. Gemini 3.1 Pro lists at $2 per million input tokens and $12 output for prompts up to 200K tokens, then steps up to $4 and $18 above 200K, with thinking tokens billed inside that output price. GPT-5.5 sits at $5 input and $30 output, with a cache-hit input price of $0.50. Claude Fable 5 is the priciest at $10 input and $50 output. Fable 5's prompt caching is aggressive, though: a 90% input discount drops cache-hit input to $1, which matters a lot for agentic loops that re-read the same context turn after turn.

Two pricing wrinkles are easy to miss. GPT-5.5's context isn't a round million, it's 1,050,000 tokens, and any session over 272K input tokens carries a surcharge of 2× on input and 1.5× on output. Gemini's tiered pricing flips at 200K, so a long-context Gemini job is priced very differently from a short one. Read the tier you'll actually land in, not the headline rate.

Claude Fable 5 vs GPT-5.5 vs Gemini 3.1 Pro: prices, context, provider-published headline results, and availability; checked July 23, 2026.
Spec	Claude Fable 5	GPT-5.5	Gemini 3.1 Pro
Input price (per 1M)	$10 ($1 cache-hit)	$5 ($0.50 cache-hit)	$2 up to 200K, $4 above
Output price (per 1M)	$50	$30	$12 up to 200K, $18 above
Context window	1M tokens	1,050,000 tokens	1M tokens
Max output	128K tokens	128K tokens	64K tokens
Headline published benchmark	SWE-bench Pro 80.3 (see caveat)	HealthBench 56.5	SWE-bench Verified 80.6
Availability	Generally available; plan credits apply	Paid API	Preview; provider terms apply

Benchmarks: only one lab gave you a full sheet

This is where the comparison breaks from the usual pattern, because the three labs published radically different amounts. Gemini 3.1 Pro is the documented one. Google reported ARC-AGI-2 at 77.1, GPQA Diamond at 94.3, Humanity's Last Exam with tools at 51.4, MMMU-Pro at 80.5, SWE-bench Verified at 80.6, and MMMLU at 92.6. That's a broad sheet covering reasoning, science, multimodal, and coding, and it's why Gemini is the one model here you can actually slot into a leaderboard.

Anthropic reported SWE-bench Pro 80.3 for Fable 5 at launch. Its table also reports the higher of Mythos 5 or Fable 5 for some rows; starred cyber, biology, and reasoning rows reflect the restricted Mythos 5 sibling rather than ordinary Fable 5 behavior. Current Fable requests caught by a safety classifier return an explicit refusal. A retry on another model happens only if the developer or product has configured one; it is not an automatic Opus answer. Anthropic also did not publish SWE-bench Verified or GPQA for Fable 5 at launch, so there is no clean shared coding or science result to set against Gemini's.

GPT-5.5 is the one you simply can't rank. OpenAI's only official benchmarks for the flagship are HealthBench 56.5 (length-adjusted) and HealthBench Professional 51.8. It did not publish SWE-bench, Terminal-Bench, or OSWorld. There's no shared coding or reasoning benchmark between GPT-5.5 and the other two, so any paper ranking against Fable 5 or Gemini 3.1 Pro would be invented. We're not going to invent one. If a coding or reasoning score for GPT-5.5 matters to your decision, the honest answer is that OpenAI hasn't published it.

Published benchmark coverage, by lab

How much each lab disclosed at launch, scaled 0–100. Higher means more comparable benchmarks you can use. This measures disclosure, not capability.

Gemini 3.1 Pro

Full sheet

Claude Fable 5

Partial

GPT-5.5

Health only

Context and limits: close, but not identical

All three are roughly million-token models, which a year ago would've been the headline and today is table stakes. Fable 5 and Gemini 3.1 Pro both list 1M tokens of context. GPT-5.5 actually edges them at 1,050,000, though as noted you pay a surcharge past 272K input. The bigger practical split is on output: Fable 5 and GPT-5.5 both cap at 128K output tokens, while Gemini 3.1 Pro caps at 64K. If your job is generating very long single responses, Gemini's ceiling is half the other two, and that can force you to chunk work that the others would do in one pass.

One more limit worth internalizing: Gemini 3.1 Pro folds thinking tokens into its output price. A reasoning-heavy Gemini call bills the thinking against that output rate, so the "cheap" output number is less cheap on hard reasoning than it first looks. It's still the cheapest of the three, just not as cheap as the sticker implies once the model thinks hard.

Availability: all three require current provider terms

Claude Fable 5 is generally available worldwide again. A temporary promotion included it in paid Claude-plan limits through July 7, 2026; that window has ended, and current use consumes plan credits. In the API, a safety-classified request returns a refusal response. If your application should retry another model, you must configure that behavior and account for the retry model's own price.

GPT-5.5 is straightforward paid API access. Note the naming, though: the flagship GPT-5.5 was announced April 23, 2026, and it's distinct from "GPT-5.5 Instant," which has been the ChatGPT default since May 5, 2026. They're not the same model, and if you're benchmarking, make sure you're hitting the flagship. Gemini 3.1 Pro has no free API tier at all, just an AI Studio UI trial, and it's still officially a preview, released February 19, 2026, so its prices and limits may move.

5× Fable 5's input list price is roughly five times Gemini 3.1 Pro's at the low tier

Which one for which work

For everyday work where cost and broad capability both matter, default to Gemini 3.1 Pro. It's the cheapest by far, it has the only complete benchmark sheet, and its SWE-bench Verified 80.6 means you're not trading much measured capability for the savings. The two things to watch are the 64K output ceiling and the thinking-tokens-in-output billing on hard reasoning. For most teams this is the sensible default, and the full Gemini 3.1 Pro review goes deeper on where it slips.

For hard, long-horizon agentic engineering where budget is secondary, test Claude Fable 5 on your own acceptance set. Prompt caching can reduce repeated-input cost, but the promotion has ended and ordinary credits apply. Read the launch table carefully: some starred cyber and biology rows are Mythos 5 results, and safety-gated Fable requests return refusals unless your application explicitly retries elsewhere. The Fable 5 launch breakdown explains the distinction.

GPT-5.5 is the hardest to make a clean case for on paper, purely because OpenAI gave you so little to go on. If you already build on the OpenAI stack, GPT-5.5 is a reasonable mid-priced upgrade and its HealthBench numbers are strong for health-adjacent work. But if you're choosing fresh and you want to rank on evidence, it's tough to recommend over a Gemini that published everything or a Fable 5 that at least published its agentic coding score. The GPT-5.5 review covers what the upgrade buys you over GPT-5.

The one-line version: Gemini 3.1 Pro for value and breadth, Claude Fable 5 for premium agentic work with the caveat read, and GPT-5.5 only if the OpenAI ecosystem or its HealthBench strength is already your reason. None of the three is the obvious winner on every axis, and that's the honest state of the mid-2026 frontier.

Frequently asked

Which of the three is cheapest?

Gemini 3.1 Pro, by a wide margin. It lists at $2 per million input tokens and $12 output for prompts up to 200K tokens, rising to $4 and $18 above that. GPT-5.5 sits in the middle at $5 input and $30 output. Claude Fable 5 is the priciest at $10 input and $50 output, though prompt caching cuts its cache-hit input to $1.

Which model has the most published benchmarks?

Gemini 3.1 Pro, easily. Google published ARC-AGI-2 at 77.1, GPQA Diamond at 94.3, Humanity's Last Exam with tools at 51.4, MMMU-Pro at 80.5, SWE-bench Verified at 80.6, and MMMLU at 92.6. GPT-5.5 shipped only HealthBench numbers. Fable 5 led with SWE-bench Pro at 80.3 but did not publish SWE-bench Verified or GPQA at launch.

Why is Fable 5's launch table misleading?

Anthropic's launch table reports the higher of Mythos 5 or Fable 5 for each row. Starred cyber, biology, and some reasoning rows reflect restricted Mythos 5 results. A safety-classified Fable request returns an explicit refusal; another model is used only when the developer or product has configured a retry. So you cannot read every number in that table as ordinary Fable 5 performance.

Can you rank GPT-5.5 against the other two on benchmarks?

No. OpenAI published only HealthBench 56.5 (length-adjusted) and HealthBench Professional 51.8 for the flagship GPT-5.5. It did not publish SWE-bench, Terminal-Bench, or OSWorld. With no shared coding or reasoning benchmark, any head-to-head ranking against Fable 5 or Gemini 3.1 Pro on paper would be guesswork.

Which one should I pick for agentic coding?

If budget is secondary and the work is long-horizon agentic engineering, test Claude Fable 5 against your workload; its temporary paid-plan inclusion ended July 7, 2026 and current use consumes plan credits. Gemini 3.1 Pro has a lower listed API price and a provider-published coding score. GPT-5.5 is harder to compare for coding because OpenAI published no directly shared coding benchmark for it.

Do any of these have a free API tier?

Not in the usual sense. The three APIs are paid under their providers' current terms. Fable 5's temporary inclusion in paid Claude-plan limits ended July 7, 2026; that was a product-plan promotion, not a free API tier. Check each provider's live pricing before purchase because terms can change.

Changelog

July 23, 2026 — Updated Fable 5 restoration and expired promotion status; corrected refusal and developer-configured fallback behavior.
June 13, 2026 — Originally published. Prices, context windows, max output, published benchmarks, availability windows, and the Mythos 5 launch-table caveat verified against Anthropic, OpenAI, and Google official documentation.

References

Anthropic, "Claude Pricing," anthropic.com/pricing, accessed June 2026.
Anthropic, "Claude API Documentation," docs.claude.com, accessed June 2026.
OpenAI, "API Pricing," openai.com/api/pricing, accessed June 2026.
OpenAI, "API Documentation," platform.openai.com/docs, accessed June 2026.
Google, "Gemini API pricing," ai.google.dev/pricing, accessed June 2026.
Google, "Gemini models," ai.google.dev/gemini-api/docs, accessed June 2026.