These are the three frontier models you're most likely deciding between in the middle of 2026: Claude Fable 5 from Anthropic, GPT-5.5 from OpenAI, and Gemini 3.1 Pro from Google. They landed close together, they all carry roughly million-token context, and they sit at three very different price points. The hard part isn't finding the most capable model in the abstract. It's that the three labs published wildly different amounts of evidence, so a straight benchmark shootout isn't even possible. This piece lays out what each lab actually disclosed, flags the one launch table you shouldn't read at face value, and gives you a per-use-case verdict.
Start with the shape of the field. Gemini 3.1 Pro is the cheapest by a lot and the only one of the three with a full public benchmark sheet. Fable 5 is the most expensive and the most clearly aimed at long-horizon agentic work, but its launch numbers come with an asterisk we'll get into. GPT-5.5 is the awkward middle: mid-priced, broadly capable by reputation, and almost undocumented on the benchmarks that matter for ranking it against the other two.
Price: the gap is enormous
The spread here is wider than the usual frontier-tier shuffle. Gemini 3.1 Pro lists at $2 per million input tokens and $12 output for prompts up to 200K tokens, then steps up to $4 and $18 above 200K, with thinking tokens billed inside that output price. GPT-5.5 sits at $5 input and $30 output, with a cache-hit input price of $0.50. Claude Fable 5 is the priciest at $10 input and $50 output. Fable 5's prompt caching is aggressive, though: a 90% input discount drops cache-hit input to $1, which matters a lot for agentic loops that re-read the same context turn after turn.
Two pricing wrinkles are easy to miss. GPT-5.5's context isn't a round million, it's 1,050,000 tokens, and any session over 272K input tokens carries a surcharge of 2× on input and 1.5× on output. Gemini's tiered pricing flips at 200K, so a long-context Gemini job is priced very differently from a short one. Read the tier you'll actually land in, not the headline rate.
| Spec | Claude Fable 5 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|
| Input price (per 1M) | $10 ($1 cache-hit) | $5 ($0.50 cache-hit) | $2 up to 200K, $4 above |
| Output price (per 1M) | $50 | $30 | $12 up to 200K, $18 above |
| Context window | 1M tokens | 1,050,000 tokens | 1M tokens |
| Max output | 128K tokens | 128K tokens | 64K tokens |
| Headline published benchmark | SWE-bench Pro 80.3 (see caveat) | HealthBench 56.5 | SWE-bench Verified 80.6 |
| Availability | Free on paid Claude plans Jun 9–22, then credits | Paid API | Preview, no free API tier |
Benchmarks: only one lab gave you a full sheet
This is where the comparison breaks from the usual pattern, because the three labs published radically different amounts. Gemini 3.1 Pro is the documented one. Google reported ARC-AGI-2 at 77.1, GPQA Diamond at 94.3, Humanity's Last Exam with tools at 51.4, MMMU-Pro at 80.5, SWE-bench Verified at 80.6, and MMMLU at 92.6. That's a broad sheet covering reasoning, science, multimodal, and coding, and it's why Gemini is the one model here you can actually slot into a leaderboard.
Claude Fable 5 led with SWE-bench Pro at 80.3, and on its face that's a strong agentic-coding number. But there's a catch you need to understand before you quote it. Anthropic's launch table reports the higher of Mythos 5 or Fable 5 for each row, and the starred rows, including cyber, biology, and some reasoning scores, reflect Mythos 5, the restricted sibling, not Fable 5. On those starred tasks, everyday Fable 5 lands closer to Opus 4.8 because safety classifiers gate the request and fall back to the smaller model. Anthropic also did not publish SWE-bench Verified or GPQA for Fable 5 at launch, so there's no clean apples-to-apples coding or science number to set against Gemini's.
GPT-5.5 is the one you simply can't rank. OpenAI's only official benchmarks for the flagship are HealthBench 56.5 (length-adjusted) and HealthBench Professional 51.8. It did not publish SWE-bench, Terminal-Bench, or OSWorld. There's no shared coding or reasoning benchmark between GPT-5.5 and the other two, so any paper ranking against Fable 5 or Gemini 3.1 Pro would be invented. We're not going to invent one. If a coding or reasoning score for GPT-5.5 matters to your decision, the honest answer is that OpenAI hasn't published it.
Context and limits: close, but not identical
All three are roughly million-token models, which a year ago would've been the headline and today is table stakes. Fable 5 and Gemini 3.1 Pro both list 1M tokens of context. GPT-5.5 actually edges them at 1,050,000, though as noted you pay a surcharge past 272K input. The bigger practical split is on output: Fable 5 and GPT-5.5 both cap at 128K output tokens, while Gemini 3.1 Pro caps at 64K. If your job is generating very long single responses, Gemini's ceiling is half the other two, and that can force you to chunk work that the others would do in one pass.
One more limit worth internalizing: Gemini 3.1 Pro folds thinking tokens into its output price. A reasoning-heavy Gemini call bills the thinking against that output rate, so the "cheap" output number is less cheap on hard reasoning than it first looks. It's still the cheapest of the three, just not as cheap as the sticker implies once the model thinks hard.
Availability: one's free right now, one has no free API at all
The access stories are as different as the prices. Claude Fable 5 is free to use inside Claude's Pro, Max, Team, and Enterprise plans from June 9 through June 22, 2026, with usage credits kicking in from June 23. That promo is about the Claude product, not the raw API. There's also a hard restriction to plan around: cyber, bio, and distillation requests fall back to Opus 4.8 rather than running on Fable 5, which is the same safety-gating that makes those starred launch numbers misleading.
GPT-5.5 is straightforward paid API access. Note the naming, though: the flagship GPT-5.5 was announced April 23, 2026, and it's distinct from "GPT-5.5 Instant," which has been the ChatGPT default since May 5, 2026. They're not the same model, and if you're benchmarking, make sure you're hitting the flagship. Gemini 3.1 Pro has no free API tier at all, just an AI Studio UI trial, and it's still officially a preview, released February 19, 2026, so its prices and limits may move.
Which one for which work
For everyday work where cost and broad capability both matter, default to Gemini 3.1 Pro. It's the cheapest by far, it has the only complete benchmark sheet, and its SWE-bench Verified 80.6 means you're not trading much measured capability for the savings. The two things to watch are the 64K output ceiling and the thinking-tokens-in-output billing on hard reasoning. For most teams this is the sensible default, and the full Gemini 3.1 Pro review goes deeper on where it slips.
For hard, long-horizon agentic engineering where you want the strongest tool and budget is secondary, reach for Claude Fable 5, especially during the June 9–22 free window on paid Claude plans. Its prompt caching makes the high sticker price more bearable on agentic loops that re-read context. Just go in clear-eyed about the launch table: the headline-grabbing cyber and bio rows are Mythos 5, and on safety-gated tasks you're getting something closer to Opus 4.8. The Fable 5 launch breakdown walks through that sibling-table issue in detail.
GPT-5.5 is the hardest to make a clean case for on paper, purely because OpenAI gave you so little to go on. If you already build on the OpenAI stack, GPT-5.5 is a reasonable mid-priced upgrade and its HealthBench numbers are genuinely strong for health-adjacent work. But if you're choosing fresh and you want to rank on evidence, it's tough to recommend over a Gemini that published everything or a Fable 5 that at least published its agentic coding score. The GPT-5.5 review covers what the upgrade buys you over GPT-5.
The one-line version: Gemini 3.1 Pro for value and breadth, Claude Fable 5 for premium agentic work with the caveat read, and GPT-5.5 only if the OpenAI ecosystem or its HealthBench strength is already your reason. None of the three is the obvious winner on every axis, and that's the honest state of the mid-2026 frontier.