GPT-5.4, reviewed: the value pick OpenAI doesn't advertise

It held the flagship crown for seven weeks. Reviewing it after the crown moved is when the price-performance story gets honest.

By the benchr team · · View changelog · Figures verified against official sources, June 10, 2026

Input / 1MOutput $15 · OpenAI
OSWorld-Verifiedhuman baseline: 72.4%
Finance benchmarkOpenAI internal · GPT-5 scored 43.7%
Contextsurcharge above 272K

Most model reviews run in launch week, when the only available story is the vendor's. This one is deliberately three months late. GPT-5.4 shipped March 5, 2026 as "our most capable and efficient frontier model for professional work," held that position for seven weeks, and handed the crown to GPT-5.5 on April 23. Now the marketing dust has settled, the prices are stable, and the question worth answering isn't "is it impressive" but "who should still buy it." More models deserve this treatment.

What it brought that GPT-5 didn't

Three things, all still true. First, context: up to 1M tokens against GPT-5's 400K, with the caveat that the standard-rate window is 272K and longer inputs carry a surcharge per OpenAI's pricing page. Second, computer use, built in rather than bolted on: 75% on OSWorld-Verified per OpenAI's launch material, against 47.3% for GPT-5.2 and a 72.4% human baseline. A model that crosses the human line on desktop tasks changed what teams could automate, and that capability didn't expire when GPT-5.5 arrived. Third, accuracy: OpenAI reported responses 18% less likely to contain errors than GPT-5.2, and individual claims 33% less likely to be wrong.

The finance tuning is the identity

OpenAI built GPT-5.4 with finance practitioners and reported its internal investment-banking benchmark jumping from 43.7% with GPT-5 to 87.3% with GPT-5.4 Thinking. The model launched alongside ChatGPT for Excel on the same day, with FactSet, S&P Global, and Moody's data integrations following. Those are vendor numbers on a vendor benchmark, so hold them loosely. But the product strategy they describe is real and visible: this was the model OpenAI aimed at people whose job is a workbook. The spreadsheets roundup covers how that bet landed in practice.

Where the record is thin

No official SWE-bench Verified score was published for GPT-5.4, which is unusual for a 2026 frontier release and means its coding position rests on estimates. benchr's index carries an editorial estimate of 80% (flagged as an estimate, sitting between GPT-5's official 74.9% and GPT-5.5's official 84.0%), and you should treat it exactly that way. If your buying decision hangs on verified coding numbers, GPT-5.5 and Claude Opus 4.8 publish theirs; GPT-5.4 makes you test for yourself.

Against GPT-5.5: the halving question

GPT-5.5 costs exactly double — $5/$30 against $2.50/$15 — and its published gains are real: 84.0% SWE-bench Verified, stronger agentic coding, the new flagship's tuning attention. The honest split: if your work is frontier coding agents or you need the verified benchmark ceiling, pay for 5.5. If your work is documents, spreadsheets, computer use, and long-context analysis, GPT-5.4 does the job at half the rate, and the surcharge structures are identical so neither escapes the 272K cliff. The full rate math lives in the GPT-5.4 pricing breakdown.

The family footnote

GPT-5.4 mini and nano followed on March 17. Mini became the free-tier ChatGPT model, which tells you its quality floor; nano is API-only for volume work. Neither is in benchr's verified figure record with official per-token rates yet, so this review scores only the main model. One more footnote for the timeline: GPT-5.3 Instant, a separate fast ChatGPT default from March 3, is unrelated to this API family despite the neighboring number — OpenAI's naming did nobody favors that month.

Frequently asked

Is GPT-5.4 still worth using after GPT-5.5?

Yes, for a specific shape of work. It costs exactly half of GPT-5.5 and keeps the features that separated it from GPT-5: context up to 1M tokens and built-in computer use at 75% OSWorld-Verified. If you need those but not the flagship's benchmark ceiling, it's the better buy.

What is GPT-5.4 best at?

Professional document work. OpenAI tuned it on real finance workflows (internal investment-banking benchmark: 43.7% with GPT-5, 87.3% with GPT-5.4 Thinking) and launched it alongside ChatGPT for Excel. Computer use is built in, at 75% on OSWorld-Verified against a 72.4% human baseline.

What are GPT-5.4's weaknesses?

No official SWE-bench Verified score, so its coding position rests on estimates. The advertised 1M context carries a surcharge above the 272K standard window. And OpenAI's attention moved to GPT-5.5 seven weeks after launch, so expect fewer updates than the flagship gets.

Changelog

  • June 10, 2026 — Published as a deliberate retrospective, three months after the March 5 launch. Pricing and context verified on OpenAI's pricing page; OSWorld, accuracy, and finance figures attributed to OpenAI's launch material; the SWE-bench gap flagged as an honest hole in the record.

References

  1. OpenAI, "Introducing GPT-5.4 mini and nano," openai.com, March 17, 2026.
  2. OpenAI API pricing, openai.com/api/pricing, verified June 10, 2026.
  3. OpenAI, "Introducing ChatGPT for Excel and new financial data integrations," openai.com, March 5, 2026.
  4. OpenAI model release notes, help.openai.com, accessed June 10, 2026.