Most model reviews run in launch week, when the only available story is the vendor's. This one is deliberately three months late. GPT-5.4 shipped March 5, 2026 as "our most capable and efficient frontier model for professional work," held that position for seven weeks, and handed the crown to GPT-5.5 on April 23. Now the marketing dust has settled, the prices are stable, and the question worth answering isn't "is it impressive" but "who should still buy it." More models deserve this treatment.
What it brought that GPT-5 didn't
Three things, all still true. First, context: up to 1M tokens against GPT-5's 400K, with the caveat that the standard-rate window is 272K and longer inputs carry a surcharge per OpenAI's pricing page. Second, computer use, built in rather than bolted on: 75% on OSWorld-Verified per OpenAI's launch material, against 47.3% for GPT-5.2 and a 72.4% human baseline. A model that crosses the human line on desktop tasks changed what teams could automate, and that capability didn't expire when GPT-5.5 arrived. Third, accuracy: OpenAI reported responses 18% less likely to contain errors than GPT-5.2, and individual claims 33% less likely to be wrong.
The finance tuning is the identity
OpenAI built GPT-5.4 with finance practitioners and reported its internal investment-banking benchmark jumping from 43.7% with GPT-5 to 87.3% with GPT-5.4 Thinking. The model launched alongside ChatGPT for Excel on the same day, with FactSet, S&P Global, and Moody's data integrations following. Those are vendor numbers on a vendor benchmark, so hold them loosely. But the product strategy they describe is real and visible: this was the model OpenAI aimed at people whose job is a workbook. The spreadsheets roundup covers how that bet landed in practice.
Where the record is thin
No official SWE-bench Verified score was published for GPT-5.4, which is unusual for a 2026 frontier release and means its coding position rests on estimates. benchr's index carries an editorial estimate of 80% (flagged as an estimate, sitting between GPT-5's official 74.9% and GPT-5.5's official 84.0%), and you should treat it exactly that way. If your buying decision hangs on verified coding numbers, GPT-5.5 and Claude Opus 4.8 publish theirs; GPT-5.4 makes you test for yourself.
Against GPT-5.5: the halving question
GPT-5.5 costs exactly double — $5/$30 against $2.50/$15 — and its published gains are real: 84.0% SWE-bench Verified, stronger agentic coding, the new flagship's tuning attention. The honest split: if your work is frontier coding agents or you need the verified benchmark ceiling, pay for 5.5. If your work is documents, spreadsheets, computer use, and long-context analysis, GPT-5.4 does the job at half the rate, and the surcharge structures are identical so neither escapes the 272K cliff. The full rate math lives in the GPT-5.4 pricing breakdown.
The family footnote
GPT-5.4 mini and nano followed on March 17. Mini became the free-tier ChatGPT model, which tells you its quality floor; nano is API-only for volume work. Neither is in benchr's verified figure record with official per-token rates yet, so this review scores only the main model. One more footnote for the timeline: GPT-5.3 Instant, a separate fast ChatGPT default from March 3, is unrelated to this API family despite the neighboring number — OpenAI's naming did nobody favors that month.