Google's biggest cleanup so far
The 2.5 generation carried Google's API story through most of 2025; 2.5 Pro was the first Gemini many teams took seriously for reasoning, and 2.5 Flash became the default cheap multimodal pick. Both got their end date when Google's deprecation table put October 16, 2026 next to each ID. The 2.0 Flash family already went dark June 1, and Gemini 3 Pro Preview lasted just 16 weeks before its March 9 shutdown. By November, nothing before the Gemini 3 line will answer an API call.
One Google-specific wrinkle: the company publishes these as earliest-possible dates and promises advance confirmation of the real one. Don't read that as a reprieve. Plan for October 16; treat anything later as a bonus.
What migration does to the bill
| Path | Input, today | Input, after | Output, today | Output, after |
|---|---|---|---|---|
| 2.5 Pro → 3.1 Pro | $1.25 | $2.00 | $10.00 | $12.00 |
| 2.5 Flash → 3.5 Flash | $0.30 | $1.50 | $2.50 | $9.00 |
Spelled out: the Pro path costs 60% more on input and 20% more on output, and the long-context surcharge above 200K tokens rises in step ($2.50/$15 becomes $4/$18). The Flash path is the painful one. A pipeline pushing 500M input and 50M output tokens a month pays $275 today on 2.5 Flash — and $1,200 on 3.5 Flash. Same traffic, 4.4× the invoice.
To be fair to the replacements: Gemini 3.5 Flash is a far stronger model. It's the coding-agent pick of the current Gemini line, and our review rates it the family's best value at frontier quality. The point isn't that the new models are bad. It's that "recommended replacement" and "equivalent cost" stopped being the same thing, and Google's docs won't do that math for you.
Migration notes that save a weekend
Model IDs: swap to gemini-3.1-pro-preview or gemini-3.5-flash. Note the 3.1 Pro replacement still carries the Preview label — Google retired a preview into a preview once already this year, so pin your tests to current behavior rather than assuming long stability.
Thinking budgets: the Gemini 3 line reasons by default and bills those tokens as output. Output-heavy traffic that was priced on 2.5-era token counts will drift. Watch the first invoices, not just the rate card.
Vertex AI users: Google Cloud's deprecation calendar is managed separately from the Gemini API's. The dates usually rhyme; they aren't guaranteed identical. Check the Vertex model table for your project's region.
Re-quote before you re-platform. Ten minutes in the cost calculator with your real token volumes settles whether the official path, a cheaper provider, or a split (3.5 Flash for hard calls, a budget model for bulk) wins. The rankings show where every current model sits on price and capability.