Gemini 3.5 Flash, reviewed

Cheap frontier for agent loops, fast on output, and priced to undercut Pro. Just don't confuse it with the old budget Flash.

· View changelog · Figures verified against official sources, 30 May 2026

Agent loops are where token bills go to die. A coding agent that plans, calls tools, reads results, and retries can burn millions of tokens chewing through one ticket, and the model you pick gets multiplied by every step. That economics problem is exactly what Google aimed Gemini 3.5 Flash at: a model fast and cheap enough to sit inside a loop you run thousands of times a day, but strong enough that it doesn't fall apart on the hard steps. Be precise about the "cheap" part, though. Gemini 3.5 Flash is cheap compared to Pro and frontier reasoning models. It is not cheap compared to the Flash tier it replaces, which cost roughly a third as much. "Cheaper than Pro" and "cheap" are two different claims, and only the first one holds.

Google made that GA official at I/O on May 19, 2026, confirmed in the Gemini API changelog, and the pitch is unusually specific: this Flash beats the previous premium tier on the benchmarks Google cares about, at Flash speed and Flash cost. The numbers below are Google's own, so read them with that grain of salt. But the shape of the offer is clear, and it changes the math on what you can afford to automate.

Standard input / 1M $1.50 Output is $9 / 1M; cached input reads $0.15 / 1M
Output speed vs frontier Google's claim; no absolute tokens-per-second figure confirmed
vs prior Flash input Gemini 3 Flash was ~$0.50 / $3; this tier costs more
Input context window 1M 1,000,000 tokens in, 65,536 tokens out

The pricing, read honestly

Start with the official numbers, because the marketing word "cheap" does a lot of quiet work. On the standard paid tier, per Google's Gemini API pricing page, Gemini 3.5 Flash is $1.50 per million input tokens and $9 per million output. Cached input reads drop to $0.15 per million, and cache storage runs $1 per million tokens per hour. Run the same work through the batch or flex tier and the rate halves, to $0.75 input and $4.50 output. There's also a free tier: input, output, and context caching are free of charge within rate limits, which makes prototyping zero-cost before you commit.

Now the honesty guardrail. Set those rates next to the Flash models that came before. Gemini 3 Flash ran about $0.50 input and $3 output. Gemini 2.5 Flash-Lite was around $0.10 input and $0.40 output. So this generation of Flash costs roughly three times the input price of the one before it, and more than ten times the old Flash-Lite. Several commentators said it plainly: Flash is no longer the budget tier. What you're buying for that higher price is a model Google claims is in the same conversation as last generation's Pro, not the throwaway-cheap workhorse the name used to imply.

For agent loops, the per-token rate is only half the story. Google says tasks often complete at less than half the cost of the prior Pro tier, partly because the model is faster and partly because the default thinking level dropped from "high" to "medium," so it spends fewer tokens deliberating on routine steps. Fewer output tokens per task, at a lower output rate, compounds. Working out whether that math beats a flat-priced alternative for your specific job is the kind of thing benchr's price-per-use-case breakdown exists to settle, and if your loops are bloating their own context, the tactics in cutting token usage matter more here than on a model you call once.

What Google says it can do

The case for paying more than old-Flash money rests on the benchmarks, and these are first-party results: Google ran them and Google reports them. Independent leaderboards may land somewhere else, so treat the table as vendor-reported until third parties weigh in. With that flag raised, the numbers are striking for a Flash model.

Gemini 3.5 Flash headline benchmarks, all Google-reported (first-party)
BenchmarkScoreWhat it measures
Terminal-Bench 2.176.2%Coding and agentic terminal tasks
GDPval-AA1656 EloReal-world agentic tasks
MCP Atlas83.6%Tool-use reliability
CharXiv Reasoning84.2%Multimodal chart understanding

The through-line is agents. Terminal-Bench and MCP Atlas both probe whether a model can drive tools and recover from its own mistakes over many steps, which is the failure mode that sinks cheap models in production. A 76.2% on Terminal-Bench 2.1 and 83.6% on tool-use reliability are the kind of scores that, if they hold up outside Google's harness, mean you can trust the loop to keep going rather than babysitting it. The CharXiv number says it reads charts and figures competently too, so multimodal agentic work is on the table, not just text.

Two limits to keep in frame. Google states the "4x faster" claim against other frontier models but didn't publish an absolute tokens-per-second figure, so the speed is real per Google but not pinned to a hard number you can spec against. And the knowledge cutoff is January 2025 in the official docs. One third-party summary said January 2026; the official figure is the one to trust, which means anything more recent than early 2025 needs grounding through Search or your own retrieval.

What it connects to, and where it runs

For agentic work, the tool list matters as much as the benchmarks. Gemini 3.5 Flash supports Google Search grounding, Grounding with Google Maps, File Search, Code Execution, URL Context, and Function Calling. The one notable gap is Computer Use, which is not supported, so if your agent needs to click around a screen rather than call APIs, this isn't the model for that step.

Reach is wide from day one. Developers get it through the Gemini API in Google AI Studio, Android Studio, and Google Antigravity, Google's agent-first dev platform, plus Vertex AI for production. Enterprises get it via Gemini Enterprise and the Gemini Enterprise Agent Platform. Consumers meet it inside the Gemini app and AI Mode in Google Search. And the free API tier means you can wire up and test a full loop without a bill, then graduate to paid rates once volume justifies it.

The pitch isn't a smarter chatbot. It's a model cheap and fast enough to leave running.

How it stacks up against the alternatives

If you're choosing inside Google's own lineup, the comparison is with Gemini 3.1 Pro, and the calculus is straightforward: Google says Flash beats it on the agentic and coding benchmarks while costing less and running faster, so for high-volume agent work the Flash is the default and Pro becomes the model you escalate to for the hardest reasoning. Outside Google, the natural rival for the cheap-but-capable agentic slot is Anthropic's small tier. Anyone weighing this against Claude Haiku 4.5 should compare on the exact loop they run, because tool-use reliability and per-task token spend, not the sticker rate, decide which one is cheaper in production.

The broader point is that "Flash" as a category has moved upmarket. It used to mean the cheapest thing Google sold; now it means the cheap end of frontier-class. That repositioning is why the price went up, and it's why the right comparison set is Pro-tier and capable-mid models, not budget models. If your real need is the absolute floor on cost, this generation of Flash isn't it.

The verdict

Gemini 3.5 Flash earns its score on fit. As a model to drop inside coding and long-horizon agent loops, the combination of claimed Pro-beating agentic benchmarks, roughly 4x output speed, a lower-than-Pro rate, and a free tier for prototyping is a strong package, and the 1M context window plus a full set of grounding and tool integrations back it up. The benchmarks being first-party is the main asterisk; wait for independent leaderboards before betting a mission-critical pipeline on the exact figures.

Go with Gemini 3.5 Flash if you're running high-volume agentic or coding work and were reaching for a Pro-tier model you don't strictly need, because here you get most of the capability at lower cost and higher speed. Skip it if your workload lived happily on the old ~$0.50/$3 Flash and doesn't need the new agentic strength, since this is a real price increase for that use. And stick with a frontier reasoning model when the task is hard single-shot reasoning rather than a loop of moderate steps, where raw capability beats throughput economics. For the job it was built for, this is one of the better-value seats on the board right now.

Frequently asked

How much does Gemini 3.5 Flash cost?

The standard paid tier is $1.50 per million input tokens and $9.00 per million output, with cached input reads at $0.15 per million, per Google's official pricing page. Batch and flex tiers cut that roughly in half, to $0.75 input and $4.50 output. There is also a free API tier where input, output, and context caching are free of charge within rate limits.

Is Gemini 3.5 Flash cheap?

It depends what you compare it to. Against Pro and frontier reasoning models it is cheap. Against earlier Flash tiers it is not: Gemini 3 Flash was about $0.50 input and $3 output, and Gemini 2.5 Flash-Lite was around $0.10 input and $0.40 output, so this generation is roughly three times the price of the Flash before it. Several commentators flagged that Flash is no longer the budget option it used to be. Treat it as a cheap frontier-class model, not a cheap small model.

Does Gemini 3.5 Flash beat Gemini 3.1 Pro on coding?

That is Google's claim, and the headline numbers are Google's own first-party results. Google reports Terminal-Bench 2.1 at 76.2%, GDPval-AA at 1656 Elo, MCP Atlas at 83.6%, and CharXiv Reasoning at 84.2%, and positions 3.5 Flash as beating the previous 3.1 Pro tier on coding and agentic work at Flash speed. Independent leaderboards may land differently, so read these as vendor-reported until third parties confirm them.

What is the context window and knowledge cutoff?

Gemini 3.5 Flash has a 1,000,000-token input context window and a maximum output of 65,536 tokens, both confirmed on Google's official docs. The knowledge cutoff is stated as January 2025 in the official documentation. One third-party summary listed January 2026, but the official figure is January 2025.

What can Gemini 3.5 Flash connect to?

It supports Google Search grounding, Grounding with Google Maps, File Search, Code Execution, URL Context, and Function Calling. Computer Use is not supported. It is reachable through the Gemini API in Google AI Studio, Android Studio, and Google Antigravity, plus Vertex AI, Gemini Enterprise, the consumer Gemini app, and AI Mode in Google Search.

Changelog

  • May 30, 2026 — Originally published. The $1.50/$9 standard pricing, batch and flex tiers, free tier, 1M context window, 65,536-token output cap, supported tools, and the May 19, 2026 GA date verified against Google's Gemini API pricing page, the "What's new in Gemini 3.5" docs, and the Gemini API changelog. The four headline benchmark figures are Google first-party results, labeled vendor-reported. Prior-Flash pricing comparison noted from contemporaneous reporting; knowledge cutoff stated as January 2025 per official docs.

References

  1. Google, "Gemini 3.5," blog.google, May 19, 2026.
  2. Google, "Gemini API pricing," ai.google.dev/gemini-api/docs/pricing, accessed May 2026.
  3. Google, "What's new in Gemini 3.5," ai.google.dev, accessed May 2026.
  4. Google, "Gemini API changelog," ai.google.dev/gemini-api/docs/changelog, accessed May 2026.
  5. MarkTechPost, "Google introduces Gemini 3.5 Flash at I/O 2026," marktechpost.com, May 20, 2026.