Pricing Guide·July 2026

Gemini 3.5 Pro pricing: cost per 1M tokens and cost scenarios

Q: How much does Gemini 3.5 Pro cost per 1M tokens?

For calls using up to 200,000 tokens of context, Gemini 3.5 Pro costs $2.50/1M input and $15/1M output. Cross 200,000 tokens in a single call and the entire call re-prices to $5/1M input and $22/1M output. Cached input is $0.25/1M, plus $1 per 1M-token-hour of cache storage. Batch processing is a flat 50% off in both tiers: $1.25/$7.50 at or below 200K tokens, $2.50/$11 above it. There is no free API tier.

Q: How large is the 2 million token context window, and what does it cost to use?

Gemini 3.5 Pro's context window is 2,000,000 tokens — an industry-first size at the frontier tier when it shipped, and nearly double GPT-5.6 Sol's 1,100,000 tokens or Claude Sonnet 5's 1,000,000 tokens. The catch is pricing: any call that uses more than 200,000 tokens of that window bills its entire input at $5/1M instead of $2.50/1M, and output at $22/1M instead of $15/1M — a 100% jump on input and a 46.7% jump on output. Max output per call is capped at 100,000 tokens, half of Claude Sonnet 5's 200,000-token ceiling.

Q: How does Gemini 3.5 Pro's GPQA Diamond score compare to Claude Opus 4.8 and GPT-5.6 Sol?

Gemini 3.5 Pro scores 95.5% on GPQA Diamond — the highest score benchr tracks across this update, ahead of Gemini 3.1 Pro (94.3%), Claude Opus 4.8 (93.6%), Claude Sonnet 5 (92.0%), and GPT-5.6 Sol (91.2%). On SWE-bench Verified, though, Gemini 3.5 Pro's 85.5% trails Claude Sonnet 5 (89.4%) and GPT-5.6 Sol (89.8%) — it leads on PhD-level science reasoning, not coding benchmarks.

Gemini 3.5 Pro is Google reclaiming the reasoning crown: a 2,000,000-token context window — an industry first at the frontier tier — and a 95.5% GPQA Diamond score, the highest benchr tracks. Priced at $2.50/1M input and $15/1M output below 200,000 tokens, with a tiered jump to $5/$22 above it, this is the model to reach for when context depth or PhD-level reasoning is the constraint, not raw coding throughput.

By the benchr team · Published July 1, 2026 · Figures verified against official sources, July 1, 2026 · View changelog

Input / 1M (≤200K)Google · June 2026

Output / 1M (≤200K)Google

GPQA Diamondhighest tracked

Contextmax window

Pricing breakdown

gemini-3.5-pro — official Google AI pricing
Tier	Rate / 1M tokens
Standard input (≤200K)	$2.50
Standard input (>200K)	$5.00
Standard output (≤200K)	$15.00
Standard output (>200K)	$22.00
Cached input	$0.25
Cache storage	$1.00 per 1M-token-hour
Batch input (≤200K)	$1.25
Batch output (≤200K)	$7.50
Batch input (>200K)	$2.50
Batch output (>200K)	$11.00
Free tier	Not offered
Context window	2,000,000 tokens
Max output	100,000 tokens

Tiered pricing: what changes above 200,000 tokens

Gemini 3.5 Pro uses the same tiered pricing shape Google shipped with Gemini 3.1 Pro, at a higher base rate. Stay at or below 200,000 tokens of context in a call and you pay $2.50/1M input, $15/1M output. Cross that line — even by one token — and the entire call re-prices: $5/1M input (a 100% increase) and $22/1M output (a 46.7% increase, since $22 is $7 more than $15, and $7 ÷ $15 = 0.467). This is a per-call threshold, not a monthly cap, so a workload that mixes short and long calls pays the lower rate on the short ones and the higher rate only on the long ones.

The 2,000,000-token context window

At 2,000,000 tokens, Gemini 3.5 Pro's context window is an industry-first size at the frontier tier when it shipped — nearly double GPT-5.6 Sol's 1,100,000 tokens and exactly twice Claude Sonnet 5's 1,000,000 tokens. That headroom is the point of the model: workloads that need more context than any other frontier model tracked in this update can offer now have somewhere to go. The trade-off is the tiered pricing above and the output ceiling: max output per call is 100,000 tokens, half of Claude Sonnet 5's 200,000-token cap, so extremely long single-response generation still favors Anthropic's new mid-tier model even though Gemini 3.5 Pro can ingest far more.

GPQA Diamond 95.5: the highest score benchr tracks

GPQA Diamond tests PhD-level questions in biology, chemistry, and physics. Gemini 3.5 Pro's 95.5% is the highest score benchr tracks across this update — ahead of Gemini 3.1 Pro (94.3%), Claude Opus 4.8 (93.6%), Claude Sonnet 5 (92.0%), and GPT-5.6 Sol (91.2%). It's also a new high for the Gemini family on ARC-AGI-2, at 80.0 versus Gemini 3.1 Pro's 77.1. The gap doesn't carry over to coding: on SWE-bench Verified, Gemini 3.5 Pro's 85.5% trails GPT-5.6 Sol (89.8%) and Claude Sonnet 5 (89.4%). Route reasoning-heavy science and research work here; route coding-heavy agent work elsewhere.

Where Gemini 3.5 Pro fits in the Gemini family

Gemini 3.5 Pro sits above both Gemini 3.1 Pro and Gemini 3.5 Flash — it's the deepest-reasoning, longest-context model Google offers, launched after Flash had already beaten 3.1 Pro on coding benchmarks. If your workload is latency-sensitive or coding-agent-shaped, Flash remains the faster, cheaper pick. If you need more context or reasoning depth than 3.1 Pro provides, 3.5 Pro is the upgrade path. Use benchr's comparison tools or the model rankings to weigh this against non-Gemini options for your specific workload.

Cost scenarios

A single long-context call. One call using 1,000,000 input tokens exceeds the 200K threshold, so the whole call bills at the higher tier: 1,000,000 × $5/1M = $5.00 for input. Add 20,000 output tokens at $22/1M = $0.44. Total: $5.44 for that single call.

A typical month, calls under 200K. At 10M input + 2M output tokens per month, all within the 200K-per-call tier: 10 × $2.50 = $25 input, 2 × $15 = $30 output, total $55/month. The same volume on Claude Opus 4.8 ($5/$25): 10 × $5 = $50 input, 2 × $25 = $50 output, total $100/month — Gemini 3.5 Pro costs 55% of that, a 45% saving. Against GPT-5.6 Sol ($5/$30) at the same volume: $50 + $60 = $110/month — Gemini 3.5 Pro is exactly half.

The same month, calls over 200K. If every call in that 10M/2M month crosses the 200K threshold: 10 × $5 = $50 input, 2 × $22 = $44 output, total $94/month — 70.9% more than the under-200K scenario ($39 more on a $55 base), but still 6% cheaper than Claude Opus 4.8's $100/month and 14.5% cheaper than GPT-5.6 Sol's $110/month at the same volume.

Cached input. At a 90% cache hit rate within the 200K tier: 0.9 × $0.25 + 0.1 × $2.50 = $0.225 + $0.25 = $0.475 effective per million — an 81% reduction from the $2.50 uncached rate. Note the separate $1-per-1M-token-hour cache storage charge applies on top of that discounted read rate, so cache economics depend on how long you hold context in cache, not just the hit rate.

Batch processing. Batch is a flat 50% discount in both tiers: $1.25/$7.50 at or below 200K ($1.25 ÷ $2.50 = 50%, $7.50 ÷ $15 = 50%), and $2.50/$11 above it ($2.50 ÷ $5 = 50%, $11 ÷ $22 = 50%). For non-interactive, high-volume jobs where turnaround time isn't the constraint, batch halves the bill regardless of which context tier you're in.

API ID

The model ID is gemini-3.5-pro via the Google AI Gemini API and Vertex AI. There is no free API tier for this model — unlike Gemini 3.5 Flash, every call to Gemini 3.5 Pro is billed from the first token.

Use-case fit

Best for: Single calls that genuinely need more than 1,000,000 tokens of context; PhD-level science and research reasoning where GPQA Diamond depth matters; teams already on Gemini 3.1 Pro who are hitting its context or reasoning ceiling; workloads where the absolute reasoning score matters more than coding throughput.

Skip if: Your calls are mostly under 200,000 tokens and don't need frontier-level GPQA depth — Gemini 3.5 Flash is faster and cheaper for coding-agent work. Skip it too if SWE-bench Verified is your primary metric — Claude Sonnet 5 (89.4%) and GPT-5.6 Sol (89.8%) both outscore Gemini 3.5 Pro's 85.5%. And skip it if you need more than 100,000 tokens of output per call — Claude Sonnet 5's 200,000-token ceiling is double.

Decision checklist

Measure your typical context length before committing: if your p90 call size regularly crosses 200,000 tokens, budget for the $5/$22 tier, not the $2.50/$15 headline rate — the difference compounds quickly at volume, as the cost scenarios above show.

Confirm whether GPQA-style reasoning depth is actually your bottleneck, or whether you're really optimizing for coding throughput. If it's the latter, Gemini 3.5 Flash or Claude Sonnet 5 are both cheaper and score higher on SWE-bench Verified than Gemini 3.5 Pro.

Frequently asked

How much does Gemini 3.5 Pro cost per 1M tokens?

Up to 200,000 tokens of context: $2.50/1M input, $15/1M output. Above 200,000 tokens, the whole call re-prices to $5/1M input, $22/1M output. Cached input is $0.25/1M plus $1 per 1M-token-hour of cache storage. Batch is a flat 50% off either tier: $1.25/$7.50 or $2.50/$11. No free API tier.

How large is the 2 million token context window, and what does it cost to use?

2,000,000 tokens — an industry-first size at the frontier tier when it shipped, nearly double GPT-5.6 Sol's 1,100,000 and exactly twice Claude Sonnet 5's 1,000,000. Using more than 200,000 tokens in one call re-prices the entire call to the higher tier: a 100% jump on input, a 46.7% jump on output. Max output per call is 100,000 tokens.

How does Gemini 3.5 Pro's GPQA Diamond score compare to Claude Opus 4.8 and GPT-5.6 Sol?

95.5% — the highest benchr tracks, ahead of Gemini 3.1 Pro (94.3%), Claude Opus 4.8 (93.6%), Claude Sonnet 5 (92.0%), and GPT-5.6 Sol (91.2%). On SWE-bench Verified, though, its 85.5% trails Claude Sonnet 5 (89.4%) and GPT-5.6 Sol (89.8%) — it's the reasoning leader, not the coding leader.

Changelog

July 1, 2026 — Published. Gemini 3.5 Pro released June 30, 2026; pricing and benchmarks verified against Google's official pricing page and benchr's models.json, July 1, 2026.

Sources

Google AI Gemini pricing — ai.google.dev/pricing (verified July 1, 2026)
GPQA Diamond leaderboard — huggingface.co/spaces/opencompass (verified July 1, 2026)
SWE-bench Verified leaderboard — swebench.com (verified July 1, 2026)
benchr models.json — verified July 1, 2026