Comparison·June 2026

AI Model Pricing Comparison 2026: Cost per Million Tokens

Complete comparison of API token pricing (input, output, caching) across OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and open-weights. Sourced directly from official docs.

By the benchr team · Reviewed June 6, 2026 · Figures verified against official sources, June 6, 2026 · View changelog

API pricing for large language models has commoditized rapidly, but the pricing structures have become more complex. Between input, output, prompt caching, batch execution, and self-hosted instances, developers must calculate pricing carefully to estimate their monthly workloads. Below is our dynamic pricing comparison table, kept in exact sync with the main model index.

Model	Provider	Input / 1M	Output / 1M	Cached Input / 1M
Qwen3.6-27B	Alibaba	Self-hosted	Self-hosted	—
Llama 4 Maverick	Meta	Self-hosted	Self-hosted	—
Llama 4 Scout	Meta	Self-hosted	Self-hosted	—
Phi-4	Microsoft	Self-hosted	Self-hosted	—
DeepSeek V4-Flash	DeepSeek	$0.140	$0.280	$0.0028
GPT-5 Mini	OpenAI	$0.250	$2.00	$0.025
DeepSeek V4-Pro	DeepSeek	$0.435	$0.870	$0.0036
Mistral Large 3	Mistral	$0.500	$1.50	—
Kimi K2.6	Moonshot AI	$0.950	$4.00	$0.160
Claude Haiku 4.5	Anthropic	$1.00	$5.00	$0.100
GPT-5	OpenAI	$1.25	$10.00	—
Grok 4.3	xAI	$1.25	$2.50	$0.200
Gemini 3.5 Flash	Google	$1.50	$9.00	$0.150
Mistral Medium 3.5	Mistral	$1.50	$7.50	—
Gemini 3.1 Pro	Google	$2.00	$12.00	$0.200
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	$0.300
Claude Opus 4.8	Anthropic	$5.00	$25.00	$0.500
Claude Opus 4.7	Anthropic	$5.00	$25.00	$0.500
GPT-5.5	OpenAI	$5.00	$30.00	$0.500

Pricing Tiers: Frontier vs. Mid-Tier vs. Small

When analyzing costs, models generally fall into three tiers:

Frontier Tier ($5.00+ Input / $15.00+ Output): Reserved for absolute top-tier intelligence like Claude Opus 4.8 and GPT-5.5. These models are ideal for complex architectural decisions and high-stakes agent loops, but are too expensive for daily high-volume tasks.
Mid-Tier ($1.25 - $3.00 Input / $2.50 - $15.00 Output): Models like Claude Sonnet 4.6, GPT-5, and Gemini 3.5 Flash represent the "sweet spot" for production applications, blending high capability with moderate pricing.
Small & Open Coder Tier (Sub-$1.00 Input): Models like DeepSeek V4-Flash, GPT-5 Mini, and self-hosted Llama 4 Scout. These provide fast responses and negligible costs, making them perfect for routing, classification, or high-volume summarization.

For more details on overall performance rankings, see our AI Model Rankings or compute your exact monthly cost with the Cost Calculator.

How to avoid false precision

Per-million-token pricing looks exact, but real invoices depend on workload shape. A support bot with a long policy prompt and short answers is mostly input cost; a writing assistant with short prompts and long drafts is mostly output cost. The same model can be cheap in one workflow and expensive in another.

Use the table as a normalized baseline, then calculate your own mix: average input tokens, average output tokens, cache hit rate, batch percentage, retry rate, and accepted-output rate. Those six numbers matter more than a one-line provider price when you are deciding what to ship.

Re-check the provider page before committing annual spend. Pricing pages change quietly, and some enterprise contracts include discounts, minimums, or platform fees that do not appear in public documentation.

Frequently asked

How does Prompt Caching save money?

Prompt caching allows API providers (such as Anthropic, OpenAI, and DeepSeek) to reuse parts of the context window that remain static (like large system prompts or system contexts). This results in discounts of up to 90% off standard input token rates, which is crucial for agentic architectures.

Are self-hosted open-weight models completely free?

While models like Llama 4 Scout, Maverick, or Phi-4 cost $0 to license under open licenses, you must pay for the cloud GPU infrastructure (such as AWS, GCP, RunPod, or Lambda Labs) to host them. If your throughput is low, managed APIs are often cheaper than keeping a GPU active 24/7.

Changelog

June 6, 2026 — Published. All prices sourced from official provider documentation and cross-checked against benchr model-figures.json.

Sources

Official provider API docs: Anthropic, OpenAI, Google, DeepSeek, xAI, Mistral, Meta, Moonshot AI, Microsoft, Alibaba (all verified June 3, 2026)
benchr model-figures.json — single source of truth, verified June 3, 2026