Pricing Index · Updated July 2026

AI model API pricing

Input, output, and caching rates for every major AI API — sourced from official provider docs and updated whenever prices change. 26 ranked chat/coding models from OpenAI, Anthropic, Google, xAI, Z.AI, MiniMax, DeepSeek, Mistral, Meta, and more, plus media-model records in the open data.

Data from models.json · verified July 13, 2026 Neutral reference — never sponsored

All models by price

Model	Provider	Input / 1M	Output / 1M	Cached input / 1M	Details
Qwen3.6-27B	Alibaba	Self-hosted	Self-hosted	—	Full guide →
Llama 4 Maverick	Meta	Self-hosted	Self-hosted	—	Full guide →
Llama 4 Scout	Meta	Self-hosted	Self-hosted	—	Full guide →
Phi-4	Microsoft	Self-hosted	Self-hosted	—	Full guide →
DeepSeek V4-Flash	DeepSeek	$0.140	$0.280	$0.0028	Full guide →
GPT-5 Mini	OpenAI	$0.250	$2.00	$0.025	Full guide →
MiniMax M3	MiniMax	$0.300	$1.20	$0.060	Full guide →
DeepSeek V4-Pro	DeepSeek	$0.435	$0.870	$0.0036	Full guide →
Mistral Large 3	Mistral	$0.500	$1.50	—	Full guide →
Kimi K2.6	Moonshot AI	$0.950	$4.00	$0.160	Full guide →
Claude Haiku 4.5	Anthropic	$1.00	$5.00	$0.100	Full guide →
GPT-5	OpenAI	$1.25	$10.00	—	Full guide →
Grok 4.3	xAI	$1.25	$2.50	$0.200	Full guide →
GLM-5.2	Z.AI	$1.40	$4.40	$0.260	Full guide →
Gemini 3.5 Flash	Google	$1.50	$9.00	$0.150	Full guide →
Mistral Medium 3.5	Mistral	$1.50	$7.50	—	Full guide →
Gemini 3.1 Pro	Google	$2.00	$12.00	$0.200	Full guide →
Grok 4.5	xAI	$2.00	$6.00	$0.500	Full guide →
GPT-5.4	OpenAI	$2.50	$15.00	$0.250	Full guide →
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	$0.300	Full guide →
Claude Opus 4.8	Anthropic	$5.00	$25.00	$0.500	Full guide →
Claude Opus 4.7	Anthropic	$5.00	$25.00	$0.500	Full guide →
GPT-5.5	OpenAI	$5.00	$30.00	$0.500	Full guide →
Claude Fable 5	Anthropic	$10.00	$50.00	$1.00	Full guide →

Individual model pricing guides

Detailed breakdown per model: token costs, context limits, caching structures, use-case recommendations, and cost scenarios.

Pricing articles and analysis

Deeper reading on cost strategy, provider comparisons, and how to reduce your API bill.

Cheapest LLM API 2026 → — DeepSeek, Llama, and where the quality/cost line breaks down.
Price per use case → — The cheapest model for chat, coding, RAG, agents, and classification.
How to reduce token usage → — Caching, batching, and routing that cut bills without switching models.
DeepSeek vs OpenAI pricing → — What the cost gap actually means on real workloads.
Claude API pricing guide → — Opus 4.8, Sonnet 4.6, and Haiku 4.5 with the 90% caching discount explained.
OpenAI API pricing guide → — GPT-5.5, GPT-5, and GPT-5 Mini with batch and caching discounts.
AI model pricing comparison → — All ranked models side by side with notes on what each price tier actually buys.

How to read these pricing tables

API providers bill in tokens — roughly 750 words per 1,000 tokens for English text. Prices are listed per million tokens because per-token rates are too small to read ($0.000001250 is clearer as $1.25/1M). Most providers charge separately for input (text you send) and output (text the model generates).

Output tokens typically cost 4–12× more than input tokens because they require a separate forward pass per token. This means the input-to-output ratio of your workload matters enormously. A summarization task (high input, low output) has a very different cost profile than a story-generation task (low input, high output).

Three cost strategies for 2026

Route by task complexity. Use cheap small models (DeepSeek V4-Flash at $0.14/1M, GPT-5 Mini at $0.25/1M) for classification, routing, and simple extraction. Reserve expensive flagships for tasks that actually require their capability. Most pipelines can route 70–80% of calls to cheaper models.

Maximize prompt caching. If your system prompt or document context repeats across calls, caching cuts that portion of your input cost by 90% (Anthropic, Google) or ~99% (DeepSeek). This single optimization often reduces monthly bills more than switching models entirely.

Use Batch API for offline workloads. OpenAI, Anthropic, and others offer 50% discounts for asynchronous batch jobs — dataset evaluation, classification runs, bulk translation. If your workload can tolerate a 24-hour return window, batch pricing cuts your bill in half.

Use the Cost Calculator to model your specific token mix, or the Cheapest API leaderboard to find the lowest-cost model that meets your quality bar.

Frequently asked questions

What is the cheapest AI API in 2026?

DeepSeek V4-Flash at $0.14/1M input tokens and $0.28/1M output tokens is the cheapest hosted API. Open-weight models like Llama 4 Scout and Phi-4 have no per-token licensing fee but require self-hosting infrastructure.

What does input vs output pricing mean?

API providers charge separately for input tokens (text you send to the model) and output tokens (text the model generates). Output tokens typically cost 4–10× more per token than input. Most real applications have 3–8× more input tokens than output.

What is prompt caching and how much does it save?

Prompt caching stores your static system prompt or document context so that repeated calls reuse it at a steep discount. Anthropic offers 90% off, Google 90% off, DeepSeek ~99% off. OpenAI doesn't publish a caching discount for GPT-5. For agents with large repeated system prompts, caching can reduce your actual bill by 60–80%.

Are the model pricing pages on benchr accurate?

Prices are sourced directly from official provider documentation and verified against model-figures.json, benchr's single source of truth. The verified date is shown on each page. Prices change without notice — re-verify before making budget decisions.