Comparison·June 2026

Cheapest LLM API 2026: Ultra-Low Cost AI Models Ranked

Looking for cheap token pricing? Compare DeepSeek, GPT-5 Mini, Claude Haiku, and self-hosted open-weights under $1 per million.

By the benchr team · Reviewed June 6, 2026 · Figures verified against official sources, June 6, 2026 · View changelog

For high-volume tasks such as RAG document ingestion, user chat routing, and structured data extraction, choosing a budget-friendly model is essential. Below is the ranked list of all models on benchr that cost **$1.00 or less per million input tokens**, or are open-weights intended for free self-hosting.

Model	Provider	Input / 1M	Output / 1M	License
Qwen3.6-27B	Alibaba	Self-hosted	Self-hosted	Open-weight
Llama 4 Maverick	Meta	Self-hosted	Self-hosted	Open-weight
Llama 4 Scout	Meta	Self-hosted	Self-hosted	Open-weight
Phi-4	Microsoft	Self-hosted	Self-hosted	Open-weight
DeepSeek V4-Flash	DeepSeek	$0.140	$0.280	Open-weight
GPT-5 Mini	OpenAI	$0.250	$2.00	Proprietary
DeepSeek V4-Pro	DeepSeek	$0.435	$0.870	Open-weight
Mistral Large 3	Mistral	$0.500	$1.50	Open-weight
Kimi K2.6	Moonshot AI	$0.950	$4.00	Open-weight
Claude Haiku 4.5	Anthropic	$1.00	$5.00	Proprietary

Key Highlights in the Low-Cost Space

DeepSeek V4-Flash ($0.14/$0.28): Disrupted the market by offering a million-token context window at a fraction of standard API prices. It holds strong coding and reasoning scores despite its speed and low cost.
GPT-5 Mini ($0.25/$2.00): OpenAI's primary low-cost entry, offering high speed (160 tok/s) and full API feature integration (such as Structured Outputs and Batch Jobs).
Claude Haiku 4.5 ($1.00/$5.00): While pricier than DeepSeek and GPT-5 Mini, it features high prompt caching capabilities and Anthropic's signature safety and formatting alignment.

Want to compare these with the flagship models? Check out the full AI Model Rankings or compute exact volume pricing on our Cost Calculator.

When cheap stops being cheap

The cheapest API is not always the lowest-cost system. A model with a lower token price can still lose if it requires more retries, produces longer-than-needed answers, misses formatting constraints, or needs more human review. For production, the useful metric is cost per accepted answer.

For high-volume routing, extraction, and classification, budget models can be excellent because the task is narrow and failure is easy to detect. For legal review, complex code repair, research synthesis, or agent loops with side effects, pay attention to failure cost before optimizing the token bill.

A practical shortlist strategy

For most teams, the cheapest-model search should produce a shortlist, not a single winner. Pick one model below $0.50/1M input, one between $0.50 and $1.50, and one stronger fallback above that. Route easy work to the cheapest tier and escalate only when confidence, formatting, or safety checks fail.

This keeps the system cheap without betting the whole product on the lowest line in the table. It also gives you a migration path: if the budget model improves, increase its traffic; if it fails on an edge case, the fallback path is already defined.

Keep provider concentration in mind too. Saving a few dollars per million tokens is not worth losing redundancy if the workload is business-critical. For important systems, pair the cheapest default with a second provider that can take over when latency, rate limits, or availability degrade.

Frequently asked

Is DeepSeek V4-Flash the cheapest API available?

Yes, DeepSeek V4-Flash leads in cheap pricing at $0.14 per million input tokens and $0.28 per million output tokens, which makes it about 40% cheaper than GPT-5 Mini.

Can I host open-weight models for free?

Open-weight models like Llama 4 Scout, Maverick, or Phi-4 are free to download and use. However, you must cover the hardware infrastructure costs to run them (such as running instances on AWS or rental GPU sites).

Are open-weights really cheaper than DeepSeek V4-Flash?

It depends on your volume. Hosting a model like Qwen3.6-27B or Llama 4 Scout requires renting a GPU (such as an A10G or A100), which bills hourly. If you make millions of requests daily, self-hosting is often cheaper. For lower volume or bursty workloads, using DeepSeek's API is significantly cheaper and zero-maintenance.

Changelog

June 6, 2026 — Published. All prices verified against official provider API docs.

Sources

DeepSeek API docs — api-docs.deepseek.com (verified June 3, 2026)
OpenAI Platform pricing — platform.openai.com/docs/pricing (verified June 3, 2026)
Anthropic API pricing — docs.anthropic.com (verified June 3, 2026)
benchr model-figures.json — single source of truth, verified June 3, 2026