Cheapest LLM API 2026: Ultra-Low Cost AI Models Ranked

Looking for cheap token pricing? Compare DeepSeek, GPT-5 Mini, Claude Haiku, and self-hosted open-weights under $1 per million.

· Figures verified against official sources, June 6, 2026

For high-volume tasks such as RAG document ingestion, user chat routing, and structured data extraction, choosing a budget-friendly model is essential. Below is the ranked list of all models on benchr that cost **$1.00 or less per million input tokens**, or are open-weights intended for free self-hosting.

Model Provider Input / 1M Output / 1M License
Loading budget models...

Key Highlights in the Low-Cost Space

  • DeepSeek V4-Flash ($0.14/$0.28): Disrupted the market by offering a million-token context window at a fraction of standard API prices. It holds strong coding and reasoning scores despite its speed and low cost.
  • GPT-5 Mini ($0.25/$2.00): OpenAI's primary low-cost entry, offering high speed (160 tok/s) and full API feature integration (such as Structured Outputs and Batch Jobs).
  • Claude Haiku 4.5 ($1.00/$5.00): While pricier than DeepSeek and GPT-5 Mini, it features high prompt caching capabilities and Anthropic's signature safety and formatting alignment.

Want to compare these with the flagship models? Check out the full AI Model Rankings or compute exact volume pricing on our Cost Calculator.

Frequently asked

Are open-weights really cheaper than DeepSeek V4-Flash?

It depends on your volume. Hosting a model like Qwen3.6-27B or Llama 4 Scout requires renting a GPU (such as an A10G or A100), which bills hourly. If you make millions of requests daily, self-hosting is often cheaper. For lower volume or bursty workloads, using DeepSeek's API is significantly cheaper and zero-maintenance.