Rankings · Updated June 2026

AI model rankings

19 models across the frontier, mid, and open-weight tiers, ranked by benchr Rating, an editorial synthesis of capability and price — not an independent lab score. Sort any column, filter by type or license, and click any model name for the full review.

Data from models.json Rankings computed from data — never paid placements
Type
License
# Model benchr Rating SWE-bench % Input $/1M Output $/1M Context Tok/s Released
Loading models…

How the benchr Rating works

The benchr Rating is a single number that tells you how a model balances what it can do against what it costs. It's not a poll, not an average of vague reviews, and no provider has paid for their position. The capability half is benchr's own editorial read, so treat the Rating as an opinion with its math shown, not a lab measurement.

Capability score (65% weight)

Derived from editorial capability estimates on a 0–10 scale, which you can inspect in models.json. These are built from benchmark results and the public record, not from marketing materials, and they're benchr's judgment rather than a lab measurement.

capability = (coding × 0.40) + (reasoning × 0.40) + (writing × 0.20)

Price score (35% weight)

Based on the blended API price — the average of input and output cost per million tokens. Free and self-hosted models score 100. The scale runs from $0.50 (full score) to $30.00 (zero score) per million tokens blended.

blended = (input_per_million + output_per_million) / 2 price_score = max(0, min(100, 100 × (1 − max(0, blended − 0.50) / 29.50)))

Final score

benchr_score = round(capability × 0.65 + price_score × 0.35)

The formula runs in assets/js/models.js, so you can read and verify it yourself. It produces a 0–100 value shown on a 0–10 scale. Capability inputs are benchr's editorial estimates. For verified official pricing and benchmark figures, see model-figures.json.

Methodology as of June 1, 2026. Formula may be revised as the model landscape changes — check the changelog for updates.

Frequently asked questions

What is the benchr Rating?

A transparent composite of capability (65%) and price efficiency (35%). Higher means more capable per dollar. The formula runs in open JavaScript — you can read it in assets/js/models.js.

Which AI model is best in 2026?

Depends on your budget and task. For no-budget-limit capability: Claude Opus 4.8. For the best capability-per-dollar: DeepSeek V4-Pro or Gemini 3.5 Flash. For free self-hosted: Llama 4 Maverick. Use the model recommender to get a personalized pick.

Are rankings ever paid or sponsored?

No. Rankings are computed purely from data in models.json. No provider has paid for placement. See editorial standards for the full policy.

How often is the data updated?

The goal is same-day updates when major models ship or providers change prices. The updated field in models.json shows the last data refresh. Spot an error? File a correction.

Why are benchmark scores labeled as "editorial estimates"?

Many benchmark figures aren't comparable across providers — test sets differ, conditions differ, and some numbers are self-reported. benchr's capability scores are built from available benchmark data but treated as estimates rather than certified figures. The methodology page explains the process. For verified official figures, see model-figures.json.

Other tools

Charts → Intelligence-vs-price quadrant and weighted benchmark explorer Cost calculator → Enter your token usage, get monthly cost ranked cheapest-first Model recommender → Answer three questions, get your best-fit pick with a reason Side-by-side compare → Pick up to five models and compare every dimension
benchr dispatch

New-model coverage the day it ships — pricing, benchmarks, and what changed.