A reference for AI model selection

Pricing, benchmarks, and use-case fit for Claude, GPT, Gemini, Llama, Mistral, and the open-weight tier — sourced from official provider documentation.

Latest

  1. 01

    Claude Opus 4.8

    Same price as 4.7, a small leaderboard bump, and an honesty gain that catches its own bugs.

  2. 02

    GPT-5.5

    OpenAI's gains land in agentic coding and computer use, at roughly double GPT-5's API price.

  3. 03

    Gemini 3.1 Pro

    A real reasoning leap. Watch the long-context bill and how fast a 3.5 Pro could supersede it.

  4. 04

    Opus 4.8 vs GPT-5.5

    Both charge $5 per million input. Where each one wins after that, and which belongs in your stack.

  5. 05

    Claude Mythos

    Anthropic's restricted frontier model, gated under Project Glasswing. What it is and why it's locked away.

  6. 06

    ChatGPT vs Claude vs Gemini

    Three $20 subscriptions, three default models, scored on four everyday tasks. A clear pick for each user.

Reviews

  1. 07

    Claude Opus 4.7

    Strong on architecture decisions and long-document analysis. Vision is its weaker side.

  2. 08

    GPT-5

    The visual-design winner. Still strong on math and structured output.

  3. 09

    Gemini 3 Pro

    Excels at vision + reasoning. Average elsewhere. Deprecated by Google in March 2026 in favor of 3.1 Pro.

Comparisons

  1. 10

    Coding assistants

    Cursor, GitHub Copilot, Windsurf, and Cody on the same Markdown-exporter task.

  2. 11

    Voice models

    ElevenLabs, OpenAI Whisper, and Cartesia Sonic on latency, accuracy, and naturalness.

  3. 12

    Context windows

    Advertised window versus the retrieval zone where the model finds information.

Analysis

  1. 13

    Price per use case

    The cheapest model for chat, coding, RAG, agents, classification, and summarization.

  2. 14

    The open-weight tier

    Where Llama 4, Mistral Large 2, DeepSeek-V3.1, and Qwen 3 stand against the closed labs.

  3. 15

    Why benchmarks stopped telling you

    MMLU is saturated above 90%. The benchmarks worth tracking now.

Guides

  1. 16

    Frontier models

    Claude Opus 4.7, GPT-5, and Gemini 3.1 Pro Preview: the reviews, the head-to-head, and a single buying decision.

  2. 17

    Open-weight models

    Llama 4, Mistral, DeepSeek, Qwen, the small-model tier, and what it takes to self-host.

  3. 18

    AI costs

    What AI costs by model and workload, and where teams overspend.

Compare models directly

An interactive reference covering pricing, benchmarks, context windows, and capability ratings for all 19 models in the index, across the frontier, mid, and open-weight tiers.

Open the comparison tool

All 19 models ranked by benchr Rating →

All 62 articles in the archive →

Recent model releases →

benchr dispatch

New-model coverage the day it ships — pricing, benchmarks, and what changed.