How this site sources information

Where the data on benchr comes from and how it is kept current.

Pricing data

Per-token pricing for closed-source models comes directly from each provider's official pricing page: Anthropic, OpenAI, Google, Mistral, DeepSeek. Pricing for open-weight models hosted on third-party inference providers references the inference provider's own published rates when relevant. Where pricing changes, the article is updated and a changelog entry is added.

Benchmark scores

Benchmark numbers are sourced from the benchmark maintainers' published leaderboards. SWE-bench Verified scores come from swebench.com. LMSYS Arena scores come from lmarena.ai. ARC-AGI scores come from arcprize.org. When a provider publishes a model's score on a benchmark before it appears on the official leaderboard, the provider's published figure is used with attribution.

Capability ratings

Where this site assigns capability ratings (coding, reasoning, writing, vision, long context, multilingual) on a 0–100 scale, the ratings are synthesized from the model's documented benchmark performance on relevant evaluations, capability claims in the model's release notes, and observed behavior in published third-party comparisons. They are a synthesized reference figure, not a score from an original lab evaluation.

What this site is and is not

This is an editorial publication that synthesizes public information. It is not a benchmarking lab. Articles do not narrate original lab tests, private API-cost totals, or first-person time-on-tool reports. Where an article takes a position on which model fits a workload, that verdict is grounded in published benchmarks, official pricing, official spec sheets, and the well-known public behavior of the models being compared.

What that means in practice: you will see qualitative judgments (“stronger on long-document analysis,” “weaker on dialectal Arabic”) more often than fresh numbers. When a number appears, the source is cited. When a comparison cannot be backed by a citable source, it is stated qualitatively instead of inventing precision.

Update cadence

Pricing tables are checked against provider documentation when articles are revised. Model release and deprecation events are added to articles within several days of the announcement. The schedule for systematic re-verification of all model data is “before major article revisions”. There is no fixed weekly or monthly cycle.

Corrections and disputes

If you find a number, date, or attribution that does not match the primary source, send a note to corrections@benchr.org. Material corrections are noted on the corrections page and in the article changelog.