benchr compiles reference information on AI language models — pricing, context windows, benchmark scores, deprecation status — sourced from official provider documentation. Coverage includes Claude, GPT, Gemini, Llama, Mistral, DeepSeek, Qwen, Phi, and other frontier and open-weight models.
The publication is structured as a working reference, not as a magazine of original research. Articles describe each model's documented capabilities, published pricing, and known limits, with analysis on which model fits which workload based on the public record. Where original evaluation is presented, the test setup is described in the article itself.
What you'll find
Pricing tables by model and provider, sourced from Anthropic's pricing page, OpenAI's API pricing, and Google's Gemini API models page among others. Benchmark scores referenced to the original benchmark maintainers — SWE-bench Verified, LMSYS Arena, ARC-AGI. Context-window comparisons. Notes on deprecations, version histories, and which model has replaced which. Recommendations on which model fits which use case, sourced from the published capabilities of each.
What you won't find
Synthetic precision claims. Invented test costs. Sponsored content. Affiliate links to providers. Rankings designed for search traffic rather than reader utility.
Use of AI
AI tools are used in drafting and editing some of this content. Pricing, release dates, deprecation status, and benchmark numbers are verified against primary sources before publication. Where original analysis is presented, the reasoning and sources are shown in the article.
Corrections
Errors are corrected on the original article with a note in the article's changelog. Material corrections also appear on the corrections page.
Contact
For corrections or source disputes: corrections@benchr.org.