benchr

benchr https://benchr.org/ Pricing, benchmarks, and use-case fit for AI model selection. en-us Fri, 12 Jun 2026 11:18:15 +0000 corrections@benchr.org (benchr) OpenAI's October 2026 retirements: the GPT-4 era ends in one day https://benchr.org/deprecations/openai-october-2026-retirements https://benchr.org/deprecations/openai-october-2026-retirements Fri, 12 Jun 2026 00:00:00 +0000 On October 23, 2026, OpenAI retires eleven model IDs at once: GPT-4o, GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, GPT-4.1 nano, GPT Image 1, o1, o1-pro, o3-mini, and o4-mini. The Assistants API goes August 26. Every date, replacement, and audit step. GPT-4o shuts down October 23, 2026: where to go next https://benchr.org/deprecations/gpt-4o https://benchr.org/deprecations/gpt-4o Fri, 12 Jun 2026 00:00:00 +0000 OpenAI retires the gpt-4o-2024-05-13 snapshot on October 23, 2026. The official replacement is GPT-5.5 at $5/$30 — but GPT-5 at $1.25/$10 is the better cost match for most GPT-4o workloads. The full decision. Gemini 2.5 Pro and Flash shut down October 16, 2026 — and the upgrade costs more https://benchr.org/deprecations/gemini-2-5-pro https://benchr.org/deprecations/gemini-2-5-pro Fri, 12 Jun 2026 00:00:00 +0000 Google retires gemini-2.5-pro and gemini-2.5-flash on October 16, 2026. The catch: Gemini 3.5 Flash bills $1.50/1M input vs 2.5 Flash's $0.30 — a 5× jump. Here's the migration math and the cheaper detours. Contact benchr https://benchr.org/contact https://benchr.org/contact Fri, 12 Jun 2026 00:00:00 +0000 How to reach benchr for general questions, corrections, and security reports. Claude Sonnet 4 retires June 15, 2026: what breaks and how to migrate https://benchr.org/deprecations/claude-sonnet-4 https://benchr.org/deprecations/claude-sonnet-4 Fri, 12 Jun 2026 00:00:00 +0000 Anthropic retires claude-sonnet-4-20250514 on June 15, 2026. The replacement, Claude Sonnet 4.6, costs the same $3/$15, scores 79.6% on SWE-bench vs 72.7%, and carries a 1M context window. Here's the swap, step by step. Claude Opus 4 and 4.1 retirement: the deprecation that pays you https://benchr.org/deprecations/claude-opus-4 https://benchr.org/deprecations/claude-opus-4 Fri, 12 Jun 2026 00:00:00 +0000 Claude Opus 4 retires June 15, 2026 and Opus 4.1 follows August 5. Migrating to Opus 4.8 cuts the bill from $15/$75 to $5/$25 per million tokens and lifts SWE-bench from 72.5% to 88.6%. One API parameter will bite you. AI Model Deprecations 2026: Every Shutdown Date That Matters https://benchr.org/deprecations https://benchr.org/deprecations Fri, 12 Jun 2026 00:00:00 +0000 Announced AI model retirements across Anthropic, OpenAI, and Google, with shutdown dates, official replacements, and migration cost analysis. Verified against official provider deprecation documentation. AI API Error Database https://benchr.org/errors https://benchr.org/errors Fri, 12 Jun 2026 00:00:00 +0000 Common AI API errors across OpenAI, Anthropic, and Google Gemini, each verified against official provider documentation, with causes, fixes, and migration alternatives. GPT-5.4, reviewed: the value pick OpenAI doesn't advertise https://benchr.org/articles/gpt-5-4-review https://benchr.org/articles/gpt-5-4-review Wed, 10 Jun 2026 00:00:00 +0000 GPT-5.4 review, three months after launch: $2.50/$15 pricing, 1M context, 75% OSWorld computer use, and why losing the flagship crown to GPT-5.5 made it the value pick. GPT-5.4 API Pricing: the tier between GPT-5 and 5.5 https://benchr.org/pricing/gpt-5-4 https://benchr.org/pricing/gpt-5-4 Wed, 10 Jun 2026 00:00:00 +0000 GPT-5.4 costs $2.50/1M input and $15/1M output — double GPT-5, half of GPT-5.5. GPT-5.4's $0.25 caching, the 272K long-context surcharge, and when the middle tier is the value pick. Claude Fable 5 هو نموذج فئة Mythos الذي تستطيع استخدامه أخيراً https://benchr.org/ar/articles/claude-fable-5-launch https://benchr.org/ar/articles/claude-fable-5-launch Wed, 10 Jun 2026 00:00:00 +0000 صدر Claude Fable 5 في 9 يونيو 2026 بسعر $10/$50 لكل مليون رمز. إنه Mythos مع مصنفات أمان: ما الذي يحجبه اللجام، وما الذي تكسبه فوق Opus 4.8، ونافذة الأسبوعين المجانية. Claude Fable 5 is the Mythos-class model you can finally use https://benchr.org/articles/claude-fable-5-launch https://benchr.org/articles/claude-fable-5-launch Wed, 10 Jun 2026 00:00:00 +0000 Claude Fable 5 shipped June 9, 2026 at $10/$50 per million tokens. It's Mythos with safety classifiers: what the leash blocks, what you get over Opus 4.8, and the two-week free window. Claude Fable 5 API Pricing: the first Mythos-class rate card https://benchr.org/pricing/claude-fable-5 https://benchr.org/pricing/claude-fable-5 Wed, 10 Jun 2026 00:00:00 +0000 Claude Fable 5 costs $10/1M input and $50/1M output — double Opus 4.8. Caching math, the tokenizer surcharge nobody mentions, fallback economics, and the free window through June 22, 2026. Anthropic expands Project Glasswing and brings Claude Security to real codebases https://benchr.org/articles/anthropic-glasswing-claude-security https://benchr.org/articles/anthropic-glasswing-claude-security Tue, 09 Jun 2026 00:00:00 +0000 Anthropic extended Project Glasswing to about 150 more organizations and launched Claude Security, a product that uses Claude Opus 4.8 to scan codebases and suggest patches. What it means for developers. The UK just forced Google to give publishers more control over AI Search https://benchr.org/articles/uk-google-ai-search-publishers https://benchr.org/articles/uk-google-ai-search-publishers Tue, 09 Jun 2026 00:00:00 +0000 The UK's CMA now requires Google to let publishers opt out of having their content power AI features in Search — without losing standard rankings. A world first, and a template other regulators may copy. What it means. Grok 4.3 is now the default — and old API slugs bill at its prices https://benchr.org/articles/grok-4-3-default-and-pricing https://benchr.org/articles/grok-4-3-default-and-pricing Tue, 09 Jun 2026 00:00:00 +0000 xAI made Grok 4.3 the default model and redirected deprecated text slugs to it — billed at Grok 4.3 prices. If your app still calls Grok 4.1 Fast or Grok 3, your costs may have jumped. Here's how to check. GPT-5.5 pricing explained: the 272K cliff and the $30/$180 Pro tier https://benchr.org/articles/gpt-5-5-pricing-explained https://benchr.org/articles/gpt-5-5-pricing-explained Tue, 09 Jun 2026 00:00:00 +0000 GPT-5.5 costs $5/$30 per million tokens — until a prompt crosses 272K input tokens, when the whole session reprices higher. And GPT-5.5 Pro is a different model at $30/$180. What both mean for your bill. Google's WebMCP idea is a warning: websites need to become agent-readable tools https://benchr.org/articles/webmcp-agent-readable-websites https://benchr.org/articles/webmcp-agent-readable-websites Tue, 09 Jun 2026 00:00:00 +0000 WebMCP is a proposed web standard, backed by Google and Microsoft, that lets sites expose structured tools to AI agents instead of making them guess from the DOM. What it is, where it stands, and why site owners should care. Google's AI Search is turning into an agent layer, not just a summary box https://benchr.org/articles/google-ai-search-agent-layer https://benchr.org/articles/google-ai-search-agent-layer Tue, 09 Jun 2026 00:00:00 +0000 Google's AI Mode is shifting from summarizing results to acting on them with background agents. Meanwhile AI Overviews keep cutting clicks. The real question for site owners: which pages survive? Google I/O 2026 made one thing clear: developer AI is moving from autocomplete to agents https://benchr.org/articles/google-ai-agents-io-2026 https://benchr.org/articles/google-ai-agents-io-2026 Tue, 09 Jun 2026 00:00:00 +0000 At I/O 2026 Google shipped Gemini 3.5 Flash as an agent-first model, started moving Gemini CLI users to Antigravity by June 18, and reframed developer AI around autonomous agents. What changes for builders. Claude Opus 4.8 is live. The real story is coding, agents, and Mythos pressure https://benchr.org/articles/claude-opus-4-8-launch https://benchr.org/articles/claude-opus-4-8-launch Tue, 09 Jun 2026 00:00:00 +0000 Claude Opus 4.8 shipped May 28, 2026 at the same $5/$25 pricing as 4.7. The real story is agentic coding, dynamic workflows, and a Mythos-class model heading to all customers. What's official and what to make of it. Qwen 3.6-27B Pricing: Self-Hosted Multilingual with Strong Coding https://benchr.org/pricing/qwen-3-6-27b https://benchr.org/pricing/qwen-3-6-27b Sat, 06 Jun 2026 00:00:00 +0000 Qwen 3.6-27B is a free open-weight model from Alibaba. Via API ~$0.20/1M input. Strong multilingual (Arabic, Chinese, European), 27B MoE, self-hostable on consumer GPUs. Phi-4 Pricing: Self-Hosted Edge Model — Strong Reasoning for 14B Parameters https://benchr.org/pricing/phi-4 https://benchr.org/pricing/phi-4 Sat, 06 Jun 2026 00:00:00 +0000 Phi-4 is free to self-host from Microsoft. Via API ~$0.07/1M input. 14B parameters, MIT license, runs on a laptop GPU. Strong reasoning benchmarks for its size class. OpenAI API Pricing Guide: GPT-5.5, GPT-5, and GPT-5 Mini Costs https://benchr.org/articles/openai-api-pricing-guide https://benchr.org/articles/openai-api-pricing-guide Sat, 06 Jun 2026 00:00:00 +0000 Complete guide to OpenAI API pricing in 2026. Review input, output, prompt caching, and batch execution pricing for all GPT models. Sourced from official docs. Mistral Medium 3.5 API Pricing: $1.50/1M — Multimodal at Mid-Range Cost https://benchr.org/pricing/mistral-medium-3-5 https://benchr.org/pricing/mistral-medium-3-5 Sat, 06 Jun 2026 00:00:00 +0000 Mistral Medium 3.5 costs $1.50/1M input and $7.50/1M output. Vision and PDF support, 128K context, EU-resident infrastructure. Mistral's multimodal production model. Mistral Large 3 API Pricing: $0.50/1M — Apache License, EU Data Residency https://benchr.org/pricing/mistral-large-3 https://benchr.org/pricing/mistral-large-3 Sat, 06 Jun 2026 00:00:00 +0000 Mistral Large 3 costs $0.50/1M input and $1.50/1M output. Apache 2.0 license for self-hosting, EU data residency available. Strong multilingual and code performance at a low price. Llama 4 Scout Pricing: 10M Context — The Largest Available Window https://benchr.org/pricing/llama-4-scout https://benchr.org/pricing/llama-4-scout Sat, 06 Jun 2026 00:00:00 +0000 Llama 4 Scout is free to self-host from Meta. Via API ~$0.11/1M input. 10 million token context — the largest available window. Lightweight MoE for extreme long-context tasks. Llama 4 Maverick Pricing: Self-Hosted Multimodal, 1M Context https://benchr.org/pricing/llama-4-maverick https://benchr.org/pricing/llama-4-maverick Sat, 06 Jun 2026 00:00:00 +0000 Llama 4 Maverick is free to self-host from Meta. Via API ~$0.20/1M input. Multimodal (vision), 1M context, MoE architecture. The open-source choice for multimodal pipelines. Kimi K2.6 API Pricing: $0.95/1M — 80.2% SWE-bench, 90.5% GPQA https://benchr.org/pricing/kimi-k2-6 https://benchr.org/pricing/kimi-k2-6 Sat, 06 Jun 2026 00:00:00 +0000 Kimi K2.6 costs $0.95/1M input and $4/1M output. 80.2% SWE-bench, 90.5% GPQA Diamond, 200K context. Moonshot AI's model combining coding and science strengths under $1. Grok 4.3 API Pricing: $1.25/1M — Cheap Output, Real-Time Web Search https://benchr.org/pricing/grok-4-3 https://benchr.org/pricing/grok-4-3 Sat, 06 Jun 2026 00:00:00 +0000 Grok 4.3 costs $1.25/1M input and $2.50/1M output. Native real-time web search, 256K context, xAI's production model. Unusually low output price relative to input. GPT-5 vs Gemini 3.5 Flash: same budget, different machines https://benchr.org/compare/gpt-5-vs-gemini-3-5-flash https://benchr.org/compare/gpt-5-vs-gemini-3-5-flash Sat, 06 Jun 2026 00:00:00 +0000 GPT-5 ($1.25/$10) vs Gemini 3.5 Flash ($1.50/$9): nearly the same price, completely different machines. Speed, context, free tier, and output ceilings compared. GPT-5 Mini API Pricing: $0.25/1M — Volume and Routing at 48% SWE-bench https://benchr.org/pricing/gpt-5-mini https://benchr.org/pricing/gpt-5-mini Sat, 06 Jun 2026 00:00:00 +0000 GPT-5 Mini costs $0.25/1M input and $2/1M output. 160 tok/s, 48% SWE-bench, 128K context. OpenAI's cheapest API model for classification, routing, and volume pipelines. Gemini 3.1 Pro API Pricing: $2/1M — 1M Context and 94.3% GPQA https://benchr.org/pricing/gemini-3-1-pro https://benchr.org/pricing/gemini-3-1-pro Sat, 06 Jun 2026 00:00:00 +0000 Gemini 3.1 Pro costs $2/1M input and $12/1M output. 1M context window, 94.3% GPQA Diamond, native multimodal. Google's frontier model for long-document and research workloads. DeepSeek V4-Pro vs GPT-5: 11.5× cheaper output, higher coding score https://benchr.org/compare/deepseek-v4-pro-vs-gpt-5 https://benchr.org/compare/deepseek-v4-pro-vs-gpt-5 Sat, 06 Jun 2026 00:00:00 +0000 DeepSeek V4-Pro vs GPT-5: 11.5× cheaper output ($0.87 vs $10) and a higher SWE-bench score (80.6% vs 74.9%). Where the catch really is, with monthly cost math. DeepSeek V4-Pro vs Claude Sonnet 4.6: one point apart, 10× apart https://benchr.org/compare/deepseek-v4-pro-vs-claude-sonnet-4-6 https://benchr.org/compare/deepseek-v4-pro-vs-claude-sonnet-4-6 Sat, 06 Jun 2026 00:00:00 +0000 DeepSeek V4-Pro ($0.435/$0.87) vs Claude Sonnet 4.6 ($3/$15): one SWE-bench point apart, a 10× price gap. What the cheaper model gives up, with real pipeline math. DeepSeek V4-Flash API Pricing: $0.14/1M — 79% SWE-bench on a Budget https://benchr.org/pricing/deepseek-v4-flash https://benchr.org/pricing/deepseek-v4-flash Sat, 06 Jun 2026 00:00:00 +0000 DeepSeek V4-Flash costs $0.14/1M input and $0.28/1M output. 79% SWE-bench, 1M context, MIT license. The cheapest model with near-Sonnet coding performance. Claude Sonnet 4.6 vs GPT-5: the daily-driver decision https://benchr.org/compare/claude-sonnet-4-6-vs-gpt-5 https://benchr.org/compare/claude-sonnet-4-6-vs-gpt-5 Sat, 06 Jun 2026 00:00:00 +0000 Claude Sonnet 4.6 ($3/$15) vs GPT-5 ($1.25/$10): SWE-bench scores, context windows, caching math, and a concrete monthly cost example for the daily-driver decision. Claude Sonnet 4.6 API Pricing: $3/1M — Anthropic Production Default https://benchr.org/pricing/claude-sonnet-4-6 https://benchr.org/pricing/claude-sonnet-4-6 Sat, 06 Jun 2026 00:00:00 +0000 Claude Sonnet 4.6 costs $3/1M input and $15/1M output. 79.6% SWE-bench, 200K context, 64K max output. The balanced choice between Haiku's speed and Opus's depth. Claude Opus 4.7 API Pricing: 87.6% SWE-bench, $5/1M https://benchr.org/pricing/claude-opus-4-7 https://benchr.org/pricing/claude-opus-4-7 Sat, 06 Jun 2026 00:00:00 +0000 Claude Opus 4.7 costs $5/1M input and $25/1M output. Same price as Opus 4.8, one point lower on SWE-bench. When to use 4.7 vs 4.8 vs Sonnet 4.6. Claude Haiku 4.5 API Pricing: $1/1M — Fastest Anthropic Model https://benchr.org/pricing/claude-haiku-4-5 https://benchr.org/pricing/claude-haiku-4-5 Sat, 06 Jun 2026 00:00:00 +0000 Claude Haiku 4.5 costs $1/1M input and $5/1M output. 145 tok/s, 73.3% SWE-bench, 200K context. Anthropic's speed and volume tier for high-throughput pipelines.