Quota and rate limits: the two 429s that need opposite fixes
OpenAI sends both rate limits and exhausted billing as HTTP 429, and confusing them wastes hours: backoff cures the first and does nothing for the second. Anthropic adds a wrinkle worth knowing — acceleration limits that fire when usage ramps too sharply, even below your ceiling. If limits keep binding, the practical escape is routing bulk traffic to cheaper, higher-throughput tiers: compare what that costs in the calculator against the current rankings.
Model not found: 2026's fastest-growing error
Three providers are retiring model lines this year — Claude Sonnet 4 and Opus 4 went dark June 15, Gemini's 2.5 line ends October 16, and OpenAI retires nine IDs on October 23. Every 404 in this category (OpenAI, Anthropic, Gemini) links into the deprecations record, where each retirement carries its replacement and the before-and-after price math. The live tracker shows every model's current status.
Context too large
Token overflow (context_length_exceeded) and byte overflow (request_too_large) fail differently and need different fixes — counting tokens versus measuring payloads. When trimming isn't an option, the fix is a bigger window: the context-window comparison shows what each model's advertised window is really worth in practice.
Authentication and billing walls
Auth failures (401s) are the cheapest errors to prevent and the most embarrassing to debug at midnight. Gemini's FAILED_PRECONDITION is its own animal — a region and billing wall that hits the moment you deploy to a server outside free-tier coverage.
Server errors and overload
Anthropic's 529 and Gemini's 504 are the platform telling you about itself, not about your code. Retry politely, and if a provider's busy hours keep hurting you, the price-history record and rankings make the case for keeping a second provider wired as a fallback.
Changelog
— Section launched with 15 errors across OpenAI, Anthropic, and Gemini, each verified against official provider documentation.
Sources
OpenAI error codes guide — developers.openai.com/api/docs/guides/error-codes (verified June 12, 2026)
Anthropic API errors — platform.claude.com/docs/en/api/errors (verified June 12, 2026)
Gemini API troubleshooting — ai.google.dev/gemini-api/docs/troubleshooting (verified June 12, 2026)
Model retirements. OpenAI shuts down nine model IDs on October 23, 2026, Anthropic retired Claude Sonnet 4 and Opus 4 on June 15, and Google's Gemini 2.5 line ends October 16. Code pinned to old IDs starts returning 404s — which is why every model-availability error here links straight into benchr's deprecations record.
Are these error explanations official?
Every page is verified against the provider's own error documentation — OpenAI's error-codes guide, Anthropic's API errors page, and Google's Gemini troubleshooting docs — with the verification date printed on the page. Where benchr adds editorial advice (like cheaper fallback models), it's labeled as benchr's pick.
What's the difference between a 429 rate limit and insufficient_quota?
Both arrive as HTTP 429 from OpenAI, but they need opposite responses. Rate limits are temporary — back off and retry, limits reset every minute. insufficient_quota means billing is exhausted — no retry strategy fixes it; only adding credits or raising your cap does.