AI API errors: find it, fix it, pick a better model

Common errors across OpenAI, Anthropic, and Gemini — verified against official docs, with the fix and the model data to route around the problem.

By the benchr team · · Every error verified against official provider docs, June 12, 2026

Showing 15 of 15 errors

OpenAI · 429 · quotainsufficient_quota

"You exceeded your current quota" — billing is empty; retrying can't fix it.

OpenAI · 429 · rate limitrate_limit_exceeded

RPM or TPM ceiling hit. Temporary — limits reset every minute.

OpenAI · 400 · contextcontext_length_exceeded

Prompt plus max_tokens overflowed the model's window.

OpenAI · 404 · availabilitymodel_not_found

Wrong ID, wrong endpoint, no access — or a model OpenAI retired.

OpenAI · 401 · authinvalid_api_key

"Incorrect API key provided" — key wrong, revoked, or malformed.

Anthropic · 529 · overloadoverloaded_error

Platform-wide pressure, not your account. Backoff and ride it out.

Anthropic · 429 · rate limitrate_limit_error

Tier ceiling — or acceleration limits if your usage ramped too sharply.

Anthropic · 400 · formatinvalid_request_error

In 2026: sampling params on Opus 4.7+, prefill, or modified thinking blocks.

Anthropic · 413 · sizerequest_too_large

Body over the 32 MB Messages cap — rejected before Anthropic even sees it.

Anthropic · 404 · availabilitynot_found_error

Since June 15, 2026 the top cause is a retired Claude model ID.

Gemini · 429 · rate limitRESOURCE_EXHAUSTED

The free tier's signature error: more requests per minute than your tier allows.

Gemini · 400 · formatINVALID_ARGUMENT

Malformed body — or a feature that doesn't exist on your API version.

Gemini · 404 · availabilityNOT_FOUND

Expired file references — or a model from a line Google already shut down.

Gemini · 504 · timeoutDEADLINE_EXCEEDED

Big prompts outran the clock. Stream, trim, or raise the timeout.

Gemini · 400 · billingFAILED_PRECONDITION

Free tier unavailable in your region without billing enabled.

Quota and rate limits: the two 429s that need opposite fixes

OpenAI sends both rate limits and exhausted billing as HTTP 429, and confusing them wastes hours: backoff cures the first and does nothing for the second. Anthropic adds a wrinkle worth knowing — acceleration limits that fire when usage ramps too sharply, even below your ceiling. If limits keep binding, the practical escape is routing bulk traffic to cheaper, higher-throughput tiers: compare what that costs in the calculator against the current rankings.

Model not found: 2026's fastest-growing error

Three providers are retiring model lines this year — Claude Sonnet 4 and Opus 4 went dark June 15, Gemini's 2.5 line ends October 16, and OpenAI retires nine IDs on October 23. Every 404 in this category (OpenAI, Anthropic, Gemini) links into the deprecations record, where each retirement carries its replacement and the before-and-after price math. The live tracker shows every model's current status.

Context too large

Token overflow (context_length_exceeded) and byte overflow (request_too_large) fail differently and need different fixes — counting tokens versus measuring payloads. When trimming isn't an option, the fix is a bigger window: the context-window comparison shows what each model's advertised window is really worth in practice.

Authentication and billing walls

Auth failures (401s) are the cheapest errors to prevent and the most embarrassing to debug at midnight. Gemini's FAILED_PRECONDITION is its own animal — a region and billing wall that hits the moment you deploy to a server outside free-tier coverage.

Server errors and overload

Anthropic's 529 and Gemini's 504 are the platform telling you about itself, not about your code. Retry politely, and if a provider's busy hours keep hurting you, the price-history record and rankings make the case for keeping a second provider wired as a fallback.

Changelog

  • — Section launched with 15 errors across OpenAI, Anthropic, and Gemini, each verified against official provider documentation.

Sources

  • OpenAI error codes guide — developers.openai.com/api/docs/guides/error-codes (verified June 12, 2026)
  • Anthropic API errors — platform.claude.com/docs/en/api/errors (verified June 12, 2026)
  • Gemini API troubleshooting — ai.google.dev/gemini-api/docs/troubleshooting (verified June 12, 2026)
  • benchr api-errors.json — the structured dataset behind this section

Frequently asked questions

Why do AI API errors spike in 2026?

Model retirements. OpenAI shuts down nine model IDs on October 23, 2026, Anthropic retired Claude Sonnet 4 and Opus 4 on June 15, and Google's Gemini 2.5 line ends October 16. Code pinned to old IDs starts returning 404s — which is why every model-availability error here links straight into benchr's deprecations record.

Are these error explanations official?

Every page is verified against the provider's own error documentation — OpenAI's error-codes guide, Anthropic's API errors page, and Google's Gemini troubleshooting docs — with the verification date printed on the page. Where benchr adds editorial advice (like cheaper fallback models), it's labeled as benchr's pick.

What's the difference between a 429 rate limit and insufficient_quota?

Both arrive as HTTP 429 from OpenAI, but they need opposite responses. Rate limits are temporary — back off and retry, limits reset every minute. insufficient_quota means billing is exhausted — no retry strategy fixes it; only adding credits or raising your cap does.