Reference·June 2026

Anthropic rate_limit_error: meaning, cause, and fix

Anthropic can limit both sustained volume and a sudden increase in traffic. The response headers and your recent request pattern help distinguish the two.

By the benchr team · Published June 12, 2026 · Verified against Anthropic's API error documentation, June 12, 2026

AnthropicHTTP 429severity: mediumrate limit

Two ways to hit it

The first road is the ceiling. Every tier carries request and token budgets, and traffic past them draws 429s until the window clears. Note which budget went first, because the cures differ: blowing the request budget calls for fewer, better-bundled calls, while blowing the token budget calls for shorter prompts and a tighter max_tokens, even at modest call volume.

Anthropic also documents acceleration limits: a sharp increase in usage can produce 429s even below the published tier limit. A launch or highly parallel backfill can create that pattern. Ramp traffic gradually and keep usage changes controlled. If errors arrive while your own traffic is flat, check whether the response is instead an overload 529.

What comes back

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Your account has hit a rate limit."
  },
  "request_id": "req_011CSHoEeqs5C35K2UUqR7Fy"
}

Branch on the type field. The Python SDK surfaces this as anthropic.RateLimitError, so catch the exception class; message strings are wording, not contract. Every response also carries a req_-prefixed request-id header that the SDKs expose, and when a limits question turns into a support ticket, that ID is the difference between a fast answer and a slow one.

Ramp traffic gradually

Backoff reacts to a 429 after it lands. Concurrency shaping prevents most of them from existing, and it doubles as ramp control, since a fixed pool can't sprint no matter how much work piles up behind it. One semaphore does the job:

# Python: a worker pool that turns bursts into a steady stream
import asyncio
import anthropic

client = anthropic.AsyncAnthropic()
gate = asyncio.Semaphore(8)        # in-flight cap; tune to your tier

async def ask(prompt: str):
    async with gate:
        return await client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=512,
            messages=[{"role": "user", "content": prompt}],
        )

async def run(prompts):
    return await asyncio.gather(*(ask(p) for p in prompts))

Eight workers grinding through a thousand prompts produce a bounded, even stream that reads as a steady customer. Raise the pool size in small steps over days, and the same gate that stops bursts becomes your gradual ramp.

Reduce request and token volume

Persistent 429s usually require lower live throughput or a higher limit. Prompt caching discounts eligible repeated input by 90%, and the Batch API lists a 50% discount for work that can wait. You can also test whether simple, high-volume calls meet their quality target on Claude Haiku 4.5 at $1/$5 per million tokens. Price the split with the calculator and include retries.

Frequently asked

Why am I getting 429s below my published limits?

Acceleration limits. Anthropic's docs warn that sharp usage increases can draw 429s even under your tier's numbers. Ramp gradually and keep traffic patterns consistent; vertical growth is the trigger.

Does exponential backoff fix this?

For spikes, yes. Jittered backoff absorbs bursts and the occasional ramp penalty cleanly. For 429s arriving every hour, no: that's chronic undersizing, and the fix is shaping traffic, caching repeated context, and moving bulk work to the Batch API.

What's the cheapest way to cut token throughput?

Prompt caching plus routing. Caching discounts repeated input context by 90%, and sending simple calls to Claude Haiku 4.5 at $1/$5 per million tokens clears the heaviest traffic off your priciest model.

Changelog

June 12, 2026 — Published. Rate-limit behavior, the acceleration-limit caveat, and the response shape verified against Anthropic's API error docs.

Sources

Anthropic API errors — platform.claude.com/docs/en/api/errors (verified June 12, 2026)
benchr api-errors.json, the structured entry for this error