Bytes, not tokens
Token limits and byte limits fail at different doors. A context-window overflow happens inside the model's accounting, after your request has been accepted and read. A 413 happens at the curb: the raw size of the request body gets checked against a per-endpoint cap, and on the direct API that check belongs to Cloudflare, which rejects the request before it ever reaches Anthropic's servers. On the Messages API, the cap is 32 MB.
That split is why this error confuses careful people. A prompt sitting comfortably inside the context window still bounces, because tokens and megabytes measure different things. Plain text almost never gets you to 32 MB — attachments do. Binary content rides inside the JSON body as base64, which adds roughly a third to its size, so a stack of images plus a long history reaches the wall far sooner than the token count suggests.
The wall by endpoint
| Endpoint | Max request size |
|---|---|
| Messages API | 32 MB |
| Token Counting API | 32 MB |
| Batch API | 256 MB |
| Files API | 500 MB |
Note the first two rows match: the Token Counting API shares the Messages cap, so an oversized payload can't even be size-checked by sending it there. It bounces at the same wall, which means the counting has to happen on your side of the wire.
What you'll get back
{
"type": "error",
"error": {
"type": "request_too_large",
"message": "Request exceeds the maximum allowed number of bytes."
},
"request_id": "req_011CSHoEeqs5C35K2UUqR7Fy"
}
Same envelope as every Anthropic error: branch on the type field, and in SDK code catch the typed exception class for the status rather than string-matching the message. Responses carry a req_-prefixed request-id header that the SDKs expose; quote it if the failure turns into a support thread.
Shrink or relocate
The fix starts with a measurement the SDK won't do for you, since the client libraries send whatever you hand them. One function in your wrapper settles it:
# Python: measure the body before the edge does
import json
CAP_MB = 32 # Messages API ceiling, in bytes rather than tokens
def body_size_mb(payload: dict) -> float:
return len(json.dumps(payload).encode("utf-8")) / 1_048_576
size = body_size_mb(payload)
if size >= CAP_MB:
# usual culprit: base64 images inline in content blocks
reroute(payload) # assets to the Files API, bulk to Batch
When the number comes back big, the culprit is nearly always embedded media, and the relocation map follows the table above. Big assets belong in the Files API at its 500 MB cap, uploaded once and referenced from the message instead of pasted into it. Bulk jobs belong in the Batch API, which takes 256 MB per request and runs at a 50% discount on standard Claude pricing. Anthropic's docs also steer long-running work, especially anything past 10 minutes, toward streaming or the Batch API rather than one enormous synchronous call.
If your failure is token-shaped instead of byte-shaped, that's a different page: the request fit down the wire but overflowed the model's window. The context_length_exceeded breakdown covers the token-side playbook, and the context-window comparison shows which models give you room to stop trimming.