Reference·June 2026

Gemini DEADLINE_EXCEEDED: meaning, cause, and fix

The service did not finish the request before its deadline. Request size, client timeout, and response mode are the first things to check.

By the benchr team · Published June 12, 2026 · Verified against Google's Gemini API troubleshooting docs, June 12, 2026

Google GeminiHTTP 504severity: lowtimeout

Why the request exceeds its deadline

Start with input size and requested output length. A large prompt may take longer than the client timeout, while a long non-streamed generation can leave the connection waiting without intermediate output. A deadline error does not by itself establish a service-wide outage; compare the workload with the configured timeout and provider status.

What comes back

A representative 504 body, in Google's standard shape:

{
  "error": {
    "code": 504,
    "message": "The service is unable to finish processing within the deadline.",
    "status": "DEADLINE_EXCEEDED"
  }
}

Key on status being DEADLINE_EXCEEDED rather than on the prose, and don't lump it in with 429s: this is a per-request time failure, not a traffic ceiling.

Three fixes to test

Start with streaming, because it removes the wait instead of extending it. Move to a bigger timeout when you've decided a slow call is acceptable. Trim context when the input was bloated to begin with.

# Python (google-genai)
from google import genai

# Fix 2: a timeout you chose deliberately (milliseconds)
client = genai.Client(http_options={"timeout": 120000})

# Fix 1: stream, so nothing waits on the full answer
for chunk in client.models.generate_content_stream(
    model="gemini-3.5-flash",
    contents=long_prompt,
):
    print(chunk.text, end="")

The third fix happens before the request leaves your machine: summarize stale history, send the relevant slice of a document instead of the whole thing, and keep an eye on how much context each request carries.

When input size is the underlying cause

Oversized input does not always surface as a 504. Google's documentation notes that context too large for processing can return a 500 INTERNAL response, with the same initial checks: reduce the context, switch models, or retry a transient failure. If both codes appear in one pipeline, record input size and give each request a context budget. Chunk large documents where the task allows it, and compare published limits in the context-window comparison.

And if giant inputs are your daily reality rather than an edge case, price the workload against Gemini 3.5 Flash and its 1M-token window before you re-architect around the clock.

Frequently asked

Is a 504 a rate limit?

No. Rate and quota problems answer as 429 RESOURCE_EXHAUSTED. A 504 is about time, not volume: the service couldn't finish this one request before the deadline, no matter how little traffic you're sending.

Will retrying help?

Only after you've changed something. Trim the context or raise the client timeout first; resending the identical oversized request tends to time out the identical way.

Should every call stream?

Long outputs, yes: streaming is the standard way to avoid sitting on one long response until the clock kills it. Short calls finish well inside any sane timeout and don't need the extra plumbing.

Changelog

June 12, 2026 — Published. Status shape, the large-input cause, and the timeout guidance verified against Google's Gemini API troubleshooting page.

Sources

Gemini API troubleshooting — ai.google.dev/gemini-api/docs/troubleshooting (verified June 12, 2026)
benchr api-errors.json — structured entry for this error