Reference·June 2026

Gemini RESOURCE_EXHAUSTED: meaning, cause, and fix

This 429 means the project exhausted a request, token, or quota allowance. Read the quota details before deciding whether to wait, throttle, or change the tier.

By the benchr team · Published June 12, 2026 · Verified against Google's Gemini API troubleshooting docs, June 12, 2026

Google GeminiHTTP 429severity: mediumrate limit

Identify the quota that was exceeded

Free-tier ceilings sit low on purpose. Google built that tier for kicking the tires, and a real interface in front of it will spend a minute's request budget in seconds. One user click that fans out into four model calls, a retry loop with no delay, a cron job that wakes up hungry: each looks innocent in code review and burns quota at runtime.

Shared projects are the other classic. Quota is counted per project, not per app, so the demo a coworker left running eats from the same plate as production. And ceilings don't vanish when you pay; paid tiers publish higher numbers, and heavy parallel workloads can still find them.

The response

Google wraps every failure in the same envelope: a numeric code that matches the HTTP status, a human-readable message, and a gRPC status string. A representative 429 body looks like this:

{
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

The message text shifts depending on which quota tripped, so branch on the status field rather than the prose.

Throttle at the source

Retrying is triage. Shaping traffic before it leaves your process is the cure, and you don't need a library for it:

// JavaScript: cap concurrency and space out request starts
function geminiLimiter(maxInFlight = 2, startGapMs = 4000) {
  const queue = [];
  let inFlight = 0;
  let lastStart = 0;

  function pump() {
    if (inFlight >= maxInFlight) return;
    if (queue.length === 0) return;
    const wait = lastStart + startGapMs - Date.now();
    if (wait > 0) { setTimeout(pump, wait); return; }
    lastStart = Date.now();
    inFlight += 1;
    const job = queue.shift();
    job.thunk().then(job.resolve, job.reject)
      .finally(() => { inFlight -= 1; pump(); });
    pump(); // another slot may be open
  }

  return (thunk) => new Promise((resolve, reject) => {
    queue.push({ thunk, resolve, reject });
    pump();
  });
}

// 4000ms between starts caps you near 15 calls a minute;
// tune both knobs to the published limits for your tier
const limited = geminiLimiter(2, 4000);
const reply = await limited(() => model.generateContent(prompt));

Every call goes in as a thunk; the gate decides when it runs. Two knobs, both tuned to your tier's published numbers: how many requests fly at once, and how far apart they launch.

Free-tier and paid-tier limits

If the limiter is doing its job and your app is still starving, stop tuning and choose a path.

Request more quota when your usage pattern is sound and the ceiling is the only problem. Move to a paid tier when this is production traffic. Or reroute the bulk work when it belongs on a different model entirely.

Paid Gemini 3.5 Flash runs $1.50 in and $9.00 out per million tokens with a 1M-token context, which prices most chat workloads in single-digit dollars a day. The calculator turns your traffic into an exact monthly figure, and the rankings help test whether another model should take the volume.

Frequently asked

Why am I rate-limited at what feels like tiny volume?

Free-tier request budgets are small by design. An interface that fires several model calls per user action can cross the per-minute line with a handful of users, which is the tier working as intended, not a bug.

Does paying make the limits go away?

No. Billing raises the ceilings; it doesn't remove them. Paid tiers publish their own per-minute numbers, and capacity planning against those numbers is still your job.

Should each app get its own Google project?

Yes. Quota is enforced per project, so separate projects keep one app's spike from starving the others, and they make it obvious which workload spends what.

Changelog

June 12, 2026 — Published. Error envelope, free-tier cause, and quota-increase path verified against Google's Gemini API troubleshooting guide.

Sources

Gemini API troubleshooting: ai.google.dev/gemini-api/docs/troubleshooting (verified June 12, 2026)
benchr api-errors.json (structured entry for this error)