Gemini RESOURCE_EXHAUSTED: meaning, cause, and fix

The free tier's signature error — and the surest sign you've outgrown it. Test-drive quotas run out the moment real users show up.

By the benchr team · · Verified against Google's Gemini API troubleshooting docs, June 12, 2026

Google GeminiHTTP 429severity: mediumrate limit

What tripped

Free-tier ceilings sit low on purpose. Google built that tier for kicking the tires, and a real interface in front of it will spend a minute's request budget in seconds. One user click that fans out into four model calls, a retry loop with no delay, a cron job that wakes up hungry: each looks innocent in code review and burns quota at runtime.

Shared projects are the other classic. Quota is counted per project, not per app, so the demo a coworker left running eats from the same plate as production. And ceilings don't vanish when you pay; paid tiers publish higher numbers, and heavy parallel workloads can still find them.

The response

Google wraps every failure in the same envelope: a numeric code that matches the HTTP status, a human-readable message, and a gRPC status string. A representative 429 body looks like this:

{
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

The message text shifts depending on which quota tripped, so branch on the status field rather than the prose.

Throttle at the source

Retrying is triage. Shaping traffic before it leaves your process is the cure, and you don't need a library for it:

// JavaScript: cap concurrency and space out request starts
function geminiLimiter(maxInFlight = 2, startGapMs = 4000) {
  const queue = [];
  let inFlight = 0;
  let lastStart = 0;

  function pump() {
    if (inFlight >= maxInFlight) return;
    if (queue.length === 0) return;
    const wait = lastStart + startGapMs - Date.now();
    if (wait > 0) { setTimeout(pump, wait); return; }
    lastStart = Date.now();
    inFlight += 1;
    const job = queue.shift();
    job.thunk().then(job.resolve, job.reject)
      .finally(() => { inFlight -= 1; pump(); });
    pump(); // another slot may be open
  }

  return (thunk) => new Promise((resolve, reject) => {
    queue.push({ thunk, resolve, reject });
    pump();
  });
}

// 4000ms between starts caps you near 15 calls a minute;
// tune both knobs to the published limits for your tier
const limited = geminiLimiter(2, 4000);
const reply = await limited(() => model.generateContent(prompt));

Every call goes in as a thunk; the gate decides when it runs. Two knobs, both tuned to your tier's published numbers: how many requests fly at once, and how far apart they launch.

Free tier or real tier

If the limiter is doing its job and your app is still starving, stop tuning and decide. Three doors. A quota increase request, when your usage pattern is sound and the ceiling is the only problem. Billing, when this is production; free tiers exist for testing, and traffic that matters deserves a tier with a contract behind it. Or rerouting, when the bulk work belongs on a different model entirely. Paid Gemini 3.5 Flash runs $1.50 in and $9.00 out per million tokens with a 1M-token context, which prices most chat workloads in single-digit dollars a day. The calculator turns your traffic into an exact monthly figure, and the rankings settle whether another model earns the volume.

Frequently asked

Why am I rate-limited at what feels like tiny volume?

Free-tier request budgets are small by design. An interface that fires several model calls per user action can cross the per-minute line with a handful of users, which is the tier working as intended, not a bug.

Does paying make the limits go away?

No. Billing raises the ceilings; it doesn't remove them. Paid tiers publish their own per-minute numbers, and capacity planning against those numbers is still your job.

Should each app get its own Google project?

Yes. Quota is enforced per project, so separate projects keep one app's spike from starving the others, and they make it obvious which workload spends what.

Changelog

  • — Published. Error envelope, free-tier cause, and quota-increase path verified against Google's Gemini API troubleshooting guide.

Sources

  • Gemini API troubleshooting: ai.google.dev/gemini-api/docs/troubleshooting (verified June 12, 2026)
  • benchr api-errors.json (structured entry for this error)