Why it happens
OpenAI bills by prepaid credit and optional budget caps. The moment usable balance hits zero (or your cap does), the API stops serving you. Four situations produce almost every case: the account simply ran out of credits mid-month; a budget cap you set months ago finally bound; free-trial credits expired (they have a shelf life); or your key belongs to a project whose quota is exhausted while a sibling project still has money. That last one bites teams that organize work into multiple projects and assume billing is shared. It isn't — quota follows the project, and so does the failure.
The error you'll see
{
"error": {
"message": "You exceeded your current quota, please check your plan and billing details.",
"type": "insufficient_quota",
"code": "insufficient_quota"
}
}
HTTP status: 429. That status is the trap — your retry middleware sees 429 and politely backs off, then fails again, forever. The body, not the status, tells you which 429 you have.
The code guard that saves the night
Split the two 429s at the handler level. Quota failures go to an alert; rate limits go to backoff:
# Python — route the two 429s differently
from openai import OpenAI, RateLimitError
client = OpenAI()
try:
r = client.chat.completions.create(model="gpt-5", messages=msgs)
except RateLimitError as e:
if "insufficient_quota" in str(e):
alert_oncall("OpenAI billing exhausted — requests halted")
raise # retrying is pointless
sleep_with_backoff() # a real rate limit — this one heals itself
Prevention
Set a usage alert below your cap, not at it. You want the email before production feels anything. Give each project its own budget so one runaway batch job can't starve the rest. And if quota keeps evaporating faster than planned, the bill itself is the bug: route bulk traffic to a cheaper tier instead of feeding everything to your most expensive model.