OpenAI API 429 error: rate limit vs. quota (and how to fix each)
A 429 from the OpenAI API is the single most misdiagnosed error in the platform, because the same status code covers two completely different problems with opposite fixes. Before you add a retry loop, figure out which one you have — retrying the wrong kind wastes time and money.
The 30-second answer
- If the error type is
rate_limit_exceeded: you sent requests/tokens faster than your tier allows. Fix = exponential backoff + slow down + (if chronic) raise your usage tier. - If the error type is
insufficient_quota: you're out of credits or hit a billing cap. Fix = add a payment method / credits / raise your usage limit. Backoff will NOT help — it'll fail forever. - How to tell them apart: read the
error.type/error.codein the JSON body. Don't guess from the 429 alone.
Step 1 — read the error body, not just the status code
Every OpenAI error returns JSON with an error object. The type/code inside it tells you which 429 you've got:
{
"error": {
"message": "Rate limit reached for ...",
"type": "requests", // or "tokens"
"code": "rate_limit_exceeded" // vs "insufficient_quota"
}
}
That one field decides everything below.
Case A: rate_limit_exceeded (you're going too fast)
Your account has per-minute limits on requests (RPM) and tokens (TPM), set by your usage tier. You tripped one of them. Note that the token limit counts the max_tokens you request, not just what you use — so an oversized max_tokens can trigger a TPM 429 even on a short reply.
Fix it:
- Exponential backoff + jitter. OpenAI explicitly recommends this: on a 429, sleep briefly, retry, and grow the delay each time (1s, 2s, 4s…) with a little randomness so concurrent clients don't retry in lockstep.
- Respect the rate-limit headers. Responses include headers such as
x-ratelimit-remaining-requests,x-ratelimit-remaining-tokens, andx-ratelimit-reset-requests/-reset-tokens. Use the reset values to wait exactly long enough instead of guessing. - Lower
max_tokensto what you actually need, to stop overstating your token draw. - Spread the load — queue and pace requests rather than bursting.
- Raise your usage tier. Tiers unlock higher RPM/TPM as your account's cumulative usage grows; you can see your current limits and tier in the Limits section of account settings.
import time, random
from openai import OpenAI, RateLimitError
client = OpenAI()
def create_with_retry(max_attempts=5, **kwargs):
for attempt in range(max_attempts):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError as e:
# Only backoff helps rate limits — bail out on quota errors
if getattr(e, "code", "") == "insufficient_quota":
raise
if attempt < max_attempts - 1:
time.sleep((2 ** attempt) + random.uniform(0, 1))
continue
raise
Case B: insufficient_quota (you're out of credits / hit a cap)
This 429 has nothing to do with speed. It means the account has no available credit balance, no valid payment method, or has hit a usage limit you (or your org admin) set. Retrying does nothing — it will return 429 on every attempt until billing is sorted.
Fix it: add or update a payment method, top up your credit balance, and check the usage-limit (budget) settings in your account — a soft/hard monthly cap that's been reached produces exactly this error. If you manage a team, confirm the org-level limit hasn't been exhausted by another project.
How to stop hitting 429s in the first place
- Keep a client-side limiter under your RPM/TPM ceilings rather than discovering them via errors.
- Trim
max_tokensto realistic values so you don't reserve token budget you won't use. - Move large, non-urgent jobs to the Batch API so they don't compete with live traffic.
- Set a billing alert so
insufficient_quotanever surprises you mid-run. (See our guide on setting an OpenAI API spend limit.)
FAQ
Is a 429 the same as being "overloaded"? No — that's a server-side condition on some APIs (e.g. Anthropic's 529 overloaded_error). A 429 is about your account's limit or quota, not the provider being busy.
How many retries? A handful (4–5) with exponential backoff is typical for rate_limit_exceeded. For insufficient_quota, zero — fix billing instead.
Related
- Claude API 529 overloaded_error — causes & fix
- How to set a spend limit on the OpenAI API
- API vs subscription — when the API is actually cheaper
Last updated May 27, 2026. Behavior verified against OpenAI's rate-limit and error-code documentation. Providers change limits and headers over time — confirm in the current docs before relying on specifics in production.