Why does the OpenAI API return a 429 error?

A 429 has two distinct causes. rate_limit_exceeded means you sent requests or tokens faster than your tier allows — retry with exponential backoff. insufficient_quota means you're out of credits or hit a billing cap — backoff won't help; you need to add credits or raise your budget.

How do I fix a 429 rate_limit_exceeded error?

Retry with exponential backoff and jitter, respect the x-ratelimit-reset headers, lower your max_tokens (token-per-minute limits count requested max_tokens), spread load over time, and raise your usage tier if you consistently need more throughput.

Does exponential backoff fix an insufficient_quota 429?

No. insufficient_quota is a billing condition, not a speed condition. Retrying will keep failing until you add a payment method, add credits, or increase your usage limit in account settings.

OpenAI API `429` error: rate limit vs. quota (and how to fix each)

A 429 from the OpenAI API is the single most misdiagnosed error in the platform, because the same status code covers two completely different problems with opposite fixes. Before you add a retry loop, figure out which one you have — retrying the wrong kind wastes time and money.

The 30-second answer

If the error type is rate_limit_exceeded: you sent requests/tokens faster than your tier allows. Fix = exponential backoff + slow down + (if chronic) raise your usage tier.
If the error type is insufficient_quota: you're out of credits or hit a billing cap. Fix = add a payment method / credits / raise your usage limit. Backoff will NOT help — it'll fail forever.
How to tell them apart: read the error.type / error.code in the JSON body. Don't guess from the 429 alone.

Step 1 — read the error body, not just the status code

Every OpenAI error returns JSON with an error object. The type/code inside it tells you which 429 you've got:

{
  "error": {
    "message": "Rate limit reached for ...",
    "type": "requests",            // or "tokens"
    "code": "rate_limit_exceeded"  // vs "insufficient_quota"
  }
}

That one field decides everything below.

Case A: `rate_limit_exceeded` (you're going too fast)

Your account has per-minute limits on requests (RPM) and tokens (TPM), set by your usage tier. You tripped one of them. Note that the token limit counts the max_tokens you request, not just what you use — so an oversized max_tokens can trigger a TPM 429 even on a short reply.

Fix it:

Exponential backoff + jitter. OpenAI explicitly recommends this: on a 429, sleep briefly, retry, and grow the delay each time (1s, 2s, 4s…) with a little randomness so concurrent clients don't retry in lockstep.
Respect the rate-limit headers. Responses include headers such as x-ratelimit-remaining-requests, x-ratelimit-remaining-tokens, and x-ratelimit-reset-requests / -reset-tokens. Use the reset values to wait exactly long enough instead of guessing.
Lower max_tokens to what you actually need, to stop overstating your token draw.
Spread the load — queue and pace requests rather than bursting.
Raise your usage tier. Tiers unlock higher RPM/TPM as your account's cumulative usage grows; you can see your current limits and tier in the Limits section of account settings.

import time, random
from openai import OpenAI, RateLimitError

client = OpenAI()

def create_with_retry(max_attempts=5, **kwargs):
    for attempt in range(max_attempts):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError as e:
            # Only backoff helps rate limits — bail out on quota errors
            if getattr(e, "code", "") == "insufficient_quota":
                raise
            if attempt < max_attempts - 1:
                time.sleep((2 ** attempt) + random.uniform(0, 1))
                continue
            raise

Case B: `insufficient_quota` (you're out of credits / hit a cap)

This 429 has nothing to do with speed. It means the account has no available credit balance, no valid payment method, or has hit a usage limit you (or your org admin) set. Retrying does nothing — it will return 429 on every attempt until billing is sorted.

Fix it: add or update a payment method, top up your credit balance, and check the usage-limit (budget) settings in your account — a soft/hard monthly cap that's been reached produces exactly this error. If you manage a team, confirm the org-level limit hasn't been exhausted by another project.

How to stop hitting 429s in the first place

Keep a client-side limiter under your RPM/TPM ceilings rather than discovering them via errors.
Trim max_tokens to realistic values so you don't reserve token budget you won't use.
Move large, non-urgent jobs to the Batch API so they don't compete with live traffic.
Set a billing alert so insufficient_quota never surprises you mid-run. (See our guide on setting an OpenAI API spend limit.)

FAQ

Is a 429 the same as being "overloaded"? No — that's a server-side condition on some APIs (e.g. Anthropic's 529 overloaded_error). A 429 is about your account's limit or quota, not the provider being busy.

How many retries? A handful (4–5) with exponential backoff is typical for rate_limit_exceeded. For insufficient_quota, zero — fix billing instead.

Last updated May 27, 2026. Behavior verified against OpenAI's rate-limit and error-code documentation. Providers change limits and headers over time — confirm in the current docs before relying on specifics in production.

OpenAI API 429 error: rate limit vs. quota (and how to fix each)

The 30-second answer

Step 1 — read the error body, not just the status code

Case A: rate_limit_exceeded (you're going too fast)

Case B: insufficient_quota (you're out of credits / hit a cap)

How to stop hitting 429s in the first place

FAQ

Related

OpenAI API `429` error: rate limit vs. quota (and how to fix each)

Case A: `rate_limit_exceeded` (you're going too fast)

Case B: `insufficient_quota` (you're out of credits / hit a cap)