OpenAI API 500 error: what it means, how to retry, and when to escalate
HTTP 500 from the OpenAI API means something failed on OpenAI's side processing your request. It's not an authentication failure, it's not a rate limit, and in the vast majority of cases it's not caused by anything in your request. This page explains exactly what a 500 is, how it differs from the other 5xx codes you might see, the correct retry pattern, and the one narrow case where your request can actually trigger a 500.
The 30-second answer
- What it means: server-side internal error on OpenAI's infrastructure. Not your request's fault.
- What to do: retry with exponential backoff. The OpenAI SDKs already do this automatically by default (2 retries).
- When to stop retrying: if you hit 4–5 consecutive 500s, check status.openai.com before debugging your code.
- Logging: always capture the
x-request-idresponse header — it's required for any support ticket.
What the error looks like
The raw response body for a 500:
HTTP/1.1 500 Internal Server Error
{
"error": {
"message": "The server had an error while processing your request. Sorry about that!",
"type": "server_error",
"param": null,
"code": null
}
}
The message text varies, but the HTTP status is always 500 and the type is always server_error. In the Python SDK this raises an openai.InternalServerError. In the Node SDK it's an InternalServerError from openai/error.
500 vs. 502 vs. 503 vs. 504: what's the difference?
All four are server-side conditions, but they mean different things and the right debug path differs:
| Status | Name | Meaning for OpenAI API | Action |
|---|---|---|---|
| 500 | Internal Server Error | Unhandled exception in OpenAI's backend; your request reached the server | Retry with backoff |
| 502 | Bad Gateway | OpenAI's gateway couldn't reach an upstream service; often very short-lived | Retry with backoff |
| 503 | Service Unavailable | Server is temporarily unavailable — maintenance or overload | Retry with backoff; check status page |
| 504 | Gateway Timeout | Request took too long; cut off by a gateway before a response | Check max_tokens / prompt length; retry with backoff |
For 502 and 503, retry behavior is identical to 500. For 504, the retry is still correct, but you should also ask whether the request is unusually large — very long prompts or extremely high max_tokens values can cause consistent 504s because the generation takes longer than the gateway allows. Trim the request or split it if you see a pattern of 504s on the same query shape.
The correct retry pattern
Exponential backoff with jitter — same pattern as any other transient API failure:
import time, random
from openai import OpenAI, InternalServerError, APIStatusError
client = OpenAI()
def chat_with_retry(max_attempts=5, **kwargs):
for attempt in range(max_attempts):
try:
return client.chat.completions.create(**kwargs)
except (InternalServerError, APIStatusError) as e:
# Retry on 500, 502, 503, 504
if hasattr(e, 'status_code') and e.status_code in (500, 502, 503, 504):
if attempt < max_attempts - 1:
delay = (2 ** attempt) + random.uniform(0, 1)
time.sleep(delay)
continue
raise
raise RuntimeError("Max retry attempts exceeded")
Key properties of this pattern:
- Exponential growth: delays of ~1s, ~2s, ~4s, ~8s, ~16s — gives the server time to recover without slamming it with retries.
- Jitter: the
random.uniform(0, 1)prevents a thundering herd when many clients hit the same 500 simultaneously. - Hard cap:
max_attemptsensures a sustained outage fails loudly rather than hanging indefinitely.
The easier path: let the SDK retry
The OpenAI Python and Node SDKs both have built-in retry logic. By default they retry 2 times. You can configure this:
# Python — configure at client level
client = OpenAI(max_retries=4)
# Or disable retries entirely if you're handling them yourself
client = OpenAI(max_retries=0)
// Node
const client = new OpenAI({ maxRetries: 4 });
For most applications, bumping max_retries to 3 or 4 is the entire fix for occasional 500s. Write a custom loop only if you need behavior the SDK doesn't support (like different delays per error type).
When to check status.openai.com
A single 500 is noise — retry and continue. A sustained wave of 500s (more than 4–5 consecutive failures across multiple requests) is a signal to stop retrying your code and check whether there's a broader incident:
- status.openai.com — shows current incidents and historical uptime. Subscribe to updates here for production systems.
- If there's no active incident and you're still seeing consistent 500s, it may be request-specific (see the edge case below). Log the
x-request-idheader and open a support ticket.
How to log 500s properly
The most important thing to capture from a 500 response is the x-request-id header. Every OpenAI API response includes it, and it's the only way their support team can look up the specific request in their systems.
try:
response = client.chat.completions.create(**kwargs)
except InternalServerError as e:
request_id = e.response.headers.get("x-request-id")
print(f"500 error | request_id={request_id} | attempt={attempt}")
raise
In the Python SDK, the exception object exposes the raw response via e.response, including its headers. Log at minimum: the timestamp, the request_id, which model you were calling, and your attempt number.
The edge case: when your request can trigger a 500
In most cases a 500 is pure server-side noise. But there is a documented edge case: certain streaming configurations — specifically, malformed function call / tool call definitions combined with streaming mode — can trigger a 500 that wouldn't occur without streaming. This has been a recurring issue in the gpt-4 and gpt-4o families when tools definitions contain schema validation issues that are caught lazily during generation rather than eagerly at request intake.
Signs this might be your case:
- The 500 only occurs when
stream=Trueis set - The same prompt without streaming returns a valid 400 instead
- Your
toolsdefinitions have complex nested schemas or edge-case JSON Schema constructs
If you suspect this: disable streaming temporarily, send the same request, and see if you get a 400 with a useful error message instead. Fix the tool definition based on that 400, then re-enable streaming.
OpenAI's own retry guidance
OpenAI's official API documentation explicitly recommends retrying 500 errors and provides guidance consistent with the pattern above. Their key points: use backoff to avoid overwhelming a potentially recovering server, don't retry more than a few times for sustained outages, and always capture the request ID for support escalation. The SDK's automatic retry behavior is specifically designed to handle transient 500s without requiring any user code changes.
FAQ
Am I billed for a request that returns a 500? No — a 500 means the server failed to produce a response. You are not charged for output tokens on a failed request.
Can I prevent 500s by changing my request? Generally no. The one exception is the streaming + tool definition edge case above. Otherwise 500s are infrastructure-level and outside your control.
Should I alert on every 500? For production systems, alert on a sustained rate (e.g. more than 3 consecutive 500s, or 500 rate above 5% over 5 minutes), not on individual occurrences. Single 500s are expected background noise at any meaningful traffic volume.
Related
- OpenAI AuthenticationError: invalid API key — 6 causes & fixes
- Claude API 529 overloaded_error: causes and retry pattern
- How to reduce your OpenAI API bill
Last updated June 2, 2026. Error codes and SDK behavior verified against OpenAI's official API reference and SDK documentation. OpenAI may change behavior over time — confirm specifics in the current docs before relying on them in production.