OpenAI API 500 error: what it means, how to retry, and when to escalate

HTTP 500 from the OpenAI API means something failed on OpenAI's side processing your request. It's not an authentication failure, it's not a rate limit, and in the vast majority of cases it's not caused by anything in your request. This page explains exactly what a 500 is, how it differs from the other 5xx codes you might see, the correct retry pattern, and the one narrow case where your request can actually trigger a 500.

The 30-second answer

What the error looks like

The raw response body for a 500:

HTTP/1.1 500 Internal Server Error
{
  "error": {
    "message": "The server had an error while processing your request. Sorry about that!",
    "type": "server_error",
    "param": null,
    "code": null
  }
}

The message text varies, but the HTTP status is always 500 and the type is always server_error. In the Python SDK this raises an openai.InternalServerError. In the Node SDK it's an InternalServerError from openai/error.

500 vs. 502 vs. 503 vs. 504: what's the difference?

All four are server-side conditions, but they mean different things and the right debug path differs:

StatusNameMeaning for OpenAI APIAction
500Internal Server ErrorUnhandled exception in OpenAI's backend; your request reached the serverRetry with backoff
502Bad GatewayOpenAI's gateway couldn't reach an upstream service; often very short-livedRetry with backoff
503Service UnavailableServer is temporarily unavailable — maintenance or overloadRetry with backoff; check status page
504Gateway TimeoutRequest took too long; cut off by a gateway before a responseCheck max_tokens / prompt length; retry with backoff

For 502 and 503, retry behavior is identical to 500. For 504, the retry is still correct, but you should also ask whether the request is unusually large — very long prompts or extremely high max_tokens values can cause consistent 504s because the generation takes longer than the gateway allows. Trim the request or split it if you see a pattern of 504s on the same query shape.

The correct retry pattern

Exponential backoff with jitter — same pattern as any other transient API failure:

import time, random
from openai import OpenAI, InternalServerError, APIStatusError

client = OpenAI()

def chat_with_retry(max_attempts=5, **kwargs):
    for attempt in range(max_attempts):
        try:
            return client.chat.completions.create(**kwargs)
        except (InternalServerError, APIStatusError) as e:
            # Retry on 500, 502, 503, 504
            if hasattr(e, 'status_code') and e.status_code in (500, 502, 503, 504):
                if attempt < max_attempts - 1:
                    delay = (2 ** attempt) + random.uniform(0, 1)
                    time.sleep(delay)
                    continue
            raise
    raise RuntimeError("Max retry attempts exceeded")

Key properties of this pattern:

The easier path: let the SDK retry

The OpenAI Python and Node SDKs both have built-in retry logic. By default they retry 2 times. You can configure this:

# Python — configure at client level
client = OpenAI(max_retries=4)

# Or disable retries entirely if you're handling them yourself
client = OpenAI(max_retries=0)
// Node
const client = new OpenAI({ maxRetries: 4 });

For most applications, bumping max_retries to 3 or 4 is the entire fix for occasional 500s. Write a custom loop only if you need behavior the SDK doesn't support (like different delays per error type).

When to check status.openai.com

A single 500 is noise — retry and continue. A sustained wave of 500s (more than 4–5 consecutive failures across multiple requests) is a signal to stop retrying your code and check whether there's a broader incident:

How to log 500s properly

The most important thing to capture from a 500 response is the x-request-id header. Every OpenAI API response includes it, and it's the only way their support team can look up the specific request in their systems.

try:
    response = client.chat.completions.create(**kwargs)
except InternalServerError as e:
    request_id = e.response.headers.get("x-request-id")
    print(f"500 error | request_id={request_id} | attempt={attempt}")
    raise

In the Python SDK, the exception object exposes the raw response via e.response, including its headers. Log at minimum: the timestamp, the request_id, which model you were calling, and your attempt number.

The edge case: when your request can trigger a 500

In most cases a 500 is pure server-side noise. But there is a documented edge case: certain streaming configurations — specifically, malformed function call / tool call definitions combined with streaming mode — can trigger a 500 that wouldn't occur without streaming. This has been a recurring issue in the gpt-4 and gpt-4o families when tools definitions contain schema validation issues that are caught lazily during generation rather than eagerly at request intake.

Signs this might be your case:

If you suspect this: disable streaming temporarily, send the same request, and see if you get a 400 with a useful error message instead. Fix the tool definition based on that 400, then re-enable streaming.

OpenAI's own retry guidance

OpenAI's official API documentation explicitly recommends retrying 500 errors and provides guidance consistent with the pattern above. Their key points: use backoff to avoid overwhelming a potentially recovering server, don't retry more than a few times for sustained outages, and always capture the request ID for support escalation. The SDK's automatic retry behavior is specifically designed to handle transient 500s without requiring any user code changes.

FAQ

Am I billed for a request that returns a 500? No — a 500 means the server failed to produce a response. You are not charged for output tokens on a failed request.

Can I prevent 500s by changing my request? Generally no. The one exception is the streaming + tool definition edge case above. Otherwise 500s are infrastructure-level and outside your control.

Should I alert on every 500? For production systems, alert on a sustained rate (e.g. more than 3 consecutive 500s, or 500 rate above 5% over 5 minutes), not on individual occurrences. Single 500s are expected background noise at any meaningful traffic volume.


Related

Last updated June 2, 2026. Error codes and SDK behavior verified against OpenAI's official API reference and SDK documentation. OpenAI may change behavior over time — confirm specifics in the current docs before relying on them in production.