OpenAI API timeout errors: connection timeout vs. read timeout (and how to fix each)
An OpenAI API timeout isn't a single condition — there are two distinct kinds with different causes and different fixes. Getting them confused leads to either over-aggressive retries (and potential duplicate charges) or under-configured clients that timeout on legitimate long-running jobs. This page covers both, including how to configure timeouts in the Python SDK and when it's safe to retry.
The 30-second answer
- Connection timeout: your client couldn't reach OpenAI's servers at all. Usually a network issue. Safe to retry immediately.
- Read timeout: connected fine, but the response took longer than your timeout allows. The server may still be generating — retrying mid-stream risks a duplicate completion.
- The OpenAI SDK's default read timeout is 600 seconds (10 minutes) — generous for most tasks, but reasoning models (o1/o3) with high effort can exceed it.
- Check status.openai.com first — if OpenAI is having a degraded service event, no amount of timeout tuning will help.
The two types of timeout — and why the distinction matters
The OpenAI Python SDK uses httpx as its HTTP transport layer. httpx distinguishes four timeout phases; in practice two of them matter most for API usage:
| Timeout type | What it means | Python exception | Safe to retry? |
|---|---|---|---|
| Connect timeout | TCP connection to OpenAI's servers not established within the limit | httpx.ConnectTimeout → openai.APIConnectionError | Yes — request never reached OpenAI |
| Read timeout | Connected, but no data received from the server within the limit | httpx.ReadTimeout → openai.APITimeoutError | Depends (see below) |
| Write timeout | Sending the request body took too long | httpx.WriteTimeout | Yes — request not accepted |
| Pool timeout | Waiting for an available connection from the pool | httpx.PoolTimeout | Yes — request not sent |
The OpenAI SDK wraps most of these into two exception classes: openai.APIConnectionError (couldn't connect) and openai.APITimeoutError (connected but timed out waiting for the response). In practice, catching both separately and applying different retry logic gives you the safest behavior.
Default timeout values in the OpenAI Python SDK
As of the v1.x SDK, the default timeout configuration is:
# OpenAI SDK v1.x default — from the source:
# httpx.Timeout(timeout=600.0, connect=5.0)
# That means:
# connect: 5 seconds
# read: 600 seconds (10 minutes)
# write: 600 seconds
# pool: 600 seconds
The 10-minute read timeout is intentionally generous to accommodate long completions. You'll hit it most often with reasoning models on hard tasks, or when OpenAI is under load and generation slows down. For interactive applications where you'd rather fail fast and show an error, you'll want to shorten it.
How to configure custom timeouts
Pass an httpx.Timeout object to the OpenAI client constructor. This applies to all requests made with that client instance.
from openai import OpenAI
import httpx
# Tight timeouts for an interactive app: fail fast if slow
client_fast = OpenAI(
timeout=httpx.Timeout(30.0, connect=5.0)
# connect=5s, read/write/pool=30s
)
# Relaxed timeouts for a batch job with o1 or long completions
client_batch = OpenAI(
timeout=httpx.Timeout(600.0, connect=10.0)
# connect=10s, read/write/pool=600s (SDK default, made explicit)
)
# Full four-value control
client_custom = OpenAI(
timeout=httpx.Timeout(
connect=5.0,
read=120.0,
write=10.0,
pool=5.0
)
)
You can also override the timeout on a per-request basis by passing it to the method call directly:
response = client.chat.completions.create(
model="o1",
messages=[{"role": "user", "content": "Solve this complex problem..."}],
timeout=httpx.Timeout(300.0, connect=5.0) # per-request override
)
Common causes of slow or timing-out responses
If you're hitting timeouts even with generous settings, the issue is usually one of these:
- Large
max_tokenson a slow model. Every additional output token takes time. Settingmax_tokens=4096when you typically get 200-token responses doesn't cause slowness — but if you actually generate 4000+ tokens, that takes time. Setmax_tokensto a realistic ceiling. - Reasoning models with high effort.
o1ando3-mini(ando3in preview) run an internal "thinking" pass before producing output. With"reasoning_effort": "high", a complex task can take two to five minutes. This is expected behavior — don't shorten your timeout for these models. Use streaming so you get first-token acknowledgment early. - Network path instability. OpenAI uses Cloudflare and regional infrastructure — if your server's route to OpenAI's endpoints is flapping, you'll see intermittent read stalls that look like timeouts. Try from a different network or region to diagnose.
- OpenAI service degradation. OpenAI publishes real-time status at status.openai.com. Degraded API response time is the most common incident type and it produces exactly the symptoms of a timeout — slow or no response. Check this before changing any code.
Timeouts with streaming responses
Streaming changes the timeout calculus. In streaming mode, the read timeout applies to the interval between received chunks, not the total response time. This means:
- You usually get the first token quickly (within a few seconds), which resets the read-timeout clock.
- If the stream stalls mid-generation — no new chunks for longer than your read timeout — you'll get a
ReadTimeouteven if the total elapsed time is well under your limit. - A stream stall typically indicates server-side slowdown under load, not a bug in your code.
For robust streaming, implement a client-side watchdog that tracks time since the last received chunk:
import time
from openai import OpenAI
client = OpenAI()
CHUNK_STALL_TIMEOUT = 30 # seconds between chunks before we give up
last_chunk_time = time.monotonic()
with client.chat.completions.stream(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a long essay..."}],
) as stream:
for chunk in stream:
now = time.monotonic()
if now - last_chunk_time > CHUNK_STALL_TIMEOUT:
raise TimeoutError("Stream stalled — no chunk received in 30s")
last_chunk_time = now
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
When is it safe to retry?
This is the most important practical question, because retrying at the wrong time can result in two completed requests being billed.
- Connection timeout (
APIConnectionError): always safe to retry. The request never reached OpenAI's servers, so there's nothing to duplicate. - Read timeout before the first token: almost always safe to retry. OpenAI either didn't receive the request or hasn't started processing it. Use exponential backoff.
- Read timeout mid-stream: not safe to retry blindly. The server received your request and was generating output. A retry will create a new completion. If idempotency matters (e.g., you're writing to a database), check whether you received any partial output before retrying, and decide whether to retry the full request or continue from where you left off.
- Note: OpenAI does not use HTTP 529. That status code is specific to Anthropic's API. OpenAI uses 503 for service unavailability (with a
Retry-Afterheader) and 429 for rate/quota limits. A true timeout is a client-side condition — no HTTP status code involved — which is why distinguishing it from server-error codes matters.
FAQ
Is an OpenAI API timeout the same as a 503 Service Unavailable? No. A timeout is a client-side condition — your HTTP client gave up waiting. A 503 is a server response meaning OpenAI accepted the connection but is currently unable to serve the request. Both can have similar causes (server overload), but they're handled differently. A 503 gives you an HTTP response to parse; a timeout gives you a Python exception with no HTTP status.
Does OpenAI support a request-level timeout separate from the SDK-level one? Yes — you can pass timeout= directly to any method call (e.g., client.chat.completions.create(..., timeout=60)), which overrides the client-level default for that single request. The value can be a float (seconds) or a full httpx.Timeout object for granular control.
Why am I getting timeouts only on o1/o3 but not on GPT-4o? Reasoning models run an internal chain-of-thought pass before producing output — this is where almost all of the latency lives. With high reasoning effort on a complex problem, o1 and o3 can take 2–5 minutes or more. This is expected. Set your timeout to at least 300–600 seconds for these models, and use streaming so your application stays responsive while waiting for output.
Last updated May 28, 2026. Timeout defaults and exception class names sourced from the OpenAI Python SDK v1.x source and httpx documentation. SDK internals can change across versions — verify against the installed SDK version in your environment.