What is the default timeout for the OpenAI Python SDK?

The OpenAI Python SDK (v1.x) uses httpx under the hood with a default read timeout of 600 seconds (10 minutes). Despite being generous, this can still be exceeded by very long completions, complex o1/o3 reasoning tasks with high effort settings, or network instability. You can override it by passing a custom httpx.Timeout to the OpenAI client constructor.

Is it safe to retry an OpenAI API request that timed out?

It depends on where in the request the timeout occurred. A connection timeout (before any data was sent to OpenAI) is always safe to retry. A read timeout mid-stream is risky — the server may have received the request and already started generating, meaning a retry could result in a duplicate completion and double billing. If you timed out before receiving the first token, check OpenAI's status page and retry with backoff.

What causes slow or timing-out responses from the OpenAI API?

The most common causes are: (1) a very large max_tokens value on a model with slow token generation, (2) reasoning models like o1 and o3-mini with high reasoning effort settings, which can take minutes for complex tasks, (3) network instability between your server and OpenAI's endpoints, and (4) OpenAI service degradation — check status.openai.com to rule this out before debugging your code.

OpenAI API timeout errors: connection timeout vs. read timeout (and how to fix each)

An OpenAI API timeout isn't a single condition — there are two distinct kinds with different causes and different fixes. Getting them confused leads to either over-aggressive retries (and potential duplicate charges) or under-configured clients that timeout on legitimate long-running jobs. This page covers both, including how to configure timeouts in the Python SDK and when it's safe to retry.

The 30-second answer

Connection timeout: your client couldn't reach OpenAI's servers at all. Usually a network issue. Safe to retry immediately.
Read timeout: connected fine, but the response took longer than your timeout allows. The server may still be generating — retrying mid-stream risks a duplicate completion.
The OpenAI SDK's default read timeout is 600 seconds (10 minutes) — generous for most tasks, but reasoning models (o1/o3) with high effort can exceed it.
Check status.openai.com first — if OpenAI is having a degraded service event, no amount of timeout tuning will help.

The two types of timeout — and why the distinction matters

The OpenAI Python SDK uses httpx as its HTTP transport layer. httpx distinguishes four timeout phases; in practice two of them matter most for API usage:

Timeout type	What it means	Python exception	Safe to retry?
Connect timeout	TCP connection to OpenAI's servers not established within the limit	`httpx.ConnectTimeout` → `openai.APIConnectionError`	Yes — request never reached OpenAI
Read timeout	Connected, but no data received from the server within the limit	`httpx.ReadTimeout` → `openai.APITimeoutError`	Depends (see below)
Write timeout	Sending the request body took too long	`httpx.WriteTimeout`	Yes — request not accepted
Pool timeout	Waiting for an available connection from the pool	`httpx.PoolTimeout`	Yes — request not sent

The OpenAI SDK wraps most of these into two exception classes: openai.APIConnectionError (couldn't connect) and openai.APITimeoutError (connected but timed out waiting for the response). In practice, catching both separately and applying different retry logic gives you the safest behavior.

Default timeout values in the OpenAI Python SDK

As of the v1.x SDK, the default timeout configuration is:

# OpenAI SDK v1.x default — from the source:
# httpx.Timeout(timeout=600.0, connect=5.0)
# That means:
#   connect: 5 seconds
#   read:    600 seconds (10 minutes)
#   write:   600 seconds
#   pool:    600 seconds

The 10-minute read timeout is intentionally generous to accommodate long completions. You'll hit it most often with reasoning models on hard tasks, or when OpenAI is under load and generation slows down. For interactive applications where you'd rather fail fast and show an error, you'll want to shorten it.

How to configure custom timeouts

Pass an httpx.Timeout object to the OpenAI client constructor. This applies to all requests made with that client instance.

from openai import OpenAI
import httpx

# Tight timeouts for an interactive app: fail fast if slow
client_fast = OpenAI(
    timeout=httpx.Timeout(30.0, connect=5.0)
    # connect=5s, read/write/pool=30s
)

# Relaxed timeouts for a batch job with o1 or long completions
client_batch = OpenAI(
    timeout=httpx.Timeout(600.0, connect=10.0)
    # connect=10s, read/write/pool=600s (SDK default, made explicit)
)

# Full four-value control
client_custom = OpenAI(
    timeout=httpx.Timeout(
        connect=5.0,
        read=120.0,
        write=10.0,
        pool=5.0
    )
)

You can also override the timeout on a per-request basis by passing it to the method call directly:

response = client.chat.completions.create(
    model="o1",
    messages=[{"role": "user", "content": "Solve this complex problem..."}],
    timeout=httpx.Timeout(300.0, connect=5.0)  # per-request override
)

Common causes of slow or timing-out responses

If you're hitting timeouts even with generous settings, the issue is usually one of these:

Large max_tokens on a slow model. Every additional output token takes time. Setting max_tokens=4096 when you typically get 200-token responses doesn't cause slowness — but if you actually generate 4000+ tokens, that takes time. Set max_tokens to a realistic ceiling.
Reasoning models with high effort. o1 and o3-mini (and o3 in preview) run an internal "thinking" pass before producing output. With "reasoning_effort": "high", a complex task can take two to five minutes. This is expected behavior — don't shorten your timeout for these models. Use streaming so you get first-token acknowledgment early.
Network path instability. OpenAI uses Cloudflare and regional infrastructure — if your server's route to OpenAI's endpoints is flapping, you'll see intermittent read stalls that look like timeouts. Try from a different network or region to diagnose.
OpenAI service degradation. OpenAI publishes real-time status at status.openai.com. Degraded API response time is the most common incident type and it produces exactly the symptoms of a timeout — slow or no response. Check this before changing any code.

Timeouts with streaming responses

Streaming changes the timeout calculus. In streaming mode, the read timeout applies to the interval between received chunks, not the total response time. This means:

You usually get the first token quickly (within a few seconds), which resets the read-timeout clock.
If the stream stalls mid-generation — no new chunks for longer than your read timeout — you'll get a ReadTimeout even if the total elapsed time is well under your limit.
A stream stall typically indicates server-side slowdown under load, not a bug in your code.

For robust streaming, implement a client-side watchdog that tracks time since the last received chunk:

import time
from openai import OpenAI

client = OpenAI()
CHUNK_STALL_TIMEOUT = 30  # seconds between chunks before we give up

last_chunk_time = time.monotonic()

with client.chat.completions.stream(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a long essay..."}],
) as stream:
    for chunk in stream:
        now = time.monotonic()
        if now - last_chunk_time > CHUNK_STALL_TIMEOUT:
            raise TimeoutError("Stream stalled — no chunk received in 30s")
        last_chunk_time = now
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

When is it safe to retry?

This is the most important practical question, because retrying at the wrong time can result in two completed requests being billed.

Connection timeout (APIConnectionError): always safe to retry. The request never reached OpenAI's servers, so there's nothing to duplicate.
Read timeout before the first token: almost always safe to retry. OpenAI either didn't receive the request or hasn't started processing it. Use exponential backoff.
Read timeout mid-stream: not safe to retry blindly. The server received your request and was generating output. A retry will create a new completion. If idempotency matters (e.g., you're writing to a database), check whether you received any partial output before retrying, and decide whether to retry the full request or continue from where you left off.
Note: OpenAI does not use HTTP 529. That status code is specific to Anthropic's API. OpenAI uses 503 for service unavailability (with a Retry-After header) and 429 for rate/quota limits. A true timeout is a client-side condition — no HTTP status code involved — which is why distinguishing it from server-error codes matters.

FAQ

Is an OpenAI API timeout the same as a 503 Service Unavailable? No. A timeout is a client-side condition — your HTTP client gave up waiting. A 503 is a server response meaning OpenAI accepted the connection but is currently unable to serve the request. Both can have similar causes (server overload), but they're handled differently. A 503 gives you an HTTP response to parse; a timeout gives you a Python exception with no HTTP status.

Does OpenAI support a request-level timeout separate from the SDK-level one? Yes — you can pass timeout= directly to any method call (e.g., client.chat.completions.create(..., timeout=60)), which overrides the client-level default for that single request. The value can be a float (seconds) or a full httpx.Timeout object for granular control.

Why am I getting timeouts only on o1/o3 but not on GPT-4o? Reasoning models run an internal chain-of-thought pass before producing output — this is where almost all of the latency lives. With high reasoning effort on a complex problem, o1 and o3 can take 2–5 minutes or more. This is expected. Set your timeout to at least 300–600 seconds for these models, and use streaming so your application stays responsive while waiting for output.

Last updated May 28, 2026. Timeout defaults and exception class names sourced from the OpenAI Python SDK v1.x source and httpx documentation. SDK internals can change across versions — verify against the installed SDK version in your environment.