OpenAI API timeout errors: connection timeout vs. read timeout (and how to fix each)

An OpenAI API timeout isn't a single condition — there are two distinct kinds with different causes and different fixes. Getting them confused leads to either over-aggressive retries (and potential duplicate charges) or under-configured clients that timeout on legitimate long-running jobs. This page covers both, including how to configure timeouts in the Python SDK and when it's safe to retry.

The 30-second answer

The two types of timeout — and why the distinction matters

The OpenAI Python SDK uses httpx as its HTTP transport layer. httpx distinguishes four timeout phases; in practice two of them matter most for API usage:

Timeout typeWhat it meansPython exceptionSafe to retry?
Connect timeoutTCP connection to OpenAI's servers not established within the limithttpx.ConnectTimeoutopenai.APIConnectionErrorYes — request never reached OpenAI
Read timeoutConnected, but no data received from the server within the limithttpx.ReadTimeoutopenai.APITimeoutErrorDepends (see below)
Write timeoutSending the request body took too longhttpx.WriteTimeoutYes — request not accepted
Pool timeoutWaiting for an available connection from the poolhttpx.PoolTimeoutYes — request not sent

The OpenAI SDK wraps most of these into two exception classes: openai.APIConnectionError (couldn't connect) and openai.APITimeoutError (connected but timed out waiting for the response). In practice, catching both separately and applying different retry logic gives you the safest behavior.

Default timeout values in the OpenAI Python SDK

As of the v1.x SDK, the default timeout configuration is:

# OpenAI SDK v1.x default — from the source:
# httpx.Timeout(timeout=600.0, connect=5.0)
# That means:
#   connect: 5 seconds
#   read:    600 seconds (10 minutes)
#   write:   600 seconds
#   pool:    600 seconds

The 10-minute read timeout is intentionally generous to accommodate long completions. You'll hit it most often with reasoning models on hard tasks, or when OpenAI is under load and generation slows down. For interactive applications where you'd rather fail fast and show an error, you'll want to shorten it.

How to configure custom timeouts

Pass an httpx.Timeout object to the OpenAI client constructor. This applies to all requests made with that client instance.

from openai import OpenAI
import httpx

# Tight timeouts for an interactive app: fail fast if slow
client_fast = OpenAI(
    timeout=httpx.Timeout(30.0, connect=5.0)
    # connect=5s, read/write/pool=30s
)

# Relaxed timeouts for a batch job with o1 or long completions
client_batch = OpenAI(
    timeout=httpx.Timeout(600.0, connect=10.0)
    # connect=10s, read/write/pool=600s (SDK default, made explicit)
)

# Full four-value control
client_custom = OpenAI(
    timeout=httpx.Timeout(
        connect=5.0,
        read=120.0,
        write=10.0,
        pool=5.0
    )
)

You can also override the timeout on a per-request basis by passing it to the method call directly:

response = client.chat.completions.create(
    model="o1",
    messages=[{"role": "user", "content": "Solve this complex problem..."}],
    timeout=httpx.Timeout(300.0, connect=5.0)  # per-request override
)

Common causes of slow or timing-out responses

If you're hitting timeouts even with generous settings, the issue is usually one of these:

Timeouts with streaming responses

Streaming changes the timeout calculus. In streaming mode, the read timeout applies to the interval between received chunks, not the total response time. This means:

For robust streaming, implement a client-side watchdog that tracks time since the last received chunk:

import time
from openai import OpenAI

client = OpenAI()
CHUNK_STALL_TIMEOUT = 30  # seconds between chunks before we give up

last_chunk_time = time.monotonic()

with client.chat.completions.stream(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a long essay..."}],
) as stream:
    for chunk in stream:
        now = time.monotonic()
        if now - last_chunk_time > CHUNK_STALL_TIMEOUT:
            raise TimeoutError("Stream stalled — no chunk received in 30s")
        last_chunk_time = now
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

When is it safe to retry?

This is the most important practical question, because retrying at the wrong time can result in two completed requests being billed.

FAQ

Is an OpenAI API timeout the same as a 503 Service Unavailable? No. A timeout is a client-side condition — your HTTP client gave up waiting. A 503 is a server response meaning OpenAI accepted the connection but is currently unable to serve the request. Both can have similar causes (server overload), but they're handled differently. A 503 gives you an HTTP response to parse; a timeout gives you a Python exception with no HTTP status.

Does OpenAI support a request-level timeout separate from the SDK-level one? Yes — you can pass timeout= directly to any method call (e.g., client.chat.completions.create(..., timeout=60)), which overrides the client-level default for that single request. The value can be a float (seconds) or a full httpx.Timeout object for granular control.

Why am I getting timeouts only on o1/o3 but not on GPT-4o? Reasoning models run an internal chain-of-thought pass before producing output — this is where almost all of the latency lives. With high reasoning effort on a complex problem, o1 and o3 can take 2–5 minutes or more. This is expected. Set your timeout to at least 300–600 seconds for these models, and use streaming so your application stays responsive while waiting for output.

Last updated May 28, 2026. Timeout defaults and exception class names sourced from the OpenAI Python SDK v1.x source and httpx documentation. SDK internals can change across versions — verify against the installed SDK version in your environment.