Anthropic Message Batches API: Python guide (50% cheaper)

The Anthropic Message Batches API processes up to 10,000 API calls asynchronously at 50% of the normal per-token price. The trade-off is latency — results come back within 24 hours, not in real time. This makes it the right choice for any workflow that doesn't need a live response: dataset labelling, bulk document classification, evaluation runs, nightly enrichment pipelines.

The 30-second answer

Creating a batch

Build a list of Request objects — each wraps a custom_id (your identifier) and a params dict identical to what you'd pass to messages.create():

import anthropic

client = anthropic.Anthropic()

# Build the request list
requests = []
texts_to_classify = [
    ("item_001", "I love this product, works perfectly."),
    ("item_002", "Terrible experience, broke after a week."),
    ("item_003", "Decent quality for the price."),
]

for item_id, text in texts_to_classify:
    requests.append(
        anthropic.types.beta.messages.MessageBatchRequestParam(
            custom_id=item_id,
            params=anthropic.types.MessageParam(
                model="claude-haiku-4-5",
                max_tokens=100,
                messages=[
                    {
                        "role": "user",
                        "content": f'Classify as POSITIVE, NEGATIVE, or NEUTRAL. Reply with one word only.\n\nText: "{text}"'
                    }
                ]
            )
        )
    )

# Submit the batch
batch = client.beta.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")  # 'in_progress'

Save the batch.id — you'll need it to poll and retrieve results. The batch starts processing immediately; you don't need to keep the connection open.

Polling for completion

Poll the batch status until processing_status is "ended". Don't poll too aggressively — once a minute is sufficient for most batch sizes:

import time

batch_id = batch.id

while True:
    batch = client.beta.messages.batches.retrieve(batch_id)
    status = batch.processing_status

    print(f"Status: {status} | Counts: {batch.request_counts}")

    if status == "ended":
        break

    time.sleep(60)  # Poll every 60 seconds

print(f"Done. Succeeded: {batch.request_counts.succeeded}, "
      f"Errored: {batch.request_counts.errored}, "
      f"Expired: {batch.request_counts.expired}")

The request_counts object tracks processing, succeeded, errored, expired, and canceled counts. A batch moves to "ended" when all requests are in a terminal state, even if some failed.

Retrieving results

Once the batch ends, iterate results using the results() method. Each result has a custom_id, a result.type, and (for succeeded results) a result.message:

results_map = {}

for result in client.beta.messages.batches.results(batch_id):
    if result.result.type == "succeeded":
        # result.result.message is a full Message object
        text = result.result.message.content[0].text
        results_map[result.custom_id] = text
    elif result.result.type == "errored":
        error = result.result.error
        print(f"{result.custom_id} failed: {error.type} — {error.message}")
    elif result.result.type == "expired":
        print(f"{result.custom_id} expired (24h limit exceeded)")

# Match back to original items
for item_id, original_text in texts_to_classify:
    classification = results_map.get(item_id, "MISSING")
    print(f"{item_id}: {original_text[:40]}... → {classification}")

Results are not guaranteed to come back in the same order as requests — always use custom_id to match responses to inputs. The results() method handles pagination automatically and returns an iterable.

Cost comparison

ModelSynchronous (input / output)Message Batches (input / output)
Claude Opus 4.6$15 / $75 per MTok$7.50 / $37.50 per MTok
Claude Sonnet 4.6$3 / $15 per MTok$1.50 / $7.50 per MTok
Claude Haiku 4.5$0.80 / $4 per MTok$0.40 / $2 per MTok

For a 10,000-item classification run with short prompts (~500 input tokens, ~10 output tokens each): total ~5M input + 100K output tokens. At Haiku synchronous rates that's $4.40; at Batches rates it's $2.20. The savings scale linearly — larger batches and longer prompts amplify the discount.

When to use batches vs synchronous calls

Use batches forUse synchronous for
Dataset labelling / annotationReal-time chat or user-facing responses
Bulk document classificationStreaming responses
Evaluation / test set scoringInteractive agentic loops
Nightly enrichment pipelinesAnything needing <1s latency
Large embedding / extraction jobsTool use with live results

FAQ

How much cheaper is the Batches API? 50% on both input and output tokens across all models. Results return within 24 hours.

Max requests per batch? 10,000. Failed requests don't affect other requests in the batch.

How do I retrieve results? Poll retrieve(batch_id) until processing_status == "ended", then iterate results(batch_id). Match responses to inputs using custom_id.

Last updated May 28, 2026. Code examples verified against the Anthropic Python SDK and Message Batches API documentation. API behaviour may change — confirm against the official docs before deploying to production.