How much cheaper is the Anthropic Message Batches API?

The Message Batches API is 50% cheaper than synchronous API calls for both input and output tokens. For example, Claude Sonnet 4.5 is $3/$15 per MTok synchronously; via Message Batches it is $1.50/$7.50 per MTok. The trade-off is latency — results are returned asynchronously within 24 hours, not in real time.

How many requests can I include in a Message Batch?

Each batch can contain up to 10,000 requests. Each request in the batch is a separate message creation, with its own model, messages, max_tokens, and other parameters. Requests within a batch are processed independently — a failure on one does not affect others.

How do I retrieve results from an Anthropic Message Batch?

Poll client.beta.messages.batches.retrieve(batch_id) until the processing_status is 'ended'. Then call client.beta.messages.batches.results(batch_id) which returns an iterable of BatchResult objects. Each result has a custom_id matching the request, a result.type ('succeeded', 'errored', 'expired'), and result.message containing the full message response for succeeded results.

Anthropic Message Batches API: Python guide (50% cheaper)

The Anthropic Message Batches API processes up to 10,000 API calls asynchronously at 50% of the normal per-token price. The trade-off is latency — results come back within 24 hours, not in real time. This makes it the right choice for any workflow that doesn't need a live response: dataset labelling, bulk document classification, evaluation runs, nightly enrichment pipelines.

The 30-second answer

50% cheaper than synchronous API calls. Max 10,000 requests per batch. Results within 24 hours.
Create: client.beta.messages.batches.create(requests=[...]) — each item has a custom_id and a params dict (same as messages.create() params).
Poll: check batch.processing_status == "ended".
Retrieve: client.beta.messages.batches.results(batch_id) → iterate BatchResult objects, match by custom_id.

Creating a batch

Build a list of Request objects — each wraps a custom_id (your identifier) and a params dict identical to what you'd pass to messages.create():

import anthropic

client = anthropic.Anthropic()

# Build the request list
requests = []
texts_to_classify = [
    ("item_001", "I love this product, works perfectly."),
    ("item_002", "Terrible experience, broke after a week."),
    ("item_003", "Decent quality for the price."),
]

for item_id, text in texts_to_classify:
    requests.append(
        anthropic.types.beta.messages.MessageBatchRequestParam(
            custom_id=item_id,
            params=anthropic.types.MessageParam(
                model="claude-haiku-4-5",
                max_tokens=100,
                messages=[
                    {
                        "role": "user",
                        "content": f'Classify as POSITIVE, NEGATIVE, or NEUTRAL. Reply with one word only.\n\nText: "{text}"'
                    }
                ]
            )
        )
    )

# Submit the batch
batch = client.beta.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")  # 'in_progress'

Save the batch.id — you'll need it to poll and retrieve results. The batch starts processing immediately; you don't need to keep the connection open.

Polling for completion

Poll the batch status until processing_status is "ended". Don't poll too aggressively — once a minute is sufficient for most batch sizes:

import time

batch_id = batch.id

while True:
    batch = client.beta.messages.batches.retrieve(batch_id)
    status = batch.processing_status

    print(f"Status: {status} | Counts: {batch.request_counts}")

    if status == "ended":
        break

    time.sleep(60)  # Poll every 60 seconds

print(f"Done. Succeeded: {batch.request_counts.succeeded}, "
      f"Errored: {batch.request_counts.errored}, "
      f"Expired: {batch.request_counts.expired}")

The request_counts object tracks processing, succeeded, errored, expired, and canceled counts. A batch moves to "ended" when all requests are in a terminal state, even if some failed.

Retrieving results

Once the batch ends, iterate results using the results() method. Each result has a custom_id, a result.type, and (for succeeded results) a result.message:

results_map = {}

for result in client.beta.messages.batches.results(batch_id):
    if result.result.type == "succeeded":
        # result.result.message is a full Message object
        text = result.result.message.content[0].text
        results_map[result.custom_id] = text
    elif result.result.type == "errored":
        error = result.result.error
        print(f"{result.custom_id} failed: {error.type} — {error.message}")
    elif result.result.type == "expired":
        print(f"{result.custom_id} expired (24h limit exceeded)")

# Match back to original items
for item_id, original_text in texts_to_classify:
    classification = results_map.get(item_id, "MISSING")
    print(f"{item_id}: {original_text[:40]}... → {classification}")

Results are not guaranteed to come back in the same order as requests — always use custom_id to match responses to inputs. The results() method handles pagination automatically and returns an iterable.

Cost comparison

Model	Synchronous (input / output)	Message Batches (input / output)
Claude Opus 4.6	$15 / $75 per MTok	$7.50 / $37.50 per MTok
Claude Sonnet 4.6	$3 / $15 per MTok	$1.50 / $7.50 per MTok
Claude Haiku 4.5	$0.80 / $4 per MTok	$0.40 / $2 per MTok

For a 10,000-item classification run with short prompts (~500 input tokens, ~10 output tokens each): total ~5M input + 100K output tokens. At Haiku synchronous rates that's $4.40; at Batches rates it's $2.20. The savings scale linearly — larger batches and longer prompts amplify the discount.

When to use batches vs synchronous calls

Use batches for	Use synchronous for
Dataset labelling / annotation	Real-time chat or user-facing responses
Bulk document classification	Streaming responses
Evaluation / test set scoring	Interactive agentic loops
Nightly enrichment pipelines	Anything needing <1s latency
Large embedding / extraction jobs	Tool use with live results

FAQ

How much cheaper is the Batches API? 50% on both input and output tokens across all models. Results return within 24 hours.

Max requests per batch? 10,000. Failed requests don't affect other requests in the batch.

How do I retrieve results? Poll retrieve(batch_id) until processing_status == "ended", then iterate results(batch_id). Match responses to inputs using custom_id.

Last updated May 28, 2026. Code examples verified against the Anthropic Python SDK and Message Batches API documentation. API behaviour may change — confirm against the official docs before deploying to production.