Anthropic Message Batches API: Python guide (50% cheaper)
The Anthropic Message Batches API processes up to 10,000 API calls asynchronously at 50% of the normal per-token price. The trade-off is latency — results come back within 24 hours, not in real time. This makes it the right choice for any workflow that doesn't need a live response: dataset labelling, bulk document classification, evaluation runs, nightly enrichment pipelines.
The 30-second answer
- 50% cheaper than synchronous API calls. Max 10,000 requests per batch. Results within 24 hours.
- Create:
client.beta.messages.batches.create(requests=[...])— each item has acustom_idand aparamsdict (same asmessages.create()params). - Poll: check
batch.processing_status == "ended". - Retrieve:
client.beta.messages.batches.results(batch_id)→ iterateBatchResultobjects, match bycustom_id.
Creating a batch
Build a list of Request objects — each wraps a custom_id (your identifier) and a params dict identical to what you'd pass to messages.create():
import anthropic
client = anthropic.Anthropic()
# Build the request list
requests = []
texts_to_classify = [
("item_001", "I love this product, works perfectly."),
("item_002", "Terrible experience, broke after a week."),
("item_003", "Decent quality for the price."),
]
for item_id, text in texts_to_classify:
requests.append(
anthropic.types.beta.messages.MessageBatchRequestParam(
custom_id=item_id,
params=anthropic.types.MessageParam(
model="claude-haiku-4-5",
max_tokens=100,
messages=[
{
"role": "user",
"content": f'Classify as POSITIVE, NEGATIVE, or NEUTRAL. Reply with one word only.\n\nText: "{text}"'
}
]
)
)
)
# Submit the batch
batch = client.beta.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}") # 'in_progress'
Save the batch.id — you'll need it to poll and retrieve results. The batch starts processing immediately; you don't need to keep the connection open.
Polling for completion
Poll the batch status until processing_status is "ended". Don't poll too aggressively — once a minute is sufficient for most batch sizes:
import time
batch_id = batch.id
while True:
batch = client.beta.messages.batches.retrieve(batch_id)
status = batch.processing_status
print(f"Status: {status} | Counts: {batch.request_counts}")
if status == "ended":
break
time.sleep(60) # Poll every 60 seconds
print(f"Done. Succeeded: {batch.request_counts.succeeded}, "
f"Errored: {batch.request_counts.errored}, "
f"Expired: {batch.request_counts.expired}")
The request_counts object tracks processing, succeeded, errored, expired, and canceled counts. A batch moves to "ended" when all requests are in a terminal state, even if some failed.
Retrieving results
Once the batch ends, iterate results using the results() method. Each result has a custom_id, a result.type, and (for succeeded results) a result.message:
results_map = {}
for result in client.beta.messages.batches.results(batch_id):
if result.result.type == "succeeded":
# result.result.message is a full Message object
text = result.result.message.content[0].text
results_map[result.custom_id] = text
elif result.result.type == "errored":
error = result.result.error
print(f"{result.custom_id} failed: {error.type} — {error.message}")
elif result.result.type == "expired":
print(f"{result.custom_id} expired (24h limit exceeded)")
# Match back to original items
for item_id, original_text in texts_to_classify:
classification = results_map.get(item_id, "MISSING")
print(f"{item_id}: {original_text[:40]}... → {classification}")
Results are not guaranteed to come back in the same order as requests — always use custom_id to match responses to inputs. The results() method handles pagination automatically and returns an iterable.
Cost comparison
| Model | Synchronous (input / output) | Message Batches (input / output) |
|---|---|---|
| Claude Opus 4.6 | $15 / $75 per MTok | $7.50 / $37.50 per MTok |
| Claude Sonnet 4.6 | $3 / $15 per MTok | $1.50 / $7.50 per MTok |
| Claude Haiku 4.5 | $0.80 / $4 per MTok | $0.40 / $2 per MTok |
For a 10,000-item classification run with short prompts (~500 input tokens, ~10 output tokens each): total ~5M input + 100K output tokens. At Haiku synchronous rates that's $4.40; at Batches rates it's $2.20. The savings scale linearly — larger batches and longer prompts amplify the discount.
When to use batches vs synchronous calls
| Use batches for | Use synchronous for |
|---|---|
| Dataset labelling / annotation | Real-time chat or user-facing responses |
| Bulk document classification | Streaming responses |
| Evaluation / test set scoring | Interactive agentic loops |
| Nightly enrichment pipelines | Anything needing <1s latency |
| Large embedding / extraction jobs | Tool use with live results |
FAQ
How much cheaper is the Batches API? 50% on both input and output tokens across all models. Results return within 24 hours.
Max requests per batch? 10,000. Failed requests don't affect other requests in the batch.
How do I retrieve results? Poll retrieve(batch_id) until processing_status == "ended", then iterate results(batch_id). Match responses to inputs using custom_id.
Last updated May 28, 2026. Code examples verified against the Anthropic Python SDK and Message Batches API documentation. API behaviour may change — confirm against the official docs before deploying to production.