How to send images to the Claude API (vision): Python guide
Claude's vision capability lets you include images alongside text in API requests using multimodal content blocks. This guide covers both source types (base64 and URL), the correct message structure, multiple images in one request, and practical limits you need to know before building image-processing pipelines.
The 30-second answer
- Message structure: set
contentto a list of blocks — each block is either{"type": "image", "source": {...}}or{"type": "text", "text": "..."}. - Base64 source:
{"type": "base64", "media_type": "image/jpeg", "data": "<b64string>"} - URL source:
{"type": "url", "url": "https://..."}— URL must be publicly accessible. - Limits: up to 20 images per request, 20 MB per image, JPEG/PNG/GIF/WebP only. Images are billed as input tokens.
Base64 image input
Read the image file, base64-encode it, and pass it in the source object. This is the most reliable approach for images you control:
import anthropic
import base64
client = anthropic.Anthropic()
# Load and encode the image
with open("screenshot.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe what you see in this screenshot."
}
]
}
]
)
print(message.content[0].text)
The media_type must match the actual image format — use "image/jpeg", "image/png", "image/gif", or "image/webp". Claude does not auto-detect format from the binary data; providing the wrong media_type will cause an error.
URL image input
For publicly accessible images, pass the URL directly. Anthropic fetches the image at request time:
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/240px-PNG_transparency_demonstration_1.png"
}
},
{
"type": "text",
"text": "What is in this image?"
}
]
}
]
)
URL sources must be publicly reachable without authentication — signed URLs with short expiry times may fail if Anthropic's servers fetch them with latency. For anything behind auth, use base64.
Multiple images in one request
Include multiple image blocks in the same content list — up to 20 images per request. You can interleave text and image blocks freely:
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Compare these two product screenshots:"
},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_1_b64
}
},
{
"type": "text",
"text": "vs."
},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_2_b64
}
},
{
"type": "text",
"text": "Which UI is cleaner? List the differences."
}
]
}
]
Claude maintains positional context — it can distinguish "the first image" from "the second image" in its response. For document workflows, you can send multiple page screenshots in a single request.
Practical limits and cost
| Limit | Value |
|---|---|
| Max images per request | 20 |
| Max image file size | 20 MB |
| Supported formats | JPEG, PNG, GIF, WebP |
| Max useful resolution | ~1568 × 1568 px (larger images are downscaled) |
| Approximate token cost (small image ~512px) | ~1,500 input tokens |
| Approximate token cost (large image ~1568px) | ~5,800 input tokens |
Images are billed as input tokens at the same rate as text. At claude-sonnet-4-5 pricing ($3/MTok input), a large image costs roughly $0.017. For high-volume image processing, resize images before sending — there is no quality benefit above ~1568px on either side, and you pay for extra pixels in token cost.
Working with PDFs (base64)
For PDF analysis, use the document source type rather than the image type:
with open("report.pdf", "rb") as f:
pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Summarise the key findings."
}
]
}
]
)
PDF support uses the document block type, not the image block type. The PDF is converted to images internally; the same 20 MB file limit applies. Multi-page PDFs work — Claude reads all pages in sequence.
FAQ
What image formats does Claude support? JPEG, PNG, GIF, WebP. Maximum 20 MB per image. Up to 20 images per request.
Base64 vs URL? Use base64 for images you control and for production. Use URL for publicly accessible images where fetching latency is acceptable. Signed/auth-gated URLs are unreliable.
How much does image input cost? Images are billed as input tokens. Roughly 1,500 tokens for a small image and 5,800 for a large one. Resize to ~1568px max before sending to avoid paying for pixels that don't improve quality.
Last updated May 28, 2026. Code examples verified against the Anthropic Python SDK and Claude API documentation. API behaviour may change — confirm against the official docs before deploying to production.