API how-to guides: cost & configuration
Step-by-step guides for the configuration and cost-control tasks that the official docs bury — focused on the Claude and OpenAI APIs, verified against current documentation, with the trade-offs and gotchas spelled out.
Cost & billing
- Anthropic prompt caching — cut Claude API input costs ~90%: setup, 5-minute vs 1-hour TTL, the exact pricing math, and why caches miss.
- Set a hard spend limit on the OpenAI API — the prepaid-credits setting that creates a true ceiling, plus alerts and the bill-spike traps.
Streaming responses
- Claude API streaming (Python & TypeScript) — context manager pattern, raw SSE events, async streaming with
AsyncAnthropic, tool use mid-stream, and error handling inside the stream loop. - OpenAI API streaming (Python & TypeScript) —
stream=Truedelta pattern, context manager, async streaming, tool call streaming, andstream_optionsfor usage data.
Tool use & function calling
- Claude tool use (function calling) — complete 4-step tool loop,
input_schemadefinition,tool_choiceoptions, parallel calls, and theis_errorpattern. - OpenAI function calling — tools API (vs deprecated
functions), full loop with message threading,tool_choice, parallel calls, strict mode withadditionalProperties: false. - OpenAI structured outputs —
json_schemaresponse format, strict mode constraints, Pydantic integration viaparse(), refusal handling, and comparison with the oldjson_mode.
Multimodal & vision
- Claude image input (vision) — base64 and URL sources, multimodal content blocks, multiple images per request, token cost by image size, and PDF document input.
Audio & speech
- OpenAI Whisper API — transcribe audio in Python, word-level timestamps, response formats (SRT, VTT, verbose JSON), translation endpoint, handling files over 25 MB.
Batch processing & async
- Anthropic Message Batches API — 50% cheaper for async workloads: create a batch, poll for completion, retrieve results by custom_id, and cost comparison table across models.
Token counting
- OpenAI token counting (tiktoken) — count tokens before sending, message overhead formula, encoding selection by model, truncation to a token budget, cost estimation.
- Anthropic token counting —
count_tokens()live API call, system prompt and tool counting, context limit check, cost estimation, comparison with tiktoken.
Hitting errors instead? See our API error fixes.