API how-to guides: cost & configuration

Step-by-step guides for the configuration and cost-control tasks that the official docs bury — focused on the Claude and OpenAI APIs, verified against current documentation, with the trade-offs and gotchas spelled out.

Cost & billing

Anthropic prompt caching — cut Claude API input costs ~90%: setup, 5-minute vs 1-hour TTL, the exact pricing math, and why caches miss.
Set a hard spend limit on the OpenAI API — the prepaid-credits setting that creates a true ceiling, plus alerts and the bill-spike traps.

Streaming responses

Claude API streaming (Python & TypeScript) — context manager pattern, raw SSE events, async streaming with AsyncAnthropic, tool use mid-stream, and error handling inside the stream loop.
OpenAI API streaming (Python & TypeScript) — stream=True delta pattern, context manager, async streaming, tool call streaming, and stream_options for usage data.

Tool use & function calling

Claude tool use (function calling) — complete 4-step tool loop, input_schema definition, tool_choice options, parallel calls, and the is_error pattern.
OpenAI function calling — tools API (vs deprecated functions), full loop with message threading, tool_choice, parallel calls, strict mode with additionalProperties: false.
OpenAI structured outputs — json_schema response format, strict mode constraints, Pydantic integration via parse(), refusal handling, and comparison with the old json_mode.

Multimodal & vision

Claude image input (vision) — base64 and URL sources, multimodal content blocks, multiple images per request, token cost by image size, and PDF document input.

Audio & speech

OpenAI Whisper API — transcribe audio in Python, word-level timestamps, response formats (SRT, VTT, verbose JSON), translation endpoint, handling files over 25 MB.

Batch processing & async

Anthropic Message Batches API — 50% cheaper for async workloads: create a batch, poll for completion, retrieve results by custom_id, and cost comparison table across models.

Token counting

OpenAI token counting (tiktoken) — count tokens before sending, message overhead formula, encoding selection by model, truncation to a token budget, cost estimation.
Anthropic token counting — count_tokens() live API call, system prompt and tool counting, context limit check, cost estimation, comparison with tiktoken.

Hitting errors instead? See our API error fixes.