Google Gemini API pricing (2026): the per-token cost table
Short answer: Per million tokens, the Gemini API costs $0.10 in / $0.40 out for Gemini 2.5 Flash-Lite (the cheapest model from any major provider), $0.30 / $2.50 for Gemini 2.5 Flash, and $1.25 / $10 for Gemini 2.5 Pro (up to 200K tokens; $2.50 / $15 above that). Newer options: Gemini 3.5 Flash $1.50 / $9 and Gemini 3.1 Pro Preview $2 / $12. The Batch API takes 50% off and context caching cuts cached input ~90%. Verified against Google's official pricing page, June 2026.
Gemini API model pricing (per 1M tokens)
| Model | Input | Output | Best for |
|---|---|---|---|
| Gemini 2.5 Flash-Lite (cheapest) | $0.10 | $0.40 | High-volume classification, routing, extraction |
| Gemini 2.5 Flash | $0.30 | $2.50 | Best value workhorse for most tasks |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | Newer lite tier, stronger reasoning |
| Gemini 3.5 Flash | $1.50 | $9.00 | Latest Flash; higher quality, native thinking |
| Gemini 2.5 Pro (≤200K) | $1.25 | $10.00 | Flagship reasoning at low cost |
| Gemini 2.5 Pro (>200K) | $2.50 | $15.00 | Long-context flagship |
| Gemini 3.1 Pro Preview (≤200K) | $2.00 | $12.00 | Newest Pro preview; hardest reasoning |
Prices are USD per million tokens (MTok), standard paid tier. Flash and Flash-Lite use a single flat rate; Gemini 2.5 Pro and 3.1 Pro Preview charge more above 200K tokens. Note: Gemini 2.0 Flash and 2.0 Flash-Lite were shut down on June 1, 2026 — migrate to the 2.5 line.
What is the cheapest Gemini model?
Gemini 2.5 Flash-Lite, at $0.10 / $0.40 per million tokens, is the cheapest Gemini model — and the cheapest production model from any major LLM provider. Three stacking levers cut it further:
- Batch API: a flat 50% discount — $0.05 in / $0.20 out per million.
- Context caching: cached input is about $0.01 per million (roughly 90% off), plus a small per-hour storage charge.
- Right-sizing: reserve 2.5 Pro for hard reasoning and route everything else to Flash-Lite or Flash.
For how Gemini stacks up against the other providers, see cheapest LLM API: OpenAI vs Anthropic vs Gemini.
Batch API and context caching discounts
| Feature | Discount | Applies to |
|---|---|---|
| Batch API | 50% off in & out | Async jobs (24-hour SLA), all models |
| Context caching | ~90% off cached input | Repeated context, plus per-hour storage fee |
| Free tier | No charge (rate-limited) | Testing and low-volume use |
Worked example: cost of a typical Gemini 2.5 Flash request
A request that sends 20,000 input tokens and generates 2,000 output tokens on Gemini 2.5 Flash:
- Input: 20,000 × $0.30 / 1,000,000 = $0.0060
- Output: 2,000 × $2.50 / 1,000,000 = $0.0050
- Total: $0.011 per request
The same request on Gemini 2.5 Flash-Lite costs (20,000 × $0.10 + 2,000 × $0.40) / 1,000,000 = $0.0028 — about 4x cheaper, which is why Flash-Lite is the right pick for high-volume, low-complexity work.
How Gemini API pricing compares
Gemini is the price leader across the budget and mid tiers in 2026. Gemini 2.5 Flash-Lite ($0.10/$0.40) undercuts OpenAI's GPT-5.4-nano ($0.20/$1.25) and Anthropic's Claude Haiku 4.5 ($1/$5); Gemini 2.5 Flash ($0.30/$2.50) beats GPT-5.4-mini ($0.75/$4.50). Even the flagship Gemini 2.5 Pro ($1.25 input) is cheaper than GPT-5.5 and Claude Opus 4.8 (both $5 input). Full breakdown: Cheapest LLM API, OpenAI API pricing, and Anthropic API pricing.