Google’s Gemini API just got more expensive to use for free — and cheaper to use at scale.
Updated May 15, 2026 — Gemini 3.1 Flash-Lite moved from preview to general availability on May 7, 2026, with confirmed pricing of $0.25 / $1.50 per million input/output tokens. We’ve also added a new section on context caching (the easy 90% discount most devs miss), refreshed free-tier rate limits to current values, and pinned what to watch for at Google I/O on May 19-20. Original guide published April 3, 2026.
On April 1, 2026, Google enforced mandatory spending caps across all billing tiers, restricted Pro models behind a paywall for free users, and introduced prepaid billing for new accounts. If you’ve been running on the free tier, your rate limits probably got cut. And if you haven’t set up billing yet, you might be locked out of models you were using last month.
But here’s the thing most coverage misses: even with these changes, Gemini is still the cheapest major AI API for most use cases. Flash-Lite costs $0.10 per million input tokens (2.5 generation) or $0.25 (3.1 generation, now GA). That’s still 12-25x cheaper than Claude Sonnet and 5-10x cheaper than OpenAI’s GPT-5.4.
The pricing just got more complicated. Let’s sort it out.
What Is the Gemini API?
If you’re not a developer, here’s the short version: an API (Application Programming Interface) is a way for software to talk to AI. When you use ChatGPT through its website, that’s the consumer product. When a company builds AI into their own app — an email assistant, a document analyzer, a chatbot on their website — they use the API.
The Gemini API lets developers build with Google’s AI models. It’s how apps like custom chatbots, content tools, and data analysis pipelines access Gemini’s brains.
For developers reading this: you already know what an API is. Skip to the pricing table.
What Changed on April 1, 2026
Three big shifts:
1. Mandatory spending caps by tier. Google now enforces maximum monthly spend at the billing account level. You can’t exceed your tier’s cap even if you want to.
| Tier | Monthly Cap | How to Reach |
|---|---|---|
| Tier 1 | $250/mo | Enable billing (default) |
| Tier 2 | $2,000/mo | $100 spend + 3 days (was $250 + 30 days) |
| Tier 3 | $20,000-$100,000+/mo | Contact Google |
Hit your cap? Your API pauses until next month — or until you upgrade tiers. No surprise bills. That’s actually a good change for developers who’ve been burned by runaway API costs.
2. Free tier restricted to Flash models only. Before April, free-tier users could access Gemini Pro. Now, Pro requires either a paid API key or a Google AI Pro/Ultra subscription. The free tier still works — but only with Flash and Flash-Lite models.
3. Prepaid billing for new users. Starting March 23, new AI Studio accounts may be required to use prepaid billing — buy credits first, use them as you go. Existing accounts aren’t affected (yet).
The Free Tier: What You Still Get
The free tier isn’t dead. But it’s smaller than it used to be. Google cut free quotas by 50-80% back in December 2025, and the April changes removed Pro model access entirely.
Here’s what free-tier developers get right now (rate limits per Google’s official rate-limits page, current as of May 2026):
| Model | Requests/Min | Requests/Day | Tokens/Min |
|---|---|---|---|
| Gemini 2.5 Flash-Lite / 3.1 Flash-Lite | 30 | 1,500 | 1,000,000 |
| Gemini 2.5 Flash / 3 Flash (preview) | 15 | 1,500 | 1,000,000 |
| Gemini 2.5 Pro | 5 | 50 | 250,000 |
All models get full access to the 1-million-token context window — that hasn’t changed.
Heads up on the May 2026 numbers: if you’ve seen older guides quoting 10 RPM / 250 RPD for Flash or 100 RPD for Pro, those were the April reductions. Google quietly raised RPM and TPM ceilings for Flash models as part of the 3.1 Flash-Lite GA rollout on May 7, 2026 — Pro free-tier requests per day got cut further (100 → 50) to push heavy reasoning use to paid tiers. The Flash bump is good news; the Pro cut is a deliberate paywall nudge.
Is the free tier enough? For prototyping, personal projects, and low-traffic apps, yes. Flash-Lite at 1,500 requests/day and 1M TPM handles a surprisingly useful workload — well past the threshold where most “weekend project” apps live. But for anything production-grade or user-facing with more than a handful of daily users, you’ll need to enable billing.
Paid Pricing: Every Model, Every Cost
Here’s the full pricing table for Gemini API as of May 2026, per million tokens (USD):
| Model | Status | Input | Output | Input (>200K) | Output (>200K) |
|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | GA | $0.10 | $0.40 | — | — |
| Gemini 3.1 Flash-Lite | GA May 7, 2026 | $0.25 | $1.50 | — | — |
| Gemini 2.5 Flash | GA | $0.30 | $2.50 | — | — |
| Gemini 3 Flash | Preview | $0.50 | $3.00 | — | — |
| Gemini 2.5 Pro | GA | $1.25 | $10.00 | $2.50 | $15.00 |
| Gemini 3.1 Pro | GA | $2.00 | $12.00 | $4.00 | $18.00 |
Cost-saving features:
- Batch API: 50% off all prices (for non-real-time jobs)
- Context caching: Up to ~90% savings on cached inputs (see the dedicated section below — this is where most devs leave money on the table)
- Audio input: Priced higher than text — $0.50/MTok on Flash-Lite, $1.00/MTok on Flash
- Flex (batch) tier: Roughly half the standard price on Flash-Lite ($0.125 input / $0.75 output) for jobs that can wait
Grounding with Google Search/Maps: Each paid project gets a free monthly grounding allocation; beyond that, expect ~$14 per 1,000 grounded queries. This is a line item most pricing guides miss — if your app calls tools=["google_search_retrieval"] at scale, budget for it.
Context Caching: The Hidden ~90% Discount
If you’re sending the same long system prompt, document, or codebase with every request, you are leaving money on the table. As of May 2026, Gemini has two caching modes and both can cut your input costs by roughly 90%.
Implicit caching (on by default, all paid projects). Google quietly checks every request against a hash of recent inputs. If you re-send the same context within the cache window, the matched tokens get billed at the cached rate automatically — no code changes needed. You’ll see the savings on the bill, but you don’t get a knob to control it.
Explicit caching (opt-in, deterministic). You create a cache entry with a TTL, then reference its ID on subsequent requests. Predictable savings, but you pay a small write fee and an hourly storage fee.
Here’s the explicit-cache pricing for Gemini 3.1 Pro (per million tokens):
| Operation | Price |
|---|---|
| Cache write (one-time per entry) | $0.50 |
| Cache read (≤200K context) | $0.20 |
| Cache read (>200K context) | $0.40 |
| Storage | $4.50 per hour |
The math, in plain English. Gemini 3.1 Pro normally charges $2.00 per 1M input tokens. With explicit caching, a re-used segment drops to $0.20 per 1M read — a 90% discount on that segment. If your app re-uses a 100K-token system prompt across 1,000 requests, naive cost is ~$200 of input. With caching: $0.05 (write) + $20 (reads) + ~$0.45 (storage for one hour) ≈ $20.50, or ~90% saved.
When explicit caching is worth the work: repeated retrieval-augmented generation (RAG) with the same corpus, multi-turn agents with a fat system prompt, code-review bots replaying the same project tree. Anything where the cache hit rate will be high.
When implicit caching is enough: one-off prototypes, chatbots with fresh contexts per session, or anything where re-use is incidental rather than intentional. You’ll still get the discount automatically when it kicks in.
For Gemini 3 Flash and 3.1 Flash-Lite, cache storage drops to ~$1.00 per 1M tokens per hour, with write/read line items broadly proportional to their base prices. The 90% rule of thumb holds across the family.
How It Compares to OpenAI and Claude
This is the table most developers actually want. How does Gemini stack up per million tokens?
| Model | Input | Output | Best For |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | Cheapest option anywhere |
| Gemini 2.5 Flash | $0.30 | $2.50 | Speed + low cost |
| Gemini 3.1 Pro | $2.00 | $12.00 | Google’s flagship |
| OpenAI GPT-5.4 | $2.50 | $15.00 | OpenAI’s current API default |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Strong reasoning |
| Claude Opus 4.7 | $5.00 | $25.00 | Best coding & agentic work (launched Apr 16) |
On paper, Gemini wins on price across every tier. Flash-Lite at $0.10 per million input tokens is the cheapest production-ready API from any major provider.
But there’s a catch. Recent research found that listed prices can be misleading. When you account for how many tokens each model actually uses to complete a task, the real cost sometimes flips. In 21.8% of model comparisons, the “cheaper” model actually costs more in practice. And the calculus got messier on April 16 when Claude Opus 4.7 shipped with a new tokenizer that bills up to ~35% more tokens for the same text — so Claude’s effective cost is now higher than the $5/$25 sticker suggests, while Gemini’s nominal pricing is unchanged.
The lesson: don’t just compare per-token prices. Test your specific workload with each model and measure total cost per task — especially for anything that uses Opus 4.7.
Which Model Should You Use?
Here’s a quick decision framework:
Gemini 2.5 Flash-Lite — For high-volume, cost-sensitive tasks where you need speed and low cost. Think classification, simple extraction, boilerplate generation. At $0.10/MTok input, you can run thousands of requests for pennies.
Gemini 2.5 Flash — The workhorse. Good quality, fast, and cheap enough for most production apps. If you’re building a chatbot, content tool, or summarizer, start here.
Gemini 3.1 Pro — For complex reasoning, long-context analysis, and tasks where quality matters more than cost. Use it when Flash isn’t accurate enough.
When to use OpenAI or Claude instead: If your task is primarily code generation (Claude excels here), complex multi-step reasoning (Claude Opus), or you need specific tool integrations (OpenAI’s ecosystem). Gemini wins on price and context window size. The others win on specific quality dimensions.
How to Set Up Billing
If you’re still on the free tier and want to unlock higher rate limits and Pro models:
- Go to Google AI Studio
- Click your profile → Billing
- Choose Prepaid or Pay-as-you-go (new accounts may only see Prepaid)
- Add a payment method
- You’ll start in Tier 1 ($250/mo cap)
- After $100 spend and 3 days, you’ll auto-upgrade to Tier 2 ($2,000/mo cap)
Check your current tier anytime at: aistudio.google.com/app/apikeys
What’s Coming Next
May 19-20, 2026 — Google I/O keynote. As of mid-May, the developer keynote has not happened yet but is the most likely vehicle for the next round of pricing news. Watch for: a possible Gemini 3 Flash GA upgrade (it’s been preview since December 2025), any Gemini 3.2 announcement, and clarifications on the long-rumored “Deep Think” mode rollout outside Pro. We’ll update this guide within a day of the keynote.
June 1, 2026: Gemini 2.0 Flash and 2.0 Flash-Lite get deprecated. If you’re still using either, migrate to 2.5 Flash, 3 Flash (preview), or the now-GA 3.1 Flash-Lite before then. The deprecation is on track per Google’s release notes as of May 2026 — no signs of a delay.
Already shipped in 2026 worth noting: Gemma 4 (Google’s open-weights model family) launched in April 2026 and is positioned as “byte for byte the most capable open model.” It’s not Gemini API and runs on your own infra, but it’s a real Gemini-adjacent option if your workload tolerates self-hosting and you want to escape the Tier caps entirely.
Google is also expected to keep tightening the free tier while expanding paid features. The trend is clear: the free lunch is shrinking, but the paid menu keeps getting cheaper — especially if you actually use caching.
A Quick Word on the Spending Caps (And Why They Help)
The April 1 monthly spend caps got mixed reviews on Reddit and the Google AI Discuss forum — some developers see $250 (Tier 1) or $2,000 (Tier 2) as a ceiling that limits scaling. That’s a real frustration if you’re trying to ramp a product fast.
But there’s a flip side worth knowing about, especially if you skim AI billing horror stories. In 2025, Google’s API had multiple thinking-token and image-handling billing bugs that produced unexpected bills ranging from $1,000 to over $70,000 for individual developers — many shared on the Google AI Discuss forum. Several of those got refunded; not all of them. The hard monthly cap means a bug like that can’t drain your card anymore — the API simply pauses at the tier ceiling until next cycle or until you upgrade.
If you’re prototyping and the $250 Tier 1 cap feels constraining, the upgrade path is faster than it used to be: $100 of paid spend plus 3 days unlocks Tier 2 ($2,000/mo). Tier 3 ($1,000+ spend, 30+ days from first payment) goes from $20K up to $100K+ depending on configuration.
The Bottom Line
Gemini API is still the most affordable major AI API in 2026 — especially at the Flash-Lite and Flash tiers. But the April 1 changes mean you can’t coast on free forever. If you’re building anything beyond a prototype, enable billing. The spend caps mean you won’t get surprised by a runaway bill, and the rate limits on paid tiers are dramatically higher than free.
And before you lock into any single provider based on listed token prices, test your actual workload. The cheapest API on paper isn’t always the cheapest in practice.
Sources:
- Gemini Developer API Pricing — Google AI for Developers
- Gemini API Billing — Google AI for Developers
- Rate Limits — Google AI for Developers
- Gemini API Release Notes / Changelog — Google AI for Developers
- Gemini 3.1 Flash-Lite is now generally available — Google Cloud Blog
- Gemini 3.1 Flash Lite: Our most cost-effective AI model yet — Google Blog
- Gemini 3 Flash — Vertex AI Generative AI Docs
- Context Caching Overview — Gemini API
- Gemini API Billing Tier Changes 2026 — LaoZhang AI Blog
- Gemini API Free Tier Limits 2026 — LaoZhang AI Blog
- AI API Pricing Comparison 2026 — IntuitionLabs
- Gemini API Pricing 2026 — MetaCTO
- Gemini API vs OpenAI vs Claude Cost Guide — AIFREEAPI
- Gemini API Free Tier 2026 — TokenMix
- Gemini 3 Flash API Pricing Calculator (May 2026) — TokenCost