Claude Opus 4.7's New Tokenizer Made Your Bill 12–27% Bigger

Anthropic kept Opus 4.7 pricing identical to 4.6. The new tokenizer uses 32-45% more tokens on the same text. Here's the real cost math and how to absorb it.

Claude Opus 4.7 launched in April with the same per-token price as Opus 4.6: $5 per million input tokens, $25 per million output tokens. Every benchmark headline said pricing was unchanged. Eight Opus 4.7 review posts later, none of them lead with the tokenizer change, and that’s where most of the actual cost increase lives.

Anthropic said it themselves in the migration notes for the model, buried inside the docs: “Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens — roughly 1.0–1.35× depending on the content type.” OpenRouter ran the numbers on more than one million real requests and found 32–45% more native tokens depending on prompt size. Simon Willison’s measurements on the system prompt itself came out to 1.46x. Mario Zechner ran his own counts and got 33% more — “that’s one chonky API revenue increase, while keeping token prices ’the same.'”

The per-token price didn’t change. The number of tokens did. The bill grew anyway. Here’s the math, what’s driving it, and the four workarounds that actually reduce your cost without giving up the 4.7 capability gains.

Anthropic’s official “What’s New in Claude Opus 4.7” documentation page describing the tokenizer change Source: What’s new in Claude Opus 4.7 — Anthropic Claude API Docs. The migration section explicitly states the tokenizer can map “the same input” to “more tokens — roughly 1.0–1.35× depending on the content type.”

The Per-Token Price Is the Wrong Metric

Anthropic kept the listed price stable. The launch post says it directly: “Opus 4.7 is available today across all Claude products and our API… Pricing remains the same as Opus 4.6: $5 per million input tokens and $25 per million output tokens.”

But “per token” is a price per unit, not a price per task. The unit got smaller. Specifically, the same English text now produces meaningfully more tokens when sent to Opus 4.7 than to Opus 4.6, because the new tokenizer’s vocabulary is structured differently. OpenRouter’s analysis described this cleanly: their independent QuadChars tokenizer (4 ASCII chars = 1 token, constant across models) stayed fixed, while Anthropic’s reported native count rose. That gap is the tokenizer change isolated from prompt content.

The result: same prompt, same workflow, same model conversation — measurably more billable tokens on 4.7 than on 4.6.

The Numbers (From People Who Actually Measured)

Three independent measurements converged on the same range. None of them are vendor-vague.

OpenRouter’s 1M+ request analysis. A “switcher cohort” of users whose primary model went from Opus 4.6 to Opus 4.7 across the launch window. The findings:

Prompt sizeNative token inflationEffective cost change after cache
Under 2K tokens~45%−1.6% (shorter completions absorbed it)
2K–10K~42%+27.2%
10K–25K~34%+25.2%
25K–50K~32%+21.3%
50K–128K~32%+11.9%
128K+~33%+15.3%

OpenRouter’s summary: “We found that costs increased 12–27%, with the exception of short prompts, which actually got more cost efficient.”

Claude Code Camp’s seven-sample measurement. Used Anthropic’s own POST /v1/messages/count_tokens endpoint to count identical content against claude-opus-4-6 and claude-opus-4-7. Results by content type:

  • Technical English docs: 1.47x
  • Shell scripts: 1.39x
  • TypeScript code: 1.36x
  • A real CLAUDE.md file: 1.445x
  • Spanish prose: 1.35x
  • Markdown with embedded code: 1.34x
  • Python code: 1.29x
  • Plain English prose: 1.20x
  • Dense JSON: 1.13x
  • CJK content (Chinese/Japanese/Korean): 1.01x (effectively unchanged)

Real-world Claude Code session cost (80 turns): $6.65 on 4.6 → $7.86–$8.76 on 4.7. A 20–30% increase per session.

Simon Willison’s independent count. “I upgraded my Claude token counter tool to compare different models and Opus 4.7 does appear to use 1.46× the tokens for text and up to 3× the tokens for images — it’s priced the same as Opus 4.6 on a per-token basis so this is actually a pretty big price bump.” That post got 1,500+ engagements on its first day; the multi-model token-comparison tool now reflects the 1.46x text and ~3x image multipliers.

Mario Zechner ran his own counts and posted: “So basically 33% more tokens with the new Opus 4.7 tokenizer. That’s one chonky API revenue increase, while keeping token prices ’the same’.”

Multiple developers reported even worse cases in agent settings. One developer running the same Claude Code task on both models reported 4.7 “exhausted its context window twice (6.7 MB transcript) [while] 4.6 completed it with zero compactions (1.4 MB) — same task, ~5x more tokens for a worse result.” That’s the worst-case combination of tokenizer inflation plus the second cost amplifier Anthropic disclosed in the docs: “Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings… it does mean it produces more output tokens.”

The tokenizer is half the story. Increased output-token verbosity on long agentic sessions is the other half. Combined, the same agentic workflow can run 1.3x to 5x more total tokens on 4.7 than on 4.6.

Why Anthropic Changed Tokenizers At All

The official explanation is that the new tokenizer “improves how the model processes text” — the migration docs frame it as a tradeoff, not a regression. Anthropic doesn’t elaborate publicly, but two reasonable interpretations are consistent with what they’ve shipped:

  1. Multilingual parity. The CJK content barely changed (1.01x in measurements). The tokenizer rebalance disproportionately affects English and Latin-script languages. Anthropic has been investing in non-English performance — the new tokenizer likely brings token-per-character ratios closer across languages, even if that means English carries the cost.

  2. Instruction-following granularity. Anthropic states Opus 4.7 has “a small but directionally consistent improvement on strict instruction following.” Finer-grained tokenization can help models attend to specific phrases more precisely — a benefit that’s invisible in benchmarks but real in agentic reliability.

Neither of these is wrong as a model decision. The cost shift is real and undisclosed by the per-token price.

The Four Workarounds That Actually Reduce Cost

Concrete strategies developers are using, with the tradeoffs each carries.

1. Maximize prompt caching aggressively

This is the single biggest lever, and it’s the one Anthropic effectively designed for. OpenRouter’s data is the clearest: at 128K+ prompts, “the majority of extra tokens from the new tokenizer are captured by the cache.” 93% of the tokenizer overhead gets absorbed when your prompts are large enough and structured for cache reuse.

The implementation: use the cache_control: { type: "ephemeral" } parameter on the parts of your prompt that don’t change between requests — system prompts, long context documents, tool definitions, repeated few-shot examples. Cached input tokens are billed at $0.50 per million (10% of normal input rate) on cache reads and ~$6.25 per million on cache writes. The math: any context block you’ll re-use 3+ times pays for the cache write within three calls.

Specifically optimize: the system prompt (always cache it), tool definitions (cache them), large constant context documents (cache them), and stable few-shot examples (cache them). What you can’t cache: the rolling conversation history, the user’s new query, and dynamic context. Aim for 70%+ of total input to come from cached blocks on long sessions.

2. Route by prompt size and task complexity

The cost curve isn’t uniform — short prompts on 4.7 are actually cheaper than 4.6 (−1.6% in OpenRouter’s data) because 4.7 generates more concise responses for simple queries. The expensive band is 2K–25K tokens, where you get +25–27% with no cache absorption.

Pattern: use 4.7 for hard reasoning tasks (where the capability gains justify the cost) and route high-volume, smaller-context work to a cheaper model (Sonnet 4.6, or in some cases GPT-5.5 / Codex for code). The developer @mixtureofmodels posted on May 17: “I’m starting to treat claude code / opus 4.7 as only the architect though and moving implementation automatically to codex / gpt 5.5 — so far this is seeming a lot more token efficient.” That’s the right shape — Opus 4.7 designs, a cheaper model implements.

Concrete routing rules to start with: tasks under 2K tokens → 4.7 is fine (cost-neutral). Tasks 2K–25K of plain English explanation → consider Sonnet. Tasks 50K+ with heavy cached context → 4.7 with caching. Final code review or hard reasoning → 4.7 with --effort xhigh.

3. Section-by-section prompting on long agentic workflows

The 6.7 MB context-window-exhaustion report from the developer who saw 5x cost on the same task is the canonical example of how tokenizer inflation compounds in long sessions. Each turn adds tokens; the next turn includes those tokens; auto-compaction kicks in and burns more tokens summarizing. By turn 50, a session that fit comfortably on 4.6 has bloated past the context limit on 4.7.

Defensive pattern: stop asking Opus 4.7 to do everything in one long conversation. Break large agentic workflows into discrete scoped sessions — “design the data model,” “write the migrations,” “write the tests,” “review the diff” — and either start fresh sessions or use claude --bg (Agent View) to dispatch them as separate background sessions with separate context windows. As @jason_coder0 put it: “Do work section-by-section. Generate individual components first.”

4. Audit your actual token spend against the count_tokens endpoint

If you’re running any production workflow on Opus 4.7, run a one-time audit before assuming your budget projection is still right. Anthropic exposes the POST /v1/messages/count_tokens API for free — feed it your representative prompts and compare counts against your 4.6 baseline (you can request a 4.6 count by passing the model name explicitly).

A sample audit script:

# Count tokens for a sample prompt under 4.6 vs 4.7
curl -s https://api.anthropic.com/v1/messages/count_tokens \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"<your prompt>"}]}'

curl -s https://api.anthropic.com/v1/messages/count_tokens \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-opus-4-7","messages":[{"role":"user","content":"<your prompt>"}]}'

The ratio between the two is your specific tokenizer overhead for your specific workload. Take that number to your budget owner before you ship 4.7 across the team.

What This Means For You

If you’re a solo developer using Claude Code casually. Don’t overthink it. The 20–30% per-session bump is real but the absolute dollar impact on a single $20/month Pro subscription is bounded by the rate limit. Cache your most-used skill files and your CLAUDE.md and you’ll absorb most of the overhead.

If you’re running an agent at scale (Cowork, an internal automation, a SaaS workflow). Audit your prompt sizes today. If they’re in the 2K–25K range, you’re squarely in the +25% cost-impact band. Either restructure prompts to push into the cache-effective 50K+ range, or route shorter work to Sonnet 4.6 and reserve 4.7 for the hard tasks. The cost difference compounds monthly.

If you’re on Max plan / Cowork plan with a 5-hour quota. The quota is denominated in usage, not in dollars, and quota math still moves against you. A workflow that fit in a 5-hour Max session on 4.6 will hit the quota wall faster on 4.7. Plan for ~20% less wall-clock productive time per cycle, or move to API billing where you have direct visibility into the cost.

If you’re a CTO/engineering lead deciding whether to migrate the team to 4.7. The capability gain is real — SWE-bench Verified jumped from ~80% (4.6) to 87.6% (4.7), agentic instruction following improved measurably, and the high-res vision (2576px) and task budgets are genuinely new capability. The cost increase is also real. Calculate the per-task cost change for your actual workloads using the count_tokens endpoint, then make the migration decision with both numbers in front of you. The right answer for many teams is “migrate to 4.7 for reasoning-heavy tasks, keep 4.6 for high-volume tasks” — not “all 4.7” or “all 4.6.”

What the Tokenizer Change Can’t Hide

A few honest limits on the cost-impact framing:

  • The capability gains are real and quantifiable. 4.7 isn’t 4.6 with a worse tokenizer. It’s a meaningfully better model — better at coding, better at instruction following, better at long-horizon agentic tasks. The cost question is whether the gains justify the increase for your specific workload, not whether to abandon 4.7.
  • CJK content is unaffected. Chinese, Japanese, Korean tokenization barely changed (1.01x). If your workload is primarily non-Latin-script, your cost stays flat.
  • Caching is real protection, not theoretical. OpenRouter’s data shows 93% absorption at 128K+ prompts when caching is in place. Teams already using prompt caching aggressively are paying a much smaller markup than the headline number.
  • The output verbosity is opt-in (sort of). The “thinks more at higher effort levels” cost amplifier is tied to the --effort setting. Running 4.7 at medium effort vs xhigh is a real cost difference. Tune effort levels per task type — don’t run everything at xhigh.

The Bottom Line

The per-token price stayed the same. The number of tokens went up 32–45% on the same English text. The effective cost increase on real workloads is 12–27% before cache absorption, 5–12% with disciplined caching, and near-flat or even cheaper on small prompts. The capability gains are worth it for hard reasoning. They’re probably not worth it for routine bulk processing where 4.6 (or Sonnet) does the job at lower cost.

Three habits will keep you ahead of the change: cache aggressively, route by task complexity, and audit your actual workloads with the count_tokens endpoint before assuming your budget projection is still accurate. The full migration to 4.7-only is a more expensive choice than the launch announcement suggests; the per-workload routing decision is where the savings live.

If you want to go deeper on managing AI costs across Claude vs ChatGPT vs Gemini and routing workloads to the right model, our ChatGPT vs Claude: Choosing the Right AI for Each Job course covers the decision framework. For the Claude Code specifics — caching, effort levels, and session-level cost management — our Claude Code Mastery course walks through the patterns end to end.

Sources

Build Real AI Skills

Step-by-step courses with quizzes and certificates for your resume