Claude Opus 4.7 Release Tracker: Shipped April 16, 2026 — First-Week Verdict

Claude Opus 4.7 shipped April 16, 2026. Model ID: claude-opus-4-7. Available across Claude Pro/Max/Team/Enterprise, the API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, Claude Code, Cursor, and GitHub Copilot from day one.

👉 Read our full first-look review with benchmarks and new features →

This page tracks what actually shipped, how it’s being received, and the “tokenizer trap” that’s making Pro users hit limits faster than on 4.6. Updated April 21 with first-week reception data.

May 15 update (one month in): Code with Claude SF (May 6-7) shipped three major agent features that lean on Opus 4.7 — Dreaming (between-session memory consolidation, Harvey reported 6× task completion), Outcomes (rubric-based success criteria, agents loop until met), and multi-agent orchestration (lead agent spawns specialized sub-agents on shared filesystem). The tokenizer’s ~35% inflation has NOT been patched — community is treating it as baked-in; mitigation = prompt caching (90% discount on reads) + tighter max_tokens. Sonnet 4.7 remains unannounced. Cloud rollout was clean: Bedrock, Vertex AI, Foundry, and Snowflake all launched April 15-16 with no slippage. See new “One Month In” section below.

Quick Answer

Question	Answer
Launch date	April 16, 2026
Model ID	`claude-opus-4-7`
API pricing	$5/M input, $25/M output (same as 4.6)
Context window	1M tokens at base price (no long-context premium)
Available on	Claude Pro/Max/Team/Enterprise, Bedrock, Vertex AI, Foundry, Claude Code, Cursor, Copilot
Headline benchmark	SWE-bench Verified 87.6% (up from 80.8% on 4.6)
The catch	New tokenizer uses up to ~35% more tokens for the same text
Companion launch	Claude Design — shipped April 17, research preview

Benchmarks at Launch

Anthropic’s model card and launch-week coverage agreed on the numbers across Vellum, ExplainX, Datalearner, and Cursor’s own before/after report.

Benchmark	Opus 4.7	Opus 4.6	Δ
SWE-bench Verified	87.6%	80.8%	+6.8 pts
SWE-bench Pro	64.3%	~53–60%	+4 to +11 pts
SWE-bench Multilingual	80.5%	77.8%	+2.7 pts
Terminal-Bench 2.0	69.4%	—	new
CursorBench (real Cursor IDE)	70%	58%	+12 pts
GPQA Diamond (grad-level science)	94.2%	~91.3%	+2.9 pts
BigLaw Bench	90.9%	—	new
OSWorld-Verified (desktop agent)	78.0%	—	new
MCP-Atlas (tool use)	77.3%	75.8%	+1.5 pts

Coding and agentic work is where 4.7 actually earns its keep. The +12-point CursorBench jump and the SWE-bench gains drove most of the positive launch-day reception — Cursor announced 50% off Opus 4.7 inference the same day to encourage adoption.

What Actually Shipped

Five features worth knowing about:

1. xhigh effort level. A new tier between high and max on the effort parameter. Claude Code defaults to xhigh for many coding workflows after 4.7. More compute per turn, higher quality, higher cost.

2. Task budgets. Runtime token budgets you can set on long-running agents so they can’t silently burn through your quota. Widely flagged by launch-week analysts as the most underrated feature — especially if you’re wiring up multi-step autonomous work.

3. /ultrareview in Claude Code. A slash command for deep multi-agent code review. Paid tiers get a small number of free runs; beyond that it’s a premium mode.

4. Vision up to 2,576px / ~3.75MP — up from 1,568px / ~1.15MP on prior Claude models. Aimed at screenshots, diagrams, and “computer use” workloads where detail matters.

5. 1M-token context at base price. No long-context premium. Anthropic called this out directly as a shot at competitors who charge extra past 128K or 200K.

Other under-the-hood wins: ~81 tokens/second throughput (vs ~72 on 4.6), automatic cybersecurity guardrails, and better tool use (MCP-Atlas 77.3% vs 75.8%).

The Tokenizer Trap (Read This Before You Switch)

This is the most important thing on the page.

Anthropic’s own model-update docs confirm it: Opus 4.7 uses a new tokenizer that can use roughly 1.0× to 1.35× as many tokens for the same text. The /v1/messages/count_tokens endpoint already returns different counts for 4.7 vs 4.6, so this isn’t a rumor — it’s documented.

List price is unchanged. Effective cost often isn’t.

What this looks like in practice:

Pro users are reportedly hitting weekly usage limits after just 3-4 complex queries — not because Anthropic cut the caps, but because each query now bills ~35% more tokens.
A Hacker News thread during launch week aggregated before/after token counts on identical prompts and showed the 35% figure tracking real workloads.
Some dev blogs called it “a stealth cost increase dressed as a version bump.”

If you’re swapping 4.6 → 4.7 in production, re-tune your max_tokens, task budgets, and effort levels before you look at your monthly bill. Several launch-week Substacks also flagged breaking changes around thinking.budget_tokens, temperature, and top_p — reasoning traces now default to hidden, which may force client code updates.

First-Week Reception (April 16–21)

Sentiment split cleanly along use-case lines.

The enthusiasts (devs + agent builders):

Cursor’s 58% → 70% CursorBench result was the single most-shared number of the week. Real coding gains, confirmed in a neutral benchmark.
Vellum described it as “our strongest generally available coding/agentic model.”
Long-horizon work (multi-step agents, complex engineering tasks, vision-heavy docs) saw the biggest gains, and power users noticed immediately.

The backlash (general Claude.ai users + cost-sensitive devs):

A Reddit thread titled “Opus 4.7 is not an upgrade but a serious regression” hit roughly 2,300 upvotes inside 48 hours.
An X post arguing “no improvement over 4.6 for general chat” cleared ~14,000 likes in the same window.
Common complaint: 4.7 feels “more confidently wrong,” more literal, and more fragile to prompt wording than 4.6 in default consumer chat.
GitHub Copilot briefly priced Opus 4.7 at roughly 7.5× premium versus other models, sparking its own mini-outcry.

The cleanest meta-analysis of the week, from a launch-week community digest: “higher ceiling, higher migration cost.” If you’re running agents or coding, 4.7 is a step up. If you’re dropping it into a consumer chat workflow with unchanged prompts and default settings, many users prefer 4.6.

Claude Design (Shipped April 17)

The rumored AI design tool shipped 24 hours after Opus 4.7 — as Claude Design, a research-preview visual creation tool powered by 4.7, available at claude.ai/design to Pro, Max, Team, and Enterprise users.

What it does: you describe what you want — a landing page, presentation, wireframe, marketing asset, product UI — and Claude returns a first visual you can refine with natural language and direct edits.

Who it’s for: Anthropic is targeting founders, PMs, marketers, and designers who want faster ideation without Figma/Adobe overhead. Research preview is available to Pro, Max, Team, and Enterprise.

First-week take from TechCrunch, Adweek, and others: promising but not a Figma replacement. The better characterization from a business-press roundup: “collapses the distance between an idea and a polished visual by hours or days.” Criticism is about scope (still limited, still experimental), not quality.

We did a hands-on with Claude Design → — 8 things it does well, 1 thing it can’t.

Cloud + IDE Availability

Surface	Status on April 16
Claude.ai (Pro / Max / Team / Enterprise)	Live day one
Claude API	Live day one
Amazon Bedrock	Live April 16
Google Cloud Vertex AI	Live April 15 (day before launch)
Microsoft Foundry	Live launch week
Claude Code	Live day one
Cursor	Live day one (with 50% off promo)
GitHub Copilot	Live launch week (initial 7.5× premium, since adjusted)

This is one of the widest day-one rollouts Anthropic has ever done. If you couldn’t access Opus 4.7 on your cloud by April 17, it wasn’t because of Anthropic.

Who Should Use Opus 4.7

If you run multi-step agents or write code daily: Switch. The SWE-bench and CursorBench gains are real, and xhigh + Task budgets are genuinely useful.

If you use Claude for general chat, writing, or one-off questions: Consider staying on 4.6 for now. Many users feel 4.7’s default behavior is worse for this use case, and the tokenizer change will burn through Pro limits faster.

If you’re on API and sensitive to cost: Run your actual workloads through /v1/messages/count_tokens on both 4.6 and 4.7 before switching defaults. The 35% ceiling is a real budget line item.

If you’re building with Claude Code: Upgrade, set effort=xhigh for hard problems, use Task budgets for autonomous runs, and try /ultrareview on your next PR.

One Month In: The May 6-7 Announcements and Current Status

The four weeks since launch turned Opus 4.7 from “new flagship” into the foundation of Anthropic’s agent platform. The most material developments:

Code with Claude SF (May 6-7): Three Features Built on 4.7

Dreaming (research preview). Between agent sessions, Claude reviews stored memory, finds patterns across past runs (recurring mistakes, convergent workflows, team-wide preferences), and either writes consolidated learnings back to memory or queues them for human review. Marketed as “REM sleep for agents.” Concrete impact data from launch: Harvey reported ~6× task completion improvement on repeat workflows after dreaming enabled. One independent user reported 5.4× on repeat tasks. The mechanism solves the persistent agent memory-bloat problem.

Outcomes. You define success rubrics (“all tests pass + PR prepared”), and the agent loops until it crosses the bar. Functions like an embedded grader for long-running workflows — particularly useful for code review, incident response, and any multi-step task where “done” is more than just “didn’t crash.”

Multi-agent orchestration. A lead agent spins up specialized sub-agents in parallel, each with its own tools, prompts, and models, all working on a shared filesystem with preserved event history. Anthropic’s docs use an incident-response example: lead agent delegates to sub-agents pulling deploy history, logs, metrics, and tickets, then synthesizes findings.

All three are exposed via Claude Managed Agents on the Claude Platform. Detailed write-up: Anthropic Launches Dreaming for Claude Agents — Let’s Data Science and Simon Willison’s live blog from the event.

The Tokenizer ~35% Inflation: No Patch, No Pricing Relief

A month later, the tokenizer issue we flagged on day one is now “baked-in” rather than a fixable bug. Anthropic kept Opus 4.7’s nominal pricing at the Opus 4.6 levels ($5/$25 per million tokens), but the new tokenizer’s 0-35% extra token emission means real per-request cost rose materially even though the list rate didn’t. Reddit and X are still calling it a “stealth hike.” Mitigation paths that actually work:

Aggressive prompt caching. Cached input reads at 90% discount makes the tokenizer overhead nearly disappear on workloads with repeated context (RAG, multi-turn agents, code-review bots).
Shrink static prompts. Anything above ~200 input tokens of boilerplate that ships every request is wasted spend at 4.7’s tokenizer ratio.
Lower max_tokens ceilings. Verbosity is part of the regression — capping output length recovers some of the loss.
Stick with Opus 4.6 for cost-sensitive workloads. Still available via the API. The “downgrade for cost reasons” calculus is now real.

One-Month Reception (Mid-May 2026)

Net sentiment has split sharply along use case. From Reddit, X, and the developer forums:

Power users (coding/agent builders, enterprises) mostly positive. Praise focuses on the long-horizon agentic gains, the new xhigh effort level, /ultrareview, and how well Opus 4.7 plays with the May 6-7 agent platform features. Cursor, Harvey, and other agent shops are publicly enthusiastic.
General Claude.ai consumer users more negative. Common complaints: perceived regressions on instruction-following, “combative” or “lazy” pushback on requests, verbosity, occasional over-cautious refusals on benign code, and a noticeable MRCR (long-context retrieval) drop. A vocal subgroup calls it “legendarily bad” for everyday chat use.
Cost-sensitive users uniformly negative on the tokenizer. The 12-35% effective price increase soured the launch arc for anyone hitting Pro/Max quota limits.

Sonnet 4.7 and 4.7.x Checkpoints — Still Pending

As of May 14, 2026:

No Sonnet 4.7 announcement from Anthropic.
No dated Opus 4.7 sub-checkpoints (no claude-opus-4-7-20260507-style IDs in any cloud or platform doc). The single claude-opus-4-7 GA model is the only Opus 4.7 surface.
Claude Code 2.1.110 added “xhigh” effort default + mobile push notifications, and 2.1.133 (May 8) continued the Code-side iteration, but no model-side checkpoint.

So the next major Anthropic ship date to watch isn’t a model-version release — it’s the next set of agent platform features building on Opus 4.7, Dreaming, and Outcomes. Watch the Claude release notes for cadence.

The Bottom Line

Opus 4.7 is a better coding model and a better agentic model. That part is not subjective — the benchmarks line up across multiple neutral evaluators and Cursor’s own before/after.

The story that gets lost in the launch-day excitement is the tokenizer. Same $5/$25 sticker, ~35% more billable tokens for the same text. If you manage usage-capped workflows on Pro — or line items on an API bill — that’s the number to optimize around, not SWE-bench.

Treat 4.7 as a power-user model. Set effort levels deliberately. Keep 4.6 in your back pocket for general chat until the community consensus settles.

Sources (last verified April 21, 2026):