Claude Opus 4.7 is live. Not “coming soon,” not “leaked,” not “this week.” Live — as of a few hours ago.
After three days of Polymarket odds, The Information exclusives, and stock market jitters, Anthropic shipped its next flagship model on April 16, 2026. Model ID: claude-opus-4-7. Available right now on the Claude platform, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
The AI Design Tool that everyone expected to ship alongside it? Didn’t happen. Today is purely about the model upgrade. And the numbers are worth paying attention to.
What Is Claude Opus 4.7?
If you’re new to Claude, here’s the quick version: Claude is Anthropic’s AI assistant — similar to ChatGPT or Google Gemini. Opus is their most powerful model, the one you use for hard problems. Version 4.7 replaces version 4.6, which launched in February 2026.
For developers and power users: Opus 4.7 is an incremental but meaningful upgrade over 4.6. Better at coding, better at vision, new effort controls, and a multi-agent code review command. Not a ground-up rewrite — more like a car that got a tuned engine, a sharper camera, and a new driving mode.
The Benchmarks: What Actually Improved
Numbers first. Here’s how Opus 4.7 compares to its predecessor and the competition:
Opus 4.7 vs Opus 4.6
| Benchmark | Opus 4.7 | Opus 4.6 | Change |
|---|---|---|---|
| SWE-bench Verified (real-world coding) | 87.6% | 80.8% | +6.8 |
| SWE-bench Pro (harder coding) | 64.3% | ~60% | +4.3 |
| CursorBench (IDE coding) | 70% | 58% | +12 |
| GPQA Diamond (graduate reasoning) | 94.2% | ~91% | +3 |
| Terminal-Bench 2.0 (CLI tasks) | 69.4% | — | new |
| OSWorld-Verified (desktop automation) | 78.0% | — | new |
| MCP-Atlas (tool use) | 77.3% | 75.8% | +1.5 |
| MMMLU (multilingual Q&A) | 91.5% | 91.1% | +0.4 |
The headline number: SWE-bench Verified jumped 6.8 points — from 80.8% to 87.6%. That’s the benchmark that measures whether an AI can actually fix real bugs in real open-source repositories. A 6.8-point jump in one release is significant. For context, the jump from Opus 4.5 to 4.6 was about 5 points.
CursorBench — which tests AI coding inside the Cursor IDE — jumped 12 points. That’s the biggest single-benchmark improvement in the release.
Opus 4.7 vs the Competition
| Benchmark | Opus 4.7 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| SWE-bench Verified | 87.6% | ~83% | ~81% |
| GPQA Diamond | 94.2% | 94.4% | 94.3% |
| MMMLU | 91.5% | 90.8% | 91.2% |
On coding, Opus 4.7 leads. On graduate-level reasoning (GPQA), GPT-5.4 has a hair-thin 0.2% edge. Multilingual is essentially a three-way tie. The takeaway: Opus 4.7 is the best coding model available right now, and competitive on everything else.
One efficiency note that surprised people: low-effort Opus 4.7 performs roughly like medium-effort Opus 4.6. If you’re on Opus 4.6 today and switch to 4.7 at the same effort level, you’re effectively getting a free upgrade. If you drop to a lower effort level, you save tokens while maintaining the same quality.
The developer community noticed fast. Cursor announced Opus 4.7 support within minutes of launch — with 50% off to drive adoption. Replit reported achieving the same quality at lower cost with the new model. And Poe’s platform team confirmed that “coding performance has improved meaningfully in Opus 4.7 compared to Opus 4.6.” The early consensus: this is a real upgrade, not a marketing bump.
New Feature: xhigh Effort Level
Claude has had effort levels for a while — low, medium, high, and max. They control how much “thinking” the model does before responding. More thinking = better reasoning = more tokens consumed.
Opus 4.7 adds xhigh — a new level between high and max. Think of it as “try harder than usual, but don’t go nuclear.”
Here’s the practical decision framework:
| Effort | Best For | Token Cost |
|---|---|---|
| Low | Simple, well-defined tasks. Quick answers. | Lowest |
| Medium | Everyday work — bugs, features, refactoring | Standard |
| High | Complex debugging, multi-file refactors, architecture | Higher |
| xhigh | Hard problems where high wasn’t quite enough | ~2x high |
| Max | Genuinely hard: algorithmic complexity, mysterious bugs, critical design | Highest |
When is xhigh worth the tokens? When you find yourself re-prompting on high because Claude’s first answer wasn’t quite right. If a task takes three tries on high, one try on xhigh might be cheaper overall — fewer retries means fewer total tokens.
For the API: use thinking: {type: "adaptive"} with the effort parameter. Note that manual extended thinking is no longer supported in Opus 4.7 — it’s adaptive-only now. Thinking tokens are billed at output rates ($25/M).
New Feature: /ultrareview — Multi-Agent Code Review
This is the feature that developers will talk about most. /ultrareview is a new Claude Code slash command that runs a multi-agent code review on your codebase.
Instead of a single Claude instance scanning your code, /ultrareview spawns multiple specialized agents — one for security, one for logic, one for performance, one for style — and synthesizes their findings into a single report. It’s like having four senior engineers review your PR at once.
Early reactions from the developer community describe it as catching issues that single-pass review consistently misses — particularly subtle logic errors and security patterns that require cross-file reasoning.
We haven’t seen detailed benchmarks on /ultrareview specifically, so consider this a “promising but verify” feature for now.
New Feature: Task Budgets (Public Beta)
Task budgets let you set a spending cap on agentic workloads. If you’ve ever kicked off a Claude Code task and watched it run for 45 minutes consuming tokens with no stop in sight, this is the fix.
Set a budget in dollars or tokens for a task, and Claude will work within that constraint — prioritizing the highest-impact actions and stopping when the budget is consumed. Available in public beta through the API.
This matters for teams. Engineering leads who’ve been nervous about giving developers open-ended Claude Code access now have a cost control mechanism. Set a $5 budget per task, and nobody accidentally burns $50 on a runaway refactoring loop.
New Feature: 2,576px Native Vision
Opus 4.7 processes images at up to 2,576 pixels on the long edge — more than three times the capacity of previous Claude models (which topped out around 800px effective resolution).
What this means in practice: you can feed Claude a full-resolution screenshot of a web page, a dense architectural diagram, or a photo of a whiteboard, and it can read details that were previously blurred or lost. For UI review work, this is a meaningful upgrade — Claude can now spot pixel-level issues that required downscaling before.
What Didn’t Ship
The AI Design Tool / Builder is not in this release. The Information reported on April 14 that Opus 4.7 would ship alongside a prompt-based design tool for websites, presentations, and product mockups. Anthropic’s actual announcement is Opus-only. The Builder product remains unshipped.
If you’re a designer who was watching for this — keep watching. The tool is still expected, just not today.
Pricing: Unchanged
| Input | Output | |
|---|---|---|
| Opus 4.7 | $5/M tokens | $25/M tokens |
| Opus 4.6 | $5/M tokens | $25/M tokens |
Same price, better model. The one caveat: if you use xhigh effort, you’ll consume more thinking tokens (billed at the $25/M output rate). A task that cost $0.10 on high might cost $0.18-0.20 on xhigh. Whether that’s worth it depends on how many retries you save.
Where to use it: Available now on the Claude platform (claude.ai), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. If you’re on Claude Pro or Max, you already have it — Opus 4.7 replaces 4.6 automatically.
What It Can’t Do (Honest Limitations)
Let’s keep this grounded:
- It’s not Mythos. Claude Mythos Preview (the cybersecurity model) remains separate and unavailable to the public. Opus 4.7 is an incremental improvement, not a leap to the next generation.
- The reasoning gains are incremental, not transformative. GPQA went up 3 points. MMMLU went up 0.4 points. This is polishing, not breakthrough.
- xhigh costs more and isn’t always better. On simple tasks, it just burns tokens without improving the answer. Use it for genuinely hard problems.
- Context rot still applies. The 1M token window hasn’t changed, and session management still matters. More capable model, same context dynamics. (See our session management guide for the techniques that keep sessions sharp.)
- No design tool today. If you were waiting for the Builder product, keep waiting.
What This Means for You
If you already use Claude (Pro or Max): You have Opus 4.7 right now — it replaced 4.6 automatically. You don’t need to change anything. But try xhigh on your next hard coding problem and see if it saves you retry cycles. And try /ultrareview on a codebase you know well — it’s the fastest way to evaluate whether multi-agent review catches things you’d miss.
If you’re a developer using Claude Code daily: The 12-point CursorBench jump and 6.8-point SWE-bench jump are real. Long-running agentic tasks should feel noticeably more reliable. Set up Task budgets if you haven’t already — it’s the responsible way to run extended auto mode without surprise bills.
If you’re choosing between Claude, ChatGPT, and Gemini: Opus 4.7 is now the best coding model on the market by measurable benchmarks. GPT-5.4 still edges it on pure reasoning (by 0.2% on GPQA). Gemini 3.1 Pro is competitive everywhere but doesn’t lead anywhere. For coding work, the data says Claude. For everything else, it’s a close three-way race.
If you build products on the Claude API: Same pricing, better model — no migration needed. The xhigh effort level is worth benchmarking on your specific workload. And Task budgets in public beta should be on your evaluation list for production agentic systems.
If you’re new to AI tools: This is a good day to try Claude. The free tier gives you access to Sonnet (not Opus), but Pro ($20/month) gives you full Opus 4.7 access. If you’ve been on the fence, the gap between Claude and the alternatives just got wider on coding tasks.
The Bottom Line
Opus 4.7 is a solid upgrade, not a revolution. The coding improvements are the story — SWE-bench up 6.8 points, CursorBench up 12 points, and low-effort 4.7 matching medium-effort 4.6 for free. The new features (xhigh, /ultrareview, Task budgets, 2,576px vision) are practical additions that solve real problems.
The AI Design Tool didn’t ship. That’s the thing people will remember about today — not because Opus 4.7 is disappointing, but because the expectation was set for something bigger.
But on its own merits? Opus 4.7 is the best model Anthropic has released to the public, and as of today, the strongest coding AI you can actually use.
Sources:
- Anthropic: Introducing Claude Opus 4.7
- Anthropic: Claude Opus Product Page
- AWS Blog: Introducing Claude Opus 4.7 in Amazon Bedrock
- Claude API Docs: What’s New in Opus 4.7
- Claude API Docs: Effort Levels
- OfficeChai: Opus 4.7 Beats GPT-5.4 on Most Benchmarks
- BenchLM: Claude API Pricing April 2026
- Investing.com: Anthropic Launches Opus 4.7
- Yahoo Finance: Anthropic Launches Opus 4.7