Opus 4.7 'Regression': Why Your Prompts Stopped Working

Within 24 hours of Opus 4.7 shipping, the complaints started. “Significantly worse than 4.6.” “Doesn’t think at all for complex tasks.” “Serious regression.” One developer switched back to 4.6 within minutes. A Japanese user posted a single devastating line: “評判悪すぎて速攻4.6にした” — the reviews are so bad I switched back to 4.6 immediately.

And then Boris Cherny — the lead engineer on Claude Code at Anthropic — posted something that got 936 likes and zero argument: “Opus 4.7 is a significant step up. To get the most out of it, take the time to adjust your workflow.”

Both sides are right. And the gap between them is the most important prompting lesson of 2026.

What Actually Changed Under the Hood

Before we get to the drama, here’s what Anthropic shipped in Opus 4.7 that directly affects how your prompts work:

1. It Takes You Literally Now

This is the big one. One developer nailed it in a single post: “No more silent generalisation. If you didn’t ask for it, it won’t do it. Great for pipelines. Annoying if your prompts relied on Claude filling in the gaps.”

Opus 4.6 was forgiving. If you wrote a vague prompt, it would guess what you meant and fill in reasonable defaults. Opus 4.7 does exactly what you say — no more, no less. For production pipelines, this is a huge upgrade. For people used to lazy prompting, it feels like a downgrade.

2. New Tokenizer (Up to 35% More Tokens)

Opus 4.7 uses a new tokenizer. The same input can map to roughly 1.0-1.35x as many tokens compared to Opus 4.6 — up to 35% more, depending on content type. Some developers have measured even higher: 1.16-1.51x in real-world usage.

Anthropic’s response? They increased rate limits for all subscribers to compensate. Boris Cherny announced this in a post that got 15,000+ likes — the most-engaged Anthropic engineering post we’ve seen.

3. Adaptive Thinking Only

In Opus 4.6, you could set a manual thinking budget: “use exactly 32,000 tokens for reasoning.” In 4.7, that’s gone. The model now decides how much to think based on the complexity of your request. This is called adaptive thinking.

For simple questions, it thinks less (faster, cheaper). For complex problems, it thinks more (slower, better). You can’t force a specific budget anymore — only steer with effort levels (low, medium, high, xhigh, max).

4. Temperature Controls Removed

If your API calls used custom temperature, top_p, or top_k settings, they’ll now return a 400 error. Opus 4.7 doesn’t accept these parameters. You guide output style through prompting, not sampling parameters.

The “Regression” Is Actually a Precision Upgrade

Here’s the pattern we’re seeing across hundreds of community posts:

What Users Say	What’s Actually Happening
“It ignores parts of my prompt”	It follows instructions literally — if you didn’t say it, it won’t do it
“The output is shorter”	Length now matches complexity — it won’t pad short answers anymore
“It doesn’t think on simple tasks”	Adaptive thinking correctly allocates less reasoning to easy tasks
“My pipeline broke”	Custom temperature/thinking parameters are no longer accepted
“It costs more tokens”	New tokenizer uses 1-1.35x more tokens per input
“The warmth is gone”	Less validation and encouragement — more direct, professional tone

Each of these is a real change. None of them is a regression in capability. The model didn’t get dumber — it got more precise. And precision punishes imprecision.

A senior engineer at AMD filed a detailed bug report analyzing 6,852 Claude Code sessions and concluded the model “could not be trusted for complex engineering work.” Anthropic’s Boris Cherny responded that the specific technical change cited (a “redact-thinking” header) was UI-only and “does not impact thinking itself.” The disagreement isn’t about data — it’s about expectations meeting a different model personality.

5 Prompt Fixes That Solve the “Regression”

If Opus 4.7 feels worse on your workload, try these specific changes. Each one addresses a real behavioral shift in 4.7.

Fix 1: Be Explicit About Scope

Before (worked on 4.6, breaks on 4.7):

“Review this function”

After (works on 4.7):

“Review this function for: (1) correctness — does it handle edge cases for null inputs and empty arrays? (2) performance — are there any O(n²) patterns that could be O(n)? (3) readability — suggest clearer variable names if any are ambiguous”

Why: 4.6 would guess you wanted all three. 4.7 takes “review” at face value and might only check one dimension. Spelling out what you want costs 30 extra words and gets 3x better results.

Fix 2: Request Reasoning When You Need It

Before:

“Is this approach correct?”

After:

“Analyze this approach step by step. Show your reasoning. Then give a verdict: correct, partially correct, or incorrect — with specific evidence.”

Why: Adaptive thinking might decide a yes/no question needs minimal reasoning. If you want deep analysis, ask for it explicitly. The magic phrase is “show your reasoning” or “think through this step by step.”

Fix 3: Specify Output Format

Before:

“Explain how authentication works in this codebase”

After:

“Explain how authentication works in this codebase. Structure your answer as: (1) The auth flow from login to token refresh, (2) Key files involved with their responsibilities, (3) Potential security concerns, (4) How a new developer should test auth changes”

Why: 4.6 would generate a long, exploratory explanation. 4.7 matches output length to perceived complexity. If the question looks simple, you get a simple answer. Adding structure signals “I want depth.”

Fix 4: Use Effort Levels Instead of Temperature

Before (4.6 API call):

temperature: 0.3, top_p: 0.9

After (4.7 API call):

thinking: {type: "adaptive"}, effort: "xhigh"

Why: Custom sampling parameters now return errors. Use effort levels instead — they control thinking depth, which is what most temperature tweaks were actually trying to achieve. Low for fast answers, xhigh for hard problems.

Fix 5: Tell It When to Be Thorough

Before:

“Fix this bug”

After:

“Fix this bug. Before writing the fix: (1) Read the relevant test file to understand expected behavior, (2) Check if similar bugs exist in adjacent functions, (3) Write the fix, (4) Verify the fix handles the edge case described in the error message”

Why: 4.6 would sometimes do these steps on its own. 4.7 does what you ask — and only what you ask. If you want a thorough investigation before a fix, say so.

✅ Quick Check: Notice the pattern across all five fixes? They all add specificity. That’s the entire lesson: Opus 4.7 rewards specificity and punishes vagueness.

The Community Is Split (And That’s Informative)

The reactions to Opus 4.7 fall into two clear camps:

Camp 1: “It’s worse”

Developers whose prompts relied on Claude filling in gaps
People running non-coding tasks (creative writing, analysis) where the “warmth” mattered
Users with custom temperature/thinking parameters in their API calls
Anyone who expected a drop-in replacement without prompt changes

Camp 2: “It’s the best Opus yet”

Developers who write specific, structured prompts
People running production pipelines where literal instruction-following is critical
Anthropic’s own team (Boris Cherny: “I’ve been feeling incredibly productive”)
Users who adjusted their workflow within the first day or two

The split isn’t random. It tracks almost perfectly with prompting specificity. The more specific your prompts were before the upgrade, the better your experience with 4.7. The more you relied on Claude’s ability to “read your mind,” the worse it feels.

Boris Cherny’s own admission is the most telling data point: “It took a few days for me to learn how to work with it effectively.” Even the lead engineer on Claude Code needed adjustment time.

The Honest Counterarguments

We’re arguing that the “regression” is mostly a prompting issue. But some complaints are legitimate:

The token cost increase is real. If the same prompt costs 35% more tokens and you’re on a budget, that’s not a prompting problem — that’s a pricing change disguised as a model update. Anthropic addressed this by raising rate limits, but for API-heavy users paying per token, the bill goes up.

Non-coding tasks did take a hit. Multiple reports describe Opus 4.7 as worse at open-ended creative writing and exploratory analysis. The “length matches complexity” behavior means shorter, more direct answers — which is great for code review and terrible for brainstorming. If your primary use case is creative, this is a genuine tradeoff, not a prompting mistake.

The “warmth” matters. Opus 4.6’s tendency to validate and encourage wasn’t just filler — for some users, especially non-technical ones using Claude for daily work, that warmth was part of the product. Losing it without warning feels like a downgrade even if the technical capabilities improved.

What This Means for You

If you’re a developer and your Claude Code sessions feel worse: Start with Fix #1 (explicit scope) and Fix #5 (tell it to be thorough). These two changes alone fix most “regression” complaints. Also try /effort xhigh on tasks where you used to rely on extended thinking — it’s the closest equivalent to the old manual thinking budget.

If you’re an API user and your costs went up: Audit your tokenizer impact — compare input token counts between 4.6 and 4.7 on the same prompts. Then evaluate whether Task budgets (now in public beta) can cap your agentic workloads. The increased rate limits help, but they don’t change per-token pricing.

If you use Claude for writing and creative work: You might genuinely prefer staying on Sonnet 4.6 for creative tasks while using Opus 4.7 for coding and analysis. There’s no rule that says you have to use one model for everything. Match the model to the task.

If you’re new to Claude and don’t have old prompts to break: You’re actually in the best position. Start with Opus 4.7 and build good habits from day one — specific prompts, structured output requests, explicit reasoning instructions. You’ll never develop the “vague prompting” habits that 4.6 tolerated and 4.7 doesn’t.

The Bottom Line

Opus 4.7 isn’t a regression. It’s a model that stopped guessing what you mean and started doing exactly what you say.

For most people, adding 20-30 words of specificity to their prompts completely fixes the “regression.” For some use cases — creative writing, open-ended exploration, workflows that relied on custom temperature settings — the tradeoffs are real and worth acknowledging.

But the direction is clear: AI models are getting more precise, not less. The prompting habits that worked in 2025 and early 2026 — vague instructions, expecting the model to fill in gaps, relying on sampling parameters for style — are becoming liabilities. Opus 4.7 is the first model to make that shift obvious.

Your prompts need to grow up. And that’s actually good news — because specific prompting gives you better results on every model, not just Opus 4.7.

Sources:

Opus 4.7 'Regression': Why Your Prompts Stopped Working

Table of Contents

What Actually Changed Under the Hood

1. It Takes You Literally Now

2. New Tokenizer (Up to 35% More Tokens)

3. Adaptive Thinking Only

4. Temperature Controls Removed

The “Regression” Is Actually a Precision Upgrade

5 Prompt Fixes That Solve the “Regression”

Fix 1: Be Explicit About Scope

Fix 2: Request Reasoning When You Need It

Fix 3: Specify Output Format

Fix 4: Use Effort Levels Instead of Temperature

Fix 5: Tell It When to Be Thorough

The Community Is Split (And That’s Informative)

The Honest Counterarguments

What This Means for You

The Bottom Line

Build Real AI Skills

Claude Code Session Mastery

Prompt Engineering

Claude Code Mastery