Why Your Old Claude Prompts Stopped Working in Opus 4.7

Claude Opus 4.7 shipped April 16, and the reception split into two camps almost immediately. The benchmarks crowd loved it (SWE-bench up 6.8 points, vision tripled, xhigh mode is real). The prompt-library crowd hated it (one viral Reddit thread titled “not an upgrade but a serious regression” hit 2,300 upvotes, citing token bloat and over-formatted output).

Both camps are right. The benchmarks improved. And a lot of carefully tuned 4.6 prompts now produce worse output on 4.7.

The reason is almost embarrassingly simple: 4.7 is more literal. It does what you actually wrote, not what you implied. Most prompts that worked on 4.6 worked because Claude was filling in gaps. On 4.7, the gaps stay gaps.

Here’s the migration map — seven specific patterns that break, with the rewrite formula that fixes each one.

What Actually Changed (One Paragraph)

Three shifts you need to know about:

Literal instruction following. 4.7 follows what you wrote. If your prompt said “for the first item, do X” but you wanted it for all items, 4.6 would generalize. 4.7 won’t.
More aggressive length calibration. Simple prompts get short answers. Complex prompts get long ones. The medium-essay default is gone.
New tokenizer. Text uses ~1.0–1.35× more tokens than 4.6 (worse on code-heavy prompts). High-res images can use up to 3× more (because max resolution went up to 2,576px from 1,568px).

That’s the foundation. Now the seven patterns.

Pattern 1: “Do It Once, Then Generalize to Everything”

What worked on 4.6:

Here’s how I want you to respond — Q1: definition + example + code. Now answer these 12 questions.

4.6 would silently generalize that pattern to all 12 questions, even though you only stated the rule once.

Why it breaks on 4.7: 4.7 won’t silently generalize an instruction from one item to another. It also won’t infer requests you didn’t make. By question 4, the model may have drifted to a different format. By question 12, you’re getting one-sentence answers.

The rewrite:

For every item in the list below, follow this response contract:
Restate the item name.
Give a 2-sentence explanation.
Add one example in code.
Apply this contract independently to each item. Q1: …

Make the generalization explicit. 4.7 will follow it reliably — it just won’t infer it.

Pattern 2: “Be Smart About Length For Me”

What worked on 4.6:

Explain this API design decision.

You’d get 3-5 paragraphs. Comfortable, mid-length, readable.

Why it breaks on 4.7: 4.7 calibrates length to its judgment of task complexity. The same prompt now gets one paragraph (if the model thinks it’s simple) or an essay (if it thinks it’s hard). Either way, the consistency in your prompt-library output is gone.

The rewrite:

Explain this API design decision in 3-5 short paragraphs, focusing on trade-offs and alternatives. Keep the total under 300 words.

Specify the format and length directly. 4.7 actually respects word count constraints better than 4.6 did — but only if you give them.

Pattern 3: Vague “Debug This / Research That”

What worked on 4.6:

Debug this function.

4.6 would do the implicit work: read, hypothesize, propose a fix, re-check.

Why it breaks on 4.7: At default or low effort, 4.7 scopes its work to literally what was asked. “Debug this function” gets a shallow one-shot suggestion instead of a methodical debug pass. The model didn’t get worse at debugging — it stopped volunteering to do more than you asked.

The rewrite (prompt + config):

In your API call:

thinking={"type": "adaptive"},
output_config={"effort": "high"}

In your prompt:

Debug this function by following these steps:
1. Explain what the function is supposed to do.
2. Identify likely failure points.
3. Propose a fix.
4. Re-check the updated code against the requirements.

Return only the final, fixed code and a brief explanation.

The combination of explicit steps + raised effort gets you back to (or better than) 4.6 behavior. Without both, you’ll wonder why the model “got dumber.”

Pattern 4: Relying on the Warm Default Tone

What worked on 4.6:

You are a helpful assistant for customer support.

4.6’s default tone was warm. Lots of “I understand how frustrating that must be” softening. If your product depended on that warmth, you didn’t have to ask for it.

Why it breaks on 4.7: 4.7’s default tone is “more direct and opinionated, fewer emoji.” If you depended on warmth, your customer-facing UX just became noticeably blunter overnight. Multiple support teams reported customer complaints in week 2 of the migration.

The rewrite:

You are a helpful, empathetic customer support assistant. Use a warm, reassuring tone. Acknowledge feelings explicitly. Avoid curt or abrupt phrasing. Use plain language and short sentences.

Spell out the tone. 4.7 hits it cleanly once instructed — it just won’t default to it.

Pattern 5: Prefill Tricks for Format Control

What worked on 4.6:

messages=[
  {"role": "system", "content": "..."},
  {"role": "user", "content": "Classify this..."},
  {"role": "assistant", "content": "{"},   # prefill the opening brace
]

This trick stripped “Here is the JSON output:” preambles and forced raw JSON output. It was one of the most common patterns in production code.

Why it breaks on 4.7: Prefilling assistant messages returns a 400 error on Claude Opus 4.7. It doesn’t degrade — it hard-fails. Any prompt library using this trick stops working entirely.

The rewrite:

output_config={"format": {"type": "json_object"}}
messages=[
  {"role": "system", "content": "Return a single JSON object matching this schema: ..."},
  {"role": "user", "content": "Classify this..."},
]

Use structured output via output_config.format. It’s actually cleaner than the prefill trick was.

Pattern 6: Manual Progress and Safety Rails

What worked on 4.6:

After every 3 tool calls, summarize progress to the user. Before returning a slide deck, double-check the layout for obvious mistakes.

4.6 needed these rails. Without them, you’d get sloppy slide layouts and silent multi-tool spans where the user had no idea what was happening.

Why it breaks on 4.7: 4.7 self-checks layouts and provides progress updates natively. Your old scaffolding now causes redundant status spam and over-checking. Worse, the over-checking sometimes triggers Claude to second-guess itself and revise a correct answer into a wrong one.

The rewrite:

Provide brief progress updates only when you finish a major subtask or when the user would otherwise be waiting more than ~20 seconds. Avoid repeating information already given.

Remove the explicit double-checks. Trust 4.7 to do them itself. Re-add only the rails that prove necessary after testing.

Pattern 7: “Use Tools When Helpful”

What worked on 4.6:

Use the tools available when helpful.

4.6 was tool-happy. It would web-search aggressively, run code execution liberally, fact-check by default.

Why it breaks on 4.7: 4.7 uses tools less often than 4.6 and reasons more, especially at lower effort. Agent harnesses that assumed frequent tool usage now under-call tools and drift off stale context. Your “research this topic” agent might never search. Your “calculate this” agent might guess instead of running code.

The rewrite:

When answering, prefer tools in these situations:
Use web_search when the answer depends on current data or external services.
Use code_execution for non-trivial calculations or to run code.
If uncertain about a fact, call a tool before guessing.
Err on the side of using tools for ambiguous or high-impact tasks.

Make tool policy explicit. For agentic search and coding, also raise effort to high or xhigh — 4.7 ties tool usage to effort level more tightly than 4.6 did.

The Token Cost Math (When Does 4.7 Cost 1.0x vs 3.0x?)

People are panicking about “3x more expensive.” That number is real for one specific case. For most workloads, the math is much less scary.

Workload	Expected token multiplier	Why
Short chat, plain language	1.0–1.15x	Tokenizer is nearly identical for short English text
Compact JSON schemas	1.05–1.20x	Long identifiers add some bloat
Code-heavy contexts (repos, stack traces)	1.25–1.35x	Code tokenization is where 4.7 is worst
Long agent traces with interleaved tools	1.25–1.35x	Each tool call/result is its own token block
Full-res images (2,576px)	Up to 3.0x	Resolution went from 1,568px to 2,576px
Vision-heavy agents at full res	Up to 3.0x	Same image-token reason

Practical defaults to avoid budget shock:

For text-only workloads: budget +25% tokens. Plan for it. Move on.
For images: downsample to 1,568px (the old max) unless you genuinely need higher resolution. You’ll save 60% on image tokens with minimal accuracy loss.
For agentic code work: this is where the multiplier is biggest and the most easily justified by quality gains. Don’t downsize, but do enable prompt caching for shared system prompts and use the batch endpoint for non-urgent jobs.

Claude Code Workflow Changes

If you live in Claude Code or a similar IDE integration, three additional shifts hit you:

xhigh is the new coding default. Most IDE integrations set this automatically. Expect deeper reasoning, more multi-file refactors per call, and higher token use. You can dial down, but you’ll feel the difference.
Extended thinking budgets are gone. budget_tokens doesn’t exist anymore. Use thinking={"type": "adaptive"} + effort setting. Thinking is also off by default — set display="summarized" if your workflow needed visible reasoning in the stream.
Fewer but fatter tool calls. 4.7 makes fewer total tool calls but does more per call. For Claude Code users, this feels like fewer micro-edits and more multi-file refactors per command.

There’s also a bundled migration helper: run /claude-api migrate inside Claude Code, and it’ll rewrite your code to 4.7 — swap model IDs, remove sampling params, convert thinking-mode. Use this before hand-tuning anything. It does the boring 80%.

What This Means for You

If you maintain a production prompt library: Treat the 4.7 migration like a schema migration. Run the seven patterns above against your library, find what breaks, rewrite systematically. Budget a real chunk of engineering time — most teams report 2-5 days of cleanup for a mature library. Don’t try to migrate prompts one-at-a-time as users complain; you’ll spend a month firefighting.

If you have a hobby setup or personal prompts: Most of yours probably work as-is. The patterns that break hardest are the “clever, underspecified” ones — agent harnesses, multi-step automated flows, anything that depends on Claude reading between the lines. Single-shot chat prompts are mostly fine.

If you use Claude Code: Run /claude-api migrate first. It catches the parameter changes. Then re-test your most-used agent flows; you’ll find the prompt-side issues quickly because the model’s behavior is consistently more literal.

If you’ve been waiting to switch from GPT-5.5 to Claude: Don’t let this stop you. The literal-instruction issue is solvable once you know about it, and 4.7’s benchmark gains are real. Treat the prompts you write now as 4.7-native and skip the migration cycle entirely.

What NOT to Change (Still Works in 4.7)

Just as important as what breaks: a lot of 4.6 best practices got better on 4.7.

Strong system prompts and explicit role definitions. 4.7’s instruction hierarchy is tighter — system prompts hold more weight. Doubling down here is more valuable, not less.
Structured outputs (JSON/XML). 4.7 respects XML tags more literally than 4.6 did. Moving to <rules>, <examples>, <output> blocks works better than free-form Markdown.
Direct style/verbosity instructions. “Be concise,” “use bullet points,” “target ~150 words” — all of these work more reliably than they did on 4.6 because 4.7 adheres to format constraints more strictly.
Long-context workflows. Still a 1M-token window at the same list price, with improved long-horizon memory. Any 4.6 design built around “dump a ton of context, let the model track it” still works.

Most of your prompt library is fine. The seven patterns above account for ~80% of the visible regressions. Fix those and move on.

The Bottom Line

The “Opus 4.7 is a regression” narrative isn’t wrong — it’s just incomplete. 4.7 is better at most things and worse at exactly one thing: rewarding lazy prompting. The fix isn’t to roll back; it’s to write prompts that say what you actually want.

If you maintain a prompt library, the seven-pattern migration above is the practical path. If you’re starting fresh, write to 4.7’s literal interpretation from the start and you’ll never feel the regression at all.

If you want to actually get good at writing prompts that work on every model — including the next one Anthropic ships — our Prompt Engineering course covers the patterns that survive model transitions, not the tricks that break the next time the tokenizer changes.

Sources:

Why Your Old Claude Prompts Stopped Working in Opus 4.7

Table of Contents

What Actually Changed (One Paragraph)

Pattern 1: “Do It Once, Then Generalize to Everything”

Pattern 2: “Be Smart About Length For Me”

Pattern 3: Vague “Debug This / Research That”

Pattern 4: Relying on the Warm Default Tone

Pattern 5: Prefill Tricks for Format Control

Pattern 6: Manual Progress and Safety Rails

Pattern 7: “Use Tools When Helpful”

The Token Cost Math (When Does 4.7 Cost 1.0x vs 3.0x?)

Claude Code Workflow Changes

What This Means for You

What NOT to Change (Still Works in 4.7)

The Bottom Line

Build Real AI Skills

Prompt Engineering

Claude Code Mastery

ChatGPT vs Claude