Claude Sonnet 4.5 1M Context Retired: 5-Min Migration to 4.6

If your Claude Sonnet 4.5 pipeline started throwing 200k token errors this morning, this is why: yesterday, April 30, 2026, Anthropic retired the context-1m-2025-08-07 beta header for Sonnet 4.5 and Sonnet 4. Any request that goes over 200k tokens now returns an error instead of silently working.

Migration target: Sonnet 4.6 or Opus 4.6, where 1M context is the default at standard pricing — no header needed. The fix is a two-line change. Here’s the walkthrough, plus the pricing shift that might mean you actually pay less after this migration than you did before.

What broke and why

The 1M context beta launched in August 2025 as a way to ship the longer context window before it was a fully supported product feature. Beta-flagged behavior at Anthropic always carries a retirement clause — and the retirement landed yesterday for Sonnet 4 and 4.5.

What it looked like in practice: any code calling claude-sonnet-4-5-20250929 (or older Sonnet 4 variants) with extra_headers={"anthropic-beta": "context-1m-2025-08-07"} and a prompt over 200k tokens worked fine on April 29. Same code on May 1 returns an error like 400 invalid_request_error: prompt is too long: 487291 tokens > 200000 maximum.

This is not a quota change, not a billing change, not a planned outage. It’s the deprecation timer hitting zero. Anthropic’s own documentation flagged April 30 in advance, but most teams running long-context pipelines didn’t see it until their first overflow request hit production today.

The 5-minute migration

Two lines of code if you’re using the Anthropic Python SDK directly. Slightly different if you’re going through Cursor, Aider, or another wrapper — covered below.

Direct SDK migration

# Before
client.messages.create(
    model="claude-sonnet-4-5-20250929",
    extra_headers={"anthropic-beta": "context-1m-2025-08-07"},
    messages=[...],
    max_tokens=4096
)

# After
client.messages.create(
    model="claude-sonnet-4-6",
    messages=[...],
    max_tokens=4096
)

Two changes:

Update the model ID from claude-sonnet-4-5-20250929 to claude-sonnet-4-6
Drop the beta header entirely — context-1m-2025-08-07 is no longer used; 1M context is the default on 4.6

Same applies if you’re upgrading to Opus:

client.messages.create(
    model="claude-opus-4-6",
    messages=[...],
    max_tokens=4096
)

That’s it. Test with one of your largest existing prompts (something close to 500k tokens is a good sanity check), and you’re migrated.

TypeScript / JavaScript SDK

// Before
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250929",
  defaultHeaders: { "anthropic-beta": "context-1m-2025-08-07" },
  messages: [...],
  max_tokens: 4096,
});

// After
const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  messages: [...],
  max_tokens: 4096,
});

If you go through Cursor / Aider / other wrappers

Cursor: open Settings → Models → Anthropic → switch active model to claude-sonnet-4-6 (or claude-opus-4-6). Cursor handles the header drop server-side; no manual config edit needed as of the latest version.

Aider: in your .aider.conf.yml or via CLI flag, switch --model from claude-sonnet-4-5-20250929 to claude-sonnet-4-6. Aider drops the beta header automatically once the model name updates.

Continue.dev, OpenCode, and similar tools: same pattern — update the model name in your config; the beta header isn’t something these tools expose anyway.

The pricing change you might have missed

This is the part that buried under the deprecation news. Before yesterday, running prompts over 200k tokens on Sonnet 4.5 cost you the long-context premium: $6 per million input tokens, $22.50 per million output, instead of the standard $3 / $15.

After migration to Sonnet 4.6, that premium tier is gone. The full 1M context now bills at the standard $3 / $15 across the entire window. For workloads that frequently overshoot 200k tokens — think RAG pipelines querying large knowledge bases, or agentic coding sessions on big repos — you’re looking at a real price drop, not just an API change.

Worked example: an 800k-token RAG query that returns a 2k-token answer.

Model	Input cost	Output cost	Total
Sonnet 4.5 + 1M beta (April 29 prices)	800k × $6/M = $4.80	2k × $22.50/M = $0.045	$4.85
Sonnet 4.6 (May 1 prices)	800k × $3/M = $2.40	2k × $15/M = $0.030	$2.43

About 50% cheaper for the same call. If your pipeline runs that 1,000 times a day, the migration just saved you $2,400 a month — and you only had to do it because your old code stopped working.

Sonnet 4.6 vs Opus 4.6: which to pick

The migration is a chance to rethink the model choice. Both now have 1M context at standard pricing; the question is whether the extra reasoning depth is worth the price gap.

Dimension	Sonnet 4.6	Opus 4.6
Pricing (input/output per M)	$3 / $15	$5 / $25
1M context (standard pricing)	Yes	Yes
SWE-bench Verified	~76-77%	~80%
Best for	RAG, document Q&A, fast agentic loops, ~70% of developer use cases	Complex multi-step reasoning, deep research, hard agentic coding
Speed	Faster (latency-optimized)	Slower (compute-heavy reasoning)

The simple decision rule:

If you were on Sonnet 4.5 and it worked: go to Sonnet 4.6. Same intelligence tier, slightly better output quality, same price.
If you were on Sonnet 4.5 and you wanted more reasoning: Opus 4.6 is the upgrade. The price gap is real (~67% more expensive) but on hard agentic tasks you’ll see fewer iterations, which often nets out cheaper.
If you’re building a new system today: start with Sonnet 4.6, escalate to Opus 4.6 only on the prompts where you can prove Sonnet falls short. Don’t pay Opus prices for tasks Sonnet handles cleanly.

What to test after migrating

A short list — don’t skip these.

Send your largest production prompt. Something close to 500k or 800k tokens, ideally one you saved from last week. Confirm it returns a real response, not an error.
Check your cost monitoring. If you have a daily spend dashboard, confirm the per-request cost dropped after migration. If it didn’t, you might still have a wrapper somewhere passing the old beta header — track it down.
Run your golden-path eval. If you have a test suite of 50-100 representative prompts, run them once on Sonnet 4.6 and compare quality to your last Sonnet 4.5 run. The output format is the same; quality should match or improve. If it doesn’t, file a regression.
Audit fallback / retry logic. If your code retries on errors, make sure it doesn’t keep slamming a deprecated model in a loop. Update any hardcoded model IDs in retry blocks.
Update your team’s docs. Anyone with a model ID in a runbook, monitoring config, or onboarding doc — flag it. The model name is the most-leaked-into-everywhere config in any LLM-using codebase.

What this can’t fix

A short list of honest limits:

Sonnet 4.6 is not a free quality upgrade for every task. On highly stylized writing or specific creative tasks where Sonnet 4.5 had been hand-tuned in your prompts, expect a small adjustment period — output style is close but not identical.
The 1M context window doesn’t mean infinite memory. Models still struggle with deeply layered cross-references in very long contexts. Use retrieval to keep the most relevant 200k visible to the model, even if the window allows more.
Older Sonnet 4 (not 4.5) is also affected. If anyone on your team is still on a Sonnet 4 model ID, they’re hitting the same deprecation. Audit all API keys and model IDs across your org, not just the ones you remember updating.
Anthropic Sonnet 4 and Opus 4 (the older base versions, not the .5 / .6) reach full retirement June 15, 2026. If you’re still on either of those, you have six weeks to migrate before all calls fail, not just the long-context ones.

What this means for you

If you’re a solo developer or small team: make the change in your codebase today. It’s a two-line edit and a test run. The pricing drop alone justifies dropping everything else for an hour.

If you’re at a 5-30 engineer team: the lead engineer should ship the change today, then send a Slack message tagging anyone with model IDs in their workflows. Update the team’s coding-assistant config (Cursor / Continue / Aider) at the same time.

If you manage a production AI pipeline: the migration itself is fast, but the eval+monitoring loop deserves a half-day of focus. Don’t ship to prod blind. Test against your golden-path eval suite, watch the first 24 hours of cost dashboards closely, and have a rollback plan to Opus 4.6 if quality dips.

If you’re a non-developer using Claude through the consumer apps: none of this applies to you. The web app, desktop app, and mobile apps all auto-route to the latest model. Carry on.

The bottom line

Two-line code change. Test against your largest prompt. Watch the cost dashboard. The migration is small and the price drop is real for long-context workloads. If your pipeline started erroring this morning, you’re not alone — and you can be back to green by lunch.

If you want to go deeper on Claude API design patterns, our Claude Code Mastery course covers the longer arc of building reliable agentic systems with the latest Claude models.

Sources: