Codex on Bedrock vs Claude Direct: Q3 Routing After May's 3 Compute Shocks

Three May 2026 routing shocks just changed your model spend math. The 4-question gate + 30-min setup for Codex-Bedrock, GPT-5.5-Bedrock, and Claude direct.

If your engineering team has been kicking the “do we route through Bedrock this quarter or stay on direct APIs” question down the road since April, this week pulled the rug out from under the comfortable answer.

In nine days, three things happened that change the math:

  1. May 3 — OpenAI’s Codex CLI shipped first-class AWS Bedrock support in v0.124.0. SigV4 signing. AWS credential auth. Usage billed against your existing AWS commits. (Codex changelog)
  2. May 5 — OpenAI brought GPT-5.5 Instant on Bedrock as a chat-latest route, the same week the model became the new ChatGPT default. (OpenAI on AWS)
  3. May 6 — Anthropic announced the SpaceX Colossus 1 compute deal and doubled Claude Code’s 5-hour rate limits for Pro, Max, Team, and seat-based Enterprise. Peak-hour reduction removed for Pro/Max. API Tier 1 Opus rate limits raised by roughly 1,500% input tokens per minute and 900% output tokens per minute, per coverage of Anthropic’s announcement. (Anthropic announcement) (TNW)

For the past 12 months the routing decision had a stable answer: route AI dev work through whichever vendor your CFO had already signed paper with. Bedrock if AWS-anchored. OpenAI direct if you’d already standardized on Azure-OpenAI or direct. Anthropic direct if you needed Opus and could swallow the rate-limit pain.

The May rate-limit doubling on Anthropic-direct — and the Codex/GPT-5.5 landing in Bedrock the same week — pushes that question back open for Q3. Below is a 30-minute decision tutorial: the 4-question gate, the per-route setup walkthrough, and the side-by-side at typical eng-team workload shapes.

If you’re under time pressure for a Q3 budget submission this month, scroll to The 4-question routing gate and come back for the setup details after.

What actually changed in the May routing window

Three separate vendor moves, all in nine days, all touching the same Q3 routing decision:

Route 1 — Codex on Bedrock (May 3)

Codex CLI v0.124.0 added Bedrock as a first-class auth target. The service-name string for SigV4 signing is bedrock-mantle (different from the usual bedrock-runtime), which trips up the first config attempt for most teams. The apply_patch tool got Bedrock-side fixes in the same release. (Codex CLI v0.124.0 — DevelopersIO)

Auth is your AWS credential chain. IAM role for ECS/EKS workloads. Inference processes through Bedrock. Usage applies to your AWS cloud commits. No second vendor relationship to set up. No second invoice. The procurement story collapses to “another model on the same Bedrock contract.”

The catch: OpenAI on Bedrock is in Limited Preview, region availability is uneven, and not every Codex feature lands on Bedrock at parity with the direct API. Treat your first 30-day window as a capability-fit test, not a full migration.

Route 2 — GPT-5.5 Instant on Bedrock (May 5)

Two days later, OpenAI dropped GPT-5.5 Instant as the new ChatGPT default and routed it through Bedrock as the chat-latest model. Same SigV4 setup, same bedrock-mantle service name, but it’s the latency-optimized model rather than the reasoning-heavy one — different routing characteristics. (Introducing GPT-5.5 Instant — OpenAI)

For an eng team running an internal copilot, a customer support agent, or a high-throughput classification pipeline, this is the route worth pricing out separately from the Codex one. Different model. Different math.

Route 3 — Claude on Anthropic-direct with doubled rate limits (May 6)

Anthropic’s SpaceX Colossus 1 deal lands them more than 300 megawatts of new capacity and over 220,000 NVIDIA GPUs within the month. (Anthropic announcement) The same announcement doubled the Claude Code 5-hour rolling rate limits for paid plans and substantially raised Opus API Tier 1 throughput.

For a team that hit “we’re rate-limited on Anthropic direct” as the binding reason to look at Bedrock six weeks ago, the binding constraint just changed. The cap-doubling lands today (May 7). Free plan unchanged. Weekly caps unchanged.

If you’ve been waiting for Anthropic to fix the rate-limit story before re-evaluating, the version of Anthropic you’re evaluating in Q3 is materially different from the one you were evaluating in March.

The 4-question routing gate

Before you compare per-token cost or run a benchmark, answer these four questions. They eliminate two of the three routes for most teams.

Question 1 — Is Bedrock budget consolidation a hard requirement this fiscal year?

If your CFO has put “consolidate AI vendor invoices to AWS Bedrock” on the FY26 procurement plan, you don’t have a routing decision — you have a setup task. Route 1 (Codex on Bedrock) and Route 2 (GPT-5.5 Instant on Bedrock) are your only options for OpenAI workloads, and you’ll either route Anthropic through Bedrock too or hold Anthropic-direct as a single exception.

If consolidation is “preferred but not required” or it’s not on the plan at all, all three routes stay on the table.

Question 2 — Are you Opus-anchored or GPT-5.5-Instant-anchored on the hot path?

Opus-anchored means your latency-critical workload genuinely needs Opus’s reasoning depth and you’ve benchmarked the gap to be material. The 1,500%/900% Tier 1 input/output token-per-minute boost on Anthropic-direct is the unlock for these workloads, not Bedrock. Route 3.

GPT-5.5-Instant-anchored means your hot-path workload is throughput-bound and you’d happily trade Opus’s reasoning depth for GPT-5.5 Instant’s 50-200ms p50 latency on a typical query. Route 2 if you’re AWS-anchored. Direct OpenAI otherwise.

If you can’t honestly say which model your hot path is anchored on, run a small benchmark before committing — this is the question that usually decides the route.

Question 3 — Can you swallow the regional availability gap?

OpenAI on Bedrock is in Limited Preview. As of early May 2026, GPT-5.5 is available in a small set of regions and GPT-5.4 is broader but not universal. (OpenAI Models on AWS Bedrock — MindStudio)

If your latency-critical workload runs in eu-central-1, ap-northeast-1, or another region that doesn’t yet have GPT-5.5 in preview, Route 2 isn’t available to you in Q3. Anthropic-direct (Route 3) gives you broader region coverage today, and the Anthropic-on-Bedrock route gets you Bedrock economics with broader regional availability than the OpenAI-on-Bedrock route.

Question 4 — Are you in the middle of an Anthropic plan negotiation?

If you’re inside an Anthropic Pro→Max upgrade decision or a Team→Enterprise renewal, the May 6 cap doubling materially changes what your plan ceiling actually delivers. Don’t sign the upgrade paperwork on March’s ceiling math. Re-run the per-seat throughput math against the post-May-6 ceiling, then negotiate.

The doubled cap doesn’t change the price. It changes what you get for the price. Most upgrade conversations in early May 2026 should pause for a re-pricing pass.

The decision matrix at typical eng-team shapes

Five common eng-team workload shapes, with the route that wins each:

Team shapeHot-path modelBest routeWhy
4-person AWS-anchored startup, Codex dailyGPT-5.5 (codegen)Route 1 — Codex on BedrockProcurement collapse; existing AWS contract handles billing; Limited Preview is acceptable for a small team
12-engineer scale-up, Cursor + Claude Code mixedSonnet 4.6 + OpusRoute 3 — Anthropic-directCap doubling unlocks Cursor headroom; Anthropic is the model layer; routing through Bedrock adds latency for marginal procurement benefit
30-person services shop, internal customer support agentGPT-5.5 InstantRoute 2 — GPT-5.5 on BedrockThroughput-bound, latency-critical, AWS-anchored; the new chat-latest route is the right Bedrock primitive
Enterprise 200+ devs, Claude Code Team planOpus + Sonnet 4.6Route 3 + Route 1 mixAnthropic-direct for Claude Code (cap doubling reduces split-account hack pressure); Codex on Bedrock as the OpenAI route only if individual pilots prove out
Regulated-industry team, latency-tolerant batch workloadEitherBedrock for whichever vendor the batch is onAudit trail, single VPC egress posture, Bedrock Provisioned Throughput pricing math

The pattern: throughput-bound workloads win on Route 2 if you’re AWS-anchored. Reasoning-anchored workloads win on Route 3 (the cap doubling is the unlock). The Codex-Bedrock route (Route 1) wins for OpenAI-shop teams that need Codex specifically and want to avoid the second vendor relationship.

Per-route setup, abbreviated

Each of the three setups has a public-docs walkthrough that’s better than what you’d get from a blog. The links below are the canonical sources. The notes are the gotchas you’ll hit on the first attempt.

Setting up Codex on Bedrock

  1. Update Codex CLI to v0.124.0+. (Codex releases) Earlier versions don’t have the SigV4 path. Confirm with codex --version.
  2. Configure your AWS credential chain. Codex picks up credentials from the standard AWS chain (env vars, ~/.aws/credentials, IAM role on ECS/EKS). Test with aws sts get-caller-identity before launching Codex.
  3. Set the Bedrock provider in ~/.codex/config.toml. The model entry takes a Bedrock provider key and a model id. The service-name string for SigV4 is bedrock-mantle — not bedrock-runtime — and the SDK throws a confusing error if you’ve hand-rolled the wrong service name in a custom client.
  4. Confirm region availability. OpenAI models on Bedrock are in Limited Preview as of early May 2026, with per-account opt-in enrollment. AWS has not yet published a canonical static region matrix listing exactly where each OpenAI model id is available; coverage typically starts with us-east-1 and us-west-2 and expands from there as enrollment opens. Check the Bedrock console or run aws bedrock list-foundation-models --region <your-region> before pointing Codex at a specific model id. (OpenAI Models on Bedrock — AWS announcement) (OpenAI Models on Bedrock — DEV Community)
  5. First Codex run. codex from the repo root. The first request will exercise the SigV4 path; if you see a 403, your IAM role probably needs bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream for the model id you’re calling.

The whole setup takes about 15 minutes if your AWS credential chain is already healthy.

Setting up GPT-5.5 Instant on Bedrock

Same auth path as Route 1. The model id is the GPT-5.5 Instant Bedrock id (check the AWS Bedrock console for the exact string in your region). For an internal-copilot or support-agent integration, you’ll typically wire it into the Bedrock Runtime client in your app code, not Codex CLI. (OpenAI on AWS — OpenAI)

The per-call latency on chat-latest is what makes this route worth it for throughput-bound workloads. If you’re hitting Codex with it instead, you’ve picked the wrong model — Codex on GPT-5.5 (the reasoning variant) is the right pairing.

Setting up Claude on Anthropic-direct with doubled limits

If you’re already on Pro, Max, or Team, the cap doubling lands today (May 7) automatically. Nothing to configure. Your 5-hour rolling cap is now 2× what it was Monday, and the peak-hour reduction is gone for Pro and Max.

For the Tier 1 Opus API throughput boost, the new tokens-per-minute rate apply automatically to existing API keys on Tier 1 — no migration needed. (Anthropic — higher limits) Verify with the Rate Limits API or by running a representative load test against your hot-path workload.

The check that matters: log a week of actual ceiling-touches before your Q3 plan-spend re-decision. If you used to hit the cap by 2pm and now you hit it at 5pm (or never), the upgrade decision math changes.

What this means for you

A few honest cuts at common team profiles. None of these are universal — your CFO, your existing AWS commitment, and your latency requirements shape the answer more than the model choice does.

If you’re a 4-person AWS-anchored startup: Route 1 (Codex on Bedrock) is the cleanest move. The procurement story is “another model on the AWS contract we already signed.” Limited Preview region availability is acceptable at your scale. Pilot one repo for 30 days before standardizing.

If you’re a 12-engineer scale-up using Cursor + Claude Code: Route 3 (Anthropic-direct, post-cap-doubling) is your Q3 default. The cap doubling materially reduces the “split work across accounts” hack pressure your team has been quietly running. Don’t migrate Claude to Bedrock unless your CFO is asking for vendor consolidation.

If you’re a 30-person services shop running an internal customer support agent: Route 2 (GPT-5.5 Instant on Bedrock) is the right primitive. chat-latest is the model, Bedrock is the runtime, and your AWS contract is the billing path. Don’t use this route for code generation — that’s Route 1 (Codex on Bedrock with GPT-5.5 reasoning) or Route 3.

If you’re an enterprise with 200+ devs on Claude Code Team: Hold Route 3 as your primary. Pilot Route 1 (Codex on Bedrock) for OpenAI workloads only if a specific eng pod proves the value. Don’t try to migrate Claude to Bedrock just because Codex is now there.

If you’re in a regulated industry with audit-trail requirements: Bedrock is your runtime regardless of which model wins per workload. Route 1 for OpenAI workloads, Anthropic-on-Bedrock for Claude workloads. Anthropic-direct (Route 3) is off the table here even with the cap doubling.

What this can’t fix

Five things this routing decision will not solve. Be honest about them before you spend a week on the re-architecture.

  1. Vendor lock-in is still real on Bedrock. Routing through Bedrock collapses procurement, but it doesn’t fix the “we’re hard-coded to one model id” problem. If GPT-5.5 gets deprecated in a year, you still rewrite.
  2. Codex on Bedrock is Limited Preview. Some Codex features may not work at parity. If you’re standardizing your team on Codex, run a 30-day pilot on Route 1 against the existing direct setup before fully cutting over.
  3. The Anthropic cap doubling is on the 5-hour rolling cap and Tier 1 Opus throughput. Weekly caps are unchanged. If your real bottleneck was the weekly cap, today’s announcement doesn’t change your math.
  4. The SpaceX/Colossus 1 capacity lands “within the month.” Anthropic’s compute-supply story for May-July is healthier than April; the second half of the year depends on how the Colossus 1 ramp actually plays out. Don’t bake “Anthropic always has capacity” into a Q4 plan.
  5. Region availability gaps for OpenAI on Bedrock are real. If your latency-critical workload is in a non-US region, Route 1 and Route 2 may not be available to you in Q3. Re-check the Bedrock model page in your region before locking the decision.

The bottom line

The May 2026 routing decision boils down to four honest answers: is Bedrock consolidation a hard requirement, are you Opus-anchored or GPT-5.5-Instant-anchored on the hot path, can you swallow the OpenAI-on-Bedrock region gap, and are you mid-Anthropic-plan-negotiation. Those four answers narrow three routes to one for most teams.

If you’re picking one route to standardize on this quarter, the math for AWS-anchored shops with mixed workloads usually lands on Route 3 (Anthropic-direct, post-cap-doubling) for reasoning-heavy work and Route 2 (GPT-5.5 Instant on Bedrock) for throughput-bound work — with Route 1 (Codex on Bedrock) as the OpenAI-shop alternative.

If you want a deeper walkthrough on running Claude Code as a daily driver — including the cap-management patterns the May 6 doubling now changes — our Claude Code Mastery course walks through the full setup, the rate-limit math, and the patterns that actually save tokens on real projects.

Sources

Build Real AI Skills

Step-by-step courses with quizzes and certificates for your resume