For two years, using an AI coding assistant meant one model, one conversation, one task at a time. Claude Opus 4.8 broke that on May 28 with a feature called dynamic workflows, and it’s the biggest shift in how Claude Code works since it shipped. Agents went from assistants to something closer to architects.
Here’s the proof that made people sit up. Jarred Sumner, the creator of Bun, used dynamic workflows to port the Bun runtime from Zig to Rust: roughly 750,000 lines of Rust, eleven days from first commit to merge, with 99.8% of the existing test suite still passing. Hundreds of agents working in parallel, two reviewers on each file. That’s not autocomplete. That’s a feature shipping a project you’d normally staff with a team.
This guide walks through the parts the launch coverage skips. What dynamic workflows actually are, the exact commands that turn them on, which plan you need, what’s happening under the hood when Claude fans out “hundreds of subagents,” and the very real catch early testers are already shouting about.
What dynamic workflows actually are
Normally, Claude Code does your task itself, step by step, in one conversation. A dynamic workflow flips that. Claude becomes an orchestrator.
But here’s the part most write-ups miss. A dynamic workflow isn’t Claude “deciding to run more agents.” It’s Claude writing an actual JavaScript program, an orchestration script, that a separate runtime then executes in the background while your chat stays free. Anthropic’s docs put it plainly:
“A dynamic workflow is a JavaScript script that orchestrates subagents at scale. Claude writes the script for the task you describe, and a runtime executes it in the background while your session stays responsive.”
Why does that matter? Because of where the work lives. With a normal task, every intermediate result (every file Claude reads, every test it runs) lands back in the conversation. The context window fills up fast, and big jobs choke. A workflow lifts the loop, the branching, and all the messy in-between state out of the chat and into the script’s own variables. Claude’s conversation only ever sees the final answer.
That one design choice is the whole unlock. Here’s the difference at a glance:
| Subagents | Skills | Workflows | |
|---|---|---|---|
| What it is | A worker Claude spawns | Instructions Claude follows | A script the runtime runs |
| Who decides what’s next | Claude, turn by turn | Claude, following the prompt | The script |
| Where mid-work results live | Claude’s context | Claude’s context | Script variables |
| Scale | A few per turn | A few per turn | Dozens to hundreds per run |
So when a developer on X posts that Claude spun up 85 agents in parallel for nearly 16 minutes from a single prompt, wrote the whole orchestration script itself, and needed zero manual handoff between agents, that’s not hype. That’s the runtime doing exactly what it says on the tin. The model decided the job needed 85 agents, gave each one a scoped goal and success criteria, and coordinated them with a scheduler it wrote on the fly.
Two things to underline. First, it’s a research preview: new, evolving, not bulletproof. Second, the orchestration runs in code, which is why it stays on track. A 500-file migration doesn’t drift halfway through, because the plan isn’t being held in a conversation that’s running low on memory. It’s held in a script.
Ultracode vs effort levels: get the words right
People are using “ultracode,” “ultrathink,” “xhigh,” and “max” interchangeably. They’re not the same, and mixing them up will cost you tokens. Let’s untangle it.
Effort levels control how hard Opus 4.8 thinks on each step. There are five, set with /effort:
| Level | When to use it | Cost |
|---|---|---|
low | Short, latency-sensitive stuff | Cheapest |
medium | Cost-sensitive work, some intelligence traded away | Low |
high | The balanced default on Opus 4.8 | Standard |
xhigh | Deeper reasoning on hard problems | Higher |
max | Demanding tasks; session-only; can overthink | Highest |
Note that high is the default on Opus 4.8, not xhigh (that was the 4.7 default). And max has no token ceiling at all, so it’s easy to burn through a budget without meaning to.
Now, ultracode is not on that list. It’s a different kind of thing entirely. Straight from the docs:
“Ultracode is a Claude Code setting rather than a model effort level: it sends
xhighto the model and additionally has Claude orchestrate dynamic workflows for substantive tasks. It applies to the current session only.”
So ultracode equals xhigh reasoning plus automatic workflow orchestration. Turn it on and Claude stops waiting for you to ask. It plans a workflow for every substantive task in the session. One request can become three workflows back to back: one to understand the code, one to change it, one to verify the change. It resets when you start a new session.
And ultrathink? That’s a one-turn thing. Drop the word ultrathink anywhere in a single prompt and Claude reasons harder for that one reply, with no session change and no workflows. (Fun detail: “think,” “think hard,” and “think more” do nothing. Only ultrathink is a real keyword. The rest is just text.)
Quick mental model: /effort high is your daily driver. ultracode is the full-throttle, money-talks mode you flip on for the big jobs. One Japanese developer framed it as the setting for when you want to move the entire codebase, not for small fixes. That’s exactly right.
How to actually turn it on
You don’t write any orchestration code yourself. Claude writes the script. But you do need to know which lever to pull. There are four ways in.
1. Just say “workflow.” Put the word workflow anywhere in your prompt and Claude routes the task through workflow creation instead of answering inline. Claude Code highlights the word so you know it caught it.
Run a workflow to audit every API endpoint under src/routes/ for missing auth checks
Triggered it by accident? Press alt+w to ignore it for that prompt.
2. Ask directly. “Create a workflow to…” or “Write a dynamic workflow that…” works too. Same result.
3. Flip on ultracode. When you want Claude deciding for itself across a whole session:
/effort ultracode
Then go back to normal work with /effort high when you’re done. Don’t leave ultracode on for routine editing. Every task becomes a workflow, and your usage will show it.
4. Run the bundled one: /deep-research. This is the only ready-made workflow that ships with Claude Code. Point it at a question and it fans out web searches across angles, cross-checks the sources against each other, votes on each claim, and hands back a cited report with the stuff that didn’t hold up already stripped out.
/deep-research What changed in the Node.js permission model between v20 and v22?
What happens when you hit enter
Before the run starts, Claude Code shows you the plan and asks. You’ll see Yes, run it / Yes, and don’t ask again for this workflow / View raw script / No. Hit Ctrl+G to open the generated script in your editor, or Tab to tweak the prompt first. For anything touching production code, read the script. It’s the equivalent of reviewing a deploy plan before you run it.
One setting worth knowing here: auto mode. It clears that launch confirmation so an unattended workflow starts without you clicking Yes (and it’s skipped entirely once ultracode is on). What it does not do is stop a run from pausing mid-flight. A workflow still halts if an agent reaches for a shell command, web fetch, or MCP tool that isn’t on your allowlist, no matter which permission mode you’re in. The fix for that is the allowlist, not auto mode (more on that below). Auto mode also has fine print: it’s Anthropic-API only (not Bedrock, Vertex, or Foundry), needs a recent Claude Code and Opus 4.6+/Sonnet 4.6, and on Team or Enterprise an admin has to switch it on first. Turn it on by cycling with Shift+Tab (it shows a one-time opt-in), or start with claude --permission-mode auto.
Saving one you like
Got a workflow that did exactly what you wanted? Keep it. Run /workflows, select the run, press s. It saves to .claude/workflows/ in your project (shared with your team) or ~/.claude/workflows/ (just you), and from then on it runs as /<name> like any other command. A code review you run on every branch becomes a one-liner.
Which plan do you actually need?
This is where a lot of the launch-day chatter got it wrong, so let’s be exact. Dynamic workflows require Claude Code v2.1.154 or later (run claude update), and they’re a research preview. Here’s who gets them:
| Plan | Access | Default state |
|---|---|---|
| Max (5× and 20×) | Yes | On by default |
| Team (Standard & Premium) | Yes | On by default |
| Enterprise | Yes | Off — an admin enables it |
| Pro | Yes | Off — turn it on in /config |
| API / Bedrock / Vertex / Foundry | Yes | Available* |
Yes, Pro users do get dynamic workflows. You just have to switch them on. On Pro it’s two quick steps:
- Run
/model opusto get onto Opus 4.8 (Pro and Team Standard default to Sonnet 4.6, which can’t run ultracode). - Turn workflows on in the “Dynamic workflows” row of
/config.
A nuance worth getting right: Sonnet 4.6 can’t do xhigh, so it can’t run ultracode. You can still fire a one-off workflow on it with the workflow keyword or /deep-research. But ultracode’s auto-orchestration needs Opus 4.8.
*The cloud asterisk: on Bedrock, Vertex, and Foundry, the default opus and sonnet aliases point to older models (Opus 4.6, Sonnet 4.5) that don’t support xhigh or ultracode. Pick a model that does (for example set ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-8, or choose it in /model). On Claude Platform for AWS, opus is Opus 4.7, which does support xhigh.
If it’s not showing up
The feature is new, so half the “it doesn’t work” reports are setup, not bugs. Run through this:
- Run
claude update. Workflows need v2.1.154+. - No
ultracodein the/effortmenu? Your active model doesn’t supportxhigh. Switch to Opus 4.8 with/model opus. - The
workflowkeyword does nothing? Workflows are toggled off in/config, or an admin disabled them org-wide (disableWorkflows). - On Pro? You need both the
/configtoggle on and Opus 4.8 selected.
What happens under the hood
When a workflow fires, the runtime runs a loop that looks like this:
- Plan. Claude writes the orchestration script (the dependency graph, the loops, the fan-out) as code it then follows. Because the plan lives in a script and not a conversation, it stays stable even across hundreds of agents or hours of runtime.
- Fan out. The runtime runs up to 16 agents at once (fewer on a weaker machine), with a hard ceiling of 1,000 agents total per run to stop runaway loops. Each agent gets its own context and its own scoped job: inspect, rewrite, test, review.
- Verify and refute. This is the part that ties to Opus 4.8’s honesty push. In Anthropic’s words: “Agents address the problem from independent angles, other agents try to refute what they found, and the run keeps iterating until the answers converge.” The system tries to break its own results before you ever see them. It’s why people keep saying 4.8 “stops saying ‘done’ when it’s half done.”
- Merge and report. Only the verified, reconciled answer comes back.
You can watch the whole thing live. Run /workflows to see every run with its phases, agent count, token total, and elapsed time, and drill into any single agent to read its prompt and what it found. From that view you can pause a run with p (press p again to resume, and finished agents return their cached results), stop a run or a single agent with x, or restart one with r. If a run goes sideways, you stop it without losing the agents that already finished.
One limit worth knowing up front: resume is session-scoped. Pause a run and come back to it within the same session, fine. But quit Claude Code entirely while a workflow is going, and the next session starts it from scratch. There’s no “close the laptop Friday, reopen Monday” across a full restart yet.
How you know it actually worked
A swarm of agents is only useful if you can trust the output, so build in a check. For /deep-research, the deliverable is a cited report with the claims that failed cross-checking already stripped out. For code, the agents have already applied their edits (they run in acceptEdits mode), so you are the final gate: run your test suite, read the git diff, and confirm the reported findings point to real file paths. This is exactly why the Bun port used the existing test suite as its bar. “Looks done” isn’t a result. A green test run is.
(Full disclosure: this article was researched and fact-checked inside exactly this kind of orchestrated run. Agents fanned out across Anthropic’s docs and community reports, then a separate verification pass tried to refute every claim before anything got written. The pattern is real. It works. With the caveats below.)
The /loop confusion
A lot of people assume there’s a /loop command for workflows. There isn’t. /loop is a separate Claude Code feature. It runs a prompt or a slash command on a repeating interval (or lets the model pace itself), handy for “check the deploy every few minutes” kinds of tasks. Useful, but it has nothing to do with dynamic workflows. Don’t go looking for it in the workflow docs.
Where it shines, and where it burns money
Dynamic workflows are a sledgehammer. Glorious on a wall, absurd on a thumbtack.
The sweet spot is anything too big for a single pass: language or framework migrations, repo-wide security and dead-code audits (Claude searches in parallel, then independently verifies every finding before it reports), and high-stakes work where you want adversarial agents trying to break the result first. Anthropic’s own team calls out discovery and review across large codebases as where it’s been most valuable.
Where it wastes money is just as clear. The docs are blunt about one-file edits: the orchestration overhead isn’t buying much, so use normal chat or a single subagent. Same for vague prompts (“run a workflow to improve the app” sends agents wandering and tokens vanishing) and routine daily coding. The rule of thumb: if you’d hand it to one engineer for an afternoon, don’t summon a swarm.
The token bill is real, and it bites fast
This is the loudest complaint from week one, and the numbers are eye-watering. One developer on the $200/month Max plan reported burning 20% of their entire weekly limit on day one, just from turning on dynamic workflows. Another: “never using Opus 4.8 + ultracode xhigh effort again. One small prompt is enough to drain all the session tokens.” One Pro user on the $20 plan reported hitting their cap after about ten minutes.
Anthropic doesn’t hide this. The launch note says dynamic workflows “can consume substantially more tokens than a typical Claude Code session” and tells you to start on a scoped task. Worth knowing why it spikes: the cost front-loads in planning, and then every one of those parallel agents is reading, writing, and double-checking. Hundreds of agents, each spending tokens. It adds up the way you’d expect hundreds of anything to add up.
How to keep it sane:
- Scope tight. Give it a path (
under src/auth/ only), an output contract (“return confirmed issues with file paths”), and an edit policy. A bounded task can’t run up an unbounded bill. - Route cheap stages to a cheaper model. Tell Claude to “use a smaller model for the stages that don’t need the strongest one.” Not every agent needs Opus.
- Set a hard ceiling. There’s no per-run token cap (a workflow counts against your normal plan limits like any session), but Pro and Max users can set a monthly spend limit with
/usage-credits, and on the API or Team an admin can set a workspace spend limit in the Console. - Watch it.
/usageshows your burn in real time./workflowslets you kill a run mid-flight without losing finished work. - Keep ultracode off unless most of your session genuinely warrants heavy orchestration. For everything else,
/effort high.
The catch nobody mentions: parallel agents and merge conflicts
Here’s the problem the hype posts skip. If two agents edit the same file at the same time, you get merge conflicts, conflicts an AI created that you now have to untangle. One developer put the deeper version of this well: “the next coding-agent fight is merge confidence. Parallel agents create leverage only when the synthesis is disciplined. Otherwise you get more confident paragraphs, more token burn, and the same merge anxiety.”
The community is still working this out. The documented answer already exists, though: git worktrees. Give each parallel session its own isolated git checkout and the agents can’t step on each other; they work in separate trees and merge cleanly at the end. For a big fan-out, Anthropic’s own best-practices guide points the same direction: generate the file list, then loop claude -p over it with a scoped --allowedTools, so each task runs isolated.
There’s a sneakier failure too. An agent that can’t finish a step (say an API call that won’t authenticate) will sometimes quietly insert mock data behind a try/catch and move on, so the job “succeeds” with fake results buried inside. Parallel agents multiply that risk, because silent fakery can spread across a whole run before you look. The fix is a CLAUDE.md rule the agents will actually follow:
Error handling: fail loud, never fake.
- Never suppress an error to look like it worked.
- Expose the failure; do not substitute placeholder data.
- A fallback is okay only if it's transparent (log a warning, annotate the output).
Permission design is the real day-one move
The most-shared line of launch day wasn’t about speed. It was this: “New model day is not prompt day. It is permission-design day.” Before you let a swarm loose on a real repo, decide what it’s allowed to touch.
A sane setup, from least to most locked-down:
- Pre-load your allowlist with the exact commands the agents will need (
/permissions), so the run doesn’t stall on a tool that isn’t approved. This, not auto mode, is what keeps a long run from pausing on you. - Work in a worktree or a throwaway branch. Never point a first run at
main. - Go read-only first. Run one workflow to map and confirm the issues, review the output, then run a second workflow to apply the changes. Two stages, not one.
- Scope the prompt with a path, an output contract, a rule that an independent agent must confirm each finding, and an edit policy.
- Lean on auto mode’s classifier, but don’t trust it as a wall. It blocks the scary stuff by default (force-pushing, pushing to
main, production deploys,curl | bash), which is genuinely useful. It’s a backstop, not a boundary. Your allowlist and a throwaway branch are the boundary. - Switch it off when you don’t want it. For yourself, toggle Dynamic workflows off in
/configor setCLAUDE_CODE_DISABLE_WORKFLOWS=1. Org-wide,"disableWorkflows": truein managed settings turns it off for everyone.
Treat it like hiring a temp engineering team for a week. You’d give them a scoped task, repo access with limits, and a review at the end. Same here.
How it stacks up against Cursor, Devin, and Codex
If you already pay for another agent, the fair question is what’s actually different here. The honest answer: the orchestration model, not the agent count. Anthropic-specific claims below come from the docs; the competitor lines are my read of how those tools work as of mid-2026, and they’re all moving fast.
| Claude dynamic workflows | Cursor (parallel agents) | Devin | OpenAI Codex | |
|---|---|---|---|---|
| Orchestration | Claude writes a JS script a runtime runs | IDE-native parallel edits | Autonomous cloud VM | Cloud sandbox task runner |
| Built-in verify/refute | Yes | No native pass | No native pass | No native pass |
| Read & rerun the plan | Yes (save as a command) | No | No | No |
| Runs for hours | Yes | Limited | Yes (cloud) | Yes (cloud) |
| MCP support | Yes | Yes | Yes | No |
Three things are genuinely Claude’s edge: the plan is a script you can read, save, and rerun instead of an opaque agent loop; the verify-and-refute pass has fresh agents trying to break the result before you see it; and because the plan lives in script variables rather than the chat, it doesn’t drift over hundreds of agents. Where the others win: Cursor keeps you in a visual diff-review loop, Devin runs fully autonomously in the cloud, and several testers still rate Codex higher for in-browser agent tasks. Different tools, different jobs.
What this means for you
If you maintain a large codebase: This is the feature to test first, on a contained job (a framework bump, a lint-rule migration, a dependency swap) where the test suite is a hard, objective bar. Let it plan and fan out, then review the merge like any large PR. Don’t point it at your whole monorepo on day one.
If you’re a solo dev on Pro: You can run this now. Flip it on in /config and switch to Opus 4.8 with /model opus. Just go in eyes-open on the limits: a $20 plan can hit its cap in minutes once workflows are spinning. Start with /deep-research to get a feel for it before you let ultracode loose on your repo.
If you’re a team lead evaluating it: Budget the token cost explicitly, and start with permission scoping, not prompting. Define what agents can edit, what needs human sign-off, which repos are off-limits. The teams that get burned will be the ones who treated it like faster autocomplete.
If you’re a non-developer keeping up: The one sentence worth carrying into a meeting is this. “Claude can now plan a big job, write its own coordination script, split the work across hundreds of mini-agents that check each other, and report back.” That shift from assistant to orchestrator is coming to far more than code.
What dynamic workflows can’t do
- It’s not autonomous magic. It still makes mistakes over long runs. The refute loop catches a lot. It doesn’t remove your review at the end.
- It won’t fix a vague goal. A swarm pointed at an unclear objective gives you a thorough, expensive wrong answer. Scope before you fan out.
- It can’t pause for your judgment mid-run. The only thing that stops a run is a tool-permission prompt. If a job needs your call between stages, split it into separate workflows.
- It doesn’t survive a full restart. Resume works within a session. Quit Claude Code and the workflow starts over.
- The controls are still settling. It’s a research preview. Triggers, caps, and toggles will shift release to release. Today’s behavior is a starting point, not a contract.
So is it hype?
Both things are true. The skeptics have a point. Plenty of developers say 4.8 feels incremental over 4.7 in daily use, that “100s of sub-agents” loses its edge fast, and that Cursor and Codex will copy the trick. One blunt take: “Good update. Overhyped af.” Fair enough.
But the genuine unlock isn’t the agent count. It’s the verify-and-refute loop. Early reports put Opus 4.8 at meaningfully less likely than 4.7 to let a bug slip through its own code unreported. For anyone running this on real work, that honesty matters more than any benchmark. The headline isn’t “Claude got faster.” It’s “Claude got better at admitting when it’s wrong, then catching it before you do.” On a 750,000-line migration, that’s the difference between a tool and a teammate.
FAQ
Do I need to write code to use a workflow? No. Claude writes the orchestration script. You describe the task, then review and approve.
Which plans have dynamic workflows?
All paid plans, plus the API, Bedrock, Vertex, and Foundry. Max and Team have it on by default. Enterprise admins enable it. Pro users turn it on in /config and need to switch to Opus 4.8 first. Requires Claude Code v2.1.154+.
What’s the difference between ultracode, ultrathink, and max?
ultracode is a session setting: xhigh reasoning plus automatic workflows. ultrathink is a one-turn keyword for deeper reasoning on a single prompt. max is the top effort level (deepest thinking, no token cap), but it doesn’t auto-orchestrate workflows the way ultracode does.
How many agents can run at once? Up to 16 concurrently, fewer on a low-core machine, with a hard cap of 1,000 total per run.
Will agents change my files without asking?
Inside a workflow, yes. Subagents run in acceptEdits mode, so file edits are auto-approved. They still inherit your tool allowlist, and shell or MCP commands outside it can pause the run for your OK.
How do I keep it from eating my whole token budget?
Scope the task to specific paths, route cheap stages to a smaller model, and keep ultracode off for routine work. Watch /usage, set a monthly ceiling with /usage-credits on Pro/Max, and stop any run from /workflows.
How do I know the workflow actually worked?
For /deep-research you get a cited report with refuted claims removed. For code, run your test suite and read the git diff. The edits are already applied, so the test run is your proof.
Can I reuse a workflow?
Yes. Run /workflows, select the run, press s to save it as a /command.
Is there a /loop for workflows?
No. /loop is a separate feature for recurring tasks. Workflows are triggered by the workflow keyword, /effort ultracode, or /deep-research.
The bottom line
Dynamic workflows are the most genuinely new thing in Opus 4.8. Claude Code goes from a coder to a coordinator that writes its own plan as code, parallelizes it, and self-checks before reporting back. For codebase-scale migrations, big refactors, and deep research, that’s a real leap. For anything small, it’s an expensive way to do a five-minute job. The winning move is to treat it like a temporary engineering team: a well-scoped task, clear permissions, and a review at the end.
If you want to actually run Claude Code at this level (orchestrating work, keeping long sessions on track, and not torching your token budget), that’s the whole point of our Claude Code Mastery course, and Automation Workflows covers the plan-then-parallelize thinking that makes the feature pay off instead of just spend. New to the effort dial itself? Start with our guide to Opus 4.8’s effort settings.
Sources
- Introducing dynamic workflows in Claude Code — Anthropic
- Orchestrate subagents at scale with dynamic workflows — Claude Code docs
- Model configuration (effort levels & ultracode) — Claude Code docs
- Choose a permission mode (auto mode) — Claude Code docs
- Manage costs — Claude Code docs
- Introducing Claude Opus 4.8 — Anthropic