Claude Code /goal: Set a Finish Line, Walk Away

Claude Code's new /goal command lets you set a finish line and walk away — Haiku checks every turn. Real wins, real failures, and the $200/14h warning.

Anthropic shipped a small command this week and the official @ClaudeDevs post about it pulled 9.5K likes, 685 retweets, and 606K views in under a day. The thing is one slash command. Five letters: /goal.

If you’ve used Claude Code for anything longer than a quick fix, you already know the pain. You ask it to migrate a module. It does half the work, hits a snag, and stops talking. You re-prompt: “keep going.” It does another chunk. Stops again. Re-prompt. Chunk. Stop. Three hours later you’ve typed “keep going” 40 times and you start to wonder why you’re paying for the assistant when you’re doing the conducting.

/goal is the fix. You write one completion condition. Claude works across turns until a separate model — Haiku, by default — checks the transcript and confirms the condition is met. Then the goal clears itself and you get your terminal back.

It actually landed in Claude Code v2.1.139 on May 11, two days before the official social push. Same release added the new agent view dashboard. Anthropic isn’t shy about why it exists: this is the feature that catches Claude Code up to OpenAI’s Codex CLI, which shipped a /goal of its own in version 0.128.0. GitHub issue #56085 was filed May 4 asking for parity, labeled “high priority,” and closed seven days later when the command shipped. One Taiwanese developer summed it up bluntly on X: “Claude copied Codex /goal in less than two weeks.” Fair.

If you only learn one new Claude Code thing this month, learn this one.

What /goal actually does

Here’s the short version, in plain language:

You type /goal followed by a condition. Claude starts working immediately — no second prompt needed. After every turn it takes, a smaller, faster model reads the conversation and answers one question: is the condition met yet? If no, Claude takes another turn. If yes, the goal closes itself and Claude hands control back to you.

That separation matters. The model doing the work isn’t the model judging whether it’s done. The judge is fresh every turn, doesn’t have the ego of having written the code, and gets one job: read the transcript, return yes or no plus a one-line reason.

In Anthropic’s own words from the docs:

A “no” tells Claude to keep working and includes the reason as guidance for the next turn. A “yes” clears the goal and records an achieved entry in the transcript.

So the loop is self-correcting. If Claude thinks it’s done but actually missed a test, the evaluator says “no — three tests in test/auth are still red” and that sentence becomes the next instruction. You don’t have to write it.

For the pros: under the hood, /goal is a wrapper around a session-scoped prompt-based Stop hook. The evaluator runs on whichever model is configured as your “small fast model” — Haiku by default — and bills tokens against that, not against the main turn budget. Anthropic says those evaluation tokens are “typically negligible compared to main-turn spend.” In practice, every turn pays for one extra short Haiku call.

Setting a goal

The syntax is exactly what you’d guess:

/goal all tests in test/auth pass and the lint step is clean

That’s it. Setting the goal counts as the first turn, with the condition itself as the directive. You don’t follow up with “now do it” — Claude reads the condition as the instruction and starts.

While it’s running, you’ll see a little ◎ /goal active indicator with elapsed time. Run /goal with no arguments at any point to see:

  • The condition you set
  • How long it’s been running
  • How many turns the evaluator has scored
  • Current token spend
  • The evaluator’s most recent reason (a one-liner like “two test files still failing in auth/refresh_token_test.py”)

To stop it early, type /goal clear. The command also accepts stop, off, reset, none, and cancel as aliases, which is a small kindness — nobody remembers the official keyword under pressure.

How to write a condition that actually works

This is where most people fail their first three attempts. The most-shared meme on the topic in the first 48 hours captured it well: “Most people are using /goal wrong. They write ‘make no mistakes’. And pray.”

The evaluator reads the transcript only. It doesn’t run commands. It doesn’t read files. It can only judge what Claude has surfaced in the conversation. So bad conditions are vague conditions, or conditions that can’t be proven from the conversation alone.

Real examples from the field in the past week, both wins and losses:

Condition typedOutcome
/goal turn this screenshot into a working app. Goal is reached once you tested every feature end to end in the browser✅ Worked. Reported done in 20 minutes.
/goal run full QA suite, fix failing tests, report results (set before bed)✅ Woke up to a codebase that had improved.
/goal Complete all backlog tasks + 90% test coverage + clean code✅ Worked, but “push Claude until morning” was the time cost.
/goal a world where 'substrate, not model' is the obvious right axis for AI-assisted programming❌ Stuck in an “acknowledgement loop” twice. Burned session limits.
Goal with gap-analysis across a 20-minute swarmed effort❌ “Task adherence was weak. Many gaps remained. Cannot trust.”

The pattern’s not subtle. Tight, verifiable, output-anchored conditions ship. Vague philosophical missions burn tokens.

A condition that holds across many turns usually has three pieces, per the official docs:

  1. One measurable end state — a test result, a build exit code, a file count, an empty queue.
  2. A stated check — exactly how Claude should prove it ("npm test exits 0" or “git status is clean”).
  3. Constraints that matter — what mustn’t change on the way there (“no other test file is modified”).

A field-tested template that went around X in the first 48 hours expands that into nine sections: GOAL, CONTEXT, CONSTRAINTS, PRIORITY, PLAN, DONE WHEN, VERIFY, OUTPUT, STOP RULES. For long missions it’s worth the structure. For “fix this one test file,” three sentences is plenty.

One more trick from the docs: if you’re worried about runaway work, add a hard limit to the condition itself. Something like or stop after 20 turns works — Claude reports progress against that clause each turn, and the evaluator counts it as a valid stop. Add this on every goal you’re not sitting next to. We’ll come back to why in the cost section.

“Set the goal before bed”

The shorthand the community settled on in week one was “set the goal before bed, wake up to either receipts or a clean error log.” Someone built a Codex /goal overnight workflow that ran 10+ hours and posted the video. Someone else summed it up: “free overnight intern.” The aspirational use case is real — point Claude at the backlog at 11pm, wake up to a finished PR.

The reality is more mixed. One developer tried Claude Code’s /goal for the first time and reported it stopped to ask for review and approval within the hour: “In my personal experience, /loop worked longer for similar tasks. Can’t leave this overnight for sure.” The cause is usually that trust mode and clarifying-question behavior haven’t been fully tuned, and any mid-turn confirmation prompt halts the loop.

If you want overnight runs to actually work:

  • Turn on auto mode. This is non-negotiable for unsupervised work. Without it, the first time Claude wants to confirm a tool call, your “overnight intern” goes back to staring at the wall.
  • Write the condition to ban clarifying questions. Something like do not ask for confirmation; if uncertain, surface the uncertainty in a written note and continue with your highest-confidence assumption.
  • Add a hard turn limit. or stop after 40 turns keeps a stuck loop from running until 6am.
  • Sit with the first hour. Watch the status panel and the evaluator reasons. If the reason hasn’t changed in three turns, you’re in a loop and need to step in.

The cost warning nobody mentioned in the launch

This part isn’t in the docs. One developer ran /goal for 14 hours overnight in a terminal and posted what happened in the morning: “I burned through all my weekly tokens on a $200 account.” Another summarized it more pithily — “Vague goals burn tokens forever.”

The math is straightforward once you see it. Each turn costs main-turn tokens for the work plus a small Haiku call for the evaluator. The evaluator is cheap. The main-turn tokens are not. A loose condition that the evaluator keeps scoring “not yet” can drive 50, 100, 500 turns before you notice. There’s no built-in spend cap — you can watch token spend in the status panel, but nothing slams the brakes automatically.

Three habits worth forming before your first long-running goal:

  • Always include a turn cap or wall-clock cap in the condition itself. or stop after 40 turns or or stop if 90 minutes elapse. This is the closest thing to a spending limit you have.
  • Check the status panel before you go to bed. If the evaluator’s most recent reason is identical to the one from three turns ago, the loop is stuck and you’re about to pay for hours of nothing.
  • Don’t run open-ended creative goals. “Make my code better” or “explore approaches to X” have no terminating condition the evaluator can score. They’ll run until you stop them — or until your token allowance dies.

How /goal differs from /loop and Stop hooks

Anthropic now ships three different ways to keep a session alive between prompts. The differences matter — picking the wrong one wastes tokens at best and burns through your wallet at worst.

ApproachNext turn starts whenStops when
/goalThe previous turn finishesA model confirms the condition is met
/loopA time interval elapsesYou stop it, or Claude decides the work is done
Stop hookThe previous turn finishesYour own script or prompt decides

/goal and a Stop hook both fire after every turn. The difference is that /goal is session-scoped — you type a condition and it’s active just for this terminal. A Stop hook lives in your settings file, applies to every session in its scope, and can run a custom script for deterministic checks (run the tests yourself, parse the output) or a prompt for model-evaluated ones.

/loop is different in kind. It re-runs a prompt on a time schedule — every 5 minutes, every hour, whatever. Use it for “check the deploy every 5 minutes” or “summarize new GitHub comments every hour.” It’s not the right tool for “finish this refactor.”

Auto mode is the fourth piece. It removes the per-tool confirmation prompts within a turn. It doesn’t start new turns on its own. The honest pairing is: auto mode + /goal = each turn runs unattended, and the loop runs until done. That’s the combo that turns Claude Code from “interactive assistant” into “send it to do the work while you make coffee.”

What you can resume, what you can’t

If you exit a session with a goal still active, the goal is restored when you come back with --resume or --continue. The condition itself carries over. But the turn count, timer, and token-spend baseline all reset — so the stats panel starts at zero again.

If the goal was already met or you cleared it before quitting, it’s gone. No resurrection.

Headless mode works too. Stick a goal inside a -p invocation and it runs to completion as one call:

claude -p "/goal CHANGELOG.md has an entry for every PR merged this week"

Hit Ctrl+C to cut it off early. This is the pattern for CI scripts, cron jobs, anything where you want autonomous work without keeping a terminal open.

The persistence problem

There’s been one take from the week that’s worth sitting with, because it’s the actual interesting thing about /goal. Paraphrasing a developer who put it well:

Claude is no longer just autocomplete. It’s becoming an execution loop with persistence. The dangerous part isn’t that AI can code. It’s that it doesn’t get tired chasing the goal.

That’s the right framing. The risk of /goal isn’t that Claude will write bad code (it does that sometimes anyway). The risk is that with /goal plus auto mode, the bad-code-writing has no natural stopping point. A tired human writes one bad function and goes to bed. A tireless evaluator-judged loop writes the bad function, scores “not yet,” writes another version, scores “not yet,” writes another version — and four hours later there are 47 versions of the same broken function and no human in the loop.

The mitigations are the same as the cost ones: tight conditions, hard turn caps, auto-mode discipline. Plus one more, mentioned in the docs but easy to miss: /goal only runs in workspaces where you’ve accepted the trust dialog. If you opened Claude Code in someone else’s repo and didn’t trust it yet, /goal is blocked. That’s the floor. Above the floor, you’re responsible for what you let it touch.

One developer tested a deliberately edgy goal — “earn me $1 online without spamming” — and reported it stopped at the boundary on its own: “no posts, payment setup, or account actions without me.” That’s the optimistic read. The pessimistic read is that the developer was testing with intent. Set a sloppy goal in a sloppy workspace and you’ll find out where the edges are the hard way.

What it can’t do

A few things to set expectations.

The evaluator can’t run commands. It only reads what Claude has surfaced in the conversation. If Claude runs npm test and the output lands in the transcript, the evaluator can score the result. If Claude says “I think the tests pass” without running them, the evaluator has nothing to verify against — and may say yes when it shouldn’t. Write conditions that force Claude to produce evidence in the transcript.

Acknowledgement loops are real. The most common failure mode in week one was conditions vague enough that Claude could plausibly claim progress every turn without actually moving. The evaluator scores “not quite” each time, Claude responds with another round of acknowledgements, repeat until tokens. The fix is the same fix as for cost: tight verifiable end states, hard turn caps.

There’s no hard token budget. Already covered above. Worth repeating.

Only one goal per session. Setting a new goal replaces the old one. There’s no nested goal-within-a-goal, no goal queue. If you need parallel objectives, run multiple Claude Code sessions — the new agent view dashboard in the same v2.1.139 release is built exactly for that.

It needs trust mode on. /goal won’t run in workspaces where you haven’t accepted the trust dialog, because the evaluator is part of the hooks system. It’s also disabled if disableAllHooks is set in your settings, or if allowManagedHooksOnly is set in managed policy. The error tells you which, so you’re not debugging silence.

What this means for you

If you write code with Claude Code daily: Stop typing “keep going” three hundred times a week. Start every long-running task with /goal and a specific finish line. Migrations, refactors, test-coverage drives, doc generation — anything with a clear “done” state. Just add a turn cap. Always add a turn cap.

If you’ve been hesitant to leave Claude Code running unsupervised: This is the moment — but ease in. Try a 40-turn cap before a 200-turn cap. Try a 30-minute run before an overnight one. The people who reported the best overnight results all had a written guardrails template they’d tested in the daytime first.

If you don’t write code but you’ve been curious about Claude Code: This isn’t your entry point. Stay on Claude.ai or the desktop app. /goal is built for terminal sessions, multi-file work, the kind of stuff developers do. If you want autonomous AI for non-code work — research, content drafts, scheduling — the scheduled tasks system Anthropic shipped earlier this year is closer to what you want.

If you use Codex CLI: You already know this pattern. The Claude version is functionally close, with the same headline trade-off (no hard spending cap, evaluator can only see the transcript). The thing that’s different is the underlying model. Anthropic’s Haiku is fast and cheap as evaluators go, and the main turns can run on Opus 4.7. Whether that combination is worth switching for is a question only your codebase can answer.

The bottom line

/goal is the feature that turns Claude Code from a chat partner into an actual worker. Not in a hype-cycle way — in the specific, narrow sense that you can now hand off a clearly-scoped task with a verifiable end state and let it run.

Write tight conditions. Always include a turn cap. Glance at the status panel for token spend. Don’t run open-ended creative goals overnight. And stop typing “keep going.”

If you want to go deeper on the workflow patterns around this — sessions, routines, when to use /goal versus /loop versus a custom Stop hook — those are exactly what our Claude Code Routines Mastery and Claude Code Session Mastery courses cover.

Sources:

Build Real AI Skills

Step-by-step courses with quizzes and certificates for your resume