Devin vs Claude Code: Which Wins After Cognition's $1B Raise (2026)

On May 27, 2026, Cognition AI closed a Series D of more than $1 billion at a $26 billion valuation, more than doubling the $10.2 billion mark it hit nine months earlier. Lux Capital and General Catalyst led; 8VC co-led; Founders Fund, Ribbit, and Atreides participated. Cognition’s own claim — “89% of the PRs written at Cognition are now opened by Devin” — landed about ten hours later in the company’s announcement post. Run-rate revenue: $492M. Customers named in the press release: Citi, Goldman Sachs, Mercedes-Benz, Dell, Santander, the US Army, and the US Navy.

If you run a ten-engineer team and your weekly AI bill is starting to look like a junior engineer’s salary, the question is no longer “should we use AI.” The question is “on which tickets does Devin earn its keep — and on which tickets does Claude Code with the team’s existing Max subscription do the same job for one-tenth the cost?”

This post answers that.

What just changed

Cognition’s Series D announcement post Source: cognition.ai/blog/series-d — captured May 28, 2026.

Three things landed in the last six weeks that change the comparison:

Devin 2.0 (April 2026) — Cognition dropped the entry tier from $500/month to $20/month + $2.25 per Agent Compute Unit (ACU), where one ACU equals roughly 15 minutes of active Devin work. A typical bug fix runs 2-3 ACU. A multi-file migration runs 30+. WeavAI’s benchmark review put the practical math at $4.50-6.75 for a routine fix, $67.50+ for a serious refactor.
Claude Code Remote Tasks — Anthropic’s most recent Claude Code update brought the SWE-Bench Verified score to 87.6%, the highest published number on the standard benchmark as of May 2026. Pricing didn’t move: Pro at $20/mo, Max 5x at $100/mo, Max 20x at $200/mo. Heavy real-world users land at $100-200/mo.
The OpenHands counter — @iam_elias1’s May 23 X post (84 likes, 29 reposts, 33 replies) framed the budget alternative: OpenHands (formerly OpenDevin) combined with Claude Sonnet 4.5’s extended-thinking mode achieves 72% on SWE-Bench Verified at roughly $6/day in API costs on heavy use. Open-source. Docker-sandboxed. No vendor lock-in. The whole frame of the post: “Six dollars per day on heavy use. Versus $500 per month for Devin.”

The valuation news is the headline. The pricing news is the actual story.

The 4 ticket types — and which agent wins each

The way to think about this is by ticket shape, not by tool brand. Devin’s pitch is autonomy: you hand it a GitHub issue, it disappears for hours, it opens a PR. Claude Code’s pitch is in-loop pair programming: you sit at the terminal, it does the work, you sign off line by line. The right answer per ticket depends on whether you’d rather wait or sit.

Claude Code documentation Source: code.claude.com/docs/en/overview — captured May 28, 2026.

Ticket 1 — Flaky test fix

The classic “this test failed three times this week, we don’t know why” ticket.

Claude Code (in-loop): Open the failing file. Tell Claude what’s flaky. It traces the race condition or timing dependency in front of you. Cost: maybe $0.50 of your Max 5x subscription. Wall-clock: 15-30 minutes.
Devin (autonomous): Assign the GitHub issue to a Devin session. Walk away. It runs the test suite, instruments the test, finds the flake source, opens a PR. ACU burn: 2-3. Cost: $4.50-6.75. Wall-clock: ~45 minutes from your perspective, but you weren’t in the room.

Winner: Claude Code, by cost and by judgment. You learn what was flaky. Next time the suite flakes you spot it faster.

Ticket 2 — Dependency upgrade across 30 files

The “we need to bump React/Next/Express to vN+1” ticket. Mostly mechanical. Lots of files. Tedious.

Claude Code: Run the upgrade command. Hand-fix the breakages. Claude Code can batch-edit but you’re still babysitting. Cost: a few dollars on subscription. Wall-clock: 2-3 hours.
Devin: Hand off the issue with the upgrade target. It runs the upgrade, fixes the breakages, runs the full test suite, opens a PR. ACU burn: 15-25. Cost: ~$50. Wall-clock: 2-4 hours of Devin time. You did something else.

Winner: Devin, clearly. This is the canonical “you’d rather wait than sit” job. $50 to get three hours of your week back is a good trade.

Ticket 3 — Build a new endpoint with 4 callers

The “we need a new POST /api/v2/foo and four services need to call it” ticket.

Claude Code: You probably want to sit with this one. Designing the endpoint shape, the auth surface, the error handling, the four caller integrations — that’s architecture-shaped work, not mechanical. Claude Code is great here because it co-designs with you. Cost: a few dollars. Wall-clock: 2-3 hours.
Devin: Devin will design it. The output will probably work. It might not match your service conventions. @morganlinton on X: “They both tend to think they are done building a plan, but there are still either unbuilt portions, or real bugs that need to be fixed.” ACU burn: 20-30. Cost: $50-70.

Winner: Claude Code, with a caveat. Devin wins if your codebase is uniform enough that an outside agent’s conventions land within tolerance. For most ten-engineer teams that’s not yet true.

Ticket 4 — Multi-file refactor (rename + signature change across 80 call-sites)

The “we’re changing the function signature and there are 80 call-sites” ticket.

Claude Code: Painful. You’ll either babysit the rename or use a deterministic refactoring tool first and have Claude clean up the edge cases. Cost: a few dollars subscription. Wall-clock: half a day.
Devin: Hand off. ACU burn: 30-50. Cost: $70-115. Wall-clock: half a day of Devin time. You did something else, again.

Winner: Devin, if your test coverage is honest. The risk is silent regressions on call-sites whose tests don’t cover the changed behavior — and that risk is the same with any tool, you just can’t watch the agent miss them. Pair Devin with a strict CI gate.

The honest cost table for a ten-engineer team

Take a typical week — 4 flaky tests, 2 dependency upgrades, 3 new endpoints, 1 large refactor:

Approach	Devin spend	Claude Code spend	Combined monthly	Note
All Devin	$200/wk on tickets + $20 base = $820/mo per dev × 10 = $8,200	—	~$8,200	The official-press-release option
All Claude Code	—	$200/mo × 10 = $2,000	$2,000	Subscription, no ticket overage
Hybrid (Devin for #2, #4; Claude Code for #1, #3)	~$300/mo per dev × 10 = $3,000	$200/mo × 10 = $2,000	$5,000	The @ryancarson actual stack — Devin for backend/cloud agents, Claude Code or Codex for frontend/local
OpenHands + Claude Sonnet 4.5 API	—	~$6/day × 22 work days × 10 devs = $1,320	$1,320	The budget-floor option per @iam_elias1

The numbers are illustrative. The math is real. Hybrid lands at $5,000/mo for the team — vs $8,200 if you go all-Devin, vs $1,320 if you go all-open-source. The post-raise question every engineering manager should ask is: “on which tickets does the extra $3,200/mo of Devin actually buy me three hours of senior-engineer time?” The answer is mostly #2 and #4. Mostly.

What this means for you

If you run a 5-15 engineer team: Hybrid. Use Devin for #2 and #4-shaped tickets (mechanical, multi-file, “you’d rather wait than sit”). Use Claude Code on Max 5x or Max 20x for #1 and #3 (judgment-heavy, architecture-shaped, “you should sit”). Budget about $4-5K/mo for the team.
If you run a 50+ engineer team: Devin’s per-seat math gets better at scale because the autonomous tickets dominate. Talk to Cognition about enterprise pricing — the $20 floor was the consumer move; serious customers negotiate.
If you’re a solo dev or 2-3 person team: Claude Code Pro or Max 5x covers 80% of what you need. Devin’s ACU pricing is structured for tickets that take hours; solo devs rarely have those.
If you’re at a company that requires open-source / no-vendor-lock-in: OpenHands + Sonnet 4.5 API. The 72% SWE-Bench number is real, the Docker sandbox is real, and the $6/day on heavy use is genuinely cheaper than any subscription path. The cost is your weekend, because nothing is set up for you.
If you’re at a regulated industry (finance, healthcare, defense): Devin won the Citi / Goldman Sachs / US Army / US Navy deals for a reason — their compliance posture is built for that buyer. Claude Code on enterprise-tier with Anthropic’s enterprise contract gets you to a similar place. OpenHands does not.

What this can’t fix

The honest limits, before you swap your stack:

No agent removes the need for review. Devin opening 89% of PRs at Cognition does not mean the company reviewed 11% of them carefully. It means a human still sat with each one. You will too.
The “tried Devin once a year ago and it sucked” thing is real. @nityasnotes (Exa): “If you tried Devin when it first came out, know it is a completely different product today.” The 2024 reputation does not match the 2026 product. Retest before deciding.
ACU burn is unpredictable until you’ve watched it for two weeks. A “simple bug fix” can turn into 12 ACU because the test suite is slow. Track per-ticket cost for the first 10 tickets before you trust the budget projection.
OpenHands is not zero-effort. The “$6/day” number assumes Docker is up, the runtime is configured, the Claude Sonnet 4.5 API key is set, and the prompts are tuned. You will spend a weekend on setup.
Devin is not yet a frontend agent. The high-engagement practitioner stacks (@ryancarson shipped Devin for backend, Codex for frontend, Claude Design for design surfaces) treat it as a backend / cloud-task agent. Don’t ask it to ship pixel-perfect UI.

The bottom line

The raise valuation is the news. The actual story is the four tickets above — what’s mechanical, what’s judgment, what’s “I’d rather wait” and what’s “I should sit.” For a typical 10-engineer team, hybrid wins on cost and on outcome. Pure Devin is the press-release move. Pure Claude Code leaves the dependency-upgrade and multi-file-refactor pain on the senior engineers’ plate. OpenHands is a real option only if you can absorb the setup cost.

If you want a guided walkthrough of how to wire Claude Code into a daily team workflow — Sessions, hooks, agents, the patterns that survive a real codebase — our Claude Code Mastery course is the structured version of the in-loop side of this stack.

Which of the 4 ticket types ate the most senior-engineer hours on your team last week — and what is the cheapest stack that actually solves that one ticket?

Devin vs Claude Code: Which Wins After Cognition's $1B Raise (2026)

Table of Contents

What just changed

The 4 ticket types — and which agent wins each

Ticket 1 — Flaky test fix

Ticket 2 — Dependency upgrade across 30 files

Ticket 3 — Build a new endpoint with 4 callers

Ticket 4 — Multi-file refactor (rename + signature change across 80 call-sites)

The honest cost table for a ten-engineer team

What this means for you

What this can’t fix

The bottom line

Sources

Build Real AI Skills

Claude Code Mastery

Claude Code Session Mastery

Prompt Engineering