5 Things Your AI Agent Should Do Tonight (While You Sleep)

Most people use AI between meetings. The smartest businesses now use AI between midnights.

Anthropic shipped a feature on May 6 called Dreaming, and it changes the math on what an AI agent is supposed to do. Instead of waking up confused every morning like Drew Barrymore in 50 First Dates, your agent now spends the night rereading yesterday’s work, noticing patterns, and getting better at its job. The legal tech company Harvey is reporting 6x more completed tasks after turning it on. Not 6% — six times.

If you’ve been wondering “what would I actually have an AI agent do overnight?” — here are five real jobs people are already running. None of them require you to be technical. All of them save someone real money.

First, What Is “Dreaming” — In Plain English

Think of your AI agent like a customer service rep on day three of the job. They handle tickets. They sort of know your products. They sometimes ask dumb questions because they haven’t built up enough context yet.

Now imagine the same rep, but every night they review yesterday’s tickets, take notes on what worked, file a cheat sheet for tomorrow, and quietly retire the bad shortcuts. After two weeks of that, they’re the best rep on the floor. That’s Dreaming.

Technically: Dreaming is a scheduled background process that re-reads an agent’s prior sessions, extracts patterns (“this customer always wants tab-separated output”), merges duplicate notes, throws out stale ones, and writes a playbook for tomorrow. It’s not retraining the AI model — it’s rewriting the agent’s notebook. The plain-text format matters: you can read, edit, or delete what it learned. No black box.

What’s important for non-developers: Dreaming doesn’t run in ChatGPT. It runs inside Claude Managed Agents, which is Anthropic’s product for businesses that want to deploy AI as automated workers, not as a chat tool. You set up the agent, give it tasks, and Dreaming runs while everyone’s asleep.

Three pieces shipped together at Anthropic’s Code with Claude event:

Feature	What it does	Status
Dreaming	Overnight memory consolidation	Research preview
Outcomes	A “grader” agent that checks work against a rubric you write	Public beta
Multi-agent orchestration	A lead agent that delegates to specialist sub-agents	Public beta

Use them together and you get something genuinely new: an agent team that reviews itself, grades itself, and improves itself overnight. Harvey isn’t the only customer running this in production — Anthropic also names Notion, Rakuten, Asana, Sentry, and Atlassian as Managed Agents deployments.

Now, the five overnight jobs that actually pay off.

1. Clean Up Your Customer Support Backlog

Best for: support teams, customer success, any team drowning in tickets.

The night-shift job: Read every support ticket from yesterday. Classify each one by issue type (billing, onboarding, bug, feature request), severity (P0–P4), and product area. Note which ones were solved on first contact vs. needed escalation. Then write the morning report — top 10 recurring issues, suggested macro responses, and which tickets the human team should look at first.

Why Dreaming makes this better over time: After a week, your agent notices things like “EU customers always confuse VAT on annual invoices” or “every iOS-on-iPad bug gets escalated to engineering — let’s tag those.” It writes those patterns into its playbook. Two weeks in, your morning report includes context that took a human rep two months to build.

What the rubric (Outcomes) checks: “Is every ticket tagged? Is the severity defensible? Does the routing rule have a precedent?” If the agent’s classification fails the rubric, it re-classifies before going to bed.

Real-world signal: This is the most common Dreaming-enabled use case in production. Support is repetitive, pattern-heavy, and there’s a clear quality bar — exactly the workload Dreaming was designed for.

2. Re-Score Yesterday’s Sales Leads (and Tell Sales Who to Call First)

Best for: B2B sales teams, founders running outbound, RevOps.

The night-shift job: Pull every lead and conversation from yesterday. Cross-reference with product usage data, CRM notes, and any new signals (LinkedIn job changes, funding announcements, website visits). Re-score every open opportunity. Output: a ranked list of who sales should call first thing in the morning, with a one-line “here’s why.”

What the rubric (Outcomes) checks: Industry fit. Company size. Intent signal (did they visit pricing in the last 48 hours?). Recent engagement (replied to email, attended demo). Past customer behavior pattern match. If a lead doesn’t pass the rubric, the agent demotes it.

Why Dreaming improves this: After a month, the agent notices things like “leads that mention SOC 2 in the second email close 40% more often” or “Tuesday outreach to manufacturing companies gets twice the reply rate of Friday outreach.” Those patterns go into the morning brief. Sales reps get a list ranked by what actually works, not what should theoretically work.

Skeptical note: The agent doesn’t make sales calls for you. It re-orders the queue and explains its reasoning. The hard part — actual conversations with humans — is still your job. But you start the day looking at the right names.

3. Run a Codebase Health and Review Pass

Best for: engineering teams, technical founders, anyone shipping code.

The night-shift job: Walk through a chunk of your codebase. Open draft pull requests for small fixes — outdated documentation, deprecated API calls, inconsistent logging, missing input validation, stale TODOs that have a fix. Don’t merge anything; just draft the PRs so an engineer can review them with coffee.

What the rubric (Outcomes) checks: No unhandled errors. No hardcoded secrets. Tests still pass for changed behavior. Lint passes. The agent revises until those criteria are met or it gives up after N attempts and flags the file as “needs human.”

Why this is the use case Anthropic actually demos: Code review and refactor PRs are the cleanest fit for Managed Agents — well-defined inputs, clear quality criteria, low blast radius if the PR is wrong (the human just doesn’t merge it). Dreaming compounds: after dreaming on a month of reviews, the agent learns “this team forgets to update the logger when they add a new endpoint” or “Python services here always lag behind the standard config.” It catches those without being told.

What it won’t do: Architectural redesigns. Big feature work. Anything where the right answer requires product judgment. Use it for the boring 80%, free up engineers for the interesting 20%.

4. Classify Documents and Stop Your Knowledge Base from Rotting

Best for: operations, IT, compliance, anyone with a Notion or Confluence or Drive that’s become a graveyard.

The night-shift job: Pull every document created or edited yesterday — meeting notes, specs, contracts, customer call transcripts. Classify each by topic, owning team, sensitivity level, and lifecycle stage (draft / current / superseded / archive-me). Update the wiki’s index. Add FAQ candidates to the help-center backlog.

Why Dreaming matters here: Internal knowledge bases have a specific failure mode — the same process gets documented five times by five teams in five formats. Dreaming detects that (“we have five onboarding checklists, here’s the consolidated version”) and promotes the merged version to the live page. Stale docs get archived. Conflicting versions get flagged for a human to resolve.

What the rubric (Outcomes) checks: Is every doc tagged? Is the lifecycle status defensible? Are duplicates flagged with a confidence score? If a doc gets a “current” tag, does the agent’s reasoning hold up?

The trap to avoid: Don’t let the agent delete anything autonomously. The rule should be “flag for review, archive after approval.” Anthropic’s own framing is that Dreaming writes notes — your team approves what becomes canonical. Keep it that way for documents too.

5. Build a Morning Brief for Every Function

Best for: founders, COOs, leadership teams, anyone who wants the org’s pulse in 60 seconds.

The night-shift job: Orchestrate four sub-agents (this is where multi-agent orchestration comes in). One for Sales, one for Support, one for Product, one for Operations. Each pulls its own data from the relevant tools. Each summarizes “what happened yesterday” — metrics, notable events, what changed.

Then the lead agent stitches the four briefs into one morning email. By 7 AM, every function head has a one-page summary of their domain plus the cross-functional view.

What the rubric (Outcomes) checks: Each brief must answer the same core questions — top risks, biggest wins, blockers, suggested actions for today. If any brief skips a section, the agent re-runs that sub-agent before sending.

Why Dreaming makes this brief readable instead of robotic: Over weeks, the agent learns that the VP of Sales reads pipeline changes first and skims revenue. Product wants customer quotes and feature-flag rollouts at the top. Operations cares about incident counts above all else. Dreaming writes those preferences into memory, and the brief auto-formats accordingly.

A real signal of how far this has come: Anthropic explicitly cites multi-agent orchestration for incident response — a coordinator agent that delegates “find the cause,” “draft the communications,” “summarize for leadership” to specialist sub-agents. The morning-brief job is the same pattern in a less stressful context.

What This Means for You

If you run a small business and don’t have engineers: Most of this still requires a Managed Agent setup, which means a developer or a no-code platform that wraps Claude. The good news: agencies and consultancies are already packaging “support triage agent” and “lead scoring agent” as turnkey services. Ask three vendors what they charge. The honest answer for a 10-person team is probably $300–$800/month all in.

If you’re a manager at a bigger company: The fastest path is pick one of the five jobs above where the ROI is clearest (support triage usually wins), find an engineer who’s already played with Claude Code or Managed Agents, and run a 30-day pilot. Don’t try all five at once. Pick the workflow where humans are spending the most time on the lowest-leverage work, and start there.

If you’re a developer: This is the most consequential change to Claude in 2026. Read the Anthropic platform docs on Dreaming before your next agent project. The pattern of “agent + rubric + nightly dream” is a real architecture, and it’s going to be how most production agents work by year-end.

If you’ve never used an AI agent: Don’t start with Dreaming. Start with a plain conversation in Claude or ChatGPT. Get used to the basic interaction. Once you’ve used it for a few weeks and you’ve felt the difference between “AI as chat” and “AI as a worker,” then come back to this list. The hierarchy matters.

What Dreaming Can’t Do (Yet)

Honest list of limits, because every blog about new AI tech needs one:

It can’t fix bad workflows. If your support team is firefighting because the product is broken, an agent that learns to firefight more efficiently doesn’t solve the real problem.
It can over-generalize. The agent might decide a one-off pattern is a rule. You need a human to review the dream output periodically — Anthropic calls this an autonomous-intern model for a reason.
It’s a research preview. Dreaming itself is labeled research preview, not GA. Reliability, pricing, and even the API surface may change. Don’t build your entire business on it yet.
Contaminated memory bites back. If noisy or biased data lands in the agent’s memory, it can develop wrong “beliefs” you have to manually clean. Treat memory hygiene as an ongoing chore, not a setup step.
Rapid-change workflows benefit less. Dreaming pays off when the work is repetitive and the norms are stable. If your team reorganizes every quarter, the patterns it learns expire fast.

How Dreaming Compares to ChatGPT Memory and Gemini Long-Context

People will ask, so here’s the cheat sheet:

Capability	Claude Dreaming	ChatGPT Memory	Gemini Long-Context
Primary purpose	Self-improving agent workflows	Personalized chat memory	Massive context window in a single session
When it works	Overnight, batch	On demand during chat	During a session
What changes	Plain-text playbooks the agent reads next time	Small memory store for user facts	1M-token context + saved facts
Who’s the audience	Engineering / ops teams running agents	End users wanting personalized chats	Developers analyzing huge documents/code
Best fit	Support, sales, code review, KB hygiene, ops briefs	“Remember my project, my dog’s name, my preferences”	Whole-repo refactors, mass document analysis

The three approaches aren’t really competing — they solve different problems. Dreaming is for businesses with repetitive workflows. ChatGPT Memory is for individuals who want a smarter chat. Gemini’s huge context is for one-shot heavy lifts. Use whichever fits the job.

The Bottom Line

The shift from “AI as chat tool” to “AI as overnight worker” is the real story of 2026, and Dreaming is the most concrete sign of it. The five jobs above aren’t hypothetical — Anthropic is naming customers who run them in production. Harvey’s 6x improvement number is the proof of concept; everyone else is now figuring out which of their workflows fit.

For most teams, the right move is to pick one — probably support triage or lead scoring — and run a real pilot. Not a slide deck, not a Slack-channel discussion, an actual 30-day test. The companies who started running overnight agents in mid-2026 will know things about their own operations by Q4 that their competitors won’t figure out until 2027.

If you want to actually deploy agents (not just read about them), our AI Agents Deep Dive course walks through the build, the rubric, and the deployment — including the parts that always go wrong on the first try.

Sources: