If you’re building an AI agent that has to remember things across sessions, the easy mistake is picking the wrong memory system because the marketing makes them sound interchangeable. They aren’t. Claude Dreaming, OpenAI Memory, and Mem0 solve three different problems, with three different architectures, and three different failure modes when used wrong.
Anthropic shipped Claude Dreaming on May 6 — a scheduled offline process where Claude agents review past sessions, extract patterns, and restructure their memory while idle. OpenAI’s Memory feature has been generally available in ChatGPT since 2024, with the underlying architecture more public this year. Mem0 is the open-source memory layer that’s become the default choice for builders who want a model-agnostic system.
This post is for the builder who has to actually choose. What each one is, how each one works under the hood, when each one wins, and when to skip the whole category and write a 50-line JSON file instead.
What each system actually is
The three products solve overlapping but distinct problems.
Claude Dreaming (Anthropic, May 6 2026) is a memory consolidation feature for Claude Managed Agents. Each agent has a persistent memory directory of plain-text notes, task records, tool histories, and preference files. Dreaming is a scheduled offline process that runs while the agent is idle: it scans the existing memory plus recent session transcripts, removes outdated information, resolves contradictions, merges duplicates, normalizes temporal references, and extracts patterns into cleaner topic files. It does not modify model weights. Everything it learns is a readable file you can audit.
OpenAI Memory is a product-level personalization feature in ChatGPT and a growing set of patterns in the OpenAI API. The user-facing version stores facts the user wants ChatGPT to remember (“I’m vegetarian,” “I live in HCMC,” “I’m working on a Hugo site”) and surfaces them across chats. Under the hood, the public architecture (the “OpenMemory” pattern documented this year) uses LangGraph + Redis Streams + a Qdrant vector DB, with an async worker that handles embedding, deduplication, and trust scoring. Focused on user personalization, not workflow optimization.
Mem0 is a standalone, model-agnostic memory layer you plug into any agent or app. It treats memory as a discrete layer between your application logic and storage. Backed by a vector DB (typically Chroma or Qdrant) with an optional graph DB for entity relationships. LLM-driven extraction pulls “memory units” from conversations, an LLM-driven CRUD loop decides whether to add, update, delete, or no-op each candidate against existing memories. Supports hierarchical memory (user / session / agent).
Three different abstractions:
| System | Primary unit of memory | Where memory lives | Who manages it |
|---|---|---|---|
| Claude Dreaming | Curated topic file (plain text) | Anthropic-managed agent directory | Dreaming process (offline) |
| OpenAI Memory | Embedded fact (key-value + vector) | OpenAI infrastructure | OpenAI worker (async stream) |
| Mem0 | Memory unit (embedding + graph node) | Your infrastructure | Your CRUD loop + scheduler |
If your immediate reaction was “I just want to remember the user’s name across chats” — that’s the OpenAI Memory shape. If it was “I want my agent to get better at its own job over weeks of use” — that’s the Dreaming shape. If it was “I’m running 50 agents across 12 customers and need a real memory subsystem” — that’s the Mem0 shape.
How they work under the hood
The architecture is what determines where each wins and loses.
Claude Dreaming
Online path (during the agent’s session):
- User message + tool results enter the agent.
- Agent writes task-time notes into a memory directory (think of it as a curated set of
.mdfiles). - Notes accumulate as the agent works on multi-day projects.
Offline path (the dreaming):
- A scheduled background job triggers — typically when the agent has been idle for several hours.
- The job scans the memory directory and the last N session transcripts.
- An LLM reads the content with a specific consolidation prompt: identify contradictions, find recurrent patterns, normalize ambiguous time references, deduplicate, merge.
- The LLM rewrites the affected memory files. The originals are versioned so you can roll back.
- The job exits. Next time the agent wakes, it reads the cleaner memory.
The architecture is intentionally not a vector store. Anthropic’s bet is that for managed agents, curated plain-text notes scale better with audit and oversight requirements than embeddings. Every memory edit is human-readable. Every consolidation pass produces a diff you can review.
The sandbox is strict: the dreaming process can only modify the memory directory. It cannot touch source code, project files, or the agent’s tool configuration.
OpenAI Memory
Product behavior:
When a user tells ChatGPT to remember something, or when ChatGPT chooses to remember (the model has discretion), the fact gets stored. The user can review, edit, or delete memories anytime via Settings. Memories are used across chats to personalize responses.
Reference architecture (the “OpenMemory” pattern):
- User message enters the assistant.
- The LLM, while processing, proposes memory candidates (“Should I remember X about this user?”).
- The proposal hits a Redis Stream — fire-and-forget, so user-facing latency stays low.
- An async worker pulls from the stream, computes an embedding for the candidate, queries the vector DB for semantically similar existing memories.
- Deduplication logic decides whether this is a new memory, an update to an existing one, or a no-op.
- The new/updated memory is written to Qdrant with metadata (timestamp, source chat, trust score).
- On the next user message, a separate retrieval step queries the vector DB for relevant memories and injects them into the prompt as context.
The architecture optimizes for two things: keeping user-facing latency low (which is why the stream-and-worker pattern exists) and avoiding noisy memories (which is why the trust-scoring and dedup logic exists). The cost is operational complexity — you need Redis, you need the worker, you need the vector DB, and you need monitoring on all three.
Mem0
The pipeline:
- New user message arrives at your agent.
- Mem0 fetches: a running conversation summary (kept fresh by a separate async process) + the last M raw messages (sliding window).
- An extraction step: LLM reads current message + summary + recent messages, produces salient memory units. Output looks like JSON:
{type: "preference", subject: "user", content: "prefers TypeScript over JavaScript"}. - For each candidate memory, Mem0 queries top-K semantically similar existing memories from the vector DB.
- An LLM “tool call” sees the candidate + neighbors and chooses one of four actions: add (new memory), update (modify existing), delete (remove outdated/conflicting), no-op (noise, ignore).
- The chosen action is committed to storage: vector DB for semantic search + optional graph DB for entity relationships.
- On the next turn, retrieval pulls a summary + relevant memories via semantic search and (optionally) graph traversal. Memories are ranked by recency + relevance and injected into the prompt.
The architecture is more sophisticated than OpenAI’s because it’s general-purpose. It has to work for any model, any storage backend, any agent shape. The trade-off is more setup work — you choose the vector DB, you choose whether to add the graph DB, you tune the extraction prompt for your domain.
Side-by-side comparison
| Dimension | Claude Dreaming | OpenAI Memory | Mem0 |
|---|---|---|---|
| Primary use case | Self-improving agents over weeks | Personalized chat assistants | Multi-agent memory subsystem |
| Storage style | Curated topic files (text) | Vector DB + structured facts | Vector DB + optional graph DB |
| Memory consolidation | Offline dreaming (scheduled batch) | Async dedup worker | Continuous LLM CRUD loop |
| Retrieval | Whole topic files loaded per task | Semantic search + rules | Semantic + graph traversal, hierarchical |
| Model coupling | Claude only (Anthropic Managed Agents) | OpenAI ecosystem | Model-agnostic |
| Latency cost | Zero at runtime (consolidation is offline) | Low (async stream pattern) | Higher (CRUD loop on every turn) |
| Audit trail | Every memory file is human-readable, diffable | Stored memories visible in UI | Configurable; depends on your storage |
| Best for | Long-running workflow agents | User personalization across chats | Platforms hosting many agents |
| Vendor lock-in | High (Anthropic) | High (OpenAI) | None (open source) |
| Operational overhead | Lowest (Anthropic runs the dreaming) | Medium (you don’t run anything; OpenAI does) | Highest (you run the whole pipeline) |
Three honest read-outs:
Dreaming wins on auditability. Memory files are plain text. You can read them. Your security team can read them. Compliance can review them. The other two are embeddings — meaningful only to a machine.
OpenAI Memory wins on simplicity for personalization. If your use case is “remember the user’s preferences across chats,” OpenAI Memory is the path of least resistance. You don’t run anything; the platform handles it.
Mem0 wins on flexibility and ownership. If you’re running production agents across customers, the model-agnostic plus self-hosted nature of Mem0 is what makes it scale. You own the data, you choose the storage, you tune the retrieval.
When to pick which (the decision matrix)
The right tool depends on three questions: what are you building, who owns the data, and how much operational complexity can your team handle?
Pick Claude Dreaming if:
- You’re running Claude Managed Agents (it’s tied to that infrastructure)
- Your agents do long-running, multi-day workflows where the agent’s own competence matters more than user-specific personalization
- You need audit trails for memory edits (regulated industries, internal compliance)
- You’re okay with vendor lock-in to Anthropic
- Concrete example: a research-summarization agent that gets better at your team’s specific writing style over months. Or a code-review agent that learns your team’s conventions through repeated exposure.
Pick OpenAI Memory if:
- You’re building an OpenAI-native application and personalization-across-sessions is your main goal
- The “memory” is mostly user facts/preferences, not workflow state
- You don’t want to run infrastructure
- You’re okay with vendor lock-in to OpenAI
- Concrete example: a writing assistant that remembers the user’s voice preferences and recurring projects. Or a tutoring app that remembers the student’s progress and weak areas.
Pick Mem0 if:
- You’re running multiple agents and/or working with multiple model providers
- You need to own the memory data (compliance, data residency, vendor independence)
- You have the engineering capacity to run a memory service (vector DB, scheduler, monitoring)
- You expect agents to share memory or have hierarchical memory scopes
- Concrete example: a SaaS platform where each customer has their own agent and you need memory isolation. Or an internal AI tool where data residency matters.
Skip all three and use plain JSON / SQL if:
- Your “memory” is mostly stable attributes (user role, preferences, settings) you’d manage with a normal database anyway
- Your agent’s session length is short enough that the context window holds everything
- The set of things to remember is small and structured
- You don’t need fuzzy semantic retrieval
- Concrete example: a customer service bot that needs to know the user’s plan tier and recent ticket history. That’s a database query, not a memory system.
The most common mistake builders make is reaching for a memory framework when a 50-line JSON file would do. Memory systems are great when you have unstructured, growing, fuzzy-recall needs. They’re overkill for “remember these 4 user preferences.”
Failure modes to plan for
Every memory system fails. Plan for the failure mode that matches the system you pick.
Claude Dreaming failures:
- Memory consolidation can be wrong. The LLM doing the dreaming makes judgment calls. Sometimes it merges two distinct things, or drops something important because it appeared “outdated.” Mitigation: review the diffs from each dreaming pass, at least weekly while you’re tuning.
- Time normalization can mangle dates. “Last Tuesday” being normalized to a specific date sounds good until the agent picks up an old note where “last Tuesday” was actually two months ago and now treats it as recent. Mitigation: prefer absolute dates in your task-time notes.
- The agent feels less responsive after a dream. Some patterns the agent had learned implicitly get reconsolidated into different shapes. Behavior can drift subtly. Mitigation: A/B test against pre-dream snapshots for high-stakes workflows.
OpenAI Memory failures:
- The model decides what to remember, and sometimes it remembers wrong things. A passing remark gets stored as a preference. A throwaway example becomes “the user’s project.” Mitigation: review and prune memories regularly in Settings; provide explicit “remember this” / “don’t remember that” prompts in critical workflows.
- Memories get stale and contradict newer ones. OpenAI’s dedup helps but isn’t perfect. Mitigation: periodically tell the assistant about contradictions and let it update.
- Hallucinated memories. Worst case — the model “remembers” things that were never said. Rare but happens. Mitigation: keep the memory list reviewable and short.
Mem0 failures:
- The CRUD loop’s LLM call adds latency on every turn. This is by design (extraction + dedup happen synchronously). Mitigation: tune the model used for the CRUD step (usually a smaller, faster model is fine), batch where possible.
- Vector retrieval brings in irrelevant memories. Especially with growing memory stores, top-K can return things that match the embedding but not the actual intent. Mitigation: add filters on metadata (recency, scope, type) before passing to the prompt.
- The graph DB adds operational complexity without proportional value. Many builders enable the graph DB because it sounds smart, then never query it. Mitigation: start with just the vector DB; add the graph layer only when you have a clear use case.
- Memory growth without pruning becomes a performance problem. Mem0 doesn’t aggressively prune by default. Mitigation: schedule your own pruning passes — delete memories older than N days for ephemeral data, archive at year boundaries for long-lived data.
What this means for you
Three quick decision paths for builders:
Path A: You’re building on Anthropic and your agent needs to get smarter over time. Use Claude Managed Agents and turn on Dreaming. The audit trail and the offline consolidation are exactly what production-grade agents need. Plan for weekly review of dream diffs while you’re learning the patterns.
Path B: You’re building an OpenAI consumer app and need cross-session memory. Use OpenAI Memory. Don’t overthink it. If you later outgrow it, the migration to Mem0 or a custom solution is straightforward.
Path C: You’re building a platform or running multiple agents. Use Mem0. Self-host the vector DB. Start without the graph layer. Tune the extraction prompt to your domain. Plan for the operational overhead — you’ll want monitoring on the CRUD loop and pruning schedules from day one.
Path D (honest): If you’re prototyping, use the simplest thing that works. For most prototype agents, that’s the context window plus a small JSON file. Memory systems become valuable when you have real users, real session counts, and real ambiguity about what to remember. Don’t add infrastructure you don’t yet need.
What none of these can solve
A few honest limits across the whole agent-memory category:
- None of them give the agent true understanding of relationships between memories. Vector similarity and graph traversal are useful proxies; they’re not the same as semantic understanding. The agent can still confabulate connections that aren’t there.
- None of them are tamper-proof against the agent itself. All three let the agent (or a consolidation process on its behalf) edit memories. That’s powerful and also the attack surface. Adversarial input can poison the memory store.
- None of them solve the “what’s truly important to remember” problem. That’s a judgment call, and the LLM’s judgment is what’s making it. Sometimes it’s wrong.
- None of them eliminate the need for explicit, structured state for critical workflows. Anything that touches money, identity, or compliance should be in a real database, not a memory system.
The bottom line
Claude Dreaming, OpenAI Memory, and Mem0 are three good solutions to three different versions of “how does my agent remember things.” The right one depends on what you’re building, what you can host, and how much you care about auditability vs convenience vs control.
The wrong question is “which is the smartest memory system.” The right question is “which one matches the shape of the agent I’m building and the team running it.”
If you want to learn the architectural literacy behind agent memory — not just for these three systems, but for the next ten that ship over the next year — these two FindSkill courses are the right starting point:
- AI Agents Deep Dive — the full architectural course. How agents work, what memory layers do, how MCP fits in, how to design agents that fail safely.
- Multi-Agent AI Systems — for builders moving beyond a single agent. Shared memory, agent coordination, the patterns that don’t fall apart at scale.
Sources
- Anthropic: Introducing Dreaming for Managed Agents
- Anthropic docs: Dreaming and memory consolidation
- OpenAI Help: Memory FAQ
- OpenAI Help: Memory and new controls
- Mem0 documentation
- Mem0 architecture and pipeline
- Ars Technica: Anthropic launches Dreaming to give AI agents long-term memory
- VentureBeat: Anthropic introduces Dreaming
- OpenMemory architecture reference (Medium)
- LangGraph + Qdrant + Redis: the OpenMemory reference stack
- Latent Space podcast: Agent memory systems compared