Memory and State: Agents That Remember
Implement agent memory patterns: buffer memory, summary memory, vector store retrieval, and entity extraction. Build agents that maintain context across sessions.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
An agent that forgets everything after each conversation is like a colleague with amnesia. Memory transforms agents from one-shot tools into persistent assistants that build knowledge over time.
🔄 Quick Recall: In the previous lesson, you learned about multi-agent orchestration patterns. One key challenge in multi-agent systems is context loss at handoffs. Memory management solves this — ensuring agents retain the right information at the right time.
The Memory Challenge
LLMs have a fundamental constraint: finite context windows. Even with 128K or 200K tokens, an agent processing documents, tool results, and conversation history can fill that window quickly. Once full, older information gets truncated — and with it, potentially critical context.
Memory management is the set of strategies for deciding what to keep, what to summarize, and what to store externally.
Memory Pattern 1: Buffer Memory
The simplest pattern — keep the entire conversation history.
System: You are a research assistant.
User: Find me papers about climate change impacts on agriculture.
Agent: [searches, returns 5 papers]
User: Focus on the third one — the one about wheat yields.
Agent: [still has the full context, knows which paper is "third"]
Pros: Complete context, no information loss Cons: Context window fills quickly, expensive (more tokens = more cost)
When to use: Short conversations, simple agents, tasks under 20 turns.
Sliding Window Variant
Keep only the last N messages:
Messages 1-10: [discarded]
Messages 11-20: [in context]
User: "What was the first paper you found?"
Agent: "I don't have that information in my current context."
Trade-off: Saves tokens but loses early context. Best for tasks where recent information matters most.
Memory Pattern 2: Summary Memory
Periodically compress the conversation into a summary:
Full conversation (2000 tokens):
User asked about climate papers → found 5 → focused on
wheat yield paper by Chen et al. → discussed methodology
→ user wants to replicate the study
Summary (200 tokens):
"User is researching climate change impacts on wheat
yields. Key paper: Chen et al. (2024) on wheat yield
decline. User wants to replicate the methodology.
Key constraint: limited to US Midwest data."
The agent carries the summary forward, freeing space for new content.
Pros: Dramatic token savings, retains key information Cons: Summarization loses details — the agent knows “Chen et al.” is important but might forget the exact p-value discussed.
✅ Quick Check: An agent summarizes 50 messages into a 3-sentence summary. The user then asks “What was the specific number I mentioned in message 12?” The agent can’t answer because the summary didn’t include that detail. How do you fix this? (Answer: Hybrid approach — keep the summary for general context AND store the full conversation in external storage (database or vector store). When the user asks about specific details, the agent can search the full history. The summary provides broad context; the external store provides precision. Never rely on summaries alone for tasks requiring specific recall.)
Memory Pattern 3: Vector Store Memory
Store memories as vector embeddings for semantic retrieval.
How It Works
- Store: Convert each memory to a vector embedding and save it
Memory: "User prefers bullet-point format for reports"
→ Embedding: [0.23, -0.45, 0.67, ...] (768 dimensions)
→ Stored in vector database
- Retrieve: When a new message arrives, find semantically similar memories
New message: "Generate a summary of Q3 results"
→ Query embedding: [0.21, -0.43, 0.65, ...]
→ Top match: "User prefers bullet-point format for reports"
→ Agent uses bullet-point format automatically
Why Vector Search Works for Memory
Vector embeddings capture meaning, not just words. This means:
- “marketing budget” finds “allocate $50K to campaigns”
- “meeting notes from last week” finds “Tuesday standup summary”
- “the thing we discussed about pricing” finds “proposed 15% price increase for Q2”
Architecture
User Message → Embedding Model → Query Vector
↓
Vector Database Search
↓
Top-K Similar Memories
↓
Inject into Agent Context
↓
Agent Processes with Memory
Memory Pattern 4: Entity Memory
Extract and maintain structured information about entities:
Conversation Turn 1:
"Sarah mentioned the budget is $50K"
→ Entity: Sarah | budget: $50K
Conversation Turn 5:
"Sarah's team grew to 12 people"
→ Entity: Sarah | budget: $50K, team_size: 12
Conversation Turn 12:
"The budget was increased to $75K"
→ Entity: Sarah | budget: $75K, team_size: 12
Entity memory maintains a structured “profile” for each entity, updating it as new information arrives.
When to use: CRM agents, project management, any domain with entities that accumulate attributes over time.
✅ Quick Check: An entity memory system stores “Customer: Acme Corp, contract: $100K/year, renewal: March 2026.” It’s now February 2026. Should the agent proactively mention the upcoming renewal? (Answer: Yes — this is where memory transforms agents from reactive to proactive. The agent should check entity memory at the start of each interaction for time-sensitive information. “I notice Acme Corp’s contract renewal is coming up in March. Would you like me to prepare renewal materials?” This proactive behavior based on memory is what makes agents feel genuinely helpful.)
State Management
Beyond memory of past conversations, agents need to track the current state of a task.
Task State
{
"task": "Generate quarterly report",
"status": "in_progress",
"steps_completed": ["data_collection", "analysis"],
"steps_remaining": ["visualization", "executive_summary"],
"current_step": "visualization",
"data": {"revenue": 12300000, "growth": 0.15}
}
If the agent is interrupted (timeout, error, user leaves), it can resume from the saved state rather than starting over.
Checkpointing
For long-running tasks, save progress at key milestones:
Step 1: Collect data ✓ [checkpoint saved]
Step 2: Analyze data ✓ [checkpoint saved]
Step 3: Create visualizations → [error: chart library timeout]
→ Resume from Step 3 checkpoint instead of restarting
Choosing the Right Memory Strategy
| Scenario | Recommended Pattern |
|---|---|
| Short tasks (< 20 turns) | Buffer memory |
| Long conversations | Summary + vector store |
| User profiles / preferences | Entity memory (persistent) |
| Document analysis | Vector store with source tracking |
| Multi-session tasks | State checkpointing + long-term memory |
| Customer support | Entity memory + summary memory |
Practice Exercise
- Design a memory strategy for a personal assistant agent that helps with daily tasks
- What goes in short-term memory? Long-term? Entity memory?
- How would you handle a user who says “Remember that I never want to be scheduled before 9 AM”?
Key Takeaways
- Buffer memory (full history) is simplest but fills the context window fastest
- Summary memory compresses history, saving tokens but losing specific details
- Vector store memory enables semantic retrieval — finding relevant past information by meaning, not just keywords
- Entity memory maintains structured profiles that update as new information arrives
- State management with checkpointing lets agents resume interrupted tasks
- The best systems combine multiple patterns: summaries for broad context, vectors for specific recall, entities for structured data
Up Next
In the next lesson, you’ll learn how to make agents production-ready — guardrails for safety, evaluation for reliability, and observability for debugging.
Knowledge Check
Complete the quiz above first
Lesson completed!