6/8

Lesson 6 12 min

Memory and State: Agents That Remember

Implement agent memory patterns: buffer memory, summary memory, vector store retrieval, and entity extraction. Build agents that maintain context across sessions.

Premium Course Content

This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.

Access all premium courses
1000+ AI skill templates included
New content added weekly

← Back to course overview

An agent that forgets everything after each conversation is like a colleague with amnesia. Memory transforms agents from one-shot tools into persistent assistants that build knowledge over time.

🔄 Quick Recall: In the previous lesson, you learned about multi-agent orchestration patterns. One key challenge in multi-agent systems is context loss at handoffs. Memory management solves this — ensuring agents retain the right information at the right time.

The Memory Challenge

LLMs have a fundamental constraint: finite context windows. Even with 128K or 200K tokens, an agent processing documents, tool results, and conversation history can fill that window quickly. Once full, older information gets truncated — and with it, potentially critical context.

Memory management is the set of strategies for deciding what to keep, what to summarize, and what to store externally.

Memory Pattern 1: Buffer Memory

The simplest pattern — keep the entire conversation history.

System: You are a research assistant.
User: Find me papers about climate change impacts on agriculture.
Agent: [searches, returns 5 papers]
User: Focus on the third one — the one about wheat yields.
Agent: [still has the full context, knows which paper is "third"]

Pros: Complete context, no information loss Cons: Context window fills quickly, expensive (more tokens = more cost)

When to use: Short conversations, simple agents, tasks under 20 turns.

Sliding Window Variant

Keep only the last N messages:

Messages 1-10: [discarded]
Messages 11-20: [in context]
User: "What was the first paper you found?"
Agent: "I don't have that information in my current context."

Trade-off: Saves tokens but loses early context. Best for tasks where recent information matters most.

Memory Pattern 2: Summary Memory

Periodically compress the conversation into a summary:

Full conversation (2000 tokens):
  User asked about climate papers → found 5 → focused on
  wheat yield paper by Chen et al. → discussed methodology
  → user wants to replicate the study

Summary (200 tokens):
  "User is researching climate change impacts on wheat
   yields. Key paper: Chen et al. (2024) on wheat yield
   decline. User wants to replicate the methodology.
   Key constraint: limited to US Midwest data."

The agent carries the summary forward, freeing space for new content.

Pros: Dramatic token savings, retains key information Cons: Summarization loses details — the agent knows “Chen et al.” is important but might forget the exact p-value discussed.

✅ Quick Check: An agent summarizes 50 messages into a 3-sentence summary. The user then asks “What was the specific number I mentioned in message 12?” The agent can’t answer because the summary didn’t include that detail. How do you fix this? (Answer: Hybrid approach — keep the summary for general context AND store the full conversation in external storage (database or vector store). When the user asks about specific details, the agent can search the full history. The summary provides broad context; the external store provides precision. Never rely on summaries alone for tasks requiring specific recall.)

Memory Pattern 3: Vector Store Memory

Store memories as vector embeddings for semantic retrieval.

How It Works

Store: Convert each memory to a vector embedding and save it

Memory: "User prefers bullet-point format for reports"
→ Embedding: [0.23, -0.45, 0.67, ...] (768 dimensions)
→ Stored in vector database

Retrieve: When a new message arrives, find semantically similar memories

New message: "Generate a summary of Q3 results"
→ Query embedding: [0.21, -0.43, 0.65, ...]
→ Top match: "User prefers bullet-point format for reports"
→ Agent uses bullet-point format automatically

Why Vector Search Works for Memory

Vector embeddings capture meaning, not just words. This means:

“marketing budget” finds “allocate $50K to campaigns”
“meeting notes from last week” finds “Tuesday standup summary”
“the thing we discussed about pricing” finds “proposed 15% price increase for Q2”

Architecture

User Message → Embedding Model → Query Vector
                                       ↓
                            Vector Database Search
                                       ↓
                            Top-K Similar Memories
                                       ↓
                            Inject into Agent Context
                                       ↓
                            Agent Processes with Memory

Memory Pattern 4: Entity Memory

Extract and maintain structured information about entities:

Conversation Turn 1:
  "Sarah mentioned the budget is $50K"
  → Entity: Sarah | budget: $50K

Conversation Turn 5:
  "Sarah's team grew to 12 people"
  → Entity: Sarah | budget: $50K, team_size: 12

Conversation Turn 12:
  "The budget was increased to $75K"
  → Entity: Sarah | budget: $75K, team_size: 12

Entity memory maintains a structured “profile” for each entity, updating it as new information arrives.

When to use: CRM agents, project management, any domain with entities that accumulate attributes over time.

✅ Quick Check: An entity memory system stores “Customer: Acme Corp, contract: $100K/year, renewal: March 2026.” It’s now February 2026. Should the agent proactively mention the upcoming renewal? (Answer: Yes — this is where memory transforms agents from reactive to proactive. The agent should check entity memory at the start of each interaction for time-sensitive information. “I notice Acme Corp’s contract renewal is coming up in March. Would you like me to prepare renewal materials?” This proactive behavior based on memory is what makes agents feel genuinely helpful.)

State Management

Beyond memory of past conversations, agents need to track the current state of a task.

Task State

{
  "task": "Generate quarterly report",
  "status": "in_progress",
  "steps_completed": ["data_collection", "analysis"],
  "steps_remaining": ["visualization", "executive_summary"],
  "current_step": "visualization",
  "data": {"revenue": 12300000, "growth": 0.15}
}

If the agent is interrupted (timeout, error, user leaves), it can resume from the saved state rather than starting over.

Checkpointing

For long-running tasks, save progress at key milestones:

Step 1: Collect data ✓ [checkpoint saved]
Step 2: Analyze data ✓ [checkpoint saved]
Step 3: Create visualizations → [error: chart library timeout]
→ Resume from Step 3 checkpoint instead of restarting

Choosing the Right Memory Strategy

Scenario	Recommended Pattern
Short tasks (< 20 turns)	Buffer memory
Long conversations	Summary + vector store
User profiles / preferences	Entity memory (persistent)
Document analysis	Vector store with source tracking
Multi-session tasks	State checkpointing + long-term memory
Customer support	Entity memory + summary memory

Practice Exercise

Design a memory strategy for a personal assistant agent that helps with daily tasks
What goes in short-term memory? Long-term? Entity memory?
How would you handle a user who says “Remember that I never want to be scheduled before 9 AM”?

Key Takeaways

Buffer memory (full history) is simplest but fills the context window fastest
Summary memory compresses history, saving tokens but losing specific details
Vector store memory enables semantic retrieval — finding relevant past information by meaning, not just keywords
Entity memory maintains structured profiles that update as new information arrives
State management with checkpointing lets agents resume interrupted tasks
The best systems combine multiple patterns: summaries for broad context, vectors for specific recall, entities for structured data

Up Next

In the next lesson, you’ll learn how to make agents production-ready — guardrails for safety, evaluation for reliability, and observability for debugging.

Knowledge Check

1. An agent has a 128K token context window. After 50 tool calls with detailed results, it's at 120K tokens. The user asks a question about something from tool call #3. What's the risk?

No risk — 128K is plenty The early context (tool call #3) may be pushed out or receive less attention as the window fills — LLMs attend less strongly to the middle of long contexts, a phenomenon called 'lost in the middle.' The agent might give an inaccurate answer about early information The agent will crash

2. You're building a customer support agent. It should remember that customer Jane prefers email communication (learned 3 weeks ago) and that she has an open ticket about billing (from 2 days ago). Which memory types handle each?

Both go in short-term memory Communication preference → long-term memory (persistent user profile, updated rarely). Open ticket → working memory or short-term retrieval (active task context, changes frequently). Different information types need different memory strategies Both go in the system prompt

3. An agent uses vector store memory. When the user asks 'What did we decide about the marketing budget?', the agent retrieves 5 semantically similar memories. But the relevant decision was phrased as 'We'll allocate $50K to Q2 campaigns' — which doesn't contain the word 'budget.' Will the vector search find it?

No — it can't find anything without keyword matches Yes — vector embeddings capture semantic meaning, not just keywords. 'Marketing budget' and 'allocate $50K to Q2 campaigns' are semantically related because they both concern marketing spending. Embedding models map similar meanings to nearby vectors Only if you add keyword tags to every memory

Answer all questions to check

Complete the quiz above first