8/8

Lesson 8 15 min

Capstone: Design Your Agent System

Apply everything from the course: design a complete agent system with architecture, tools, memory, guardrails, and evaluation for a real-world use case.

Premium Course Content

This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.

Access all premium courses
1000+ AI skill templates included
New content added weekly

← Back to course overview

You’ve learned the components, patterns, tools, and practices of AI agents. Now design a complete agent system from scratch — applying everything from the course.

🔄 Quick Recall: Across this course you’ve covered: why agents matter (Lesson 1), the four components (Lesson 2), design patterns (Lesson 3), tool use (Lesson 4), multi-agent systems (Lesson 5), memory (Lesson 6), and production practices (Lesson 7). This capstone integrates all of them.

Capstone Exercise: The Research Assistant Agent

Design an agent system that helps knowledge workers research topics, synthesize findings, and produce reports. Walk through each design decision using the frameworks from this course.

Step 1: Define the Agent’s Purpose

Task: Research a topic, gather information from multiple sources, synthesize findings, and produce a structured report.

Users: Analysts, consultants, product managers — people who research topics and write reports as part of their jobs.

Success criteria: The report is accurate, well-sourced, covers the key aspects of the topic, and follows the user’s preferred format.

Step 2: Choose the Architecture

Decision: Single agent or multi-agent?

Evaluate using the framework from Lesson 5:

Factor	Assessment
Task scope	Multiple skills: search, read, analyze, write
Tool count	5-8 tools — manageable for one agent
Context needs	Fits in one context window for most topics
Parallelism	Research steps could run in parallel, but sequential is simpler

Decision: Start with a single agent. The task is complex but fits within one agent’s capabilities. If research topics regularly exceed the context window, split into a Research Agent + Writing Agent later.

Step 3: Select Design Patterns

Primary: Planning + ReAct

The agent first plans the research (which subtopics to investigate, what sources to check), then executes each step using ReAct (Thought → Action → Observation).

Secondary: Reflection

After drafting the report, the agent reflects: Are all claims sourced? Does the structure match the user’s format? Are there gaps in coverage?

[Plan] Break topic into 4-5 subtopics
[ReAct] Research subtopic 1: search → read → synthesize
[ReAct] Research subtopic 2: search → read → synthesize
...
[Draft] Write report from synthesized findings
[Reflect] Check accuracy, completeness, format
[Revise] Fix issues identified in reflection
[Deliver] Return final report

✅ Quick Check: The agent’s plan includes “Research subtopic: quantum computing applications in healthcare.” After searching, it finds very little information — only 2 sources, both speculative blog posts. What should the adaptive planning agent do? (Answer: Replan. The agent should note that this subtopic has insufficient authoritative sources, inform the user that coverage will be limited in this area, and potentially redirect research effort to better-documented subtopics. It should NOT pad the section with speculation from weak sources. The plan adapts to what the research actually finds.)

Step 4: Define Tools

Tool	Purpose	When Used
`web_search`	Find current information	Research phase
`read_url`	Extract content from web pages	After finding relevant URLs
`file_read`	Read user-provided documents	When user uploads reference material
`file_write`	Save the final report	Delivery phase
`calculate`	Verify numbers and statistics	Fact-checking during reflection

Each tool gets a clear description explaining when to use it and when not to.

Step 5: Design Memory

Memory Type	What It Stores	Pattern
Short-term	Current research context, sources found	Buffer memory
Working state	Research plan, progress per subtopic	Task state with checkpoints
Long-term	User’s format preferences, past topics	Entity memory (user profile)

The agent checkpoints after each subtopic is researched. If interrupted, it resumes from the last checkpoint rather than restarting.

Step 6: Add Guardrails

Input Guardrails:
├── Topic scope check: Is this a research topic we can help with?
├── Harmful content filter: Block requests for harmful information
└── Length estimate: Warn if the topic scope is too broad

Output Guardrails:
├── Source verification: Every claim must cite a source
├── Plagiarism check: No large verbatim passages without quotes
├── Format compliance: Report structure matches requested format
└── Confidence flagging: Mark sections with limited sourcing

Tool Guardrails:
├── URL filtering: Don't access blocked domains
└── Rate limiting: Max 20 web searches per report

Step 7: Plan Evaluation

Test suite (30 cases):

15 normal topics (varied domains: technology, business, science)
8 edge cases (very niche topics, very broad topics, recent events)
4 adversarial (injection attempts, out-of-scope requests)
3 regression (previously failed cases)

Metrics:

Task completion rate target: > 90%
Source accuracy (verified by human): > 95%
Format compliance: > 98%
Average latency: < 5 minutes per report

Course Recap

Lesson	Core Concept	Key Takeaway
1. Welcome	Agents vs chatbots	Agents perceive, plan, act, and adapt in a loop
2. Anatomy	Four components	LLM brain + tools + memory + planning
3. Patterns	ReAct, Reflection, Planning	Choose by task; combine for complex work
4. Tool use	Function calling, MCP, structured outputs	Clear descriptions = correct tool selection
5. Multi-agent	Supervisor, pipeline, peer-to-peer	Start simple; split only when needed
6. Memory	Buffer, summary, vector, entity	Different information needs different storage
7. Production	Guardrails, evaluation, observability	Measure everything; guard every boundary
8. Capstone	Complete system design	Fundamentals before complexity

Design Checklist

Use this when designing any agent system:

Architecture:
□ Single agent or multi-agent? (Justified by actual need)
□ Design pattern selected (ReAct/Reflection/Planning)
□ Pattern combination defined for complex tasks

Tools:
□ Each tool has a clear description with when/when-not
□ Structured outputs for tool inputs and outputs
□ Fallback tools for critical capabilities

Memory:
□ Short-term strategy (buffer/sliding window)
□ Long-term strategy (vector store/entity memory)
□ State management with checkpointing

Production:
□ Input, output, and tool guardrails defined
□ Test suite covering all four categories
□ Observability with distributed tracing
□ Failure recovery with retries and circuit breakers

Key Takeaways

Agent system design follows a clear sequence: purpose → architecture → patterns → tools → memory → guardrails → evaluation
Start with the simplest architecture that could work — a single well-designed agent with good tools beats a complex multi-agent system without fundamentals
Every design decision should be justified by actual need, not theoretical elegance
Adaptive planning, clear tool descriptions, and multi-layer memory are the highest-impact design choices
Production readiness requires four pillars: safety (guardrails), reliability (evaluation), visibility (observability), and resilience (failure recovery)
The meta-principle: fundamentals before complexity — get the basics right, then add sophistication

Knowledge Check

1. You're designing an agent system for a law firm that reviews contracts. The system needs to extract key terms, flag risky clauses, and suggest revisions. Which architecture would you choose?

One general-purpose agent that does everything A pipeline architecture: Extract Agent (pull key terms into structured data) → Analysis Agent (compare against risk templates) → Revision Agent (suggest specific changes) — each agent is specialized, and the pipeline ensures consistent processing order A peer-to-peer system where agents discuss the contract

2. Your agent system processes 500 customer requests per day. On day 30, you notice the average response quality has declined — scores dropped from 4.2/5 to 3.6/5 over two weeks. What's your debugging approach?

Retrain the agent Check the observability data: (1) Did the decline correlate with a model update or API change? (2) Is the decline uniform across all request types or concentrated in one category? (3) Are tool call success rates stable? (4) Review traces from low-scored responses to identify the specific step where quality degrades Add more guardrails

3. After completing this course, what's the single most important principle for building reliable AI agents?

Use the most expensive LLM available Start simple — a single well-designed agent with good tools, clear guardrails, and comprehensive evaluation outperforms a complex multi-agent system built without these fundamentals Always use multi-agent architecture

Answer all questions to check

Complete the quiz above first