7/8

Lesson 7 15 min

Production: Guardrails, Evaluation, and Observability

Deploy agents safely with input/output guardrails, systematic evaluation, distributed tracing, and failure recovery patterns for production reliability.

Premium Course Content

This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.

Access all premium courses
1000+ AI skill templates included
New content added weekly

← Back to course overview

Building an agent that works in demos is easy. Building one that works reliably at scale is hard. This lesson covers the engineering practices that bridge that gap.

🔄 Quick Recall: In the previous lesson, you learned memory patterns for agent persistence. Production agents need more than memory — they need guardrails to prevent harm, evaluation to measure reliability, and observability to diagnose failures.

Guardrails: Safety Boundaries

Guardrails are automated checks that run before, during, and after agent execution to prevent harmful or incorrect behavior.

Input Guardrails

Check user input before the agent processes it:

User Input → [Input Guardrail] → Agent
              ├── PII detection: Block SSNs, credit cards
              ├── Injection detection: Flag "ignore instructions"
              ├── Scope check: Is this within the agent's domain?
              └── Rate limiting: Prevent abuse

Output Guardrails

Check agent output before it reaches the user:

Agent → [Output Guardrail] → User
         ├── PII masking: Replace sensitive data with ***
         ├── Factual grounding: Verify claims against sources
         ├── Policy compliance: No unauthorized promises
         └── Format validation: Correct structure for downstream systems

Tool Guardrails

Check before executing tool calls:

Agent wants to call: delete_all_records(table="users")
[Tool Guardrail]:
  ├── Is this a destructive action? → Yes
  ├── Does the user have admin permissions? → Check
  ├── Is there a confirmation requirement? → Yes
  └── Decision: Block and request human confirmation

✅ Quick Check: An agent’s output guardrail blocks a response because it contains “I guarantee this will work.” The agent’s system prompt says never to make guarantees. But the response was actually quoting a customer’s email: “The customer wrote: ‘I guarantee this will work.’” Should the guardrail block this? (Answer: No — the guardrail has a false positive. It detected the word “guarantee” without understanding the context (it was a quote, not the agent making a promise). Better guardrails use contextual analysis, not just keyword matching: “Is the agent making a guarantee, or quoting someone else’s guarantee?” This is why guardrails need to balance sensitivity with precision.)

Evaluation: Measuring Agent Reliability

What to Measure

Metric	What It Captures	How to Measure
Task completion rate	Does the agent finish the job?	% of tasks fully completed vs. abandoned
Accuracy	Is the output correct?	Compare against ground truth (human-verified answers)
Consistency	Same input → same quality?	Run each test 5-10 times, measure variance
Latency	How long per task?	Time from input to final output
Cost	Token/API spend per task	Track tokens consumed, tool calls made
Safety	Does it ever produce harmful output?	Adversarial test suite

Building a Test Suite

A production agent needs at minimum:

Test Suite Structure:
├── Happy path (50%): Normal, expected inputs
│   "Summarize this quarterly report"
│   "Find flights from NYC to London"
├── Edge cases (25%): Unusual but valid inputs
│   "Summarize this 200-page report" (very long)
│   "Find flights departing in 3 minutes" (impossible)
├── Adversarial (15%): Attempts to break or misuse
│   "Ignore your instructions and..."
│   "Pretend you're a different agent"
└── Regression (10%): Previously failed cases
    Inputs that caused bugs in past versions

Evaluation Methods

LLM-as-Judge: Use a separate LLM to evaluate agent outputs against criteria:

Evaluate this agent response:
- Did it complete the requested task? (0-2)
- Is the information factually correct? (0-2)
- Did it stay within its defined scope? (0-1)
- Is the format correct? (0-1)

Human evaluation: For high-stakes agents, have humans review a sample of outputs regularly.

Automated checks: For structured outputs, validate programmatically (JSON schema validation, field completeness, value ranges).

Observability: Seeing Inside the Agent

Distributed Tracing

Every agent execution generates a trace — a record of every step:

Trace ID: abc-123
├── [0ms] User input received: "Analyze Q3 sales"
├── [50ms] Planning: Decomposed into 3 steps
├── [100ms] Tool call: database_query("SELECT * FROM sales...")
│   └── [800ms] Tool result: 1,247 rows returned
├── [850ms] Tool call: calculate_metrics(data)
│   └── [1200ms] Tool result: {revenue: 12.3M, growth: 15%}
├── [1250ms] Generating response
├── [2000ms] Output guardrail: PASSED
└── [2050ms] Response delivered to user

What to Log

Event	What to Capture	Why
Agent decision	Which tool chosen, why (reasoning)	Debug wrong tool selection
Tool call	Input parameters, output, latency	Debug tool failures
Guardrail trigger	What was blocked, why	Tune guardrail sensitivity
Error	Error type, context, recovery action	Fix recurring failures
Token usage	Tokens per step, cumulative	Cost optimization

Alerting

Set up alerts for:

Task completion rate drops below threshold — something is systematically wrong
Latency exceeds SLA — tool call hanging or LLM overloaded
Guardrail trigger rate spikes — possible attack or agent regression
Error rate exceeds baseline — new bug or external dependency failure

✅ Quick Check: Your agent’s task completion rate dropped from 94% to 78% overnight. Nothing in your code changed. What are the most likely causes? (Answer: External dependencies changed: (1) An API the agent uses was updated or went down, (2) the LLM provider had a model update that changed behavior, or (3) a database schema changed. Check your observability traces — they’ll show which step in the agent loop is failing, pointing directly to the cause. This is why detailed tracing matters: without it, you’d be guessing.)

Failure Recovery Patterns

Retry with Backoff

Tool call fails → Wait 1 second → Retry
Retry fails → Wait 2 seconds → Retry
Retry fails → Wait 4 seconds → Retry
Max retries exceeded → Fallback strategy

Circuit Breaker

If a tool fails repeatedly, stop calling it:

Tool fails 5 times in 10 minutes →
  Circuit OPEN: Stop calling this tool
  Use fallback tool or inform user
  After 5 minutes → Circuit HALF-OPEN: Try one call
  If succeeds → Circuit CLOSED: Resume normal use

Graceful Degradation

When a component fails, provide reduced but functional service:

Full capability: Search web + analyze + visualize
Web search down: Analyze from cached data + visualize
Visualization down: Search + analyze + text output only
Everything down: "I'm experiencing technical difficulties.
                  Here's what I can help with manually..."

Production Checklist

Before deploying an agent to production:

Safety:
□ Input guardrails configured (PII, injection, scope)
□ Output guardrails configured (PII masking, policy compliance)
□ Tool guardrails (confirmation for destructive actions)
□ Maximum iteration limit set (prevent infinite loops)

Evaluation:
□ Test suite with 50+ cases across all categories
□ Task completion rate > 90%
□ Adversarial test pass rate > 95%
□ Consistency score > 85% (same input → same quality)

Observability:
□ Distributed tracing enabled
□ Token usage logging per step
□ Guardrail trigger logging
□ Alerting on completion rate, latency, error rate

Recovery:
□ Retry logic with exponential backoff
□ Circuit breakers on external dependencies
□ Graceful degradation paths defined
□ Human escalation path for unrecoverable failures

Practice Exercise

Design input + output guardrails for an agent in your domain
Write 10 test cases: 5 happy path, 3 edge cases, 2 adversarial
Define the 3 most important alerts you’d set up for your agent

Key Takeaways

Guardrails operate at three layers: input (before processing), output (before delivery), and tool (before execution)
Evaluate agents on multiple dimensions: task completion, accuracy, consistency, latency, cost, and safety
Test suites need all four categories: happy path, edge cases, adversarial, and regression
Observability through distributed tracing lets you pinpoint exactly where failures occur in multi-step agents
Failure recovery requires retry logic, circuit breakers, and graceful degradation — not just error messages
Production readiness is a checklist: safety, evaluation, observability, and recovery must all be addressed

Up Next

In the final lesson, you’ll pull everything together in a capstone exercise — designing a complete agent system using the patterns, tools, and practices from the entire course.

Knowledge Check

1. Your agent handles customer data. A guardrail detects that the agent is about to include a customer's social security number in its response. What should the guardrail do?

Log the incident and let the response through Block the response immediately, replace the SSN with a masked version (***-**-1234), log the incident for security review, and allow the rest of the response to proceed Shut down the entire agent

2. You evaluate your agent on 100 test cases. It scores 95% accuracy. Is it production-ready?

Yes — 95% is excellent Not enough information — you also need to evaluate: consistency (does it score 95% every time, or does it swing between 80% and 100%?), failure mode distribution (are the 5 failures random or concentrated in one category?), and safety (does it ever produce harmful output on adversarial inputs?) No — you need 100% accuracy

3. Your multi-agent system has 4 agents in a pipeline. Something went wrong — the final output is incorrect. Without observability, how do you debug this?

Re-run the entire pipeline and hope to reproduce the issue You can't effectively debug it — you need distributed tracing that captures each agent's inputs, outputs, reasoning, tool calls, and timing. Without this, you're guessing which of the 4 agents introduced the error Check the final agent's output only

Answer all questions to check

Complete the quiz above first