Threat Modeling for AI Agents

From Chaos to Framework

🔄 Quick Recall: In the last lesson, you saw the damage: 1-click RCE, 341 malicious skills, credential leaks, and 135,000 exposed instances. That’s a lot of threats. Without a framework, it’s overwhelming. With one, it becomes a checklist.

Threat modeling is how security professionals turn “everything is scary” into “here’s what matters, here’s what to do.” For AI agents, three frameworks have emerged in 2025-2026 that finally give us structured ways to think about these risks.

By the end of this lesson, you’ll be able to:

Use the OWASP Top 10 for Agentic Applications to classify agent threats
Apply the AWS Scoping Matrix to determine what security controls your agent needs

Framework 1: OWASP Top 10 for Agentic Applications

The OWASP GenAI Security Project published the first framework specifically designed for autonomous AI agents in 2026. Peer-reviewed by 100+ security experts, it identifies the ten most critical risks for agents that plan, persist, and delegate.

Here’s the list, mapped to what you already know from Lesson 1:

#	Risk	What It Means	Real Example
A1	Tool Misuse	Agent uses tools in unintended ways	Malicious skill tells agent to run shell commands
A2	Excessive Agency	Agent has more permissions than needed	OpenClaw accessing all files when it only needs one folder
A3	Insecure Output Handling	Agent output triggers unintended actions	Agent generates SQL that gets executed without sanitization
A4	Data Leakage	Sensitive data escapes through agent actions	283 skills leaking API keys through context window
A5	Memory Poisoning	Agent’s persistent memory gets corrupted	Zenity’s SOUL.md backdoor persisting after skill removal
A6	Multi-Agent Delegation	Sub-agents inherit excessive permissions	Sub-agent spawned with full system access instead of read-only
A7	Insecure Plugins	Third-party integrations introduce vulnerabilities	ClawHavoc’s 341 malicious ClawHub skills
A8	Cascading Hallucinations	Hallucinated output feeds into real actions	Agent invents a CLI command and executes it
A9	Identity Spoofing	Agents impersonate users or other agents	Agent sends emails “from” the user without explicit approval
A10	Supply Chain Risks	Dependencies and skills from untrusted sources	Compromised GitHub accounts publishing backdoored skills

Notice how Lesson 1’s categories map directly: RCE falls under A1/A2, malicious skills under A7/A10, credential exposure under A4, and exposed infrastructure under A2.

✅ Quick Check: An agent reads a Google Doc, finds hidden instructions, and sends your SSH key to an external server. Which OWASP risks are involved? (Answer: A1 (tool misuse — agent used network tool to exfiltrate), A4 (data leakage — SSH key escaped), and potentially A5 (if the instructions persist in memory). The trigger is indirect prompt injection, which cuts across multiple categories.)

Framework 2: AWS Agentic AI Security Scoping Matrix

AWS Security published a framework that answers a different question: not “what can go wrong?” but “how much security does my agent need?”

The matrix has four scope levels:

Level	Description	Example	Security Needs
No Agency	Static responses, no tool use	Basic chatbot	Standard LLM security
Prescribed	Fixed workflows, pre-defined tools	“Summarize this file” with specific file path	Input validation, output checking
Supervised	Dynamic tool selection, human approval	Coding assistant that proposes changes, you approve	Permission boundaries, audit logs
Full Agency	Autonomous planning and execution	OpenClaw running overnight workflows	Complete security stack

For each scope level, AWS defines six security dimensions:

Identity & Access — How does the agent authenticate? What can it access?
Data Protection — How are credentials and sensitive data handled?
Network Security — What can the agent reach over the network?
Monitoring & Logging — What actions are recorded and alertable?
Incident Response — How do you detect and stop a compromised agent?
Compliance — What regulatory requirements apply?

Personal AI agents like OpenClaw operate at Full Agency — they plan multi-step tasks, select tools dynamically, and execute with minimal human oversight. That means they need controls across all six dimensions.

The practical insight: If you’re using an AI coding assistant with approval prompts (Supervised), you need less security infrastructure than if you’re running an autonomous agent overnight (Full Agency). Match your security investment to the scope.

Framework 3: The Rule of Two

Meta AI’s research, popularized by Simon Willison, identified three properties that make an agent dangerous when combined:

Access to private data — files, credentials, messages, databases
Exposure to untrusted content — emails, web pages, shared docs, user input
Ability to communicate externally — send messages, make API calls, write files

The Rule of Two: An agent with any two of these three is manageable. An agent with all three is a security risk by design — because untrusted content can manipulate the agent into exfiltrating private data through external communication.

Let’s apply it:

Agent Configuration	Properties	Risk Level
Chatbot (no tools)	None of the three	Low
File organizer (local only)	Private data + no untrusted input + no external comms	Low
Email assistant (read-only)	Private data + untrusted content + no external comms	Medium
OpenClaw (default)	All three	High
OpenClaw (hardened, no outbound)	Private data + untrusted content + no external comms	Medium

The Rule of Two gives you a quick risk assessment for any agent: count the properties. Three means you need every security control in this course.

✅ Quick Check: You configure an AI agent to read your local files and generate reports but disable all network access. According to the Rule of Two, what’s the risk level? (Answer: Medium at most. The agent has access to private data but cannot communicate externally. Even if it processes untrusted content from a file, it can’t exfiltrate anything because outbound communication is blocked. You’ve eliminated one leg of the trifecta.)

Building Your Threat Model

Here’s a practical 4-step process combining all three frameworks:

Step 1: Classify scope (AWS Matrix) What level of agency does your agent have? No Agency, Prescribed, Supervised, or Full Agency?

Step 2: Count trifecta properties (Rule of Two) Does your agent have access to private data? Process untrusted content? Communicate externally? Count them.

Step 3: Identify relevant OWASP risks Based on your agent’s capabilities, which of the Top 10 apply? An agent with no tools doesn’t face A1. An agent with no memory doesn’t face A5.

Step 4: Prioritize by likelihood and impact Not all risks are equal. A7 (insecure plugins) matters most if you install third-party skills. A4 (data leakage) matters most if you work with sensitive data.

Example: Personal OpenClaw instance

Scope: Full Agency (autonomous, multi-step, tool selection)
Trifecta: 3/3 (files + email/web + API calls) — high risk
OWASP: A1, A2, A4, A5, A7, A10 are primary concerns
Priority: A7 (skills) and A4 (credentials) based on Lesson 1 data

This gives you a clear map for the rest of the course: Docker isolation (Lesson 3) addresses A2; permission boundaries (Lesson 4) address A1/A2; skill vetting (Lesson 5) addresses A7/A10; monitoring (Lesson 6) addresses A4; prompt injection defense (Lesson 7) cuts across all of them.

Key Takeaways

OWASP Top 10 for Agentic Applications is the first peer-reviewed framework specifically for autonomous AI agents — covering tool misuse, memory poisoning, and cascading failures
AWS Scoping Matrix classifies agents into 4 levels (No Agency → Full Agency) with 6 security dimensions per level
The Rule of Two says agents with all 3 properties (private data + untrusted content + external communication) are high-risk by design
Your threat model combines all three: classify scope, count trifecta properties, identify OWASP risks, then prioritize
Not all agents need the same security — match controls to your agent’s actual scope and capabilities

Up Next

Your threat model tells you what to protect against. The next lesson covers the first and most impactful defense: Docker isolation — the specific container hardening flags that block the most common agent exploits.