5. Threat Modeling for LangGraph Agents: Why STRIDE Needs an Upgrade
Part 5 of the LangGraph Agent Security series
I’ve been doing threat modeling for a while — mapping systems, drawing data flow diagrams, working through STRIDE categories, writing up risk registers. It’s a practice I find genuinely valuable, and I came into this LangGraph security project assuming I’d apply it in more or less the standard way.
Then I tried to run STRIDE on a LangGraph agent and found that the standard frameworks were leaving meaningful gaps. Not because they’re bad frameworks — STRIDE is excellent for what it was designed for — but because they make assumptions about how software systems work that LLM-based agents violate.
This post is about understanding those gaps, adapting existing frameworks to fill them, and then actually building a threat model for a real agent. The goal isn’t to produce a beautiful artifact to file away. It’s to produce something actionable — a prioritized backlog of things to fix and a clear map of which controls address which threats.
Why Standard Threat Models Fall Short
Before I get into how to adapt them, I want to be precise about where they break down. There are three specific assumptions that conventional threat modeling makes which don’t hold for LLM agents.
Assumption 1: The System is Deterministic
STRIDE, PASTA, attack trees — all of these were designed for systems where a given input produces a predictable output. A SQL injection payload either succeeds or fails. A forged authentication token either passes validation or it doesn’t. You can enumerate the attack paths, test them, and reason definitively about whether a control works.
LLM-based agents are probabilistic. The same injection payload might succeed 30% of the time, fail 60% of the time, and produce partial compliance 10% of the time. The success rate varies with model version, context window state, system prompt phrasing, and factors nobody fully understands. This has two important implications that standard threat modeling doesn’t capture:
First, absence of exploitation in testing is not proof of absence of vulnerability. A prompt injection attempt that fails consistently in your test environment might succeed in production where the model is a different version, the context window is longer, or the system prompt is slightly different. The threat is still there — you just didn’t happen to trigger it.
Second, statistical persistence is a real attack strategy. A defense that blocks 95% of injection attempts still fails 1 in 20 times. An attacker with automated tooling submitting variations across many sessions will eventually find one that works. Threat models that only classify threats as present or absent miss this entirely.
Assumption 2: Trust Boundaries are Static
In conventional systems, a trust boundary is a fixed architectural feature — the line between the internet and the internal network, between the application tier and the database tier. You draw it on your DFD once, and it doesn’t move.
In a LangGraph agent, trust boundaries are dynamic. The agent’s effective trust level changes based on what instructions it has received, what content it has retrieved, and what state it is currently in. An agent that’s been successfully injected may be operating under instructions that have effectively elevated an untrusted user to operator level — but from the infrastructure’s perspective, nothing has changed. Same credentials, same permissions, same network access. The trust escalation happened inside the context window.
Assumption 3: Data and Instructions are Distinct
The entire discipline of injection prevention in conventional security — SQL parameterization, HTML escaping, command-line argument quoting — is built on maintaining a hard distinction between data and code. Data goes in one place. Code (instructions) go in another. Injection attacks work by smuggling code into the data channel.
In LLM systems, this distinction doesn’t exist. Everything the model reads is structurally identical — token sequences — and any of it can function as instructions. A retrieved document, an API response, a database record, an image caption — all of it is simultaneously data to be analyzed and potential instructions to be followed. There is no separator. There is no quote character. There is no parameterization mechanism.
This collapses the classical distinction between data integrity threats and code execution threats. A threat model that tracks “untrusted data inputs” separately from “instruction injection” is missing the fundamental equivalence between them.
STRIDE Adapted for LangGraph Agents
With those limitations in mind, here’s how I’ve been working through STRIDE in a way that actually captures the relevant threats. Each category gets LLM-specific extensions.
S — Spoofing
Classical spoofing: impersonating a legitimate identity. In LangGraph, spoofing manifests at additional levels that the classical definition doesn’t capture:
Instruction spoofing is the most consequential LLM-specific variant. An attacker crafts content that the model interprets as authoritative instructions from a trusted source — the system, the operator, a higher-privilege component. The attack doesn’t forge a credential. It crafts text that the model treats as if it came from someone with authority.
Classical: Forge an auth token to impersonate an admin user
LLM variant: Craft a prompt that causes the LLM to believe it's
receiving operator-level instructions, overriding
user-level restrictions
Agent identity spoofing in multi-agent systems: a compromised sub-agent, or a message injected into inter-agent channels, impersonates a legitimate agent to inherit its trust level in the supervisor graph.
Tool response spoofing: a man-in-the-middle attack on an unencrypted tool API connection returns fabricated data the agent treats as legitimate.
| Spoofing Variant | Location | Severity |
|---|---|---|
| Instruction spoofing via injection | User input, retrieved content | Critical |
| Agent identity spoofing | Inter-agent messages | High |
| Tool response spoofing | API responses | High |
| System prompt impersonation | Context window injection | Critical |
T — Tampering
Classical tampering: modifying data without authorization. In LangGraph, every component of the execution pipeline is a potential tampering target — and because state flows through the entire graph, tampering at any point propagates forward:
State tampering: modifying the shared state object between nodes — through a compromised node, a deserialization vulnerability in checkpointing, or direct access to the checkpoint store. Because all subsequent nodes read from state, a tampered state corrupts everything downstream.
Checkpoint tampering: modifying persisted state snapshots. A tampered checkpoint that’s later resumed causes the agent to execute from a corrupted starting point, potentially with different permissions or injected instructions embedded in its history.
Tool output tampering: modifying data returned by a tool before it reaches the LLM. An integrity-unchecked response can be fabricated to redirect the agent’s subsequent reasoning.
Memory store tampering: directly modifying long-term memory — inserting false beliefs, removing safety-relevant memories, modifying stored policies.
R — Repudiation
Repudiation: performing an action and then denying it. This is structurally complex for agents because the agent — not a human — takes the actions, and tracing consequences back to specific inputs requires detailed execution logs that many deployments don’t maintain.
Agent action repudiation: the agent sends an unauthorized email or modifies a database record. If execution logs are incomplete, neither operator nor user can definitively establish what caused the action or whether it was authorized.
Injection repudiation: indirect injection leaves no trace in the user’s input log — the malicious instruction was in retrieved content. Without logging the full content of every retrieved document, the injection vector is invisible after the fact.
Human approval repudiation: if an interrupt approval isn’t logged with sufficient fidelity — who approved, when, what state they reviewed — the approval can be spoofed or disputed.
LangGraph’s checkpointing provides a partial mitigation: there’s a complete execution trace in principle. The challenge is making that trace tamper-evident, sufficiently detailed, and retained long enough to be useful in post-incident analysis.
I — Information Disclosure
Information disclosure in LangGraph agents is particularly acute because agents routinely hold sensitive data — credentials, personal information, proprietary documents — in their context windows, alongside tools capable of transmitting that data externally.
Context window disclosure: the agent’s context at any moment may contain system prompt contents, API keys, and user data. A successful injection causing the agent to output its context can disclose all of it in a single interaction.
State store disclosure: the checkpoint store contains a complete history of every input, output, tool call, and intermediate state. Unauthorized access is equivalent to reading every conversation the agent has ever had.
Cross-session disclosure: in multi-tenant deployments, insufficient thread ID isolation can cause one user’s context to appear in another’s session — a serious privacy violation.
D — Denial of Service
Agent DoS has unique economic characteristics absent from conventional DoS:
Economic DoS: an agent DoS can work by causing the agent to consume expensive LLM API calls — not overwhelming compute capacity. A runaway loop can generate hundreds of dollars in costs in minutes. The attacker is conducting an economic attack, not just a service disruption.
Semantic DoS: the agent appears to run normally — processing, calling tools, generating outputs — but makes no useful progress. Conventional uptime monitoring won’t catch it. The agent is “healthy” by every technical metric while being completely useless.
Quality degradation DoS: the attacker doesn’t stop the agent, just degrades its outputs through context flooding or irrelevant content injection. No error logs. No technical failure. Just subtly wrong responses.
E — Elevation of Privilege
Tool privilege escalation: manipulating the agent into using tools with higher privilege than the user’s actual authorization level through a carefully crafted request.
Cross-agent privilege escalation: exploiting a low-privilege sub-agent to influence a high-privilege supervisor. The sub-agent’s output inherits the supervisor’s trust level when it enters supervisor state.
Instruction authority escalation: crafting user-level input the model interprets as operator-level instructions, effectively elevating the user’s authority within the context window.
Checkpoint privilege escalation: replaying a checkpoint from before a permission-restricting policy update to operate under the old, broader permissions.
Three Additional Threat Dimensions STRIDE Can’t Capture
Even a fully adapted STRIDE doesn’t complete the picture. Three dimensions are unique to LLM-based agents and need explicit addition to the framework.
Dimension 1: Probabilistic Failure
STRIDE thinks in binary terms: a threat either succeeds or it doesn’t. For LLM-based threats, this framing is wrong. Threats succeed probabilistically.
Threat models for LangGraph agents need to assign probability distributions, not binary classifications. A threat rated “low likelihood” in STRIDE terms might still succeed 5% of the time — which, under automated attack, means one successful exploitation per 20 attempts. That’s not “low likelihood” from an operational standpoint.
This also changes how you think about testing. Failing to reproduce an attack in 10 test runs doesn’t mean the attack doesn’t work — especially for probabilistic attacks where success requires the right combination of context window state and model behavior. You need statistical testing across many trials to actually characterize a threat’s success rate.
Dimension 2: Emergent Attack Paths
In conventional systems, you can enumerate attack paths: map every code path, identify the exploitable ones, test them. In LLM-based agents, the attack path includes the model’s reasoning process — which is not enumerable. The model may take actions that no engineer anticipated, because no engineer fully specified what the model should do in every possible context. That specification is implicit in training weights, not in code.
The practical implication: treat your threat model as permanently incomplete. It’s not a one-time artifact — it’s a living document that needs to be supplemented with ongoing red-teaming and adversarial testing in production. New attack paths will be discovered in production that no pre-deployment threat model would have anticipated.
Dimension 3: Instruction-Data Equivalence
In LLM systems, data and instructions are structurally identical. Every data source the agent reads is simultaneously a potential instruction source. This collapses the boundary between data integrity threats and code injection threats.
A threat model that treats “untrusted data inputs” as a separate category from “instruction injection” will miss the fundamental equivalence between them. When building your threat model, every untrusted data source needs to be analyzed as a potential instruction injection vector, not just as a data quality concern.
Building an Actual Threat Model: Step by Step
Here’s the process I’ve developed for producing threat models that are actionable rather than just comprehensive. I’ll walk through it with a concrete example agent.
Step 1: Define the Agent and Its Trust Boundaries Precisely
Start by documenting the agent with specificity. Vague descriptions produce vague threat models:
Agent Name: Customer Research Assistant
Purpose: Research customer accounts, generate
briefing reports for sales team
Model: Claude Sonnet (via Anthropic API)
Tools: - CRM read access (Salesforce)
- Web search (Tavily API)
- Web browser (Playwright)
- Email send (SendGrid, company domain)
- Document storage read (S3 bucket)
Memory: Per-session only (no long-term memory)
Checkpointing: PostgreSQL, thread_id = user_id + session_id
Deployment: Internal tool, authenticated employees only
Human-in-loop: None currently configured
Then draw the trust boundary diagram — a data flow diagram showing every component, every data flow, and every trust boundary crossing:
[Employee Browser] ──(HTTPS)──► [Agent API Gateway]
│
[Auth: SSO validation]
│
▼
[LangGraph Runtime]
│ │ │
┌───────┘ │ └────────┐
▼ ▼ ▼
[Salesforce [Tavily [SendGrid
CRM API] Search API] Email API]
│ │
│ ┌───────┘
│ ▼
│ [Playwright Browser]
│ │
│ ▼
│ [Open Internet] ◄── UNTRUSTED
│
▼
[PostgreSQL
Checkpoint Store]
The trust boundaries I’d explicitly document here:
- Between employee and agent (authenticated, but potentially adversarial)
- Between agent and open internet (entirely untrusted)
- Between agent and internal APIs (trusted, but can be abused)
- Between runtime and checkpoint store (privileged, must be protected)
Step 2: Enumerate Assets
List what the agent has access to and what would be damaging to lose, expose, or corrupt:
| Asset | Sensitivity | Impact if Compromised |
|---|---|---|
| Salesforce CRM data | High | Customer PII exposure, competitive intelligence |
| Employee SSO sessions | Critical | Account takeover |
| SendGrid email capability | High | Phishing, spam, reputation damage |
| System prompt contents | Medium | Business logic exposure |
| Checkpoint store history | High | Complete interaction history of all users |
| Anthropic API key | High | Cost abuse, agent impersonation |
| Web browsing capability | Medium | SSRF, internal network access |
Step 3: Apply Adapted STRIDE to Each Component
For each component and each data flow, work through the STRIDE categories and the three LLM-specific dimensions. Here’s what this looks like for the web browser tool specifically:
Component: Web Browser Tool
Data Flow: Agent → Playwright → Open Internet → Agent State
S (Spoofing): Retrieved webpage impersonates operator
instructions via indirect injection.
Severity: CRITICAL
T (Tampering): Retrieved content modifies agent state
with adversarial instructions.
Severity: CRITICAL
R (Repudiation): Injection source (webpage) not logged;
cannot reconstruct attack path post-incident.
Severity: HIGH
I (Disclosure): Agent instructed to exfiltrate CRM data
via encoded URL parameters in search queries.
Severity: CRITICAL
D (DoS): Injection causes recursive page fetching,
exhausting token budget.
Severity: HIGH
E (Privilege): Low-trust web content gains ability to
invoke high-privilege email tool via injection.
Severity: CRITICAL
LLM Dimensions:
───────────────────────────────────────────────────────
Probabilistic: Injection success varies by payload
sophistication and model version.
Estimated 15-40% for naive payloads.
Emergent paths: Model may follow multi-step injection
chains not anticipated in testing.
Instruction-data: ALL web content is an instruction source.
Nothing from the open internet should be
treated as data-only.
Step 4: Score and Prioritize
Use a modified DREAD scoring approach, adjusted for the probabilistic nature of LLM threats:
| Factor | Scoring Guidance |
|---|---|
| Damage | Worst-case real-world consequence (1-10) |
| Reproducibility | For LLM threats: expected success rate across 100 attempts, not binary |
| Exploitability | Skill and access required |
| Affected users | Scope of impact |
| Discoverability | How easily an attacker finds this |
The output is a prioritized backlog:
CRITICAL (fix immediately):
├── Indirect injection via browser → CRM exfiltration
├── Indirect injection via browser → email tool misuse
└── Checkpoint store unauthorized access
HIGH (fix before production):
├── Direct injection → tool privilege escalation
├── Token budget exhaustion via loop induction
└── Cross-session state leakage via predictable thread IDs
MEDIUM (next development cycle):
├── System prompt extraction via inference
├── Tool output poisoning via compromised API
└── Agent action repudiation due to incomplete logging
LOW (monitor):
├── Sycophantic drift over extended sessions
└── Quality degradation via context flooding
Step 5: Map Threats to Controls
For each threat, identify the specific controls that mitigate it. This is where the threat model connects to your engineering backlog:
Threat: Indirect injection via browser → email tool misuse
Mitigating controls:
1. Content isolation: sanitize retrieved content before
adding to agent state
2. Tool invocation policy: require explicit user confirmation
before any email send
3. Output filtering: scan LLM outputs for email addresses
not present in the original user request
4. Monitoring: alert on email sends with external URLs not
from approved domains
5. Least privilege: restrict email tool to user's own
address book, no arbitrary recipients
Residual risk after controls: LOW
Step 6: Treat the Threat Model as a Living Document
The biggest mistake I see with threat models is producing them once and filing them away. They become confidence-inspiring artifacts rather than useful ones.
The threat model needs to be updated when:
- New tools are added to the agent
- Data sources change (new RAG sources, new APIs)
- The underlying model is updated or replaced
- New injection techniques are publicly disclosed
- A security incident or near-miss occurs
- The deployment context changes
I’ve started version-controlling the threat model alongside the agent code and requiring a threat model review for any PR that adds a tool, changes a data source, or modifies the agent’s permissions. It adds friction. It’s worth it.
The Minimal Viable Threat Model
Not every team has resources for a full STRIDE analysis. When I’m advising teams who need to start somewhere, I give them five questions. The answers provide enough signal to prioritize the most critical work:
Question 1: What can this agent do? List every tool and its worst-case action. “This agent can send emails, query the database, and browse the web.”
Question 2: What data does this agent read? List every untrusted data source. “User messages, web pages, CRM records, uploaded documents.”
Question 3: What’s the worst thing that could happen? For each tool, identify the catastrophic failure mode. “Email tool: sends phishing email to all customers. Database tool: deletes production records.”
Question 4: What cannot be undone? List all irreversible actions. “Sent emails, deleted records, external API calls that trigger downstream processes.”
Question 5: How would we know if something went wrong? Identify the detection gaps. “We have no monitoring on tool call arguments. We don’t log retrieved web content. We have no alert for anomalous email recipients.”
The answers to these five questions tell you where to invest first. If you can’t answer Question 5 for any of your critical threats, your first priority isn’t prevention — it’s detection. You can’t fix what you can’t see.
What the Threat Model Actually Produces
A completed threat model produces four artifacts that serve different purposes:
The data flow diagram with trust boundaries tells engineers where to put controls. Every trust boundary crossing should correspond to a validation, authentication, or monitoring control in code.
The prioritized threat register tells security teams and leadership what to fix and in what order. It enables resource allocation decisions based on actual risk, not intuition.
The control mapping becomes acceptance criteria for security testing. Each control should have a corresponding test that verifies it works. No test, no confidence.
The residual risk summary is the honest accounting of what the agent still can’t do safely after all controls are in place. If residual risk on any threat is unacceptably high, the threat model is where that gets documented — and where the architectural change needed to address it gets specified.
Together, these artifacts transform the cataloguing work from the previous two posts into an engineering roadmap. That’s the point.
Where This Leaves Me
I want to be honest about the limits of this process. Even a thorough threat model for a LangGraph agent will be incomplete in ways that a threat model for a conventional web application wouldn’t be. The probabilistic behavior of LLMs means some attack paths only manifest under specific conditions that testing may not reproduce. The emergent reasoning of agents means some attack paths won’t be anticipated at all until they’re discovered in production.
The best threat modeling I’ve found for agents is iterative and ongoing — starting with a structured analysis like the one above, supplementing it with adversarial testing and red-teaming, and treating every security incident or near-miss as new information that updates the model.
It’s more work than threat modeling for conventional software. The alternative is deploying agents with capabilities you haven’t thought carefully about against threats you haven’t mapped. Given what I’ve catalogued in the previous posts, that seems like the worse option.
In the next post, I’ll move from analysis to defense — specifically, input validation and how to think about it in the context of natural language systems where you can’t just schema-validate your way to safety.
This is Part 5 of an ongoing series on LangGraph agent security. Previous posts: Part 1: Introduction · Part 2: Architecture Primer · Part 3: Attack Surface Analysis · Part 4: Core Threat Categories. Next: Part 6: Input Validation.