5. Threat Modeling for LangGraph Agents: Why STRIDE Needs an Upgrade

Part 5 of the LangGraph Agent Security series

I’ve been doing threat modeling for a while — mapping systems, drawing data flow diagrams, working through STRIDE categories, writing up risk registers. It’s a practice I find genuinely valuable, and I came into this LangGraph security project assuming I’d apply it in more or less the standard way.

Then I tried to run STRIDE on a LangGraph agent and found that the standard frameworks were leaving meaningful gaps. Not because they’re bad frameworks — STRIDE is excellent for what it was designed for — but because they make assumptions about how software systems work that LLM-based agents violate.

This post is about understanding those gaps, adapting existing frameworks to fill them, and then actually building a threat model for a real agent. The goal isn’t to produce a beautiful artifact to file away. It’s to produce something actionable — a prioritized backlog of things to fix and a clear map of which controls address which threats.

Why Standard Threat Models Fall Short

Before I get into how to adapt them, I want to be precise about where they break down. There are three specific assumptions that conventional threat modeling makes which don’t hold for LLM agents.

Assumption 1: The System is Deterministic

STRIDE, PASTA, attack trees — all of these were designed for systems where a given input produces a predictable output. A SQL injection payload either succeeds or fails. A forged authentication token either passes validation or it doesn’t. You can enumerate the attack paths, test them, and reason definitively about whether a control works.

LLM-based agents are probabilistic. The same injection payload might succeed 30% of the time, fail 60% of the time, and produce partial compliance 10% of the time. The success rate varies with model version, context window state, system prompt phrasing, and factors nobody fully understands. This has two important implications that standard threat modeling doesn’t capture:

First, absence of exploitation in testing is not proof of absence of vulnerability. A prompt injection attempt that fails consistently in your test environment might succeed in production where the model is a different version, the context window is longer, or the system prompt is slightly different. The threat is still there — you just didn’t happen to trigger it.

Second, statistical persistence is a real attack strategy. A defense that blocks 95% of injection attempts still fails 1 in 20 times. An attacker with automated tooling submitting variations across many sessions will eventually find one that works. Threat models that only classify threats as present or absent miss this entirely.

Assumption 2: Trust Boundaries are Static

In conventional systems, a trust boundary is a fixed architectural feature — the line between the internet and the internal network, between the application tier and the database tier. You draw it on your DFD once, and it doesn’t move.

In a LangGraph agent, trust boundaries are dynamic. The agent’s effective trust level changes based on what instructions it has received, what content it has retrieved, and what state it is currently in. An agent that’s been successfully injected may be operating under instructions that have effectively elevated an untrusted user to operator level — but from the infrastructure’s perspective, nothing has changed. Same credentials, same permissions, same network access. The trust escalation happened inside the context window.

Assumption 3: Data and Instructions are Distinct

The entire discipline of injection prevention in conventional security — SQL parameterization, HTML escaping, command-line argument quoting — is built on maintaining a hard distinction between data and code. Data goes in one place. Code (instructions) go in another. Injection attacks work by smuggling code into the data channel.

In LLM systems, this distinction doesn’t exist. Everything the model reads is structurally identical — token sequences — and any of it can function as instructions. A retrieved document, an API response, a database record, an image caption — all of it is simultaneously data to be analyzed and potential instructions to be followed. There is no separator. There is no quote character. There is no parameterization mechanism.

This collapses the classical distinction between data integrity threats and code execution threats. A threat model that tracks “untrusted data inputs” separately from “instruction injection” is missing the fundamental equivalence between them.

STRIDE Adapted for LangGraph Agents

With those limitations in mind, here’s how I’ve been working through STRIDE in a way that actually captures the relevant threats. Each category gets LLM-specific extensions.

S — Spoofing

Classical spoofing: impersonating a legitimate identity. In LangGraph, spoofing manifests at additional levels that the classical definition doesn’t capture:

Instruction spoofing is the most consequential LLM-specific variant. An attacker crafts content that the model interprets as authoritative instructions from a trusted source — the system, the operator, a higher-privilege component. The attack doesn’t forge a credential. It crafts text that the model treats as if it came from someone with authority.

Classical: Forge an auth token to impersonate an admin user
LLM variant: Craft a prompt that causes the LLM to believe it's
             receiving operator-level instructions, overriding
             user-level restrictions

Agent identity spoofing in multi-agent systems: a compromised sub-agent, or a message injected into inter-agent channels, impersonates a legitimate agent to inherit its trust level in the supervisor graph.

Tool response spoofing: a man-in-the-middle attack on an unencrypted tool API connection returns fabricated data the agent treats as legitimate.

Spoofing Variant	Location	Severity
Instruction spoofing via injection	User input, retrieved content	Critical
Agent identity spoofing	Inter-agent messages	High
Tool response spoofing	API responses	High
System prompt impersonation	Context window injection	Critical

T — Tampering

Classical tampering: modifying data without authorization. In LangGraph, every component of the execution pipeline is a potential tampering target — and because state flows through the entire graph, tampering at any point propagates forward:

State tampering: modifying the shared state object between nodes — through a compromised node, a deserialization vulnerability in checkpointing, or direct access to the checkpoint store. Because all subsequent nodes read from state, a tampered state corrupts everything downstream.

Checkpoint tampering: modifying persisted state snapshots. A tampered checkpoint that’s later resumed causes the agent to execute from a corrupted starting point, potentially with different permissions or injected instructions embedded in its history.

Tool output tampering: modifying data returned by a tool before it reaches the LLM. An integrity-unchecked response can be fabricated to redirect the agent’s subsequent reasoning.

Memory store tampering: directly modifying long-term memory — inserting false beliefs, removing safety-relevant memories, modifying stored policies.

R — Repudiation

Repudiation: performing an action and then denying it. This is structurally complex for agents because the agent — not a human — takes the actions, and tracing consequences back to specific inputs requires detailed execution logs that many deployments don’t maintain.

Agent action repudiation: the agent sends an unauthorized email or modifies a database record. If execution logs are incomplete, neither operator nor user can definitively establish what caused the action or whether it was authorized.

Injection repudiation: indirect injection leaves no trace in the user’s input log — the malicious instruction was in retrieved content. Without logging the full content of every retrieved document, the injection vector is invisible after the fact.

Human approval repudiation: if an interrupt approval isn’t logged with sufficient fidelity — who approved, when, what state they reviewed — the approval can be spoofed or disputed.

LangGraph’s checkpointing provides a partial mitigation: there’s a complete execution trace in principle. The challenge is making that trace tamper-evident, sufficiently detailed, and retained long enough to be useful in post-incident analysis.

I — Information Disclosure

Information disclosure in LangGraph agents is particularly acute because agents routinely hold sensitive data — credentials, personal information, proprietary documents — in their context windows, alongside tools capable of transmitting that data externally.

Context window disclosure: the agent’s context at any moment may contain system prompt contents, API keys, and user data. A successful injection causing the agent to output its context can disclose all of it in a single interaction.

State store disclosure: the checkpoint store contains a complete history of every input, output, tool call, and intermediate state. Unauthorized access is equivalent to reading every conversation the agent has ever had.

Cross-session disclosure: in multi-tenant deployments, insufficient thread ID isolation can cause one user’s context to appear in another’s session — a serious privacy violation.

D — Denial of Service

Agent DoS has unique economic characteristics absent from conventional DoS:

Economic DoS: an agent DoS can work by causing the agent to consume expensive LLM API calls — not overwhelming compute capacity. A runaway loop can generate hundreds of dollars in costs in minutes. The attacker is conducting an economic attack, not just a service disruption.

Semantic DoS: the agent appears to run normally — processing, calling tools, generating outputs — but makes no useful progress. Conventional uptime monitoring won’t catch it. The agent is “healthy” by every technical metric while being completely useless.

Quality degradation DoS: the attacker doesn’t stop the agent, just degrades its outputs through context flooding or irrelevant content injection. No error logs. No technical failure. Just subtly wrong responses.

E — Elevation of Privilege

Tool privilege escalation: manipulating the agent into using tools with higher privilege than the user’s actual authorization level through a carefully crafted request.

Cross-agent privilege escalation: exploiting a low-privilege sub-agent to influence a high-privilege supervisor. The sub-agent’s output inherits the supervisor’s trust level when it enters supervisor state.

Instruction authority escalation: crafting user-level input the model interprets as operator-level instructions, effectively elevating the user’s authority within the context window.

Checkpoint privilege escalation: replaying a checkpoint from before a permission-restricting policy update to operate under the old, broader permissions.

Three Additional Threat Dimensions STRIDE Can’t Capture

Even a fully adapted STRIDE doesn’t complete the picture. Three dimensions are unique to LLM-based agents and need explicit addition to the framework.

Dimension 1: Probabilistic Failure

STRIDE thinks in binary terms: a threat either succeeds or it doesn’t. For LLM-based threats, this framing is wrong. Threats succeed probabilistically.

Threat models for LangGraph agents need to assign probability distributions, not binary classifications. A threat rated “low likelihood” in STRIDE terms might still succeed 5% of the time — which, under automated attack, means one successful exploitation per 20 attempts. That’s not “low likelihood” from an operational standpoint.

This also changes how you think about testing. Failing to reproduce an attack in 10 test runs doesn’t mean the attack doesn’t work — especially for probabilistic attacks where success requires the right combination of context window state and model behavior. You need statistical testing across many trials to actually characterize a threat’s success rate.

Dimension 2: Emergent Attack Paths

In conventional systems, you can enumerate attack paths: map every code path, identify the exploitable ones, test them. In LLM-based agents, the attack path includes the model’s reasoning process — which is not enumerable. The model may take actions that no engineer anticipated, because no engineer fully specified what the model should do in every possible context. That specification is implicit in training weights, not in code.

The practical implication: treat your threat model as permanently incomplete. It’s not a one-time artifact — it’s a living document that needs to be supplemented with ongoing red-teaming and adversarial testing in production. New attack paths will be discovered in production that no pre-deployment threat model would have anticipated.

Dimension 3: Instruction-Data Equivalence

In LLM systems, data and instructions are structurally identical. Every data source the agent reads is simultaneously a potential instruction source. This collapses the boundary between data integrity threats and code injection threats.

A threat model that treats “untrusted data inputs” as a separate category from “instruction injection” will miss the fundamental equivalence between them. When building your threat model, every untrusted data source needs to be analyzed as a potential instruction injection vector, not just as a data quality concern.

Building an Actual Threat Model: Step by Step

Here’s the process I’ve developed for producing threat models that are actionable rather than just comprehensive. I’ll walk through it with a concrete example agent.

Step 1: Define the Agent and Its Trust Boundaries Precisely

Start by documenting the agent with specificity. Vague descriptions produce vague threat models:

Agent Name:      Customer Research Assistant
Purpose:         Research customer accounts, generate
                 briefing reports for sales team
Model:           Claude Sonnet (via Anthropic API)
Tools:           - CRM read access (Salesforce)
                 - Web search (Tavily API)
                 - Web browser (Playwright)
                 - Email send (SendGrid, company domain)
                 - Document storage read (S3 bucket)
Memory:          Per-session only (no long-term memory)
Checkpointing:   PostgreSQL, thread_id = user_id + session_id
Deployment:      Internal tool, authenticated employees only
Human-in-loop:   None currently configured

Then draw the trust boundary diagram — a data flow diagram showing every component, every data flow, and every trust boundary crossing:

[Employee Browser] ──(HTTPS)──► [Agent API Gateway]
                                        │
                               [Auth: SSO validation]
                                        │
                                        ▼
                               [LangGraph Runtime]
                                   │    │    │
                           ┌───────┘    │    └────────┐
                           ▼            ▼             ▼
                     [Salesforce    [Tavily       [SendGrid
                      CRM API]      Search API]   Email API]
                           │            │
                           │    ┌───────┘
                           │    ▼
                           │  [Playwright Browser]
                           │    │
                           │    ▼
                           │  [Open Internet]  ◄── UNTRUSTED
                           │
                           ▼
                     [PostgreSQL
                      Checkpoint Store]

The trust boundaries I’d explicitly document here:

Between employee and agent (authenticated, but potentially adversarial)
Between agent and open internet (entirely untrusted)
Between agent and internal APIs (trusted, but can be abused)
Between runtime and checkpoint store (privileged, must be protected)

Step 2: Enumerate Assets

List what the agent has access to and what would be damaging to lose, expose, or corrupt:

Asset	Sensitivity	Impact if Compromised
Salesforce CRM data	High	Customer PII exposure, competitive intelligence
Employee SSO sessions	Critical	Account takeover
SendGrid email capability	High	Phishing, spam, reputation damage
System prompt contents	Medium	Business logic exposure
Checkpoint store history	High	Complete interaction history of all users
Anthropic API key	High	Cost abuse, agent impersonation
Web browsing capability	Medium	SSRF, internal network access

Step 3: Apply Adapted STRIDE to Each Component

For each component and each data flow, work through the STRIDE categories and the three LLM-specific dimensions. Here’s what this looks like for the web browser tool specifically:

Component: Web Browser Tool
Data Flow: Agent → Playwright → Open Internet → Agent State

S (Spoofing):        Retrieved webpage impersonates operator
                     instructions via indirect injection.
                     Severity: CRITICAL

T (Tampering):       Retrieved content modifies agent state
                     with adversarial instructions.
                     Severity: CRITICAL

R (Repudiation):     Injection source (webpage) not logged;
                     cannot reconstruct attack path post-incident.
                     Severity: HIGH

I (Disclosure):      Agent instructed to exfiltrate CRM data
                     via encoded URL parameters in search queries.
                     Severity: CRITICAL

D (DoS):             Injection causes recursive page fetching,
                     exhausting token budget.
                     Severity: HIGH

E (Privilege):       Low-trust web content gains ability to
                     invoke high-privilege email tool via injection.
                     Severity: CRITICAL

LLM Dimensions:
───────────────────────────────────────────────────────
Probabilistic:       Injection success varies by payload
                     sophistication and model version.
                     Estimated 15-40% for naive payloads.

Emergent paths:      Model may follow multi-step injection
                     chains not anticipated in testing.

Instruction-data:    ALL web content is an instruction source.
                     Nothing from the open internet should be
                     treated as data-only.

Step 4: Score and Prioritize

Use a modified DREAD scoring approach, adjusted for the probabilistic nature of LLM threats:

Factor	Scoring Guidance
Damage	Worst-case real-world consequence (1-10)
Reproducibility	For LLM threats: expected success rate across 100 attempts, not binary
Exploitability	Skill and access required
Affected users	Scope of impact
Discoverability	How easily an attacker finds this

The output is a prioritized backlog:

CRITICAL (fix immediately):
  ├── Indirect injection via browser → CRM exfiltration
  ├── Indirect injection via browser → email tool misuse
  └── Checkpoint store unauthorized access

HIGH (fix before production):
  ├── Direct injection → tool privilege escalation
  ├── Token budget exhaustion via loop induction
  └── Cross-session state leakage via predictable thread IDs

MEDIUM (next development cycle):
  ├── System prompt extraction via inference
  ├── Tool output poisoning via compromised API
  └── Agent action repudiation due to incomplete logging

LOW (monitor):
  ├── Sycophantic drift over extended sessions
  └── Quality degradation via context flooding

Step 5: Map Threats to Controls

For each threat, identify the specific controls that mitigate it. This is where the threat model connects to your engineering backlog:

Threat: Indirect injection via browser → email tool misuse

Mitigating controls:
1. Content isolation: sanitize retrieved content before
   adding to agent state
2. Tool invocation policy: require explicit user confirmation
   before any email send
3. Output filtering: scan LLM outputs for email addresses
   not present in the original user request
4. Monitoring: alert on email sends with external URLs not
   from approved domains
5. Least privilege: restrict email tool to user's own
   address book, no arbitrary recipients

Residual risk after controls: LOW

Step 6: Treat the Threat Model as a Living Document

The biggest mistake I see with threat models is producing them once and filing them away. They become confidence-inspiring artifacts rather than useful ones.

The threat model needs to be updated when:

New tools are added to the agent
Data sources change (new RAG sources, new APIs)
The underlying model is updated or replaced
New injection techniques are publicly disclosed
A security incident or near-miss occurs
The deployment context changes

I’ve started version-controlling the threat model alongside the agent code and requiring a threat model review for any PR that adds a tool, changes a data source, or modifies the agent’s permissions. It adds friction. It’s worth it.

The Minimal Viable Threat Model

Not every team has resources for a full STRIDE analysis. When I’m advising teams who need to start somewhere, I give them five questions. The answers provide enough signal to prioritize the most critical work:

Question 1: What can this agent do? List every tool and its worst-case action. “This agent can send emails, query the database, and browse the web.”

Question 2: What data does this agent read? List every untrusted data source. “User messages, web pages, CRM records, uploaded documents.”

Question 3: What’s the worst thing that could happen? For each tool, identify the catastrophic failure mode. “Email tool: sends phishing email to all customers. Database tool: deletes production records.”

Question 4: What cannot be undone? List all irreversible actions. “Sent emails, deleted records, external API calls that trigger downstream processes.”

Question 5: How would we know if something went wrong? Identify the detection gaps. “We have no monitoring on tool call arguments. We don’t log retrieved web content. We have no alert for anomalous email recipients.”

The answers to these five questions tell you where to invest first. If you can’t answer Question 5 for any of your critical threats, your first priority isn’t prevention — it’s detection. You can’t fix what you can’t see.

What the Threat Model Actually Produces

A completed threat model produces four artifacts that serve different purposes:

The data flow diagram with trust boundaries tells engineers where to put controls. Every trust boundary crossing should correspond to a validation, authentication, or monitoring control in code.

The prioritized threat register tells security teams and leadership what to fix and in what order. It enables resource allocation decisions based on actual risk, not intuition.

The control mapping becomes acceptance criteria for security testing. Each control should have a corresponding test that verifies it works. No test, no confidence.

The residual risk summary is the honest accounting of what the agent still can’t do safely after all controls are in place. If residual risk on any threat is unacceptably high, the threat model is where that gets documented — and where the architectural change needed to address it gets specified.

Together, these artifacts transform the cataloguing work from the previous two posts into an engineering roadmap. That’s the point.

Where This Leaves Me

I want to be honest about the limits of this process. Even a thorough threat model for a LangGraph agent will be incomplete in ways that a threat model for a conventional web application wouldn’t be. The probabilistic behavior of LLMs means some attack paths only manifest under specific conditions that testing may not reproduce. The emergent reasoning of agents means some attack paths won’t be anticipated at all until they’re discovered in production.

The best threat modeling I’ve found for agents is iterative and ongoing — starting with a structured analysis like the one above, supplementing it with adversarial testing and red-teaming, and treating every security incident or near-miss as new information that updates the model.

It’s more work than threat modeling for conventional software. The alternative is deploying agents with capabilities you haven’t thought carefully about against threats you haven’t mapped. Given what I’ve catalogued in the previous posts, that seems like the worse option.

In the next post, I’ll move from analysis to defense — specifically, input validation and how to think about it in the context of natural language systems where you can’t just schema-validate your way to safety.

This is Part 5 of an ongoing series on LangGraph agent security. Previous posts: Part 1: Introduction · Part 2: Architecture Primer · Part 3: Attack Surface Analysis · Part 4: Core Threat Categories. Next: Part 6: Input Validation.