1. LangGraph Agent Security: What I Wish Someone Had Told Me Before I Started
By someone who should have known better
I’ve spent years studying AI systems — their capabilities and their failures. I’ve read the papers on adversarial robustness, thought carefully about AI safety, and written more than a few words about the gap between what AI models say they’ll do and what they actually do.
And yet when I started seriously building agents with LangGraph a few months ago, I found myself staring at my own code thinking: I have created something I don’t fully know how to secure.
That’s not a comfortable feeling for someone who’s supposed to know things about AI systems.
This post is my attempt to organize what I’ve learned since. It’s written for people like me — researchers, ML practitioners, people who understand models but are newer to the engineering realities of deploying agents in production. I’ll assume you know what an LLM is and have some sense of why they fail. What I want to add is the security practitioner’s lens, which I had to develop somewhat painfully from scratch.
Fair warning: I’m going to be opinionated. Some of what I say will be “here’s the obvious thing that took me too long to realize.” Security people will recognize most of this immediately. If that’s you, hi, I respect your work more than I did six months ago.
First, a Clarification I Wish Had Been Made to Me Earlier
Before diving in, I want to draw a distinction that the field often blurs: AI safety versus AI security. As researchers, we tend to conflate these, but they’re pointing at genuinely different problems.
AI safety is about the system behaving as intended. It’s about alignment, reliability, preventing the model from causing unintentional harm. The adversary, if there is one, is entropy — randomness, distribution shift, hallucination, poor training data. A medical AI giving a wrong dosage because of a calculation error is a safety problem. A self-driving car misreading a stop sign in the rain is a safety problem. The threat is internal and accidental.
AI security is about protecting the system from intentional external threats. It’s about adversaries who are actively trying to break, trick, manipulate, or steal from the system. Prompt injection to bypass filters is a security problem. Data poisoning to create backdoors is a security problem. Model inversion to extract training data is a security problem. The threat is external and deliberate.
This guide is about security. Not because safety doesn’t matter — it absolutely does — but because the security questions are under-discussed relative to the safety questions in the research community, and they become urgently practical the moment you deploy an agent in production.
Most of what I write here applies to AI agents broadly, not just LangGraph specifically. But LangGraph is the framework I’ve been working in, and it’s a good lens because it makes the architectural decisions explicit in ways that help security reasoning.
What LangGraph Actually Is (And Why It’s Different)
If you’ve been in NLP or ML for a while, you probably have a mental model of LLM applications as essentially fancy input-output functions. You give the model a context, it generates a completion. Even the more sophisticated setups — RAG pipelines, chain-of-thought, few-shot prompting — are variations on this theme. Data in, tokens out.
Agents break this mental model in a way that has real consequences.
The core idea is that you model an agent’s behavior as a directed graph — a network of nodes and edges — rather than a linear sequence of calls. Each node is a discrete unit of work: an LLM inference call, a tool invocation, a validation step, a routing decision. Each edge defines how execution flows between nodes, and critically, those edges can be conditional — they evaluate the current state and decide where to go next.
What this buys you is the ability to express behavior that linear chains simply cannot:
- Cycles and loops — the agent can reason, act, observe the result, and reason again, iterating until some goal is satisfied
- Branching logic — different paths through the graph based on intermediate results
- Persistent state — a shared data structure flows through the graph, accumulating context across every step
- Human-in-the-loop — execution can pause at defined breakpoints, waiting for human input before continuing
- Multi-agent coordination — separate graphs can delegate to each other, with supervisors orchestrating specialized sub-agents
Here’s a minimal example that shows the basic structure:
from langgraph.graph import StateGraph, END
from typing import TypedDict
class AgentState(TypedDict):
messages: list
tool_results: list
next_action: str
completed: bool
def call_llm(state: AgentState) -> AgentState:
# LLM reads the current state and decides what to do next
response = llm.invoke(state["messages"])
return {
"messages": state["messages"] + [response],
"next_action": response.tool_calls[0].name if response.tool_calls else "done"
}
def execute_tool(state: AgentState) -> AgentState:
# Execute whatever tool the LLM selected
result = tool_registry[state["next_action"]].invoke(state)
return {"tool_results": state["tool_results"] + [result]}
# Build the graph
graph = StateGraph(AgentState)
graph.add_node("llm", call_llm)
graph.add_node("tool", execute_tool)
graph.set_entry_point("llm")
graph.add_conditional_edges(
"llm",
lambda state: "tool" if state["next_action"] != "done" else END
)
graph.add_edge("tool", "llm") # Loop back after tool use
app = graph.compile()
What this encodes is the classic ReAct loop — Reason, Act, Observe, repeat. The LLM decides it needs a tool, the tool runs, the result feeds back into the LLM’s context, and it decides again. This continues until the LLM determines it has enough to answer.
Notice something immediately: the loop is unbounded by default. There’s no maximum iteration guard in that code. This is a small but telling example of the security-relevant decisions that are lurking throughout the framework — things that seem like innocent implementation details until you think about what an adversary could do with them.
A Brief History, Because Context Matters
LangGraph didn’t appear from nowhere. Its origin is bound up with the evolution of LangChain, which Harrison Chase launched in October 2022 as an open-source project while working at Robust Intelligence. LangChain’s initial approach — organizing LLM calls into linear “chains” — worked well for straightforward applications but proved too rigid as developers pushed toward more autonomous behavior. The framework had a single AgentExecutor class for agent-like behavior, but as Chase later reflected, developers wanted far more flexibility and control than that one abstraction could provide.
The team started developing LangGraph in summer 2023 and launched it in early 2024, with two explicit design pillars: full controllability (no hidden prompts or context engineering — you own the entire execution graph) and a production-grade runtime with first-class support for streaming, statefulness, human-in-the-loop, and durable execution.
LangGraph is built by LangChain Inc. but is explicitly designed to be usable without LangChain. The graph model draws inspiration from Google’s Pregel and Apache Beam, with a public interface influenced by NetworkX — an interesting design genealogy that reflects the team’s orientation toward production reliability over research novelty. The broader ecosystem it sits in pairs LangGraph (agent orchestration) with LangChain (integrations and components) and LangSmith (observability), a three-layer architecture that companies including LinkedIn, Uber, J.P. Morgan, and BlackRock have validated in production.
I mention this history because it’s relevant to security thinking. LangGraph was explicitly designed for production control flow, not just research experimentation. The design choices — explicit state, explicit edges, explicit interrupt points — reflect a team that had thought about what could go wrong. That doesn’t mean it’s secure out of the box, but it does mean the framework provides the primitives you need to build securely, which is more than can be said for some alternatives.
The Leap From Model to Agent: Why This Changes Everything
As researchers, we tend to think about LLMs in terms of their capabilities and limitations as statistical models. But deploying an agent forces a shift in how you think about failure consequences, and I found this shift required some recalibration on my part.
Consider the difference plainly:
| Property | Chatbot | LangGraph Agent |
|---|---|---|
| Output type | Text | Actions in the real world |
| External access | None | APIs, databases, email, code execution, web |
| State | Stateless per turn | Persistent, mutable, accumulating |
| Autonomy | None — responds to each prompt | Makes sequential decisions with minimal oversight |
| Duration | Single inference | Potentially hours or days |
| Worst-case failure | A wrong or misleading sentence | Deleted records, sent communications, financial transactions |
That last row is where my recalibration happened. As a researcher, I’m accustomed to thinking about model failures in terms of their outputs: hallucinations, biased text, incorrect reasoning. These matter, but their harm is bounded by the output modality. Bad text is bad. But it’s text.
An agent with tool access doesn’t produce bad text. It takes bad actions. The gap between “what the model decided” and “what happened in the world” collapses completely. The model doesn’t say it could book a flight. It books the flight. It doesn’t describe how data could be exfiltrated. It exfiltrates the data.
Let me make this concrete with a scenario I found clarifying.
The Same Task, Two Systems
A user sends: “Book me a flight to Chicago next Tuesday.”
A chatbot responds: “Sure! Here are some options for flights to Chicago on Tuesday. Would you prefer morning or evening departure? You can book directly at united.com.”
Words about a booking. No booking.
A LangGraph agent executes:
[START]
│
▼
[parse_request] → extracts: Chicago, next Tuesday, user prefs
│
▼
[search_flights] → calls flight API → 12 results
│
▼
[filter_preferences] → narrows to 3 based on stored user preferences
│
▼
[INTERRUPT] ◄── execution PAUSES, sends options to user
│ waits for confirmation
▼ (user selects)
[book_flight] → calls booking API with payment credentials
│
▼
[send_confirmation] → emails itinerary
│
▼
[update_calendar] → modifies Google Calendar
│
▼
[END]
The result is an actual booking, an actual email, an actual calendar entry — all from one instruction. The agent isn’t describing what could happen. It’s making things happen.
This is genuinely impressive. It’s also a fundamentally different security surface than anything we typically analyze as researchers.
Not Just Better Automation: The Crucial Difference From Workflow Tools
One frame I initially reached for — and I think many researchers do — was to think of LangGraph agents as sophisticated workflow automation. More intelligent Zapier. Smarter Airflow. That frame is wrong in a way that matters for security.
Traditional workflow automation is deterministic and static. A human engineer pre-defines every step, every branch condition, every possible outcome at design time. The graph is fixed. If a flight API returns an unexpected response format, the workflow breaks — it has no capacity to adapt. When it fails, it fails loudly: you get an error log, a failed run, an alert.
A LangGraph agent is dynamic and reasoning-driven. The LLM at its core reads intermediate results and decides what to do next — selecting tools, forming sub-goals, adjusting strategy based on what it observes. The graph defines the boundaries of what the agent can do, but the LLM determines the path through those boundaries at runtime.
This distinction has a security implication that I keep returning to: traditional automation fails loudly and predictably. Agents can fail silently and plausibly.
A broken Zapier workflow produces an error log. An agent that has been manipulated into pursuing an attacker’s goal continues executing confidently, producing outputs that look reasonable at every step while doing something its designers never intended. There’s no exception thrown. No error in the logs. Just a sequence of individually-normal-looking API calls that together constitute something harmful.
The power of reasoning-driven execution and the risk of reasoning-driven execution are literally the same thing. You cannot have the adaptability without the unpredictability.
The Landscape Beyond LangGraph
Before going further, it’s worth briefly noting that LangGraph is not the only framework in this space, and the security concerns I’m going to describe are not LangGraph-specific. They’re properties of the entire class of agentic systems.
On the developer framework side, the most prominent alternatives include:
- Microsoft’s AutoGen, which models agent behavior as structured conversations between multiple agents that debate and critique each other. More conversational, less graph-structured.
- CrewAI, which organizes agents into role-based teams. Popular for task decomposition where you want clear specialization.
- OpenAI Agents SDK — opinionated, managed, tightly integrated with OpenAI’s tooling.
- Google’s Agent Development Kit (ADK) — similar first-party integration for Gemini-powered systems.
At the higher-autonomy end sits Manus, an autonomous agent that launched in March 2025 and was later acquired by Meta, designed to independently carry out complex real-world tasks including writing and deploying code — with minimal human guidance. And OpenClaw has emerged as an open-source framework emphasizing general-purpose autonomy across web browsing, code execution, file systems, and messaging platforms. (OpenClaw is relatively new — it was originally named ClawdBot, briefly renamed MoltBot, and is still evolving, so treat it as an emerging tool rather than a mature enterprise option.)
The security principle that cuts across all of these: the more autonomy a framework grants, the larger the attack surface becomes. A highly autonomous system like Manus is powerful precisely because it removes friction — and removing friction from legitimate workflows also removes friction from adversarial ones.
LangGraph sits in an interesting middle position here. It gives you more explicit control over graph structure than highly autonomous systems, which means more places to insert security controls — but it also gives the LLM significant decision-making authority, which means those controls are genuinely needed.
Six Security Implications I Had to Learn
Okay, this is the section I wish I’d read before building anything. Let me walk through the security implications that I either missed initially or understood intellectually but didn’t really feel until I saw them in practice.
1. Agents Take Real-World Actions
I keep saying this because I kept having to remind myself of it. LangGraph agents are typically connected to tools: web search, code execution, databases, email clients, payment APIs, internal services. A compromised or manipulated agent doesn’t produce a harmful text string — it executes harmful operations. The gap between “what the model decided” and “what actually happened” is zero.
2. The Attack Surface Extends in Every Direction
In a conventional application, the attack surface is relatively bounded: HTTP endpoints, database queries, file uploads, user inputs. You can enumerate them.
In a LangGraph agent, the attack surface includes every source of data the agent reads. And agents are designed to read broadly:
- User messages (obvious)
- Retrieved documents from vector stores (less obvious)
- Webpages the agent browses (genuinely non-obvious until you think about it)
- API responses from tool calls
- Outputs from other agents in a multi-agent setup
- Database records it queries
Every one of these is a potential channel for an adversary to deliver instructions to the LLM. This matters because the attacker doesn’t have to interact with your agent directly. They can publish a webpage with a malicious instruction embedded in it, wait for an agent to retrieve it during a legitimate research task, and let the injection happen through the retrieval pipeline. This is called indirect prompt injection, and it’s one of the more unsettling attack patterns I’ve encountered.
3. LLMs Are Not Deterministic Security Controls
This is the one that I, as an AI researcher, should have understood immediately but somehow needed to learn again in the context of deployment.
The mistake is treating the model’s refusal behavior as a security boundary. Assuming that because the LLM “knows” it shouldn’t do X, X can’t happen. LLMs are probabilistic, instruction-following systems. They can be manipulated through carefully crafted prompts. They have known and evolving jailbreak techniques. A model that refuses a direct request might comply with the same request wrapped in a fictional framing, approached gradually across multiple turns, or after a certain amount of carefully crafted context has accumulated.
Using model judgment as your primary security guarantee is architecturally equivalent to relying on JavaScript validation as your only defense against SQL injection. It’ll stop naive attacks. It is not a guarantee.
I find it helpful to think of it this way: the model’s alignment training is a property of its statistical behavior, not a cryptographic commitment. There’s no hash check. No signature verification. No formal proof. Just probabilities that can shift under adversarial pressure.
4. Failures Compound Across Steps
A single LLM inference call has one failure point. A LangGraph agent running a twenty-step workflow has twenty. And because state accumulates across the graph, an early failure doesn’t just affect that step — it propagates forward, potentially influencing every subsequent decision.
I ran into a version of this during testing. An early retrieval step in my agent returned content that had been (accidentally, in my test) contaminated with instruction-like text. The LLM at that step incorporated it into its reasoning. Five steps later, the agent was doing something subtly wrong that I couldn’t immediately trace back to its source. The state had been poisoned early and the corruption compounded with each step.
In production, with an actual adversary constructing that contamination deliberately, this is significantly more serious. By the time you notice something is wrong, the agent may have taken several irreversible actions.
5. Silent Failures Are the Norm
Traditional software fails loudly. Exceptions get thrown. Error logs get written. Alerts fire. There’s usually a clear signal that something went wrong.
Agents can fail silently, in ways that look superficially correct at every observable step. An agent that has been manipulated into exfiltrating data via carefully chosen API call parameters doesn’t throw an exception while it’s exfiltrating. The logs show normal tool calls with normal-looking arguments. The outputs look reasonable. The only way to detect it is with purpose-built monitoring that looks at the semantic content of what the agent is doing, not just whether the execution succeeded.
This is genuinely new territory for observability engineering. Most of the monitoring tooling we have is oriented toward detecting technical failures. Detecting intentional misbehavior that looks like correct behavior requires different instrumentation entirely.
6. The Stakes Are Scaling Faster Than the Defenses
LangGraph agents are no longer confined to research contexts or internal tools. They’re being deployed in customer support, legal document processing, financial analysis, software engineering workflows, and healthcare triage. The cost of a security failure scales with the sensitivity of the domain. An agent with read/write access to a production database, an enterprise email system, or a cloud infrastructure account is a high-value target — and the security practices around these deployments are, in many cases, not keeping pace with the deployment velocity.
The Mindset Shift That I’m Still Completing
Securing a LangGraph agent requires a different frame than anything I was trained to think about as a researcher. It’s not about evaluating model capabilities. It’s not about measuring robustness to distribution shift. It’s not even really about prompt engineering.
The questions that actually matter are:
- What can this agent do? Not in principle, but concretely: for each tool it has access to, what is the worst possible action it could take if manipulated?
- What can the agent read? Every data source is a potential injection point. Have I treated all of them as untrusted input?
- What cannot be undone? There’s a categorical difference between read-only operations and write operations, between internal queries and external communications. The irreversible actions need the most protection.
- How would I know if something was going wrong right now? If my agent were being actively exploited at this moment — slow data exfiltration, subtle goal manipulation, induced loops — what would the signal look like? How quickly would I see it?
That last question is the one I find most uncomfortable to sit with. The honest answer, for most agent deployments I’ve seen (including my own early ones), is: you wouldn’t know for a while. Possibly a long while.
Security for agentic systems is not a feature you add at the end. It’s a design constraint that needs to shape your graph topology, your tool selection, your state management, and your operational monitoring from the beginning. The rest of this guide is my attempt to work out what that actually looks like in practice.
What’s Coming
In subsequent posts, I’ll work through:
- The architecture in depth — how LangGraph’s execution lifecycle actually works, and where each security-relevant event occurs
- Threat categories — prompt injection (direct and indirect), goal hijacking, data exfiltration, denial of service, memory poisoning, supply chain attacks
- Defensive controls — input validation, tool security, state protection, output guardrails, and how they fit together
- Authentication, monitoring, and human-in-the-loop — the operational layer that makes security sustainable
- Testing — adversarial test suites, fuzzing, red-teaming methodology
- Compliance — what GDPR, HIPAA, SOC 2, and the EU AI Act actually mean for agent deployments
I’m writing this as I learn, which means some of it will probably be revised as I develop a better understanding. If you have expertise in any of these areas and think I’ve got something wrong, I genuinely want to know.