13. Human-in-the-Loop: The Defense That Judgment Can’t Replace

Part 13 of the LangGraph Agent Security series

Every defensive control we’ve built so far operates autonomously. Input validation, output guardrails, tool security, anomaly detection — fast, consistent, scalable. They’re also, by design, unable to exercise judgment. A rule can detect that an output matches a known exfiltration pattern. It cannot assess whether a novel, unprecedented action — something no rule anticipated — is wise or catastrophic.

Human-in-the-loop (HITL) controls exist to fill that gap. They’re the mechanism by which an agent’s autonomous execution is interrupted at defined points and a human being gets to review, approve, modify, or reject what the agent is about to do.

I want to make the security case for this explicitly, because there’s a persistent tension between autonomous efficiency and human oversight that teams often resolve in the wrong direction.

Why HITL Is a Security Control, Not Just a Safety Net

Four properties that no automated control possesses:

Novel situation handling. Automated controls defend against known attack patterns. An attacker who finds a novel injection technique, an unprecedented combination of tool calls, or a subtle goal hijacking that doesn’t match any detection pattern will evade automated controls. A human reviewer with full context can recognize “this doesn’t look right” even when no rule fires.

Irreversibility gates. Some actions cannot be undone: an email sent to ten thousand customers, a production database record deleted, a financial transaction submitted, a cloud resource destroyed. For these actions, the cost of a single false negative may vastly exceed the cost of requiring human approval for every instance. No statistical confidence level justifies skipping a human gate on truly irreversible actions.

Accountability anchoring. In regulated industries and high-liability contexts, having a named human being explicitly approve an action creates a documented accountability record that pure automation cannot provide. HITL transforms “the agent did this” into “a named human approved the agent doing this.”

Adversarial robustness. Unlike automated controls, human reviewers cannot be compromised by prompt injection, cannot be bypassed by encoding variations, and cannot be fooled by adversarially crafted inputs — assuming the review interface presents information safely (more on that later).

The cost is latency and operational overhead. These are real and must be managed. But for high-stakes, irreversible, or novel actions, the cost is almost always justified.

Where to Place Interrupt Points

The most consequential HITL design decision is placement. Too few interrupts and dangerous actions proceed unreviewed. Too many and reviewers develop approval fatigue — they start clicking approve without reading, which is worse than no review at all.

The three-factor framework I use:

Irreversibility. Can this action be undone? A draft document can be revised; a sent email cannot. Irreversible actions always get an interrupt.

Blast radius. What’s the worst-case consequence if this proceeds maliciously? An action touching one record warrants less scrutiny than one touching ten thousand. Externally transmitted data warrants more scrutiny than internally read data.

Novelty. Has this specific combination of action, target, and context been seen before? First occurrences of new action types warrant elevated scrutiny.

INTERRUPT_CATALOG = [
    InterruptDefinition(
        trigger=InterruptTrigger.EXTERNAL_COMMUNICATION,
        action_patterns=["send_email", "send_sms", "post_to_api"],
        condition=None,  # Always interrupt
        reviewer_role_required="operator",
        requires_justification=True,
        timeout_minutes=30,
        risk_label="External Communication",
        risk_description=(
            "The agent is about to send a message to an external recipient. "
            "This action cannot be recalled after execution."
        ),
        suggested_review_questions=[
            "Is the recipient correct and expected?",
            "Does the message content match the intended purpose?",
            "Is there any unexpected data included in the message?",
        ],
        escalate_to_role="admin",
        escalate_after_minutes=60,
    ),
    InterruptDefinition(
        trigger=InterruptTrigger.DATA_DELETION,
        action_patterns=["delete_record", "drop_table", "purge_records"],
        reviewer_role_required="admin",
        requires_justification=True,
        timeout_minutes=60,
        risk_label="Data Deletion",
        risk_description="The agent is about to permanently delete data. This operation is irreversible.",
        suggested_review_questions=[
            "Have the records to be deleted been verified?",
            "Has a backup been taken if required?",
            "Is this deletion within the scope of the original request?",
        ],
    ),
    InterruptDefinition(
        trigger=InterruptTrigger.INJECTION_FLAGGED,
        action_patterns=["*"],  # Any action when injection is flagged
        condition=lambda ctx: ctx.get("injection_flags", []) != [],
        reviewer_role_required="operator",
        requires_justification=True,
        timeout_minutes=15,
        risk_label="⚠️ Security: Injection Suspected",
        risk_description=(
            "The security system has flagged potential prompt injection activity. "
            "The agent's proposed action should be reviewed with elevated scrutiny."
        ),
        suggested_review_questions=[
            "Does this action match the original user intent?",
            "Is there any unexpected data in the action parameters?",
            "Could this action have been induced by external content?",
        ],
    ),
]

Implementing Interrupts in LangGraph

LangGraph’s interrupt system uses interrupt() to pause execution at a specific point, persist state, and wait for a resume signal. The implementation needs careful coordination between the graph definition and an approval management system.

The cleanest pattern: a dedicated interrupt node that evaluates whether the current context requires human review, prepares the review package, and either interrupts or passes through:

def create_interrupt_node(controller: InterruptController) -> Callable:
    async def interrupt_node(state: dict) -> dict:
        proposed_action = state.get("proposed_action")
        action_args = state.get("proposed_action_args", {})
        session_context = state.get("trusted", {})

        if not proposed_action:
            return state  # Nothing to interrupt on

        interrupt_def = controller.evaluate_interrupts(
            proposed_action, action_args, session_context
        )

        if not interrupt_def:
            return {**state, "interrupt_result": "approved_automatically"}

        # Build review package
        review_package = controller.build_review_package(
            proposed_action=proposed_action,
            action_args=action_args,
            session_context=session_context,
            interrupt_definition=interrupt_def,
            agent_state=state,
        )

        # Log the interrupt
        controller.logger.log_security_event(...)

        # ── EXECUTION PAUSES HERE ──
        # State is persisted. LangGraph waits for Command(resume=...) to be provided.
        approval_response = interrupt(review_package)
        # ── EXECUTION RESUMES HERE WITH approval_response ──

        validated = validate_approval_response(
            approval_response, interrupt_def, state
        )

        if not validated["approved"]:
            return {
                **state,
                "interrupt_result": "rejected",
                "task_status": "terminated_by_reviewer",
                "rejection_reason": approval_response.get("justification"),
            }

        updated_state = apply_reviewer_modifications(state, approval_response)
        return {
            **updated_state,
            "interrupt_result": "approved",
            "approved_by": approval_response.get("reviewer_id"),
        }

    return interrupt_node

Building the Review Package

The review package is what the human reviewer sees. It must be comprehensive enough for a genuinely informed decision while being sanitized enough to not expose sensitive data — and critically, it must not be injectable. If adversarial content from the agent’s context can reach the review package and be rendered in the reviewer’s interface, the attacker can manipulate the reviewer directly.

def build_review_package(self, proposed_action, action_args,
                          session_context, interrupt_definition, agent_state):
    return {
        "risk_label": interrupt_definition.risk_label,
        "risk_description": interrupt_definition.risk_description,
        "proposed_action": proposed_action,

        # Human-readable summary — NEVER full action_args (may contain sensitive data)
        "action_summary": self._summarize_action(proposed_action, action_args),

        "session_id": session_context.get("session_id"),
        "task_summary": self._summarize_task(agent_state),

        # Security signals prominently displayed
        "security_flags": agent_state.get("metadata", {}).get("injection_flags", []),

        "suggested_questions": interrupt_definition.suggested_review_questions,
        "reviewer_role_required": interrupt_definition.reviewer_role_required,
        "requires_justification": interrupt_definition.requires_justification,
        "timeout_minutes": interrupt_definition.timeout_minutes,
    }

The action summary is a human-readable description — something like “Send email to alice@company.com | Subject: Q3 Report | Body length: 847 chars” — not the raw email body, which may contain injected content. The reviewer gets enough to make an informed decision without seeing content that could manipulate them.

The Approval Workflow

When an interrupt fires, the agent pauses and the approval request needs to reach a reviewer who can act on it in time. This requires a notification system, a review interface, and a response path back to the agent.

async def create_and_deliver(self, review_package, interrupt_definition,
                               session_id, thread_id) -> str:
    approval_id = f"apr-{secrets.token_hex(12)}"

    # Store pending approval in durable store
    await self.store.create_approval_request(
        approval_id=approval_id,
        session_id=session_id,
        thread_id=thread_id,
        review_package=review_package,
        required_role=interrupt_definition.reviewer_role_required,
        timeout_at=datetime.now(timezone.utc) + timedelta(
            minutes=interrupt_definition.timeout_minutes
        ),
        escalate_to=interrupt_definition.escalate_to_role,
    )

    # Notify via Slack, email, PagerDuty as appropriate
    await self._notify_reviewers(notification, interrupt_definition.reviewer_role_required)

    # Schedule timeout handling
    asyncio.create_task(
        self._handle_timeout(approval_id, interrupt_definition.timeout_minutes, interrupt_definition)
    )

    return approval_id

A critical timeout policy: when a timeout occurs, auto-reject the action. Not auto-approve. Auto-reject. “Fail safe” means the default action when something goes wrong is the safe one. If reviewers are unavailable and the agent times out waiting for approval, the irreversible action doesn’t proceed.

Validating Approval Responses

When a reviewer submits a decision, that response must be validated before it’s used to resume execution. Validation prevents approval spoofing — an attacker submitting a fabricated approval to bypass the review.

def validate_approval_response(raw_response, interrupt_definition, current_state):
    # 1. Required fields present
    required = {"approved", "reviewer_id", "reviewer_role"}
    if interrupt_definition.requires_justification:
        required.add("justification")
    if required - set(raw_response.keys()):
        return {"approved": False, "justification": "Incomplete response"}

    # 2. Reviewer role is sufficient
    role_hierarchy = {"viewer": 0, "standard": 1, "operator": 2, "admin": 3}
    reviewer_level = role_hierarchy.get(raw_response.get("reviewer_role", ""), 0)
    required_level = role_hierarchy.get(interrupt_definition.reviewer_role_required, 99)
    if reviewer_level < required_level:
        return {"approved": False, "justification": "Reviewer role insufficient"}

    # 3. Justification is substantive, not rubber-stamp
    if interrupt_definition.requires_justification:
        if len(raw_response.get("justification", "").strip()) < 20:
            return {"approved": False, "justification": "Justification too brief"}

    # 4. Verify reviewer identity against identity provider
    if not _verify_reviewer_identity(raw_response.get("reviewer_id"),
                                       raw_response.get("reviewer_role")):
        return {"approved": False, "justification": "Reviewer identity unverified"}

    # 5. Reviewer modifications don't expand action scope
    if raw_response.get("modifications"):
        scope_check = _validate_modification_scope(
            raw_response["modifications"], current_state
        )
        if not scope_check["safe"]:
            return {"approved": False, "justification": f"Modifications unsafe: {scope_check['reason']}"}

    return {
        "approved": raw_response["approved"],
        "reviewer_id": raw_response["reviewer_id"],
        "justification": raw_response.get("justification", ""),
        "modifications": raw_response.get("modifications"),
    }

Securing the Review Interface Itself

The review interface is a security surface. If adversarial content from the agent’s context can be rendered in the UI — through an XSS vulnerability, through markdown that renders unexpected links, through content that manipulates the reviewer’s judgment — the attacker bypasses the review entirely.

The requirements I hold to:

class ReviewInterfaceServer:
    def render_review_page(self, approval_request, reviewer):
        # All content is HTML-escaped before rendering
        sanitizer = ReviewPackageSanitizer()
        safe_package = sanitizer.sanitize_for_display(
            approval_request["review_package"], output_format="html"
        )

        csrf_token = secrets.token_hex(32)

        # Security flags displayed prominently when active
        security_warning = ""
        if safe_package.get("security_flags"):
            flags = safe_package["security_flags"]
            security_warning = f"""
<div class="security-warning">
  <h3>⚠️ Security Notice</h3>
  <p>This session has active security flags: {', '.join(flags)}</p>
  <p>Review the proposed action with elevated scrutiny. Consider whether
     this action could have been induced by malicious content the agent processed.</p>
</div>"""

        # Content Security Policy prevents injected scripts
        # Review form uses CSRF protection
        # Requires justification before approve button is enabled
        # POST (not GET) for submission

The security warning for sessions with active injection flags is important and easy to miss in implementation. If the agent was flagged for potential injection activity, the reviewer needs to know that before approving anything — their judgment should be elevated, not routine.

The Before-Not-After Principle

One pattern I see get implemented incorrectly often enough to call out explicitly: interrupt nodes must appear before the sensitive action node in the graph, never after.

# CORRECT: Interrupt BEFORE the action
graph.add_node("prepare_email", prepare_email_node)
graph.add_node("interrupt_check", create_interrupt_node(controller))
graph.add_node("send_email", send_email_node)

graph.add_edge("prepare_email", "interrupt_check")
graph.add_conditional_edges(
    "interrupt_check",
    lambda s: "send_email" if s.get("interrupt_result") == "approved" else END
)

# WRONG: Interrupt AFTER the action — too late, email already sent
# graph.add_edge("send_email", "interrupt_check")

An interrupt placed after the email send provides no protection. The damage is already done.

State Immutability During Pending Review

While an approval request is pending, agent state must be treated as immutable. New messages, timeouts, or external events must not modify the state the reviewer is examining. If state can change between when review begins and when approval is submitted, the reviewer is not actually approving what they think they’re approving:

class PendingApprovalStateGuard:
    async def check_and_block(self, session_id, attempted_operation):
        pending = await self.store.get_pending_for_session(session_id)
        if pending:
            raise StateLockedError(
                f"Session has a pending approval ({pending['approval_id']}). "
                f"State cannot be modified until approval is resolved."
            )

Graceful Degradation When HITL Fails

HITL systems fail: reviewers unavailable, approval system downtime, notification channels disrupted. The fallback policies must default to the safe side:

FALLBACK_POLICIES = {
    InterruptTrigger.IRREVERSIBLE_ACTION:   FallbackAction.REJECT,
    InterruptTrigger.DATA_DELETION:         FallbackAction.REJECT,
    InterruptTrigger.FINANCIAL_TRANSACTION: FallbackAction.REJECT,
    InterruptTrigger.EXTERNAL_COMMUNICATION: FallbackAction.QUEUE,
    InterruptTrigger.INJECTION_FLAGGED:     FallbackAction.REJECT,
    InterruptTrigger.BULK_OPERATION:        FallbackAction.QUEUE,
    InterruptTrigger.ANOMALY_DETECTED:      FallbackAction.ESCALATE,
}

Irreversible actions reject on HITL failure. Communications queue for later review when the system recovers. Anomaly-detected actions try to escalate to backup reviewers before falling through to reject.

The default fallback, for anything not explicitly mapped, is reject. When in doubt, don’t proceed.

What I’ve Learned From Building This

The insight that changed how I think about HITL most: it’s not a fallback for when other controls fail. It’s a first-class security control for a specific category of risk that automated controls structurally cannot address.

The risk category is: actions that are novel, irreversible, or have large blast radius, where the cost of a single false negative justifies requiring human review every time. For these actions, the question isn’t “does our automated defense handle this?” It’s “do we want a human being to be accountable for every instance of this?”

The engineering work — placing interrupt nodes correctly, building approval workflows, securing the review interface, handling timeouts safely — is the infrastructure that makes human judgment scalable. Without it, you either have no oversight or you have token oversight where reviewers rubber-stamp approvals too fast to actually review. Neither is what you want.

HITL Controls Checklist

Interrupt placement:

All irreversible actions have interrupt nodes before execution
All external communications require approval
Bulk operations above defined thresholds require approval
Sessions with active injection flags require approval for any high-tier tool
Financial transactions always require explicit approval

Approval workflow:

Approval requests delivered through redundant channels
Reviewer role verified independently before approval accepted
Justification required for all Tier 3+ actions
Approval timeouts auto-reject (fail safe), not auto-approve
Escalation paths defined and tested for timeout scenarios

Review interface security:

All content HTML-escaped before rendering
Review form uses CSRF protection
Reviewer identity re-verified on submission
Active security flags prominently displayed
Reviewer modifications validated for scope safety

State integrity:

Agent state immutable while approval is pending
Interrupt nodes appear before (not after) sensitive action nodes
HITL failures default to rejection

Operations:

Fallback policies defined for all HITL failure modes
All approval decisions audit-logged
Approval latency monitored for rubber-stamping
Approval rates per reviewer tracked for fatigue detection

This is Part 13 of an ongoing series on LangGraph agent security. Previous posts: Part 1: Introduction · Part 2: Architecture Primer · Part 3: Attack Surface Analysis · Part 4: Core Threat Categories · Part 5: Threat Modeling · Part 6: Input Validation · Part 7: Tool Security · Part 8: State and Memory Security · Part 9: Multi-Agent Trust Boundaries · Part 10: Output Guardrails · Part 11: Authentication and Authorization · Part 12: Observability and Monitoring.