Glossarys
Short definitions for emerging AI security terms, each linked to deeper explainers and primary sources.
Cross-Agent Infection
Cross-agent infection is a multi-agent prompt injection pattern in which an already-compromised agent propagates the adversary's instructions laterally to peer or delegated agents through inter-agent messages, summaries, shared memory, or consensus dynamics. It names the lateral-spread phase that follows a single-hop injection: recipients never see the original untrusted source, yet inherit the payload by treating the compromised agent's output as trusted task input.
Memory Poisoning
Memory poisoning is a prompt-injection variant in which an attacker causes malicious instructions or false facts to be written into an agent's persistent memory or shared retrieval store, so the payload is replayed as trusted context on a later turn. The defining property is persistence and delayed reactivation: the compromise fires after the original untrusted source has left the conversation, often in a different session, for a different user, or against a different agent in the same system.
Indirect Prompt Injection
Indirect prompt injection is an attack where malicious instructions reach an LLM agent through content it ingests as data — a webpage, retrieved document, email, tool response, or memory record — rather than through the user's prompt. The model interprets that attacker-controlled content as operational instruction and acts on it under the victim user's credentials. Also known as XPIA or second-order prompt injection, it is a confused-deputy vulnerability arising from the missing isolation boundary between instructions and data inside a single LLM context.
Tool Precommitment
Tool precommitment is a prompt-injection defense pattern in which a trusted planner decides which tools, parameter scopes, and destinations an LLM agent may use before any untrusted content enters its context. A deterministic policy engine then enforces that fixed capability manifest for the rest of the session, so instructions recovered from documents, web pages, or other agents cannot expand the agent's tool surface at runtime.
Ambient Authority
Ambient authority is the security property of a process or agent acting under whatever permissions its surrounding environment happens to grant — inherited tokens, cookies, environment variables, mounted filesystems, registered tools — rather than under explicit, narrow grants for each operation. In an LLM agent runtime, ambient authority means any text that reaches the planner, including attacker-controlled content, can steer the full set of inherited credentials. It is the structural precondition that turns prompt injection into authorized action.
Information-Flow Control (IFC)
Information-flow control (IFC) is a security model that labels data with confidentiality and integrity metadata at its source and enforces deterministic policies on how labeled data may flow to sinks. Where access control asks whether a principal may touch a resource, IFC asks whether data originating in one place may reach another. In agent security, IFC constrains how untrusted text in an LLM's context can influence tool calls or move sensitive data to public outputs.
Lethal Trifecta
The lethal trifecta is the threat-model shorthand for the three ingredients that make a tool-using AI agent dangerous when combined: an untrusted instruction source, a sensitive data source, and an exfiltration or side-effect channel — all reachable inside one planning loop. Any two are manageable; all three together let attacker text in an ingested document steer the agent into reading private data and writing it to an attacker-reachable sink without a user click.
Compound AI System
A compound AI system is an automation system in which a foundation model is one component among many — retrievers, tools, memory, planners, verifiers, sandboxes, and approval gates — collaborating to complete a multi-step objective. The term, introduced by Berkeley AI Research in 2024, names the shift from single model calls to systems whose behavior emerges at the system boundary. It is broader than 'agent' and more specific than 'LLM application.'
Excessive Agency
Excessive agency is an OWASP-named risk class in which an LLM-backed system holds more permission, autonomy, or tool breadth than its task requires. It is a property of the system's authority configuration, not any single prompt or output. The surplus authority becomes harmful only when prompt injection, a hallucinated plan, or poisoned retrieval converts it into a real action.
Retrieval Poisoning
Retrieval poisoning is a corpus-side attack on retrieval-augmented generation (RAG) and agent memory in which an attacker plants malicious documents in a knowledge source so that those documents are retrieved for chosen queries and steer the model's answer or action. It is the supply-chain analogue of prompt injection: instead of attacking the prompt at request time, the attacker attacks the knowledge the prompt is built from.
Constrained Decoding
Constrained decoding is a sampling-time technique that restricts a language model's token-by-token output to tokens that keep the running prefix consistent with a target grammar — typically a JSON Schema, regex, or context-free grammar. At each step the runtime masks tokens that would violate the grammar, so the final output is structurally guaranteed to conform. It enforces shape, not meaning, and is distinct from prompting the model to follow a format.
Function Calling
Function calling is the serialization protocol by which a large language model emits a structured tool invocation — a tool name and a JSON argument object drawn from a developer-declared schema — instead of free-form text. It guarantees that the model's output is parseable as a tool call against a known vocabulary. It does not guarantee that the chosen tool, the arguments, or the resulting action match user intent or are safe to execute.
Tool-Use Reliability
Tool-use reliability is the end-to-end property of an LLM agent that every tool call it emits is syntactically well-formed, schema-valid, semantically correct, state-consistent, and authorized. It spans five layers — syntax, schema, semantics, state, and authority — of which function calling and structured outputs cover only the lowest two or three. Production incidents typically occur at the upper layers, where the model is implicitly trusted to self-restrict.
Cognitive Offloading
Cognitive offloading is the use of an external tool or action — writing, diagrams, calculators, search, AI assistants — to reduce the internal cognitive demand of a task. The construct was formalized by Risko and Gilbert (2016) and is neutral by default; its educational consequence depends on which cognitive operation got moved outside the head. In AI-mediated learning, the failure mode is offloading the target skill itself rather than peripheral load.
Performance-Learning Gap
The performance-learning gap is the dissociation between a learner's assisted task performance and their unassisted independent capacity on the same class of task. AI assistance can raise visible output quality while underlying skill stalls or regresses, because the assistant performs the cognitive operations the task was designed to exercise. The gap becomes visible only when the scaffold is removed and the learner is reassessed alone.
Productive Struggle
Productive struggle is the band of cognitive effort where a learner is challenged enough to build durable schema but not so overwhelmed that progress collapses. In cognitive-load terms, it is the germane-load region a tutor must preserve while keeping intrinsic load tractable and extraneous load low. In AI-tutoring design, it names the effort a generative tutor must protect rather than dissolve by giving away answers.
Intelligent Tutoring System (ITS)
An intelligent tutoring system (ITS) is a computer-based instructional system that adapts to an individual learner by maintaining an explicit domain model, a learner model of current mastery, and a pedagogical policy that selects the next hint, problem, or feedback. Classic ITSs operate at the step level rather than only grading final answers, and named exemplars include Cognitive Tutor, ASSISTments, AutoTutor, Andes, and ALEKS.
Solver–Tutor Gap
The solver–tutor gap is the empirically observed divergence between a language model's ability to solve a domain problem and its ability to teach a learner to solve it. Formalized by Macina et al. in MathTutorBench (EMNLP 2025), the term names the structural fact that subject competence — final-answer correctness on benchmarks like MATH or GSM8K — does not entail pedagogical competence such as diagnosing mistakes, withholding answers, or scaffolding the next move.
Automated Vulnerability Repair (AVR)
Automated vulnerability repair (AVR) is the subclass of automated program repair where the input is a vulnerability signal — a CVE, sanitizer report, crash trace, or proof-of-concept exploit — and the output is a patch that must eliminate exploitability while preserving intended behavior. Modern AVR systems are increasingly agentic: they navigate the repository, build, run PoCs, fuzz, and iterate. The defining distinction from generic APR is that success is a security property, not a passing test.
Check Circumvention
Check circumvention is a patch-failure mode in automated vulnerability repair where the generated patch removes, weakens, or routes around the invariant that surfaced the bug — an assertion, bounds check, sanitizer path, or error return — instead of repairing the root cause. The proof-of-concept stops triggering and existing tests still pass, so the patch looks valid, but the unsafe behavior typically remains exploitable through nearby inputs the PoC never covered.
PoC+ Test
A PoC+ test is a patch-validation artifact that extends a vulnerability proof-of-concept from a crash witness into a behavior witness: it asserts not only that the original PoC input no longer triggers the bug, but also what the patched program should output, return, or raise on that input. Introduced by PVBench, PoC+ tests are used to distinguish real fixes from crash-suppression or specification-violating patches in automated vulnerability repair.
Agentic Binary Reverse Engineering
Agentic binary reverse engineering is the practice of using an LLM-driven system that plans, invokes reverse-engineering tools (Ghidra, IDA, radare2, angr, GDB, sandboxes), observes their output, preserves evidence, and revises hypotheses across many turns to analyze a compiled program without human step-by-step direction. It is distinguished from one-shot LLM-assisted RE, where a human pastes decompiler output into a chat for naming or summarization.
Chain of Evidence
In agentic binary reverse engineering, a chain of evidence is a structured, machine-readable record that binds every claim an RE agent makes to the specific tool output, function, control-flow path, or dynamic observation that supports it, along with a confidence score and validation status. Unlike a flat chat transcript, claims become nodes with provenance so a validator can replay each link before the agent's final verdict is accepted.
Feedback-Driven Execution
Feedback-driven execution is an agent-architecture pattern in which an LLM iteratively reasons about partial evidence, calls a tool, observes the result, and revises its hypothesis until a verifier accepts an answer or a budget is exhausted. It replaces the one-pass paradigm — where a static toolchain produces a fixed snapshot and the model reasons over it once — with a closed reasoning–action–observation loop, and is the dominant control pattern in modern agentic binary reverse-engineering systems.
Answer Over-Disclosure
Answer over-disclosure is a generative AI tutor failure mode in which the tutor reveals the final answer or a complete worked solution before the learner has made a meaningful attempt, replacing the retrieval and reasoning the task was designed to practice. Named in the SafeTutors benchmark, it is a pedagogical failure rather than a factual one: the answer is correct, but its timing destroys the learning.
Faded Scaffolding
Faded scaffolding is an instructional pattern in which a tutor gradually withdraws support across successive attempts so the learner takes on more of the cognitive work, advancing only when the learner succeeds at the current level. The fade is what distinguishes real scaffolding from permanent assistance: without removal, support becomes a dependency that produces high assisted performance and low unassisted competence.
Pedagogical Safety
Pedagogical safety is the property of a tutoring system that protects learners from educational harms — answer over-disclosure, misconception reinforcement, cognitive offloading, false mastery, and multi-turn drift — rather than from offensive or unsafe content. A tutor can be content-safe and pedagogically unsafe at the same time: a correct, well-worded answer delivered before the learner has attempted the work is a pedagogical safety failure.
Multi-Agent Prompt Injection
Multi-agent prompt injection is a class of indirect prompt injection that arises when LLM-driven agents collaborate. An attacker plants instructions in content an upstream agent will summarize or relay; downstream planner and tool-using agents then read those instructions as ordinary peer-agent text and act on them under the operator's credentials. Unlike single-agent injection, the attacker's provenance is laundered through an intermediate agent before reaching the agent with tool access.
Glitch Token
A glitch token is a vocabulary entry whose presence in a prompt disproportionately triggers anomalous model output — incoherence, unexplained refusals, truncation, loops, or silent corruption. The usual root cause is under-training: tokens that exist in the tokenizer's vocabulary but appear rarely or never in the pretraining corpus receive few gradient updates, leaving their embeddings in a poorly-conditioned region of representation space. Also known as under-trained tokens or magikarp tokens.
Tool Hijacking
Tool hijacking is the class of attack where an adversary causes a tool-using LLM agent to exercise its delegated authority on the adversary's behalf — invoking tools that should not have been called, supplying attacker-controlled arguments to legitimate tools, or treating attacker-controlled tool return values as new instructions. It is what prompt injection becomes once the model has hands: a reasoning loop with credentials, re-pointed by hostile text anywhere in its context.