Agentic AI Security: Prompt Injection, Tool Hijacking, and Voice Agents
Quick Answer
This topic gathers richards.ai work on what changes when LLMs gain tools, memory, and peer agents. It collects the threat models, defenses, glossary terms, checklists, executive briefs, and source papers behind multi-agent prompt injection, tool hijacking, memory poisoning, cross-agent infection, and voice-agent jailbreaking. The page offers reading paths for practitioners, engineering leads, researchers, and newcomers, anchored to the current canonical defense reference.
Agentic AI Security: Prompt Injection, Tool Hijacking, and Voice Agents
This topic gathers the richards.ai work on what changes once an LLM gains tools, memory, and peer agents. The unifying observation across the cluster is that prompt injection in agentic systems is a confused-deputy and authority-propagation problem, not a string-filtering problem. Hardening Multi-Agent Systems Against Prompt Injection is now the canonical defense reference for the cluster and supersedes the earlier exploitation paper, which is retained here for historical context on how the threat model emerged. Some reproduction detail is withheld in the linked artifacts pending vendor coordination.
What this topic covers
The cluster spans threat models, defenses, glossary terms, an operational checklist, an executive brief, source papers, an interactive jailbreak demo, and a runnable capability-control reference implementation. In scope: multi-agent prompt injection, cross-agent infection, memory poisoning, tool hijacking, retrieval-channel exfiltration, and the voice-modality variant of the same authority-confusion failure. Out of scope: single-agent jailbreaking that does not involve tools, memory, or inter-agent channels, and vendor-specific orchestration guidance.
How to read this page
Newcomers should start with What Is Multi-Agent Prompt Injection? and the glossary entries before opening the papers. Practitioners reviewing or building a system should jump to the multi-agent prompt injection defense checklist, which is re-anchored to the hardening paper. Engineering leads and stakeholders who will not work through the full paper should read the executive briefing. Researchers should go to the papers group; the hardening paper carries the current architecture, and the earlier exploitation paper documents the attack surface it answers.
Where this topic sits
This cluster sits inside the security pillar alongside other agentic and LLM-application topics; the topics index lists adjacent shelves, and the papers index holds the broader source-paper set this cluster draws from.
Papers
2 membersHardening Multi-Agent Systems Against Prompt Injection
NoteCanonical defense reference for the cluster; layered architecture and adaptive evaluation. Read this first if you want the controls.
Exploiting Multi Agent Systems: How Prompt Injection Turns Collaboration into Compromise
NoteEarlier attack-side companion, now superseded by the hardening paper; kept for historical context on how the threat model emerged.
Learn
4 membersWhat Is Multi-Agent Prompt Injection? Attack Paths and Defenses
Multi-agent prompt injection is indirect prompt injection whose carrier is another agent. An attacker plants instructions in content that…
NotePlain-language entry point to the threat class; start here before the paper if multi-agent prompt injection is new to you.
What Is Tool Hijacking? When AI Agents Run the Wrong Function
Tool hijacking is the class of attack where an adversary steers an AI agent's function calls: which tool it invokes, with what arguments,…
NoteExplainer on tool-selection and tool-manifest attacks; complements the cross-agent material with the tool-side surface.
What Is RAG Data Exfiltration? Retrieval-Channel Attacks on LLM Apps
RAG data exfiltration is a class of attack where a retrieval-augmented generation system leaks sensitive data because of what it…
NoteRetrieval-channel exfiltration explainer; the memory and retrieval surface that memory-poisoning attacks target.
What Is Voice-Agent Jailbreaking? Why Spoken Prompt Injection Is Different
Voice-agent jailbreaking is prompt injection delivered through a real-time spoken conversation, where an attacker convinces an agent to…
NoteVoice modality variant of the same authority-confusion problem; included to show the pattern beyond text-only stacks.
Glossary
4 membersMulti-Agent Prompt Injection
Multi-agent prompt injection is a class of indirect prompt injection that arises when LLM-driven agents collaborate. An attacker plants…
NoteOne-screen definition of the umbrella term used across this cluster.
Cross-Agent Infection
Cross-agent infection is a multi-agent prompt injection pattern in which an already-compromised agent propagates the adversary's…
NoteDefines the lateral-spread attack class — one compromised agent embedding instructions for peers.
Memory Poisoning
Memory poisoning is a prompt-injection variant in which an attacker causes malicious instructions or false facts to be written into an…
NoteDefines the temporal attack class — injected content stored in shared memory and reactivated later.
Tool Hijacking
Tool hijacking is the class of attack where an adversary causes a tool-using LLM agent to exercise its delegated authority on the…
NoteDefinition for the tool-manifest and tool-selection attack surface referenced throughout the cluster.