Indirect Prompt Injection

Indirect prompt injection is an attack class against LLM agents in which the malicious instructions are not typed by the user but are embedded in third-party content the agent later ingests as data — a webpage, retrieved document, email, tool response, calendar invite, or memory record. The model fails to maintain an isolation boundary between instructions and data, treats the attacker-controlled content as operational guidance, and takes actions under the victim user's credentials. The term is also written as XPIA, indirect injection, or second-order prompt injection. It differs from direct prompt injection: the user is the victim, not the attacker, and the payload author is a third party whose content the agent happens to read.

Tool hijacking, memory poisoning, and multi-agent prompt injection are downstream specializations of the same mechanism, distinguished by the channel — tool metadata, persistent memory, or peer-agent message — that carries the payload.

Indirect Prompt Injection

Indirect Prompt Injection

See also

Derived From

Related Work

External References