Tool Hijacking
Quick Answer
Tool hijacking is the class of attack where an adversary causes a tool-using LLM agent to exercise its delegated authority on the adversary's behalf — invoking tools that should not have been called, supplying attacker-controlled arguments to legitimate tools, or treating attacker-controlled tool return values as new instructions. It is what prompt injection becomes once the model has hands: a reasoning loop with credentials, re-pointed by hostile text anywhere in its context.
Tool Hijacking
Tool hijacking is the class of attack against tool-using LLM agents in which an adversary causes the agent to exercise its delegated authority on the adversary's behalf. A tool-using agent emits structured function calls into a runtime that dispatches them against real APIs, with each tool carrying its own credentials and reach; an attacker who can place text anywhere in the model's context window can redirect that authority — invoking a tool that should not have been called, supplying hostile arguments to a legitimate tool, or treating an attacker-controlled tool return value as new instructions. It is what prompt injection becomes once the model has hands. Stakes scale with the toolset: data exfiltration, unauthorized actions, lateral access into connected systems.
See also
- Tool hijacking explained — full taxonomy of the attack patterns and how they compose
- Prompt injection, tool hijacking, and data exfiltration defenses — source review paper this entry derives from