Tool Hijacking

Tool hijacking is the class of attack against tool-using LLM agents in which an adversary causes the agent to exercise its delegated authority on the adversary's behalf. A tool-using agent emits structured function calls into a runtime that dispatches them against real APIs, with each tool carrying its own credentials and reach; an attacker who can place text anywhere in the model's context window can redirect that authority — invoking a tool that should not have been called, supplying hostile arguments to a legitimate tool, or treating an attacker-controlled tool return value as new instructions. It is what prompt injection becomes once the model has hands. Stakes scale with the toolset: data exfiltration, unauthorized actions, lateral access into connected systems.

Tool Hijacking

Tool Hijacking

See also

Derived From

Related Work