Superseded
This paper has been superseded by Hardening Multi-Agent Systems Against Prompt Injection. The current treatment lives there.
Exploiting Multi Agent Systems: How Prompt Injection Turns Collaboration into Compromise
Exploiting Multi Agent Systems: How Prompt Injection Turns Collaboration into Compromise
Placeholder intro | Conference briefing
Presenter: Jeremy Richards
Team: AI Red Team, ServiceNow
Intro
Large language model agents no longer operate in isolation. They collaborate, delegate, and execute tool calls across orchestration layers, which creates a larger security surface than single-agent prompt injection alone.
This briefing introduces a red-team attack path for multi-agent systems: harvesting hidden system prompts, escalating through mirrored pattern injections, subverting planner behavior, and influencing downstream tool use. The session focuses on both direct prompt injection and second-hand (indirect) injections that move across agent boundaries and compound into mission-level compromise.
Session details
- Date and time: Thursday, October 2, 3:15 PM to 4:00 PM
- Room: 801B
- Format: 45-minute briefing
- Tracks: AI, ML, and Data Science; Application Security
Scope and deliverables
- Attack chain walkthroughs for direct and indirect multi-agent injection
- Practical defender controls with trade-offs in engineering effort and efficacy
- Orchestrator policy patterns for reducing blast radius
- Observability patterns that support security telemetry while respecting privacy
Status
This page is a placeholder intro for the published session. A longer technical write-up and supporting materials will be added in a future revision.