Constrained Decoding
Quick Answer
Constrained decoding is a sampling-time technique that restricts a language model's token-by-token output to tokens that keep the running prefix consistent with a target grammar — typically a JSON Schema, regex, or context-free grammar. At each step the runtime masks tokens that would violate the grammar, so the final output is structurally guaranteed to conform. It enforces shape, not meaning, and is distinct from prompting the model to follow a format.
Constrained Decoding
Constrained decoding is a sampling-time technique that restricts a language model's token-by-token output to tokens that keep the running prefix consistent with a target grammar — most often a JSON Schema, regex, or context-free grammar. At each decoding step, the runtime masks tokens that would make the prefix unextendable under the grammar and samples only from the remainder, so the final output is structurally guaranteed to conform. It is qualitatively different from prompting: prompting asks the model to behave; constrained decoding changes the set of outputs the model can emit at all. It guarantees shape, not meaning — a tool call can be schema-valid and still target the wrong tool with the wrong arguments. It sits at the grammar layer of the tool-use reliability stack.
See also
- Tool hijacking — valid shape does not prevent the wrong tool being selected.
- Excessive agency — valid JSON does not imply authorized action.