Answer Over-Disclosure

Answer over-disclosure is a generative AI tutor failure mode in which the tutor reveals the final answer or a complete worked solution before the learner has made a meaningful attempt, replacing the retrieval and reasoning the task was designed to practice. Named as the prototypical pedagogical-safety harm in the SafeTutors benchmark, it is a pedagogical failure rather than a factual one: the answer is correct, but its timing substitutes for the learner's cognitive work. The default instruction-tuned behavior of an LLM — fulfill the user's request — produces this failure unless the surrounding tutoring system gates it, and failures intensify across multi-turn dialogue and under motivated answer-seeking.

The measurable downstream consequence is false mastery: high assisted performance paired with low unassisted performance, often discovered only at high-stakes evaluation. Answer over-disclosure is the opposite of unhelpfulness — it is over-helpfulness — and is a tutor-design failure, not student misconduct.

Answer Over-Disclosure

Answer Over-Disclosure

See also

Derived From

Related Work

External References