AI Deception
AI deception is when an AI system misleads people or other systems about what it knows, intends, or can do. This is different from ordinary mistakes or hallucinations: deception involves behavior that shapes others’ beliefs in misleading ways. Evidence of such behavior has already appeared in widely used AI systems, and the risk is expected to grow as AI becomes more capable, more autonomous, and more embedded in everyday decision-making. The Scientific Advisory Board warns that current tools for detecting and controlling AI deception are not yet keeping pace.
Al deception can result in the loss of control of Al systems, large scale social and political disruptions, and could pose significant global risks.
Additional Resources
- Park, Peter S., et al. "AI deception: A survey of examples, risks, and potential solutions." Patterns 5.5 (2024).
- Chen, Boyuan, et al. “AI Deception: Risks, Dynamics, and Controls.” arXiv preprint arXiv:2511.22619, 27 Nov. 2025,
- Stix, Charlotte, et al. "AI Behind Closed Doors: a Primer on The Governance of Internal Deployment." arXiv preprint arXiv:2504.12170 (2025).
- Bengio et al., “International AI Safety Report” (DSIT 2025/001, 2025) safety-report-2025
- Bengio, Yoshua, et al. "Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path?." arXiv preprint arXiv:2502.15657, 2025.
- Duan et al., “AI Alignment and Deception: A Primer,” September 2025.
- Balesni, Mikita, et al. "Towards evaluations-based safety cases for AI scheming." arXiv preprint arXiv:2411.03336 (2024).
