From Black Boxes to Glass Boxes: Explainable AI for Trustworthy Deepfake Forensics
Hanwei Qian,Lingling Xia,3 Authors,Zhengjun Jing
TLDR
A systematic review of explainable deepfake detection methods is provided, categorizing them into three main approaches: forensic analysis, which identifies physical or algorithmic manipulation traces; model-centric methods, which enhance transparency through post hoc explanations or pre-designed processes; and multimodal and natural language explanations, which translate results into human-understandable reports.
Abstract
As deepfake technology matures, its risks in spreading false information and threatening personal and societal security are escalating. Despite significant accuracy improvements in existing detection models, their inherent opacity limits their practical application in high-risk areas such as forensic investigations and news verification. To address this gap in trust, explainability has become a key research focus. This paper provides a systematic review of explainable deepfake detection methods, categorizing them into three main approaches: forensic analysis, which identifies physical or algorithmic manipulation traces; model-centric methods, which enhance transparency through post hoc explanations or pre-designed processes; and multimodal and natural language explanations, which translate results into human-understandable reports. The paper also examines evaluation frameworks, datasets, and current challenges, underscoring the necessity for trustworthy, reliable, and interpretable detection technologies in combating digital misinformation.
