UPDF AI

Machine Learning Strategies for Audio Deepfake Detection

Chappidi Aishwarya

2025 · DOI: 10.22214/ijraset.2025.74070
International Journal for Research in Applied Science and Engineering Technology · 0 Citations

TLDR

These results highlight that combining deep spatial–temporal feature learning with ensemble classification offers a strong and reliable solution for securing voice-based systems against DeepFake threats.

Abstract

The proliferation of synthetic audio generated by advanced generative models poses a significant threat to the integrity of digital communication systems. This study proposes a novel hybrid framework combining Convolutional Neural Networks (CNN), Bidirectional Long Short-Term Memory (Bi-LSTM) networks, and eXtreme Gradient Boosting (XGBoost) to detect audio DeepFakes effectively. CNNs extract spatial features from Mel-frequency cepstral coefficients (MFCCs), Bi-LSTMs capture temporal dependencies, and XGBoost serves as a final decision-level classifier. Experiments conducted on benchmark datasets demonstrate that the proposed system achieves an accuracy of 98%, along with high precision, recall, and robustness against unseen attacks. These results highlight that combining deep spatial–temporal feature learning with ensemble classification offers a strong and reliable solution for securing voice-based systems against DeepFake threats.

Cited Papers
Citing Papers