AI based Voice Spoofing Detection using ML and DL
DOI:
https://doi.org/10.70849/IJSCIKeywords:
Early detection of voice spoofing; speaker verification system; CQT-based Cepstral Coefficients; neural networks with convolution; BiLSTM; combining scores; ASVspoof; audio fraud protection through deepfake.Abstract
Voice-based biometric systems, like speaker verification and voice authentication, are now common in financial institutions, virtual assistants, customer service, and security measures. Sadly, attackers can still impersonate these systems easily through different methods—such as playing back recorded voices, creating synthetic speech by TTS, and using voice conversion (VC)—which are all conducted without detection and hence cause security breach. The article introduces an AI-based anti-spoofing system which employs a mix of both machine learning (ML) and deep learning (DL) to tell real speech apart from fake audio. Manually extracted features such as Mel-Frequency Cepstral Coefficients (MFCC), Constant-Q Cepstral Coefficients (CQCC), and spectral features are united together with the deep representations obtained through Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)/Long Short-Term Memory (LSTM) models. The proposal for a hybrid methodology that relies on benchmark datasets (for instance, ASVspoof) reveals that better performance in detection accuracy along with the elimination of Equal Error Rate (EER) occurs when compared to traditional classifiers. The implication is that the combination of ML-based feature engineering with DL-based end-to-end learning offers a powerful and flexible solution that is suitable for the practical implementation of voice spoofing detection in real time.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








