Deep Learning Approach for Detecting Audio Deepfakes in Urdu

Main Article Content

Marium Mateen


The application of Deep Learning algorithms for speech synthesis has led to the widespread generation of Audio Deepfakes, which are becoming a real threat to voice interfaces. Audio Deepfakes are fake audio recordings that are difficult to differentiate from real recordings because they use AI-generated techniques to clone human voices. When prominent speakers, celebrities, and politicians are the target of Audio Deepfakes, this technology can potentially undermine public confidence and trustworthiness. Therefore, it is essential to create efficient methods and technologies to identify and stop the creation and spread of Audio Deepfakes. To address the critical issue of the widespread circulation of fake audio and to detect Audio Deepfakes, several Machine Learning and Deep Learning techniques have been developed recently. However, most such solutions have been trained using datasets in English, raising concerns about their accuracy and trustworthiness for other languages. The primary objective of this research is to develop a Deep Learning model for detecting Audio Deepfakes in Urdu. For this purpose, the deep learning model is trained using an Urdu language audio dataset. The dataset was prepared using both real and fake audio. The real Urdu audio clips were initially collected from which Deep fakes were generated with the help of the Real-Time Voice Cloning tool. Our Deep Learning-based model is built to detect Audio Deep fakes produced using imitation and synthesis techniques. According to the findings of our study, when tested and evaluated, our model obtained an accuracy of 91 percent.


Download data is not yet available.

Article Details

Volume 2 (2023)