
The main aim of this project is to build an intelligent system that can accurately differentiate between AI-generated voice recordings and authentic human speech. As voice synthesis technologies like deepfakes and text-to-speech engines become more advanced, detecting artificially generated audio has become a critical cybersecurity and forensic challenge. This system can be used in call centers, emergency response verification, fraud detection, and investigations involving voice evidence. By training models such as Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs) on labeled datasets of real and synthetic voice samples, the project aims to produce a tool that flags potentially fake recordings. The outcome is a prototype capable of assisting in automated voice verification tasks with a focus on reliability and ethical AI use.
The project will be carried out over twelve weeks with structured milestones. Initially, students will gain an understanding of CNNs, SVMs, and how they apply to audio signal classification. They will collect or generate datasets of real and AI-generated voices and preprocess these audio clips for feature extraction (e.g., MFCCs, spectrograms).
Development will include writing and training models using TensorFlow or PyTorch, integrating signal processing libraries, and refining the model with diverse datasets. In the testing phase, the model will be evaluated for precision and recall using previously unseen voice samples. A microphone may be used to test the model with real-time inputs. The project will also require proper documentation, usability testing, and a final presentation. Ethical standards, including privacy, data security, and anti-bias measures, must be followed strictly throughout the project lifecycle.