Design and Development of Audio Processing and Speech Classification Algorithm
Unmanned Aerial Vehicles (UAVs) have gained well-known attention in recent years for numerous applications including military and civilian surveillance operations as well as search and rescue missions. The UAVs are not controlled by professional pilots and most users have less aviation experience bu
2025-06-28 16:31:20 - Adil Khan
Design and Development of Audio Processing and Speech Classification Algorithm
Project Area of Specialization Artificial IntelligenceProject SummaryUnmanned Aerial Vehicles (UAVs) have gained well-known attention in recent years for numerous applications including military and civilian surveillance operations as well as search and rescue missions. The UAVs are not controlled by professional pilots and most users have less aviation experience but still, users face difficulties during flying especially when performing other tasks simultaneously. Therefore, it seems to be purposeful to simplify the process of UAV control by enabling them to maneuver it with their voice commands. This project aims to control the quadcopter entirely by Human Voice input for effective flight and make human-machine interference easier and effective. An intelligent speech processing algorithm will be proposed and implemented on quadcopters to maneuver the UAVs accordingly.
The availability of large datasets has empowered deep learning to make a cutting-edge advancement in the variety of computer vision and speech recognition domains. Speech, being the main method of communication among human beings, received much interest in the past decade right from the introduction of artificial intelligence. Automatic speech recognition is the capability of a machine or computer to recognize the content of words and phrases in an uttered language and transform them to a machine-understandable format. Speech recognition can be used in many other applications for example dictating computers instead of typing, spaceships when the extremities are busy, helping handicapped people, smart homes, and many others.
Under this project, a machine learning/deep learning-based audio processing and speech recognition algorithm will be developed and implemented on a Raspberry-Pi. The PI will pass the maneuvering instructions to the quadcopter for flying as per the voice commands. TensorFlow Speech Recognition Challenge dataset will be employed here to train the network. The dataset includes 65,000 one-second long utterances of 30 short words, by thousands of different people. The short words include right, left, up, down, go, etc.
Project Objectives- The core intention of this project is to control the quadcopter entirely by human voice input for effective flight.
- A machine learning/deep learning-based audio processing and speech recognition algorithm will be designed and developed.
- Voice commands will be used for controlling quadcopter and make human-machine interference easier and effective.
- A comprehensive literature review will be carried out.
- The selected method/technique will be implemented using Pytorch and TensorFlow Speech Recognition Challenge dataset.
- A Genetic Algorithm will be employed to optimize the selected machine learning/ deep learning algorithm to decrease the overall computational cost.
- The proposed technique will be then implemented on a Raspberry-Pi.
- Real-time voice-controlled quadcopter flights will be carried out.
Offering an increased work efficiency and productivity, drones have become an important focus in various applications including agricultural monitoring, disaster management, surveillance, remote sensing, and videography. The undertaken project aims to improve the machine-human interaction through speech commands. Considering the ongoing pandemic COVID-19, the project can also be employed to control the machines without involving a physical touch. Speech recognition finds its applications in many other areas for example:
- Automated data entry in ATMs and vending machines (Covid-19)
- Home/Office automation
- Wheelchair control for handicapped people
- Vehicle navigation system by voice commands
- A machine learning/ deep learning-based speech recognition algorithm implemented on a Raspberry-Pi to maneuver a quadcopter in real-time using our voice commands.
- A functional communication protocol to send flight commands from the ground station to the flight controller.
- A fully functional Quadcopter.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 38000 | |||
| Microphone | Equipment | 1 | 1000 | 1000 |
| NVIDIA Jetson Nano Developer Kit | Equipment | 1 | 30000 | 30000 |
| Power Board | Equipment | 1 | 2000 | 2000 |
| 8MP Raspberry Pi Camera Module V2 | Equipment | 1 | 5000 | 5000 |