Design and Development of Audio Processing and Speech Classification Algorithm

2025-06-28 16:31:20 - Adil Khan

Project Title

Project Area of Specialization Artificial IntelligenceProject Summary

Unmanned Aerial Vehicles (UAVs) have gained well-known attention in recent years for numerous applications including military and civilian surveillance operations as well as search and rescue missions. The UAVs are not controlled by professional pilots and most users have less aviation experience but still, users face difficulties during flying especially when performing other tasks simultaneously. Therefore, it seems to be purposeful to simplify the process of UAV control by enabling them to maneuver it with their voice commands. This project aims to control the quadcopter entirely by Human Voice input for effective flight and make human-machine interference easier and effective. An intelligent speech processing algorithm will be proposed and implemented on quadcopters to maneuver the UAVs accordingly.

The availability of large datasets has empowered deep learning to make a cutting-edge advancement in the variety of computer vision and speech recognition domains. Speech, being the main method of communication among human beings, received much interest in the past decade right from the introduction of artificial intelligence. Automatic speech recognition is the capability of a machine or computer to recognize the content of words and phrases in an uttered language and transform them to a machine-understandable format. Speech recognition can be used in many other applications for example dictating computers instead of typing, spaceships when the extremities are busy, helping handicapped people, smart homes, and many others.

Under this project, a machine learning/deep learning-based audio processing and speech recognition algorithm will be developed and implemented on a Raspberry-Pi. The PI will pass the maneuvering instructions to the quadcopter for flying as per the voice commands. TensorFlow Speech Recognition Challenge dataset will be employed here to train the network. The dataset includes 65,000 one-second long utterances of 30 short words, by thousands of different people. The short words include right, left, up, down, go, etc.

Project Objectives

The core intention of this project is to control the quadcopter entirely by human voice input for effective flight.
A machine learning/deep learning-based audio processing and speech recognition algorithm will be designed and developed.
Voice commands will be used for controlling quadcopter and make human-machine interference easier and effective.

Project Implementation Method

A comprehensive literature review will be carried out.
The selected method/technique will be implemented using Pytorch and TensorFlow Speech Recognition Challenge dataset.
A Genetic Algorithm will be employed to optimize the selected machine learning/ deep learning algorithm to decrease the overall computational cost.
The proposed technique will be then implemented on a Raspberry-Pi.
Real-time voice-controlled quadcopter flights will be carried out.

Benefits of the Project

Offering an increased work efficiency and productivity, drones have become an important focus in various applications including agricultural monitoring, disaster management, surveillance, remote sensing, and videography. The undertaken project aims to improve the machine-human interaction through speech commands. Considering the ongoing pandemic COVID-19, the project can also be employed to control the machines without involving a physical touch. Speech recognition finds its applications in many other areas for example:

Automated data entry in ATMs and vending machines (Covid-19)
Home/Office automation
Wheelchair control for handicapped people
Vehicle navigation system by voice commands

Technical Details of Final Deliverable

A machine learning/ deep learning-based speech recognition algorithm implemented on a Raspberry-Pi to maneuver a quadcopter in real-time using our voice commands.
A functional communication protocol to send flight commands from the ground station to the flight controller.
A fully functional Quadcopter.

Final Deliverable of the Project HW/SW integrated systemCore Industry ITOther IndustriesCore Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Industry, Innovation and InfrastructureRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	38000
Microphone	Equipment	1	1000	1000
NVIDIA Jetson Nano Developer Kit	Equipment	1	30000	30000
Power Board	Equipment	1	2000	2000
8MP Raspberry Pi Camera Module V2	Equipment	1	5000	5000

Design and Development of Audio Processing and Speech Classification Algorithm

More Posts