Adil Khan 1 year ago

AdiKhanOfficial #FYP Ideas

Real-time Sound Event Detection System using Deep learning with Edge Computing

A baby crying or someone calling for help are just some audio events that require action. Manually monitoring for these sounds, either in proximity or remotely through a monitoring device, not only demands attention but also requires the person to be within hearing distance. This is not always possi

Project Title

Project Area of Specialization

Electrical/Electronic Engineering

Project Summary

Machine learning has experienced strong growth in recent years, due to increased dataset

sizes and computational power, and to advances in deep learning methods that can learn to make predictions in extremely nonlinear problem settings. However, a large amount of data is needed to train a neural network that can achieve good quality performance. With the increased amount of audio datasets publicly available, there is also an increase of tagging labels available for them. We refer to these tagging labels, which only indicate the presence or not of a type of event in a recording and lack any temporal information about it, as weak labels. However, in recent decades, there has also been an increase in the demand for transcription

predictions for a variety of audio recordings instead of just the tags of a recording. Transcription

of audio recordings refers to audio event detection, which provides a list of audio events active in a recording along with temporal information about each of them, i.e., starting time and duration for each event. Some potential applications where audio event transcription is necessary are context awareness for cars, smartphones, etc., surveillance for dangerous events and crimes, analysis and monitoring of biodiversity, recognition of noise sources and machine faults, and many more. Depending on the audio event to be detected and classified in each task, it may become difficult to collect enough samples for them. Furthermore, different tasks use task-specific datasets, hence the number of recordings available may be limited. Annotating data with strong labels, labels that contain temporal information about the events, to train transcription predictors is a time-consuming process involving a lot of manual labor. On the other hand, collecting weakly labeled data takes much less time, since the annotator must mark the active sound event classes and not their exact boundaries. We refer to datasets that only have these types of weak labels, may contain rare events, and have limited amounts of training data as low-resource datasets. In comparison to supervised techniques that are trained on strong labels, there has been relatively little work on learning to perform audio event transcription using weakly labeled data.

Project Objectives

Main tasks identified in the Gantt Chart?

The main steps to proceed:

Initiate
Planning
Execution
Simulations
Testing
Hardware

Main Milestones identified in Gantt Chart?

Data Collection*
Algorithm Selection
Product Design

Project Implementation Method

We will be using Deep Neural Networks and Data segmentation to classify sound events for detection and dealing with weakly labeled data, respectively.

Process:

At first, we will collect data and define them into small equal parts/files of approximately 20 to 30 seconds each. Then after deploying out data to the program, Fourier transform is applied to get frequency analysis i.e. spectrum. After that, we would be analyzing the same spectrum with respect to time which covers in Short Fourier Transform having color depth feature. The labeled results would be feed into the model i.e. Multilayer Neural Network. In between, we would segment the audio files/input to increase data size. And would define the total data into training, validation, and testing data. We would analyze the predicted data from test data and after that to real data. We will be removing the overfitting of data with the help of one of these techniques or two, Early Stoppage, Switching, and Regularization. In short, the Mini Batch Gradient Decent algorithm would be used. All of this will be implemented using PyCharm software using libraries which include Tensor Flow and Keras.

Fourier Transform:

The Fourier Transform is a mathematical technique that transforms a function of time, x(t), to a function of frequency, X(?). It is closely related to the Fourier Series. If you are familiar with the Fourier Series, the following derivation may be helpful. If you are only interested in the mathematical statement of transform, please skip ahead to the Definition of Fourier Transform.

Short Fourier Transform:

Short-time Fourier transform (STFT) is a sequence of Fourier transforms of a windowed signal. STFT provides the time-localized frequency information for situations in which frequency components of a signal vary over time, whereas the standard Fourier transform provides the frequency information averaged over the entire signal time interval.

Multilayer Neural Network:

Multilayer networks solve the classification problem for nonlinear sets by employing hidden layers, whose neurons are not directly connected to the output. The additional hidden layers can be interpreted geometrically as additional hyper-planes, which enhance the separation capacity of the network.

Benefits of the Project

Project Deliveries and specifications are as follows:

The model would be able to classify sound events upon extracted features with a reduced error/cost function.

The device can turn microphones – like those in smart speakers and cameras – into a sound recognition and alerting device.

It listens for audio events and automatically alerts an application if a specific sound is detected, so that a human may take the appropriate action.

The application will be used for generating an alarm

History and previous data will be saved in App

The device will be portable

It would also increase the security and protection of the home.

Technical Details of Final Deliverable

Project Deliveries and specifications are as follows:

The device can turn microphones – like those in smart speakers and cameras – into a sound recognition and alerting device.
It listens for audio events and automatically alerts an application if a specific sound is detected, so that a human may take the appropriate action.
The app will be used for generating an alarm
History and previous data will be saved in App
The device will be portable

Final Deliverable of the Project

Hardware System

Core Industry

Security

Other Industries

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Sustainable Cities and Communities

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
Controller (Module)	Equipment	1	10000	10000
Mic(Nodes)	Equipment	4	1500	6000
Battery	Equipment	1	5000	5000
Charger	Equipment	2	1500	3000
RGB Lights	Miscellaneous	4	500	2000
Device Box	Miscellaneous	1	1500	1500
Components(others)	Miscellaneous	1	2000	2000
			Total in (Rs)	29500

If you need this project, please contact me on contact@adikhanofficial.com

149

Comments 0

Voice based security system

The world is getting over conscious for their security day by day and for that everyone pr...

Adil Khan

1 year ago

Noninvasive Hemoglobin meter

Monitoring of Hemoglobin levels daily is an essential part of your health management. Noni...

Adil Khan

1 year ago

Cloud-Based Health Monitoring using IOT Devices

This project involves building a 'Cloud based health monitoring system with IOT devices'.&...

Adil Khan

1 year ago

Concept of Hoverbike

We want to built a combination of bike and helicopter. Which is called Hoverbike... By the...

Adil Khan

1 year ago

Vulnerability Scanner

We will be creating a web based Vulnerability Scanner  which will detect for cyber se...

Adil Khan

1 year ago