Adil Khan 9 months ago
AdiKhanOfficial #FYP Ideas

Real-time Sound Event Detection System using Deep learning with Edge Computing

A baby crying or someone calling for help are just some audio events that require action. Manually monitoring for these sounds, either in proximity or remotely through a monitoring device, not only demands attention but also requires the person to be within hearing distance. This is not always possi

Project Title

Real-time Sound Event Detection System using Deep learning with Edge Computing

Project Area of Specialization

Electrical/Electronic Engineering

Project Summary

A baby crying or someone calling for help are just some audio events that require action. Manually monitoring for these sounds, either in proximity or remotely through a monitoring device, not only demands attention but also requires the person to be within hearing distance. This is not always possible especially for deaf and hard hearing people.

Machine learning has experienced strong growth in recent years, due to increased dataset

sizes and computational power, and to advances in deep learning methods that can learn to make predictions in extremely nonlinear problem settings. However, a large amount of data is needed to train a neural network that can achieve good quality performance. With the increased amount of audio datasets publicly available, there is also an increase of tagging labels available for them. We refer to these tagging labels, which only indicate the presence or not of a type of event in a recording and lack any temporal information about it, as weak labels. However, in recent decades, there has also been an increase in the demand for transcription

predictions for a variety of audio recordings instead of just the tags of a recording. Transcription

of audio recordings refers to audio event detection, which provides a list of audio events active in a recording along with temporal information about each of them, i.e., starting time and duration for each event. Some potential applications where audio event transcription is necessary are context awareness for cars, smartphones, etc., surveillance for dangerous events and crimes, analysis and monitoring of biodiversity, recognition of noise sources and machine faults, and many more. Depending on the audio event to be detected and classified in each task, it may become difficult to collect enough samples for them. Furthermore, different tasks use task-specific datasets, hence the number of recordings available may be limited. Annotating data with strong labels, labels that contain temporal information about the events, to train transcription predictors is a time-consuming process involving a lot of manual labor. On the other hand, collecting weakly labeled data takes much less time, since the annotator must mark the active sound event classes and not their exact boundaries. We refer to datasets that only have these types of weak labels, may contain rare events, and have limited amounts of training data as low-resource datasets. In comparison to supervised techniques that are trained on strong labels, there has been relatively little work on learning to perform audio event transcription using weakly labeled data.

Project Objectives

Main tasks identified in the Gantt Chart?

The main steps to proceed:

  • Initiate
  • Planning
  • Execution
  • Simulations
  • Testing
  • Hardware

Main Milestones identified in Gantt Chart?

  • Data Collection*
  • Algorithm Selection
  • Product Design

Project Implementation Method

              We will be using Deep Neural Networks and Data segmentation to classify sound events for detection and dealing with weakly labeled data, respectively.

Process:

At first, we will collect data and define them into small equal parts/files of approximately 20 to 30 seconds each. Then after deploying out data to the program, Fourier transform is applied to get frequency analysis i.e. spectrum. After that, we would be analyzing the same spectrum with respect to time which covers in Short Fourier Transform having color depth feature. The labeled results would be feed into the model i.e. Multilayer Neural Network. In between, we would segment the audio files/input to increase data size. And would define the total data into training, validation, and testing data. We would analyze the predicted data from test data and after that to real data. We will be removing the overfitting of data with the help of one of these techniques or two, Early Stoppage, Switching, and Regularization. In short, the Mini Batch Gradient Decent algorithm would be used. All of this will be implemented using PyCharm software using libraries which include Tensor Flow and Keras.

 

Fourier Transform:

The Fourier Transform is a mathematical technique that transforms a function of time, x(t), to a function of frequency, X(?). It is closely related to the Fourier Series. If you are familiar with the Fourier Series, the following derivation may be helpful. If you are only interested in the mathematical statement of transform, please skip ahead to the Definition of Fourier Transform.

  Short Fourier Transform:

Short-time Fourier transform (STFT) is a sequence of Fourier transforms of a windowed signal. STFT provides the time-localized frequency information for situations in which frequency components of a signal vary over time, whereas the standard Fourier transform provides the frequency information averaged over the entire signal time interval.

 

Multilayer Neural Network:

Multilayer networks solve the classification problem for nonlinear sets by employing hidden layers, whose neurons are not directly connected to the output. The additional hidden layers can be interpreted geometrically as additional hyper-planes, which enhance the separation capacity of the network.

Benefits of the Project

Project Deliveries and specifications are as follows:

  • The model would be able to classify sound events upon extracted features with a reduced error/cost function.
  • The device can turn microphones – like those in smart speakers and cameras – into a sound recognition and alerting device.
  • It listens for audio events and automatically alerts an application if a specific sound is detected, so that a human may take the appropriate action.
  • The application will be used for generating an alarm
  • History and previous data will be saved in App
  • The device will be portable
  • It would also increase the security and protection of the home.

Technical Details of Final Deliverable

Project Deliveries and specifications are as follows:

  • The device can turn microphones – like those in smart speakers and cameras – into a sound recognition and alerting device.
  • It listens for audio events and automatically alerts an application if a specific sound is detected, so that a human may take the appropriate action.
  • The app will be used for generating an alarm
  • History and previous data will be saved in App
  • The device will be portable

Final Deliverable of the Project

Hardware System

Core Industry

Security

Other Industries

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Sustainable Cities and Communities

Required Resources

Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Controller (Module) Equipment11000010000
Mic(Nodes) Equipment415006000
Battery Equipment150005000
Charger Equipment215003000
RGB Lights Miscellaneous 45002000
Device Box Miscellaneous 115001500
Components(others) Miscellaneous 120002000
Total in (Rs) 29500
If you need this project, please contact me on contact@adikhanofficial.com
Smart Health Care System

Early diagnosis of any disease leads to better options to recover. Unfortunately, in rural...

1675638330.png
Adil Khan
9 months ago
First Care App

As per the above Statements, References, Problems, discussion about this application will...

1675638330.png
Adil Khan
9 months ago
Design Development and Control of a Hybrid Bicycle

Since the fuel prices not only in Pakistan but all over the world is increasing day by day...

1675638330.png
Adil Khan
9 months ago
Eye star

The main focus of this project is our blind community who our unable to feel the pleasure...

1675638330.png
Adil Khan
9 months ago
Remote Medic

An embedded system that?s is intended to provide rural medical care. The basic objective o...

1675638330.png
Adil Khan
9 months ago