A baby crying or someone calling for help are just some audio events that require action. Manually monitoring for these sounds, either in proximity or remotely through a monitoring device, not only demands attention but also requires the person to be within hearing distance. This is not always possi
Real-time Sound Event Detection System using Deep learning with Edge Computing
A baby crying or someone calling for help are just some audio events that require action. Manually monitoring for these sounds, either in proximity or remotely through a monitoring device, not only demands attention but also requires the person to be within hearing distance. This is not always possible especially for deaf and hard hearing people.
Machine learning has experienced strong growth in recent years, due to increased dataset
sizes and computational power, and to advances in deep learning methods that can learn to make predictions in extremely nonlinear problem settings. However, a large amount of data is needed to train a neural network that can achieve good quality performance. With the increased amount of audio datasets publicly available, there is also an increase of tagging labels available for them. We refer to these tagging labels, which only indicate the presence or not of a type of event in a recording and lack any temporal information about it, as weak labels. However, in recent decades, there has also been an increase in the demand for transcription
predictions for a variety of audio recordings instead of just the tags of a recording. Transcription
of audio recordings refers to audio event detection, which provides a list of audio events active in a recording along with temporal information about each of them, i.e., starting time and duration for each event. Some potential applications where audio event transcription is necessary are context awareness for cars, smartphones, etc., surveillance for dangerous events and crimes, analysis and monitoring of biodiversity, recognition of noise sources and machine faults, and many more. Depending on the audio event to be detected and classified in each task, it may become difficult to collect enough samples for them. Furthermore, different tasks use task-specific datasets, hence the number of recordings available may be limited. Annotating data with strong labels, labels that contain temporal information about the events, to train transcription predictors is a time-consuming process involving a lot of manual labor. On the other hand, collecting weakly labeled data takes much less time, since the annotator must mark the active sound event classes and not their exact boundaries. We refer to datasets that only have these types of weak labels, may contain rare events, and have limited amounts of training data as low-resource datasets. In comparison to supervised techniques that are trained on strong labels, there has been relatively little work on learning to perform audio event transcription using weakly labeled data.
Main tasks identified in the Gantt Chart?
The main steps to proceed:
Main Milestones identified in Gantt Chart?
We will be using Deep Neural Networks and Data segmentation to classify sound events for detection and dealing with weakly labeled data, respectively.
Process:
At first, we will collect data and define them into small equal parts/files of approximately 20 to 30 seconds each. Then after deploying out data to the program, Fourier transform is applied to get frequency analysis i.e. spectrum. After that, we would be analyzing the same spectrum with respect to time which covers in Short Fourier Transform having color depth feature. The labeled results would be feed into the model i.e. Multilayer Neural Network. In between, we would segment the audio files/input to increase data size. And would define the total data into training, validation, and testing data. We would analyze the predicted data from test data and after that to real data. We will be removing the overfitting of data with the help of one of these techniques or two, Early Stoppage, Switching, and Regularization. In short, the Mini Batch Gradient Decent algorithm would be used. All of this will be implemented using PyCharm software using libraries which include Tensor Flow and Keras.
Fourier Transform:
The Fourier Transform is a mathematical technique that transforms a function of time, x(t), to a function of frequency, X(?). It is closely related to the Fourier Series. If you are familiar with the Fourier Series, the following derivation may be helpful. If you are only interested in the mathematical statement of transform, please skip ahead to the Definition of Fourier Transform.


Short Fourier Transform:
Short-time Fourier transform (STFT) is a sequence of Fourier transforms of a windowed signal. STFT provides the time-localized frequency information for situations in which frequency components of a signal vary over time, whereas the standard Fourier transform provides the frequency information averaged over the entire signal time interval.


Multilayer Neural Network:
Multilayer networks solve the classification problem for nonlinear sets by employing hidden layers, whose neurons are not directly connected to the output. The additional hidden layers can be interpreted geometrically as additional hyper-planes, which enhance the separation capacity of the network.

Project Deliveries and specifications are as follows:
Project Deliveries and specifications are as follows:
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Controller (Module) | Equipment | 1 | 10000 | 10000 |
| Mic(Nodes) | Equipment | 4 | 1500 | 6000 |
| Battery | Equipment | 1 | 5000 | 5000 |
| Charger | Equipment | 2 | 1500 | 3000 |
| RGB Lights | Miscellaneous | 4 | 500 | 2000 |
| Device Box | Miscellaneous | 1 | 1500 | 1500 |
| Components(others) | Miscellaneous | 1 | 2000 | 2000 |
| Total in (Rs) | 29500 |
Early diagnosis of any disease leads to better options to recover. Unfortunately, in rural...
As per the above Statements, References, Problems, discussion about this application will...
Since the fuel prices not only in Pakistan but all over the world is increasing day by day...
The main focus of this project is our blind community who our unable to feel the pleasure...
An embedded system that?s is intended to provide rural medical care. The basic objective o...