HearSmart
HearSmart is Pakistan?s first smart hearing solution for people, with (or without) hearing impairment, who have trouble focusing on specific sounds/voices. It would let them hear what they want to hear by cleaning the noisy input and isolating speakers from a mixture of noise and sounds.
2025-06-28 16:32:52 - Adil Khan
HearSmart
Project Area of Specialization Artificial IntelligenceProject SummaryHearSmart is Pakistan’s first smart hearing solution for people, with (or without) hearing impairment, who have trouble focusing on specific sounds/voices. It would let them hear what they want to hear by cleaning the noisy input and isolating speakers from a mixture of noise and sounds.
Project Objectives- Algorithm for real time denoising of speech signal
- Speaker seperation and audio enhancement
- Standalone system for deployment of algorithm for real time processing
Hear Smart would be using deep learning to isolate voices from a mixed audio source containing various noises, including multiple speakers. To accomplish this, it would only need to listen to the target person speaking .The recorded audio would then be quickly processed in the cloud, where a neural network will learn to extract the target’s voice. Once training is complete, the neural network model would send the results back to the system, enabling instantaneous on-device voice isolation.
Benefits of the ProjectThis project will benefit people who face difficulty in hearing in noisy environment, or cannot understand speech when too many people start talking at once. It could also be helpful to people working at environments like factories, airports etc. by allowing them to filter out unwanted noises.
Technical Details of Final DeliverableThe system first takes input (noisy) from the environment, extracts features (Spectral and Spatial) out of it and passes the feature vector to a DNN (as input).
We train this deep neural network to learn the spectral mapping from reverberant, or reverberant and noisy signals to
The cIRM (Complex Ideal Ratio Mask). The DNN is given the complementary set of features. The input is normalized to have zero mean and unit variance. After normalization, auto-regressive moving average (ARMA) filtering is performed on the input features. The output layer of the DNN is divided into two sublayers. The sublayers are for the real and imaginary components of the cIRM. Linear activation functions are used in the output layer, whereas rectified linear functions are used in the hidden layer. Back propagation based on the mean-square error is used to train the DNN.
The output of the DNN is an estimate of the compressed mask values of the cIRM.
At each noisy entry, DNN suggests a mask to remove the noise. This mask, when multiplied with noisy input, returns a De-noised speech signal.
This de-noised speech is then passed through Speaker Separation Algorithm and returns separated speech sources available. The user can pick up the speaker of its own choice and suppress the others to have clear hearing.



| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 77095 | |||
| Intel NCSM2450.DK1 Movidius Neural Compute Stick | Equipment | 2 | 10615 | 21230 |
| Raspberry Pi 3 Model B+ | Equipment | 1 | 18200 | 18200 |
| LED HP EliteDisplay E231 | Equipment | 1 | 8000 | 8000 |
| Others(Electronic components etc) | Miscellaneous | 1 | 10000 | 10000 |
| Amazon Web Services (EC2) | Equipment | 1 | 19665 | 19665 |