Deep Read aims to help people with hearing loss by providing them with a deep learning based state of the art lips reading system that will take a video with noisy environment or without sound as an input and help them understand what the person in video is
Deep Read
Deep Read aims to help people with hearing loss by providing them with a deep learning based state of the art lips reading system that will take a video with noisy environment or without sound as an input and help them understand what the person in video is saying by recognizing their lips movement and mapping it on to the possible words then finally predicting and constructing sentences from these words.
There are many instances where our traditional approaches for speech recognition fail for example at an airport, a security camera records a most wanted criminal talking to another criminal and the camera has recorded the video of criminals but the officials are not able to know what they were talking to each other. There is another example where a person with hearing disability have lost their hearing aid and they are not able to listen or understand what other people are trying to say to them.
Our main objective is to build an accurate deep learning based neural network that will work as a visual speech recognition system and take a voiceless video as input and predict its transcript by recognizing the movement of lips, teeth, cheeks and throat of persons in the video.
Deep read will be using Spatiotemporal Convolutional Neural Network based model with stacked convolutional layers alongwith Gated Recurrent Units for feature extraction and softmax activation fuction will be used for prediction of sentences. Connectionist temporal loss function will be used for calculating the actual loss in the prediction and weights will be optimized on the base of CTC loss.
After training of the model, extensive testing of the model will be performed. Furthermore, we'll analyze the previously used techniques for lips reading and then compare those with our model and improve Deep Read model on the basis of the analysis report.
To train, test and evaluate the model, a GPU will be required because the neural net needs to do the heavy processing for feature extraction and sentences prediction. This can't be done in our normal CPUs as they don't have sufficient processing capabilities while in deep read huge amount of matrix multiplications will need to be done and other massive calculation operations need to be performed which should be done in parallel to speed up the training process.
The trained and optimized neural net will then be deployed on the cloud where the end users will be able to upload the muted videos with the help of our web app or mobile app and get the transcript of the videos.
Deep read will help
There are 466 million people who have to live with a life of difficulties due to deafness and hearing loss. There is a negligible amount of effort being put to make the world a better place for the deaf people. Deep Read will greatly make the lives of people with hearing loss easier by providing them the realtime transcript of voiceless videos and they will be able to understand/know what other people are trying to say to them.
Lawenforcement agencies face problems where the video is available as evidence but the voice of video is not recorded due to technical errors, the sound of the video is corrupted or there is noise in background due to which the officials can't interpret what is being spoken in the video and hence there is no progress in that case/issue. Deep read will provide take the corrupted, muted or noisy video as input and provide the predicted transcript of the video alongwith its confidence score.
Deep read model will be mainly coded in python with tensorflow as backend and our end products will be
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| GPU (GEFORCE RTX 2080) | Equipment | 1 | 70000 | 70000 |
| Data gathering, processing and miscellaneous | Miscellaneous | 1 | 10000 | 10000 |
| Total in (Rs) | 80000 |
Hybrid cars are nowadays on roads, saving environment from harmful gases in the populated...
This proposal is about designing & developing a prototype smart robotic system wi...
Among other applications, face recognition is one of the primary biometric tasks, becoming...
Lane detection in driving scenes is an important module for saving human lives and prevent...
Machine learning algorithms employ a variety of statistical, probabilistic and optimizatio...