Deep Read

Deep Read aims to help people with hearing loss by providing them with a deep learning based state of the art lips reading system that will take a video with noisy environment or without sound as an input and help them understand what the person in video is

Project Title

Deep Read

Project Area of Specialization

Artificial Intelligence

Project Summary

Deep Read aims to help people with hearing loss by providing them with a deep learning based state of the art lips reading system that will take a video with noisy environment or without sound as an input and help them understand what the person in video is saying by recognizing their lips movement and mapping it on to the possible words then finally predicting and constructing sentences from these words.

Project Objectives

There are many instances where our traditional approaches for speech recognition fail for example at an airport, a security camera records a most wanted criminal talking to another criminal and the camera has recorded the video of criminals but the officials are not able to know what they were talking to each other. There is another example where a person with hearing disability have lost their hearing aid and they are not able to listen or understand what other people are trying to say to them.

Our main objective is to build an accurate deep learning based neural network that will work as a visual speech recognition system and take a voiceless video as input and predict its transcript by recognizing the movement of lips, teeth, cheeks and throat of persons in the video.

Project Implementation Method

Deep read will be using Spatiotemporal Convolutional Neural Network based model with stacked convolutional layers alongwith Gated Recurrent Units for feature extraction and softmax activation fuction will be used for prediction of sentences. Connectionist temporal loss function will be used for calculating the actual loss in the prediction and weights will be optimized on the base of CTC loss.

After training of the model, extensive testing of the model will be performed. Furthermore, we'll analyze the previously used techniques for lips reading and then compare those with our model and improve Deep Read model on the basis of the analysis report.

To train, test and evaluate the model, a GPU will be required because the neural net needs to do the heavy processing for feature extraction and sentences prediction. This can't be done in our normal CPUs as they don't have sufficient processing capabilities while in deep read huge amount of matrix multiplications will need to be done and other massive calculation operations need to be performed which should be done in parallel to speed up the training process.

The trained and optimized neural net will then be deployed on the cloud where the end users will be able to upload the muted videos with the help of our web app or mobile app and get the transcript of the videos.

Benefits of the Project

Deep read will help

People with hearing loss by providing them realtime transcript of a voiceless video.
Law-enforcement angencies to predict the transcript of suspicious conversation in a mute video.

There are 466 million people who have to live with a life of difficulties due to deafness and hearing loss. There is a negligible amount of effort being put to make the world a better place for the deaf people. Deep Read will greatly make the lives of people with hearing loss easier by providing them the realtime transcript of voiceless videos and they will be able to understand/know what other people are trying to say to them.

Lawenforcement agencies face problems where the video is available as evidence but the voice of video is not recorded due to technical errors, the sound of the video is corrupted or there is noise in background due to which the officials can't interpret what is being spoken in the video and hence there is no progress in that case/issue. Deep read will provide take the corrupted, muted or noisy video as input and provide the predicted transcript of the video alongwith its confidence score.

Technical Details of Final Deliverable

Deep read model will be mainly coded in python with tensorflow as backend and our end products will be

Fully trained model with optmized weights that could be deployed by organizations in their systems and used for conversation predictions.
Web app that will take video as input and return transcript as output.
Mobile app that will take video as input and return transcript as output.

Final Deliverable of the Project

Software System

Core Industry

Other Industries

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Good Health and Well-Being for People

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
GPU (GEFORCE RTX 2080)	Equipment	1	70000	70000
Data gathering, processing and miscellaneous	Miscellaneous	1	10000	10000
			Total in (Rs)	80000

If you need this project, please contact me on contact@adikhanofficial.com

112

Comments 0

Online Pizza ordering System like that Fast food Restaurant Management...

There are three modules of this system/application for food app system/application that is...

Adil Khan

11 months ago

DESIGN AND DEVELOPMENT OF A WAREHOUSE STACKING ROBOT (INSPIRED BY SQUI...

Design and development of warehouse stacking robot (inspired by squid bot) is a robot that...

Adil Khan

11 months ago

Human Following Object Carrier

The project revolves around creating a carrier that can carry any object ranging from...

Adil Khan

11 months ago

Construction Xpertz

Construction Xpertz is basically based on mobile and web applications in which we can buy...

Adil Khan

11 months ago

Priority Based Load Monitoring Energy Device

This project strives to launch a framework for energy device to avoid the complete blackou...

Adil Khan

11 months ago