Adil Khan 9 months ago
AdiKhanOfficial #FYP Ideas

Deep Read

Deep Read aims to help people with hearing loss by providing them with a deep learning based state of the art lips reading system that will take a video with noisy environment or without sound as an input and help them understand what the person in video is

Project Title

Deep Read

Project Area of Specialization

Artificial Intelligence

Project Summary

Deep Read aims to help people with hearing loss by providing them with a deep learning based state of the art lips reading system that will take a video with noisy environment or without sound as an input and help them understand what the person in video is saying by recognizing their lips movement and mapping it on to the possible words then finally predicting and constructing sentences from these words. 

Project Objectives

There are many instances where our traditional approaches for speech recognition fail for example at an airport, a security camera records a most wanted criminal talking to another criminal and the camera has recorded the video of criminals but the officials are not able to know what they were talking to each other. There is another example where a person with hearing disability have lost their hearing aid and they are not able to listen or understand what other people are trying to say to them.
 
Our main objective is to build an accurate deep learning based neural network that will work as a visual speech recognition system and take a voiceless video as input and predict its transcript by recognizing the movement of lips, teeth, cheeks and throat of persons in the video.
 

Project Implementation Method

Deep read will be using Spatiotemporal Convolutional Neural Network based model with stacked convolutional layers alongwith Gated Recurrent Units for feature extraction and softmax activation fuction will be used for prediction of sentences. Connectionist temporal loss function will be used for calculating the actual loss in the prediction and weights will be optimized on the base of CTC loss.

After training of the model, extensive testing of the model will be performed. Furthermore, we'll analyze the previously used techniques for lips reading and then compare those with our model and improve Deep Read model on the basis of the analysis report.

To train, test and evaluate the model, a GPU will be required because the neural net needs to do the heavy processing for feature extraction and sentences prediction. This can't be done in our normal CPUs as they don't have sufficient processing capabilities while in deep read huge amount of matrix multiplications will need to be done and other massive calculation operations need to be performed which should be done in parallel to speed up the training process. 

The trained and optimized neural net will then be deployed on the cloud where the end users will be able to upload the muted videos with the help of our web app or mobile app and get the transcript of the videos. 

Benefits of the Project

Deep read will help

  • People with hearing loss by providing them realtime transcript of a voiceless video.
  • Law-enforcement angencies to predict the transcript of suspicious conversation in a mute video.

There are 466 million people who have to live with a life of difficulties due to deafness and hearing loss. There is a negligible amount of effort being put to make the world a better place for the deaf people. Deep Read will greatly make the lives of people with hearing loss easier by providing them the realtime transcript of voiceless videos and they will be able to understand/know what other people are trying to say to them.

Lawenforcement agencies face problems where the video is available as evidence but the voice of video is not recorded due to technical errors, the sound of the video is corrupted or there is noise in background due to which the officials can't interpret what is being spoken in the video and hence there is no progress in that case/issue. Deep read will provide take the corrupted, muted or noisy video as input and provide the predicted transcript of the video alongwith its confidence score.  

Technical Details of Final Deliverable

Deep read model will be mainly coded in python with tensorflow as backend and our end products will be

  • Fully trained model with optmized weights that could be deployed by organizations in their systems and used for conversation predictions.
  • Web app that will take video as input and return transcript as output. 
  • Mobile app that will take video as input and return transcript as output.

Final Deliverable of the Project

Software System

Core Industry

IT

Other Industries

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Good Health and Well-Being for People

Required Resources

Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
GPU (GEFORCE RTX 2080) Equipment17000070000
Data gathering, processing and miscellaneous Miscellaneous 11000010000
Total in (Rs) 80000
If you need this project, please contact me on contact@adikhanofficial.com
Design and Fabrication of Hybrid Bike

Hybrid cars are nowadays on roads, saving environment from harmful gases in the populated...

1675638330.png
Adil Khan
9 months ago
UNDER SURFACE OBJECT DETECTION SMART ROBOT BASED ON GSM COMMUNICATION...

This proposal is about designing & developing a prototype smart robotic system wi...

1675638330.png
Adil Khan
9 months ago
Face Detection Door System

Among other applications, face recognition is one of the primary biometric tasks, becoming...

1675638330.png
Adil Khan
9 months ago
Intelligent Lane Detection using Artificial Intelligence

Lane detection in driving scenes is an important module for saving human lives and prevent...

1675638330.png
Adil Khan
9 months ago
Depression and Anxiety prediction through Machine Learning

Machine learning algorithms employ a variety of statistical, probabilistic and optimizatio...

1675638330.png
Adil Khan
9 months ago