EYES N EARS (A Smart Support for Physically Challenged People)

2025-06-28 16:27:10 - Adil Khan

Project Title

Project Area of Specialization Artificial IntelligenceProject Summary

Artificial Intelligence (AI) has become increasingly prevalent in contemporary times. It has a wide variety of application areas which can almost replicate tasks that humans would normally perform. Many companies that are using this form of technology are making efficiencies by replacing humans with AI agents. However, researchers are still making efforts to find ways to enhance artificial intelligence to be more 'human-like'. We plan to use the “AI agents” to help people that are physically challenged and that need the help of other humans to go about their daily life. In turn this will also raise their standard of living and make them more independent. In today's advanced hi-tech world, the need for independent living is recognized in the case of visually impaired people who are facing the main problem of social restrictiveness. They suffer in strange surroundings without any manual aid. Visual information is the basis for most tasks, so visually impaired people are at a disadvantage because necessary information about the surrounding environment is not available. With the recent advances in inclusive technology, it is possible to extend the support given to people with visual impairment. This project is proposed to help those people who are blind or visually impaired using Artificial Intelligence, Machine Learning, Image and Text Recognition. The idea is implemented through an Android mobile app that focuses on voice assistant, image recognition, text recognition, gesture recognition etc. The app is capable to assist using voice commands to recognize objects in the surrounding, do text analysis to recognize the text in the hard copy document. It will be an efficient way in which blind people can also interact with the environment with the help of technology and utilize the facilities of the technology. Similarly, speech is also one of the essential communication methods for human beings. The present solutions available for people with hearing disabilities are limited due to accessibility and expensive due to the high cost of hardware components. Sound classification methods are primarily used in smart assistants and smart home products. This technology has a lot of potential and can be inculcated in an application-based solution for deaf people. Apart from this, solutions pertaining to Sign Language Recognition are limited in usability and features as most of these products are limited to only the alphabet's recognition, which real-world usage is inadequate. With the advancement in pose estimation algorithms, a solution can be developed which can recognize words and sentences to improve the efficiency of daily communication. We plan on developing an application that caters to both the visually impaired and the auditorily impaired people.

Project Objectives

Gesture Recognition:

? To investigate the ways in which sign language can be interpreted by an artificial agent.

? To develop and optimize an artificial agent that can translate sign language into text and vice versa.

? To develop an easy-to-use front-end application that runs the artificial agent and deploy this application on a suitable platform

Text Recognition:

? To investigate the ways in which text can be interpreted by an artificial agent.

? To develop and optimize an artificial agent that can accurately read text from images and convey output to the user via speaker

? To develop an easy-to-use front-end application that runs the artificial agent and deploy this application on a suitable platform.

? Finally, to integrate both modules into an easy-to-use application that is readily available on popular platforms.

Project Implementation Method

MODULE 1:

? Gesture Recognition:

The methodology of this research includes data collection, data pre-processing, model building and training. The data collection involves the use of 10 participants and multiple preprocessing techniques are applied to this image data.

? Data Collection: This research aims to test the accuracy of a CNN model in recognizing ASL sign language specifically. In order to assess the accuracy in a controlled manner, static gestures in ASL such as numbers (0–9) and the alphabet are used as the dataset to be utilized by this research.

? Data Pre-Processing: In this stage, OpenCV is used to read and filter the images. A common split of the 70:30 ratio for training neural networks was implemented where 7 data subjects were used for training data and the remaining 3 for predictive data respectively.

? Model Building and Training: TensorFlow and Keras were used to build and train the model. The model consists of four convolutions, each using a Conv2D, MaxPooling2D, and Dropout layer.

MODULE 2:

? Text Recognition:

1. Pre-Processing

At the pre-processing stage, a text region detector is designed to detect text regions in each layer of the image. Initially in the Text Region Detector the original color image is converted into a gray-level image.A text region detector is designed by a widely used feature descriptor: histogram of oriented gradients (HOG). The aim of the text region detector is not to find accurate text positions but to estimate probabilities of the text position and scale information.

2. Normalized Width and Height

To filter out non-text components whose sizes are too large or small, we normalize component's width and height with the scale value (sc) calculated by averaging the corresponding pixel values on the text scale map.

3.Compactness

To prune non-text components with too complex contour shape, compactness is defined as the ratio between the bounding box area (ba) and the square of the component's perimeter, whose value is the component contour pixel number (cn).

Benefits of the Project

Convolutional Neural Networks (CNNs) is a type of deep learning model that had made a lot of success in Speech Data Processing. CNN can be regarded as a variant of the standard neural network. Instead of using fully connected hidden layers like in other Neural Network approaches, CNN introduces a special network structure, which consists of alternating socalled convolution and pooling layers. We identified our best CNN model for our emotion classification task after creating a few different models. With our current model, we were able to obtain a training accuracy of 97 percent. If we had additional data to work with, our model would perform better. We can also see how the model anticipated the actual numbers in the graphs above. We suggested a simple and small convolutional neural network (CNN) architecture with multiple layers using modified kernels and a pooling strategy to detect the sensitive cues based on the extraction of deep frequency characteristics from voice spectrograms, which are more discriminative and robust in speech emotion recognition.

Technical Details of Final Deliverable

No.	Elapsed time (in months) from start of the project	Milestone	Deliverables
1.	1	Research and Plan for Gesture Recognition	Project plan for module of Gesture Recognition
2.	2	Data Cleaning for Gesture Recognition	A clean ready-to-use dataset consisting of signs for communication.
3.	3	Implementation of backend/python script for gesture recognition	Rough but efficient application that recognizes and translates gestures into text and vice versa in real time
4.	4	Integration of python script with front-end	Final Gesture Recognition module fully deployed on a suitable platform.
5.	5	Research for Text Recognition	Project plan for module of Text Recognition.
6.	6	Data Cleaning for Text Recognition	A clean ready-to-use data set consisting of alphabets and numbers.
7.	7	Implementation of backend/python script for text recognition	Rough but efficient application that recognizes text and reads the text to the

No.

Final Deliverable of the Project Software SystemCore Industry HealthOther IndustriesCore Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Good Health and Well-Being for People, Reduced InequalityRequired Resources

Elapsed time in (days or weeks or month or quarter) since start of the project	Milestone	Deliverable
Month 1	Research and Plan for Gesture Recognition	Project plan for module of Gesture Recognition
Month 2	Data Cleaning for Gesture Recognition	A clean ready-to-use dataset consisting of signs for communication.
Month 3	Implementation of backend/python script for gesture recognition	Rough but efficient application that recognizes and translates gestures into text and vice versa in real time
Month 4	Integration of python script with front-end	Final Gesture Recognition module fully deployed on a suitable platform.
Month 5	Research for Text Recognition	Project plan for module of Text Recognition.
Month 6	Data Cleaning for Text Recognition	A clean ready-to-use data set consisting of alphabets and numbers.
Month 7	Implementation of backend/python script for text recognition	Rough but efficient application that recognizes text and reads the text to the user via the speakers in real time.
Month 8	Integration of python script with front-end	Final Speech Recognition module integrated with the main application.
Month 9	Documentation and Testing	Testing Document and Efficiency Analysis along with Final Project Report.

EYES N EARS (A Smart Support for Physically Challenged People)

More Posts