Conversational Smart Guide for Visually Impaired People

2025-06-28 16:30:56 - Adil Khan

Project Title

Project Area of Specialization Artificial IntelligenceProject Summary

The ultimate goal of this research project is to provide artificial vision to the blind/visually impaired people of the community using Machine Learning. This can be achieved by making sure that the user can “see” the surrounding just like the people with clear sight. The proposed idea is an mobile application and Cloud API. The application will be able to capture the surroundings of the user with the help of a camera and will pass the information to the cloud API so that it can process that information to give the desired results. The application will then deliver the information to the user via voice.

Project Objectives

Automation in any field has been becoming more common today. It has become crucial to introduce automation in every industry.Moreover,it has become extremely important to thrive in the ever-changing era of technology.

The main aim of the project is to develop an automatic system that would lead to progress and advancements in the industry.
The most valuable thing for a disabled person is gaining independence. This project aims at giving them social independence by providing the means to navigate around easily and without any outside help.
The project should be implemented on an industrial level and should be available commercially at a cheap cost so that maximum people can get benefits from it.
The system must develop new techniques to be better than already existing systems and open new research in this area.
The system should be better in performance than the previous systems.

Project Implementation Method

Conversational Smart Guide for Visually Impaired People _1639950506.jpeg

The proposed system consists of two components i.e.Mobile application and Cloud API. This system has multiple modules: Object Detection, Face Detection and Recognition, Facial Expression Detection, Scene Recognition, Activity Recognition and Natural Language Generation. In Natural Language Generation, user will ask for directions in form of voice query and the application will answer that query. In object detection and face detection, firstly, camera in mobile application will capture the surroundings of the user through image. It will then detect face or object/obstacle from the captured images based on the respected module.The proposed system will also be able to detect and recognize the facial expression of the people in front of user and give summary to user in form of voice. Scene recognition system will be able to identify and explain the scene in which user is present. Single and multi-person activity will be detected with the help of activity recognition module.It will explain the information about the detected face. It will also explain the distance and height of the object detected through voice.

Face Detection: The captured frame from mobile application will be passed to web API. The API will call this module to detect the faces from the frame and will return the information to the mobile application.
Face Recognition: The annotated frame/image with detected faces will be passed to face recognition module from face detection module that will perform face recognition and will label the recognized faces with their respective names.
Object Detection: The frame captured from the live camera will passed to the object detection module that will detect the objects and will return the annotated frame from API.
Facial Expression: The frame captured from the live camera will be passed to the face detection module that will detect the faces of people in front of user and then pass the result to facial expression module to detect the expressions using CNN .It then returns the annotated frame.
Scene Recognition: The frame captured from the live camera will be passed to the scene recognition module that willdetect and recognize the scene.
Activity Recognition: Multiple frames will be sent to activity recognition module which which detect the activity taking place and will send the activity name to the application
Speech to Text: The UI captures the voice and pass it to this module which will generate the text as input for other modules.
Text to Speech: The UI will pass the output text from different modules and will generates the voice as output.
Natural Language Generation: The UI will capture the input voice, will generate text via speech to text and will pass it to nlg module which will identify the intent of the user.
Intent Identification: Input text will be captured from the UI and then will be passed to intent identification module which generates intent class using algorithm(selected on the basis of different metrics).

Benefits of the Project

This project will improve the quality of life of the visually impaired people.
It will open a new research in this area.
It will be beneficial for the economic growth of the country as more people will be serving the country without any hurdles.
It will make the visually impaired person independent of the outside help.

Technical Details of Final Deliverable

The project is utilzing computer vision,natural language generation and understanding as well as speech recognition.As described in proposed methodology, the project will consist of the following modules:

Face Recognition
Face Detection
Object Detection
Facial Expression
Scene Recognition
Activity Recognition
Speech to Text
Text to Speech
Natural Language Generation
Intent Identification

Th final deliverable of the project is the android application integated with REST API. The REST API will be served via Azure Web Application and API service. The major languages used will be python and java. The web API will be made in python flask API.

The user will query the application through voice and will recieve the output in the form of voice as well.

Final Deliverable of the Project Software SystemCore Industry MedicalOther Industries IT Core Technology Artificial Intelligence(AI)Other Technologies Cloud Infrastructure, OthersSustainable Development Goals Good Health and Well-Being for People, Decent Work and Economic GrowthRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	70104
Android Mobile Phone	Equipment	1	35000	35000
Cloud Credits	Equipment	2	13052	26104
Printing and Binding cost of research paper	Miscellaneous	3	3000	9000

Conversational Smart Guide for Visually Impaired People

More Posts