Conversational Smart Guide for Visually Impaired People

The ultimate goal of this research project is to provide artificial vision to the blind/visually impaired people of the community using Machine Learning. This can be achieved by making sure that the user can ?see? the surrounding just like the people with clear sight. The proposed idea is an mobile

2025-06-28 16:30:56 - Adil Khan

Project Title

Conversational Smart Guide for Visually Impaired People

Project Area of Specialization Artificial IntelligenceProject Summary

The ultimate goal of this research project is to provide artificial vision to the blind/visually impaired people of the community using Machine Learning. This can be achieved by making sure that the user can “see” the surrounding just like the people with clear sight. The proposed idea is an mobile application and Cloud API. The application will be able to capture the surroundings of the user with the help of a  camera and will pass the information to the cloud API so that it can process that information to give the desired results. The application will then deliver the information to the user via voice.

Project Objectives

Automation in any field has been becoming more common today. It has become crucial to introduce automation in every industry.Moreover,it has become extremely important to thrive in the ever-changing era of technology. 

Project Implementation Method

Conversational Smart Guide for Visually Impaired People _1639950506.jpeg

The proposed system consists of two components i.e.Mobile application and Cloud API. This system has multiple modules: Object Detection, Face Detection and Recognition, Facial Expression Detection, Scene Recognition, Activity Recognition and Natural Language Generation. In Natural Language Generation, user will ask for directions in form of voice query and the application will answer that query. In object detection and face detection, firstly, camera in mobile application will capture the surroundings of the user through image. It will then detect face or object/obstacle from the captured images based on the respected module.The proposed system will also be able to detect and recognize the facial expression of the people in front of user and give summary to user in form of voice. Scene recognition system will be able to identify and explain the scene in which user is present. Single and multi-person activity will be detected with the help of activity recognition module.It will explain the information about the detected face. It will also explain the distance and height of the object detected through voice.

  1. Face Detection: The captured frame from mobile application will be passed to web API. The API will call this module to detect the faces from the frame and will return the information to the mobile application.
  2. Face Recognition: The annotated frame/image with detected faces will be passed to face recognition module from face detection module that will perform face recognition and will label the recognized faces with their respective names.
  3. Object Detection: The frame captured from the live camera will passed to the object detection module that will detect the objects and will return the annotated frame from API.
  4. Facial Expression: The frame captured from the live camera will be passed to the face detection module that will detect the faces of people in front of user and then pass the result to facial expression module to detect the expressions using CNN .It then returns the annotated frame.
  5. Scene Recognition: The frame captured from the live camera will be passed to the scene recognition module that willdetect and recognize the scene.
  6. Activity Recognition: Multiple frames will be sent to activity recognition module which which detect the activity taking place and will send the activity name to the application
  7. Speech to Text: The UI captures the voice and pass it to this module which will generate the text as input for other modules.
  8. Text to Speech: The UI will pass the output text  from different modules and will generates the voice as output.
  9. Natural Language Generation: The UI will capture the input voice, will generate text via speech to text and will pass it to nlg module which will identify the intent of the user.
  10. Intent Identification: Input text will be captured from the UI and then will be passed to intent identification module which generates intent class using algorithm(selected on the basis of different metrics).
Benefits of the Project Technical Details of Final Deliverable

The project is utilzing computer vision,natural language generation and understanding as well as speech recognition.As described in proposed methodology, the project will consist of the following modules:

Th final deliverable of the project is the android application integated with REST API. The REST API will be served via Azure Web Application and API service. The major languages used will be python and java. The web API will be made in python flask API.

The user will query the application through voice and will recieve the output in the form of voice as well.

Final Deliverable of the Project Software SystemCore Industry MedicalOther Industries IT Core Technology Artificial Intelligence(AI)Other Technologies Cloud Infrastructure, OthersSustainable Development Goals Good Health and Well-Being for People, Decent Work and Economic GrowthRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 70104
Android Mobile Phone Equipment13500035000
Cloud Credits Equipment21305226104
Printing and Binding cost of research paper Miscellaneous 330009000

More Posts