In the recent literature of artificial intelligence, Visual Question Answering (VQA) has emerged as a promising application for real-time visual reasoning. Using this approach our project can simulate the process of looking at a view, perceiving it, and extra
Cognitive Lens
In the recent literature of artificial intelligence, Visual Question Answering (VQA) has emerged as a promising application for real-time visual reasoning.
Using this approach our project can simulate the process of looking at a view, perceiving it, and extracting its details. Due to the ability of our project to imitate the thought process of humans, we have named it the “Cognitive Lens”.
This project will be specialized for the visually impaired, providing them with an easy to use interface with limited buttons and multi-tap gestures to call various functions of the application. This would not only provide the user with the gift of sight but also help them tackle everyday obstacles much more easily.
VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language, and common-sense knowledge to answer.
This project is based on a mobile application built on React Native. It will take an input image and a natural-language question about the image and produce a natural-language answer as the output.
The proposed project aims to familiarize a machine with its surroundings through the Visual Question Answering dataset. To achieve this the objects in the image will first be identified and classified according to the classes they match in the model which contains 50 different classes.
Then the user will be prompted to question the app in accordance to the objects identified regarding their quantity or appearance which the app will respond to with the generated answer.
The smart glasses paired to the app will contain a camera to snap a picture of the user's view and the follow-up question from the user will be sent to the app through a microphone embedded in the glasses. The received output will be clearly heard by the user through earphones also attached to the glasses. Smart glasses will be paired to the user's phone via Bluetooth.
We will design a cross-platform mobile application that takes input in the form of speech and image and generates output from the VQA dataset.
The main objectives of our projects are:
Implement the model to the cross-platform React Native application for testing after each step.
Our final deliverable is a cross-platform mobile application that runs on both Android and iOS.
It runs on all the devices that support Android 6.0 and above
It runs on all the devices that support iOS 9.0 and above
The models that run in the backend are built on Python API (Keras)
Smart-glasses can be connected to the mobile application through wireless technology (Bluetooth).
Input data will be sent via a microphone attached to the glasses and output data will be received and played in the earphones. The Smartglasses must contain an embedded button to click to capture a picture.
Questions asked by the user will be converted using natural language processing for the system to better understand the query.
Text-to-speech: expo-speech API will be used to read out each button function the user taps on.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Movidius Neural Compute Stick 2 with Myriad X Vision Processing Unit | Equipment | 1 | 23741 | 23741 |
| Portable Anti-Polarization Call Semi-Open Type Smart Bluetooth Glasses | Equipment | 1 | 16502 | 16502 |
| SSD | Equipment | 1 | 5799 | 5799 |
| Total in (Rs) | 46042 |
The degradation of the natural fossil fuels such as coal, petroleum and natural gas has en...
Energy harvesting is being in the use since the development of the technology of Low power...
A Kinect sensor is a motion sensing device introduced by Microsoft and has since been used...
Our project is based on alternating current to high voltage direct current as we can see i...