Adil Khan 1 year ago
AdiKhanOfficial #FYP Ideas

Cognitive Lens

In the recent literature of artificial intelligence, Visual Question Answering (VQA) has emerged as a promising application for real-time visual reasoning. Using this approach our project can simulate the process of looking at a view, perceiving it, and extra

Project Title

Cognitive Lens

Project Area of Specialization

Artificial Intelligence

Project Summary

In the recent literature of artificial intelligence, Visual Question Answering (VQA) has emerged as a promising application for real-time visual reasoning.

Using this approach our project can simulate the process of looking at a view, perceiving it, and extracting its details. Due to the ability of our project to imitate the thought process of humans, we have named it the “Cognitive Lens”.

This project will be specialized for the visually impaired, providing them with an easy to use interface with limited buttons and multi-tap gestures to call various functions of the application. This would not only provide the user with the gift of sight but also help them tackle everyday obstacles much more easily.

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language, and common-sense knowledge to answer.  

This project is based on a mobile application built on React Native. It will take an input image and a natural-language question about the image and produce a natural-language answer as the output. 


The proposed project aims to familiarize a machine with its surroundings through the Visual Question Answering dataset. To achieve this the objects in the image will first be identified and classified according to the classes they match in the model which contains 50 different classes.

Then the user will be prompted to question the app in accordance to the objects identified regarding their quantity or appearance which the app will respond to with the generated answer.

The smart glasses paired to the app will contain a camera to snap a picture of the user's view and the follow-up question from the user will be sent to the app through a microphone embedded in the glasses. The received output will be clearly heard by the user through earphones also attached to the glasses. Smart glasses will be paired to the user's phone via Bluetooth.

We will design a cross-platform mobile application that takes input in the form of speech and image and generates output from the VQA dataset.

Project Objectives

The main objectives of our projects are:

  1. To design and develop a cognitive cam/app that takes real-time visual contents and a speech (question). As a result, it will produce a logical response as VQA.
  2. To simulate the human cognitive ability.
  3. To make the human-computer interaction convenient for the visually impaired individuals.
  4. To be able to answer open-ended questions asked by the user about the image and real-time visual contents.
  5. This cognitive lens can solve a range of problems for all kinds of people. 

Project Implementation Method

  • Step1:
    Initial investigation of the literature review on VQA. Understanding the dimensions of the dataset.
     
  • Step 2:
    As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text-based).
     
  • Step 3:
    We then modify the previous model to accept an image as an input in addition to the question.
     
  • Step 4:
    For this purpose, we explore the MobileNet, VGG-16, and K-CNN convolutional neural networks to extract visual features from the image.
     
  • Step 5:
    These are merged with the word embedding or with a sentence embedding of the question to predict the answer.
     
  • Step 6:
    Then through text-to-speech and speech-to-text API, the text is made available to the user and the system respectively.
     
  • Step 7:
    Designing of proposed VQA solution on a cross-platform mobile app. Testing and tuning the proposed solution.

Implement the model to the cross-platform React Native application for testing after each step.

Benefits of the Project

  1. It can be handy for the visually-impaired individuals who require constant feedback regarding their environment. It could aid them by answering questions they may face in their day-to-day lives.
     
  2. Secondly, children can use this virtual buddy to educate themselves with its vast knowledge.
     
  3. This may not only be limited to children as non-native users may also benefit from this by familiarizing themselves with our language.
     
  4. It could help in creating intelligent robots and machinery.
     
  5. Another obvious application is to integrate VQA into image retrieval systems. This could have a huge impact on social media or e-commerce.

Technical Details of Final Deliverable

Our final deliverable is a cross-platform mobile application that runs on both Android and iOS. 

  • It runs on all the devices that support Android 6.0 and above

  • It runs on all the devices that support iOS 9.0 and above

The models that run in the backend are built on Python API (Keras) 

  • Keras is compatible with Python 3.6+ and is built on top of Tensorflow 2.0
  • To train the models, we require Intel Compute Stick (Movidius Stick)

Smart-glasses can be connected to the mobile application through wireless technology (Bluetooth).

Input data will be sent via a microphone attached to the glasses and output data will be received and played in the earphones. The Smartglasses must contain an embedded button to click to capture a picture.

Questions asked by the user will be converted using natural language processing for the system to better understand the query.

Text-to-speech: expo-speech API will be used to read out each button function the user taps on.

Final Deliverable of the Project

HW/SW integrated system

Core Industry

IT

Other Industries

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Good Health and Well-Being for People, Decent Work and Economic Growth, Industry, Innovation and Infrastructure

Required Resources

Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Movidius Neural Compute Stick 2 with Myriad X Vision Processing Unit Equipment12374123741
Portable Anti-Polarization Call Semi-Open Type Smart Bluetooth Glasses Equipment11650216502
SSD Equipment157995799
Total in (Rs) 46042
If you need this project, please contact me on contact@adikhanofficial.com
Co pyrolysis of waste tire and rice straw in a fixed bed reactor to pr...

The degradation of the natural fossil fuels such as coal, petroleum and natural gas has en...

1675638330.png
Adil Khan
1 year ago
Design and Development of Vibration based Energy Harvesting for Struct...

Energy harvesting is being in the use since the development of the technology of Low power...

1675638330.png
Adil Khan
1 year ago
Cricket Bowling Training app for Kids elbow extension analysis

A Kinect sensor is a motion sensing device introduced by Microsoft and has since been used...

1675638330.png
Adil Khan
1 year ago
video

PHP Tutorial (& MySQL) #9 - Loops

AdiKhanOfficial
Adil Khan
5 years ago
Alternating Current To High Voltage Direct Current

Our project is based on alternating current to high voltage direct current as we can see i...

1675638330.png
Adil Khan
1 year ago