Integrated Audio-Visual Perceptual System of Socially Interactive Humanoid Robot

Inspiring from the human audio-visual system of object recognition or sound source localization demands to develop a biologically plausible attention mechanism for gaze shifting and sound source localization in a humanoid robot. By achieving a sophisticated robot's acoustic and visual system, the ab

2025-06-28 16:33:08 - Adil Khan

Project Title

Integrated Audio-Visual Perceptual System of Socially Interactive Humanoid Robot

Project Area of Specialization RoboticsProject Summary

Inspiring from the human audio-visual system of object recognition or sound source localization demands to develop a biologically plausible attention mechanism for gaze shifting and sound source localization in a humanoid robot. By achieving a sophisticated robot's acoustic and visual system, the ability of a robot's real-time interaction with humans can be intensified. One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interactingin a real-world environment. The humanoid robot’s visual system has some obstruction when the target is not in the visual field or the lighting condition is poor. A robot cannot detect a non-visual event that may be accompanied by a sound emission. Similarly, to understand the surrounding environment and compensating for the narrow visual field in robotics, auditory processing is requisite.

The social interaction of the humanoid robotic head has not achieved in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. So, there is a need to develop an auditory and visual perception system that can improve the social aspects of human-robot interaction. To perceive the worldly environment, a neural network model to detect sound origination through microphones will be developed in this FYP.  The aim of the final year project is to develop a biologically inspired audio-visual integrated system that can augment the functionality of human-robot interaction. This auditory and visual localization-based system will integrate audio-visual information to improve the social interaction by detecting the cues through active audition and vision. This humanoid robot will be able to detect objects, sound origination, and effective motor control.

Project Objectives
  1. Develop a Neural Network Model for Audio-Visual sensory information integration.
  2. Develop a humanoid robot head having the ability of reflexive gaze shifting and accurate sound source localization.

       3. Develop an acoustic system for the humanoid robotic head.

Project Implementation Method

To perform the worldly tasks, the first task is to localize the sound source. To achieve this task, an active audio perceptual system will be developed by using microphones. Next, an active visual system will be formed to extract the information from sound originating cues in a real environment. Then, the next task is to integrate the auditory and visual information from these two active systems to form a neural network that will move the head in a targeted position to make the humanoid socially interactive. This neural network model will be trained initially for various sound sources at different positions. After the training, the humanoid robot will have an audio-visual integration system similar to the humans. This whole process is shown in the figure below:

Integrated Audio-Visual Perceptual System of Socially Interactive Humanoid Robot _1639951377.png

Benefits of the Project

The socially interactive humanoid robot can serve a vast variety of useful purposes around the globe. It can serve on the ground and underground at a more efficient and faster rate than humans also at a place where the lives of humans are at risk. Considering the present condition of the pandemic, many lives have been lost to death. So, instead of risking a utile life, robots can serve as great companions and assistants in hospitals and other working areas. So for having the ability to understand and solve environmental problems like humans, an integrated active audio-visual perceptual system of a humanoid will serve at its best. Such a robust humanoid robot can serve as a breakthrough in every sector of Pakistan's GDP by extending the benefits in parallel to rising demands.

Technical Details of Final Deliverable

A socially interactive humanoid robot equipped with the replicated abilities of humans will be available as a deliverable. The developed integrated audio-visual perceptual system will be fully functional to serve as an essential part of every field of life.

Final Deliverable of the Project Hardware SystemCore Industry SecurityOther Industries Medical , Agriculture , Transportation , Health Core Technology RoboticsOther Technologies Artificial Intelligence(AI)Sustainable Development GoalsRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 76860
DC Geared encoder motor Equipment35001500
Arduino Equipment1700700
L298D IC Equipment3220660
Microphone Equipment21500030000
Camera Equipment245009000
Ras-Pi-Pi4 Equipment11650016500
Jumper wires Equipment1200200
Mechanical body Equipment11050010500
Battery Equipment1500500
Transportation chareges Miscellaneous 130003000
Printing Miscellaneous 140004000
Breadboard Equipment2150300

More Posts