Movement Replication using Human Pose Estimation

Human Pose Estimation Gesture and pose recognition is very trending project in recent days. But this project will have special focus on movement replication of human beings by robots in real time.   Image processing This process can be don

2025-06-28 16:34:11 - Adil Khan

Project Title

Project Area of Specialization Electrical/Electronic EngineeringProject Summary

Human Pose Estimation

Gesture and pose recognition is very trending project in recent days. But this project will have special focus on movement replication of human beings by robots in real time.

Image processing
This process can be done usually in 2 steps i.e. human detection and human pose recognition. It includes identification of the body joints and parts. The pictures taken might be affected by noise and moisture. Thus first step is to process images and videos. Filtering will be done to remove noise.

Different approaches for image processing
Different approaches have been introduced for the image processing and recognition. Neural networks are more efficient in approximating non-linear mapping functions from arbitrary person images to the joint locations even at the presence of vague human body appearance, viewing conditions and background noises.

Image Segmentation
There are two types of image segmentation named as Semantic Segmentation and Instance Segmentation. Semantic segmentation will classify all the people as one instant. But in case of instance segmentation, different objects of the same class have been recognized as different instances.

Mask-RCNN Approach
Faster-RCNN is an object detection architecture that uses convolution neural networks. Mask R-CNN is modified form of Faster R-CNN. Mask R-CNN not only find class and boundary box for each object but will also return the object mask. This algorithm is more computational, faster and has less complexity and more accuracy.

Movement Recognition
Human body movement will be recognized by Gradient Boosting which is used to detect the multidimensional movement. Major concern will be on upper limb movement detection.

Dataset
All the dataset which is to be processed will be fed by the cameras, images and videos. Raspberry-pi (micro-controller) will be used for this project. This project will be hardware based as well as software based.

Conclusion
This project will be able to recognize bidirectional movement that is robot will replicate the movement of human from front as well as from back. Robot will be able to detect multiple person in real time. This machine will be able to mimic any kind of work in real time as well as by the storing data using human pose estimation algorithm. Achieving several advantages such as more accuracy, one-one correspondence, fast replication, lesser human intervention, action detection, and more. This will be user-friendly Interface.

Project Objectives

Movement replication Using Human Pose Estimation

The goal of this project is to make such robot which follow the trajectories of a human pose skeleton that is performing an action, instead of manually programming robots to follow trajectories. A human instructor can effectively teach the robot certain actions by just demonstrating the same. The robot can then calculate how to move its articulators to perform the same action.

Multi person Pose Estimation

We will use the multi-person pose estimation algorithm which will estimate many poses/person in an image or real time activities.

Image Segmentation

By dividing the image into segments, we can make use of the important segments for processing the image. This technique gives us a far more granular understanding of the object(s) in the image

Object detection

It dealswith detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos.

Instance Segmentation

It detects the object, and at the same time, generates a segmentation mask, which you can think as classifying each pixel - whether it belongs to an object or not.

Masked RCNN Approach

Masked RCNN is an algorithm for Human Pose Estimation. Mask RCNN is a popular architecture for performing semantic and instance segmentation. The model parallelly predicts both the bounding box locations of the various objects in the image and a mask that semantically segments the object.

Movement Replication

Robot-assisted training enables stroke patients with moderate or severe upper limb impairment to perform repetitive tasks in a highly consistent manner, tailored to their motor abilities. Upper limb problems commonly occur after a stroke, comprising loss of movement, coordination, sensation, and dexterity, which lead to difficulties with activities of daily living (ADL) such as washing and dressing.

Project Implementation Method

There will be two parts in our implementation methods.
• Hardware Implementation.
• Software Implementation.

Hardware Implementation
In hardware implementation robotic body for upper limbs movement replication with a locomotive infrastructure will be assembled.
•   The robot body can be divided into a humanoid upper body and a holonomic platform for locomotion.
•   The motors power rating should be selected according to mass and inertia of the body.
•   The movable range for motors should be in following accordance.
o Wrist   (-30 to 30/-60 to 60 degrees)
o   Elbow   (-90 to 90/-10 to 150 degrees)
o   Shoulders (-180 to 180/-10 to 180/-45 to 180 degrees)
o   Neck    (-180 to 180/-45 to 45/ -60 to 60 degrees)
o   Torso   (-180 to 180/-10 to 60/ 20 to 60 degrees)
•   The lower locomotive part of robot with tires.

Software Implementation
In software implementation, major concern is with libraries. The major library for reading the video is OpenCV. Masked RCNN algorithm will be used for the detection of humans in a video and then the segmentation of body parts. Masked RCNN is an algorithm for Human Pose Estimation.

This Convolutional Neural Network parallelly predicts both the bounding box locations of the various objects in the image and a mask that semantically segments the object. These feature maps are used by a Region Proposal Network (RPN) to get bounding box candidates for the presence of objects. The bounding box candidates select an area (region) from the feature map extracted by the CNN. The bounding box candidates can be of various sizes, a layer called RoIAlign is used to reduce the size of the extracted feature such that they are all of the uniform size. This extracted feature is passed into the parallel branches of CNNs for final prediction of the bounding boxes and the segmentation masks.

• The major libraries of python used for this purpose are.

CV2 using cv2 library of python we first start the recording using VideoWriter function into storage we want then after detecting the motions and gestures using Human Pose Estimation we can replicate the motions. Numpy is another python library used to store the image frame in matrices.

• Multi-dimensional Gradient Boosting is also done to detect the pose and movement in multiple dimensions.
• Gradient Tree-boost is a given training set of input and output so we can map the inputs to outputs using function an mapping inputs to outputs. So the goal of regression is to find a function.

Benefits of the Project

Crime detection:

By storing the data recovered from CCTV cameras in the city, various crimes can be detected using human pose estimation. These crimes can include theft, burglaries, shootings etc.

Once a crime is detected in any area, the proper law enforcement agencies can be notified and they would be able to take the necessary action required for the situation. Such crime scenes can then also be reproduced easily.

Right now there is a limit to how much video can be stored from the CCTV cameras in the city due to the bigger size of video files and the limited space available in the servers. Using human pose estimation, we can detect when any crime is occurring and only store the video footage of the time of crime hence saving up memory space as well.

Robotics:

In the field of robotics, human pose estimation can play a very vital role. Instead of programming a robot to do specific tasks, we can actually use the data recovered from the algorithms to train the robots to do various tasks by programming the robot to flow the pattern of our own movements.

Another use of human pose estimation is that we can control the movement of any robot in real time as well. This is particularly useful when there is a place inaccessible by humans, such as caves with low level of oxygen, any place contaminated by nuclear substances etc.

It can also be used by robots that are used to dispose of bombs and also in expeditions of other planets and moons. A professional can do the required actions on Earth in front of a camera and the robot on the other planet can replicate the movements in real time.

Medical:

By capturing the movement and actions of doctors while performing surgery, we can recreate those actions using the data we would store by human pose estimation algorithms. Hence we would be able to create machines that can mimic those actions and hence perform perfect surgeries without the involvement of surgeons.

We can also replicate the movement of such surgeries in real time. For example, if there is an urgent need to perform surgery but the doctor is not available at that spot, he can just sit in front of the camera anywhere to perform the surgery. By getting a live feed of the patient and using machines that would mimic his actions, surgery can be performed easily without any need for the surgeon to be physically be there.

Military:

By replicating the actions of soldiers, machines can be deployed that would follow the instructions in real time. Using this technique would drastically decrease the amount of casualties in the battlefield.

Technical Details of Final Deliverable

Human Pose Estimation:
It is a method by which the pose and movements of a person can be detected using the data obtained from camera. A picture or video is actually a matrix of values which we can use to perform different operations on it. By detecting patterns and using machine/deep learning, we can differentiate between images and detect different objects present in that particular image.
Using this idea, we detect different body parts of a person that are visible in the image and mark them. Then by joining those body parts, we can create a structure for each person. By noting the change in the structure, we can estimate the overall body movements using different algorithms.

Raspberry-pi:
This micro-controller board will be used in this project. This board requires an operating system to be installed on it for it to work properly. Debian will be used in this project. This board has joined capability of a computer and a simple micro-controller (i.e it has all the interfaces of a computer as well as input/output pins.)

Camera:
The camera would be used to capture the images or a video that will be fed to the Raspberry-pi.

Programming:
The programming would be done in python language and the program would run on the Raspberry-pi.

Algorithm:
The algorithm we would use is called Mask RCNN. In this type of algorithm, the body parts and people in the image are detected simultaneously.

Servo-motors:
These are the type of motors that have a low speed and a high torque. These motors would be used to control the movement of the robot that would be replicating the movement of the person the camera is detecting. These motors can be thought of as the joints of a person.

DC-motors:
These motors have a high starting torque and a controllable speed and work completely on DC (direct current). These motors would be used to drive and change the direction of the structure supporting the arms that would be replicating the movement of the person being detected.

Speed Controllers:
These devices are connected between the dc motors and the Raspberry-pi using an opto-coupler isolation circuit. By sending the appropriate PWM signal to the speed controllers, the speed of the motors can be controlled.

Opto-Couplers:
These devices consist of an LED and a photodiode pair. When a signal is applied on the LED side, it appears as a voltage on the photodiode side and hence separates the motors and the micro-controller. This protects the micro-controller from any transients that might come from the motor.

Final Deliverable of the Project HW/SW integrated systemCore Industry EducationOther Industries IT , Others Core Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Good Health and Well-Being for People, Industry, Innovation and InfrastructureRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	80000
HDMI display	Equipment	1	15000	15000
Jetson nano and accessories	Equipment	1	35000	35000
Camera and sensors	Equipment	1	20000	20000
printing and stationary	Miscellaneous	1	10000	10000

Movement Replication using Human Pose Estimation

More Posts