Adil Khan 10 months ago

AdiKhanOfficial #FYP Ideas

Deep Reinforcement Learning Based Robot Navigation in a Map Less Environment

Project Title

Project Area of Specialization

Robotics

Project Summary

Robots have always been used to assist humans and they are specially needed for tasks that are either impossible or difficult for humans to acheive. One of such problems is navigation like in Urban Search And Rescue (USAR) operations, deep sea and space exploration, covert operations and many more. Navigation is a fundamental problem of mobile robots and is required to acheive such tasks. Since conventional methods are computationally expensive, time and memory consuming causing the navigation to become more difficult and challenging in large environment. In this work, an autonomous navigation system is proposed with a mobile robot using Deep Reinforcement Learning (DRL) based approach for a map-less environment. In DRL-based system the agent (robot) learns by interacting with the environment using trial and error methods without the need of the map. The agent improves its policy according to the reward received in effect to the action.

Project Objectives

Using Deep reinforcement learning (DRL) to design the navigation system of a mobile robot in a map-less environment.
Implementing the learned policies on a hardware based robot and evaluate their sim-to-real performance.

Project Implementation Method

Formulation of the Problem

The navigation problem has been formulated as a Markov Decision Process (MDP). At each time step t, the agent obserevs the environement and takes the reading of its current state, takes an action, receives a reward, and transits to the next state. For the considered task, the states are laser range readings and the agent's pose relative to the goal position. The task of the agent is to reach the goal position.

Components Required

The agent is trained in a simulation environment to save time, as the robot is automatically placed in the starting position after each episode, and to avoid hardware damage. This is done using the CoppeliaSim Robot Simulator.

Once the robot has been tained, the learned policies are tested on a real mobile robot. The learning algorithm is deployed in the environment to analyze the performance of the algorithm. In a DRL based problem in order to acheive a task the components required are the agent, environemnt, states, actions and the reward function.

The Agent

The robot BubbleRob is working as an agent in a simulated environment as shown in the figure. It is based on the differential drive mechanism. For the navigation problem, the actions taken are to move left, right and forward, and the sensors used are the distance sensor and LIDAR.

The Environment

An environment with obstacles was created in the CoppeliaSim robot simulator for the navigation problem.

The State Space

The state space consists relative angle between the goal and the heading of the agent. The state space has discrete values from -90 degrees to +90 degrees with 1-degree steps. the smaller the angle between the goal and the agent's heading would be, more precise would be the navigation.

The Reward Function

Reward functionis based on the states, collision, and relative angle. The reward function is formulated as

Benefits of the Project

The proposed method finds the optimal policy for a large parameter system. Our designed autonomous system has generalization capability in changing environments. The robot is able to navigate in dynamic environments.
The proposed method can be deployed on real-world scenarios after learning optimal policy through simulation.
The robot can be deployed to intelligently naviagte in the disastrous environment i.e. earthquake-affected areas.

Technical Details of Final Deliverable

The Double Deep Q-Network (DDQN) algorithm has been used to achieve the autonomous navigation task. In DDQN, two neural networks are used. One is called the policy network and the other is called the target network. The agnet observes a state from the environment and takes an action using the policy network, transits to the next state, and receives a reward from the environment. Episode completion information is stored into Done flag. The set {state, action, next state, reward, done} is called a transition. All the transitions are logged into Experience Replay Memory (ERM). To update the action policy (weights of the policy network), we use a random batch of transitions from the ERM. To reduce the overestimation problem, the weights of the target network are replaced with the policy network's weights after some episodes. The training process is continued until the agent gets trained enough to achieve the goal, this policy is termed optimal policy. The optimal policy is the one by which the agent is able to reach the goal in a minimum number of steps and time while maximizing the reward.

Observations are taken from the environment directly using Lidar which is preprocessed prior to being fed to the learning algorithm. The Learning algorithm is trained over many episodes. The simulation and real robot specifications have been kept similar to achieve high sim-to-real performance. The trained neural network is uploaded to the robot's memory. In any given state, the algorithm generates the best possible action to take. The actuation signals are sent to the wheels of the robot, either left/right rotation, or forward. The mobile robot is able to explore and reach the goal location in a real-world environment intelligently using the sim-to-real approach.

Final Deliverable of the Project

HW/SW integrated system

Core Industry

Other Industries

Core Technology

Robotics

Other Technologies

Artificial Intelligence(AI)

Sustainable Development Goals

Industry, Innovation and Infrastructure

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
dc gear motor with encoder and wheel	Equipment	4	880	3520
raspberry pi 4 4GB RAM	Equipment	1	23050	23050
VL53L0X Laser distance sensor	Equipment	1	690	690
H bridge motor driver	Equipment	1	1320	1320
LIPO Battery 11.1V 2200mAh	Equipment	1	3000	3000
LIPO battery charger	Equipment	1	5000	5000
Aluminium robot chassis	Equipment	1	1600	1600
RP Lidar A1 360	Equipment	1	24108	24108
Thesis Printing and binding	Miscellaneous	1	4000	4000
3D printing	Miscellaneous	1	6000	6000
			Total in (Rs)	72288

If you need this project, please contact me on contact@adikhanofficial.com

101

Comments 0

Specific Emitter Identification using machine learnig

pecific emitter identification is a technique that distinguishes different emitters using...

Adil Khan

10 months ago

Eye Movement Controlled Wheelchair

There have been many improvements made in the field of technology in the past decade, but...

Adil Khan

10 months ago

Remote Access to Multiple Controllers

Internet of Things (IoT) has revolutionized the application of sensor networks by increasi...

Adil Khan

10 months ago

Genetic algorithm based course advisor and degree plan automation syst...

  Our main aim is to automate course planning and to make a perfect degree planner b...

Adil Khan

10 months ago

COVID Detection From X-Ray Images using Self Attention Mechanism (Tran...

COVID-19 has become a global pandemic issue, it has bad effects on the health, economy and...

Adil Khan

10 months ago