Robots have always been used to assist humans and they are specially needed for tasks that are either impossible or difficult for humans to acheive. One of such problems is navigation like in Urban Search And Rescue (USAR) operations, deep sea and space exploration, covert operations
Deep Reinforcement Learning Based Robot Navigation in a Map Less Environment
Robots have always been used to assist humans and they are specially needed for tasks that are either impossible or difficult for humans to acheive. One of such problems is navigation like in Urban Search And Rescue (USAR) operations, deep sea and space exploration, covert operations and many more. Navigation is a fundamental problem of mobile robots and is required to acheive such tasks. Since conventional methods are computationally expensive, time and memory consuming causing the navigation to become more difficult and challenging in large environment. In this work, an autonomous navigation system is proposed with a mobile robot using Deep Reinforcement Learning (DRL) based approach for a map-less environment. In DRL-based system the agent (robot) learns by interacting with the environment using trial and error methods without the need of the map. The agent improves its policy according to the reward received in effect to the action.
Formulation of the Problem
The navigation problem has been formulated as a Markov Decision Process (MDP). At each time step t, the agent obserevs the environement and takes the reading of its current state, takes an action, receives a reward, and transits to the next state. For the considered task, the states are laser range readings and the agent's pose relative to the goal position. The task of the agent is to reach the goal position.
Components Required
The agent is trained in a simulation environment to save time, as the robot is automatically placed in the starting position after each episode, and to avoid hardware damage. This is done using the CoppeliaSim Robot Simulator.
Once the robot has been tained, the learned policies are tested on a real mobile robot. The learning algorithm is deployed in the environment to analyze the performance of the algorithm. In a DRL based problem in order to acheive a task the components required are the agent, environemnt, states, actions and the reward function.
The robot BubbleRob is working as an agent in a simulated environment as shown in the figure. It is based on the differential drive mechanism. For the navigation problem, the actions taken are to move left, right and forward, and the sensors used are the distance sensor and LIDAR.

An environment with obstacles was created in the CoppeliaSim robot simulator for the navigation problem.
The state space consists relative angle between the goal and the heading of the agent. The state space has discrete values from -90 degrees to +90 degrees with 1-degree steps. the smaller the angle between the goal and the agent's heading would be, more precise would be the navigation.
Reward functionis based on the states, collision, and relative angle. The reward function is formulated as

The Double Deep Q-Network (DDQN) algorithm has been used to achieve the autonomous navigation task. In DDQN, two neural networks are used. One is called the policy network and the other is called the target network. The agnet observes a state from the environment and takes an action using the policy network, transits to the next state, and receives a reward from the environment. Episode completion information is stored into Done flag. The set {state, action, next state, reward, done} is called a transition. All the transitions are logged into Experience Replay Memory (ERM). To update the action policy (weights of the policy network), we use a random batch of transitions from the ERM. To reduce the overestimation problem, the weights of the target network are replaced with the policy network's weights after some episodes. The training process is continued until the agent gets trained enough to achieve the goal, this policy is termed optimal policy. The optimal policy is the one by which the agent is able to reach the goal in a minimum number of steps and time while maximizing the reward.

Observations are taken from the environment directly using Lidar which is preprocessed prior to being fed to the learning algorithm. The Learning algorithm is trained over many episodes. The simulation and real robot specifications have been kept similar to achieve high sim-to-real performance. The trained neural network is uploaded to the robot's memory. In any given state, the algorithm generates the best possible action to take. The actuation signals are sent to the wheels of the robot, either left/right rotation, or forward. The mobile robot is able to explore and reach the goal location in a real-world environment intelligently using the sim-to-real approach.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| dc gear motor with encoder and wheel | Equipment | 4 | 880 | 3520 |
| raspberry pi 4 4GB RAM | Equipment | 1 | 23050 | 23050 |
| VL53L0X Laser distance sensor | Equipment | 1 | 690 | 690 |
| H bridge motor driver | Equipment | 1 | 1320 | 1320 |
| LIPO Battery 11.1V 2200mAh | Equipment | 1 | 3000 | 3000 |
| LIPO battery charger | Equipment | 1 | 5000 | 5000 |
| Aluminium robot chassis | Equipment | 1 | 1600 | 1600 |
| RP Lidar A1 360 | Equipment | 1 | 24108 | 24108 |
| Thesis Printing and binding | Miscellaneous | 1 | 4000 | 4000 |
| 3D printing | Miscellaneous | 1 | 6000 | 6000 |
| Total in (Rs) | 72288 |
pecific emitter identification is a technique that distinguishes different emitters using...
There have been many improvements made in the field of technology in the past decade, but...
Internet of Things (IoT) has revolutionized the application of sensor networks by increasi...
Our main aim is to automate course planning and to make a perfect degree planner b...
COVID-19 has become a global pandemic issue, it has bad effects on the health, economy and...