Adil Khan 9 months ago

AdiKhanOfficial #FYP Ideas

Exploration of an Unknown Environment Using Deep Reinforcement Learning

Project Title

Project Area of Specialization

Artificial Intelligence

Project Summary

We need robots in places where human is physically unable to reach, such as planet exploration, search and rescue operations that involve traversing tight passageways and perilous activities like covert operations. To achieve these tasks exploration is required. Conventional methods are computationally expensive and require more time and memory due to which exploration becomes difficult in a large environment. We propose a novel autonomous exploration system with a mobile robot using Deep Reinforcement Learning (DRL) based approach for an unknown environment. In DRL-based system, the agent (robot) learns by interacting with the environment using trial and error methods without the need of a map. The agent improves its policy according to the reward received in effect to the action

Project Objectives

Design and analyze an autonomous exploration system of a mobile robot that has no prior knowledge of the environment utilizing DRL.
To evaluate the sim-to-real performance of the learned policies by implementing them on a real hardware-based mobile robot.

Project Implementation Method

Problem Formulation

We formulate the problem of autonomous exploration as a Markov Decision Process (MDP), where at each time step t, the agent receives an observation of its current state, takes an action, receives a reward, and transits to the next state. For the maples exploration task that we consider, the states consist of laser range readings and agent pose relative to the goal position. The task of the agent is to reach the goal position.

Project Components

We use the CoppeliaSim robot simulator to train the agent. We conduct training in the simulation setting to avoid hardware damage and reduce training time as the robot needs to be replaced at the start location after each episode. . After training, we test the learned policies on the real mobile robot. The learning algorithm is deployed in the environment to analyze the performance of the algorithm. A DRL problem consists of the following components that require proper formulation to achieve a specific task:

Agent/Mobile robot
Environment
States/Observations
Actions/Wheel velocities
Reward function

Agent

We are using a differential drive robot (BubbleRob) as our agent in a simulated environment as shown in the figure. Its wheels are based on the differential drive mechanism. For exploration without obstacles, the main characteristics of the mobile robot are given in table

Actions	left/right rotation, forward motion
Sensors	Proximity Sensor, LIDAR Sensor

Environment

An obstacle-free rectangular environment is created in the CoppeliaSim simulator for exploration without obstacles.

State Space

We have formulated state space for exploration without obstacles containing the relative angle between the goal and the heading of the Bubble Rob. To reduce the convergence complexity of the algorithm, we discretized the state-space, i.e., from -90 degrees to +90 degrees with 1-degree steps.

Reward Formulation

Reward function formulation is based on the states, collision, and relative angle. The reward function is formulated as

Actions

Sensors

Benefits of the Project

Proposed method can be deployed on real-world scenarios after learning optimal policy through simulation.
Our proposed method finds the optimal policy for a large parameter system. Our designed autonomous system has generalization capability in changing environments. The robot is able to explore dynamic environments.
The robot can be deployed to intelligently explore the disastrous environment i.e. earthquake-affected areas.

Technical Details of Final Deliverable

Framework

We use the Double Deep Q-Network (DDQN) algorithm to achieve the autonomous exploration task. In DDQN, two neural networks are used. One is called the policy network and the other is called the target network. The agnet observes a state from the environment and takes an action using the policy network, transits to the next state, and receives a reward from the environment. Episode completion information is stored into Done flag. The set {state, action, next state, reward, done} is called a transition. All the transitions are logged into Experience Replay Memory (ERM). To update the action policy (weights of the policy network), we use a random batch of transitions from the ERM. To reduce the overestimation problem, the weights of the target network are replaced with the policy network's weights after some episodes. The training process is continued until the agent gets trained enough to achieve the goal, this policy is termed optimal policy. The optimal policy is the one by which the agent is able to reach the goal in a minimum number of steps and time while maximizing the reward.

Observations are taken from the environment directly using Lidar which is preprocessed prior to being fed to the learning algorithm. The Learning algorithm is trained over many episodes. We tried to keep the simulation and real robot specifications similar to achieve high sim-to-real performance. The trained neural network is uploaded to the robot's memory. In any given state, the algorithm generates the best possible action to take. The actuation signals are sent to the wheels of the robot, either left/right rotation, or forward. The mobile robot is able to explore and reach the goal location in a real-world environment intelligently using the sim-to-real approach.

Final Deliverable of the Project

HW/SW integrated system

Core Industry

Other Industries

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Good Health and Well-Being for People

Required Resources

If you need this project, please contact me on contact@adikhanofficial.com

Comments 0

Smart three phase power factor correction system for commercial and in...

Power factor correction is the process of compensating for the lagging current by cre...

Adil Khan

9 months ago

ARTIFICIAL INTELLIGENCE BASED ENERGY MANAGEMENT AND CONTROL SYSTEM

According to UNITED NATIONS SUSTAINABLE DEVELOPMENT GOALS report, Pakistan's Power sector...

Adil Khan

9 months ago

Design and Development of a Novel Configuration of a Gravitational Wat...

Conventional Heat Exchangers are classified as parallel, counter or cross flow in which he...

Adil Khan

9 months ago

Face Expression Analysis

Computer systems have opened up new horizons for emotion detection by recognizing and dete...

Adil Khan

9 months ago

Human Activity Detection via Drone Application

Due to the increased number of coronavirus cases. Many countries (Sierra Leone, Rwanda, Ch...

Adil Khan

9 months ago