Reinforcement learning is the technique, which allows us to learn from the environment by interacting with them. Learning from interaction come from our natural behaviour, and we will implement it over the robot. The robot/agent will influence the environment through his actions and in return, the e
UWB based Indoor Path Planning and Navigation using Reinforcement Learning.
Reinforcement learning is the technique, which allows us to learn from the environment by interacting with them. Learning from interaction come from our natural behaviour, and we will implement it over the robot. The robot/agent will influence the environment through his actions and in return, the environment will give us data that data will help us to train our Neural Network Model. Reinforcement learning can be demystified by considering the agent (moving vehicle) receives an environment state which we denoted by S0 where zero stands for at t=0, based on the observation the agent chooses the action A0 after taking this action environment will give us a new state S1 and give some reward R1 to the agent, agent then take an action A1 at time step2, process continues where the environment passes the reward and state, agent responds with an action and so on.

Reinforcement learning Model Block Diagram[1]
Reinforcement learning is based on the reward-based hypothesis, and the goal of the agent is to maximize expected cumulative rewards we will deploy Reinforcement learning over the vehicle, which is moving in the synchronized environment of our indoor positioning system made with UWB ultra wideband tranceivers.
Three stationary Ultra-Wide Band (UWB) transmitters mounted on the walls of the Room which act as anchors, the sensor will emit UWB pulses simultaneously, as we can only read single pulse at a time, so to allot a fixed time interval to each pulse we have to multiplex using Time Difference of Arrival (TDMA). The moving autonomous vehicle will be roaming in the indoor environment here vehicle will also have a mounted UWB receiver. The receiver will calculate the time delay in receiving the three pulses and knowing its positions; the receiver will estimate its position using trilateration principle. From the trilateration principle, we will find the xy co-ordinate and direction and send this data to the PC for training the Reinforcemenat learning model.
[1] Google Deep mind Reasech paper.
The objectives of our project include:
The aim is to apply Reinforcement learning over a vehicle moving in a synchronized environment. The synchronized environment will be created with the help of three stationary Ultra-Wide Band (UWB) transmitters mounted on the walls of the Room, which act as anchors. The anchors will emit UWB pulses simultaneously. The moving autonomous vehicle will be roaming in the indoor environment here vehicle will also have a mounted UWB receiver. The receiver will calculate the time delay in receiving the three pulses and knowing its positions; the receiver will estimate its position using trilateration principle. From the trilateration principle, we will find the xy co-ordinate and direction, which is then send to PC. Reinforcement learning model will be training with the help of data provided by the vehicle, and the result will be in the form of the shortest possible path towards the destination.

Block Diagram of Project
The accurate self-localization and path planning of a vehicle by interacting with the environment will benefits self-driving car for indoor environment, apart from this we can also benefits the society implementing this project with some amendment in several aspects of daily life including, customer service in malls, autonomous wheelchairs, virtual reality games, asset tracking and rescue operations etc.
The final deliverable of a project will contain software hardware integrated system.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Arduino Mega | Equipment | 1 | 1000 | 1000 |
| Arduino pro mini | Equipment | 4 | 500 | 2000 |
| vehicle case | Equipment | 1 | 5000 | 5000 |
| Magnetometer | Equipment | 2 | 500 | 1000 |
| UWB1000 | Equipment | 4 | 8000 | 32000 |
| Battery | Equipment | 1 | 5000 | 5000 |
| Battery charger | Equipment | 1 | 3000 | 3000 |
| fdi chip for programming | Equipment | 4 | 600 | 2400 |
| Miscellaneous | Miscellaneous | 1 | 10000 | 10000 |
| Total in (Rs) | 61400 |
Our project laid a platform that will help the tourist to search and find place of their o...
Vehicle Instrumentation is a crucial component of sensor-based perception systems for adva...
ECG monitoring system using IOT is an IOT based project for monitoring the health of the p...
Microwave CIrcuits are implemented with the help of Transmission Lines and Waveguides.At l...
PLC applications are widely used in industries to facilitate and control repetitive proces...