Post on 11-Jan-2022
Autonomous Aerial Search and Mapping of Unknown Built
Environments
by
C. P. Charland
B.S., University of Colorado Boulder, 2020
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Masters of Science
Department of Aerospace Engineering Sciences
2020
Committee Members:
Prof. Nisar Ahmed
Prof. Morteza Lahijanian
Prof. Zachary Sunberg
ii
Charland, C. P. (M.S., Aerospace Engineering Sciences)
Autonomous Aerial Search and Mapping of Unknown Built Environments
Thesis directed by Prof. Nisar Ahmed
In this thesis, an overview of real time decision making processes and observation
frameworks is given. These processes are then used to investigate the simultaneous search
and mapping problem. This problem describes an agent in an unknown indoor environment,
attempting to determine both a map of the environment as well as the pose of targets in this
environment. This work formulates the problem as a partially observable Markov decision
process (POMDP), using a Monte-Carlo tree search based method to solve the POMDP.
To improve the efficiency of the planner, wall projection, A* augmentation, and progressive
widening are employed. Simulation results in both a simple two dimensional world and a
three dimensional realistic environment show this framework is successful at planning in
the problem space. This augmented planner captures the target 9.2% faster than the naıve
planner when operating in a representative environment.
iii
Acknowledgements
I would like to thank Prof. Nisar Ahmed for his understanding, compassion, and
guidance throughout the creation of this document. As well as, the other members of the
committee for their guidance on this project.
iv
Contents
Chapter
1 Introduction 1
1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 4
2.1 Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Map Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Problem Formulation and Solution Approach for 2D Environments 7
3.1 Problem Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 Quadrotor Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Formal Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Approximations to the POMDP . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.1 Monte Carlo Tree Search . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Naıve Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Double Progressive Widening . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6.1 Determining Tuning Parameters . . . . . . . . . . . . . . . . . . . . . 26
3.6.2 POMCP-DPW Results . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7 Wall Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
v
3.7.1 The Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.8 A* Rollout Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.8.1 Blob Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Extending to Three Dimensions 38
4.1 Mapping in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Conclusions and Future Work 44
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Bibliography 46
vi
Tables
Table
2.1 Realsense D435 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Baseline Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Tested Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Results of Fully augmented planner vs Naıve Planner . . . . . . . . . . . . . 37
4.1 Results of Fully augmented planner vs Naıve Planner in 3D Case . . . . . . . 43
vii
Figures
Figure
2.1 Occupancy Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Testbed Map approximately 50 x 50 meters . . . . . . . . . . . . . . . . . . 8
3.2 Map with Target Shown in Red . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Nonlinear quadrotor dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Resulting positions of a ‘forward’ command . . . . . . . . . . . . . . . . . . 13
3.5 Translate Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.6 Rotate Right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7 Observation Reward vs. Number of Observations for a Single Cell . . . . . . 16
3.8 Simulated Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.9 POMCP Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.10 Malformed POMCP Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.11 Typical run of Naive Planner . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.12 Parameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.13 Hough Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.14 Result of Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.15 Wall Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.16 Wall Biasing for 2D Indoor Environment. . . . . . . . . . . . . . . . . . . . . 32
3.17 Run of the planner without A* augmentation . . . . . . . . . . . . . . . . . 35
viii
3.18 With A* augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.19 Total map entropy over 30 runs with finite time . . . . . . . . . . . . . . . . 36
3.20 Mean Reward Gain Over Time . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1 Octomap Volume Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Hector Robot in the Simulated Space . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Top Down View of Willow Garage . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Target Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Observed occupied Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6 3D Planner Reward Accumulation . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 1
Introduction
Search and rescue operations often involve entering structures that are not safe, and
in these situations a robotic agent capable of entering the structure, searching for survivors.
A highly maneuverable robot, such as a quadrotor could traverse spaces that a ground rover
could not, while also having a higher vantage point to better view the scene. In the battlefield,
such an agent could enter a space and determine the location of enemy combatants and relay
this information to the war-fighter on the ground prior to entering a dangerous building. In
all these scenarios, having an accurate map of the space prior to entry allows responders to
more safely navigate the structure, potentially keeping them out of harms way.
Commonly robotics problems focus on either the search problem or the mapping
problem. The typical mapping approaches focus on building the best map of the environ-
ment possible, whereas typical search problems work with a known map.. However, planning
frameworks often aren’t built to both search and map at the same time. This thesis inves-
tigates searching and mapping through an unknown environment, as well as methods that
allow a planner in this space to use domain knowledge to plan more efficiently in an online
manner with real sensor data.
1.1 Previous Work
The previous work in this domain can be broken into three main groups: works that
focus on building a map through an unknown environment, work that focuses on search-
2
ing through a known map for a target, and general planning algorithms. Beginning with
mapping, one of the earliest methods to map an unknown environment was investigated by
Yamauchi [19], here the author describes a method of traversing a map towards the bound-
aries between unexplored and explored space. Yamauchi was one of the first papers to discuss
methods of map exploration that is not based in wall following with a simple topological
planner. More modern approaches for map building and navigation through an unknown
space like the methods proposed by Richter and Roy [13] attempt to traverse an unknown
map quickly by predicting collisions based on training data. However, this method does not
allow the agent to take actions to reduce the uncertainty of the states.
In the mapping domain, extensive work has been done, namely in the area of repre-
senting the physical worlds as a discretized set of objects. The major introduction into the
mapping world occurred in the mid 1980s, with Elfes’ sonar based mapping in unknown
environments [5]. This work describes a method of representing the world as a set of “pixel
like” volumes, from which the map is then incrementally built up as the robot receives more
measurements about the environment. The robotics community then shifted towards simul-
taneous localization and mapping problems as a heavy focus, with numerous works published
on the topic, since the problem was first coined by Leonard and Durrant-White in 1991 [8].
From there many works explored mapping as it pertained to the SLAM problem. Moving
into mapping in three dimensions is important as the aerial robots such as quadcopters and
multirotors interact with the environment in three dimensions. Extensions of the ideas of
two dimensional modeling is shown in Hornung’s Octomap 3D mapping framework [7]. This
work describes a method of occupied cubic volumes, whose three dimensional components
of this can be scaled to nearly any resolution dynamically as needed during run-time. The
Octomap framework is used extensively in this thesis for the 3D based simulations of the
quadcopter environment. Leveraging domain knowledge for map building is important in
this work as well. Luperto [9] discusses a method of extracting building layouts from grid
based maps. This method takes advantage of the regularity of the building geometry to build
3
a polygon based map of the environment. This thesis looks at a different novel strategy for
building ”map priors” online for dynamic mapping and motion planning in unknown indoor
environments.
Decision making frameworks are pervasive throughout the literature, and there are
many different methods for solving decision making problems. In this work I will focus
on methods to solve partially observable Markov decision making processes (POMDPs).
Silver et al [15] discuss a Monte Carlo based tree search method online known as partially
observable Monte-Carlo planning (POMCP). Monte Carlo tree search methods are surveyed
at length by Browne [3] mainly focusing on methods of increasing computational efficiency of
tree generation. POMCP is modified in Sunberg [17] to account for the numerical difficulties
in generating plans in large state in observation space problems such as the one investigated
in this work. This thesis will examine techniques for translating techniques like POMCP
into practical online planners for autonomous robots working with realistic sensor data in
unknown environments.
1.2 Thesis Contributions
In this thesis, a simultaneous search and mapping problem is formalized as a POMDP.
This framing of the simultaneous search and mapping problem is the major contributions
of this thesis. As a second thesis contribution, a planning framework will be built to solve
this POMDP online, with real robotic sensing and perception data to support mapping and
target search. Finally, as a third thesis contribution, the performance of this online POMDP
planning framework will be characterized using realistic 2D and 3D environment single target
search and mapping simulations for an autonomous multirotor platform. Several key aug-
mentations to this planner will be discussed as they are needed to improve performance of
the planner online and incorporate domain knowledge. A simulation environment in Gazebo
representing a realistic scenario with realistic sensors was developed the proposed planning
and sensing in this framework.
Chapter 2
Background
2.1 Sensing
The way in which the robot receives information about the environment is a critical
component to the system design. For this problem the main sensing modality is a structured
light depth camera along with a standard RGB camera based on the Intel Realsense D435.
This camera system uses a structured light projector to generate a depth image in infrared.
This image is then registered to the pose of the robot, and the map is updated with the
new depth data. To find the target, an object classifier was run on the RGB image output
of the camera. This provides a detection or non detection of the object in the frame. This
observation gives no information about the location of the object in the frame. Thus the
pose of the object is not directly observable by the robot.
In order to simulate observations in the future given a sampled map and target position,
an observation model of the sensor p(zt|xt) where zt, xt are the observation, and the total
state respectively. To simulate this measurement, the methods described by Mallick [10] are
used here. The major components of this sensor are listed below in Table 2.1.
Parameter ValueField of View 87o (Horizontal)x 58o (Vertical) x 95o (Diag.)
Resolution 848 x 480Frame Rate 90 fps
Table 2.1: Realsense D435 Parameters
5
The variance in a depth measurement can be fit by the following equation described
by Sung Ahn [1],
σz(z, θy) = 0.001063 + 0.0007278z + 0.003949z2 + 0.022z3/2θy
(π/2− θy)2(2.1)
Where θy and z are respectively the distance to the obstacle, and the angle of the ray from the
camera to the sensed object. The mean of the error is 0, and can be modeled as a Gaussian
random variable N (0, σz). This error is then fed into the occupancy grid implementation
discussed in the next section.
Object (i.e. target of interest) detection is not the main focus of this work. However, the
object detection in this work was completed using the OpenCV object detection framework
You Only Look Once (YOLO) [12] trained on the COCO dataset. The detection rate of
this object detector for the ball was set to 65.5% as described in the 2010 paper. the object
detector used in this work (where the object of interest was taken to be a static red ball
of diameter 0.5 meters) The actual rate of detection on image frames fully containing the
object was observed to be empirically higher than this threshold.
2.2 Map Representations
Occupancy grids are probabilistic discrete approximations of a continuous partially
occluded space, where the map is broken into grid cells of finite size.
Figure 2.1: Occupancy Grid
6
Above Fig. 2.1 shows a typical occupancy grid, where the grey cells have not yet been
observed, the white cells are probably unoccupied, and the black cells are probably occupied.
The vision cone shown in the picture (triangle) depicts the robot sensor’s limited field of
view. The belief of the occupancy of any given cell is represented by a binomial distribution.
One of the major assumptions of the occupancy grid representation is independence of the
cells, that is the probability of occupancy for each cell is independent of the occupancy state
for surrounding cells. This assumption and its limitations will be further examined in this
thesis.
The log odds representation of occupancy over the map can be described as shown in
Eq. 2.2, where mi is the grid cell at index i for cell location mi ∈ [0, 1]. Using the log odd
representation of occupancy prevents numerical instability near zero and one.
lt,i = logp(mi|z1:t, x1:t)
1− p(mi|z1:t, x1:t)(2.2)
Algorithm 1 occupancy grid mapping(lt−1,i, xt, zt)
lt−1,i = occupancy probability of cell ixt = robot posezt = measurementfor all cells mi do
if mi is in field of zt thenlt,i = lt−1,i + inverse model(mi, xt, zt)-l0
elselt,i = lt−1,i
end ifend for
As described in Thrun [18] Algorithm 1 updates the occupancy probability of the
grid cell i based on the observed measurement zt written in log odds form. l0 is the prior
probability of occupancy represented in the log odds form. The inverse model mentioned
in Alg. 1 is derived from Eq. 2.2. Note that in the simultaneous search and mapping
problem, it is assumed that the lower/upper bounds for all dimensions of the indoor space
to be traversed are known in advance.
Chapter 3
Problem Formulation and Solution Approach for 2D Environments
3.1 Problem Introduction
Real time planning for the simultaneous mapping and search problem was first in-
vestigated in a two dimensional grid world. The algorithms described in this chapter were
implemented in MATLAB to prototype before being ported to a C++ implementation. This
prototype was used to quickly iterate over the algorithm design and allowed rapid modifi-
cation of the project during development. In chapter 4, extensions of these techniques to a
realistic simulated 3D search space are presented.
The problem was first simplified to a two dimensional grid world example. The problem
domain was reduced to a discrete domain with a point robot. The robot is able to rotate
in 90o increments, and move forward in increments of one or two cell widths. The robot
probabilistically observes the occupancy of the cells in a cone in front of it, as discussed
in Section 2. The area to be explored by the robot is defined by a binary ”ground truth”
occupancy grid, shown in Fig. 3.1 referred to as the ‘map’.
8
Figure 3.1: Testbed Map approximately 50 x 50 meters
The ground truth map contained multiple walls shown in the black cells with the
unoccupied regions shown as the white cells. The map measured 100x100 cells, with each
cell measuring 0.5 meters by 0.5 meters. But, the resolution of the cells in the grid world is
arbitrary and can be scaled to match the scale of the quadrotor. This map was used due to
its long continuous walls, and the multiple rooms that are representative of an office space.
Inside of this space, a single target is introduced. The location of this target is shown
in red on Fig. 3.2 but is unknown to the robot at the beginning of the search mission (the
robot has some prior distribution over the target’s location, given that it knows the extent
of the indoor map boundaries a priori). The target can be anything that the detector is able
to detect reliably; in this case, it is assumed that the target is a red ball.
9
Figure 3.2: Map with Target Shown in Red
3.1.1 Quadrotor Dynamics
In order to capture the configuration of the robot properly, a state vector x was defined
to represent the current position and orientation of the robot inside of the map.
The total quadrotor state vector is described in Eq. 3.1, these terms encapsulate the
current state of the quadrotor. In square brackets ‘[]’ the frame of reference that the term is
defined in is described. This frame of reference is either ‘Earth’ frame meaning fixed to the
world that the robot is operating in described in a North-East-Down reference system, or
in ‘Body’ frame which is described in reference to the quadrotor frame (this reference frame
moves with the quadrotor through the space).
10
x =
φ- roll angle [Earth]
θ - pitch angle [Earth]
ψ - yaw angle [Earth]
p - roll rate [Body]
q - pitch rate [Body]
r - yaw rate [Body]
u - linear velocity [Body]
v - linear velocity [Body]
w - linear velocity [Body]
x - Position [Earth]
y - Position [Earth]
z - Position [Earth]
(3.1)
u = [ft τx τy τz] (3.2)
The non-linear dynamics of the quadrotor are described in [14]. This set of equations
is shown in Fig. 3.3. These show the relationship between the dynamics of the vehicle, and
the physical characteristics, inertia Ix,y,z, and vehicle mass m. These equations allow the
progression of the states through time, and can be approximated to generate trajectories.
For small perturbations we can set [φ θ ψ]T = [p q r]T , which holds for only small
angles of displacement [4].
The quadrotor dynamics were thus simplified to linear dynamics, such that Eq. 3.2 is
linearized about a stable hover to produce the set of dynamics shown in Eq. 3.3 (where u
is the set of nominal control inputs needed to produce a stable hover, with input elements
corresponding to the torques in x, y, z as well as the total force. The set of hover states is
11
Figure 3.3: Nonlinear quadrotor dynamics
12
described in Eq. 3.4
x =
φ = p
θ = q
ψ = r
p = 0
q = 0
r = 0
u = −gθ
v = gφ
w = 0
x = u
y = v
z = w
(3.3)
x = [0 0 0 0 0 0 0 0 0 x y z] (3.4)
u = [mg 0 0 0] (3.5)
The control vector described in Eq. 3.5 shows the force of gravity acting on the robot
must be counteracted in the Z direction. In a stable hover this force, mass times gravity is the
only constant force acting on the robot. In lateral directions, the robot requires no constant
input to maintain a simple hover. hence zeros for these torque terms respectively. As the
robot drifts from this constant hover feedback will be employed to return the quadrotor to
its initial state of a steady hover.
The linearized dynamics were used to compute the transition probabilities of the quad
from one occupancy map grid cell to another. This is then used to form a probabilistic state
transition matrix during the tree generation phase of the online planner to be described
13
later. This approximation to the stochastic dynamics is required due to the computational
complexity of simulating true control inputs on the fly. Rather, the results of many simu-
lations performed offline is sufficient to allow the planner to much more quickly sample the
output of the desired control action (which again in this case are limited to 90 degree turns
and forward translation maneuvers for the point robot).
Additive white Gaussian noise is introduced into the system at the level of accelerations
u, v. This model of process noise is used as an analogue to system disturbances that is not
able to be properly modeled by the first principles equations of the robot’s dynamics alone,
i.e. things such as wind, (e.g. from open doors/windows, hallway drafts, ventilation, or
ground/side wash effects, etc.) control and estimation errors all are captured by this noise
term. The simulated process noise covariance was modeled in discrete time with intensity
matrix Q defined in Eq. 3.6.
Q = diag
∣∣∣∣0 0 0 0 0 0 0.18 0.18 0 0 0 0
∣∣∣∣ [m/sec2] (3.6)
Figure 3.4: Resulting positions of a ‘forward’ command
Fig. 3.4 shows the controller for a ‘forward’ motion shown using the full quadrotor
dynamics. This is the model that is used in the forward planning simulation portion of
this work. There is a lot of uncertainty involved with this motion in this case, there is a
14
96% chance of ending up in the desired grid cell. The figure shows each run of the low
level controller as a green plus, where the low-level controller in this case represents an
implementation of the controller discussed in [2], this controller is the same one that is
implemented in the PX4 open source hardware stack. All runs of this particular controller
started at the black dot. In the true planner this is not the case, but is shown here for
illustrative purposes.
After a ‘forward’ command, the robot returns to a stable hover in the desired cell. Then
the whole process is repeated in the next cycle of the planner. Rotations are completed in
nearly the same process as the transnational motion. The same low level controller is used,
in the rotational case, the injected noise does not directly cause a rotation, therefore the
resulting rotational command ends within 0.5o of the commanded angle in every case.
3.2 Formal Problem Definition
Before describing the planning methods, the problem needs to be formally defined.
First, we recap the objective of the simultaneous mapping and search problem. We have
a robot moving in unknown indoor environment that is looking for a single static object.
The robot’s objectives are to detect and locate the the target as quickly as possible (due to
limited flight time) while also mapping out as much of the environment as possible, taking
care to avoid collisions with obstacles. The robot is assumed to know its pose at all times
(i.e. perfect onboard navigation and localization capability), but can have its movements
disrupted by stochastic disturbances in the environment. The ground truth map is assumed
to stay constant, but is unknown a priori at start of the search problem. The static target’s
location is unknown but can be described by a prior probability distribution (e.g. uniform
over all grid cell locations) which can be updated as the robot gathers new information from
its sensors.
Defining this problem as a POMDP begins by defining the tuple (S,A, T ,R,O,Z, γ),
beginning with the state space S = (m1:n, x, τ). Where m1:n are the binary occupancy states
15
of the map cells, x is the pose of the robot defined as the position in x and y as well as the
yaw angle θ, and τ is the unknown position of the target in x and y (so for the ground truth
map defined in Figure 3.1, there are a total of over 10000 states). A is the set of actions that
the robot can take, where in this problem the set of actions is move forward a meter, move
forward a half meter, turn left 90 degrees, and turn right 90 degrees. While a minimal action
set, if the problem needs to be redefined to have more actions, augmenting the action set
is simple and straightforward. This minimal action set simplifies the computation allowing
the Monte Carlo tree (to be presented later for online planning) to be built faster than for
larger action sets. Figs. 3.5 & 3.6 depict two of the actions available to the agent.
Figure 3.5: Translate Forward Figure 3.6: Rotate Right
T is the probabilistic state transition model p(s’|s,a), taking a state s and action a
to probabilistically select a new state s’. R is the reward function, which this problem is
defined in Eq. 3.7 where robs is the reward gained by observing map grid cells, and rtarget
is the reward for observing the target in the workspace. robs is calculated by determining
the number of times that a particular map cell has been observed (nobs,i) and the reward
is proportional to the inverse of that number plus one. This was chosen to incentivize the
planner to observe a cell multiple times, but as the cell is observed more than a few times
16
the reward quickly goes to near zero as shown in Fig. 3.7.
r(s, zt) = robs + rtarget (3.7)
robs =n∑i=0
1
1 + nobs,i(3.8)
Figure 3.7: Observation Reward vs. Number of Observations for a Single Cell
The observation space O of this problem is the major sticking point as the observation
consists of two components, a detection or no detection of the target in the image frame, and
a set of observed cells from the processed range data ∈ [0, 1]n where n is dependent on the
number of cells in view from the sensor, 0 is observed unoccupied, 1 is observed occupied.
Only cells that are in the view of the sensor are updated by the mapping program. The
sensor observes depth data within 0.3 and 5 meters. A representative observation is shown
in Fig. 3.8 where the teal represents observed cells that are empty, yellow shows occupied
17
cells and dark blue shows unobserved cells. Note that the occupied cells occlude areas behind
them relative to the robot shown in green. With each cell being a state in the 3D case the
robot has an extremely large observation space. Even in the small 2D case shown below
the robot can observe hundreds of cells, each one being a state means the total number of
possible observations is 2n+1 where n is the number of cells observed, and the additional one
is a detect no detect of the target. Observing 100 cells gives a total of 2.5x1030 possible
unique observations.
Figure 3.8: Simulated Observation
p(τocc|o) =p(o|τocc)p(τocc)∑
i p(o|τi)p(τi)(3.9)
Vπ = E[∞∑t=0
γtrt] (3.10)
Z for the POMDP defines the observation model p(o|s, a, s′) based on the equation
defined Eq. 2.1 and the detection rate of the YOLO object detector. The γ term of the
POMDP is the future reward discount factor, in the range [0,1] that reduces the value
of rewards in the future. This tuning knob determines how important future reward is
compared to near reward. By maximising the discounted future reward function a policy π
18
can be generated that maps a belief state to an action. Where the discounted future reward
Eq. 3.10 is the total expected reward from the current time to infinity, where future rewards
values are reduced by the discount factor γ, increasing the importance of rewards gained
earlier in time.
The target belief is updated through a Bayesian belief update for τ given a detection
or non detection of the map grid cells in view of the detector, Eq. 3.9, describes the update
over the visible grid cells i for the pose of the target.
For the simultaneous search and mapping problem, the observation space is so large
that computing the analytical solution to the problem is computationally intractable. Sim-
ulating a new observation is almost always unique, i.e. the probability of sampling a large
space and returning the same value twice is astronomically small for large discrete sets, and
zero in the case of continuous distributions.
3.3 Approximations to the POMDP
Computing a solution to the POMDP on a problem with a large state and large ob-
servation space increases the computation required to find a solution, beyond anything that
is possible in real time. In order to find a solution to the problem in reasonable time, this
thesis investigates a combination of tree-based methods with augmentation that allows the
search space to be significantly reduced. The widely used tree-based method of solving a
POMDP, known as POMCP as described in [15]. This is a Monte Carlo based tree search
method using an unweighted particle filter to represent the belief states in the tree of future
histories a series of possible state and action pairs up to a finite horizon. This tree generates
an explicit approximation of the value function using a particle set at the starting state,
rather than solving for the entire policy over the full belief space, as this is computationally
intractable even offline. POMCP has been used widely in the community for online solvers
to POMDPs, but is challenged by continuous or very large observation spaces.
Monte Carlo based planning uses a simulator as a generative model G of the POMDP.
19
This simulator generates a successor state, observation, and reward for a given state and
action pair. This simulator is used to generate sequences of state, observation, and rewards.
These sequences known as future histories are used to update the value function without
requiring the true dynamics of the underlying system.
The challenge posed by the continuous observation space is that the probability of
sampling the same observation twice from a continuous random variable is 0. Therefore
after the first step the Monte Carlo search tree explodes, and no meaningful conclusion can
be drawn over any planning horizon greater than a single observation step in the future.
This phenomenon can be seen in Fig. 3.10: as the planner generates new future histories,
none of the observations are the same; once the planner takes an action the observation will
not match any of the generated future histories. A properly formed tree of future histories is
shown in Fig. 3.9. Starting from a prior belief b the future histories are generated through
simulated action observation pairs to new beliefs bi
Figure 3.9: POMCP Tree
20
Figure 3.10: Malformed POMCP Tree
3.3.1 Monte Carlo Tree Search
Monte Carlo Tree Search uses a Monte Carlo simulation to evaluate the nodes of the
search tree. In this tree there is a single node for a given history h, containing the value
Q(h), as well as a visitation count of that node N(s). The value of a history V(h) is the
estimated mean return of all simulations that begin at h. Each simulation begins at an
initial state sampled from the belief B(ht). On initialization, the values of Q and N are set
to zero. There are two stages of simulation that are preformed. The first is when there are
children nodes for all children, in this configuration, actions are selected by UCB described
in Eq. 3.11. Actions are selected to maximize this Upper Confidence Bound
UCB(s, a) = Q(s, a) + c
√logN(s)
N(s, a)(3.11)
Where N(s) =∑
aN(s, a) and c is a parameter specific to the problem. The second
stage of simulation actions are selected by a rollout policy πrollout(h, a) such as random
uniform action selection. After a simulation, one node is added to the tree corresponding to
the first new history encountered by the simulation. When the simulation is complete, the
agent selects the action at that has the greatest value. Then receives a real observation ot.
At this point the node corresponding to the history h,at,ot becomes the new root of the tree
and the remainder is pruned.
21
3.4 Algorithms
This section will describe the anatomy of the planner, and the associated algorithms.
The following is the complete algorithms, which will be incrementally discussed through the
following sections piece by piece.
Algorithm 2 Plan(b)
L ← HoughTransform(b)for lines ∈ L do
bias b over Lend forwhile planning time do
s ← sample from bSIMULATE(s, b, dmax)if out of planning time then
return arg max Q(ba)end if
end while
Defining terms used throughout the algorithms above: the starting point is the ‘Plan’
subroutine. First given the current map, representative lines are generated through the
Hough transform and those lines are used to project into the map. A history h is generated
consisting of an observation o, and an action a repeating through (b, a1, o1, ... ak, ok). The
general flow is then this: a tree of future histories is generated until the process is out of
compute time; when new branches of the tree are generated, the Rollout policy periodically
selects the centroid of unexplored space as the target of an A* search. When reward is far
away, this allows the robot to get unstuck. This procedure allows the discovering new areas
when the robot has fully explored a sub area, but not the entire map. The Rollout strategy
described here is a significant factor which contributes to the novelty and success of this
work.
For completeness, h, ha, and hao are different stages of the tree generation, h represents
a history (b,a1, o1,...), where b is the initial belief, ha is the tree with an action selected, hao
is the tree with an action and an observation selected. d is the planning depth, i.e. how
22Algorithm 3 SIMULATE(s,h,d)
if d = 0 thenreturn 0
end ifa ← ActionProgWiden(h)if |C(ha)| ≤ k0N(ha)α0 then
s’,o,r ← G(s,a)C(ha) ← C(ha) ∪ oM(hao) ← M(hao) + 1append s’ to B(hao)if M(hao) = 1 then
total ← r + γRollout(s’,hao,d-1)else
total ← r + γSimulate(s’,hao,d-1)end if
elseo ← select o ∈ C(ha) w.p. M(hao)∑
oM(hao)
s’ ← select s’ ∈ B(hao) w.p. 1|B(hao)|
r ← R(s,a,s’)total ← r + γSimulate(s’,hao,d-1)
end ifN(h) ← N(h) + 1N(ha) ← N(ha) + 1
Q(ha) ← Q(ha) + total−Q(ha)N(ha)
return total
many levels down the tree should be expanded. C is the list of children of a node. N counts
the number of times that particular node has been visited. M counts the number of times a
history has been generated. The list of states associated with a node is denoted B. Finally
Q is the estimated value function at that node. All of these are initialized to 0 or ∅. s
is the state of the system, r is the reward function evaluated at a particular state. The
simulate dynamics function G is a black box simulator, that takes the state and the action
and forward simulates the system in time. This uses the linearized dynamics with controller
described earlier in Chapter 2.
In next sections, various implementations of the POMCP online planner are described,
building up to the most sophisticated implementation that constitutes the novel contribution
23Algorithm 4 ROLLOUT(s,h,d)
n ← random ∈ [0,1]if n ≤ bias rate then
goal ← centroid of unexplored regions’, h, a ← A*(s, goal,h)
elsea ← select from action sets’, o ← G(a,s)
end ifr ← R(s,a,s’)total ← r + γ Rollout(s’,hao,d-1)return total
of this thesis. This implementation uses double progressive widening as well as a novel
technique for generating and incorporating wall priors to account for the issue of continuous
measurements described earlier.
3.5 Naıve Planning
In the fully naıve case, the planner is run without double progressive widening, wall
continuity, and without A* augmentations. This case represents the bare bones planning
scenario. This simple case uses the POMCP framework to generate approximations to the
policy π. In the fully naive case the planner is prone to becoming stuck in small rooms and
other confined spaces. This can be seen in Fig. 3.11, here the robot trajectory shown in red
displays that the robot is unable to find a path out of the small room that it began in, or is
otherwise trapped in tight spaces.
24
Figure 3.11: Typical run of Naive Planner
Median Time to Capture (TTC) 451 secondsCapture Rate 13.3%
Mean Reward per Second 3.4 reward/secMean Total Reward at 500 seconds 1727
Total Reward Variance 63.2
Table 3.1: Baseline Results
The simulation with the naive planner was run 30 times for 500 seconds each run.
The exploration parameter c was set to 80, the tree exploration depth was set to 10, and
the particle set was 300 particles. The 500 second time limit is enforced by the battery
constraints of a typical quadrotor platform. As described in Table 3.6 the fully naive planner
does not capture the target in more than 13% of the run cases. For a target search planner
this rate of capture is not acceptable. The median time to capture for cases where the
25
target was successfully captured also supports this fact. In the cases where the planner did
successfully capture the target, it did so at nearly the end of the full trial run. This is a
major contributing factor to the terrible capture rate. With a mean reward per second of 3.4
the agent is still uncovering new cells and exploring the space. However as the observation
space is so large, the planner is essentially operating as a greedy planner, with a horizon of
one. This prevents the planner from taking information further out from the current pose
of the robot into account. The remedies for this are discussed in the following section.
3.6 Double Progressive Widening
For online path generation, a tree-based approach of approximating the value function
of the POMDP was used. This method incrementally builds a tree of future history based
on the current belief. It only requires a black box simulation of the problem domain, making
it perfect for this problem, as forward simulation of this problem is simple. During the
planning phase of the problem, future histories of the tree are built, then the robot takes the
action that generates the maximal reward. Due to the extreme size of the observation space,
Double progressive widening as described by Sunberg [17] was implemented. This reduces
the branching of the tree significantly allowing the POMDP to be approximated by removing
unlikely observations, and sending more particles down the tree on more likely observations
paths. Fig. 3.9 describes a tree of future histories that is properly formed; this tree has
particles that are evenly distributed by the planner. The branches of the tree that are more
common have more particles than those that are unlikely. Without progressive widening, the
planner would never sample the same observation twice even from the same state. This leads
to a tree that looks like the tree described in fig 3.10, where there are a large number of child
nodes from a single node. Progressive widening artificially limits the number of children
nodes that can be generated to kNα, where k, α are tuning parameters determined by the
problem itself. When the limit of node children is reached, the planner selects a child node
that has already been simulated with probability M(hao)∑oM(hao)
, where M(hao) is the number of
26
times a history has been generated by the model. This selects for observations that are more
common, and increases the number of particles sent through nodes, thereby increasing the
particles on more plausible tracks.
The major procedures for DPW as described in [17] allow the tree expansion to be
reduced, allowing the tree to be formed more properly. Without DPW, the large observation
space will prevent planning beyond a small time horizon, as the probability of receiving a
observation that was simulated is essentially zero. This means the generate tree of future
state histories and observations degenerates after receiving the first actual set of observations
from the robot’s sensors.
3.6.1 Determining Tuning Parameters
As the planner becomes more and more complex, the number of tuning parameters
also increases. For simple POMCP, the major parameter c determines the exploration of the
tree: a larger c increases the likelihood that the planner takes paths that it has not before.
To determine the values of the tuning parameters, the goal is to maximize some statistic over
the planner. In the case of this problem, the total reward was maximized with respect to the
tuning parameters. The planner was run 30 times starting in a location selected at random
over the map, this location was selected randomly to prevent the planner from fitting the
parameters to the specific starting location.
Parameter Values
α 4 5 6 7
K 1/8 1/5 1/2 1
Table 3.2: Tested Parameter Values
27
Figure 3.12: Parameter Tuning
Table 3.2 shows the parameter values used to tune the planner. These parameters are
not the only ones that were tested, but were the ones that were most productive. Higher
and lower values of α and K did not produce better results for this planner. Fig. 3.12
shows the results of the parameter tuning. There is not a large difference between all of
the parameters, but the combination of α0 = 1/5 and K = 5 produced the largest total
reward. This result is an improvement over the fully naive case. This planner still relies on
the agent generating observations that are coherent, and also assumes that all adjacent cells
are completely independent of one another. The next section details how domain knowledge
can be leveraged to improve the observations that are sampled by POMCP.
3.6.2 POMCP-DPW Results
Statistic Baseline Planner POMCP-DPWMedian Time to Capture (TTC) 451 seconds 450 seconds
Capture Rate 13.3% 17.3%Mean Reward per Second 3.4 reward/sec 3.49 reward/sec
Mean Total Reward at 500 seconds 1727 1896.5Total Reward Variance 63.2 62.4
Table 3.3: Results
28
Using Double progressive widening allows the planner to preform better than the base-
line planner. For statistical testing, the null hypothesis is stated to be the planner with
progressive widening is not better than the POMCP planner with respect to total reward
gained over the trial run. With the given n of 30 and the p for the null hypothesis of 0.133,
a Z-test is not appropriate as np ≤ 10, so a standard binomial test is used for the capture
probability. With a significance level of 10% the null hypothesis is rejected. Therefore adding
the double progressive widening does improve the planning capability of the agent.
However, the planner still assumes that there is no relationship between grid cells, this
assumption will be explored and evaluated in the next section.
3.7 Wall Continuity
In order to further reduce the observation space, assumptions about the built environ-
ment are made. In a human made environment, walls have a tendency to continue on. This
prior knowledge of the built environment allows the planner to increase confidence about the
environment through the generated occupancy grid map.
3.7.1 The Hough Transform
Traditionally used for feature extraction during image recognition algorithms, the
Hough Transform is a voting algorithm that takes in an image, and determines the ‘strongest’
line candidates based on the image. In the simplest of cases, the Hough Transform is used to
determine line segments in an image. It has been extensively modified to extract other fea-
tures such as polygons. However, for this work, the Hough Transform will be used to extract
lines that represent walls of the building. As humans have a tendency to build environments
with straight walls, and ninety degree corners, the Hough Transform is an ideal candidate
to determine where walls are in a map. The Hough Transform uses an array known as the
‘accumulator’ to characterize lines as described in Eq. 3.12, where r is the minimum distance
from the origin to the line, and θ is the angle between the x-axis and the line running from
29
the origin to the closest point as shown in Fig. 3.13. The algorithm calculates parameters
for (r, θ) in the neighborhood of a pixel. If there is evidence that a straight line exists at
that pixel, the bin for the set of parameters is incremented. As the end of a run the highest
values of the parameter space are used to determine the lines.
r = x cos(θ) + y sin(θ) (3.12)
Figure 3.13: Hough Values
Using the Hough transform, the representative walls can be extracted from the 2D
map. These segments can then be used to project the walls out into the unexplored regions
of space. These projections can be used to bias the prior in these unexplored regions. Since
the walls tend to continue into this unexplored region, the bias on the prior allows the
simulated observations to more closely match the unexplored part of the map. The result of
a Hough transform on a completed map can be seen in Fig. 3.14, where the results of the
Hough transform are shown as green line segments with red x’s on either end. These line
segments closely follow the underlying walls of the map that the transform was run on.
30
Figure 3.14: Result of Hough Transform
While the Hough transform gives the representative line segments of walls on the map,
it would be incorrect to assume that the walls continue on into the unknown regions of space
indefinitely. So, as the wall segment is extrapolated into the unknown areas of the map, the
strength of the biasing is reduced the further from the endpoints the cell is. The biasing can
be seen in as the dull yellow in Fig 3.16. The Hough transform gives the representative lines
along with their endpoints.
31
Figure 3.15: Wall Projection
Wall biasing for a simple hallway can be seen in Fig. 3.15, which depicts a robot as a
circle facing North. This robot can sense the walls shown in black. From this information, it
projects the walls into the neighboring cells. Shown as the lightening gray tones. The lighter
the tone, the less impact the observed wall has on the map. The blue shows the unobserved
space, that the robot has no information about, and the white space is observed unoccupied.
Eq. 3.13 describes the amount that the map is biased away from the terminating points of
the representative line segments. This equation implicitly assumes that biased cells should
only exist along the same axis as the representative line segment. These cells are the ones
that would remain occupied by the wall assuming that it continues along that same path.
βmax is the bias factor that describes how much the projection biases the cells that are along
the same axis as the wall segment. The cell bias at cell mi with prior l0 is biased based
on the inverse of the distance from the endpoint of the wall to the centroid of the cell in
32
question.
bias(mi) = l0 + βmax1√
(xm,i − xend)2 + (ym,i − yend)2(3.13)
This method has the benefit of resolving some of the dependency between cells that is
not accounted for in the occupancy grid representation of the map.
Figure 3.16: Wall Biasing for 2D Indoor Environment.
The biasing of the map can easily be seen in Fig. 3.16, where the robot shown in red,
the observations in dark blue and the unknown space in teal. The robot sees a small section
of wall in the map, and the planner extends that wall segment into the unknown regions
of the map where it thinks the wall is likely to be. Here multiple line segments are shown
overlapping spaces that there are likely to be walls, even though the robot has not yet seen
what is in those spaces.
33Statistic Baseline Planner POMCP-DPW Wall Projection
Median Time to Capture (TTC) 451 seconds 450 seconds 438 secondsCapture Rate 13.3% 17.3% 19.6 %
Mean Reward per Second 3.4 reward/sec 3.49 reward/sec 3.92 reward/secMean Total Reward at 500 sec 1727 1896.5 1960.4
Total Reward Variance 63.2 62.4 ,60.2
Table 3.4: Results
The wall projection continues to improve upon the performance of the planner, the
major component of this performance increase is due to more of the Monte Carlo planning
tree being maintained between action selections, as well as fewer possibilities in the observa-
tion space being considered. Again binomial hypothesis testing was used to determine the
significance of the improvement in the capture rate, with a significance level of 10% again.
Therefore wall projection is an effective method of improving the planner beyond the limits
of progressive widening.
This leaves one issue left to address, where the robot still occasionally runs into, the
robot becomes stuck in confined spaces. The next section discusses a remedy this problem.
3.8 A* Rollout Biasing
Since the planner only works on a finite horizon, as the map becomes more and more
explored, there is a larger possibility of the planner becoming trapped in a region of explored
space. In this explored space the reward is far enough away from the current position of the
robot that it does not effectively ‘see’ the reward far away. To fix this problem, the rollout
step of POMCP is biased with cycles of A* planning towards centroids of unexplored space.
A* based search is accomplished through the methods discussed in [6], where the planner
maintains each grid cell as a node in the A* search graph, and the transitions between cells
are defined by the most probable transition for a given control input. However the A* planner
requires both a goal, and a target destination, with the map defining the transitions between
cells, so the planner must determine what the location of the target is. In order to find
34
this destination in the map space the planner employs another image processing technique,
namely blob detection.
3.8.1 Blob Detection
The blob detection algorithm takes in the physical map and returns the centroids of
‘blobs’ or continuous regions of unoccupied cells. It does this by thresholding the entropy
of the cells of the map then determining the Laplacian of the Gaussian of the map in (x,y)
frame. The result of this operation is segments of the map that contain continuous regions of
unexplored space. The centroid of the largest of these regions is then easily computed. The
A* planner is run from the current location of the robot towards this centroid and included
in the POMCP tree as a branch. When the robot reaches the point of needing this biasing,
the cells between the robot and these frontiers of unexplored regions of the map have been
observed enough that the robot is certain of the occupancy of the cells between it and the
frontiers.
Statistic Baseline POMCP-DPW Wall Proj A* biasingTime to Capture (sec) 451 450 438 405
Capture Rate 13.3% 17.3% 19.6 % 20.1%Mean Reward per Second 3.4 3.49 3.92 4.07
Mean Total Reward at 500 sec 1727 1896.5 1960.4 2037.8Total Reward Variance 63.2 62.4 60.2 61.5
Table 3.5: Results
The statistics for these augmented planners are collated in Table 3.5. Again the hy-
pothesis is tested through a standard binomial test with a 10% significance level. The null
hypothesis that the augmented planner is not able to gather more total reward is rejected.
Therefore augmentation of the planner does increase the performance of the planner at
gathering more total reward over these trials than the non-augmented planner.
35
3.9 Results
The simulation was run for 500 time steps, to simulate the battery based time con-
straint. The action set is to be expanded in future work on this problem to allow the robot
to cover much more ground than the MATLAB implementation would show.
Figure 3.17: Run of the planner without A* augmentation
Fig. 3.17 shows again a typical run of the planner without A* augmentation, DPW
or wall projection, it is very easy for the planner to become stuck in an area where it can
no longer ‘see’ the reward due to computation constraints. This is shown by the robot, the
red trace, staying in the top left corner of space. Stated another way, the tree the planner
generates is no longer deep enough for it to gather reward from the far away regions. Instead
it becomes trapped in the small room.
By including the A* augmentation, the planner is able to see the reward outside of the
room, and leaves the room for the unexplored regions of space. This is seen in a typical run
of the planner with A* shown above in Fig. 3.18 where the robot leaves the room for the
unexplored regions of space.
36
Figure 3.18: With A* augmentation
Figure 3.19: Total map entropy over 30 runs with finite time
The differences between the unaugmented planner and the augmented planner can
further be seen by examining the total map entropy over time, as shown in Fig. 3.19. The
algorithm without A* often becomes trapped, and the total map entropy begins to stall, as
the robot is trapped in a room and unable to leave. On the other hand, the A* augmented
robot leaves the room allowing to reduce to total map entropy much more before hitting the
time limit. Total map entropy can be thought of as the certainty of the robot in the map
that is has generated. The lower the entropy, the better the robot believes it knows the map.
37
Figure 3.20: Mean Reward Gain Over Time
The mean reward gained by fully augmented planner is compared to the naıve case in
Fig. 3.20, as the robot traverses the space, where the reward gain per time step is shown
in blue for the totally unmodified planner and red for the fully augmented case. The fully
augmented case maintains a higher reward per second than the non augmented planner. This
is expected as the modifications to the planner have been shown to increase the exploration
of the robot as it traverses the space. Table 3.6 shows a numerical breakdown of the results
for these runs. Note that the numbers here are from the same runs that are described by
Fig. 3.20.
Statistic Baseline Planner Augmented PlannerMedian Time to Capture (TTC) 451 seconds 405 seconds
Capture Rate 13.3 % 20.1 %Mean Reward per Second 3.4 reward/sec 4.07 reward/sec
Mean Total Reward at 500 sec 1727.0 2037.8Total Reward Variance 63.2 61.5
Table 3.6: Results of Fully augmented planner vs Naıve Planner
Chapter 4
Extending to Three Dimensions
After using the two dimensional case to test the planner in a lower dimensional space,
the code was transferred to a three dimensional simulator. For the move to three dimensions
the Robot Operating System (ROS) was used along with the Gazebo simulation environment.
Gazebo is a general physics simulator that is natively shipped with ROS. This allows the
simulation of the sensors for the system as well as the response to control inputs in the system.
The mapping solution must also be migrated from two dimensions to three dimensions with
Octomap. These changes also necessitated a migration from the Matlab based simulator to
a C++ based computation for the speed. These changes were also needed because ROS uses
C++ bindings and does not support Matlab.
4.1 Mapping in 3D
The two dimensional grid is augmented to include a third dimension. For this problem
the third dimension is segmented into 3 discrete levels. Each separated from another by 1
meter. From above the map is segmented the same way as the two dimensional case, broken
into a grid of squares.
The Octomap representation [7] for a 3D occupancy grid segments the space into equal
volumes, which as in the two dimensional case, maintain a Binomial distribution about
whether the volume is occupied or free. As with the two dimensional case, the observations
update the occupancy of the volume that is visible from the perspective of the simulated
39
Figure 4.1: Octomap Volume Breakdown
robot sensor. Fig. 4.1 shows the way that the Octomap representation of the space can
operate at differing resolutions depending on what is requested by the user. In this way the
Octomap can be resolved to differing dimension to increase computational speed or fidelity.
In future iterations this project could leverage this to quickly build paths then refine them
to be more accurate with finer resolution that requires heavy computation.
Again similarly to the two dimensional case the occupancy probability can be described
by Eq. 2.2, where the log odds of occupancy lt,i is defined. The log odds is used here as well,
due to not having convergence issues when approaching zero and one.
Once the Octomap is built, the different vertical levels of the map are used to generate
a projected 2D grid map that the robot planner can use for online planning at a given
altitude. The Octomap implementation for ROS easily allows for this altitude based slicing,
making it possible to use the 2D grid world POMCP planner described in Chapter 3 along
with the various augmentation elements.
4.2 System Architecture
The system architecture is based off the Hector Gazebo ROS package [11], which con-
tains an open source quadrotor simulator. The Hector quadrotor is seen in Fig. 4.2. This
package handles the generation of the quadrotor dynamics in the Gazebo simulationenvi-
ronment, which for this thesis was selected to be the built-in ”Willow Garage” world. This
40
Figure 4.2: Hector Robot in the Simulated Space
world simulates an indoor space, with straight planar walls, and ninety degree corners.
Figure 4.3: Top Down View of Willow Garage
Fig 4.3 shows the layout of the Willow Garage world. This environment contains both
large open spaces, as well as long corridors. This world is representative of a more or less
standard built environment.
In this environment the target is represented by a sphere with a 0.5 meter radius as
seen in Fig. 4.4. The robot is assumed to be able to discern the target from the environment
with 90% accuracy as long as it is in the frame of the camera.
41
Figure 4.4: Target Sphere
Figure 4.5: Observed occupied Space
As the robot traverses the space the map that is built is seen in Fig. 4.5. This figure
also shows the two dimensional projection that occurs for the planner to operate in a lower
dimensional space. By operating in a squashed representation of the space, the computational
requirement for the planner is lowered. While this allows faster computation of plans, it also
precludes the robot from using map features such as open windows to traverse the space.
42
4.3 Results
As in the two dimensional case the robot was run in this environment for a span of time
equivalent to the typical battery life expected, which is around 500 seconds of flight for this
configuration. The preliminary results are shown below in Table 4.1. These results reflect
a run of 500 seconds and 25 separate trails, that all start at the same location on the map.
The median time to capture the target, the capture rate, mean reward per second, mean
total reward, total reward variance were recorded during the 25 trials each of the augmented
and unaugmented planners. These results can also be seen in Fig. 4.6 as the mean reward
gained per second is described there.
Figure 4.6: 3D Planner Reward Accumulation
The time to capture the target is highly dependent on the location of the target relative
to the starting pose of the robot: if the target is placed close to the starting location of the
robot it is much more likely to find it faster, whereas if the target is well hidden in the map,
the ability of the robot to find it is greatly diminished. As shown in the two dimensional
43
case, the fully augmented planner preforms slightly better than the naıve planner. The
augmentation allows the robot to find the target around three times as often as the naıve
planner for this configuration of the space. However, these results may not generalize to any
space or configuration. More work is required to determine the limits of where this planner
outperforms the baseline, and where tweaking may be required to make the augmentation
preform better.
Statistic Baseline Planner Augmented Planner
Median Time to Capture (TTC) 306 seconds 280 seconds
Capture Rate 8 % 24 %
Mean Reward per Second 2.3 reward/sec 4.5 reward/sec
Mean Total Reward at 500 sec 1164.7 2254.7
Total Reward Variance 170.5 130.5
Table 4.1: Results of Fully augmented planner vs Naıve Planner in 3D Case
Concluding the three dimensional work would require more tuning of the parameters,
as well as trialing differing environments in the Gazebo simulator. However, the preliminary
results show here are promising that this method of planner augmentation could be used
to plan for actual live vehicles in POMDP style problems. Augmentation of the planner
shows promise as a method to incorporate domain information into the planning process,
and leverage this information to make a better planner than a simple naıve planner in the
same space.
Chapter 5
Conclusions and Future Work
5.1 Conclusions
This thesis presented an approach to facilitate the simultaneous search and map of an
unknown structured environment. This environment was constrained to be a built indoor
setting, allowing the robot to use prior information about the way that humans build the
world around them to generate a better model of future observations.The problem was first
formally cast into the POMDP planning framework, allowing the simultaneous mapping and
search problem to be tackled in a unified way using computational methods that account for
planning and sensing under uncertainty at the same time. A reward function for the problem
was developed that enabled the robot to both explore/map the unknown space and search for
an unknown static target as efficiently as possible to minimize search time. Approximations
to the analytically intractable problem of finding POMDP policies were then formulated
using the POMCP Monte Carlo Tree Search algorithm for online planning. This formulation
is particularly relevant for realistic robotic platforms like aerial vehicles, which have limited
flight time and fields of view, as well as limited computational abilities.
By leveraging prior environment information, the simulated robot was able to more
accurately predict whatactual sensor observations may be obtained for online planning. This
biasing allows the extremely high dimensionality of the observation space to be sampled in
a way that is more representative of the ground truth. The Hough transform allows this
information to be extracted from the map in the form of straight lines. These lines are
45
then expanded in both directions biasing the prior of the map in unexplored regions. This
modification increases the computational efficiency of the tree generation by around 10%.
Double progressive widening of the tree of future history for the POMDP prevents the
tree from becoming sparse, where each branch of the tree is generated by only one particle.
This bush structure contains no statistically significant information about the distribution of
the map. Double progressive widening allows particles to follow branches that have already
been generated. With the inclusion of double progressive widening and the biasing, the
algorithm is able to select observations that are more likely to occur.
Transferring this information to the three dimensional space allows the algorithm to
be tested in an environment that is more representative of the operating space, namely the
Willow Garage environment which is available through the Gazebo simulator. In this three
dimensional simulator, the planner was implemented to work with Octomap and a modified
implementation of the planner for operatingn in three dimensions. Here the augmented
POMCP planner still out-preformed the naıve planner for the specific configuration that was
presented, showing promising proof of concept results for implementation on a realistic robot
platform operating in an indoor environment with actual sensing and perception hardware.
5.2 Future Work
In future versions of this work, implementing this algorithm into a real hardware envi-
ronment should be investigated. By implementing the algorithm on a real robot, questions
about the suitability of this process to be used in real life situations could be answered along
with how well the system noise was modeled during the algorithm development. Further
augmentation of the planner could be preformed to adapt to other types of semi-structured
environments, where underlying patterns to the way the area is constructed can be further
exploited to enhance online planning under uncertainty.
Bibliography
[1] Min Sung Ahn, Hosik Chae, Donghun Noh, Hyunwoo Nam, and Dennis Hong. Analysisand noise modeling of the intel realsense d435 for mobile robots. 2019 16th InternationalConference on Ubiquitous Robots (UR), pages 707–711, 2019.
[2] Dario Brescianini, Markus Hehn, and Raffaello D’Andrea. Nonlinear quadrocopter at-titude control: Technical report. Technical report, ETH Zurich, 2013.
[3] Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowl-ing, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, andSimon Colton. A survey of monte carlo tree search methods. IEEE Transactions onComputational Intelligence and AI in games, 4(1):1–43, 2012.
[4] A. Das, F. Lewis, and K. Subbarao. Dynamic inversion with zero-dynamics stabilisationfor quadrotor control. IET Control Theory Applications, 3(3):303–314, Jan 2009.
[5] A. Elfes. Sonar-based real-world mapping and navigation. IEEE Journal on Roboticsand Automation, 3(3):249–265, 1987.
[6] P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determina-tion of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics,4(2):100–107, 1968.
[7] Armin Hornung, Kai M Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Bur-gard. Octomap: An efficient probabilistic 3d mapping framework based on octrees.Autonomous robots, 34(3):189–206, 2013.
[8] J. J. Leonard and H. F. Durrant-Whyte. Mobile robot localization by tracking geometricbeacons. IEEE Transactions on Robotics and Automation, 7(3):376–382, 1991.
[9] Matteo Luperto and Francesco Amigoni. Extracting structure of buildings using layoutreconstruction. In Marcus Strand, Rudiger Dillmann, Emanuele Menegatti, and Ste-fano Ghidoni, editors, Intelligent Autonomous Systems 15, pages 652–667, Cham, 2019.Springer International Publishing.
[10] Tanwi Mallick, Partha Pratim Das, and Arun Kumar Majumdar. Characterizations ofnoise in kinect depth images: A review. IEEE Sensors journal, 14(6):1731–1740, 2014.
47
[11] Johannes Meyer, Alexander Sendobry, Stefan Kohlbrecher, Uwe Klingauf, and OskarVon Stryk. Comprehensive simulation of quadrotor uavs using ros and gazebo. volume7628, pages 400–411, 11 2012.
[12] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once:Unified, real-time object detection. In Proceedings of the IEEE conference on computervision and pattern recognition, pages 779–788, 2016.
[13] Charles Richter, William Vega-Brown, and Nicholas Roy. Bayesian Learning for SafeHigh-Speed Navigation in Unknown Environments, pages 325–341. Springer Interna-tional Publishing, Cham, 2018.
[14] Francesco Sabatino. Quadrotor control: modeling, nonlinear control design, andsimulation. PhD thesis, 2015, KTH School of Electrical Engineering (EES), StockholmSweden.
[15] David Silver and Joel Veness. Monte-carlo planning in large pomdps. In J. D. Lafferty,C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances inNeural Information Processing Systems 23, pages 2164–2172. Curran Associates, Inc.,2010.
[16] Adhiraj Somani, Nan Ye, David Hsu, and Wee Sun Lee. Despot: Online pomdp planningwith regularization. In Advances in neural information processing systems, pages 1772–1780, 2013.
[17] Zachary Sunberg and Mykel J. Kochenderfer. POMCPOW: an online algorithm forpomdps with continuous state, action, and observation spaces. CoRR, abs/1709.06196,2017.
[18] Sebastian Thrun. Probabilistic robotics. Communications of the ACM, 45(3):52–57,2002.
[19] Brian Yamauchi. Frontier-based exploration using multiple robots. In Proceedings ofthe second international conference on Autonomous agents, pages 47–53, 1998.