Voice Control of Fetch Robot Using Amazon Alexa · Siri, Amazon’s Alexa, and Google Assistant, do...
Transcript of Voice Control of Fetch Robot Using Amazon Alexa · Siri, Amazon’s Alexa, and Google Assistant, do...
Voice Control of Fetch Robot Using Amazon Alexa
Purong Liu
Thesis submitted to the Faculty of the
Virginia Polytechnic Institute and State University
in partial fulfillment of the requirements for the degree of
Master of Science
in
Mechanical Engineering
Alexander Leonessa, Chair
Alan Asbeck
Kaveh Akbari Hamed
Feb 21, 2020
Blacksburg, Virginia
Keywords: Robotics, Voice Control, Alexa, Internet of Things
Copyright 2020, Purong Liu
Voice Control of Fetch Robot Using Amazon Alexa
Purong Liu
ABSTRACT
With the rapid development of computers and technology, virtual assistants (VA) are be-
coming more and more common and intelligent. However, virtual assistants, such as Apple’s
Siri, Amazon’s Alexa, and Google Assistant, do not currently have any physical functions.
As an important part of the internet of things (IoT), the field of robotics has become a
new trend in the usage of VA. In this project, a mobile robot, Fetch, is connected with
the Amazon Echo Dot through the Amazon web service (AWS) and a local robot operation
system (ROS) bridge server. We demonstrated that the robot could be controlled by voice
commands through an Amazon Alexa. Given certain commands, Fetch was able to move in
a desired direction as well as track and follow a target object. The follow model was also
learned by Neural Network training, which allows for the target position to be predicted in
future maps.
Voice Control of Fetch Robot Using Amazon Alexa
Purong Liu
GENERAL AUDIENCE ABSTRACT
Nowadays, virtual personalized assistants (VPAs) exist everywhere around us. For example,
Siri or android VPAs exist on every smartphone. More and more people are getting household
Virtual Assistants, such as Amazon Alexa, Google Assistant, and Microsoft’s Cortana. If
the virtual assistants can connect with objects which have physical functions like an actual
robot, they will be able to provide better services and more functions for humans. In this
project, a mobile robot, Fetch, is connected with the Echo dot from Amazon. This connection
allows us to control the robot by voice command. You can ask the robot to move in a given
direction or track and follow a certain object. In order to let the robot learn how to predict
the position of the target when the target is lost, a map is built as a influence factor. Since
a designed algorithm of target position prediction is difficult to implement, we opted to use
a machine learning method instead. Therefore, a machine learning algorithm was tested on
the following model.
Acknowledgments
First of all, I would like to express my sincere gratitude to my advisor Prof. Alexander
Leonessa for continuously support my research project and Master study with his immense
knowledge and patience. His guidance helped me in all the time of research and writing of
this thesis. Other than my advisor, my sincere thanks also goes to my thesis committees,
Prof. Alan Asbeck and Prof. Kaveh Akbari Hamed for their insightful comments and
encouragement and being supported. I would also like to thank my lab mate Alex Fuge for
helping with the new camera case design, and Dr. Garret Burks for his help with editing
this thesis.
iv
Contents
List of Figures viii
List of Tables x
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Review of Literature 7
2.1 Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Virtual Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 IoT Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Mapping and Neural Network Training . . . . . . . . . . . . . . . . . . . . . 14
2.5 Proposed Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Materials and Methods 18
3.1 Fetch Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
v
3.1.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.3 Track and follow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.4 Improve Follow Task Performance . . . . . . . . . . . . . . . . . . . . 28
3.1.5 Neural Network Training . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.6 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Amazon Echo Dot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 Skill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Cloud-based Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Marvin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.1 Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.2 Devices, ROS and Alexa . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Results 43
4.1 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Robot Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Train Model Evaluation And Mapping . . . . . . . . . . . . . . . . . . . . . 52
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
vi
5 Discussion and Future Work 58
5.1 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.1 Software Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.2 Continued Navigation Development . . . . . . . . . . . . . . . . . . . 59
5.2.3 Other Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6 Conclusions 61
Bibliography 63
Appendices 72
Appendix A Alexa 73
A.1 Turtle Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Appendix B Neural Network 78
vii
List of Figures
1.1 AR tags provided in ROS ar_track_alvar package . . . . . . . . . . . . . . . 3
1.2 Pokemon Go with AR technique [26] . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Fetch Robot[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 PrimeSense Carmine 1.09 and Intel SR 300 . . . . . . . . . . . . . . . . . . . 20
3.3 TF tree for camera and AR marker . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Kinetic Model of Fetch’s Base and Reference Frame . . . . . . . . . . . . . . 24
3.5 ar_track node connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 ar_track node to ar_follower node . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Velocity Computation Algorithm Flowchart . . . . . . . . . . . . . . . . . . 28
3.8 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.9 Interaction Model Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.10 Cloud watch summary from AWS Lambda . . . . . . . . . . . . . . . . . . . 37
3.11 Device Communication for Alexa Controlled Robot . . . . . . . . . . . . . . 39
3.12 Image message conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.13 JSON message conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 AR Tag Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
viii
4.2 Compared camera performance . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Turtle trajectory for AR tag following . . . . . . . . . . . . . . . . . . . . . 47
4.4 Output comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.5 Example JSON input and output . . . . . . . . . . . . . . . . . . . . . . . . 51
4.6 Training and validation loss for neural network training process . . . . . . . 53
4.7 Training model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.8 Map from two SLAM method . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.9 Map built by Cartographer with trajectory . . . . . . . . . . . . . . . . . . . 56
4.10 Trajectory for one loop and two loops . . . . . . . . . . . . . . . . . . . . . . 57
A.1 Turtle simulation for Alexa via AWS Lambda . . . . . . . . . . . . . . . . . 73
A.2 Turtle simulation for Alexa control . . . . . . . . . . . . . . . . . . . . . . . 74
A.3 Complete JSON input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
A.4 Complete JSON output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
A.5 Turtle simulation for AR tag following . . . . . . . . . . . . . . . . . . . . . 76
A.6 Detail device log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
B.1 Training Algorithm [47] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
ix
List of Tables
3.1 Camera comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Camera Parameters Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Neural Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Cartographer 2D SLAM configuration . . . . . . . . . . . . . . . . . . . . . 32
3.5 Interaction Model Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 Data loss ratio under different situation . . . . . . . . . . . . . . . . . . . . 44
4.2 Turtle Simulation Response with AWS Lambda . . . . . . . . . . . . . . . . 48
4.3 Turtle Simulation Response with BST proxy . . . . . . . . . . . . . . . . . . 49
4.4 Process time for Alexa intent requests . . . . . . . . . . . . . . . . . . . . . 51
x
List of Abbreviations
ADA Americans with Disabilities Act
AMR Autonomous Mobile Robot
AR Augmented Reality
AWS Amazon Web Service
FoV Field of View
IDL Interface Definition Language
IMU Inertial Measurement Unit
IoT Internet of Things
NN Neural Network
RFID Radio-Frequency Identification
ROS Robot Operation System
SLAM Simultaneous Localization And Mapping
SSH Secure Shell
TF Transfer Frame
URDF Unified Robot Description Format
VA Virtual Assistant
WSN Wireless Sensor Network
xi
Chapter 1
Introduction
In this chapter, the background and the motivation of this project are introduced in section
1.1, while the objective of the project is shown in section 1.2.
1.1 Background and Motivation
As one of the most important inventions today, the internet is all over our lives. It is used in
our daily life for work, school, as well as in our social lives. Along with the widespread use of
the internet, a new trend of internet applications, the Internet of Things, is also developing.
The concept of the IoT was first brought up in 1991 by Mark Weise [64]. The term “Internet
of Things” was used for the first time in 2009 by Kelvin Ashton [40]. Essentially, the IoT
can be considered as a network consisting of devices, machines and objects that are all able
to transfer data to each other without required human intervention [50].
With the development of smartphones and smart speakers, these applications are becoming
more and more common. For example, the camera in front of your house can be monitored
by your smartphone. The bedroom light can be remotely turned on and off using a virtual
household assistant. As one of the IoT applications, the field of IoT robotics attracts our
attention.
According to research from the University of Michigan School of Information, one of the
1
2 Chapter 1. Introduction
three main uses of current household VA involves the IoT voice control commands [64].
However, currently the IoT applications have numerous limitations related to their func-
tionality, specifically related to power and mobility. Current IoT controls through a VA
are limited to stationery appliances such as lights, air conditioners, and switches. While
these functions are convenient and ease our lives, the current IoT applications have minimal
physical functions and no physical interactions with humans themselves. It will be much
more beneficial for elderly individuals as well as people with disabilities if they can control a
mobile robot with only their voice, and if they can enable a robot to assist when necessary.
For example, an elderly person with a back injury may ask a mobile robot to follow and pick
up a dropped item using only voice commands.
Considering the increasing number of older adults, the geriatric and general medical service
needs are also expected to significantly increase in the coming years. As a result, healthcare
robots are beginning to draw more and more scientists’ attention [49]. In [49], the authors
summarized some of the current research on healthcare robots with applications for older
adults and the problems that can arise for elderly individuals who desire to live independently.
There are three types of healthcare robots that are used in the home which include: assistance
robots, companion robots, and monitoring robots. The first type, assistant-type robots, can
help the elderly individuals dealing with problems such as housework as well as difficulties
with mobility and bathing. In addition, companion robots focus on the psychosocial needs for
older adults. In [11], 41 publications involving four different robotic systems were reviewed:
(1) NeCoRo, a cat-like robot; (2) Bandit, a socially interactive robot with humanoid torso
mounted on a mobile platform; (3) AIBO, a dog-like robot; (4) Paro, a seal-like robot.
NeCoRo, AIBO and Paro are used to imitate the interaction between humans and animals.
The authors of [11] concluded that the use of robots in elderly care seems to have potential to
improve quality of life. Lastly, monitoring robots are used to monitor the health conditions
1.2. Objective 3
of users.
With the same concerns regarding the aging populations ability to remain independent, the
authors in [66] developed a daily life support robot, ApriAttenda. This robot can follow a
specified person and avoid obstacles. As a result, it is designed not only for elderly care,
but also to be used in baby-sitting settings and as a shopping assistant in a shopping center.
The control algorithm used to follow people is based on proportional controller. When the
person is too far, the robot will move forward. And when the person is too close, the robot
moves backward.
1.2 Objective
Similar to the purpose of ApriAttenda, this project focuses on robot assistants for at home
use. Our goals are to enable the robot to follow a person, to simplify the robot control
for users with voice control, and to build an intelligent navigation system for better people
following performance. As a fundamental approach of people following, we enabled the robot
to follow an AR tag as shown in Figure 1.1. AR techniques have been used in phone games
like Pokemon Go to simulate the appearance of virtual objects in real world. Figure 1.2
displays the game interface of Pokemon Go.
Figure 1.1: AR tags provided in ROS ar_track_alvar package[42]
In addition to implementing the following algorithm, the IoT robotic control is enabled with
4 Chapter 1. Introduction
Figure 1.2: Pokemon Go with AR technique [26]
an artificial intelligence VA. The IoT control avoids learning complicated control of robot.
With VA, users can control the robot via voice command. To achieve the goal of this project,
the Amazon Echo Dot and the Fetch robot were selected.
We chose the Amazon Echo Dot because Alexa is one of the most popular standalone virtual
assistants. Amazon held a 61.1% share of the smart speaker market in March 2019 [33],
which is almost three times the second largest market share holder Google. Additionally,
Amazon provides a lot of convenient functions on Alexa, such as shopping, searching, and
controlling smart electronics. In addition to the provided functions, Amazon also allows
developers to customize their desired functions through the Amazon Developer Console. On
the console platform, users are able to design Alexa skills.
In addition, the robot Fetch is selected because it is designed to be able to traverse ADA-
compliant buildings [65]. Fetch is an AMR from Fetch Robotics Inc, and is equipped with a
head-on RGB-D camera, one 7 degrees of freedom manipulator, a 2D laser scanner LiDAR,
1.2. Objective 5
and an adjustable torso body. It also contains an internal computer in the base. The
computer runs a Linux system, Ubuntu 14.04 LTS, and ROS Indigo. It can be remotely
controlled as long as it is connected to the wireless internet. To do the remote control, a
base computer that is connected to the same network will be needed.
ROS is a package-based middleware for controlling robots. On March 2, 2010, Willow Garage
released the first distribution of ROS 1.0 [3]. Indigo is the eighth distribution of ROS 1.0.
It provides plenty of useful packages and convenient tools for users. ROS supports four
programming languages: C++, Python, Octave, and LISP [45]. In order to implement the
cross-language development, ROS describes messages as a language-independent IDL. These
strictly typed messages are passed through ROS topics. By subscribing to a ROS topic,
a ROS node, a computation process for ROS, can retrieve messages from topics. And the
messages sent by a node will be done by publishing the message to a given topic.
Because of the strictly typed data structure of the ROS messages, we needed to transfer the
output messages from Alexa to ROS messages. One more challenge for this project was to
enable the robot to detect and follow an AR-tag automatically. To provide a better following
performance, we let the robot learn the hard-coded follow algorithm, which can be further
improved to imitate human following behavior, using neural network learning. Furthermore,
we would like to integrate an intelligent navigation system with the robot so that the robot
assistant can provide more services. Therefore, a reliable map for the precise localization will
be needed. The map information can also help the robot with learning human navigation
behavior. In summary, the objectives of the project were:
1. Enable the robot to detect and follow an AR-tag.
2. Ensure that Alexa correctly interprets voice commands and sends the appropriate
requests to the robot.
6 Chapter 1. Introduction
3. Confirm that the robot can understand the request sent by Alexa and perform the
tasks.
4. Train the robot to learn current target follow navigation behavior using a Neural
Network
5. Build a reliable map.
1.3 Outline
To achieve the objectives mentioned above, we first designed and validated the experiment.
In Chapter 2, the related previously reported literature is reviewed. Next, the materials
and methods are discussed in Chapter 3. The following sections in Chapter 4 illustrates the
results of the project and in the second to last chapter, Chapter 5, we discuss the limitations
and future work. The final chapter, Chapter 6, summarizes this thesis and gives a conclusion.
Chapter 2
Review of Literature
In this chapter, the history and development of the IoT will be discussed. In particular,
we will emphasize applications of the IoT with virtual assistants. Further, we discuss the
possible future applications and challenges of the IoT with VA. Additionally, we focus on
robotic applications of the IoT while also presenting previous works in the field of IoT
robotics with VA.
2.1 Internet of Things
The history of “Internet of Things” began in the early 1990s, when the concept of IoT was
first brought up by Mark Weiser [64]. However, according to [40], the term “IoT” was likely
to be used for the first time by Kelvin Ashton in 1999. In the article Gubbi et al. [25], the
authors listed three elements of the IoT [25]. The first one is hardware, which usually refers
to actuators, sensors, and controllers. The second element is middleware, which consist of
storage data analytics tools for computing. The last element is the presentation, which
should be a lucid visualization tool that can be designed for various applications. The
implementation of the IoT middleware consists of five key technologies: (1) RFID, (2) WSN,
(3) addressing, and (4) data storage and (5) analytics [25, 58].
(1) RFID systems consist of two parts, readers and tags. Basically, RFID allows for the
automatic identification of objects, which are assigned to a unique tag. One of the common
7
8 Chapter 2. Review of Literature
applications of RFID technology can be found in bar-code readers in grocery stores. (2)
A WSN is composed of a large number of sensing nodes. These nodes can be used to
monitor environmental conditions, such as temperature, or implantable medical devices,
such as implantable cardiac defibrillators (ICDs). (3) Addressing schemes are essential for
the IoT since they allow devices to be identified by their unique addresses. Based on wireless
technologies, such as Wi-Fi and RFID, the ability to identify things was developed. IPv4 can
address a geographically unique identification. However, the individual identification issue
is expected to depend on the development of IPv6. (4,5) Data storage issues are a result of
the unprecedented amount of data created from novel emerging fields using the internet. To
solve these storage issues, cloud-based storage arise. Along with the development of cloud
data storage, the field of cloud-based data analysis is also growing. Based on the progress and
development of computing services, the cloud-based data analysis and storage are foreseen
to be a new trend [25, 58].
With the three elements (hardware, middleware, and presentation), the IoT applications
can be classified into four groups based on different occasions: (1) Personal and home, (2)
Enterprise, (3) Utilities, and (4) Mobile [25]. (1) The personal and home application focuses
on household and personal electronics. For example, the monitor at a front door can be
connected to the VA with a screen. (2) Enterprise usage of the IoT concentrates on the work
environment electronics and utility management. Many these applications overlap with the
usage in personal and home applications and utilities. For example, a sample usage would
be to use a smartphone to control the coffeemaker in the conference room make coffee rather
than in the home. (3,4) Utilities and mobile IoT applications can be used to contribute to
smart cities. The usage of the IoT in these two groups two can be used to manage energy,
water, transportation, and logistics. For example, using smartphone to track the location of
buses in real-time.
2.1. Internet of Things 9
In a more recent article Stankovic [58], Stankovic [58] highlighted eight current research topics
in IoT, which include “massive scaling, architecture and dependencies, creating knowledge
and big data, robustness, openness, security, privacy, and human-in-the-loop” [58]. Massive
scaling includes the storage of massive created data and insufficient IPv address. Architecture
and dependencies relate to how “things” in the IoT be connected and controlled. Creating
knowledge and big data focuses on the use of generated data from the IoT. For example,
can we utilize the raw data in a way that provides some useful knowledge? The topic of
robustness considers the how devices deal with deteriorating conditions. A common example
of the deterioration problem, which is highlighted in more detail in [58], can be found in the
clock synchronization problem. The topic of openness focus on the accessibility of devices
and their control systems. It is important to note that security and privacy problems arise
with openness. If a device is easily accessed, it is also possible that others can control the
device and steal information from it. Lastly, the human-in-the-loop topic concentrates on
problems when humans are involved in the control loop. For example, in an automobile
system a human can dictate the vehicle speed by pushing more or less on a pedal. In
addition to discussing the highlighted topics described above, John A. Stankovic also detailed
the potential challenges related to the human-in-the loop topic. Four subcategories of the
human-in-the-loop applications are classified in [58] and relate to: systems directly controlled
by a human; systems which monitor humans and take proper actions; systems that model
human’s physiological parameters; and the combination of the previous three applications.
There are three main concerns mentioned [58]: (1) The necessity to understand all type
of human-in-the-loop control, (2) System identification or other techniques that need to be
extended so that the models of human behaviors can be learned, (3) The need to determine
a method which brings the human behavior models into the feedback control.
The focus of this thesis is on the voice control function of the AMR, which can be cataloged
10 Chapter 2. Review of Literature
under the last research topic mentioned above, human-in-the-loop control. This research
focuses on one of the cases of the supervisory control applications where the system receives
commands and takes action autonomously, and also sends feedback and waits for the next
command [58].
2.2 Virtual Assistant
Based on the article [25] in 2013, the estimated number of interconnected devices in 2012 is
9 million. Further, that number is expected to increase up to 24 billion by 2020. Accord-
ing to [12], the latest research from Strategy Analytics shows that the number reached 22
billion in 2018 and is expected to exponentially increase to 40 billion by 2025. Among the
interconnected devices, smart home devices are expected to be one of the fastest growing
[7]. The Smart Speaker Consumer Adoption report in March 2019 stated that by the end
of 2018, there were 66.4 million U.S. adults that had a smart speaker [33]. In addition, 85%
of smart speaker users chose to use either an Amazon Echo or Google device. In the same
report, the market share for these two device makers was also presented. By January 2019,
Amazon had a 61.1% market share while Google had 23.9% of the smart speaker market.
As the market of smart speakers continues to expand, researchers have started to wonder
how people use VA. In [52] from June 2018, researchers gathered and analyzed 278,654 voice
commands from Alexa users, and concluded that 14.7% of voice commands were used for
smart homes, which means that 14.7% of commands were used to control appliances such
as lights, TV, or air-conditioning units. Similar research was performed in [10], which was
published in April 2019. By analyzing 193,665 voice commands from Alexa and Google
Home users, researchers revealed that the IoT control commands occupy 16.7% of the total
commands. Both articles show that the leading usage of the voice command is music, which
2.2. Virtual Assistant 11
is 25% in [52] and 28.5% in [10]. Predictably, the IoT technology progress propels IoT
development.
As the technology of the IoT evolves, more applications can be incorporated into smart
speakers. The article Kepuska and Bohouta [32] discusses some possible applications of
VPA in the areas of education assistance, medical assistance, robotics and vehicles, and
disabilities systems [32]. For example, robots could be used deliver medicine to patients in
a hospital setting, or individuals with a visual impairment could use smart speakers to do
online shopping. To improve the services provided by VPA, Veton Këpuska proposes to add
more elements to the current dialogue-based systems and generate mulit-modal dialogue
systems such as the Gesture Model and Graph Model [32]. The Gesture model analyzes
a user’s motion and facial expression and responds based on the analysis. Graph models
analyze image and video data and return appropriate results. User models will collect all
the user information in advance, such as the user’s preferences. Then when users need a
response from a VPA, the previously gathered information will help guide the final response
[32].
As smart speakers get more intelligent and popular, the potential issues are gradually drawing
more attention from users. In the article, “Integration of Cloud computing and Internet of
Things: A survey”, the authors mention seven concerns related to Cloud IoT applications
[12]: (1) Privacy, (2) Security, (3) Large scale, (4) Legal and social aspects, (5) Reliability, (6)
Performance and (7) Heterogeneity. The concerns about privacy and security arise because
of the possibilities of attacks on cloud environments, which can lead to the leakage of user
information. The large-scale challenges occur when there are numerous devices involved in
one scenario. The insufficient data storage and device monitor security issues will be hard
to overcome. Additionally, when cloud services are based on data provided by users from all
over the world, various international laws need to be complied with. Because cloud IoT is
12 Chapter 2. Review of Literature
mission-oriented, when it receives a request, it will respond without thinking. For example,
if you send a move forward command to a vehicle even when there is a river directly in front,
the vehicle will complete the task without doubt. Therefore, the reliability of the IoT device
is of critical importance.
The performance challenge primarily affects the real-time applications, such as environmental
monitoring. The last challenge, heterogeneity, is a threat to all kinds of applications. It
happens when multiple devices with various systems are involved in one task. Finally, the
integration of all of these subsystems is also a significant challenge.
The concern about user privacy and data is also mentioned in [44]. Their research indicated
that Amazon collected and stored some of the users’ data [44]. The authors also highlighted
a criminal investigation from 2015, where the police department obtained recordings from
an Amazon Echo as relevant evidence. In addition, the reliability issues were investigated in
the article, “Emerging Threats in Internet of Things Voice Services” [36]. As stated in [36],
68.9% of an investigated 572,319 audio samples were accurately interpreted. By analyzing
the misunderstood samples, the article extracted that 41.7% were due to phonetic confusion,
33.3% were homophones, 8.3% were compound words, and the rest at 16.7% were due to
other factors.
2.3 IoT Robotics
As a prospective application of the IoT, indoor robotic control has potential to be used to
create a smart home for individuals with multiple disabilities or elderly individuals. In [9],
a smart home control framework was built based on ZigBee, which can establish a personal
network. The structure successfully integrated and controlled devices such as a doorbell, fire
alarm, light and refrigerator door. In addition to providing disabilities assistance, the IoT
2.3. IoT Robotics 13
can also be used in industrial environments. For example, the voice control of an industrial
robot was designed and tested in [43]. In this paper, the author used the Microsoft Speech
Engine to process speech recognition. The human commands were then converted to text.
The converted text was then used to control two industrial robots that performed two simple
tasks. One robot completed a pick and place while the other performed simple linear welding.
In addition to the Microsoft Speech Engine mentioned above, Amazon Alexa also can also
be used as a speech recognition engine for research. The Alexa voice assistant can be used
for Natural Language Processing in the integration of the Alexa assistant as a voice interface
for robotics platforms [28]. The authors demonstrate how the Amazon Alexa be connected
with other devices so that the VA users can control devices by voice. One example of these
features is demonstrated in Alexa’s ability to connect to a Raspberry Pi with ROS via the
MQTT network communication protocol. The remote server is composed of Mosquitto, a
message broker and RedBot platform, as well as a chatbot platform based on Node-RED.
Another research use of Amazon Alexa can be found in the speech to text conversion as high-
lighted in [19]. To explore the collaboration environment between robots and humans, Craig
Douglas and Robert Lodder integrated Amazon Alexa and a ROS-based double telepresence
robot for human identification and localization. ROS gazebo was used for the simulation and
TensorFlow is used for Artificial Intelligence, which allows the robot to recognize humans in
a crowded environment.
In [57], an Amazon Tap Speaker was used to activate a lawnmower. Along with Alexa voice
services, the researchers applied a free web-based service, If This Than That (IFTTT), which
built simple conditional statements for triggering and controlling the speed of lawnmower.
In this case, IFTTT plays the role of cloud service in the Alexa control schemes. In addition
to using the Alexa, the researchers also built a web-based GUI to control the speed of
lawnmower.
14 Chapter 2. Review of Literature
Another approach of using Alexa to control devices is through the adoption of the AWS
services: AWS Lambda and AWS IoT. These methods are demonstrated in [30]. AWS
Lambda is a cloud computing service and AWS IoT allows for bi-directional communication
between IoT devices and an AWS cloud, which is Lambda in this case. Moreover, a Raspberry
Pi, a microcomputer, can be used to transmit messages from the cloud service to the local
network. The authors’ intelligent robotic assistance system, equipped with several devices,
can then be controlled over the local network. Additionally, the devices’ state can be updated
to the cloud service.
2.4 Mapping and Neural Network Training
In addition to being used for multiple device communication, the IoT system can also be used
in the areas of mapping and localization. In the review paper [55], the authors mentioned
the concept of the location of things (LoT) and the importance of the LoT for the IoT
infrastructure. The LoT acts like a search engine for data and device management and the
integration of LoT and IoT is proposed in [41]. In Nath et al., researchers built a voice-
based location detection system using an Alexa Echo and ultrasonic sensor and integrated
the system with a smart home. Another voice interface, Google Assistant, was used for
indoor navigation in [54]. Instead of having mobile devices for navigation, stationary devices
were used for localization.
The work presented in [55] also summarized the methodology and applications of SLAM,
which is often used for path planning and obstacle avoidance in robotics. SLAM has been
an active research area and has gained even more attention since the autonomous car won
the DARPA Grand Challenge in 2005 [60]. The 2D SLAM technique [38] is extended to
3D SLAM using 3D laser range finder in [48] and Visual SLAM in [31] using the fusion
2.4. Mapping and Neural Network Training 15
3D camera data information. Currently multiple SLAM algorithms and ROS packages are
available. In the article [23], the authors compared three 2D laser-based SLAM methods,
GMapping, Hector SLAM, and Cartographer. Their results indicated that both Hector
SLAM and Cartographer had small RMSE and absolute trajectory errors. Further, they
indicated that Cartographer was more robust to environment change than the other options.
Cartographer was released in 2016 and supported in ROS [27]. It can provide real-time
mapping solutions as well as offline mapping from recorded rosbag. To minimize the drift
from odometry, Cartographer applied a new algorithm that generates several submaps with
constrains and landmarks as the local SLAM. Based on the collected submaps, the backend
subsystem will perform the scan-matching scan for loop closure and optimize the final map
and trajectory. It allows developers to use both 2D and 3D SLAM.
In addition to using SLAM, machine learning is another popular application in the field of
robotics. There are different types of machine learning methods, including: reinforcement
learning, supervised learning and unsupervised learning [35]. Machine learning is widely
applied in many research areas, such as trajectory prediction [20], natural language processing
[16], and close-loop control [24]. In [46], a gradient-based online learning algorithm is used
to improve the controller performance. The research group in [47], used a Neural Network
algorithm to train a controller model from a unicycle kinetic model before applying the
trained model to a Quadcopter. The result in [47], indicated that the training model from a
simple system can be applied to a more complicated system.
In addition to the previous applications, machine learning can also be used in SLAM to im-
prove the mapping and localization performances. For example, a deep recurrent convolution
neural network is used to improve the robot localization in [63]. The authors in [53] proposed
a self-localization method which was supported by support vector machine algorithm.
16 Chapter 2. Review of Literature
2.5 Proposed Work
Summarized from section 2.1, the technologies development of RFID, WSN, addressing, etc.,
promote the expansion of the IoT. IoT applications are increasingly entering our daily life.
Based on its evolving speed, IoT technologies can be applied in many fields in the future,
including smart homes, smart cities, and smart offices. Section 2.2 indicates that as one of
the critical devices in IoT smart home application, smart speakers, embedded with a virtual
assistant, are becoming more prevalent. Together with the popularization of smart speakers,
the usage of smart speakers is spreading out rapidly. The VA embedded in smart speakers
also can be used as a speech recognition engine.
Current IoT applications are limited by the devices’ mobility and power. Therefore, re-
searchers are interested in exploring new applications of the IoT through VA, and particu-
larly applications related to the IoT robotics. In section 2.3, several methods of connecting
VA and robots was discussed. To control the robot successfully, tools other than a speech
recognition engine are needed. Since the message transfer from device to device will affect
the response time, we choose the most simple one among different proposed methods in lit-
erature. This project connects the VA, Amazon Alexa, with an AMR, Fetch, through AWS
Lambda and BTS proxy tools, so that the robot can be directed in a certain direction and
can follow a desired target.
Being able to successfully carry out target following tasks is fundamental to a final intelligent
navigation system, and it is also an important task for a robot companion. To assist the user
in time, the robot needs to follow a target consistently. However, the hand-coded algorithm
is sensitive to noise such as incorrect target detecting. In order to improve the performance of
the robot companion task, we proposed a data-driven neural network algorithm for the robot
to learn target following control. The neural network computed model is expected to be more
2.5. Proposed Work 17
robust to changes in the environment. In addition to learning simple navigation behavior,
robot localization is also important for further development of navigation. Therefore, we
needed a reliable map for precise localization. The map information can then be used when
the robot lost the target and needed to navigate itself back to a safe place.
Chapter 3
Materials and Methods
As mentioned in chapter 2, there are three main elements of the IoT: hardware, middleware
and presentation. In our project, the hardware used is the mobile robot, Fetch, which is
made up of actuators, sensors and embedded communication hardware. Our remote server,
Marvin, is the middleware. Marvin will perform as a computing tool for voice control and
visualization. Last but not least, an Echo Dot embedded with Alexa will be used as the
presentation, which is an interpretation tool for various applications.
In this chapter, the materials and methods for the voice control of the robot will be discussed.
Section 3.1 introduces the configuration and task implementation method of Fetch. Section
3.2 presents how Alexa was used for voice control. Section 3.3 shows the setting of the remote
server, Marvin. Section 3.4 focuses on communication between the devices.
3.1 Fetch Robot
Fetch is a commercial robot available from Fetch Robotics Inc, and equiped with a mobile
base, a manipulator, an adjustable torso, and a head camera [65]. Additionally, a laser
scanner is included in the robot in the base. The base design limits its mobility and, as
a result, Fetch can only be used in an indoor environment. The robot is configured with
an embedded computer, which runs Ubuntu 14.04 and ROS indigo. ROS is a middleware
software based on packages. It is not a actual operation system even there is a ‘operation
18
3.1. Fetch Robot 19
system’ in the name. ROS is a complex robot control software and uses package to organize
the software that it contains. To control the motion of Fetch, we need to send ROS accepted
messages to the corresponding control node. Figure 3.1 displays a picture of Fetch robot
similar to the one used in the experimental work presented in this thesis.
Figure 3.1: Fetch Robot[4]
3.1.1 Camera
Fetch originally had a built-in camera (model PrimeSense Carmine 1.09) on the head. The
camera model is a 3D RGB Depth sensor with a 0.35 − 3 m operation range and was best
calibrated in the range of 0.35−1.4 m [65]. PrimeSense also generates PointCloud image data,
which can show the spatial position of objects relative to the camera [51]. Unfortunately,
this original camera was broken and had no image available. The issue was suspected to be
a software conflict caused by previous research efforts. Despite attempting to fix the issues
20 Chapter 3. Materials and Methods
by reinstalling the camera drivers and reinstall most of the related packages, no images were
able to be obtained from the camera. We removed the camera from the robot in order to test
for hardware issues, however, the camera problems were still unable to be diagnosed. This
camera model has been out of the market since 2013, because Apple Inc bought PrimeSense
and stopped making Carmine 1.09. Therefore, a new camera model was needed to replace
the Carmine 1.09.
(a) PrimeSense Carmine 1.09 [5]
(b) Intel SR 300 [6]
Figure 3.2: PrimeSense Carmine 1.09 and Intel SR 300
The replaced camera was chosen based on the following metrics: operation range, accuracy,
provided data information, and size. We required a similar operation range because the
robot needs to see a target within a short range and with high accuracy. Further, the size
of the camera decided if we were able to place the camera at the initial position on the head
of the robot. In the article Carfagni et al. [13], the authors compared the performance of
three cameras, the Kinect v2, Carmine 1.09 and SR300 [13]. Their results indicated that the
3.1. Fetch Robot 21
Carmine 1.09 and the SR300 had both close probing errors PF and flatness errors F . Further,
the SR300 had less sphere-spacing error SS than the Carmine 1.09. Table 3.1 summarized
their results.
Device Carmine 1.09 SR 300 Kinect v2PF [mm] 9.32 8.3 20.13F [mm] 6.71 6.88 12.58SS [mm] 26.08 6.05 19.7
Table 3.1: Camera comparison[13]
The operation range of the SR300 is 0.2− 1.5 m [17]. And the operation range of Kinect v2
is 0.5− 4.5 m. The size of SR300 measures 110× 12.6× 4.4 mm while the size of Carmine
1.09 is 180× 25× 35 mm [1]. Kinect v2’s size is 360.68× 139.7× 165.1 mm. The comparison
is displayed in Table 3.2
Model Carmine 1.09 SR300 Kinect v2Operation Range (m) 0.35− 1.4 0.2− 1.5 0.5− 4.5
Size (mm) 180× 25× 35 mm 110× 12.6× 4.4 360.68× 139.7× 165.1
Image DataColor ✓ ✓ ✓Depth ✓ ✓ ✓
PointCloud ✓ ✓ ✓
Table 3.2: Camera Parameters Comparison
After evaluating options, it was determined that the Intel RealSense SR300 Depth camera
satisfied all of the required conditions. As a result, the Intel RealSense SR300 with a new
3D-printed camera case substituted with Carmine 1.09. Images of the two cameras can be
seen in Figure 3.2
22 Chapter 3. Materials and Methods
3.1.2 Software
ROS provides numerous convenient tools and packages for work related to visualization,
simulation and control. In particular, Rviz is one of the most common tools for visualization.
Rviz allows users to view multiple messages in the same window, including camera images,
laser scan data, as well as the robot model. The robot model can be built based on two
packages, TF and URDF. URDF describes the unchanged parameters of the robot, such as
the radius of the base and where the camera is located related to the head tilt pan link,
while TF keeps track of the relative position of the robot from one frame to another.
Along with the packages mentioned above, ar_track_alvar is an important package in this
project. With this package, the robot can detect the designed AR tag and receive the tag’s
position and orientation relative to a chosen frame. In addition, the detection of the AR-tag
has a small error in pose estimation of about 4 cm [29]. By changing the subscription frame
to the head camera RGB frame, ‘head_camera_link’, the corresponding relative position
messages will be published to the ROS topic /ar_pose_marker. Figure 3.3 highlights the
TF tree for a detected AR tag.
Since Fetch is a mobile robot and is able to interact with the surrounding environment, it
also has the potential to hurt people around it or harm itself. Therefore, every step of the
experiment was simulated in advance using the turtlesim package. Turtlesim is usually used
in ROS tutorials. We chose to use the package to test the performance of a given task due
to its movement control topic, /turtle1/cmd_vel, which uses the same type of message as
the robot’s, /cmd_vel. They are both geometr_msgs/Twist messages, which contains three
linear and three angular components. The topic /cmd_vel means command velocity. The
message represent the linear and angular velocity that is sent by command. The infomation
3.1. Fetch Robot 23
Figure 3.3: TF tree for camera and AR marker
in /cmd_vel can then be written as a matrix, T in Equation 3.1,
T =
vω
=
vx vy vz
ωx ωy ωz
(3.1)
where the first row contains the linear components, vx, vy, and vz and the second row contains
the angular components, ωx, ωy, and ωz.
In order to interact with Alexa, all of the ROS messages need to be converted to JSON mes-
sages for Alexa. Accordingly, two more packages were needed: roslibjs and rosbridge_server.
More detail will be provided in the later section 3.4 about the use of these packages.
3.1.3 Track and follow
To implement the following function, a follower node /ar_follower (/ar_follower_turtle for
turtlesim) was created to perform computations. The marker position information was
24 Chapter 3. Materials and Methods
published to the ROS topic /ar_pose_marker as a ar_track_alvar/AlvarMarkers message,
which is a customized message type defined in the package. The information needed, such
as the position data, could then be retrieved from AlvarMarkers.markers.pose.pose.position.
The position information was comprised of three components, x, y and z, where x represents
the distance from the tag to the camera, y is the horizontal displacement from the center of
the FoV, and z is the vertical displacement from the center.
The position of the AR tag is referred to the camera frame, which can be consider as the
robot’s frame in this project, because the robot’s head is stationary in reference to the robot’s
base. When the robot moves, the robot’s frame moves accordingly.
Figure 3.4: Kinetic Model of Fetch’s Base and Reference Frame
The kinetic model of the robot is a unicycle model, which can be expressed using the differ-
ential equation in Equation 3.2 [59].
3.1. Fetch Robot 25
x
y
ω
=
cos(Φ) 0
sin(Φ) 0
0 1
vω
(3.2)
As shown in the Figure 3.4 and Equation 3.2, the robot can’t move linearly in y or z direction
or rotate along x or y axis, the /cmd_vel message matrix T can be simplified to Equation
3.3.
T =
vx 0 0
0 0 wz
(3.3)
Since the z position of the tag will not affect the movement of Fetch, we will only need pose.x
and pose.y from the position of AR tag. In 3.4 shows the position relationship between the
robot and AR tag.
The mechanism of tracking and following the AR tag in ROS is illustrated in Figure 3.5.
Additionally, a combined and simplified version of two nodes’ relationship is displayed in
Figure 3.6.
As the figure shows, the image received from the camera was processed in a ROS node
/ar_track_alvar and where a ar_track_alvar/AlvarMarkers message was generated. The
message was then published to a ROS topic /ar_pose_marker. Node /ar_follower (/ar_follower_turtle
for turtlesim) subscribes to the /ar_pose_marker and which then computes the geome-
try_msgs/Twist message for the base movement. In the same way, the message will be
published to the base control topic /cmd_vel (/turtle1/cmd_vel for turtlesim).
When the robot receives a marker position, the computation will begin. Although there are
three position components, x, y and z, for this computation we only required the x and y
26 Chapter 3. Materials and Methods
/ar_follower
/ar_track_alvar
/head_camera/rgb/image_raw
/cmd_vel/tf
/ar_pose_marker
Shape: Square à Topic
Eclipse à Node
Color: Dark Blue à Publish
Grean à Subscribe
Light Blue à Publish + Subscribe
Figure 3.5: ar_track node connection
Figure 3.6: ar_track node to ar_follower node
positions. We can define a matrix, P , as a simplified input matrix based on the position
components, and a matrix O as an expected output matrix, which should be the message
send to /cmd_vel. Therefore, 0 is using the same format as T .
P =
pose.x 0
0 pose.y
; O =
vω
=
o11 0 0
0 0 o23
(3.4)
Due to safety concerns, the output velocity was limited. The minimum speed was set so that
3.1. Fetch Robot 27
the robot was able to move smoothly.
Max =
max{vx}
max{ωz}
=
max1
max2
; Min =
min{vx}
min{ωz}
=
min1
min2
(3.5)
A weighting matrix W is defined as Equation 3.6, where scale.x and scale.y are the weighting
values for pose.x and pose.y. And the goal position of the AR tag goal.x and goal.y form
the goal position matrix G.
W =
scale.x 0 0
0 0 scale.y
; G =
goal.x 0
0 goal.y
(3.6)
To let the robot follow the AR tag, the first step was to check if the target is within the
threshold using equation 3.7, where H is the offset from goal position. If the target falls
outside of our threshold, we continued to calculate the desired velocity D with equation 3.8
with goal.y = 0. ∣∣∣∣H∣∣∣∣ = ∣∣∣∣P −G
∣∣∣∣ =∣∣∣∣∣∣∣h1 0
0 h2
∣∣∣∣∣∣∣ (3.7)
D = H ∗W = (P −G)∗W =
(x− goal.x) ∗ scale.x 0 0
0 0 y ∗ scale.y
=
d1 0 0
0 0 d2
(3.8)
After we computed the desired velocity, we then needed to compare it with the limit 3.5. A
flowchart for the complete algorithm can be seen in the illustration displayed in Figure 3.7,
where i ∈ {1, 2}.
After the computation, the final geometry_msgs/Twist message will be output in the matrix,
28 Chapter 3. Materials and Methods
Figure 3.7: Velocity Computation Algorithm Flowchart
O, as shown in equation 3.9
O =
o1 0 0
0 0 o2
=
linear.x 0 0
0 0 angular.z
(3.9)
The final output will then be published to the topic /cmd_vel as a geometry_msgs/Twist
message.
3.1.4 Improve Follow Task Performance
The target follow algorithm in the above section operates as a proportional controller. How-
ever, this following model is very sensitive to noise and it is hard to filter all the noise with
a hard-coded algorithm. Other than the proportional control, there are also other control
methods which can be used for object following and navigation. The authors in [37], pro-
posed a saturation feedback controller to simultaneously solve the trajectory tracking and
regulation problems for path planning. The controller enables a unicycle-modeled mobile
3.1. Fetch Robot 29
robot to follow a designed line or circle. Another control strategy for mobile robot object
following is designed based on two discrete PID controllers at each wheel [34]. The con-
trollers in [34] considers the situation that the robot needs to follow a target in a dynamic
environment.
In addition to using controllers to smooth the trajectory of the robot and improve the robust-
ness for target following, researchers also aim at robot intelligent navigation. The authors of
[61] introduce a cognitive map based on the spatial cognition of objects. The proposed frame-
work decomposes the cognitive map into two parts. One is feature extraction of object and
environment. The other one is understanding and reasoning about the environment. This
cognitive map allows the robot to approach human’s spatial cognition. In [21], researchers
demonstrate another intelligent navigation system, human-awareness navigation based on
the de Social Force Model. This navigation system takes into consideration that robot in-
teracts with humans or obstacles. The goal of [21] is to use this navigation algorithm for a
robot companion task. In addition, a socially aware navigation method is proposed in [14].
Unlike the hand-code heuristic navigation algorithm presented in [21], the authors of [14]
applied deep reinforcement learning to let the robot learn how to walk in a pedestrian-rich
environment while avoiding collisions.
Machine learning methods are not only used in navigation, but also used in robotic control.
The author of [22] presented a neural network computed torque controller for a nonholonomic
mobile robot. This controller can be applied for trajectory tracking, path following, and
posture stabilization. With the neural network controller, priori dynamic parameters of the
robot are no longer needed. Furthermore, the control model improves the performance of the
robot drastically. Similarly, in [56], the authors adopt reinforcement learning for a mobile
robot to perform corridor following and obstacle avoidance. The robot learns the model
from the example answer to the task, which is given by computation or controlled by a
30 Chapter 3. Materials and Methods
human directly. As a result, their robot manages to learn good control laws faster than the
hand-engineered programming process.
Based on the previous work that has been conducted for robot companion tasks, we believe
that a neural network computed algorithm has the potential to improve the current following
task performance and can be evolved to imitate a human following behavior for navigation.
Therefore, we proposed a data-driven neural network algorithm and used the experimental
data from the hand-coded follow task to train the following algorithm.
3.1.5 Neural Network Training
According to 3.1.3, the base movement control only depends on the position of the marker.
If we want to add dependencies such as the location or trajectory to control the movement
of the robot, numerous code modifications will be needed. Most importantly, the debugging
process will be difficult and time-consuming. Therefore, we would like to let the robot learn
a control model itself using neural network training.
The AR tag follow model is used to test the neural network algorithm. We adapted the
algorithm in [47], and further simplified to algorithm 1. The original algorithm is shown in
B.1.
Algorithm 1 Neural Network Training ProcessRandomly generate validation set Xvalid
for each epoch i dofor each batch j do
Generate training set Xbatch
Train neural network using Xbatch
end forCompute loss(RSME)
end for
To train the best fit model, we applied a neural network using Keras [15] and Tensorflow [8].
3.1. Fetch Robot 31
The inputs were the relative position of the markers, x and y, while the outputs were the
command velocity, linear.x and angular.z. The neural network was built with three hidden
layers, with each layer containing 256 hidden units. We choose to use Leaky ReLU [39] as
the activation function with α = 0.1. The training and test loss were computed using a Root
Mean Square Error (RMSE) function which was defined as,
RMSE =
√∑ni=1(yi − yi)2
n, (3.10)
, where n is the total number of samples, yi is the actual value, and yi is the predicted
value. Figure 3.8 shows the neural network architecture for the follow task. In addition, the
parameters are summarized in Table 3.3.
Parameter Description Valuenl Number of hidden layer 3nn Number of hidden units 256fa Activation function ‘LeakyReLU’α Parameter for activation 0.1
Table 3.3: Neural Network Parameters
.
.
.
.
.
.
.
.
.
Input Layer Hidden Layers[3 X 256]
Output Layer
Pose.X
Pose.Y
cmd_vel.Linear.X
cmd_vel.Angular.Z
Figure 3.8: Neural Network
32 Chapter 3. Materials and Methods
3.1.6 Mapping
Since we would like to apply the map information to predict a target position, a map that can
be trusted by the robot is necessary. Fetch Robotics provides a navigation package based on
Karto SLAM for building a map. Karto SLAM takes information from LiDAR and uses that
information to build maps in real-time using particle filter localization, which is also known as
Monte Carlo Localization (MCL). In the article [62], Karto SLAM 1.1 has a RMSE = 0.3207
m, and maximum error, errormax = 1.21 m. Although the RMSE is small, the maximum
error is significantly larger which indicates that the Karto SLAM might be unreliable to
use. Therefore, another SLAM package, Cartographer, was tested and compared with Karto
SLAM. According to [23] Cartographer had the best performance among the three tested
SLAM methods, which are GMapping, Hector SLAM and Cartographer. Cartographer can
build maps with only laser scan information. To improve the quality of the final map, sensor-
fusion with IMU data, camera point cloud data and GPS information can be added at the
discretion of the user. To visualize the map generation process in RVIZ, we choose to replay
the rosbag data for SLAM. The configuration of Cartographer in this project is listed in
Table 3.4. Although Point Cloud data is available from Fetch, it will not be used for the
map building because point cloud contain enormous data and recording those data will cause
an error of buffer exceed. Besides, Karto SLAM can’t do 3D mapping. Only the 2D map
will be compared. Therefore, point cloud data is set to 0.
Option Value Related TopicProvide Odom True /odomUse IMU True /imu1/imuUse GPS False -Number of LiDAR 1 /base_scanNumber of Point Cloud 0 -
Table 3.4: Cartographer 2D SLAM configuration
3.2. Amazon Echo Dot 33
3.2 Amazon Echo Dot
The Amazon Echo dot embedded with Alexa is one of the most popular smart speakers
today. It has small size ((32mm×84mm×84mm)) and light weight (163 g). These characters
allow us to place the device on the robot or to carry it easily. In addition to the regular
speaker usage, the Echo dot can also communicate with users and understand the user’s
voice commands. The Echo dot has a 7-microphone array tucked underneath the light ring,
which enables it to pick up voices from every direction of the room. Typically, the hands-free
control is used for searching or controlling Alexa compatible smart appliances. Using these
features, voice commands can be sent to the Alexa voice service and be interpreted.
3.2.1 Skill
Amazon provides the Alexa Voice Service for third-party developers so that they can make
their device Alexa-enabled. Additionally, the Alexa Skill Kit is available for developers to
customize the Alexa skills, so that Alexa can complete more complicated tasks.
A custom skill requires the following components: (1) an invocation name for Alexa to iden-
tify the skill; (2) intents that represent operations; (3) sample utterances which are sample
commands users might use; (4) cloud service to handle intents as structured requests and
send back the appropriate response. Besides the mandatory components, a useful optional
component, slot, was also applied in this project. A slot is always assigned with a slot type,
which clarifies a type of word.
All of the components above can be configured in the Alexa Developer Console. Since our
project includes several extra modules, the cloud service was configured as an external service.
The Alexa Developer Console was able to link to the cloud-based service by configuring the
34 Chapter 3. Materials and Methods
endpoint. The Alexa custom skill can then be defined as two parts. Part one is the interaction
model which includes the invocation name, intents, utterances, and slots. The second part is
the cloud service that handles the request. Figure 3.9 further illustrates how the interaction
model works.
Figure 3.9: Interaction Model Process
The following description highlights the interaction model the used in this project 3.5. We
chose to use the invocation name “turtle one” as the simulation skill for this project. The
skill contains two intents, MoveBase and FollowIntent. Further, the MoveBase intent has a
slot dir (short for direction), which is defined as a slot type ListOfDirection. This slot type
defines four possible direction words: forward, backward, left, and right.
As a result, to use this skill users can say: Alexa tell turtle one to go forward.
The main purpose of this interaction model is to help Alexa map the voice commands from
the user to the correct intent so that the cloud computing process can handle the requests
3.2. Amazon Echo Dot 35
Skill TurtleSimInvocation Turtle One
Intent Sample Utterance Slot Slot Value
Move
Move {dir}
dir
forwardTurn {dir} backwardGo {dir} left
Move {dir} right
Follow
Follow {tag} tag markerFind {tag} tag
{quit} following quit quit{quit} find stop
Table 3.5: Interaction Model Configuration
sent from Alexa correctly. When the above example command is received by Alexa, two
keywords will be processed: “turtle one” and “forward”. When Alexa recognizes “turtle
one”, it will send a skill launching request. The word “forward” will map to the slots type
ListOfDirection, then map to the MoveBase intent. A MoveBase intent request along with
the slot’s value can then be sent to the cloud service. All of the requests are sent as JSON
messages. A sample JSON input is shown in figure 4.5.
In order to map the keywords to the expected intent, a slot value needs to be unique for
intents. A slot value cannot appear in two intents’ sample utterances. It will cause confusion
for Alexa. Alexa doesn’t have the ability to distinguish which intent request it should be
sent. Other than the Alexa skill configuration, Alexa Developer Console also provides the
Alexa simulation page for users to test the skill without an actual Alexa embedded device or
Alexa app. The simulation page can take both text and voice input. To build a connection
between the Alexa Skill and computing services, the endpoint needed to be configured to
the internet access service.
36 Chapter 3. Materials and Methods
3.2.2 Cloud-based Service
For the cloud-based service, we used another one of Amazon’s free cloud computing services,
the AWS Lambda function. AWS Lambda allows users to upload and run their codes without
provisioning or managing servers. Additionally, it supports multiple programming languages,
including Node.js, Python and Java. Thus, developers can edit their codes using the code
editor window in AWS Lambda.
The connection between AWS Lambda and Alexa Skill is bidirectional. The Lambda function
connects to the Alexa Skill by adding an Alexa Skill Kit Trigger. The trigger is linked to one
and only one Alexa Skill ID. The Alexa Skill Kit Trigger also protects the Lambda function
so that it cannot be used by others. The function can only can be evoked by the specific
Alexa Skill. To connect the Skill to the AWS Lambda function, we need to configure the
endpoint of skill to the Lambda function ARN (Amazon Resource Name).
AWS Lambda can also be used to test and debug the script using various test events. To test
the launch request handler, a launch test event first needs to be configured. Similarly, we will
need the intent request handler and stop request handler. The test event is a manual JSON
input. For instance, an intent request test is shown below in Figure 3.10. By executing the
test event, the Lambda function shows whether the test passed or failed. The JSON output
will be published, as well as the execution duration, etc. It also generates the summary for
error count and duration as shown in Figure 3.10.
However, in order to test all of the possible requests, users need to make up all of the possible
JSON inputs. In our case, at least eight test events are needed which is time-consuming and
not convenient. Therefore, another server, in this case, a BST proxy lambda server was
also used. Bespoken Proxy (bst proxy) is a tool from Bespoken, LLC. This tool allows
users to communicate with the local service running on the machine via Alexa device, Alexa
3.3. Marvin 37
(a) Error summary from Lambda (b) Duration summary form Lambda
Figure 3.10: Cloud watch summary from AWS Lambda
simulator, or Alexa app. The proxy lambda command can run a Lambda function as a local
service.
To use the BST proxy server, the Skill endpoint needs to be configured to the generated
public URL for accessing the local service. The URL remains the same for one IP address.
Testing and debugging of code can be easily done via the Amazon Developer Console’s
simulator. If an error occurs, the simulation page will not display the designed response,
and the error message will be shown in the terminal. Instead of configuring all possible
JSON inputs, we can let Alexa generate the JSON input from our text or voice command
input.
3.3 Marvin
Marvin is a remote server used for controlling the Fetch. Although Fetch has a built-in com-
puter to send control commands, it is dangerous to have wires like Ethernet cable connected
to the robot while moving the base or arm. In order to accomplish the detect and follow
AR-tag task, Fetch needed to be able to move without any restrictions. Therefore, Marvin
was used for sending commands and receiving messages to and from Fetch.
38 Chapter 3. Materials and Methods
For the best compatibility, Marvin runs the same version of Ubuntu and ROS as the robot,
which in this case was Ubuntu 14.04 and ROS Indigo. This can reduce the risk of having
conflicts between the robot and the remote server.
The connection between Marvin and Fetch can be built via the ROS_MASTER_URI. We
set the ROS_MASTER_URI of Marvin to be Fetch’s IP address, so that all the ROS related
data from Fetch will be sent to Marvin automatically. One of the benefits of this connection
is that there is no noticeable latency when using ROS tools such as RVIZ or rqt_graph.
However, ROS_MASTER_URI only allows us to receive data and thus, we can’t send
control commands or messages to Fetch. In order to solve this problem, SSH is used. SSH is a
network protocol that enables secure communication from machine to machine by connecting
to a remote host for one terminal session. Using the SSH commands will temporarily make
the current terminal the destination’s terminal. We are able to run a program or edit the
code on Fetch. One of the reasons that we need the SSH is that we need to build a bridge
between the robot and Alexa. If the bridge is built on Marvin, Alexa can only communicate
with Marvin instead of Fetch. The detail of the SSH will be explained in more detail in the
following section.
Besides the remote server, Marvin also runs the local computing service for Alexa’s request.
As we mention in the Alexa section [Section ref], the BST proxy tool allows us to run a local
service as a cloud service. Instead of using Amazon Lambda as a cloud service, we chose to
run Lambda as a local service on Marvin via BST service because Lambda cannot provide
a stable connection between the ROS WebSocket server and Alexa Skills. Along with the
BST proxy server, the Alexa simulator can be used for testing and debugging. The error
messages, as well as the JSON messages, will be published on the terminal directly.
3.4. Communication 39
3.4 Communication
Since multiple devices and programming languages are used, communication among them
is one of the most essential parts of this project. In the following illustration, Figure 3.11
shows a schematic detailing how the messages are communicated between devices. The
communication between machines will be built via SSH as illustrated in the figure. The
communication between the devices and virtual assistants will be built via ROS packages
and the cloud server.
Figure 3.11: Device Communication for Alexa Controlled Robot
3.4.1 Machines
In this project the robot was able to be controlled and monitored by Marvin through SSH.
However, if Marvin tries to receive a large number of messages from the robot, for example,
while using RVIZ to view image data from the camera or laser scan data from LiDAR, the
RVIZ will have approximately a 3-second latency period. Consequently, the visualization of
the ROS message will be done through ROS_MASTER_URI. ROS tools like RVIZ can also
be opened directly without SSH.
SSH is mainly used when we need to run a new program or start a new ROS node. There
are two new nodes that we needed to start on Fetch. One is the node that executes the
40 Chapter 3. Materials and Methods
detection and following of the AR tag, /ar_follower. The other node is used to build a
bridge for the transfer of messages between Alexa and Fetch, /rosbridge_websocket. Also,
when a test was finished, a a control node /teleop_twist_keyboard was created to allow for
the movement of the robot with a keyboard in order to try to bring the robot to the starting
position. Another usage of SSH was to modify the code remotely. If an error occurred when
a script was run remotely, the code could then be edited immediately.
It is important to note that this communication is not bidirectional. Marvin can send
commands to the robot as well as retrieve messages from the robot. However, the robot
cannot do the same thing because Marvin is not configured to be SSH-enabled.
3.4.2 Devices, ROS and Alexa
For the purpose of smooth communication among devices, there are several message con-
versions and transfers that need to be done due to both ROS and Alexa only accepting
certain types of messages. One of the message conversions can be done in the ROS package
ar_track_alvar. In ROS, images from the sensor will be sent to an ROS hardware driver and
passed around in the ROS message format sensor_msgs/Image. Unfortunately, the ROS im-
age message is not convenient for image processing. Thus, the ROS package ar_track_alvar
is used to convert the ROS message, sensor_msgs/Image, to an OpenCV image message,
cv::Mat, using the package cv_brigde.
The sensor_msgs/Image will be published to a CvBridge node and converted to an OpenCV
image [2]. The new image message will then be republished over ROS. This conversion is
bidirectional. Figure 3.12 shows the message conversion flow.
Other than image message conversion, the project also considers the JSON message conver-
sion for interacting with Alexa. Because the request sent to the service and the response
3.5. Summary 41
Figure 3.12: Image message conservation
sent back from the service will be in JSON format, which is not readable for ROS with-
out conversion. Fortunately, ROS provides a library, rosbridge_library, to do the transfer
between JSON string and ROS message [18]. Although rosbrigde_library allows for the
conversion of JSON message and ROS message, it left the transport layer to another pack-
age, rosbridge_server. This package provides a WebSocket as a transport layer, which has
low latency and allows bidirectional communication. Although Amazon Lambda supports
several programming languages, BTS proxy only supports JavaScript. Consequently, the
cloud service for Alexa Skill is written in JavaScript. And another ROS package, roslibjs,
is needed for cloud service to communicate with ROS. This relationship is shown in figure
3.13.
3.5 Summary
In this chapter, the materials involved in this project were introduced. The main devices that
were discussed include the robot: Fetch, the speech recognition engine: Alexa, and the remote
42 Chapter 3. Materials and Methods
Figure 3.13: JSON message conversion
server computer: Marvin. In addition, how these materials are connected and communicate
with each other was explained. The messages were converted using ROS packages and the
device were connected via SSH internet protocol and WebSocket layer. Devices should be
able to communicate with each other. The communication within the devices was tested
and the results will be highlighted in the next chapter.
Chapter 4
Results
This chapter will focus the results from both simulations and experimental testing with the
fetch robot. First, the performance of SR300 was evaluated. We also continued to simulate
the follow and speech recognition tasks on the remote server. The last results that will be
displayed demonstrate the voice control tests on the real robot.
4.1 Camera
Since the camera plays a key role in completing the detect and follow functions, the first
step was to test the performance of the new camera, the SR300. Using the ROS package,
ar_track_alvar, multiple AR tags were able be detected. The position relationship can then
be visualized in RVIZ. The camera image and the visualized transfer frame is presented in
figure 4.1b.
To test the performance of SR300, we used a moving AR tag held by a person while we
recorded the marker messages. The tag was moved randomly within the FoV, mainly chang-
ing the position in x and y direction within the approximate sensor range of two camera
model. Since the position of the tag was controlled by a human, we were control the speed
to a certain number. We tried to move the tag steadily which made it easier to eliminate the
outliers from raw data. When the data had a sudden jump in position on any direction, it
can be determined to be a wrong detection. If there were empty marker messages, the cam-
43
44 Chapter 4. Results
(a) Multiple tag detection with RGB image (b) Visualized TF for multiple tag detection
Figure 4.1: AR Tag Detection
era did not detect any AR tag. The following table explains the efficiency of SR300. Since
the Intel camera package allows developers to determine the launch mode of the camera,
both RGB-D mode and Kinect type mode were tested. The efficiency of the original camera
model is tested with a new camera at the end of the project. Since the PrimeSense Carmine
1.09 originally launched with PointCloud2, we only evaluated the Kinect type mode.
Table 4.1: Data loss ratio under different situation
Method Background Camera Launch Type Total Data Empty Data RatioTrack Noisy Kinect Type 26914 22428 83.33%
RGBD 26823 9515 35.47%White Kinect Type 26989 26827 99.40%
RGBD 26827 542 2.02%Follow White Kinect Type 27003 25565 94.67%
RGBD 26999 178 0.66%
Table 4.1 shows that SR300 with RGBD mode provided a reliable tacking function. When
there was a clear white background, the AR tag was visible for 97.80% of the time. Further,
when the the AR tag was placed in a noisy background, which is the normal in a lab
environment, the detection efficiency was 33.45% lower. Both experiments show acceptable
4.1. Camera 45
results. However, with the Kinect type mode, the camera data loss reached up to 99.40% and
83.33% for the white and noisy backgrounds, respectively. Meanwhile, the original camera
model has an 8.33% data loss for 15 minutes of tracking. In addition to the detecting function
only, the detection productivity while following was also verified. The data loss reached a
minimum during the following task.
The Kinect type mode had a large offset between the RGB image and depth image, which is
used to create the distance data in PointCloud Library and therefore we were unable to trust
the recorded data. To address these issues, we attached the AR tag is to a slightly larger
white paper and found that the detection efficiency was improved. The empty data ratio
dropped from 35.47% to 2.02% 4.1. The white paper forms a small area of white background
around the target tag. Consequently, the background builds an obvious contrast with the
edge of the AR tag. Accordingly, it is easier for the camera to detect the corners and blocks
of the tag. However, since the tag is held by a person and moves in arbitrary directions,
it can sometimes moves out of the FoV of the camera, which causes most of the data loss.
During the following test, the robot moves with the target, the tag is less likely to be out the
view because the robot adjusted its position in order to keep a certain distance from the tag.
As a result, the detection efficiency increases for the following task. The figures 4.2a and
4.2b show the detected marker positions, where x is the distance from the tag to the camera,
and y and z are horizontal and vertical distance from the center of FoV, respectively. Based
on the position change of the marker, the moving speed was able to be calculated. The peak
positions in 4.2c and 4.2d indicate the empty marker messages.
According to the result above, we can conclude that the Intel SR300 can be used as an
appropriate substitute for the PrimeSense Carmine 1.09.
46 Chapter 4. Results
(a) Marker position from SR300 (b) Marker position from Carmine 1.09
(c) Marker moving speed from SR300 (d) Marker moving speed from Carmine 1.09
Figure 4.2: Compared camera performance
4.2 Simulation
As we mentioned above, Fetch is a mobile robot that can only be used on a flat surface. In
addition, the weight of the Fetch is 250 lbs (113.3 kgs) [65]. It will be very dangerous if Fetch
hit someone in the room or go down to the stair by accident. Due to these safety concerns,
it was first necessary to simulate results prior to conducting our experimental tests.
The first simulation was to track and follow an AR tag using turtlesim. The simulated turtle
moved based on the position of the detected marker. Figure A.5 displayed the simulation
4.2. Simulation 47
for track and follow.
Figure 4.3: Turtle trajectory for AR tag following
We recorded the actual command velocity that is published to /turtle1/cmd_vel, which
control the movement of turtlesim, as well as the AR tag position that is published to
/ar_pose_marker. The expected trajectory was computed based on 3.7. The actual and
expected trajectory was compared in 4.3. The two trajectories basically overlapped with a
slight misalignment. The misalignment was caused by the mismatched updating frequencies
of the camera and the control topic. The camera frequency fluctuated across a range from
25Hz to 30Hz, while the updating rate of /turtle1/cmd_vel was set a constant 30Hz.
Overall, the turtle was able to follow the AR tag as expected based on the detected relative
position.
In addition to conducting the following simulation, turtlesim was also used to do conduct
simulations that evaluated the ROS and Alexa connection as well as the speech recognition
48 Chapter 4. Results
tasks. Similar to the methods used earlier, we used the turtle to imitate the movement of
a real robot. The Alexa developer console test page was applied for sending commands. If
the ROS and Alexa were not connected, the Alexa simulation page sent back the designed
message, yet the turtle will not perform the tasks. Only when the ROS and Alexa connection
is built, will the turtle perform the requested mission, including moving to a direction or
follow the AR tag.
When using the AWS lambda function as the cloud server, the connection between ROS and
Alexa did not work well. During the event test, the console displayed that the rosbridge was
not connected while the rosbridge_server had launched.
The turtle simulation result using AWS Lambda was displayed in Figure 4.2. The actual
simulation window can be found in Appendix A A.1. Alexa received the correct message but
the turtle did not move as expected.
Table 4.2: Turtle Simulation Response with AWS Lambda
User Command Alexa Response Rostopic Responselinear angular
Alexa start turtle one Where do you want to go NaN NaNAlexa tell turtle one go forward Moving Forward … NaN NaN
Please say another commandAlexa tell turtle one trun left Turning left … NaN NaN
Please say another command
As a result, we decided to use bst proxy server to be the “cloud” server. The simulation
results with the responses from the Alexa simulator and the ROS topic /turtle1/cmd_vel
are listed in 4.3. The actual simulation window using bst proxy is also shown in A.2.
As shown in Figure A.2, the trajectory of the movement of the turtle is satisfied with the
text input command. To distinguish the forward and backward commands, the forward
command is set to move for one step while the backward is set to be one steps.
4.3. Robot Test 49
Table 4.3: Turtle Simulation Response with BST proxy
User Command Alexa Response ROS Topic Responselinear angular
Alexa start turtle one Where do you want to go NaN NaNAlexa tell turtle one go backward Going backward {-2, 0, 0} {0, 0, 0}Alexa tell turtle one go forward Going forward {1, 0, 0} {0, 0, 0}Alexa trun left Turning left {0, 0, 0} {0, 0, 1.6}Alexa go forward Going forward {1, 0, 0} {0, 0, 0}Alexa trun right Turning right {0, 0, 0} {0, 0, -1.5}
The simulation of speech recognition implies that the bst proxy server provides a stable
connection between ROS and Alexa. Alexa receives and interprets the command correctly
when a proper command is sent. After configuring the simulation, the algorithm and the
Alexa voice control was ready to be tested on the Fetch.
4.3 Robot Test
After validating our process through simulations, we were then able to conduct experiments
on the robot. Before we tested the voice control with Alexa, we would first made sure that
the track and following task could be performed as expected. After applying a designed
algorithm, the expected geometry/Twist messages were computed from the marker posi-
tion data. The result was then compared with the actual messages that were published to
/cmd_vel as shown in Figure 4.4.
When applying the RMES equation 3.10, the computational result is shown in Equation 4.1
RMSEcmd =
RMSElinear.x
RMSEangular.y
=
1.5402× 10−13
4.202× 10−13
(4.1)
50 Chapter 4. Results
Figure 4.4: Output comparison
RMSE values in 4.1 is close to zero. The error might come from the mismatched time-steps.
The /cmd_vel topic update frequency is set to be 60Hz, but the marker position topic’s rate
is ranging from 25Hz to 30Hz. To make the expected and the actual messages comparable,
the actual movement data is down sampled to half of the original sample size.
Since the error was very small, we can conclude that the following task was performed as
expected.
After completing the track and follow test, we moved on to the voice control test. The skill
is activated correctly with the innovation name. The four direction commands as well as the
follow and the stop following commands work well via text input in the Alexa simulation
page. Figure 4.5 shows simplified JSON input and output. The complete JSON input and
output can be found in Figure A.3 and A.4 under Appendix A .
In addition to testing the designed commands, the default commands to exit the Alexa were
also tested. Since the messages will need to be converted multiple times, the response times
4.3. Robot Test 51
Input Output
"request":{ "response": { "type": "IntentRequest" "outputSpeech": { "intent": { "type": "SSML", name: { "MessageIntent" "ssml": "<speak> Going forward</speak>" "slots": { } "item": { "should end session": false, "name": "Item", }
"value": "forward",
}
}
}
}
}
Figure 4.5: Example JSON input and output
could be affected. According to the device log in Amazon Developer Console test page, we
estimated the response time for four requested types which are presented in Table 4.4.
Table 4.4: Process time for Alexa intent requests
Intent Sample command Process Time (ms)Launch Alexa start Fetch 934.69
MoveBase Alexa ask Fetch move forward 867.77Follow Alexa tell Fetch follow the tag 920.53Stop Stop 930.29
The sample device log is shown in Appendix A Figure A.6. Because the speech time is
varied based on the designed response output, we only consider the response time from
TextMessage to RequestProgressingCompleted. The process time was then determined based
on the average time of 15 requests. All four type of intent requests spent less than 1 second
to process the request, which meant the robot would be able to react within 1 second after
receiving the command.
The final purpose of the project was to send commands via voice instead of text. Therefore,
52 Chapter 4. Results
the voice command using the Amazon Echo Dot was verified. Unfortunately, since we used
the BST proxy as the local server, we were unable to receive the device log from AWS. Instead
of the response time, the correctness of speech recognition was tested. In the 40 examined
voice commands, 5 of them were launch requests, 3 were stop requests, 16 were movement
requests, and the other 16 were tag-following requests. After competing the study, we found
that all of the launch and stop requests were interpreted correctly. The movement command
had only one failure case where Alexa did not recognize the direction in the command and
exited the skill. Half of the follow request failed. Based on Alexa’s response, the word
“follow” was recognized, but Alexa thought that the “tag” or “AR tag” was a social media
hashtag or account, and intents to follow that account, instead of letting the robot complete
the desired track and follow task. This situation could be improved if the command is said
with the innovation name of the skill. For example, Alexa tell fetch to follow tag. Similarly,
when we want to stop the following, the command should be Alexa tell fetch to stop following
tag.
4.4 Train Model Evaluation And Mapping
Even though the current following algorithm works fine, we would like to further improve
the model using a NN. After applying the algorithm in 1 and running for 100 epochs, we
were able to obtain our best trained model and a loss log for all epochs. After running 100
epochs the training loss is reduced to 0.29958 and the validation loss is reduced to 0.30184.
In Figure 4.6 shows the decrease of the training loss and the validation loss along with more
epochs.
At the first epoch, both training loss and validation losses are high, which meant that the
trained model in epoch 0 was under fit for the problem. Underfit models usually cause
4.4. Train Model Evaluation And Mapping 53
Figure 4.6: Training and validation loss for neural network training process
poor generalization and unreliable predictions. After epoch 50, the validation loss seemed to
converge and kept a certain level for the rest of the training, while the training loss continued
to decrease. This indicates that the model had a high risk of overfitting after epoch 50. The
overfitting problem can also lead to a loss of generalization of the model. Therefore, our
best model was found to be around epoch 50.
After applying the best trained model to the test data, we were then able to evaluate the
performance of the model. Figure 4.7 compared the actual output from the experiment and
the output from the model. The positive linear velocity indicated that the robot moved
forward and the negative velocity meant that the robot was moving backward. There was
bad fitting in the region where the robot had a positive linear velocity. The results of the bad
fitting region is displayed in 4.7b. The actual output of this region showed poor continuity
which was highly suspected to be noise from unexpected input. When the robot was moving
forward, the AR tag was smaller in the view. Consequently, it was easier for the robot to
54 Chapter 4. Results
pick up the environment noise as the input. It was difficult to filter out this kind of noise in
the designed algorithm because the input was within the expected input range, which was
the range of camera. However, the neural network computed model was able to ignore the
noise and create a smoother trajectory.
Applied the Equation 3.10, the error is computed and displayed in 4.2.
RMSEcmd =
RMSElinear.x
RMSEangular.z
=
0.03680.0383
(4.2)
From the evaluation results, we can see that the proposed NN can perform as good as the
original algorithm. In addition, the NN behaved in a more robust way with regard to dealing
with the noise and generating smoother trajectories.
(a) Training model evaluation with test data (b) Detail for bad fitting region
Figure 4.7: Training model evaluation
To achieve the final goal of the navigation system, we needed an accurate map for robot
localization in addition to a good following model. Through the application of the Karto
SLAM method described previously, the map in Figure 4.8a is determined. As can be seen
in the figure, the produced map has a noticeable drift.
4.4. Train Model Evaluation And Mapping 55
(a) Map built by Karto SLAM (b) Map built by Cartographer
Figure 4.8: Map from two SLAM method
The map built from Cartographer is shown in Figure 4.8b. When compared with the map
from Karto SLAM, it is immediately clear that the Cartographer map has significantly less
drift. In addition to generating maps, Cartographer can generate the trajectory of the robot
while building a map. The corresponding trajectory of 4.8b is shown in 4.9, where the green
dot is the starting position and the red dot is the stopping position.
At the beginning of the map generating process, Cartographer also has some drift as shown in
4.10. As shown in the figure, both graphs start at the same position and have the trajectory
at the beginning, but the trajectory starts to drift after certain time. Instead of generating
a map with two drifted loops as displayed in Figure 4.8a, Cartographer closes the loop based
on the laser matching and constrains.
According to the output maps from two SLAM tools, we chose Cartographer to generate the
map for robot localization.
56 Chapter 4. Results
Figure 4.9: Map built by Cartographer with trajectory
4.5 Summary
In this chapter, the performance of camera is validated. The Intel SR300 was able to track
the AR tag continuously using the RGB images. Therefore, the SR300 was used for all exper-
imental work presented in this thesis. Due to the complicated and crowded lab environment
and the potential danger of hitting people or objects in the lab, the turtlesim tool based in
ROS was used for simulation. The simulation results validated that the follow algorithm was
successful. The turtle moved or rested according to the marker’s position. Also, the turtle
simulation conformed that the bst proxy server offered a more stable connection between
Alexa custom skill and ROS. The turtlesim’s movement could be controlled by Alexa via
bst proxy. On the basis of the previous simulation, we changed the controlled topic from
/turtlesim/cmd_vel to /cmd_vel, and mounted SR300 in the place of Carmine1.09. By
4.5. Summary 57
Figure 4.10: Trajectory for one loop and two loops
saying the appropriate commands, Fetch can be controlled to move or to follow an AR tag.
Also, a neural network algorithm was used to train the target following model and validated
with small error. The neural network computed model showed good robustness to the en-
vironment and generated a more continuous moving commands. After comparing different
SLAM tools for map generation, we found Cartographer had the most stable performance.
Hence, we choose Cartographer to build a map for robot precise localization.
Chapter 5
Discussion and Future Work
There were several limitations that we encountered while completing this project which
will be discussed in more detail in this chapter. Additionally, the robot and the methods
presented in this thesis could have other possible applications. In the second section of this
chapter, the possible future applications will be explored.
5.1 Limits
One main limitation of the robot Fetch, is that Fetch is designed to be a indoor robot which
uses wheels to move. Therefore, the movement of the robot on an uneven surface will be
restricted. If there are stairs or holes on Fetch’s moving path, the robot will stop in front of
stairs or stuck in the holes. Also, while completing the detect and following AR tag test, the
camera sometimes lost target because the camera frequency is restricted to be 30 Hz, which
results in some data lost.
In addition to the hardware restrictions, the robot also has multiple software limitations.
The Ubuntu and ROS versions that were used were old and had already stopped being
supported. Updates and support were also not available after April 2019. In addition, a
lot of new functions are not provided for ROS indigo. For example, eval for roslaunch,
which evaluate python expressions, can only be used in the ROS version later than Kinetic.
Besides, ROS 1.0 has dependency conflict with python 3, which greatly limits the range of
58
5.2. Future Work 59
applications. Although the python 2.7 works well so far, it will retire in January 2020. There
are multiple ways to use Python 3 with ROS1. One of the most recommended ways is to
use it in a virtual environment since the installation of python 3 will remove all current ROS
packages, which can mess up with all the work that has been done so far. File back ups will
be highly essential before any updates take place.
5.2 Future Work
5.2.1 Software Upgrade
As I mentioned above, currently there are numerous limitations for Fetch with the majority
of the restrictions coming from outdated software. Therefore, one of the biggest jobs is
upgrade the software for future use. All the important files needs to have a backup since the
update from Ubuntu 14.04 + ROS indigo to Ubuntu 18.04 + ROS melodic is not supported
by the Fetch.Inc. There no direct way to upgrade from Ubuntu 14.04 to 18.08. First, we
will need to upgrade to Ubuntu 16.04, and then upgrade to 18.04. This process will be
time consuming and can not guarantee the result. In addition, this upgrade will not fix the
compatibility issue of python 3. Currently, ROS 2 supports python 3, but ROS 2 is not
completely ready for use.
5.2.2 Continued Navigation Development
After upgrading the operating system, we were then able to continue to develop an intelligent
navigation system. As a first approach to the final goal, we let the robot learn human
following behaviors. Instead of a given program computed answer, the example solution was
given by the human control command. The robot was then controlled by a human with a
60 Chapter 5. Discussion and Future Work
joystick to follow the target. Similarly, the experimental data was used for NN training. In
this case, we expected the robot to imitate a human navigation behavior without a given
algorithm. Also, it is important to note that obstacle avoidance during the following is not
currently considered. The obstacle avoidance implementation will be necessary in future
work to enable robot-human coexistence in various environments. It will help the robot
assistant to behave more like a human assistant.
Furthermore, the map information can be integrated with the navigation system and Alexa
control. If the robot knows where it is on the map, it can predict the possible location of
the target it should follow. In addition, the robot can be used for object delivery with the
map. By enabling navigation with Alexa, we can send the desired destination for the robot
using voice commands.
5.2.3 Other Hardware
In addition to navigation, the application of other hardware can be developed in the future.
First of all, the manipulator was not used in this project at all. The manipulator is one of
the most useful functions of fetch. The arm can reach higher than 180 cm and ± 90◦ from
the center. The arm is designed to support up to 6 kg. It can be used to pick up object
or can simply be a holder for certain objects. The robot can be expected to help the elder
individuals solve their problems in their daily lives. For example, if the user needs assistance
with picking up items around the home, the manipulator will be needed.
Chapter 6
Conclusions
The IoT has as enormous number of possibilities as the internet developing. As one of the
most important components of the IoT, smart speakers are becoming more and more popular.
In addition to playing music, smart speakers are also virtual assistants, which understand
certain voice commands and take actions based on those commands such as searching the
weather, setting timers or alarms, and turning on and off lights. However, the IoT nowadays
is limited by mobility. The facilities that a VA can control usually are stationary. We believe
that the new trending applications of the IoT in the field of robotics, will be able to break
down those limitations. This belief is rooted in the idea that robots can be designed and
have more diverse and numerous functions than typical household utilities. For example, if
a robot is equipped with wheels or legs, it could be able to move and complete tasks around
a home or workplace. If a robot is equipped with arms and grippers, it could be able to pick
or hold items. It a robot is equipped with laser, it can tell us how far a obstacle is.
In this project, we were able to successfully connect a mobile robot, Fetch, to an IVA, Alexa.
The IVA was able to understand given commands and was able to control the robot according
to the commands. The robot could be directed to a given direction. It could start to detect
a AR tag without following it. It was also able to follow the AR tag while maintaining a
certain distance. Based on the hand-engineered algorithm, we applied a data-driven neural
network algorithm to develop a similar following model. The neural network architecture can
be further improved for human navigation behavior imitation. The neural network trained
61
62 Chapter 6. Conclusions
model performed as well as the original algorithm. Moreover, it is less sensitive to the noise
and manage to compute a more continuous control commands. In order to generate a map
for robot localization, two SLAM tools were tested and compared. Cartographer barely has
drift and provide 3D SLAM if needed while Karto SLAM has larger drift and the drift will
not be eliminated by loop closure.
Bibliography
[1] Primesense 3d sensors.
[2] Wiki: cvbridge. URL http://wiki.ros.org/cv_bridge.
[3] Wiki:distributions. URL http://wiki.ros.org/Distributions.
[4] Fetch robotics. URL https://fetchrobotics.com/robotics-platforms/
fetch-mobile-manipulator/.
[5] primesense carmine 1.09. URL http://xtionprolive.com/primesense-carmine-1.
09.
[6] Intel sr 300. URL https://click.intel.com/media/catalog/product/cache/1/
image/9df78eab33525d08d6e5fb8d27136e95/p/s/ps-blasterx_senz3d_front_1.
png.
[7] Number of connected devices reached 22 billion, where is the revenue?, May 2019. URL
https://www.helpnetsecurity.com/2019/05/23/connected-devices-growth/.
[8] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig
Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat,
Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal
Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat
Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens,
Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay
Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin
Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on
63
64 BIBLIOGRAPHY
heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available
from tensorflow.org.
[9] Raafat Aburukba, AR Al-Ali, Nourhan Kandil, and Diala AbuDamis. Configurable
zigbee-based control system for people with multiple disabilities in smart homes. In 2016
International Conference on Industrial Informatics and Computer Systems (CIICS),
pages 1–5. IEEE, 2016.
[10] Tawfiq Ammari, Jofish Kaye, Janice Y Tsai, and Frank Bentley. Music, search, and
iot: How people (really) use voice assistants. ACM Transactions on Computer-Human
Interaction (TOCHI), 26(3):17, 2019.
[11] Roger Bemelmans, Gert Jan Gelderblom, Pieter Jonker, and Luc De Witte. Socially as-
sistive robots in elderly care: A systematic review into effects and effectiveness. Journal
of the American Medical Directors Association, 13(2):114–120, 2012.
[12] Alessio Botta, Walter De Donato, Valerio Persico, and Antonio Pescapé. Integration of
cloud computing and internet of things: a survey. Future generation computer systems,
56:684–700, 2016.
[13] Monica Carfagni, Rocco Furferi, Lapo Governi, Michaela Servi, Francesca Uccheddu,
and Yary Volpe. On the performance of the intel sr300 depth camera: metrological and
critical characterization. IEEE Sensors Journal, 17(14):4508–4519, 2017.
[14] Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P How. Socially aware motion
planning with deep reinforcement learning. In 2017 IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017.
[15] François Chollet et al. Keras. https://keras.io, 2015.
BIBLIOGRAPHY 65
[16] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and
Pavel Kuksa. Natural language processing (almost) from scratch. Journal of machine
learning research, 12(Aug):2493–2537, 2011.
[17] Intel Corporations. Intel realsense camera sr300 data sheet revision 1.0, 2016.
[18] Christopher Crick, Graylin Jay, Sarah Osentoski, Benjamin Pitzer, and Odest Chad-
wicke Jenkins. Rosbridge: Ros for non-ros users. In Robotics Research, pages 493–504.
Springer, 2017.
[19] Craig C Douglas and Robert A Lodder. Human identification and localization by robots
in collaborative environments. Procedia Computer Science, 108:1602–1611, 2017.
[20] Esther Calvo Fernández, José Manuel Cordero, George Vouros, Nikos Pelekis,
Theocharis Kravaris, Harris Georgiou, Georg Fuchs, Natalya Andrienko, Gennady An-
drienko, Enrique Casado, et al. Dart: a machine-learning approach to trajectory predic-
tion and demand-capacity balancing. SESAR Innovation Days, Belgrade, pages 28–30,
2017.
[21] Gonzalo Ferrer, Anais Garrell, and Alberto Sanfeliu. Robot companion: A social-force
based approach with human awareness-navigation in crowded environments. In 2013
IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1688–
1694. IEEE, 2013.
[22] Rafael Fierro and Frank L Lewis. Control of a nonholonomic mobile robot using neural
networks. IEEE transactions on neural networks, 9(4):589–600, 1998.
[23] Maksim Filipenko and Ilya Afanasyev. Comparison of various slam systems for mobile
robot in an indoor environment. In 2018 International Conference on Intelligent Systems
(IS), pages 400–407. IEEE, 2018.
66 BIBLIOGRAPHY
[24] Nicolas Gautier, J-L Aider, THOMAS Duriez, BR Noack, Marc Segond, and Markus
Abel. Closed-loop separation control using machine learning. Journal of Fluid Mechan-
ics, 770:442–457, 2015.
[25] Jayavardhana Gubbi, Rajkumar Buyya, Slaven Marusic, and Marimuthu Palaniswami.
Internet of things (iot): A vision, architectural elements, and future directions. Future
generation computer systems, 29(7):1645–1660, 2013.
[26] Xavier Harding. The ’pokémon go’ improved ar mode is now on iphone and android
- here’s how to use it, Oct 2018. URL https://www.mic.com/articles/191915/
pokemon-go-improved-ar-mode-iphone-android.
[27] Wolfgang Hess, Damon Kohler, Holger Rapp, and Daniel Andor. Real-time loop closure
in 2d lidar slam. In 2016 IEEE International Conference on Robotics and Automation
(ICRA), pages 1271–1278. IEEE, 2016.
[28] Alejandro Hidalgo-Paniagua, Andrés Millan-Alcaide, Juan P Bandera, and Antonio
Bandera. Integration of the alexa assistant as a voice interface for robotics platforms.
In Iberian Robotics conference, pages 575–586. Springer, 2019.
[29] Pengju Jin, Pyry Matikainen, and Siddhartha S Srinivasa. Sensor fusion for fiducial tags:
Highly robust pose estimation from single frame rgbd. In 2017 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), pages 5770–5776. IEEE, 2017.
[30] Jan Jungbluth, Rolf Krieger, Wolfgang Gerke, and Peter Plapper. Combining virtual
and robot assistants-a case study about integrating amazon’s alexa as a voice interface
in robotics. In Robotix-Academy Conference for Industrial Robotics (RACIR) 2018,
page 5. Shaker, 2018.
[31] Niklas Karlsson, Enrico Di Bernardo, Jim Ostrowski, Luis Goncalves, Paolo Pirjanian,
BIBLIOGRAPHY 67
and Mario E Munich. The vslam algorithm for robust localization and mapping. In
Proceedings of the 2005 IEEE international conference on robotics and automation,
pages 24–29. IEEE, 2005.
[32] Veton Kepuska and Gamal Bohouta. Next-generation of virtual personal assistants
(microsoft cortana, apple siri, amazon alexa and google home). In 2018 IEEE 8th
Annual Computing and Communication Workshop and Conference (CCWC), pages 99–
103. IEEE, 2018.
[33] Bret Kinsella and Ava Mutchler. smart speaker consumer adoption report 2019. Tech-
nical report, voicebot.ai, 2019. URL https://voicebot.ai/wp-content/uploads/
2019/03/smart_speaker_consumer_adoption_report_2019.pdf.
[34] Adrian Korodi, Alexandru Codrean, Liviu Banita, Vlad Ceregan, Anamaria Butaru,
and Radu Carnaru. Object following control for wheeled mobile robots. In Proceedings
of the 9th WSEAS International Conference on International Conference on Automation
and Information, pages 338–343. World Scientific and Engineering Academy and Society
(WSEAS), 2008.
[35] Sotiris B Kotsiantis, I Zaharakis, and P Pintelas. Supervised machine learning: A review
of classification techniques. Emerging artificial intelligence applications in computer
engineering, 160:3–24, 2007.
[36] Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason,
Adam Bates, and Michael Bailey. Emerging threats in internet of things voice services.
IEEE Security & Privacy, 2019.
[37] Ti-Chung Lee, Kai-Tai Song, Ching-Hung Lee, and Ching-Cheng Teng. Tracking con-
trol of unicycle-modeled mobile robots using a saturation feedback controller. IEEE
transactions on control systems technology, 9(2):305–318, 2001.
68 BIBLIOGRAPHY
[38] John J Leonard and Hugh F Durrant-Whyte. Simultaneous map building and localiza-
tion for an autonomous mobile robot. In IROS, volume 3, pages 1442–1447, 1991.
[39] Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectifier nonlinearities improve
neural network acoustic models. In Proc. icml, volume 30, page 3, 2013.
[40] Friedemann Mattern and Christian Floerkemeier. From the internet of computers to
the internet of things. In From active data management to event-based systems and
more, pages 242–259. Springer, 2010.
[41] Rajdeep Kumar Nath, Rajnish Bajpai, and Himanshu Thapliyal. Iot based indoor
location detection system for smart home environment. In 2018 IEEE International
Conference on Consumer Electronics (ICCE), pages 1–3. IEEE, 2018.
[42] Scott Niekum. Ros wiki, Dec 2013. URL http://wiki.ros.org/ar_track_alvar.
[43] J Norberto Pires. Robot-by-voice: Experiments on commanding an industrial robot
using the human voice. Industrial Robot: An International Journal, 32(6):505–511,
2005.
[44] Douglas A Orr and Laura Sanchez. Alexa, did you get that? determining the evidentiary
value of data stored by the amazon® echo. Digital Investigation, 24:72–78, 2018.
[45] Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob
Wheeler, and Andrew Y Ng. Ros: an open-source robot operating system. In ICRA
workshop on open source software, volume 3, page 5. Kobe, Japan, 2009.
[46] Nathan Ratliff, Franziska Meier, Daniel Kappler, and Stefan Schaal. Doomed: Direct
online optimization of modeling errors in dynamics. Big data, 4(4):253–268, 2016.
BIBLIOGRAPHY 69
[47] Hailin Ren, Jingyuan Qi, and Pinhas Ben-Tzvi. Learning flatness-based controller using
neural network. In ASME 2019 Dynamic Systems and Control Conference. American
Society of Mechanical Engineers Digital Collection, 2019.
[48] Luis Riazuelo, Moritz Tenorth, Daniel Di Marco, Marta Salas, Dorian Gálvez-López,
Lorenz Mösenlechner, Lars Kunze, Michael Beetz, Juan D Tardós, Luis Montano, et al.
Roboearth semantic mapping: A cloud enabled knowledge-based approach. IEEE
Transactions on Automation Science and Engineering, 12(2):432–443, 2015.
[49] Hayley Robinson, Bruce MacDonald, and Elizabeth Broadbent. The role of healthcare
robots for older people at home: A review. International Journal of Social Robotics, 6
(4):575–591, 2014.
[50] Margaret Rouse. What is internet of things (iot)? - definition from whatis.com,
Jul 2019. URL https://internetofthingsagenda.techtarget.com/definition/
Internet-of-Things-IoT.
[51] Radu Bogdan Rusu and Steve Cousins. 3d is here: Point cloud library (pcl). In 2011
IEEE international conference on robotics and automation, pages 1–4. IEEE, 2011.
[52] Alex Sciuto, Arnita Saini, Jodi Forlizzi, and Jason I Hong. Hey alexa, what’s up?: A
mixed-methods studies of in-home conversational agent usage. In Proceedings of the
2018 Designing Interactive Systems Conference, pages 857–868. ACM, 2018.
[53] Yosuke Senta, Yoshihiko Kimuro, Syuhei Takarabe, and Tsutomu Hasegawa. Ma-
chine learning approach to self-localization of mobile robots using rfid tag. In 2007
IEEE/ASME international conference on advanced intelligent mechatronics, pages 1–6.
IEEE, 2007.
[54] David Sheppard, Nick Felker, and John Schmalzel. Development of voice commands in
70 BIBLIOGRAPHY
digital signage for improved indoor navigation using google assistant sdk. In 2019 IEEE
Sensors Applications Symposium (SAS), pages 1–5. IEEE, 2019.
[55] Rathin Chandra Shit, Suraj Sharma, Deepak Puthal, and Albert Y Zomaya. Location
of things (lot): A review and taxonomy of sensors localization in iot infrastructure.
IEEE Communications Surveys & Tutorials, 20(3):2028–2061, 2018.
[56] William D Smart and L Pack Kaelbling. Effective reinforcement learning for mobile
robots. In Proceedings 2002 IEEE International Conference on Robotics and Automation
(Cat. No. 02CH37292), volume 4, pages 3404–3410. IEEE, 2002.
[57] José A Solorio, José M Garcia-Bravo, and Brittany A Newell. Voice activated semi-
autonomous vehicle using off the shelf home automation hardware. IEEE Internet of
Things Journal, 5(6):5046–5054, 2018.
[58] John A Stankovic. Research directions for the internet of things. IEEE Internet of
Things Journal, 1(1):3–9, 2014.
[59] Chin Pei Tang. Differential flatness-based kinematic and dynamic control of a differen-
tially driven wheeled mobile robot. In 2009 IEEE International Conference on Robotics
and Biomimetics (ROBIO), pages 2267–2272. IEEE, 2009.
[60] Sebastian Thrun, Mike Montemerlo, Hendrik Dahlkamp, David Stavens, Andrei Aron,
James Diebel, Philip Fong, John Gale, Morgan Halpenny, Gabriel Hoffmann, et al.
Stanley: The robot that won the darpa grand challenge. Journal of field Robotics, 23
(9):661–692, 2006.
[61] Shrihari Vasudevan, Stefan Gächter, Viet Nguyen, and Roland Siegwart. Cognitive
maps for mobile robots—an object based approach. Robotics and Autonomous Systems,
55(5):359–371, 2007.
BIBLIOGRAPHY 71
[62] Regis Vincent, Benson Limketkai, and Michael Eriksen. Comparison of indoor robot
localization techniques in the absence of gps. In Detection and Sensing of Mines,
Explosive Objects, and Obscured Targets XV, volume 7664, page 76641Z. International
Society for Optics and Photonics, 2010.
[63] Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni. Deepvo: Towards end-to-
end visual odometry with deep recurrent convolutional neural networks. In 2017 IEEE
International Conference on Robotics and Automation (ICRA), pages 2043–2050. IEEE,
2017.
[64] Mark Weiser. The computer for the 21st century. IEEE pervasive computing, 1(1):
19–25, 2002.
[65] Melonee Wise, Michael Ferguson, Derek King, Eric Diehr, and David Dymesich. Fetch
and freight: Standard platforms for service robot applications. In Workshop on au-
tonomous mobile service robots, 2016.
[66] Takashi Yoshimi, Manabu Nishiyama, Takafumi Sonoura, Hideichi Nakamoto, Seiji
Tokura, Hirokazu Sato, Fumio Ozaki, Nobuto Matsuhira, and Hiroshi Mizoguchi. De-
velopment of a person following robot with vision based target detection. In 2006
IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5286–
5291. IEEE, 2006.
Appendices
72
Appendix A
Alexa
A.1 Turtle Simulation
Figure A.1: Turtle simulation for Alexa via AWS Lambda
73
74 Appendix A. Alexa
Figure A.2: Turtle simulation for Alexa control
A.1. Turtle Simulation 75
Figure A.3: Complete JSON input
Figure A.4: Complete JSON output
76 Appendix A. Alexa
Figure A.5: Turtle simulation for AR tag following
A.1. Turtle Simulation 77
Figure A.6: Detail device log
Appendix B
Neural Network
Figure B.1: Training Algorithm [47]
78