Post on 11-Sep-2021
HARDWARE AND SOFTWARE SYSTEMS
FOR PERSONAL ROBOTS
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Morgan L. Quigley
December 2012
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/gq378mt7634
© 2012 by Morgan Lewis Quigley. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Andrew Ng, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
J Kenneth Salisbury, Jr
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Peter Abbeel
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
iv
Abstract
Robots play a major role in precision manufacturing, continually performing economi-
cally justifiable tasks with superhuman speed and reliability. In contrast, deployments
of advanced “personal” robots in home or office environments have been stymied by
difficult hardware and software challenges. Among many others, these challenges have
included cost, reliability, perceptual capability, and software interoperability. This
thesis will describe a series of hardware and software systems designed in response to
these challenges and towards the long-range goal of creating general-purpose robots
that will be useful and practical in everyday environments.
First, several low-cost robot subsystems will be described, including systems for
indoor localization, short-range object recognition, and inertial joint encoding, as
demonstrated on prototype low-cost manipulators. Next, the design of a low-cost,
highly capable robotic hand will be described in detail, which incorporates all of the
aforementioned hardware and software subsystems. Finally, the thesis will describe
a robot software system developed for the STanford AI Robot (STAIR) project, and
its evolution into the Robot Operating System (ROS), a widely used robot software
framework designed to ease collaboration between disparate research communities to
create integrative, embodied AI systems.
v
Acknowledgments
Countless people contributed to the work described in this thesis. In alphabetical
order, the co-authors of the publications which led to this thesis were Alan Asbeck,
Siddharth Batra, Eric Berger, Reuben Brewer, Adam Coates, Ken Conley, Josh Faust,
Tully Foote, Brian P. Gerkey, Stephen Gould, Quoc Le, Jeremy Leibs, Ellen Klingbeil,
Vijay Pradeep, Curt Salisbury, Sai P. Soundaraj, David Stavens, Sebastian Thrun,
Andrew Y. Ng, Ashley Wellman, and Rob Wheeler.
The work described in this thesis benefited enormously from the collaboration of
researchers at Willow Garage, Inc., and Sandia National Laboratories. At Stanford,
my officemates Zico Kolter, Honglak Lee, Adam Coates, Alan Asbeck, and Anya
Petrovskaya patiently answered my never-ending barrage of questions. Faculty mem-
bers including Ken Salisbury, Pieter Abbeel, Oussama Khatib, Fei-Fei Li, and Mark
Cutkosky provided critical assistance and support through many phases of this work.
And of course, none of this would have been possible without the many years of
patient guidance provided by my advisor, Andrew Y. Ng.
vi
Contents
Abstract v
Acknowledgments vi
1 Introduction 1
1.1 Personal Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 First Published Appearances . . . . . . . . . . . . . . . . . . . . . . . 5
2 Low-cost Indoor Localization 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Robotic SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Obtaining Training Data . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 Camera Sensor Model . . . . . . . . . . . . . . . . . . . . . . 12
2.3.4 WiFi Sensor Model . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.5 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 High-resolution Depth Sensing 25
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
vii
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Laser Line Scanning for Robotics . . . . . . . . . . . . . . . . . . . . 29
3.3.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Hardware Considerations . . . . . . . . . . . . . . . . . . . . . 30
3.3.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.1 Sliding Windows . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4.2 Learning the Classifiers . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Door Opening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 Inventory-Control Experiment . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Inertial Joint Encoding 43
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.1 Estimation via EKF . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.2 Point estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Controlling a low-cost manipulator . . . . . . . . . . . . . . . . . . . 57
4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6.1 PR2 Alpha State Estimation . . . . . . . . . . . . . . . . . . . 59
4.6.2 Low-cost Manipulator Torque Control . . . . . . . . . . . . . 61
4.6.3 PR2 Alpha Position Control . . . . . . . . . . . . . . . . . . . 63
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 A Compliant Low-cost Robotic Manipulator 66
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3.1 Actuation overview . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.2 Tradeoffs of using stepper motors . . . . . . . . . . . . . . . . 73
viii
5.3.3 Distal actuation . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3.4 Inertia and stiffness . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.5 Low-cost manufacturing . . . . . . . . . . . . . . . . . . . . . 76
5.4 Series Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.5 Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.7 Control and Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.8 Demonstration Application . . . . . . . . . . . . . . . . . . . . . . . . 84
5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6 A Low-cost Robotic Hand 86
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3 High-level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.1 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.2 Actuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3.3 Hand Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4 Sensor Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4.1 Joint Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.4.2 Contact Geometry . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4.3 Tactile Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.4.4 Strain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.4.5 Visual Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.5 Wiring Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.5.1 Motor Wiring . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.5.2 Phalange Wiring . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.5.3 Interconnect Wiring . . . . . . . . . . . . . . . . . . . . . . . 112
6.6 Computational Systems . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.7 Teleoperation Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
ix
7 STAIR and Switchyard 119
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.2 STAIR: Hardware Systems . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2.1 STAIR 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2.2 STAIR 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3 Switchyard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.4.1 Message-Passing Topology . . . . . . . . . . . . . . . . . . . . 125
7.4.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.5 Fetch a Stapler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8 ROS: A Robot Operating System 137
8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.2 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.2.1 Peer-to-Peer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.2.2 Multi-lingual . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.2.3 Tools-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.2.4 Thin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.2.5 Free and Open-Source . . . . . . . . . . . . . . . . . . . . . . 143
8.3 Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.4 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.4.1 Debugging a single node . . . . . . . . . . . . . . . . . . . . . 145
8.4.2 Logging and playback . . . . . . . . . . . . . . . . . . . . . . 146
8.4.3 Packaged subsystems . . . . . . . . . . . . . . . . . . . . . . . 147
8.4.4 Collaborative Development . . . . . . . . . . . . . . . . . . . . 148
8.4.5 Visualization and Monitoring . . . . . . . . . . . . . . . . . . 150
8.4.6 Composition of functionality . . . . . . . . . . . . . . . . . . . 151
8.4.7 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9 Conclusions 154
x
Bibliography 157
xi
List of Tables
2.1 Quantitative results for the tracking task . . . . . . . . . . . . . . . . 21
5.1 Measured properties of the manipulator . . . . . . . . . . . . . . . . . 72
5.2 Part cost breakdown of the arm . . . . . . . . . . . . . . . . . . . . . 77
xii
List of Figures
2.1 Left: the STAIR robot, used to acquire maps. Right: a laser backpack,
used to provide ground-truth localization of a pedestrian carrying the
low-cost sensor package. . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 2D map of the environment used in these experiments, as produced by
GMapping and the robot shown in Figure 2.1 . . . . . . . . . . . . . 10
2.3 A typical rendering of the particle filter used to localize the “ground
truth” pedestrian using rearward LIDAR. The green LIDAR scan is
rendered from the most likely particle in the filter. . . . . . . . . . . . 11
2.4 Example image regions corresponding to the “visual words” used dur-
ing image matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Empirical justification of the Gaussian + uniform model of the WiFi
power measurements. The plot shows the frequency of power mea-
surement deviations from their respective means. This dataset was
gathered while sitting stationary for 60 seconds, and includes 34 trans-
mitters, most of which were observed 25 times. . . . . . . . . . . . . . 15
2.6 Pre-computed nearest-neighbor prediction of the WiFi signal strength
of a particular MAC address at any point in the environment. The
walls of the environment are overlaid for clarity. . . . . . . . . . . . . 17
xiii
2.7 Visualization of the unified vision + WiFi localization system. Upper-
left shows the particle cloud, which is overshadowed by the centroid of
the particle distribution (yellow) and the ground-truth position (cyan
crosshairs). Right shows the current camera image, with SURF key-
points circled. Lower-left shows the joint likelihood of the WiFi ob-
servations. Extreme lower-left visualizes the histogram of the bag-of-
words representation image. . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Pedestrian motion model, shown after a one-second integration. With-
out odometry, the particle filter must generate sufficient diversity in
its hypotheses to handle corners. . . . . . . . . . . . . . . . . . . . . . 19
2.9 Images from the training set (top) differed from images in the test set
(bottom) due to illumination changes and typical furniture re-arranging. 20
2.10 The ground-truth LIDAR track of the 1-kilometer test set used for
quantitative evaluation. The test set contained 62 corner turns, and
a mixture of navigating tight corridors and open meeting spaces. Dis-
tances are in meters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.11 Histograms of localization errors during the tracking benchmark on
a continuous 1-kilometer test set. Errors are measured with respect
to LIDAR ground-truth. The SURF and HoG performance is for the
global (1-level) spatial pyramid. Adding WiFi to SURF slightly de-
creases its long-term average tracking accuracy. The color histogram
performs poorly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.12 Global localization performance. The localization systems were started
with a uniform prior at 200 different starting points in the test set.
Errors against ground-truth were averaged at each timestep to show
the expected convergence properties of each system. All methods show
improvement as more observations are incorporated. The combination
of WiFi and the best visual algorithm (3-level spatial pyramid of SURF
descriptors) produces the best performance. . . . . . . . . . . . . . . 23
xiv
3.1 Several off-axis views of a raw scan of a coffee mug obtained by our
scanner from 1.2 meters away. The 5mm-thick handle is prominently
visible. Approximately 1000 points of the scan are on the surface of the
coffee mug, despite the fact that it comprises only 5% of the horizontal
field-of-view of scan. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Clutter makes scene understanding from only 2D visual images dif-
ficult, even in a relatively simple office environment, as many of the
strong edges are not those which suggest the 3D structure of the scene. 28
3.3 A vertical (green) laser line projected by the robot at left is deformed
as it strikes objects in the scene. . . . . . . . . . . . . . . . . . . . . . 31
3.4 A prototype laser line scanner on the STAIR 1 robot. The laser and
its rotary stage are mounted in the upper-right. Images are captured
by the camera in the lower-left. . . . . . . . . . . . . . . . . . . . . . 32
3.5 Image channels considered by the patch-selection algorithm, along with
the typical appearance of a coffee mug. Top: intensity image. Middle:
gradient image. Bottom: depth image. . . . . . . . . . . . . . . . . . 33
3.6 Examples of localized patches from the coffee-mug dictionary. Left:
Intensity patches. Middle: Gradient patches. Right: Depthmap patches. 35
3.7 Precision-recall curves for mugs (left), disposable cups (middle), and
staplers (right). Blue solid curve is for our method; red dashed curve
is for vision only detectors. Scores are computed at each threshold by
first removing overlapping detections. A true-positive is counted if any
detection overlaps with our hand-labeled ground truth by more than
50%. Any detection that does not overlap with a groundtruth object of
the correct class is considered a false-positive. Average Precision mea-
sures 11-point interpolated area under the recall vs. precision curve.
Greater area under the curve is better. . . . . . . . . . . . . . . . . . 36
3.8 After localizing the door handle in the 3D point cloud, the robot can
plan a path to the handle and open the door. . . . . . . . . . . . . . 38
3.9 Detecting coffee mugs in cluttered environments. The detector cor-
rectly ignored the paper cup to the right of the coffee mug. . . . . . . 39
xv
3.10 The inventory-gathering experiment required autonomous navigation
(green track), autonomous door opening, and 20 laser scans of desks
in the four offices shown above. The robot position at each scan is
shown by the red circles, and the field-of-view of each scan is indicated
by the yellow triangles. The locations of the detected coffee mugs are
indicated by the orange circles. This figure was entirely automatically
generated, using the SLAM output for the map and the localization log
for the robot track and sensing positions, which allow the coffee-mug
detections to be transformed into the global map frame. . . . . . . . . 40
4.1 Two manipulators used to demonstrate the utility of accelerometer-
based sensing. Left: Willow Garage PR2 Alpha. Right: a prototype
low-cost manipulator. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Accelerometers are present in the F0, F2, and F3 links of the robotic
finger, creating a 2-DOF estimation problem between F0 and F2, and
a 1-DOF estimation problem between F2 and F3. . . . . . . . . . . . 49
4.3 Left: an unpowered arm used to evaluate the calibration potential of
the accelerometer-based sensing approach. Right: touching the end
effector to points on the calibration board. . . . . . . . . . . . . . . . 53
4.4 Hold-out test set error during the optimization convergence on pro-
totype manipulator. The horizontal axis shows the iteration number,
and the vertical axis shows the mean of the miscalibrations. Numerical
optimization drives the average error from 11mm to 2mm. . . . . . . 55
4.5 Hold-out test set error during the optimization convergence on Willow
Garage PR2 Alpha. The horizontal axis shows the iteration number,
and the vertical axis shows the mean error in joint angle estimates of
the shoulder lift and the upper arm roll. The optimization drives the
average error from 0.1 deg to 0.02 deg. . . . . . . . . . . . . . . . . . 56
4.6 Shoulder and wrist of the demonstration manipulator . . . . . . . . . 58
4.7 Elbow and gripper of the demonstration manipulator. . . . . . . . . . 58
xvi
4.8 Accelerometers were attached to the upper arm, forearm, and gripper
of a PR2 Alpha robot. . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.9 Tracking the forearm roll of the robot shown in Figure 4.8, showing
the encoder ground-truth (red) against the joint angle estimate from
the accelerometers (blue). . . . . . . . . . . . . . . . . . . . . . . . . 60
4.10 Closed-loop control of a low-cost manipulator using only accelerome-
ters. Two joints are shown. Desired state is plotted in red. Output of
the accelerometer-based state estimation algorithm is plotted in blue.
Vertical axis denotes joint angles in radians; horizontal axis denotes
time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.11 Differences between each stopping position of the arm and their re-
spective cluster centroids, in the XY plane (left) and the XZ plane
(right), as measured by an optical tracker. 14 trials were run, all of
which appear on this plot. . . . . . . . . . . . . . . . . . . . . . . . . 62
4.12 In this experiment, accelerometer-based state estimation was used to
generate relative joint position commands, allowing a position-controlled
robot to repeatedly grasp a doorknob. . . . . . . . . . . . . . . . . . 63
4.13 Time series of one PR2 joint as the manipulator undergoes relative
joint angle commands from the accelerometer-based sensing scheme
and simple setpoint-interpolation to derive small step commands. . . 64
5.1 A low-cost compliant manipulator described described in this chapter.
A spatula was used as the end effector in the demonstration applica-
tion. For ease of prototyping, lasercut plywood was used as the primary
structural material. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Actuation scheme for each of the proximal four DOF. . . . . . . . . . 73
5.3 Cable routes (solid) and belt routes (dashed) for the shoulder lift,
shoulder roll, and elbow joints. All belt routes rotate about the shoul-
der lift joint. The elbow cables twist about the shoulder roll axis inside
a hollow shaft. Best viewed in color. . . . . . . . . . . . . . . . . . . 74
5.4 Compact servos are used to actuate the distal three joints. . . . . . . 75
xvii
5.5 Diagram of the series compliance. Left, compliant coupling with no
external force. Right, an applied force causes rotation against the
locked driven wheel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.6 Stiffness of the elbow. Hysteresis is exhibited due to the polyurethane
in the series compliance. The joint was quasi-statically moved through
70% of its normal operating range. . . . . . . . . . . . . . . . . . . . 79
5.7 Repeatability test results. Measurement accuracy is ±0.1 mm. . . . . 80
5.8 Step responses for each of the major types of actuators of the robot.
Top, the shoulder-lift joint, a NEMA-34 stepper motor. Middle, the
elbow joint, a NEMA-23 stepper motor. Bottom, the wrist yaw joint,
a rigidly coupled Robotis RX-64 servo. Note that timescales change
on each plot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.9 Low-cost MEMS inertial sensors affixed to the teleoperator’s torso, up-
per arm, lower arm, and hand to estimate desired end-effector positions. 82
5.10 Playing chess via teleoperation. . . . . . . . . . . . . . . . . . . . . . 83
5.11 Demonstration task: making pancakes. . . . . . . . . . . . . . . . . 84
6.1 Each finger module has three motors at the proximal end of the module,
shown at left in the figure. . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 The hand frame and its set of identical finger modules, which dock
magnetically or with retaining bolts. . . . . . . . . . . . . . . . . . . 88
6.3 The motor module (aluminum at right) separates from the rest of the
finger module (plastic at center) by simply removing a few bolts. Cable
tension is not affected. . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.4 Hand frame variations. Finger modules are unchanged. . . . . . . . . 92
6.5 The locations of the accelerometers are illustrated by the red circles. . 93
6.6 Soft tactile pads allow conformal grasping of small objects. . . . . . . 94
6.7 To achieve mechanical robustness while still exhibiting conforming
properties, the skin consists of a tougher thin outer layer above a very
soft and thick inner layer. . . . . . . . . . . . . . . . . . . . . . . . . 94
xviii
6.8 Cross-section rendering showing transflective sensors embedded in the
finger pads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.9 Tactile array implemented as a rigid-flex PCB. . . . . . . . . . . . . . 96
6.10 Flat test fixture for the tactile array. . . . . . . . . . . . . . . . . . . 97
6.11 Raw sensor response of repeatedly loading and unloading a 2-gram US
Penny onto the skin assembly shown in Figure 6.10. . . . . . . . . . . 98
6.12 Trinocular camera boards holding direct-solder lens modules (left) and
fully-assembled camera flex circuit boards (right). . . . . . . . . . . . 100
6.13 Left: beam-steered pico projector affixed to the side of a robotic hand.
Right: depth image constructed using this apparatus. . . . . . . . . . 101
6.14 Difference images of polarity-inversion bar codes produced by the pico
projector, amplified 5x. . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.15 Left: laser line generator mounted on robotic finger. Middle: Laser
line sweeping across scene. Right: typical frame and image-difference
processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.16 Two scenes showing typical scans of the fingertip-mounted laser scanner.104
6.17 Demonstration of unaided stereo (left), texture-assisted stereo (center),
and laser line scanning (right) on an artificial scene with very little
texture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.18 Left: initial prototype hand. Right: final prototype, after extensive
design work to eliminate loose wires. . . . . . . . . . . . . . . . . . . 106
6.19 Stackup of outrunner brushless motors, controller board, and heatsink. 107
6.20 Finger Motor Controller Board (FMCB) . . . . . . . . . . . . . . . . 108
6.21 Two pairs of steel cables actuate the distal phalanges, shown in red
and yellow. Electrical implementation is shown at bottom. . . . . . . 110
6.22 Simplified schematic of the multiplexing of power and half-duplex data
over the pair of conductors running the length of the finger. RS-485
transceivers are connected to the D+/D- nodes; F2 and F3 power sup-
plies are connected to the V+/V- nodes. Bus power is supplied from
FMCB (left). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
xix
6.23 Left: illustration of the connectors in the palm and one finger module
base. Right: The resulting hand, which has features no loose wires. . 114
6.24 Data bus topology of the robotic hand. . . . . . . . . . . . . . . . . . 115
6.25 An exoskeletal glove designed to precisely measure the movements of
the teleoperator, with a kinematic structure similar to the robotic hand.116
6.26 Various grasps and manipulation postures achieved during tele-operation.117
7.1 Left: the STAIR 1 robot. Right: the STAIR 2 robot . . . . . . . . . . 121
7.2 Pan-tilt-zoom (PTZ) camera control graph . . . . . . . . . . . . . . . 130
7.3 Graph of the original STAIR “fetch a stapler” demonstration. The
large red text indicates the tasks performed by various regions; those
annotations were not a functional part of the graph. . . . . . . . . . . 132
7.4 The STAIR1 robot picking up a stapler. . . . . . . . . . . . . . . . . 136
8.1 A typical ROS network configuration . . . . . . . . . . . . . . . . . . 140
8.2 An automatically-generated rendering of a running ROS system . . . 152
xx
Chapter 1
Introduction
This thesis addresses personal robots—that is, robots designed to be owned by in-
dividuals and operated for their benefit, as opposed to robots that are operated as
capital equipment for industrial applications. This nomenclature is modeled on the
paradigm shift seen in computing in the late 1970s, where organization-owned main-
frames and minicomputers gave way to the era of the personal computer and its
corresponding massive shifts in application domains, user base, and societal impor-
tance. Although the legendary success of personal computing has caused virtually
every field of endeavor to wish to recreate its trajectory, the field of advanced robotics
has been described as primed for a similar paradigm shift. The exciting possibilities
for increases in economic productivity as well as numerous potential opportunities for
societal improvement, however, will require advances in a variety of fields of robotics
research and development. Among many others, these challenges include cost, re-
liability, perceptual capability, and software interoperability. This thesis presents a
series of experiments and prototypes designed to explore these issues.
1.1 Personal Robots
Personal robots are a long-standing dream of artificial intelligence. As such, the
concept has attracted massive amounts of research and development over several
decades. Although the term is often used in a variety of contexts, broadly speaking, a
1
2 CHAPTER 1. INTRODUCTION
personal robot is intended to be owned, operated by, or of assistance to, an individual.
Although many subsystems and areas of expertise can be shared, the domain of
personal robotics is distinct from the established field of industrial robotics.
Industrial robotics, often described as automation, values precision, repeatability,
and reliability. Systems designed for production-line operation can have decades-long
deployments and thus must be developed with long-term system stability as a primary
design consideration [31]. In contrast, envisioned applications of personal robots often
involve difficult perceptual challenges, reasoning under considerable uncertainty, and
close interactions with humans. Both domains are certainly robots—sensors and
actuators connected by algorithms—but the differences are significant.
In addition to these environmental and task-space distinctions, many envisioned
applications of personal robotics are far more cost-sensitive than typical industrial-
robot applications. In contrast to the superhuman performance achieved by many
deployed industrial-automation systems, many envisioned tasks for personal robotics
are currently performed manually, and thus can be readily evaluated in terms of labor
cost. For mass-market acceptance of personal robots beyond entertainment value, this
presents an upper bound on the cost of the system.
Although the mass manufacture of any product, including personal robots, can
lead to enormous cost savings, additional cost-reduction can be achieved by specif-
ically designing robotic systems for low-cost operation. This entails co-design of
software and hardware, with the interplay generally seeking to increase the complex-
ity of software to compensate for a reduction in the complexity and/or precision of
hardware, a trade-off often justified by noting that extraordinarily complex software
can be perfectly copied at zero cost.
The work presented in this thesis was intended to address several challenges cur-
rently limiting deployments of personal robots. The hardware systems were developed
specifically to drive down the system cost while still attaining a level of performance
deemed sufficient to accomplish a variety of tasks envisioned of future personal robots.
The subsystem costs of several state of the art service robots were analyzed to select
areas for cost reduction, resulting in the identification of several research directions: a
low-cost localization system that does not require a scanning laser rangefinder, and the
1.2. CONTRIBUTIONS 3
development of sensing techniques, calibration methods, and fabrication techniques
to create low-cost manipulator arms and hands. The results from these low-cost
explorations are presented in this thesis.
Even if cost is no object, real-world usage of advanced personal robotics must
face the perceptual challenges associated with the unstructured and highly variable
environments of typical homes and offices. As a result, this thesis includes a series
of experiments designed to demonstrate that high-resolution depth sensing can dra-
matically improve the reliability of object recognition in the environments envisioned
for future deployments of personal robots, and can be done at reasonable cost. These
results led to the integration of high-resolution depth sensing systems in the robotic
hand which will be presented in Chapter 6.
Finally, numerous challenges are created by the sheer volume of software required
to create general-purpose robotic systems of the complexity necessary to handle un-
structured environments. Software systems of this scale require the contributions
of massive numbers of researchers and engineers, typically working in parallel and
often in disparate groups. Such challenges are common to any large-scale engineer-
ing endeavor. However, the desire to improve the productivity of robotics software
developers, in addition to the stability of the resulting composite systems and the
re-usability of individual robotic software subsystems, resulted in the creation of a
series of robotics software frameworks as part of this thesis. These frameworks were
used to implement all of the hardware systems described in the thesis, and for all of
the integrative demonstrations described in this work. The Robot Operating System
(ROS) has since become widely-used in the personal-robot research community as
well as in other robotics domains. The design goals and development of ROS and its
predecessors will be described in detail in the final two chapters of this thesis.
1.2 Contributions
The contributions of this thesis are in two areas. First, a series of hardware de-
sign techniques for low-cost robotics, culminating in the design of a low-cost, high-
performance robotic hand. Second, a series of robotics software frameworks were
4 CHAPTER 1. INTRODUCTION
designed, implemented, refined, and freely released to facilitate collaboration and
promote interoperability in the robotics research and development community.
1.3 Outline
The thesis will proceed in the following manner:
Chapter 2: Low-cost Indoor Localization will describe a low-cost system for
indoor localization. Localization is a critical component of virtually all mobile robotic
systems, and the system described in this chapter was purposely designed using only
low-cost sensors suitable for localizing and tracking personal electronics and robotics
in typical indoor environments.
Chapter 3: High-resolution Depth Sensing presents a high-resolution short-
range depth-sensing system mounted on a mobile manipulator. Experimental results
are then provided when using this system to perform two tasks that are expected to
be necessary for personal robots: opening interior doors, and object recognition in
cluttered environments.
Chapter 4: Low-cost Joint Encoding describes a method for estimating the
kinematic state of manipulator arms using low-cost 3D MEMS accelerometers of the
type commonly found in mobile phones. Implementations are shown on both low-cost
and high-precision manipulators.
Chapter 5: A Compliant Low-cost Manipulator presents the design and
implementation of a low-cost robotic arm with compliant joints. The manipulator is
then demonstrated performing a cooking task.
Chapter 6: A Low-cost Robotic Hand presents the design and implemen-
tation of a fully-actuated robotic hand, which integrates all of the low-cost design
concepts detailed in the previous chapters and provides an integrated suite of visual,
tactile, and inertial sensors.
Chapter 7: STAIR and Switchyard presents an overview of the STanford AI
Robot (STAIR) project to create a home and office assistant robot, and Switchyard,
the initial software integration framework which emerged from that effort.
Chapter 8: ROS: A Robot Operating System describes the development,
1.4. FIRST PUBLISHED APPEARANCES 5
design, and features of the Robot Operating System (ROS), a much larger software
integration framework intended to support personal robotics.
1.4 First Published Appearances
Much of the following material has been previously published. Chapter 2 is derived
from [77]. Chapter 3 is derived from [73]. Chapter 4 is an extension of [75]. Chapter
5 is derived from [72]. The work described in Chapter 6 has not previously been
published. Chapters 7 and 8 provide a more detailed discussion of work originally
published in [74] and [76].
Chapter 2
Low-cost Indoor Localization
2.1 Introduction
Of all the subsystems of a personal robot, localization is perhaps the one most fre-
quently glossed over as a solved problem. Indeed, there is a vast literature in the
subject, dating back to the earliest mobile robot experiments. Robust navigation
systems are now widely available and are often based around Bayes filter variants
which observe the world through laser range-finders and robot odometry [96].
However, most fielded localization systems involve high-precision, expensive sen-
sors such as laser range sensors, high-quality inertial units, or extensive infrastruc-
ture. Although these systems have been usefully employed in a variety of settings, the
domain of personal robotics entails cost sensitivity that is difficult to achieve using
time-of-flight laser range sensors. In contrast, this chapter describes a localization sys-
tem which employs sensing technologies found in commodity cellphones: integrated
CMOS imagers, accelerometers, and WiFi radios. This sensor suite is quite different
from the canonical laser range finder and odometer found in most robot localization
tasks, and presents a different (though related) set of challenges.
The system consists of two components: a mapping platform and a mobile lo-
calization platform. The mapping platform is a typical modern robot, as shown in
Figure 2.1. Like many robots of this size class, it can autonomously acquire maps of
indoor environments and aligns them with off-the-shelf SLAM algorithms. The focus
6
2.2. RELATED WORK 7
of this chapter is the mobile localization system, which can autonomously localize it-
self to pre-made maps using only low-cost sensors. The method requires no additional
environment instrumentation or modification beyond standard, widely-deployed WiFi
infrastructure. In practice, the mapping platform is used once for each environment.
The resulting map may then be used by many roaming robots, bringing high-quality
localization to these low-cost sensor platforms.
The implementation was tested by evaluating its accuracy against ground truth
results acquired using the backpack-mounted sensing system shown in Figure 2.1.
Using this data, the method was shown to provide sub-meter precision with low-cost,
consumer-grade sensors and without environment modification or instrumentation.
WiFi is shown to be excellent for quick global convergence, but camera data performs
better for precise position tracking, and sensor fusion combines the best aspects of
both systems. The localization system was tested in a typical office environment,
where the map and localization data were collected at different times of day and on
different days, with the environment allowed to undergo typical daily changes. The
system offers numerous potential applications, including the localization of low-cost
personal robot platforms in typical home and office environments.
2.2 Related Work
The literature on localizing a robot (or other rigid sensor platform) against a map
is vast. [96] provides a comprehensive literature review, which is summarized and
extended in this section. The idea goes back at least as far as the robot Odysseus [90],
which compared sensor measurements in a local grid to a global map and competed
at the National Conference on Artificial Intelligence (AAAI) in 1992. A continuum of
algorithms exist across a variety of sensor and map configurations. [52] used sonar to
detect coarse landmarks in maps and localize with an extended Kalman filter (EKF).
Later, grid-based methods were developed. In contrast to EKFs, these methods
represented the posterior as a histogram and were not constrained to Gaussian noise
assumptions. Grid-based methods usually relied on landmarks, however. Grid-based
localization was used successfully in sewer pipes [37], in a museum [11], and in an office
8 CHAPTER 2. LOW-COST INDOOR LOCALIZATION
environment [89]. [95] used learning to determine the best landmarks for reliable local-
ization. Most recently, Monte Carlo Localization (MCL) [18] was developed, replacing
landmarks with raw measurements and the histogram posterior with particles. In a
hybrid of ideas between MCL and grid-based methods, [43] introduces MCL with fea-
tures. Several papers have utilized MCL with cameras including [51, 83, 106, 46, 105].
Others have localized by direct image matching, without using a probabilistic filter or
motion model [85, 108]. Localization with signal-strength mechanisms such as WiFi
have been studied in the literature as well [6, 25, 21, 53, 36], including systems that
bootstrap automatically without an explicit map-making step [57, 7].
There are several key differences between the system described in this chapter
and the previous literature. First, the sensor suite is intentionally limited to com-
modity parts whose economy of scale allows price points in the tens-of-dollars range:
consumer-grade MEMS sensors, an integrated CMOS camera, and a WiFi radio.
Second, while previous work exists on low-cost sensors such as WiFi or cameras,
these sensors were usually studied individually, whereas the system described in this
chapter employs probabilistic sensor fusion. As long as the sources of measurement
uncertainty, such as noise and bias, are conditionally independent, combining mul-
tiple sensors will have a positive impact on performance. This is true even if the
inexpensive sensors are highly noisy. The data shows that while WiFi offers fast
global convergence, cameras provide more precise tracking. Sensor fusion allows the
best of both.
2.3 Approach
The system is based around three levels of sensing and inference. The first two are
used for offline map-building, and the third is used for online localization. These
stages are described in detail in the following sections.
2.3. APPROACH 9
Figure 2.1: Left: the STAIR robot, used to acquire maps. Right: a laser backpack,used to provide ground-truth localization of a pedestrian carrying the low-cost sensorpackage.
2.3.1 Robotic SLAM
The first tier of the system captures the 3D structure of the environment. This is per-
formed by a robotic platform equipped with three LIDAR scanners and a panoramic
camera, as shown in Figure 2.1. To build up a 2-D map of the environment and
correct the odometry of the robot, a horizontal LIDAR is used with the GMapping
SLAM system, an efficient open-source implementation of grid-based FastSLAM [34].
The GMapping system was used out-of-the-box to produce the 2D map shown
in Figure 2.2. The robot path corresponding to this map was then used to project
the vertical and diagonal LIDAR clouds into 3D by backprojecting rays through the
rectified images into the LIDAR cloud. The robotic mapping phase of the system is
thus able to map the 3D structure and visual texture of the environment. However,
this alone is not enough to permit localization via a low-cost sensor suite; what is
needed next is a precise sensor model of how the low-cost sensors behave in the
10 CHAPTER 2. LOW-COST INDOOR LOCALIZATION
Figure 2.2: 2D map of the environment used in these experiments, as produced byGMapping and the robot shown in Figure 2.1
environment of interest. This is handled by the next phase of the system.
2.3.2 Obtaining Training Data
Non-parametric methods are a simple way to capture the complex phenomena ob-
served by the low-cost sensors. For example, it would be difficult to parametrically
model the various radio-frequency (RF) propagation effects that can occur with WiFi
signal power in typical indoor environments. Issues such as occlusion/shadowing
from building structural elements, interference between multiple access points, di-
rectionality of the transmit and receive antennas, etc., result in a complex power
distribution pattern. Similarly, the camera of a smartphone captures an enormously
complex stream of data. A simple (indeed, perhaps the simplest) way to predict these
complex observations is to simply acquire many observations from a large number of
known positions in the environment.
Obtaining training data for these non-parametric techniques is non-trivial: the
location of the observation (WiFi signal power or camera image) must be known for
it to be useful to subsequent localization algorithms. Low-cost localization systems
can be used to localize any number of items, ranging from low-cost robots to wheeled
equipment to pedestrians. In this experiment, pedestrian localization was examined
in detail, although the resulting system could be used equally well (indeed, perhaps
with even better results) on wheeled vehicles.
Because the pedestrian’s body can have an effect on the received signal strength
2.3. APPROACH 11
Figure 2.3: A typical rendering of the particle filter used to localize the “groundtruth” pedestrian using rearward LIDAR. The green LIDAR scan is rendered fromthe most likely particle in the filter.
(e.g., the person’s body is directly between the receiving and transmitting antennas).
A system was created for accurately localizing pedestrians, and was used to obtain
training data for non-parametric modeling of the spatial RF signal power.
To localize the pedestrian, a rearward-facing laser range finder was affixed to a
backpack, as shown in Figure 2.1. A particle filter was then employed to fuse the
laser observations with a crude motion model of a walking pedestrian. This is more
challenging than the canonical robot localization task, since mobile robots typically
have odometry which is locally stable. In contrast, the pedestrian-localization system
used in this experiment only knows if a person is walking or not; at time of writing,
low-cost MEMS accelerometers are far too noisy to simply doubly-integrate to produce
position estimates. Instead, a simple sliding-window classifier was used on the spectra
of the acceleration vectors to detect when to apply a “walking” motion model. Low-
cost magnetometers were found to be challenging in the test environment of a steel-
framed building with many computers, power cables, and other electronic equipment
capable of inducing local magnetic disturbances. While this testing environment may
have been particularly unfriendly, similar local magnetic perturbations would likely
also pose challenges to efforts relying heavily on magnetometer data in similar indoor
environments.
As the accelerometer and magnetometer can only give a coarse measurement of
the path of a pedestrian, the LIDAR-based pedestrian particle filter relies heavily
12 CHAPTER 2. LOW-COST INDOOR LOCALIZATION
Figure 2.4: Example image regions corresponding to the “visual words” used duringimage matching.
on frequent “gentle” resamplings of the particle cloud. More specifically, the mea-
surement model has a far higher uniform component than is typical for mobile robot
filters, and incorporates measurements from every laser scan, in order to correctly
track the pedestrian through turns. A typical rendering of the particles is shown in
Figure 2.3.
2.3.3 Camera Sensor Model
The literature on place recognition using visual images contains many proposed meth-
ods. For these experiments, three different approaches were selected from the recent
computer vision literature: a “bag of words” method using SURF descriptors of inter-
est points [16] [5], a “bag of words” method using HoG descriptors of a dense uniform
grid [17], and a color-histogram method [111]. The first two methods were further
augmented by adding a spatial pyramid [50]. These methods will be briefly described
in the following paragraphs.
In the bag of words model, first a dictionary of “visual words” is constructed. This
is done by extracting SURF [5] descriptors from a large set of images captured of the
target environment and quantizing the descriptors using K-means clustering. The
2.3. APPROACH 13
resulting 128-dimensional cluster centroids are stored with indices 1 to k. Figure 2.4
shows image patches whose descriptors are at the center of clusters computed by K-
means. Then, given a image, the “bag of words” representation can be calculated in
the following way:
• Extract SURF descriptors from the image
• Map each descriptor to the index of the nearest centroid in the dictionary
• Construct a histogram with the frequency counts for each index, i.e., the number
of descriptors that were mapped to each index
Although the histogram discards all of the geometric information about the loca-
tions of the descriptors in the image, the histograms have nevertheless been shown to
function effectively as compact descriptions of the image content.
The HoG-based method used a similar approach. However, instead of using de-
scriptors of interest points, the image was sampled on a dense grid. As a result, the
number of HoG descriptors extracted from each image was always the same. To pro-
duce a similar data compression as the SURF-based method, we chose to extract HoG
descriptors from 32x32 blocks arranged on a 15x20 grid across the image. This re-
sulted in 300 HoG descriptors per image, which was similar to the average number of
SURF keypoints found in the same images using the OpenCV SURF implementation.
As before, k-means was used to quantize the HoG descriptors, and then histograms
were built of the quantized descriptors for each image.
As previously mentioned, the “vanilla” bag of words algorithm discards the spatial
configuration of the descriptors in the image plane. The “spatial pyramid” approach is
one proposed method to incorporate coarse spatial information, and is fully developed
in [50]. This method repeatedly subdivides the image into quadrants, and constructs
histograms for each quadrant on each level. For example, a two-level spatial pyramid
would have one global histogram for the whole image, and one histogram for each
quadrant, for a total of five histograms. Similarly, the three-level pyramid has 1+4+
16 = 21 histograms. This approach has been shown to offer improved performance
14 CHAPTER 2. LOW-COST INDOOR LOCALIZATION
over the single-histogram technique, at the cost of correspondingly increased memory
and computation requirements.
For a radically different approach, and to serve as a baseline, a color-histogram
technique was also implemented. This technique is conceptually much simpler: the
image is first converted to hue-saturation-value (HSV) space, after which a histogram
is constructed of the hue values of all pixels in the image. HSV space is used to provide
some invariance to illumination changes. The resulting representation is essentially
a polar histogram of the color wheel, with the intent of capturing the colors of the
walls, ceiling, and floor coverings, which often change depending on the region of the
indoor environment viewed by the camera.
To use these image representations in a localization filter, it is necessary to produce
an estimate of the probability that an image representation z was produced from pose
x. To compute this probability using an approach analogous to that of laser range-
finders, a textured 3D model of the world would need to be projected into the camera
frame of each particle in the particle filter, followed by the computation of some sort
of distance function. This would be computationally difficult, even on high-power
GPU hardware. Instead, a coarse, yet experimentally justified, approximation was
used: p(z|x) was estimated through a nearest-neighbor lookup on the training-set
images yiimgand poses yipose in histogram space. A histogram distance metric was
augmented with a penalty for using images that were far from the candidate pose x.
Intuitively, if the pose x was in the exact position as a pose in the training set,
and the corresponding image histograms are identical, p(z|x) should be very high.
Furthermore, p(z|x) should fall off smoothly as the image and pose start to differ
from the training image histogram yihistand training image pose yipose , so that query
images taken near (but not exactly on) the poses of the training images will still receive
a significant probability. Conversely, if the query image z is significantly different from
the training image yihist, or the candidate pose x is significantly different from the
map image pose yipose , the probability should be very small.
Various probability distributions were tested, and it was found experimentally
that the heavy tails of a Laplacian distribution were better suited for this sensor than
a Gaussian distribution. Two parameters, λ1 and λ2, allow for independent scaling
2.3. APPROACH 15
−40 −30 −20 −10 0 10 200
20
40
60
80
100
120
RSSI difference from means
Fre
quen
cy (
2−se
cond
mea
s. in
terv
als)
25 Stationary observations of 34 transmitters
Figure 2.5: Empirical justification of the Gaussian + uniform model of the WiFi powermeasurements. The plot shows the frequency of power measurement deviations fromtheir respective means. This dataset was gathered while sitting stationary for 60seconds, and includes 34 transmitters, most of which were observed 25 times.
between the histogram distance and the pose distance. Penalization was added for
yaw deviation, as the query image and the training image should be pointed in nearly
the same direction for comparison to be meaningful. The combined model first finds
the nearest neighbor, using the aforementioned weighted distance metric, and then
models that distance as a zero-mean Laplacian distribution:
p(z|x) ∝ exp−miniλ1
∥∥z− yiimg
∥∥1
+ λ2∥∥x− yipose
∥∥2
σ(2.1)
Large changes in ambient illumination will cause low-cost cameras to have numer-
ous artifacts, such as higher noise when the camera gain must be raised in dim lighting
to maintain a short exposure time. This, in turn, will cause a different number of in-
terest points to be found in the image, resulting in a vertical shifts of the histogram.
To provide some measure of invariance to global illumination for the SURF-based
method, the image histograms were normalized before computing their distance.
16 CHAPTER 2. LOW-COST INDOOR LOCALIZATION
2.3.4 WiFi Sensor Model
WiFi signal power measurements do not suffer from the correspondence-matching
problem often associated with robotic sensors. Signal power measurements from
scanning WiFi radios are returned with the transmitter’s MAC address, a 48-bit
number unique to the hardware device (barring pathological spoofing cases). Thus,
even though the power measurement is noisy, WiFi observations can provide excellent
context for global localization.
To simplify the probabilistic treatment, conditional independence of the WiFi sig-
nals was assumed. This assumption is impossible to justify without access to the con-
figuration and firmware of the WiFi radio, and we suspect that the assumption does
not hold up. For example, if two WiFi radios are broadcasting on the same channel,
a nearby radio may mask the presence of a more distant radio, or in some infras-
tructure deployments, the same radio may broadcast more than one MAC address.
However, assuming conditional independence was found experimentally to provide
a useful likelihood function, and has the significant added benefit of computational
simplicity.
To model the WiFi noise, a Gaussian distribution was used that was summed with
a uniform distribution. This was empirically justified by the stationary observations
shown in Figure 2.5, which were gathered from 34 transmitters over 60 seconds.
There is a Gaussian-like bump around the expected mean, and a small number of
large deviations on both sides. More formally, for a set of signal power measurements
zi and a robot pose x
p(z|x) ∝∏i
exp
(−‖zi − hi(x)‖22
σ2
)(2.2)
where hi(x) is the predicted power measurement for transmitter i at pose x. To
make this prediction, we simply employ nearest-neighbor over the training set: since
each observation in the training set occurred at a known location (thanks to the laser
scanner employed at training time), we build up a pre-computed map of the nearest-
neighbor prediction of the WiFi signal power levels. A sample nearest-neighbor map
is shown in Figure 2.6. A nearest-neighbor map was computed for each MAC address
2.3. APPROACH 17
Figure 2.6: Pre-computed nearest-neighbor prediction of the WiFi signal strengthof a particular MAC address at any point in the environment. The walls of theenvironment are overlaid for clarity.
seen in the training set. With these maps, the computation of p(z|x) is linear in the
number of MAC addresses in z.
2.3.5 Localization
Once the sensor models are acquired, they can be incorporated into a particle filter,
to introduce temporal constraints on the belief state and to fuse the models in a
systematic fashion. The particle filter used was Monte Carlo Localization (MCL)
as described in [18]. The update step of the particle filter requires a motion model.
As previously described, magnetometers were found to be unreliable in the steel-
framed building used in these experiments, preventing reliable direct observation of
the heading changes of the low-cost sensor package. Instead, a motion model was
used to continually hypothesize motions of the pedestrian.
The motion model was empirically developed to match the trajectories observed by
the laser-equipped ground-truth pedestrian. The motion model assumes that pedes-
trians usually travel in the direction they are facing, and this direction usually does
not change. This behavior was modeled by sampling the future heading from a Gaus-
sian distribution N1 centered on the previous heading. The velocity of the pedestrian
was sampled from a Gaussian distribution N2 with a mean of 1.2 meters/second,
which was empirically found using the LIDAR-based pedestrian localizer. These dis-
tributions are summed with a 2-d zero-mean Gaussians N3 to encourage diversity in
18 CHAPTER 2. LOW-COST INDOOR LOCALIZATION
Figure 2.7: Visualization of the unified vision + WiFi localization system. Upper-left shows the particle cloud, which is overshadowed by the centroid of the particledistribution (yellow) and the ground-truth position (cyan crosshairs). Right showsthe current camera image, with SURF keypoints circled. Lower-left shows the jointlikelihood of the WiFi observations. Extreme lower-left visualizes the histogram ofthe bag-of-words representation image.
the particle filter. More formally, to sample from the motion model,
v′ = Rθ
[N2 (µvel, σ1)
0
](2.3)
x′
y′
θ′
=
x
y
θ
+
N3 (0, σ2) + v′
N1 (0, σ3)
(2.4)
The parameters to this model were tuned in the LIDAR-based localization sce-
nario, where the time between each laser scan was 27 milliseconds. To scale up to
the larger intervals seen in the WiFi- and camera-based filters, particles were simply
propagated through the previous equations the appropriate number of times. Run-
ning the model for one second produces the particle distribution shown in Figure 2.8
2.4. RESULTS 19
−1 0 1 2 3
−1.5
−1
−0.5
0
0.5
1
1.5
Motion Model
longitudinal axis (meters)
late
ral a
xis
(met
ers)
Figure 2.8: Pedestrian motion model, shown after a one-second integration. Withoutodometry, the particle filter must generate sufficient diversity in its hypotheses tohandle corners.
(dimensions in meters).
The motion model also encodes the fact that the target cannot go through walls.
As a result, when the target platform passes an intersection of corridors, particles are
rapidly generated to implicitly cover the possibility that the pedestrian has turned.
As is common practice in particle filters, to prevent premature convergence of the
particles during global localization, and to handle unmodeled effects when tracking
(e.g., lens flare when facing a sunbeam from a window, passers-by occluding the
camera, RF anomalies, etc.), a uniform distribution was added to the measurement
models described in the previous section.
2.4 Results
To quantify the performance of the system, two data sets were collected from the
second floor of the Stanford University Computer Science Department. The data sets
were approximately 13 minutes long and contain paths approximately one kilometer
20 CHAPTER 2. LOW-COST INDOOR LOCALIZATION
Figure 2.9: Images from the training set (top) differed from images in the test set(bottom) due to illumination changes and typical furniture re-arranging.
long. The first data set was recorded in the daytime, and the second data set was
recorded in the nighttime, several days later, as shown in Figure 2.9. In the interim,
many chairs were moved around in meeting spaces, different office doors were open (or
shut), and some clutter was moved. No effort was made to normalize the environment
between testing and training, other than to ensure that interior lights were turned
on. However, no major renovations, redecorations, or organized clean-up occurred in
the intervening days.
The first data set was used solely for training the sensor models. The second
data set was used to generate localization estimates using the models learned from
the first data set. The backpack shown in Figure 2.1 was worn while collecting both
datasets, to permit quantitative analysis of localization errors of the low-cost sensors
with respect to the LIDAR localization scheme, which is the best available estimate
of ground truth.
2.4. RESULTS 21
−20 −10 0 10 20 30
−10
−5
0
5
10
Figure 2.10: The ground-truth LIDAR track of the 1-kilometer test set used forquantitative evaluation. The test set contained 62 corner turns, and a mixture ofnavigating tight corridors and open meeting spaces. Distances are in meters.
Data was collected on a laptop carried by a pedestrian. A handheld Logitech
Webcam Pro 9000 was run at 640x480, 30 frames per second, and the raw YUV
frames were recorded to disk. The internal WiFi card (Intel 4965) of a Dell m1330
laptop provided the WiFi data. Code was adapted from the Linux “iwlist” command
to scan the RF environment every two seconds. Accelerometer data was provided by
a handheld MicroStrain 3DM-GX2 at 100 Hz. All of these sensors are comparable in
performance to those in high-end smartphones; a laptop was used simply for ease of
data-collection and storage.
Empirical evaluation was performed of several vision methods, WiFi by itself, and
finally on a combination of WiFi and the empirically-best vision method.
The first evaluation benchmark is a histogram of the localization errors observed
on a 1-kilometer path through the test environment. This test dataset included 62
corners, and is shown in Figure 2.10.
Table 2.1: Quantitative results for the tracking taskMetric WiFi SURF HoG Color SURF+WiFi
mean error (m) 1.81 0.73 4.31 10.90 0.78std. dev of error (m) 0.99 0.57 9.15 10.65 0.64
The results of evaluating single-level SURF and dense HoG, color-histogram, WiFi,
and SURF+WiFi on the “tracking” benchmark are shown in Figure 2.11. The SURF
22 CHAPTER 2. LOW-COST INDOOR LOCALIZATION
Figure 2.11: Histograms of localization errors during the tracking benchmark on acontinuous 1-kilometer test set. Errors are measured with respect to LIDAR ground-truth. The SURF and HoG performance is for the global (1-level) spatial pyramid.Adding WiFi to SURF slightly decreases its long-term average tracking accuracy. Thecolor histogram performs poorly.
method outperforms the WiFi method, and the combined SURF+WiFi system per-
forms no better than the SURF-only system. The dense HoG method does signif-
icantly worse, and the color histogram method performs poorly. The quantitative
results are shown in Table 2.1.
The second benchmark measures the speed of global localization by averaging the
localization error as a function of the time since the localizer was started. These
results were computed by starting the localization systems on the test data at 200
regularly-spaced starting points. A graphical plot is shown in Figure 2.12.
This benchmark reveals an interesting duality of the sensor suite: the WiFi sys-
tem, thanks to having an intrinsic solution to the correspondence problem, can quickly
achieve a mean error of 3-4 meters. However, due to the many sources of noise in the
WiFi signal power measurement, the WiFi-only system cannot obtain meter-level per-
formance. In contrast, the best visual methods (2-and 3-level SURF spatial pyramid)
are able to obtain excellent tracking results, but take much longer to converge due to
2.4. RESULTS 23
0 5 10 15 20 25 300
5
10
15
Time (seconds), 1 picture per second, 1 WiFi scan per 1.8 seconds
Mea
n lo
caliz
atio
n er
ror
(met
ers)
Global localization performance (200 different starting positions)
1−level SURF2−level SURF3−level SURF1−level dense HoG2−level dense HoG3−level dense HoGColor HistogramWiFiWiFi + 3−level SURF
Figure 2.12: Global localization performance. The localization systems were startedwith a uniform prior at 200 different starting points in the test set. Errors againstground-truth were averaged at each timestep to show the expected convergence prop-erties of each system. All methods show improvement as more observations are in-corporated. The combination of WiFi and the best visual algorithm (3-level spatialpyramid of SURF descriptors) produces the best performance.
the repetitive nature of some regions of the test environment (e.g., long corridors), or
inherent ambiguity of some starting positions (e.g., facing the end of a corridor).
Probabilistic fusion of the best visual method (3-level SURF spatial pyramid)
and the WiFi measurements produces a system that combines the strengths of both
modalities: quick global convergence to an approximate position fix, followed by
precise tracking. The particle filter performs the sensor fusion automatically, using
the sensor models and the motion model.
24 CHAPTER 2. LOW-COST INDOOR LOCALIZATION
2.5 Summary
This chapter presented a precision indoor localization system which uses only low-
cost sensors which are several orders less expensive and less accurate than what is
typically found on a research robot. The method requires no environment instrumen-
tation or modification. The chapter also described the implementation and testing of
the system, demonstrating its effectiveness at sub-meter localization in a test environ-
ment. The results indicate sensor fusion is essential, as WiFi is effective for fast global
convergence, whereas computer vision is preferred for high-precision tracking. Such
a system could be readily ported to a low-cost personal robot navigating in typical
home and office environments, enabling “reasonable” localization performance suffi-
cient to permit autonomous navigation, at a small fraction of the cost of canonical
solutions employing scanning laser range fingers.
Chapter 3
High-resolution Depth Sensing
3.1 Introduction
Personal robots are expected to operate in the unstructured environments of typical
homes and workplaces. Although humans effortlessly use complex objects in such
environments, this domain is enormously more complex than the highly engineered
environments inside the typical closed-workcell environments of successful industrial
robots. The previous chapter presented a system intended to allow low-cost robots to
localize themselves in such environments, but localization is only part of the problem:
viable personal robots must also accomplish useful work. For many applications, this
will require robust perception capabilities to recognize objects in cluttered everyday
scenes.
This chapter will describe a series of experiments showing the utility of high-
resolution 3D sensing on mobile manipulators. Just as the change from sonar-based
sensing to laser-based sensing enabled drastic improvement of navigation and mapping
systems in mobile robotics, the results shown in this chapter suggest that dramatically
improving the quality of depth estimation on mobile manipulators could facilitate new
classes of algorithms and higher levels of performance, as shown in Figure 3.1.
It must be noted that these experiments were performed when the state of the art
of 3D data on mobile manipulators was either tilting time-of-flight laser scanners, or
relatively noisy time-of-flight 3D cameras. As such, the usage of depth information on
25
26 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING
large mobile manipulators was not yet standard practice. Since the original publica-
tion of these experiments, the field has been upended by the advent of low-cost depth
cameras typified by the Microsoft Kinect. However, the experiments of this chapter
remain relevant and the overall argument is still valid: high-resolution, high-fidelity
depth data can dramatically increase the utility of mobile manipulators in everyday
environments. In support of this idea, this chapter presents two scenarios where high-
accuracy 3D data is useful to large mobile manipulators operating in the cluttered
environments that one would expect to find in deployments of personal robots.
The first scenario involves object detection. In many tasks, a mobile manipulator
needs to search for an object class in a cluttered environment. This problem is chal-
lenging when only visual information is given to the system: variations in background,
lighting, scene structure, and object orientation exacerbate an already-difficult prob-
lem. This chapter demonstrates that augmenting state-of-the-art computer vision
techniques with high-resolution 3D information results in higher precision and recall
than is achievable by either modality alone.
The second scenario involves manipulator trajectory planning. The following sec-
tions demonstrate closed-loop perception and manipulation of door handles using
information from both visual images and the 3D scanner. The high-resolution 3D
information helps ensure that the trajectory planner keeps the manipulator clear of
the door while still contacting the door handle.
An application experiment is then presented which combines these capabilities
to perform a simple inventory-control task. The mobile manipulator enters several
offices, searches for an object class, and records the detected locations.
3.2 Related Work
Augmenting computer vision algorithms with 3D sensing has the potential to reduce
some of the difficulties inherent in image-only object recognition. Prior work has
shown that low-resolution depth information can improve object detection by remov-
ing object classifications which are inconsistent with training data. For example,
objects are usually not floating in the air, some object classes are unlikely to be on
3.2. RELATED WORK 27
Figure 3.1: Several off-axis views of a raw scan of a coffee mug obtained by our scannerfrom 1.2 meters away. The 5mm-thick handle is prominently visible. Approximately1000 points of the scan are on the surface of the coffee mug, despite the fact that itcomprises only 5% of the horizontal field-of-view of scan.
the floor, and many object classes have upper and lower bounds on their absolute
size [33].
However, if a depth sensor’s noise is comparable to the size of the target object
classes, it will be hard-pressed to provide more than contextual cues. The difference
between a stapler and a coffee mug, for example, is only several centimeters in each
dimension. Indeed, many objects designed for manipulation by human hands tend
to be similarly sized and placed; thus, using depth information to distinguish among
them requires sub-centimeter accuracy.
Unfortunately, many current sensing technologies have noise figures on the cen-
timeter level when measuring from 1-2 meters distant. Ranging devices based on
time-of-flight, for example, tend to have centimeter-level noise due to the extremely
short timescales involved [61]. Additionally, time-of-flight ranging systems can intro-
duce depth artifacts correlated with the reflectance or surface normal of the target
object [24].
In contrast, the accuracy of passive stereo cameras is limited by the ability to find
precise feature matches. Stereo vision can be significantly improved using global-
optimization techniques [93], but the fundamental problem remains: many surfaces,
particularly in artificial environments, do not possess sufficient texture to permit
robust feature matching (e.g., a blank piece of paper). Efforts have recently been
28 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING
Figure 3.2: Clutter makes scene understanding from only 2D visual images difficult,even in a relatively simple office environment, as many of the strong edges are notthose which suggest the 3D structure of the scene.
made to combine passive stereo with time-of-flight cameras [112], but the resulting
noise figures still tend to be larger than what is achievable using a laser line scanner.
Active vision techniques use yet another approach: they project patterns onto
the scene using a video projector, and observe deformations of the patterns in a
camera to infer depth [109]. Besides the difficulties inherent in overcoming ambient
light simultaneously over a large area, the projected image must be at least roughly
focused, and thus depth of field is limited by the optical geometry. However, this
is a field of active research and great strides have been made in recent years. The
PrimeSense sensors, typified in the Microsoft Kinect depth camera, represent the state
of the art at time of writing. A pattern is projected onto the scene in near-infrared,
and its deformation onto the world is decoded using a custom ASIC implementation
to deliver 30-fps performance in a low-power, compact design.
This brief summary of the limitations of alternative 3D sensing modalities is bound
to change with the continual progress being made in each of the respective areas of
inquiry. The work in this chapter seeks to explore the potential benefits of highly
accurate 3D data for mobile manipulators. As the various 3D modalities continue to
improve, their data could be used by the algorithms described in this paper. However,
none of the extant 3D modalities are currently able to match the resolution and sharp
3.3. LASER LINE SCANNING FOR ROBOTICS 29
depth discontinuities that emerge from a triangulation-based laser line scanner. For
the purposes of this study, several laser line triangulation systems were constructed
to explore how high-quality 3D data can improve the performance of mobile manip-
ulation.
Laser line triangulation was selected because millimeter-level accuracy is readily
achievable. This is on the order of accuracy we have been able to achieve in sensor-
to-manipulator calibration; further increases in sensing accuracy would thus continue
to improve the perceptual capabilities of a robot, but not necessarily improve the
performance of its hand-eye calibration.
Laser line scanners have proven useful in manufacturing, as is well documented
both in the research literature [55] and by the numerous products available in the
marketplace. They have been often used in fixed settings, where objects are placed
on a rotary table in front of the scanner [44] or flow by on conveyor belts. Low-
cost implementations have been designed which rely on a known background pattern
instead of precision hardware. Triangulation-based laser scanners have also been
used on planetary rovers to model rough terrain [60], to find curbs for autonomous
vehicles [62] and to model archaeological sites and works of art [54].
Numerous out-of-the-box triangulation systems are commercially available for
imaging small objects. However, many of these systems emphasize high accuracy
(< 0.1mm), often sacrificing depth of field. To be of most use to a mobile manipula-
tor, the sensor needs to cover the entire workspace of the manipulator, and “extra”
sensing range is helpful in determining how to move the platform so that a nearby
object will enter the workspace of the manipulator.
3.3 Laser Line Scanning for Robotics
3.3.1 Fundamentals
The geometry of laser-line triangulation scheme is well-studied and only repeated
here for completeness. Many variants of the underlying concepts are possible. In the
scanners described in this chapter, a rotating vertical laser line is directed into the
30 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING
scene. An image formed on a rigid, horizontally-offset camera shows a line which is
deformed by the depth variations of the scene (Figures 3.3 and 3.4). On each scanline
of the image, the centroid of the laser slice is detected and used to define a ray from
the camera origin through the image plane and into the scene. This ray is intersected
with the plane of laser light defined by the angle of the laser stage, its axis of rotation,
and the 3D translation from the laser stage to the camera origin. The intersection
of the plane and pixel ray produces a single 3D point directly in the image frame,
thus avoiding the depthmap-to-image registration problem, since the 3D point cloud
is defined directly in the camera image plane.
The vertical angular resolution of the point cloud is limited by the vertical res-
olution of the camera, whereas the horizontal resolution is determined by the the
rotational speed of the laser, the frame rate of the camera, and the horizontal field
of view. Depth resolution is likewise determined by a variety of factors, including
the ratio between the horizontal resolution of the camera and the field of view, the
precision of the shaft encoder on the laser stage, the ability to achieve horizontal
sub-pixel interpolation, the horizontal offset between the camera and the laser, and
the distance of the object from the camera.
3.3.2 Hardware Considerations
The prototype apparatus acquired roughly 600 images during each scan. The hori-
zontal field of view was approximately 70 degrees, and was overscanned by 10 degrees
to accommodate for the depth variations of the scene. As a result, the laser line
moved approximately 0.15 degrees per frame.
The prototype scanner onboard the robot required six seconds to gather its 600
images, which were buffered in RAM on a computer onboard the robot. Subsequent
image processing and triangulation steps require an additional 4 seconds. Such a
slow rate of acquisition means that the scanner cannot be used in fast-moving scenes.
This is a fundamental limitation of this type of laser line scanning. However, addi-
tional implementation effort could result in dramatic speedups, e.g., moving to (very)
high-speed cameras and/or performing the image processing on a graphics processor
3.3. LASER LINE SCANNING FOR ROBOTICS 31
Figure 3.3: A vertical (green) laser line projected by the robot at left is deformed asit strikes objects in the scene.
(GPU).
3.3.3 Calibration
The automatic checkerboard-finding algorithm and nonlinear solver implemented in
OpenCV [8] was used to estimate the camera intrinsics. To estimate the extrinsic
calibration between the camera and the laser, first the translation and rotation were
roughly measured by hand. Then, a flat board was scanned that was marked with
several points whose relative planar distances had been carefully measured. The lo-
cations of these points in the camera image were found and recorded. Then, the
calibration error could be quantified: an error function can be created as the sum
of planarity measures as well as deviations from the measured distances on the cal-
ibration board. The calibration board was imaged from several angles to cover the
workspace of the scanner. A numerical optimization routine was then used to min-
imize the sum of the errors while perturbing the parameters, randomly restarting
many times to explore many different local minima.
The resulting calibration holds true except at the extreme edges of the camera
32 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING
Figure 3.4: A prototype laser line scanner on the STAIR 1 robot. The laser and itsrotary stage are mounted in the upper-right. Images are captured by the camera inthe lower-left.
view, which was hypothesized to be due to lens effects not captured in the standard
radial and tangential distortion models. Away from the edges of the image, the
scanner shows errors in the range of 1mm when imaging flat surfaces such as doors,
desks, and walls.
To calibrate the manipulator to the scanner, it was necessary to estimate the
6D transform between the manipulator base and the camera frame. To accomplish
this, the end effector of the manipulator was touched to several points on a test
board which were easily identifiable in the camera frame. Each time the end effector
touched a target point, the forward-kinematics estimate of the end effector position
was recorded. Finally, a numerical optimization routine was employed to improve the
hand-measured estimate of the 6D transform. The resulting calibration accuracy was
approximately 5mm throughout the workspace of the manipulator.
3.4. OBJECT DETECTION 33
Figure 3.5: Image channels considered by the patch-selection algorithm, along withthe typical appearance of a coffee mug. Top: intensity image. Middle: gradientimage. Bottom: depth image.
3.4 Object Detection
Once the scanner was calibrated, it was ready to be employed to improve the perfor-
mance of object detection. For many robotics applications, this is a critical subgoal
of a larger task: for example, in order to grasp an object, it is first necessary to detect
its presence (or absence) and localize it in the workspace.
Although the laser-line scanner geometry results in the production of depth es-
timates in the image plane, these estimates do not lie on a regular grid due to the
sub-pixel horizontal interpolation being used to estimate the center of the laser stripe.
34 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING
Furthermore, some regions of the depth image will be more dense than others, de-
pending on the relative direction of the surface normal and the distance to the surface.
Thus, the depth maps were resampled so that exactly one depth estimate was pro-
duced for each RGB pixel. In the experiments described in this chapter, this was
achieved through bilinear interpolation. The resulting depthmap can then be consid-
ered as another channel in the image.
3.4.1 Sliding Windows
Sliding-window methods attempt to probabilistically match a rectangular window of
the image with a collection of features local to the window. These features are very
small “patches” of the window. The classifier can be viewed as a “black box” which
returns a high probability if the window tightly bounds an instance of the target
object class, and a low probability otherwise. To perform object detection across
an entire image, the window is shifted through all possible locations in the image at
several spatial scales.
An extension of the sliding-window approach was used to combine information
from the visual and depth channels. Similar to the state-of-the-art approach of Tor-
ralba et al. [98], the features used by the probabilistic classifier were derived from
a learned “patch dictionary.” Each patch was a very small rectangular subregion
randomly selected from a set of hand-labeled training examples. The channels con-
sidered were the original (intensity) image, the gradient image (a transformation of
the original image: edges become bright, flat regions become dark), and the depth
map discussed in the previous section. The patches were drawn separately from these
three channels, and probabilistically represented the visual appearance (intensity or
edge pattern) and shape (depth profile) of a small region of the object class, as shown
in Figure 3.5.
Combined, the patches provided a generalized representation of the entire object
class that is robust to occlusion and appearance or shape variation. Each dictionary
entry contained the patch g, its location within the window containing the positive
example w, and the channel from which it was drawn c (intensity, gradient, or depth).
3.4. OBJECT DETECTION 35
Figure 3.6: Examples of localized patches from the coffee-mug dictionary. Left: In-tensity patches. Middle: Gradient patches. Right: Depthmap patches.
A patch response for a particular window was computed by measuring the similarity
of the corresponding region within the window to the stored patch.
More formally, let the image window be represented by three channels {I i, Ig, Id}corresponding to intensity, gradient and depth, respectively. Then the patch response
for patch p = 〈g, w, c〉 is
maxw′
dc(Ic, g)
where dc() was a similarity metric defined for each channel. To improve robustness to
minor spatial variations, w′ was a 7× 7 pixel grid centered around the original patch
location in the training set. This allowed the patches to “slide” slightly within the
window being tested.
The similarity between patches was computed using normalized cross-correlation.
The intensity and gradient channels were normalized by subtracting the average
(mean) from the window. The depth channel was normalized by subtracting the
median depth.
3.4.2 Learning the Classifiers
The preceeding discussion assumed that the classifiers were already known. This
section will describe how the classifiers were built from training data.
For each object class, a binary gentle-boost classifier [28] was learned over two-split
decision stumps in these steps:
36 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING
Figure 3.7: Precision-recall curves for mugs (left), disposable cups (middle), andstaplers (right). Blue solid curve is for our method; red dashed curve is for visiononly detectors. Scores are computed at each threshold by first removing overlap-ping detections. A true-positive is counted if any detection overlaps with our hand-labeled ground truth by more than 50%. Any detection that does not overlap with agroundtruth object of the correct class is considered a false-positive. Average Preci-sion measures 11-point interpolated area under the recall vs. precision curve. Greaterarea under the curve is better.
• Construct a training set by cropping positive examples and random negative
windows from the training images.
• Build an initial patch dictionary by randomly sampling regions from the positive
training images, and compute patch responses over our training set.
• Learn a gentle-boost classifier given these responses.
• Trim the dictionary to remove all patches that were not selected by boosting.
• Run the classifier over the training images and augment the set of negative
examples with any false-positives found.
• Repeat the training process with these new negative examples to obtain the
final classifier.
Since we are learning two-split decision stumps, our classifiers are able to learn
correlations between visual features (intensity patterns and edges) and object shape
(depth). Example patches from a coffee-mug classifier for the three image channels are
3.5. DOOR OPENING 37
shown in Figure 3.6. This figure is a typical representation of 12 of the approximately
50 patches selected by the algorithm.
Five-fold cross-validation was performed to evaluate the performance of the de-
tectors and compare them against state-of-the-art detectors that did not use depth
information. The dataset consisted of 150 cluttered office scenes, with several objects
in each scene. The training procedure outlined above was used for each detector and
the average performance was calculated over the hold-out sets. Results for coffee
mugs, disposable cups, and staplers are shown in Figure 3.7 and the following table:
Mug Cup Stapler
3D 2D 3D 2D 3D 2D
Max. F-1 Score 0.932 0.798 0.922 0.919 0.662 0.371
Average Precision 0.885 0.801 0.879 0.855 0.689 0.299
The results suggest that 3D information appears to help eliminate false positives.
The 2D detectors seldom miss instances of their trained object class. Instead, the
typical problem is that they can collect a variety of disparate cues from shadows
or unrelated objects that together match enough of the localized patches that the
sliding-window detector considers it a high-probability detection.
The 3D information can help in this regard: the training process often automati-
cally selects relatively large, uniform depth patches. Effectively, this associates higher
probabilities to windows which tightly bound a single object rather than a collection
of several disparate objects. Because the approach described in this chapter did
not normalize for depth variation inside a patch, but only for its median, the depth
patches thus also encoded a measure of the absolute size of an object. These depth
cues were not explicitly expressed in the visual-light image, and as is common in
machine learning systems, presenting a richer set of features to the classifier helped
to boost performance.
3.5 Door Opening
Automation of a large variety of home and office tasks will require that robots rou-
tinely open and pass through doors. For example, at the end of a workday a typical
38 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING
Figure 3.8: After localizing the door handle in the 3D point cloud, the robot can plana path to the handle and open the door.
office building will have tens or hundreds of closed doors that must be opened if
the robot is to clean the building or search for an item. The ability to open a door
thus needs to be another primitive in the robot’s navigation toolbox, alongside path
planning and localization. In this section, a door-opening system is summarized, to
emphasize the utility of high-resolution 3D sensing for mobile manipulation.
Door opening requires manipulating the door handle without colliding with the
door. The operation of a typical door latch does not allow more than a centimeter
or two of positioning error, as the end effector is continually in close proximity to
the (rigid) door surface. Thus the door-opening task, like any grasping task where
target objects are identified in a camera, tests not only sensing accuracy but also the
calibration between the sensing system and the manipulator.
To test the utility of high-accuracy 3D sensing in this task, a system was con-
structed that used a hand-annotated map to marks the locations of doors on a 2D
building floor plan. If the robot needed to pass through one of the marked doorways,
it used the triangulation-based laser scanner described in this chapter to scan the
door. From this scan, the robot used a classifier trained on hundreds of door handles
3.6. INVENTORY-CONTROL EXPERIMENT 39
Figure 3.9: Detecting coffee mugs in cluttered environments. The detector correctlyignored the paper cup to the right of the coffee mug.
to localize the handle and classify the door as right-handed or left-handed. The robot
then drove to a location where its manipulator could reach the door handle, planned
a manipulator path to the edge of the handle, and pressed on the handle to unlatch
the door, as shown in Figure 3.8. Once the door was unlatched and partially opened,
the robot was able to drive through the door by pushing it fully open as its chassis
(slowly) came into contact with the now-unlatched door.
High-resolution point clouds assisted in planning collision-free manipulator paths
to the door handle. Unlike some 3D sensing modalities which effectively “low-pass”
the depth map as part of the sensing process, the laser-line triangulation process did
not smooth out depth discontinuities, such as those between the door handle and the
door immediately behind it. As a result, the door handle stood out sharply in the
3D data, making path planning and recognition considerably easier.
40 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING
Figure 3.10: The inventory-gathering experiment required autonomous navigation(green track), autonomous door opening, and 20 laser scans of desks in the fouroffices shown above. The robot position at each scan is shown by the red circles, andthe field-of-view of each scan is indicated by the yellow triangles. The locations ofthe detected coffee mugs are indicated by the orange circles. This figure was entirelyautomatically generated, using the SLAM output for the map and the localizationlog for the robot track and sensing positions, which allow the coffee-mug detectionsto be transformed into the global map frame.
3.6 Inventory-Control Experiment
To demonstrate the utility of these two uses of the laser-line triangulation scanner
on our mobile manipulator, the object-detection and door-opening algorithms were
combined to form an inventory-taking system. Such a system could be envisioned in
a future home, perhaps cataloging the locations of every object in the house at night
so that the robot could instantly respond to human queries about the location of
commonly-misplaced objects. Workplace applications could include inventory-taking
in retail stores, safety inspections in industry, or location verification of movable
equipment in, e.g., hospitals.
In this system, a high-level planner sequenced a standard 2D navigation stack, the
door-opening system, and the object-detection system, which together allow the robot
to take an inventory of an object class in a cluttered office building with closed (but
3.7. SUMMARY 41
unlocked) doors. The system was implemented using the ROS software framework,
which will be described in Chapter 8. A building map was created offline using the
GMapping SLAM toolkit [34], using LIDAR and odometry data. The resulting map
was hand-annotated to mark the locations of doors and desks. The runtime navigation
stack is derived from the Player localization and planning modules, which perform
particle-filter localization and online path planning which unifies object-avoidance
and goal-seeking behaviors.
As necessary during the inventory-taking sequence, control switches to the door-
opening system discussed in the previous section, after which control is returned to
the 2D navigation stack.
A representative run of the inventory-gathering system is shown in Figure 3.9.
During this run, 25 coffee mugs were spread throughout in the search area. The 3D-
enhanced object detector found 24 of them, without any false positives. In contrast,
the image-only detector was only able to find 15 of the mugs, while also reporting 19
false positives. The robot was also seeking to identify disposable paper cups among
the clutter of the environment. The mug-inventory and cup-inventory results are
compared against ground truth in the following tables for both the integrated 3D
detectors and the 2D-only detectors.
3D-Enhanced Detectors
OBJECT COUNT HIT ERROR RECALL PREC.
Mug 25 24 0 0.96 1.00
Cup 10 8 2 0.8 0.8
2D-Only Detectors
OBJECT COUNT HIT ERROR RECALL PREC.
Mug 25 15 19 0.6 0.441
Cup 10 8 4 0.8 0.67
3.7 Summary
As shown by the PR curves obtained when using the 3D information versus 2D
alone, incorporating high-quality 3D information into the sensing scheme of a mobile
manipulator can increase its robustness when operating in a cluttered environment.
42 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING
The door-opening task shows that high-quality 3D data can help accomplish motion
planning by accurately sensing the immediate vicinity of the robot.
These experiments were conducted using a simple laser-line triangulation device
which was constructed to provide high-quality depth data which preserved sharp
depth discontinuities and avoided depth-to-image registration difficulties. Many other
depth-measurement modalities exist, such as popular depth cameras using systems
from PrimeSense and other manufacturers, which are far easier to procure and op-
erate, and offer much faster speeds. These experiments were conducted before the
general availability of such off-the-shelf depth cameras. However, the data quality of
such cameras, as quantified by point density and noise, has yet to match that provided
by simple laser-line triangulation scanners, albeit at a much slower rate. However,
the quality of depth measurements produced by low-cost depth cameras is rapidly
increasing, and thus the algorithms presented in this chapter have the possibility of
becoming more applicable to commodity depth cameras in future personal robots.
High-resolution depth data can also be obtained by simply reducing the range
between the scanner and the targets of interest, as many depth-sensing modalities,
including stereo vision, have error modes which are a function of range. The results
of the work presented in this chapter inspired the integration of depth sensing into a
robotic hand which will be presented in Chapter 6, where extremely high resolution
3D point clouds can be acquired by “flying” the robotic hand near the objects of
interest.
Regardless of how such depth data is acquired, the results presented in this chapter
suggest that high-resolution depth processing is a powerful technique to address the
variability and clutter found in the everyday environments likely to be encountered
by personal robots.
Chapter 4
Inertial Joint Encoding
4.1 Introduction
The previous chapter advocated the use of high-precision 3D data to improve object
detection and manipulation of large robots equipped with manipulator arms. This
chapter moves further down the system, addressing the sensing needs of the manipu-
lator arms themselves while controlling system cost, as future deployments of personal
robots are envisioned to be extremely cost sensitive.
This chapter presents a sensing strategy which uses a series of 3D MEMS ac-
celerometers to provide a low-cost method for estimating the kinematic configuration
of robot manipulators. The approach mounts at least one 3D accelerometer for each
pair of joints. Then, the joint angles are inferred using either point estimates or
through a tracking method using an Extended Kalman Filter (EKF). Because of its
low cost, low power, and negligible volumetric requirements, this system of joint po-
sition estimation can be used to augment an existing robotic sensor suite, or it can
function as the primary sensor for a low-cost robotic manipulator.
43
44 CHAPTER 4. INERTIAL JOINT ENCODING
Figure 4.1: Two manipulators used to demonstrate the utility of accelerometer-basedsensing. Left: Willow Garage PR2 Alpha. Right: a prototype low-cost manipulator.
4.2 Related Work
Inertial sensing has been used extensively in recent years for human motion capture.
Several companies provide small, lightweight, and networkable inertial units which in-
tegrate accelerometers, gyroscopes, and magnetometers, and the required supporting
electronics. These systems are easily attached to human limbs and torsos and are used
extensively in film or video-game character animation [91], virtual reality [27], [26],
or human activity recognition [3], among many existing and proposed applications.
Accelerometers are used in the attitude determination systems of virtually every
aerial vehicle, as well as for a variety of unconstrained navigation applications includ-
ing underwater vehicle guidance. Mass-market accelerometer applications began with
collision-sensing devices for automotive air-bag systems, but inertial sensors are now
found in an ever-increasing array of products, including mobile phones, tablet com-
puters, digital cameras, video game controllers, laptop hard drives, and so on. The
devices are now so small and power-efficient that new applications are continually
emerging.
4.3. STATE ESTIMATION 45
In the robotics literature, accelerometers have also found other uses in robot navi-
gation beyond attitude determination. For example, when strapped to legged robots,
spectral analysis of accelerometer readings can be used to classify the walking sur-
face to aid in gait selection and tuning [103]. For robotic manipulation, prior work
includes simulation results of configuration estimation [67], employing strapdown in-
ertial units on heavy equipment [30], kinematic calibration [12], and the creation
of a fault-detection system [1]. In [58], accelerometers were doubly-integrated using
an Extended Kalman Filter (EKF) to estimate the configuration of a SCARA-type
manipulator undergoing high-dynamic motions. The use of accelerometers in flexible-
link robots was proposed in [56]. A human-robot system was proposed in [65] which
incorporated the attachment of accelerometers and other sensors to a human tele-
operator.
4.3 State Estimation
The vast majority of robotic manipulators use shaft encoders of some variety (optical,
magnetic, resistive, capacitive, etc.) to determine the kinematic configuration of the
manipulator. Shaft encoders may be placed on the joints themselves, or they may
be used on the motor shafts prior to the speed-reducing elements, to gain increased
resolution at the cost of not observing any unmodeled behavior in any downstream
transmissions or linkages. In contrast, this chapter discusses a sensing scheme based
solely on 3D MEMS accelerometers, and does not require any electromechanical shaft
encoders to produce estimates of the kinematic configuration of the manipulator.
As will be discussed in Chapter 6, such volumetric considerations become critical in
heavily-constrained design environments such as robotic hands.
In static conditions, a 3-axis accelerometer essentially returns a 3D vector pointed
away from the center of the earth. Since the length of this vector is fixed at 1g in
the static case, a static 3-axis accelerometer only exhibits two degrees of freedom.
Thus, at least one accelerometer is required for every two rotary joints in a robotic
manipulator. However, incorporating one accelerometer per rotary joint will increase
the robustness, the accuracy of the joint-angle estimates, and eliminate the potential
46 CHAPTER 4. INERTIAL JOINT ENCODING
for multiple point-estimate solutions. These effects will be discussed in following
sections.
It is important to note that given a vector of 3D accelerometer readings and
knowledge of the kinematics of the system, it is possible to estimate joint angles in
all cases except measurement singularities where an axis of rotation is vertical. As an
axis of rotation approaches the vertical, inference accuracy is reduced by the loss of
effective resolution, as the noise floor of the measurement engulfs the projection of the
gravity vector onto the joint axis. In an all-accelerometer sensing scheme, this set of
configurations must be avoided, and the severity of this limitation is dependent on the
kinematics of the manipulator and the required task. For example, accelerometer-
only sensing will completely fail on SCARA-type manipulators which always have
vertical joint axes. However, for the anthropomorphic manipulators considered in
this chapter, vertical joint configurations are readily avoidable for most degrees of
freedom. Alternatively, excursions through vertical or near-vertical joint orientations
could be gracefully handled by augmenting the accelerometer measurements with
magnetometers, back-EMF sensing, angular rate sensors, or shaft encoders of various
types.
4.3.1 Estimation via EKF
This section describes a method of producing a coherent estimate of the manipulator
state using accelerometers attached to each link of the kinematic chain. This is a
sensor-fusion problem: each accelerometer, by itself, can only determine the direction
of a gravity-directed vector. However, by using a priori knowledge of the kinematic
constraints of the manipulator, it is possible to produce a unified state estimate. In
this section, this is achieved by an Extended Kalman Filter (EKF). An augmented
forward kinematics function is used to predict the accelerometer readings given the
current belief state. The EKF algorithm then updates the belief state after observing
the true measurement. The following paragraphs will discuss this process in more
detail.
Numerous strategies can be employed to infer the manipulator configuration from
4.3. STATE ESTIMATION 47
accelerometer readings. An EKF is used in this chapter due to its relatively simple
implementation and fast on-line performance. A detailed discussion of the EKF
algorithm is beyond the scope of this chapter and is presented in many excellent
texts [96]. This section only discusses the aspects of the EKF implementation unique
to this work.
To encourage smooth estimates of the joint position, the state space is defined to
include the joint angles, velocities, and accelerations:
x =
θ
θ
θ
(4.1)
where θ represents the joint angles of the manipulator. The state (plant) update
function implements numerical integration and assumes constant acceleration:
f(x) =
θ + ∆tθ
θ + ∆tθ
θ
(4.2)
The measurement function needs to predict sensor measurements z given a state
x. As is often the case in EKF implementations, the measurement function could
be made arbitrarily complex to capture more and more properties of the system.
However, experiments showed that a measurement function which only predicts the
acceleration due to gravity was sufficient to handle the low-frequency regime of the
manipulators considered in this work. Adding additional terms to capture centripetal
accelerations and linear accelerations induced by joint-angle accelerations did not
change the performance, as will be discussed in Section 4.6.
To predict the measurement zi of a particular 3D accelerometer αi given the state
x,
zi = Rαii R
ii−1 (x) · · ·R1
0 (x)R0w
0
0
1
(4.3)
48 CHAPTER 4. INERTIAL JOINT ENCODING
where Rαii represents the rotation from the accelerometer frame to the frame attached
to link i, the rotations Rii−1 (x) are the rotations between the link frames, and R0
w is
the rotation from the base of the manipulator to the world. The following paragraphs
describe these rotations in more detail.
Rαii is determined by how accel
Rii−1 (x) is determined by the axial orientation of link i and link i− 1, as well as
by the joint angle θi present in the state x. The link twist can be statically estimated
during calibration, but the joint angle must be recursively estimated by the EKF.
R0w is the rotation between the base of the manipulator and the gravitational
frame. If the manipulator is stationary, this rotation is constant and can be estimated
by static calibration. There are numerous situations where the R0w rotation is not
constant and needs to be estimated, but they lie in extreme domains of robotics:
manipulation on vehicles traveling rapidly across rough terrain, on spacecraft, or
aboard ships in rough seas, for example. These situations are far beyond the scope of
this chapter, but could be readily addressed by estimating R0w through various means.
Using the state-update and measurement functions, the EKF algorithm produces
a recursive estimate of the mean and covariance of the state as more timesteps and
observations are experienced. The computational requirements of an EKF of this
size are not a concern: even when computing numerical derivatives of the update and
measurement functions, updating the 18-state EKF described in this section at 100 Hz
only required 3% of a single core of a desktop CPU at time of writing. This efficiency
was due in part to the usage of the Eigen C++ library for efficient numerical code
generation, but significant speedups could also be found through analytical derivatives
of the update equation, if such proved necessary for a given target application and
computational resources.
4.3.2 Point estimates
The previous section described an EKF-based method to estimate a joint vector of
a manipulator given a vector of accelerometer readings distributed throughout the
manipulator. This section will describe an alternative method, which can produce
4.3. STATE ESTIMATION 49
Figure 4.2: Accelerometers are present in the F0, F2, and F3 links of the robotic finger,creating a 2-DOF estimation problem between F0 and F2, and a 1-DOF estimationproblem between F2 and F3.
estimates of the joint vector given only a single set of accelerometer readings. The
tradeoff, however, is that point estimates do not provide filtering of the measure-
ment noise, and depending on the accelerometer configuration, the point estimates
may have multiple equally-likely estimates of the joint vector. These issues will be
discussed in detail in the following sections.
Both the 1-DOF and the 2-DOF estimation scenarios arise in the joint-state esti-
mation of the robotic fingers described in Chapter 6. For clarity of discussion, these
cases are presented here, although the inter-related electrical and mechanical issues
will be described in Chapter 6. As shown in Figure 4.2, each finger presents a 1-DOF
and a 2-DOF estimation problem.
The simplest closed-form solution arises when two accelerometers are separated by
a single joint. In the robotic finger shown in Figure 4.2, this occurs when estimating
the F2-F3 joint angle. In this case, the joint angle can be estimated by taking the
difference between the arctangents of the projection of the gravity vectors of each
accelerometer into the plane orthogonal to the joint axis. The accuracy of this esti-
mate varies as a function of the angle between the joint axis and the gravity vector:
the accuracy is maximal when the joint axis is orthogonal to gravity (as depicted in
50 CHAPTER 4. INERTIAL JOINT ENCODING
Figure 4.2, and it degenerates to contain no information when the joint axis is paral-
lel to gravity. As a result, the configuration of the system must be considered when
evaluating the information content of the estimate, and joint axes near the vertical
will likely be dominated by sensor noise.
Although the geometry is somewhat more tedious, a pair of accelerometers sep-
arated by two orthogonal rotary joints also can produce a closed-form solution, al-
though depending on the joint configurations with respect to gravity and the joint
limits, up to four possible solutions may exist. In the robotic finger shown in Fig-
ure 4.2, this estimation scenario occurs when estimating the pair of joint angles con-
necting links F0 and F2, as accelerometers were not able to be fitted within the highly
constrained volume of the connecting link.
The closed-form solution requires a lemma that is presented in [42] through an
exposition of inverse kinematics, in which a similar trigonometric problem often recurs
and is reproduced here for completeness. Specifically, an equation of the form:
a cosψ + b sinψ = c (4.4)
has a closed-form solution using the following change of variables:
a = r sin γ (4.5)
b = r cos γ (4.6)
thus,
r =√a2 + b2 (4.7)
ψ = atan2 (a, b) (4.8)
giving zero, one, or two solutions:
ψ = atan2(c,±√r2 − c2
)− atan2 (a, b) (4.9)
4.3. STATE ESTIMATION 51
Equation 4.9 will prove useful in the derivation of a closed-form point solution
to the 2-DOF separation between the accelerometers on F0 and F2 in the robotic
finger shown in Figure 4.2. Let the joint vector of the finger be defined as (0, 0, 0)
when the finger is in the fully outstretched posture shown in Figure 4.2. Furthermore,
let θ represent the proximal joint angle about the z axis, and φ represent the next
rotation about the y axis, as shown in Figure 4.2. Finally, let α0 and α2 define the
accelerometer vectors in links F0 and F2, respectively. Assuming the system is at
rest, and only the gravity vector is observed on the sensors, accelerometer α2 can
then be written as a function of accelerometer α0:
~α2 = Ry (φ) Rz (θ) ~α0 (4.10)
=
cφ 0 −sφ0 1 0
sφ 0 cφ
cθ sθ 0
−sθ cθ 0
0 0 1
~α0 (4.11)
~α2 =
cφ(α0xcθ + α0ysθ
)− α0zsφ
−α0xsθ + α0ycθ
sφ(α0xcθ + α0ysθ
)+ α0zcφ
(4.12)
The middle row of Equation 4.12 is of the form of Equation 4.4 and thus has
solutions of the form of Equations 4.9, by substituting a = α0y, b = −α0x, and
c = α2y. This substitution results in up to two possible joint estimates for the
proximal joint angle θ:
θ = atan2(α2y,±
√α0x
2 + α0y2 − α2y
2)− atan2
(α0y,−α1x
)(4.13)
Depending on the joint limits of the mechanism, it is often possible to eliminate
one of the estimates θ. For example, the proximal joint of the robotic finger shown in
Figure 4.2 has a range of motion of ± 90 degrees. Assuming that the estimation of θ
can be reduced to a single solution due to kinematic constraints, the estimated value
of the φ, the distal joint of the pair, can then be used with either the top or bottom
row of Equation 4.12, using another substitution of Equation 4.9. Using the top row
52 CHAPTER 4. INERTIAL JOINT ENCODING
of Equation 4.12, the substitutions are a =(α0xcθ + α0ysθ
), b = −α0z, and c = α2x,
resulting in the following expression for φ:
φ = atan2
(α2x,±
√(α0xcθ + α0ysθ
)2+ α0x
2 − α2x2
)−atan2
(α0xcθ + α0ysθ,−α0z
)(4.14)
4.4 Calibration
The accelerometer based sensing framework described in this chapter can be used
for controlling a low-cost manipulator lacking other sensors, as well as to provide
an additional absolute configuration sensor on a manipulator equipped with another
primary sensing system. The second application can be useful when the primary
sensing modality is incremental, and the manipulator goes through a homing sequence
after initial power-up. An initial estimate of the manipulator state, even if far less
accurate than the primary sensing modality, could allow for a more efficient homing
sequence. The calibration accuracy achievable using accelerometer-based sensing in
both scenarios is evaluated in the following sections.
Depending on the fabrication methods employed to construct the manipulator
in question, some of the static link parameters may not be known a priori to high
precision. In addition, the accelerometer axes may have internal misalignments, and
additional misalignments are incurred as the chip is soldered to its circuit board
and that circuit board is subsequently mounted to the robotic manipulator. These
misalignments are cumulative, and can result in rotations of several degrees between
the measurement axes and the axes of the kinematic frames. Fortunately, these
mechanical imperfections are static, and thus can be modeled and calibrated away.
To demonstrate a calibration technique for low-cost manipulators using accelerom-
eters as a primary sensing modality, an unactuated arm was constructed, shown in
Figure 4.3. This 4-dof unpowered arm has a roughly spherical shoulder and a single
elbow joint, with link lengths similar to the manipulator shown in Figures 4.1. The
same accelerometers were used as on the powered manipulator shown in Figure 4.1.
4.4. CALIBRATION 53
Figure 4.3: Left: an unpowered arm used to evaluate the calibration potential of theaccelerometer-based sensing approach. Right: touching the end effector to points onthe calibration board.
The calibration scheme for the low-cost prototype was loosely derived from the
checkerboard-based calibration method widely used in computer vision [110]. A pla-
nar calibration board was placed in the workspace of the manipulator. The end
effector of the manipulator was touched to its corners, each time recording the corre-
sponding accelerometer readings. Then, the board was translated and rotated out of
plane, and the process was repeated to collect a dataset of 20 different measurements
covering the manipulator workspace. Because the sizes of the checkers are known
to high precision, these measurements served to provide scale and skew constraints
to the calibration. This data was augmented by collecting a very large number of
accelerometer readings of manipulator configurations where the end effector was in
contact with a large planar surface such as a tabletop. This large dataset served to
provide planarity constraints.
54 CHAPTER 4. INERTIAL JOINT ENCODING
Formally, the optimization problem can be written as:
arg minL,R
g1(α,L,R) + λ2g2(α,L,R) + λ3g3(R) (4.15)
where α are the accelerometer readings in the test set, L represents the estimated link
parameters, and R represents the rotation matrices representing the misalignment of
each accelerometer frame to its respective link frame.
The first term enforces the known scale of the calibration board:
g1(α,L,R) =∑i,j
||dij − dij|| (4.16)
where dij is the distance between the estimated positions of the manipulator:
dij = |FK(αi,L,R)− FK(αj ,L,R)| (4.17)
where, FK is the forward-kinematics joint-angle estimation algorithm described in
the previous section to produce estimates of the end-effector positions using the ac-
celerometers and the link parameters L. The subscripts i and j identify samples
from the training set which were gathered from recorded positions on the calibration
checkerboard pattern, and which therefore correspond to end-effector positions whose
ground-truth distance dij is known.
The second term of the calibration optimization function, g2, corresponds to the
planarity constraint imposed on the large number of manipulator configurations in
which the end-effector was touching a tabletop which is assumed to be planar. Let
P be the plane fitted by taking the vectors corresponding to the top two singular
values of Y TY , where Y is the n x 3 matrix whose rows yi consist of the end-effector
positions calculated by the estimated calibration. This sum of the distances between
the large number of estimated end-effector positions and their projections onto the
fitted plane provides a source of information about the severity of the miscalibration:
in an ideal calibration, this sum would be zero. Thus:
4.4. CALIBRATION 55
Figure 4.4: Hold-out test set error during the optimization convergence on prototypemanipulator. The horizontal axis shows the iteration number, and the vertical axisshows the mean of the miscalibrations. Numerical optimization drives the averageerror from 11mm to 2mm.
g2(α,L,R) =∑yi
||yi − projP (yi)|| (4.18)
The final term of the optimization function g3 encourages the misalignment rota-
tion matrices Ri to remain orthonormal during the optimization:
g3(R) =∑i
||RiTRi − I3|| (4.19)
To evaluate the utility of an accelerometer-based sensing scheme for a robot which
already has high-precision configuration sensing, data was collected using a Willow
Garage PR2 Alpha robot. This robot has two 7-DOF arms equipped with optical
shaft encoders on the motors driving each joint. The manipulator was already well
calibrated and the link parameters are known a priori to high precision. In this case,
the calibration task is simplified, needing to estimate only the rotations Ri of the
mounting of each accelerometer on its respective link. The joint angles from the shaft
encoders were treated as ground truth, and compared with estimates produced from
56 CHAPTER 4. INERTIAL JOINT ENCODING
Figure 4.5: Hold-out test set error during the optimization convergence on WillowGarage PR2 Alpha. The horizontal axis shows the iteration number, and the verticalaxis shows the mean error in joint angle estimates of the shoulder lift and the upperarm roll. The optimization drives the average error from 0.1 deg to 0.02 deg.
the accelerometer readings to calibrate the misalignment rotations. The resulting
optimization problem shown below is similar to Equation 4.15
arg minR
∑i
||θi − θi(α,Ri)||+ λ∑i
||RiTRi − I3|| (4.20)
where θi is the joint angle position of the manipulator as given by the shaft encoders
and θi is the estimate based on accelerometer readings. θi is computed by solving for
joint angles by considering inverse kinematics on pairs of links.
At time of writing, the implementation required approximately one hour of CPU
time to reach convergence using the simplex-based optimization technique imple-
mented by the MATLAB fminsearch function. As is always the case with nonconvex
optimization, a good starting point was necessary to achieve a reasonable solution.
Starting with the ideal parameters of the manipulator CAD models and assuming per-
fect sensor alignment resulted in obtaining reasonable solutions from the test data.
To evaluate the performance of the calibration methods quantitatively, we al-
lowed the optimization algorithm to use training data containing several manipulator
4.5. CONTROLLING A LOW-COST MANIPULATOR 57
positions and checkerboard orientations, and maintained a hold-out test set of sev-
eral other orientations for evaluation purposes. The results demonstrate a significant
calibration accuracy improvement over the initial starting point of the CAD models.
Figure 4.4 shows the convergence of the algorithm from an initial mean error of 11mm
down to 2mm on the prototype unpowered manipulator. Figure 4.5 shows conver-
gence of the algorithm from a mean error of 0.1 degrees in the shoulder lift joint to
0.02 degrees on Willow Garage PR2 Alpha.
On the low-cost unpowered manipulator, the resulting 2mm average end-effector
localization error is an order of magnitude worse than what is reported by manufac-
turers of high-quality robotic manipulators sensed by shaft encoders. However, it is
more accurate than the best camera-manipulator calibration we have been able to
achieve in several years of constructing and calibrating vision-guided systems with
high-performance manipulators. We anticipate that this level of calibration error will
not be the limiting factor of using low-cost localization approaches in a complete
robotic system.
4.5 Controlling a low-cost manipulator
To explore the feasibility of low-cost manipulator control using a purely accelerometer-
based control scheme, a 6-dof manipulator was constructed under a strict budgetary
constraint of $1000 USD (Figures 4.1, 4.6, 4.7). As this manipulator incorporates
several unconventional design features, its design is summarized in this section for
completeness. Mass-produced parts were employed wherever possible, such as au-
tomotive windshield-wiper DC gearmotors and skateboard bearings, and fabrication
cost was reduced by the use of laser-cut materials.
The shoulder operated in a spherical RRR configuration with a remote center of
motion to allow the second and third motors to operate in a direct-drive configuration
to achieve a minimal part count. Although the shoulder motors were powerful and
low-cost, they exhibited some cogging due to their ferrous cores, which had detrimen-
tal affects to the low-speed control of these joints. Unfortunately, powerful low-cost
58 CHAPTER 4. INERTIAL JOINT ENCODING
Figure 4.6: Shoulder and wrist of the demonstration manipulator
Figure 4.7: Elbow and gripper of the demonstration manipulator.
DC brushed motors suffer almost universally from cogging, and addressing this be-
havior is a challenge in low-cost arm design.
A friction differential drive (Figure 4.6) provided the wrist with pitch and roll
degrees of freedom. The differential was created by belt-driven rubber wheels pressed
firmly against a thin aluminum veneer. Such assemblies are low-cost, durable, and
effective at transmitting high torques. The friction drive provided zero-backlash and
an inherent safety limit: torque overloads resulted in slippage rather than damage to
the drivetrain.
The elbow is driven via belt from a large motor in the shoulder. The gripper was
fabricated from lasercut polypropylene to avoid the mechanical complexity of discrete
parts, using flexures to create a durable, zero-backlash 4-bar mechanism. The thin
belts visible in Figure 4.7 were used only to turn potentiometers for position feedback.
Linux-based software was written using the open-source Robot Operating System
4.6. EXPERIMENTS 59
Figure 4.8: Accelerometers were attached to the upper arm, forearm, and gripper ofa PR2 Alpha robot.
(ROS) platform, which will be presented in Chapter 8. ROS modules were writ-
ten for firmware communication (via the Linux usb-serial driver), state estimation,
joint-space control, teleoperation via joysticks, trajectory recording, and trajectory
playback/looping.
4.6 Experiments
In this section, a series of experiments are presented that quantify the performance
of accelerometer-based state estimation and closed-loop control on two robots: the
low-cost manipulator discussed in the previous section, and a Willow Garage PR2
Alpha, a high-precision 7-dof manipulator. Accelerometers were designed into the
motor control boards distributed throughout four links of the low-cost manipulator.
The PR2 Alpha was outfitted with strapdown accelerometers, as shown in Figure 4.8.
4.6.1 PR2 Alpha State Estimation
To quantify the performance of the calibration and joint-tracking systems, accelerom-
eters were affixed to a Willow Garage PR2 Alpha robot (Figure 4.8). This 7-dof
60 CHAPTER 4. INERTIAL JOINT ENCODING
Figure 4.9: Tracking the forearm roll of the robot shown in Figure 4.8, showing theencoder ground-truth (red) against the joint angle estimate from the accelerometers(blue).
manipulator is equipped with high-quality shaft encoders, which served as ground
truth for this experiment. Its kinematic configuration includes a vertical first joint
(the “shoulder yaw”), which was not estimated in these experiments. This joint axis
is always parallel to gravity, and thus lies in a measurement singularity.
Figure 4.9 demonstrates the tracking performance of one joint (the forearm roll)
as the manipulator was smoothly moved through its workspace. The following ta-
ble (in degrees) shows the mean error throughout the trajectory, measured as the
difference between the shaft encoder readings and the joint state estimates from the
accelerometers.
Shoulder Lift Upper Arm Roll Elbow Flex
Error (deg) 0.965 0.926 1.590
This experiment was done under near-ideal conditions: the PR2 Alpha arm uses
spring balances for gravity compensation and small coreless motors [107]. The mech-
anism is thus extremely smooth and well-behaved, avoiding any transients or other
anomalies as it travels the workspace.
4.6. EXPERIMENTS 61
Figure 4.10: Closed-loop control of a low-cost manipulator using only accelerometers.Two joints are shown. Desired state is plotted in red. Output of the accelerometer-based state estimation algorithm is plotted in blue. Vertical axis denotes joint anglesin radians; horizontal axis denotes time.
4.6.2 Low-cost Manipulator Torque Control
Manipulators equipped with shaft encoders can often ignore the state-estimation
problem when designing a control scheme, as the quality of the state estimate is
often independent of the state of the robot. In contrast, the quality of the state es-
timates inferred from the accelerometers can vary wildly with the configuration and
velocity of the manipulator. This section discusses the ramifications of this property
on the behavior of a low-cost manipulator equipped only with accelerometers and DC
gearmotors.
For this experiment, a proportional-integral (PI) controller was wrapped around
the state estimates produced by the EKF described in Section 4.3.1. To reduce the
efforts required of the PI controller, active gravity compensation was implemented,
using the Jacobian to compute the feed-forward torques necessary to float the manip-
ulator links, as is common practice [15]. Representative trajectories for the first and
second joints in the manipulator (taken simultaneously) are shown in Figure 4.6.2.
The responses of the other joints were similar.
The accelerometer-only sensing scheme was found to break down under high dy-
namic conditions. More specifically, linear accelerations induced by angular joint
accelerations are not modeled in the measurement prediction of Equation 4.3, and
62 CHAPTER 4. INERTIAL JOINT ENCODING
Figure 4.11: Differences between each stopping position of the arm and their respec-tive cluster centroids, in the XY plane (left) and the XZ plane (right), as measuredby an optical tracker. 14 trials were run, all of which appear on this plot.
neither are the centripetal accelerations induced by the angular joint velocities. In-
terestingly, adding those terms did not improve the high-dynamic performance of the
manipulator, nor did adding a derivative term to the PI controller. Although it is
difficult to speculate without supporting experiments, it is possible that for such a
system to remain stable, it must have a higher system bandwidth, calibration level,
or measurement SNR.
As a result of the previous observation, the low-cost manipulator could become
unstable in high-dynamic situations. Furthermore, the measurement model of Equa-
tion 4.3 does not model contact forces; as a result, large contact forces may also
induce instability. In either case, stability can be regained by quickly ramping down
motor torques, which effectively slows the manipulator down until it re-enters a stable
region of the coupled sensing and control systems.
To quantify the repeatability of the low-cost manipulator, an active optical track-
ing device (Atracsys EasyTrack500) was used to obtain ground-truth position data of
the end effector. The tracking system has an accuracy of 0.2mm. The manipulator
was placed in front of the optical tracker, with the optical beacon attached to the
gripper tip.
The manipulator was commanded to cycle between two joint configurations which
required motion of at least 30 degrees on all six joints of the manipulator (excluding
4.6. EXPERIMENTS 63
Figure 4.12: In this experiment, accelerometer-based state estimation was used togenerate relative joint position commands, allowing a position-controlled robot torepeatedly grasp a doorknob.
the gripper fingers), resulting in end-effector travel of 34cm. The optical-tracking data
was analyzed to extract the points where the manipulator had stopped, resulting
in one cluster for each of the target positions. Figure 4.11 shows an estimate of
the repeatability of the manipulator, as measured by the deviation of each of these
stopping positions from the mean of its cluster. The mean deviation was 18.6mm,
and the maximum deviation was 33.9mm.
4.6.3 PR2 Alpha Position Control
A final experiment used the PR2 Alpha arm in a doorknob-grasping task (Fig-
ure 4.12). To avoid the instabilities witnessed in the low-cost manipulator experiment,
the accelerometer-based state estimator was used to control only the low-frequency
trajectory of the manipulator. The shaft encoders and internal high-frequency control
loops of the PR2 electrical system were used to stabilize the high-frequency behav-
ior. The accelerometer-based controller sent relative joint angle commands to the
PR2. As such, this controller could be used without shaft encoders on any manipula-
tor with stable position-based actuators, such as stepper motor-based manipulators.
64 CHAPTER 4. INERTIAL JOINT ENCODING
Figure 4.13: Time series of one PR2 joint as the manipulator undergoes relative jointangle commands from the accelerometer-based sensing scheme and simple setpoint-interpolation to derive small step commands.
Importantly, because the accelerometers fly on the actual links of the robot, their
measurements are not corrupted by link droop or cable stretch.
The accelerometer-based controller, stabilized by incremental joint encoders, was
used to control an arm of a PR2 Alpha robot. The arm was driven through a sequence
of control points to repeatedly grasp a door handle in front of the robot. Trajectory
tracking in this position-controlled scenario is shown in Figure 4.13.
4.7 Summary
The work described in this chapter was motivated by the ever-increasing precision
of consumer-grade MEMS accelerometers, and the observation that some anticipated
future domains of robotics, such as home service robots, possess many sources of error
beyond manipulator repeatability. In such scenarios, a reduction in the repeatability
of the manipulator may not drastically increase the overall system error figure.
4.7. SUMMARY 65
In general, an accelerometer-only sensing strategy removes complexity from the
electromechanical mechanisms and increases the complexity of the calibration and
control software. This strategy is motivated by the observation that complex software,
unlike complex hardware, can be replicated at no cost.
However, because adding accelerometers to an existing manipulator design is me-
chanically simple and incurs very little cost, particularly for robots already equipped
with circuit boards distributed throughout the kinematic structure, this sensing ap-
proach is also suitable as a backup, or auxiliary, sensing strategy for manipulators
equipped with shaft encoders. The accelerometers could then be used to bootstrap the
power-up sequence of manipulators equipped with relative shaft encoders: regardless
of the configuration of the manipulator at power-up, an accelerometer-driven EKF
will quickly converge to a reasonable estimate of the manipulator configuration. After
an accelerometer-based joint configuration is estimated, the manipulator could use its
incremental encoders to safely and quickly reach the homing flags for each joint. Fur-
thermore, accelerometers can provide information about impacts, drivetrain health
(through spectral analysis), and a continual “sanity check” for the incremental en-
coders.
This chapter presented a low-cost sensing scheme based on 3D MEMS accelerome-
ters. The system produces coherent, absolute estimates of the kinematic configuration
of a manipulator. Experiments were performed to quantify the performance of this
scheme using both high-precision and low-cost manipulators. The accelerometer-
based sensing algorithm can be readily applied to any manipulator to augment its
state estimation with very little hardware cost and trivial mechanical complexity.
Chapter 5
A Compliant Low-cost Robotic
Manipulator
5.1 Introduction
Many extant robotic manipulators are very expensive, due to high-precision actuators
and custom machining of components. Indeed, at time of writing, the cost of the
manipulator arms and hands together make up the bulk of the cost of many state-of-
the-art research robots. This observation led to the work described in this chapter,
wherein several fabrication and actuation methods are described which endeavor to
offer significant reduction in total system cost of robotic manipulators.
The underlying motivation is that robotic manipulation research, and eventually
real-world deployments of personal robots, can advance more rapidly if robotic arms
of “sufficient” performance were available at a greatly reduced cost. Increased afford-
ability can lead to wider adoption, which in turn can lead to faster progress—a trend
seen in numerous other fields [14]. However, drastic cost reduction requires numerous
design tradeoffs and compromises, many of which only are justifiable in a far different
set of application domains than the closed-workcell factory domains in which previous
generations of robotic manipulators have seen economic success.
66
5.1. INTRODUCTION 67
Figure 5.1: A low-cost compliant manipulator described described in this chapter. Aspatula was used as the end effector in the demonstration application. For ease ofprototyping, lasercut plywood was used as the primary structural material.
Closed-workcell production line applications demand manipulators with high de-
grees of repeatability, precision, and reliability. Mass-production welding and paint-
ing robots provide stereotypical examples of this application domain, and provide
enormous commercial value. However, although high-precision manipulators excel at
tasks for which they can be pre-programmed, robotic manipulators have not become
commonplace in the unstructured domains of typical homes and workplaces. In such
domains, the requirements of absolute precision and speed become overshadowed by
the cost and safety concerns of the consumer market. A major effort of this thesis is
to develop techniques which can drive down the cost of personal robots. As robots
become less expensive, an increasingly varied array of applications becomes feasible,
which could lead to home and service robots capable of performing enough tasks to
provide economic justification for their large-scale deployment.
There are numerous dimensions over which robotic arms can be evaluated, includ-
ing backlash, payload, speed, bandwidth, repeatability, compliance, human safety,
and cost, to name a few. In robotics research, some of these dimensions are more
important than others: for grasping and object manipulation, high repeatability and
68 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR
low backlash are important. Payload must be sufficient to lift the objects under study.
Human-safety is critical if the manipulator is to be used in close proximity to people
or in classroom settings.
Some areas of robotics research require high-bandwidth, high-speed manipulators.
However, in many research settings, speed and bandwidth may be less important.
For example, in object manipulation, service robotics, or other tasks making use of
complex vision processing and motion planning, large amounts of time are typically
required for computation. This results in the actual robot motion requiring a small
percentage of the total task time. Additionally, in many laboratory settings, manip-
ulator speed is often deliberately limited to give the experimenters time to respond
to accidental collisions or unintended motions.
A shipped product must include overhead, additional design expenditures, testing
costs, packaging, and possibly technical support, making a direct comparisons diffi-
cult between commercial product prices and the parts cost of research prototypes.
However, this chapter includes the parts cost of the manipulator in order to give
a rough idea of the possible cost reduction as compared to commercially-available
manipulators at time of writing. Experiments are then presented which demonstrate
that repeatibility on the order of millimeters can be achieved with low-cost fabrication
technologies.
A set of design goals were selected to guide development, intending to produce a
manipulator whose performance is comparable to commercially-available manipula-
tors currently used for personal-robot research:
• Human-scale workspace
• 7 Degrees of freedom (DOF)
• Payload of at least 2 kg (4.4 lb.)
• Human-safety considerations:
– Compliant or easily backdrivable
– Flying mass under 4 kg
5.2. RELATED WORK 69
• Repeatability under 3 mm
• Maximum speed of at least 1.0 m/s
• Zero backlash
To meet these goals while remaining sensitive to system cost, a manipulator was
designed and prototyped which employs low-cost stepper motors in conjunction with
timing belt and cable drives to achieve backlash-free performance. This design trades
off the cost of expensive, compact gearheads for increased system volume, mass, and
power consumption. To improve human safety, a series-elastic design was used, in
combination with minimizing the flying mass of the arm by keeping the motors at or
close to ground. The resulting prototype is shown in Figure 5.1.
A brief outline of this chapter is as follows. Section 5.2 gives an overview of some
other robotic arms used in robotics research. Section 5.3 presents an overview of the
design of the arm, and discusses tradeoffs in its actuation scheme. Section 5.4 dis-
cusses the series compliance scheme, and sections 5.5, 5.6, and 5.7 discuss its sensing
scheme, performance, and control, respectively. Section 5.8 discusses application of
the robotic arm to a pancake-making task, followed by a conclusion.
5.2 Related Work
The field of robotic manipulation has produced a vast number of manipulator designs
over the past several decades. A full survey of the field would be immense and beyond
the scope of this work. The following discussion covers some of the widely-used and/or
influential robotic arms used in personal robotics research at time of writing, many
with unique features and design criteria intended to function in the personal robotics
domain.
The Barrett WAM [78, 94] is a popular cable-driven robot well-known for its back-
drivability and smooth, fast operation. It is capable of high speed (3 m/s) operation,
advertises 2 mm repeatability, and achieves zero-backlash performance through the
use of cable reductions and cable differentials. Very high mechanical bandwidth is
70 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR
achieved through these cable reductions and the relatively low flying mass of the arm,
as the large shoulder and elbow motors are grounded.
The Meka A2 arm is series-elastic and intended for human interaction. Other
custom-made robots with series-elastic arms include Cog, Domo, Obrero, Twendy-
One, and the Agile Arm [9, 22, 99, 39, 71]. The Meka and Twendy-One arms achieve
zero-backlash performance by using harmonic drive gearheads. The Cog arms employ
planetary gearboxes, whereas Domo, Obrero, and the Agile Arm use ballscrews. These
robots use various mechanisms to provide generous series elasticity, and thus tend to
have a relatively low mechanical bandwidth of less than 5 Hz due to series compliance.
However, this bandwidth limitation has not appeared to restrict their use in research
in various manipulation domains, and appears to be offset by the increased margin
of safety.
A different approach is taken by several arms developed at the Stanford AI lab us-
ing a “macro-mini” approach, combining large, low-bandwidth series-elastic actuators
with small, high-bandwidth electric motors [114, 88].
The Willow Garage PR2 robot takes yet another approach to safety: a unique
3-DOF gravity-compensation mechanism allows the arms to passively float in any
configuration. Because the large masses of the arms are always supported, only
relatively small motors with small gear reductions are needed to move the arms and
support payloads. These small actuators can be easily backdriven and thus help
improve human safety, especially when combined with soft coverings on the arms.
The prior set of designs can be contrasted with the DLR-LWR III arm [38], Schunk
Lightweight Arm [84], and Robonaut 1 [2], which all use motors directly mounted to
each joint and harmonic-drive gearheads to provide high control bandwidth with zero
backlash. These arms have higher payloads than the other arms discussed in this
section, ranging from 3-14 kg. However, human safety is only possible with active
control systems, as these arms have relatively large flying masses (close to 14 kg for
the DLR-LWR). These systems have fewer “inherent” safety features than the arms
mentioned previously, but their high bandwidths allow for tight coupling with distal
force/torque sensors to stop the arms extremely quickly after collisions are detected.
Of the robotic arms discussed previously, those that are commercially available are
5.3. DESIGN 71
relatively expensive, with end-user purchase prices well above $100,000 USD at time
of writing. However, there are a few examples of low-cost robotic manipulators used
in research. The arms on the Dynamaid robot [92] are constructed from ROBOTIS
Dynamixels, which are lightweight and compact self-contained actuator modules. The
Dynamaid robot has a human-scale workspace, but a lower payload (1 kg) than the
class of arms discussed previously. The 5-DOF KUKA YouBot arm is targeted at
is a new 5-DOF arm for robotics research [48]. It has a comparatively small work
envelope of just over 0.5 m3, repeatability of 0.1 mm, and payload of 0.5 kg and
employs custom, compact motors and gearheads.
More relevant to the manipulator described in this chapter, countless robot arms
have been constructed using stepper motors. Pierrot and Dombre [69, 20] discuss
how stepper motors contribute to the human-safety of the Hippocrate and Dermarob
medical robots, because the steppers will remain stationary in the event of electronics
failure, as compared to traditional DC motors, which may continue rotating, depend-
ing on the type of subsystem failure. More importantly, stepper motors are often
operated relatively close to their maximum torque, as compared to DC or BLDC
motors which typically have a much higher stall torque than the torque used for con-
tinuous operation. In the marketplace at time of writing, ST Robotics offers a number
of stepper-driven robotic arms which advertise sub-mm repeatability. However, these
manipulators are intended for industrial applications, and thus do not have integrated
human-safety features. Various other small, non-compliant manipulators were made
in the 1980s and 1990s for the educational market were driven by stepper motors. For
example, the 5-DOF Armdroid robots had 0.6m reach and used steppers with timing
belts for gear reduction, followed by cables to connect to the rest of the arm.
5.3 Design
The manipulator presented in this chapter has an approximately spherical shoulder
and an approximately spherical wrist, connected by a single-DOF elbow. The joint
limits and topology were designed to enable the robot to perform manipulation tasks
when mounted near table-height, as opposed to anthropomorphic arms, which must
72 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR
hang down from the shoulder and require the base of the arm to be mounted some
distance above the workspace, with a correspondingly higher center of mass (and fall
hazard) of the resulting structure. The shoulder-lift joint has nearly 180 degrees of
motion, allowing the arm to reach objects on the floor and also work comfortably on
tabletops. A summary of the measured properties and performance of the manipulator
is shown in table 5.1.
Table 5.1: Measured properties of the manipulatorLength 1.0m to wrist
Total mass 11.4 kgFlying mass 2.0 kg
Maximum payload 2.0 kgMaximum end-effector speed 1.5 m/s
Repeatability 3 mm
5.3.1 Actuation overview
Figure 5.2 shows the actuation scheme for the proximal four DOF. These joints are
driven by stepper motors. Speed reduction is realized through timing belts and cable
circuits, and is followed by a series-elastic coupling to each joint. Creating speed
reduction through timing belts and cable circuits results in low friction, low stiction,
and zero backlash, enabling the arm to make small incremental motions of less than
0.5mm in all configurations. Additionally, there is no gearing to damage under ap-
plied external impulse forces. This leads to a low-cost but relatively high performance
actuation scheme. A downside to this scheme, however, is that the reduction mech-
anisms occupy a relatively large volume, making the proximal portion of the arm
somewhat large.
The two-stage reduction of a timing belt followed by a cable circuit accomplishes
not only a larger gear reduction than a single stage, but also enables the motors to be
located closer to ground. The motors for the two most proximal DOF are grounded,
and the motors for the elbow and upperarm roll joints are located one DOF away
from ground. By placing the relatively heavy stepper motors close to ground, the
5.3. DESIGN 73
Figure 5.2: Actuation scheme for each of the proximal four DOF.
flying mass of the arm is greatly reduced: below the second (shoulder pitch) joint,
the flying mass of the arm is 2.0 kg. For comparison, a typical adult male human
arm has a flying mass of 3.4 kg [13].
The two-stage reduction scheme leads to coupled motions of the first four joints.
However, this coupling is purely linear and can easily be compensated using a software
feedforward term. The routes of the timing belts and cables can be seen in figure 5.3.
Following the timing belts and cable circuits, the proximal four DOF have series elastic
couplings between the cable capstan and the output link, as discussed in section 5.4.
These couplings provide intrinsic compliance to the arm, as well as provide torque
sensing (section 5.5).
The three distal joints are driven by Dynamixel RX-64 actuator modules. These
joints do not have compliant features aside from software torque limits. However,
the compliance of the proximal four joints allow the end effector to be displaced in
Cartesian space in three dimensions, outside of kinematic singularities where only two
dimensions will be compliant.
5.3.2 Tradeoffs of using stepper motors
Using stepper motors as actuators offer several advantages. Because they are similar
to brushless motors, but have many more poles than a typical brushless motor, stepper
motors excel at providing large torques at low speeds, which is the target regime of
the arm. They require a relatively low gear reduction, which can be accomplished
with timing belts and cable drives, as in the design presented in this chapter. In
this design, the effective reductions were 6, 10, 13, and 13, respectively, for the first
four joints. DC motors, for comparison, generally require a significantly larger gear
74 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR
Figure 5.3: Cable routes (solid) and belt routes (dashed) for the shoulder lift, shoulderroll, and elbow joints. All belt routes rotate about the shoulder lift joint. The elbowcables twist about the shoulder roll axis inside a hollow shaft. Best viewed in color.
reduction that would be either susceptible to backlash or incur considerable cost.
Stepper motors can also act as electromagnetic clutches, improving safety if large
forces are accidentally applied at the output. If a force is applied that causes a stepper
to exceed its holding torque, the stepper motor will slip and the arm will move some
distance until the stepper can re-engage. In the controllers developed for this arm,
the holding torque is approximately 60% more than the maximum moving torque
(and hence the maximum payload of the arm): large enough to avoid unintentional
slipping, but small enough to provide a form of force-limiting.
However, there are several downsides of the stepper motors acting as an electro-
magnetic clutch. First, if a stepper motor slips, the arm may need to be re-homed.
The manipulator uses joint-angle encoders for state estimation, so closed-loop position
5.3. DESIGN 75
Figure 5.4: Compact servos are used to actuate the distal three joints.
control can still occur after a slip. However, force sensing will be miscalibrated (see
section 5.5). Second, the arm may move suddenly away from the impact if a stepper
motor slips. The arm only slips if relatively large amounts of force are applied, and
after a slip the steppers initially provide little resistance until the rotor slows down
sufficiently to be re-locked by the stator fields. The moving arm may collide with
other objects or people. In the proposed design, this risk was addressed by reducing
the flying mass of the arm as much as possible. Adding backshaft encoders to the
stepper motors would enable tracking of the rotor position during rotor slippage, and
thus enable faster stoppage of a slipping motor. Whether or not the additional cost is
justified depends on the task and the anticipated frequency of unintended high-speed
collisions. As envisioned in the design, stepper slips occur only as a final layer of
safety, and thus are not anticipated to be a frequent operational mode.
5.3.3 Distal actuation
The actuation scheme of the proposed manipulator uses series-elastic actuators (SEA)
in the proximal 4 joints, but directly-coupled actuators for the distal 3 joints. The
bandwidth of the distal 3 joints is somewhat higher than the bandwidth of the prox-
imal 4 joints, permitting a restricted set of higher-frequency motions. This is similar
to that described in [113], which employs a macro-mini actuation scheme for the most
proximal DOF and conventional actuators for the more distal DOF.
The decision to use of non-SEA distal actuators was primarily to reduce the flying
76 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR
mass and volume of the forearm. However, one risk is that the gears of these distal
joints must absorb shock loads delivered or received by the end effectors.
5.3.4 Inertia and stiffness
One important tradeoff with a series-elastic manipulator is the arm inertia and series
elastic stiffness. Consider a single-joint arm with moment of inertia I [kg m2] driven
by a rotary joint with torsional stiffness kθ [N m/radian]. The arm will oscillate at
its natural frequency, which is f0 = 12π
√kθ/I. If the arm has a low inertia or the
series elastic coupling is stiff, the motor driving the arm may not have enough torque
or bandwidth to compensate for this oscillation. Pratt and Williamson [70] suggest
increasing the arm’s inertia to eliminate this effect; other options are to reduce the
spring constant; include damping in the series-elastic coupling; or increase bandwidth
by decreasing the motor gear reduction, at the cost of a lower payload. For human-safe
robotic arms with low inertia, this issue can be significant.
In the arm described in this chapter, considering the elbow joint, the natural
frequency is around f0 = 5.1 Hz, with kθ = 86 N m/radian and I = 0.083 kg m2.
This is close to the bandwidth of the motors with the selected gear reduction.
5.3.5 Low-cost manufacturing
Several methods were used to reduce the cost of the manipulator. With similar
speed/torque delivery, planetary-geared motors of similar cost to the stepper motors
used in this design typically exhibit at least one degree of backlash. At time of writing,
actuators using harmonic drives that are capable of these speeds and torques often
cost ten times as much as a stepper motor. For cost concerns, stepper motors were
used for the proximal four joints.
To control the low-volume manufacturing cost of the design, it was realized pri-
marily by lasercutting 5-ply plywood. Although high-volume manufacturing meth-
ods such as injection molding would enable drastically lower incremental costs, these
methods typically incur significant tooling expenses which require thousands of units
to justify. Laser cutting, in contrast, is “toolless” and thus allows low-volume runs
5.3. DESIGN 77
Table 5.2: Part cost breakdown of the armMotors
Steppers $700Servos $1335
Electronics $750Hardware $960Encoders $390Total $4135
to be economically realized. The lasercutter used for these experiments (a 500-watt
Beam Dynamics OmniBeam 500) can produce tolerances of 0.025mm, and excellent
results were also achieved with an Epilog Legend Helix 24 (45 Watt) laser cutter.
Dovetailing of the wood pieces was done to enable them to press-fit together, and
flanged bearings and shafts were also press-fit into holes. It is unknown how the
wooden structure would respond to large temperature and humidity variations, but
in a typical lab environment these are held relatively constant. Wood is an excellent
material for rapid prototyping, and is rigid enough to meet the repeatability design
requirements. Further experiments and iterations of the proximal joints were done us-
ing laser-cut sheet aluminum, which verified that the results presented in this chapter
are replicable on more durable materials.
The lower arm of the robot was made of folded lasercut aluminum. Although
folded metal structures typically cannot achieve the extreme precision of custom-
machined 3D parts, calibration techniques can be readily used to compensate for
manufacturing variances.
All parts other than the lasercut structures were off-the-shelf components. A
breakdown of the parts cost for the robot is shown in Table 5.2. Not included in
this list are the costs of laser cutter time and assembly time; laser cutting would
require approximately 2.5 hours and assembly would take approximately 15 hours
for additional copies of the manipulator. The costs shown in Table 5.2 were those
incurred for the creation of the prototype described in this chapter, from a variety
of domestic suppliers. Efforts at locating low-cost suppliers would likely result in a
dramatic reduction of the part cost.
78 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR
Figure 5.5: Diagram of the series compliance. Left, compliant coupling with noexternal force. Right, an applied force causes rotation against the locked drivenwheel.
5.4 Series Compliance
As mentioned previously, the manipulator employs compliant couplings in the prox-
imal four joints. This provides an increased measure of safety, allows the arm to be
compliant even though the stepper motors are not backdrivable, and is used for force
sensing by measuring the deflection across the compliant members.
A diagram of the compliant coupling is shown in Figure 5.5. Its operation is
similar to the elastic couplings described in [49, 4, 100]. At the joint, a capstan used
in the cable circuit (labeled 1 in Figure 5.5) floats on bearings on the same shaft as the
output link (2). The capstan is connected to the output link only via the compliant
elements. Two plates connected to the output link extend through the middle of the
capstan, which has two interior cutouts. Each cutout contains a polyurethane tube
(3), which is compressed between the plate from the output link and the side of the
cutout in the capstan. In Figure 5.5(right), the capstan (4) is held stationary while
an external force (F) is applied. This causes one polyurethane tube (5) to compress
while the other (6) expands. The polyurethane tubes are initially pre-compressed to
slightly more than half of their maximum possible compression, so they will always
remain in compression as the output link moves with respect to the capstan.
Polyurethane was used to provide some mechanical damping of the joint, which
gives the arm some hysteresis but helps dampen oscillations. However, springs could
5.5. SENSING 79
Figure 5.6: Stiffness of the elbow. Hysteresis is exhibited due to the polyurethanein the series compliance. The joint was quasi-statically moved through 70% of itsnormal operating range.
readily be used in their place. Tubes were used instead of rods or balls to give the
output links around 4 degrees of compliance in each direction, which requires several
millimeters of travel. Figure 5.6 shows the stiffness and hysteresis of the compliant
coupling in the elbow joint.
5.5 Sensing
As discussed in previous sections, the first four joints of the manipulator are actuated
by relatively large stepper motors embedded in the base and shoulder. The intrinsic
stability of stepper motors forms a key aspect of the sensing strategy: assuming
the stepper motors do not slip, the series of step motions the motors undergo can be
precisely integrated to give the input displacement to the series-elastic coupling. Joint
angles are measured directly at the links using optical encoders. The deflection of
the compliant element can thus be measured as the difference of the (post-reduction)
motor position and the joint angle, thus permitting force sensing via the inferred
80 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR
Figure 5.7: Repeatability test results. Measurement accuracy is ±0.1 mm.
deformation of the compliant elements in the couplings.
Integration of motor step counts occurs on embedded microcontrollers located
in the first two links of the manipulator. This integration commences at power-up,
and thus the motor step integration is best seen as a relative position estimate. To
estimate the position offsets, enabling comparison with the (indexed) absolute joint-
angle encoders, the manipulator is driven to the index pulses and held stationary.
The stepper count when the manipulator is stationary at all encoder index pulses can
be taken as a static offset to permit force-sensing calibration, barring hysteresis or
plastic deformation of the compliant elements.
The distal three joints are actuated by Robotis Dynamixel RX-64 servos, which
feature internal potentiometers with a usable range of 300 degrees. The potentiometer
voltage is internally sampled by the servo.
To simplify the manipulator wiring, the stepper-motor drivers and servos share
a common RS-485 bus and data protocol. Sensors are sampled and actuators are
commanded at 100 Hertz.
5.6. PERFORMANCE 81
−2 0 2 4 6
0
2
4
6
8
10
12
Time (seconds)
Jo
int
an
gle
(d
eg
ree
s)
Shoulder lift step response
TargetOpen−loopClosed−loop
−1 −0.5 0 0.5 1 1.5 2 2.5 3
0
5
10
15
Time (seconds)
Jo
int
an
gle
(d
eg
ree
s)
Elbow step response
TargetOpen−loopClosed−loop
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
2
4
6
8
10
12
Time (seconds)
Join
t angle
(degre
es)
Wrist yaw step response
Target
Closed−loop
Figure 5.8: Step responses for each of the major types of actuators of the robot.Top, the shoulder-lift joint, a NEMA-34 stepper motor. Middle, the elbow joint, aNEMA-23 stepper motor. Bottom, the wrist yaw joint, a rigidly coupled RobotisRX-64 servo. Note that timescales change on each plot.
5.6 Performance
The performance of the manipulator on several metrics was measured. Closed-loop
repeatability was tested by repeatedly moving the between a home position and eight
locations distributed far apart in the workspace. The repeatability at the home
position is shown in Figure 5.7, where the position of the arm is plotted each time
after it returned from a distant location, as measured by an external high-precision
optical tracking system with 0.1mm accuracy.
The encoders can register changes of 0.036 degrees, which corresponds to 0.64mm
82 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR
Figure 5.9: Low-cost MEMS inertial sensors affixed to the teleoperator’s torso, upperarm, lower arm, and hand to estimate desired end-effector positions.
at the base joint with the arm fully extended. The stepper motor at the base joint
can command changes of 0.52mm at the end effector. Moving down the arm, each
subsequent motor can command sequentially finer motions due to increased effective
gear ratios and shorter distances to the end effector.
Payload was measured by adding mass to the end effector until the shoulder step-
per motors slipped when slowly moving through the worst-case fully outstretched
configuration. Maximum velocity was measured by commanding the fully-extended
arm to move upwards at the maximum rate of the stepper motor controllers, while ob-
serving the end-effector velocity with an optical tracking system. These experiments
demonstrated a maximum payload of 2.0 kg and a maximum velocity of 1.5 m/s.
Due to the ability of the encoders to register very small motions and the soft com-
pliance of the arm, force sensing can be accomplished by measuring the displacement
of the arm. With the arm fully outstretched, masses of 15 grams reliably induce an
angular deflection large enough to be observed by the shoulder-lift joint encoder.
5.7. CONTROL AND SOFTWARE 83
Figure 5.10: Playing chess via teleoperation.
5.7 Control and Software
The manipulator is controlled using standard techniques: closed-loop PID control in
joint space is achieved using joint encoders. Inverse kinematics using the OROCOS-
KDL library [10] allows Cartesian control while respecting joint limits. Nullspace
control is numerically computed on-the-fly using the Eigen C++ library [35] to con-
tinually push the joints away from singular configurations. As previously discussed,
the proximal joints are coupled due to their belt and cable routes. Linear feedforward
terms were added to the joint-space controller to decouple the kinematics.
System integration and visualization is performed by the Robot Operating System
(ROS) [76] under Linux to ease debugging. ROS will be fully described in 8, but in this
context, it was employed due to its support of hot-swapping software modules, and
management of setup and teardown of an ensemble of peer-to-peer data connections.
It was thus possible to easily swap the underlying controllers to support additional
features, such as improved force sensing or to simulate Cartesian compliance.
To demonstrate the ability of the manipulator to perform various tasks, a low-
cost teleoperation system was created similar to that described in [64]. Compact,
inexpensive USB devices containing MEMS inertial sensors and magnetometers were
84 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR
Figure 5.11: Demonstration task: making pancakes.
affixed to a shirt, allowing easy estimation of the posture of the teleoperator (Fig-
ure 5.9), which in turn was used to generate inverse-kinematics joint-angle targets for
the manipulator. This control stack was used, among other things, to play a game of
chess (Figure 5.10) to demonstrate teleoperation involving fine motions.
5.8 Demonstration Application
To explore the feasibility of real-world use of the proposed manipulator, a demon-
stration “chefbot” application was created. For this application, the manipulator was
equipped with an end effector consisting of a distal roll axis connected to a spatula
and spoon. The unpowered manipulator was moved by hand through a trajectory
that scooped pancake batter out of a basin, poured two pancakes on a griddle, flipped
them at the appropriate time, and finally deposited the pancakes onto a serving plate
(Figure 5.11).
Joint-space waypoints were recorded via keypress while the manipulator was being
manually moved. The intrinsic compliance of the manipulator simplified the software:
5.9. SUMMARY 85
only simple moving-setpoint control with linear joint-space interpolation was neces-
sary in order to obtain reliable autonomous task completion. During the scraping
operations, firm contact between the spatula and the griddle surface was maintained
by virtue of the series-elastic shoulder and elbow, in addition to the compliance of
the spatula. As a result, neither high-bandwidth control nor accurate force/torque
sensors were required at the end effector to manage the contact force.
5.9 Summary
This chapter presented the design of a low-cost 7-DOF arm intended for personal-
robotics research tasks and explorations. Low cost was the result of a series of trade-
offs. For speed reduction, the cost of an expensive gearhead was traded for the volume
and complexity of a timing belt and zero-backlash cable drive circuit. For the prox-
imal actuators, the cost and potential backlash of a highly-reduced gearmotor were
traded for the relatively large static power requirements of stepper motors. These
design tradeoffs were chosen for the envisioned application of research robots inter-
acting with unstructured environments such as a typical home or workplace, where
the safety of intrinsic mechanical compliance is an important design consideration.
The cost-controlling tradeoffs described in this chapter were made as an exploration
of designing affordable compliant manipulators, an area of research which, to date,
has received little attention, but which could have a large impact on the speed of
adoption of robots into high-volume consumer markets.
Chapter 6
A Low-cost Robotic Hand
6.1 Introduction
This chapter describes the design and development of a low-cost robotic hand de-
veloped for the DARPA Advanced Robotic Manipulation (ARM) program. Some of
the highest-level design decisions were driven by program requirements: most im-
portantly, the hand was required to be self-contained. Virtually all animals realize
phalange motion through a network of tendons connected to lower-limb muscles. In
contrast, this research program sought to create robotic hand designs which can be
easily swapped among a variety of extant robotic manipulators. This required a fully
self-contained hand design, and made for numerous volumetric challenges, which will
be described in this chapter. This requirement of self-contained actuation was the
primary driver of the mechatronic topology. Other influential program goals included
low cost and a reasonable level of robustness, which will be extensively discussed in
this chapter.
The hand was designed in close collaboration with Dr. Curt Salisbury of Sandia
National Laboratories, who realized the mechanical design and assembly. As such,
the mechanical work is not claimed as original work by the author for this thesis, and
is described herein only for completeness and to explain the co-designed electronics.
The electrical and software portions of the hand were designed and implemented by
the author, and are presented in full detail.
86
6.2. RELATED WORK 87
Figure 6.1: Each finger module has three motors at the proximal end of the module,shown at left in the figure.
6.2 Related Work
Robotic hands have long been a field of intense interest. Some of the seminal previ-
ous designs include the Salisbury Hand [79], the Utah/MIT Hand [41], the Shadow
Hand [87], and the Barrett Hand [94]. Among more recent designs, a high-level clas-
sification can be made between fully-actuated and under-actuated kinematics. Many
arguments have been made for and against each approach. Although implementation
decisions can greatly affect performance and capabilities, in general, under-actuated
hands can be simpler, less voluminous, and less massive, while still capable of grasp-
ing a large variety of objects and performing relatively simple in-hand manipulations
such as pushing a button on a handheld electronic device. In contrast, fully-actuated
designs must encompass relatively more complexity, typically requiring more volume
and mass, but offer the maximum possible kinematic capabilities and permit in-hand
manipulations such as complex finger gaiting. The hand described in this chapter
falls into the latter category: it is a fully-actuated, 12 DOF design.
6.3 High-level Design
The hand was designed with low cost and robustness in mind. Towards these goals,
the hand was designed as a collection of identical finger modules that attach to a
hand frame. Each finger module is a self-contained three-axis micro-manipulator, as
shown in Figure 6.1. The results of the design effort towards low-cost and robustness
will be described in detail in the following sections.
88 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.2: The hand frame and its set of identical finger modules, which dock mag-netically or with retaining bolts.
6.3.1 Robustness
Robustness is a relative term that is used in many contexts and can be difficult to
quantify without extensive destructive testing. However, as a general concept, robust-
ness implies the ability of a system to gracefully absorb and recover from situations
that exceed the normal operating conditions of the system.
Several key design decisions were made to increase the robustness of the hand. As
shown in Figure 6.2, the finger modules connect to the hand frame via a mechanical
fuse, exhibiting binary compliance. When large forces or torques are experienced,
for example if a robotic manipulator crashes the hand into a rigid object, the finger
modules separate cleanly from the hand frame, rather than attempting to absorb
6.3. HIGH-LEVEL DESIGN 89
the overload force or torque and convey it back to the manipulator arm. Binary
compliance can be achieved in a variety of ways. During the design and prototyping
process, both magnetic decoupling and easily-replaced mechanical breakaway features
were employed.
An alternative strategy to improve robustness is to support field repairs by un-
trained personnel. Towards this goal, the finger modules are designed to be easily re-
placed in the event of mechanical or electrical failure. Electrical connectivity between
the finger modules and the hand frame is achieved with spring contacts, avoiding the
need for seating a delicate connector. Mechanical installation consists of tightening a
pair of easily-accessible screws into their respective mechanical breakaway features.
6.3.2 Actuation
Extremely compact fingers and knuckles are desired in a robotic hand, to permit
natural manipulation of objects and tools commonly found in human-designed envi-
ronments. Towards the goal of slender fingers and knuckles, the motors were placed
as proximal as possible while still meeting the design requirement of a self-contained
hand. Furthermore, as discussed in the previous section, each finger module needed
to be self-contained to support the goal of separable fingers for robustness.
These constraints resulted in a design where each finger module contained a col-
umn of three motors at its base, as shown in Figure 6.1. To maximize torque density
of the motors, brushless “outrunner” motors were employed. The “outrunner” motor
configuration is essentially an “inside-out” brushless motor: the stator is at the center
of the motor, and is enveloped by a ring of permanent magnets spinning around it.
The magnetic moment arm, and thus the resultant torque, is as large as possible for
a given motor diameter, a critical factor in such a space-constrained design scenario.
Unfortunately, the geometry of “outrunner” motors requires significant thought
to be given to the thermal path of the motor. Because the permanent-magnet rotor
envelops the stator coils, significant heat is generated inside the motor. To sink this
heat away from the motors, the stator of each motor is affixed to a heatsink inside
the finger module base. As will be described in detail in Section 6.5, the stators
90 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.3: The motor module (aluminum at right) separates from the rest of thefinger module (plastic at center) by simply removing a few bolts. Cable tension isnot affected.
and shaft supports of the “outrunner” motors pass through interior cutouts in the
motor control boards to conserve volume and eliminate wiring. Similarly, the rotor
shafts pass through the shaft supports and heatsink, terminating with small pinion
gears located outside of the heatsink. For thermal conductivity, the heatsink and a
protective cap over the rotors are constructed out of machined aluminum. As shown
in Figure 6.3, this “motor module” is fully self-contained and cleanly separates from
the finger module using six bolts. As such, it can be swapped without requiring any
adjustments to the cable drive mechanisms, as the cable tension is borne by the gear
reduction and its supporting input and output bearings.
6.4. SENSOR SUITE 91
After the motor, typical cable-driven robotic assemblies employ a gear reduction
followed by a capstan upon which mechanical drive cables wrap and terminate. How-
ever, the volume and mass of robotic fingers are severely constrained, making this
traditional approach undesirable. To permit the gear reduction to also serve as the
cable capstan, the output shaft of the gear reduction is rigidly coupled or “grounded”
to the finger module chassis, and the input side of the gear reduction housing is
supported by a thin-section bearing.
To maximize torque density, a low-cost planetary gear reduction was selected.
Like many planetary gear reductions, the selected unit was rotationally symmetric
about the co-axial input and output shafts. Additionally, the exterior of the ring
gear of the planetary gear reduction was slightly larger than a common metric thin-
section bearing size. Disassembling the gear reduction and precisely turning the ring
gear on a lathe allowed the gear reduction to be precisely mated to the thin-section
bearing, which in turn was press-fit into the finger module assembly. As a result, the
exterior of the gear reductions function as capstans for the mechanical drive cables
actuating the distal two degrees of freedom of each finger module. The first degree
of freedom, being co-axial with the first motor, did not require a cable drive and is
rigidly attached to its respective gear reduction.
6.3.3 Hand Frame
Because the mechanical complexity of the hand is contained inside the finger modules,
a variety of hand geometries can be realized by printing different hand frames and
simply bolting the fingers into them, as shown in Figure 6.4.
6.4 Sensor Suite
The hand is equipped with a variety of proprioceptive and exteroceptive sensors. The
proprioception capabilities include joint-angle, tactile, strain, and thermal sensing
schemes, whereas the exteroceptive sensing capability is provided by a small camera
array. These sensing modalities will be described in detail in the following sections.
92 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.4: Hand frame variations. Finger modules are unchanged.
6.4.1 Joint Encoding
The volumetric constraints of robotic fingers increase the difficulty of sensing their
joint angles. In Chapter 4, an argument was made in favor of utilizing MEMS ac-
celerometers to reduce the overall system cost versus the traditional solutions of
optical or magnetic encoder discs. However, in the case of robotic finger, an addi-
tional argument against traditional disc-based solutions can be made: they simply
cannot fit within the structures while satisfying mechanical constraints. The volume
of the finger knuckles is a critical stress concentration that must be mitigated by rel-
atively massive mechanical structures, due to the somewhat poor material properties
of rapid-prototyping plastic at time of writing.
As a result, a combination of MEMS inertial sensors and forward-propagation
of motor encoders were employed to create a compact, low-cost encoding scheme.
Three-dimensional accelerometers were placed on the circuit boards in the base of
the finger module and the F2 and F3 phalanges, as shown in Figure 6.5. In static
conditions, these accelerometers measure the direction of the gravity vector in each
respective inertial frame.
By projecting the difference between the direction of the gravity vectors onto the
constraints of the kinematic chain, the joint angles can be inferred in many cases.
The F2-F3 joint angle is relatively easy to infer, as accelerometers are positioned
6.4. SENSOR SUITE 93
Figure 6.5: The locations of the accelerometers are illustrated by the red circles.
on either side of this 1-d joint. As such, the joint is only difficult to measure when
the direction of the joint approaches vertical.
In contrast, the proximal two joints (F0-F1 and F1-F2) are significantly more
difficult to estimate. Because of the kinematic advantages of reducing the offset
between the two proximal orthogonal joints, F1 was made as compact as possible.
Its interior volume is fully occupied by four internal cable pulleys and the shaft and
bearings for the F1-F2 joint. As a result, it was prohibitively difficult to place an
accelerometer in this link, and thus the joint angles for the two proximal joints must
be inferred by only observing the gravity vector in F0 and F2, as shown in Figure 6.5.
The closed-form solution for the proximal two joint angles as a function of the
accelerometer readings in the F0 and F2 links is somewhat complex and was provided
in Chapter 4. This closed-form solution can produce up to four estimates. However,
applying range-of-motion constraints typically results in only one or two potential
solutions, and a variety of tracking and filtering schemes can be employed to choose
between them.
94 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.6: Soft tactile pads allow conformal grasping of small objects.
Figure 6.7: To achieve mechanical robustness while still exhibiting conforming prop-erties, the skin consists of a tougher thin outer layer above a very soft and thick innerlayer.
6.4.2 Contact Geometry
Soft coverings of manipulator surfaces can greatly simplify grasping and manipula-
tion tasks. As demonstrated throughout the animal kingdom, biological manipulator
surfaces such as hands, feet, paws, tentacles, etc., conform to the object or envi-
ronment. Flexible surfaces create far larger contact patches than those created by
rigid manipulator surfaces, allowing for partial or full envelopment of small objects,
as shown in Figure 6.6. This effect is particularly pronounced when small objects
must be manipulated in fingertip grasps, such as writing implements or keys. Even
when handling larger objects, however, grasp stability is greatly improved when the
contact patch is a large surface rather than the line contacts or even point contacts
produced when rigid manipulator surfaces attempt to grasp hard objects.
6.4. SENSOR SUITE 95
From an engineering perspective, however, it is often difficult to fabricate a surface
which exhibits conformal properties while being mechanically robust. To address this
challenge, a multi-layer silicone finger skin was developed. Conformal properties are
provided by a relatively thick layer of very soft Shore OO 10 silicone gel. This soft
gel is coated by a thin and somewhat stiffer coating of Shore A40 silicone. Because
the junction between these layers is between two layers of silicone, very high bond
strength can be achieved using silicone adhesives. The result is somewhat analogous
to animal skin, where relatively stiff layers of dermis and epidermis protect the soft
layers of flesh underneath. In both cases, the skin is highly compliant in the direction
of its surface normal, but significantly stiffer under shear loading conditions. This
allows for shear forces to be exhibited on objects and the environment, even while
the manipulator surface is conforming to the object or environment. A cutaway of
this construction is shown in Figure 6.7.
6.4.3 Tactile Sensing
Robotic manipulation, whether for simple grasping or complex in-hand manipulation,
involves managing fingertip forces while maintaining contact with objects. As such,
high-resolution tactile data can be extremely useful. Towards this objective, and with
the overall design goals of low cost and robustness in mind, a novel tactile scheme
was developed and implemented.
As described in the previous section, the finger pads are constructed using a
multi-layer construction to exhibit mechanical toughness against strain loads and
mechanical compliance under normal loads. By measuring the deflection of the soft
inner layer, the contact forces can be estimated.
To observe this deflection, a layer of clear silicone was added below the soft inner
silicone layer shown in Figure 6.7, and an array of transflective photosensors were em-
bedded below this clear layer, as shown in Figure 6.8. Transflective photosensors are
comprised of an LED and phototransistor pair inside a single package. The LED and
phototransistor are arranged with a vergence angle such that the photocurrent varies
significantly depending on the reflectivity of objects located within a few millimeters
96 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.8: Cross-section rendering showing transflective sensors embedded in thefinger pads.
Figure 6.9: Tactile array implemented as a rigid-flex PCB.
of the device.
To improve the optical properties as much as possible, the outer (tough) layer
is cast in black silicone, and the middle (soft) layer was cast in white silicone. As a
result, when observed from below, through the clear silicone layer, the proximity of the
nearest white surface changes from approximately 1.5mm to 0.25mm, depending on
the external forces applied to the silicone assembly. The variance in proximity of the
white layer proximity produces a varying photocurrent, which is directed through a
transimpedance amplifier and low-pass filter before being digitized by a 16-bit analog-
to-digital converter.
Rigid-flex circuit boards were created to fit this circuitry into the space constraints
of the robotic fingers. Rigid-flex constructions allow for high-density, multi-layer cir-
cuitry on a portion of the assembly, with a subset of the copper layers then continuing
6.4. SENSOR SUITE 97
Figure 6.10: Flat test fixture for the tactile array.
outside the rigid fiberglass section and protruding while being covered only by flexible
polymide film. As shown in Figure 6.9, the rigid portion includes the vast majority
of the components and is routed on six layers, whereas the flexible portion includes
only the photosensor array. When installed into the robotic finger, the rigid portion
resides inside the finger volume, and the flex portion wraps around the slightly conical
outside of the finger core, covered with a protective plastic “window frame” to secure
the photosensors against shear loads. This assembly is then covered with the layers
of clear, white, and black silicone. A flat test fixture of this construction is shown in
Figure 6.10.
By varying the durometers and thicknesses of each respective silicone layer, a va-
riety of sensor characteristics can be tuned, such as sensitivity, range, and mechanical
toughness. For the robotic hand described in this work, the layer thicknesses and
durometers were chosen experimentally to seek a balance between these properties to
allow sensing of handheld tool manipulation. A representative plot of the raw sensor
response to repeated cycles of loading and unloading a 2-gram US Penny is shown in
Figure 6.11, demonstrating that these 2-gram loads are far above the noise floor of
the sensor.
6.4.4 Strain
To provide a direct measurement of the force of the grasp closure, a strain gage
was affixed to the center of the F2 link. This gage is measuring the bending of the
F2 link as it is loaded under the combined torques of J2 and J3. Because of space
constraints in the F2 link, the strain is measured using a single gage in a quarter-bridge
98 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.11: Raw sensor response of repeatedly loading and unloading a 2-gram USPenny onto the skin assembly shown in Figure 6.10.
configuration. The differential bridge voltage is measured by a 24-bit analog-to-digital
converter after amplification.
6.4.5 Visual Sensing
Although tactile and strain sensing can be of great utility in managing and maintain-
ing contact forces, a key challenge in robotic manipulation is knowing where contact
forces should be applied. A wide variety of approaches have been applied to this
problem over decades of research, with many approaches utilizing vision-based data
as their sensory input. However, key challenges to all vision-based methods include
calibration and occlusion.
The calibration problem is simple to describe: namely, if visual or depth data
is obtained from head-mounted sensors on a humanoid robot, this information must
be propagated through typically nine joints (a pan/tilt head followed by a 7-DOF
arm) to arrive in the frame of the end effector. Angular errors and coordinate-frame
misalignments are cumulative, resulting in errors on the order of one centimeter even
6.4. SENSOR SUITE 99
on precisely machined, carefully calibrated humanoid robots. Although far superior
results can be obtained by the extremely rigid structures found in industrial robots,
and particularly by closed kinematic chains, the fundamental problem remains: prop-
agating sensory data through long kinematic chains is inherently difficult.
Occlusion is even simpler to describe: visual data is obtained by a 2-dimensional
projection of the scene onto an imaging device. By definition, it is impossible to see
through opaque objects. As a result, form-closure grasps of objects typically involve
placing at least one contact point on a region of the object that cannot be directly
perceived from a single viewpoint. A variety of strategies has emerged to predict
where these occluded grasp points should be placed. However, the ability to perceive
in the hand frame allows all sides of an object to be imaged by “flying” the hand
around the workplace. Such “eye-in-hand” systems have been proposed by a number
of researchers over the years. Recent developments in depth-sensing algorithms, com-
bined with ever-increasing electronics density, demonstrated the technical possibility
and utility of creating an in-hand stereo vision system.
A key design challenge of the in-hand stereo vision system was to create a system
which did not add to the volume of the hand, while simultaneously not compro-
mising the structural integrity of the mechatronic assembly. A first prototype was
constructed using mobile-phone camera modules. Due to global demand for “camera
phones,” hundreds of millions of camera modules are manufactured annually, with
massive market pressures pushing towards continual improvements in cost, volume,
and power.
Two approaches toward using cell-phone camera modules are shown in Figure 6.12,
using circuit boards which fit behind the palm surface of the hand and carry cam-
era modules on the tips of long protrusions reaching between the finger modules to
avoid introducing structural issues. The first effort intended to maximize density
by soldering camera modules directly to this large circuit board, but the (relatively)
large metal lens-holding structures were found to cause manufacturability issues with
standard surface-mount electronics production tools. As a result, the second effort
shown in Figure 6.12 employed the standard mobile-phone fabrication technique of
pre-assembled camera modules and high-density connectors.
100 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.12: Trinocular camera boards holding direct-solder lens modules (left) andfully-assembled camera flex circuit boards (right).
As demonstrated by the recent explosion of perception research using low-cost 3D
sensors, depth data can significantly improve the capabilities of perception systems.
A variety of modalities have been explored over the past several decades. Passive
stereo systems are notorious for failing in sparse artificial scenes having low texture.
Unfortunately, this is a common situation in the envisioned use cases for robotic
hands. As a result, various active-sensing modalities were prototyped for the in-hand
vision system.
Pico Projector
The first prototype was based on a beam-steered pico projector. These devices use
oscillating MEMS mirrors to sweep laser beams across their images, and are compact
enough to mount on the side of a robot hand, as shown in Figure 6.13.
As described in several papers in the literature using full-size projectors, the pro-
jectors can be used to create difference images of bars at various horizontal scales,
6.4. SENSOR SUITE 101
Figure 6.13: Left: beam-steered pico projector affixed to the side of a robotic hand.Right: depth image constructed using this apparatus.
as shown in Figure 6.14. Pixel-wise depth estimates are then obtained by classify-
ing each pixel of each frame as {0, 1, indeterminate}, and converting the resulting
binary string to its unique plane emerging from the projector. Intrinsic calibration
of the camera produces a ray for each pixel, which intersects this projector plane to
produces a 3D estimate. Figure 6.13 shows the composite depth image produced by
the images of Figure 6.14.
Although this method has some benefits, particularly that its independent pixel-
wise estimation preserves sharp depth discontinuities, it also has several drawbacks
that became apparent during the prototyping phase. First and foremost, it relies on
using the pico projector to overcome ambient light. Safety regulations dictate that
handheld laser projectors are eye-safe; as a result, they are limited to approximately 20
lumens, spread evenly across the entire scene. Although effective in somewhat dimly
lit rooms, this technique would not scale well to the variable lighting conditions of
many envisioned applications of the robot hand.
102 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.14: Difference images of polarity-inversion bar codes produced by the picoprojector, amplified 5x.
Fingertip Laser Line Scanner
As a result of the limited ambient-light robustness of the pico projector, a simpler
scheme was next prototyped using a laser line generator on the back of a finger.
This finger was then slowly moved in such a way as to sweep the laser line across the
scene, following the simple and well-known geometry and image-processing techniques
of laser line scanners. As shown in Figure 6.15, this technique is relatively simple to
implement. Because the optical energy of the laser is collapsed onto a 1-D line, instead
of trying to cover a 2-D surface as in the pico projector, there is a significant gain in
signal-to-noise ratio simply due to the higher concentration of laser light for a given
optical power limit.
6.4. SENSOR SUITE 103
Figure 6.15: Left: laser line generator mounted on robotic finger. Middle: Laser linesweeping across scene. Right: typical frame and image-difference processing.
As expected from this modality, the finger-based prototype of this technique
showed the advantages and drawbacks typical of laser line scanners, as previously
described in Chapter 3. Fine detail emerges readily, as shown by the coffee mug scan
in Figure 6.16. However, scans are time consuming. The scan shown in Figure 6.16
was acquired in approximately one minute and assembled from 300 positions of the
finger-mounted laser line generator.
Speckle Generator
As a final prototype experiment, a laser speckle generator was employed to produce
unstructured light. As noted widely throughout the literature and demonstrated by
contemporary robots such as the Willow Garage PR2, the injection of unstructured
texture into scenes can significantly boost the performance of passive stereo vision
systems. This is particularly true in artificial scenes containing featureless walls and
objects, which can cause block-matching stereo algorithms to fail.
A demonstration of the utility of this method is shown in Figure 6.17 on a scene
of an envisioned future application of the robotic hand: grasping a coffee mug from a
table. As is common in artificial workspaces, the scene offers little texture for passive
stereo block-matching, hence producing few points on the surface of the coffee mug
104 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.16: Two scenes showing typical scans of the fingertip-mounted laser scanner.
itself. Laser line scanning (Figure 6.17, right-most column) produces hundreds of
thousands of points, but at the cost of acquiring three hundred images. Injecting
texture into the scene via laser diffraction (Figure 6.17, middle column) offers a
useful intermediate level of point cloud density with the benefits of a single-frame
point-cloud acquisition time.
6.5 Wiring Elimination
Wiring issues are a common cause of failure in robotic systems. Unfortunately, proper
design for wires that flex during normal operation often involves large structures to
increase the bend radius of moving wires, or capturing wires in hollow chains traveling
through carefully-designed routes to stay clear of moving parts. For the design of
a fully-actuated robotic hand, such volumetric requirements are prohibitive, as the
majority of the interior volume is already occupied by electro-mechanical systems. To
address these issues, significant design effort was expended to eliminate all moving
6.5. WIRING ELIMINATION 105
Figure 6.17: Demonstration of unaided stereo (left), texture-assisted stereo (center),and laser line scanning (right) on an artificial scene with very little texture.
wires from the design.
The result of this design effort can be seen in Figure 6.18, which compares the
initial hand prototype, which utilized pre-packaged motors and hall-effect encoders,
with the custom motor assemblies of the final design and eliminated all loose wiring.
The wiring-reduction steps in the hand design can be categorized into three de-
sign efforts. First, wiring involved in the sensing and commutation of the motors.
Second, wiring to provide power and communications between the base of each finger
and sensor arrays mounted in the phalanges. Third, wiring connecting the major
subsystems of the hand. The approaches used to meet each of these challenges were
quite different, and will be detailed in the following sections.
106 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.18: Left: initial prototype hand. Right: final prototype, after extensivedesign work to eliminate loose wires.
6.5.1 Motor Wiring
As briefly mentioned in Section 6.3.2, the hand is actuated by a set of “outrunner”
brushless motors. These motors consist of a three-phase stator winding surrounded by
a permanent-magnet rotor ring. The direction of currents through the stator windings
must be commutated through a six-step cycle in synchrony with the position of the
rotor. Such motors are often packaged as an assembly featuring a small circuit board
underneath the stator that is only slightly larger than the rotor diameter, which
contains the necessary hall-effect switches to provide clocking for the commutation
sequence. Unfortunately, utilizing such pre-assembled motors can result in a design
such as that shown in the left side of Figure 6.18, where a great deal of wiring must be
run between the motor assemblies and the motor controllers. In a compact high-DOF
design environment such as a 12-motor hand, this can become difficult to manage.
To resolve this issue, as well as reduce the volume of the overall design, custom
motor controllers were designed. To control the fabrication cost of the design, com-
mercial off-the-shelf (COTS) low-cost outrunner brushless motors were employed.
6.5. WIRING ELIMINATION 107
Figure 6.19: Stackup of outrunner brushless motors, controller board, and heatsink.
These COTS motors were disassembled, after which their stators and rotors were
re-assembled around the custom motor controllers. As previous discussed in Sec-
tion 6.3.2, each finger module contains three outrunner brushless motors. Corre-
spondingly, each motor controller board contains the necessary hall effect sensors,
amplifiers, computation, and communications resources to drive three motors.
The motor control boards are mounted in between the rotors and the finger
heatsink, as shown in Figure 6.20. This placement allows the hall-effect sensors
for brushless commutation to be placed on the same circuit board as the amplifiers,
reducing the cost, volume, and reliability issues associated with bringing these sensor
inputs to off-board amplifiers.
The remainder of the motor-controller wiring shown in Figure 6.18 was eliminated
by soldering the stator windings directly to the motor control boards. Traces on the
motor control boards were brought to open pads directly underneath the termination
points of the magnet wire phases on the stators, allowing direct and secure connections
inside the volume of the enveloping rotors and avoiding any chance of wires being
ingested into the assembly.
On the motor control boards themselves, integrated motor drivers were utilized.
108 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.20: Finger Motor Controller Board (FMCB)
The ST L6229Q drivers were selected primarily for their compact size and high level
of integration, as they contain commutation logic, current control loops, and the three
half-bridges required for brushless commutation within a 5mm x 5mm package. Even
with the large interior cutouts required in the motor board for the stators to pass
through, the resulting layout easily fits within the volume of the finger module, as
shown in Figure 6.20.
However, one penalty for such a high level of power-stage integration is the rela-
tively high bridge resistance of approximately 2 ohms. To manage the heat dissipation
associated with this resistance, the amplifiers are mounted on the bottom side of the
motor control board, which is subsequently clamped to an aluminum thermal plate,
as shown in Figure 6.19.
To protect the high-speed outrunner rotors from damage, the back of the finger
module is covered by an aluminum shell which bolts to the thermal plate. The
shell also significantly increases the surface area available for heat convection. These
thermal features allow the fingers to operate without the associated reliability issues
6.5. WIRING ELIMINATION 109
of forced-air cooling, despite their relatively high power density and scale of electro-
mechanical integration.
6.5.2 Phalange Wiring
One of the key design challenges of a dense, fully-actuated robotic hand is to create
reliable and compact electrical connections out to the fingertip sensors. In particular,
the F1 link, which connects the orthogonal joints J1 and J2, is particularly challeng-
ing since its internal volume is fully occupied by structural material, bearings, and
shafts for the four cable lengths passing through it. Because F1 is the most proximal
finger link, it sees the highest mechanical stresses. Thus, unfortunate tradeoffs are
created by any effort to remove structural material to create free space for electrical
conductors.
Several prototypes were created using thin 0.5mm-pitch flat-flex cable (FFC),
which has become a standard interconnect method for mass-market products such as
flip phones and hinged laptop screens. However, after the failure of several prototypes,
it was determined that although FFC wiring excels at single-DOF applications, such
as the consumer products listed previously, it is difficult to create satisfactory routes
and service loops for FFC cables in space-constrained multi-DOF settings such as the
F1 links of the fingers.
To bypass the challenges associated with creating routes for copper wiring in
parallel with the steel mechanical tendons, a series of experiments and prototypes
were constructed which use the steel mechanical tendons as electrical conductors.
The resulting system consists of several carefully co-designed mechanical and electrical
components, and is able to multiplex DC power and multi-megabit half-duplex data
onto the pair of steel tendons which pass through the F1 link.
To permit bidirectional and symmetric torque generation at each joint, the F2 and
F3 links each house terminations of a pair of steel tendons. The tendons wrap multiple
times around their respective floating gear reduction in the base of the finger module,
as described in Section 6.3.2. To eliminate the possibility of tendons slipping on their
respective gear reduction, each cable is carefully unwound for a few millimeters at its
110 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.21: Two pairs of steel cables actuate the distal phalanges, shown in red andyellow. Electrical implementation is shown at bottom.
midpoint, and a small screw passes through the cable and into a hole tapped into the
side of the gear reduction.
This mechanical cable actuation system was created to produce maximal torque
density with commodity low-cost parts. However, it can also be exploited to pass
power and data through the difficult design environment of the F1 link. As shown
in Figure 6.21, there are two steel cables running through F1: the cable wrapping
around M2 (illustrated in yellow) and the cable wrapping around M3 (illustrated in
red). Through their retaining screws, these cables are electrically coupled to the ring
gear of their respective gear reductions. These planetary gear reductions have metal
ball bearings and a metal output shaft. As a result of the cable tension preload, the
cables are therefore electrically connected to the output shaft of their respective gear
reductions. Each output shaft, in turn, is electrically connected to the metal retaining
features and setscrews which lock them relative to the plastic front cover of the finger
module chassis. By connecting wires to these retaining features, it is thus possible to
6.5. WIRING ELIMINATION 111
1 2 3 4
D
C
OSRF419 N. Shoreline BlvdMountain View, CA 94043USAhttp://osrfoundation.org10/20/2012 11:26:02 AM
Morgan Quigley
Title
Size: Number:
Date:Designer:
Revision:
Sheet ofTime:
A
D+ D- D+ D-
V+ V-
D+ D-
V+ V-
EN
24V
FMCB
F2 F3
Figure 6.22: Simplified schematic of the multiplexing of power and half-duplex dataover the pair of conductors running the length of the finger. RS-485 transceiversare connected to the D+/D- nodes; F2 and F3 power supplies are connected to theV+/V- nodes. Bus power is supplied from FMCB (left).
obtain electrical connections to the F2 and F3 links, without requiring any flexible
copper conductors.
The steel mechanical tendons and the various mechanical features mentioned in
the previous section incur electrical penalties including both resistance and the po-
tential for intermittent connectivity as the ball bearings in the planetary output stage
are loaded and unloaded. The electrical resistance of the conductor chain totals 1.5
ohms; this would potentially be an issue if this transmission line were intended to drive
motors. However, since only sensors and relatively energy-efficient microcontrollers
are receiving power, the losses are manageable. The potential for intermittent con-
nectivity is addressed by capacitors on the distal sensors, allowing them to maintain
power through the momentary glitches. Data drops are handled by using a protocol
similar to USB, where each packet is protected by checksums and slightly spaced in
time, to allow for fast and unambiguous re-synchronization of all transceivers.
Although this conductor chain is mechanically robust, it only has two conductors.
The phalange sensors require both power and data connections to be useful. Thus,
a multiplexing scheme is employed to transmit both power and half-duplex RS-485
data across the pair of conductors. By reversing the standard connections to one
inductor of a common-mode choke, it becomes a “differential-mode choke,” which
112 CHAPTER 6. A LOW-COST ROBOTIC HAND
strongly attenuates differential-mode signals. Thus, the pair of conductors can have
DC power flowing through it, but still act as a transmission line for data. The RS-485
transceivers on the motor controller and the distal and proximal phalanges are then
capacitively coupled to this conductor pair. Each transceiver has a transmitter-enable
line that is controlled by its corresponding microcontroller. Similar to USB and other
half-duplex architectures, bus traffic follows a simple master/slave scheduling method
to prevent collisions. A simplified schematic diagram is shown in Figure 6.22.
A key requirement of this multiplexing scheme is a data stream with zero DC off-
set. Otherwise, the data will attempt to “drive” the power line slightly higher or lower,
resulting in either saturation of the inductors or overloading the active transceiver’s
output stage. To avoid this condition, the microcontrollers in each module were specif-
ically selected to include Manchester encoders. The Manchester encoding scheme is
a simple method of removing DC bias in a data stream by replacing each “0” or “1”
bit with a “0-1” or “1-0” pair of chips. Because the chipping rate is now twice the
data rate, effectively half the bandwidth has been lost. This condition is consider-
ably improved in more contemporary 8b/10b coding standards, among many others.
However, space constraints in the phalanges, the desire to use commodity low-cost
microcontrollers, and the sufficiency of its throughput for tactile data transmission
resulted in the selection of Manchester coding for these experiments.
6.5.3 Interconnect Wiring
The design challenges and approaches taken to address the motor and phalange wiring
issues were somewhat specific to the design environment of a robotic hand with many
degrees of freedom. However, there are other interconnection challenges that are com-
mon to any high-density electronic system: connecting the various planes of circuitry
in the palm. These circuit boards do not move with respect to each other, allowing
standard high-density connectors to be employed. The cameras, palm board, FPGA
board, and power board all interconnect using 0.5mm-pitch, dual-row connectors.
The internal mechanical structures holding these circuit boards were the result of
careful and interactive electrical and mechanical co-design. The end result is that
6.6. COMPUTATIONAL SYSTEMS 113
installing the circuit boards into the hand cause their connectors to blind-mate, after
which the boards are locked into place using mechanical fasteners.
In contrast to the static interconnections of the palm circuit boards, the fingers are
designed to separate harmlessly from the palm in the event of mechanical overload.
This requires electrical connections which have sufficient capacity to power the motors
of a finger module and disconnect in “5.5” degrees of freedom: all three axes of torque
overload, and all axes of force overload except along the ray pointing from the finger
module to its mating surface on the back of the palm. The many potential modes
of mechanical finger/palm disconnection ruled out many standard connectors, which
would shear or otherwise be destroyed. As a result, a pogo-pin based approach was
employed. Four spring-loaded pogo pins protruding from the front face of each finger
module compress against suitably-placed gold-plated pads on a circuit board attached
to the back of the palm, carrying DC power and RS-485 data between the palm and
the finger module. As the finger disconnects during a mechanical overload, these
pogo pins simply slide or otherwise unload from their respective contact pads. The
circuitry immediately downstream and upstream of this spring-loaded connection is
designed to handle the transients associated with momentary connections or shorts of
these pins and their neighbors. As a result, mechanical breakaway can occur without
mechanical or electrical damage to the finger/palm connection.
Though it is difficult to render a three-dimensional interconnect topology in two
dimensions, Figure 6.23 illustrates the interconnection of the circuit boards in the
palm, as well as the motor control board of a single finger. The “pogo board” shown
connecting a finger socket to a finger motor board is simply an interconnect board
used to create rigid “wiring” between pogo pins protruding from the front of the
finger module to the finger motor control board, which must reside on a plane near
the back of the finger for the reasons described in Figure 6.5.1.
6.6 Computational Systems
The hand contains a distributed network of processors: three ARM Cortex-M3 micro-
controllers in each finger module, one ARM Cortex-M4 in the palm, and one Xilinx
114 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.23: Left: illustration of the connectors in the palm and one finger modulebase. Right: The resulting hand, which has features no loose wires.
Spartan-6 FPGA in the palm. These computational resources are connected through
a hybrid star-like topology, as illustrated in Figure 6.24. The microcontroller in the
base of each finger module has an RS-485 connection to a dedicated UART in the
palm microcontroller, as well as an RS-485 transceiver connected to its phalange
data/power bus, which has two slave controllers on the respective F2 and F3 links,
as described in Section 6.5.2. The palm microcontroller has a variety of outside-
facing peripherals, such as a CANbus isolator, 10/100 ethernet, and a USB isolator,
as well as an internal SPI link to the FPGA, which in turn has a high-bandwidth
connection to a gigabit ethernet physical-layer transceiver (PHY). This architecture
was designed to allow a variety of connectivity options to upstream hardware in the
often-noisy electrical environment of a robotic arm.
Because of the relatively large number of microcontrollers, attention was given to
designing methods to allow batch programming of the entire hand. It would be un-
fortunate, for example, if firmware updates required an operator to manually connect
6.7. TELEOPERATION INTERFACES 115
Figure 6.24: Data bus topology of the robotic hand.
programming adapters to all twelve ARM processors in the fingers. To prevent this sit-
uation, a custom bootloader was written that is compatible with the packet structures
used during normal operation of the hand. This bootloader waits for commands for
several seconds on power-up before booting the application image. Electrical power
to the nodes on each data bus can be shut down electronically by its upstream pro-
cessor. This allows a reprogramming cycle to be triggered at any point by forcing a
hard reset of the downstream processors, catching their respective bootloaders, and
transmitting a new application flash image. Through this method, a single script on
an attached computer can reprogram all twelve finger microcontrollers.
6.7 Teleoperation Interfaces
A variety of teleoperation interfaces were created to subjectively evaluate the perfor-
mance of the robotic hand with a collection of test objects. First, several off-the-shelf
gloves based on resistive flex-sensors were employed. The gloves used during these
tests, however, were challenging to calibrate for precise movements, and had tenden-
cies to experience “mechanical crosstalk” between the flex sensors in the proximal
116 CHAPTER 6. A LOW-COST ROBOTIC HAND
Figure 6.25: An exoskeletal glove designed to precisely measure the movements of theteleoperator, with a kinematic structure similar to the robotic hand.
knuckles of the glove as they gradually migrated in between the knuckles of the tele-
operator during extended experimental sessions.
To improve teleoperation accuracy, an exo-skeletal glove was designed and inter-
faced to the robotic hand, as shown in Figure 6.25. Through a series of four-bar
linkages, this glove directly measures the relative orientations of the tele-operator’s
fingers, and was designed to have kinematics similar to the robotic hand. Limited,
subjective evaluations supported the hypothesis that direct measurement of the con-
figuration of the exoskeletal structure allowed for higher-precision teleoperation, as
compared to inferring the position of the tele-operator’s fingers through resistive flex-
sensor measurements.
The exo-skeletal glove allowed for high-dimensional movements to be expressed by
the tele-operator, such as finger-gaiting an object off a work surface and into a grasp.
However, the exo-skeletal glove required a significant amount of operator attention
to perform basic tasks such as performing perfectly cylindrical or spherical grasps.
This was partially because the range of motion of the robotic hand is significantly
super-human, requiring amplification of the motions of the exo-skeletal hand. The
6.7. TELEOPERATION INTERFACES 117
Figure 6.26: Various grasps and manipulation postures achieved during tele-operation.
resulting mapping required mental work on the part of the tele-operator and could
be surprisingly challenging.
To achieve super-human, reliable execution of “simple” grasps, an eigengrasp-
based interface was developed. Eigengrasps for canonical grasps such as cylindri-
cal, spherical, and prismatic grasps (among others) were experimentally determined.
Commodity gamepads were then utilized to traverse the eigengrasps by holding var-
ious buttons and moving the gamepad axes. Although this interface allowed quick
traversals of canonical grasps, the thumb-joystick interaction of commodity gamepads
caused difficulties in producing small, precise motions. The eigengrasp interface was
then used more successfully when mapped to the sliders of commodity audio-mixing
panels, which are widely available, low cost, and can connect to host computers over
118 CHAPTER 6. A LOW-COST ROBOTIC HAND
USB and other common protocols. The relatively long travel of the sliders, and the
absence of spring-return features, allowed the mixing-panel eigengrasp interface to
more easily perform tasks in subjective evaluations.
Demonstrations of the robotic hand were performed by attaching the hand to
the wrist of a Barrett WAM arm, which was tele-operated using a straightforward
master-slave configuration with a second Barrett WAM arm. A second tele-operator
operated the hand using the exo-skeletal glove for high-dimensional tasks such as
finger gaiting, and the mixing-panel eigengrasp interface for simpler tasks such as
spherical grasps. A representative sampling of grasps and manipulations achieved
during these demonstrations is shown in Figure 6.26.
6.8 Summary
A low-cost and robust robotic hand was designed, prototyped through numerous
iterations, and demonstrated performing a variety of grasping tasks and simple ma-
nipulations. The design of this robotic hand was described in detail, focusing on its
electrical systems.
Chapter 7
STAIR and Switchyard
The previous chapters presented hardware subsystems intended to support the cre-
ation of low-cost robotic systems capable of performing a variety of tasks envisioned
of a future personal robot. However, the previous chapters did not directly address
the challenge of creating the immense volume of software required for a general-
purpose robot to successfully operate autonomously, or even semi-autonomously, in
the unstructured environments of typical homes and offices. To address this chal-
lenge, several robotics software integration frameworks were created as part of the
STanford AI Robot (STAIR) project.
This chapter provides a brief overview of the STAIR project and describes the
design of Switchyard, its original software integration framework. Switchyard was
created to support the rapid integration efforts of dozens of contributors to the STAIR
project, as the project quickly grew to include numerous parallel development efforts
for various robot subsystems.
7.1 Introduction
The STanford Artificial Intelligence Robot (STAIR) project was a long-running effort
by many researchers at Stanford University, led by Professor Andrew Ng. The long-
term goal of the STAIR project was to develop technologies necessary for the future
emergence of viable home and office assistant robots.
119
120 CHAPTER 7. STAIR AND SWITCHYARD
As concrete steps towards this goal, several large-scale demonstrations were cre-
ated to encourage the repeated integration of a large number of complex software
and hardware subsystems. These demonstrations included fetching items in response
to verbal commands, navigating multiple floors of office buildings using elevators,
and performing inventory-taking tasks in cluttered environments. Carrying out these
tasks involved the integration of large software subsystems for navigation, spoken
dialog, visual object detection, and robotic grasping, among others.
At the AAAI 2007 Mobile Robot Exhibition, videos were presented of the STAIR
robot performing a “fetch a stapler” demonstration. In the creation of these demon-
strations, having a consistent software framework was found to be critical to building
a robotic system of the level of complexity of STAIR. The following sections describe
these technical challenges in detail.
7.2 STAIR: Hardware Systems
The first robots built for the STAIR project were named simply STAIR 1 and
STAIR 2. Although each robot had a manipulator arm on a mobile base, they differed
in virtually all implementation details.
7.2.1 STAIR 1
The STAIR 1 robot, shown in Figure 7.1, was constructed using largely off-the-shelf
components. The robot was built atop a Segway RMP-100. The robot arm was
a Neuronics Katana 6M-180, equipped with a parallel plate gripper at the end of
a position-controlled arm with 5 degrees of freedom. Sensors used in the demon-
strations included a Point Grey Bumblebee stereo and trinocular cameras, a SICK
LMS-291 laser scanner in the horizontal plane, and a Sony EVI-D100 pan-tilt-zoom
video camera. For various experiments, additional laser scanners were mounted in
vertical and diagonal planes, as well as atop panning stages, to obtain full 3D point
clouds during both stationary and mobile settings.
The sensors were primarily mounted on an aluminum-extrusion frame bolted to
7.2. STAIR: HARDWARE SYSTEMS 121
Figure 7.1: Left: the STAIR 1 robot. Right: the STAIR 2 robot
the table of the Segway base. The self-balancing capabilities of the Segway were not
used; rather, an additional aluminum frame was constructed which added wheels to
the front and back of the robot to provide static stability. This was done as a practical
measure to avoid damage in the event of an emergency stop, at the cost of increasing
the footprint of the robot by approximately 20cm fore and aft.
STAIR 1 was powered by a deep-cycle 12-volt battery feeding an array of DC-
DC converters, which produced the various DC voltages required for the sensing and
computational subsystems. An off-the-shelf automatic battery charger carried by the
robot allowed the 12-volt power rail to function as a large-capacity uninterruptable
power supply (UPS), thereby enabling the computers and sensors to remain running
as AC power was removed for mobile experiments. The power system allowed for
approximately two hours of runtime on typical system loads.
Onboard computation was provided by a Pentium-M machine running Linux and
a Pentium-4 machine running Windows. These machines were connected via an on-
board ethernet switch and, via an 802.11g wireless bridge, to workstations distributed
throughout the wired building network.
122 CHAPTER 7. STAIR AND SWITCHYARD
7.2.2 STAIR 2
The STAIR 2 platform is also shown in Figure 7.1. Its original wheeled base (com-
prising the bottom 10 inches of the robot) was designed and constructed by Reuben
Brewer of the Stanford Biorobotics Laboratory. This base had four steerable turrets,
each of which contained two independently-driven wheels. As a result, the platform
could holonomically translate in any direction, turn in place, or translate and rotate
simultaneously. Desired motions in the robot’s coordinate system were translated
by a dedicated XScale processor on a Gumstix Verdex board into motor commands.
This mobile base was eventually replaced by a Segway RMP-400 holonomic base con-
structed from four Mecanum wheels, which provided similar capabilities for holonomic
motion.
The mechanical structure of STAIR 2 was also provided by an aluminum-extrusion
frame, which supported a Barrett WAM arm and a variety of sensors. The WAM
arm is a well-known manipulation system with seven degrees of freedom and a 3-
fingered hand. A dedicated onboard Linux PC was used to control the arm. For
perception, the robot was equipped with a Point Grey Bumblebee2 stereo camera,
with perceptual computation carried out on a second onboard Linux machine. The
power and networking systems were similar to STAIR 1: an ethernet switch linked the
onboard computers, and a 802.11g wireless bridge provided connectivity to various
offboard workstations.
7.3 Switchyard
Because of the large number of researchers contributing to the project and the parallel
development of its various subsystems, considerable effort was devoted to improving
the mechanisms used to facilitate collaborative robot software design, debugging, and
distribution.
Many other researchers have worked in this area of robot software systems, produc-
ing notable robotics frameworks such as Player/Stage [29], CARMEN [66], MCA [82],
Tekkotsu [97], Microsoft Robotics studio, and many others.
7.3. SWITCHYARD 123
After investigating these existing frameworks, we determined that our platform
and goals differed sufficiently from those of the designers of other frameworks that
implementing a purpose-built framework would be worthwhile. Specifically, our hard-
ware systems comprised a heterogeneous collection of operating systems and network
topologies. This implied requirements of both cross-platform support and peer-to-
peer messaging, which (at the time) was uncommon. Further, we desired to reduce
boilerplate code and simplify the messaging interface as much as possible, an admit-
tedly vague and subjective goal, yet one which significantly impacts the perceived
difficulty of adopting a software framework. Finally, the project required a collection
of tools for multi-machine process management, hierarchical subsystem composition,
and efficient message recording and playback.
This collection of requirements was subjectively determined to differ sufficiently
from existing projects to justify the creation of a new framework. As such, the Switch-
yard project was started. The following sections describe its design requirements and
goals.
Parallel Processing
The STAIR demonstration applications ran on large, highly capable robots, and
required carrying out a considerable amount of computation. The software had ele-
ments with soft-realtime requirements, such as obstacle avoidance, and longer-running
planning and scene analysis tasks. The onboard computational resources of the robots
could not support all of the required computations, resulting in the usage of several
offboard machines to support each experiment.
Modularity
Because the STAIR project involved dozens of researchers contributing to a sizable
code base, it was important to enforce modularity between software components so
that components could be debugged and verified in isolation as much as possible.
OS-neutral
Most of the computational resources used by the STAIR project were Linux work-
stations and servers. However, a few onboard sensors were only supplied with propri-
etary Windows drivers. As a result, the software system was required to operate on
both Linux and Windows, with data streams flowing transparently between them.
124 CHAPTER 7. STAIR AND SWITCHYARD
Robot-Independent
The STAIR project originally included two robots with significantly different hard-
ware. This encouraged the creation of software which was as robot-independent as
possible, to limit the size and complexity of the code base. Although some software
modules functioned as device drivers and thus were tied to hardware, the architec-
tural goal was to create as many software modules as possible which only sent and
received hardware-independent messages.
Clean
As with any large software project, research progress is significantly easier if the
software is clean, streamlined, and as short as possible.
7.4 Approach
The Switchyard framework was designed to meet the aforementioned requirements.
It supported parallel processing through message passing along a user-defined, task-
specific graph of peer-to-peer connections between software modules.
Modularity was enforced through the classical operating system process model:
each software module executes as a process on some CPU on the network. TCP was
chosen for message passing due to its support on all modern operating systems and
networking hardware, and its lossless performance allowed simple parsers that usually
did not need to handle re-synchronization.
From an aesthetic standpoint, the library was in the form of C++ classes which
each module extended to provide the required functionality. Networking, routing,
and scheduling code were abstracted away from the client software modules, allowing
most modules to have very little boilerplate code. The peer-to-peer connections were
encoded in hierarchical XML files.
These design choices were certainly debatable, and indeed, lessons learned from
Switchyard soon led to another software framework which will be described in the
next chapter. However, for completeness, the following sections will provide details
of Switchyard’s design and operation, as used in video demonstrations shown at the
AAAI 2007 Mobile Robot Exhibition.
7.4. APPROACH 125
7.4.1 Message-Passing Topology
Switchyard set up a “virtual cluster” of computers on top of an existing cluster of
networked machines. The loose term “virtual cluster” is meant to indicate that a
subset of machines on a local-area network operated as a cohesive group during a run
of the robot.
Master Server
One computer in the virtual cluster was chosen to be the master server. Importantly,
the master server did not process all traffic flowing through the virtual cluster. Such
routing patterns would result in a star network topology, which is highly inefficient
for networks with heterogeneous connections between machines.
As a concrete example, consider a STAIR robot with an onboard ethernet switch
connecting several machines onboard the robot, as well as a wireless bridge connect-
ing the onboard ethernet segment to the building network, which in turn consists of
many ethernet switches connecting many more machines. Network throughput be-
tween machines on either side of the wireless link is excellent, but throughput across
the wireless link is often slow, latent, and even intermittent, as the robot moves
through the building and encounters regions of poor radio coverage. For simplicity,
the Switchyard framework was designed around the concept of a single master server
that maintains data structures defining the peer-to-peer connections, and informs
new machines joining the virtual cluster which processes to launch, and where to
find their peers. By definition, the master server must reside on only one side of the
wireless link, and if it were to process all data messages, the wireless link would grind
throughput to a halt across the entire virtual cluster, even if the data-heavy message
flows are fully contained on the subnets on either side of the wireless bridge.
As a result of this routing challenge, the master server was only used to automate
the startup, shutdown, and interconnection of the virtual cluster. Data payloads
sent between software modules always traveled on peer-to-peer TCP connections.
On startup, the master server loaded an XML description of the desired connection
graph—the topology of the virtual cluster—and automated its creation, as described
126 CHAPTER 7. STAIR AND SWITCHYARD
in the next paragraph.
Startup
A simple process-launching program ran on every machine that was part of the virtual
cluster. This program was started with the IP address of the master server and
its “virtual machine name,” which was not necessarily its host name. The process
launcher connected to the master server, announced its name, received back a list
of processes that were to be launched, and forked them. This step was only for
convenience and to reduce manual scripting requirements; if a process needed to be
launched manually (for example, inside a debugging tool), it could be excluded from
the automatic-launch list.
The actual software modules themselves were invoked with the IP address of the
master server, a “virtual process name,” which was not necessarily the executable
image filename, and an available TCP port on which to open a server socket. Processes
started a server on their assigned port, connected to the master server, announced
their name, and received back a list of other processes with which to establish peer-
to-peer connections. The processes then automatically connected to their peers, and
started sending and receiving peer-to-peer messages.
Message streams
Message streams in Switchyard were always unidirectional and asynchronous with
respect to any other modules. From a data-flow perspective, a running system was
visualized as a directed graph, with processes corresponding to graph nodes, and
peer-to-peer connections corresponding to directed edges in the graph. The sending
(or “upstream”) node sent messages at any time. Each message stream could have
any number of downstream receiving nodes. Data was always sent in self-contained
messages such as images, laser scans, maps, matrices, or navigation-waypoint lists.
Although each of these messages could have been divided into smaller units, for
example, images and matrices could be divided into rows or blocks, downstream nodes
would likely have had to reconstruct the original logical unit (e.g., a full image or
7.4. APPROACH 127
matrix) before processing could begin. Thus, to reduce code size and complexity in the
receiving node, Switchyard only presented the user code with entire messages; message
assembly and inflation to user-space data structures were hidden from programmers
using the system.
To simplify the code required for typical uses of the system, a C++ class hierarchy
was provided, together with a set of macros that reduce the typical boilerplate to a
single-line macro instantiation. Abstract superclasses contained all networking and
sequencing code required to transmit byte blocks of arbitrary size. These superclasses
were then derived and specialized to create each type of data flow in the system, which
included:
• 2D, 3D, and 6D points
• Visual and depth images
• 2D navigation waypoint paths
• Grid maps for navigation
• 2D particle clouds for localization
• Configuration-space manipulator coordinates and paths
• Text strings
• Audio samples
• Miscellaneous simple tokens for state-machine sequencing
Each subclass contained only the code necessary to serialize and deserialize its
data structure to and from a byte stream. These methods are implemented as C++
virtual functions, which allows the higher-level scheduling code to invoke the serialize
and deserialize methods without needing to know what is actually being transmitted.
This use of C++ polymorphism significantly reduced the code size and complexity
required for implementing message routing and re-synchronization.
128 CHAPTER 7. STAIR AND SWITCHYARD
Because the computation graph ran asynchronously, whenever a process was ready
to send data to its downstream peers, it invoked a framework-provided function which
did the following:
1. The serialize method of the message subclass serialized the message data
structure to its linear representation.
2. The length of the serialization was sent to downstream peers via TCP.
3. The serialization itself was sent to downstream peers via TCP.
On the downstream side, the framework continually performed the following steps:
1. The serialized size of a message was received and adequate space allocated, if
necessary, to buffer it.
2. The serialization itself was received and buffered.
3. The deserialize virtual method in the message subclass inflated the data
structure.
4. A virtual function was called to notify the receiving process that a data structure
was ready for processing.
To avoid race conditions, each data flow had a mutex which was automatically
locked during the inflation and processing of each message. Thus, the data-processing
code was not required to be re-entrant, which often simplified its implementation. The
framework silently dropped incoming messages if the process has not finished handling
the previous message, thus implementing an effective message queue of length 1.
Data Flow Registration
To initialize a Switchyard process, message streams were required to be instantiated
and registered with the framework. This was typically done in the process constructor,
with a single line of C++ code for each message stream.
If a process was producing data on a message stream, the following actions were
taken by the framework:
7.4. APPROACH 129
• The data flow name, IP address, and port number of the sending process were
sent to the master server, which maintained a static map between processes in
the (static) system topology, and their current network location.
• As message-receiving processes connected to message-producing processes, the
client socket connections were spun off and stored in internal data structures.
• Whenever a message-sending process wished to send data down a message
stream, the framework would invoke the corresponding serialize method for
the message type, and send the serialized message to all downstream peers, as
discussed previously.
If a process was receiving data on a message stream, the following actions were
taken by the framework:
• A thread was spun off to handle the network communications of the message
stream asynchronously from the main thread of the process.
• This thread attempted to connect via TCP with the processes that produced
message on the particular stream in question, using the peer connection mapping
maintained by the master server.
• Once connected to a peer process, the thread announced the data flow it wished
to receive.
• The thread then synchronized to the data stream and invoked user code to
process each incoming message, as discussed in previous sections.
By organizing the behavior in this manner, the entire peer-to-peer messaging
scheme was automatically orchestrated by the framework, saving a great deal of bug-
prone, repetitive networking and sequencing code in each process.
Configuration Ports
Many robotics software modules have startup parameters that need to be configured
for a particular system run. For example, a map server has access to many map files,
130 CHAPTER 7. STAIR AND SWITCHYARD
Figure 7.2: Pan-tilt-zoom (PTZ) camera control graph
a laser scanner can be configured in a variety of resolutions, and so on. Following
the precedent set in the Player/Stage framework, the graph file itself can optionally
contain startup parameters which, at runtime, will override the default values hard-
coded in each process.
As reduction of C++ boilerplate code was a design goal, the configuration ports
were implemented by a table of pointers, which was built up in the constructor of each
process. A single line of C++ code registers a particular variable (e.g. a string or a
double) with the framework, which then handled requests from the master server to
overwrite the values of variables. Because this mechanism was used only for startup
configuration, not run-time parameter changes, such behavior could not cause race
conditions.
7.4.2 Operation
To run an experiment or demonstration, the following steps occurred:
Graph Design
The machines available to run the experiment were described in the XML graph file,
by hostname or IP address. Then, the software processes needed for the experiment
or demonstration were either used off-the-shelf, or written from scratch. Finally, the
connections between the processes were defined in the XML graph file.
7.4. APPROACH 131
As a concrete example, the following XML graph file routed video from machine
“1” to machine “2” and allowed remote pan-tilt-zoom (PTZ) control.
<graph>
<comp name="1">
<proc name="video_display">
<proc name="ptz_control">
</comp>
<comp name="2">
<proc name="ptz_camera">
<port name="compression_quality" value="40"/>
</proc>
</comp>
<conn from="1.ptz_control.ptz" to="2.ptz_camera.ptz">
<conn from="2.ptz_camera.frames" to="1.video_display.frames">
</graph>
When this graph was run, it caused two processes to start on computer 1, and
one process to start on computer 2. Once the processes were running, they connected
to each other automatically and messages started flowing. A visualization of this
simple graph is shown in Figure 7.2. It allowed machine “1” (typically offboard the
robot) to view a video stream and command the camera to pan, tilt, and zoom. The
camera device driver was running on machine “2” (typically onboard the robot) and
the video was transmitted across the network as individual JPEG-compressed images
of configurable quality.
The “fetch an item” demonstration involved a much larger graph. As shown in
Figure 7.3, this graph involved 21 processes running on 3 machines: two onboard
the robot and one offboard. The modular software structure allowed parallel devel-
opment of many modules in much smaller, even trivial, graphs, by disparate groups
of contributors, without requiring coordinated synchronous development and testing
efforts. Unit testing in this fashion thus reduced development time and helped to
isolate bugs.
132 CHAPTER 7. STAIR AND SWITCHYARD
Figure 7.3: Graph of the original STAIR “fetch a stapler” demonstration. The largered text indicates the tasks performed by various regions; those annotations were nota functional part of the graph.
The asynchronous nature of the Switchyard framework is also shown in Figure 7.3.
Cycles in this graph would create potential deadlocks if the graph were to operate
synchronously. To keep things simple, the framework enforced no synchronization:
any and all synchronous behavior was implemented using local state variables in
modules which required it.
7.5. FETCH A STAPLER 133
7.5 Fetch a Stapler
This section will discuss how the previously-described software and hardware systems
were used to perform a “fetch an item” demonstration. First, a user verbally asked the
robot to fetch an item, in this case, a stapler. In response to this spoken command, the
robot navigated to the area containing the item, finds it using visual object detection,
applied a learned grasping strategy to pick up the object, and finally navigated back
to the user to deliver the item.
Figure 7.3 showed the detailed organization of the components used in this demon-
stration. “Computer 1” was an offboard Linux machine, “Computer 2” was an on-
board Windows machine, and “Computer 3” was an onboard Linux machine. The
number in parentheses after each process name indicated what computer ran the
process, and the directed edges in the graph show the message streams.
The processes used could be subdivided into roughly five main functional par-
titions, which were implemented using Switchyard by five largely disjoint teams of
researchers.
Spoken Dialog
In prior work on the STAIR project [47], a Markov Decision Process (MDP) model
to provide robust management of human-robot dialog. However, because the dialog
requirements of the “fetch an item” demonstration were so simple, this behavior
was approximated by using the CMU Sphinx speech-recognition system [102] with a
grammar that included numerous variations of “STAIR, please fetch the [object] from
the [location].” When such a phrase was recognized, the robot gave a simple verbal
acknowledgment using the Festival speech-synthesis system.
Unfortunately, fan noise propagated throughout the metal-framed robot resulted
in unusable audio quality from the robot’s onboard omnidirectional microphones. To
produce usable audio data, a lapel-clip microphone was worn by the experimenter.
The audio signal was transmitted via analog FM to a nearby computer, where it was
digitized and using the PortAudio library inside the audio mic process, resulting in
a continual message stream of digital audio samples.
134 CHAPTER 7. STAIR AND SWITCHYARD
The digital-audio message stream was received by the sphinx transcribe pro-
cess, which utilized the CMU Sphinx speech-recognition system to continuously pro-
cess the incoming digital-audio stream and attempt to recognize a simple grammar.
When the system recognized a “fetch an item” command, it passed the command to
the fetch item director process, which functioned as a central planner. In the “fetch
an item” demonstration, verbal acknowledgment of the command was generated using
the Festival speech-synthesis system, by the tts festival process running onboard
the robot, which verbalized text strings sent to it along a message stream.
Navigation
A comprehensive navigation architecture which enabled STAIR to navigate in indoor
environments and open doors was described in [68]. However, for the “fetch an item”
video demonstration shown at AAAI 2007, portions of this system were replaced to
increase speed and robustness to changing environments. The navigation system used
a Voronoi-based global planner and an implementation of VFH+[101] to avoid local
obstacles. The robot used a map of the Gates building on the Stanford campus, which
was built offline using using the DP-Slam algorithm [23].
As shown in Figure 7.3, the navigation and localization module used a collection of
independent processes to stream laser-scanner readings, perform localization against
a map, generate low-level commands to control the robot base, and so on. The
processes with soft-realtime requirements were run onboard the robot (machine 3),
while processes running on longer time-scales, such as the Voronoi-based path planner,
were run on a more powerful offboard machine (machine 1).
Object detection
The foveal-peripheral visual object detection system used by STAIR used a foveal-
peripheral system with a wide-angle camera and a steerable pan-tilt-zoom (PTZ)
camera, to form a foveal-peripheral imaging system. This subsystem was described
in detail in [32], but is briefly described here for completeness.
Since object detection is significantly easier from high resolution images than
7.6. SUMMARY 135
from low resolution ones, the high-resolution images provided by the steerable pan-
tilt-zoom camera significantly improve the accuracy of the visual object detection
algorithm. (Note that, in contrast, obtaining high resolution, zoomed-in images this
way would not have been possible if we were performing object detection on images
downloaded off the Internet.) In our demonstration, a fast offboard machine (com-
puter 1) was responsible for steering our robot’s PTZ camera to obtain high resolution
images of selected regions, and for running the object recognition algorithm. An on-
board machine (computer 2) was used to run the low-level device drivers responsible
for steering the camera and for taking/streaming images. Our object recognition
system was built using image features described in [86].
Grasping
To pick up the object, the robot used the grasping algorithm developed by [80, 81].
The robot uses a stereo camera to acquire an image of the object to be grasped. Using
the visual appearance of the object, a learned classifier then selects a good “grasp
point”—i.e., a 3D position at which to attempt to pick up the object. The algorithm
for choosing a grasp point was trained on a large set of labeled natural and synthetic
images of a variety of household objects. Although this training set did not include
staplers, the learned feature set was robust enough to generalize to staplers. The low-
level drivers for the camera and the robot arm were run onboard the robot (computer
2); the slower algorithm for finding a grasp point was run offboard (computer 1). An
example of the robot executing a grasp of the stapler is shown in 7.4.
7.6 Summary
This chapter described the hardware and software systems that allowed the STAIR
robot to perform the “fetch a stapler” demonstration, emphasizing its software frame-
work. The Switchyard framework provided a uniform set of conventions for commu-
nications across processes, and allowed different research teams to write software in
parallel for many different modules. Using Switchyard, these modules were then easy
to execute simultaneously and in a distributed fashion across a small set of onboard
136 CHAPTER 7. STAIR AND SWITCHYARD
Figure 7.4: The STAIR1 robot picking up a stapler.
and offboard computers. After extensive use of Switchyard in the Stanford AI Lab,
many lessons were learned which contributed to the Robot Operating System (ROS),
which will be described in the next chapter.
Chapter 8
ROS: A Robot Operating System
The previous chapter provided an overview of the STAIR project, focusing on Switch-
yard, its software-integration platform, which was created to facilitate collaboration
among the project contributors. Although the Switchyard framework was used in the
Stanford AI Laboratory for a variety of research projects and integration efforts, some
areas of potential improvement were quickly identified. These included the static na-
ture of the message-passing graph, the requirement of hand-coding additional message
types, and the lack of software tools and infrastructure to permit radical scaling of
the number of contributors.
As a result of these shortcomings, which were deemed fundamental in nature, a
new project was started: the Robot Operating System (ROS), a collaboration between
the author and numerous other researchers, including many at Willow Garage, Inc.
This chapter describes the motivations, design, and implementation of ROS, an open-
source software framework intended to facilitate collaborative development of complex
robot software systems. ROS is not an operating system in the traditional sense of
processor scheduling and low-level resource management; instead, ROS provides a
structured communications layer above the host operating systems of heterogeneous
compute clusters of various sizes and capabilities.
137
138 CHAPTER 8. ROS: A ROBOT OPERATING SYSTEM
8.1 Overview
As discussed in the previous chapter, writing software for robots is difficult, particu-
larly as the scale and scope of robotics continues to grow. Different types of robots can
have wildly varying hardware, making code reuse nontrivial. On top of this, the sheer
size of the required code can be daunting, as it must contain a deep stack starting
from driver-level software and continuing up through perception, abstract reasoning,
and beyond. Since the required breadth of expertise is well beyond the capabilities
of any single researcher, robotics software architectures must also support large-scale
software integration efforts.
To meet these challenges, robotics researchers have created a variety of frameworks
to manage complexity and facilitate rapid prototyping of software for experiments,
resulting in the many robotic software systems currently used in academia and indus-
try [45]. Each of these frameworks was designed for a particular purpose, perhaps in
response to perceived weaknesses of other available frameworks, or to place emphasis
on aspects which were seen as most important in the design process.
ROS, the framework described in this chapter, is also the product of tradeoffs and
prioritizations made during its design cycle. Its emphasis on large-scale integrative
robotics research is intended to be useful in a wide variety of situations as robotic
systems grow ever more complex. This chapter discusses the design goals of ROS, how
its implementation works towards them, and demonstrate how ROS handles several
common use cases of robotics software development.
8.2 Design Goals
ROS is not the best framework for all robotics software. The field of robotics is far
too broad for a single solution. For example, there are applications and domains with
strict power, size, or cost constraints that preclude using POSIX-based computers
capable of running ROS. There are also applications for which mission-critical safety
considerations are paramount and drive the entire system design. In contrast, ROS
8.2. DESIGN GOALS 139
was designed to meet a specific set of challenges encountered when developing large-
scale service robots as part of the STAIR project [74] at Stanford University and the
Personal Robots Program [107] at Willow Garage. Although the resulting architecture
has found use in a variety of robotics domains, it is important to note its design
environment from the outset.
The philosophical goals of ROS can be summarized as:
• Peer-to-peer
• Tools-based
• Multi-lingual
• Thin
• Free and Open-Source
The following sections will elaborate on these philosophies, showing how they
influenced the design and implementation of ROS.
8.2.1 Peer-to-Peer
A system built using ROS consists of a number of processes, potentially running on
a number of different computers, connected at runtime in a peer-to-peer topology.
Although frameworks based on a central server (e.g., CARMEN [66]) can also realize
the benefits of the multi-process and multi-host design, a central data server is prob-
lematic if the computers are connected in a heterogeneous network, or if very high
data rates are involved.
For example, on the large service robots for which ROS was designed, there are
typically several onboard computers connected via wired ethernet. This network seg-
ment is bridged via wireless LAN to high-power offboard machines that are running
computation-intensive tasks such as computer vision or speech recognition (Figure
8.1). Running the central server either onboard or offboard would result in unnec-
essary traffic flowing across the (slow) wireless link, because many message routes
140 CHAPTER 8. ROS: A ROBOT OPERATING SYSTEM
Figure 8.1: A typical ROS network configuration
are fully contained in the subnets either onboard or offboard the robot. In con-
trast, peer-to-peer connectivity, combined with buffering, throttling, and “fanout”
software modules where necessary, can allow topologies to be readily crafted that
achieve full utilization of the wired links on either side of the wireless link, with only
low-bandwidth, latency-tolerant data traveling across the link.
8.2.2 Multi-lingual
When writing code, many individuals have preferences for some programming lan-
guages above others. These preferences are the result of personal tradeoffs between
programming time, ease of debugging, syntax, runtime efficiency, and a host of other
reasons, both technical and cultural. For these reasons, ROS was designed to be
language-neutral. ROS software is often written in C++, Python, or Java, although
language ports in various states of completion exist to a variety of other languages,
including LISP, C#, and Octave/MATLAB.
The ROS specification is at the messaging layer, not any deeper. Peer-to-peer
connection negotiation and configuration occurs in XML-RPC, for which reasonable
implementations exist in most major languages. Rather than provide a C-based
implementation with stub interfaces generated for all major languages, ROS has been
natively implemented in each target language to better follow the conventions of each
language. However, in some cases it is expedient to add support for a new language
by wrapping an existing library. For example, the Octave client is implemented by
wrapping the ROS C++ library.
8.2. DESIGN GOALS 141
To support cross-language development, ROS uses a simple, language-neutral in-
terface definition language (IDL) to describe the messages sent between modules.
The IDL uses (very) short text files to describe fields of each message, and allows
composition of messages, as illustrated by the complete IDL file for a joystick state
message:
Header header
float32[] axes
int32[] buttons
Code generators for each supported language then generate native message im-
plementations which “feel” like native objects and are automatically serialized and
deserialized by ROS as messages are sent and received. This saves considerable pro-
grammer time and errors; the previous 3-line IDL file is automatically expanded to
hundreds of lines of tedious code in C++, Python, Java, LISP, and Octave.
Because the messages are generated automatically from such simple text files, it
is easy to enumerate new types of messages. At time of writing, the known publicly-
accessible ROS-based software ecosystem contains over two thousand types of mes-
sages. The ease of creating new message types facilitates easier refactoring of larger
software modules into a set of smaller ones. In turn, this allows for finer-grained
debugging, load balancing, and unit testing.
The code generators thus help produce a language-neutral message processing
scheme where different languages can be mixed and matched as desired. Each module
of a ROS network can be written in the language which best fits its (often competing)
design objectives of programmer expertise, performance requirements, and ease of
maintenance.
8.2.3 Tools-based
In an effort to manage the complexity of ROS, it follows a microkernel-inspired design,
where a large number of small tools are used to build and run the various ROS
components. This is in contrast to a monolithic type of design, where a single complex
program is used to provide an integrated development environment (IDE).
142 CHAPTER 8. ROS: A ROBOT OPERATING SYSTEM
The suite of tools provided with ROS perform a variety of tasks to speed up devel-
opment and debugging. These include navigating the (very large) source code forest,
getting and setting configuration parameters, visualizing the peer-to-peer connection
topology, measuring bandwidth utilization, graphically plotting message data, auto-
generating documentation, and so on. Although core services such as a global clock
and a global logging mechanism could have been implemented inside the master mod-
ule, for virtually all decisions of this type, such functionality was pushed into separate
modules. The rationale for such actions was that the loss in efficiency was more than
offset by gains in stability and complexity management.
8.2.4 Thin
As was eloquently described in [59], most existing robotics software projects contain
drivers or algorithms which could be reusable outside of the project. Unfortunately,
due to a variety of reasons, much of this code has become so entangled with mid-
dleware that it is difficult to “extract” its functionality and re-use it outside of its
original context.
To combat this tendency, driver and algorithm development in ROS is strongly
encouraged to occur in standalone libraries that have no dependencies on ROS. The
ROS build system performs modular builds inside the source code tree, and its use
of CMake makes it comparatively easy to follow this “thin” ideology. ROS modules
are strongly encouraged, though not required, to place virtually all complexity in
libraries. These libraries can then be wrapped by small executables which expose
library functionality to ROS, to allow for easier code extraction and reuse beyond
the original goals. As an added benefit, unit testing is often far easier when code is
factored into libraries, as standalone test programs can be written to exercise various
features of the library against pre-defined task sequences or known-good datasets.
ROS re-uses code from numerous other open-source projects, such as the drivers,
navigation system, and simulators from the Player project [104], vision algorithms
from OpenCV [8], and planning algorithms from OpenRAVE [19], among hundreds of
such examples. In each case, ROS is used only to expose various configuration options
8.3. NOMENCLATURE 143
and to route data into and out of the respective software, with as little wrapping or
patching as possible. To benefit from the continual community improvements, the
ROS build system can automatically update source code from external repositories,
apply patches, and perform various other modifications to external source trees.
8.2.5 Free and Open-Source
The full source code of ROS is publicly available. This is critical to facilitate de-
bugging at all levels of the software stack. While proprietary environments such as
Microsoft Robotics Studio [40] and Webots [63] have many commendable attributes,
for some tasks there is simply no substitute for a fully open platform. This is par-
ticularly true when hardware and many levels of software are being designed and
debugged in parallel.
ROS is distributed under the terms of the BSD license, which allows the devel-
opment of both non-commercial and commercial projects. ROS passes data between
modules using inter-process communications, and does not require that modules link
together in the same executable. Systems built around ROS can use fine-grain li-
censing of their various components: individual modules can incorporate software
protected by various licenses ranging from GPL to BSD to proprietary, but license
“contamination” ends at the module boundary.
8.3 Nomenclature
The fundamental concepts of the ROS implementation are nodes, messages, topics,
and services. These terms are used similarly to those in the previous chapter describ-
ing Switchyard. For clarity, they are described in the following paragraphs.
Nodes are processes that perform computation. ROS is designed to be mod-
ular at a fine-grained scale: a system is typically comprised of many nodes, with
many systems involving several hundred nodes. In this context, the term “node” is
interchangeable with “software module.” The use of the term “node” arises from vi-
sualizations of ROS-based systems at runtime. As illustrated in the previous chapter,
144 CHAPTER 8. ROS: A ROBOT OPERATING SYSTEM
when many nodes are running, it is convenient to render the peer-to-peer commu-
nications as a graph, with processes as graph nodes and the peer-to-peer links as
arcs.
Nodes communicate with each other by passing messages. A message is a strictly
typed data structure. Standard primitive types (integer, floating point, boolean,
etc.) are supported, as are arrays of primitive types and constants. Messages can be
composed of other messages, and arrays of other messages, nested arbitrarily deep.
Pairs of messages, termed the request and response, form a service.
A node sends a message by publishing it to a given topic, which is simply a string
such as “odometry” or “map.” A node that is interested in a certain kind of data
will subscribe to the appropriate topic. There may be multiple concurrent publishers
and subscribers for a single topic, and a single node may publish and/or subscribe to
multiple topics. In general, publishers and subscribers are not aware of each others’
existence.
The simplest communications are along pipelines:
microphone
speech recognition
dialog manager
speech synthesis
speaker
8.4. USE CASES 145
However, graphs are usually far more complex, and typically contain cycles and nu-
merous one-to-many or many-to-many connections.
Although the topic-based publish-subscribe model is a flexible communications
paradigm, its asynchronous “broadcast” routing scheme can be overly complex for
simple synchronous transactions. In ROS, a simple synchronous transaction is called
a service, and is defined by a name and a pair of strictly typed messages: one for
the request and one for the response. This is analogous to web services, which are
typically defined by URIs and have request and response documents of well-defined
types. Note that, unlike topics, only one node can advertise a service of any particular
name: there can only be one service called ”classify image”, for example, just as there
can only be one web service at any given URI.
8.4 Use Cases
In this section will describe a number of common scenarios encountered when using
robotic software frameworks. The open architecture of ROS allows for the creation
of a wide variety of tools. Describing the ROS approach to these use cases will also
introduce a number of the tools designed to be used with ROS.
8.4.1 Debugging a single node
When performing robotics research, often the scope of work is limited to a well-defined
area of the system, such as a node which performs some type of planning, reasoning,
perception, or control. However, to bring up a robotic system for experiments, a
much larger software ecosystem must exist. For example, to do vision-based grasping
experiments, drivers must be running for the camera(s) and manipulator(s), and any
number of intermediate processing nodes (e.g., object recognition, pose detection,
trajectory generation) also must be up and running. This adds a significant amount
of difficulty and overhead to integrative robotics research.
ROS is designed to minimize the difficulty of debugging in such settings: its strictly
146 CHAPTER 8. ROS: A ROBOT OPERATING SYSTEM
modular structure allows nodes undergoing active development to run alongside pre-
existing, well-debugged nodes. Because nodes connect to each other at runtime, the
graph can be dynamically modified. In the previous example of vision-based grasping,
a graph with perhaps a dozen nodes is required to provide the “infrastructure.” This
infrastructure graph can be started and left running during an entire experimental
session. Only the node(s) undergoing source code modification need to be periodi-
cally restarted after each recompilation or parameter adjustment, at which time ROS
silently handles the graph modifications to disconnect and reconnect the program
after its relaunch. This can result in a massive increase in productivity, particularly
as the robotic system in question becomes ever more complex and interconnected.
To emphasize, altering the computation graph in ROS often amounts to simply
starting or stopping a POSIX process. In debugging settings, this is typically done at
the command line or in a debugger. The ease of inserting and removing nodes from
a running ROS-based system is one of its most powerful and fundamental features.
8.4.2 Logging and playback
Research in robotic perception is often done most conveniently with logged sensor
data, to permit controlled comparisons of various algorithms and to simplify the
experimental procedure. ROS supports this approach by providing generic logging
and playback functionality. Any ROS message stream can be dumped to disk and
later replayed. Importantly, this can all be done at the command line; it requires no
modification of the source code of any pieces of software in the graph.
For example, the following network graph could be quickly set up to collect a
dataset for visual-odometry research:
8.4. USE CASES 147
camera
logger visualizer
robot
The resulting message dump can be played back into a different graph, which
contains the node under development:
logger vision research visualizer
As before, node instantiation can be performed simply by launching a process; it
can be done at the command line, in a debugger, from a script, etc.
To facilitate logging and monitoring of systems distributed across many hosts, the
rosconsole library builds upon the Apache project’s log4cxx system to provide a
convenient and elegant logging interface, allowing printf-style diagnostic messages
to be routed through the network to a single stream called rosout.
8.4.3 Packaged subsystems
Some areas of robotics research, such as indoor robot navigation, have matured to the
point where “out of the box” algorithms can work reasonably well. ROS leverages
the algorithms implemented in the Player project to provide a navigation system,
producing this graph:
148 CHAPTER 8. ROS: A ROBOT OPERATING SYSTEM
robot
localization
planner
laser map
Although each node can be run from the command line, repeatedly typing the
commands to launch the processes can get tedious, particularly with large subgraphs.
To allow for “packaged” functionality such as a navigation system, ROS provides a
tool called roslaunch, which reads an XML description of a graph and instantiates
the graph on the cluster, optionally on specific hosts. The end-user experience of
launching a navigation system then boils down to
roslaunch navstack.xml
and a single Ctrl-C will gracefully close all processes described in the XML document.
This functionality can also significantly aid sharing and reuse of large demonstrations
of integrative robotics research, as the set-up and tear-down of large distributed
systems can be encoded once by its creator, and subsequently inserted and removed
from other ROS systems.
8.4.4 Collaborative Development
Due to the vast scope of robotics and artificial intelligence, collaboration between
researchers is necessary in order to build large systems. To support collaborative
8.4. USE CASES 149
development, the ROS software system is organized into packages. The definition
of “package” is deliberately open-ended: a ROS package is simply a directory which
contains an XML file describing the package and stating any dependencies.
A collection of ROS packages is a directory tree with ROS packages at the leaves:
a ROS package repository may thus contain an arbitrarily complex scheme of sub-
directories. For example, one ROS repository has root directories including “nav,”
“vision,” and “motion planning,” each of which contains many packages as subdirec-
tories.
ROS provides a utility called rospack to query and inspect the code tree, search
dependencies, find packages by name, etc. A set of shell expansions called rosbash
is provided for convenience, accelerating command-line navigation of the system.
The rospack utility is designed to support simultaneous development across mul-
tiple ROS package repositories. Environment variables are used to define the roots
of local copies of ROS package repositories, and rospack crawls the package trees as
necessary. Recursive builds, supported by the rosmake utility, allow for cross-package
library dependencies.
The open-ended nature of ROS packages allows for great variation in their struc-
ture and purpose: some ROS packages wrap existing software, such as Player or
OpenCV, automating their builds and exporting their functionality. Some packages
build nodes for use in ROS graphs, other packages provide libraries and standalone
executables, and still others provide scripts to automate demonstrations and tests.
The packaging system is meant to partition the building of ROS-based software into
small, manageable chunks, each of which can be maintained and developed on its own
schedule by its own team of developers.
At time of writing, several thousand ROS packages exist across over one hundred
publicly-viewable repositories, and hundreds more likely exist in private repositories at
various institutions and companies. The ROS distributions, both binary and source-
code, are available via the ROS website:
http://ros.org
Additional packages are found on other sites. Known publicly-viewable code reposi-
tories are regularly queried and indexed by a crawling engine, with a searchable index
150 CHAPTER 8. ROS: A ROBOT OPERATING SYSTEM
available on the ROS website.
8.4.5 Visualization and Monitoring
While designing and debugging robotics software, it often becomes necessary to ob-
serve some state while the system is running. Although printf is a familiar technique
for debugging programs on a single machine, this technique can be difficult to extend
to large-scale distributed systems, and can become unwieldy for general-purpose mon-
itoring.
Instead, ROS can exploit the dynamic nature of the connectivity graph to “tap
into” any message stream on the system. Furthermore, the decoupling between pub-
lishers and subscribers allows for the creation of general-purpose visualizers. Simple
programs can be written which subscribe to a particular topic name and plot a par-
ticular type of data, such as laser scans or images. However, a more powerful concept
is a visualization program which uses a plugin architecture: this is done in the rviz
program, which is distributed with ROS. Visualization panels can be dynamically in-
stantiated to view a large variety of datatypes, such as images, point clouds, geometric
primitives (such as object recognition results), render robot poses and trajectories,
etc. Plugins can be easily written to display more types of data.
A native ROS port is provided for Python, a dynamically-typed language sup-
porting introspection. Using Python, a powerful utility called rostopic was written
to filter messages using expressions supplied on the command line, resulting in an in-
stantly customizable “message tap” which can convert any portion of any data stream
into a text stream. These text streams can be piped to other UNIX command-line
tools such as grep, sed, and awk, to create complex monitoring tools without writing
any code.
Similarly, a tool called rxplot provides the functionality of a virtual oscilloscope,
plotting any variable in real-time as a time series, again through the use of Python
introspection and expression evaluation.
8.4. USE CASES 151
8.4.6 Composition of functionality
In ROS, a “stack” of software is a cluster of nodes that does something useful, as
was illustrated in the navigation example. As previously described, ROS is able to
instantiate a cluster of nodes with a single command, once the cluster is described in
an XML file. However, sometimes multiple instantiations of a cluster are desired. For
example, in multi-robot experiments, a navigation stack will be needed for each robot
in the system, and robots with humanoid torsos will likely need to instantiate two
identical arm controllers. ROS supports this by allowing nodes and entire roslaunch
cluster-description files to be pushed into a child namespace, thus ensuring that
there can be no name collisions. Essentially, this prepends a string (the namespace)
to all node, topic, and service names, without requiring any modification to the code
of the node or cluster. Figure 8.2 shows a hierarchical multi-robot control system
constructed by simply instantiating multiple navigation stacks, each in their own
namespace:
Figure 8.2 was automatically generated by the rxgraph tool, which can inspect
and monitor any ROS graph at runtime. Its output renders nodes as ovals, topics as
squares, and connectivity as arcs.
8.4.7 Transformations
Robotic systems often need to track spatial relationships for a variety of reasons:
between a mobile robot and some fixed frame of reference for localization, between
the various sensor frames and manipulator frames, or to place frames on target objects
for control purposes.
To simplify and unify the treatment of spatial frames, a transformation system has
been written for ROS, called tf. The tf system constructs a dynamic transformation
tree which relates all frames of reference in the system. As information streams in
from the various subsystems of the robot (joint encoders, localization algorithms,
etc.), the tf system can produce streams of transformations between nodes on the
tree by constructing a path between the desired nodes and performing the necessary
calculations.
152 CHAPTER 8. ROS: A ROBOT OPERATING SYSTEM
Figure 8.2: An automatically-generated rendering of a running ROS system
For example, the tf system can be used to easily generate point clouds in a sta-
tionary “map” frame from laser scans received by a tilting laser scanner on a moving
robot. As another example, consider a two-armed robot: the tf system can stream
the transformation from a wrist camera on one robotic arm to the moving tool tip of
the second arm of the robot. These types of computations can be tedious, error-prone,
and difficult to debug when coded by hand, but the tf implementation, combined with
the dynamic messaging infrastructure of ROS, allows for an automated, systematic
approach.
8.5. SUMMARY 153
8.5 Summary
ROS follows a philosophy of modular, tools-based open-source software development.
Its open-ended design was intended to be readily extensible by other researchers, to
build robot software systems which can be useful to a variety of hardware platforms,
research settings, and runtime requirements. At time of writing, it has become a
popular platform for the human-scale service-robotics platforms, as well as a variety
of other robotics domains.
Chapter 9
Conclusions
This thesis presented a series of hardware and software systems designed to facilitate
personal robotics. In the context of this thesis, “personal robotics” was taken to im-
ply robotic systems that are primarily owned by, or perform work directly benefiting,
the end users. The linguistic borrowing of the term personal from the computing
industry was intended to highlight the nature of this emerging application domain.
Like early personal computers, the application space is still being explored, and the
economic and societal impacts of highly capable personal robots appear to be promis-
ing, but remain unclear. As a result, the work described in this thesis investigated
two key challenges in contemporary personal robotics: hardware cost and software
interoperability.
The over-arching theme of the hardware systems contained in this thesis was to
enable low-cost mechanical subsystems by increasing the complexity of the software,
firmware, or electrical designs. The justification for these efforts was simple: highly
complex software can be mass-produced essentially at zero cost, run on ever-cheaper
computing systems, and more easily incorporate future improvements by collabora-
tors. In contrast, high-precision mechanical assemblies require massive economies of
scale and significant investment in static tooling in order to obtain cost reduction.
The robotic hand described in Chapter 6 incorporated numerous design elements
from the preceeding chapters, and served to illustrate the potential of highly dense
and tightly co-designed low-cost mechanical and electrical systems.
154
155
The software systems presented in this thesis were explicitly designed to facilitate
the re-use of software among the robotics community, and has resulted in a significant
increase in the quality and quantity of interoperable open-source robotics software.
The Robot Operating System (ROS) framework was designed to have sufficient per-
formance and scalability to manage the high-level, non-realtime intercommunications
of a complex, human-scale robot. In addition, ROS was designed to be integration-
friendly by reducing the effort required to create and package software modules, and
equally important, to subsequently incorporate community-contributed collections of
these software modules to create a customized state-of-the-art robot software system.
The integration-friendly design elements detailed in Chapter 8 have allowed the ROS
software ecosystem to benefit from thousands of contributors.
ROS is not the highest-performing framework for modular robotics software, as
measured by metrics of messaging throughput, latency, or jitter. There are many
technical improvements that can and should be made in future work. In particular,
many opportunities exist for new frameworks and collaboration methodologies in the
low-level, hard-realtime subdomains of robot software systems.
However, the various performance limitations of ROS have not prevented its adop-
tion in the robotics community. Instead, ROS provides messaging performance that
is “good enough” for a variety of medium- to high-level, non-realtime tasks, such as
the perceptive and deliberative layers of embodied AI systems. Rather than allo-
cating the time and resources required to implement the highest possible messaging
performance in ROS, massive effort was expended to reduce barriers to technical col-
laboration among scientists and engineers. This included numerous design cycles and
efforts to simplify the fundamental messaging API, web-based documentation and in-
dexing tools, and extensive efforts by collaborators at Willow Garage, Inc. and other
institutions to provide hands-on training to thousands of individuals, freely provide
fully-supported hardware platforms through the PR2 Beta Program, and organize
numerous technical conferences and workshops. The result of these efforts is a large
and growing community, comprising academic institutions, corporations, government
agencies, and other interested parties, who are now creating software of increasing
interoperability and generality.
156 CHAPTER 9. CONCLUSIONS
The work presented in this thesis, comprised of low-cost hardware design method-
ologies and collaborative open software systems, suggests several directions for con-
tinued research towards the long-term goal of widespread, general-purpose personal
robots. In particular, the performance of low-cost, low-power embedded systems con-
tinues to increase rapidly, blurring the traditional performance boundaries between
microcontrollers and microprocessors. Exploiting the capability of these architec-
tures for real-time processing, while still providing an integration-friendly program-
ming and debugging environment, would allow increased community interaction with
lower levels of the system stack. This points towards opportunities for the creation of
new software frameworks intended for memory-constrained, resource-limited environ-
ments, and which are specifically designed to support massive collaboration. Creating
interoperable bridges between open low-level software frameworks and high-level soft-
ware systems such as ROS is an exciting avenue for future research and development.
Bibliography
[1] H. Aldridge and J.-N. Juang. Joint Position Sensor Fault Tolerance in Robot
Systems using Cartesian Accelerometers. AIAA Guidance, Navigation, and
Control Conference, 1996.
[2] RO Ambrose, H. Aldridge, RS Askew, RR Burridge, W. Bluethmann,
M. Diftler, C. Lovchik, D. Magruder, and F. Rehnmark. Robonaut: NASA’s
Space Humanoid. IEEE Intelligent Systems and Their Applications, 15(4):57–
63, 2000.
[3] L. Bao and S. Intille. Activity Recognition from User-Annotated Acceleration
Data. Pervasive Computing, 3001/2004, 2004.
[4] Y. Bar-Cohen and C.L. Breazeal. Biologically-Inspired Intelligent Robots. So-
ciety of Photo Optical, 2003.
[5] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-Up Robust Features
(SURF). Computer Vision and Image Understanding, 2008.
[6] M. Berna, B. Lisien, B. Sellner, G. Gordon, F. Pfenning, and S. Thrun. A
learning algorithm for localizing people based on wireless signal strength that
uses labeled and unlabeled data. In Proc. of the International Joint Conference
on Artificial Intelligence (IJCAI), 2003.
[7] P. Bolliger. Redpin - adaptive, zero-configuration indoor localization through
user collaboration. In Proc. of the First ACM Int. Workshop on Mobile Entity
Localization and Tracking in GPS-less Environments, 2008.
157
158 BIBLIOGRAPHY
[8] G. Bradski and A. Kaehler. Learning OpenCV. O’Reilly, 2008.
[9] R. Brooks, C. Breazeal, M. Marjanovic, B. Scassellati, and M. Williamson.
The Cog Project: Building a Humanoid Robot. Computation for metaphors,
analogy, and agents, pages 52–87, 1999.
[10] H. Bruyninckx. Open Robot Control Software: the OROCOS Project. In IEEE
International Conference on Robotics and Automation, pages 2523–2528, 2001.
[11] W. Burgard, A. B. Cremers, D. Fox, D. Haehnel, G. Lakemeyer, D. Schulz,
W. Steiner, and S. Thrun. Experiences with an Interactive Museum Tour-Guide
Robot. Artificial Intelligence, 114:3–55, 1999.
[12] G. Canepa, J. Hollerbach, and A. Boelen. Kinematic Calibration by Means
of a Triaxial Accelerometer. IEEE International Conference on Robotics and
Automation, 1994.
[13] RF Chandler, CE Clauser, JT McConville, HM Reynolds, and JW Young. In-
vestigation of Inertial Properties of the Human Body. NTIS, National Technical
Information Service, 1975.
[14] C. M. Christensen. The Innovator’s Dilemma: When New Technologies Cause
Great Firms to Fail. Harvard Business Press, 1997.
[15] J. Craig. Introduction to Robotics: Mechanics and Control, 3rd ed. Pearson
Prentice Hall, 2005.
[16] G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual Cate-
gorization with Bags of Keypoints. In In Workshop on Statistical Learning in
Computer Vision, ECCV, pages 1–22, 2004.
[17] N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection.
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2005.
BIBLIOGRAPHY 159
[18] F. Dellaert, D. Fox, W. Burgard, and S. Thrun. Monte Carlo Localization
for Mobile Robots. In Proc. of the International Conference on Robotics and
Automation (ICRA), 1999.
[19] R. Diankov and J. Kuffner. The Robotic Busboy: Steps Towards Developing a
Mobile Robotic Home Assistant. In Intelligent Autonomous Systems, volume 10,
2008.
[20] E. Dombre, G. Duchemin, P. Poignet, and F. Pierrot. Dermarob: a Safe Robot
for Reconstructive Surgery. IEEE Transactions on Robotics and Automation,
19(5):876–884, 2003.
[21] F. Duvallet and A. Tews. WiFi Position Estimation in Industrial Environments
Using Gaussian Processes. In Proc. of the IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS), 2008.
[22] A. Edsinger-Gonzales and J. Weber. Domo: A Force Sensing Humanoid Robot
for Manipulation Research. In 2004 4th IEEE/RAS International Conference
on Humanoid Robots, pages 273–291, 2004.
[23] A. Eliazar and R. Parr. Hierarchical Linear/Constant Time SLAM Using Parti-
cle Filters for Dense Maps. In Neural Information Processing Systems (NIPS),
2005.
[24] D. Falie and V. Buzuloiu. Noise Characteristics of 3D Time-of-Flight Cameras.
In International Symposium on Signals, Circuits, and Systems, 2007.
[25] B. Ferris, D. Haehnel, and D. Fox. Gaussian Processes for Signal Strength-
Based Location Estimation. In Proc. of Robotics: Science and Systems (RSS),
2006.
[26] D. Fontaine, D. David, and Y. Caritu. Sourceless Human Body Motion Capture.
Smart Objects Conference, 2003.
[27] E. Foxlin and L. Naimark. VIS-Tracker: A Wearable Vision-Inertial Self-
Tracker. IEEE Virtual Reality Conference, 2003.
160 BIBLIOGRAPHY
[28] J. Friedman, T. Hastie, and R. Tibshirani. Additive Logistic Regression: a
Statistical View of Boosting. In Technical report, Dept. of Statistics, Stanford
University, 1998.
[29] B. Gerkey, R. Vaughan, and A. Howard. The Player/Stage Project: Tools for
Multi-Robot and Distributed Sensor Systems. In International Conference on
Advanced Robotics (ICRA), 2003.
[30] F. Ghassemi, S. Tafazoli, P.D. Lawrence, and K. Hashtrudi-Zaad. An
Accelerometer-Based Joint Angle Sensor for Heavy-Duty Manipulators. IEEE
International Conference on Robotics and Automation, 2002.
[31] K. Goldberg. What is Automation? IEEE Transactions on Automation Science
and Engineering, 9(1):1–2, 2012.
[32] S. Gould, J. Arfvidsson, A. Kaehler, B. Sapp, M. Meissner, G. Bradski,
P. Baumstarck, S. Chung, and A. Y. Ng. Peripheral-Foveal Vision ror Real-
time Object Recognition and Tracking in Video. In Twentieth International
Joint Conference on Artificial Intelligence (IJCAI-07), 2007.
[33] S. Gould, P. Baumstarck, M. Quigley, A. Y. Ng, and D. Koller. Integrating Vi-
sual and Range Data for Robotic Object Detection. In European Conference on
Computer Vision (ECCV) workshop on Multi-camera and Multi-modal Sensor
Fusion Algorithms and Applications (M2SFA2), 2008.
[34] G. Grisetti, C. Stachniss, and W. Burgard. Improved Techniques for Grid Map-
ping with Rao-Blackwellized Particle Filters. IEEE Transactions on Robotics,
2006.
[35] G. Guennebaud, B. Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010.
[36] A. Haeberlen, E. Flannery, A. Ladd, A. Rudys, D. Wallach, and L. Kavraki.
Practical robust localization over large-scale 802.11 wirless networks. In Proc.
of the International Conference on Mobile Computing and Networking, 2004.
BIBLIOGRAPHY 161
[37] J. Hertzberg and F. Kirchner. Landmark-based Autonomous Navigation in
Sewerage Pipes. In Proc. of the First Euromicro Workshop on Advanced Mobile
Robotics, 1996.
[38] G. Hirzinger, N. Sporer, A. Albu-Schaffer, M. Hahnle, R. Krenn, A. Pascucci,
and M. Schedl. DLR’s Rorque-Controlled Light Weight Robot III- Are We
Reaching the Technological Limits Now? In Proceedings- IEEE International
Conference on Robotics and Automation, volume 2, pages 1710–1716, 2002.
[39] H. Iwata, S. Kobashi, T. Aono, and S. Sugano. Design of Anthropomorphic
4-DOF Tactile Interaction Manipulator with Passive Joints. Intelligent Robots
and Systems, 2005 (IROS 2005), pages 1785 – 1790, Aug. 2005.
[40] J. Jackson. Microsoft Robotics Studio: A Technical Introduction. In IEEE
Robotics and Automation Magazine, Dec. 2007. http://msdn.microsoft.com/en-
us/robotics.
[41] S. Jacobsen, E. Iversen, D. Knitti, R. Johnson, and R. Biggers. Design of the
Utah/MIT Dextrous Hand. In ICRA, 1986.
[42] R. Jazar. Theory of Applied Robotics, 2nd Ed. Springer, 2010. The author of
this thesis is indebted to Sonny Chan for pointing to this solution.
[43] P. Jensfelt, D. Austin, O. Wijk, and M. Anderson. Feature-Based Condensation
for Mobile Robot Localization. In Proc. of the International Conference on
Robotics and Automation (ICRA), pages 2531–2537, 2000.
[44] J. Jezouin, P. Saint-Marc, and G. Medioni. Building an Accurate Range Finder
with Off-the-Shelf Components. In Proceedings of CVPR, 1988.
[45] J. Kramer and M. Scheutz. Development Environments for Autonomous Mobile
Robots: A Survey. Autonomous Robots, 22(2):101–132, 2007.
162 BIBLIOGRAPHY
[46] B. Krose, N. Vlassis, and R. Bunschoten. Omnidirectional Vision for
Appearance-Based Robot Localization. In Revised Papers from the Interna-
tional Workshop on Sensor Based Intelligent Robots, pages 39–50. Springer-
Verlag, 2002.
[47] F. Krsmanovic, Curtis Spencer, Daniel Jurafsky, and A. Y. Ng. Have we met?
MDP based speaker ID for robust dialog. In Ninth International Conference on
Spoken Language Processing (InterSpeech-ICSLP), 2006.
[48] KUKA. youbot arm, 2010.
[49] A. Kumpf. Explorations in Low-Cost Compliant Robotics. Master’s thesis,
Massachusetts Institute of Technology, 2007.
[50] S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyra-
mid Matching for Recognizing Natural Scene Categories. In IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2006.
[51] S. Lenser and M. Veloso. Sensor Resetting Localization for Poorly Modelled
Mobile Robots. In Proc. of the International Conference on Robotics and Au-
tomation (ICRA), 2000.
[52] J. J. Leonard and H. F. Durrant-Whyte. Mobile Robot Localization by Tracking
Geometric Beacons. IEEE Transactions on Robotics and Automation, 7:376–
382, 1991.
[53] J. Letchner, D. Fox, and A. LaMarce. Large-Scale Localization from Wireless
Signal Strength. In Proc. of the National Conference on Artificial Intelligence
(AAAI), 2005.
[54] M. Levoy, K. Pulli, B. Curless, S. Rusinkiewicz, D. Koller, L. Pereira, M. Ginz-
ton, S. Anderson, J. Davis, J. Ginsberg, J. Shade, and D. Fulk. The Digital
Michelangelo Project: 3D Scanning of Large Statues. In SIGGRAPH, 2000.
BIBLIOGRAPHY 163
[55] J. Li, J. Zhu, Y. Guo, X. Lin, K. Duan, Y. Wang, and Q. Tang. Calibration
of a Portable laser 3-D Scanner used by a robot and its use in Measurement.
Optical Engineering, 47(1), 2008.
[56] Y. F. Li and X. B. Chen. End-Point Sensing and State Observation of a Flexible-
Link Robot. IEEE/ASME Transactions on Mechatronics, 6(3), 2001.
[57] H. Lim, L. Kung, J. Hou, and H. Luo. Zero-configuration, Robust Indoor
Localization: Theory and Experimentation. In Proc. of IEEE INFOCOM, 2006.
[58] H. Liu and G. Pang. Accelerometers for Mobile Robot Positioning. IEEE
Transactions on Industry Applications, 37(3), 2001.
[59] A. Makarenko, A. Brooks, and T. Kaupp. On the Benefits of Making Robotic
Software Frameworks Thin. In IROS, November 2007.
[60] L. Matthies, T. Balch, and B. Wilcox. Fast Optical Hazard Detection for Plan-
etary Rovers using Multiple Spot Laser Triangulation. In ICRA, 1997.
[61] S. May, B. Werner, H. Surmann, and K. Pervolz. 3D Time-of-Flight Cameras
for Mobile Robotics. In IROS, 2006.
[62] C. Mertz, J. Kozar, J. R. Miller, and C. Thorpe. Eye-safe Laser Line Striper
for Outside Use. In IEEE Intelligent Vehicle Symposium, 2001.
[63] O. Michel. Webots: a Powerful Realistic Mobile Robots Simulator. In Proc. of
the Second Intl. Workshop on RoboCup. Springer-Verlag, 1998.
[64] N. Miller, O. C. Jenkins, M. Kallmann, and M. J. Mataric. Motion Cap-
ture from Inertial Sensing for Untethered Humanoid Teleoperation. In 2004
4th IEEE/RAS International Conference on Humanoid Robots, pages 547–565,
2004.
[65] N. Miller, O.C. Jenkins, M. Kallman, and M. Mataric. Motion Capture from
Inertial Sensing for Untethered Humanoid Teleoperation. International Journal
of Humanoid Robotics, 2008.
164 BIBLIOGRAPHY
[66] M. Montemerlo, N. Roy, and S. Thrun. Perspectives on Standardization in
Mobile Robot Programming: The Carnegie Mellon Navigation (CARMEN)
Toolkit. In IEEE/RSJ International Conference on Intelligent Robots and Sys-
tems (IROS), 2003.
[67] K. Parsa, J. Angeles, and A. Misra. Pose-and-Twist Estimation of a Rigid
Body Using Accelerometers. IEEE International Conference on Robotics and
Automation, 2001.
[68] A. Petrovskaya and A. Y. Ng. Probabilistic Mobile Manipulation in Dynamic
Environments, with Application to Opening Doors. In International Joint Con-
ference on Artificial Intelligence (IJCAI), 2007.
[69] F. Pierrot, E. Dombre, E. Degoulange, L. Urbain, P. Caron, S. Boudet,
J. Gariepy, and J. Megnien. Hippocrate: a Safe Robot Arm for Medical Appli-
cations with Force Feedback. Medical Image Analysis, 3(3):285–300, 1999.
[70] G.A. Pratt and M.M. Williamson. Series Elastic Actuators. In Proceedings
of the IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS-95), volume 1, pages 399–406, 1995.
[71] J. Pratt, B. Krupp, and C. Morse. Series Elastic Actuators for High Fidelity
Force Control. Industrial Robot: An International Journal, 29(3):234–241, 2002.
[72] M. Quigley, A. Asbeck, and A. Y. Ng. A Low-cost Compliant 7-DOF Robotic
Manipulator. In IEEE International Conference on Robotics and Automation
(ICRA), 2011.
[73] M. Quigley, S. Batra, S. Gould, E. Klingbeil, Q. Le, A. Wellman, and A. Y.
Ng. High-Accuracy 3D Sensing for Mobile Manipulation: Improving Object
Detection and Door Opening. In International Conference on Robotics and
Automation (ICRA), 2009.
[74] M. Quigley, E. Berger, and A. Y. Ng. STAIR: Hardware and Software Archi-
tecture. In AAAI Robotics Workshop, 2007.
BIBLIOGRAPHY 165
[75] M. Quigley, R. Brewer, S. P. Soudararaj, V. Pradeep, Q. Le, and A. Y. Ng. Low-
cost Accelerometers for Robotic Manipulator Perception. In IEEE Conference
on Intelligent Robots and Systems (IROS), 2010.
[76] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger,
R. Wheeler, and A. Y. Ng. ROS: an open-source Robot Operating System.
In Open-Source Software workshop of the International Conference on Robotics
and Automation (ICRA), 2009.
[77] M. Quigley, D. Stavens, and S. Thrun. Sub-meter Indoor Localization in Un-
modified Environments with Inexpensive Sensors. In International Conference
on Intelligent Robots and Systems (IROS), 2010.
[78] B. Rooks. The Harmonious Robot. Industrial Robot: An International Journal,
33(2):125–130, 2006.
[79] J. K. Salisbury and J. J. Craig. Articulated Hands: Force Control and Kine-
matic Issues. The International Journal of Robotics Research, 1(1), 1982.
[80] A. Saxena, J. Driemeyer, J. Kearns, and A. Y. Ng. Robotic Grasping of Novel
Objects. In Neural Information Processing Systems (NIPS), 2006.
[81] A. Saxena, J. Driemeyer, J. Kearns, C. Osondu, and A. Y. Ng. Learning to
Grasp Novel Objects Using Vision. In International Symposium on Experimen-
tal Robotics (ISER), 2006.
[82] K.-U. Scholl, J. Albiez, and B. Gassmann. MCA - An Expandable Modular
Controller Architecture. In 3rd Real-Time Linux Workshop, Milan, Italy, 2001.
[83] D. Schulz and D. Fox. Bayesian Color Estimation for Adaptive Vision-Based
Robot Localization. In Proc. of the IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), 2004.
[84] Schunk. 7-DOF LWA Manipulator, 2010.
166 BIBLIOGRAPHY
[85] S. Se, D. Lowe, and J. Little. Vision-Based Global Localization and Mapping
for Mobile Robots. IEEE Transactions on Robotics, 21(3):364–375, 2005.
[86] T. Serre, L. Wolf, and T. Poggio. Object Recognition with Features Inspired by
Visual Cortex. In IEEE Conference on Computer Vision and Pattern Recogni-
tion, 2005.
[87] Shadow Robot Company, Ltd. Shadow Hand, 2012.
[88] D. Shin, I. Sardellitti, and O. Khatib. A Hybrid Actuation Approach for
Human-Friendly Robot Design. In IEEE Int. Conf. on Robotics and Automation
(ICRA 2008), Pasadena, USA, pages 1741–1746, 2008.
[89] R. G. Simmons, J. Fernandez, R. Goodwin, S. Koenig, and J. O’Sullivan.
Lessons Learned from Xavier. IEEE Robotics and Automation Magazine, 7:33–
39, 2000.
[90] R. G. Simmons, S. Thrun, C. Athanassiou, J. Cheng, L. Chrisman, R. Good-
win, G. T. Hsu, and H. Wan. Odysseus: An Autonomous Mobile Robot. AI
Magazine, 1992.
[91] R. Slyper and J. Hodgins. Action Capture with Accelerometers. Eurograph-
ics/ACM SIGGRAPH Symposium on Computer Animation, 2008.
[92] J. Stuckler, M. Schreiber, and S. Behnke. Dynamaid, an Anthropomorphic
Robot for Research on Domestic Service Applications. In Proc. of the 4th
European Conference on Mobile Robots (ECMR), 2009.
[93] J. Sun, N. Zheng, and H. Shum. Stereo Matching Using Belief Propagation.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7), 2003.
[94] Barrett Technology. http://www.barrett.com.
[95] S. Thrun. Bayesian landmark learning for mobile robot localization. Machine
Learning, 33, 1998.
[96] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, 2005.
BIBLIOGRAPHY 167
[97] E. Tira-Thompson. Tekkotsu: A Rapid Development Framework for Robotics.
Master’s Thesis, Carnegie Mellon University, 2004.
[98] A. Torralba, K. Murphy, and W. Freeman. Sharing Visual Features for Multi-
class and Multiview Object Detection. In Neural Information Processing Sys-
tems (NIPS), 2007.
[99] E. Torres-Jara. Obrero: A Platform for Sensitive Manipulation. In 2005
5th IEEE-RAS International Conference on Humanoid Robots, pages 327–332,
2005.
[100] N. G. Tsagarakis, M. Laffranchi, B. Vanderborght, and D. G. Caldwell. A Com-
pact Soft Actuator Unit for Small Scale Human Friendly Robots. In IEEE In-
ternational Conference on Robotics and Automation Conference (ICRA), pages
4356–4362, 2009.
[101] I. Ulrich and J. Borenstein. VFH+: Reliable Obstacle Avoidance for Fast
Mobile Robots. In IEEE International Conference on Robotics and Automation
(ICRA), May 1998.
[102] Carnegie Mellon University. CMU Sphinx Open Source Toolkit for Speech
Recognition, 2012.
[103] D. Vail and M. Veloso. Learning from Accelerometer Data on a Legged Robot.
IFAC/EURON Symposium on Intelligent Autonomous Vehicles, 2004.
[104] R. Vaughan and B. Gerkey. Reusable Robot Code and the Player/Stage Project.
In Davide Brugali, editor, Software Engineering for Experimental Robotics,
Springer Tracts on Advanced Robotics, pages 267–289. Springer, 2007.
[105] N. Vlassis, B. Terwijn, and B. Krose. Auxillary Particle Filter Robot Localiza-
tion from High-Dimensional Sensor Observations. In Proc. of the International
Conference on Robotics and Automation (ICRA), 2002.
168 BIBLIOGRAPHY
[106] J. Wolf, W. Burgard, and H. Burkhardt. Robust Vision-Based Localization by
Combining an Image Retrieval System with Monte Carlo Localization. IEEE
Transactions on Robotics and Automation, 2005.
[107] K.A. Wyrobek, E.H. Berger, H. der Loos, and J.K. Salisbury. Towards a Per-
sonal Robotics Development Platform: Rationale and Design of an Intrinsically
Safe Personal Robot. In Proc. IEEE Int. Conf. on Robotics and Automation,
pages 2165–2170, 2008.
[108] N. Yazawa, H. Uchiyama, H. Saito, M. Servieres, and G. Moreau. Image-Based
View Localization System Retrieving from a Panorama Database by SURF. In
Proc. of the IAPR Conference on Machine Vision Applications, 2009.
[109] S. Zhang and P. Huang. High-Resolution, Real-Time Three-Dimensional Shape
Measurement. Optical Engineering, 45(12), 2006.
[110] Z. Zhang. A Flexible New Technique for Camera Calibration. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, 22, 2000.
[111] C. Zhou, Y. Wei, and T. Tan. Mobile Robot Self-Localization Based on Global
Visual Appearance Features. In IEEE International Conference on Robotics
and Automation, 2003.
[112] J. Zhu, L. Wang, R. Yang, and J. Davis. Fusion of Time-of-Flight Depth and
Stereo for High Accuracy Depth Maps. In Proceedings of CVPR, 2008.
[113] M. Zinn, O. Khatib, B. Roth, and J. K. Salisbury. Playing it safe: A New Actu-
ation Concept for Human-Friendly Robot Design. IEEE Robotics & Automation
Magazine, 11(2):12–21, 2004.
[114] M. Zinn, B. Roth, O. Khatib, and J. K. Salisbury. A New Actuation Approach
for Human Friendly Robot Design. The International Journal of Robotics Re-
search, 23(4-5):379, 2004.