Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining...

25
Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut Geyer and Chris Atkeson, Carnegie Mellon University Overview. A central conundrum in robotics is that manually designed controllers are often much more robust than model-based controllers, but model-based controllers are easier to command and coordinate with perceptual systems such as vision. The overall goal of this proposal is to test the hypothesis that a hybrid controller, combining a manually designed neuromuscular model of human control and an optimization- based controller, can achieve the desirable features of both types of control. Exploring hybrid control could lead to agile and robust robot legged locomotion, give us better understanding of the science of locomotion, and potentially lead to insights into human locomotion. Performance goals include human-like walking, achieving human speeds, step length, and cadence; control of locomotion including starting, stopping, speed, and steering; walking on rough terrain including slopes, slippery surfaces, and granular material such as dirt and soil; and whole body locomotion, including the use of hand rails, grab bars, canes, and walkers. The researchers have several legged robots (Sarcos Primus and Atrias), and access to others (Atlas and Coman) for comparative evaluation of current approaches and proposed hybrid approaches. They will evaluate the ability to match human speeds, step lengths, and cadence, control of speed and direction, rough terrain locomotion, whole body locomotion, and the ability to recover from disturbances and errors such as pushes, slipping, and tripping. They will also compare robot and human kinematics and dynamics, including joint velocies and torques, head velocity and acceleration, and ground reaction forces. The project involves theoretical research on nonlinear control, development of new control algorithms, as well as experimental research verifying control theory on humanoid robots. A successful outcome could trigger a paradigm shift in control approaches to robust behavior. Intellectual Merit. The direct contributions of this research will be: (1) A mathematical framework blending model-based and parametric policy-based control for whole body locomotion in robotics. The framework will be independent of the particular policy, although we focus on a neuromuscular policy as an example in this project. The framework may transfer to other scientific domains where blending optimization with domain knowledge is beneficial. (2) The experimental evaluation in simulation and humanoid robot testbeds of three specific approaches within this framework that focus on the policy as a cost, constraint, or replacement. (3) An understanding of how well these types of blended control generalize across robot platforms of dierent morphology and sensory and actuation modalities. (4) New hypotheses about human sensory-motor control of locomotion behaviors through the advancement of the neuromuscular policy. The direct contributions of this research are a deeper understanding of the benefits of combining model- based and parametric policy-based control for locomotion performance in robotics and a comparison of walking controllers for humanoids that robustly navigate complex and uncertain environments. More broadly, the proposed research will contribute to better understanding of robust nonlinear control, and may lead to better understanding of human sensorimotor control. Broader Impacts. The research will lead to more useful robots. A successful outcome will enable practical controllers for robust locomotion of legged robots in uncertain environments with applications in disaster response and elderly care. The project may generate new insights into human sensorimotor control of behavior, which can lead to better prostheses and exoskeletons and more eective approaches to fall risk management. The proposed project integrates research into education and mentoring by directly training graduate and undergraduate students and by incorporating research outcomes into course work. The project strengthens the infrastructure for research and education by improving the robot testbeds available for research and education. The project supports the broad dissemination of research results by publishing and freely providing simulation and implementation codes. We also expect unplanned broader impacts such as appearing in the news, on TV shows, and in movies. Unplanned broader impacts from previous work include our technologies being demonstrated on entries in the DARPA Robotics Challenge, a graduate student being 1

Transcript of Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining...

Page 1: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior

Hartmut Geyer and Chris Atkeson, Carnegie Mellon University

Overview. A central conundrum in robotics is that manually designed controllers are often much morerobust than model-based controllers, but model-based controllers are easier to command and coordinate withperceptual systems such as vision. The overall goal of this proposal is to test the hypothesis that a hybridcontroller, combining a manually designed neuromuscular model of human control and an optimization-based controller, can achieve the desirable features of both types of control. Exploring hybrid control couldlead to agile and robust robot legged locomotion, give us better understanding of the science of locomotion,and potentially lead to insights into human locomotion. Performance goals include human-like walking,achieving human speeds, step length, and cadence; control of locomotion including starting, stopping, speed,and steering; walking on rough terrain including slopes, slippery surfaces, and granular material such as dirtand soil; and whole body locomotion, including the use of hand rails, grab bars, canes, and walkers. Theresearchers have several legged robots (Sarcos Primus and Atrias), and access to others (Atlas and Coman)for comparative evaluation of current approaches and proposed hybrid approaches. They will evaluate theability to match human speeds, step lengths, and cadence, control of speed and direction, rough terrainlocomotion, whole body locomotion, and the ability to recover from disturbances and errors such as pushes,slipping, and tripping. They will also compare robot and human kinematics and dynamics, including jointvelocies and torques, head velocity and acceleration, and ground reaction forces. The project involvestheoretical research on nonlinear control, development of new control algorithms, as well as experimentalresearch verifying control theory on humanoid robots. A successful outcome could trigger a paradigm shiftin control approaches to robust behavior.

Intellectual Merit. The direct contributions of this research will be: (1) A mathematical frameworkblending model-based and parametric policy-based control for whole body locomotion in robotics. Theframework will be independent of the particular policy, although we focus on a neuromuscular policy as anexample in this project. The framework may transfer to other scientific domains where blending optimizationwith domain knowledge is beneficial. (2) The experimental evaluation in simulation and humanoid robottestbeds of three specific approaches within this framework that focus on the policy as a cost, constraint,or replacement. (3) An understanding of how well these types of blended control generalize across robotplatforms of different morphology and sensory and actuation modalities. (4) New hypotheses about humansensory-motor control of locomotion behaviors through the advancement of the neuromuscular policy.

The direct contributions of this research are a deeper understanding of the benefits of combining model-based and parametric policy-based control for locomotion performance in robotics and a comparison ofwalking controllers for humanoids that robustly navigate complex and uncertain environments. More broadly,the proposed research will contribute to better understanding of robust nonlinear control, and may lead tobetter understanding of human sensorimotor control.

Broader Impacts. The research will lead to more useful robots. A successful outcome will enablepractical controllers for robust locomotion of legged robots in uncertain environments with applicationsin disaster response and elderly care. The project may generate new insights into human sensorimotorcontrol of behavior, which can lead to better prostheses and exoskeletons and more effective approachesto fall risk management. The proposed project integrates research into education and mentoring by directlytraining graduate and undergraduate students and by incorporating research outcomes into course work. Theproject strengthens the infrastructure for research and education by improving the robot testbeds available forresearch and education. The project supports the broad dissemination of research results by publishing andfreely providing simulation and implementation codes. We also expect unplanned broader impacts such asappearing in the news, on TV shows, and in movies. Unplanned broader impacts from previous work includeour technologies being demonstrated on entries in the DARPA Robotics Challenge, a graduate student being

1

Page 2: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

a participant on a Discovery Channel TV series, The Big Brain Theory: Pure Genius, a graduate studenthelping run a group of all female high school students in the robot FIRST competition, and a robot characterin a Disney movie being inspired by our work on soft robots (Baymax in Big Hero 6).

2

Page 3: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

PROJECT DESCRIPTION

1 Objectives

Motivation. A central conundrum in robotics is that manually designed parametric policy-based controllers(for example [75, 38, 128, 74, 29, 110, 66]) are often more robust than model-based optimization controllers,but model-based controllers (for example [96, 60, 80, 67, 63, 25, 24]) generalize well to automated behaviorgeneration, which requires direct control of foot or hand placement for visually guided locomotion or theuse of handholds, canes, walkers, and other walking aids [17]. We believe that finding a way to combinethe advantages of both approaches will lead to more human-like and robust locomotion in humanoid robots,give us a better understanding of the science of locomotion, and potentially lead to new insights into humanmotor control.

Research Question and Proposed Solution. Our motivation translates into a specific research question:Can these two approaches be combined within one formal framework such that the resulting hybrid approachshares the human-likeness and robustness of policy-based control with the versatility of model-based con-trol? We propose as a solution to this question the use of policy-based control as either a cost, a constraint, ora replacement within the model-based optimization control framework. We hypothesize that each of thesevariants offers different advantages and shortcoming with respect to the human-likeness, robustness, andcomputational efficiency of humanoid control. As an example of the model-based controller, we will focuson a hierarchical optimization-based controller developed by Atkeson’s group [96, 25, 24] (Sec. 3). Thiscontroller provides direct control of foot and hand placement and steering. It can be combined with a visionsystem, and can automatically generate a variety of behaviors. For instance, we have demonstrated visuallyguided rough terrain locomotion, and we have added the use of hands for climbing a ladder with the ATLASrobot [25, 24]. As an example of the policy-based control, we will focus on a neuromuscular model of hu-man control created by Geyer’s group [29, 90, 87, 88] (Sec. 4). The model presents an attractive candidatepolicy that is inherently human-like, comparably robust to disturbances, and that is less demanding in termsof state estimation (for most joints only position is required and not velocity, for example).

Specific Objectives. Our specific objectives are to formally develop the three variants of a hybrid ap-proach and to evaluate their performance in simulation and hardware experiments with different humanoidrobots (Fig. 1) (Sec. 5). The metrics according to which we will evaluate the performance include locomo-tion speed, cadence, duty factor, as well as kinematics, kinetics and ground reaction force patterns (human-likeness), external push resistance and survival rate on rocky and slippery terrain (robustness to externalperturbations), model parameter variations and artificial sensory noise (robustness to internal uncertainty),and cycle time of controller execution (computational costs). For the hardware experiments, our groups areequipped with humanoid robots of different complexity (ATRIAS and SARCOS Primus), and we have ac-cess to two more humanoids with different actuation and sensing capabilities (COMAN and ATLAS). Thisaccess to a range of hardware platforms will allow us to evaluate how well the observed outcomes generalizeacross humanoids.

Intellectual Merit. The expected contributions of the proposed research are: (1) A mathematical frame-work blending model-based and parametric policy-based control for whole body locomotion in robotics.The framework will be independent of the particular policy, although we focus on a neuromuscular policyas an example in this project. (2) The experimental evaluation in simulation and humanoid robot testbeds ofthree specific approaches within this framework that focus on the policy as a cost, constraint, or replacement.(3) An understanding of how well these types of blended control generalize across robot platforms of dif-ferent morphology and sensory and actuation modalities. (4) New hypotheses about human sensory-motorcontrol of locomotion behaviors through advancing the neuromuscular policy.

1

Page 4: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

(a) ATRIAS (b) SARCOS (c) ATLAS (d) COMAN

Figure 1. Robot platforms that we have worked with. (a) CMU’s ATRIAS compliant electrical humanoid with boom constraintfor sagittal plane locomotion. Note that ATRIAS is fully 3D capable and can be taken off the boom. (b) CMU’s SARCOSPrimus hydraulic humanoid. (c) ATLAS hydraulic humanoid used by Atkeson’s group crossing rubble field in the DARPARobotics Challenge. (d) COMAN compliant electrical humanoid. To evaluate our control hypotheses of this proposal, we willuse the ATRIAS, SARCOS and COMAN testbeds.

Significance. These contributions can have significant intellectual and societal impact. Although hu-manoid robots are considered key technological platforms with applications in disaster response and mitiga-tion and elderly care, robust humanoid locomotion remains a research topic in a nascent stage. Successfullydeveloping a mathematical framework that blends model-based and policy-based control, and combines theadvantages of both individual control approaches, could trigger a paradigm shift in generating robust loco-motion behaviors in robotics with an impact on future robotics technology including humanoids, prosthetics,and exoskeletons. In addition, the mathematical framework may generalize to other scientific domains andproblems which benefit from incorporating domain knowledge into optimization.

2 Related Work

Model-based Control in Humanoid Robotics. Force-controlled humanoid robots are often controlled atthe behavior level using models of the robot and environment dynamics. These computed torque approachesare generalized to floating base inverse dynamics [96, 60, 80, 67, 63, 25, 24], which enable the robot to trackthe behavior of an underlying, simplified dynamics model. In legged locomotion, common underlyingmodels are, for instance, the linear inverted pendulum model used in humanoid robots like ASIMO, HRP-4and HUBO [46, 37, 47, 103, 13], or the zero dynamics manifold for bipedal robots like RABBIT, MABEL,and AMBER, controlled within the hybrid zero dynamics framework [12, 111, 2, 92]. The model-basedcontrol paradigm is mathematically elegant, and the corresponding locomotion problem has been solved intheory with optimal control techniques that automatically generate behavior [44, 8, 64, 104, 106, 25]. Inpractice, however, the paradigm requires to identify the states and dynamic parameters of a robot with highaccuracy, and humanoids struggle to demonstrate robust dynamic locomotion.

Parametric Policy-Based Controllers. Parametric policy-based controllers for legged locomotion havebeen developed in the robotics, computer graphics, and neuromechanics communities [75, 38, 128, 74, 29,66, 110]. These policy-based controllers capture explicit domain knowledge, acquired either by intuition orby a more formal and time-consuming analysis of individual locomotion behaviors. The Raibert robots inparticular demonstrate robust locomotion while navigating rough terrain [74, 66]. In addition, simulationexperiments demonstrate that policy controlled models of humans and humanoids can withstand comparablylarge disturbances including unexpected ground changes and pushes to the body [110, 91, 88, 27].

One motivation for investigating parametric policies is the observation that, in animals and humans, de-centralized neural control circuits (central pattern generators and muscle reflex modules) at the level ofthe spinal cord seem to contribute substantially to the generation and robustness of legged locomotion

2

Page 5: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

Figure 2. Controller architecture based on hierarchical optimization

[85, 11, 34, 20, 70, 40, 28, 15]. It is believed that imitating these types of parametric policy-based con-trol will help machines to match the agility and robustness of animals [84, 129, 65].

Hybrid Approaches. Very few attempts have been made to combine the benefits of both types of controlapproaches in humanoid robotics. The control overview reported for the Petman humanoid [66] seemsclosest in spirit, although the paper does not reveal implementation details. Specifically, during stance thecontrol regulates global goals like the height and orientation of the body similar to other implementations[46, 73, 25]. Desired behaviors also includes target trajectories for the pelvis, back and arm motions, whichcan be interpreted as costs that improve the human-likeness of behavior. A force distribution algorithm isthen used to best achieve the necessary leg forces that track the different behaviors simultaneously.

We are not aware of a formal framework or analysis of a hybrid approach to humanoid control whichseamlessly blends model-based optimization control and policy based control.

3 Control Strategy using Model-Based Hierarchical Optimization

Figure 2 shows the architecture of our robot control based on hierarchical optimization. It was successfullytested on our SARCOS and ATLAS platforms (Fig. 1b,c). Our ATLAS robot demonstrated rough terrainwalking, ladder climbing, and full body manipulation in the DARPA Robotics Challenge Trials of 2013.This work also received a “Best Oral Paper Award” at Humanoids 2013 [25]. Figure 3 shows a sequence ofsnapshots of ATLAS walking on rough terrain and plots of measured trajectories of the feet and the centerof mass (CoM). Figure 4 shows similar data for ladder climbing.

3.1 Overview

The Behavior Library (Fig. 2) contains learned behaviors and behaviors created from demonstration ormotion capture of humans or other robots, and also acts as a cache for planning. The output of the library isused to prime, speed up, and guide the real time optimization process, and can be motion or force trajectories

(a) (b)

z (m

)

x (m)

0

1

2

0 1 2 3

CoM

right foot

left foot

CoM

right foot

left foot

x (m)0 1 2 3

0

0.5

-0.5

-1

y (m

)

transversal plane sagittal plane

Figure 3. ATLAS robot practicing for the terrain task of the DRC. Photo snapshots taken every 5 seconds. Measuredtrajectories in the transversal (a) and sagittal planes (b) of the left (red) and right foot (green) and the center of mass (CoM,blue) are shown. The robot walks in a straight line along y = 0 in reality. Our state estimator at that time drifted significantly.This has been fixed by adding vision-based SLAM.

3

Page 6: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

(b)

-0.5 0.5

frontal plane

0

z (m

)

1

3

2

y (m)

(a)

CoM

right foot

left foot

left hand

right hand

x (m)0 1 2

0

z (m

)

sagittal plane

1

3

2

Figure 4. ATLAS climbing the top half of the same ladder as used in the DRC. Photo snapshots taken every 13 seconds.The top row shows repositioning of the hook hands, and the bottom row shows stepping up one tread. The two plots showATLAS climbing the first five treads during the actual run at the DRC Trials. Axis definitions and color codes for foot and CoMtrajectories are the same as in Fig. 3. Left and right hand positions are plotted with cyan and magenta dashed lines.

in joint or task coordinates (xd(t)), as well as possibly including value, heuristic cost, or utility functions(Vx(t)) that can be used to guide search and optimization. Libraries can be built in advance, or when taskspecification are known. The rest of the controller can also operate without information from the library ifsuitable information is not available.

The Multistep Low-Order Dynamics Planner uses receding horizon control to continually optimizetrajectories of motion and force of a simplified model of the current task. In walking, this is usually centerof mass (COMd(t)), angular momentum (Ld(t)), contact force (contactsd(t)) trajectories, and value, heuristiccost, or utility functions (Vc(t)) specifying tradeoffs between these quantities. We optimize with a timehorizon of several seconds using Differential Dynamic Programming [43].

For walking, our planner is similar in spirit to Kajita’s Preview Control for a Linear Inverted PendulumModel (LIPM) [45]. We are also using a CoM model, reason about Zero Moment Point (ZMP), and usefuture information to guide the current trajectory. However, our planner can be generalized to nonlinearmodels and adjust foot steps while optimizing the CoM trajectory. We explicitly add the vertical height inour CoM model to handle height variations on rough terrain. Like capture point methods [72], we take thenext few steps into consideration but do not plan to come to rest at the end. Ogura et al. [68] investigatedgenerating human-like walking with heel-strike and toe-off by parameterizing the swing foot trajectory. Incontrast, we use very simple rules to guide the low level controller to achieve the same behaviors.

The Full Body QP Optimization uses quadratic programming (QP) with a very short time horizon (1mson SARCOS and 3ms on ATLAS) to optimize the desired full state of the robot (θθd, θθd), joint torques (ττd),and contact forces (fd). Currently, we pre-specify gain matrices (Kθ, Kθ, Kτ, Kf) for the robot controller,but plan to explore how to optimize gains as well as trajectories. This module enforces constraints suchas robot kinematics and dynamics and joint, velocity, torque, actuator, friction cone, and center of pressurelimits, and resolves redundancies. The module performs inverse kinematics and dynamics to provide us withcompliant motions and dynamic behaviors, while compensating for the effects of modeling errors.

For the QP, we use a formulation developed in our group [94, 114, 112]. Unlike [61, 76] who use or-thogonal decomposition to project the allowable motions into the null space of the constraint Jacobian, andminimize costs in the contact constraints and the commands, we directly optimize a quadratic cost in termsof state accelerations, torques and contact forces on the full robot model. This design choice allows us totrade off physical quantities of interest. We are also able to directly reason about inequality constraints suchas center of pressure within the support polygon, friction, and torque limits. Although it becomes a biggerQP problem, we are still able to solve it in real time. Hutter el al. [41] resolved redundancy in inversedynamics using a kinematic task prioritization approach that ensures lower priority tasks always exist in thenull space of higher priority ones. In contrast to their strictly hierarchical approach, we minimize a sum of

4

Page 7: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

weighted terms. We can directly specify the relative importance of the terms by adjusting the weights.The Low-Level Robot Controller is a blend of our work and controllers provided by the robot man-

ufacturers. For SARCOS and ATLAS, hydraulic valve commands for each joint i are provided by themanufacturers,

vi = kθ,i(θi − θd,i) + kθ,i(θi − θd,i) + kτ,i(τi − τd,i) + v f f ,i , (1)

where vi is the valve command, θi is the joint angle, θi is the joint angular velocity, τi is the joint torque, andv f f ,i is a feedforward valve command. Subscripts d indicate desired values, and quantities denoted with kare scalar gains. This joint level servo runs at 5kHz on SARCOS and at 1kHz on ATLAS.

We augment this independent joint control by coupling joints and providing force control based onforce/torque sensors in the wrists, ankles, and skin of the robot,

v = Kθ(θθ − θθd) + Kθ(θθ − θθd) + Kτ(ττ − ττd) + Kf(f − fd) ,+v f f (2)

where v, θθ, θθ, ττ, and v f f are the corresponding vector quantities, and the K are matrix gains coupling alljoints. f are sensed contact forces. The coupled controller runs at 1kHz on SARCOS and 333Hz on ATLAS.

Figure 2 does not include our state estimator, which provides appropriate state information to all othermodules. The details of this work on state estimation are described in [124].

3.2 Results from Prior NSF Support Related to this Work

(a) NSF award: IIS-0964581 (PI: Hodgins, Atkeson co-PI); amount: $699,879; period: 7/1/10 - 6/30/14.(b) Title: RI: Medium: Collaborative Research: Trajectory Libraries for Locomotion on Rough Terrain.(c) Summary of Results: This grant has supported work on a variety of approaches to controlling humanoidrobots based on trajectory libraries. Section 3.1 focused on the hierarchical optimization approach, whichforms one starting point for the proposed work and received a “Best Oral Paper Award” at Humanoids 2013.

Intellectual Merit. (1) Identification of a hierarchical approach to online optimal control of behavior, asdescribed above. (2) Algorithms for training robust policies/controllers using multiple models, forming thebasis for a new approach to robust learning and for algorithms greatly speeding up reinforcement learning.(3) Simple learning approaches to concisely represent and accurately predict optimal trajectories. (4) Glob-ally optimal control of instantaneously coupled systems (ICS), which is designed by coordinating multiplelower-dimensional optimal controllers. (5) We discovered that many features of optimized walking includ-ing costs can be fit using simple global function approximation, such as quadratic function approximation.(6) A paradigm for designing controllers for complex systems, “informed priority control”, which coordi-nates multiple sub-policies and enables us to prototype complex control system designs faster. (7) Stateestimation for mobile and humanoid robots with “floating body” dynamics.

Broader Impacts. We are working toward more useful robots and knowledge that may help with a sig-nificant social problem, why people fall and injure themselves. We have coordinated our outreach activitieswith the larger outreach efforts of CMU’s NSF Engineering Research Center on Quality of Life Technol-ogy to achieve greater reach and effectiveness. Our technologies are being shared by being published, andpapers are available electronically. The technologies are also being demonstrated on entries in the DARPARobotics Challenge. We have created and teach relevant courses, including an undergraduate course onhumanoid robotics and a graduate course on optimization of behavior. We put the teaching materials on theweb to widely disseminate the results. Our work is being used in Disney Research and through this technol-ogy transfer path will eventually be used in entertainment and education applications, and will be availableto and inspire the public. A graduate student was a participant on a Discovery Channel TV series, ”The BigBrain Theory: Pure Genius”. One purpose of the TV series is getting people excited about engineering. Agraduate student helps run a group of all female high school students in the robot FIRST competition. Arobot character in a Disney movie is inspired by our work on soft robots (Baymax in Big Hero 6).

5

Page 8: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

Figure 5. Neuromuscular control policy. (a) Example walking on ground with large unexpected height changes. Refer to textfor details on model structure (i-iv). (b) Robustness to external disturbances. Survival rate on terrain with unexpected stepsand resistance to unexpected pushes. (c) Robustness to internal disturbances. Black trace: trunk pitch during level groundwalking. Red trace: disturbed pitch signal seen by control policy (15ms time delay, 5deg/s uniform random noise).

Development of Human Resources. The project involved five PhD graduate students (three studentshave graduated) and four postdocs. We had weekly meetings, weekly lab meetings, and we individuallymentored all participants. All participants actively did research, made presentations to our group, gaveconference presentations, gave lectures in courses. The students served as teaching assistants.(d) Publications resulting from this NSF award: [126, 62, 4, 52, 125, 78, 99, 93, 97, 122, 130, 86, 77, 5,131, 7, 50, 116, 123, 39, 54, 3, 113, 95, 115, 122, 117, 48, 26].(e) Other research products: [Chris: needs “a brief description of available data, samples, physical collec-tions and other related research products not described elsewhere;” Can also say “none”](f) Renewed support. This proposal is not for renewed support.

4 Control Strategy using Neuromuscular Policy

Figure 5 shows the architecture of our human locomotion model controlled by a neuromuscular control pol-icy. We have shown in previous work that this model successfully captures human-like kinematics, kinetics,and ground reaction forces [29, 90, 88], is robust to comparably large external and internal disturbances[29, 89, 90, 87, 91] (Fig. 5b,c), and shows a range of human locomotion behaviors [88] (Fig. 6). We alsohave applied the neuromuscular policies of this model to the control of prosthetic limbs [21, 105], leadingto the first powered ankle prosthesis that is commercially available to transtibial amputees [36, 1].

4.1 Overview

The Musculoskeletal Dynamics are represented in the human model with 8 segments connected by revolutejoints (Fig. 5a, i) that are spanned by 18 Hill-type muscles (Fig. 5a, ii). The muscles generate net torquesabout the joints they span. Each muscle’s force is computed based on a Hill-type model [108] with seriesand parallel elasticities connected to a contractile element (Fig. 5a, iii). The contractile element’s forceFce = AFmax fl fv depends on mechanical muscle properties described by the maximum isometric force Fmax

and the nonlinear force-length and force-velocity relationships ( fl and fv), as well as on the muscle activationA. Muscle activations are low-pass filtered stimulations S generated by a model of human neural control.

The Neural Control (Figs. 5, iv) of the model almost exclusively represents spinal reflex circuits thatare physiologically plausible and embed major functions of legged dynamics and control as local feedbackloops [29]. For example, the compliant leg behavior important to legged dynamics in stance [10, 59, 32] isrealized with simple positive force reflexes of the anti-gravity muscles, and their stimulation is modeled as

6

Page 9: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

Figure 6. Examples of steady and transient locomotion behaviors that the neuromuscular policy can generate.

S (t) = S 0 + GFce(t − ∆t), where S 0 is a reference membrane potential at the α-motoneuron, G is a synapticgain, and ∆t describes a neural time delay (up to 20ms) [30]. The neural control combines multiple of thesefunctional reflex circuits into control modules. Together they form the neuromuscular control policy.

Our approach to modeling neural control stands in contrast to more common approaches which emphasizecentral pattern generation [102, 100, 101, 35, 51, 84, 65]. An advantage that our approach provides is thatit links physiology to physics-based function, making the contribution of individual control elements clearand transferrable to rehabilitation and humanoid robotics.

The Neuromuscuclar Control Policy (NMP) represent only one example of several other control poli-cies, which we could consider for a hybrid approach that blends optimization and policy based control(compare Sec. 2). In fact, the main ideas behind the approaches we propose to investigate do not hinge onthe particular policy. However, the NMP provides several advantages that make it an attractive candidatepolicy. It is inherently human-like, does not depend on a high-fidelity model of the full system dynam-ics, and is comparably robust to disturbances (Fig. 5). In addition, it can express a range of steady andtransient locomotion behaviors including slope and stair negotiation, walking and running, accelerating anddecelerating, turning and obstacle avoidance (Fig. 6).

4.2 Results from Prior NSF Support Related to this Work

(a) NSF award number: EEC-0540865; amount: $29,560,917; period of support: 6/1/06 - 5/31/15.(b) Title: NSF Engineering Research Center on Quality of Life Technology (PI: Siewiorek).(c) Summary of Results: This NSF Engineering Research Center supported during the period 6/2010-5/2013a research project (lead H. Geyer) on swing leg control in the NMP. We only report on this part of the award.

Intellectual Merit. (1) Identification of a policy-based swing leg controller that robustly places the footinto targets on the ground [18]. (2) Identification of plausible spinal reflex circuits that embed this swingcontrol in an NMP. The NMP generated resulting muscle activations that match human muscle activationsin walking and running, providing evidence for a similar control in humans [19].

Broader Impacts. The project has supported our work toward control algorithms in rehabilitation robotics,which may help people who depend on legged assistance to achieve greater mobility and quality of life. Thecommercially available, powered ankle prosthesis [21, 1] is a testament to the social impact that this workcan have. Our results are being shared by being published, and papers are available electronically. Theresults have been integrated into the curriculum of a graduate course on biomechanics and motor control.

Development of Human Resources. The project trained two graduate students, who presented their re-sults at conferences (ROBIO 2012, ICRA 2013 and EMBC 2013). The graduate students have completedtheir Master’s degree and won prestigious scholarships (R. Desai: Siebel and Google Anita Borg scholar-

7

Page 10: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

MultistepLow-Order Dynamics

Planner

Approach 1:Neuromuscular Policy

as Cost

θd, Kθ

BehaviorLibrary

Full BodyQP

Optimization

Low-LevelRobot

Controller

θd, Kθ

τd, Kτfd, Kf

COMd(t)Ld(t)

contactsd(t)Vc(t)

xd(t)Vx(t)

Approach 2:Neuromuscular Policy

as Constraint

Approach 3:Neuromuscular Policy

as Replacement

(i) (ii)(iii)

Figure 7. Key concepts for hybrid approaches blending optimal and policy-based control in humanoid locomotion. Theneuromuscular policy control acts as a cost (1), constraint (2), or replacement (3) at different levels of the model-basedoptimization control.

ships) during tenure with the project. They are continuing their training and education within the RoboticsInstitute’s PhD program.(d) Publications resulting from this NSF award: [18, 19, 90, 87, 91].(e) Other research products: None.(f) Renewed support. This proposal is not for renewed support.

5 Research Plan

Both the model-based optimization approach and the policy-based approach have distinct strengths andweaknesses. The policy-based approach generates quite robust locomotion while being insensitive to kine-matic singularities, joint and actuator limits, and model parameters in general, which are all concerns thatmake it difficult to apply controllers from the model-based optimization approach on humanoid robots. Onthe other hand, the policy-based approach does not generalize well to automated behavior generation, whichrequires direct control of foot or hand placement for visually guided locomotion or the use of handholds,canes, walkers, and other walking aids. In this domain, the model-based optimization approach shines. Webelieve that the fields of humanoid robotics and locomotion science will be pushed forward if the two ap-proaches can be combined to get the best of each. We do not yet know the best combination, and propose toinvestigate different hybrid approaches.

5.1 Hybrid approaches that we will pursue

Our key concepts for hybrid approaches are to use the neuromuscular policy as either a cost, a constraint, ora replacement (Fig. 7). Each of these ideas can be applied to different stages of the model-based optimizationwith distinct advantages and shortcomings.

Hybrid Approach 1: Policy as a Cost Term. The concept of the first hybrid approach is to use theneuromuscular control policy (NMP) as a term

J =∑

i

Ji + JNMP (3)

in the cost function at different stages of the hierarchical optimization control. The general advantage ofthis approach is that it is simple to implement in the state-of-art control frameworks of humanoid con-trollers [127]. The NMP simply acts as a bias term in existing optimization stages. The main shortcomings

8

Page 11: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

of this approach are that it maintains a high demand on the quality of sensory signals and of the robot-environment dynamics model, M ¨q + H = S τ + JT λ, and that the computational speed of the optimizationremains slow as no dimensionality reduction occurs. We hypothesize that the NMP influence on the costwill improve a humanoid robot’s performance including, for example, more human-like step lengths andtiming in locomotion. However, we do not anticipate substantial improvements on locomotion robustness.

Hybrid Approach 2: Policy as an Optimization Constraint. Instead of a cost term, the second approachintroduces the NMP as a constraint on the hierarchical model-based optimization,

min J s.t. fi(x) = 0, g j(x) > 0, (4)

where x = [q ˙q]T . Some individual functions of the limbs are common in locomotion. One example iscompliant leg behavior. Yet compliant legs tend to buckle [83]. Part of the reflex control in the neuro-muscular model counteracts this recurring tendency [29]. Including such common limb functions (domainknowledge) as constraints instead of as costs ensures that the optimization respects known danger zones thatthreaten locomotion stability. For instance, the knee is particularly prone to buckling, and a torque constraint

fi(x) = τk − τNMP,k, (5)

would ensure that the knee torque τk follows the counteraction torque τNMP,k produced by the neuromuscularpolicy. Besides this main advantage, the second approach can also lead to partial dimensionality reductionand a related computational speed improvement if the basis for the optimization is chosen such that it auto-matically fulfills NMP constraints. The dependence on a high quality dynamics model of the robot remainsa shortcoming of this approach. In addition, the implementation of the nonlinear NMP as a constraint willbe more challenging. We hypothesize, however, that it improves over the first approach with more stableand robust locomotion behaviors.

Hybrid Approach 3: Policy as Replacement for Optimization. The third approach directly replacesoptimization by the NMP. The motivation behind this concept is that in contrast to a full optimization usinga dynamics model of the entire robot-environment system, the NMP requires much less sensory informationand model precision (Fig. 7). For instance, replacing the QP optimization (ii in Fig. 7), the NMP acts as ablackbox that receives a set of task level inputs xgoal from the dynamics planner (i) and generates desiredjoint torques for the robot,

τcmd = f (q, ˆFcnt,ˆθ f loat, xgoals), (6)

based on the current positions q of the robot’s joints, a gross estimate ˆFcnt of the left and right legs’ contactforces, and an estimate of the trunk orientation ˆθ f loat in the world. The NMP does not require precisemeasurements of environmental interaction (λ and JT ) or estimates of joint velocities ( ˙q). Another advantageof replacement is that it enables very fast computation as optimization is reduced to the low-dimensionalitydynamics planner. On the other hand, this approach can only be applied when the topologies match betweenthe NMP and the target robot. Another major shortcoming is that the robots’ behavioral versatility is limitedto the behavior versatility of the NMP, which furthermore must be accessible with a few high level inputsxgoal. We hypothesize that the third approach will be the least sensitive to unmodeled robot-environmentdynamics for the behaviors that the NMP can address.

5.2 Research activities that we will perform

The starting points for the proposed research are our model-based optimization control framework (detailedin section 3) and neuromuscular control policy (detailed in section 4). The proposed research activities detailthe intellectual steps that will take us from these starting points to the evaluation of our hypotheses abouthybrid approaches to humanoid control.

9

Page 12: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

(a) (b) (c)

TRajectory Plot coming

Figure 8. [hg to update] Preliminary results on neuromuscular policy as replacement. (a) Adaptation of neuromuscular modelto ATRIAS topology. (b) Robot negotiating rough terrain (±4cm) using swing leg targets (xgoal) sent by the optimization to theneuromuscular model. (c) Center of mass trajectory and ground level in sagittal plane.

Activity 1: Adapt Realtime Optimal Control Framework to ATRIAS and COMAN. Our existingmodel-based control framework is implemented in realtime on both SARCOS Primus and ATLAS [113, 23].We will adapt this framework to the ATRIAS and COMAN platforms. The adaptation provides new intel-lectual challenges. ATRIAS and COMAN have physically compliant, series elastic actuators (SEAs) [71],and it is unclear how these additional states should be treated in the realtime optimization framework. Wewill investigate two directions, either keeping the low level control of the SEAs separate from optimizationor integrating their states into the optimization. From a theoretical standpoint, the second version shouldprovide superior control performance. On the other hand, we have cascaded controllers on the ATRIASplatform for which the SEA control part is an inner loop separated from the behavior control [58, 57]. Thisseparation has provided satisfactory control fidelity in our past work.

Activity 2: Develop Hybrid Approaches with NMP as a Cost or Constraint. Using the NMP as a costor constraint raises two research questions: Which cost or constraint terms of the NMP maximize controlperformance? At what stage in the control framework (i-iii, Fig. 7) should the NMP be integrated? We willaddress the first question by investigating the effect of different cost or constraint terms. In particular, weplan to focus on terms using torques, accelerations, and contact forces. For instance, JNMP = τT WττNMP +¨qT W ¨q ¨qNMP would generate a cost biasing torques and accelerations of the optimization toward the NMP.Ideas that we will pursue include the optimization of the cost term weights to find the most robust controllerand the ordering of the constraints in a priority list that gets satisfied until degrees of freedom are exhausted.For the second question, we will investigate how the bias with the NMP at different stages affects the controlperformance. A bias in the planner (i, Fig. 7) may provide better foot placement targets than the linearinverted pendulum currently used in the planner. NMP bias in the full body optimization (ii) can addressindividual limb functions. Finally, a behavior library populated with the NMP (iii) can provide fast, pre-compted (cached) solutions, and may allow the humanoid to learn behavior over the course of execution.

In general, the NMP constraints are nonlinear constraints. To implement them in the optimization frame-work, we will use sequential quadratic programming techniques [33].

Activity 3: Develop Hybrid Approach with NMP as a Replacement. Replacing individual stagespresents a more radical change than influencing an established optimization framework with additional costsor constraints. To address the feasibility of this approach, we have implemented a preliminary version in asimulation of the 2D ATRIAS testbed (see section 5.3 for details on our testbeds). We removed the foot ofthe neuromuscular model and adapted it to ATRIAS (Fig. 8a). We then mapped the torques generated by theNMP into desired joint torques (Eq. 6) for a previously validated simulation of the testbed [58, 57]. Finally,we added optimization to plan desired foot placement targets and leg ground clearances in the presenceof known obstacles. The resulting control combined robust walking of ATRIAS over terrain with unknownground level changes up to XXcm (X% leg length) with intentional stepping over obstacles of varying height(tested up to XXcm) and width (XXcm) (Fig. 8b,c) [hg to provide exact results]. The preliminary resultssuggest the feasibility of the approach. To evaluate it on real hardware, we will embed the NMP replacement

10

Page 13: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

Figure 9. Proposed evaluation of human-likeness and robustness to external disturbances. (a) Human joint kinematics andkinetics as a benchmark for likeness. (b) Standard methods of push recovery and ground changes for assessing robustnessto external perturbation. Refer to section 5.3 for more details and additionally proposed evaluation metrics.

for the individual robot platforms within the existing realtime optimal control framework (activity 1).In addition, we will address the research question of whether the NMP can be formulated in a hierarchy

that controls versatile behavior with a few high-level task goals xgoal. Specifically, we will investigate if ahierarchy of reflex control can be identified for the key inputs of target speed, target gait, and target turningrate. Our original NMP produced only walking [29]. An understanding of how the swing leg control of footplacement can be formulated with a hierarchical structure [18, 19] led us to an NMP which can express arich set of behaviors from walking and running to slope negotiation and turning [88] (compare Fig. 6). Al-though we can formulate offline optimizations that generate transitions between these behaviors by adaptingthe reflex control parameters, we do not yet understand how these adaptations can be reinterpreted with hi-erarchical control structures. A solution to this question not only would enrich the online NMP replacementapproach, but also can provide new insights into human sensorimotor control. A human model that adaptsits locomotion behaviors based on higher level goals would provide a direct hypothesis about how humansadapt locomotion behaviors.

Activity 4: Implement, Evaluate and Refine Hybrid Approaches in Simulation. We will implementthe hybrid approaches in simulation to evaluate their performance and refine them before conducting ac-tual hardware tests. The simulations will model the experimental tests that we plan to perform within therobot testbeds. For this activity, we will build on our previous work, in which we have developed similarsimulations for the ATRIAS, SARCOS and ATLAS platforms [57, 25, 95, 113, 23].

Activity 5: Implement, Evaluate and Refine Hybrid Approaches in Robot Experiments. We willimplement the different hybrid approaches on four different robot testbeds. The details of our experimentalevaluation plan, including performance metrics and benchmarks, robot testbeds, and data collection, areelaborated in the next section.

5.3 How we will evaluate controller performance in experiments

To test our hypotheses about the performance of the three hybrid approaches, we will evaluate the human-likeness, the robustness to external perturbations, the robustness to internal model uncertainty, and the com-putational cost of the individual humanoid controllers in robot testbeds of increasing complexity.

Performance Metrics and Benchmarks. Well-defined standards and benchmarks for assessing locomo-tion in humanoid robots are rare [107]. Benchmarks that have been used often focus on accomplishing aspecific task with discrete pass or fail metrics (climb ladder, open door, negotiate rubble field, [17]). Estab-lished performance metrics for a more continuous comparison of competing locomotion controllers includeenergy-efficiency, the likeness to human locomotion, and the assessment of push recovery [107].

We will quantify human-likeness using established methods and metrics from biomechanical gait analy-sis focusing on speed, cadence, and duty factor [42, 69], as well as on similarity of kinematic, kinetic, and

11

Page 14: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

ground reaction force patterns [69, 118]. We will evaluate locomotion robustness to external perturbationsby quantifying the recovery from pushes in the sagittal and frontal planes, and the survival rate on terrainwith unknown ground disturbances of increasing severity. The pushes will be administered with a force-sensor equipped device to a fixed location on the torso near the COM of the robots to quantify disturbanceimpulse. (We have successfully used this method in previous work assessing standing balance [98].) Theground disturbances will include tracks with pits of rocks (three different granularities ranging from 1cmto 10cm) and of tiny roundball pellets (6mm, emulate behavior of sand), and slippery surfaces of decreas-ing friction coefficient (steel, oil layer). We will evaluate the robustness to internal model uncertainty bysystematically changing the model parameters (mass and inertial properties) and adding artificial sensorynoise in the software of the controller during locomotion on level ground. Finally, we will quantify thecomputational cost by tracking the execution time for each controller.

Benchmarks for some of these metrics have been reported in the literature; however, these benchmarks arehighly dependent on the physical capability of the individual robot hardware platform. To objectively assessthe benefits and shortcomings of the proposed hybrid approaches to humanoid control, we will establishbenchmark values for each of our robot platforms using the state of the art model-based optimization control(detailed in section 3) as a null hypothesis.

Robot Testbeds. The groups of the PIs are equipped with humanoid robots of different morphology(ATRIAS and SARCOS Primus) and have access to another humanoid with different actuation and sensinghardware (COMAN) (Fig. 1). This access to a range of hardware platforms will provide us with the uniqueopportunity to quantify our results across platforms. Such a systematic evaluation of performance, either insimulation or on actual robots, has rarely been done in the field of humanoid robotics, including our previouswork.

We will evaluate the control approaches on the simplest robot testbed first (7 degrees of freedom, DOF)and then systematically advance to the testbeds with increasing complexity (up to 34 DOF). First, we willuse CMU’s ATRIAS biped constrained to a boom (Fig. 1A[check]). With only 3 external and 4 internalDOF, the testbed substantially simplifies the realtime implementation of the different controllers. Second,we will take ATRIAS off the boom (Fig. ??B[check]). This will increase the level of complexity to 6 externaland 6 internal DOF, and will allow us to assess the performance of the controllers including lateral balance.Third, we will apply and test the controllers on CMU’s SARCOS Primus humanoid. With 12 internal DOFused for locomotion, the humanoid matches the full complexity of the neuromuscular model policy control.We will also explore full body control with 34 DOF (includes legs, arms, torso, and neck, but not fingers,eyes, or mouth). Finally, we plan to test the generalizability of our results with the COMAN and ATLASplatforms, which share the complexity of the SARCOS robot.

Data Collection. All of the humanoid robots are equipped with position, force and torque sensing. Inaddition, we will use our motion capture and force plate systems to collect ground truth for joint motionsand ground reaction forces during the experiments. The measurements are sufficient to analyze the human-likeness and the robustness to external and internal disturbances with the proposed metrics.

6 Management Plan

6.1 Roles of the Investigators

This project will bring together researchers with complementary areas of expertise attacking an interdis-ciplinary problem. The key personnel will include the two PIs, Chris Atkeson and Hartmut Geyer (CMURobotics Institute), one postdoctoral researcher, and two graduate students.

Principal Investigators. Chris Atkeson has expertise in model-based optimal control of humanoids [95,113, 23], robot learning, behavior libraries, and humanoid robotics. Hartmut Geyer is an expert in humanneuromechanics and robotics. He has broad experience in dynamics and control of legged systems including

12

Page 15: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

conceptual gait models [81, 31, 32, 22, 18, 120], computational models of human neuromuscular controlpolicies [30, 29, 89, 19, 90, 87, 91, 88], and powered prosthetics and legged robots [21, 79, 58, 57]. He alsohas experience with experimental gait analysis [82, 30, 18, 19].

Neither PI can do the project work alone. A joint effort is necessary to combine our expertise in this inter-disciplinary project involving humanoid robotics, model-based optimization, and human neuromechanics.Our collaboration will provide a thrust towards a new generation of students that can take on the challengingquestions in this interdisciplinary area of research.

Postdoctoral Researcher. We have found that for research involving technologically advanced platforms,the combination of a postdoctoral researcher with several faculty and students provides a very productivemix of expertise. It accelerates the training and learning of the graduate students while maintaining a highlevel of research productivity. We will recruit a postdoctoral researcher with expertise in humanoid roboticsto ensure such a productive environment. She will participate in all activities of the project with a focuson developing a single framework that can seamlessly blend optimization and policy-based control. Theattached mentoring plan details our mentoring of the postdoctoral researcher.

Graduate Students. The project will involve two graduate students who will be jointly supervised bythe PIs and the postdoctoral researcher. For the theoretical parts of the project, one student will focus moreon the hybrid approaches using the NMP as a cost or constraint (activity 2), and the other student will focusmore on the NMP as a replacement (activity 3). For the experimental validation in simulation and on therobot platforms, the students will work together (activities 1, 4 and 5). The students will be trained bythe PIs in legged systems and optimal control theory as well as in model-based optimization and humanneuromechanics. They will also acquire expertise in advanced techniques and algorithms for rigid bodysimulation and realtime hardware control.

Project Coordination. The PIs will jointly coordinate the activities supported by this project and willbe responsible for all aspects of it. For the day-to-day administration and financial management, they willbe supported by the staff of the CMU Robotics Institute. Frequent ad hoc meetings and interactions of allpersonnel involved are guaranteed as as the PIs’ groups share lab space and participate in a weekly, jointCMU seminar on bipedal locomotion. In addition, all personnel in this project will meet bi-weekly in amore formal setting to review research results and discuss future directions.

6.2 Time Line of Activities

Year 1. Theory: Develop hybrid approaches for planar locomotion (Activities 2 & 3). Simulation Experi-ments: Implement, evaluate and refine theory in simulation experiments of 2D ATRIAS platform (A1 & 4).

Year 2. Theory: Develop hybrid approaches for 3D locomotion (A2 & 3). Simulation Experiments: Imple-ment, evaluate and refine theory in simulation experiments of 3D ATRIAS and SARCOS platforms (A1 &4). Robot Experiments: Evaluate hybrid approaches on 2D (early) and 3D ATRIAS testbed (later) (A5).

Year 3. Theory: Refine theory based on year 2 robot experiments (A2 & 3). Simulation Experiments:Generalize in simulation experiments to ATLAS and COMAN platforms (A1 & 4). Robots: Evaluate hybridapproaches on SARCOS (early), ATLAS and COMAN testbeds (later) (A5).

6.3 Risk Assessment and Mitigation

Theory Development Risks (Activities 2 and 3). Our ideas may not work out as expected, and we may en-counter unforeseen barriers to progress. Mitigation: The proposed research activities include the explorationof different cost and constraint modalities as well as alternative stages for the integration of the policy basedcontrol into the optimization framework (Fig. 7). For activity 2, we have working baseline controllers for

13

Page 16: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

both the SARCOS and ATLAS robots. For activity 3, we have preliminary results that support the viabilityof the replacement approach (Sec. 5.2). Additionally, the intellectual independence of the three approachesprovides a layer of risk mitigation should the theoretical development for one of them be delayed or fail.

Implementation Risks (Activities 1, 4 and 5). As for any project including hardware evaluation, we faceimplementation risks. We may not be able to achieve real time performance of the blended control frame-work. The humanoid robot platforms may break or have mechanical limitations that hamper implementation(power output limited on ATRIAS due to amplifiers, torque control limited on ATLAS and SARCOS due tostiction in actuators and play in transmission and bearings, joint velocity estimation required for approaches1 and 2 limited on ATLAS and SARCOS). Mitigation: We have extensive prior experience working withthree of the four hardware platforms [58, 57, 98, 25, 95, 113, 23] including realtime control implementationand demonstration of locomotion controllers. We are currently updating some of the platforms to improvepower output (ATRIAS) and joint velocity estimation (ATLAS and SARCOS). Working with multiple plat-forms also balances the risk of an implementation delay or failure on a single platform.

7 Broader Impact

Unplanned Broader Impact. Often the broader impacts of our work are serendipitous, and not plannedin advance. Examples of such ad hoc broader impacts from our recent work include: 1) Our technologiesbeing demonstrated on entries in the DARPA Robotics Challenge. 2) A graduate student was a participanton a Discovery Channel TV series, ”The Big Brain Theory: Pure Genius”. One purpose of the TV seriesis getting people excited about engineering. 3) A graduate student helped run a group of all female highschool students in the robot FIRST competition. 4) Our work on soft robotics inspired the soft inflatablerobot Baymax in the Disney movie Big Hero 6 [? ]. We have participated in extensive publicity as a result.An explicit goal of this movie was to support STEAM. We expect similar unplanned broader impacts toresult from this work, especially based on dramatic videos of agile robots.

Development of Human Resources. Students working on this project will gain experience and expertisein teaching and mentoring by assisting the course students in class projects. Students working on this projectwill also have the opportunity to train their communication and inter-personal skills, as they will activelyparticipate in the dissemination of the research results at conferences and in related K-12 outreach programsof the Robotics Institute.

Participation of Underrepresented Groups. We will attract both undergraduate and graduate students,especially those from underrepresented groups. We will also make use of existing efforts that are part ofongoing efforts in the Robotics Institute, and CMU-wide efforts. These efforts include supporting minorityvisits to CMU, recruiting at various conferences and educational institutions, and providing minority fellow-ships. CMU is fortunate in being successful in attracting an usually high percentage of female undergrad-uates in Computer Science. Our collaboration with the Rehabilitation Science and Technology Departmentof the University of Pittsburgh in the area of assistive robotics is a magnet for students with disabilities andstudents who are attracted by the possibility of working on technology that directly helps people.

Outreach. One form of outreach we have pursued is an aggressive program of visiting students and post-docs. This has been most successful internationally, with visitors from the Delft University of Technology(4 students) [6, 56, 119], the HUBO lab at KAIST (1 student and 1 postdoc) [9, 14, 49], and the ChineseScholarship Council supported 5 students [53, 122]. We welcome new visitors, who are typically paid bytheir home institutions during the visit. We are currently experimenting with the use of Youtube and labnotebooks on the web to make public preliminary results as well as final papers and videos. We have foundthis is a useful way to support internal communication as well as potentially create outside interest in ourwork. We will continue to give lectures to visiting K-12 classes. The Carnegie Mellon Robotics Institute al-ready has an aggressive outreach program at the K-12 level, and we will participate in that program, as well

14

Page 17: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

as other CMU programs such as Andrew’s Leap (a CMU a summer enrichment program for high school stu-dents), the SAMS program (Summer Academy for Mathematics + Science: a summer program for diversityaimed at high schoolers), Creative Tech Night for Girls, and other minority outreach programs.

Curriculum Development. We will pursue multiple directions for dissemination. First, we will de-velop course material on robot control and biologically inspired approaches to humanoid and rehabilitationrobotics which will directly be influenced by the planned activities of this proposal. The PIs currently teachseveral courses that will benefit from this material. The first course, 16-642: Manipulation, Mobility & Con-trol, is part of the Robotics Institute’s recently established professional Master’s degree program that aimsat training future leaders in the workforce of robotics and intelligent automation enterprises and agencies.Another course, 16-868: Biomechanics and Motor Control, directly addresses the research areas in whichthis proposal is embedded. We teach a course 16-745: Dynamic Optimization, which addresses optimalcontrol. We also teach a course design to attract undergraduates into the field, 16-264: Humanoids. All ofthese courses emphasize learning from interaction with real robots. We will make these course materialsfreely available on the web.

Dissemination Plan. For a more complete description of our plans for dissemination, see our DataManagement Plan. We will maintain a public website to freely share our simulations and control code andto document research progress with video material. Finally, we will present our work at conferences andpublish it in journals, and use these vehicles to advertise our work to potential collaborators in science andindustry.

Benefits to Society. The proposed research has the potential to lead to more useful robots. A successfuloutcome can enable practical controllers for robust locomotion of legged robots in uncertain environmentswith applications in disaster response and elderly care. The outcomes of this project may also providenew insights into human sensorimotor control, in particular, into how humans adapt locomotion behaviors.Understanding how humans actually control their limbs has the potential to trigger neural and robotic pros-theses and exoskeletons which restore legged mobility to people who have damaged or lost sensorimotorfunctions of their legs due to diabetic neuropathy, stroke, or spinal cord injury as well as improve fall riskmanagement in older adults [109, 16, 55, 121].

Enhancement of Infrastructure for Research and Education. The project will upgrade CMU’s ATRIASbiped robot testbed to a 3D locomotion testbed, a powerful tool for research in dynamics and control oflegged locomotion and an equally powerful tool for education. This testbed and our Sarcos robot regularlyattract the interest of students and inspires them to advanced education and training in robotics.

Relationship to DRC funding. We are currently part of one of the funded teams in the DARPA RoboticsChallenge [17]. This support will end in the summer of 2015. The DRC focuses on reliability, implemen-tation issues, and logistics. We are not able to develop the proposed ideas in the time frame of the DRC, solonger term NSF funding complements our soon to end DRC funding.

15

Page 18: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

References

[1] BiOM Clinical Reference Page.

[2] A. D. Ames. First steps toward underactuated human-inspired bipedal robotic walking. In Robot.Autom. (ICRA), 2012 IEEE Int. Conf., pages 1011–1017. IEEE, 2012.

[3] S. Anderson. The Design of Control Architectures for Force-controlled Humanoids Performing Dy-namic Tasks. PhD thesis, Robotics Institute, Carnegie Mellon University, 2013.

[4] S. Anderson and J. Hodgins. Adaptive torque-based control of a humanoid robot on an unstableplatform. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on,pages 511–517, 2010.

[5] S. Anderson and J. Hodgins. Informed priority control for humanoids. In Humanoid Robots (Hu-manoids), 2011 11th IEEE-RAS International Conference on, pages 416–422, 2011.

[6] S. O. Anderson, M. Wisse, C. G. Atkeson, J. K. Hodgins, G. J. Zeglin, and B. Moyer. Powered bipedsbased on passive dynamic principles. In Proceedings of the 5th IEEE-RAS International Conferenceon Humanoid Robots (Humanoids 2005), pages 110–116, 2005.

[7] C. G. Atkeson. Efficient robust policy optimization. In American Control Conference, 2012.

[8] R. Bellman. Dynamic Programming (Dover Books on Computer Science). Dover Publications, 2003.

[9] D. Bentivegna, C. G. Atkeson, and J.-Y. Kim. Compliant control of a hydraulic humanoid joint. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2007.

[10] R. Blickhan. The spring-mass model for running and hopping. J. Biomech., 22:1217–1227, 1989.

[11] T. G. Brown. On the nature of the fundamental activity of the nervous centres; together with ananalysis of the conditioning of rhythmic activity in progression, and a theory of the evolution offunction in the nervous system. J Physiol, 48(1):18–46, 1914.

[12] C. Chevallereau, E. R. Westervelt, and J. W. Grizzle. Asymptotically stable running for a five-link,four-actuator, planar bipedal robot. Int. J. Rob. Res., 24(6):431–464, 2005.

[13] B.-K. Cho, I.-W. Park, and J.-H. Oh. Running pattern generation of a humanoid biped with a fixedpoint and its realization. Int. J. Humanoid Robot., 6(4):631–656, 2009.

[14] D. Choi, C. G. Atkeson, S. J. Cho, and J. Y. Kim. Phase plane control of a humanoid. In IEEE-RASInternational Conference on Humanoid Robots (Humanoids), pages 145–150, 2008.

[15] G. Courtine, Y. Gerasimenko, R. van den Brand, A. Yew, P. Musienko, H. Zhong, B. Song, Y. Ao,R. M. Ichiyama, I. Lavrov, R. R. Roy, M. V. Sofroniew, and V. R. Edgerton. Transformationof nonfunctional spinal circuits into functional states after the loss of brain input. Nat Neurosci,12(10):1333–1342, 2009.

[16] C. C. Cowie, K. F. Rust, E. S. Ford, M. S. Eberhardt, D. D. Byrd-Holt, C. Li, D. E. Williams, E. W.Gregg, K. E. Bainbridge, S. H. Saydah, and L. S. Geiss. Full accounting of diabetes and pre-diabetesin the U.S. population in 1988-1994 and 2005-2006. Diabetes Care, 32(2):287–294, 2009.

[17] DARPA. www.theroboticschallenge.org.

1

Page 19: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

[18] R. Desai and H. Geyer. Robust swing leg placement under large disturbances. In Proc. IEEE Int ConfRobot. Biomimetics, pages 265–270, 2012.

[19] R. Desai and H. Geyer. Muscle-Reflex Control of Robust Swing Leg Placement. In Proc IEEE IntConf Robot. Autom., pages 2169 – 2174, 2013.

[20] V. Dietz. Proprioception and locomotor disorders. Nat Rev Neurosci, 3(10):781–790, 2002.

[21] M. Eilenberg, H. Geyer, and H. Herr. Control of a Powered Ankle-Foot Prosthesis Based on a Neu-romuscular Model. IEEE Trans Neural Syst Rehabil Eng, 18(2):164–173, 2010.

[22] M. Ernst, H. Geyer, and R. Blickhan. Extension and customization of self-stability control in com-pliant legged systems. Bioinspir. Biomim., pages doi:10.1088/1748–3182/7/4/046002, 2012.

[23] S. Feng, E. Whitman, Xinjilefu, and C. G. Atkeson. Optimization-based full body control for thedarpa robotics challenge. Journal of Field Robotics, in press.

[24] S. Feng, E. Whitman, X. Xinjilefu, and C. G. Atkeson. Optimization Based Full Body Control forthe Atlas Robot. In Proc IEEE Conf Humanoids, Madrid, Spain, 2014.

[25] S. Feng, X. Xinjilefu, W. Huang, and C. Atkeson. 3D walking based on online optimization. In Proc.IEEE Humanoids, 2013.

[26] S. Feng, X. Xinjilefu, W. Huang, and C. G. Atkeson. 3d walking based on online optimization. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2013.

[27] T. Geijtenbeek, M. van de Panne, and a. F. van der Stappen. Flexible muscle-based locomotion forbipedal creatures. ACM Trans. Graph., 32:1–11, 2013.

[28] Y. Gerasimenko, R. R. Roy, and R. V. Edgerton. Epidural stimulation: Comparison of the spinalcircuits that generate and control locomotion in rats, cats and humans. Exp. Neurol., 209:417–425,2008.

[29] H. Geyer and H. Herr. A Muscle-Reflex Model that Encodes Principles of Legged Mechanics Pro-duces Human Walking Dynamics and Muscle Activities. IEEE Trans Neural Syst Rehabil Eng,18(3):263–273, 2010.

[30] H. Geyer, A. Seyfarth, and R. Blickhan. Positive force feedback in bouncing gaits? Proc. R. Soc.Lond. B, 270:2173–2183, 2003.

[31] H. Geyer, A. Seyfarth, and R. Blickhan. Spring-mass running: simple approximate solution andapplication to gait stability. J. Theor. Biol., 232(3):315–328, 2005.

[32] H. Geyer, A. Seyfarth, and R. Blickhan. Compliant leg behaviour explains the basic dynamics ofwalking and running. Proc. R. Soc. Lond. B, 273:2861–2867, 2006.

[33] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale constrainedoptimization. SIAM J. Optim, 12:979–1006, 2002.

[34] S. Grillner. Neurobiological bases of rhythmic motor acts in vertebrates. Science, 228(4696):143–9,Apr. 1985.

[35] K. Hase and N. Yamazaki. Computer Simulation Study of Human Locomotion with a Three-Dimensional Entire-Body Neuro-Musculo-Skeletal Model. I. Acquisition of Normal Walking. JSMEInt. J. Ser. C, 45:1040–1050, 2002.

2

Page 20: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

[36] H. Herr, H. Geyer, and M. Eilenberg. Model-based neuromechanical controller for a robotic leg. USPat. No. 8864846, 2014.

[37] M. Hirose and K. Ogawa. Honda humanoid robots development. Philos Trans. A Math Phys Eng Sci,365(1850):11–19, Jan. 2007.

[38] J. K. Hodgins. Three-Dimensional Human Running. In Proc. 1996 IEEE Int. Conf. Robot. Autom.,pages 3271–3276, 1996.

[39] W. Huang, J. Kim, and C. G. Atkeson. Energy-based optimal step planning for humanoids. InRobotics and Automation (ICRA), 2013 IEEE International Conference on, pages 3124–3129, 2013.

[40] H. Hultborn and J. B. Nielsen. Spinal control of locomotion - from cat to man. Acta Physiol.,189(2):111–121, 2007.

[41] M. Hutter, M. Hoepflinger, C. Gehring, C. D. R. M. Bloesch, and R. Siegwart. Hybrid operationalspace control for compliant legged systems. In Proc. of the 8th Robotics: Science and SystemsConference (RSS), 2012.

[42] V. T. Inman, H. J. Ralston, F. Todd, and J. C. Lieberman. Human walking. Williams & Wilkins,Baltimore, 1981.

[43] D. H. Jacobson and D. Q. Mayne. Differential Dynamic Programming. Elsevier, 1970.

[44] J. J. K. Jr., S. Kagami, K. Nishiwaki, M. Inaba, and H. Inoue. Dynamically-Stable Motion Planningfor Humanoid Robots. Auton. Robots, 12(1):105–118, Jan. 2002.

[45] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa. Biped walk-ing pattern generation by using preview control of zero-moment point. In Robotics and Automation,2003. IEEE International Conference on, volume 2, pages 1620–1626 vol.2, 2003.

[46] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa. Resolvedmomentum control: humanoid motion planning based on the linear and angular momentum. In Intell.Robot. Syst., pages 1644–1650, 2003.

[47] S. Kajita, T. Nagasaki, K. Kaneko, and H. Hirukawa. ZMP-Based Biped Running Control. Robot.Autom. Mag. IEEE, 14(2):63–72, 2007.

[48] J. Kim, N. S. Pollard, and C. G. Atkeson. Quadratic encoding of optimized humanoid walking. InIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2013.

[49] J.-Y. Kim, C. G. Atkeson, J. K. Hodgins, D. Bentivegna, and S. J. Cho. Online gain switching algo-rithm for joint position control of a hydraulic humanoid robot. In IEEE-RAS International Conferenceon Humanoid Robots (Humanoids), 2007.

[50] S. Kim, C. G. Atkeson, and S. Park. Perturbation-dependent selection of postural feedback gain andits scaling. Journal of Biomechanics, 45(8):1379–1386, 2012.

[51] Y. Kim, Y. Tagawa, G. Obinata, and K. Hase. Robust control of CPG-based 3D neuromusculoskeletalwalking model, 2011.

[52] T. Kwon and J. K. Hodgins. Control systems for human running using an inverted pendulum modeland a reference motion capture sequence. In Proceedings of the 2010 ACM SIGGRAPH/EurographicsSymposium on Computer Animation, SCA ’10, 2010.

3

Page 21: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

[53] C. Liu and C. G. Atkeson. Standing balance control using a trajectory library. In IEEE/RSJ Interna-tional Conference on Intelligent Robots and Systems (IROS), pages 3031 — 3036, 2009.

[54] C. Liu, C. G. Atkeson, and J. Su. Biped walking control using a trajectory library. ROBOTICA,31:311–322, 2013.

[55] D. Lloyd-Jones, R. J. Adams, T. M. Brown, M. Carnethon, S. Dai, G. De Simone, and Others. Heartdisease and stroke statistics–2010 update: a report from the American Heart Association. Circulation,121(7):e46–e215, 2010.

[56] T. Mandersloot, M. Wisse, and C. Atkeson. Controlling velocity in bipedal walking: A dynamicprogramming approach. In Proceedings of the 6th IEEE-RAS International Conference on HumanoidRobots (Humanoids), 2006.

[57] W. Martin, A. Wu, and H. Geyer. Robust Spring Mass Model Running for a Physical Bipedal Robot.In IEEE ICRA, Submitt.

[58] W. Martin, A. Wu, and H. Geyer. Robust Spring-Mass Model Running on ATRIAS Biped. In Dyn.Walk. 2014, 2014.

[59] T. A. McMahon and G. C. Cheng. The mechanism of running: how does stiffness couple with speed?J. Biomech., 23:65–78, 1990.

[60] M. Mistry, J. Buchli, and S. Schaal. Inverse Dynamics Control of Floating Base Systems UsingOrthogonal Decomposition. Control, pages 3406–3412, 2010.

[61] M. Mistry, J. Buchli, and S. Schaal. Inverse dynamics control of floating base systems using orthog-onal decomposition. In Robotics and Automation (ICRA), 2010 IEEE International Conference on,pages 3406–3412, 2010.

[62] M. Mistry, A. Murai, K. Yamane, and J. Hodgins. Sit-to-stand task on a humanoid robot from humandemonstration. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conferenceon, pages 218–223, 2010.

[63] M. Mistry, A. Murai, K. Yamane, and J. Hodgins. Model-Based Control and Estimation of Hu-manoid Robots via Orthogonal Decomposition. In Exp. Robot., chapter Model-Based, pages 839–854. Springer, 2014.

[64] I. Mordatch, M. de Lasa, and A. Hertzmann. Robust physics-based locomotion using low-dimensionalplanning. ACM Trans. Graph., 29(4):1, July 2010.

[65] J. Morimoto, G. Endo, J. Nakanishi, and G. Cheng. A Biologically Inspired Biped LocomotionStrategy for Humanoid Robots: Modulation of Sinusoidal Patterns by a Coupled Oscillator Model.IEEE Trans. Robot., 24(1):185–191, Feb. 2008.

[66] G. Nelson, A. Saunders, N. Neville, B. Swilling, J. Bondaryk, D. Billings, C. Lee, R. Playter, andM. Raibert. Petman: A humanoid robot for testing chemical protective clothing. J. Robot. Soc. Japan,30(4):372–377, 2012.

[67] S. Noda, M. Murooka, S. Nozawa, Y. Kakiuchi, K. Okada, and M. Inaba. Generating whole-bodymotion keep away from joint torque, contact force, contact moment limitations enabling steep climb-ing with a real humanoid robot. In 2014 IEEE Int. Conf. Robot. Autom., pages 1775–1781. IEEE,May 2014.

4

Page 22: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

[68] Y. Ogura, K. Shimomura, H. Kondo, A. Morishima, T. Okubo, S. Momoki, H. ok Lim, and A. Takan-ishi. Human-like walking with knee stretched, heel-contact and toe-off motion by a humanoid robot.In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 3976–3981,2006.

[69] J. Perry. Gait analysis: normal and pathological function. SLACK Inc., Thorofare, NJ, 1992.

[70] E. Pierrot-Desseilligny and D. C. Burke. The circuitry of the human spinal cord: its role in motorcontrol and movement disorders. Cambridge University Press, Cambridge, 2005.

[71] G. A. Pratt and M. M. Williamson. Series elastic actuators. IEEE/RSJ Int. Conf. Intell. Robot. Syst.,1:399–406, 1995.

[72] J. Pratt, J. Carff, S. Drakunov, and A. Goswami. Capture point: A step toward humanoid pushrecovery. In Humanoid Robots, 2006 6th IEEE-RAS International Conference on, pages 200–207,2006.

[73] J. Pratt, T. Koolen, T. de Boer, J. Rebula, S. Cotton, J. Carff, M. Johnson, and P. Neuhaus.Capturability-based analysis and control of legged locomotion, Part 2: Application to M2V2, a lower-body humanoid. Int. J. Rob. Res., 31:1117–1133, 2012.

[74] M. Raibert, K. Blankenspoor, G. Nelson, and R. Playter. BigDog, the Rough-Terrain QuadrupedRobot. In Proc. 17th World Congr. Int Fed Autom. Control, pages 10823–10825, 2008.

[75] M. H. Raibert. Legged robots that balance. MIT press, Cambridge, 1986.

[76] L. Righetti, J. Buchli, M. Mistry, M. Kalakrishnan, and S. Schaal. Optimal distribution of contactforces with inverse dynamics control. International Journal of Robotics Research, pages 280–298,2013.

[77] S. Sanan, M. H. Ornstein, and C. G. Atkeson. Physical human interaction for an inflatable manipula-tor. In IEEE Engineering in Medicine and Biology Society (EMBC), pages 7401–7404, 2011.

[78] S. Schaal and C. G. Atkeson. Learning control for robotics. IEEE Robotics & Automation Magazine,17(2):20–29, 2010.

[79] A. Schepelmann, M. D. Taylor, and H. Geyer. Development of a Testbed for Robotic NeuromuscularControllers. In Robot. Sci. Syst. VIII - Online Proc., 2012.

[80] L. Sentis, J. Park, and O. Khatib. Compliant control of multicontact and center-of-mass behaviors inhumanoid robots. IEEE Trans. Robot., 26:483–501, 2010.

[81] A. Seyfarth, H. Geyer, M. Gunther, and R. Blickhan. A movement criterion for running. J. Biomech.,35:649–655, 2002.

[82] A. Seyfarth, H. Geyer, and H. M. Herr. Swing-leg retraction: a simple control model for stablerunning. J. Exp. Biol., 206:2547–2555, 2003.

[83] A. Seyfarth, M. Gunther, and R. Blickhan. Stable operation of an elastic three-segmented leg. Biol.Cybern., 84:365–382, 2001.

[84] J. Shan and F. Nagashima. Neural Locomotion Controller Design and Implementation for HumanoidRobot HOAP-1. In Proc. 20th Annu. confrence Robot. Soc. Japan, 2002.

5

Page 23: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

[85] C. S. Sherrington. Flexion-reflex of the limb, crossed extension-reflex, and reflex stepping and stand-ing. J Physiol, 40(1-2):28–121, 1910.

[86] K. W. Sok, K. Yamane, J. Lee, and J. Hodgins. Editing dynamic human motions via momentumand force. In Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on ComputerAnimation, SCA ’10, pages 11–20, 2010.

[87] S. Song, R. Desai, and H. Geyer. Integration of an adaptive swing control into a neuromuscularhuman walking model. In 2013 35th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., pages 4915–4918.IEEE, July 2013.

[88] S. Song and H. Geyer. A neural circuitry that emphasizes spinal feedbacks generates diverse behaviorsof human locomotion. J. R. Soc. Interface, Submitt.

[89] S. Song and H. Geyer. Regulating speed and generating large speed transitions in a neuromuscularhuman walking model. In Proc IEEE Int. Conf. Robot. Autom., pages 511–516. IEEE, May 2012.

[90] S. Song and H. Geyer. Generalization of a muscle-reflex control model to 3D walking. In 2013 35thAnnu. Int. Conf. IEEE Eng. Med. Biol. Soc., pages 7463–7466. IEEE, July 2013.

[91] S. Song and H. Geyer. Robust 3D locomotion models using primarily reflex control. In Online ProcDyn. Walk., 2013.

[92] K. Sreenath, H.-W. Park, I. Poulakakis, and J. W. Grizzle. Embedding active force control withinthe compliant hybrid zero dynamics to achieve stable, fast running on MABEL. Int. J. Rob. Res.,32(3):324–345, 2013.

[93] B. Stephens. Dynamic balance force control for compliant humanoid robots. In International Con-ference on Intelligent Robots and Systems (IROS), pages 1248–1255, 2010.

[94] B. Stephens. Push Recovery Control for Force-Controlled Humanoid Robots. PhD thesis, TheRobotics Institute, Carnegie Mellon University, Pittsburgh, August 2011.

[95] B. Stephens. Push Recovery Control for Force-Controlled Humanoid Robots. PhD thesis, CarnegieMellon University, 2011.

[96] B. Stephens and C. Atkeson. Dynamic Balance Force Control for compliant humanoid robots. Intell.Robot. Syst. (IROS), 2010 IEEE/RSJ Int. Conf., pages 1248–1255, 2010.

[97] B. Stephens and C. G. Atkeson. Push recovery by stepping for humanoid robots with force controlledjoints. In International Conference on Humanoid Robots, 2010.

[98] B. J. Stephens and C. G. Atkeson. Push Recovery by stepping for humanoid robots with force con-trolled joints. 2010 10th IEEE-RAS Int. Conf. Humanoid Robot., pages 52–59, 2010.

[99] M. Stolle and C. G. Atkeson. Finding and transferring policies using stored behaviors. AutonomousRobots, 29(2):169—200, 2010.

[100] G. Taga. A model of the neuro-musculo-skeletal system for human locomotion: I. Emergence ofbasic gait. Biol. Cybern., 73(2):97–111, 1995.

[101] G. Taga. A model of the neuro-musculo-skeletal system for anticipatory adjustment of human loco-motion during obstacle avoidance. Biol Cybern, 78(1):9–17, 1998.

6

Page 24: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

[102] G. Taga, Y. Yamaguchi, and H. Shimizu. Self-organized control of bipedal locomotion by neuraloscillators in unpredictable environment. Biol Cybern, 65(3):147–159, 1991.

[103] T. Takenaka, T. Matsumoto, T. Yoshiike, and S. Shirokura. Real time motion generation and controlfor biped robot 2nd report: Running gait pattern generation. In Intell. Robot. Syst. 2009. IROS 2009.IEEE/RSJ Int. Conf., pages 1092–1099. IEEE, 2009.

[104] R. Tedrake, I. R. Manchester, M. Tobenkin, and J. W. Roberts. LQR-trees: Feedback Motion Planningvia Sums-of-Squares Verification. Int. J. Rob. Res., 29(8):1038–1052, Apr. 2010.

[105] N. Thatte and H. Geyer. Towards Local Reflexive Control of a Powered Transfemoral Prosthesis forRobust Amputee Push and Trip Recovery. IEEE Conf. Intell. Robot. Syst., 2014.

[106] E. Theodorou, Y. Tassa, and E. Todorov. Stochastic Differential Dynamic Programming. In Am.Control Conf., pages 1125–1132. IEEE, 2010.

[107] D. Torricelli, R. Mizanoor, J. Gonzalez, V. Lippi, G. Hettich, L. Asslaener, M. Weckx, B. Vander-borght, S. Dosen, M. Sartori, J. Zhao, S. Schuetz, Q. Liu, T. Mergner, D. Lefeber, D. Farina, K. Berns,and J. Pons. Benchmarking Human-Like Posture and Locomotion of Humanoid Robots: A Prelim-inary Scheme. In A. Duff, N. Lepora, A. Mura, T. Prescott, and P. Verschure, editors, Lect. NotesComput. Sci. Living Mach., pages 320–331. Springer International Publishing Switzerland, 2014.

[108] J. L. van Leeuwen. Muscle function in locomotion. In R. Alexander, editor, Mech. Anim. Locomot.,volume 11, pages 191–249. Springer-Verlag, New York, 1992.

[109] A. I. Vinik, B. D. Mitchell, S. B. Leichter, A. L. Wagner, J. T. O’Brian, and L. P. Georges. Epidemi-ology of the complications of diabetes, pages 221–287. Cambridge University Press, Cambridge,1995.

[110] J. M. Wang, S. R. Hamner, S. L. Delp, and V. Koltun. Optimizing locomotion controllers usingbiologically-based actuators and objectives, 2012.

[111] E. R. Westervelt and J. W. Grizzle. Feedback Control of Dynamic Bipedal Robot Locomotion. Controland Automation Series. CRC PressINC, 2007.

[112] E. Whitman. Coordination of Multiple Dynamic Programming Policies for Control of Bipedal Walk-ing. PhD thesis, The Robotics Institute, Carnegie Mellon University, Pittsburgh, September 2013.

[113] E. Whitman. Coordination of Multiple Dynamic Programming Policies for Control of Bipedal Walk-ing. PhD thesis, Robotics Institute, Carnegie Mellon University, 2013.

[114] E. Whitman and C. Atkeson. Control of instantaneously coupled systems applied to humanoid walk-ing. In Humanoid Robots (Humanoids), IEEE-RAS International Conference on, pages 210–217,2010.

[115] E. Whitman and C. G. Atkeson. Control of instantaneously coupled systems applied to humanoidwalking. In International Conference on Humanoid Robots, pages 210–217, 2010.

[116] E. C. Whitman and C. G. Atkeson. Multiple model robust dynamic programming. In AmericanControl Conference (ACC), pages 5998–6004, 2012.

[117] E. C. Whitman, B. J. Stephens, and C. G. Atkeson. Torso rotation for push recovery using a simplechange of variables. In International Conference on Humanoid Robots, pages 50–56, 2012.

7

Page 25: Combining Optimal and Neuromuscular Controllers for Agile ...cga/tmp-public/geyer3.pdf · Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Hartmut

[118] D. A. Winter. Biomechanics and motor control of human movement. Wiley, Hoboken, N.J., 4th editioedition, 2009.

[119] M. Wisse, C. G. Atkeson, and D. K. Kloimwieder. Swing leg retraction helps biped walking stability.In Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots (Humanoids),2005.

[120] A. Wu and H. Geyer. The 3-D Spring-Mass Model Reveals a Time-Based Deadbeat Control forHighly Robust Running and Steering in Uncertain Environments. IEEE Trans. Robot., 29(5):1114–1124, 2013.

[121] M. Wyndaele and J.-J. Wyndaele. Incidence, prevalence and epidemiology of spinal cord injury: whatlearns a worldwide literature survey? Spinal Cord, 44(9):523–529, 2006.

[122] D. Xing, C. G. Atkeson, J. Su, and B. Stephens. Gain scheduled control of perturbed standing balance.In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4063–4068,2010.

[123] Xinjilefu and C. Atkeson. State estimation of a walking humanoid robot. In Intelligent Robots andSystems (IROS), 2012 IEEE/RSJ International Conference on, pages 3693–3699, 2012.

[124] X. Xinjilefu, S. Feng, W. Huang, and C. Atkeson. Decoupled state estimation for humanoid usingfull-body dynamics. In International Conference on Robotics and Automation, 2014.

[125] K. Yamane, S. Anderson, and J. Hodgins. Controlling humanoid robots with human motion data:Experimental validation. In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS InternationalConference on, pages 504–510, 2010.

[126] K. Yamane and J. Hodgins. Control-aware mapping of human motion data with stepping for hu-manoid robots. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conferenceon, pages 726–733, 2010.

[127] K. Yamane, J. Kuffner, and J. Hodgins. Synthesizing animations of human manipulation tasks. ACMTransactions on Graphics, 23(3):532–539, 2004.

[128] K. Yin, K. Loken, and M. van der Penne. SIMBICON: simple biped locomotion control. In ProcACM SIGGRAPH, volume 26, 2007.

[129] R. Zaier and S. Kanda. Adaptive locomotion controller and reflex system for humanoid robots. In2008 IEEE/RSJ Int. Conf. Intell. Robot. Syst. IROS, pages 2492–2497, 2008.

[130] M. Zucker, J. A. Bagnell, C. G. Atkeson, and J. Kuffner. An optimization approach to rough terrainlocomotion. In IEEE International Conference on Robotics and Automation (ICRA), pages 3589–3595, 2010.

[131] M. Zucker, N. D. Ratliff, M. Stolle, J. E. Chestnutt, J. A. Bagnell, C. G. Atkeson, and J. Kuffner.Optimization and learning for rough terrain legged locomotion. International Journal of RoboticResearch, 30(2):175–191, 2011.

8