Learning rhythmic movements by demonstration using ... · Learning Rhythmic Movements by...

Learning Rhythmic Movements by Demonstration

using Nonlinear Oscillators

Auke Jan Ijspeert1, Jun Nakanishi2, and Stefan Schaal1,2

1Dept. of Computer Science, University of Southern California, Los Angeles, USA2ATR Human Information Science Laboratories, Kyoto, JapanEmail: [email protected], [email protected], [email protected]

Abstract

This paper presents a new approach to the generationof rhythmic movement patterns with nonlinear dy-namical systems. Starting from a canonical limit cy-cle oscillator with well-defined stability properties, wemodify the attractor landscape of the canonical sys-tem by means of statistical learning methods to embedarbitrary smooth target patterns, however, withoutlosing the stability properties of the canonical sys-tem. In contrast to non-autonomous movement rep-resentations like splines, the learned pattern gener-ators remain autonomous dynamical systems whichrobustly cope with external perturbations that disruptthe time flow of the original pattern, and which canalso be modified on-line by additional perceptual vari-ables. A simple extension allows to cope with mul-tiple degrees-of-freedom (DOF) patterns, where allDOFs share the same fundamental frequency but,otherwise, can move in arbitrary phase and ampli-tude offsets to each other. We evaluate our meth-ods in learning from demonstration with an actual30 DOF humanoid robot. Figure-8 and drummingmovements are demonstrated by a human, recordedin joint angle space with an exoskeleton, and em-bedded in multi-dimensional rhythmic pattern gener-ators. The learned patterns can be used by the robotin various workspace locations and from arbitraryinitial conditions. Spatial and temporal invarianceof the pattern generators allow easy amplitude andspeed scaling without losing the qualitative signatureof a movement. This novel way of creating rhyth-mic patterns could tremendously facilitate rhythmicmovement generation, in particular in locomotion ofrobots and neural prosthetics in clinical applications.

1 Introduction

There has been a growing interest in nonlinear oscil-lators in both robotics and the modeling of animalmotor control [3, 6, 9, 10, 13, 16], particularly for thecontrol of rhythmic movements such as locomotion,

juggling, or drumming. Nonlinear oscillators haveinteresting properties for rhythmic motor control, in-cluding robust limit cycle behavior, on-line adapta-tion through coupling signals, synchronization withother rhythmic systems, and the possibility to ef-ficiently exploit the natural dynamics of mechanicalsystems through resonance tuning. However, it is dif-ficult to design oscillator controllers for a designatedtask, e.g. when specific frequencies, phases, and sig-nal shapes are needed. Attention has therefore beengiven to derive learning algorithms for adjusting non-linear oscillators automatically (e.g., [4, 5, 12]). Sofar, most work has focused on learning the frequen-cies and phase relations of multiple rhythmic pat-terns, rather than on complete signal shaping.

This paper focuses on how to learn complex rhyth-mic patterns from a desired target signal, e.g., as ob-tained from movement recordings or learning fromdemonstration. We build on previous work, wherewe proposed to learn attractor landscapes of pointattractors to form control policies (CPs) for discretemovements (e.g., a pointing motion, tennis swingstoward a ball) [7]. The essence of this approach wasto use a canonical simple dynamical system with welldefined point attractor properties, to anchor Gaus-sian basis functions in phase space of this system,and to learn a weight vector that multiplies the basisfunctions and creates a nonlinear modulation of thecanonical dynamics to form arbitrary smooth newattractor landscapes. The statistical learning sys-tem employed was based on nonparametric regres-sion techniques [14], a method that allows a theoret-ically sound way of determining the number of basisfunction needed for accurate learning.

In this paper, we extend these ideas to learningrhythmic movements. A simple nonlinear oscillatorwith stable limit cycle dynamics is used as a canon-ical system to provide a phase signal to anchor thenonlinear basis functions, and a learned weight vec-tor, multiplying the basis functions, together withthe amplitude signal of the canonical system are used

to create new, complex rhythmic patterns. Given asample trajectory of a desired rhythmic movement,e.g., from a human demonstration, statistical learn-ing methods [14] can be used to embed this desiredpattern as a limit cycle attractor in our dynamicalsystems.

In the following, Section 2 first introduces our meth-ods to learn rhythmic pattern generators for 1 DOFsystems, discusses the theoretical properties of thesystems, and demonstrates a possible extension tomulti-dimensional pattern generators. Afterwards,in Section 3 , we will illustrate the application ofour methods in experimental evaluations of learn-ing from demonstration with an actual 30 DOF hu-manoid robot.

2 Nonlinear Oscillators as Control Poli-

cies

Nonlinear dynamical systems can be conceived of ascontrol policies (CP), as they prescribe a change-of-state in a particular state in the same way as con-trol policies prescribe a motor command in a par-ticular state. The interesting property of dynamicalsystems as control policies lies in their ability to beon-line modified by external coupling variables, aswell-known from the theory of coupled oscillators.The research question we pursue is whether differen-tial equations can be used as a general tool to createcontrol policies. For the purpose of learning a controlpolicy from sample trajectories, we represent CPs inkinematic coordinates (similar as in [13]), e.g. jointangles of a robot, thus assuming that an appropriatecontroller exists to convert outputs of the kinematicpolicies into motor commands.

To create rhythmic control policies (RCPs), the cen-tral element of our modeling is the canonical nonlin-ear oscillator (see [8, 11] for very similar oscillators):

z = −µ

E0

(E − E0) z − k2u (1)

u = z (2)

which has a stable limit cycle characterized by the

closed trajectory z2

2+ k2u2

2= E0 (const) on the phase

plane (see Figure 1), where µ, E0 and k are positive

parameters and E(u, z) = z2

2+ k2u2

2denotes the en-

ergy of the oscillator. Note that the first term in (1)can be considered as a nonlinear damping term whichregulates the total energy of the system. The param-eter k corresponds to the frequency of the oscillator,and E0 corresponds to the desired total energy anddetermines the amplitude of the oscillation. µ deter-mines the convergence rate to the limit cycle.

Based on this limit cycle oscillator, we introduce acontrol policy of first order dynamics with a nonlin-

−1 −0.5 0 0.5 1 1.5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

z

k*u

φ

Figure 1: Phase plot of the nonlinear oscillator (u, z)

with µ=10, k=2π and E0=0.5. The continuous line cor-

responds to the limit cycle behavior. The dotted lines

show different trajectories starting with different initial

conditions.

ear function f to produce a desired position y from

y = β(ym − y) + f (3)

where

f =

∑N

i=1Ψiw

Ti z

∑N

i=1Ψi

(4)

with Gaussian kernel functions

Ψi = exp(

−0.5hi(φ − ci)2)

. (5)

β is a positive constant, and ym is a parameter whichdetermines the baseline around which y oscillates.The phase variable φ = atan2(z, ku) anchors theGaussian kernel functions in the phase space of thecanonical system (1,2), and z = [z,

√E0]

T is thedriving signal for (3), generated by the limit cycledynamics and including a bias term for better func-tion approximation. The parameters hi and ci de-termine the location and width, respectively, of eachbasis function in phase space. The choice of z is mo-tivated by our desire to have a differential equationwith spatial scale invariance (see Section 2.2) and theneed that y should be zero for E0 = 0. The parame-ters wi are learned to fit a given sample trajectories1,as explained in Section 2.4. Figure 2 shows an exem-plary time evolution of the complete system.

2.1 Stability of the RCP

First, we show that the (u, z) oscillator has a stable

limit cycle characterized by the closed trajectory z2

2+

k2u2

2= E0 on the phase plane. Consider the time

derivative of the energy of the oscillator

V = −µ

E0

(E − E0)z2. (6)

1Note that to ensure smooth transitions from one period

to the other, each Ψi function is in practice the sum of three

gaussian functions centered on ci − 2π, ci, and ci + 2π, rather

than a single gaussian function centered on ci.

0 0.5 1 1.5 2−4

−2

0

2

y

0 0.5 1 1.5 2−50

0

50

dy/d

t

0 0.5 1 1.5 2

−2

0

2

φ

0 0.5 1 1.5 20

0.5

1

Ψi

0 0.5 1 1.5 2−0.5

0

0.5

u

Time [s]0 0.5 1 1.5 2

−2

−1

0

1

2

z

Time [s]

Figure 2: Time evolution over two periods of the dy-

namical systems (limit cycle behavior). N=25, µ=20,

β=3, σi=0.0386 for i=1,...,N , The ci are equally spaced

between c1=−π and cN=π. The same parameters will

be used throughout the paper. In this particular example,

k=2π, E0=0.5, ym=−1.0, and the parameters wi have

been adjusted to fit ydemo(t) = sin(2πt) + cos(6πt).

This implies that the energy of the oscillator E

monotonically converges to E0 from any initial con-dition (except for the origin) and therefore E(u, z) =E0 is a unique stable limit cycle of the system. They dynamics in (3) can be interpreted simply as a firstorder low pass filter of a periodic input f . If the in-put is bounded, the output of this equation will bebounded, too.

2.2 Spatial Scale Invariance

An interesting property of the RCPs is that theyare spatially invariant. Scaling of the energy ofthe oscillator E0 by a factor c does not affect thetopology of the attractor landscape of the policy.It can be shown that the attractor landscape with[E0, z, u, φ, ym, y] is topologically equivalent to thelandscape with [cE0,

√cz,

√cu, φ,

√cym,

√cy]. Sim-

ilarly, the frequency of the signal y is directly de-termined by the parameter k. We will exploit theseproperties to modulate the frequency and amplitudeof learned rhythmic patterns in section 3.3.

2.3 Robustness against Perturbations

When considering applications of our approach tophysical systems, e.g., robots and humanoids, inter-actions with the environment may require an on-linemodification of the policy. As outlined in [7], the dy-namical system formulation allows feeding back anerror term between actual and desired positions intoCPs such that the time evolution of the CP can bepaused during a perturbation. In this section, we ex-tend this idea to rhythmic control policies by mod-ulating the original equations by the error between

0 2 4 6−6

−4

−2

0

2

y

0 2 4 6−50

0

50

dy/d

t

0 2 4 6

−2

0

2

φ

0 2 4 60

0.5

1

Ψi

0 2 4 6−0.5

0

0.5

u

Time [s]0 2 4 6

−2

−1

0

1

2

z

Time [s]

Figure 3: Time evolution of the dynamical systems un-

der a perturbation. The dotted and continuous lines cor-

respond, respectively, to the unperturbed and perturbed

dynamics. The actual position y is frozen between 1.5

and 3.0s (dashed line in top left figure). For this exam-

ple, αu = 200, and αy = 50.

the planned position y and the actual position of thephysical system denoted by y:

u = z(

1 + αu(y − y)2)

−1(7)

y =

∑N

i=1Ψiw

Ti z

∑N

i=1Ψi

+ β(ym − y) + αy(y − y)(8)

Figure 3 illustrates the effect of a perturbation wherethe actual motion of the physical system is blockedduring a short period of time. During the perturba-tion, the time evolution of the states of the policy isgradually halted. The desired position y is modifiedto remain close to the actual position y, and, as soonas the perturbation stops, rapidly resumes perform-ing the (time-delayed) planned trajectory. Note thatother (task-specific) ways to cope with perturbationscan be designed, which will be addressed in our fu-ture work. Such on-line modifications are one of themost interesting properties of using autonomous dif-ferential equations for control policies.

2.4 Learning of Rhythmic Control Policies

Assume we are given a target trajectory ydemo, e.g.,from a human demonstration. For learning the at-tractor landscape of the control policy from the givensampled trajectory, we solve a nonlinear function ap-proximation problem to find the parameters wi in(3).

Given a sampled data point (ftarget, z) at t where

ftarget = ydemo − β(ym − ydemo) (9)

(cf. (3)), the learning problem is formulated tofind the parameters wi using incremental locally

(u, z, φ,Ψ)i

ym

y

ym

y

ym

yw

w

w

1

11

2

2

66

2

6

ym7

ym8

ym12

E kw

w

w

y

y

y

8

77

8

12

12

Figure 4: Schematic view of an organization of a mul-

tiple DOF RCP. A unique (u, z) oscillator is used to gen-

erate the rhythmic drive for 12 DOFs (as will be needed

in our drumming experiment below).

weighted regression technique [14] in which wi is up-dated by

wt+1

i = wti + Pt+1

i zei (10)

where

Pt+1

i =1

λ

(

Pti −

Ptizz

T Pti

λΨi

+ zT Ptiz

)

, ei = ftarget −wTi z

and λ ∈ [0, 1] is the forgetting factor. We chose thislocally weighted regression framework as it can auto-matically find the correct number of necessary basisfunction, can tune the hi parameters of each Gaus-sian basis function (5) to achieve higher function ap-proximation accuracy, and, most importantly, learnsthe parameters wi of every local model i totally inde-pendently of all other local models. The latter prop-erty creates very consistent parameters for similarpatterns that can be use for classification of differentrhythmic movements [7].

2.5 Multi-dimensional Rhythmic Patterns

For multi-dimensional rhythmic patterns, it is im-portant to also provide a mechanism for the stablecoordination between the individual DOFs partici-pating in the pattern. A possible solution is sug-gested in Figure 4. Here, the canonical oscillator (1,2) is used to generate the rhythmic (u, z) dynamicsfor all DOFs—the parameters wi are learned indi-vidually for each DOF, and each DOF has its ownequation (3). By manipulating E0 and k in the oscil-lator, the amplitude and frequency of all DOFs canbe modulated, while keeping the same phase relationbetween the DOFs. This form of modulation mightbe particularly useful, for instance, in a drummingtask in order to replay the same beat pattern at dif-ferent speeds.

3 Experimental Evaluations

We tested the proposed RCPs in a learning bydemonstration task using a humanoid robot, whichis a 1.9-meter tall 30 DOF hydraulic anthropomor-phic robot with legs, arms, a jointed torso, and ahead [2]. We recorded a set of rhythmic movementssuch as tracing a figure-8, or a drumming sequenceon a bongo performed by a human subject using ajoint-angle recording system, the Sarcos Sensuit (seeFigure 5 left). Six degrees of freedom for both armswere recorded (three at the shoulder, one at the el-bow, and two at the wrist). The recorded joint-angletrajectories are fitted by the RCPs, with one set ofweights per DOF using the multi-DOF RCP in Fig-ure 4. The RCPs are then used by the humanoidrobot to imitate the movement. Sections 3.1 and 3.2show the experimental results of learning by imita-tion of figure-8 and drumming tasks. Section 3.3 dis-cusses how the learned movements can be modulatedusing the proposed RCPs.

3.1 Learning of Figure-8 Movements

Figure 5 (left) shows the human demonstration of afigure-8 movement, and Figure 5 (right) shows therobot performance of the learned movement usingthe proposed RCPs. Figure 6 illustrates the recordedtrajectories of the figure-8 performed by the robot.Movies of the human demonstration and the robotperformance can be found at [1]. The movies alsodemonstrate robustness against perturbations im-posed by a human interfering with the robot’s ex-ecution of the pattern: the robot pauses the figure-8motion when its arm is blocked, and resumes themotion where it was stopped when the perturbationvanishes.

3.2 Learning of Drumming Movements

In this experiment, we recorded movements whichlook like a drumming sequence on a bongo (i.e. with-out drumming sticks). Figure 7 shows the joint tra-jectories over one period of an exemplary drummingbeat. Demonstrated and learned trajectories are su-perposed. For learning, the base frequency was ex-tracted manually such as to provide the parameter k

to the RCP. Note that the beat includes higher fre-quency components for the right arm, with the righthand hitting twice the bongo during the baseline pe-riod. Movies of the human demonstration and therobot performance can be found at [1].

3.3 Modulation of Learned Movements

Once a rhythmic movement has been learned by theRCP, it can be modulated in several ways. Figure 8illustrates an example of different modulations of therecorded trajectory (A), which were generated by

Figure 5: Humanoid robot learning a figure-8 move-

ment from a human demonstration. Left: human demon-

stration. Right: robot execution. Recording of the robot

performance is shown in Figure 6.

changing in the time window between 3-7 secondsthe RCP’s amplitude (B), frequency (C) and spa-tial mid point (D), respectively—due to space con-straints, only one degree of freedom is shown. Inall examples, changes are smooth and rapid, as it isdesirable for such behaviors. Movies showing the ef-fect of frequency and amplitude modulations on thefigure-8 and drumming movements can be viewed at[1].

4 Discussion

This paper presented a method for learning rhyth-mic patterns based on nonlinear oscillators. Thecentral idea of our approach is to encode complexoscillatory patterns based on modulating a canoni-cal simple limit cycle system with statistical learningmethods, using nonlinear basis functions anchored inphase space of the canonical system. Despite the re-sulting dynamical system becomes strongly nonlin-ear, stability properties of the canonical system arepreserved. The final dynamical system can be con-ceived of as a kinematic control policy with a globallimit cycle attractor.

Learning control policies based on a dynamicalsystems approach has various desirable properties.Firstly, since movement plans generated by the dy-

Figure 6: Recorded tip trajectory of the hand of robot

performing the learned figure-8 movement shown in Fig-

ure 5.

0 0.5 1 1.5 2−0.1

−0.050

0.05

L_S

FE

0 0.5 1 1.5 2

00.05

0.10.15

R_S

FE

0 0.5 1 1.5 2−0.15

−0.1−0.05

00.05

L_S

AA

0 0.5 1 1.5 2

−0.050

0.050.1

R_S

AA

0 0.5 1 1.5 2−0.4−0.2

00.2

L_H

R

0 0.5 1 1.5 2

00.20.4

R_H

R

0 0.5 1 1.5 2−0.4−0.2

00.20.40.6

L_E

B

0 0.5 1 1.5 2−0.8−0.6−0.4−0.2

0

R_E

B

0 0.5 1 1.5 2

−0.2−0.1

00.1

L_W

FE

0 0.5 1 1.5 20

0.10.20.3

R_W

FE

0 0.5 1 1.5 2

−0.20

0.2

L_W

R

Time [s]0 0.5 1 1.5 2

−0.4−0.2

0

R_W

R

Time [s]

Figure 7: Recorded drumming movement performed

with both arms (6 degrees of freedom per arm). The dot-

ted lines and continuous lines correspond to the demon-

strated and learned trajectories, respectively.

namical systems are not explicitly indexed by time,i.e, develop out of the time evolution of autonomousdifferential equations, flexible on-line modification ofthe control policies can be accomplished by meansof coupling terms to the differential equations. Wedemonstrated one such example by incorporating thetracking error of a robotic systems as an inhibitoryvariable in the control policies, thus ensuring thatthe control policy cannot create movement plansthat are not realizable by the robot. Other per-ceptual variables could be used to create differentforms of on-line modifications, e.g., based on contactforces in locomotion or perceptual variables in jug-gling. Secondly, although the output of the canon-ical limit cycle oscillator has a very simple signalshape, the proposed framework can fit almost ar-bitrarily complex smooth signals using the function

0 1 2 3 4 5 6 7 8 9 10

−1

0

1

A

0 1 2 3 4 5 6 7 8 9 10

−1

0

1

B

0 1 2 3 4 5 6 7 8 9 10

−1

0

1

C

0 1 2 3 4 5 6 7 8 9 10

−1

0

1

D

Time [s]

Figure 8: Modification of the learned rhythmic pat-

tern (flexion/extension of the right elbow, R EB, see Fig-

ure 7). A: trajectory learned by the RCP, B: amplitude

modification with E0 = 4E0, C: frequency modification

with E0 = 4E0 and k = 2k, D: spatial modification with

ym = ym+1 (dotted line), where E, k, and ym correspond

to modified parameters between 3 and 7s.

approximation framework of locally weighted learn-ing [14]. These learning methods also allow us toautomatically determine the open parameters of thelearning system, i.e., the number of kernel functions,their center location and bandwidth (a topic that wedid not expand on due to space limitations). Andlastly, due to the meaningful parameterization of thedynamical systems, the learned trajectories can eas-ily be modified with respect to amplitude, frequency,and the midpoint of the rhythmic pattern. Spatialand temporal invariance of the differential equationsensure that such scaling does not affect the qualita-tive shape of the rhythmic patterns.

Future work will address how to automatically ex-tract the base frequency of a sample trajectory (e.g.,by FFT analyses, learning approaches, or oscillatorsynchronization [15]), how to integrate the rhythmiccontrol policies suggested in this paper with discretecontrol policies from previous work for general imi-tation learning, and how to incorporate the abilityof using reinforcement learning to further modify alearned pattern by trial and error.

Acknowledgments

This work was made possible by support from theUS National Science Foundation (Awards 9710312and 0082995), the ERATO Kawato Dynamic BrainProject funded by the Japan Science and Technol-ogy Corporation, the ATR Human Information Sci-ences Laboratories, and the Communications Re-search Laboratory (CRL).

References

[1] http://rana.usc.edu:8736/˜ijspeert/humanoid.html.

[2] C. G. Atkeson, J. Hale, M. Kawato, S. Kotosaka,F. Pollick, M. Riley, S. Schaal, S. Shibata, G. Tevatia,and A. Ude. Using humanoid robots to study humanbehaviour. IEEE Intelligent Systems, 15:46–56, 2000.

[3] R.D. Beer, H. J. Chiel, R. D. Quinn, and R. E. Ritz-mann. Biorobotic approaches to the study of motorsystems. Curr. Opin. Neurobiol., 8:777–782, 1998.

[4] K. Doya and S. Yoshizawa. Adaptive synchronizationof neural and physical oscillators. In J.E. Moody,S.J. Hanson, and R.P. Lippmann, editors, Advancesin Neural Information Processing Systems. MorganKaufmann, 1992.

[5] B. Ermentrout and N. Kopell. Learning of phase lagsin coupled neural oscillators. Neural Computation,6:225–241, 1994.

[6] A.J. Ijspeert. A connectionist central pattern gener-ator for the aquatic and terrestrial gaits of a simu-lated salamander. Biological Cybernetics, 85(5):331–348, 2001.

[7] A.J. Ijspeert, J. Nakanishi, and S. Schaal. Movementimitation with nonlinear dynamical systems in hu-manoid robots. In IEEE International Conference onRobotics and Automation (ICRA2002), pages 1398–1403. 2002.

[8] E. Klavins and D. Koditschek. Stability of coupledhybrid oscillators. In IEEE International Conferenceon Robotics and Automation, pages 4200–4207. 2001.

[9] S. Kotosaka and S. Schaal. Synchronized robot drum-ming by neural oscillator. In The International Sym-posium on Adaptive Motion of Animals and Ma-chines. Montreal, Canada. 2000.

[10] E. Marder. Motor pattern generation. Curr. Opin.Neurobiol., 10:691–698, 2000.

[11] J. Nakanishi, T. Fukuda, and D.E. Koditschek. Abrachiating robot controller. IEEE Transactions onRobotics and Automation, 16(2):109–123, 2000.

[12] J. Nishii. A learning model for oscillatory networks.Neural Networks, 11:249–257, 1998.

[13] A.A. Rizzi and D.E. Koditschek. Further progress inrobot juggling: Solvable mirror laws. In IEEE In-ternational Conference on Robotics and Automation,pages 2935–2940. 1994.

[14] S. Schaal and C.G. Atkeson. Constructive incremen-tal learning from only local information. Neural Com-putation, 10(8):2047–2084, 1998.

[15] S. Schaal, S. Kotosaka, and D. Sternad. Nonlineardynamical systems as movement primitives. In In-ternational Conference on Humanoid Robotics. Cam-bridge, MA, Sept 6-7, 2001. 2001. in Press.

[16] G. Taga, Y. Yamaguchi, and H. Shimizu. Self-organized control of bipedal locomotion by neural os-cillators in unpredictable environment. Biological Cy-bernetics, 65:147–159, 1991.

Learning rhythmic movements by demonstration using ... · Learning Rhythmic Movements by...

Documents

Transcript of Learning rhythmic movements by demonstration using ... · Learning Rhythmic Movements by...