Automated Derivation of Primitives for Movement...

16
Autonomous Robots 12, 39–54, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Automated Derivation of Primitives for Movement Classification AJO FOD, MAJA J. MATARI ´ C AND ODEST CHADWICKE JENKINS USC Robotics Research Labs, Computer Science Department, University of Southern California, 941 West 37th Place, SAL 228, Mailcode 0781, Los Angeles CA, USA [email protected] [email protected] [email protected] http://robotics.usc.edu/agents/imitation.html Abstract. We describe a new method for representing human movement compactly, in terms of a linear super- imposition of simpler movements termed primitives. This method is a part of a larger research project aimed at modeling motor control and imitation using the notion of perceptuo-motor primitives, a basis set of coupled perceptual and motor routines. In our model, the perceptual system is biased by the set of motor behaviors the agent can execute. Thus, an agent can automatically classify observed movements into its executable repertoire. In this paper, we describe a method for automatically deriving a set of primitives directly from human movement data. We used movement data gathered from a psychophysical experiment on human imitation to derive the primi- tives. The data were first filtered, then segmented, and principal component analysis was applied to the segments. The eigenvectors corresponding to a few of the highest eigenvalues provide us with a basis set of primitives. These are used, through superposition and sequencing, to reconstruct the training movements as well as novel ones. The validation of the method was performed on a humanoid simulation with physical dynamics. The ef- fectiveness of the motion reconstruction was measured through an error metric. We also explored and evalu- ated a technique of clustering in the space of primitives for generating controllers for executing frequently used movements. Keywords: imitation, learning, robotics, movement control, basis behaviors 1. Introduction Programming and interacting with robots, especially humanoid ones, is a complex problem. Using learn- ing to address this problem is a popular approach, but the high dimensionality of humanoid control makes the approach prohibitively slow. Imitation, the process of learning new movement patterns and skills by obser- vation, is a promising alternative. The ability to imitate enables a robot to greatly reduce the space of possi- ble trajectories to a subset that approximates that of the observed demonstration. Refinement through trial and error is still likely to be required, but in a greatly reduced learning space. We have developed a model for learning by im- itation (Jenkins et al., 2000; Matari´ c, 2000, 2001), inspired by two lines of neuroscience evidence: mo- tor primitives (Bizzi et al., 1995) and mirror neurons (Iacoboni et al., 1999). Motor primitives are biolog- ical structures that organize the underlying mecha- nisms of complete movements, including spinal fields (Bizzi et al., 1995) and central pattern generators (Stein, 1997). In a computational sense, primitives can be viewed as a basis set of motor programs that are suf- ficient, through combination operators, for generating entire movement repertoires. Several properties of hu- man movement patterns are known, including smooth- ness (Hogan, 1984; Flash and Hogan, 1985; Uno et al.,

Transcript of Automated Derivation of Primitives for Movement...

Page 1: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

Autonomous Robots 12, 39–54, 2002c© 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

Automated Derivation of Primitives for Movement Classification

AJO FOD, MAJA J. MATARIC AND ODEST CHADWICKE JENKINSUSC Robotics Research Labs, Computer Science Department, University of Southern California,

941 West 37th Place, SAL 228, Mailcode 0781, Los Angeles CA, [email protected]

[email protected]

[email protected]

http://robotics.usc.edu/∼agents/imitation.html

Abstract. We describe a new method for representing human movement compactly, in terms of a linear super-imposition of simpler movements termed primitives. This method is a part of a larger research project aimedat modeling motor control and imitation using the notion of perceptuo-motor primitives, a basis set of coupledperceptual and motor routines. In our model, the perceptual system is biased by the set of motor behaviors theagent can execute. Thus, an agent can automatically classify observed movements into its executable repertoire.In this paper, we describe a method for automatically deriving a set of primitives directly from human movementdata.

We used movement data gathered from a psychophysical experiment on human imitation to derive the primi-tives. The data were first filtered, then segmented, and principal component analysis was applied to the segments.The eigenvectors corresponding to a few of the highest eigenvalues provide us with a basis set of primitives.These are used, through superposition and sequencing, to reconstruct the training movements as well as novelones. The validation of the method was performed on a humanoid simulation with physical dynamics. The ef-fectiveness of the motion reconstruction was measured through an error metric. We also explored and evalu-ated a technique of clustering in the space of primitives for generating controllers for executing frequently usedmovements.

Keywords: imitation, learning, robotics, movement control, basis behaviors

1. Introduction

Programming and interacting with robots, especiallyhumanoid ones, is a complex problem. Using learn-ing to address this problem is a popular approach, butthe high dimensionality of humanoid control makes theapproach prohibitively slow. Imitation, the process oflearning new movement patterns and skills by obser-vation, is a promising alternative. The ability to imitateenables a robot to greatly reduce the space of possi-ble trajectories to a subset that approximates that ofthe observed demonstration. Refinement through trialand error is still likely to be required, but in a greatlyreduced learning space.

We have developed a model for learning by im-itation (Jenkins et al., 2000; Mataric, 2000, 2001),inspired by two lines of neuroscience evidence: mo-tor primitives (Bizzi et al., 1995) and mirror neurons(Iacoboni et al., 1999). Motor primitives are biolog-ical structures that organize the underlying mecha-nisms of complete movements, including spinal fields(Bizzi et al., 1995) and central pattern generators (Stein,1997). In a computational sense, primitives can beviewed as a basis set of motor programs that are suf-ficient, through combination operators, for generatingentire movement repertoires. Several properties of hu-man movement patterns are known, including smooth-ness (Hogan, 1984; Flash and Hogan, 1985; Uno et al.,

Page 2: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

40 Fod, Mataric and Jenkins

1989), inter-joint torque constraints (Gottlieb et al.,1996), and the “2/3 power law” relating speed to cur-vature (Viviani and Terzuolo, 1982). It is not clear howone would go about creating motor primitives with suchknowledge alone.

Mirror neurons are structures that link the visualand motor systems in the context of complete, gen-eralized movements, such as reaching and grasping. Inour model, the ability to imitate is based on such a map-ping mechanism; the system automatically classifies allobserved movements onto its set of perceptuo-motorprimitives, a biologically-inspired basis set of coupledperceptual and motor routines. This mapping processalso serves as a substrate for more complex and less di-rect forms of skill acquisition. In this view, primitivesare the fundamental building blocks of motor control,which impose a bias on movement perception and fa-cilitate its execution, i.e., imitation (Mataric, 2001).

Determining an effective basis set of primitives isthus a difficult problem. Our previous work has ex-plored hand-coded primitives (Mataric et al., 1999;Jenkins et al., 2000). Here, we describe a method forautomatically generating a set of arm movement prim-itives from human movement data. We hypothesizethat the trajectory executed by the arm is composedof segments in which a set of principal componentsare active. We first segment movement data and thenapply principal component analysis on the resultingsegments to obtain primitives. The eigenvectors corre-sponding to a few of the highest eigenvalues provideus with a basis set of such primitives. The projectionof a new segment onto this subspace contains mostof the information about that segment, but is a lower-dimensional, more compact representation. To evaluatethe method of movement encoding in terms of eigen-movement primitives, and the subsequent reconstruc-tion of the original movements or entirely novel ones,we used a mean squared deviation metric. Finally, wealso demonstrated the movements on a dynamic hu-manoid simulation, reconstructing both the training setand entirely novel test movements.

The rest of the paper is organized as follows. InSection 2, we place our work in the context of relevantresearch. A brief description of our imitation modelis given in Section 3. The psychophysical experimentfrom which the movement data were drawn is describedin Section 4. The methods used to analyze the dataare presented in Section 5. Section 6 discusses con-troller design and movement clustering. Section 7 intro-duces the evaluation metric and the performance of the

movement reconstruction. Section 8 describes the val-idation of the derived primitives on a humanoid sim-ulation. In Section 9 summarizes the paper and ourcontinuing research.

2. Motivation and Related Work

Much of human visual experience is devoted to watch-ing other humans manipulate the shared environment.If humanoid or similarly articulated robots are to workin the same type of environments, they will need torecognize other agents’ actions and to dynamically re-act to them. To make this possible, motor behaviorsneed to be classified into higher-level representations.The perceptuo-motor primitives we present are sucha representation. Understanding motor behavior, then,becomes a process of classifying the observed move-ments into the known repertoire, a natural basis forimitation.

In our approach, the motor repertoire consists of acollection of movement primitives, behaviors that ac-complish complete goal-directed actions (Mataric andMarjanovic, 1993; Mataric, 1997, 2001). These behav-iors are used to structure movement as well as bias andclassify movement perception. Thus, the primitives, inour imitation model, are perceptuo-motor, unifying vi-sual perception and motor output, through a direct map-ping between the two representational spaces. Whilethis model is inspired by neuroscience evidence, its spe-cific structure and the methods we describe for derivingthe primitives are not intended to model any particularbiological processes.

To move, the vertebrate central nervous system(CNS) must transform information about a small num-ber of variables to a large number of signals to manymuscles. Any such transformation might not be unique.Bizzi et al. (1995) performed experiments to showthat inter-neuronal activity imposes a specific balanceof muscle activation. These synergies lead to a finitenumber of force patterns, which they called conver-gent force fields. The CNS uses such synergies to re-cruit a group of muscles simultaneously to perform anygiven task. This evidence has inspired work in motorcontrol of mobile robots using schemas (Arkin, 1987,1998; Arbib, 1992) and our own work on basis behav-iors (Mataric and Marjanovic, 1993; Mataric, 1997).Sanger (2000) demonstrated such motor synergies byapplying PCA to analyze trajectories followed by a hu-man arm while the wrist traces a curve on a plane. Weuse a similar approach here, but on higher-dimensional

Page 3: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

Automated Derivation of Primitives 41

free movements, for the purposes of movement encod-ing for subsequent recognition and classification. Inmobile robotics, Pierce and Kuipers (1997) proposed amethod of abstracting control of a system using PCAto determine motor features that are associated withsensory features.

Other studies and theories about motor primi-tives (Mussa-Ivaldi, 1997; Mussa-Ivaldi et al., 1993;Iacoboni et al., 1999) suggest that they are viable meansfor encoding humanoid movement. Thoroughman andShadmehr (2000) provide neuroscience evidence insupport of motor primitives. They argue for a localizedsensori-motor map from position, velocity, and acceler-ation to the torques at the joints. In contrast, our methodfor generating primitives produces a fixed set of trajec-tories for which controllers can be generated. Schaalet al. (1998, 1999) discuss a set of primitives for gener-ating control commands for any given motion by mod-ifying trajectories appropriately or generating entirelynew trajectories from previously learned knowledge. Incertain cases it is possible to apply scaling laws in timeand space to achieve the desired trajectory from a set ofapproximations. The described elementary behaviorsconsist of two types of movements, discrete and rhyth-mic. While these are well suited for control, it is notobvious how the robot would generalize from the inputspace of visual data. In contrast, in our model, primi-tives are the representation used for generalizing visualinputs, i.e., inspired by the function of mirror neurons,they combine perceptual and motor components.

Segmentation and classification become key inter-related processes of movement interpretation. Segmen-tation of visual data is a general problem in computervision. Brand (1996) discusses a method for “gisting” avideo by reasoning about changes in motions and con-tact relations between participants. In that work, usu-ally only the agent (typically a part of a human arm)and any objects under its control were segmented. Ourother work (Jenkins et al., 2000) applies a method ofsegmenting visual data manually into primitives. Mostrelated to the work presented here is the approachto face classification and recognition using so-called“eigenfaces” (Turk and Pentland, 1991). There, princi-ple component analysis was applied to a set of static im-ages of faces; in this work we apply PCA toward clas-sification and encoding of time-extended multi-jointmovements.

The current testing environment for our research isa humanoid simulation with dynamics, Adonis. As thedevelopment of physical humanoid systems advances

Figure 1. Our imitation model.

(Inoue et al., 2000; Hashimoto et al., 2000; Ambroseet al., 2000), such physical testbeds will become moreaccessible for further evaluation and application of ourresearch.

3. The Imitation Model

Here, we describe in more detail our model for imita-tion using perceptuo-motor primitives (Mataric, 2001).The model consists of five main subcomponents:Tracking, Attention, Learning, Classification, and Ac-tuation. These components are divided into three lay-ers: Perception, Encoding, and Action. Figure 1 illus-trates the structure of the model.

3.1. Perception

The Perception layer consists of two components,Tracking and Attention, that serve to acquire and pre-pare motion information for processing into primitivesat the Encoding layer. The Tracking component is re-sponsible for extracting the motion of features overtime from the perceptual inputs.

In general, the choice of a primitive set is constrainedby the types of information that can be perceived. Nat-urally, extending the dimensionality and types of infor-mation provided by the Tracking component increasesthe complexity of other components. For instance, if 3Dlocation information is used, the primitive set generatedby the Learning component must provide more descrip-tions to appropriately represent the possible types ofmotion.

The Attention component is responsible for select-ing relevant information from the perceived motionstream in order to simplify the Classification and Learn-ing components. This selection process is performed

Page 4: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

42 Fod, Mataric and Jenkins

by situating the observed input into meaningful struc-tures, such as a set of significantly moving featuresor salient kinematic substructures. Various attentionalmodels can be applied in addition to our segmentationmethod (Pashler, 1999).

3.2. Classification and Learning

The Encoding layer encompasses Classification andLearning, which classify observed motions into appro-priate primitives. The primitives, in turn, can be usedto facilitate segmentation. The Learning componentserves to determine and refine the set of primitives.The Classification component then uses the updatedset of primitives for movement encoding.

The Encoding layer provides two outputs to the Ac-tion layer: 1) the sequence of segments representing amotion (from Classification), and 2) a set of constraintsfor creating motor controllers for each primitive in theprimitive set (from Learning). The list of segments isespecially important because it describes intervals intime where a given primitive is active.

A discussion of key issues, including time and spaceinvariance of the primitives and the choice of their rep-resentation for facilitating classification and learning isfound in Jenkins et al. (2000).

3.3. Action

The final layer, Action, consists of a single compo-nent, Actuation, which performs the imitation by exe-cuting the list of segments provided by the Classifica-tion component. Ideally, primitive controllers shouldprovide control commands independently, which canthen be combined and/or superimposed through motoractuation. The design of such controllers is one topicof research we are pursuing.

As noted above, the motor controller components ofthe primitives may be manually derived or learned. Inboth cases, to be general, they must be characterizedin parametric form. In this paper we address learningof an initial set of primitives directly from movementdata.

In this paper, we address three components of theabove-described imitation model: Attention, Learning,and Classification. Attention is implemented in theform of a method for motion segmentation. Learning isaddressed only at the level of deriving an initial set ofprimitives from the segmented data, not subsequent ex-pansion and refinement of that set. The latter is another

active area of research we are pursuing. Classificationof new motion is performed for encoding the motioninto the primitives and for reconstruction of the originalobserved movement.

4. Human Movement Data

We used human movement data as the basis for derivingthe movement primitives. The data were collected ina psychophysical experiment in which 11 college-agesubjects watched and imitated 20 short videos (stim-uli) of arm movements1 (Pomplun and Mataric, 2000).Each stimulus was a 3 to 5-second long sequence ofarm movements, presented against a black background,shown on a video monitor. The subjects were asked toimitate a subset of the presented movements with theirright arm, in some cases repeatedly, and their imita-tions were tracked with the Polhemus FastTrak motiontracking system. Position sensors, about 2 × 1 × 1 cmin size, were fastened with elastic bands at four pointson each subject’s right arm, as shown in Fig. 2:

1. The center of the upper arm,2. The center of the lower arm,3. Above the wrist,4. The phalanx of the middle finger.

Thus, the tracking data consisted of the joint an-gles over time of the 4 DOF of a human arm: theshoulder flex extend (SFE), shoulder adduction abduc-tion (SAA), humeral rotation (HR), and elbow rota-tion (ER) (Kreighbaum and Barthels, 1985). SFE is the

Figure 2. Location of Polhemus FastTrak sensors on the subject’sarm.

Page 5: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

Automated Derivation of Primitives 43

up-and-down movement of the arm; SAA representsmovement of the projection of the arm onto the horizon-tal plane; HR is rotation about an axis passing throughthe upper arm, and ER is the flexion/extension of theelbow joint.

The positions of the sensors were measured every34 ms, with an accuracy of about 2 mm. The measured3D coordinates of the four sensors were recorded foreach of the subject’s imitations, for all subjects, alongwith a time stamp for each measurement. The detailsof the psychophysical experiment and its results aredescribed in Pomplun and Mataric (2000).

5. Methods

This section describes our approach and the methodsinvolved. First, we applied inverse kinematics to trans-form extrinsic Cartesian 3D marker coordinates intointrinsic joint angle representation. We applied filteringto remove the incidental random zero-velocity pointscaused by noise. Next, we segmented the movementdata so that finite-dimensional vectors could be ex-tracted for subsequent principal component analysis(PCA). By applying PCA, we obtained a set of eigen-vectors, a small subset of which represents almost allthe variance in the data. The details of each of thesemethods are given in the following sections. Figure 3

Figure 3. Components of our system. Thin lines represent the flow of data in the training phase. Thick lines represent the flow of data in thetesting phase. All dashed lines represent the use of previously computed data. The three layers of the imitation model are shown vertically, asthey relate to the specific system components.

summarizes the key steps of the approach, and relatesthem to our general imitation model.

5.1. Inverse Kinematics

The raw movement data we used were in the form of 3DCartesian marker positions as a function of time. Dueto human kinematic constraints imposed on the joints,all the data can be compressed into fewer variablesthan required by a 3D representation of the wrist andelbow. For example, since the elbow lies on the surfaceof a sphere centered at the shoulder, we need only twodegrees of freedom to completely describe the elbowwhen the position of the shoulder is known. Similarly,the wrist can only lie on the surface of a sphere centeredat the elbow. Thus, the configuration of the arm can becompletely represented by four variables henceforthreferred to as angles.

The Euler angle representation we used describesthe position of the arm in terms of a sequence of ro-tations around one of the principal axes. Each rota-tion transforms the coordinate frame and produces anew coordinate frame in which the next transforma-tion is performed. The first transformation Rzγ ro-tates the coordinate frame through γ about the z-axis.Subsequently, the rotations Ryβ about the y-axis andRzα determine the configuration of the shoulder. If therotation of the elbow is specified, the position of the

Page 6: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

44 Fod, Mataric and Jenkins

elbow and the wrist relative to the arm are completelydetermined.

5.2. Filtering

The movement data we used contained two distincttypes of noise that had to be filtered out: drift and highfrequency components. Drift, the low frequency com-ponent of the noise, is primarily caused by the move-ment of the subject’s shoulder. We assumed that it af-fected all marker positions on the arm uniformly, andmade the shoulder the origin of our coordinate frame,thereby eliminating this noise component. The highfrequency component of noise can be attributed to theelastic vibrations in the fasteners used to attach themarkers to the arm. Passing the signal through a But-terworth filter of order 10 and normalized cutoff fre-quency of 0.25 reduced this component of the noiseconsiderably (based on an empirically-derived level oftolerance). After these transformations, the resultingangles are a smooth function of time.

5.3. Segmentation

Signal segmentation is notoriously difficult, and thussegmentation of the arm movement data was one ofthe most challenging aspects of this work. In our ap-proach, it is necessary to segment the movement dataso PCA can be applied to extract common features.In the context of movement representation and recon-struction, the segmentation algorithm needs to have thefollowing properties:

1. Consistency: Segmentation produces identical re-sults in different repetitions of the action. For this,we defined a set of features which could be used todetect segment transitions.

2. Completeness: Ideally, segments do not overlap anda set of segments contains enough information toreconstruct the original movement completely.

A great deal of previous work has dealt with segmen-tation of time-series data. Brand (1996) and Brand et al.(1997) present an event-based method for continuousvideo segmentation; segments are introduced when acertain feature or contact configuration changes. Wadaet al. (1995) introduce a via-point method of seg-menting handwriting, speech, and other temporal func-tions. Pomplun and Mataric (2000) use a segmentation

scheme based on the root mean squared (RMS) valueof velocity in different joints. To reduce the distancebetween two movements that could be similar, theypropose a modified scheme that minimizes that dis-tance to produce the best segmentation. We consider acomparatively simpler approach to segmentation herewhich proved sufficient.

Our first attempt at segmentation involved the use ofthe 4D parametric curve plotting the change of the jointangles over time. The radius of curvature is defined asthe radius of the largest circle in the plane of the curvethat is tangential to the curve. Curvature has the conve-nient property of being a scalar function of time. Thus,when the radius is low, it could imply a change of de-sired state and hence can be used as a segmentation cue.Unfortunately, calculation of the radius of curvatureof a trajectory becomes intractable as the dimension-ality of the data increases. Furthermore, choosing anappropriate threshold for segmenting the curvature isdifficult. As a result, we abandoned this segmentationmethod for two simpler alternatives, presented next.

The segmentation routines we designed are based onthe angular velocity of different DOFs. The velocity re-verses in response to the subject changing direction ofmovement. Whenever this happens, the point is labeledas a zero-velocity crossing (ZVC). Each ZVC is asso-ciated with one DOF. We assume that all movement inany DOF occurs between ZVCs. If the movement is sig-nificant, based on an empirically-determined threshold,it is recorded as a segment. This thresholding aids ineliminating small oscillations. ZVCs could flank a seg-ment where almost no movement occurs. Ideally, thesesegments should be discarded since they contain nomovement information other than that the angle needsto be held constant. Spurious ZVCs could also be intro-duced by noise. Figure 4 shows an example of the resultof ZVC segmentation of a subject’s imitation of a stim-ulus video. Significant movement is assumed to occurwhen the change in the angle is above a certain thresh-old. The short curves above the continuous curves in-dicate the instances when such segments are extracted.

Toward the goal of deriving primitives, we seekproperties common to all DOFs. Fortunately, mostof the time, the ZVCs in different DOFs coincide inmany of the movement samples. Due to this prop-erty of continuous smooth movement in particular, wesegmented the data stream at the points where morethan one DOF have ZVCs separated by less than 300ms, an empirically-derived threshold; we call this theSEG-1 routine. To avoid including random segments,

Page 7: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

Automated Derivation of Primitives 45

Figure 4. Segmenting using zero-velocity crossings. Variables areplotted along the y-axis against time on the x-axis. The four sub-plotsare of the four angles that we analyzed. The curved lines above themovement plots represent individual extracted segments.

whose properties cannot be easily determined, we keptonly those segments whose start and end both hadthis property. This results in dropping some segments,thereby compromising the completeness property ofthe segmentation algorithm; we would be unable to re-construct all movements completely. The movementsrepresented in such dropped segments have certaincommon properties. Specifically, the end-points arelikely to involve little movement, and there is a directedeffort to get to a new configuration of joint angles in theinterim. It is also very likely that the limb is movingin a nearly constant direction during such segments.The direction reversals are likely to be points whereintentions and control strategies change.

The segmentation routine SEG-1 we described isnot complete, i.e., it does not guarantee that any in-put movement will be completely partitioned. There isconsiderable overlap when two joint angles cross theirzeros simultaneously while the other two cross theirzeros at a different point in time. For some actions,the zero-crossings may not coincide for the majority ofthe movement. This is possible either when the someof the zeros are missed by the zero marking procedureor when the zeros fail to align. The latter problem arisesfor specific classes of movements in which at least oneof the DOFs is active when another is at its ZVC, suchas in repetitions of smooth curves. For example, repeat-edly tracing out a circle would result in no boundariesdetected by SEG-1.

Though SEG-1 explains the formation of primitivesin well-defined segments, it cannot reliably reconstruct

outside those segments because the segmentation doesnot result in a complete partition of the movementspace. It might be possible to improve the segmen-tation for reaches. However, that would not solve theproblems caused by potential misalignment of the zero-crossings themselves. It might also be possible to arriveat a different segmentation scheme for the parts thatare not amenable to this form of segmentation. Thiswould essentially divide the space of movement intoreaches and non-reaches, consistent with literature inmotor control (Schaal et al., 1998).

In order to produce a more complete segmentation,we developed a second segmentation algorithm, calledSEG-2, based on the sum of squares of angular velocityz, as follows:

z = θ21 + θ2

2 + θ23 + θ2

4 (1)

where θi is an angle. Figure 5 plots the variation of zover time for a single movement stimulus. The SEG-2algorithm segments whenever z is lower than a thresh-old we derived empirically from the movement data. Asnoted above, the subjects were asked to repeat some ofthe demonstrations multiple times in a sequence. Weadjusted the threshold so as to obtain about as manysegments in each movement as the number of repeti-tions of the action in the sequence.

The resulting segments had the common propertythat z was high within a segment and low at the bound-aries. Thus, low values of z mark segment bound-aries. Because of the way we determined the threshold,this method of segmentation was highly reliable. The

Figure 5. A plot of the variation of the sum of squares of velocities(y-axis) as a function of the time (x-axis).

Page 8: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

46 Fod, Mataric and Jenkins

resulting segments do not encode the entire movementbecause there are portions where z is low for an ex-tended period of time, indicating a stationary pose.

In Section 7 we compare the performance of the twosegmentation algorithms.

5.4. Principal Component Analysis

The segmentation algorithms above converted the in-put movement data into a list of segments of varyinglengths. In order to perform PCA on the segments, wefirst converted the movement to vector form. For eachDOF, the movement within a segment is interpolated toa fixed length, in our implementation to 100 elements.The elements are then represented in vector form andthe vectors for all DOFs concatenated to form a singlevector s of 400 elements. This is effectively a 400Drepresentation of the movement within each segment.

The mean of N segments s in the input vector iscomputed as follows:

s =∑N

i=1 si

N(2)

Since we have about 250 vectors in our data set, thecovariance matrix K can be written as

K =∑N

i=1(si − s)(si − s)T

N(3)

where T is the transpose operator and N is the numberof segments.

The principal components of the vector are the eigen-vectors of K. If s is d-dimensional, then there are dsuch eigenvectors. Most of the variance in the move-ments is captured by the few eigenvectors (say f vec-tors) that have the highest eigenvalues. Figure 6 illus-trates this property; it shows the most significant 11eigenvectors obtained by sorting the 400 principal com-ponents of K by increasing magnitude. Thus, we candescribe the vector s in terms of the projection of thevector along these f eigenvectors Ef, in other words, anf -dimensional vector p:

p = ETf s (4)

This results in an f -dimensional representation of eachsegment. The first four of these eigenvectors are shownin Fig. 7.

The number of eigenvectors chosen for further pro-cessing determines the accuracy and the computationaloverhead involved in deriving the movement primi-tives. More eigenvalues produce higher accuracy of

Figure 6. The magnitude of the most significant 11 eigenvalues forour training set. x-axis presents the index of the eigenvectors, they-axis the associated eigenvalue.

the resulting primitive representation, but involve in-creased computational overhead in the derivation.

5.5. Segment Reconstruction

The vector p mentioned above can be used to recon-struct the original movement segment, as follows. pis essentially the projection of the movement segmentonto the eigenvector space. To reconstruct, we need toexpand the original vector in terms of this set of basisvectors. Formally, projecting back consists of:

s = Efp (5)

where vector s is in the same space as the original setof vectors. The resulting vectors can now be split into4 components for the 4 DOF needed for reconstruc-tion. We split the vector into 4 parts of length 100 each,thereby decomposing the movement in the 4 joint an-gles in time. This is the movement on a normalizedtime scale.

While encoding the movement in this format, werecord the time at the beginning and the end of eachsegment to aid in the reconstruction of the movement.Expanding p for every segment gives us a set of time-normalized segments. These need to be resized tempo-rally to obtain the original movement. This is done bycubic spline interpolation between data points from theKarhunen-Loeve expansion.

Because of the segmentation approach we used,the derived primitives, when executed in isolation,

Page 9: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

Automated Derivation of Primitives 47

Figure 7. The first 4 eigenvectors. Each figure shows one eigenvector. Each sub-plot shows the variation of an angle (y-axis) over time (x-axis)in each of the eigenvectors.

constitute strokes, movements of the arm from one poseto another (Schaal, 1999). High frequency componentscontribute increasingly in lower eigen-valued primi-tives. When combined through sequencing, the strokesgenerate a reconstruction of a given test movement.

6. Movement Controller Design

As described in Section 3.3, our imitation model in-volves a set of controllers which actuate the derivedprimitives. We explored the following method for gen-erating such controllers by clustering motion segmentsin the reduced eigenvector space.

Each segment is projected onto a f -dimensionalspace by the subset of eigenvectors we chose toretain. We then cluster the points using K-means(Gordon, 1999) into k clusters, where k is a

predetermined constant. The algorithm is initializedwith a set of k random estimates of cluster vectors.In every iteration, each point that is closer to one of thevectors than others is grouped into a cluster belongingto that vector. The vector is then recalculated as the cen-troid of all the points belonging to the vector’s cluster.

The choice of the number of clusters, k, is basedon the error that can be tolerated in approximating amovement by a cluster. If the tolerable error is large,we can do well with few clusters. For more accuracy,we increase the number of clusters; this reduces theerror but increases the computation time as well as thememory requirements of the clustering algorithm. For-tunately, clustering is performed in the training phase,while the primitives are being derived, not during theiron-line use for real-time movement classification andimitation.

Page 10: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

48 Fod, Mataric and Jenkins

To evaluate the above clustering technique, we re-placed each point in eigenvector (primitive) space withits corresponding cluster point. We then used thosecluster points to reconstruct the training movements(i.e., the data used to create the clusters). The quanti-tative analysis of the reconstruction error is presentedin Section 7.

We can view clustering as a naive means of controllerparameterization. Thus, this analysis represents theworst-case scenario, providing an upper bound on thereconstruction error. Any more sophisticated controllerdesign method will produce improved movement re-construction. Next we discuss one such more sophisti-cated approach to controller design.

A simple schematic representation of a primitives-based controller is shown in Fig. 8. Desired inputs arefed into the control system and the error is calculated,then projected onto the primitives derived above. Thisprocess gives us certain projection coefficients whichgenerate the desired control inputs for the primitivecontrollers.

As shown in the figure, each of the Primitive Con-trollers executes an individual primitive. Given the pro-jection coefficients, the controller generates a force sig-nal. The Position Correction modules correct for thedifference in the center of the stroke and the conse-quent change in the dynamics of the humanoid. Theforce signals are then sent to the humanoid. The result-ing angle measurements are compared with the desiredangles. The error is transformed into a correction signaland sent to the Primitive Controllers.

Figure 8. A controller model. The Primitive Controller is an indi-vidual controller for each primitive. The Position Correction modulecorrects for changes in the location of the center of the stroke. TheProjection Module projects the errors back onto the eigenmovementspace.

This approach is one solution for the Actuation com-ponent of the Action layer of our imitation model. Ourcontinuing research addresses the design and learn-ing of such controllers. However, the evaluation of theprimitives approach we present here has so far beenperformed with a simpler Actuation component, usingPD servos on the humanoid simulation. The results arepresented in Section 8. The next section presents ananalytical evaluation of the complete method, includ-ing the accuracy of primitives-based movement recon-struction, with and without the use of clustering.

7. Results and Evaluation

We chose mean squared error (MSE), one of the mostcommonly used error metrics for linear systems, toevaluate the performance of our encoding and recon-struction of movement. MSE is attractive for the fol-lowing reasons:

1. It penalizes large deviations from the original moreseverely than smaller ones.

2. It is additive. Error in one part of the reconstructionwill not cancel the error in another part.

3. It is mathematically tractable.

Representing segments in terms of eigenvectors al-lows us to quantify the error in the subsequent recon-struction. The squared error between the reconstructedvector sr and the original vector s is given by:

ε = (s − sr)T (s − sr) (6)

If we tolerate a root mean squared error of θ degreesat each sample point in s, the required bound on theexpectation of ε, i.e., MSE, is given by:

E(ε) ≤ Nθ2 (7)

The Karhunen-Loeve (KL) expansion of the recon-structed vector in terms of the eigenvectors is:

sr =f∑

i=1

pr,i ei (8)

where r is the cluster that is used for the reconstructionand pr,i are the coordinates of the cluster points.

The difference between the original and the recon-structed vectors is:

s − sr =f∑

i=1

(pi − pr,i )ei +N∑

i= f +1

pi ei (9)

Page 11: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

Automated Derivation of Primitives 49

The first term in the computed reconstruction error isthe clustering error, while the second term is the pro-jection error.

Our formulation of the clustering error is based onapproximating any segment by its closest cluster point.A torque strategy can be developed for each clusterpoint. If we use the torque strategy corresponding tothe nearest cluster, our error in reconstructing the move-ment would be lower, bounded by the error as formu-lated above. If, instead, we make use of a local approxi-mation of the movement-torque map around the clusterpoint, this error can be reduced further.

Since the eigenvectors are a set of orthonormal basisvectors, the MSE is a sum of the MSE of each projectionalong the individual eigenvectors. The MSE of individ-ual projections along each of the dropped eigenvectorsis:

E(ε)e =N∑

i= f +1

λi (10)

where the eigenvectors f + 1 through N were dropped.The error in projection is a function of the number ofeigenvectors kept for reconstruction. Table 1 comparesthe projection error for different numbers of eigenvec-tors for the segmentation algorithms SEG-1 and SEG-2.

We used thirty eigenvectors ( f = 30) in our projec-tion of the input vector.

The clustering error is orthogonal to the projectionerror and is given by:

E(ε)c = E

(f∑

i=1

(ai − pr,i )2

)(11)

where f is the number of dimensions in which clus-tering is done, ai is projection of the data point alongthe i th eigenvector, and pr,i is the projection of thecluster vector along the same. The above is also theaverage squared distance from the representative vec-tor when clustering is done. Table 2 shows a tab-ulation of the MSEc introduced by clustering as a

Table 1. A comparison of the MSE in projection using SEG-1 andSEG-2 segmentation routines.

No. of eigen- MSEp using MSEp usingvectors used SEG-1 SEG-2

10 1.22 × 107 6.47 × 107

20 1.05 × 105 1.48 × 106

30 2.82 × 103 7.64 × 104

40 1.92 × 102 5.39 × 103

Table 2. A comparison of the MSE in clustering using the SEG-1and SEG-2 segmentation routines.

No. of MSEc using MSEc usingclusters SEG-1 SEG-2

10 1.45 × 109 6.12 × 105

20 3.05 × 105 3.52 × 105

30 2.34 × 105 2.84 × 105

100 9.63 × 104

170 7.54 × 104

function of the number of clusters used, comparingthe two segmentation algorithms we used, SEG-1 andSEG-2.

Since the error in clustering is orthogonal to the errorin dropping smaller eigenvalues, the total error intro-duced by the approximation steps is given by:

E(ε) = E(ε)p + E(ε)c (12)

A tabulation of this total error as a function of thenumber of clusters used is given in Table 3. The totalerror is calculated assuming that 30 eigenvectors wereused in the KL expansion.

As can be seen, SEG-2 is a more complete seg-mentation algorithm than SEG-1, which partitions themovement data into complete non-overlapping seg-ments more suitable for reconstruction. While usingSEG-2, the magnitude of the eigenvectors does not de-cline as fast as in SEG-1. As a result of this, the numberof eigenvectors required to reconstruct the original seg-ment is larger, as shown in Tables 2 and 3.

The majority of the training movements segmentedinto 2, 3, or 4 segments. Figure 10 shows an exampleof a reconstructed movement, a reproduction of onesegment from the input data. Figure 11 shows a recon-struction of an entire movement composed of multiple

Table 3. Angular error corresponding to reconstruction with 30eigenvectors using SEG-1 and SEG-2 segmentation routines.

No. of Total error θ (degrees) Total error θ (degrees)clusters using SEG-1 using SEG-1 using SEG-2 using SEG-2

10 1.45 × 106 60.28 6.88 × 105 41.42

20 3.05 × 105 26.68 4.28 × 105 32.71

30 2.34 × 105 24.43 3.50 × 105 29.50

100 1.633 × 105 20.22

170 1.514 × 105 19.42

Page 12: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

50 Fod, Mataric and Jenkins

Figure 9. The graphical display of Adonis, the dynamic humanoidsimulation we used for validation.

segments. Each sub-plot shows one angle as a func-tion of time. In both figures, solid lines show originalmovements, dotted lines the reconstruction.

To compare the performance of the algorithm in re-constructing data from the training set with that in re-constructing movement outside the training set, we splitour movement data into two sets. Then, 75% of the datawere used as a training set, and the remaining 25% asa test set, i.e., as novel movements. The results showthat reconstruction is marginally better with the train-

Figure 10. Reconstruction of a segment. Each plot shows the vari-ation of one angle (y-axis) over time (x-axis). The solid line showsthe original movement, the dotted line the reconstruction.

Figure 11. Reconstruction of an entire movement composed ofmultiple segments. Each subplot shows one angle (y-axis) as a func-tion of time (x-axis). The solid line shows the original movement,the dotted line the reconstruction.

ing set (see Fig. 12). The median RMS reconstructionerror is 19.4 degrees for the training data, and 22.27 de-grees for the test data. Both reconstructions, observedby the naked eye, qualitatively appear to have very highfidelity.

The choice of the model parameters directly af-fects the error in the movement representation andthus reconstruction. By altering those parameters, wecan guarantee a higher accuracy in representation orachieve greater generalization. For example, higher ac-curacy can be obtained by either increasing the numberof eigenvectors in the representation or by increasingthe number of cluster points. This choice involves anecessary tradeoff between the processing time, mem-ory, and desired accuracy.

8. Humanoid Simulation and Validation

To demonstrate an application of the derived primitives,we used them to control a humanoid simulation withfull dynamics. Here we describe the simulator test-bed,the controller implementation, and finally the results ofthe validation.

8.1. The Humanoid Simulation Test-Bed

Our validation of the approach was performed onAdonis (Mataric et al., 1998, 1999), a rigid-body sim-ulation of a human torso, with static legs (see Fig. 9).Adonis consists of eight rigid links connected in a treestructure by a combination of hinge, gimbal, and ball

Page 13: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

Automated Derivation of Primitives 51

Figure 12. A comparison of the histograms of RMS angular errors in the reconstruction of movements in the training (left plot) and test (rightplot) data sets.

joints. The simulator has a total of 20 DOF. Each armhas 7 DOF, but movement of one or both arms gener-ates internal torques in the rest of the DOFs. The staticground alleviates the need for explicit balance control.

Mass and moment of inertia information is generatedfrom the geometry of body parts and human body den-sity estimates. Equations of motion are calculated usinga commercial solver, SD/Fast (Hollars et al., 1991).The simulation acts under gravity and accepts otherexternal forces from the environment. Collision detec-tion, with the body itself and with the environment, isimplemented on Adonis, but was not used in the ex-periments presented here, as it was not relevant; thetraining and test movements involved no collisions ornear-collisions.

8.2. Motor Control in the Simulation Test-Bed

We used joint-space PD servos to actuate the hu-manoid in order to execute the derived primitives, af-ter converting the movement segments into a quater-nion representation. Specifically, the segments werepresented as a sequence of target joint angle config-urations. If the set of desired points falls on a smoothtrajectory, the resulting motion has a natural appear-ance. Based on the target joint angle configuration, thetorques for all joints on the arm were calculated as afunction of the angular position and velocity errors byusing a PD servo controller:

τ = k(θdesired − θactual) + kd(θdesired − θactual) (13)

where k is the stiffness of the joint, kd the dampingcoefficient, θdesired, θdesired are the desired angles andvelocities for the joints, and θactual, θactual are the actualjoint angles and velocities.

In this simple validation, Adonis is programmed tocontinually reach target points in its joint space. Thiscan be thought of as setting a region about the tar-get point into which the trajectory must fall before thenext target is tracked. This approach has the potentialof over-constraining the trajectory. However, its pur-pose here is merely to validate the reconstruction ofthe primitives and to visualize them. As discussed inSection 6, our model mandates the use of motor con-trollers associated with each primitive (Mataric, 2001;Jenkins et al., 2000), an area of research we are cur-rently pursuing.

9. Summary and Continuing Research

We have presented a method that can be used to de-rive a set of perceptuo-motor primitives directly frommovement data. The primitives can be used as a lower-dimensional space for representing movement, as pro-posed in our imitation model. By projecting observedmovement onto a lower-dimensional primitives-basedspace, we can facilitate perception in a top-downmanner, as well as classification and imitation. Thuswe only need to store a small number of controlstrategies and use those as a basis for sequencing.In addition to sequencing primitives, our representa-tion also allows them to be concurrently executed,

Page 14: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

52 Fod, Mataric and Jenkins

since the eigenvectors form superimposable basisfunctions.

Briefly, the derivation process consists of the fol-lowing steps. Movement data in the form of 3D lo-cations of the arm joints are converted into an Eulerrepresentation of joint angles using inverse kinemat-ics, and filtered to facilitate segmentation. Next, oneof the two segmentation algorithms we proposed wasapplied. Principal components analysis was performedon the resulting segments to obtain the “eigenmove-ments” or primitives. The K-means algorithm was usedto cluster the points in the space of the projections ofthe vectors along the eigenvectors corresponding to thehighest eigenvalues to obtain commonly used move-ments. Reconstruction was performed to retrieve theoriginal movements as an evolution of Euler anglesin time from a representation in terms of primitives.A MSE-based error metric was used to evaluate theresulting reconstruction.

The described research is one of several of ourprojects aimed at primitives-based motor control andimitation, described in more detail in Mataric (2001)and Jenkins et al. (2000). The areas of direct appli-cation of our work are in robotics, physics-based ani-mation, and Human Computer Interface (HCI). In allof these areas, the need to control and interact withcomplex humanoid systems, whether they be physi-cal robots or realistic simulations, involves managingcomplex motor control, where the number of inter-dependent variables makes designing control strate-gies formidable. By using human movement data, wecan find a set of relationships between the differentdegrees of freedom, as our eigenvectors do, to re-duce the dimensionality of the problem. This can fur-ther be used for controller design, as well as moregeneral structuring of perception, motor control, andlearning.

The presented method for deriving primitives is thebasis for our work on primitives-based motor controland imitation (Mataric, 2001; Jenkins et al., 2000). Thenotion of motor primitives is gaining popularity in themotor control community. For example, Schaal (1999)suggests a learning mechanism where a few movementprimitives compete for a demonstrated behavior. Heuses an array of primitives for robot control, each ofwhich has an associated controller. Learning is per-formed by adjusting both the controller and the primi-tives. Our method for deriving primitives could be usedeither in place of the learning algorithm or to help boot-strap it.

Our continuing research focuses on methods for gen-erating controllers for the derived primitives. We arealso working on the policies needed to modify thesecontrollers for changes in duration and couplings withother controllers that are simultaneously active. This isan important part of our model, which involves simulta-neous execution as well as sequencing of the primitivesfor both movement perception and generation (Mataric,2001). Finally, we are exploring alternative segmenta-tion methods, in particular a means of using existingprimitives as top-down models.

Acknowledgments

The research described here was supported in part bythe NSF Career Grant IRI-9624237 to M. Mataric,by the DARPA MARS Program Grant DABT63-99-1-0015, by the National Science Foundation Infras-tructure Grant CDA-9512448, and by the USC All-University Predoctoral Fellowship to Chad Jenkins.The data used for this analysis were gathered byM. Mataric and M. Pomplun in a joint interdis-ciplinary project conducted at the National Insti-tutes of Health Resource for the Study of NeuralModels of Behavior, at the University of Rochester.The humanoid simulation was obtained from JessicaHodgins.

Note

1. The data were gathered by M. Mataric and M. Pomplun in ajoint interdisciplinary project conducted at the NIH Resource forthe Study of Neural Models of Behavior, at the University ofRochester.

References

Ambrose, R.O., Aldridge, H., Askew, R.S., Burridge, R.R.,Bluethmann, W., Diftler, M., Lovchik, C., Magruder, D., andRehnmark, F. 2000. Impedance control: An approach to manipu-lation. IEEE Intelligent Systems, 15(4):57–63.

Arbib, M. 1992. Schema theory. In The Encyclophedia of Aritifi-cial Intelligence, 2nd ed., S. Shapiro (Ed.), Wiley-Interscience:New York, pp. 1427–1443.

Arkin, R.C. 1987. Motor schema based navigation for a mobile robot:An approach to programming by behavior. In Proceedings of IEEEIntl. Conf. on Robotics and Automation, Raleigh, NC, pp. 264–271.

Arkin, R.C. 1998. Behavior-Based Robotics, MIT Press: Cambridge,MA.

Page 15: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

Automated Derivation of Primitives 53

Bizzi, E., Giszter, S.F., Loeb, E., Mussa-Ivaldi, F.A., and Saltie, P.1995. Modular organization of motor behavior in the frog’s spinalcord. Trends Neurosci., 18:442–446.

Brand, M. 1996. Understanding manipulation in video. In 2nd Inter-national Conference on Face and Gesture Recognition, Killington,VT, pp. 94–99.

Brand, M., Oliver, N., and Pentland, A. 1997. Coupled hidden markovmodels for complex action recognition. In Proc. IEEE ComputerSociety Conference on Computer Vision and Pattern Recogni-tion, IEEE Computer Society Press: Los Alamitos, CA, pp. 994–999.

Gordon, A.D. 1999. Classification, Chapman & Hall: London.Flash, T. and Hogan, N. 1985. The coordination of arm movements:

An experimentally confirmed mathematical model. Journal ofNeuroscience, 5:1688–1703.

Gottlieb, G.L., Song, Q., Hong, D.-A., Almeida, G.L., and Corcos, D.1996. Coordinating movement at two joints: A principle of linearcovariance. Journal of Neurophysiology, 75:1760–1764.

Hashimoto, S., Narita, S., Kasahara, H., Shirai, K., Kobayashi, T.,Takanishi, A., Sugano, S., Yamaguchi, J., Sawada, H., Takanobu,H., Shibuya, K., Morita, T., Kurata, T., Onoe, N., Ouchi, K.,Noguchi, T., Niwa, Y., Nagayama, S., Tabayashi, H., Matsui,I., Obata, M., Matsuzaki, H., Murasugi, A., Kobayashi, T.,Haruyama, S., Okada, T., Hidaki, Y., Taguchi, Y., Hoashi, K.,Morikawa, E., Iwano, Y., Araki, D., Suzuki, J., Yokoyama, M.,Dawa, I., Nishino, D., Inoue, S., Hirano, T., Soga, E., Gen,S., Yanada, T., Kato, K., Sakamoto, S., Ishii, Y., Matsuo, S.,Yamamoto, Y., Sato, K., Hagiwara, T., Ueda, T., Honda, N.,Hashimoto, K., Hanamoto, T., Kayaba, S., Kojima, T., Iwata, H.,Kubodera, H., Matsuki, R., Nakajima, T., Nitto, K., Yamamoto,D., Kamizaki, Y., Nagaike, S., Kunitake, Y., and Morita. S. 2000.Humanoid robots in waseda university—Hadaly-2 and wabian. InProc. of First IEEE-RAS International Conference on HumanoidRobots, Cambridge, MA, USA.

Hogan, N. 1984. An organizing principle for a class of voluntarymovements. Journal of Neuroscience, 4:2745–2754.

Hollars, M., Rosenthal, D., and Sherman, M. 1991. Sd/fast user’smanual. Technical report, Symbolic Dynamics, Inc.

Iacoboni, M., Woods, R.P., Brass, M., Bekkering, H., Mazziotta, J.C.,and Rizzolatti, G. 1999. Cortical mechanisms of human imitation.Science, 286:2526–2528.

Inoue, H., Tachi, S., Tanie, K., Yokoi, K., Hirai, S., Hirukawa, H.,Hirai, K., Nakayama, S., Sawada, K., Nishiyama, T., Mikiand,O., Itoko, T., Inaba, H., and Sudo, M. 2000. HRP: Humanoidrobotics project of miti. In Proc. of First IEEE-RAS InternationalConference on Humanoid Robots, Cambridge, MA, USA.

Jenkins, O.C., Mataric, M.J., and Weber, S. 2000. Primitive-basedmovement classification for humanoid imitation. In Proceedings,First IEEE-RAS International Conference on Humanoid Robotics(Humanoids-2000), Cambridge, MA, USA.

Kreighbaum, E. and Barthels, K.M. 1985. Biomechanics: A Qualita-tive Approach for Studying Human Movement. Burgess PublishingCompany: Minneapolis, Minnesota.

Mataric, M.J. 1997. Behavior-based control: Examples from naviga-tion, learning, and group behavior. Journal of Experimental andTheoretical Artificial Intelligence, 9(23):323–336.

Mataric, M.J. 2000. Getting humanoids to move and imitate. IEEEIntelligent Systems, July 2000, 18–24.

Mataric, M.J. 2001. Visuo-motor primitives as a basis for learning

by imitation: Linking perception to action and biology to robotics.In Imitation in Animals and Artifacts, Kerstin Dautenhahn andChristopher Nehaniv (Eds.). MIT Press: Cambridge, MA.

Mataric, M.J. and Marjanovic, M.J. 1993. Synthesizing complex be-haviors by composing simple primitives, self organization andlife: From simple rules to global complexity. In European Con-ference on Artificial Life (ECAL-93), Brussels, Belgium, pp. 698–707.

Mataric, M.J., Zordan, V.B., and Mason, Z. 1998. Movement con-trol methods for complex, dynamically simulated agents: Adonisdances the macarena. In Proceedings, Autonomous Agents, ACMPress: Minneapolis, St. Paul, MI, pp. 317–324.

Mataric, M.J., Zordan, V.B., and Williamson, M. 1999. Making com-plex articulated agents dance: An analysis of control methodsdrawn from robotics, animation, and biology. Autonomous Agentsand Multi-Agent Systems, 2(1):23–43.

Mussa-Ivaldi, F.A. 1997. Nonlinear force fields: A distributed systemof control primitives for representation and learning movements.In Proc. of the 1997 IEEE International Symposium on Computa-tional Intelligence in Robotics and Automation, IEEE ComputerSociety Press: Los Alamitos, CA, pp. 84–90.

Mussa-Ivaldi, F.A., Giszter, S.F., and Bizzi, E. 1993. Convergentforce fields organized in the frog’s spinal cord. Journal of Neuro-science, 12(2):467–491.

Pashler, H.E. 1999. The Psychology of Attention, MIT Press:Cambridge, MA.

Pierce, D. and Kuipers, B. 1997. Map learning with uninterpretedsensors and effectors. Artificial Intelligence Journal, 92:169–229.

Pomplun, M. and Mataric, M.J. 2000. Evaluation metrics andresults of human arm movement imitation. In Proceedings,First IEEE-RAS International Conference on Humanoid Robotics(Humanoids-2000), MIT Press: Cambridge, MA. Also IRIS Tech-nical Report IRIS-00-384.

Sanger, T.D. 2000. Human arm movements described by a low-dimensional superposition of principal component. Journal ofNeuroScience, 20(3):1066–1072.

Schaal, S. 1999. Is imitation learning the route to humanoid robots.Trends in Cognitive Sciences, 3(6):233–242.

Schaal, S., Kotosaka, S., and Sternad, D. 1998. Programmable pat-tern generators. In Proceedings, 3rd International Conference onComputational Intelligence in Neuroscience, Research TrianglePark, NC, pp. 48–51.

Stein, P.S.G. 1997. Neurons, Networks, and Motor Behavior, MITPress: Cambridge, MA.

Thoroughman, K.A. and Shadmehr, R. 2000. Learning of ac-tion through combination of motor primitives. Nature, 407:742–747.

Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journalof Cognitive Neuroscience, 3(1):71–86.

Uno, Y., Kawato, M., and Suzuki, R. 1989. Formation and control ofoptimal trajectory in human multijoint arm movement. BiologicalCybernetics, 61(2):89–101.

Viviani, P. and Terzuolo, C. 1982. Trajectory determines movementdynamics. Neuroscience, 7:431–437.

Wada, Y., Koike, Y., Vatikiotis-Bateson, E., and Kawato, M. 1995.A computational theory for movement pattern recognition basedon optimal movement pattern generation. Biological Cybernetics,73:15–25.

Page 16: Automated Derivation of Primitives for Movement Classificationcs.brown.edu/research/pubs/pdfs/2002/Fod-2002-ADP.pdfautomatically generating a set of arm movement prim-itives from human

54 Fod, Mataric and Jenkins

Ajo Fod received his B.S. in Mechanical Engineering from the IndianInstitute of Technology—Madras (IIT) in 1998. He received twoM.S. degrees, in Operations Research and Computer Science, fromthe University of Southern California, Los Angeles (USC) in 1999and 2001, respectively. His current research interests include activityrecognition using planar lasers and frameworks for representing suchactivity.

Maja Mataric is an associate professor in the Computer ScienceDepartment and the Neuroscience Program at the University ofSouthern California, the Director of the USC Robotics ResearchLabs and an Associate Director of IRIS (Institute for Robotics andIntelligent Systems). She joined USC in September 1997, after twoand a half years as an assistant professor in the Computer Science

Department and the Volen Center for Complex Systems at Bran-deis University. She received her Ph.D. in Computer Science andArtificial Intelligence from MIT in 1994, her M.S. in Computer Sci-ence from MIT in 1990, and her B.S. in Computer Science fromthe University of Kansas in 1987. She is a recipient of the IEEERobotics and Automation Society Early Career Award, the NSF Ca-reer Award, and the MIT TR100 Innovation Award. She has workedat NASA’s Jet Propulsion Lab, the Free University of Brussels AILab, LEGO Cambridge Research Labs, GTE Research Labs, theSwedish Institute of Computer Science, and ATR Human Informa-tion Processing Labs. Her research studies coordination and learningin robot teams and humanoids. More information about it is foundat: http://robotics.usc.edu/∼maja

Chad Jenkins is a Ph.D. student in the Computer Science De-partment and the Robotics Research Labs at the University ofSouthern California. He received his B.S. in computer science andmathematics from Alma College and his M.S. in computer sciencefrom The Georgia Institute of Technology. He is currently work-ing on his thesis pertaining to the generation of perceptual-motorprimitives from motion capture data. His research interests includecontrol for humanoid robots and characters, computer animation, ac-tivity modeling and sensing, and interactive environments. His emailaddress is [email protected] and contact information is available athttp://robotics.usc.edu/∼cjenkins