698 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, …mmoussa/documents/TR2010_000.pdf698 IEEE TRANSACTIONS...

698 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, NO. 4, AUGUST 2010

An Integrated System for User-AdaptiveRobotic Grasping

Maria Ralph and Medhat A. Moussa, Member, IEEE

Abstract—This paper presents an integrated system that com-bines learning, a natural-language interface, and robotic graspingto enable the transfer of grasping skills from nontechnical usersto robots. The system consists of two parts: a natural-language in-terface for grasping commands and a learning system. This paperfocuses on the learning system and testing of the entire system ina small usability study. The learning system presented consists oftwo phases. In the first phase, the system learns to predict the nextcommand, which the user is planning to issue based on commandsequences recorded during previous grasping sessions. In the sec-ond phase, the system predicts the user’s current state and movesthe robot’s gripper to the intended target endpoint to attempt tograsp the object. Using eight nontechnical users and a 5-degree-of-freedom (DOF) robot arm, a usability study was conducted toobserve the impact of the learning system on user performanceand satisfaction during a grasping operation. Experimental resultsshow that the system was effective in learning users’ grasping in-tentions, which allowed it to reduce the average time to grasp anobject. In addition, participants’ feedback from the usability studywas generally positive toward having an adaptive robotics systemthat learns from their commands.

Index Terms—Grasping, learning and adaptive systems, per-sonal robots, physical human–robot interaction, user-adaptiverobots, user studies.

I. INTRODUCTION

THERE is currently a significant interest in developing per-sonal and service robots. Robots are now being used as

museum tour guides [16], [17], as office assistants [1], in su-permarkets [7], [11], and in homes as personal companionsand helpers [2], [12], [18]. However, widespread use of robotsin these environments faces several challenges. One key chal-lenge is the role human–robot interaction (HRI) plays to de-velop complex robot skills and behaviors. In contrast to “robotfriendly” industrial environments, homes and offices are human-friendly environments that are typically uncertain, unstructured,and cluttered. A robot surviving in these environments will need

Manuscript received October 5, 2009; revised April 1, 2010; accepted April8, 2010. Date of publication May 20, 2010; date of current version August10, 2010. This paper was recommended for publication by Associate EditorT. Kanda and Editor L. Parker upon evaluation of the reviewers’ comments.This work was supported by the Natural Sciences and Engineering ResearchCouncil of Canada and by the MDA Space Missions.

M. Ralph is with the Robotics Institute at Guelph, School of Engi-neering, University of Guelph, Guelph ON N1G 2W1, Canada (e-mail:[email protected]).

M. A. Moussa is with the Intelligent Systems Laboratory, School of En-gineering, University of Guelph, Guelph ON N1G 2W1, Canada (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TRO.2010.2048386

to develop complex behaviors and skills. Users can play a vitalrole to develop these skills that can be customized to their ownneeds and special environmental constraints, which leads to thedevelopment of more user-adaptive robots. These robots adapttheir behavior to an individual user on the basis of nontrivialinference from users’ sensory feedback and interaction. Sincemost of these users will have no prior experience with robotsor robotic programming, it will be critical to develop naturalcommunication interfaces for HRIs.

While many studies in the literature focus on robots acquiringnavigation and language skills, we are particularly focused ongrasping. A robot has limited functionality, if it cannot graspand manipulate objects that are located in its environment. Ob-jects found in a typical home vary in shape, size, texture, andfunctionality. To grasp everyday objects located in a clutteredenvironment is very difficult, especially in cases where basicgrippers are used to reduce robot cost and power needs. Thus,we are interested to develop robotic systems that can learn howto grasp based on interactions with nontechnical users. This in-teraction process can mimic how infants learn to grasp from theirown experimentation and parents guidance and allows users totransfer their own grasping skills to robots.

This paper is part of a series of papers [13]–[15] that arefocused on exploring these issues. In [14] and [15], an interac-tive study was conducted where a group of 15 participants wereasked to instruct a robot arm to grasp five small objects thatwere located on a table using a basic two-fingered gripper. Theparticipants were categorized into one of three groups: beginner,intermediate, and advanced users. Participants used speech toissue simple approach and grasp commands to the robot via anoperator. All of the commands were from a set of 12 primitivecommands; however, participants also used their own naturallanguage commands that had equivalent meaning for them. Re-sults collected from this study reveal that the average number ofcommands required to grasp all five objects was approximately20 commands for beginner users, 18 for intermediate, and 15 foradvanced. This is a significant number which could be reducedif the robot was able to learn users’ grasping behaviors and pref-erences during interaction sessions. The objective of this paperis to present a new integrated system that combines learning andnatural-language instruction to enable the transfer of graspingskills from users to robots.

A. Paper Contribution and Organization

This paper has several contributions. First, it presents a casestudy of an integrated system that combines natural languageprocessing (NLP), learning, and grasping, to facilitate skilltransfer (i.e., grasping skills) from nonexpert users to robots.

1552-3098/$26.00 © 2010 IEEE

Authorized licensed use limited to: University of Guelph. Downloaded on August 11,2010 at 16:49:49 UTC from IEEE Xplore. Restrictions apply.

RALPH AND MOUSSA: INTEGRATED SYSTEM FOR USER-ADAPTIVE ROBOTIC GRASPING 699

Fig. 1. Objects used in the study: comb, spoon, tweezers, key, and tensor-bandage clip.

This integrated approach provides valuable insights that can beapplied to other robotic skills and behaviors. To our knowl-edge, this is the first time such a system has been proposedfor grasping. The paper also reports on the results of testingthe system in a small usability study, and as such, it providesa benchmark for future HRIs involving nonexpert users andgrasping. Second, we examine in detail the issue of learningfrom previous experience and knowledge transfer across usersand objects. We do so by evaluating several scenarios that re-flect a typical home environment. Finally, the paper examineshow users respond to a more proactive robot that is capable ofmoving on its own in order to potentially reduce the user’s work-load. This is an important issue toward developing user-adaptiverobots.

This paper is divided into the following sections. Section IIprovides a brief overview of the experimental setup used duringour usability studies. Section III discusses Phase I and Phase IIof our learning system. In this section, Phase I describes analgorithm to predict the user’s next grasping command, andPhase II focuses on to classify the robot’s current grasping stage,while approaching an object. Section IV presents findings froma usability study, which incorporates the learning system duringgrasping experiments to observe whether the proposed two-phase learning system accurately interprets users’ intentions.Finally, Section V concludes our discussion and outlines futurework.

II. BACKGROUND

This section highlights the experimental setup used for theusability studies conducted. Further details of the experimentalsetup can be found in our previous work [15]. In this section,we present a brief overview of the workspace, task, and typesof commands provided to the users. A group of 15 partici-pants (five beginner, five intermediate, and five advanced users)were asked to command a CRS A255 five-degree-of-freedom(DOF) human-scale robotic arm equipped with a small parallelgripper to grasp five objects. The object set consisted of com-mon small household items, which are particularly difficult tograsp. Fig. 1 shows the set, which included a comb, a spoon,tweezers, a key, and a tensor-bandage clip. The first object (thecomb) was used as a practice session. Users were classified aseither beginner, intermediate, or advanced, based primarily on

Fig. 2. Laboratory layout for the interactive study.

their experience with technology. Users with no experience withtechnology were considered beginner, whereas users with someexperience with technology were considered intermediate. Themost experienced group of users were classified as advanced.The user’s workspace is depicted in Fig. 2. As shown in thefigure, users were positioned close to but within a safe distanceaway from the robotic arm.

During each session, the participant’s language, sequence ofcommands, number of commands, and time to complete the taskwere recorded. Participants were encouraged to simply “talk” tothe robot to describe the desired directional movements; how-ever, they were given an initial set of commands to begin with,as shown in Table I.

A “Wizard of Oz” approach was used to map spoken com-mands into controlled arm movements using a graphical-userinterface. These commands used (1) in order to accommodateincremental movements issued by users such as “move downvery little” or “move forward a lot.” In this case, a factor of 0.1,0.3, 0.6, or 0.9 was used for movements corresponding to “verysmall,” “default/no descriptor,” “more,” and “a lot.” The dis-tance calculated in (1) was multiplied by this factor to move ei-ther 10%, 30%, 60%, or 90% toward the target object from thegripper’s current position. Additional details about moving theend-effector during experiments can be found in [15]. Althoughdirectional movements such as “move down 10 in” were consid-ered, introducing this option to command the robot would haveresulted in too much variability in the results collected. In orderto maintain some level of consistency, we, therefore, chose touse (1) with a range of factors to manipulate the robot arm withrespect to user intentions

d =√

(f(x1 − x2))2 + (f(y1 − y2))2 + (f(z1 − z2))2 . (1)

Table II shows a sample interaction scenario, while Fig. 3shows the robot going through a sequence of commands untila grasp is completed. In total 60 interaction sessions, a total of1052 commands were recorded.

III. LEARNING SYSTEM DESIGN

The integrated system consists of two parts: a natural-language interface to grasp and a learning system. The natural-language interface is discussed in [15] and is, therefore, not



TABLE IPRIMITIVE COMMAND SET

TABLE IISAMPLE INTERACTION SESSION

presented here. Only relevant details, which concern the learn-ing system will be presented in this paper.

The learning system’s goal is to predict the user’s intentionsbased on the past experiments to enable the robot to specula-tively execute the user’s intended motion and reduce the num-ber of commands that the user needs to issue during a graspingtask. The learning-system design consists of the following twophases: Phase I and Phase II. In Phase I, the focus is to predictthe user’s most likely next command. In Phase II, the focus is topredict the robot’s current state and to speculatively move therobot to the user’s desired target endpoint. In both phases, therobot learns from past experiences, and continuously updates itsknowledge as the user(s) grasps more objects.

A. Phase I: Predicting the User’s Next Command

In this phase, the prediction task can be formulated as the fol-lowing target function: Given a user’s sequence of commandsc1 , c2 , . . . , ct , predict the user’s next command ct+1 . To learn

this target function, the learning system requires examples ofhow the user(s) behaves while grasping an object. We are inter-ested to understand whether past experiences with other userscan also be used and if the nature of the object is a factor tolearn from past experience.

1) Related Work: Similar work to predict the user’s nextmove has been presented in [6], where sliding windows areused to predict users’ behaviors during a table-assembly task.Here, the user’s past interaction history is used to anticipatewhat action will be performed next. From here, a robot canselect and ultimately execute the next appropriate action to aidthe user in task completion. Much like our work, the objectiveof the system is to reduce the user effort by improving systemperformance.

In [5], [8], and [10], first-order and mixed-order Markovchains are incorporated to produce the most accurate predictionsfor the user’s most likely next action for command line entries.In [8], experiments conducted using UNIX commands show thathigher accuracy rates are achieved when mixed-order Markovchains are used during the prediction process. Therefore, wehave chosen mixed-order Markov chains for our approach.

2) Prediction Algorithm: The prediction algorithm pre-sented in this paper represents the mixed-order Markov chainsas command-sequence patterns. These patterns are used to pre-dict the most likely next grasping command that will be issuedby the user. Command sequences recorded in [14] and [15] wereused to develop a table of command patterns. A pattern is de-fined as any sequence of commands ranging in size from two to12 commands, which is repeated throughout the recordings ofcommand sequences. Although patterns of 12 commands werefound, only three patterns of this length were recorded. In total,616 patterns were extracted. Probability distributions were thencreated for each of the patterns in the pattern table. The predic-tion algorithm used this table of patterns to determine the mostlikely next command the user would issue.



Fig. 3. Sequence of commands that leads to a successful grasp. Several intermediate commands were not shown for clarity. The user was also not shown forprivacy.

First, a prediction pattern “pattern C” is created using the firsttwo commands issued by the user. The pattern table t is thensearched using this prediction pattern C to extract the com-mand, which will most likely to occur next. If the probability ofthe prediction made is below a predetermined threshold t, theprediction algorithm waits for the next command from the userand expands the prediction pattern. A set of arbitrarily chosenthresholds ranging from 0.3 to 0.8 were used to observe howthe prediction algorithm would perform with variable constraintconditions. If a prediction is above t, then the prediction algo-rithm compares the predicted next command to the command,which the user enters. If a match occurs, then the net numberof correctly predicted commands increases by one (CorrectPre-dict + 1). If there is no match, then the net number of correctpredictions made decreases by one (CorrectPredict − 1). Thisapproach differs from those discussed earlier by the way inwhich the prediction pattern (i.e., Markov chain) is updated.In this paper, the prediction pattern is expanded regardless ofwhether or not a correct prediction has taken place. This meansthat the actual (i.e., correct) command is included to expand thecurrent prediction pattern. The objective is to achieve the largestprediction pattern possible and to use as much relevant previ-ous knowledge available in order to improve future predictions.However, if a prediction pattern cannot be found in the patterntable or if it exceeds its maximum length of 12 commands, thenthe prediction pattern is cleared. Once the prediction pattern hasbeen cleared, the next command issued from the user becomesthe start of a new prediction pattern to be used from that pointonward.

3) Experiments: By examining the number of correct pre-dictions made by the prediction algorithm, we can also exploreseveral important questions. We will use a real-world scenarioof a family that consists of children (i.e., advanced users), par-ents (i.e., intermediate users), and grandparents (i.e., beginnerusers) to illustrate these questions as follows.

1) Scenario 1: Is there a transfer of knowledge from grasp-ing one object to grasping another for each individualuser independent of other users? In other words, doeseach child, parent, or grandparent use similar patterns ofcommand sequences from object-to-object when trainingthe robot to grasp during their own individual graspingsessions?

2) Scenario 2: Is there a transfer of knowledge from graspingone object to grasping another for users of the same group?In other words, if a group of children train the robot tograsp one object, can this knowledge be used by otherchildren to grasp other objects?

3) Scenario 3: Is there a transfer of knowledge from grasp-ing one object to grasping another for all types of usersfrom different backgrounds? That is, do grandparents(e.g., beginner users), parents (e.g., intermediate users),or children (e.g., advanced users) use similar commandsequences (e.g., knowledge) from earlier grasping exper-iments with one object to grasp subsequent objects?

4) Scenario 4: Is there a transfer of knowledge between usersof the same group? We assume here that each user willcomplete working with all objects before another userstarts their session. In this case, for example, do subse-quent children training the robot to grasp objects use sim-ilar command sequences to that used by previous childrenin the group?

To examine these questions, two types of pattern tables wereused during experiments. The first type, which will be referred toas the offline pattern table, maintains the pattern table during in-teraction sessions and expands it with new data as users train therobot on grasping various objects. The second type, which willbe referred to as the online pattern table, clears the pattern tablebetween objects/users during interaction sessions. This secondapproach ensures that no previous information/knowledge isused during subsequent grasping sessions.

4) Results: The following figures present results, which arecollected for the two types of pattern tables discussed: onlineand offline pattern tables. These figures are used as a way toobserve the system’s overall behavior. In all of these figures,values along the y-axis represent the difference in the net num-ber of correct predictions between online and offline patterntables. This net difference is not cumulative. Therefore, a pos-itive value along the y-axis indicates that a higher number ofcorrect predictions are made when the table is not cleared be-tween grasping sessions (offline) than when the table is cleared(online). In the following, we will discuss the results for variousknowledge-transfer scenarios, as outlined earlier. It should benoted that for each of the figures, object 2 is not representedsince it has a (x,y) starting location of (0,0).



Fig. 4. Difference in net number of correct predictions between pattern tablenot cleared and pattern table cleared for threshold 0.5 for all users for objects 3,4, and 5, respectively.



Figs. 4–6 examine possible skill transfer from object-to-object for each user, independent of user group, for an arbitrarilychosen set of prediction thresholds 0.5–0.7. For example, whenthe pattern table is not cleared, User 1 grasping object 3 willuse the pattern table built from their individual grasping sessionwith object 2, and so on. When the pattern table is cleared, the

Fig. 7. Difference in net number of correct predictions between online andoffline pattern tables for thresholds 0.3–0.8 for the beginner group.

Fig. 8. Difference in net number of correct predictions between online andoffline pattern tables for thresholds 0.3–0.8 for the intermediate group.

previous knowledge from object 2 is not used for object 3. Thisis done for each user individually. The figures show that most ofthe results show a preference toward having the pattern table notcleared between objects. This is an important finding, as it sug-gests that overall, individual users build on their own previousknowledge and have a tendency to repeat command sequencesfor later objects. This is especially evident for object 5 sincemost of the datapoints recorded are well above the x-axis. Us-ing our family scenario, this means that each individual child,parent, and grandparent will use their own previous knowledgeacquired from grasping one object and use this knowledge tograsp the next object.

Figs. 7–9 examine a possible skill transfer from object-to-object for different users, which belong to the same group.For example, patterns of commands used by beginner users forgrasping object 2 are also used by subsequent beginner users tograsp object 3, and so on. Figs. 7–9 show that a higher numberof correct predictions are made when the pattern table is notcleared between objects. Paired two-tailed t-tests reveal signif-icance levels of p < 0.00002 for the beginner group and p <0.002 for intermediate and advanced users, which show a pref-erence toward not clearing the pattern table between objects.This means that users of a specific group (i.e., beginner, inter-mediate, or advanced) repeat command sequences used to graspprevious objects during subsequent grasping sessions. How-ever, only the intermediate group, for the most part, appears to



Fig. 9. Difference in net number of correct predictions between online andoffline pattern tables for thresholds 0.3–0.8 for the advanced group.

Fig. 10. Difference in net number of correct predictions between pattern tablenot cleared and pattern table cleared for thresholds 0.3–0.8 for all users.

consistently build on previous knowledge from object-to-object,as suggested by an apparent increase in the number of correctpredictions made from object 2 to object 5. Overall, to reflectthese results in our family scenario means that for instance,when a group of children train the robot to grasp an object,other children will have an easier time training the robot onsubsequent objects.

Fig. 10 explores the role previous grasping experiences playbetween objects for all users, regardless of the user group. Inthis scenario, all users grasp object 2 and then proceed to graspobject 3, and so on. Fig. 10 again shows a preference towardnot clearing the pattern table for subsequent interaction ses-sions. Paired two-tailed t-tests reveal a significance level ofp < 0.00001 for all users, regardless of group distinction. Thisagain shows a preference toward not clearing the pattern tablebetween objects overall. Fig. 10 also suggests that similar com-mand patterns are repeated mainly for interaction sessions forobjects 3 and 4, respectively. However, there appears to be lessapplication of previous knowledge when it comes to object 5.We note that Figs. 7–9 also show for the most part that object5 appears to have less transfer of knowledge between variousthresholds than objects 3 and 4. This can be attributed to anumber of reasons. Object 5 is very small and, therefore, mayhave required several different grasp attempts before successwas achieved. This factor may have also contributed to changesin previous command patterns observed. However, the general

Fig. 11. Difference in net number of correct predictions between pattern tablenot cleared and pattern table cleared for thresholds 0.3–0.8 for the beginnergroup for all objects.

Fig. 12. Difference in net number of correct predictions between pattern tablenot cleared and pattern table cleared for thresholds 0.3–0.8 for the intermediategroup for all objects.

Fig. 13. Difference in net number of correct predictions between pattern tablenot cleared and pattern table cleared for thresholds 0.3–0.8 for the advancedgroup for all objects.

finding from Fig. 10 using our family scenario indicates that forthe most part, all members of a family benefit from other familymember’s knowledge to grasp each of the objects in the objectset.

While the results so far show that knowledge acquired fromexperimenting with one object can be used to grasp other ob-jects, Figs. 11–13 suggest that this is not the case when trainingfollows a per-user sequence (forth scenario), where each user



completes working with all objects before another user can starthis/her session. Figs. 11–13 suggest that as a new user beginshis/her interaction session, command sequences from graspingof all objects by previous users within that group do not appearto be used during the new user’s interaction session. In otherwords, if three children out of four complete grasping all ofthe objects in the object set, the fourth child does not appearto use most of the knowledge acquired from his/her siblings.Paired two-tailed t-tests reveal a significance level of p < 0.02for beginner users, which favor to clear the pattern table be-tween users. However, values for intermediate and advanceduser groups show no significant difference with p > 0.1 andp > 0.2, respectively. This is interesting since this scenario isunlikely to occur in a real-world situation, where interactionswill be done using a per-user sequence. A typical home or workplace has many objects and interaction sessions that will likelymix users and objects. However, a selection of users do appear toshow some benefit from previous users’ experiences. For exam-ple, beginner User 15 does appear to use some similar commandsequences recorded from Users 2, 10, 12, and 13, respectively.Similarly, Intermediate Users 4 and 5, and Advanced User 7,also appear to benefit from previous users’ experiences as indi-cated by the higher number of correct predictions made whenprevious knowledge is present. The order of the users may haveplayed a role in these findings; however, in the real world, theorder of the users cannot be controlled.

5) Other Findings: We also examined the learning systemwith respect to commands with high frequency values. As shownpreviously in Table II, several commands such as the move downcommand (D), and the move forward command (F), for exam-ple, are repeated frequently. This raised the question: Are thepredicted commands simply the ones with highest frequencies?In order to address this question, we compared the predictionfrequency of each command produced by the learning systemwith a straightforward prediction of the next command based onthe frequency of each command found in the data. Findings re-veal that our learning approach achieves predictions, which arehigher than simply using the frequencies of commands to estab-lish predictive patterns. Therefore, this further substantiates theeffectiveness of the learning algorithm we have presented.

B. Phase II: Predicting Grasping Stages

The second phase of the learning system focuses on to predictthe robot’s current stage in the grasping process, and where itis heading. This will be followed by a speculative motion to atarget location to reduce the training time or make the graspingoperation more autonomous. The focus here is on the approachcomponent of the grasping process. Since there is a higher de-gree of repeatability in the commands users typically issue asthe robot reaches for an object, this enables predictive actionsto be explored. Furthermore, the Phase I algorithm discussedpreviously will be integrated in Phase II, and its impact will beevaluated.

1) Related Work: Work on robotic grasping has been cov-ered extensively in the robotics literature and focuses primarilyon the selection of a final grasp configuration, more so than on

exploration of common behaviors encountered during a grasp-ing task. However, Campbell et al. [3] present an approach tosegment a grasping operation into a series of behaviors. They fo-cus on the segmentation of teleoperated movements into specificrobot behaviors for the approach and grasp phase of the graspingprocess using Robonaut, i.e. the humanoid robot developed byNASA. Five main behaviors are defined during a reaching andgrasping operation: reach, grasp, hold, release, and withdraw.It also examines clustering sensory-motor information into be-haviors in order to provide a more robust approach to accommo-date changes in the environment during autonomous reachingand grasping movements. Although behaviors are outlined, thisstudy does not focus on the user’s preferences or adapt to theuser’s expectations to approach and grasp an object.

Within HCI, similar work to segment behaviors into specificstages has also been presented. In this case, the user’s behaviorsand preferences are taken into consideration. The MavHomesystem [4] for example learns to adapt to users’ behaviors withina home environment. The system segments a home into severalzones and tracks the movements of the occupants over a periodof time. Based on the data recorded, the system learns to adaptto users’ expectations by executing tasks such as to trigger lightsto turn on at specific times and to start kitchen appliances basedon the occupant’s current location/zone.

2) Segmenting Grasping Stages: Table II shows an exampleof one user session, which we will refer to as User A from nowonward, while instructing the robot during a grasping opera-tion. Table II shows that there are three defined stages: an ori-ent/rotate stage (in bold), an approach/translate stage (in italics),and a grasp attempt stage (in regular font). These command se-quences also include a fourth stage (not shown) where patternsof translation and rotation commands are mixed. This fourthstage is referred to as a mixed stage, where the robot transi-tions in and out of translation and rotation stages. To segmentcommand sequences into stages enables the learning system toidentify and track the robot’s behavior. Using the data collectedin [14] and [15] (60 sessions and 1052 commands), a machine-learning algorithm will be presented in this section to predictthe grasping stage. It should be noted that although User A, inTable II, did not issue corrective movements (i.e., move back,tilt up, etc.) as a result of overshooting his/her desired location,other users did during the study.

Prediction of the grasping stages will be based on a sequenceof commands observed in any given session. The goal is to ascer-tain the robot’s current grasping stage using the least number ofcommands. However, very short sequences of commands couldhave a low prediction accuracy rate. Thus, several commandwindows lengths will be investigated. A command window isdefined as any subset of commands of a specific length. Thecommand window maintains its fixed length, while sliding tothe right by one command to extract the next block of com-mands to be processed. For example, a record that containingthe number 123456, which has a sliding command window oflength 2, would extract the values 12, 23, 34, 45, 56. Similarly,a sliding command window of length 3 would extract values123, 234, 345, 456, and so on. The command window sizes un-der investigation were arbitrarily chosen and ranged from two



to six commands for training and two to five commands fortesting.

Once extracted, command segments were then passed to anaive Bayes (NB) classifier to classify the robot’s current stageduring a grasping operation. An NB classifier was chosen since itis one of the most accurate classification algorithms for this typeof learning problem. It also eliminates the need other algorithmshave to search the entire hypothesis space and, therefore, is moreefficient in classifying the instances observed.

The dataset was further segmented into two-thirds trainingdata and one-third testing data. This translated into segmenta-tion of the 60 records (1052 commands) into 40 records to train(726 commands) and 20 records to test (326 commands). Tenbatches of 40 training and 20 testing records were preparedby randomly selecting different records from the 60-recorddataset.

The learning system in this usability study focused only onthe translation stage. There are two reasons for this decision.First, most of the commands recorded were translation com-mands. Thus, predicting these commands could potentially savethe user time and reduce the number of commands required tocomplete a grasping task. Second, it is not possible to exam-ine the impact of several command window sizes unless a longsequence of commands are used in any one stage. Only the trans-lation stage provided this. However, we should clarify that thefocus on the translation stage does not limit the generalizationof the results to other tasks. In this respect, there is no inher-ent difference between issuing either a translation command ora rotation command. We simply chose to focus on translationcommands since these commands were a more effective and re-liable parameter to use in evaluating the impact of the learningsystem.

3) Translation Stage Target Endpoint: The target endpointwas derived from endpoints collected in our previous usabilitystudy [15]. It showed that an endpoint located directly abovethe object at a distance of approximately 2.3 cm from theobject’s geometric center would offer a conservative estimateof the user’s final target position before fine-tuning the grip-per’s position for a grasp attempt. This location will be usedby the learning system during the usability study discussed inSection IV.

4) Results I: With Only NB (Unfiltered Data): Figs. 14–17present results for testing window sizes of two to five commands(the unfiltered points), respectively. For each testing windowsize, results with training window sizes ranging from two tofour commands are presented. These results show that the bestaccuracy achieved on average is with a training window of threeor four commands and testing window of four or five commands(96.1%–97.5%). These results are encouraging, but ideally wewould like to achieve the highest classification rate with thesmallest testing window size to reduce the workload on the user.The accuracy to test with a window of two or three commandsranged from 81.3% to 92.5%, respectively, depending on thesize of the training window. To investigate an improvementon these accuracy rates, the prediction algorithm discussed inSection III-A was used as a preprocessing step (i.e., filter) to theNB classifier.

Fig. 14. Classification accuracy rates for robot stage to test window size oftwo commands and train window size of two to four commands.

Fig. 15. Classification accuracy rates for robot stage to test window size ofthree commands and train window size of two to four commands.

Fig. 16. Classification accuracy rates for robot stage to test window size offour commands and train window size of two to four commands.

5) Results II: With NB and Filtering: The prediction algo-rithm was used with a threshold value of 0.7 to filter only thosecommands that could be predicted with greater than 70% cer-tainty to pass on to the NB classifier. This threshold was chosenbased on the results found during Phase I presented in previ-ous figures, where a threshold of 0.7 was found to produce the



Fig. 17. Classification accuracy rates for robot stage to test window size offive commands and train window size of two to four commands.

TABLE IIISIGNIFICANCE VALUES FOR TWO-TAILED PAIRED t-TESTS COMPARING

UNFILTERED AND FILTERED METHODS

best overall results on average, particularly for the beginner andintermediate groups.

Figs. 14–17 show the results of adding the prediction algo-rithm (filtered data points). Table III shows the results of apply-ing two-tailed paired t-tests to evaluate the significance of thedifference in the means between the two methods (unfilteredand filtered, respectively). Fig. 14 shows that for a testing win-dow size of 2, the filtered method improved the accuracy rateby 5%–14% depending on the training window. These resultsare significant with more than 99% confidence, as shown inTable III.

To test with the use of a window size of 3, both methodsappear to be statistically equal with the only significant result,which shows that the unfiltered method outperforms the filteredmethod when a training window size of 2 is used. We note thatthe filtered method had significant variance in this specific case(SD = 9.3).

To test larger windows across all training window sizes, thetwo methods are either 1) statistically equal, 2) the unfilteredmethod outperforming the filtered method, or 3) the filteredmethod outperforming the unfiltered method (see Figs. 16 and17). Some of these results are significant, while others are not,as shown in Table III.

Finally, an aggregate comparison of all the data from allthe training and testing windows shows that the two methodsare statistically equal with 95% confidence, while the filteredmethod outperforms the unfiltered method when 90% confi-dence is used, as shown in Table III. We note that our interest isnot focused on the overall comparison between the two methods

but rather to enhance the accuracy of the learning system withsmall testing windows.

In summary, the aforementioned results show that the overallbest combination to train and test window sizes is a training win-dow size of 4 coupled with a testing window size of 5 to achieve100% grasping stage classification accuracy. However; waitingfor five user commands reduces the effectiveness of the sys-tem considerably. From this perspective, the choice of a testingwindow size of two or three commands coupled with a trainingwindow of three commands and the filtered method appears tobe the right balance between having small testing windows andconsistently high classification accuracy values (i.e., 97% fortwo testing commands 95% for three testing commands). Thistwo-phase learning system was tested in a small usability studypresented in Section IV.

IV. USABILITY STUDY

The objective of this usability study was to test the integratedsystem and examine if the addition of a more proactive robotwould be welcomed by users. To use the learning system dis-cussed earlier, all of the command sequences recorded fromour previous work [15] were used to build a pattern table. Thecommand sequences collected during this study were not usedto update this pattern table between experiments. However, thisoption could be incorporated into future versions of the system.

A. Study Details

The study included eight users who were part of the original15 users who participated in the earlier usability study [15].Because of recidivism, the eight participants returning for thisstudy consisted of four beginner, three intermediate, and one ad-vanced user. As such, the original users categories of beginner,intermediate, and advanced were restructured to beginner (fourusers) and nonbeginner groups (four users). The same experi-mental setup was used as mentioned in [15] with the additionof the learning system and target endpoint as discussed earlier.During this usability study, users were asked to conduct thesame set of grasping experiments as previously performed inSection II; however, the learning system was initiated once atranslation stage was detected from the set of possible grasp-ing stages (i.e., rotation, translation, grasp, mixed stage). Fromhere, if the robot was in an acceptable position (i.e., no singu-larity), the learning system triggered the robot to move to theestablished target endpoint. To ensure user safety, users wereinformed prior to the robot moving to the target endpoint. Thiswas accomplished by having the operator issue a verbal state-ment to the users that the robot would be moving on its own to anew location. Users did not have the option of to accept or rejectthe move to the target location; however, this option could beincluded in future versions of the system.

Once the target endpoint was reached, users continued to issuecommands to reach the object and attempt an initial grasp. Com-mands and times for each session were recorded. A debriefingsession following each interaction session was also conductedin order to gather additional user feedback and to clarify obser-vations that had been made.



TABLE IVSUMMARY OF RESULTS (AVERAGES)

Fig. 18. Box-whisker plots for times (minutes) recorded for study 1 and study2 users. These values are for the eight users returning for study 2 and not theentire 15 users previously involved. The top box relates to the 75th percentile,the bottom box relates to the 25th percentile, the solid line corresponds tomedian values (50th percentile), and hollow diamonds represent the mean, withmaximum and minimum values represented by top and bottom whiskers.

B. Results I: Impact of Learning System on User Performance

Table IV provides a summary of the results for both our earlierusability study [15] (study 1) and the usability study (study 2)discussed in this paper. Figs. 18 and 19 present box-whiskergraphs to compare performance data (only times recorded). Wefound that although both the beginner and nonbeginner groupsreported similar numbers of commands for both studies, timesrecorded in the two studies differed. The total time to graspan object in study 2 was on average 38% less than the timerecorded in study 1. The speed up in times recorded can onlybe attributed to the addition of the learning system since thenumber of commands remained relatively the same. When thelearning system was triggered, the arm moved larger distancestoward the target endpoint. This in turn eliminated the needfor intermediate commands to be issued by the user to reachthe same location, subsequently affecting times recorded forinteraction sessions. The lack of significant improvement inthe number of commands issued by users could be attributedto users performing additional corrective steps to reorient thegripper into a more ideal pose during this study. A more idealgripper pose could be established from examining gripper posescollected in [15]. An approximation to orient the end-effectorcould be used to represent a close to ideal gripper pose. Theaddition of this gripper pose during the hand-preshaping phase

Fig. 19. Box-whisker plots for each user group for times recorded (minutes).These values are for the eight users returning for study 2 and not all of the entire15 users previously involved. The top box relates to the 75th percentile, the bot-tom box relates to the 25th percentile, the solid line corresponds median values(50th percentile), and hollow diamonds represent the mean, with maximum andminimum values represented by top and bottom whiskers.

TABLE VSUMMARY OF STATISTICS FOR STUDY 1

TABLE VISUMMARY OF STATISTICS FOR STUDY 2

could likely reduce the total number of commands, which theusers need to issue.

Statistical analysis was performed to evaluate the significanceof the difference between the two studies. Tables V and VI showthe mean, standard deviation, and standard error for each studyrelated only to times recorded.

Paired two-tailed t-tests conducted on the data presented inTables V and VI show that the difference is significant for begin-ners (p < 0.021) and for nonbeginners (p < 0.00042), respec-tively. Overall when all users were analyzed, the significancelevel was p < 0.00022. It should be emphasized that there wasa two-year gap between each of the two studies, which in turnsignificantly reduces or eliminates the impact of the users’ previ-ous experiences on the times recorded. Therefore, these resultsindicate clearly that the addition of the learning system reduceduser workload and enabled users to reach their target objectivewith less effort than previously required.

Paired two-tailed t-test conducted on the number of com-mands data presented in Table IV show no significant difference



between study 1 and study 2 with significance levels of p = 0.79,p = 0.47, and p = 0.87 for the beginner, nonbeginner groups,and all users, respectively.

To analyze median command values to activate the learneralso reveals an important finding. It appears that by the seventhcommand, the learning system typically finds a translation stage,and effectively moves the gripper to the target endpoint. If anappropriate prediction for the orientation of the gripper werealso included, possibly with the help of a vision system thatprovides the object orientation as well, a significant reduction inthe number of commands can also be achieved. In this case, weexpect that most objects can be grasped within approximatelyten commands.

It should also be noted that during this study, those specificcommand sequences which were used to trigger the learningsystem were not recorded. These sequences were not consid-ered because the focus of this paper is on the aggregate resultsof using the learning system. Furthermore, different objects anddifferent users may lead to different specific sequences. How-ever, we plan to provide the raw data to other researchers whomay be interested to examine the specific command sequencesused by the learning system.

C. Results II: Impact of Learning System on User Satisfaction

We also gathered additional feedback from users by conduct-ing debriefing sessions following interaction sessions to gainfurther insight into whether users were satisfied with the ad-dition of the learning system to the grasping process. For themost part, users reported that they welcomed a more proactiverobot. Out of a total of 32 experiments conducted during thisstudy, users reported being satisfied with how the learning sys-tem behaved 21 out of the 32 experiments. This translates intouser satisfaction reported approximately 66% of the time. Evenin cases, where the gripper orientation was not ideal, users stillreported some satisfaction with the approach phase initiated bythe learning system. However, in some cases, the learning sys-tem could not be executed since it was activated either too latein the grasping process (i.e., the gripper was in a closer positionto the object than if it were moved to the endpoint) or singu-larity situations occurred. In cases, where singularity occurred,the learning system was not activated during that segment of thegrasping operation. Instead, users continued issuing commandsequences until the learning system was triggered again at alater stage during the grasping process.

During debriefing sessions held following each interactionsession, users also expressed less frustration during this usabilitystudy than previously observed. Users expressed their interest inwelcoming robots that can adapt their behaviors and interactionstyle toward the user, rather than relying solely on users learninghow to interact with and program these complex machines.This feedback, therefore, provides important motivation towardcontinued efforts in the area of adaptive HRI research.

V. DISCUSSION AND CONCLUSION

The results presented in this paper has shown several impor-tant findings. First, the paper presented a learning system that

was integrated with an NLP interface to reduce the time that auser takes to transfer a complex skill such as grasping an objectto a robot. In addition, the system’s effectiveness was validatedin a small usability study. While many improvements to the sys-tem can be suggested as will be subsequently discussed, this isclearly an important result. Can we expect the system to performsimilarly in other environments with different sets of objects,users, and robots? The study involved small difficult to grasp ev-eryday objects not carefully selected “lab-friendly” objects, realnontechnical users who are not graduate students, and a roboticarm equipped with a simple parallel gripper. Thus, it is our con-clusion that the system will provide similar performance in otherenvironments that will most likely involve larger and easier tograsp objects, especially if the robot is equipped with a morecapable gripper. This is particularly the case if this system is fur-ther integrated with other sensory and HRI tools. For example,using an advanced vision system to locate and recognize variousobjects will allow the system to generalize grasping experiencesthat are not only based on user command but on geometric fea-tures of the object as well. Similarly, integrating tactile sensorswill also allow generalization based on surface friction charac-teristics and mass. To incorporate other HRI approaches, suchas using laser pointers to point to objects [9], can also help oncethe robot has already learned to grasp approaches for that object.All of these tools will present additional sensory feedback thatcan be used in addition to the user’s feedback to enhance thesystem.

Second, the usability study showed that the system was wellreceived by the majority of the users. This is important giventhat the system acted in a proactive mode in the usability study.In our approach, user acceptance will play a pivotal role to in-troduce robots into human-living and human-working environ-ments. While many studies have shown robots assisting humansin various tasks, the robot’s role was quite limited and well de-fined. This was not the case reported here. As users continued tointeract with the robot, the more they developed a relationshipbased on trust with the robot. This was evidenced by users goingso far as to name the robot according to their own preferencesfor the name selection. This acceptance of a proactive robot isan important result with widespread implications to other tasksand situations.

In conclusion, we believe that a system that combines userinteraction with learning is the best way to transfer complexskills, such as to grasp arbitrarily shaped objects in user’s homesand offices, to robots’ operating in these environments. Ourfuture work will explore other types of learning approaches aswell as incorporate additional sensory input, particularly vision.We plan to experiment with additional grasping stages, suchas fine manipulation with the help of a suitable gripper andskill transfer based on object characteristics, as well as userpreferences.

ACKNOWLEDGMENT

The authors would like to thank the participants who volun-teered for the study.



REFERENCES

[1] H. Asoh, Y. Motomura, F. Asano, I. Hara, S. Hayamizu, K. Itou, T. Kurite,T. Matsui, N. Vlassis, R. Bunschoten, and B. Krose, “Jijo-2: An officerobot that communicates and learns,” IEEE Intell. Syst., vol. 16, no. 5,pp. 46–55, Sep./Oct. 2001.

[2] Z. Bien, H. Lee, J. Do, Y. Kim, K. Park, and S. Yang, “Intelligent interac-tion for human-friendly service robot in smart house environment,” Int.J. Comput. Intell. Syst., vol. 1, no. 1, pp. 77–93, Jan. 2008.

[3] C. Campbell, R. Peters, R. Bodenheimer, W. Bluethmann, E. Huber, andR. Ambrose, “Superpositioning of behaviors learned through teleopera-tion,” IEEE Trans. Robot., vol. 22, no. 1, pp. 79–91, Feb. 2006.

[4] S. Das, D. Cook, A. Bhattacharya, E. Heierman, and T. Lin, “The role ofprediction algorithms in the mavhome smart home architecture,” IEEEWireless Commun., vol. 9, no. 6, pp. 77–84, Dec. 2002.

[5] B. Davison and H. Hirsh, “Predicting sequences of user actions,” in Proc.AAAI Workshop Predicting Future: AI Approaches Time-Ser. Anal., 1998,pp. 5–12.

[6] P. F. Dominey, G. Metta, F. Nori, and L. Natale, “Anticipation and initia-tive in human-humanoid interaction,” in IEEE/RAS Int. Conf. HumanoidRobots, Daejeon, Korea, Dec. 2008, pp. 693–699.

[7] H. Endres, W. Feiten, and G. Lawitzky, “Field test of a navigation system:Autonomous cleaning in supermarkets,” in Proc. IEEE Int. Conf. Robot.Autom., 1998, vol. 2, pp. 1779–1781.

[8] N. Jacobs and H. Blockeel, “Sequence prediction with mixed order markovchains,” presented at the Belgium/Dutch Conf. Artif. Intell., Nijmegen,The Netherlands, 2003.

[9] C. C. Kemp, C. D. Anderson, H. Nguyen, A. J. Trevor, and Z. Xu, “Apoint-and-click interface for the real world: laser designation of objects formobile manipulation,” in Proc. 3rd ACM/IEEE Int. Conf. Human RobotInteract., New York: ACM, 2008, pp. 241–248.

[10] B. Korvemaker and R. Greiner, “Predicting unix command lines: Adjust-ing to user patterns,” in Proc. 17th Nat. Conf. Artif. Intell., Austin, TX,Jul. 2000, pp. 230–235.

[11] V. Kulyukinand, C. Gharpure, and J. Nicholson, “RoboCart: Toward robot-assisted navigation of grocery stores by the visually impaired,” in Proc.IEEE/RSJ Int. Conf. Intell. Robots Syst., Aug. 2–6, 2005, pp. 2845–2850.

[12] N. Otero, A. Alissandrakis, K. Dautenhahn, C. Nehaniv, D. S. Syrdal,and K. L. Koay, “Human to robot demonstrations of routine home tasks:Exploring the role of the robot’s feedback,” in Proc. 3rd ACM/IEEEInt. Conf. Human-Robot Interact., New York: ACM, Mar. 12–15, 2008,pp. 177–184.

[13] M. Ralph and M. Moussa, “Human-robot interaction for robotic grasping:A pilot study,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Edmonton,AB, Canada, Aug. 2005, pp. 2146–2151.

[14] M. Ralph and M. Moussa, “On the effect of the user’s background oncommunicating grasping commands,” in Proc. ACM Conf. Human-RobotInteract., Salt Lake City, UT, Mar. 2006, pp. 353–354.

[15] M. Ralph and M. Moussa, “Toward a natural language interface for trans-ferring grasping skills to robots,” IEEE Trans. Robot., vol. 24, no. 2,pp. 468–475, Apr. 2008.

[16] M. Shiomi, T. Kanda, H. Ishiguro, and N. Hagita, “Interactive humanoidrobots for a science museum,” in Proc. 1st ACM SIGCHI/SIGART Conf.Human-Robot Interact., New York: ACM, 2006, pp. 305–312.

[17] S. Thrun, “MINERVA: A tour-guide robot that learns,” Lect. Notes Com-put. Sci., vol. 1701, pp. 14–27, 1999.

[18] K. Wada and T. Shibata, “Living with Seal robots: Its sociopsychologicaland physiological influences on the elderly at a care house,” IEEE Trans.Robot. Autom., vol. 23, no. 5, pp. 972–980, Oct. 2007.

Maria Ralph received the B.Sc. degree in com-puter science from the Ryerson Polytechnic Uni-versity, Toronto, ON, Canada, and the Ph.D. degreein systems and computer engineering from the Uni-versity of Guelph, Guelph, ON, in 1997 and 2008,respectively.

She was a Postdoctoral Fellow with the ComputerVision and Active Perception Laboratory, School ofComputer Science and Communication, Royal In-stitute of Technology, Stockholm, Sweden. She iscurrently a Research Associate with the Intelligent

Systems Laboratory, School of Engineering, University of Guelph. She was aSoftware Developer and Researcher for companies such as MDA Space Mis-sions, Brampton, ON, for several years. Her research interests include roboticgrasping, user-adaptive robots, human–robot interaction, and machine learning.

Medhat A. Moussa (M’03) received the B.A.Sc. de-gree from the American University, Cairo, Egypt andthe M.A.Sc. degree from the Universite de Moncton,Moncton, NB, Canada, both in mechanical engineer-ing, and the Ph.D. degree in systems design engineer-ing from the University of Waterloo, Waterloo, ON,Canada, in 1987, 1991, and 1996, respectively.

He is currently an Associate Professor with the In-telligent Systems Laboratory, School of Engineering,University of Guelph, Guelph, ON. His research in-terests include user-adaptive robots, machine vision,

machine learning, neural networks, and human–robot interaction.Dr. Moussa is a member of the Association for Computing Machinery.


698 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, …mmoussa/documents/TR2010_000.pdf698 IEEE TRANSACTIONS...

Documents

Transcript of 698 IEEE TRANSACTIONS ON ROBOTICS, VOL. 26, …mmoussa/documents/TR2010_000.pdf698 IEEE TRANSACTIONS...