A Farewell to Arms: Non-verbal Communication for Non ...

5
A Farewell to Arms: Non-verbal Communication for Non-humanoid Robots * Aaron G. Cass Union College Schenectady, NY, USA [email protected] Kristina Striegnitz Union College Schenectady, NY, USA [email protected] Nick Webb Union College Schenectady, NY, USA [email protected] Abstract Human-robot interactions situated in a dy- namic environment create a unique mix of challenges for conversational systems. We argue that, on the one hand, NLG can con- tribute to addressing these challenges and that, on the other hand, they pose interest- ing research problems for NLG. To illus- trate our position we describe our research on non-humanoid robots using non-verbal signals to support communication. 1 Introduction Our research is about interaction strategies for robots who have to approach and communicate with strangers in busy public spaces (Cass et al., 2015, 2018). For example, in one of our target sce- narios a delivery robot in a busy academic build- ing on a college campus has to solicit help to op- erate the elevator from humans passing by. In an- other scenario a robot is recruiting survey partici- pants in a shopping mall. In order to develop so- lutions that will work in a real-world deployment, we collect data and study human-robot interaction not just in laboratory experiments but also in field studies conducted in the wild. In these field studies we have encountered chal- lenges that are traditionally not addressed by the natural language generation (NLG) pipeline. However, we would like to argue that an NLG sys- tem aware of these issues can contribute to a bet- ter solution and that they also pose interesting re- search problems for NLG. In particular, the following two sources of chal- lenges have stood out to us. First, the robot is sit- uated in a dynamic environment with human in- teraction partners that can act while the robot is * Position paper presented at the workshop on natural lan- guage generation for human-robot communication at INLG 2018. speaking or planning an utterance. As in other situated communication tasks (Koller et al., 2010; Smith et al., 2011) the timing of the robot’s ut- terances is important. For fluent interactions the robot needs to monitor the human’s actions and changes in the environment and react to them in a timely manner, potentially by interrupting itself or modifying an utterance mid-stream (Clark and Krych, 2004). Second, many environmental factors may hin- der communication and are not controllable by us or the robot. For example, in a busy public space the background noise level may be high, making it hard for people to hear the robot. People may be passing by and even in between the robot and the addressee. The robot will encounter many differ- ent reactions from addressees; some will be sur- prised, scared, or embarrassed to interact with it. One approach to these challenges would be to solve these issues first in order to create a situa- tion where a “traditional” NLG pipeline, based on NLG for text generation, can be used optimally. For example, we could try to develop a module that perfectly times utterances, make sure to ad- just the audio level to always be above the en- vironmental noise level, and only communicate with addressees that are directly in front of the robot. However, these goals may be impossible to achieve. For example, while it makes sense to optimize the timing of utterances, most contribut- ing factors are out of our control, so that the robot will always have to be prepared to deal with unex- pected actions by the human addressee, changes in the environment, or network delays. Furthermore, this approach may lead to suboptimal results. For instance, if the robot only communicates with peo- ple if they are positioned right in front of it, in a busy space with people passing through, many op- portunities for interaction may be lost. Therefore, we believe that NLG should be

Transcript of A Farewell to Arms: Non-verbal Communication for Non ...

Page 1: A Farewell to Arms: Non-verbal Communication for Non ...

A Farewell to Arms: Non-verbal Communication for Non-humanoidRobots∗

Aaron G. CassUnion College

Schenectady, NY, [email protected]

Kristina StriegnitzUnion College

Schenectady, NY, [email protected]

Nick WebbUnion College

Schenectady, NY, [email protected]

Abstract

Human-robot interactions situated in a dy-namic environment create a unique mix ofchallenges for conversational systems. Weargue that, on the one hand, NLG can con-tribute to addressing these challenges andthat, on the other hand, they pose interest-ing research problems for NLG. To illus-trate our position we describe our researchon non-humanoid robots using non-verbalsignals to support communication.

1 Introduction

Our research is about interaction strategies forrobots who have to approach and communicatewith strangers in busy public spaces (Cass et al.,2015, 2018). For example, in one of our target sce-narios a delivery robot in a busy academic build-ing on a college campus has to solicit help to op-erate the elevator from humans passing by. In an-other scenario a robot is recruiting survey partici-pants in a shopping mall. In order to develop so-lutions that will work in a real-world deployment,we collect data and study human-robot interactionnot just in laboratory experiments but also in fieldstudies conducted in the wild.

In these field studies we have encountered chal-lenges that are traditionally not addressed bythe natural language generation (NLG) pipeline.However, we would like to argue that an NLG sys-tem aware of these issues can contribute to a bet-ter solution and that they also pose interesting re-search problems for NLG.

In particular, the following two sources of chal-lenges have stood out to us. First, the robot is sit-uated in a dynamic environment with human in-teraction partners that can act while the robot is

∗Position paper presented at the workshop on natural lan-guage generation for human-robot communication at INLG2018.

speaking or planning an utterance. As in othersituated communication tasks (Koller et al., 2010;Smith et al., 2011) the timing of the robot’s ut-terances is important. For fluent interactions therobot needs to monitor the human’s actions andchanges in the environment and react to them ina timely manner, potentially by interrupting itselfor modifying an utterance mid-stream (Clark andKrych, 2004).

Second, many environmental factors may hin-der communication and are not controllable by usor the robot. For example, in a busy public spacethe background noise level may be high, making ithard for people to hear the robot. People may bepassing by and even in between the robot and theaddressee. The robot will encounter many differ-ent reactions from addressees; some will be sur-prised, scared, or embarrassed to interact with it.

One approach to these challenges would be tosolve these issues first in order to create a situa-tion where a “traditional” NLG pipeline, based onNLG for text generation, can be used optimally.For example, we could try to develop a modulethat perfectly times utterances, make sure to ad-just the audio level to always be above the en-vironmental noise level, and only communicatewith addressees that are directly in front of therobot. However, these goals may be impossibleto achieve. For example, while it makes sense tooptimize the timing of utterances, most contribut-ing factors are out of our control, so that the robotwill always have to be prepared to deal with unex-pected actions by the human addressee, changes inthe environment, or network delays. Furthermore,this approach may lead to suboptimal results. Forinstance, if the robot only communicates with peo-ple if they are positioned right in front of it, in abusy space with people passing through, many op-portunities for interaction may be lost.

Therefore, we believe that NLG should be

Page 2: A Farewell to Arms: Non-verbal Communication for Non ...

aware of these issues and can contribute to a so-lution. For example: An incremental NLG mod-ule may be able to better time utterances and re-act to unexpected changes (Allen et al., 2001;Buschmeier et al., 2012). When the environmentis noisy or the robot is far away from the ad-dressee, generating shorter utterances using sim-pler words and complementing natural languagewith non-verbal signals might be more effective.Previous work has explored the problem of adapt-ing the form and content of generated utterancesto situational constraints (e.g. Rieser et al., 2014;Walker et al., 2007; Rieser et al., 2014), but typi-cally not in the context of human-robot interaction.

In order to illustrate our position, we will de-scribe some results and observations from our on-going research on making human-robot communi-cation more robust using non-verbal signals. A lotof work has been done on generating non-verbalsignals, like gestures, facial expressions, and pos-ture for animated characters (known as embodiedconversational agents or virtual humans). Some ofthis work has been transferred to humanoid robots.However, because of our application scenario, theuse of humanoid robots is not practical for us. Weneed robots that are tall enough to interact withstanding humans and that are not too expensive tobe deployed in a busy public space. We work withrobots that have a wheeled base and a mountedscreen (see Figure 1).

The research challenge is, therefore, to find outwhat non-verbal signals are effective communica-tive devices for these non-humanoid robots. Thesesignals may mimic human behaviors, or they maybe visual metaphors that express the robot’s inten-tions in a way that is not modeling realistic humanbehavior, similar to the way comics express a char-acter’s movement or emotions.

2 Related Work

People accompany their speech with non-verbalsignals, which support and add to the contentof the speech and which help manage the di-alog. For example, iconic hand gestures maydepict some features of an object or event be-ing described (McNeill, 1992), eye gaze plays animportant role in regulating turn-taking in dia-log (Kendon, 1967), and facial displays express aspeaker’s emotions (Ekman and Friesen, 2003) butalso serve pragmatic functions that help organizethe dialog (Chovil, 1991).

Embodied conversational agents (ECAs) or vir-tual humans are animated characters that engagewith humans in a dialog using both verbal andnon-verbal communication (Cassell et al., 2000;Poggi et al., 2005; Thiebaux et al., 2008). Typicalresearch in this area closely analyzes human non-verbal behavior and aims to model these behaviorsin the animated character.

Some of this work on generating non-verbalbehaviors for animated conversational charactershas been transferred to physical humanoid robots.Salem et al. (2012) and Hasegawa et al. (2010) usegesture generation strategies developed for ECAson humanoid robots. Breazeal (2000) presents arobot with a simple cartoonish face that can ex-press emotions and interaction cues. Most expres-sions are modeled on human facial expressions.But the robot can also use its non-human, animal-like ears to indicate arousal and attention.

While Breazeal’s (2000) work shows that evenwith humanoid robots going beyond the normalhuman repertoire of non-verbal signals can be ben-eficial, non-humanoid robots often are not capableof mimicking human non-verbal behaviors. It istherefore essential to identify what behaviors ofnon-humanoid robots can easily be interpreted byhumans (Cha et al., 2018). Recent work has, forexample, explored the interpretability of robot armmovements (Dragan et al., 2013). In a study thatis most similar to our research, Cha and Mataric(2016) have shown that a service robot can uselight and sound signals to indicate that it needshelp and to communicate levels of urgency.

3 Experiments Exploring Non-verbalSignals for Non-humanoid robots

We describe three studies we have carried outor are currently conducting to explore how non-verbal behaviors can contribute to communicationbetween humans and non-humanoid robots. Inthese studies we explore non-verbal robot behav-iors modeled on human behaviors as well as robotbehaviors designed to communicate metaphori-cally through movement.

The two robots we have used for this work,SARAH and VALERIE, both have a mobile base,a screen on which a simple cartoon face can bedisplayed, and a suite of cameras and depth sen-sors (VALERIE is shown in Figure 1). Impor-tantly, the robots have a non-humanoid form, lack-ing the typical mechanisms for human non-verbal

Page 3: A Farewell to Arms: Non-verbal Communication for Non ...

Figure 1: VALERIE

expression. Our experiments are conducted usinga Wizard of Oz (WoZ) protocol, in which a humanwizard remotely controls the robot unbeknownstto the participants. The wizard interface providesa set of pre-planned behaviors the wizard can ini-tiate, as well as lower-level controls for the robot.

3.1 Robot eye gaze to support reference

In this ongoing study we look at whether humansuse our robot’s eye gaze to resolve referring ex-pressions. Hanna and Brennan (2007) found thathumans use a human instruction giver’s eye gazeto distinguish an object being described from itssimilar looking distractors. We replicated their ex-periment, in the laboratory, with VALERIE takingthe instruction giver’s place.

Participants stood opposite VALERIE with a ta-ble between them. On the table was a sequenceof colored shapes, each of which also had a num-ber of black dots. Some layouts contained dis-tractor pairs, which are shapes of the same colorand form, but with a different number of dots.VALERIE gave instructions of the form “Pressthe button corresponding to the blue triangle withthree dots”, while either only moving her mouthor, additionally, accompanying the instruction by amovement of the pupils in the direction of the tar-get shape. A preliminary analysis of the data sug-gests that VALERIE’s eye gaze helps participantspick out the right target more quickly in situationswhere the layout contains a distractor shape that issufficiently far away from the target shape that itcan be distinguished by eye gaze.

This shows that the participants do indeed inter-pret the robot’s eye gaze and use it to guide theirown behavior. From an NLG point of view, thegeneration of eye gaze is interesting because eyegaze has to be coordinated with the natural lan-guage utterance it accompanies, while also pro-ducing natural looking eye movements.

Limitations and future work: This study wasdone in a laboratory environment using a repeti-tive and unrealistic task. We plan to conduct a fol-low up study that tests the effectiveness of roboteye gaze as a communicative device in the wild.

3.2 Robot body movement and orientation toattract attention and initiate interactions

In this experiment in the wild, the robot behav-ior was designed to (very crudely) mimic the be-haviors humans might use to initiate an inter-action with a passer-by in a busy public space.SARAH was stationed in a popular hallway. Shewould greet people (“Hello! Can you please helpme?”) either while standing still or accompaniedby a rotational movement that followed the sub-ject we wished to engage. People who approachedSARAH were then asked to press a specific num-ber on a keypad.

We collected video data of 14 one-hour sessionsover the course of 5 weeks. In total, 1658 peo-ple passed by SARAH. Of those, only 714 en-gaged with her in any way, including just look-ing at her. Of the 714, 108 completed our task.We found that movement of the robot statisticallysignificantly increased how many people lookedat SARAH (64% of passers-by noticed the stillrobot, 88% the moving robot), but not the num-ber of completed tasks (6.4% in the non-movingcondition, 6.7% in the moving condition). Givena 30% increase in the number of people who no-tice SARAH, we expected a similar increase in thenumber of people who stop to interact with her.

A closer analysis of our video data indicates thattechnical issues with the WoZ interface (which weplan to address in follow-up experiments) as wellas issues related to SARAH’s communicative be-havior may be the reason for why the increasedattention did not lead to more successful interac-tions. First, it seems that SARAH’s intentionsweren’t always clear and, second, several peoplein the study acted surprised or scared of SARAHor embarrassed to interact with her. Both issuespoint toward a need for better communicative non-verbal behaviors to convey the robot’s intentionsand to lessen people’s apprehension.

As with eye gaze, these non-verbal behav-iors have to be planned and coordinated with therobot’s natural language utterances. An additionalchallenge is that the signals we are exploring arecomplex, involving eye gaze, facial expressions,

Page 4: A Farewell to Arms: Non-verbal Communication for Non ...

and different kinds of movement. Furthermore,the optimal choice of non-verbal signals and formor natural language utterance may depend on as-pects of the environment, such as how busy andnoisy it is or how far away the addressee is. TheNLG system planning these utterances will have tobe able to coordinate diverse types of communica-tive signals and to adapt to the current situation.

Future work: In our current work, we are study-ing verbal and non-verbal behaviors that allow therobot to better signal its wish to interact (e.g. mov-ing toward the selected addressee, facial expres-sions to indicate a need for help and a wish toengage). This exploration is guided by what isknown about human behaviors in similar situa-tions (Kendon and Ferber, 1973).

3.3 Robot gestures to express mental statesIn the first two studies the robot used non-verbalbehaviors that were modeled on human behav-ior. We now describe a pilot study, conductedin the wild, that moves toward metaphorical ges-tures. This study focused on gestures to expressthe following mental states of the robot: agree-ment, disagreement, uncertainty, and excitement.In humans, facial expressions and head gesturesplay an important role in expressing these mentalstates. While SARAH can produce different fa-cial displays, she does not have a movable head.Based on our intuition, we devised the followingnon-verbal behaviors.

agreement Smile and move forward and back-ward a few inches.

disagreement Frown and rotate side to side by 35degrees.

uncertainty With a neutral facial expression, turnaway from the addressee by 45 degrees,briefly pause, then return.

excitement With surprised facial expression, spinaround 360 degrees.

SARAH recruited subjects in a busy hallwayon campus. She instructed subjects to retrieve anindex card with a set of yes/no questions froma pocket attached to the robot and to ask thosequestions. SARAH accompanied her spoken an-swer either with facial expressions only or withfacial expressions and gestures. At the end of thescripted interaction, SARAH said “Yay, we com-pleted the task” and expressed excitement.

SARAH then asked the subjects to completea paper survey rating SARAH’s intelligence and

naturalness. In this pilot study, SARAH’s use ofgestures did not have a (statistically significant)impact on people’s perceptions of her. And, unfor-tunately, we did not collect data that allows us todraw conclusions on whether humans interpretedthe gestures as intended.

Interesting research problems that arise are thedesign of easy to interpret metaphorical gestures,how to select which signals to use in a given dia-log situation, how to coordinate different commu-nicative signals, and how to transition between andblend different non-verbal behaviors.

Future work: We are preparing a follow upstudy that will evaluate the interpretability of vari-ants of different gestures more systematically. Ourgoal is to create a lexicon of robot behaviors thatcan perform different discourse and dialog func-tions. We are currently focusing on robot move-ments, but we are also interested in other sig-nals, like non-speech sounds and visual cues onthe screen that go beyond facial expressions mim-icking humans.

4 Conclusion

The interactions between humans and robots inpublic spaces are situated in an un-controllableand only partially predictable environment. Thiscreates challenges for communication. We thinkthat NLG can contribute to a solution to these chal-lenges by producing utterances and other commu-nicative behaviors that are adapted to the situation.In addition, we argue that these challenges giverise to research problems that are interesting froman NLG point of view.

In this paper, we have illustrated our positionby describing three studies that explore the gen-eration of co-verbal communicative behaviors fornon-humanoid robots. This line of research tack-les the following issues related to the generationof multimodal utterances. We need to design non-verbal signals that are mimicking human behav-ior as well as signals that communicate metaphor-ically. The robot behaviors are constrained by thelimited motor capabilities of the robot, but theycan also take advantage of expressive options thatare not available to humans. We need techniquesfor generating multimodal utterances that coordi-nate the different non-verbal signals and speech.And finally, we need to understand how to choosethe most effective set of signals in a given dialogsituation.

Page 5: A Farewell to Arms: Non-verbal Communication for Non ...

ReferencesJames Allen, George Ferguson, and Amanda Stent.

2001. An architecture for more realistic conversa-tional systems. In Proc. of the 6th InternationalConference on Intelligent User Interfaces.

Cynthia Lynn Breazeal. 2000. Sociable machines:Expressive social exchange between humans androbots. Ph.D. thesis, MIT.

Hendrik Buschmeier, Timo Baumann, BenjaminDosch, Stefan Kopp, and David Schlangen. 2012.Combining incremental language generation and in-cremental speech synthesis for adaptive informationpresentation. In Proc. of the 13th Annual Meetingof the Special Interest Group on Discourse and Dia-logue.

Aaron G. Cass, Eric Rose, Kristina Striegnitz, and NickWebb. 2015. Determining appropriate first contactdistance: trade-offs in human-robot interaction ex-periment design. In Proc. of the Workshop on De-signing and Evaluating Social Robots for Public Set-tings at the International Conference on IntelligentRobots and Systems.

Aaron G. Cass, Kristina Striegnitz, Nick Webb, andVenus Yu. 2018. Exposing real-world challenges us-ing HRI in the wild. In Proc. of the 4th Workshop onPublic Space Human-Robot Interaction at the Inter-national Conference on Human-Computer Interac-tion with Mobile Devices and Services.

Justine Cassell, Joseph Sullivan, Elizabeth Churchill,and Scott Prevost. 2000. Embodied ConversationalAgents. MIT press.

Elizabeth Cha, Yunkyung Kim, Terrence Fong, andMaja J. Mataric. 2018. A survey of nonverbal sig-naling methods for non-humanoid robots. Founda-tions and Trends in Robotics, 6(4):211–323.

Elizabeth Cha and Maja Mataric. 2016. Using nonver-bal signals to request help during human-robot col-laboration. In Proc. of the Intl. Conf. on IntelligentRobots and Systems.

Nicole Chovil. 1991. Discourse-oriented facial dis-plays in conversation. Research on Language andSocial Interaction, 25(1-4):163–194.

Herbert H. Clark and Meredyth A. Krych. 2004.Speaking while monitoring addressees for under-standing. J. of Memory and Language, 50(1):62–81.

Anca D. Dragan, Kenton C.T. Lee, and Siddhartha S.Srinivasa. 2013. Legibility and predictability ofrobot motion. In Proc. of the 8th Intl. Conf. onHuman-Robot Interaction.

Paul Ekman and Wallace V. Friesen. 2003. Unmaskingthe Face: A Guide to Recognizing Emotions FromFacial Expressions. Malor Books.

Joy E. Hanna and Susan E. Brennan. 2007. Speakers’eye gaze disambiguates referring expressions earlyduring face-to-face conversation. J. of Memory andLanguage, 57(4):596–615.

Dai Hasegawa, Justine Cassell, and Kenji Araki.2010. The role of embodiment and perspective indirection-giving systems. In Proc. of the AAAI fallsymposium: Dialog with robots.

Adam Kendon. 1967. Some functions of gaze-direction in social interaction. Acta Psychologica,26:22–63.

Adam Kendon and Andrew Ferber. 1973. A descrip-tion of some human greetings. In ComparativeEcology and Behaviour of Primates: Proc. of aConf. held at the Zoological Society London. Aca-demic Press.

Alexander Koller, Kristina Striegnitz, Donna Byron,Justine Cassell, Robert Dale, Johanna Moore, andJon Oberlander. 2010. The first challenge on gen-erating instructions in virtual environments. InE. Krahmer and M. Theune, editors, EmpiricalMethods in Natural Language Generation, volume5980 of LNCS. Springer.

David McNeill. 1992. Hand and mind: What gesturesreveal about thought. University of Chicago Press.

Isabella Poggi, Catherine Pelachaud, Fiorella de Ro-sis, Valeria Carofiglio, and Berardina De Carolis.2005. Greta. a believable embodied conversationalagent. In O. Stock and M. Zancanaro, editors, Mul-timodal Intelligent Information Presentation, pages3–25. Springer.

Verena Rieser, Oliver Lemon, and Simon Keizer.2014. Natural language generation as incremen-tal planning under uncertainty: adaptive informa-tion presentation for statistical dialogue systems.IEEE/ACM Transactions on Audio, Speech, andLanguage Processing, 22(5).

Maha Salem, Stefan Kopp, Ipke Wachsmuth, KatharinaRohlfing, and Frank Joublin. 2012. Generation andevaluation of communicative robot gesture. Intl. J.of Social Robotics, 4(2):201–217.

Cameron Smith, Nigel Crook, Simon Dobnik, DanielCharlton, Johan Boye, Stephen Pulman, Raul San-tos De La Camara, Markku Turunen, David Benyon,Jay Bradley, et al. 2011. Interaction strategies for anaffective conversational agent. Presence: Teleoper-ators and Virtual Environments, 20(5):395–411.

Marcus Thiebaux, Stacy Marsella, Andrew N Marshall,and Marcelo Kallmann. 2008. Smartbody: Behav-ior realization for embodied conversational agents.In Proc. of the 7th Intl. Joint Conf. on AutonomousAgents and Multiagent Systems - Volume 1.

Marilyn A. Walker, Amanda Stent, Francois Mairesse,and Rashmi Prasad. 2007. Individual and domainadaptation in sentence planning for dialogue. Jour-nal of Artificial Intelligence Research, 30.