Measuring the Quality of Interaction in Mobile … · Measuring the Quality of Interaction in...

Int J of Soc Robot manuscript No.(will be inserted by the editor)

Measuring the Quality of Interaction in Mobile Robotic Telepresence: APilot’s PerspectiveThis is an author’s compilation of DOI: 10-1007/s12369-012-0166-7. The final publication isavailable at www.springer.com.

Annica Kristoffersson · Kerstin Severinson-Eklundh · Amy Loutfi

Received: date / Accepted: date

Abstract This article presents a method for measuring thequality of interaction in social mobile robotic telepresence.The methodology is in part based on Adam Kendon’s the-ory of F-formations. The theory is based on observations ofhow bodies naturally orient themselves during interactionbetween people in real life settings. In addition, two pres-ence questionnaires (Temple Presence Inventory and Net-worked Minds Social Presence Inventory), designed to mea-sure the users’ perceptions of others and the environmentwhen experienced through a communication medium are us-ed. The perceived presence and ease of use are correlatedto the spatial formations between the robot and an actor.The proposed methodology is validated experimentally ona dataset consisting of interactions between an elder (ac-tor) and 21 different users being trained in piloting a mo-bile robotic telepresence unit. The evaluation has shown thatthese tools are suitable for evaluating mobile robotic telep-resence and also that correlations between the tools used ex-ist. Further, these results give important guidelines on howto improve the interface in order to increase the quality ofinteraction.

The ExCITE project has been supported by EU under the AmbientAssisted Living Joint Programme (AAL-2009-2-125).

Annica Kristoffersson, Amy LoutfiCenter of Applied Autonomous Sensor SystemsOrebro UniversitySwedenE-mail: [email protected]

Kerstin Severinson-EklundhSchool of Computer Science and CommunicationKTHSwedenE-mail: [email protected]

Keywords mobile robotic telepresence· F-formations·methodology· presence· telepresence· quality of interac-tion

1 Introduction

Mobile robotic telepresence (MRP) is a combination of tele-operation and telepresence offering a “walking around” ca-pability. A MRP system is a video conferencing system thr-ough which the pilot of the system can move around in aremote environment and interact with its inhabitants (localuser). The pilot interacts with the local user while at thesame time navigating the robot via a computer. The localuser experiences the interaction with the pilot via a videoconferencing system mounted on a robotic base. Thus, us-age of a MRP system includes many sorts of interaction si-multaneously. Local users of the system are interacting withanother human (HHI) but at the same time, they are interact-ing with a robot (human-robot interaction, HRI). The pilotsof the MRP system are interacting with a computer system(HCI) but at the same time, they are interacting with anotherhuman (HHI) while being embodied in a robot which theycannot see themselves. Commercial examples of MRP sys-tems are Giraff (Giraff Technologies [8]), QB (Anybot [26]),Texai (Willow Garage [28]) and VGo (Vgo Communica-tions [30]).

Previous studies in evaluation for MRP systems havefocused on navigational and interface aspects (for exam-ple [25]) and more recently, examined acceptance from po-tential users of the system. Recent evaluations of MRP sys-tems studied in the perspective of coworkers include [21,29]. Lee & Takayama [21] studied how remote coworkersusing a MRP system were treated by the local coworkers.Tsui et al. [29] studied the bystanders’ impression of a MRPsystem being placed in an office and discuss guidelines for

2 Annica Kristoffersson et al.

increasing social acceptability. Other studies focusing on theuse of MRP systems from an elder perspective include Beeret al. [4] who gave voluntary participants a first-hand expe-rience of driving and being visited by a MRP system. Thisstudy identified users perspectives on the added value of thesystem, such as the ability to see who they are talking toand reduced travel costs. The study also also raised concernsabout social issues, such as a need to be able to refuse a calland monitor who has access to the system. Still in the con-text of eldercare, Kristoffersson et al. [19] reported aboutthe sense of presence among the alarm operators and healthcare professionals during the training session when using aMRP system. The study showed that the overall sense of so-cial richness as perceived by the pilots was high and that thepilots had a realistic feeling regarding their spatial presence.

In this article, we study the interaction between the pilotand the local user and describe the interaction using mea-sures that relate to 1) the spatial formation that occurs be-tween the local user and the pilot and 2) the perceived levelof social and spatial presence of the pilot user when em-bodied in the robot. To characterize spatial formation, wetake inspiration from Kendon’s F-formations which providea framework to describe the natural positioning/configurat-ion of people when engaged in specific tasks. i.e. how peo-ple orient themselves with respect to each other. To charac-terize the levels of social and spatial presence, a question-naire based on the Temple Presence Inventory (TPI) [22]and the Networked Minds Social Presence Inventory (Net-worked Minds) [6] was filled by the pilot user directly af-ter the interaction took place. The TPI and the NetworkedMinds are two standard questionnaires to assess the per-ceived presence. The Networked Minds is designed to mea-sure the users’ perceptions of others experienced through acommunication medium [5] and the TPI is appropriate foruse with most media and media content [22]. Both the TPIand Networked Minds have been applied in previous stud-ies in HRI, e.g. [2,1,12]. The questionnaire also includedquestions on the perceived ease of use.

A total of 21 interactions were performed in a smart-home environment where pilot users from a health care or-ganization visited an actor (representing an elder) via a spe-cific MRP system, called Giraff. By analyzing a third per-son video feed from a semi-scripted interaction, the spatialformations were determined. Correlation between the per-ceived level of presence and the various spatial configura-tions during the interaction were calculated in order to de-termine if “natural” configurations resulted in a higher levelof perceived spatial and social presence and perceived easeof use

The article is organized as follows, in Section. 2 a de-scription of Hall’sinterpersonal distancesand Kendon’sF-Formation Systemand related work is given. Section. 3 de-tails the experimental setup and data collection. Section.4

presents our hypotheses. The results are presented and dis-cussed in Section. 5. Finally, conclusions are drawn in Sec-tion. 6.

2 Spatial relationships in interaction

Much of the work presented in this article is based on twotheories on spatial handling with origins in HHI. These areHall’s interpersonal distancesand Kendon’sF-FormationSystem. In this section, we first describe these theories. Wethen discuss how spatial and embodiment constraints influ-ence the quality of interaction and spatial formations[13,17,18,23]. Communication over a medium is differentfrom HHI as several factors important for the level of in-timacy [3] are distorted [11]. Up until this point, there areonly a few HRI studies which make use of the F-formations,e.g. [15,20,31].

The termproxemicswas introduced by the anthropolo-gist Edward T. Hall in 1966. Proxemics is “the interrelatedobservations and theories of man’s use of space as a special-ized elaboration of culture”, p. 1[10]. According to Hall,the effects of proxemics can be summarized into “Like grav-ity, the influence of two bodies on each other is inversely pro-portional not only to the square of their distance but possiblyeven the cube of the distance between them”, p.121[10]. Hedescribes interpersonal distances and divides them into fourvoluntarily created classes: (1) intimate, (2) personal, (3) so-cial and (4) public. Although Hall warned for making gen-eralizations across cultures and different sorts of interactionsituations, they can be described according to the following.

Theintimate distanceranges from 0 to≈ 46 cm (0 to 18inches) and is the distance of love making and comfortingof others. Thepersonal distanceranges from≈ 46 cm to≈ 122 cm (18 inches to 4 feet) and represents interactionsoccurring between for example good friends or family mem-bers. Hall uses the length of an arm as a reference for thisdistance, while still possible to touch by stretching the arm,the field of vision has increased for the interactions.Socialdistanceranges from≈ 1.22 m to≈ 3.66 m (4 feet to 12feet). This distance represents the one typically held wheninteracting with an acquaintance and is more formal thanthe personal distance.Public distance, ranging from≈ 3.66m and more (more than 12 feet), is used for public speech.According to Hall, subtle nuances in tone of voice and fa-cial expressions can no longer be picked up at the publicdistance.1

Adam Kendon studies spatial relations that occur when-ever two or more people engage in an interaction and hisclaim is that“a behaviour of any sort occurs in a three-dimensional world and any activity whatever requires space

1 Huttenrauch [14] discusses the conversion of distances into cen-timeters and stresses the need to refer to the original distances in feetby Hall, p. 41.

Measuring the Quality of Interaction in Mobile Robotic Telepresence: A Pilot’s Perspective 3

of some sort”, p. 1[18]. This space must have physical prop-erties, allow the organism to do what it needs to do and bedifferentiated from other spaces. According to Kendon, itis common in any setting for several individuals to be co-present, but how they orient and space themselves in rela-tion to one another directly reflects how they may be in-volved with one another. Kendon’s F-formation system isa well known theory of spatial relationships. Kendon dis-tinguished three F-formations by observations; (1) vis-a-vis,(2) L-shape and (3) side-by-side are depicted in Fig. 1. Theyarise when“two or more people sustain a spatial and orien-tational relationship, p.209”[17].

1 2 3

Fig. 1 The different F-formations distinguished by Kendon. 1. Vis-a-vis, 2. L-shape and 3. Side-by-side.

Thevis-a-visis an arrangement formed by two individu-als who are facing each other, theL-shapeis generated whentwo individuals are standing perpendicularly to each otherfacing an object and theside-by-sidearrangement is whentwo participants stand close together looking in the same di-rection.

The shared space created by the F-formation is calledthe O-spaceand is the overlap of transactional segment towhich two or more people direct their attention and ma-nipulate objects [17,23]. The type of F-formation createdcan be influenced by environmental features such as walls,but what influence such spatial patterns have on interac-tion has not been studied in depth by Kendon [17,18]. TheF-formation spatial arrangement is reconfigured as partici-pants in a group come and go, and change their positionsand orientations [20]. However, the orientation of the lowerpart of the body has a dominant effect on the reconfigurationof the arrangement [17].

According to Marshall et al. [23], physical environmentscan limit and constrain opportunities for some shared activ-ities, while encouraging others. Marshall et al. commentson previously performed studies in contexts such as muse-ums to point out that physical aspects of the environmentsuch as architectural and interior design factors have a rolein an interaction. For example, the study highlighted howrarely groups larger than two discussed options with eachother and with counter staff. The reason seems to be largelycaused by the constraints of physical space in front of thecounter. They concluded that some features of the physical

environment may work to discourage the creation of certainF-formations.

Hornecker [13] has developed the concept ofembodiedconstraints, these restrict what people can do and therebymake some behaviors more probable than others. The em-bodied constraints refer to physical system set-up or config-uration of space and objects and can ease some types of ac-tivity and limit others. Thus, according to Hornecker, someembodied constraints provide implicit suggestions to act ina certain way.

The level of intimacy in HHI is an equilibrium“jointfunction of eye-contact, physical proximity, intimacy of topic,smiling, etc, p. 293”[3]. When using MRP systems, thiswould potentially mean that the experienced intimacy forthe local user and pilot engaged in an interaction would bedependent on the above factors. The communication via amedium causes a physical gap in itself, however the distancebetween the people interacting can be adjusted by either partof the pilot and the local user. A factor that has been pointedout as being of importance in a MRP system is the abil-ity to adjust the height of the robot to increase the level ofeye-contact [7,16]. This can be compared with real life situ-ations in which we generally prefer a conversation where allare either standing or sitting.

It should be noted that an interaction enabled by use ofcomputer-mediated technologies, e.g. a video-conference,can not be equivalent to face-to-face situations accordingto Heath and Luff [11]. They write that gestures and otherforms of body movement including gaze, which are system-atically employed in face-to-face communication prove in-effective in a large part in video communication. One of thereasons mentioned is that the recipient’s access to the ges-ture is the appearance of the movement on the screen whichonly constitutes a small part of the horizon. Heath and Luffalso claim that the current speaker faces related difficulties.In other words, his or her access to the remote environmentis severely delimited by angle of camera and orientation ofscreen. The versions of gestures are distorted by the angleof camera and subtle gestures may pass unnoticed and moredramatical ones may fill the entire screen of the recipient.

Kuzuoka et al. [20] examined how reconfiguration of F-formation arrangements occurred as an effect of a guidingrobot following a predefined trajectory at a museum whenrotating either its head, its upper body or the whole bodyto guide their listeners when talking about exhibits. The re-sults are corresponding well with Kendon’s finding that theorientation of the lower body has a dominant effect on theautomatic creations of F-formations. Yamaoka et al. [31] de-veloped a model for how an information-presenting robotshould appropriately adjust its position and found that robotsthat were constrained to an O-space were perceived as morecomfortable than robots being close either to the listener orthe object the robot was presenting.


Huttenrauch et al. [15] studied HRI in real home set-tings by deploying a service robot and instructing the usersto guide the robot around in their own home and to teach therobot rooms, locations and objects. Each trial was video-taped for later analysis and the study reveals two ways ofshowing objects. The first one was “park and show” in whichthe robot was stopped and shown objects that the user fetch-ed, typical for this behavior was that the user would con-sistently return and show the objects from the same dis-tance and orientation in avis-a-vis interaction. The otherway to show objects was “optimize and show” in whichthe user carefully maneuvered the robot in to a certain lo-cation before stepping behind the robot and labeling the ob-ject. The study also revealed the users’ preferred distanceswhen showing locations (≈95 to 110 cm) and objects (≈85to 105 cm) and that numerous narrow passages exist in realhomes. Generally, when passing a narrow passage the userwould lead at a preferred distance of 154 cm and avis-a-vis formation. In the study, the measure for a safety breach(distance where the robot would stop) was less than 50 cm,this was broken by the users in all homes. None of the usersseemed to hesitate to approach the robot that closely.

The instruction from behind formation is not found inthe F-formations as defined by Kendon. The preferred dis-tances when showing locations and objects are in the farrange of what Hall named personal distance. The distancekept in narrow passages was in the lower range for socialdistance. The formation that occurs in the narrow passageshows that one can also distinguish another spatial forma-tion in HRI which Huttenrauch callsfollow in which therobot follows the user.

In the outlined material on spatial formations in HRI, weare missing studies on how the quality of interaction withothers is perceived by people who are being embodied inMRP systems. We are interested in how the perceived qual-ity varies depending on what spatial formations they createduring an interaction with a local user of the system. Fur-ther, we investigate whether there exist correlations in be-tween the perceived social and spatial presence and choiceof spatial formations. To our knowledge, these factors havenot been studied before in HRI.

3 Method

The methodology consisted of inviting 21 alarm operators2

to a training session during which they were allowed to usethe system to make a remote visit to an elder’s home. Thisvisit was presented as an opportunity to train on steering and

2 The alarm operators respond to alarms coming from elderly whoby pushing a button on a necklace get in direct contact with theservice.The training session was a preparation for participation in a larger studyin which the Giraff would be deployed, used and evaluated withtwoelderly in a real home. One of which was using a wheelchair.

using the Giraff. The training session also served the pur-pose of collecting data via questionnaires and observations.

Eight hypotheses regarding e.g. expected behavior andexisting relations in between formations, presence and easeof use were formulated, these are presented in Section. 4.

3.1 The Giraff

The MRP system used in this study was the Giraff robot.It consists of a 163 cm robotic base equipped with a wide-angle lens web camera, a microphone, speakers and a screen.The client interface is installed on the pilot’s computer andallows the pilot to navigate the Giraff while interacting withthe local user. Graphical depictions of the Giraff are givenin Fig. 2 (a) and (b). A snapshot of the client interface isshown in Fig. 2 (c). The interface is designed to be used inan intuitive manner and not to require an extensive period oflearning. A standard computer running Windows, its point-ing device such as a mouse, a headset and a web camera issufficient. Navigation is done by pointing with the mousecursor on the real-time video feed received from the Giraffand pushing the left button of the mouse. The speed of therobot is dependent on how far from the Giraff’s current po-sition the pilot is pointing on the video feed and is automat-ically adjusted when the cursor changes position.

3.2 The Experiment

The training session took place in a smart-home environ-ment3. The locality for the training sessions was chosen inorder to simulate a close to real experience of use of therobot in a real home setting. This particular setup simulatedan elder residence in which one elder(actor) was sitting ina wheelchair. A graphical overview of the remotely visitedhome can be found in Fig. 3. There are constraints in thephysical space. The kitchen is too small to accommodateboth a wheelchair and a Giraff at the same time. The dis-tance between the table and television set makes it difficultto fit the Giraff beside the wheelchair. The spatial constraintsmay discourage creation of certain F-formations.

The alarm operators (pilots) who came individually wereplaced in a room where a laptop equipped with a headsetand a mouse was installed. Each training session began withinforming the participants about the computer and its con-nected devices and then instructing them to make a visit to aremote home with the Giraff. They were instructed to inter-act with the elder as if it was a visit to a real home. Further,they were informed that they would be asked to fill in a ques-tionnaire after completing the training session.

3 The environment [27] is located at AASS (Center for Applied Au-tonomous Sensor Systems) which is a research center atOrebro Uni-versity, Sweden.


Fig. 2 (a) The Giraff robot from behind (b) A person embodied in the tiltable Giraff screen (c) The Giraff client interface.

Fig. 3 A model of the visited smart-home. The numbers denote thewidth of door openings and other spaces, for example 1.51 m in thedoor opening to the bedroom and the width and depth of the kitchen is1.8 m. The cylinder represents the Giraff and the office chair representsthe wheelchair

The use of an actor was essential to script the visit andto ensure that the interaction was as similar as possible be-tween the visits from different alarm operators. However,two different actors were used during the training sessions.A snapshot showing different steps in one of the trainingsessions can be seen in Fig. 4. The procedure used is furtherdetailed in Section 3.3.

3.3 Procedure

The procedure outlined below was used for each visit. Here,we usepilot to denote the alarm operator,elder to denotethe actor in the smart home andresearcherto denote theresearcher who sat beside the pilot and provided technicalsupport in case of difficulties (e.g. the pilot cannot find thedocking station or does not know which buttons to press onthe Giraff Pilot application).

1. The researcher instructed the pilot to start the Giraff ap-plication, log on to the Giraff server and to connect tothe Giraff that was facing the wall. See Fig. 4 (a).

2. Once connected, the pilot was instructed to undock theGiraff from the docking station by pushing the buttonsBackward andTurn . The pilot was asked to locate theelder. The pilot would find the elder in the bed. No guid-ance in where to find the elder was given.

3. When the pilot had found the elder, the elder moved overto the wheelchair and asked the pilot to follow to thekitchen (S1). See Fig. 4 (b)-(d).

4. While in the kitchen, the elder started a discussion abouta medical issue (S2).

5. The elder then asked for help to find the remote controlfor the television set. The pilot and the elder moved tothe living room (S3) to find it. The pilot would tell theelderly that it was lying on the floor in between the sofatable and the television set (S4). See Fig. 4 (e).

6. After the pilot had found the remote control, an artifi-cially triggered alarm rang in the bedroom. Dependingon the pilot’s response, the elder found an appropriatemeans to conclude the conversation and asked the pilotto return to the docking station. See Fig. 4(f).

7. The pilot returned to the docking station and discon-nected from the Giraff with help from the researcher ifnecessary. See Fig. 4 (g).

For most pilots, the script took in between 200 s to 300 smeasuring from when the pilots had undocked and turnedthe Giraff until the moment when elderly said bye.

3.4 Data Collection

Throughout the experiment, ten permanently installed webcameras positioned at different locations in the ceiling ofthesmart-home recorded each of the participant’s training ses-sions. The cameras were configured to capture most parts ofthe apartment from different angles. A snapshot from oneof the videos is found in Fig. 5. The figure shows the Gi-raff and the elder in the wheelchair from several angles al-lowing analysis of different parameters, e.g. F-formations.Video recordings enable repeated and detailed access to theconduct and interaction of participants, and, more specifi-cally, the interplay of talk, bodily, and material conduct [24].Recording of video data is an ethical issue. It is argued thatwhen being filmed, people inevitably react to the camera -rendering the data is unreliable [9]. In this experiment how-ever, the participants were unaware of the fact that the Giraff


(a) Undocking (b) Meeting the elder (c) Follow to kitchen

(d) In the kitchen (e) Locate Remote Control (f) Farewell

(g) Docking

Fig. 4 Snapshots from a Trial Run

was being recorded on video so they did not react to beingfilmed. The video recordings were taken without sound andthey do not reveal who is currently embodied in the MRPsystem. However, the choice of camera configuration camewith a sacrifice. The video recordings were not able to cap-ture facial expressions from the pilot or sound from the theinteraction between the pilot and the elder. The sacrifice lim-its the possibility to fully understand the interaction anditseffects on chosen spatial formations.

Upon completing the training, the participants were ask-ed to fill in the questionnaire. Two of the sections in thequestionnaire assessed the perceived social and spatial pres-ence. Each dimension of perceived presence in the question-naire consisted of several questions that were to be answeredon a likert scale 1-7 where 1 = not at all and 7 = to a veryhigh degree except for the questions in the TPI dimensionSocial richness. Social richness was assessed by asking thepilots to rate their experience in opposite couples (for exam-ple whether they perceived the experience to be Insensitive

or Sensitive) on a scale 1-7. The dimensionsObject real-ism and Person realismoriginate from the TPI dimensionPerceptual realism. The original dimension contained morequestions about modalities not available in a MRP system.Thus, a subset of questions from the perceptual realism di-mension was used.

4 Expected formations

The participants in this study were faced with a multiple ofnovelties, such as using a video conferencing system to in-teract with somebody, steering a robot, meeting a new per-son and a new environment. These factors, and the fact thatvideo conferencing systems cause a distortion in perceptionmade us expect that some pilots would not turn the Giraff ina vis-a-vis formation while interacting with the elder.

Two of the situations included in the script enforced amovement of the Giraff together with the elder (S1 and S3)and it was expected that these would lead to situations in


Fig. 5 A snap shot from one of the video recordings showing the dif-ferent angles of video capture.

which the user of the MRP system would either follow orgo ahead of the elder. WhileFollow andAheadare not in-cluded in Kendon’s F-formations, the follow formation hasoccurred in a previous study in HRI [15]. Thus, it could beimagined that this formation would occur when the eldergave commands of the type “follow me” in which the pi-lot would follow the elder. It could also be imagined that anopposite formation would occur when the elder said “showme” in which the pilot would goaheadof the elder.

As already discussed in Section. 3.2, the size of the kitch-en would potentially limit the possible spatial formationsbecause it could not accommodate a wheelchair and a Gi-raff at the same time. In combination with the fact that theelder would discuss a medical issue (S2) with the user ofthe Giraff we hypothesized that the user would form avis-a-visformation at an appropriate interpersonal distance withthe elder. Another space constraining factor in the apartmentwas the distance between the table and the television set. Weexpected that it would be difficult to fit the Giraff and thewheelchair beside each other at this location and hypothe-sized that aL-shapewould occur in S4.

Thus six patterns of formations were foreseen to occurduring the training sessions as shown in Fig. 6. Three ofwhich being F-formations as defined by Kendon and threeof which being assumed based on the above presented as-sumptions.

To conclude, the spatial formations were expected to varythroughout the scenario based on the different situations andchanges in available space in the scripted scenario. Specifi-cally we hypothesized that:

1

0

4 5

32

Fig. 6 The formations expected in the experiment, 1, 4 and 5 are F-formations: 0. Look-away, 1. Vis-a-vis, 2. Ahead, 3. Follow, 4. L-shapeand 5. Side-by-side. The elder is black and the Giraff is grey.

[H1] The user would follow the elder from the bedroom tothe kitchen in S1.

[H2] The user would turn the Giraff towards the elder (forma Vis-a-vis formation) during the encounter in the kitchenin S2.

[H3] The user would drive ahead of the elder when tryingto help the elder finding the remote control in S3.

[H4] The user would form an L-shape between the Giraffand the elder when describing the position of the remotecontrol in S4.

The questionnaire used in this study was partly based ontwo presence questionnaires, the TPI and Networked Mindsdesigned for use in different sorts of media. These question-naires had been previously used in the HRI field. Thus, itwas assumed that these questionnaires were suitable for usein the MRP domain.

[H5] The presence questionnaire used would be suitablefor use in the MRP domain.

According to Kendon, as discussed in Section. 2, it iscommon to be co-present when interacting with others. Thatis, how people orient and space themselves in relation to oneanother is a reflection of how they may be involved with oneanother. Thus it was also hypothesized that:

[H6] Relations between chosen formations and perceivedpresence would exist.

We further expected that the pilots perceiving the systemas being easier to use would orient themselves in suitableformations to a higher degree.

[H7] Relations between chosen formations and experiencedease of use would exist.


When interacting with humans, we feel a different emo-tional involvement with different people. This could be ex-pected to be the case also when interacting via an MRP sys-tem.

[H8] The choice of actor would matter regarding the per-ceived presence.

5 Results

5.1 Subjects

The users invited to the training session were 21 alarm oper-ators. The alarm operators had all seen an available instruc-tion movie before their arrival4. The average age of the userswasµage= 42.19, σage= 10.34. Only two of the alarm op-erators were men, therefore no comparison between gendersis done in this study. None of the users had previous expe-rience of Skype or similar systems for communicating withor without video feed. On a likert scale 1-7 where 1 = notat all and 7 = to a very high degree, the experience of usingsuch technologies wasµ = 1.90 andσ = 1.67. Thus therewas a dual novelty for the participants in that they lackedexperience of both videoconferencing technology and MRPsystems.

5.2 Choice of formations based on situations and space

It was hypothesized that the different situations S1-S4 incombination with space constraints, would result in the spa-tial formations as depicted in Fig. 6. To investigate this, thevideos were analyzed in several steps. First, each movie waswatched and notes were made on when the different stepsin the script occurred and how the interaction between theelder and the pilot took place. A number of fields of infor-mation had to be filled for each video. Secondly, all of the 21video recordings were re-watched and notes were made onhow the formations fluctuated during the interactions. Thesenotes were then converted to illustrative graphs showing thefluctuations between different formations.

To exemplify, two graphs showing the occurrences offormations between two different pilots and the elder arepresented in Fig. 7 (Pilot 1-1) and Fig. 8 (Pilot 1-10). Thex-axis shows the time having elapsed from when the pilotmoved the Giraff forward after undocking in Step 2. The y-axis shows the different formations as defined in Fig. 6.

In Fig. 7, Pilot 1-1 found the elder after 55 s. The pilotformed avis-a-visformation while interacting with the el-der in the bedroom. Thereafter, the pilotfollowed the elder

4 The original instruction movie for the participants was Swedish.An English version can be downloaded by clicking on “Giraff User’sGuide” at: http://www.oru.se/ExCITE/Part-3/User-manuals/English/.

to the kitchen (S1) after 78 s. Upon arrival to the kitchenafter 100 s, the pilot chose avis-a-visformation while inter-acting with the elder in the kitchen (S2). This formation waskept throughout S2 with a short break during which the pi-lot turned the Giraff around 360 degrees. The pilotfollowedthe elder from the kitchen towards the living room (S3) after183 s. At 195 s, the pilot formed a quickvis-a-visformationafter which it continued to lead towards the remote control.The remote control was found (S4) in aside-by-sideforma-tion after 207 s. This formation lasted for 13 s after which avis-a-visformation was upheld for 12 s. The elder then wenttowards the bedroom followed by the pilot, a newvis-a-visoccurred at 240 s. This formation lasted for 6 s after whichthe elder said bye. In this example, the pilot did as hypothe-sized in H1 and H2.

In Fig. 8, Pilot 1-10 formed avis-a-visformation whileinteracting with the elder in the bedroom. Thereafter, the pi-lot startedfollowing the elder to the kitchen after 43 s (S1)and arrived there after 62 s. The pilot chose avis-a-vis(S2)formation during the interaction in the kitchen. After 97 s,the pilot chose to goaheadto the living room (S3) in orderto find the remote control. The remote control was found af-ter 116 s during aL-shapeformation (S4). This formationlasted for 9 s after which aVis-a-viswas upheld until 172s when the elder said bye. In this example, the pilot did ashypothesized in H1-H4.

Fig. 7 An illustration of how the formations fluctuated during pilot 1-1’s training session. 0. Look-away, 1. Vis-a-vis, 2. Ahead, 3. Follow,4. L-shape and 5. Side-by-side. Details on the formations are found inFig. 6.

To be able to assess whether the pilots acted accordingto H1-H4 and whether the choice of formation correlatedwith perceived presence and ease of use, the actual occurringformations for each participant during the situations S1-S4were coded as a True or False variable. H1-H4 hypothesizedaround what spatial formations between the pilot and theelder that would occur during the different situations S1-S4in the scripted scenario. Table 1 summarizes the frequency


Fig. 8 An illustration of how the formations fluctuated during pilot 1-10’s training session. 0. Look-away, 1. Vis-a-vis, 2. Ahead, 3. Follow,4. L-shape and 5. Side-by-side. Details on the formations are found inFig. 6.

of how many pilots did as hypothesized in H1-H4 and howmany did not.

Table 1 Summary of the the amount of pilots doing as hypothesizedor not during the four specified situations. Numbers are given in per-centages.

Hypothesis True(%) False(%)H1 86 14H2 67 33H3 62 38H4 62 38

In total, 18 out of 21 pilots chose tofollow the elder fromthe bedroom to the kitchen in S1. The majority of the pilotsdid as hypothesized in H1.

Also, 14 out of 21 pilots chose to form avis-a-visfor-mation while communicating in the kitchen in S2. The otherseven pilots formed alook-awayformation by looking at thefridge or the wall to the right of the fridge. Two possible rea-sons are that the Giraff is equipped with a wide-angle lenscamera allowing the pilots to see a large field of view, an-other is the fact that turning the robot around is an effort andrequires navigational skills.

It was observed that, 13 out of 21 pilots chose to goaheadof the elder on the way to the living room in S3.When leaving the kitchen, the pilot would have to move outof the way of the elder in order to follow. The fact that 8pilots did not go ahead of the elder implies that the robotmay have been more difficult to navigate than what was ex-pressed in the questionnaire. Another reason may be that thepilots were focusing on the interaction with the elder ratherthan on navigating the robot.

Hypothesis (H4) was that an L-shape would occur whenfinding the remote control to the television set in the livingroom (S4). 13 out of 21 pilots formed aL-shapebut the other8 formed aside-by-sideshape in the space restricted area. It

is worth noting that those who formed aside-by-sideshapehad general difficulties navigating the Giraff, as observedbythe researcher. The assumption of an L-shape configurationin this situation seems reasonable and correlates well withthe fact that comfortable drivers of the Giraff conform tothis configuration.

5.3 The suitability of the presence questionnaires in theMRP domain

While the goals of this study was to investigate whetherthere are correlations between two measures on quality ofinteraction, presence and spatial formation, it is still neces-sary to analyze the applicability of presence questionnairesin the MRP domain.

The response on the different TPI dimensions is pre-sented in Table 2. Along with each dimension’s mean andstandard deviation, the internal consistency Cronbach’sα ispresented. In general, a Cronbachα between 0.6 and 0.7can be considered acceptable, anα between 0.8 and 0.95as good and anα higher than 0.95 as high. As can be seenin the Table 2, the calculated Cronbachα-values of eachdimension is 0.76 or higher indicating that the internal con-sistencies of the dimensions are all acceptable or good. Thisindicates that the TPI questionnaire can be used in the HRIdomain with our modifications.

Table 2 Summary of response among the presence dimensions in theTPI.

Dimension µ σ αSpatial Presence 3.05 1.80 0.76Social Passive Interpersonal 4.98 1.46 0.84Engagement (Mental Immersion) 4.99 1.60 0.85Object Realism 4.79 1.59 0.78Person Realism 5.31 1.37 0.76Social Richness 4.64 1.48 0.88

The response on the different Networked Minds dimen-sions is presented in Table 3. It should be noted that thechoice of scaling on some questions during language trans-lation in the dimensions “Co-presence” and “Attentional en-gagement” originating from Networked Minds unfortunatelycaused a low reliability when performing the same calcula-tions as for the TPI. Thus, the dimensions as they are used inthis study are not the same as the ones in Networked Minds.See Appendix. A for more information.

The Cronbachα-values calculated for the NetworkedMinds dimensions range from 0.66 to 0.91 which means thatthe internal consistencies are acceptable or high. The num-bers indicate that also the Networked Minds questionnairewith our modifications can be used in the MRP domain. Itmay also be the case that all questions in the dimensions


Table 3 Summary of response among the social presence dimensionsin the Networked Minds. Psycho-behavioral interaction has four sub-dimensions; Attentional engagement, Emotional Contagion, Compre-hension and Behavioral Interdependence.

Dimension µ σ αCo-presence 5.31 1.61 0.87Psycho-behavioral interactionAttentional engagement 5.81 1.08 0.66Emotional Contagion 2.96 1.80 0.91Comprehension 4.84 1.58 0.77Behavioral Interdependence 4.70 1.81 0.91

“Co-presence” and “Attentional engagement” if promptedcorrectly could be used in the MRP domain but this studycan not confirm that.

Using the Networked Minds scale also brings the oppor-tunity to look for a symmetry among the response, namelywhether the pilot perceives that the elder shares the samestate of Social Presence. In our experiment the pilots re-sponded that they perceived the elder to share their own stateof Social Presence. This was an unexpected result since thepilots were faced with a multiple of novelties; a video con-ferencing system, steering a robot, meeting a new personand a new environment.

In this study, the presence dimensions in both the TPIand the Networked Minds show acceptable to high internalconsistencies. Thus, the hypothesisH5 - The presence ques-tionnaire used was suitable for use in this HRI settingisconfirmed.

5.4 Relations between formations and perceived presence

F-tests were done to investigate whether there existed rela-tionships in between the chosen formations in the situationsS1-S4 and and all the outlined dimensions of presence. Wefound differences in the perceivedAttentional engagementdepending on spatial formation in S1 andComprehensiondepending on spatial information in S4.

All pilots experienced a relatively highAttentional en-gagementwith the elder. However, those who droveaheadof the elder experienced a significantly higher level ofAtten-tional engagementwith respect to those whofollowed theelder in (S1). Specifically F(1,19)=6.36,≺ 0.05 (µahead=

6.11,σahead= 0.23 andµ f ollow = 5.78,σ f ollow = 0.21).The pilots who chose the hypothesizedL-shapein S4

experienced a higherComprehensionthan the pilots whochose theside-by-sideformation, Specifically F(1,19)=4.26,≺ 0.05 andµL−shape= 5.15,σL−shape= 0.96 among the 13pilots choosing the hypothesized formation andµside−by−side=

4.33,σside−by−side= 0.75 among the eight pilots who chosethe side-by-side formation. See Appendix. A for more de-tails on the dimension Comprehension.

Considering that the period of time during which the pi-lot described the position of the remote control was short(e.g. 13 s for pilot 1-1 in a side-by-side formation and 9 s forpilot 1-10 in a L-shape formation), it is interesting that thereexists a significant difference in perceived Comprehension.The numbers indicate that the pilots who formed the hypoth-esized L-shape were also able to better understand the el-der’s intentions, thoughts etc. As pointed out in Section 5.2,five of the pilot’s who chose the side-by-side formation inS2 were observed to steer the robot with more difficulty. Itmay be the case that they were not able to focus as muchon the elder and thus not understand the elder’s intentions,thoughts etc. to the same degree.

The majority’s choice of formation when describing theposition of the remote control and its relation toCompre-hensionis well in line with Kendon. He claims that “In con-versations between just two persons, when the topic is dis-embodied, the arrangement tends to beL-shaped...Typically,when two people greet one another and then continue totalk together on some topic, they can be observed to be-gin with a face-to-face arrangement and then to shift to anL-arrangement as they move from salutation to talk” pp. 8-9. [18]

To summarize, as hypothesized in H6 there are correla-tions in between what spatial formations the pilots used andhow spatially and socially present they felt in the environ-ment and with the elder.

5.5 Relations between formations and perceived ease of use

A summary of the perceived ease of use of the MRP systemGiraff is given in Table. 4. The pilots were asked to respondto a number of questions such as “How was it to connect toGiraff?” on a likert scale 1-7 where 1 = very difficult and 7 =very easy. The users have responded 5 or higher in averageon all questions asked regarding the perceived ease of use.

Table 4 A summary of the response regarding the perceived ease ofuse of the system. 1 = very difficult, 7 = very easy.

How it was to µ σstart Giraff Application 6.52 0.81connect to the Giraff 6.38 0.86leave docking station 6.38 1.07make a u turn 5.71 1.27find the person you met 5.90 0.89stop 6.38 0.86go backwards 5.43 1.36follow the person you met 5.62 1.07help the person in the living room 5.95 0.92go back and dock the Giraff 5.86 1.11finish the call 6.29 1.01hear what the person said 5.52 1.29see the person I met 5.24 1.55keep appropriate distance to the wheelchair4.71 1.49


F-test analysis reveals that the perceived ease of use variesdepending on the occurring spatial formations in S1 and S2but not in S3 and S4.

The pilots who did as hypothesized in S1 answered sig-nificantly higher on how easy it was to start the Giraff ap-plication, F(1,19)=4.58,≺ 0.05. The mean value for the pi-lots choosing tofollow was µ f ollow = 6.67, σ f ollow = 0.59while the value for the pilots choosing to goaheadwasµahead= 5.67,σahead= 1.53. It could be the case that pilotswho found the interface easier to use were more focused onthe task at hand (follow elder to the kitchen).

There is a significant statistical correlation between thepilots’ choice of positioning in S2 and how easy the pilotsthought it was to leave the docking station (F(1,19)=8.14,≺ 0.01) and to make a u-turn (F(1,19)=9.16,≺ 0.01). Thepilots who chosevis-a-visformation during the interactionwith the elder perceived it as easier to both leave the dock-ing station (µvis−a−vis= 6.79,σvis−a−vis= 0.43 compared toµlook−away= 5.57, σlook−away= 1.51)and to make a u-turn(µvis−a−vis= 6.21,σvis−a−vis= 0.58 compared toµlook−away=

4.71, σlook−away = 1.70). As previously discussed in Sec-tion 5.2, one of the possible reasons for not positioning theGiraff in a vis-a-vis formation with the elder in S2 was thatan extra effort was needed. The pilots having chosen the vis-a-vis formation also responds that they perceived it as bothbeing easier to leave the docking station and to make a u-turnwith the Giraff. For them, steering the Giraff in to a vis-a-vis formation could have been considered as being less of aneffort than for the ones having more trouble navigating theGiraff.

To summarize, as hypothesized in H7 there are correla-tions in between what spatial formations the pilots used andhow easy to use they perceived the Giraff system to be.

5.6 Actor effect

Two different actors were used during the training sessions.Analysis whether or not the choice of actor influenced the pi-lots performance and perceived presence was needed. Therewere in fact several of the assessed variables that differeddepending on who was the actor. For example, the averagetime for the training session was much higher for one of theactors (7 min 22 s compared to 4 min 5 s). Another differ-ence was that the pilots meeting the other actor perceived ahigherEmotional Contagion(the degree to which they feltthey could affect the other s mental state). This could becompared to how we may influence others’ mental state todifferent degrees in real face-to-face interactions.

6 Conclusions

In this paper, we have demonstrated the use of several toolsin evaluating the quality of interaction when using mobilerobotic telepresence systems. These tools include, on onehand a questionnaire that assesses the perceived presenceand ease of use and on the other hand theories about spa-tial formations and how these influence the quality of theinteraction. What emerged from this study were correlationsbetween the dimensions measured in the questionnaires andthe chosen spatial formations. By scripting certain actions,we designed a scenario where sequences of behaviors wereexpected e.g. follow me. As such, it was expected that par-ticular formations should arise during these sequences. Theresults of this study show that pilot behaviors deviating fromthe expected formations often correlated to a decreased per-ception in terms of ease of use. Another reason for deviatingbehaviors in terms of F-formations may be that communica-tion over a medium is distorted, for example the camera onthe mobile robotic telepresence unit provided the pilot witha large field of view.

The results of this study also shows that there exist rela-tions between formations and the perceived presence whichare two measures of quality of interaction. For example, thepilots who formed the expected L-shape when finding the re-mote control were perceiving a higher Comprehension dur-ing the interaction with the elder. This is an important find-ing since this formation lasted for a significantly shorter pe-riod of time in each training session.

The nature of the interactions via social mobile robotictelepresence is complex. It encompasses human-human in-teraction, human-robot interaction and human-computer in-teraction. This study has used tools applicable for human-human interaction and human-computer interaction and eval-uated how the correlation of these tools apply to mobilerobotic telepresence. Specifically, we have used F-formations,the Temple Presence Inventory and the Network Minds So-cial Presence Inventory. The study has shown that these toolsare suitable for evaluating mobile robotic telepresence andalso that the correlations can give important guidelines ofhow to better operate the robotic unit in order to increasethe quality of interaction. For example, improving the inter-face in order to allow easier rotation of the robot in orderto change spatial formation could lead to a higher perceivedcomprehension of the thoughts and intentions of the otherand a higher quality in the interaction.

In future work, experiments with more pilots could fur-ther validate the existence of correlations between spatialformations and perceived presence and their applicabilityin measuring quality of interaction in human-robot interac-tion. In here lies addressing the issue of assessing the per-ceived co-presence and attentional engagement with morequestions. Of interest is also to study how the quality of in-


teraction as perceived by the elder varies depending on thespatial formations in various situations.

Appendix A Supporting definitions on PresenceThe level of co-presence“is influenced by the degree towhich the user and the agent appear to share an environmenttogether, p. 5” [5]. Theco-presenceas used in this studyconsists of only four questions:

1. I felt that x and I were in the same place.2. I believe that x felt as if we were in the same place.3. I was aware of that x was there.4. x was aware that I was there.

Theattentional engagement“seek to measure the degree towhich the users report attention to the other and the degreeto which they perceive the others level of attention towardsthem, p. 10” [5]. TheAttentional engagementas used in thisstudy only contains two questions:

1. I payed attention to x.2. x payed attention to me.

Comprehensionis the degree to which the user and the otherunderstand their respective intentions, thoughts etc. In to-tal six questions were asked in the dimension, examples in-clude:

1. I could communicate my intentions to x in a clear way.2. x could communicate his/her intentions to me in a clear

way.3. My thoughts were clear to x.4. I could understand what x meant.

Acknowledgements The authors would like to acknowledge Prof. Sil-via Coradeschi for her support in the project and to Martin Langkvistand Federico Pecora for participating in the experiment. We would alsolike to thank Rafael Munoz Salinas, Enrique Yeguas Bolıvar and LuisDıaz Mas for technical support with video recording.

References

1. Adalgeirsson, S.O.: Mebot: a robotic platform for socially embod-ied presence. Master’s thesis, Massachussetts Institute of Technol-ogy (2009)

2. Adalgeirsson, S.O., Breazeal, C.: Mebot: a robotic platform forsocially embodied presence. In: Proceedings of HRI 2010, pp.15–22 (2010)

3. Argyle, M., Dean, J.: Eye-contact, distance and affiliation. So-ciometry28(3) (1965)

4. Beer, J.M., Takayama, L.: Mobile remote presence systems forolder adults: Acceptance, benefits, and concerns. In: Proceedingsof HRI 2011, pp. 19–26 (2011)

5. Biocca, F., Harms, C.: Guide to the networked minds social pres-ence inventory. http://cogprints.org/6743/. Accessed 6 November2011

6. Biocca, F., Harms, C.: Networked minds social presence in-ventory (scales only version 1.2). Tech. rep., M.I.N.D. Labs,Michigan State University, Michigan, USA (2002). URLhttp://cogprints.org/6742/. Accessed 6 November 2011

7. Cohen, B., Lanir, J., Stone, R., Gurevich, P.: Requirementsanddesign considerations for a fully immersive robotic telepresencesystem. In: Proccedings of HRI 2011 Workshop on Social RoboticTelepresence, pp. 16–22 (2011)

8. Giraff: http://www.giraff.org. Accessed 6 Nov 20119. Gottdiener, M.: Field research and video tape. Sociological In-

quiry 49(4), 59–66 (1979)10. Hall, E.T.: The Hidden Dimension: Man’s Use of Space in Public

and Private. The Bodley Head Ltd, London, UK (1966)11. Heath, C., Luff, P.: Disembodied conduct: Communication

through video in a multi-media environment. In: Prodeedings ofCHI 1991, pp. 99–103 (1991)

12. Hoffman, L., Kramer, N.C.: How should an artificial entity be em-bodied? comparing the effects of a physically present robot anditsvirtual representation. In: Proceedings of HRI 2011 WorkshoponSocial Robotic Telepresence (2011)

13. Hornecker, E.: A design theme for tangible interaction: Embodiedfacilitation. In: Proceedings of ECSCW 2005, pp. 23–43 (2005)

14. Huttenrauch, H.: From hci to hri: Designing interaction for a ser-vice robot. Ph.D. thesis, KTH Computer Science and Communi-cation (2006)

15. Huttenrauch, H., Topp, E., Severinson-Eklundh, K.: The art ofgate-crashing bringing hri into users’ homes. Interact Stud10(3),275–298 (2009)

16. Jouppi, N., Thomas, S.: Telepresence systems with automaticpreservation of user head height, local rotation, and remote trans-lation. In: Proceedings of ICRA 2005, pp. 62–68 (2005)

17. Kendon, A.: Conducting Interaction: Patterns of Behavior in Fo-cused Encounters. Cambridge University Press (1990)

18. Kendon, A.: Spacing and orientation in co-present interaction.Lecture Notes in Computer Studies5967, 1–15 (2010)

19. Kristoffersson, A., Coradeschi, S., Severinson-Eklundh, K.,Loutfi, A.: Sense of presence in a robotic telepresence domain.In: Proceedings of UAHCI 2011 - Volume Part II, pp. 479–487(2011)

20. Kuzuoka, H., Suzuki, Y., Yamashita, J., Yamazaki, K.: Reconfig-uring spatial formation arrangement by robot body orientation. In:Proceedings of HRI 2010, pp. 285–292 (2010)

21. Lee, M.K., Takayama, L.: ”now, i have a body”: Uses and socialnorms for mobile remote presence in the workplace. In: Proceed-ings of CHI 2011, pp. 33–42 (2011)

22. Lombard, M., Ditton, T.: A literature-based presence measure-ment instrument the temple presence inventory (tpi) (beta). Tech.rep., M.I.N.D. Labs, Temple University, Pennsylvania, USA(2004). URL http://matthewlombard.com/research/P2scales11-04.doc. Accessed 6 November 2011

23. Marshall, P., Rogers, Y., Pantidi, N.: Using f-formations toanal-yse spatial patterns of interaction in physical environments. In:Proceedings of CSCW 2011, pp. 445–454 (2011)

24. Meisner, R., von Lehn, D., Heath, C., Burch, A., Gammon, B.,Reisman, M.: Exibiting performance: Co-participation in sciencecentres and museums. Int J of Sci Educ29, 1531–1555 (2007)

25. Michaud, F., Boissy, P., Labonte, D., Corriveau, H., Grant, A.,Lauria, M., Cloutier, R., Roux, M.A., Iannuzzi, D., M.-P.Royer:Telepresence robot for home care assistance. In: Proceedings ofAAAI Spring Symposium on Multidisciplinary Collaboration forSocially Assistive Robotics 2007 (2007)

26. QB: http://anybots.com. Accessed 6 November 201127. Saffiotti, A., Broxvall, M.: PEIS ecologies: Ambient intelligence

meets autonomous robotics. In: Proc of the Int Conf on SmartObjects and Ambient Intelligence (sOc-EUSAI), pp. 275–280.Grenoble, France (2005)

28. Texai: http://www.willowgarage.com/pages/texai/overview29. Tsui, K.M., Desai, M., Yanco, H.A., Uhlik, C.: Exploring use

cases for telepresence robots. In: HRI’11, pp. 11–18 (2011)30. VGo: http://www.vgocom.com/. Accessed 6 November 2011


31. Yamaoka, F., Kanda, T., Ishiguro, H., Hagita, N.: A model of prox-imity control for information-presenting robots. IEEE Trans onRobot26(1), 187–194 (2010)

Annica Kristoffersson is a PhD student at the Center of AppliedAutonomous Sensor Systems (AASS) atOrebro University, Sweden.She has a MSc in Media Technology and Engineering from LinkopingUniversity, Sweden. Currently, her main activities focus on ExCITE(AAL Call 2) where she focuses on longitudinal evaluation of socialrobotic telepresence systems. She has been co organizer of a workshopon social robotic telepresence.

Prof. Kerstin Severinson-Eklundh is a professor emerita in Human-Computer Interaction at the former IPLab, now the HCI group at theRoyal Institute of Technology (KTH), Sweden. Her previous projectsinclude models for human interaction with mobile service robots,aswell as user oriented development of mobile units for disabled personsin office environments. Recently she has coordinated KTH’s effort inthe European projects Cogniron (the cognitive robot companion) andCommRob (Advanced behaviour and high-level multimodal commu-nication with and among robots).

Dr. Amy Loutfi is an associate professor at AASS. She has drivenlarge and longitudinal evaluation studies within human-robot interac-tion, with particular focus on user perspectives of robots integratedin smart-home environments. Together with ISTC-CNR she has per-formed a number of cross culture evaluation studies which comparethe views of domestic robotics between elderly in northern andsouth-ern Europe.

Measuring the Quality of Interaction in Mobile … · Measuring the Quality of Interaction in...

Documents

Transcript of Measuring the Quality of Interaction in Mobile … · Measuring the Quality of Interaction in...