Enabling Human Micro-Presence through Small-Screen Head-Up...

Enabling Human Micro-Presencethrough Small-Screen Head-UpDisplay Devices

Scott Greenwald

MIT Media Lab75 Amherst St.Cambridge, MA 02139 [email protected]

Mina Khan


Pattie Maes


AbstractOne of the main ways that humans learn is by interacting withpeers, in context. When we don’t know which bus to take, howto prepare plantains, or how to use a certain app, we ask anearby peer. If the right peer is not around, we can use amobile device to connect to a remote peer. To get the desiredanswer, though, we need to find the right person, and have theright affordances to send and receive the relevant information.In this paper, we define a class of micro-presence systems forjust-in-time micro-interactions with remote companions, andexplore dimensions of the design space. Informed by theories of

Permission to make digital or hard copies of part or all of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. Copyrights for

third-party components of this work must be honored. For all other uses,

contact the owner/author(s). Copyright is held by the author/owner(s).

CHI’15 Extended Abstracts, April 18–23, 2015, Seoul, Republic of Korea.

ACM 978-1-4503-3146-3/15/04.

http://dx.doi.org/10.1145/2702613.2732846

contextual memory and situated learning, we presentTagAlong, a micro-presence system conceived for learninga second language in context with help from a remotenative speaker. TagAlong uses a connected head-mountedcamera and head-up display device to share images of thewearer’s visual context with remote peers. The remotecompanion can convey knowledge using these images as apoint of reference– images are sampled, annotated andthen sent back to the wearer in real time. We report on aquantitative experiment to determine the effectiveness ofusing visual cues that involve spatial and photographicelements as a means of “pointing” at things to draw thewearer’s attention to them.

Author KeywordsRemote Collaboration, Situated Learning, Just-in-TimeInformation, Micro-Presence

ACM Classification KeywordsH.5.1 [Information Interfaces and Presentation (e.g.,HCI)]: Multimedia Information Systems—artificial,augmented, and virtual realities; K.3.1 [Computers andEducation]: Computer Uses in Education—collaborativelearning, distance learning ; H.4.3 [Information SystemsApplications]: Communications Applications; H.5.3:Group and Organization Interfaces—computer-supportedcooperative work, synchronous interaction.

Work-in-Progress CHI 2015, Crossings, Seoul, Korea

1157

Introduction and related workIn this paper, we conceptually investigate how mobile andwearable devices can facilitate learning through real-timemicro-interactions with remote peers, and present theTagAlong system as a concrete instance of a systemimplementing these ideas. We call such systems that aidthis type of interaction micro-presence systems. Thisconcept uses wearable computing and augmented realityto reap benefits of contextual learning and situatedlearning. In memory theory, it has been shown thatinformation is more easily recalled in context, meaningthat we would expect for language learners to have aneasier time recalling a word or phrase if it was taught tothem in a context than if they had learned it from a book.One example of learning in context would be learningnames for groceries while at the supermarket. In situatedlearning, learners in communities of practice take on rolesand responsibilities that are at first peripheral and simple,and over time grow in importance and complexity.Situated learning was conceived primarily in the context ofvocational/practical skills – one seminal work [4] makesreference to midwives, tailors, quartermasters, butchersand more. Bridging the gap between distributed groupsand practical knowledge is key so that scarce expertpractitioners could disseminate knowledge on a mass scaleand raise the quality of practice everywhere. Imagine aworld where every second-language learner is constantlyaccompanied by an expert speaker until maximalproficiency is achieved. A prior example of near-real-timemicro-presence system is VizWiz, which assists blind usersin distinguishing objects [2].

Micro-presence: Just-in-time micro-interactionswith remote peersInspired by the insights above, we conceive of a class ofmicro-presence systems for just-in-time micro-interactions

with remote peers. In advance of introducing our concreteexample, the TagAlong system, we present somebackground considerations and dimensions of themicro-presence design space.

Information FlowSchematically, micro-presence systems establish a flow ofinformation back and forth between the wearer and aremote companion. Information sent from wearer tocompanion allows the companion to establish anawareness of the wearer’s context, and thereby (i) inferwhat information is needed, and (ii) establish a vocabularyof common reference points to which such informationcan refer. Information sent from the companion to thewearer is just-in-time, contextual information. That is, ittells the wearer something relevant to her currentenvironment and activity.

In general this information flow can consist of any type ofinformation that the system can capture on the side of thewearer and represent on the side of the companion andvice versa. Audio, video, sensor and physiological data canall easily be captured by the wearer’s device and sent tothe companion. Information that passively flows from thewearer’s device is referred to as the context stream. Thetype of feedback signals at the disposal of the companiondepends on the capabilites of the wearer’s device. In ourexperimental setup, we look at a wearer-side system thatconveys information using a head-up display and visualcues containing photographic and spatial information.Wearing some other kind of output device, such asheadphones, a vibration motor, an air-based tactilefeedback device, or muscle stimulation device would givethe companion different channels through which tointeract with the wearer.


1158

Context AwarenessContext awareness can be a powerful tool, automaticallyinitiating interactions, changing the flow of information,and changing its presentation. For example, a request forpresence can be made automatically when a wearer is inmotion or is in a specific location. If the companion isalso mobile, the wearer can be notified when she isstanding still and able to respond to requests. The systemmay prompt the wearer to review prior information whenit detects idleness.

Personal ConnectionThe choice of how personal or impersonal an interaction isafforded by a system determines a great deal about howusers will engage, as well as the quality of informationthat can be obtained. A one-on-one phone call is quitepersonal, and will be advantageous in cases where forexample personal or culturally-sensitive advice is sought.On the other hand, if the information sought is minisculeand targeted – such as what bus to take – it may not beimportant or advantageous to make interactions sociallyintimate. In another way, if the interaction is many-to-oneit immediately becomes less personal, but may benefitfrom representing the aggregate knowledge of more people– such as a group that votes on your next chess move.

TagAlongTagAlong is a micro-presence system conceived for aidingusers with second language learning through the help of aremote native speaker. One or many remote companionusers provide information for a mobile wearer with GoogleGlass. This information uses as its primary point ofreference the visual and auditory information captured bythe Glass device. Specifically, companions can extractrectangular samples from images captured by the Glass,annotate them, and send them back to the wearer.

Annotations can be textual, sketched, or auditory. In thissection, we describe the system in detail.

Figure 1: TagAlong wearer (left) and companion (right)

Wearer AffordancesWait and watch. The wearer can passively receiveinformation the companion decides to send, and can alsorequest information. The information the wearer receivescan be graphical and/or auditory. The graphicalinformation may be a sample from a camera image takenfrom her device, and may also include text- orsketch-annotations.

Point and pin. To request information about an objector part of a scene, the wearer centers the object inquestion behind the crosshairs shown on the Glass display,and performs a gesture, which takes a picture, marks apoint of interest with a pin, and sends it to thecompanion. The wearer can also include an audiorecording to further specify what information is desired.

Companion AffordancesOnce a TagAlong session has begun, a companion caneither respond to a pin, or passively monitor imagescoming from the learner’s Glass until he chooses tointerject.


1159

ImplementationThe system consist of three main components – softwarefor Glass, a Python server, and companion web client.

Glass softwareThe Glass software is written in JavaScript to run on theWearScript platform. We used a modified version ofWearScript to allow for audio sent from the companion tobe saved and played back.

Python serverThe python server mediates communication betweenGlasses and companion web clients. It connects to bothusing WebSockets. Images sent from the Glasses to theserver are saved, and the web clients are given urls wherethey can be requested over http.

Companion web clientThe companion web client is written in HTML5 to runboth on desktop clients and on mobiles. The client hastwo main views: the Image Selector, which allows thecompanion to choose from a selection of images, and theCard Composer, where cards are annotated and sent.

Pointing with small-screen HUD’sIn order to enable learning that is situated and in context,companions need to be able to indicate or “point” atobjects in the environment of the wearer, in order to giveinformation about them or in reference to them (see also[1, 3]). Here we investigate the effectiveness of onemethod of doing this, approaching it in practical termswith hardware and networking capabilities that are widelyavailable today.

Summary of setup and variablesIn our experiment we use a large 4K LCD touch screen todisplay the “environment” in front of the user, while

simultaneously displaying an image on a Google Glassdevice. The screen is placed at roughly the same distanceas the focal plane of the head-mounted display, tominimize the time necessary for the user to refocus hiseyes between the head-mounted display and the screen.We measure the time it takes the user to find and touchthe object in question.

We measure the effect of several variables on the time tocomplete the task. In the environment stimulus we varysize of objects and location of objects, as shown in Figure2. We hypothesize that smaller objects will be moredifficult to find, as will peripheral objects compared withcentral ones. In the feedback stimulus, we vary theresolution of cropped images as well as the presence orabsence of a mini-map. Mini-maps represent the relativeposition of the cropped image in the wearer’senvironment, as shown in Figure 3. Due to limited cameraresolution, small objects must be presented in lowresolution. If it is the case that smaller objects take longerto find, we wish to separate the effects of low-resolutionpresentation from object size in the wearer’s optical fieldof view. Trials with and without a mini-map test thehypothesis that a mini-map will be particularly helpful forfinding small, peripheral objects.

Results and discussionWe had eight participants (N=8) in this experiment, andeach performed 90 target acquisition tasks. We usedwithin-subjects design for sizes, location and resolution(30 images each per user), and between subjects formini-maps (three with, five without). Data for twoparticipants was incomplete due to technical difficulties onthe system side.


1160

(a) Stimulus screen

(b) Crop size

(c) Crop location

Figure 2: Target sizes and location

(a) Stimulus screen

(b) Crop

(c) Mini-map

(d) HUD screen w/ mini-map

Figure 3: Screen and HUDstimuli

The strongest expected results were (i) the inverse linearrelationship between image size and time to acquire thetarget, and (ii) the longer time needed for peripheral vscentral targets. The most interesting unexpected resultwas that mini-maps sped up acquisition for medium andsmall images, but slowed participants down on largeimages. Participants using a mini-map appeared to requirea certain fixed time to interpret the map, regardless of thesize of the target.

We interpret the large variance in our measurements to bea result of variations in image qualities that were notcontrolled for (e.g. low/high contrast, presence ofrepetitive elements). On the whole, these other variationshave an influence of a comparable magnitude to thevariables we did control for.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4: Results varying (a) image size, (b) region, and (c)resolution

ConclusionWe presented a framework based on wearable computingand augmented reality to designing micro-presencesystems, in which the wearer is mobile and the companionis remote. It covers human factors as well asconsiderations related to system affordances. Weintroduced TagAlong, an end-to-end micro-presencesystem using Google Glass, that allows images annotated


1161

with text, sketches, and audio to be sent from thecompanion to the wearer. We showed through ourexperiment that our simple, robust, low-overhead systemprovides the core capability of establishing acorrespondence between digital and physical information.This and the whole class of future micro-presence systemshas potential benefits related to situated learning,contextual memory and social motivation.

Future WorkPointing in motion. We have explored the question ofhow to point at objects in scenes while the device weareris stationary. Given that the ultimate purpose is to usedevices “in-the-wild,” we need to explore methods forpointing while the user is mobile. A technique could eitherassists the user in returning to a location where the objectis visible, or provides enough additional information thatthe reference can be useful without such physicalbacktracking.

Mobile learning with cognition awareness. Can wequantify and model the wearer’s attention in such a wayas to optimize her learning? It this best accomplished bygiving the companion the decision of when to interject, orby using the system to regulate the delivery of informationdelivery to the wearer?

Social presence and motivation. While furtherexploring micro-presence use cases, where a remotehuman is involved in generating or mediating situation,real-time, or just-in-time information, the questionsrelated to social presence will be important. When arethere privacy concerns and how can they be dealt with? Inwhat cases is remote presence attractive or motivating forthe companion or the wearer – e.g. by creating a feelingof social closeness while sharing an experience? Does the

addition of other data streams, such as audio, motion, orphysiological signals enhance this effect?

AcknowledgementsWe thank Blake Elias, Cory Kinberger, and Christian D.Vazquez for their work developing the TagAlong system.Special thanks to Chris Schmandt for his helpfulsuggestions and feedback.

References[1] Baudisch, P., and Rosenholtz, R. Halo: A technique

for visualizing off-screen objects. In Proceedings of theSIGCHI Conference on Human Factors in ComputingSystems, CHI ’03, ACM (New York, NY, USA, 2003),481–488.

[2] Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A.,Miller, R. C., Miller, R., Tatarowicz, A., White, B.,White, S., and Yeh, T. VizWiz: Nearly Real-timeAnswers to Visual Questions. In Proceedings of the23Nd Annual ACM Symposium on User InterfaceSoftware and Technology, UIST ’10, ACM (New York,NY, USA, 2010), 333–342.

[3] Funk, M., Boldt, R., Pfleging, B., Pfeiffer, M., Henze,N., and Schmidt, A. Representing indoor location ofobjects on wearable computers with head-mounteddisplays. In Proceedings of the 5th AugmentedHuman International Conference, AH ’14, ACM (NewYork, NY, USA, 2014), 18:1–18:4.

[4] Lave, J., and Wenger, E. Situated Learning:Legitimate Peripheral Participation. Learning inDoing: Social, Cognitive and ComputationalPerspectives. Cambridge University Press, 1991.


1162

Enabling Human Micro-Presence through Small-Screen Head-Up...

Documents

Transcript of Enabling Human Micro-Presence through Small-Screen Head-Up...