Systems for Crowdsourcing and Collaboration in Existing ... · to support any kind of collaboration...

Systems for Crowdsourcing and Collaboration in Existing Interfaces

Kyle I. Murray

Department of Computer Science

University of Rochester

May 2012

1

Abstract

Many existing computer interfaces have been designed for use by a single user.

However, there are many situations in which users of these single-user interfaces can

benefit from additional or complementary input to the interface from more people.

We introduce four systems which allow for collaboration and crowdsourcing across a

wide variety of single-user interfaces in popular use today: (i) Legion, a system which

a system which allows end users to easily capture existing GUIs and outsource them

for real-time control by the crowd; (ii) WeGame, a system that allows small groups to

more flexibly and collaboratively operate existing single-player games; (iii) Automatic

Function Definition, a system for integrating crowdsourced and collaborator-defined

subroutines into existing software development interfaces; and (iv) Multiverse, a system

that allows crowd algorithms to be applied to existing interfaces.

1 Introduction

Many existing computer interfaces have been designed for use by a single user. However, there

are many situations in which users of these single-user interfaces can benefit from additional

or complementary input to the interface from more people. These additional human sources

of input can be split into two categories: collaboration and crowdsourcing. Systems with

interfaces designed for a single user typically require substantial one-off programming effort

to support any kind of collaboration or crowdsourcing because the input space is limited

to that which a single user is typically able to provide, such as a single mouse pointer and

keyboard, or single videogame controller.

We introduce four systems which allow for collaboration and crowdsourcing across a

wide variety of single-user interfaces in popular use today: (i) Legion [12], a system which a

system which allows end users to easily capture existing GUIs and outsource them for real-

time control by the crowd; (ii) WeGame, a system that allows small groups to more flexibly

and collaboratively operate existing single-player games; (iii) Automatic Function Definition

2

(AFD) [15], a system for integrating crowdsourced and collaborator-defined subroutines into

existing software development interfaces; and (iv) Multiverse, a system that allows crowd

algorithms to be applied to existing interfaces.

In this paper, we will consider collaboration to be actions that involve people with a

professional or familiar relationship with the original user of the computer system who are

working to achieve a common goal state in the computer system being controlled. We will

consider crowdsourcing to be the ephemeral use of a large, readily available group of people

with the potential to be motivated to act collectively on behalf of the original user of the

system. Typically, recruiting for crowdsourcing takes place over the internet.

We focus on supporting existing interfaces rather than creating novel interfaces for three

primary reasons: (i) Familiarity : our systems involve users who are recruited to join the

original user, so using existing interfaces greatly increases the likelihood that these new

users will already be familiar with the interface; (ii) One-off programming effort : our systems

are designed to apply to many common interfaces, games, or programming environments,

so do not require substantial, if any, new programming effort to support a new interface;

(iii) Relevance: users of existing interfaces do not need to be convinced to use a new program

or interface in order to benefit from our system, so they can employ our systems on the tasks

and interfaces they care about.

In summary, our contributions are the following:

• We articulate the idea of enabling crowdsourcing and collaboration on existing inter-

faces.

• We describe techniques for building systems general enough to support swaths of

popularly-used existing interfaces.

• We implement our systems and describe the important, enabling components of each.

3

2 Background

2.1 Crowdsourcing

Individuals have long been able to control graphical computer interfaces in real-time re-

motely; systems such as the X Window System [18] and Virtual Network Computing (VNC)

[16] let a user control the existing input devices such as mice and keyboards by redirecting

input to the remote machine while showing the remote machines window(s) in a window on

the local machine. These systems differ from Legion and Multiverse in that they were not

designed to facilitate multiple users controlling a single system collaboratively.

The idea of using a crowd within an existing interface was pioneered by Soylent [1].

Soylent tasked crowd workers to rewrite sentences from a text document so that their length

could be shortened without changing the meaning of the sentence. It introduced the crowd

algorithm find-fix-verify to ensure that the crowd workers knew what type of problems to look

for, how to correct the problems, and also how to ensure that other workers had completed

the task satisfactorily. The system was packaged into the word processor Microsoft Word,

and required a significant programming effort that was specific to implementing Soylent

within Word.

In addition to the redundancy pattern used in Soylents find-fix-verify, other crowdsourc-

ing systems use crowd algorithms that involve patterns to try to ensure quality such as

iteration [13], or laying and task decomposition [10, 11]. Multiverse allows many of these

algorithms to be applied at the higher level of abstraction of the virtual machine, which

allows them to be applied to existing interfaces.

2.2 Collaborative Gaming

Multiplayer games have involved collaboration since before the advent of computer systems

or written history. Videogames on videogame consoles and in arcades have long included

4

multiple controllers so that players can control independent components of games such as

avatars. However, the collaborative control of a single in-game object has not been explored

in-depth. Prior work by Maynes-Aminzade et al. [14] and Carpenter [4] has studied control

of a single object by an audience of spectators who provided a single, low-bandwidth form

of input such as a colored marker held up to indicate an up or down direction to be read

by a computer vision system and interpreted as input. These systems involved significant

game- and environment-specific programming efforts and did not handle more complex input

schemes used by popular modern games, such as single- or dual-analog joystick input.

Prior work has considered the ways in which game players cooperate in games. In partic-

ular, El-Nasr et al. [5] identified cooperative patterns used by players in games meant to be

played collaboratively. This work provides a useful framework for evaluating the effectiveness

of a system that provides techniques for collaborative input. Its evaluation of cooperative

games provides some evidence for the potential benefits of increasing chances for cooperation

in games.

2.3 Crowdsourced Programming

Crowdsourcing, broadly defined, can be seen as a common source of computer programs.

Open source projects contributors can often be construed as crowd, as such projects often

have no particular rules about who participates, or the quality or size of an individuals con-

tribution. A representative open source community is GitHub [7], which encourages users

to fork others code without even having to contact the original author. Kickstarter [9] pro-

vides project-level structure and solicits crowdsourced funding of projects including software

development. Automatic Function Definition differs in that it is exposed in the interface of

the development environment and can be invoked while programming. It crowdsources code

at the level of the function, method, or subroutine rather than at the scope of a project.

Prior work has examined collaboration and code outsourcing with a software development

5

environment. Goldman et al. developed Collabode [8], a collaborative coding environment.

They included a system for masking errors generated by collaborators changes to the shared

codebase, and proposed the idea of micro-outsourcing, which uses a new interface specifically

built for communicating with collaborating programmers. Micro-outsourcing involves giving

natural language instructions to a collaborator to do a task whose scope is also defined by

natural language. Automatic Function Definition differs from micro- outsourcing in sev-

eral ways: (i) AFD enforces syntactic structure for external contributions, so a contribution

must be a function or other grammatical structure which can be automatically verified be-

fore its inclusion; (ii) Collabode uses collaborators as the only source of code contributions,

while AFD uses collaborators, crowdsourcing, and code search.; (iii) AFD uses the exist-

ing autocomplete interface, which is familiar to programmers using modern development

environments.

3 Legion

Legion is a system that allows end users to easily capture existing graphical user interfaces

and outsource them for collaborative, real-time control by the crowd. End users initiate

the system by first picking rectangular portion of their desktop interface which contains the

window or windows they would like the crowd to control. They then write a natural language

description of the task that they want to crowd to perform; each member of the crowd will

see this description. They then choose a price (in cents) that they are willing to pay the

crowd for having completed the task.

When the user submits the task details, Legion immediately forwards a video feed of the

selected portion of the desktop the group of workers that Legion has begun recruiting. We

use Amazons Mechanical Turk service as the source of crowd workers and use a technique

similar to quikTurKit by Bigham et al. [2], which quickly recruits a pool of workers and

attempts to retain a pool of the same size throughout the duration of a task.

6

Figure 1: Legion is a framework that allows existing interfaces to be outsourced to the crowd. Inthis example, a user has outsourced control of her Rovio robot. The Legion client allows end usersto choose a portion of their screen to send to crowd workers, sends a video stream of the interfaceto the server, and simulates events (key presses, mouse clicks) when instructed by the server.The Legion server recruits workers, aggregates the input of multiple crowd workers using flexibleinput mediators, and forwards the streaming video from the client to the crowd workers. The webinterface presents the streaming video, collects worker input (key presses and mouse clicks), andgives workers feedback.

The workers each see the same video feed of the end users desktop interface, and are

given instructions on how to control it. Their keypresses and mouse clicks are captured and

sent to the Legion server, which runs one of several types of input mediators. The purpose

of the input mediators is to decide which of the many near-simultaneous inputs from the

crowd workers to accept. There are numerous trade-offs to the different mediators, which are

described in detail in Lasecki et al. [12] The mediator named Leader, for example, measures

the level of agreement between the different workers input over time and periodically elects

the worker with the highest agreement to be the leader, whose input is forwarded directly to

7

the end users machine. The live interface video has an overlay which indicates various facts

about the performance of a worker, such as whether or not the worker has direct control at

the moment and what level of crowd agreement bonus the worker will receive for agreeing

with the input of the others.

The end user of Legion can decide when the crowd must stop controlling the users com-

puter. At any time during the task, the user is not restricted from providing input as usual,

directly from the physical input devices typically used to control an interface. They can

also decide whether or not the crowd completed the task as described, which determines

payment.

3.1 Implementation

Legion achieves its flexible applicability to existing interfaces in several ways. First, it uses

operating system level routines such as the Application Services framework in Apples OS X

to simulate keyboard and mouse input events. These events reach all applications unless the

application is using a rare, hardware-specific feature such as a non-standard media key on

the keyboard.

The graphical user interface on the end users machine is forwarded to the workers in

several steps. The first is to use the CamTwist library [3] to record the appropriate screen

rectangle and make it available to the operating system as a video source. This video source

is forwarded using the Real-Time Media Flow Protocol [17], a protocol for streaming live

video to web clients with low latency. The video is first forwarded to the central Legion

server from the end user, and then to the individual workers as they connect. In contrast to

technologies like VNC [16], RTMFP has a typical latency low enough to support real-time,

closed-loop control of remote interfaces. Because GUIs are graphical by nature, and because

CamTwist simply forwards these graphics, we are able to support essentially any graphical

interface with this combination of streaming and input events.

8

Figure 2: WeGame allows small groups of players to operate a game that was originally designedas a single-player game. This diagram illustrates the WeGame overlay that shows each player thecurrent state of their controller. The stars to the left of each player name indicate how many ofthe other players have voted for that player to have additional control influence.

4 WeGame

WeGame is a system that allows small groups to more flexibly and collaboratively operate

existing single-player games. Users of the system (players) start using the system by plugging

in a controller for each player. They then select an existing game on their computer which

would normally be played with a single controller. When the game starts, a new overlay

appears which shows graphical representations of the current state of each of the controllers

that is plugged into the system. Alongside these controllers, a list of names of the players is

shown.

Like Legion [12], WeGame uses input mediators to determine which final input is actually

forwarded to the unmodified game. In order to better support collaborative control as

9

opposed to crowd control, WeGame has a different set of input mediators. Several of the

mediators use a special form of voting input from the players. Each of the players is associated

with a particular color, and they control a star that can be toggled between each of the players

names on screen. The effect of moving the star is to vote for that player to temporarily gain

a degree of control over the input. Depending on the mediator, this could be full control or

a percentage weight in an average of the rest of the players.

While the input is being mediated, the rest of the game continues as usual, with the

exception that what was once a single-player game is now being controlled collaboratively

by numerous players. Players are free to implement their own social rules on top of the rules

imposed by the mediators. For example, they may agree to vote on a new leader after each

in- game death, or after each in-game level is completed.

4.1 Implementation

WeGames implementation is designed to support the typical multiplayer gaming scenario

in which a group of two to four players sit together on a couch and face a large television.

Typically, these games use a controller with one or two analog joysticks, plus a number of

face buttons and shoulder triggers. We chose Sonys DualShock 3 controller as it supports

both USB and Bluetooth. We chose Microsoft Windows as the supported operating system

due to its popularity among PC game makers.

A key component of WeGame is its use of VMulti, a human interface device driver for

Windows which simulate DirectInput devices such as joysticks, mice, keyboards, multitouch

input, and digitizer pens [20]. WeGame reads the state of each controller via USB, and then

forwards that input to our input mediation server and to our screen overlay. After the server

performs its mediation, it sends the output of a single, synthesized controller to VMulti.

The entire process happens within tens of milliseconds, so the system effectively maintains

the normal level of interaction and latency between the controller and the game.

10

5 Automatic Function Definition

Automatic Function Definition (AFD) [15] is a system for integrating crowdsourced and

collaborator-defined subroutines into existing software development interfaces. Specifically,

it is activated by the existing autocomplete interface feature found in many common devel-

opment environments. While existing autocomplete implementations typically only provide

name completion for existing function names, AFD uses the same interface as a way to

specify the name and function signature of a yet-to-be-defined function.

The typical autocomplete list is depleted when the user enters the name of a function

that does not exist. In AFD, however, there is always at least one item on the autocomplete

menu: define. When this item is selected, AFD takes the name of the function, and any

surrounding context such as arguments passed and any nearby natural language comments,

and packages them into a request for a definition.

The definition sources provided by AFD are as follows: (i) crowdsourcing, (ii) collab-

orators, and (iii) automatic code search. Refer to Murray and Bigham [15] for details on

automatic code search in AFD. Workers in the crowdsourcing service have two options: they

can either write the code themselves, or perform manual code search themselves to find a def-

inition on the web. Collaborators have the same two options available, but their motivation

as a collaborator is different than the typical monetary motivation of a crowd worker. Once

the worker or collaborator finishes their definition, they can send it back to the environment

of the original end user and it will be transparently inserted into the codebase.

5.1 Implementation

We implemented AFD in a web-based JavaScript editor. Code analysis necessary for the

autocomplete feature was performed using Vardoulakis and Shivers CFA2 algorithm [19], a

control flow analysis for JavaScript which performs type inference and supplies an annotated

syntax tree of a given input program. Requests for function definitions are presented in a

11

Figure 3: End users encapsulate tasks for the crowd in virtual machines that are then replicatedon the Multiverse server and controlled by crowd workers via a web-based VNC connection. Theserver implements crowd algorithms on top of these virtual machines to ensure reliability.

web interface that communicates with the end users editor.

6 Multiverse

Multiverse is a system that allows crowd algorithms to be applied to existing interfaces. It

reduces the one-off programming efforts previously required to implement crowd algorithms

in existing interfaces. Multiverse implements existing crowd algorithms by encapsulating a

users application state into a virtual machine. It clones virtual machines to support multiple

workers working on the same part of the task at once. These clones are redundant, and

Multiverse supports a voting mechanism to decide which of the clones completes the task

in the most satisfactory way. In addition to just cloning, Multiverse also support a robust,

between-VM copy and paste operation. Copying and pasting allows the work of multiple

working to be merged. In effect, this supports a broad subset of what would be achievable if

it were feasible to merge two whole virtual machines together. This functionality allows for

the merging (reduce) step in task decomposition algorithms like CrowdForge [10].

To use Multiverse, a user initializes a virtual machine into a desired starting state. For

example, a user may open Adobe Photoshop and start an image editing session that she

12

Step 1: FindFind a close-up image of a flower on flikr.com and load it in GIMP.

Step 3: CropCrop the image so that only the petals and top of the stem are visible.

Step 5: LabelConsult Wikipedia to label the petals, pistil, stamen, pedicol, and bract.

Step 6: Vote FinalStep 2: Vote Step 4: Vote

Figure 4: Results of an example run of Multiverse. The task has been divided this process intothree steps. Three crowd members perform each step separately in a private VM, and then othercrowd workers vote on the best, which is carried on to the next stage.

plans to have the crowd complete. She then opens the Multiverse interface and writes a

short description of the steps of the task. Multiverse supports steps that require redundancy

to get good crowd output as well as steps that require creative and unique work on the part

of the crowd worker. Furthering our example, our user may want the crowd to find a good

photo of a flower off of the web, remove the background leaving just the flower, and then label

the parts of the flower to make a diagram. This example involves several voting steps where

the performance of the crowd can be evaluated. In another example, our user may want to

have the crowd author a Microsoft PowerPoint presentation. To use Multiverse in this case,

he will follow the same steps except that he will specify a copy-and-paste merge step, where

he asks the crowd to merge the slides that had been created by individual workers into one

presentation. At the end of the process, the end user is left with a single suspended virtual

machine which can be resumed and used just as though he had done the work himself.

6.1 Implementation

Multiverse takes advantage of the ubiquity of virtualization on modern computers to achieve

its broad compatibility with crowd algorithms and existing interfaces. Consumer-level vir-

tualization is becoming prevalent, with programs such as VMware Workstation [21] being

13

Figure 5: The screenshot on the left shows the end user interface for Multiverse as a task is inprogress. Each step shows the latest screenshot from each worker branch underneath the instructionthat the worker is given. The screenshot on the right shows the voting interface for the task startedin the left screenshot. Workers are shown full screenshots of the interface and asked to vote onwhich previous worker better completed the instructions.

marketed directly to regular computer users. Our implementation uses VMware Worksta-

tion virtual machines as the basis for initialization. VMware Workstation includes built-in

support for VNC [16] connections, which are used to give the crowd workers full control over

their individual copies of the end users original. The web interface uses a web-based VNC

client [6].

Multiverse manages the lifecycle of the virtual machines according the form that the

end user fills out to specify the steps of their task. We use TurKit [13] to post control,

voting, and merging tasks to Amazons Mechanical Turk service. Voting tasks present workers

with screenshots of the control or merging task given to previous workers. Voters see the

instruction the previous workers were given alongside a screenshot of the virtual machine

right before the previous task was submitted; they are asked to choose the virtual machine

which appears to have the best work.

We implemented the merging steps by first enabling a shared clipboard between concur-

rently running virtual machines. The clipboard uses the serialized form of whatever data is

14

on the clipboard and is in that sense agnostic to the contents. The web interface for the

workers in the merging step contains as many VNC frames as there are steps to merge. For

example, a worker asked to merge four PowerPoint slides would see four VNC frames. The

first VM is designated as the destination VM, and the rest are all sources for copying. This

relationship is indicated in the instructions as well as implicitly indicated by the presence of

a paste button on the first VM, and copy buttons on all of the other VMs.

7 Future Work

Future work may involve evaluations with real end users of the three systems in which we

have thus far only conducted preliminary validations. Lasecki et al. [12] performed in-depth

evaluations of Legion for a robot navigation task and a spreadsheet transcription task, as

well as smaller demonstrations of feasibility for other tasks such as controlling an assistive

software keyboard and controlling a makeshift robot. For WeGame, future research may

include comparisons between the single-player and collaborative adaptations of single-player

games, as well as comparisons between games intended to be multiplayer games and games

that were made to be multiplayer through WeGame. For Multiverse, we plan to examine

the ease with which real end users can package tasks into virtual machines. We would

also like to find whether they find it useful to package tasks and have them crowdsourced

in this way. If they do find it useful, then we will seek to find which tasks they want to

crowdsource. For Automatic Function Definition, we plan to examine the market for small-

scale crowdsourcing of programming tasks, and how it might be feasible to generate or find

a crowd of programmers that is knowledgeable enough.

Future directions of this work may also involve combinations of the different systems

introduced in this paper. Legion and WeGame already share some ideas in term of input

mediation. However, there are also opportunities to overcome limitations in some systems,

such as the limitation of Multiverse that virtual machines cannot effectively replicate state

15

external to the virtual machines, such as modifications made to an external website. In

those situations, a new system could choose between using Multiverse to complete a step

and using Legion to complete a step, depending on which of the respective strengths the

particular task might need.

8 Conclusion

We have introduced four systems for crowdsourcing and collaboration in existing interfaces:

Legion, WeGame, Automatic Function Definition, and Multiverse. We have explained how

our implementations of these systems were specifically designed to maximize the compati-

bility of our systems with existing interfaces. Our goals in doing so were to minimize one-off

programming effort for people implementing collaboration- and crowd- enhanced interfaces,

preserve end users familiarity with existing interfaces, and to maintain the relevance of the

contributions of our systems by making them work on top of popularly used applications.

9 Acknowledgements

The author would like to acknowledge the contributions of Jeffrey Bigham, Walter Lasecki,

Anna Loparev, Robert Miller, and Samuel White.

16

References

[1] M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R. Karger, D.Crowell, and K. Panovich. Soylent: a word processor with a crowd inside. In Proc.of the ACM Symp. on User interface software and technology, UIST 10. 313322.2010.

[2] J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarow-icz, B. White, S. White, and T. Yeh. Vizwiz: nearly realtime answers to visualquestions. In Proc. of the annual ACM Symp. on User interface software and tech-nology, UIST 10. 333342. 2010.

[3] CamTwist. 2012. http://allocinit.com/macosxprojects/camtwist/

[4] L. Carpenter. Method and apparatus for audience participation by electronic imaging.US Patent 5210604. 1993.

[5] M. S. ElNasr, B. Aghabeigi, D. Milam, M. Erfani, B. Lameman, H. Maygoli, S. Mah.Understanding and evaluating cooperative games. In Proc. of the 28th internationalconference on Human factors in computing systems, CHI 10. 253262. 2010.

[6] M. Fucci. Flashlightvnc. 2011. http://flashlightvnc.sourceforge.net

[7] GitHub, Inc. 2012. https://github.com

[8] M. Goldman, G. Little, and R. C. Miller. Collabode: Collaborative Coding in theBrowser. In Proc. of the 4th International Workshop on Cooperative and HumanAspects of Software Engineering, CHASE 11. 6568. 2011.

[9] Kickstarter, Inc. 2012. http://www.kickstarter.com/discover/categories/open%

20software

[10] Kittur, A. and Smus, B. and Khamkar, S. and Kraut, R. E. Crowdforge: Crowdsourcingcomplex work. In Proc. of the 24th annual ACM Symp. on User interface softwareand technology, UIST 11. 2011.

[11] A. P. Kulkarni, M. Can, and B. Hartmann. Collaboratively crowdsourcing workflowswith turkomatic. In Proc. of the ACM Conf. on Computer supported cooperativework, CSCW 2012.

[12] W. S. Lasecki, K. I. Murray, S. White, R. C. Miller, and J. P. Bigham. Realtime crowdcontrol of existing interfaces. In Proc. of the 24th annual ACM Symp. on Userinterface software and technology, UIST 11.

17

http://flashlight‐vnc.sourceforge.net

https://github.com

http://www.kickstarter.com/discover/categories/open%20software

http://www.kickstarter.com/discover/categories/open%20software

[13] G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. TurKit: human computationalgorithms on mechanical turk. In Proc. of the 23nd annual ACM Symp. on Userinterface software and technology, UIST 10. 5766. New York, NY, USA. 2010.

[14] D. MaynesAminzade, R. Pausch, and S. Seitz. Techniques for interactive audience par-ticipation. In Proc. of the Intl. Conf. on Multimodal Interfaces, ICMI 02. 15-20.2002.

[15] K. I. Murray and J. P. Bigham. Beyond Autocomplete: Automatic Function Definition.In Proceedings of the 2011 IEEE Symposium on Visual Languages and HumanCen-tric Computing, VL/HCC 11. 2011.

[16] T. Richardson, Q. Stafford–Fraser, K. Wood, and A. Hopper. Virtual network comput-ing. Internet Computing, IEEE. 2:3338. Jan/Feb 1998.

[17] RTMFP FAQ — Flash Media Enterprise Server 4.5. 2012. http://www.adobe.com/

products/flashmediaenterprise/rtmfpfaq.html

[18] R. W. Scheifler and J. Gettys. The X Window System. ACM Trans. Graph. 5:79109.April 1986.

[19] D. Vardoulakis and O. Shivers. CFA2: a ContextFree Approach to ControlFlow Analy-sis. In Proc. of the 19th European Symposium on Programming, ESOP 10. 570589.2010.

[20] vmulti. 2012. http://code.google.com/p/vmulti/

[21] VMware Workstation. 2012. http://www.vmware.com/products/workstation/

18

http://www.adobe.com/products/flash‐media‐enterprise/rtmfp‐faq.html

http://www.adobe.com/products/flash‐media‐enterprise/rtmfp‐faq.html

http://code.google.com/p/vmulti/

http://www.vmware.com/products/workstation/

Systems for Crowdsourcing and Collaboration in Existing ... · to support any kind of collaboration...

Documents

Transcript of Systems for Crowdsourcing and Collaboration in Existing ... · to support any kind of collaboration...