Collaboration in Action: Enabling Innovative Scholarship with Social and Crowdsourcing Services
Systems for Crowdsourcing and Collaboration in Existing ... · to support any kind of collaboration...
Transcript of Systems for Crowdsourcing and Collaboration in Existing ... · to support any kind of collaboration...
Systems for Crowdsourcing and Collaboration in Existing Interfaces
Kyle I. Murray
Department of Computer Science
University of Rochester
May 2012
1
Abstract
Many existing computer interfaces have been designed for use by a single user.
However, there are many situations in which users of these single-user interfaces can
benefit from additional or complementary input to the interface from more people.
We introduce four systems which allow for collaboration and crowdsourcing across a
wide variety of single-user interfaces in popular use today: (i) Legion, a system which
a system which allows end users to easily capture existing GUIs and outsource them
for real-time control by the crowd; (ii) WeGame, a system that allows small groups to
more flexibly and collaboratively operate existing single-player games; (iii) Automatic
Function Definition, a system for integrating crowdsourced and collaborator-defined
subroutines into existing software development interfaces; and (iv) Multiverse, a system
that allows crowd algorithms to be applied to existing interfaces.
1 Introduction
Many existing computer interfaces have been designed for use by a single user. However, there
are many situations in which users of these single-user interfaces can benefit from additional
or complementary input to the interface from more people. These additional human sources
of input can be split into two categories: collaboration and crowdsourcing. Systems with
interfaces designed for a single user typically require substantial one-off programming effort
to support any kind of collaboration or crowdsourcing because the input space is limited
to that which a single user is typically able to provide, such as a single mouse pointer and
keyboard, or single videogame controller.
We introduce four systems which allow for collaboration and crowdsourcing across a
wide variety of single-user interfaces in popular use today: (i) Legion [12], a system which a
system which allows end users to easily capture existing GUIs and outsource them for real-
time control by the crowd; (ii) WeGame, a system that allows small groups to more flexibly
and collaboratively operate existing single-player games; (iii) Automatic Function Definition
2
(AFD) [15], a system for integrating crowdsourced and collaborator-defined subroutines into
existing software development interfaces; and (iv) Multiverse, a system that allows crowd
algorithms to be applied to existing interfaces.
In this paper, we will consider collaboration to be actions that involve people with a
professional or familiar relationship with the original user of the computer system who are
working to achieve a common goal state in the computer system being controlled. We will
consider crowdsourcing to be the ephemeral use of a large, readily available group of people
with the potential to be motivated to act collectively on behalf of the original user of the
system. Typically, recruiting for crowdsourcing takes place over the internet.
We focus on supporting existing interfaces rather than creating novel interfaces for three
primary reasons: (i) Familiarity : our systems involve users who are recruited to join the
original user, so using existing interfaces greatly increases the likelihood that these new
users will already be familiar with the interface; (ii) One-off programming effort : our systems
are designed to apply to many common interfaces, games, or programming environments,
so do not require substantial, if any, new programming effort to support a new interface;
(iii) Relevance: users of existing interfaces do not need to be convinced to use a new program
or interface in order to benefit from our system, so they can employ our systems on the tasks
and interfaces they care about.
In summary, our contributions are the following:
• We articulate the idea of enabling crowdsourcing and collaboration on existing inter-
faces.
• We describe techniques for building systems general enough to support swaths of
popularly-used existing interfaces.
• We implement our systems and describe the important, enabling components of each.
3
2 Background
2.1 Crowdsourcing
Individuals have long been able to control graphical computer interfaces in real-time re-
motely; systems such as the X Window System [18] and Virtual Network Computing (VNC)
[16] let a user control the existing input devices such as mice and keyboards by redirecting
input to the remote machine while showing the remote machines window(s) in a window on
the local machine. These systems differ from Legion and Multiverse in that they were not
designed to facilitate multiple users controlling a single system collaboratively.
The idea of using a crowd within an existing interface was pioneered by Soylent [1].
Soylent tasked crowd workers to rewrite sentences from a text document so that their length
could be shortened without changing the meaning of the sentence. It introduced the crowd
algorithm find-fix-verify to ensure that the crowd workers knew what type of problems to look
for, how to correct the problems, and also how to ensure that other workers had completed
the task satisfactorily. The system was packaged into the word processor Microsoft Word,
and required a significant programming effort that was specific to implementing Soylent
within Word.
In addition to the redundancy pattern used in Soylents find-fix-verify, other crowdsourc-
ing systems use crowd algorithms that involve patterns to try to ensure quality such as
iteration [13], or laying and task decomposition [10, 11]. Multiverse allows many of these
algorithms to be applied at the higher level of abstraction of the virtual machine, which
allows them to be applied to existing interfaces.
2.2 Collaborative Gaming
Multiplayer games have involved collaboration since before the advent of computer systems
or written history. Videogames on videogame consoles and in arcades have long included
4
multiple controllers so that players can control independent components of games such as
avatars. However, the collaborative control of a single in-game object has not been explored
in-depth. Prior work by Maynes-Aminzade et al. [14] and Carpenter [4] has studied control
of a single object by an audience of spectators who provided a single, low-bandwidth form
of input such as a colored marker held up to indicate an up or down direction to be read
by a computer vision system and interpreted as input. These systems involved significant
game- and environment-specific programming efforts and did not handle more complex input
schemes used by popular modern games, such as single- or dual-analog joystick input.
Prior work has considered the ways in which game players cooperate in games. In partic-
ular, El-Nasr et al. [5] identified cooperative patterns used by players in games meant to be
played collaboratively. This work provides a useful framework for evaluating the effectiveness
of a system that provides techniques for collaborative input. Its evaluation of cooperative
games provides some evidence for the potential benefits of increasing chances for cooperation
in games.
2.3 Crowdsourced Programming
Crowdsourcing, broadly defined, can be seen as a common source of computer programs.
Open source projects contributors can often be construed as crowd, as such projects often
have no particular rules about who participates, or the quality or size of an individuals con-
tribution. A representative open source community is GitHub [7], which encourages users
to fork others code without even having to contact the original author. Kickstarter [9] pro-
vides project-level structure and solicits crowdsourced funding of projects including software
development. Automatic Function Definition differs in that it is exposed in the interface of
the development environment and can be invoked while programming. It crowdsources code
at the level of the function, method, or subroutine rather than at the scope of a project.
Prior work has examined collaboration and code outsourcing with a software development
5
environment. Goldman et al. developed Collabode [8], a collaborative coding environment.
They included a system for masking errors generated by collaborators changes to the shared
codebase, and proposed the idea of micro-outsourcing, which uses a new interface specifically
built for communicating with collaborating programmers. Micro-outsourcing involves giving
natural language instructions to a collaborator to do a task whose scope is also defined by
natural language. Automatic Function Definition differs from micro- outsourcing in sev-
eral ways: (i) AFD enforces syntactic structure for external contributions, so a contribution
must be a function or other grammatical structure which can be automatically verified be-
fore its inclusion; (ii) Collabode uses collaborators as the only source of code contributions,
while AFD uses collaborators, crowdsourcing, and code search.; (iii) AFD uses the exist-
ing autocomplete interface, which is familiar to programmers using modern development
environments.
3 Legion
Legion is a system that allows end users to easily capture existing graphical user interfaces
and outsource them for collaborative, real-time control by the crowd. End users initiate
the system by first picking rectangular portion of their desktop interface which contains the
window or windows they would like the crowd to control. They then write a natural language
description of the task that they want to crowd to perform; each member of the crowd will
see this description. They then choose a price (in cents) that they are willing to pay the
crowd for having completed the task.
When the user submits the task details, Legion immediately forwards a video feed of the
selected portion of the desktop the group of workers that Legion has begun recruiting. We
use Amazons Mechanical Turk service as the source of crowd workers and use a technique
similar to quikTurKit by Bigham et al. [2], which quickly recruits a pool of workers and
attempts to retain a pool of the same size throughout the duration of a task.
6
Figure 1: Legion is a framework that allows existing interfaces to be outsourced to the crowd. Inthis example, a user has outsourced control of her Rovio robot. The Legion client allows end usersto choose a portion of their screen to send to crowd workers, sends a video stream of the interfaceto the server, and simulates events (key presses, mouse clicks) when instructed by the server.The Legion server recruits workers, aggregates the input of multiple crowd workers using flexibleinput mediators, and forwards the streaming video from the client to the crowd workers. The webinterface presents the streaming video, collects worker input (key presses and mouse clicks), andgives workers feedback.
The workers each see the same video feed of the end users desktop interface, and are
given instructions on how to control it. Their keypresses and mouse clicks are captured and
sent to the Legion server, which runs one of several types of input mediators. The purpose
of the input mediators is to decide which of the many near-simultaneous inputs from the
crowd workers to accept. There are numerous trade-offs to the different mediators, which are
described in detail in Lasecki et al. [12] The mediator named Leader, for example, measures
the level of agreement between the different workers input over time and periodically elects
the worker with the highest agreement to be the leader, whose input is forwarded directly to
7
the end users machine. The live interface video has an overlay which indicates various facts
about the performance of a worker, such as whether or not the worker has direct control at
the moment and what level of crowd agreement bonus the worker will receive for agreeing
with the input of the others.
The end user of Legion can decide when the crowd must stop controlling the users com-
puter. At any time during the task, the user is not restricted from providing input as usual,
directly from the physical input devices typically used to control an interface. They can
also decide whether or not the crowd completed the task as described, which determines
payment.
3.1 Implementation
Legion achieves its flexible applicability to existing interfaces in several ways. First, it uses
operating system level routines such as the Application Services framework in Apples OS X
to simulate keyboard and mouse input events. These events reach all applications unless the
application is using a rare, hardware-specific feature such as a non-standard media key on
the keyboard.
The graphical user interface on the end users machine is forwarded to the workers in
several steps. The first is to use the CamTwist library [3] to record the appropriate screen
rectangle and make it available to the operating system as a video source. This video source
is forwarded using the Real-Time Media Flow Protocol [17], a protocol for streaming live
video to web clients with low latency. The video is first forwarded to the central Legion
server from the end user, and then to the individual workers as they connect. In contrast to
technologies like VNC [16], RTMFP has a typical latency low enough to support real-time,
closed-loop control of remote interfaces. Because GUIs are graphical by nature, and because
CamTwist simply forwards these graphics, we are able to support essentially any graphical
interface with this combination of streaming and input events.
8
Figure 2: WeGame allows small groups of players to operate a game that was originally designedas a single-player game. This diagram illustrates the WeGame overlay that shows each player thecurrent state of their controller. The stars to the left of each player name indicate how many ofthe other players have voted for that player to have additional control influence.
4 WeGame
WeGame is a system that allows small groups to more flexibly and collaboratively operate
existing single-player games. Users of the system (players) start using the system by plugging
in a controller for each player. They then select an existing game on their computer which
would normally be played with a single controller. When the game starts, a new overlay
appears which shows graphical representations of the current state of each of the controllers
that is plugged into the system. Alongside these controllers, a list of names of the players is
shown.
Like Legion [12], WeGame uses input mediators to determine which final input is actually
forwarded to the unmodified game. In order to better support collaborative control as
9
opposed to crowd control, WeGame has a different set of input mediators. Several of the
mediators use a special form of voting input from the players. Each of the players is associated
with a particular color, and they control a star that can be toggled between each of the players
names on screen. The effect of moving the star is to vote for that player to temporarily gain
a degree of control over the input. Depending on the mediator, this could be full control or
a percentage weight in an average of the rest of the players.
While the input is being mediated, the rest of the game continues as usual, with the
exception that what was once a single-player game is now being controlled collaboratively
by numerous players. Players are free to implement their own social rules on top of the rules
imposed by the mediators. For example, they may agree to vote on a new leader after each
in- game death, or after each in-game level is completed.
4.1 Implementation
WeGames implementation is designed to support the typical multiplayer gaming scenario
in which a group of two to four players sit together on a couch and face a large television.
Typically, these games use a controller with one or two analog joysticks, plus a number of
face buttons and shoulder triggers. We chose Sonys DualShock 3 controller as it supports
both USB and Bluetooth. We chose Microsoft Windows as the supported operating system
due to its popularity among PC game makers.
A key component of WeGame is its use of VMulti, a human interface device driver for
Windows which simulate DirectInput devices such as joysticks, mice, keyboards, multitouch
input, and digitizer pens [20]. WeGame reads the state of each controller via USB, and then
forwards that input to our input mediation server and to our screen overlay. After the server
performs its mediation, it sends the output of a single, synthesized controller to VMulti.
The entire process happens within tens of milliseconds, so the system effectively maintains
the normal level of interaction and latency between the controller and the game.
10
5 Automatic Function Definition
Automatic Function Definition (AFD) [15] is a system for integrating crowdsourced and
collaborator-defined subroutines into existing software development interfaces. Specifically,
it is activated by the existing autocomplete interface feature found in many common devel-
opment environments. While existing autocomplete implementations typically only provide
name completion for existing function names, AFD uses the same interface as a way to
specify the name and function signature of a yet-to-be-defined function.
The typical autocomplete list is depleted when the user enters the name of a function
that does not exist. In AFD, however, there is always at least one item on the autocomplete
menu: define. When this item is selected, AFD takes the name of the function, and any
surrounding context such as arguments passed and any nearby natural language comments,
and packages them into a request for a definition.
The definition sources provided by AFD are as follows: (i) crowdsourcing, (ii) collab-
orators, and (iii) automatic code search. Refer to Murray and Bigham [15] for details on
automatic code search in AFD. Workers in the crowdsourcing service have two options: they
can either write the code themselves, or perform manual code search themselves to find a def-
inition on the web. Collaborators have the same two options available, but their motivation
as a collaborator is different than the typical monetary motivation of a crowd worker. Once
the worker or collaborator finishes their definition, they can send it back to the environment
of the original end user and it will be transparently inserted into the codebase.
5.1 Implementation
We implemented AFD in a web-based JavaScript editor. Code analysis necessary for the
autocomplete feature was performed using Vardoulakis and Shivers CFA2 algorithm [19], a
control flow analysis for JavaScript which performs type inference and supplies an annotated
syntax tree of a given input program. Requests for function definitions are presented in a
11
Figure 3: End users encapsulate tasks for the crowd in virtual machines that are then replicatedon the Multiverse server and controlled by crowd workers via a web-based VNC connection. Theserver implements crowd algorithms on top of these virtual machines to ensure reliability.
web interface that communicates with the end users editor.
6 Multiverse
Multiverse is a system that allows crowd algorithms to be applied to existing interfaces. It
reduces the one-off programming efforts previously required to implement crowd algorithms
in existing interfaces. Multiverse implements existing crowd algorithms by encapsulating a
users application state into a virtual machine. It clones virtual machines to support multiple
workers working on the same part of the task at once. These clones are redundant, and
Multiverse supports a voting mechanism to decide which of the clones completes the task
in the most satisfactory way. In addition to just cloning, Multiverse also support a robust,
between-VM copy and paste operation. Copying and pasting allows the work of multiple
working to be merged. In effect, this supports a broad subset of what would be achievable if
it were feasible to merge two whole virtual machines together. This functionality allows for
the merging (reduce) step in task decomposition algorithms like CrowdForge [10].
To use Multiverse, a user initializes a virtual machine into a desired starting state. For
example, a user may open Adobe Photoshop and start an image editing session that she
12
Step 1: FindFind a close-up image of a flower on flikr.com and load it in GIMP.
Step 3: CropCrop the image so that only the petals and top of the stem are visible.
Step 5: LabelConsult Wikipedia to label the petals, pistil, stamen, pedicol, and bract.
Step 6: Vote FinalStep 2: Vote Step 4: Vote
Figure 4: Results of an example run of Multiverse. The task has been divided this process intothree steps. Three crowd members perform each step separately in a private VM, and then othercrowd workers vote on the best, which is carried on to the next stage.
plans to have the crowd complete. She then opens the Multiverse interface and writes a
short description of the steps of the task. Multiverse supports steps that require redundancy
to get good crowd output as well as steps that require creative and unique work on the part
of the crowd worker. Furthering our example, our user may want the crowd to find a good
photo of a flower off of the web, remove the background leaving just the flower, and then label
the parts of the flower to make a diagram. This example involves several voting steps where
the performance of the crowd can be evaluated. In another example, our user may want to
have the crowd author a Microsoft PowerPoint presentation. To use Multiverse in this case,
he will follow the same steps except that he will specify a copy-and-paste merge step, where
he asks the crowd to merge the slides that had been created by individual workers into one
presentation. At the end of the process, the end user is left with a single suspended virtual
machine which can be resumed and used just as though he had done the work himself.
6.1 Implementation
Multiverse takes advantage of the ubiquity of virtualization on modern computers to achieve
its broad compatibility with crowd algorithms and existing interfaces. Consumer-level vir-
tualization is becoming prevalent, with programs such as VMware Workstation [21] being
13
Figure 5: The screenshot on the left shows the end user interface for Multiverse as a task is inprogress. Each step shows the latest screenshot from each worker branch underneath the instructionthat the worker is given. The screenshot on the right shows the voting interface for the task startedin the left screenshot. Workers are shown full screenshots of the interface and asked to vote onwhich previous worker better completed the instructions.
marketed directly to regular computer users. Our implementation uses VMware Worksta-
tion virtual machines as the basis for initialization. VMware Workstation includes built-in
support for VNC [16] connections, which are used to give the crowd workers full control over
their individual copies of the end users original. The web interface uses a web-based VNC
client [6].
Multiverse manages the lifecycle of the virtual machines according the form that the
end user fills out to specify the steps of their task. We use TurKit [13] to post control,
voting, and merging tasks to Amazons Mechanical Turk service. Voting tasks present workers
with screenshots of the control or merging task given to previous workers. Voters see the
instruction the previous workers were given alongside a screenshot of the virtual machine
right before the previous task was submitted; they are asked to choose the virtual machine
which appears to have the best work.
We implemented the merging steps by first enabling a shared clipboard between concur-
rently running virtual machines. The clipboard uses the serialized form of whatever data is
14
on the clipboard and is in that sense agnostic to the contents. The web interface for the
workers in the merging step contains as many VNC frames as there are steps to merge. For
example, a worker asked to merge four PowerPoint slides would see four VNC frames. The
first VM is designated as the destination VM, and the rest are all sources for copying. This
relationship is indicated in the instructions as well as implicitly indicated by the presence of
a paste button on the first VM, and copy buttons on all of the other VMs.
7 Future Work
Future work may involve evaluations with real end users of the three systems in which we
have thus far only conducted preliminary validations. Lasecki et al. [12] performed in-depth
evaluations of Legion for a robot navigation task and a spreadsheet transcription task, as
well as smaller demonstrations of feasibility for other tasks such as controlling an assistive
software keyboard and controlling a makeshift robot. For WeGame, future research may
include comparisons between the single-player and collaborative adaptations of single-player
games, as well as comparisons between games intended to be multiplayer games and games
that were made to be multiplayer through WeGame. For Multiverse, we plan to examine
the ease with which real end users can package tasks into virtual machines. We would
also like to find whether they find it useful to package tasks and have them crowdsourced
in this way. If they do find it useful, then we will seek to find which tasks they want to
crowdsource. For Automatic Function Definition, we plan to examine the market for small-
scale crowdsourcing of programming tasks, and how it might be feasible to generate or find
a crowd of programmers that is knowledgeable enough.
Future directions of this work may also involve combinations of the different systems
introduced in this paper. Legion and WeGame already share some ideas in term of input
mediation. However, there are also opportunities to overcome limitations in some systems,
such as the limitation of Multiverse that virtual machines cannot effectively replicate state
15
external to the virtual machines, such as modifications made to an external website. In
those situations, a new system could choose between using Multiverse to complete a step
and using Legion to complete a step, depending on which of the respective strengths the
particular task might need.
8 Conclusion
We have introduced four systems for crowdsourcing and collaboration in existing interfaces:
Legion, WeGame, Automatic Function Definition, and Multiverse. We have explained how
our implementations of these systems were specifically designed to maximize the compati-
bility of our systems with existing interfaces. Our goals in doing so were to minimize one-off
programming effort for people implementing collaboration- and crowd- enhanced interfaces,
preserve end users familiarity with existing interfaces, and to maintain the relevance of the
contributions of our systems by making them work on top of popularly used applications.
9 Acknowledgements
The author would like to acknowledge the contributions of Jeffrey Bigham, Walter Lasecki,
Anna Loparev, Robert Miller, and Samuel White.
16
References
[1] M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R. Karger, D.Crowell, and K. Panovich. Soylent: a word processor with a crowd inside. In Proc.of the ACM Symp. on User interface software and technology, UIST 10. 313322.2010.
[2] J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarow-icz, B. White, S. White, and T. Yeh. Vizwiz: nearly realtime answers to visualquestions. In Proc. of the annual ACM Symp. on User interface software and tech-nology, UIST 10. 333342. 2010.
[3] CamTwist. 2012. http://allocinit.com/macosxprojects/camtwist/
[4] L. Carpenter. Method and apparatus for audience participation by electronic imaging.US Patent 5210604. 1993.
[5] M. S. ElNasr, B. Aghabeigi, D. Milam, M. Erfani, B. Lameman, H. Maygoli, S. Mah.Understanding and evaluating cooperative games. In Proc. of the 28th internationalconference on Human factors in computing systems, CHI 10. 253262. 2010.
[6] M. Fucci. Flashlightvnc. 2011. http://flashlightvnc.sourceforge.net
[7] GitHub, Inc. 2012. https://github.com
[8] M. Goldman, G. Little, and R. C. Miller. Collabode: Collaborative Coding in theBrowser. In Proc. of the 4th International Workshop on Cooperative and HumanAspects of Software Engineering, CHASE 11. 6568. 2011.
[9] Kickstarter, Inc. 2012. http://www.kickstarter.com/discover/categories/open%
20software
[10] Kittur, A. and Smus, B. and Khamkar, S. and Kraut, R. E. Crowdforge: Crowdsourcingcomplex work. In Proc. of the 24th annual ACM Symp. on User interface softwareand technology, UIST 11. 2011.
[11] A. P. Kulkarni, M. Can, and B. Hartmann. Collaboratively crowdsourcing workflowswith turkomatic. In Proc. of the ACM Conf. on Computer supported cooperativework, CSCW 2012.
[12] W. S. Lasecki, K. I. Murray, S. White, R. C. Miller, and J. P. Bigham. Realtime crowdcontrol of existing interfaces. In Proc. of the 24th annual ACM Symp. on Userinterface software and technology, UIST 11.
17
[13] G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. TurKit: human computationalgorithms on mechanical turk. In Proc. of the 23nd annual ACM Symp. on Userinterface software and technology, UIST 10. 5766. New York, NY, USA. 2010.
[14] D. MaynesAminzade, R. Pausch, and S. Seitz. Techniques for interactive audience par-ticipation. In Proc. of the Intl. Conf. on Multimodal Interfaces, ICMI 02. 15-20.2002.
[15] K. I. Murray and J. P. Bigham. Beyond Autocomplete: Automatic Function Definition.In Proceedings of the 2011 IEEE Symposium on Visual Languages and HumanCen-tric Computing, VL/HCC 11. 2011.
[16] T. Richardson, Q. Stafford–Fraser, K. Wood, and A. Hopper. Virtual network comput-ing. Internet Computing, IEEE. 2:3338. Jan/Feb 1998.
[17] RTMFP FAQ — Flash Media Enterprise Server 4.5. 2012. http://www.adobe.com/
products/flashmediaenterprise/rtmfpfaq.html
[18] R. W. Scheifler and J. Gettys. The X Window System. ACM Trans. Graph. 5:79109.April 1986.
[19] D. Vardoulakis and O. Shivers. CFA2: a ContextFree Approach to ControlFlow Analy-sis. In Proc. of the 19th European Symposium on Programming, ESOP 10. 570589.2010.
[20] vmulti. 2012. http://code.google.com/p/vmulti/
[21] VMware Workstation. 2012. http://www.vmware.com/products/workstation/
18