Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval...

17
J Grid Computing (2009) 7:519–535 DOI 10.1007/s10723-009-9133-4 Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies David Toth · David Finkel Received: 15 January 2009 / Accepted: 13 August 2009 / Published online: 28 August 2009 © Springer Science + Business Media B.V. 2009 Abstract Volunteer computing projects have been used to make significant advances in knowledge since the 1990s. These projects use idle CPU cycles donated by people to solve computationally intensive problems in medicine, the sciences and other disciplines. It is important to use the donated cycles as efficiently as possible because participation in volunteer computing is low and the number of volunteer computing projects keeps increasing. Task retrieval policies, policies describing when a volunteered computer requests additional work from a server, can have an effect on the number of wasted CPU cycles and consequently, the number of tasks completed by clients. We present the results of simulating different task retrieval policies for clients under realistic conditions, including clients running on computers with one single-core CPU, clients running on computers with multi-core CPUs, and D. Toth (B ) Department of Computer Science, Merrimack College, 315 Turnpike Street, North Andover, MA 01845, USA e-mail: [email protected] D. Finkel Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA e-mail: dfi[email protected] clients running on computers that are put into a power save mode by environmentally conscious owners. Keywords Volunteer computing · Performance · Task retrieval · Simulation 1 Introduction Volunteer computing projects produce advances in medical, scientific, and mathematical knowl- edge by enabling researchers to solve problems that are too computationally demanding to solve without a supercomputer. A volunteer computing project’s sponsor provides a client program that uses a specialized algorithm to analyze the spon- sor’s data, and one or more Internet-connected servers to distribute the sponsor’s data to clients running on volunteered computers. The clients request data sets from one of the servers, down- load data sets, run the algorithm to analyze data, and return the results to one of the servers. The servers then aggregate the results from the clients. With millions of volunteered computers working on a single project, small improvements in the ef- ficiency of the client program can lead to projects being completed significantly faster. Using volunteer computing projects has be- come an accepted way to solve computation- ally intensive problems. The development of the

Transcript of Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval...

Page 1: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

J Grid Computing (2009) 7:519–535DOI 10.1007/s10723-009-9133-4

Improving the Productivity of Volunteer Computingby Using the Most Effective Task Retrieval Policies

David Toth · David Finkel

Received: 15 January 2009 / Accepted: 13 August 2009 / Published online: 28 August 2009© Springer Science + Business Media B.V. 2009

Abstract Volunteer computing projects havebeen used to make significant advances inknowledge since the 1990s. These projects useidle CPU cycles donated by people to solvecomputationally intensive problems in medicine,the sciences and other disciplines. It is importantto use the donated cycles as efficiently as possiblebecause participation in volunteer computingis low and the number of volunteer computingprojects keeps increasing. Task retrieval policies,policies describing when a volunteered computerrequests additional work from a server, can havean effect on the number of wasted CPU cyclesand consequently, the number of tasks completedby clients. We present the results of simulatingdifferent task retrieval policies for clients underrealistic conditions, including clients runningon computers with one single-core CPU, clientsrunning on computers with multi-core CPUs, and

D. Toth (B)Department of Computer Science,Merrimack College, 315 Turnpike Street,North Andover, MA 01845, USAe-mail: [email protected]

D. FinkelDepartment of Computer Science,Worcester Polytechnic Institute, 100 Institute Road,Worcester, MA 01609, USAe-mail: [email protected]

clients running on computers that are put into apower save mode by environmentally consciousowners.

Keywords Volunteer computing · Performance ·Task retrieval · Simulation

1 Introduction

Volunteer computing projects produce advancesin medical, scientific, and mathematical knowl-edge by enabling researchers to solve problemsthat are too computationally demanding to solvewithout a supercomputer. A volunteer computingproject’s sponsor provides a client program thatuses a specialized algorithm to analyze the spon-sor’s data, and one or more Internet-connectedservers to distribute the sponsor’s data to clientsrunning on volunteered computers. The clientsrequest data sets from one of the servers, down-load data sets, run the algorithm to analyze data,and return the results to one of the servers. Theservers then aggregate the results from the clients.With millions of volunteered computers workingon a single project, small improvements in the ef-ficiency of the client program can lead to projectsbeing completed significantly faster.

Using volunteer computing projects has be-come an accepted way to solve computation-ally intensive problems. The development of the

Page 2: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

520 D. Toth, D. Finkel

Berkeley Open Infrastructure for Network Com-puting (BOINC), which allows people to buildvolunteer computing projects quickly, rather thanhaving to write significant amounts of code fromscratch for each project, has helped increase thenumber of volunteer computing projects run-ning today significantly [1]. The number of vol-unteer computing projects running today hasexpanded from the few original projects such asSETI@home, Folding@home, and the Great In-ternet Mersenne Prime Search (GIMPS) to morethan 25 projects running currently [2–4].

While the increase in the use of volunteer com-puting has attracted more researchers to study andimprove it, the large increase in number of con-currently running projects has also slowed downthe progress for all of the projects because of in-creases competition for available volunteer com-puters. One report estimated that while there are300 million Internet-connected computers, fewerthan 1% of those computers are participating involunteer computing [5]. Adding to the challengesfacing volunteer computing, the push to becomemore environmentally friendly and conserve elec-tricity may make some people less likely to volun-teer their computer when it is idle. These peoplemay choose to have their computer enter a powersaving mode when they are not using it, instead ofvolunteering it. Due to the low participation rateand the high computational needs of the volunteercomputing projects, some of which may run indef-initely because they continue to produce data setsto analyze, attracting more volunteers and usingthe volunteered computers more effectively hasbecome extremely important if volunteer com-puting is to remain a viable method for solvingcomputationally intensive problems.

Not every volunteer computing client uses thesame policy for retrieving tasks and work by Tothand Finkel indicated that the task retrieval policyused by a volunteer computing client could impactthe number of tasks the client could complete [6].We study the two task retrieval policies used byclients today, as well as comparing them to otherpossible policies, in order to determine if using aparticular task retrieval policy would increase thenumber of tasks clients complete. As we study thevarious policies, we consider the effects the greenmovement and the trend of building computers

with multi-core CPUs have on the policies. Theresearch on how task retrieval policies affect thenumber of tasks volunteer computing clients com-plete for computers with single-core and multi-core CPUs was previously published in [7] and [8].We summarize those results here, present newresearch on how the environmental movementmay influence the productivity of clients usingthe various task retrieval policies, and indicatedirections for future research.

2 Methodology

We chose to use simulations to study the effects ofhaving clients use different task retrieval policiesbecause simulations provided several advantagesover analytic modeling and running an actual vol-unteer computing project. An analytic model wastoo complex to be very useful, as it needed toincorporate a great deal of information about thecomputers participating in volunteer computing tobe realistic and this information made the modelunwieldy. Although running an actual volunteercomputing project would incorporate more of thelow level details than a simulation would, runninga volunteer computing project would not yieldreproducible results. Running a project would alsorequire running additional projects in the eventwe wished to test the effects of new developmentson task retrieval policies. Using simulations al-lowed us to test the various policies under iden-tical conditions and yielded reproducible results,as well as allowing us to study the effects of newdevelopments on the different task retrieval poli-cies with relative ease by simply modifying thesimulator. In order to make the simulations asrealistic as possible, we chose to run trace-drivensimulations in which the traces represented theavailability of computers to work on volunteercomputing projects at a given time and thus, weneeded to have traces of computer availability [9].In order to construct a simulator that would pro-duce accurate results, we needed to learn:

1. When volunteer computing clients would beable to run on the volunteered computers.

2. How volunteer computing projects implementtask retrieval policies.

Page 3: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

Improving the Productivity of Volunteer Computing 521

3. How volunteer computing clients checkpoint.4. Which parameters of volunteer computing

projects are important and typical values ofthose parameters [7].

2.1 Computer Availability Traces

Because we had chosen to use trace-driven sim-ulations to compare the different task retrievalpolicies, we needed to have traces of computeravailability and the traces needed to meet severalcriteria.

1. The traces needed to reflect the different typesof computers that might be used for volun-teer computing. Home computers, businesscomputers, public computers such as thosein a public library or a computer lab on acollege campus, and undergraduate studentcomputers are different types of computersthat would be used for volunteer computing.We determined the usage patterns and thusthe availability of these types of computersare different and therefore, we needed to havetraces of each of these types of computers tomake our simulations as accurate as possible.

2. The traces needed to allows us to determinewhen a computer was available to work ona volunteer computing project, when a com-puter was unavailable to work on volunteercomputing projects but powered on, and whena computer was off. “Because most volunteercomputing programs consider a computer tobe available if the computer’s screensaver isrunning, we also used this criteria to deter-mine if a computer is available” [10]. There-fore, the traces need to allow us to determinewhen the screensavers of the computers wererunning, when the screensavers were not run-ning but the computers were on, and whenthe computers were powered off, with a fineenough granularity keep the simulations accu-rate.

3. The traces needed to be recorded in a consis-tent manner and for a continuous period oftime long enough so we would get an accuraterepresentation of computer usage and mini-mize anomalous data.

4. The traces needed to be recent enough toaccurately represent current patterns of com-puter usage [10].

Numerous studies of computer availability havebeen conducted. Wolski et al. collected data fromseveral computer science student labs at UCSBand a Condor pool at the University of Wisconsin.Mutka and Livny collected data from graduatestudents, faculty, and systems programmers [11].Acharya et al collected traces from a cluster ofpublic computers belonging to the Computer Sci-ence department at the University of Maryland, aCondor pool of approximately 300 workstations atthe University of Wisconsin, and a group of com-puters at UC Berkeley [12]. A study measuringthe availability of more than 200 computers at theSan Diego Supercomputer Center was publishedby Kondo et al [13]. However, none of these stud-ies collected data from the four types of computerswe specified in the first requirement and mostdid not meet our second or third requirement aswell. Current volunteer computing projects didnot collect the information we needed or did notrespond when we contacted them (Woltman, per-sonal communication; Anderson, personal com-munication). Because we were unable to obtaindata meeting our requirements from past studies,we developed a method of collecting the data oursimulations required.

We wrote a service to query the operatingsystem every 10 s to ascertain whether thescreensaver was running. For 28 days, the servicerecorded a continuous trace of the state of thescreensaver and whether the computer waspowered on to a file using a series of timestampsand text, starting a new file every day andtransmitting the previous file to a server (exceptfor the home and business computers, where thedata files were collected manually at the end of the28 days). The service was designed to minimizethe performance impact on the computers sousers would not notice any slowdown of theircomputers. The service used 0% CPU utilizationand less than 10 MB of RAM according TaskManager on a computer with a Pentium III 450MHz CPU. When the service needed to transmita data file to the server, the file was less than 10KB and thus not noticeable to a normal user.

Page 4: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

522 D. Toth, D. Finkel

We collected traces from computers from eachof the specified types. Of our 140 traces, 68(49%) came from computers in public computerlabs at Worcester Polytechnic Institute and 25(18%) from Worcester Polytechnic Institute’sundergraduate students. 20 traces (14%) camefrom various home computers and 27 (19%) froma business that requested to remain anonymous.After collecting the traces, we truncated them,removing any time period beginning before thefirst day of the trace or ended after the last day ofthe trace [10]. Our trace data is publicly availableat http://www.merrimack.edu/∼dtoth/papers_and_slides/computer_availability_trace_data.zip.

A new study collecting the availability of amuch larger set of computers has been conductedby Kondo, Andrzejak, and Anderson [14]. Kondoet. al.’s study classified computers in three cate-gories: home, work, and school, as compared toour home, business, public, and student computerclassification. We suggest the reader refer to theresults of Kondo et. al.’s study for more informa-tion about availability [14].

2.2 Task Retrieval Policies

Current volunteer computing clients use one oftwo task retrieval policies. We simulated these twopolicies and several more of our own devising. Thepolicies we simulated are summarized in Table 1.The first task retrieval policy we simulated is theBuffer Multiple Tasks policy, which is used inthe clients for projects built using BOINC [15].

Since the majority of current volunteer computingprojects were created with the BOINC frame-work, the effectiveness of this policy is critical.Clients created using the BOINC framework al-low users to configure several settings, includ-ing how much work the client should downloadwhen the client determines it needs to downloadmore work [16]. The client will override the user’spreferences and not download more tasks than itcalculates it will be able to complete on time [17].This mechanism increases the effectiveness of theclient by limiting the number of tasks it can down-load but not complete. This decreases the numberof CPU cycles a client would spend on tasks thatwill not complete, thereby limiting wasted CPUcycles and increasing the number of tasks theclient can complete in a given amount of time.“To simulate the Buffer Multiple Tasks policy(BOINC’s policy), when a client requests work,the simulator first determines how much CPUtime the client is expected to receive before a newtask would need to be completed. The expectedamount of CPU time is calculated by multiplyingthe percentage of time the computer was availableduring the 28 day trace by the wall clock timebefore the new task would be aborted. The simu-lator divides the expected amount of CPU time bythe time it should take to complete a task, gettingthe maximum number of tasks that are expectedto be completed before the new task would beaborted. If the client is buffering fewer tasks thanthe maximum number of tasks that could be com-pleted before the new task would be aborted, thenthe new tasks are retrieved. Otherwise, the new

Table 1 Task retrieval policies

Policy name Description

1 Buffer none “Does not buffer any tasks. Downloads a task after returning the result of the previous task” [7].2 Download early Downloads a new task when the client is 95% done with the task it is processing [9].3 Buffer 1 task Buffers one task so the client always has a task to process, even while it is downloading

a new task [9].4 Buffer multiple tasks “Buffers some number of days of tasks. The amount of tasks is limited to a number that can

possibly be completed before the tasks’ deadlines” [7].5 Super-optimal “Does not buffer any tasks. Downloads a task after returning the result of the previous task,

but assumes tasks are downloaded instantaneously. Does not attempt to execute a task thatwill not be completed on time” [7].

Page 5: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

Improving the Productivity of Volunteer Computing 523

tasks are not retrieved” [7]1. We have performeda complete set of simulations assuming the clientis configured to download 1 day of tasks, 3.5 daysof tasks, 7 days of tasks, and 14 days of tasks.

The second task retrieval policy we simulatedis Buffer None, which is used in the clients forGrid.org’s volunteer computing projects. BufferNone only downloads a task when it finishes thecurrent task and returns the result to server, orwhen it aborts the current task. The advantage ofthis policy is that since there is only one task thatcan be processed at any one time, a client usingthis policy has the most possible wall clock timeto process any task it downloads [18]. The disad-vantage of this policy is it wastes any availableCPU cycles the computer has until the client fin-ishes downloading the next task. Using this policy,clients with slow Internet connections can waste asignificant amount of CPU cycles that could havebeen volunteered while downloading large datafiles to perform short tasks.

We devised two other policies in an attemptto create a policy combining the best features ofthe existing task retrieval policies. The Buffer 1Task policy works like the Buffer None policy,except a client using it always keeps one task ina buffer. When the client completes a task andreturns the result, it immediately begins workingon the buffered task and starts downloading thenext task to refill the buffer. A second policy, theDownload Early policy, works like the Buffer 1Task policy, except it attempts to refill its bufferonly when the client estimates it has completed95% of the task it is processing. Both the Buffer1 Task and Download Early policies attempt tominimize wasted CPU cycles while downloadingthe next task and at the same time, maximize theamount of wall clock time that can be spent onevery task. In order to implement the DownloadEarly policy, the creator of the workunits for a

1Portions reprinted, with permission, from Toth, D. andFinkel, D. Increasing the Amount of Work Completedby Volunteer Computing Projects with Task DistributionPolicies. Proceedings of the 2nd Workshop on DesktopGrids and Volunteer Computing Systems - PCGrid 2008,April 18, 2008, Miami, Florida, USA. ©2008 IEEE.

volunteer computing project could build in codeto signal the client when an arbitrary percent ofthe workunit has been completed. The client couldthen begin downloading the next workunit. Thiswill not be suitable for some types of workunitswhere some of the computations of the workunitare skipped because an answer is obtained prema-turely, removing a need to complete a portion ofthe computational process to obtain a result.

We devised the Super-Optimal policy to estab-lish an upper bound on the number of tasks aclient could complete in the optimal case. “TheSuper-Optimal policy is similar to the Buffer Nonepolicy in that it doesn’t have the client download anew task until it has returned the task it is workingon. However, in the Super-Optimal policy, the actof downloading the next task is assumed to occurinstantaneously, while the Buffer None policy isunable to work on the task until sufficient time todownload the task has passed. Not downloading atask until it can be started minimizes the chancethat the task is not completed on time by givingthe computer the most wall clock time to completethe task before it would be aborted. Having thedownload take no time mimics the idea that down-loading the task uses very little CPU time andmostly other resources such as the network card,which might otherwise be unused as the currenttask is being processed. An aspect of this policythat is even more significant is the look-aheadfeature it uses. When the simulated client usingthe Super-Optimal policy is ready to downloadand start a task, it scans through the trace anddetermines if it would be able to complete the taskon time. If the client is able to complete the taskon time, it downloads the task and begins workingon it. Otherwise, the client sleeps for 1 s and thenchecks to see whether it could complete a task if itdownloaded and started it then” [7]. This policy“ensures that the most tasks will be completedby a client using the Super-Optimal policy byensuring that a task that will not be completeddoes not use CPU time that could otherwise beused to complete a task. We note that the Super-Optimal policy clearly cannot actually be imple-mented in a volunteer computing client becauseit requires knowledge of the computer’s future

Page 6: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

524 D. Toth, D. Finkel

availability. However, the Super-Optimal policyprovides an upper bound on the number of tasksa client can complete in a given amount of time.The upper bound serves as a benchmark, allowingus to determine how effective other policies areand whether there is sufficient motivation to try todevelop better policies” [7].

Our task retrieval policies should not have a no-ticeable impact on the servers used for volunteercomputing projects for the following reasons. Ourpolicies do not cause clients to download moretasks than the Buffer Multiple Tasks currentlyin use by many volunteer computing projects.In fact, our policies may result in fewer tasksbeing downloaded by clients. Each task requiresa small calculation before being downloaded toensure it could potentially be completed on time,so this small calculation is performed a little lessfrequently. However, clients using our new taskretrieval policies may request tasks more oftensince they only download one at a time. Thus,there may be some more requests to the servers.These requests also contain very little data andentail very little processing. Thus, we expect theadditional costs and savings to roughly offset eachother, and even if they do not, they are so smallthat they should not cause any noticeable impacton the servers.

2.3 Checkpointing

“Volunteer computing clients implement check-pointing at regular intervals to avoid needing torestart any incomplete task from the beginning in”the “event that the client is terminated for anyreason, such as the computer being turned off” [7].“Our simulator incorporates checkpointing to ac-curately mimic volunteer computing clients. How-ever, the data on how frequently checkpoints arecreated and how long it takes to create a check-point and restore the state of a task from a check-point was not available, and we believe that thislikely varies from project to project. Therefore, weassume that volunteer computing projects check-point at the optimal interval to minimize the timeit takes to complete the project. We estimatedthat it takes 10 s to create a checkpoint or torestore the state of a task from a checkpoint.We calculated the frequency that tasks would be

interrupted based on the computer usage traceswe collected. Using the estimate of time to createa checkpoint and restore from a checkpoint andthe failure rate of tasks, we used Young’s formulato calculate the optimal checkpoint interval foruse in our simulations [19]” [7].

2.4 Volunteer Computing Project Parameters

“We wanted to ensure that our simulations ac-curately showed which task retrieval policies en-abled clients to complete the most tasks over aperiod of time under realistic conditions. Ideally,we would observe that a particular policy con-sistently allowed clients to complete more tasks.If we observed that different conditions pro-duced different results, then volunteer computingproject sponsors could select the policy that wouldwork best for their project, based on the expectedvalues of their project’s parameters such as delaybound and file size. The parameters we used in thesimulations were:

– File Size. The size of the file required by eachtask. The file is downloaded before the taskcan be started.

– Download Speed. The download speed of theclient.

– Completion Time. The amount of CPU timerequired to complete a task if it is not inter-rupted.

– Buffered Days of Work. The number of daysof work a client using the Buffer MultipleTasks policy tries to buffer. This parameteris not relevant for the other task retrievalpolicies” [7].

– Retrieval Policy. The policy clients use to re-trieve tasks.

– “Delay Bound. The amount of wall clock timebetween completing the download of a taskand when the task will be aborted.

– Checkpoint Time. The time to create or re-store a checkpoint.

– Internet Connectivity. We assume that theclient is always connected to the Internet be-cause we do not expect mobile devices to beused to run volunteer computing clients.

We gathered as much information as we could findabout what parameter values specific volunteer

Page 7: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

Improving the Productivity of Volunteer Computing 525

computing projects use (shown in Table 2)” [7].“Using the information we gathered on existingvolunteer computing projects and calculations weperformed to estimate the optimal checkpoint in-terval for volunteer computing projects using theformula from Young [19], we determined the pa-rameters we used for our simulations. We adoptedthe following base parameter settings based on thedata in Table 2:

– File Size: 1 MB– Download Speed: 300 kbps, 10 mbps– Delay Bound: 7 days– Checkpoint Cost and Frequency: 10 s and

1574 s– Task Duration: 4 h, 24 h” [7].

2.5 Simulating Task Retrieval Policies forComputers with Multi-Core CPUs

In order to understand whether multi-coreCPUs would change which task retrieval policiesresulted in the most tasks being completed, wemodified our simulator and ran a new set of sim-ulations. “We began by calculating the number oftasks that we would expect a computer with an n-core CPU to complete if each core worked on aseparate task, based on the results from [7, 9]. Thisvalue gave us a baseline to compare the numberof tasks a computer with an n-core CPU wouldcomplete if each core could work on the same taskat once without incurring additional overhead andassuming that the task could be decomposed inton even sub-tasks that could be executed in parallelwithout additional cost” [8]2.

2.6 Simulating Task Retrieval Policies forComputers in the Context of the GreenMovement

In order to understand whether the green move-ment would affect which task retrieval policies

2Portions reprinted, with permission, from Toth, D. TheImpact of Multi-Core Architectures on Task RetrievalPolicies for Volunteer Computing. Proceedings of The 20thIASTED International Conference on Parallel and Dis-tributed Computing and Systems - PDCS 2008, November16–18, 2008, pp. 330–335, Orlando, Florida, USA. ©2008IASTED. T

able

2P

aram

eter

valu

esfo

rva

riou

svo

lunt

eer

com

puti

ngpr

ojec

ts[7

](©

IEE

E20

08)

Pro

ject

File

size

Del

aybo

und

Tas

kdu

rati

onC

heck

poin

ting

freq

uenc

y

SET

I@ho

me

350

KB

[20]

4to

60da

ys[2

1]10

s[2

2]F

oldi

ng@

hom

eN

orm

al<

5M

B,s

ome

new

Max

(10

days

or2

+30

∗ day

sto

do2–

4da

yson

benc

hmar

k15

min

[25]

larg

eon

es≈

5M

B[2

3,24

]on

dedi

cate

dP

4–2.

8m

achi

ne)

[25]

mac

hine

(P4–

2.8)

[26]

Ein

stei

n@ho

me

4.5

MB

,12

MB

,16

MB

[27,

28]

1w

eek

[28]

QM

C@

hom

e4–

48h

[29]

LH

C@

hom

e8

days

,the

n7

days

,the

n5

days

[30,

31]

Ros

etta

@ho

me

800

k,1

MB

,1.2

MB

,1.5

MB

,(6

72h)

chan

ged

to16

8h

[32]

3h,

upto

24h,

Var

ies

base

don

user

3M

B[3

2]2–

4da

ys[3

2,33

]se

ttin

gs[3

3]G

rid.

org

10K

B–

100

KB

[32]

100

h,15

0h,

200

h,20

hon

PII

-400

[32]

222

h,33

6h

[32]

Clim

atep

redi

ctio

n.ne

t34

7da

ys5

h20

min

or15

03

wee

ks,8

wee

ks,2

0w

eeks

[34]

15–3

0m

in[2

2]da

ys5

h20

min

[21]

SIM

AP

1–2

MB

[35]

10da

ys[2

1]2

h[3

5]T

heR

iese

lSie

vepr

ojec

t7

days

[36]

30m

in[3

7]10

min

[38]

Wor

ldco

mm

unit

ygr

id10

–20

h[3

9]

Page 8: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

526 D. Toth, D. Finkel

resulted in the most tasks being completed, wemodified our simulator again and ran one moreset of simulations. In these simulations, we as-sume computers will run the volunteer computingclients while they are powered on and the screen-saver is not running. We assume a computer willenter a power saving mode and the client willnot be able to run when a computer’s screensaverturns on.

3 Results & Analysis

This work began by examining the results of us-ing task retrieval policies for computers with onesingle-core CPU and we present those results inSection 3.1. We discuss how we validated oursimulations for single-core CPUs in Section 3.2.We then modified our simulator to analyze howthese results would be affected when computerswith multi-core CPUs participated in volunteercomputing and present those results in Section 3.3.Finally, we modified the simulator once more tounderstand the effects the environmental move-ment might have on the results for both single-core and multi-core computers. We present thoseresults in Section 3.4 and Section 3.5.

3.1 Single-Core CPUs

“We ran the simulations for the task retrieval poli-cies and the selected parameters using the traceswe had obtained. We compared the total num-ber of tasks each policy completed using all 140traces” “to the number of tasks the Super-Optimalpolicy completed. The results of the simulationsare shown in Table 3. The percentages shown forthe policies indicate how many fewer tasks thepolicies completed than the number of tasks theSuper-Optimal policy completed. Thus, volunteer

computing projects will complete more tasks byusing a policy with a lower percentage” [7]. Theclients completed more tasks when they used theDownload Early and Buffer None policies thanwhen they used the policies that buffer tasks. Forour base parameter settings, the total number oftasks completed by the clients was greater forthe Download Early and Buffer None policiesthan for the Buffer Multiple Tasks policy. For 4-htasks, there is not a large difference in the num-ber of tasks completed by the different policies.This is likely because clients had periods wherethe screensaver ran for long enough to completeentire tasks or significant portions of tasks. Thiswould lead to all of the clients using the differentpolicies not having to redo a lot of portions of atask and result in more tasks being completed ontime. However, the difference between bufferingtasks as compared to not buffering any tasks ismore significant when the tasks require 24 h ofCPU time to be completed. Based on these results,the Download Early and Buffer None policiesappear to be better policies for retrieving tasks involunteer computing projects for the given para-meter settings than the policies that buffer tasks.Another point worth noting is there is very littledifference between the amounts of tasks clientscompleted when they use the Download Early andBuffer None policies than the number of tasksclients completed when using the Super-Optimalpolicy. Therefore, it does not appear to makesense try to devise an adaptive policy to try todynamically determine how much work to bufferbased on recent computer usage patterns, in anattempt to increase the number of tasks volunteercomputing clients could complete. Such a policywould not be able to increase productivity signifi-cantly [7, 9].

“In our simulations, we observed that bufferingtasks led to completing fewer tasks. This is a result

Table 3 Differences between amount of tasks completed by super-optimal policy and other policies [7, 9] (©IEEE 2008)

Task completion Download Buffer Download Buffer 1 Buffer 1 day Buffer 3.5 days Buffer 7 days Buffer 14 daystime speed none early task of tasks of tasks of tasks of tasks

4 h 300 kbps 0.22% 0.03% 0.19% 0.25% 0.46% 0.94% 0.94%4 h 10 mbps 0.03% 0.03% 0.19% 0.25% 0.44% 0.93% 0.93%24 h 300 kbps 0.43% 0.38% 0.86% 1.13% 2.52% 3.33% 3.33%24 h 10 mbps 0.38% 0.38% 0.86% 1.13% 2.52% 3.33% 3.33%

Page 9: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

Improving the Productivity of Volunteer Computing 527

of the delay bound for tasks. As discussed inSection 2.4, the delay bound is the amount of wallclock time between when the file required for atask is downloaded and when the task must becompleted” [7]. Clients using the Buffer None pol-icy (and the Download Early policy in most cases)may use all the available CPU cycles betweenwhen it is downloaded and when its delay boundis over. “However, when a client uses the BufferMultiple Tasks policy, the downloads for all of thefiles for the tasks assigned at one time complete atapproximately the same time. Thus, the deadlinefor those tasks is almost identical and all of thosetasks must be completed in the available CPUcycles between when the files required for thetasks are downloaded and when the delay bound isover. In addition to all of those tasks needing to becompleted, the last task from the previous set of”downloaded tasks “must be completed during thatsame block of available CPU cycles. Thus, if thereare x usable CPU cycles between when the filesfor the tasks are downloaded and when the delaybound is over and if n tasks are downloaded at onetime by a client using the Buffer Multiple Taskspolicy, n+1 tasks must be completed in the x CPUcycles. In contrast to that, a client using the BufferNone policy” (and the Download Early policy inmost cases) “has all x CPU cycles to completeone task. This leads to clients using the BufferMultiple Tasks policy not completing tasks moreoften than clients using the” Download Early orBuffer None policies. “If a task is not completedon time, all of the CPU time spent on that task iswasted. Over the course of our simulations of the28-day traces, some of the time wasted by clientsusing the Buffer Multiple Tasks policy was usedproductively by clients using the” Download Earlyand Buffer None policies, allowing the clients us-ing the Download Early and Buffer None policiesto complete as many as 3 tasks more than clientsusing the Buffer Multiple Tasks policy [7, 9].

3.2 Validation of Simulation Results

It was important to “validate that the results ofour simulations would hold in the real world”because simulations can fail to include details thatwould have a large effect on the results, such asnetwork congestion [9]. We “developed a volun-

teer computing server program and an emulatedvolunteer computing client to test the task re-trieval policies under more realistic conditions.We ran the server program and the emulatedvolunteer computing client with different policiesusing a subset of the traces we collected” [9]. “Weused a private network of 52 computers to runthe validation. One computer ran nothing but thevolunteer computing server program and a webserver from which the clients downloaded datafiles. A second computer functioned as a router.The other 50 computers ran the emulated volun-teer computing clients” [9].

“For the validation, we elected to use the para-meter settings that we previously determined werethe ones we expected to see most commonly inreal volunteer computing projects, as discussed inSection 2.4. This gave us a 1 MB file size, a delaybound of seven days, a checkpoint cost of 10 s anda checkpoint frequency of 1574 s. We needed toselect a download speed from 300 kbps and 10mbps and a task duration from 4 h and 24 h. Weselected the download speed of 10 mbps becausethis was easier to implement in our experiment.We selected the task duration of 24 h because webelieve our experience with volunteer computingprojects is that tasks take an amount of CPU timemuch closer to 24 h than to 4 h” [9].

Due to hardware constraints, we were only ableto test a subset of task retrieval policies and traces.We tested the Buffer None, Download Early,Buffer 1 Task, and Buffer Multiple Tasks policieswith 1 day and 7 days of tasks for the BufferMultiple Tasks policy. We elected not to test 1 dayand 7 days because we felt that a user was morelikely to buffer a round number of days like 1 than3.5. The results for the parameters we were testingwere the same for buffering 7 days and 14 days, sowe chose to buffer 7 days. To select the traces totest, we selected the traces that caused clients tocomplete different numbers of tasks for differenttask retrieval policies. From these traces, we se-lected the ten traces “that resulted in the greatestvariability in the number of tasks completed forthe simulations” [9].

We created a number crunching task we be-lieved was similar to tasks volunteer computingclients would perform for some projects. The taskwas to perform Fast Fourier Transforms using

Page 10: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

528 D. Toth, D. Finkel

data files for input. We had our computers eachrun 50 trials to see how many FFTs they couldperform in an hour and scaled the average up tothe number of FFTs that should be completed in24 h [9].

“The volunteer computing client is” a “multi-threaded Java application. When the client starts,a single thread (the control thread) starts. Thecontrol thread sleeps for some number of minutesequal to 5 * the client’s unique identificationnumber, which is a command line argument. Thisinitial sleeping period staggers the start times ofthe clients so they are not all downloading theinitial files at exactly the same time, making thesituation more realistic. Once the client has fin-ished its initial sleep, it wakes up and reads atrace file specified as a command line argument.The trace file is one of the 140 traces we col-lected”. “The data in the trace file is used bythe control thread to instruct other threads tostart, pause, resume, or stop, based on the traceat the appropriate times to mimic the status ofthe computer from which the trace was collected.The other threads are a science application thread,a checkpoint thread, and one or more downloadthreads. Those three threads form the actual vol-

unteer computing client, while the control threadsimulates the computer and screensaver status.The control thread runs for the entire durationof the trace, starting, stopping, pausing, and re-suming the science application thread and down-load threads at the appropriate times. The scienceapplication thread controls the checkpoint thread.Once the end of the trace has been reached, theclient outputs the number of tasks it has com-pleted and terminates” [9]. The results for ourvalidation are shown in Table 4 and show goodagreement with the simulation results.

3.3 The Impact of Multi-Core CPUs

Multi-core CPUs have become commonplace to-day and therefore, it is important to explore theimpact multi-core CPUs have on task retrievalpolicies. “We ran simulations for CPUs with 2,3, 4, 8, and 16 cores” [8]. “We found that in 138of the 140 cases, the number of tasks completedexceeded the” linear scaling of the number oftasks completed by each policy with respect tothe number of cores, resulting in a slightly bet-ter than linear increase in the number of taskscompleted in relation to the number of processing

Table 4 Comparison ofnumber of taskscompleted by validationand simulation [9]

Trace Method Task retrieval policyDownload Buffer Buffer 1 Buffer Buffer 7early none task 1 day days

Business 9 Simulation 12 12 11 11 11Validation 11 12 10 9 10

Business26 Simulation 16 16 15 15 15Validation 16 16 15 15 12

CCC 7 Simulation 16 16 16 16 13Validation 16 16 15 15 12

CCC 10 Simulation 16 16 16 16 14Validation 16 16 15 15 12

CCC 42 Simulation 17 17 17 17 15Validation 16 16 15 16 16

CCC 55 Simulation 20 20 20 20 18Validation 20 20 19 20 17

CCC 64 Simulation 22 22 22 22 20Validation 22 22 21 21 19

Home 1 Simulation 9 9 8 8 8Validation 10 10 8 8 7

Home 24 Simulation 10 10 8 8 8Validation 10 10 9 8 8

Student 4 Simulation 11 11 11 11 9Validation 11 11 11 11 8

Page 11: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

Improving the Productivity of Volunteer Computing 529

cores used [8]. “This means that by having theprocessing cores in a computer’s CPU all workon one task simultaneously, the number of tasksthat the computer would complete would be morethan if each core worked on a separate task. Thetwo cases where the number of completed tasksdid not exceed the predicted number of completedtasks were when the download speed was slow(300 kbps) and the tasks were short (4 h of CPUtime was required for a single core to complete atask). We note that increasing the time required tocomplete a task or increasing the download speedonce again resulted in every policy completingmore tasks than predicted” [8].

“In general, more tasks were completed thanpredicted because some tasks that were abortedwhen run on a single core CPU were completedbecause it took less wall clock time to completethem with multiple cores. Also, time that wasspent on tasks that were aborted when run ona single core CPU in some cases was no longerwasted when run on multiple cores and could beused to complete additional tasks. We observedthat in general, the amount by which the expectednumber of completed tasks was exceeded in-creased as the number of processing cores used in-creased and increased more for longer tasks” [8].

“Our simulations suggest that for the 24 h tasks,a computer with a quad-core CPU could increasethe number of tasks it completed by almost 4% forsome task retrieval policies and as much as 6.5%for other policies if the cores worked collabora-tively on one task at a time rather than working onseparate tasks. We also observed that the policiesthat cause a client to buffer tasks exceed the pre-dicted number of completed tasks by a higher per-centage than the policies that do not buffer tasks.Increasing the amount of tasks buffered resultedin exceeding the predicted number of completedtasks by a larger amount, except for the policy thatbuffered 14 days of tasks. The policy that buffered14 days of tasks had a percent improvement incompleted tasks that was slightly lower than thepolicy that buffered 7 days of tasks” [8].

“The fact that policies that buffered tasks hadlarger increases in the percent improvement overthe predicted number of completed tasks than thepolicies that did not buffer tasks is significant.This means that the gap between the numbers

of tasks that are completed by clients using dif-ferent policies shrinks as the number of CPUcores used to work on a single task increases” [8].“The shrinking of that gap means that using taskretrieval policies that buffer work will not resultin as significant a decrease in the number of taskscompleted as more cores work on a single task.In fact, in some cases, computers using multiplecores to work on a single task complete more tasksif they buffer 14 days of tasks than if they do notbuffer any tasks which was not the case for a singlecore CPU” [8]. “Figure 1 shows the gap betweenthe policy that completes the most tasks and thepolicy that completes the fewest tasks” [8].

An exception to the gap shrinking as moreCPU cores work on a single task occurred for 4-htasks and a 300 kbps download speed. While “theperformance gap decreases when 2 cores are usedinstead of 1, it then increases as more cores areadded” [8]. For 4-h tasks and a 300 kbps down-load speed, “the clients using the Buffer Nonepolicy fail to maintain the slightly greater thanlinear speedup. This implies that as clients runon multi-core computers, projects should transi-tion away from using the Buffer None policy” [8].Because “the Download Early policy still outper-forms the Buffer Multiple Tasks policy in all cases”and the Buffer None policy fails to maintain theslightly greater than linear speedup, the Down-load Early policy appears to be the best policy touse for clients running on computers with multi-core CPUs [8].

3.4 The Impact of Running Clients while theScreensaver is Off on Single-Core CPUComputers

The environmental movement today may posea difficult challenge for volunteer computing be-cause some people may put their computers intopower-save modes when the screensavers comeon instead of running volunteer computing clients.Some people who volunteer their computer maywish to still participate in volunteer computingwhile trying to conserve electricity. We believethese people may choose to run a volunteer com-puting client as a background process while hiscomputer’s screensaver is off and then let the

Page 12: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

530 D. Toth, D. Finkel

Fig. 1 Difference between number of tasks completed by policies that complete the most and fewest tasks (© IASTED2008)

computer enter a power-saving mode when thescreensaver starts. In an attempt to determinehow users who make this change in behavior cancomplete the most tasks, we ran simulations todetermine which task retrieval policies would re-sult in the greatest number of completed tasksunder those conditions. In general, we expectedto see a change in the number of tasks completedby clients based on the percentage of time clientswould run if they ran as screensavers as comparedto the percentage of time clients would run ifthey ran as background processes while screen-savers were not running. The computers our traceswere collected from had the screensaver runningapproximately 53% of the time on average andwere powered on with the screensaver not runningapproximately 27% of the time on average. Thus,we expected to see a decrease in productivity ofnearly 50% and our simulations confirmed this.However, the important information to determineis which task retrieval policies result in the great-est number of tasks being completed and thuswe examined the results of our simulation moreclosely.

For computers with one single-core CPU, oursimulations demonstrated that the different task

retrieval policies produce a significant differencein the productivity of clients running while a com-puter’s screensaver is off. Figure 2 illustrates thedifference between the numbers of tasks clientscompleted using the different task retrieval poli-cies while running when the screensaver was off.When clients run while the screensaver is on in-stead of while the screensaver is off, the differencein the number of tasks clients complete using thedifferent task retrieval policies is much smaller.Clients running while the screensaver was on andusing the least productive task retrieval policycompleted 0.9% to 2.96% fewer than the numberof tasks completed by the clients using the mostproductive policy. In contrast, clients runningwhile the screensaver was off and using the leastproductive task retrieval policy completed 8.5%to 40.17% fewer tasks than the clients using themost productive policy. The percent difference inthe tasks completed by the most productive andleast productive clients is shown in Fig. 3.

For 4-h tasks, although clients running whilethe screensaver was off and using the DownloadEarly task retrieval policy completed more tasksthan the clients using other task retrieval policies,clients using the Buffer None policy and clients

Page 13: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

Improving the Productivity of Volunteer Computing 531

Fig. 2 Difference in tasks completed by task retrieval policies

using the Buffer 1 Task policy completed almostas many tasks and there was less than a 0.2% dif-ference. However, clients buffering 1 day of taskscompleted over 7% fewer tasks than the clientsusing the Download Early policy and bufferingmore than 1 day of tasks made clients completeeven fewer tasks. For 24-h tasks, the clients usingthe Buffer None policy to retrieve tasks completedthe most tasks and clients using the Download

Early policy completed only 0.24% fewer tasks.However, clients buffering one task completed17% fewer tasks and buffering more tasks causedthe clients to complete even fewer tasks. Due tothese differences, we recommend clients runningwhen the screensaver is off rather than when thescreensaver is on should use either the DownloadEarly or Buffer None policies to retrieve tasks oncomputers with one single-core CPU.

Fig. 3 Difference in tasks completed by most and least productive task retrieval policies

Page 14: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

532 D. Toth, D. Finkel

3.5 The Impact of Running Clients while theScreensaver is Off on Multi-Core CPUComputers

As computers with multi-core CPUs have becomethe standard in the last couple of years, we feltit was important to assess which task retrievalpolicies would result in the greatest number oftasks completed by multi-core computers if theclients ran while the screensaver was off insteadof while the screensaver was on. Similar to theresults we observed with clients using multiplecores for a single task while the client ran as ascreensaver, using multiple cores for a single taskwhen a client ran while the screensaver was offgenerally produced greater than linear speedup inthe number of tasks completed with respect to thenumber of cores used for a task. However, whilethe speedup observed for clients running when thescreensaver was on was only slightly better thanlinear, the speedup observed for clients runningwhen the screensaver was off was in some casesmuch more significant. For 4-h tasks, the speedupwas roughly equivalent between clients runningwhile the screensaver was on and clients runningwhile the screensaver was off. However, for 24-h tasks, with four or more cores, the speedupobserved in clients running while the screensaverwas off was more than 1.5 times the speedupobserved in clients running while the screensaverwas on for policies that buffered 1 or more daysof tasks. The speedups for 24-h tasks and the 300kbps download speed are shown in Table 5 and

are almost identical to the speedups for 24-h tasksand the 10 mbps download speed.

In general, there was a much greater differencebetween the percent of tasks that would be com-pleted by clients using the most productive andleast productive task retrieval policies if clientsran when the screensaver was off instead of whenthe screensaver was on. The difference betweenthe percent of tasks that would be completed byclients using the most productive and least pro-ductive task retrieval policies is shown in Fig. 4.The best and worst task retrieval policies for acomputer with a dual-core CPU running a clientas a screensaver resulted in a maximum differenceof 1.66% fewer tasks being completed. In contrast,the best and worst task retrieval policies for acomputer with a dual-core CPU running the vol-unteer computing client as a background processwhen the screensaver is off resulted in a differenceof as much as 22.1% fewer tasks being completedin the least productive policy than in the most pro-ductive policy. To reduce the difference betweenthe percent of tasks completed to less than 5%,which is still larger than 1.66% for both 4-h tasksand 24-h tasks required CPU with more than 8cores. Even a computer with a 16-core CPU wouldnot be able to reduce the difference between thepercent of tasks completed by clients using thebest and worst task retrieval policies to 1.66

For 4-h tasks, clients with multiple CPU coresworking on a single task and running while thescreensaver was off completed more tasks whenthey used the Download Early and Buffer 1 Task

Table 5 Speedup comparison for 24-h tasks & 300 kbps download speed

Buffer Download Buffer Buffer 1 Buffer 3.5 Buffer 7 Buffer 14none early 1 task day of tasks days of tasks days of tasks days of tasks

2 Core CPU Screensaver on 1.02 1.02 1.03 1.03 1.04 1.05 1.04Screensaver off 1.11 1.12 1.30 1.52 1.47 1.43 1.43

3 Core CPU Screensaver on 1.03 1.04 1.04 1.04 1.05 1.06 1.05Screensaver off 1.13 1.14 1.35 1.68 1.63 1.59 1.58

4 Core CPU Screensaver on 1.04 1.04 1.04 1.04 1.06 1.07 1.07Screensaver off 1.15 1.15 1.38 1.73 1.69 1.66 1.64

8 Core CPU Screensaver on 1.04 1.05 1.05 1.05 1.07 1.08 1.08Screensaver off 1.16 1.16 1.40 1.82 1.80 1.79 1.78

16 Core CPU Screensaver on 1.04 1.05 1.05 1.06 1.07 1.08 1.08Screensaver off 1.16 1.17 1.41 1.88 1.85 1.85 1.84

Page 15: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

Improving the Productivity of Volunteer Computing 533

Fig. 4 Difference between number of tasks completed by the policies that complete the most and fewest tasks

policies. Clients using other task retrieval policiesconsistently completed fewer tasks and differedby as much as 1% to 5%. For 24-h tasks, clientsusing the Buffer None and Download Early taskretrieval policies completed more tasks than theother clients. Clients using the other policies com-pleted fewer tasks.

Clients using the Buffer 1 Task policy com-pleted almost as many tasks in some instances, butdiffered by more than 3% in other instances. Theinconsistency leads us to believe the DownloadEarly policy is the best choice for computers withmulti-core CPUs where the client runs while thescreensaver is off.

4 Conclusions & Future Work

Our simulations have shown the productivity ofvolunteer computing projects can be increased byhaving clients use certain task retrieval policiesunder the given assumptions and realistic condi-tions we presented. Our simulations for single-core computers indicated buffering tasks can leadto fewer tasks being completed and the Down-load Early task retrieval policy we presented out-

performs the existing policies that clients use.These simulations also indicated little benefitcould be gained by developing an adaptive policyfor single-core computers.

Our simulations for multi-core computers in-dicated there is significant value in being able todevelop tasks for volunteer computing clients thatcan be split into equal parts that can worked on inparallel by multiple cores. Multi-core computersprocessing these tasks can achieve slightly greaterthan linear speedup relative to the number ofcores working on the task. For tasks that can beprocessed in parallel by multiple cores, there isalso a smaller performance loss caused by buffer-ing tasks than the performance loss caused bybuffering tasks on a single-core computer.

Introducing the idea that people may run vol-unteer computing clients while the computer’sscreensaver is not running instead of when thescreensaver is running led to a very large per-formance difference between the policies thatbuffered work and policies that did not for single-core computers. In this case, the Download Earlypolicy outperformed the policies that bufferedtasks. Increasing the number of buffered tasks in-creased the performance gap between the clients

Page 16: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

534 D. Toth, D. Finkel

using the Download Early policy and the poli-cies that buffered tasks. For clients running whilethe screensaver is off, multi-core computers againclose the performance gap between the policies.However, it takes quad-core CPUs to close thegap between the Download Early policy and theleast productive policy to 10%. The magnitudeof this gap is larger than the gap between clientsrunning when the screensaver is on using the mostand least productive policies with single-core ormulti-core CPUs. It takes 8-core CPUs to closethe gap to 5%. Therefore, until the normal sys-tems have CPUs with more than 8 cores, volunteercomputing projects would be more productive ifthe Download Early policy was used. The Buffer1 Task policy also does reasonably well and couldbe used if a policy that is slightly simpler to im-plement or a policy that buffers tasks is desired.As the Buffer None policy fails to maintain theslightly greater than linear speedup for 4-h tasksand a 300 kbps download speed as the number ofcores increases, we recommend discontinuing theuse of this policy.

In future work, we intend to address the as-sumptions “that tasks could be decomposed inton-equal length sub-tasks without any overhead”and that the “completion of the tasks would notbe slowed down by any additional contention forresources such as accessing the disk or commu-nication busses” [8]. We also intend to addressthe assumption that a volunteer computing clientgets all of a computer’s CPU cycles while thecomputer is not in power save mode. We intendto examine the effect on the effectiveness of taskretrieval policies when people run volunteer com-puting clients constantly as a low priority processregardless of whether the screensaver is active.Finally, we intend to study the effects processoraffinity can have on task retrieval policies.

References

1. Anderson, D.P.: BOINC: a system for public resourcecomputing and storage. In: 5th IEEE/ACM Inter-national Workshop on Grid Computing, Pittsburgh,USA, 8 November 2004

2. Anderson, D.P., Cobb, J., Korpela, E., Lebofsky, M.,Wertheimer, D.: SETI@home: an experiment in public-

resource computing. Commun. ACM 45(11), 56–61(2002)

3. Folding@Home: Folding@Home distributed comput-ing. http://folding.stanford.edu/. Accessed 26 October2006

4. GIMPS: mersenne prime search. http://www.mersenne.org/prime.htm. Accessed 13 July 2005

5. Bohannon, J.: Grassroots supercomputing. Science308, 810–813 (2005)

6. Toth, D., Finkel, D.: File-based tasks for public-resource computing. In: Proceedings of The 17thIASTED International Conference on Parallel andDistributed Computing and Systems - PDCS 2005, pp.398–403, Phoenix, Arizona, USA, 14–16 November2005

7. Toth, D., Finkel, D.: Increasing the amount of workcompleted by volunteer computing projects with taskdistribution policies. In: Proceedings of the 2nd Work-shop on Desktop Grids and Volunteer Computing Sys-tems - PCGrid 2008, Miami, Florida, USA, 18 April2008

8. Toth, D.: The impact of multi-core architectures ontask retrieval policies for volunteer computing. In:Proceedings of the 20th IASTED International Con-ference on Parallel and Distributed Computing andSystems - PDCS 2008, pp. 330–335, Orlando, Florida,USA, 16–18 November 2008

9. Toth, D.: Improving the productivity of volunteer com-puting. Ph.D. Dissertation (2008)

10. Toth, D., Finkel, D.: Characterizing resource avail-ability for volunteer computing and its impact ontask distribution methods. In: Proceedings of the 6thWSEAS International Conference on Software Engi-neering, Parallel and Distributed Systems - SEPADS2007, Corfu, Greece, 16–19 February 2007

11. Mutka, M.H., Livny, M.: Profiling workstations’ avail-able capacity for remote execution. In: Proc.12th IFIPWG 7.3 International Symposium on Computer Per-formance Modeling, Measurement and Evaluation, pp.529–544. Brussels, Belgium (1987)

12. Acharya, A., Edjlali, G., Saltz, J.: The utility of ex-ploiting idle workstations for parallel computation. In:Proceedings of SIGMETRICS’97, pp. 225–234. Seattle,Washington, USA (1997)

13. Kondo, D., Taufer, M., Brooks, C. Casanova, H.,Chien, A.: Characterizing and evaluating desktop grids:an empirical study. In: Proceedings of the Interna-tional Parallel and Distributed Processing Symposium(IPDPS’04) (2004)

14. Kondo, D., Andrzejak, A., Anderson, D.P.: On cor-related availability in internet-distributed systems. In:Proceedings of the 9th IEEE/ACM International Con-ference on Grid Computing (Grid 2008), Tsukuba,Japan, 29 Sept – 1 Oct 2008

15. BOINC: Choosing BOINC projects. http://boinc.berkeley.edu/projects.php. Accessed 26 October 2006

16. Preferences. http://boinc.berkeley.edu/prefs.php. Ac-cessed 24 February 2005

17. Work distribution. http://boinc.berkeley.edu/work_distribution.php. Accessed 23 February 2005

Page 17: Improving the Productivity of Volunteer Computing by Using the Most Effective Task Retrieval Policies

Improving the Productivity of Volunteer Computing 535

18. Grid.org: GRID.ORG - help: frequently asked ques-tions. http://www.grid.org/help/faq_wus.htm. Accessed1 April 2005

19. Young, J.: A first order approximation to the optimumcheckpoint interval. Commun. ACM 17, 530–531(1974)

20. SETI@home beta - some questions. http://setiathome.berkeley.edu/forum_thread.php?id=37561. Accessed 8February 2007

21. Result deadline - unofficial BOINC Wiki. http://boinc-boinc-wiki.ath.cx/index.php?title=Deadline. Accessed9 February 2007

22. Posts by Keck_Komputers. http://einstein.phys.uwm.edu/forum_user_posts.php?userid=2914. Accessed 9February 2007

23. Folding@Home News. http://folding.stanford.edu/news.html. Accessed 9 February 2007

24. Folding@Home configuration FAQ. http://folding.stanford.edu/FAQ-settings.html. Accessed 9 February2007

25. Frequently asked questions (FAQ). http://folding.stanford.edu/faq.html. Accessed 9 February 2007

26. WorkUnits - FaHWiki. http://fahwiki.net/index.php/WorkUnits. Accessed 8 February 2007

27. Workunit size vs. processor. http://einstein.phys.uwm.edu/forum_thread.php?id=4583. Accessed 8 February2007

28. Einstein@Home FAQ. http://einstein.phys.uwm.edu/faq.php. Accessed 8 February 2007

29. QMC@Home - wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/QMC%40Home. Accessed 8February 2007

30. Report deadline too short. http://lhcathome.cern.ch/forum_thread.php?id=1977. Accessed 8 February2007

31. Get wu’s with deadline less than 5 days, why??http : // lhcathome.cern.ch/ forum_thread.php? id=1619.Accessed 9 February 2007

32. grid.org forums - view topic - READ ME -=- workunits (WU). http://forum.grid.org/phpBB/viewtopic.viewtopic.php?t=8847&highlight=workunit+size. Ac-cessed 8 February 2007

33. Rosetta@Home FAQ (work in progress). http://boinc.bakerlab.org/rosetta/forum_thread.php?id=669.Accessed 23 October 2006

34. ClimatePrediction.Net gateway. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/quick_faq.php. Accessed 8February 2007

35. BOINCSIMAP :: view topic - Wus. http://boinc.bio.wzw.tum.de/boincsimap/forum/viewtopic.php?t=5. Ac-cessed 8 February 2007

36. The Riesel Sieve project :: view topic - length of WU.http://www.rieselsieve.com/forum/viewtopic.php?t=819.Accessed 9 February 2007

37. PerlBOINC :: RieselSieve. http://boinc.rieselsieve.com/?faq. Accessed 8 February 2007

38. The Riesel Sieve project :: view topic - checkpoint-ing? http://www.rieselsieve.com/forum/viewtopic.php?t=1084. Accessed 9 February 2007

39. World community Grid - view thread - runtimes for work units - what to expect. http://worldcommunitygrid.org/forums/wcg/viewthread?thread=928. Accessed 8 February 2007