David P. Anderson Space Sciences Laboratory University of California – Berkeley

28
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC

description

David P. Anderson Space Sciences Laboratory University of California – Berkeley. Public Distributed Computing with BOINC. Public-resource computing. 1 billion Internet-connected PCs in 2010 >50% of PCs are privately owned Assume 100M participants At least 100 PetaFLOPs - PowerPoint PPT Presentation

Transcript of David P. Anderson Space Sciences Laboratory University of California – Berkeley

Page 1: David P. Anderson Space Sciences Laboratory University of California – Berkeley

David P. AndersonSpace Sciences Laboratory

University of California – Berkeley

Public Distributed Computingwith BOINC

Page 2: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Public-resource computing

● 1 billion Internet-connected PCs in 2010● >50% of PCs are privately owned● Assume 100M participants

– At least 100 PetaFLOPs– At least 1 Exabyte (10^18) storage

● Problems– incentive, security, failures, ...

Page 3: David P. Anderson Space Sciences Laboratory University of California – Berkeley

SETI@home

● Started May 1999● ~600,000 active participants● ~60 TeraFLOPs● Problems with current software

– hard to change/add algorithms– can't share participants w/ other projects– inflexible data architecture

Page 4: David P. Anderson Space Sciences Laboratory University of California – Berkeley

SETI@home data architecture

ideal:current:

commercialInternet

Berkeley

participants

tapes Internet2(free)

commercialInternet

Berkeley Stanford USC

participants

50 Mbps

Page 5: David P. Anderson Space Sciences Laboratory University of California – Berkeley

BOINC: Berkeley Open Infrastructure for Network Computing

● Multiple projects

– easy to develop and operate

– independent● Support wide range of tasks

– computation/storage

– task “topologies”● Participant features

– can choose projects, resource allocation

– configurable; invisible on participant hosts

– many platforms supported

Page 6: David P. Anderson Space Sciences Laboratory University of California – Berkeley

BOINC server architecture

work generator

projectDBBOINC

DB

timeout/retry

validater

assimilator

file deleter data serverdata serverdata server

data serverdata serverscheduling server

Web interfaces(PHP)

Page 7: David P. Anderson Space Sciences Laboratory University of California – Berkeley

BOINC client architecture

BOINCcore client

screensaver

application

BOINClibrary

application

BOINClibrary

files,shared memory

messages schedulers,data servers

Page 8: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Data architecture

● Files

– immutable, replicated– may originate on client or project– may remain resident on client

● Persistent, non-intrusive file transfers● XML descriptor:

<file_info><name>arecibo_3392474_jun_23_01</name><url>http://ds.ssl.berkeley.edu/a3392474</url><url>http://dt.ssl.berkeley.edu/a3392474</url><md5_cksum>uwi7eyufiw8e972h8f9w7</md5_cksum><nbytes>10000000</nbytes>

</file_info>

Page 9: David P. Anderson Space Sciences Laboratory University of California – Berkeley

BOINC applications

● Any language (C, C++, Fortran)● BOINC API

– filename translation– checkpoint/restart, % done, CPU time– graphics (based on OpenGL, GLUT)

Page 10: David P. Anderson Space Sciences Laboratory University of California – Berkeley
Page 11: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Work units● Template for a computation● Resource estimates

– Integer, FP ops; memory; disk space● Delay bound

– determines retry, client abort

<file_info><name>arecibo_3392474_jun_23_01</name>...

</file_info><workunit>

<name>ar_13323313</name><file_ref>

<name>arecibo_3392474_jun_23_01</name><open_name>input_file</open_name>

</file_ref><command_line>-niter 1000</command_line>

</workunit>

Page 12: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Results

● An instance of a computation (completed or not)

● Includes: host ID, claimed/granted credit

<file_info><name>arecibo_3392474_jun_23_01.out</name>...

</file_info><result>

<workunit_name>ar_13323313</workunit_name><file_ref>

<name>arecibo_3392474_jun_23_01.out</name><open_name>output_file</open_name>

</file_ref></result>

Page 13: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Scheduling

● Work buffering on client– upper, lower bounds

● Host attributes– FP/int/mem speeds, disk/memory sizes– network bandwidth up/down– fraction of time connected, computing

● Scheduler policy:– send as much work as requested, subject

to feasibility, WU deadlines

Page 14: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Client/server protocol (XML-RPC)

● Request– Authentication– Host description– Persistent file descriptions– Result descriptions– Duration of work requested

● Reply– Application, workunit, result descriptors– Result acknowledgements– Preferences– Control messages (redirect, back off, etc.)

Page 15: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Work sequences● Handle long (weeks or months)

computations with large local state● Sequence normally stays on one host;

move to different host if failure● Scheduling, redundancy checking are

trickyUpload state

Check for abort

Page 16: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Redundant computing

● Create several results per workunit● Find “canonical result” with project-

specific consensus policy● Generate additional copies as needed,

up to error thresholds● One result per WU per user

Page 17: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Participant Credit● Goals:

– credit for work actually done (CPU, network, storage)

– don't know workunit size in advance– cheat-proof

● Integration with redundancy– claimed credit = benchmark * CPU time– granted credit = minimum claimed credit

● Handling graphics coprocessors– project-specific benchmarks

Page 18: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Work unit lifecycle

● Work generator: create WU, N results

● Timeout check

– create new results if needed

– detect too many errors, too many results without consensus

● Validator

– find canonical result; grant credit● Assimilator

– merge canonical result into project DB● File deleter

– delete input and output files when no longer needed

Page 19: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Participating in a BOINC project

User Project web site

create account

email account IDdownload core client

core client

enter account ID, project URL

get list of scheduling servers

scheduler RPC

Page 20: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Windows GUI

● Multi-language● Operations: suspend/resume,

attach/detach projects, etc.

Page 21: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Participant preferences

Page 22: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Project-specific preferences

Page 23: David P. Anderson Space Sciences Laboratory University of California – Berkeley

User-visible web features

● User profiles– user of the day

● Forums● Self-moderating FAQs● Teams● XML data export (3rd party statistics

reporting)

Page 24: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Project configuration file

<boinc><config> <db_name>ap</db_name> <db_passwd></db_passwd> <shmem_key>0x35740417</shmem_key> <key_dir>/mydisks/a/users/boincadm/keys</key_dir> <upload_url>http://setiboinc.ssl.berkeley.edu/ap_cgi/file_upload_handler</upload_url> <upload_dir>/mydisks/a/users/boincadm/projects/AstroPulse_Beta/upload</upload_dir> <cgi_url>http://setiboinc.ssl.berkeley.edu/ap_cgi</cgi_url> <log_dir>/mydisks/a/users/boincadm/projects/AstroPulse_Beta/log</log_dir> <disable_account_creation/></config><daemons> <daemon><cmd>feeder -d 1</cmd></daemon> <daemon><cmd>validate_test -d 2 -app AstroPulse -quorum 3</cmd></daemon> <daemon><cmd>timeout_check -d 2 -app AstroPulse -nerror 10 -ndet 10 -nredundancy 3</cmd></daemon> <daemon><cmd>assimilator -d 2 -app AstroPulse</cmd></daemon> <daemon><cmd>file_deleter -d 2</cmd></daemon></daemons><tasks> <task><cmd>update_stats -update_users -update_hosts -update_teams</cmd><period>1 hour</period></task> <task><cmd>get_load</cmd><period>5 min</period></task> <task><cmd>db_count "user"</cmd><output>count_users.out</output><period>5 min</period></task> <task><cmd>db_count "result"</cmd><output>count_results_all.out</output><period>5 min</period></task></tasks></boinc>

Page 25: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Project control

● Single control program– enable, disable– cron– status

● uses PID files to keep track of daemons● uses timestamp file for period tasks● uses lockfiles for mutual exclusion

Page 26: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Python-based testing system● Create objects representing projects,

hosts, applications, work, etc.● Activate objects to realize (create

databases and directories, run servers and clients)

● Simulate various types of failures● Check correctness of final system state

(database, result files, etc.) host = Host() user = UserUC() for i in range(2): ProjectUC(users=[user], hosts=[host], redundancy=5, short_name="test_1sec_%d"%i, resource_share=[1, 5][i]) run_check_all()

Page 27: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Monitoring/debugging tools

● All backend processes create log files– web/grep tool for tracking particular

WU/result● Database browsing tools

– summary of activity; entry point for browsing● Strip charts

– record, graph measures of system health● Watchdogs

– detect system failures; ring pager

Page 28: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Summary and status

● BOINC is funded by a 3-year NSF grant● Computing projects at Space Sciences Lab

– Astropulse (in beta test)– SETI@home (original, Australian)

● Other projects– Folding@home– Climateprediction.net

● Source code is free for noncommercial use