BOINC The Year in Review David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009.
-
date post
20-Jan-2016 -
Category
Documents
-
view
215 -
download
0
Transcript of BOINC The Year in Review David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009.
BOINCThe Year in Review
David P. Anderson
Space Sciences LaboratoryU.C. Berkeley
22 Oct 2009
Volunteer computing
• Throughput is now 10 PetaFLOPS
– mostly Folding@home
• Volunteer population is constant
– 330K BOINC, 200K F@h
• Volunteer computing still unknown in
– HPC world
– scientific computing world
– general public
ExaFLOPS
• Current PetaFLOPS breakdown:
• Potential: ExaFLOPS by 2010– 4M GPUs * 1 TFLOPS * 0.25 availability
Processor type0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 4.6
2.42.2
1.2
NVIDIACPUPS3 (Cell)ATI
Projects
• No significant new academic projects
– but signs of life in Asia
• No new umbrella projects
• AQUA@home: D-Wave systems
• Several hobbyist projects
BOINC funding
• Funded into 2011
• New NSF proposal
Facebook apps
• Progress thru Processors (Intel/GridRepublic)
– Web-only registration process
– lots of fans, not so many participants
• BOINC Milestones
• IBM WCG
Research
• Host characterization
• Scheduling policy analysis
– EmBOINC: project emulator
• Distributed applications
– Volpex
• Apps in VMs
• Volunteer motivation study
Fundamental changes
• App versions now have dynamically-determined processor usage attributes (#CPUs, #GPUs)
• Server can have multiple app versions per (app, platform) pair
• Client can have multiple versions per app
• An issued job is linked to an app version
Scheduler request
• Old (CPU only)
– requested # seconds
– current queue length
• New: for each resource type (CPU, NVIDIA, ...)
– requested # seconds
– current high-priority queue length
– # of idle instances
Schedule reply
• Application versions include
– resource usage (# CPUs, # GPUs)
– FLOPS estimate
• Jobs specify an app version
• A given reply can include both CPU and GPU jobs for a given application
Client: work fetch policy
• When? From which project? How much?• Goals
– maintain enough work– minimize scheduler requests– honor resource shares
• per-project “debt”
CPU 0
CPU 3
CPU 2
CPU 1
maxmin
Work fetch for GPUs: goals
• Queue work separately for different resource types
• Resource shares apply to aggregate
Example: projects A, B have same resource share
A has CPU and GPU jobs, B has only GPU jobs
GPU
CPU A
BA
Work fetch for GPUs
• For each resource type
– per-project backoff
– per-project debt• accumulate only while not backed off
• A project’s overall debt is weighted average of resource debts
• Get work from project with highest overall debt
Client: job scheduling
• GPU job scheduling– client allocates GPUs– GPU prefs
• Multi-thread job scheduling– handle a mix of single-, multi-thread jobs– don’t overcommit CPUs
GPU odds and ends
• Default install is non-service• Dealing with sporadic usability
– e.g. Remote Desktop• Multiple non-identical GPUs• GPUs and anonymous platform
Other client changes
• Proxy auto-detection
• Exclusive app feature
• Don’t write state file on each checkpoint
Screensaver
• Screensaver coordinator
– configurable
• New default screensaver
• Intel screensaver
Scheduler/feeder
• Handle multiple app versions per platform
• Handle requests for multiple resources
– app selection
– completion estimate, deadline check
• Show specific messages to users
– “no work because you need driver version N”
• Project-customized job check
– jobs need different # of GPU processors
• Mixed locality and non-locality scheduling
Server
• Automated DB update
• Protect admin web interface
Manager
• Terms of use feature
• Show only projects supporting platform– need to extend for GPUs
• Advanced view is keyboard navigable
• Manager can read cookies (Firefox, IE)
– web-only install
Apps
• Enhanced wrapper
– checkpointing, fraction done
• PyMW: master/worker Python system
Community contributions
• Pootle-based translation system
– projects can use this
• Testing– alpha test project
• Packaging– Linux client, server packages
• Programming
– lots of flames, little code
What didn’t get done
• Replace runtime system
• Installer: deal with “standby after X minutes”
• Global shutdown switch
Things on hold
• BOINC on mobile devices
• Replace Simple GUI
Important things to do
• New system for credit and runtime estimation– we have a design!
• Keep track of GPU availability separately
• Steer computers with GPUs towards projects with GPU apps
• Sample CUDA app
BOINC development
• Let us know if you want something
• If you make changes of general utility:
– document them
– add them to trunk