BOINC The Year in Review David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009.

BOINCThe Year in Review

David P. Anderson

Space Sciences LaboratoryU.C. Berkeley

22 Oct 2009

Volunteer computing

• Throughput is now 10 PetaFLOPS

– mostly Folding@home

• Volunteer population is constant

– 330K BOINC, 200K F@h

• Volunteer computing still unknown in

– HPC world

– scientific computing world

– general public

ExaFLOPS

• Current PetaFLOPS breakdown:

• Potential: ExaFLOPS by 2010– 4M GPUs * 1 TFLOPS * 0.25 availability

Processor type0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5 4.6

2.42.2

1.2

NVIDIACPUPS3 (Cell)ATI

Projects

• No significant new academic projects

– but signs of life in Asia

• No new umbrella projects

• AQUA@home: D-Wave systems

• Several hobbyist projects

BOINC funding

• Funded into 2011

• New NSF proposal

Facebook apps

• Progress thru Processors (Intel/GridRepublic)

– Web-only registration process

– lots of fans, not so many participants

• BOINC Milestones

• IBM WCG

Research

• Host characterization

• Scheduling policy analysis

– EmBOINC: project emulator

• Distributed applications

– Volpex

• Apps in VMs

• Volunteer motivation study

Fundamental changes

• App versions now have dynamically-determined processor usage attributes (#CPUs, #GPUs)

• Server can have multiple app versions per (app, platform) pair

• Client can have multiple versions per app

• An issued job is linked to an app version

Scheduler request

• Old (CPU only)

– requested # seconds

– current queue length

• New: for each resource type (CPU, NVIDIA, ...)

– requested # seconds

– current high-priority queue length

– # of idle instances

Schedule reply

• Application versions include

– resource usage (# CPUs, # GPUs)

– FLOPS estimate

• Jobs specify an app version

• A given reply can include both CPU and GPU jobs for a given application

Client: work fetch policy

• When? From which project? How much?• Goals

– maintain enough work– minimize scheduler requests– honor resource shares

• per-project “debt”

CPU 0

CPU 3

CPU 2

CPU 1

maxmin

Work fetch for GPUs: goals

• Queue work separately for different resource types

• Resource shares apply to aggregate

Example: projects A, B have same resource share

A has CPU and GPU jobs, B has only GPU jobs

GPU

CPU A

BA

Work fetch for GPUs

• For each resource type

– per-project backoff

– per-project debt• accumulate only while not backed off

• A project’s overall debt is weighted average of resource debts

• Get work from project with highest overall debt

Client: job scheduling

• GPU job scheduling– client allocates GPUs– GPU prefs

• Multi-thread job scheduling– handle a mix of single-, multi-thread jobs– don’t overcommit CPUs

GPU odds and ends

• Default install is non-service• Dealing with sporadic usability

– e.g. Remote Desktop• Multiple non-identical GPUs• GPUs and anonymous platform

Other client changes

• Proxy auto-detection

• Exclusive app feature

• Don’t write state file on each checkpoint

Screensaver

• Screensaver coordinator

– configurable

• New default screensaver

• Intel screensaver

Scheduler/feeder

• Handle multiple app versions per platform

• Handle requests for multiple resources

– app selection

– completion estimate, deadline check

• Show specific messages to users

– “no work because you need driver version N”

• Project-customized job check

– jobs need different # of GPU processors

• Mixed locality and non-locality scheduling

Server

• Automated DB update

• Protect admin web interface

Manager

• Terms of use feature

• Show only projects supporting platform– need to extend for GPUs

• Advanced view is keyboard navigable

• Manager can read cookies (Firefox, IE)

– web-only install

Apps

• Enhanced wrapper

– checkpointing, fraction done

• PyMW: master/worker Python system

Community contributions

• Pootle-based translation system

– projects can use this

• Testing– alpha test project

• Packaging– Linux client, server packages

• Programming

– lots of flames, little code

What didn’t get done

• Replace runtime system

• Installer: deal with “standby after X minutes”

• Global shutdown switch

Things on hold

• BOINC on mobile devices

• Replace Simple GUI

Important things to do

• New system for credit and runtime estimation– we have a design!

• Keep track of GPU availability separately

• Steer computers with GPUs towards projects with GPU apps

• Sample CUDA app

BOINC development

• Let us know if you want something

• If you make changes of general utility:

– document them

– add them to trunk

BOINC The Year in Review David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009.

Documents

Transcript of BOINC The Year in Review David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009.