HTPC - High Throughput Parallel Computing (on the OSG)

25
HTPC - High Throughput HTPC - High Throughput Parallel Computing Parallel Computing (on the OSG) (on the OSG) Dan Fraser, UChicago Dan Fraser, UChicago OSG Production Coordinator OSG Production Coordinator Horst Severini, OU Horst Severini, OU (Greg Thain, Uwisc) (Greg Thain, Uwisc) OU Supercomputing OU Supercomputing Symposium Symposium Oct 6, 2010 Oct 6, 2010

description

HTPC - High Throughput Parallel Computing (on the OSG). Dan Fraser, UChicago OSG Production Coordinator Horst Severini , OU (Greg Thain , Uwisc ) OU Supercomputing Symposium Oct 6, 2010. Rough Outline. What is the OSG? (think ebay ) HTPC as a new paradigm - PowerPoint PPT Presentation

Transcript of HTPC - High Throughput Parallel Computing (on the OSG)

Page 1: HTPC - High  Throughput Parallel  Computing (on the OSG)

HTPC - High Throughput HTPC - High Throughput Parallel ComputingParallel Computing

(on the OSG)(on the OSG)Dan Fraser, UChicago Dan Fraser, UChicago OSG Production CoordinatorOSG Production Coordinator

Horst Severini, OUHorst Severini, OU(Greg Thain, Uwisc)(Greg Thain, Uwisc)

OU Supercomputing SymposiumOU Supercomputing SymposiumOct 6, 2010Oct 6, 2010

Page 2: HTPC - High  Throughput Parallel  Computing (on the OSG)

Rough OutlineRough Outline

What is the OSG? (think ebay)What is the OSG? (think ebay)HTPC as a new paradigmHTPC as a new paradigmAdvantages of HTPC for parallel jobsAdvantages of HTPC for parallel jobsHow does HTPC work?How does HTPC work?Who is using it?Who is using it?The FutureThe FutureConclusionsConclusions

Page 3: HTPC - High  Throughput Parallel  Computing (on the OSG)

Making sense of the OSGMaking sense of the OSG

OSG = Technology + Process + SociologyOSG = Technology + Process + Sociology70+ sites70+ sites (& growing) -- Supply (& growing) -- Supply contribute resources to the OSGcontribute resources to the OSG

Virtual Organizations -- DemandVirtual Organizations -- Demand VO’s are Multidisciplinary Research GroupsVO’s are Multidisciplinary Research Groups

Sites and VOs often overlapSites and VOs often overlapOSG Delivers: OSG Delivers: ~1M CPU hours every day~1M CPU hours every day 1 Pbyte of data transferred every day1 Pbyte of data transferred every day

Page 4: HTPC - High  Throughput Parallel  Computing (on the OSG)

eBay (naïve) eBay (naïve)

EBAY

Demand Supply

List itemSearch/buy

Page 5: HTPC - High  Throughput Parallel  Computing (on the OSG)

eBay (more realistic) eBay (more realistic)

Demand SupplyUPS Endpoints

Match Maker

Accounting Ratings

Complaints

PayPalUPS

Bank 1Bank 2

Page 6: HTPC - High  Throughput Parallel  Computing (on the OSG)

OSG-Bay OSG-Bay

VO’s(Demand)

ResourceProviders(Supply)

ClientStack

Match Making

Accounting Monitoring

Integration& Testing

EngageSupport

ServerStack

SoftwarePackaging

Ticketing

Servers &Storage

Servers &Storage

Servers &Storage

Security

Coordination

ProposalAnchoring …

Page 7: HTPC - High  Throughput Parallel  Computing (on the OSG)

Where does HTPC fit?Where does HTPC fit?

Page 8: HTPC - High  Throughput Parallel  Computing (on the OSG)

The two familiar HPC Models The two familiar HPC Models

High Throughput Computing (e.g. OSG)High Throughput Computing (e.g. OSG) Run ensembles of single core jobsRun ensembles of single core jobs

Capability Computing (e.g. TeraGrid)Capability Computing (e.g. TeraGrid) A few jobs parallelized over the whole systemA few jobs parallelized over the whole system Use whatever parallel s/w is on the systemUse whatever parallel s/w is on the system

Page 9: HTPC - High  Throughput Parallel  Computing (on the OSG)

HTPC – an emerging modelHTPC – an emerging model

Ensembles of small- Ensembles of small- way parallel jobsway parallel jobs(10’s – 1000’s)(10’s – 1000’s)

Use whatever Use whatever parallel s/w you want parallel s/w you want (It ships with the job)(It ships with the job)

Page 10: HTPC - High  Throughput Parallel  Computing (on the OSG)

Tackling Four ProblemsTackling Four Problems

Parallel job portabilityParallel job portability

Effective use of multi-core technologiesEffective use of multi-core technologies

Identify suitable resources & submit jobsIdentify suitable resources & submit jobs

Job Management, tracking, accounting, …Job Management, tracking, accounting, …

Page 11: HTPC - High  Throughput Parallel  Computing (on the OSG)

Current plan of attackCurrent plan of attack

Force jobs to consume an entire processorForce jobs to consume an entire processor Today 4-8+ cores, tomorrow 32+ cores, …Today 4-8+ cores, tomorrow 32+ cores, … Package jobs with a parallel libraryPackage jobs with a parallel library

HTPC jobs as portable as any other jobHTPC jobs as portable as any other jobMPI, OpenMP, your own scripts, …MPI, OpenMP, your own scripts, …Parallel libraries can be optimized for on-board Parallel libraries can be optimized for on-board memory accessmemory access

All memory is available for efficient utilizationAll memory is available for efficient utilization Submit the jobs via OSG (or Condor-G)Submit the jobs via OSG (or Condor-G)

Page 12: HTPC - High  Throughput Parallel  Computing (on the OSG)

Problem areasProblem areas

Advertising HTPC capability on OSGAdvertising HTPC capability on OSGAdapting OSG job submission/mgmt toolsAdapting OSG job submission/mgmt tools GlideinWMSGlideinWMS

Ensure that Gratia accounting can identify Ensure that Gratia accounting can identify jobs and apply the correct multiplierjobs and apply the correct multiplierSupport more HTPC scientistsSupport more HTPC scientistsHTPC enable more sitesHTPC enable more sites

Page 13: HTPC - High  Throughput Parallel  Computing (on the OSG)

What’s the magic RSL?What’s the magic RSL?

Site SpecificSite Specific We’re working on documents/standards We’re working on documents/standards

PBSPBS (host_xcount=1)(xcount=8)(queue=?)(host_xcount=1)(xcount=8)(queue=?)

LSFLSF (queue=?)(exclusive=1)(queue=?)(exclusive=1)

CondorCondor (condorsubmit=(‘+WholeMachine’ true))(condorsubmit=(‘+WholeMachine’ true))

Page 14: HTPC - High  Throughput Parallel  Computing (on the OSG)

Examples of HTPC users:Examples of HTPC users:

Oceanographers:Oceanographers: Brian Blanton, Howard Lander (RENCI)Brian Blanton, Howard Lander (RENCI)

Redrawing flood map boundariesRedrawing flood map boundaries ADCIRCADCIRC

Coastal circulation and storm surge modelCoastal circulation and storm surge modelRuns on 256+ cores, several daysRuns on 256+ cores, several days

Parameter sensitivity studiesParameter sensitivity studiesDetermine best settings for large runsDetermine best settings for large runs220 jobs to determine optimal mesh size220 jobs to determine optimal mesh sizeEach job takes 8 processors, several hoursEach job takes 8 processors, several hours

Page 15: HTPC - High  Throughput Parallel  Computing (on the OSG)

Examples of HTPC users:Examples of HTPC users:

ChemistsChemists UW Chemistry groupUW Chemistry group GromacsGromacs Jobs take 24 hours on 8 coresJobs take 24 hours on 8 cores Steady stream of 20-40 jobs/daySteady stream of 20-40 jobs/day

Peak usage is 320,000 hours per monthPeak usage is 320,000 hours per monthWritten 9 papers in 10 months based on thisWritten 9 papers in 10 months based on this

Page 16: HTPC - High  Throughput Parallel  Computing (on the OSG)

Chemistry Usage of HTPCChemistry Usage of HTPC

Page 17: HTPC - High  Throughput Parallel  Computing (on the OSG)

OSG sites that allow HTPCOSG sites that allow HTPC

OUOU The first site to run HTPC jobs on the OSG!The first site to run HTPC jobs on the OSG!

PurduePurdueClemsonClemsonNebraskaNebraskaSan Diego, CMS Tier-2San Diego, CMS Tier-2

Your site can be on this list!

Page 18: HTPC - High  Throughput Parallel  Computing (on the OSG)

Future DirectionsFuture Directions

More Sites, more cycles!More Sites, more cycles!

More usersMore users Working with Atlas (AthenaMP)Working with Atlas (AthenaMP) Working with Amber 9Working with Amber 9 There is room for you…There is room for you…

Use glide-in to homogenize accessUse glide-in to homogenize access

Page 19: HTPC - High  Throughput Parallel  Computing (on the OSG)

ConclusionsConclusions

HTPC adds a new dimension to HPC HTPC adds a new dimension to HPC computing – ensembles of parallel jobscomputing – ensembles of parallel jobsThis approach minimizes portability issues This approach minimizes portability issues with parallel codeswith parallel codesKeep same job submission modelKeep same job submission modelNot hypothetical – we’re already running Not hypothetical – we’re already running HTPC jobsHTPC jobsThanks to many helping handsThanks to many helping hands

Page 20: HTPC - High  Throughput Parallel  Computing (on the OSG)

Additional SlidesAdditional Slides

Some of these are from Greg Thain Some of these are from Greg Thain (UWisc)(UWisc)

Page 21: HTPC - High  Throughput Parallel  Computing (on the OSG)

The playersThe players

Dan FraserDan FraserComputation Inst.Computation Inst.University of ChicagoUniversity of Chicago

Miron LivnyMiron LivnyU WisconsinU Wisconsin

John McGeeJohn McGeeRENCIRENCI

Greg ThainGreg ThainU WisconsinU WisconsinKey DeveloperKey Developer

Funded by NSF-STCIFunded by NSF-STCI

Page 22: HTPC - High  Throughput Parallel  Computing (on the OSG)

Configuring Condor for HTPCConfiguring Condor for HTPC

Two strategies:Two strategies: Suspend/drain jobs to open HTPC slotsSuspend/drain jobs to open HTPC slots Hold empty cores until HTPC slot is openHold empty cores until HTPC slot is open

http://condor-wiki.cs.wisc.eduhttp://condor-wiki.cs.wisc.edu

Page 23: HTPC - High  Throughput Parallel  Computing (on the OSG)

How to submitHow to submit

universe = vanilla

requirements = (CAN_RUN_WHOLE_MACHINE =?= TRUE)

+RequiresWholeMachine=trueexecutable = some job

arguments = arguments

should_transfer_files = yes

when_to_transfer_output = on_exit

transfer_input_files = inputs

queue

Page 24: HTPC - High  Throughput Parallel  Computing (on the OSG)

MPI on Whole machine jobsMPI on Whole machine jobs

universe = vanilla

requirements = (CAN_RUN_WHOLE_MACHINE =?= TRUE)

+RequiresWholeMachine=true

executable = mpiexec

arguments = -np 8 real_exeshould_transfer_files = yes

when_to_transfer_output = on_exit

transfer_input_files = real_exequeue

Whole machine mpi submit file

Page 25: HTPC - High  Throughput Parallel  Computing (on the OSG)

How to submit to OSGHow to submit to OSGuniverse = griduniverse = grid

GridResource = some_grid_hostGridResource = some_grid_host

GlobusRSL = MagicRSLGlobusRSL = MagicRSL

executable = wrapper.shexecutable = wrapper.sh

arguments = argumentsarguments = arguments

should_transfer_files = yesshould_transfer_files = yes

when_to_transfer_output = on_exitwhen_to_transfer_output = on_exit

transfer_input_files = inputstransfer_input_files = inputs

transfer_output_files = outputtransfer_output_files = output

queuequeue