An overview of batch processing - Nevis Laboratoriesseligman/root-class/BatchProcessing201… ·...

17
An overview of batch processing 2-June-2016

Transcript of An overview of batch processing - Nevis Laboratoriesseligman/root-class/BatchProcessing201… ·...

  • Anoverviewofbatchprocessing

    2-June-2016

  • Yourcomputer

    Yourprogram

    One-on-one

  • Yourcomputer(mul?plecores)

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    Mul?pleprogramsonasinglecomputer

  • Yourcomputer(mul?plecores)

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    Abatchsystemmanagingmul?pleprogramsonasinglecomputer

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    onhold

  • Batchnode

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    Abatchsystemmanagingmul?pleprogramsonmul?plecomputers

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    onhold

    Yourcomputer

    BatchnodeBatchnode

    Batchnode BatchnodeBatchnode

    Batchmanager

  • ThestandardsoFwareformanagingbatchsystemsinscien?ficcompu?ngisHTCondor(orjustCondor)

    MainwebpagehKp://research.cs.wisc.edu/htcondor/

    Quickstart

    hKp://research.cs.wisc.edu/htcondor/quick-start.html

    FullmanualhKp://research.cs.wisc.edu/htcondor/manual/v7.6/2_Users_Manual.html

    •  WeuseanolderversionofCondorintheNevispar?cle-physics

    systems.•  S?cktothe“vanilla”universe;the“standard”universewon’twork

    forROOToranyotherpar?cle-physicssoFware(soyoudon’tneedcondor_compile).

  • Batchnode

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    Condormanagingmul?pleprogramsonmul?plecomputerswithmul?plequeues

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    onhold

    Submitmachine

    BatchnodeBatchnode

    Batchnode BatchnodeBatchnode

    Condormaster

    Condorpool

  • Batchnode

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    Condorwillhaltaqueueinfavorofaninterac?veprogram

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    onhold

    Submitmachine

    BatchnodeBatchnode

    Batchnode BatchnodeBatchnode

    Condormaster

    Condorpool

    Someoneloggedin!

  • Batchnode

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    Condormanagingmul?pleprogramsonmul?plecomputerswithmul?pleconfigura?ons

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    onhold

    Submitmachine

    BatchnodeBatchnode

    Batchnode BatchnodeBatchnode

    Condormaster

    Condorpool

  • Batchnode

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    Condoruses“ClassAds”tomatchyourrequirementswithwhateachnodeoffers

    YourprogramYourprogram Yourprogram YourprogramYourprogram

    onhold

    Submitmachine

    BatchnodeBatchnode

    Batchnode BatchnodeBatchnode

    Condormaster

    Condorpool

    Yourrequirements(jobClassAd)

    Whatanodeoffers(machineClassAd)

  • ResourcePlanning•  Condorcan’tdoeverythingforyou.•  Thinkaboutinputfiles(includingprograms)andoutputfilesandhow

    they’llbeaccessed.•  Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.•  Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!•  Moral:Letcondortransferyourfiles…whenpossible.

    Whenyoucan’tletcondortransferyourfiles,herearedisk-sharingmethodsoutsideofcondor:•  NFS–usedatNevis•  CVMFS–FermilabandCERN•  Grid,BlueArc–onlyusedatFermilab•  AFS–obsolete,s?llusedatCERN

  • ResourcePlanning

    Yourserver

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    /home

    Whatwedon’tdo

    •  Condorcan’tdoeverythingforyou.•  Thinkaboutinputfiles(includingprograms)andoutputfilesandhow

    they’llbeaccessed.•  Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.•  Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!•  Moral:Letcondortransferyourfiles…whenpossible.

  • ResourcePlanning•  Condorcan’tdoeverythingforyou.•  Thinkaboutinputfiles(includingprograms)andoutputfilesandhow

    they’llbeaccessed.•  Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.•  Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!•  Moral:Letcondortransferyourfiles…whenpossible.

    Yourserver

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    node

    /home

    Fileserver

    /share

    /data

    Whatwedo

  • Computer Systems at NevisLinux Cluster

    hypa?aadministra?on,NIS

    kolyaATLASfranklin

    Mail

    karthurATLAS

    hogwartsstaff

    Administra?veservers Workgroup/Loginservers

    Clientsand

    x-terms

    Clientsand

    x-terms

    Worksta?onsbatchnodesstudentboxes

    shangDOE

    annexoff-sitebackup&mail

    adawebserver

    sullivanmailing-listserver

    tehanuVERITAS

    shelleybackupserver

    xenia

    tangoSMB

    hermesDNS,batch

    virtualmachines

    houstonNeutrino

    Fileservers

    xenia2

    vetchgedserret

    amsterdam

    bleeker

    westside

    riverside

  • node05

    Bringingthejobtothedata

    Submitmachine

    node06node04

    node02 node03node01

    Condormaster

    requirements = (machine = node04.nevis.columbia.edu)

    bigfile1.root bigfile2.root bigfile3.root

    bigfile4.root bigfile5.root bigfile6.root

    Somewrapperscript

  • Final?ps•  Splitupyourtasksoeachcondorjobtakes20-60minutes•  Ifyourjobmustbepreempted,itwillhavetorunfromthebeginningonthesamemachinethatcancelledthejob

    •  Testyourjobwithoneprocessbeforesubmiqngitfor10,000processes!

  • Resources

    MainwebpagehKp://research.cs.wisc.edu/htcondor/

    QuickstarthKp://research.cs.wisc.edu/htcondor/quick-start.html

    FullmanualhKp://research.cs.wisc.edu/htcondor/manual/v7.6/2_Users_Manual.html

    Nevispar?cle-physicscondorguide

    hKps://twiki.nevis.columbia.edu/twiki/bin/view/Nevis/Condor

    BasicCondor@NevistutorialhKp://www.nevis.columbia.edu/~seligman/root-class/