An overview of batch processing - Nevis Laboratoriesseligman/root-class/BatchProcessing201… ·...
Transcript of An overview of batch processing - Nevis Laboratoriesseligman/root-class/BatchProcessing201… ·...
-
Anoverviewofbatchprocessing
2-June-2016
-
Yourcomputer
Yourprogram
One-on-one
-
Yourcomputer(mul?plecores)
YourprogramYourprogram Yourprogram YourprogramYourprogram
Mul?pleprogramsonasinglecomputer
-
Yourcomputer(mul?plecores)
YourprogramYourprogram Yourprogram YourprogramYourprogram
Abatchsystemmanagingmul?pleprogramsonasinglecomputer
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
-
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Abatchsystemmanagingmul?pleprogramsonmul?plecomputers
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Yourcomputer
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Batchmanager
-
ThestandardsoFwareformanagingbatchsystemsinscien?ficcompu?ngisHTCondor(orjustCondor)
MainwebpagehKp://research.cs.wisc.edu/htcondor/
Quickstart
hKp://research.cs.wisc.edu/htcondor/quick-start.html
FullmanualhKp://research.cs.wisc.edu/htcondor/manual/v7.6/2_Users_Manual.html
• WeuseanolderversionofCondorintheNevispar?cle-physics
systems.• S?cktothe“vanilla”universe;the“standard”universewon’twork
forROOToranyotherpar?cle-physicssoFware(soyoudon’tneedcondor_compile).
-
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Condormanagingmul?pleprogramsonmul?plecomputerswithmul?plequeues
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Submitmachine
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Condormaster
Condorpool
-
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Condorwillhaltaqueueinfavorofaninterac?veprogram
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Submitmachine
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Condormaster
Condorpool
Someoneloggedin!
-
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Condormanagingmul?pleprogramsonmul?plecomputerswithmul?pleconfigura?ons
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Submitmachine
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Condormaster
Condorpool
-
Batchnode
YourprogramYourprogram Yourprogram YourprogramYourprogram
Condoruses“ClassAds”tomatchyourrequirementswithwhateachnodeoffers
YourprogramYourprogram Yourprogram YourprogramYourprogram
onhold
Submitmachine
BatchnodeBatchnode
Batchnode BatchnodeBatchnode
Condormaster
Condorpool
Yourrequirements(jobClassAd)
Whatanodeoffers(machineClassAd)
-
ResourcePlanning• Condorcan’tdoeverythingforyou.• Thinkaboutinputfiles(includingprograms)andoutputfilesandhow
they’llbeaccessed.• Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.• Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!• Moral:Letcondortransferyourfiles…whenpossible.
Whenyoucan’tletcondortransferyourfiles,herearedisk-sharingmethodsoutsideofcondor:• NFS–usedatNevis• CVMFS–FermilabandCERN• Grid,BlueArc–onlyusedatFermilab• AFS–obsolete,s?llusedatCERN
-
ResourcePlanning
Yourserver
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
/home
Whatwedon’tdo
• Condorcan’tdoeverythingforyou.• Thinkaboutinputfiles(includingprograms)andoutputfilesandhow
they’llbeaccessed.• Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.• Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!• Moral:Letcondortransferyourfiles…whenpossible.
-
ResourcePlanning• Condorcan’tdoeverythingforyou.• Thinkaboutinputfiles(includingprograms)andoutputfilesandhow
they’llbeaccessed.• Thinkaboutdiskspace.“df -h”and“du -shx *”canhelp.• Funfact:Thepar?cle-physicsCondorpoolscan’tseeyourhomedirectory!• Moral:Letcondortransferyourfiles…whenpossible.
Yourserver
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
node
/home
Fileserver
/share
/data
Whatwedo
-
Computer Systems at NevisLinux Cluster
hypa?aadministra?on,NIS
kolyaATLASfranklin
Mail
karthurATLAS
hogwartsstaff
Administra?veservers Workgroup/Loginservers
Clientsand
x-terms
Clientsand
x-terms
Worksta?onsbatchnodesstudentboxes
shangDOE
annexoff-sitebackup&mail
adawebserver
sullivanmailing-listserver
tehanuVERITAS
shelleybackupserver
xenia
tangoSMB
hermesDNS,batch
virtualmachines
houstonNeutrino
Fileservers
xenia2
vetchgedserret
amsterdam
bleeker
westside
riverside
-
node05
Bringingthejobtothedata
Submitmachine
node06node04
node02 node03node01
Condormaster
requirements = (machine = node04.nevis.columbia.edu)
bigfile1.root bigfile2.root bigfile3.root
bigfile4.root bigfile5.root bigfile6.root
Somewrapperscript
-
Final?ps• Splitupyourtasksoeachcondorjobtakes20-60minutes• Ifyourjobmustbepreempted,itwillhavetorunfromthebeginningonthesamemachinethatcancelledthejob
• Testyourjobwithoneprocessbeforesubmiqngitfor10,000processes!
-
Resources
MainwebpagehKp://research.cs.wisc.edu/htcondor/
QuickstarthKp://research.cs.wisc.edu/htcondor/quick-start.html
FullmanualhKp://research.cs.wisc.edu/htcondor/manual/v7.6/2_Users_Manual.html
Nevispar?cle-physicscondorguide
hKps://twiki.nevis.columbia.edu/twiki/bin/view/Nevis/Condor
BasicCondor@NevistutorialhKp://www.nevis.columbia.edu/~seligman/root-class/