Dan Bradley Computer Sciences Department University of Wisconsin-Madison [email protected] Schedd On...
-
Upload
russell-midgley -
Category
Documents
-
view
223 -
download
0
Transcript of Dan Bradley Computer Sciences Department University of Wisconsin-Madison [email protected] Schedd On...
![Page 1: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/1.jpg)
Dan BradleyComputer Sciences Department
University of [email protected]
http://www.cs.wisc.edu/condor
Schedd On The Side
![Page 2: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/2.jpg)
www.cs.wisc.edu/condor
Schedd
ScheddOn The
Side
What is it?Specialized scheduler operating on schedd’s jobs.
Job 1Job 2Job 3Job 4Job 5…Job 4*
job queue
![Page 3: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/3.jpg)
www.cs.wisc.edu/condor
Condor Farm Story
Schedd
StartdResources
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
Application
condor_submit
job queue
•Now that this is working, howcan I use my collaborator’sresources too?
![Page 4: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/4.jpg)
www.cs.wisc.edu/condor
Option #1: Merge Farms
› Combine machines with collaborator into one Condor resource pool.o Everything works just like it did before.o Excellent option for small to medium clusters.
o Requires bidirectional connectivity to all startds, or equivalent via GCB.
o Requires some administrative coordination (e.g. upgrades, negotiator policy, security, etc.)
![Page 5: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/5.jpg)
www.cs.wisc.edu/condor
Option #2: Flocking Together
Schedd
LocalStartds
RemoteStartds
•full featured(std universe etc)•automatic matchmaking•easy to configure
•requires bidirectionalconnectivity•both sites must runcondor
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
![Page 6: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/6.jpg)
www.cs.wisc.edu/condor
Gatekeeper
X
Option #3: Grid Universe
Schedd
Startds
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed Random
SeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
•easier to live with private networks•may use non-Condor resources
•restricted Condor feature set(e.g. no std universe over grid)•must pre-allocating jobsbetween vanilla and grid universe
vanilla site X
![Page 7: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/7.jpg)
www.cs.wisc.edu/condor
Option #4: Routing Jobs
Schedd
LocalStartds
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed Random
SeedRandomSeed
RandomSeed Random
SeedRandomSeed
RandomSeed Random
SeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
ScheddOn The
Side Gatekeeper
X
Y
Z
vanilla site X
RandomSeed
RandomSeed
site Y site Z
•dynamic allocation of jobsbetween vanilla and grid universes.•not every job is appropriate fortransformation into a grid job.
![Page 8: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/8.jpg)
www.cs.wisc.edu/condor
What About Flow Control?
› May restrict routing to jobs which have been rejected by negotiator.
› May limit maximum actively routed jobs on a per site basis.
› May limit maximum idle routed jobs per site.
› Periodic remove of idle routed jobs is possible, but no guarantee of optimal rescheduling.
› Routing table may be reconfigured dynamically.
› Multicast? Might be interesting to try.
![Page 9: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/9.jpg)
www.cs.wisc.edu/condor
What About I/O?›Jobs must be sandboxable (i.e. specifying input/output via transfer-files mechanism).
›Routing of standard universe is not supported.
›Additional restrictions may apply, depending on site network and disk.
![Page 10: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/10.jpg)
www.cs.wisc.edu/condor
What Types of Grids?›Routing table may contain any combination of grid types supported by the grid universe.
›Example: Condor-C
Schedd
ScheddOn The
Side
Schedd X
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
site X
•for two Condor sites, schedd-to-scheddsubmission requires no additional software•however, still not as trivial to use as flocking
![Page 11: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/11.jpg)
www.cs.wisc.edu/condor
Routing Behind the Scenes
Gatekeeper
XSchedd
ScheddOn The
Side
Schedd X3
X2
•navigate internal firewalls•provide custom routesfor special users•improve scalability•However, keep in mindI/O requirements etc.
![Page 12: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/12.jpg)
www.cs.wisc.edu/condor
Future Step: Glidein Factory
Gatekeeper
X
Schedd
Startds
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
•true late binding of jobs to resources•may run on top of non-Condor sites•supports full feature set of Condor(e.g. standard universe)
•requires GCB on network boundary(initiated by schedd-on-the-side?)
homesite X
ScheddOn The
Side
glidein jobs
![Page 13: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/13.jpg)
www.cs.wisc.edu/condor
Glideing in the Works
Schedd
ScheddOn The
Side
glidein factory
site X
schedd-to-schedd
schedd-to-gatekeeper
•hierarchical strategy for scalabilityand reliability•better match for private networks
•may require some additional horsepowerfrom gatekeeper machine, perhaps adedicated element for “edge services”.
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
![Page 14: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu Schedd On The Side.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c895503460f949419c3/html5/thumbnails/14.jpg)
www.cs.wisc.edu/condor
Thanks
Interested?Let us know.
We are currentlyusing job routingfor specific usersat UW. Dan Bradley
Future developmentwill focus on moreuse-cases.