Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan...
-
Upload
lesly-reach -
Category
Documents
-
view
212 -
download
0
Transcript of Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan...
Opportune Job Shredding:An Efficient Approach for
Scheduling Parameter Sweep Applications
Rohan Kurian, Pavan Balaji, P. Sadayappan
The Ohio State University
Parameter Sweep Applications
An important class of applications
Set of independent tasks
MCell Application3D simulations for sub-cellular architecture/physiology
GTOMO (Parallel Tomography) ApplicationMultiple view-point simulation
Systems exist for scheduling on the Grid
Cluster-based Scheduling?
Application Level Schedulers
Manage the scheduling of applications
Break the application to appropriate
chunks
APST (AppLeS Parameter Sweep Template)
NIMROD
Greedy approach to schedule PSA
chunks
Presentation Roadmap
Job Scheduling in Clusters
Multi-Site Job Scheduling
PSA Scheduling Strategies
Multi-Site Scheduling of PSAs
Performance Evaluation
Conclusions
Job Scheduling in Clusters
Mapping arriving jobs to available resources
Multiple Schemes for Scheduling First Come First Serve (FCFS)
Conservative Scheduling
Aggressive or EASY Scheduling
Fair-Share Constraints A user can not have more than ‘N’ queued jobs
Submitting the multiple chunks of a PSA job Violation of Fair-Share constraints
Combine chunks to form a single parallel job
Formation of PSAs in ClustersSmall
Independent Tasks
Parallel Parameter
Sweep Application
Presentation Roadmap
Job Scheduling in Clusters
Multi-Site Job Scheduling
PSA Scheduling Strategies
Multi-Site Scheduling of PSAs
Performance Evaluation
Conclusions
Multi-Site Job Scheduling
Multiple Simultaneous Requests
Job submitted to multiple sites
Started on the earliest cluster
Existing schemes have limitations
Heterogeneous Clusters
Different Scheduling Schemes
Multiple-simultaneous-requests
Meta Scheduler
Local Scheduler
Meta Scheduler
Local Scheduler
Meta Scheduler
Local Scheduler
Jobs
Jobs
JobsSite 1 Site 2
Site 3
Presentation Roadmap
Job Scheduling in Clusters
Multi-Site Job Scheduling
PSA Scheduling Strategies
Multi-Site Scheduling of PSAs
Performance Evaluation
Conclusions
PSA Scheduling Strategies Flooding based Job Shredding
Submit all chunks in the PSA at onceGreedy approach Improves User and System metricsDoesn’t ensure fairness to Non-PSA jobs
Opportune Job ShreddingUses an additional Application-Level Scheduler
Monitors the current schedule of the system
If no normal backfill is possibleAllow PSA jobs to shred and backfill
Presentation Roadmap
Job Scheduling in Clusters
Multi-Site Job Scheduling
PSA Scheduling Strategies
Multi-Site Scheduling of PSAs
Performance Evaluation
Conclusions
Multi-Site Scheduling for PSAs
Two-level Application Level Schedulers
No constraints on sites
Allowed to have different speeds
Allowed to have different scheduling
policies
Similar to “Multiple Simultaneous
Requests”
Simultaneous requests only for PSAs
Multi-Site Scheduling for PSAs
App-Level Scheduler
Job Queue Local Scheduler
App-Level Scheduler
Job Queue Local
Scheduler
App-Level Scheduler
Job Queue Local
Scheduler
MetaApplication-Level
Scheduler
Site 1
Site 2
Site 3
Presentation Roadmap
Job Scheduling in Clusters
Multi-Site Job Scheduling
PSA Scheduling Strategies
Multi-Site Scheduling of PSAs
Performance Evaluation
Conclusions
Performance MetricsResponse Time
Completion Time – Submit Time
SlowdownResponse Time / Runtime
Loss of Capacity (LOC)LOC = min {(waiting jobs procs), idle
procs}T = Time for which this state lastsLOC = LOC x T
Evaluation Scheme
Simulation based Approach
CTC trace from Feitelson’s archive
EASY backfilling used
For multi-site evaluation
CTC traces from 3 different months
Processing speeds in the ratio 2:1:3
Flooding Based Job ShreddingAverage Slowdown (10% PSA Jobs)
-150
-100
-50
0
50
100
1 1.2 1.5
LoadP
erce
ntag
e de
crea
se
All Jobs PSA Jobs Non-PSA Jobs
Average Response Time(10% PSA Jobs)
-20
0
20
40
60
80
1 1.2 1.5
Load
Per
cent
age
decr
ease
All Jobs PSA Jobs Non-PSA Jobs
• Up to 60% improvement for PSA Jobs• Up to 90% worse performance for Non-PSA
Jobs
Flooding: Job Category wise breakup
Average Response Time(10% PSA Jobs)
-100
-80
-60
-40
-20
0
20
1 1.2 1.5
Load
Pe
rce
nta
ge
de
cre
ase
NarrowShort NarrowLong
WideShort WideLong
Average Slowdown(10% PSA Jobs)
-140
-120
-100
-80
-60
-40
-20
0
20
40
1 1.2 1.5
LoadP
erc
en
tag
e d
ecr
ea
seNarrowShort NarrowLong
WideShort WideLong
• Narrow Short Non-PSA jobs suffer most• Loss of back-filling opportunities is the main
reason
Flooding: Loss of Capacity
Loss Of Capacity (10% PSA jobs)
0
10
20
30
40
50
60
70
80
1 1.2 1.5
Load
Pe
rce
nta
ge
de
cre
ase
10% PSA Jobs
• Up to 75% improvement in the Loss of Capacity
Opportune Job ShreddingAverage Response Time
(10% PSA Jobs)
-2
0
2
4
6
8
10
1 1.2 1.5
Load
Per
cent
age
decr
ease
All Jobs PSA Jobs Non-PSA Jobs
Average Slowdown(10% PSA Jobs)
-100
1020304050607080
1 1.2 1.5
LoadP
erce
ntag
e de
crea
se
All Jobs PSA Jobs Non-PSA Jobs
• Up to 70% improvement for PSA Jobs• Less than 2% worsening in performance for Non-
PSA Jobs
Opportune: Job Category wise breakup
Average Response Time(10 % PSA Jobs)
-3
-2
-1
0
1
2
3
4
1 1.2 1.5
Load
Per
cent
age
decr
ease
NarrowShort NarrowLong
WideShort WideLong
Average Slowdown (10% PSA Jobs)
-8
-6
-4
-2
0
2
4
1 1.2 1.5
LoadP
erce
ntag
e de
crea
se
NarrowShort NarrowLong
WideShort WideLong
• No category of Non-PSA jobs suffers more than 7%
Opportune: Loss of Capacity
Loss Of Capacity (10% PSA Jobs)
0
2
4
6
8
10
12
14
1 1.2 1.5
Load
Per
cent
age
decr
ease
10% PSA Jobs
• Up to 12% improvement in the Loss of Capacity
Opportune (Multi-Site)Average Response Time
(10% PSA Jobs)
0102030405060708090
1 1.2 1.5
Load
Per
centa
ge
dec
reas
e
PSA Jobs Cluster1 Non-PSA Jobs Cluster1
PSA Jobs Cluster2 Non-PSA Jobs Cluster2
PSA Jobs Cluster3 Non-PSA Jobs Cluster3
Average Slowdown (10% PSA Jobs)
-40
-20
0
20
40
60
80
100
120
1 1.2 1.5
LoadPe
rcen
tage
dec
reas
ePSA Jobs Cluster1 Non-PSA Jobs Cluster1
PSA Jobs Cluster2 Non-PSA Jobs Cluster2
PSA Jobs Cluster3 Non-PSA Jobs Cluster3
• Up to 95% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs
Opportune (Multi-Site):Response Time
Average Response Time (10% PSA Jobs)
0
1020
3040
5060
7080
90
1 1.2 1.5
Load
Perc
enta
ge d
ecre
ase
PSA Jobs Cluster1 Non-PSA Jobs Cluster1 PSA Jobs Cluster2Non-PSA Jobs Cluster2 PSA Jobs Cluster3 Non-PSA Jobs Cluster3
• Up to 75% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs
Opportune (Multi-Site):Slowdown
Average Slowdown (10% PSA Jobs)
-40
-20
0
20
40
60
80
100
120
1 1.2 1.5
Load
Perc
enta
ge d
ecre
ase
PSA Jobs Cluster1 Non-PSA Jobs Cluster1 PSA Jobs Cluster2Non-PSA Jobs Cluster2 PSA Jobs Cluster3 Non-PSA Jobs Cluster3
• Up to 95% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs
Opportune (Multi-Site):Loss of Capacity
Loss Of Capacity (10% PSA Jobs)
05
101520253035404550
1 1.2 1.5
Load
Per
cent
age
decr
ease
Cluster1
Cluster2
Cluster3
• Up to 45% improvement in the Loss of Capacity
Concluding Remarks
Opportune Job ShreddingEfficient Scheduling of PSAsSingle Site and Multi-Site versionsSignificant improvement for PSA jobsEnsures that Non-PSA jobs are not affected
Plan to integrate this with Prod. Schedulers
Thank You!