Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.
-
Upload
elijah-shaw -
Category
Documents
-
view
269 -
download
12
Transcript of Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.
Stochastic Models and Analysis for Resource
Management in Server Farms
Thesis Proposal
VARUN GUPTA
1
SMART : Stochastic Models and Analysis for Resource
managemenT in server farms
Thesis Proposal
VARUN GUPTA
2
SARCASM : Stochastic models and Analysis for ResourCe
mAnagement in Server farMs
Thesis Proposal
VARUN GUPTA
3
Supercomputers
Data center pods
Cloud computing
Multi-core chipsArray-of-Wimpy-Nodes4
Server farm popularity is on the rise
+ high compute capacity+ incremental growth+ fault-tolerance+ efficient resource utilization+ energy efficiency+ high parallelism
5
BackendserversFrontend
dispatcher/router
Design Choice 1:Which server to send a job to?
Design Choice 2:Scheduling policy
for backend servers?
Design Choice 3: When to turn servers on/off for efficiency?
Design Choice 4: How many servers to buy? Of what capacity?
A simple server farm “template”
6
BackendserversFrontend
dispatcher/router
Design Choice 1:Dispatching policy
Design Choice 2:Scheduling policy
Design Choice 3: Dynamic capacity scaling
Design Choice 4: Provisioning
A simple server farm “template”
Thesis GoalStochastic modeling and analysis to answer the
questions faced by server farm designers/managers
Long history of stochastic modeling and analysis• Erlang (1909): Operator provisioning in telephone exchanges• Inventory/production management• Call center staffing
Several gaps between traditional models and compute server farms• New constraints• New opportunities• New metrics
Bridge these gaps by developing new models andanalysis techniques relevant to requirements of today’sserver farms
7
8
Application GAPDesign Choice
Dispatching
Backend Scheduling
Capacity scaling
Provisioning
Web server farms
No analysis for PS server farms
Energy management
in Data centers
Setup penalties + unpredictable demands
Fully replicated DBs
High variance in job sizes
Database servers Thrashing
VM management
VM migration
and departures
A summary of explored gaps
A glimpse of results
9
10
Application GAPDesign Choice
Dispatching
Backend Scheduling
Capacity scaling
Provisioning
Web server farms
No analysis for PS server farms
Energy management
in Data centers
Setup penalties + unpredictable demands
Fully replicated DBs
High variance in job sizes
Database servers Thrashing
VM management
VM migration
and departures
A summary of explored gaps
11
Application 1 : Web server farms/cluster computing
Immediatedispatch
12
PS
12
Q: Good load balancing dispatchers? How many servers? Existing work limited to First-Come-First-Served servers or Exponential job size distribution
GAP : Processor Sharing servers + high-variance job sizes
PS
PS
Application 1 : Web server farms/cluster computing
Immediatedispatch
13
PS
13
PS
PS
???Poisson
arrivals
Join-Shortest-Queue (JSQ) : most popular
• Balances load• Greedy
KHomogeneous
Servers
Q: Is JSQ optimal for general job size distribution?Bonomi [90] : Optimal for Exponential job size distribution when job sizes unknown
Q: Analysis of JSQ for general job size distribution?
Model : Dispatching policies for M/G/K-PS
10
12
14
16
18
20
Det
Exp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
MeanResponseTime
RANDOM
???PS
PSSimulation Results
Increasing job-size variance (same mean) 14
10
12
14
16
18
20
Det
Exp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
MeanResponseTime
RANDOM
JSQ
???PS
PS
Increasing job-size variance (same mean)
Simulation Results
15
10
12
14
16
18
20
Det
Exp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
MeanResponseTime
RANDOM
Round-Robin
JSQ
???PS
PSSimulation Results
Increasing job-size variance (same mean) 16
10
12
14
16
18
20
Det
Exp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
MeanResponseTime
RANDOM
Round-Robin
JSQOPT-0
???PS
PSSimulation Results
Increasing job-size variance (same mean) 17
18
PS
18
PS
PS
JSQPoisson
arrivals
KHomogeneous
Servers
Conjecture: JSQ is near-optimal (even among size-aware dispatching policies)
Performance of JSQ is “nearly-insensitive” to the job size distribution
Model : Dispatching policies for M/G/K-PS
19
Contribution 1: The Single-Queue-Approximation
Goal : Approx. for mean response time under Exponential job sizes
Compensate for the effect of other queues via state-dependent arrival rates
• λ(n) easier to approximate (only need to worry about λ(1), λ(2))
• < 2% error in mean response time for up to 64 servers
PS
Mn/M/1/PS
λ(n)≈
M/M/K-JSQ/PS
JSQ
PS
PSλ(n) = state-dependent arrival rate
[Performance’07]
Contribution 2: Many-server heavy-traffic analysis (PROPOSED)
Goal 1: Quantify the “near-insensitive” behavior Goal 2: Optimal dispatching policies for heterogeneous servers
Hard to prove anything in general, must resort to limiting regimes
The many-server heavy-traffic scaling
• Shows the effect of job size variability• Intuition into behavior of JSQ 20
JSQλ = K - constant
PS
PS
PS
PS
K → ∞
21
Application GAPDesign Choice
Dispatching
Backend Scheduling
Capacity scaling
Provisioning
Web server farms
No analysis for PS server farms
Energy management
in Data centers
Setup penalties + unpredictable demands
fully replicated DBs
High variance in job sizes
Database servers Thrashing
VM management
VM migration
and departures
A summary of explored gaps
22
Q: When to turn servers ON/OFF to adapt to demand?
Existing work assumes zero setup delays, knowledge of future demand pattern
GAP : setup penalties non-zero + unpredictable demand patterns
Application 2 : Energy-Performance trade-off in
Data centers/Cloud computing
23
First-In-First-Out buffer
Poisson
arrivals
ON
ON
SETUP
OFF
Contribution: New traffic-oblivious policy DELAYEDOFF• Servers turn off after idle for twait
• If arrival sees all servers busy, turn a new server ON• Most-Recently-Busy (MRB) dispatching: send job to server which
idled last
Theorem: Under DELAYEDOFF, as the load , the number of ON servers is concentrated around
DELAYEDOFF is asymptotically
optimal
Model : Dynamic capacity scaling in M/M/∞ with setup delays
[Performance’10]
24
Simulation Results for DELAYEDOFF
PROPOSED: Refine DELAYEDOFF, prove performance guarantees
25
Application GAPDesign Choice
Dispatching
Backend Scheduling
Capacity scaling
Provisioning
Web server farms
No analysis for PS server farms
Energy management
in Data centers
Setup penalties + unpredictable demands
fully replicated DBs
High variance in job sizes
Database servers Thrashing
VM management
VM migration
and departures
A summary of explored gaps
26
Application 3 : Fully replicated databases
27
First-In-First-Out buffer
Q: How many servers, and what speed?
No exact analysis, approximations good for low job size variance
GAP : Job sizes have very high variance
Application 3 : Fully replicated databases
28
First-In-First-Out buffer
Poisson
arrivals
Application 3 : Fully replicated databases
29
First-In-First-Out buffer
The Holy Grail of queueing theory (model for many other applications)
yet no exact analysis!
Lee-Longton [1959] :
Poisson
arrivalssquared coeff. of
variation of job size dist.
typically C2 > 20
Model : M/G/K/FCFS
30
Contribution 1: Inapproximability results
Goal: No accurate approx. based only on first 2 moments
Pick a subclass of distributions• Analytically tractable • Large enough to fix 2 moments, but wiggle room to prove gap
0
2
4
6
Increasing 3rd moment →
E[Delay]
Lee-Longton Approximation
H2
{G | 2 moments}
[QUESTA’10]
Contribution 2: Tight moment-based bounds
Goal: Better approximation using n moments?
[QUESTA ’10] : Conjectured extremal distributions
M/G/K/FCFS under light-traffic• Extremality should be invariant to load• Verify conjectures for n = 2,3• Also for other queueing systems with no exact analysis
31
E[Delay]
{G | n moments}
?
?
tight bounds | n moments
, 4, 5, 6… proposed work
32
Application GAPDesign Choice
Dispatching
Backend Scheduling
Capacity scaling
Provisioning
Web server farms
No analysis for PS server farms
Energy management
in Data centers
Setup penalties + unpredictable demands
Fully replicated DBs
High variance in job sizes
Database servers Thrashing
VM management
VM migration
and departures
A summary of explored gaps
33
500MB
1.5GB
1GB
2GB
1GB
1GB
Application 5 : Managing VMs in the cloud
34
Q: Which server to start VM on? What capacity servers to buy?Assumption of permanent itemsGAP : VMs depart + VM migration possible
Contribution : Stochastic bin packing model with job departure/migrationPROPOSED: develop packing/migration schemes for efficient packing
Application 5 : Managing VMs in the cloud
500M
1G
250M
500M
Time line for proposed work
35
Application GAP Status Proposed Work
Web server farms
No analysisfor PS server
farms
70% CompletedNov ‘10-Jan ‘11
• Optimal dispatch policies for heterogeneous servers
• Characterizing “near-insensitivity”
Energymanagement in Data centers
Setup penalties +
unpredictable demands
80% CompletedFeb-Mar ’11
• Refine the proposed DELAYEDOFF policy
• Performance guarantees for traffic- oblivious capacity scaling
Fully replicated DBs
High variance in job sizes
CompletedVerifying conjectures on tight
moment- based bounds beyond the scope of the thesis
Database servers
Thrashing Completed
VM managementVM migration
and departures
10% CompletedOct’ 10-Jan ’11
Develop and analyze heuristics for online stochastic bin packing with item departures and migrations
Expected Graduation: MAY2011
# jobs at server
Speed
ON
SETUP
OFF
JSQPS
PS
M/G/K
36
Dispatch
VM
VM
VM
37
[Performance’07]
V. Gupta, M. Harchol-Balter, K. Sigman, and W. Whitt. Analysis of join-the-shortest-queue routing for web server farms. PERFORMANCE 2007
[Performance’10]
V. Gupta, A. Gandhi, M. Harchol-Balter, and M. Kozuch. Optimality analysis of energy-performance trade-off for server farm management. PERFORMANCE 2010.
[QUESTA’10] V. Gupta, J. Dai, M. Harchol-Balter, and B. Zwart. On the inapproximability of M/G/K: why two moments of job size distribution are not enough. Queueing Systems, Vol 64, 2010.
[TR’08] V. Gupta, J. Dai, M. Harchol-Balter, and B. Zwart. The effect of higher moments of job size distribution on the performance of an M/G/K queueing system. Technical Report ,CMU, 2008.
[Sigmetrics’09] V. Gupta and M. Harchol-Balter. Self-adaptive admission control policies for resource-sharing systems. SIGMETRICS 2009.
V. Gupta and T. Osogami, On Markov-Krein characterization of mean sojourn time in M/G/K. Under submission.
Other Work
Time-varying systems
[Sigmetrics’06] V. Gupta, M. Harchol-Balter, A. Scheller-Wolf, and U. Yechiali. Fundamental characteristics of queues with fluctuating load. Sigmetrics 2006
[MAMA’08a] V. Gupta, and P. Harrison. Fluid level in a reservoir with ON-OFF source. MAMA 2008.
Single Server
Scheduling
[MAMA’08b] V. Gupta. Finding the optimal quantum size: Sensitivity analysis of the M/G/1 round-robin queue. MAMA 2008.
[Performance’10b]
V. Gupta, M. Burroughs, and M. Harchol-Balter. Analysis of scheduling policies under correlated job sizes. Performance 2010.
Distributed Data
placement
[INFOCOM’10] S. Borst, V. Gupta, and A. Walid. Distributed caching algorithms for Content Distribution Networks. INFOCOM 2010.
[SOCC’10]H. Amur, J. Cipar, V. Gupta, M. Kozuch, G.Ganger, and K. Schwan, Robust and flexible power-proportional storage. Symposium on Cloud Computing, 2010.
Epidemics [INFOCOM’08]M. Vojnovic, V. Gupta, T. Karagiannis, and C. Gkantsidis. Sampling strategies for epidemic-style information dissemination. INFOCOM 2008.
Stability analysis
A. Busic, V.Gupta, and J. Mairesse. Stability of the bipartite matching model. Under Submission
References