Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

37
Stochastic Models and Analysis for Resource Management in Server Farms Thesis Proposal VARUN GUPTA 1

Transcript of Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

Page 1: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

Stochastic Models and Analysis for Resource

Management in Server Farms

Thesis Proposal

VARUN GUPTA

1

Page 2: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

SMART : Stochastic Models and Analysis for Resource

managemenT in server farms

Thesis Proposal

VARUN GUPTA

2

Page 3: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

SARCASM : Stochastic models and Analysis for ResourCe

mAnagement in Server farMs

Thesis Proposal

VARUN GUPTA

3

Page 4: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

Supercomputers

Data center pods

Cloud computing

Multi-core chipsArray-of-Wimpy-Nodes4

Server farm popularity is on the rise

+ high compute capacity+ incremental growth+ fault-tolerance+ efficient resource utilization+ energy efficiency+ high parallelism

Page 5: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

5

BackendserversFrontend

dispatcher/router

Design Choice 1:Which server to send a job to?

Design Choice 2:Scheduling policy

for backend servers?

Design Choice 3: When to turn servers on/off for efficiency?

Design Choice 4: How many servers to buy? Of what capacity?

A simple server farm “template”

Page 6: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

6

BackendserversFrontend

dispatcher/router

Design Choice 1:Dispatching policy

Design Choice 2:Scheduling policy

Design Choice 3: Dynamic capacity scaling

Design Choice 4: Provisioning

A simple server farm “template”

Page 7: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

Thesis GoalStochastic modeling and analysis to answer the

questions faced by server farm designers/managers

Long history of stochastic modeling and analysis• Erlang (1909): Operator provisioning in telephone exchanges• Inventory/production management• Call center staffing

Several gaps between traditional models and compute server farms• New constraints• New opportunities• New metrics

Bridge these gaps by developing new models andanalysis techniques relevant to requirements of today’sserver farms

7

Page 8: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

8

Application GAPDesign Choice

Dispatching

Backend Scheduling

Capacity scaling

Provisioning

Web server farms

No analysis for PS server farms

Energy management

in Data centers

Setup penalties + unpredictable demands

Fully replicated DBs

High variance in job sizes

Database servers Thrashing

VM management

VM migration

and departures

A summary of explored gaps

Page 9: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

A glimpse of results

9

Page 10: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

10

Application GAPDesign Choice

Dispatching

Backend Scheduling

Capacity scaling

Provisioning

Web server farms

No analysis for PS server farms

Energy management

in Data centers

Setup penalties + unpredictable demands

Fully replicated DBs

High variance in job sizes

Database servers Thrashing

VM management

VM migration

and departures

A summary of explored gaps

Page 11: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

11

Application 1 : Web server farms/cluster computing

Immediatedispatch

Page 12: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

12

PS

12

Q: Good load balancing dispatchers? How many servers? Existing work limited to First-Come-First-Served servers or Exponential job size distribution

GAP : Processor Sharing servers + high-variance job sizes

PS

PS

Application 1 : Web server farms/cluster computing

Immediatedispatch

Page 13: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

13

PS

13

PS

PS

???Poisson

arrivals

Join-Shortest-Queue (JSQ) : most popular

• Balances load• Greedy

KHomogeneous

Servers

Q: Is JSQ optimal for general job size distribution?Bonomi [90] : Optimal for Exponential job size distribution when job sizes unknown

Q: Analysis of JSQ for general job size distribution?

Model : Dispatching policies for M/G/K-PS

Page 14: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

10

12

14

16

18

20

Det

Exp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

MeanResponseTime

RANDOM

???PS

PSSimulation Results

Increasing job-size variance (same mean) 14

Page 15: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

10

12

14

16

18

20

Det

Exp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

MeanResponseTime

RANDOM

JSQ

???PS

PS

Increasing job-size variance (same mean)

Simulation Results

15

Page 16: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

10

12

14

16

18

20

Det

Exp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

MeanResponseTime

RANDOM

Round-Robin

JSQ

???PS

PSSimulation Results

Increasing job-size variance (same mean) 16

Page 17: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

10

12

14

16

18

20

Det

Exp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

MeanResponseTime

RANDOM

Round-Robin

JSQOPT-0

???PS

PSSimulation Results

Increasing job-size variance (same mean) 17

Page 18: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

18

PS

18

PS

PS

JSQPoisson

arrivals

KHomogeneous

Servers

Conjecture: JSQ is near-optimal (even among size-aware dispatching policies)

Performance of JSQ is “nearly-insensitive” to the job size distribution

Model : Dispatching policies for M/G/K-PS

Page 19: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

19

Contribution 1: The Single-Queue-Approximation

Goal : Approx. for mean response time under Exponential job sizes

Compensate for the effect of other queues via state-dependent arrival rates

• λ(n) easier to approximate (only need to worry about λ(1), λ(2))

• < 2% error in mean response time for up to 64 servers

PS

Mn/M/1/PS

λ(n)≈

M/M/K-JSQ/PS

JSQ

PS

PSλ(n) = state-dependent arrival rate

[Performance’07]

Page 20: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

Contribution 2: Many-server heavy-traffic analysis (PROPOSED)

Goal 1: Quantify the “near-insensitive” behavior Goal 2: Optimal dispatching policies for heterogeneous servers

Hard to prove anything in general, must resort to limiting regimes

The many-server heavy-traffic scaling

• Shows the effect of job size variability• Intuition into behavior of JSQ 20

JSQλ = K - constant

PS

PS

PS

PS

K → ∞

Page 21: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

21

Application GAPDesign Choice

Dispatching

Backend Scheduling

Capacity scaling

Provisioning

Web server farms

No analysis for PS server farms

Energy management

in Data centers

Setup penalties + unpredictable demands

fully replicated DBs

High variance in job sizes

Database servers Thrashing

VM management

VM migration

and departures

A summary of explored gaps

Page 22: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

22

Q: When to turn servers ON/OFF to adapt to demand?

Existing work assumes zero setup delays, knowledge of future demand pattern

GAP : setup penalties non-zero + unpredictable demand patterns

Application 2 : Energy-Performance trade-off in

Data centers/Cloud computing

Page 23: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

23

First-In-First-Out buffer

Poisson

arrivals

ON

ON

SETUP

OFF

Contribution: New traffic-oblivious policy DELAYEDOFF• Servers turn off after idle for twait

• If arrival sees all servers busy, turn a new server ON• Most-Recently-Busy (MRB) dispatching: send job to server which

idled last

Theorem: Under DELAYEDOFF, as the load , the number of ON servers is concentrated around

DELAYEDOFF is asymptotically

optimal

Model : Dynamic capacity scaling in M/M/∞ with setup delays

[Performance’10]

Page 24: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

24

Simulation Results for DELAYEDOFF

PROPOSED: Refine DELAYEDOFF, prove performance guarantees

Page 25: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

25

Application GAPDesign Choice

Dispatching

Backend Scheduling

Capacity scaling

Provisioning

Web server farms

No analysis for PS server farms

Energy management

in Data centers

Setup penalties + unpredictable demands

fully replicated DBs

High variance in job sizes

Database servers Thrashing

VM management

VM migration

and departures

A summary of explored gaps

Page 26: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

26

Application 3 : Fully replicated databases

Page 27: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

27

First-In-First-Out buffer

Q: How many servers, and what speed?

No exact analysis, approximations good for low job size variance

GAP : Job sizes have very high variance

Application 3 : Fully replicated databases

Page 28: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

28

First-In-First-Out buffer

Poisson

arrivals

Application 3 : Fully replicated databases

Page 29: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

29

First-In-First-Out buffer

The Holy Grail of queueing theory (model for many other applications)

yet no exact analysis!

Lee-Longton [1959] :

Poisson

arrivalssquared coeff. of

variation of job size dist.

typically C2 > 20

Model : M/G/K/FCFS

Page 30: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

30

Contribution 1: Inapproximability results

Goal: No accurate approx. based only on first 2 moments

Pick a subclass of distributions• Analytically tractable • Large enough to fix 2 moments, but wiggle room to prove gap

0

2

4

6

Increasing 3rd moment →

E[Delay]

Lee-Longton Approximation

H2

{G | 2 moments}

[QUESTA’10]

Page 31: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

Contribution 2: Tight moment-based bounds

Goal: Better approximation using n moments?

[QUESTA ’10] : Conjectured extremal distributions

M/G/K/FCFS under light-traffic• Extremality should be invariant to load• Verify conjectures for n = 2,3• Also for other queueing systems with no exact analysis

31

E[Delay]

{G | n moments}

?

?

tight bounds | n moments

, 4, 5, 6… proposed work

Page 32: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

32

Application GAPDesign Choice

Dispatching

Backend Scheduling

Capacity scaling

Provisioning

Web server farms

No analysis for PS server farms

Energy management

in Data centers

Setup penalties + unpredictable demands

Fully replicated DBs

High variance in job sizes

Database servers Thrashing

VM management

VM migration

and departures

A summary of explored gaps

Page 33: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

33

500MB

1.5GB

1GB

2GB

1GB

1GB

Application 5 : Managing VMs in the cloud

Page 34: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

34

Q: Which server to start VM on? What capacity servers to buy?Assumption of permanent itemsGAP : VMs depart + VM migration possible

Contribution : Stochastic bin packing model with job departure/migrationPROPOSED: develop packing/migration schemes for efficient packing

Application 5 : Managing VMs in the cloud

500M

1G

250M

500M

Page 35: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

Time line for proposed work

35

Page 36: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

Application GAP Status Proposed Work

Web server farms

No analysisfor PS server

farms

70% CompletedNov ‘10-Jan ‘11

• Optimal dispatch policies for heterogeneous servers

• Characterizing “near-insensitivity”

Energymanagement in Data centers

Setup penalties +

unpredictable demands

80% CompletedFeb-Mar ’11

• Refine the proposed DELAYEDOFF policy

• Performance guarantees for traffic- oblivious capacity scaling

Fully replicated DBs

High variance in job sizes

CompletedVerifying conjectures on tight

moment- based bounds beyond the scope of the thesis

Database servers

Thrashing Completed

VM managementVM migration

and departures

10% CompletedOct’ 10-Jan ’11

Develop and analyze heuristics for online stochastic bin packing with item departures and migrations

Expected Graduation: MAY2011

# jobs at server

Speed

ON

SETUP

OFF

JSQPS

PS

M/G/K

36

Dispatch

VM

VM

VM

Page 37: Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.

37

[Performance’07]

V. Gupta, M. Harchol-Balter, K. Sigman, and W. Whitt. Analysis of join-the-shortest-queue routing for web server farms. PERFORMANCE 2007

[Performance’10]

V. Gupta, A. Gandhi, M. Harchol-Balter, and M. Kozuch. Optimality analysis of energy-performance trade-off for server farm management. PERFORMANCE 2010.

[QUESTA’10] V. Gupta, J. Dai, M. Harchol-Balter, and B. Zwart. On the inapproximability of M/G/K: why two moments of job size distribution are not enough. Queueing Systems, Vol 64, 2010.

[TR’08] V. Gupta, J. Dai, M. Harchol-Balter, and B. Zwart. The effect of higher moments of job size distribution on the performance of an M/G/K queueing system. Technical Report ,CMU, 2008.

[Sigmetrics’09] V. Gupta and M. Harchol-Balter. Self-adaptive admission control policies for resource-sharing systems. SIGMETRICS 2009.

V. Gupta and T. Osogami, On Markov-Krein characterization of mean sojourn time in M/G/K. Under submission.

Other Work

Time-varying systems

[Sigmetrics’06] V. Gupta, M. Harchol-Balter, A. Scheller-Wolf, and U. Yechiali. Fundamental characteristics of queues with fluctuating load. Sigmetrics 2006

[MAMA’08a] V. Gupta, and P. Harrison. Fluid level in a reservoir with ON-OFF source. MAMA 2008.

Single Server

Scheduling

[MAMA’08b] V. Gupta. Finding the optimal quantum size: Sensitivity analysis of the M/G/1 round-robin queue. MAMA 2008.

[Performance’10b]

V. Gupta, M. Burroughs, and M. Harchol-Balter. Analysis of scheduling policies under correlated job sizes. Performance 2010.

Distributed Data

placement

[INFOCOM’10] S. Borst, V. Gupta, and A. Walid. Distributed caching algorithms for Content Distribution Networks. INFOCOM 2010.

[SOCC’10]H. Amur, J. Cipar, V. Gupta, M. Kozuch, G.Ganger, and K. Schwan, Robust and flexible power-proportional storage. Symposium on Cloud Computing, 2010.

Epidemics [INFOCOM’08]M. Vojnovic, V. Gupta, T. Karagiannis, and C. Gkantsidis. Sampling strategies for epidemic-style information dissemination. INFOCOM 2008.

Stability analysis

A. Busic, V.Gupta, and J. Mairesse. Stability of the bipartite matching model. Under Submission

References