The Only Constant is Change: Incorporating Time-varying ...€¦ · The Only Constant is Change:...

The Only Constant is Change: Incorporating Time-Varying Bandwidth

Reservations in Data Centers

Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella

Cloud Computing is Hot

Private Cluster

Key Factors for Cloud Viability

• Cost

• Performance

Performance Variability in Cloud

• BW variation in cloud due to contention [Schad’10 VLDB]

• Causing unpredictable performance

Local Cluster Amazon EC2

Bandwidth (Mbps)

Reserving BW in Data Centers

• SecondNet [Guo’10]

– Per VM-pair, per VM access bandwidth reservation

• Oktopus [Ballani’11]

– Virtual Cluster (VC)

– Virtual Oversubscribed Cluster (VOC)

How BW Reservation Works

Virtual Cluster Model

Bandwidth

VirtualSwitch

1. Determine the model 2. Allocate and enforce the model

Only fixed-BW reservationRequest <N, B>

Network Usage for MapReduce Jobs

Hadoop Sort, 4GB per VM

Hadoop Word Count, 2GB per VM

Hive Join, 6GB per VM

Hive Aggregation, 2GB per VM10

Hive Aggregation, 2GB per VM11

Time-varying network usage

Motivating Example

• 4 machines,

2 VMs/machine,

non-oversubscribed

network

• Hadoop Sort– N: 4 VMs

– B: 500Mbps/VM

500Mbps50

sNot enough BW

Motivating Example

• 4 machines,

2 VMs/machine,

non-oversubscribed

network

• Hadoop Sort– N: 4 VMs

– B: 500Mbps/VM

500Mbps

Under Fixed-BW Reservation Model

500MbpsJob3Job2

Job1 Time

0 5 10 15 20 25 30

Bandwidth

Temporally-Interleaved Virtual Cluster (TIVC)

• Key idea: Time-Varying BW Reservations

• Compared to fixed-BW reservation– Improves utilization of data center

• Better network utilization

• Better VM utilization

– Increases cloud provider’s revenue

– Reduces cloud user’s cost

– Without sacrificing job performance

Challenges in Realizing TIVC

Bandwidth

VirtualSwitch 0 T

Request <N, B>

Bandwidth

Request <N, B(t)>

Q1: What are right model functions?

Q2: How to automatically derive the models?

Q3: How to efficiently allocate TIVC?

Q4: How to enforce TIVC?

• What are the right model functions?

• How to automatically derive the models?

• How to efficiently allocate TIVC?

• How to enforce TIVC?

• What are the right model functions?

How to Model Time-Varying BW?

Hadoop Hive Join

TIVC Models

Virtual Cluster

0 T1 T2 T

Time T31

T11 T12 T21 T22 T32 T

Time T31

T11 T12 T21 T22 T32 T T11T32

Hadoop Sort

Hadoop Word Count

Hadoop Hive Join

Hadoop Hive Aggregation

What are the right model functions?

Possible Approach

• “White-box” approach– Given source code and data of cloud application,

analyze quantitative networking requirement

– Very difficult in practice

• Observation: Many jobs are repeated many times– E.g., 40% jobs are recurring in Bing’s production data

center [Agarwal’12]

– Of course, data itself may change across runs, but size remains about the same

Our Approach

• Solution: “Black-box” profiling based approach

1. Collect traffic trace from profiling run

2. Derive TIVC model from traffic trace

• Profiling: Same configuration as production runs

– Same number of VMs

– Same input data size per VM

– Same job/VM configuration

How much BW should we give to the application?

Impact of BW Capping

No-elongation BW threshold

Choosing BW Cap

• Tradeoff between performance and cost

– Cap > threshold: same performance, costs more

– Cap < threshold: lower performance, may cost less

• Our Approach: Expose tradeoff to user

1. Profile under different BW caps

2. Expose run times and cost to user

3. User picks the appropriate BW cap

Only below threshold ones

From Profiling to Model Generation

• Collect traffic trace from each VM– Instantaneous throughput of 10ms bin

• Generate models for individual VMs

• Combine to obtain overall job’s TIVC model– Simplify allocation by working with one model

– Does not lose efficiency since per-VM models are roughly similar for MapReduce-like applications

Generate Model for Individual VM

1. Choose Bb

2. Periods where B > Bb, set to Bcap

Maximal Efficiency Model

• Enumerate Bb to find the maximal efficiency model

Volume Bandwdith Reserved

Volume Traffic nApplicatioEfficiency

How to automatically derive the models?

TIVC Allocation Algorithm

• Spatio-temporal allocation algorithm– Extends VC allocation algorithm to time dimension

– Employs dynamic programming

• Properties– Locality aware

– Efficient and scalable• 99th percentile 28ms on a 64,000-VM data center in

scheduling 5,000 jobs

How to efficiently allocate TIVC?

Enforcing TIVC Reservation

• Possible to enforce completely in hypervisor– Does not have control over upper level links

– Requires online rate monitoring and feedback

– Increases hypervisor overhead and complexity

• Observation: Few jobs share a link simultaneously– Most small jobs will fit into a rack

– Only a few large jobs cross the core

– In our simulations, < 26 jobs share a link in 64,000-VM data center

Enforcing TIVC Reservation

• Enforcing BW reservation in switches

– Avoid complexity in hypervisors

– Can be implemented on commodity switches

• Cisco Nexus 7000 supports 16k policers

How to efficiently allocate TIVC?

How to enforce TIVC?

Proteus: Implementing TIVC Models

1. Determine the model

2. Allocate and enforce the model

Evaluation

• Large-scale simulation

– Performance

– Cost

– Allocation algorithm

• Prototype implementation

– Small-scale testbed

Simulation Setup

• 3-level tree topology– 16,000 Hosts x 4 VMs

– 4:1 oversubscription

• Workload– N: exponential distribution around mean 49

– B(t): derive from real Hadoop apps

50Gbps

10Gbps

… …1Gbps

20 Aggr Switch

20 ToR Switch

40 Hosts

… … …

Batched Jobs

• Scenario: 5,000 time-insensitive jobs

42% 21% 23% 35%

1/3 of each type

Completion time reduction

All rest results are for mixed

Varying Oversubscription and Job Size

25.8% reduction for non-oversubscribed

network

Dynamically Arriving Jobs

• Scenario: Accommodate users’ requests in shared data center

– 5,000 jobs, Poisson arrival, varying load

Rejected: VC: 9.5%

TIVC: 3.4%

Analysis: Higher Concurrency

• Under 80% load

7% higher job concurrency

28% higher VM utilization

Rejected jobs are large

28% higher revenue

Charge VMs

Tenant Cost and Provider Revenue

• Charging model

– VM time T and reserved BW volume B

– Cost = N (kv T + kb B)

– kv = 0.004$/hr, kb = 0.00016$/GB

12% less cost for tenants

Providers make more money

Amazon target utilization

Testbed Experiment

• Setup– 18 machines

– Tc and NetFPGA rate limiter

• Real MapReduce jobs

• Procedure– Offline profiling

– Online reservation

Testbed Result

TIVC finishes job faster than VC,Baseline finishes the fastest

Baseline suffers elongation, TIVC achieves similar performance as VC

Conclusion

• Network reservations in cloud are important– Previous work proposed fixed-BW reservations– However, cloud apps exhibit time-varying BW usage

• We propose TIVC abstraction – Provides time-varying network reservations– Uses simple pulse functions– Automatically generates model– Efficiently allocates and enforces reservations

• Proteus shows TIVC benefits both cloud provider and users significantly

The Only Constant is Change: Incorporating Time-varying ...€¦ · The Only Constant is Change:...

Documents

Transcript of The Only Constant is Change: Incorporating Time-varying ...€¦ · The Only Constant is Change:...

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers

i EFFECT OF CONSTANT AND VARYING MIXTURE PROPERTIES ...

Two Possible Neutralization Methods Two possible neutralization methods Time-varying, common emitter structure Spatially separated, constant steady-state.

Incorporating modeling uncertainties in the assessment of ...bakerjw/Publications/... · forced concrete moment frames of varying height (1–20 stories) which were designed according

7.4 Work Done by a Varying Force. Work Done by a Varying Force Assume that during a very small displacement, x, F is constant For that displacement,

Forecasting using mixed-frequency VARs with time-varying ...€¦ · The latter include constant parameter VARs with and without mixed-frequencies and time-varying VARs without a

On Formulas for Equivalent Potential Temperaturefisicaatmo.at.fcen.uba.ar/teoricas/Davis_Jones_2009.pdf · heat at constant pressure for dry air, assumed to be a constant, varying

Sample Section 7.3 on “Microbial Growth on Multiple ...elements/07chap/html/kompala.pdf“Microbial Growth on Multiple Substrates ... Incorporating the effect of varying key enzyme

Constant vs. Time-varying Hedging Effectiveness Comparison ... · Constant vs. Time-varying Hedging Effectiveness Comparison for CO 2 Emissions Allowances: the Empirical Evidence

Bayesian varying coefficient models using PC priors · the effect of the covariate is constant. In this work, we present varying coefficient models in a unified way using the recently

- PE/2marks... · 31.Differentiate between constant frequency & variable frequency control strategies of varying the duty cycle of DC chopper. constant frequency control — Frequency

1 Combined Linear & Constant Envelope Modulation M-ary modulation: digital baseband data sent by varying RF carrier’s (i) envelope ( eg. MASK) (ii) phase.

Unit 2: Motion in 2D Textbook: Chapter 3. Unit Objectives: Motion Models 1. Determine which model (constant velocity or constant acceleration, or varying.

PRO-ORAM: Constant Latency Read-Only Oblivious RAM · PRO-ORAM: Constant Latency Read-Only Oblivious RAM ... simulator for varying ﬁle / block sizes and total data ... network latency

ISM Transmitter Has Constant Transmitter Power for Varying

Solar sail deployment dynamicsarrow.utias.utoronto.ca/~damaren/papers/behradasr2020.pdf · 1. Introduction Translating continua of constant or time-varying length ... updated simulation

Sect. 7-3: Work Done by a Varying Force. Work Done by a Varying Force For a particle acted on by a varying force, clearly is not constant! For a small.

Spooktacular Plants - Constant Contactfiles.constantcontact.com/5b8dd73e301/aed85a76-af0...beauty and function to your landscape by incorporating containers with winter annuals around

Conformational Entropy as a Means to Control the Behavior ... · shown by incorporating linear polymer segments varying in molecular weight (MW) and conformational degrees of free-dom

How to Discount Cashflows with Time-Varying Expected Returns · betas are not constant, the dividend discount model ignores time-varying risk pre-miums and betas. We develop a model