HCLOUD: RESOURCE-EFFICIENT PROVISIONING IN SHARED …kozyraki/publications/2016.h... · 2016. 4....

Post on 01-Oct-2020

2 views 0 download

Transcript of HCLOUD: RESOURCE-EFFICIENT PROVISIONING IN SHARED …kozyraki/publications/2016.h... · 2016. 4....

Christina Delimitrou1 and Christos Kozyrakis2

1Stanford/Cornell University, 2Stanford University/EPFLhttp://mast.stanford.edu

ASPLOS– April5th 2016

HCLOUD: RESOURCE-EFFICIENT PROVISIONING IN SHARED CLOUD SYSTEMS

2

¨ Problem: cloud provisioning is difficult

¤ Many resource offerings à resource/cost inefficiencies

¤ Interference from other users à performance jitter

¨ HCloud: resource-efficient public cloud provisioning

¤ User provides resource reservations performance goals

¤ Automatic selection of instance type and configuration

¤ Adjust allocations between reserved and on-demand

¤ High utilization and low performance jitter

Executive Summary

3

Motivation

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlarge

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlarge

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

Cost, predictability, availability, flexibility, spin-up overheads,

retention time, load fluctuation, external load, sensitivity to interference, …

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

t2.nanot2.micro t2.small t2.medium t2.largem4.large m4.xlarge m4.2xlargem4.4xlargem4.10xlargem3.medium m3.large m3.xlarge m3.2xlargec4.largec4.xlarge c4.2xlargec4.4xlargec4.8xlargec3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarger3.large r3.xlarge

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

n1-standard-1n1-standard-2n1-standard-4n1-standard-8n1-standard-16n1-standard-32n1-highmem-2n1-highmem-4n1-highmem-8n1-highmem-16n1-highmem-32n1-highcpu-2n1-highcpu-4n1-highcpu-8n1-highcpu-16n1-highcpu-32f1-microg1-small

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

A0A1A2A3A4A5A6A7A8A9A10A11D1D2D3D4D11D12D13D14D1 v2D2 v2D3 v2D4 v2D5 v2D11 v2

9

Cloud Provisioning Goals

✗ Determine appropriate instance size/type

✗ Determine appropriate instance configuration

✗ Dynamically adjust allocation decisions at runtime

10

Cloud Provisioning Scenarios

¨ Batch analytics, latency-critical applications, and scientific workloads in all scenarios¤ Batch: 50%¤ Latency: 40%¤ Scientific: 10%

11

Provisioning Baselines¨ All reserved (SR):

¤ 16vCPU instances on GCE, no external load¤ Predictable performance, high cost, low flexibility¤ Applications provisioned for peak requirements

¨ All on-demand: ¤ Largest instances (OdF): only 16vCPU instances¤ Mixed instances (OdM): mix of large and smaller instances¤ Interference, low cost, high flexibility

¨ Resource management: ¤ Least-loaded scheduler

12

Extracting Resource Preferences

¨ Exhaustive characterization is infeasible

Heterogeneity

Interference

Resources per server

Resource ratio

10 servers40 apps

100 servers300 apps

1000 servers1200 apps

Combinations

1,000

1,000,000

0

1,000,000,000

Systems

Number of servers

Application params

13

Quasar Overview [ASPLOS’14]

Profiling

Cluster

QoS Scheduler

UserApp

App

App

App

App

Resourcepreferences

App

Data mining

14

Quasar Overview [ASPLOS’14]

Cluster

Scheduler

Greedy scheduler

Profiling

User Resourcepreferences

Data mining Resource selection

App

15

Quasar Overview [ASPLOS’14]

Cluster

Scheduler

Greedy scheduler

Profiling

User Resourcepreferences

Data mining Resource selection

App

16

Quasar Overview [ASPLOS’14]

Profiling[5-30sec]

Data mining[20msec]

Resource selection[50msec-2sec]

Cluster

Scheduler

Greedy schedulerUser

App

App

App

App

Resourcepreferences

App

17

Evaluation

18

Evaluation

19

Cloud Provisioning Goals

✔ Determine appropriate instance size/type

✗ Determine appropriate instance configuration

✗ Dynamically adjust allocation decisions at runtime

20

Hybrid Cloud Resource Allocation

¨ Insight: ¤ Combine reserved and on-demand resources

¨ Challenge: ¤ Separate applications between reserved (~private) and

on-demand (public) resources

21

Hybrid Allocation Policies

P1 P2 P3 P4 P5 P6 P7 P8556065707580859095

Perf.

nor

m to

Isol

atio

n (%

) On-Demand Resources

HFHM

P1 P2 P3 P4 P5 P6 P7 P875

80

85

90

95

100

Perf.

nor

m to

Isol

atio

n (%

) Reserved Resources

HFHM

✗ Account for application interference sensitivity

22

Hybrid Allocation Policies

P1 P2 P3 P4 P5 P6 P7 P875

80

85

90

95

100

Perf.

nor

m to

Isol

atio

n (%

) Reserved Resources

HFHM

P1 P2 P3 P4 P5 P6 P7 P8556065707580859095

100

Perf.

nor

m to

Isol

atio

n (%

) On-Demand Resources

HFHM

P1 P2 P3 P4 P5 P6 P7 P875

80

85

90

95

100

Perf.

nor

m to

Isol

atio

n (%

) Reserved Resources

HFHM

✗ Do not overload reserved resources

✓ Account for application interference sensitivity

23

Hybrid Allocation Policies

P1 P2 P3 P4 P5 P6 P7 P875

80

85

90

95

100

Perf.

nor

m to

Isol

atio

n (%

) Reserved Resources

HFHM

P1 P2 P3 P4 P5 P6 P7 P8556065707580859095

100

Perf.

nor

m to

Isol

atio

n (%

) On-Demand Resources

HFHM

P1 P2 P3 P4 P5 P6 P7 P8556065707580859095

100

Perf.

nor

m to

Isol

atio

n (%

) On-Demand Resources

HFHM

P1 P2 P3 P4 P5 P6 P7 P875

80

85

90

95

100

Perf.

nor

m to

Isol

atio

n (%

) Reserved Resources

HFHM

✗ Dynamic decisions (e.g., utilization limits)

✓ Account for application interference sensitivity

✓ Do not overload reserved resources

24

HCloud

¨ Insights: ¤ Account for interference sensitivity¤ Set load limits dynamically

Reserved On-demand

AppApp

App

App

25

HCloud

Reserved On-demand

AppApp

App

App

¨ Insights: ¤ Account for interference sensitivity¤ Set load limits dynamically

26

HCloud

Reserved On-demand

AppApp

App

App

¨ Insights: ¤ Account for interference sensitivity¤ Set load limits dynamically

27

HCloud

Reserved On-demand

AppApp

App

App

¨ Insights: ¤ Account for interference sensitivity¤ Set load limits dynamically (feedback loop on queue length)

28

HCloud

Reserved On-demand

AppApp

App

App

¨ Insights: ¤ Account for interference sensitivity¤ Set load limits dynamically (feedback loop on queue length)

¨ What signals a sensitive job?

29

HCloud

¨ Insights: ¤ Account for interference sensitivity¤ Set load limits dynamically

¨ What signals a sensitive job?

Reserved On-demand

AppApp

App

App

30

Hybrid Allocation Policies

P1 P2 P3 P4 P5 P6 P7 P875

80

85

90

95

100

Perf.

nor

m to

Isol

atio

n (%

) Reserved Resources

HFHM

P1 P2 P3 P4 P5 P6 P7 P8556065707580859095

100

Perf.

nor

m to

Isol

atio

n (%

) On-Demand Resources

HFHM

P1 P2 P3 P4 P5 P6 P7 P8556065707580859095

100

Perf.

nor

m to

Isol

atio

n (%

) On-Demand Resources

HFHM

P1 P2 P3 P4 P5 P6 P7 P875

80

85

90

95

100

Perf.

nor

m to

Isol

atio

n (%

) Reserved Resources

HFHM

✓ Account for application interference sensitivity

✓ Do not overload reserved resources

✓ Dynamic decisions (e.g., utilization limits)

31

Evaluation

32

Cloud Provisioning Goals

✔ Determine appropriate instance size/type

✔ Determine appropriate instance configuration

✗ Dynamically adjust allocation decisions at runtime

33

Adjusting Allocations at Runtime

¨ Runtime: monitor app performance¤ Remove co-scheduled apps¤ Move to larger instance within cluster¤ Move to reserved cluster

Reserved On-demand

34

Conclusions

¨ HCloud: hybrid cloud provisioning (reserved & on-demand)¤ Account for interference sensitivity¤ Account for performance-cost tradeoffs¤ Adjust to load fluctuations¤ 2.1x better performance than on-demand, 50% lower cost than

reserved

¨ See paper for: ¤ Performance unpredictability analysis on EC2 and GCE¤ Sensitivity studies on system & app parameters

n Spin-up overheads, cost, retention time, load, app characteristics, external load, …

¤ Further workload scenarios

35

¨ HCloud: hybrid cloud provisioning (reserved & on-demand)¤ Account for interference sensitivity¤ Account for performance-cost tradeoffs¤ Adjust to load fluctuations¤ 2.1x better performance than on-demand, 50% lower cost than

reserved

¨ See paper for: ¤ Performance unpredictability analysis on EC2 and GCE¤ Sensitivity studies on system & app parameters

n Spin-up overheads, cost, retention time, load, app characteristics, external load, …

¤ Further workload scenarios

Questions??

36

Thank you

Questions??

37

Resource Efficiency