Download - : Taming the Cloud Object Storage - pds · : Taming the Cloud Object Storage Ali Anwar★, Yue Cheng★, Aayush Gupta†, Ali R. Butt★ ★Virginia Tech & †IBM Research – Almaden

: Taming the Cloud Object Storage

Ali Anwar★, Yue Cheng★, Aayush Gupta†, Ali R. Butt★

★Virginia Tech & †IBM Research – Almaden

2

Cloud object stores enable cost-efficient data storage

Object storage

3

Cloud object store supports various workloads

Object storage

Online video sharing Enterprise backup

Website

Online gaming

4

One size does not fit all

Replace monolithic object store with specialized fine-grained object stores each launched on a

sub-cluster

5

Reason 1: Classification of workloads

Object storage

Online gaming


Website

Applications have different service level requirements, e.g.,

average latency per request, queries per second (QPS), and data transfer throughput (MB/s)

6

Small objects

Object storage

~ 1-100 KB

Online gaming

Get: 5%, Put: 90%, Delete5:%

Website Get: 90%, Put: 5%, Delete5:%

7

Large objects

Object storage




~ 1-100 MB

8

Reason 2: Heterogeneous resources

¤ Dcenters hosting object stores are becoming increasingly heterogeneous

¤ Hardware to application workload mismatch

¤ Meeting SLA requirement is challenging

9

Outline

Introduction

Motivation

Contribution

Design

Evaluation

10

Background: Swift object store

Object storage

= Proxy server Storage

nodes

11

Swift: Proxy and Storage servers

Object storage


nodes

1 2

12

Swift: Ring architecture

Object storage


nodes 3

13

Benchmark used: CosBench

¤ COSBench is Intel developed Benchmark to measure Cloud Object Storage Service performance ¤  For S3, OpenStack Swift like object store ¤ Not for file system or block device system

¤ Used to compare different hardware software stacks

¤  Identify bottlenecks and make optimizations

14

Workload used

Workload Workload Characteristics Application scenario Object Size Distribution

Workload A 1 – 128 KB G: 90%, P: 5%, D:5%

Web hosting

Workload B

1 – 128 KB

G: 5%, P: 90%, D:5%

Online game hosting

Workload C

1 – 128 MB

G: 90%, P: 5%, D:5%

Online video sharing

Workload D

1 – 128 MB

G: 5%, P: 90%, D:5%

Enterprise backup

15

COSBench 32 cores

Experimental setup for motivational study

8 cores

32 cores

1 Gbps 10 Gbps

3 SATA SSD/node

Proxy servers

Storage servers

16

Configuration 1 – Default monolithic

COSBench 32 cores 8 cores

32 cores

1 Gbps 10 Gbps

3 SATA SSD/node

Round robin

17

Configuration 2 – Favors small objects


1 Gbps 10 Gbps

COSBench

32 cores

3 SATA SSD/node

Small objects Large objects

18

Configuration 3 – Favors large objects


1 Gbps 10 Gbps

COSBench

32 cores

3 SATA SSD/node


19

Performance under multi tenant environment – Workload A & B

0 100 200 300 400 500

A B C DThro

ughp

ut (Q

PS

)

Workload

Config 1Config 2Config 3


20

Performance under multi tenant environment- – Workload A & B

0 50

100 150 200

A B C DThro

ughp

ut (M

B/s

)

Workload



21

Performance under multi tenant environment - latency

0 3 6 9

12 15 18

A B C D

Late

ncy

(sec

)

Workload



22

Key Insights

¤ Cloud object store workloads can be classified based on the size of the objects in their workloads

¤ When multiple tenants run workloads with drastically different behaviors, they compete for the object store resources with each other

23

Outline

Introduction

Motivation

Contribution

Design

Evaluation

24

Contributions

¤ Perform a performance and resource efficiency analysis on major hardware and software configuration opportunities

¤ We design MOS, Micro Object Storage: ¤  1) dynamically provisions fine-grained microstores ¤  2) exposes the interfaces of microstores to the tenants

¤ Evaluate MOS to showcase its advantages

25

Outline

Introduction

Motivation

Contribution

Design

Evaluation

26

Design criteria for MOS

¤ We studied the effect of three knobs on performance of a typical object store to come up with design rules/ rules of thumb ¤ Proxy Server settings ¤  Storage Server settings

¤ Hardware changes

27

Effect of Proxy server settings

0 0.8 1.6 2.4 3.2

4

1 2 4 8 16 32 64 2x0%20%40%60%80%100%

Thro

ughp

ut (1

03 QP

S)

Per

-nod

e C

PU

util

(%)

Proxy workers

100% utilQPS

CPU util

0

0.5

1

1.5

2

1 2 4 8 16 32 2x

Thro

ughp

ut (G

B/s

)

Proxy workers

10 Gbps NIC bandwidth limitSmall objects

Large objects

28

Effect of Proxy server settings

0 0.8 1.6 2.4 3.2

4

1 2 4 8 16 32 64 2x0%20%40%60%80%100%

Thro

ughp

ut (1

03 QP

S)

Per

-nod

e C

PU

util

(%)

Proxy workers

100% utilQPS

CPU util

0

0.5

1

1.5

2

1 2 4 8 16 32 2x

Thro

ughp

ut (G

B/s

)

Proxy workers

10 Gbps NIC bandwidth limitSmall objects

Large objects

29

Effect of Storage server settings

0 0.4 0.8 1.2 1.6

2 2.4 2.8

1 2 4 8 16 32 0 0.2 0.4 0.6 0.8 1 1.2

Thro

ughp

ut (1

03 QP

S)

Thro

ughp

ut (G

B/s

)

Object storage workers

QPSGB/s

30


0 0.4 0.8 1.2 1.6

2 2.4 2.8

1 2 4 8 16 32 0 0.2 0.4 0.6 0.8 1 1.2

Thro

ughp

ut (1

03 QP

S)

Thro

ughp

ut (G

B/s

)


QPSGB/s

Small objects

31


0 0.4 0.8 1.2 1.6

2 2.4 2.8

1 2 4 8 16 32 0 0.2 0.4 0.6 0.8 1 1.2

Thro

ughp

ut (1

03 QP

S)

Thro

ughp

ut (G

B/s

)


QPSGB/s

Large objects

32

Effect of hardware settings

0

0.5

1

1.5

2

2.5

1 Gbps 10 Gbps

Thro

ughp

ut(1

03 QP

S)

HDDSSD

0

0.3

0.6

0.9

1.2

1.5

1 Gbps 10 Gbps

Thro

ughp

ut(G

B/s

)

HDDSSD


33

Rules of thumb

¤ CPU on proxy serves as the first-priority resource for small-object intensive workloads

¤ Network bandwidth is more important than CPU on proxy for large-object intensive workloads

¤ proxyCores = storageNodes ∗ coresPerStorageNode

¤ BWproxies = storageNodes ∗ BWstorageNode ¤ Faster network cannot effectively improve QPS for

small-object intensive workloads – use weak network (1 Gbps NICs) with good storage devices (SSD)

34

MOS Design

Load balancer/Load redirector

Microstore 1

Object storage

Object storage

Object storage

Proxy Proxy Proxy…

…

Workload monitor

Microstore N

Object storage

Object storage

Object storage

Proxy Proxy Proxy…

…

Workload monitor

Resource manager

Free resource pool

ServerServerProxyObject storageObject storageObject storage



…

…

MOS setup

Microstores

Resource Manager

35

Resource Provisioning Algorithm

¤  Initially, the algorithm allocates the same amount of resources to each microstore conservatively then use greedy approach for resource allocation

¤ Keep track of free set of resources (including hardware configuration, current load served, and the resource utilization such as CPU and network bandwidth utilization)

¤ Periodically collect monitoring data from each microstore to aggressively increase and linearly decrease resources from each microstore

36

Outline

Introduction

Motivation

Contribution

Design

Evaluation

37

Preliminary evaluation via simulation – Experimental setup

¤ Compute nodes:

-  3 – 32 core machines

-  4 – 16 core



¤ Network:

-  18 – 10 Gbps

-  32 – 1 Gbps NICs

¤ HDD to SSD ratio was 70% to 30%.

38

Aggregated throughput

0

4

8

12

16

20

0 2 4 6 8 10 12 14 16 18 20

Thro

ughp

ut (1

03 QP

S)

Time (min)

DefaultMOS static

MOS dynamic

0

3

6

9

12

15

0 2 4 6 8 10 12 14 16 18 20

Thro

ughp

ut (G

B/s

)

Time (min)

DefaultMOS static

MOS dynamic


39


0

4

8

12

16

20

0 2 4 6 8 10 12 14 16 18 20

Thro

ughp

ut (1

03 QP

S)

Time (min)

DefaultMOS static

MOS dynamic

0

3

6

9

12

15

0 2 4 6 8 10 12 14 16 18 20

Thro

ughp

ut (G

B/s

)

Time (min)

DefaultMOS static

MOS dynamic


40


0

4

8

12

16

20

0 2 4 6 8 10 12 14 16 18 20

Thro

ughp

ut (1

03 QP

S)

Time (min)

DefaultMOS static

MOS dynamic

0

3

6

9

12

15

0 2 4 6 8 10 12 14 16 18 20

Thro

ughp

ut (G

B/s

)

Time (min)

DefaultMOS static

MOS dynamic


41

Timeline under dynamically changing workloads

0

2

4

6

8

0 100 200 300 400 500 600 700 800 0 1 2 3 4 5

Thro

ughp

ut (1

03 QP

S)

Thro

ughp

ut (G

B/s

)

Time (min)

Stage 1 Stage 2 Stage 3 Stage 4

A B C D

42

Resource utilization timeline

0.5

0.75

1

0 100 200 300 400 500 600 700 800Time (min)

A

B

C

D

Util

izat

ion

(%)

CPU Network

0.5

0.75

1

A

B

C

D

Util

izat

ion

(%)

0.5

0.75

1

A

B

C

D

Util

izat

ion

(%)

0.5

0.75

1

A

B

C

D

Util

izat

ion

(%)

43

Related Work

¤  MET proposes several system metrics that are critical for a NoSQL database and highly impacts server utilization’s estimation

¤  φSched and Walnut propose sharing of hardware resources across clouds of different types

¤  CAST and its extension perform coarse-grained cloud storage (including object stores) management for data analytics workloads

¤  IOFlow solves a similar problem by providing a queue and control functionality at two OS stages – the storage drivers in the hypervisor and the storage server

44

Conclusion

¤ We performed exhausted study of cloud object stores

¤ We proposed a set of rules to help cloud object store administrator to efficiently utilize resources

¤ We presented MOS which can outperform extant object store under multi-tenant environment

¤ Our analysis shows that it is possible to exploit heterogeneity inherited by modern datacenter to the advantage of object store providers

45

Q&A

46

http://research.cs.vt.edu/dssl/

Ali Anwar Yue Cheng Aayush Gupta Ali R. Butt