Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel®...

30
Ben Walker Data Center Group Intel Corporation

Transcript of Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel®...

Page 1: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Ben Walker

Data Center Group

Intel Corporation

Page 2: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Notices and DisclaimersIntel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.

No computer system can be absolutely secure.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.

Intel, the Intel logo, Xeon, and others are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2017 Intel Corporation.

Page 3: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Agenda

Introduction

Use Cases

Design

Benchmarks

3

Page 4: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,
Page 5: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

But they aren’t designed to directly use the block device

Lots of applications want to use SPDK…

5

Page 6: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

What does a filesystem do?

Directories

Permissions

Access Times Byte Granularity

Checksums

Snapshots

TRIMSparse Allocation

Caching

I/O Scheduling

RAIDPartitions

6

Page 7: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Let’s build some new components!

What can SPDK do to help?

7

Page 8: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,
Page 9: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

What sort of application benefits from SPDK?

Lots of I/O

Latency Sensitive

SAN? Database? Cache?

We picked two use cases:

RocksDB

Dynamic Block Allocation

9

Page 10: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Log-structured merge tree

Written in C++, Open Source

Pluggable storage backend

Broadly adopted

Recommends XFS

Makes minimal use of XFS

Directory structure

I/O pattern

Minimal caching needs

RocksDB

No other file system features required!

10

Page 11: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,
Page 12: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Glossary Of Terms File: Array of bytes

Mutable, Resizable

String name

Object: Array Of bytes

Immutable, replaceable

String name

Page – 4 KiB

12

Page 13: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Simple and efficient

Design for fast storage media

Support file & object-like semantics

Design Goals

BlobFS

Blobstore

BDEV

13

Page 14: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Blobstore Basics

The user interacts with chunks of data called blobs

Array of pages Mutable, resizable ID

Asynchronous No blocking, queueing, or waiting

Fully parallel operations No locks in I/O path

I’m very efficient

14

Page 15: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Blobstore Space Allocation

Page 0

Cluster 0

… LBA 252 LBA 253 LBA 254 LBA 255LBA 0 LBA 1 LBA 2 LBA 3

Page 255…

LBA 0 LBA 255

15

Page 16: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Blobstore DesignBlob: array of pages implemented as an ordered list of clusters:

Cluster 455

0-255 512-767 768-1023256-511

Cluster 87Cluster 52Cluster 905

0 1 2 3

LBA 0 LBA N

Page Offsets:

16

Page 17: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

LBA 13312

LBA 13313

LBA 13314

LBA 13315

LBA 232583

LBA 232584

Blobstore Sample I/O

Page Offset

256

Page Offset

257

Page Offset

258

Page Offset

259

Page Offset

254

Page Offset

255…

Disk Write(Offset 232583, 2 LBAs)

Disk Write(Offset 13312, 4 LBAs)

Blob Write (Offset 254, 6 pages)

Cluster 9050

Cluster 521

Blobs are read/written by specifying a relative page offset and a page count

… …

17

Page 18: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Blobstore Metadata Metadata is stored in pages in a reserved region

Metadata pages are not shared between blobs

A blob may have multiple pages of metadata

Page 0(Blob 1)

Page 1(Blob 2)

Page 2(Blob 3)

Page 3(Blob 1)

Page 4(Blob 4)

SSDMetadata Region

18

Page 19: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

open, close, read, write, sync, resize

Asynchronous, callback-driven

Read/write in units of pages, space allocation in clusters

Data is direct

Metadata is cached

Minimal support for xattrs

Blobstore API

Independent of BlobFS

19

Page 20: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Layered on Blobstore

User interacts with files

Data can be cached

Synchronous API*

* Asynchronous API possible

BlobFS DesignCore 0 Core 1

I/O Device

open()

write()

open()

read()

Core 2

Async I/O Thread

20

Page 21: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

I/O Device

Not a general purpose page cache

Read ahead

Sequential write buffering

All other access bypasses cache

BlobFS CachingCore 0 Core 1

open()

write()

open()

read()

Core 2

Async I/O Thread

write()

write()

read()

21

Page 22: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,
Page 23: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Benchmark: db_bench Read/Write Latency

System Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s, 1 DIMM per channel, Fedora* Linux 25, Linux kernel 4.10.8, Intel® P3700 NVMe SSD (800GB), FW 8DV101H0, SPDK 17.03, DPDK 17.02, RocksDB 5.1.2 23

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

50 75

La

ten

cy (

ms)

Percentile Latency

Kernel SPDK

0

2

4

6

8

10

12

14

16

18

20

99 99.9L

ate

ncy

(m

s)

Percentile Latency

Kernel SPDK

Page 24: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Benchmark: db_bench Read/Write Throughput

System Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s, 1 DIMM per channel, Fedora* Linux 25, Linux kernel 4.10.8, Intel® P3700 NVMe SSD (800GB), FW 8DV101H0, SPDK 17.03, DPDK 17.02, RocksDB 5.1.2 24

0

5000

10000

15000

20000

25000

30000

Kernel SPDK

Tra

nsa

ctio

ns

Pe

r S

eco

nd

Page 25: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,
Page 26: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

Next Steps Major API clarifications

More & better benchmarking

Use blobstore as a dynamic partitioner (bdev)

BlobFS caching strategy is RocksDB-centric

Asynchronous BlobFS API

Sparse allocation of blobs

More open source application integration?

26

Page 27: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,
Page 28: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,
Page 29: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

SPDK Blobstore Vs. Kernel: Latency

0

20000

40000

60000

80000

100000

120000

140000

Readwrite

Late

ncy

uS

db_bench 99.99th Percentile LatencyLower is Better

Kernel (256KB sync) Blobstore (20GB Cache + Readahead)

372%

SPDK Blobstore reduces tail latency by 3.7X

0

1000

2000

3000

4000

5000

6000

7000

Insert Randread Overwrite

Late

ncy

uS

db_bench 99.99th Percentile LatencyLower is Better

Kernel (256KB sync) Blobstore (20GB Cache + Readahead)

21%

44%

28%

System Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s, 1 DIMM per channel, Fedora* Linux 25, Linux kernel 4.10.8, Intel® P3700 NVMe SSD (800GB), FW 8DV101H0, SPDK 17.03, DPDK 17.03, RocksDB 5.1.2

Page 30: Ben Walker Data Center Group Intel Corporation Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s,

SPDK Blobstore Vs. Kernel: Transactions Per Second

0

200000

400000

600000

800000

1000000

1200000

Insert Randread Overwrite Readwrite

Key

s p

er s

eco

nd

db_bench Key TransactionsHigher is Better

85%

8% 4% ~0%

System Configuration: 2x Intel® Xeon® E5-2699v3, Intel® Speed Step enabled, Intel® Turbo Boost Technology enabled, 8x 8GB DDR4 2133 MT/s, 1 DIMM per channel, Fedora* Linux 25, Linux kernel 4.10.8, Intel® P3700 NVMe SSD (800GB), FW 8DV101H0, SPDK 17.03, DPDK 17.03, RocksDB 5.1.2