PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 ·...

42
Persistent Memory Performance Benchmarking & Comparison Eden Kim, Calypso Systems, Inc. John Kim, Mellanox Technologies, Inc.

Transcript of PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 ·...

Page 1: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

Persistent  Memory  Performance  Benchmarking  &  Comparison

Eden  Kim,  Calypso  Systems,  Inc.John  Kim,  Mellanox  Technologies,  Inc.

Page 2: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

PM  Benchmarking  &  Comparison

Part  1:  Test  Plan  &  Workloads  – Eden  Kim  of  Calypso

Part  2:    Direct  Attached  PM  &  NVMe  -­ Eden  Kim  of  Calypso

Part  3:    Remote  PM  &  NVMe-­oF  – John  Kim  of  Mellanox

Part  4:    Conclusions  – John  Kim  of  Mellanox  &  Eden  Kim  of  Calypso

Questions  &  Answers

2

Page 3: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

Part  1:  Test  Plan  &  Workloads

A. Test  Plan

B. Real  World  Workload  Source  Captures

C. Test  Workloads

3

Page 4: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

A:  Test  Plan  -­ Steps

4

Step Description Reference

Select & Define Workloads Synthetic SQL WorkloadReal World GPS Nav Portal

SNIA SSS PTS v2.0.1TestMyWorkload.com Demo #7SNIA RWSW PTS v1.0.7

Run Direct Attached TestsPM–NNVMe U.2–HNVMe U.2-I

SQL DIRTH - RND 8K RW65GPS Nav Auto DIRTH – 9 IO StreamReplay - Native & ScaledIndividual Streams – Steady State

Run Fabric/Remote TestsPM–NNVMe U.2–HNVMe U.2-I

RoCE on 100 Gb/s Ethernet

Page 5: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

A.  Test  Plan  -­ Sample  Pool

5

Sample Description Capacity

PM-N NVDIMM-N Single 16 GB NVDIMM-N Module

NVMe U.2-H U.2x4 SSD2.5” U.2x4SFF 86393200 GB

NVMe U.2-I U.2x4 SSD2.5” U.2x4SFF 86392000 GB

Page 6: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

A:  Test  Plan  -­ Platforms

6

Device Under Test Platform Description

PM–N - Direct Smart ModularSupermicro X11 DRI

Ubuntu; DDR4 16GB RAM; 24 Core 2.1 GHz CPU

NVMe U.2–H NVMe U.2-I

Calypso RTP 5Supermicro X10 DRI

CentOS 7x; DDR4 32GB RAM; Dual 24 Core Intel E5 2660 v4 2.0 GHz CPU; PCIe Gen 3

PM–N - Remote Smart ModularSupermicro X11 DRI

Ubuntu; DDR4 16GB RAM; 24 Core 2.1 GHz CPU

Fabric Mellanox100Gb/s

Linux Open Source RoCE ConnectX-4 VPI 100Gb/sConnectX-5 EN 100Gb/s

Test Software Calypso CTS CTS 6.5

Page 7: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

B:  Real  World  Source  Captures

7

Item Description Reference

SSSI Reference Captures Synthetic BenchmarksReal World Workloads

SNIA RWSW PTS v1.0.7TestMyWorkload.com

PM Summit 20182,000 Outlet Retail Web Portal Workload24 hr/5 min resolution Drive0 Block IOVaried Activity/QD – Coarser Grain

TestMyWorkload.com/demo_5

PM Summit 2019GPS Navigation Portal Workload30 min/1 sec resolution Drive0 Block IOHomogenous Activity – Finer Grain

TestMyWorkload.com/demo_7

Page 8: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

B.    PM  Summit  2018  – TestMyWorkload.com/demo_5

8

Retail  Web  Portal• Block  IO  Drive0

• Fewer  IO  Streams  (6)• 30.5%  RND  64K  W• 28.9%  SEQ  0.5K  W• 16.4%  RND  8K  W• 14.2%  SEQ  8K  R• 5.6%  SEQ    64K  R• 4.4%  RND  8K  W

• QD  (Users)  Higher  Range  1-­306

• Coarser  Grain  (5  min)• More  Activities  &  Spikes• Lower  IOPS  Range• Larger  Block  Sizes• Read  &  Write  IOs

12476 65

7 20

181146

30

184

31124

160147104

207142

186

306

48

174

300

65168108169

86

244159

13

0.001

0.01

0.1

1

10

100

1000

10000

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

0 100 200 301 401 501 601 702 804 906 1,007 1,108 1,208 1,308 1,409

IOPSIOs

Time  (Min)

Real  World  Retail  Web  Portal  -­‐ Block  IO:  6  IO  Streams  24h  5m4.4%:  RND  8K  W 5.6%:  SEQ  64K  R 14.2%:  SEQ  8K  R 16.4%:  RND  8K  R28.9%:  SEQ  0.5K  W 30.5%:  RND  64K  R QD  (Users) IOPS

QueueDepth

IOPS

IOStreams

2  am  Back  Up

Boot  Storm

MorningHours

DailyTransactions

ClosingHours

EarlyMorning

Page 9: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

B.    PM  Summit  2019  – TestMyWorkload.com/demo_7

9

GPS  Nav  Portal• Block  IO  Drive0

• More  IO  Streams  (9)• 28.0%  SEQ  4K  W• 19.9%  RND  16K  W• 16.2%  SEQ  16K  W• 15.7%  SEQ  0.5K  W• 6.5%  SEQ  1K  W• 5.1%  RND  4K  W• 3.0%  RND  1K  W• 2.9%  RND  28K  W• 2.8%  SEQ  1.5K  W

• QD  (Users)  Lower  Range  1-­14

• Finer  Grain  (1  sec)• Fewer  Activities• Higher  IOPS  Range• Smaller  Block  Sizes• All  Write  IOs

10

21

8

2

795

1

14

2

9

1

5

2

7

31

9

1

5 7

2

47

1

3

9

2

610

2

8

1

7

1

48

2

685

12

9469

1

5

2

757

1

6 9

1

42

7

1

59 9

1

11

2

96

13

8

1

5942

8

1

9

2575

1

0

1

10

100

1,000

1

10

100

1,000

10,000

100,000

1,000,000

0 2 3 5 7 8 10 12 13 15 17 18 20 22 23 25 27 28

IOPS

IOs

Time  (Min)

Real  World  GPS  NAV  SQL  -­‐ Block  IO:  9  IO  Stream  30m  1s2.8%:  SEQ  1.5K  W 2.9%:  RND  28K  W 3%:  RND  1K  W 5.1%:  RND  4K  W6.5%:  SEQ  1K  W 15.7%:  SEQ  0.5K  W 16.2%:  SEQ  16K  W 19.9%:  RND  16K  W28%:  SEQ  4K  W QD IOPS

QueueDepth

IOPS

IOStreams

Page 10: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      10

Source  Workload: GPS  Nav  Portal -­ TestMyWorkload.com/Demo_7

QueueDepth1  -­ 14

GPS  Nav  Portal  Source  Capture• TestMyWorkload.com  

Demo  No.  7

• Block  IO  – Drive0

• 30  min  Duration                                                      1  sec  Resolution

• 14  Processes                                    176  IO  Streams

• 9  IO  Streams                                                89%of  Total  IOs

• 2  Process  IDs  (PIDs):                                              mysqld.exe                    sqlservr.exe

• QD  Range:  1-­14

Page 11: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

C:  Test  Workloads

11

Test Test Flow Description Reference

Synthetic SQL – DIRTHFixed Synthetic Workload

Demand Intensity Response Time HistogramFixed 2 IO Stream Composite with a QD Range 1-1024Run the Fixed Workload to Steady StateApply Thread Count/Queue Depth Loop (T32Q32 – T1Q1)

SNIA SSS PTS 2.0.1Section 14 page 80www.snia.org/pts#

GPS Nav – Auto DIRTHFixed Real World Workload

GPS Nav Portal: 9 IO Stream – 30min / 1sec captureFixed 9 IO Stream Composite with a QD Range 1-1024Run the Fixed Workload to Steady StateApply Thread Count/Queue Depth Loop (T32Q32 – T1Q1)

TestMyWorkload.com Demo No. 7RWSW PTS 1.0.7, Section 10 page 63www.snia.org/rwsw#

Replay – NativeDynamic Real World Workload

Replay the Sequence of IO Stream Combinations & QDs fromthe Original GPS Nav Portal Workload CaptureThe GPS Nav Portal has a Native QD Range 1-14

RWSW PTS 1.0.7, Section 7 page 39www.snia.org/rwsw#

Replay – ScaledDynamic Real World Workload

Run the GPS Nav Replay Native IO Stream Combinations Scale the Native QDs by x2, x4, x8, x16, x32Native QD Range 1-14 is Multiplied by the Scaling Factor

RWSW PTS 1.0.7, Section 7 page 39www.snia.org/rwsw#

Individual StreamsFixed Real World Workload

Run the 9 IO Individual Streams of the GPS Nav Portal Workload to Steady State and Report

RWSW PTS 1.0.7, Section 9 page 54www.snia.org/rwsw#

Page 12: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

C.    Synthetic  SQL  – DIRTH  RND  8K  RW65

12

• DIRTH  – Demand  Intensity  Response  Time  Histogram

• 2  IO  Streams:  RND  8K  RW65

• Fixed  &  Constant  Synthetic  Workload

• 2  IO  Streams  

• 65%  RND  8K  R

• 35%  RND  8K  W

• Outstanding  IO  (OIO)  Loop      T32Q32  – T1Q1

• QD  (Users)  Range  1-­1024

1

10

100

1000

10000

100000

1000000

10000000

0

20,000,000

40,000,000

60,000,000

80,000,000

100,000,000

120,000,000

140,000,000

160,000,000

180,000,000

200,000,000

0 1,161 2,368 3,464 4,675 5,879 6,980 8,188 9,281 10,495 11,699 12,798 14,006

IOPS

IOs

Time  (Min)

Synthetic  SQL  Workload:  RND  8K  65:35  RW  -­‐ QD  1  to  102435%:  RND  8K  W 65%:  RND  8K  R QD    (Users) IOPS

Queue  Depth1  -­ 1024

IOPS

IOStreams

Page 13: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

C.  Real  World  SQL:  DIRTH  GPS  Nav.  Portal

13

• Auto  DIRTH  – 9  IO  Stream  GPS  Nav  Portal

• Fixed  &  Constant  Workload

• Same  9  IO  Streams  for  each  Test  Step• 22.2%  RND  16K  W• 20.2%  SEQ  4K  W• 15.5%  SEQ  0.5K  W• 14.2%  SEQ  16K  W• 12.9%  RND  4K  W• 5.9%  SEQ  1K  W• 3.7%  RND  1K  W• 2.7%  RND  0.5K  W• 2.7%  SEQ  1.5K  W

• Outstanding  IO  Loop  T32Q32  – T1Q1

• QD  (Users)  Range  1-­1024

102451225612864321684 2 1

102451225612864321684 21

10245122561286432168 421

1024512256128643 168 421

102451225612864321 8 4 2 1

1

10

100

1,000

10,000

100,000

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

1 21 41 61 81 101 121 141 161 181 202 222

IOPSIOs

Time  (Min)

Real  World  GPS  NAV  SQL  DIRTH  -­‐ DriveC  9  IO  Stream  30m  1s2.7%:  SEQ  1.5K  W 2.7%:  RND  0.5K  W 3.7%:  RND  1K  W 5.9%:  SEQ  1K  W

12.9%:  RND  4K  W 14.2%:  SEQ  16K  W 15.5%:  SEQ  0.5K  W 20.2%:  SEQ  4K  W

Queue  Depth1  -­ 1024

IOPS

IOStreams

Page 14: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

C.    Real  World  Replay  – GPS  Nav.  Portal

14

• Replay  IO  Streams  &          QD  for  each  Test  Step

• Dynamic  Sequence  of  IO  Streams  &  QD

• 9  IO  Streams• 28.0%  SEQ  4K  W• 19.9%  RND  16K  W• 16.2%  SEQ  16K  W• 15.7%  SEQ  0.5K  W• 6.5%  SEQ  1K  W• 5.1%  RND  4K  W• 3.0%  RND  1K  W• 2.9%  RND  28K  W• 2.8%  SEQ  1.5K  W

• QD  (Users)  Range  1-­14

• Replay  Original  Capture  IO  Streams  &  QD  for  each  Test  Step

10

21

8

2

7 9 5

1

14

2

9

1

5

2

7

31

9

1

5 7

2

47

1

3

9

2

610

2

8

1

7

1

48

2

6 85

1 2

946

9

1

5

2

7 5 7

1

6 9

1

4

2

7

1

59 9

1

11

2

9 6

13

8

1

5942

8

1

9

2

575

1

0

1

10

100

1,000

1

10

100

1,000

10,000

100,000

1,000,000

0 2 3 5 7 8 10 12 13 15 17 18 20 22 23 25 27 28

IOPSIOs

Time  (Min)

Real  World  GPS  NAV  SQL  -­‐ Block  IO:  9  IO  Stream  30m  1s2.8%:  SEQ  1.5K  W 2.9%:  RND  28K  W 3%:  RND  1K  W

QueueDepth

IOPS

IOStreams

Page 15: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      15

C.  Replay:  Replay  Sequence  of  IO  Streams  &  QDs

QueueDepths

IOPS

• Replay  Test  – Play  the  Same  Sequence  of  IO  Stream  &  QDs  from  the  Original  Capture

• Dynamically  changing  IO  &  QD  combinations

• Different  IO  Streams  for  each  test  step

• 1800  1s  Test  Steps                (30m  Dur  x  1s  Res)

IO Stream & QDCombinations

Page 16: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

Part  2:  Direct  Attached  Block  IO  Test

A. DIRTH:  Synthetic  SQL  v  Real  World  GPS  Nav

B. Real  World:  GPS  Nav  DIRTH  v  Replay  

C. Replay:  Scaled  QD  x1,  x2,  x4,  x8,  x16,  x32

16

Page 17: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

A.  DIRTH:  Synthetic  SQL  v Real  World  GPS  -­ IOPS

17

Syn  SQL  DIRTH  -­ RND  8K  RW65Fixed  Synthetic  -­ 2  IO  Streams

RND  8K  R            65.0%            65.0RND  8K  W          35.0%            35.0

Synthetic  SQL  DIRTHA0

GPS  Nav  Auto  DIRTH  –Fixed  Real  World  -­ 9  IO  Streams

Higher  is  Better

SEQ  4K  W                24.8%            28.0RND  16K  W          17.7%            19.9

GPS  Nav  Auto  DIRTHA0

SEQ  16K  W            14.4%            16.2SEQ  0.5K  W          13.9%            15.7SEQ  1K  W                      5.8%                6.5  RND  4K  W                    4.5%                5.1RND  1K  W                2.65%            2.98RND  28K  W            2.54%            2.86SEQ  1.5K  W            2.44%            2.75

Page 18: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

A.  DIRTH:  Syn.  SQL  – IOPS,  ART,  QoS  (5  9s  RT  – 99.999%)

18

Syn  SQL  DIRTH  – RND  8K  RW65Fixed  Synthetic  -­ 2  IO  Streams

QD  (Users)  Range  1-­1024

RND  8K  R            65.0%            65.0RND  8K  W          35.0%            35.0

Synthetic  SQL  DIRTHA0

ART

PM-­‐N

1,052,928

0.015  mS0.090  mS

IOPS

QoS

ART

U.2-­‐I

72,077

0.890  mS29.940  mS

U.2-­‐H

213,178

0.300  mS11.140  mS

IOPS

QoS

Page 19: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

B.  GPS  Nav:  Auto  DIRTH – IOPS,  ART,  QoS  (5  9s  RT  – 99.999%)

19

ART

PM-­‐N

715,393

0.011  mS0.100  mS

IOPS

QoS

ART

U.2-­‐I

20,832

1.540  mS28.450  mS

U.2-­‐H

135,589

0.230  mS5.550  mS

IOPS

QoS

SEQ  4K  W                24.8%            28.0RND  16K  W          17.7%            19.9

GPS  Nav  Auto  DIRTHA0

SEQ  16K  W            14.4%            16.2SEQ  0.5K  W          13.9%            15.7SEQ  1K  W                      5.8%                6.5  RND  4K  W                    4.5%                5.1RND  1K  W                2.65%            2.98RND  28K  W            2.54%            2.86SEQ  1.5K  W            2.44%            2.75

Page 20: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

B.  GPS  Nav:  Auto  DIRTH  v  Replay  -­ IOPS

20

GPS  Nav  Replay  – Replay  Native  &  Scaled

Replay  Native  QD  Range  1-­‐14Replay  Scaled  QD  x32

GPS  Nav  ReplayA0

GPS  Nav  Auto  DIRTH:  9  IO  Streams

Higher  is  Better

SEQ  4K  W                24.8%            28.0RND  16K  W          17.7%            19.9

GPS  Nav  Auto  DIRTHA0

SEQ  16K  W            14.4%            16.2SEQ  0.5K  W          13.9%            15.7SEQ  1K  W                      5.8%                6.5  RND  4K  W                    4.5%                5.1RND  1K  W                2.65%            2.98RND  28K  W            2.54%            2.86SEQ  1.5K  W            2.44%            2.75

Page 21: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

C.  Replay:  PM-­N – IOPS,  ART,  QoS  (5  9s  RT  – 99.999%)

21

GPS  Nav  Replay  – Scaled  QD

ART

Scaled  QD  x32

822,347

130  mS330  mS

Scaled  QD  x1

381,988

9.0  mS100  mS

IOPS

QoS

• Lowest  Response  Times  at  QDx1

• Highest  IOPS  Spike  at  QDx32

• Response  Times  Increase  Linearly  

Replay  Native  QD  Range  1-­‐14Replay  Scaled:  Native  QD  x2-­‐x32

GPS  Nav  ReplayA0

Page 22: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

C.  Replay:  NVMe  U.2-­H  – IOPS,  ART,  QoS  (5  9s  RT  – 99.999%)

22

ART

QD  x4

155,085

0.090  mS1.680  mS

QD  x1

125,598

0.025  mS0.670  mS

IOPS

QoS

GPS  Nav  Replay  – Scaled  QD

• Lowest  Response  Times  at  QDx1

• Highest  IOPS  at  QDx4,  Decrease  at  QDx8

• Response  Times  begin  to  Spike  at  QDx8

Replay  Native  QD  Range  1-­‐14Replay  Scaled:  Native  QD  x2-­‐x32

GPS  Nav  ReplayA0

Page 23: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

C.  Replay:  NVMe  U.2-­I  – IOPS,  ART,  QoS  (5  9s  RT  – 99.999%)

23

ART

QD  x2

9,934

1.24  mS15.21  mS

QD  x1

8,703

0.68  mS12.34  mS

IOPS

QoS

GPS  Nav  Replay  – Scaled  QD

• Lowest  Response  Times  at  QDx1

• Highest  IOPS  at  QDx2,  Decrease  at  QDx4  

• Response  Times  Increase  Linearly

Replay  Native  QD  Range  1-­‐14Replay  Scaled:  Native  QD  x2-­‐x32

GPS  Nav  ReplayA0

Page 24: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      24

D.  Ind  Streams: PM-­N  – 9  IO  StreamsGPS  Nav  – Individual  Streams9  IO  Streams  to  Steady  State  

• PM-­N  IOPS  are  Higher  than                at  smaller  transfer  sizes  

• Weighted  IO  Stream  Mix  determines  workload  performance

SEQ  4K  W                28.0%RND  16K  W          19.9%

GPS  Nav  Ind  StreamsA0

SEQ  16K  W            16.2%SEQ  0.5K  W          15.7%

SEQ  1K  W                      6.5%

RND  4K  W                    5.1%

RND  1K  W                2.98%

RND  28K  W            2.86%

SEQ  1.5K  W            2.75%

Higher  is  Better

Page 25: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      25

D.  Ind  Streams: U.2-­H  v  U.2-­I  – 9  IO  Streams

• NVMe  U.2-­H  IOPS  are  Higher  than  NVMe  U.2-­I    IOPS  at  all  transfer  sizes

Higher  is  Better

SEQ  4K  W                28.0%RND  16K  W          19.9%

GPS  Nav  Ind  StreamsA0

SEQ  16K  W            16.2%SEQ  0.5K  W          15.7%

SEQ  1K  W                      6.5%

RND  4K  W                    5.1%

RND  1K  W                2.98%

RND  28K  W            2.86%

SEQ  1.5K  W            2.75%

GPS  Nav  – Individual  Streams9  IO  Streams  to  Steady  State  

Page 26: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

Part  3:  Direct  v  Fabric

1. Direct  Attached  v  Remote  Fabric

2. Direct  PM  v  Remote  PM  A. Synthetic  SQL  DIRTHB. Real  World  Auto  DIRTHC. Individual  Streams  WSAT

3. Direct  NVMe  v  NVMe-­oF  A. Synthetic  SQL  DIRTHB. Real  World  Auto  DIRTHC. Individual  Streams  WSAT

26

Page 27: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      27

1.  Direct  v  Fabric: Architectures  – DAS  v  NMVe-­oF

Target  server

Initiator  server

Calypso  CTS  6.5  

NVMe  transport

NVMe-­oF  initiator

Linux  block  device100GbE  NIC

100GbE  NIC

NVDIMM-­N  as  storage

NVMe-­oF  targetNVMe  driver

Fabric  enables  remote  accessThis  test  used  RDMA/RoCE  on  100GbEAlso  can  use  FC,  IB,  TCP/IP

Fabric  always  adds  latencyBut  not  always  noticeableEffect  depends  on  workload  and  media  latencyRDMA  improves  network  efficiency,  reduces  fabric  latency

Page 28: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      28

A.  PM  v  RPM: Syn.  DIRTH  – IOPS,  ART,  QoS  (5  9s  RT  – 99.999%)Syn  SQL  DIRTH  – Fixed  RND  8K  RW65QD  (Users)  Range  1-­1024

RND  8K  R            65.0%            65.0RND  8K  W          35.0%            35.0

Synthetic  SQL  DIRTHA0

IOPSARTQoS

Remote1,036,8340.060  mS0.200  mS

Direct  1,052,9280.015  mS0.090  mS

• PM-­N  IOPS  are  Higher  than                    RPM-­N  IOPS

• PM-­N  RTs  are  Lower  than                    RPM-­N  RTs

Page 29: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      29

B.  PM  v  RPM:  GPS  Nav.  DIRTH  – IOPS,  ART,  QoS  (5  9s  RT  – 99.999%)GPS  Nav  Auto  DIRTH  – Fixed  9  IO  Streams

SEQ  4K  W                24.8%            28.0RND  16K  W          17.7%            19.9

GPS  Nav  Auto  DIRTHA0

SEQ  16K  W            14.4%            16.2SEQ  0.5K  W          13.9%            15.7SEQ  1K  W                      5.8%                6.5  RND  4K  W                    4.5%                5.1RND  1K  W                2.65%            2.98RND  28K  W            2.54%            2.86SEQ  1.5K  W            2.44%            2.75

IOPSARTQoS

Remote795,4380.080  mS0.400  mS

Direct  715,3930.011  mS0.100  mS

• PM-­N  IOPS  are  Lower  than  RPM-­N  

• PM-­N  RTs  are  Lower  than  RPM-­N

Page 30: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      30

C.  PM  v  RPM:  Individual  Streams  -­ IOPSGPS  Nav  – Individual  Streams9  IO  Streams  to  Steady  State  

• RPM-­N  IOPS  are  Higher  than                  PM-­N  IOPS  at  transfer  sizes  <  4K

• Weighted  IO  Stream  Mix  determines  workload  performance

SEQ  4K  W                28.0%RND  16K  W          19.9%

GPS  Nav  Ind  StreamsA0

SEQ  16K  W            16.2%SEQ  0.5K  W          15.7%SEQ  1K  W                      6.5%RND  4K  W                    5.1%RND  1K  W                2.98%RND  28K  W            2.86%SEQ  1.5K  W            2.75%

Higher  is  Better

Page 31: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      31

C.  PM  v  RPM: Individual  Streams  – Average  Response  TimesGPS  Nav  – Individual  Streams9  IO  Streams  to  Steady  State  

• RPM-­N  RTs  are  Lower  than                      PM-­N  RTs  at  transfer  sizes  <  4K,  the  same  at  4K  and  higher  >  4K

• Weighted  IO  Stream  Mix  determines  workload  performance

SEQ  4K  W                28.0%RND  16K  W          19.9%

GPS  Nav  Ind  StreamsA0

SEQ  16K  W            16.2%SEQ  0.5K  W          15.7%SEQ  1K  W                      6.5%RND  4K  W                    5.1%RND  1K  W                2.98%RND  28K  W            2.86%SEQ  1.5K  W            2.75%

Lower  is  Better

Page 32: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      32

A.  NVMe  v  NVMe-­oF  – U.2-­H:  Synthetic  DIRTHSyn  SQL  DIRTH  – Fixed  RND  8K  RW65QD  (Users)  Range  1-­1024

RND  8K  R            65.0%            65.0RND  8K  W          35.0%            35.0

Synthetic  SQL  DIRTHA0

IOPSARTQoS

NVMe-­‐oF290,0500.44  mS12.46  mS

NVMe213,1780.30  mS11.14  mS

• NVMe  U.2-­H  IOPS  are  Lower  than  NVMe-­oF  U.2-­H

• NVMe  U.2-­H  RTs  are  Lower  than  NVMe-­oF  U.2-­H

Page 33: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      33

A.  NVMe  v  NVMe-­oF  – U.2-­I:  Synthetic  DIRTHSyn  SQL  DIRTH  – Fixed  RND  8K  RW65QD  (Users)  Range  1-­1024

RND  8K  R            65.0%            65.0RND  8K  W          35.0%            35.0

Synthetic  SQL  DIRTHA0

IOPSARTQoS

NVMe-­‐oF97,5092.62  mS49.22  mS

NVMe72,0770.89  mS25.94  mS

• NVMe  U.2-­I  IOPS  are  Lower  than  NVMe-­oF  U.2-­I

• NVMe  U.2-­I  RTs  are  Lower  than  NVMe-­oF  U.2-­I

Page 34: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      34

B. NVMe  v  NVMe-­oF  – U.2-­H:  Auto  DIRTHGPS  Nav  Auto  DIRTH  – Fixed  9  IO  Streams

SEQ  4K  W                24.8%            28.0RND  16K  W          17.7%            19.9

GPS  Nav  Auto  DIRTHA0

SEQ  16K  W            14.4%            16.2SEQ  0.5K  W          13.9%            15.7SEQ  1K  W                      5.8%                6.5  RND  4K  W                    4.5%                5.1RND  1K  W                2.65%            2.98RND  28K  W            2.54%            2.86SEQ  1.5K  W            2.44%            2.75

IOPSARTQoS

NVMe-­‐oF125,0520.26  mS7.25  mS

NVMe  136,5890.23  mS5.55  mS

• NVMe  U.2-­H  IOPS  are  Higher  than  NVMe-­oF  U.2-­H  

• NVMe  U.2-­H RTs  are  Lower  than  NVMe-­oF  U.2-­H  

Page 35: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      35

B. NVMe  v  NVMe-­oF  – U.2-­I:  Auto  DIRTHGPS  Nav  Auto  DIRTH  – Fixed  9  IO  Streams

SEQ  4K  W                24.8%            28.0RND  16K  W          17.7%            19.9

GPS  Nav  Auto  DIRTHA0

SEQ  16K  W            14.4%            16.2SEQ  0.5K  W          13.9%            15.7SEQ  1K  W                      5.8%                6.5  RND  4K  W                    4.5%                5.1RND  1K  W                2.65%            2.98RND  28K  W            2.54%            2.86SEQ  1.5K  W            2.44%            2.75

IOPSARTQoS

NVMe-­‐oF20,4841.56  mS28.00  mS

NVMe  20,8321.54  mS28.45  mS

• NVMe  U.2-­I  IOPS  are  substantially  the  same  as  NVMe-­oF  U.2-­I  

• NVMe  U.2-­I  RTs  are  substantially  the  same  NVMe-­oF  U.2-­I  RTs

Page 36: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      36

C. NVMe  v  NVMe-­oF  – U.2-­H:  Ind.  Streams  -­ IOPS

Higher  is  Better

GPS  Nav  – Individual  Streams9  IO  Streams  to  Steady  State  

• NVMe  U.2-­H  IOPS  are  Higher  than  NVMe-­oF  U.2-­H  at  most  transfer  sizes  

• NVMe-­oF  U.2-­H  SEQ  4K  W  IOPS  are  Higher  than  NVMe  U.2-­H

SEQ  4K  W                28.0%RND  16K  W          19.9%

GPS  Nav  Ind  StreamsA0

SEQ  16K  W            16.2%SEQ  0.5K  W          15.7%SEQ  1K  W                      6.5%RND  4K  W                    5.1%RND  1K  W                2.98%RND  28K  W            2.86%SEQ  1.5K  W            2.75%

Page 37: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      37

C. NVMe  v  NVMe-­oF  – U.2-­H:  Ind.  Streams  -­ ART

Lower  is  Better

GPS  Nav  – Individual  Streams9  IO  Streams  to  Steady  State  

• NVMe  U.2-­H  ART  generally    are  Higher  than  NVMe-­oF  U.2-­H  at  larger  transfer  sizes  

• NVMe  U.2-­H  ART  generally    are  Lower  than  NVMe-­oF  U.2-­H  at  smaller  transfer  sizes  

SEQ  4K  W                28.0%RND  16K  W          19.9%

GPS  Nav  Ind  StreamsA0

SEQ  16K  W            16.2%SEQ  0.5K  W          15.7%SEQ  1K  W                      6.5%RND  4K  W                    5.1%RND  1K  W                2.98%RND  28K  W            2.86%SEQ  1.5K  W            2.75%

Page 38: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      38

C. NVMe  v  NVMe-­oF  – U.2-­I:  Ind.  Streams  -­ IOPS

Higher  is  Better

GPS  Nav  – Individual  Streams9  IO  Streams  to  Steady  State  

• NVMe  U.2-­I  IOPS  are  substantially  similar  to    NVMe-­oF  U.2-­I  IOPS

SEQ  4K  W                28.0%RND  16K  W          19.9%

GPS  Nav  Ind  StreamsA0

SEQ  16K  W            16.2%SEQ  0.5K  W          15.7%SEQ  1K  W                      6.5%RND  4K  W                    5.1%RND  1K  W                2.98%RND  28K  W            2.86%SEQ  1.5K  W            2.75%

Page 39: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      39

C. NVMe  v  NVMe-­oF  – U.2-­I:  Ind.  Streams  -­ ART

Lower  is  Better

GPS  Nav  – Individual  Streams9  IO  Streams  to  Steady  State  

• NVMe  U.2-­I  ART  are  substantially  similar  to  NVMe-­oF  U.2-­I  ART

• ART  are  Very  High  for  SEQ  0.5KW                            and  SEQ  1K  W  for  NVMe  U.2-­I  and  NVMe-­oF  U.2-­I

SEQ  4K  W                28.0%RND  16K  W          19.9%

GPS  Nav  Ind  StreamsA0

SEQ  16K  W            16.2%SEQ  0.5K  W          15.7%SEQ  1K  W                      6.5%RND  4K  W                    5.1%RND  1K  W                2.98%RND  28K  W            2.86%SEQ  1.5K  W            2.75%

Page 40: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

Conclusions

40

A. PM  -­ Faster  IOPS  &  lower  RTs  by  order(s)  magnitude  over  NVMe

B. Synthetic  SQL  – Higher  Demand  Intensity,  65%  R  workload  =  Higher  performance

C. Real  World  Replay  -­ Lower  Demand  Intensity  &  Write  workload  =  Lower  performance

D. Scaled  Replay  – Increases  IOPS  and  Response  Times

E. Fabric  – Adds  Response  Time  Overhead

F. NVMe-­oF – Slower  U.2  shows  Equivalent  Performance  for  Direct  and  Fabric  NVMe  Storage

G. Performance  – Affected  by  Workload,  Demand  Intensity,  Storage  Type  and  Inter  connect

Page 41: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      

Questions?

41

[email protected]

[email protected]

Page 42: PM Performance Benchmarking & Comparison - PM v NVMe FINAL · 2019-12-21 · ©2019SNIA(Persistent(MemorySummit.(All(RightsReserved. B:(Real(World(Source(Captures 7 Item Description

©  2019  SNIA  Persistent  Memory  Summit.  All  Rights  Reserved.      42