Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin...

19
Coherent Storage: the Brave New World of Non-Volatile Main Memory ©2016 Western Digital Corporation or affiliates. All rights reserved. Dejan Vucinic July 12 th , 2016

Transcript of Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin...

Page 1: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

Coherent Storage: the Brave NewWorld of Non-Volatile Main Memory

©2016 Western Digital Corporation or affiliates. All rights reserved.

Dejan VucinicJuly 12th, 2016

Page 2: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

©2016 Western Digital Corporation or affiliates. All rights reserved.

Credits

• Zvonimir Bandic

• Kiran Gunnam

• Martin Lueker-Boden

• Luis Vittorio Cargnini

• Qingbo Wang

• Damien Le Moal

• Cyril Guyot

• Md Kamruzzaman

• Chao Sun

• Minghai Qin

• Luiz Franca-Neto

• Seung-Hwan Song

• Filip Blagojevic

• Robert Mateescu

Page 3: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

Emerging Resistive Non-Volatile Memories

©2016 Western Digital Corporation or affiliates. All rights reserved. 3

From H.-S. P. Wong et al. Proc IEEE 2010

ReRAM

PCM

From Akinaga, Shima, Proc IEEE 2010

Page 4: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

©2016 Western Digital Corporation or affiliates. All rights reserved.

Read Latency of Emerging Resistive NVMs

DRAM PCM ReRAM NAND HDSRAM

Page 5: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

Where can we attach eNVMs?

Is it memory?

©2016 Western Digital Corporation or affiliates. All rights reserved. 5

Is it storage?

• Doesn't work today• Major changes required toDDR protocol, controller IP

• Works well today, but meh

• Latency of fast SSD dominatedby PCIe latency—lost mainadvantage of resistive NVM

Page 6: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

DDR

Once upon a time...

©2016 Western Digital Corporation or affiliates. All rights reserved. 6

CPU die

CPUfr

ont

side

bus

DRAMDRAM

DRAM

PCI

Northbridge

SouthbridgeSCSIHBA

Page 7: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

DDR

99.3% of servers today look like this

©2016 Western Digital Corporation or affiliates. All rights reserved. 7

CPU die

CPU

CPU

cohe

renc

e bu

s DRAMDRAM

DRAM

CPU

Memory ctrl

SSD

NANDNANDNAND

PCIePeripheral ctrl HBA SAS, SATA

Page 8: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

PCIe

DDR

DRAM controller keeps coherence state

©2016 Western Digital Corporation or affiliates. All rights reserved. 8

CPU die

CPU

CPU

CSco

here

nce

bus DRAM

DRAMDRAM

CPU

Log Phy

SSD

Phy Phy TL Enc BCHNANDNANDNAND

WL

H

LogRoot

Page 9: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

DDR

Don't do this

©2016 Western Digital Corporation or affiliates. All rights reserved. 9

CPU die

CPU

CPU

CSco

here

nce

bus

CPU

Log PhyTL Enc BCH

WL

NVMNVMNVM

Page 10: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

Let's do this instead!

©2016 Western Digital Corporation or affiliates. All rights reserved. 10

CPU die

CPU

CPU

cohe

renc

e bu

s

CPU

Phy

NVM

CS TL Enc BCH

WL

PhyNVMNVMNVM

Page 11: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

©2016 Western Digital Corporation or affiliates. All rights reserved.

Why?

How?

Page 12: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

Response latency jitter

• ECC: Hamming, BCH, LDPC?– BER too high for Hamming

– LDPC not needed/too slow

– BCH is well suited but variable latency

– Code should be chosen for a particular NVM, don'ttry and put a universal engine on the CPU

• Other causes of variability in response times– Write/read asymmetry delays reads

– Macroevents: overheating, wear leveling

• It is not cost-effective to architect resistiveNVMs with deterministic latency

– DDR/DIMM was not designed for jitterymemory; coherence protocol was!

©2016 Western Digital Corporation or affiliates. All rights reserved. 12

late

ncy

Increasing BER Decreasin

g fracti

on of pack

ets

Page 13: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

Wear leveling, data protection at rest

• Flash-like translation layer is too heavy, probably not needed– e.g. GB table for TB of memory

• Start-gap schemes are lightweight, but vulnerable to malicious code– May not be adequate for some types of resistive NVMs

– Are you sure your scheme has no vulnerabilities?

• Aging controller– We have devised (and patented) translation schemes that are very fast, but have high up-front computational cost

– One cost effective solution is to store pre-computed vectors as fuses in the controller

• Encryption of non-volatile working set– Scrubbing is not adequate, don't trust the programmer; interaction with wear leveling

• Hot-pluggable?

Controller belongs with non-volatile media!

©2016 Western Digital Corporation or affiliates. All rights reserved. 13

Page 14: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

©2016 Western Digital Corporation or affiliates. All rights reserved.

Why?

How?

Page 15: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

Coherent storage controller in reconfigurable logic

©2016 Western Digital Corporation or affiliates. All rights reserved. 15

Zynq FPGA

FPGA fabric

CPU subsystem

A9

A9

snoopco

here

nce

bus

DDRor

SERDESCSC IP

?

NVM

ACP

Page 16: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

©2016 Western Digital Corporation or affiliates. All rights reserved. 16

Coherent storage controller in reconfigurable logic

• Comparable latency, but– CPU is 5-8x slower– Coherence bus is 12x slower– Cost is 2-50x lower

Page 17: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

Coherent scale-out through external fabric

©2016 Western Digital Corporation or affiliates. All rights reserved. 17

Page 18: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

Risc-V Shopping List*

• Hardware Coherence: Yes, please!– e.g. 300 Gib/s 40 ns on die, chip to chip

• Fast, wide ports for peripherals to join the coherence domain– e.g. opening into programmable logic, or scalable fabric (RapidIO?)– Unique advantage of the Risc-V ecosystem over competition

• Relinquish the non-volatile memory controller for now– Competing technologies make attempts at universal solution

risky over the next decade

• Get used to high variability in main memory response time– Hyperthreads are bad memories' best friend

©2016 Western Digital Corporation or affiliates. All rights reserved. 18

* for the enterprise

Page 19: Coherent Storage: the Brave New World of Non …...• Md Kamruzzaman • Chao Sun • Minghai Qin • Luiz Franca-Neto • Seung-Hwan Song • Filip Blagojevic • Robert Mateescu

©2016 Western Digital Corporation or affiliates. All rights reserved. 19