DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory...

38
FuturePlus Systems Corporation 15 Constitution Drive Bedford NH 03110 USA Barbara P. Aichinger Vice President New Business Development DDR4 Memory Compliance Testing

Transcript of DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory...

Page 1: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

FuturePlus Systems Corporation

15 Constitution Drive

Bedford NH 03110 USA

Barbara P. AichingerVice President New Business Development

DDR4 Memory Compliance Testing

Page 2: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Agenda

• DDR Memory Standards for Compliance Testing

• Memory problems continue to plague the industry– Recent Published Papers

– Row Hammer Failures

– Security Issues

• The concept of an Audit for Compliance Testing– Electrical

– Protocol

– Row Hammer

– SPD/MRS

– Performance/Margin

• Summary

Page 3: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Compliance Testing Documents

• Not yet…getting closer…

• FuturePlus Systems Sponsoring a

Protocol Checks Document

– Task Group has several Industry members

and several T&M vendors

– Several ballots have been passed and a

document is expected in 2017

Page 4: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Memory Errors continue to

plague the industry

Page 5: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Memory Errors in Modern Systems

Page 6: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

This is called Thresholding

Page 7: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Average ~2%

Page 8: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.
Page 9: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Errors in Facebook’s Fleet of

Servers

Page 10: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

If FB has 100K Servers• ~2% have a memory failure every month

• Of that number 46% of those have a DIMM

swap

• Doing the math….2% of 100K is 2000

• 46% of 2000 = 920 DIMM Swaps a Month!

• 30 days a month, 24 hours a day = 720 hours

in a month

Facebook is swapping out DIMMs every hour of every day of every month all year long!

Page 11: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

An Update on Row Hammer

Failures

• Seen on DDR4

– Passmark Blog

• Several reports for DDR4 failing the Row Hammer

test

– ThirdIO paper

• http://www.thirdio.com/rowhammer.pdf

– Usenix

– Blackhat

– SGI seeing DDR4 RH failures in HPC

Page 12: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Row Hammer

A quick review!

0

1

0 0 0 0 0 00 0

11 1 11

ActivateCommand

Columns

Rows (pages)

Victim Row

Page 13: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

USENIX Security Symposium

August 2016

Page 14: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

ECC will not save you!

Page 15: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Row Hammer Failures on

DDR4

https://www.sgi.com/pdfs/4567.pdf

Page 16: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Introducing: The concept of an

AUDIT for JEDEC Compliance

Testing

• Not a repeat of a Design Verification

• A check to make sure the JEDEC

specification is being met

Page 17: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

For the System and DIMM

• Audit the signal integrity of the memory channel

• Monitor the system for Protocol Violations

– BIOS programming errors

– SPD programmed incorrectly

– Memory Controller Issues

• SPD Check

• Row Hammer Testing

• Performance/Margin Testing

Page 18: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Using a Scan from a Logic

Analyzer instead of a Scope

• Allows for an easy and quick check of:

– Signal Alignment

– Relative Data Valid Eye

– Signal Swing

Page 19: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

To see all signals at once a slot

interposer is used

Page 20: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

DIMM Slot Interposerallows the system to operate up to 4200MT/s and run any application

Page 21: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Audit: Signal Swing

Slide Courtesy of

Overdriving DDR4 DRAM

to 1.4V could cause

damage.

Potential ODT setting issue. Threshold of first bit in burst has less swing than remainder of burst.Could also be ISI (inter-symbol interference)

Page 22: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Audit: Signal Alignment

For READS the Strobe is level

aligned For WRITES the

Strobe is Edge Aligned to the

Data

Page 23: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Signal Alignment

All the Data signals in a

Byte should be aligned

Page 24: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Relative Data Eye

DQ Write Eye overlay on Byte 5

5000 cycles (2400MT/s)

Eye threshold

centered at 790mv – 838mv

Eye size

Avg. of 272mV x 205 ps

Observations

All eyes are consistent in size and alignment.

Page 25: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Address Signals

Page 26: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Easy to check even at higher speeds

3200MT/s

Read data with Strobe

Write data with Strobe

Page 27: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Next Check for JEDEC Protocol

Violations by the memory controller

• The DDR4 JEDEC spec contains rules on

event ordering

• Examples

– Do not ACTIVATE a bank that is already open

– Do not PRECHARGE a bank that is already

closed

– Do not RD/WR a non open page

Page 28: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Memory Controller

Timing Violations

• Clock edge boundary

– Commands can not be too close together or too far apart

– Examples

• tREFI - Average refresh interval

• tRC - ACT to ACT or REF

• tMOD - MRS to PDE

• tCCD_L - RD to RD to Same Bank Group

Page 29: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

65 violations identified with over

1000+ simultaneous checks

Page 30: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Protocol and Timing Compliance

‘in the wild’

Page 31: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

JEDEC Specification Violation

Page 32: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

The SPD has to be checked!Serial Presence Detect Device

Mistakes in the SPD can lead to the BIOS not

programming the Memory

Controller correctly

Page 33: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Mode Register Settings

Page 34: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Performance MetricsNot necessary for JEDEC compliance but a nice to

know!

• Which power management features are implemented

– Is the clock stopped in Self Refresh?

– Is Max Power Down implemented?

• Can we look to see if any timing parameters can be improved?

Page 35: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Increasing Performance by

looking at timing marginsRD to WR same Rank

Spec says 7 system operating at 10

Operating right at

Specification

Not happening! No Power

Management

Page 36: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Making the Measurement

Photos Courtesy of Keysight Technologies Photos Courtesy of FuturePlus Systems

Page 37: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Summary

• Memory Errors in the Field are pervasive!

• DDR Memory Compliance Testing can be

achieved using the method outlined

• Tools are available

– Purchase or Rent

• Companies needing help can hire industry

experts to perform the testing for them

Page 38: DDR4 Memory Compliance Testing - memcon.com · DDR4 Memory Compliance Testing. ... • Memory problems continue to plague the industry ... Errors in Facebook’s Fleet of Servers.

Contact Information

Barbara P. Aichinger

FuturePlus Systems

[email protected]

603-472-5905

www.FuturePlus.com

www.DDRDetective.com