Presented by Sam Schiferl and Pedram Zamirai Study of DRAM ... - EECS... · Presented by Sam...

Post on 04-Jun-2020

1 views 0 download

Transcript of Presented by Sam Schiferl and Pedram Zamirai Study of DRAM ... - EECS... · Presented by Sam...

Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance ErrorsISCA 2014

Yoongu Kim1 Ross Daly1 Jeremie Kim1 Chris Fallin1 Ji Hye Lee1 Donghyuk Lee1 Chris Wilkerson2 Konrad Lai Onur Mutlu1

1Carnegie Mellon University 2Intel Labs

Presented by Sam Schiferl and Pedram Zamirai

Outline

1. Motivation2. DRAM Structure3. Disturbance Errors4. Test System Setup5. Results6. Proposed Solution7. Conclusion8. Discussion

2

Motivation

● As DRAM process technology continues to downscale, memory reliability suffers due to:

○ Smaller cell holds limited charge○ Cells are closer together, which can lead to electromagnetic coupling○ Higher variation in process technology

● These issues can lead to the violation of memory isolation○ An access to one memory address should not have unintended side effects on data stored

in other addresses

● The authors investigate the vulnerability of three major commodity DRAM manufacturers to targeted disturbance error attacks

3

DRAM Structure

Single memory cell1Rows of cells1 Figure from paper

● Charge stored in capacitor to represent 0/1

● Access transistor used to read/write data to specific cell

4

DRAM Access

Single memory cell1Rows of cells1 Figure from paper

1. Row’s wordline is raised to high2. Row-buffer reads/write desired

columns3. Row’s wordline is closed

5

DRAM Access

Single memory cell1Rows of cells1 Figure from paper

1. Row’s wordline is raised to high2. Row-buffer reads/write desired

columns3. Row’s wordline is closed

6

DRAM Access

Single memory cell1Rows of cells1 Figure from paper

1. Row’s wordline is raised to high2. Row-buffer reads/write desired

columns3. Row’s wordline is closed

7

DRAM Access

Single memory cell1Rows of cells1 Figure from paper

1. Row’s wordline is raised to high2. Row-buffer reads/write desired

columns3. Row’s wordline is closed

8

DRAM Access

Single memory cell1Rows of cells1 Figure from paper

1. Row’s wordline is raised to high2. Row-buffer reads/write desired

columns3. Row’s wordline is closed

9

DRAM Refresh

● The charge of a memory cell constantly leaks, eventually leading to a loss of data

● Data must be refreshed periodically by raising the wordline

● DRAM specifications guarantee a retention time before the cell loses data

○ 64 ms retention time for DDR310

DRAM Refresh

● The charge of a memory cell constantly leaks, eventually leading to a loss of data

● Data must be refreshed periodically by raising the wordline

● DRAM specifications guarantee a retention time before the cell loses data

○ 64 ms retention time for DDR311

Disturbance Errors

● Unwanted interaction between two isolated circuit components

● Repeatedly toggling the voltage of a wordline can cause cells in nearby rows to leak charge at a faster rate - leak entire charge prior to refresh

● Causes:○ Noise injection○ Bridges○ Hot-carrier injection

Aggressor

Victims

Victims

12

Disturbance Error Attack

● Repeatedly read data from same row in DRAM and track bit flips in other DRAM rows

● Flush line from cache after each readmov (X), %eax mov (Y), %ebx clflush (X) clflush (Y)mfencejmp code1a

X & Y map to the same bank, but different rows

mov (X), %eax clflush (X)

mfencejmp code1a

Induces errors

Does not induce errors

13

Experimental Methodology

● Testing platform○ 8 Xilinx FPGA boards○ DDR3-800 memory controller○ Run at 50�C

● DRAM modules○ 129 DDR3 DRAM modules○ 972 DRAM chips

● Test Parameters○ Activation Interval (AI)○ Refresh Interval (RI)○ Data Pattern (DP)

14

Types of Tests

1. Toggle all lines in module repeatedly and locate all disturbed cells○ Quickly identify all disturbed cells throughout an entire module

2. Toggle single row repeatedly and identify specific disturbed cells○ Correlate victim cells with aggressor rows

15

Manufacturing Date

● No error in 19 oldest modules● Relatively recent phenomenon

16

Effective Parameters

● Access patterns○ Repeated toggling of wordline○ Opening & closing cause the problem

● Refresh interval (RI)● Activation interval (AI)● Data Patterns

Access Pattern Disturbance Errors?

(open-read-close)N Yes

(open-write-close)N Yes

open-readN-close No

open-writeN-close No

17

Effective Parameters

● Access patterns● Refresh interval (RI)

○ RI ↓ ⇒ Errors ↓■ Less leakage■ Less row openings

● Activation interval (AI)● Data Patterns

18

Effective Parameters

● Access patterns● Refresh interval (RI)● Activation interval (AI)

○ AI ↑ ⇒ Errors ↓■ Less row openings in each RI

● Data Patterns

19

Effective Parameters

● Access patterns● Refresh interval (RI)● Activation interval (AI)● Data Patterns

○ Victim cells lose charge when they are disturbed○ True-cell: High voltage = 1 ○ Anti-cell: High voltage = 0○ True is dominant○ Errors are mostly 1 → 0

20

Address Correlation

● No errors in aggressor itself● Strong peaks at ±1

○ Great effect on two immediate neighbor○ Logical and physical adjacency highly correlate

● Errors in non-adjacent rows○ Physically-adjacent ⇎ Logically-adjacent

21

Sensitivity Results

● Errors are mostly repeatable○ Ten iterations of testing○ Relatively constant average number of errors (±0.25%)

● Victim cells ≠ Weak cells○ Weak cells = cells with shortest retention time

● Not strongly affected by temperature○ ±20�C from ambient temperature → No effect

22

Probabilistic Adjacent Row Activation (PARA)

● After closing a row, memory controller might refresh one of the adjacent rows by probability of P (small constant)

○ Stateless solution

● It picks one of the neighbors randomly● Number of accesses ↑ ⇒ Refresh Probability ↑● Cannot prevent disturbance errors with absolute certainty

23

Conclusion

● Demonstrated, characterized and analyzed disturbance errors● Repeated accesses to the same row corrupts data in other rows● Emerging problem (affect current and future computing systems)● Proposed several solutions

24

Discussion Points

● Does the type of processor (ARM vs x86) have an effect on the feasibility of the attack?

25

Discussion Points

● Does the type of processor (ARM vs x86) have an effect on the feasibility of the attack?

● How practical is their PARA solution that relies on probabilistically refreshing candidate victim rows?

26

Discussion Points

● Does the type of processor (ARM vs x86) have an effect on the feasibility of the attack?

● How practical is their PARA solution that relies on probabilistically refreshing candidate victim rows?

● Should this attack be mitigated with a software or a hardware solution?

27

Potential Solutions

Solution Probable Defect

Make better chips Future smaller cells

Correct errors High cost & unable to correct multi-bit errors

Refresh all rows frequently Degrade performance and energy efficiency

Map faulty cells to spare cells (manufacturer) Not enough spare cells

Retire cells (end-user)1. Disable/remap faulty addresses2. Refresh faulty addresses more frequently

1: Every row in the module is a victim row2: refreshes victim rows more frequently even when there is no access to the module

Identify “hot” rows and refresh neighbors High hardware overhead to identify hot rows28