CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH...

33
CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH * , SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY OF PITTSBURGH HPCA 2015

Transcript of CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH...

CAFO: Cost Aware Flip Optimization for Asymmetric

Memories

RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM

COMPUTER SCIENCE DEPARTMENT

UNIVERSITY OF PITTSBURGH

HPCA 2015

Introduction DRAM and NAND Flash are facing physical limitations putting

their scalability into question• DRAM: Decrease in cell reliability and Increase in power consumption• NAND Flash: Endurance degradation and Increase in number of

transient and hard errors

Phase-Change Memory (PCM) and Spin-Transfer Torque Random Access Memory (STT-RAM)are a promising alternative• Scalability, low access latency and close to zero leakage power• Initial assessments and evaluations are encouraging

Challenges PCM and STT-RAM have a number of challenges that

needs to be dealt with before deployment in functional systems• PCM suffers from limited endurance • STT-RAM suffers from high write bit error rate

Solution: Bit flip minimization• Service write requests while flipping as few bits as possible• Preserves PCM’s endurance and improves STT-RAM’s write

reliability

Previous Work

• Differential Write: compares old data against new data and then only flips differing cells.

• Flip-N-Write: encodes write data into either its regular or inverted form and then picks the encoding that yields in less flips in comparison against old data

• Flip-Min: encodes write data into a set of data vectors and then picks the vector that yields in less flips in comparison against old data

0 0 1 1 1

0 1 0 0 1

Old

New

Saves 2 bit flips

0 0 1 1 1

0 1 0 0 1

Old

New1 0 1 1 0New

Saves 3 bit flips

0 0 1 1 1

0 1 0 0 1

Old

New1

1 0 1 1 0New2

Saves 4 bit flips1 0 1 1 1New3

Write Asymmetries PCM

• The RESET state is more detrimental to endurance than the set state

STT-RAM

• Anti-parallel magnetization is more prone to write errors than parallel magnetization

SET (“1”)

RESET (“0”)

Time

Pow

er

Free Layer

Oxide Layer

Reference Layer

Free Layer

Oxide Layer

Reference Layer

Parallel magnetization (“0”)

Anti-parallel magnetization (“1”)

Contribution Observation: existing schemes fail to exploit the

write asymmetry0 0 0 1

1 1 1 1

0 0 0 0

Saves 1 bit flip

Old

New

NewSaves 3 bit flips

Writing a “0” is 4 times more detrimental to endurance than writing a“1”

Number of bit flips is oblivious to the write asymmetry!

Contribution Observation: existing schemes fail to exploit the write asymmetry

• Focusing solely on the number of bit flips is oblivious to the write asymmetry

Proposal: move from the concept of “bit flip reduction” to “cost reduction”

Cost Aware Flip Optimization (CAFO)• Cost model: captures the write asymmetry and assigns a cost for a given

write operation• Coding engine: encodes the write data into a form that result in overall

cost reduction

Cost Model Compare write data to currently stored data and

associate a cost to each cell

The costs “a”, “b”, “c” and “d” depend on the technology being modeled and the optimization objective (endurance, energy, error rate)

0 0 1 1 0 1 1 1

1 0 1 0 1 0 1 0

a c d b a b d b

Currently Stored Data

New DataCost of Writing

a: 01, b: 10, c: 00, d:11

Write cost:

With a write cost we can define a gain among

different encodings

Gain Calculation

C= 2a + 3b + 1c + 2d = 8

Cencoded = 1a + 2b + 2c + 3d = 5

Gain G = C- Cencoded = 8 – 5 = 3

0 0 1 1 0 1 1 1

1 0 1 0 1 0 1 0Currently Stored DataNew Data

a c d b a b d bCost of Writing

Cost of Writing

c b a d c d a d

Encoded Data

a: 01, b: 10, c: 00, d:11

Costs: a = 1, b = 2, c = 0, d = 0

0 1 0 1 0 1 0 1

A positive gain implies that it is less costly to write the data encoded

How to encode Data?

Encoding

Auxiliary bits

Auxiliary bits serve as inversion flags

Coding steps:1. Compute rows gain2. Flip all rows with positive gain

EncodingAuxiliary bits serve as inversion

flagsCoding steps:

1. Compute rows gain2. Flip all rows with positive gain3. Compute columns gain4. Flip all columns with positive gain5. Repeat process until all rows and

columns show a zero or negative gain

Alteration between row and column flips yields in additional cost reduction

Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a

cell that is to be flipped, “0” otherwise

1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1

00000000

0 0 0 0 0 0 0 0

0000

+20-2+2

Gain

Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a

cell that is to be flipped, “0” otherwise

1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1

00000000

0 0 0 0 0 0 0 0

0000

+20-2+2

00001001

0 0 0 0 0 0 0 0

0000-20-2-2

-2 +4 -2 -2 -2 +2 0 -4

1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 00 1 0 0 0 1 1 01 1 0 1 0 1 0 00 1 0 0 1 0 0 10 0 0 1 1 0 1 0

Flip rows with + gain

Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a

cell that is to be flipped, “0” otherwise

00001001

0 1 0 0 0 1 0 0

1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 10 1 0 1 1 1 1 0

0-40-4-6-4-2+2

-2 -4 -2 -2 -2 -2 0 -4

Flip columns with + gain

1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1

00000000

0 0 0 0 0 0 0 0

0000

+20-2+2

00001001

0 0 0 0 0 0 0 0

0000-20-2-2

-2 +4 -2 -2 -2 +2 0 -4

1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 00 1 0 0 0 1 1 01 1 0 1 0 1 0 00 1 0 0 1 0 0 10 0 0 1 1 0 1 0

Flip rows with + gain

Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a

cell that is to be flipped, “0” otherwise

00001000

0 1 0 0 0 1 0 0

-60-6-20-2-4-2

0 -6 -0 -6 -4 -4 -2 -2

1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 11 0 1 0 0 0 0 1

21 flips

33 flips

00001001

0 1 0 0 0 1 0 0

1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 10 1 0 1 1 1 1 0

0-40-4-6-4-2+2

-2 -4 -2 -2 -2 -2 0 -4

1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1

00000000

0 0 0 0 0 0 0 0

0000

+20-2+2

00001001

0 0 0 0 0 0 0 0

0000-20-2-2

-2 +4 -2 -2 -2 +2 0 -4

1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 00 1 0 0 0 1 1 01 1 0 1 0 1 0 00 1 0 0 1 0 0 10 0 0 1 1 0 1 0

Encoding terminates as no row or column shows a positive gain

Flip columns with + gainFlip rows with + gain Flip rows with + gain

Row only Inversion1 0 0 10 1 1 01 1 1 00 1 0 00 1 1 00 0 1 10 1 1 01 1 0 01 0 1 11 0 0 11 1 0 10 1 0 00 1 0 01 0 0 11 1 1 00 1 0 1

0000000000000000

0010000010100010

FNW

1 0 0 10 1 1 00 0 0 10 1 0 00 1 1 00 0 1 10 1 1 01 1 0 00 1 0 01 0 0 10 0 1 00 1 0 00 1 0 01 0 0 10 0 0 10 1 0 1

33 flips

25 flips

Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a

cell that is to be flipped, “0” otherwise

00001000

0 1 0 0 0 1 0 0

-60-6-20-2-4-2

0 -6 -0 -6 -4 -4 -2 -2

1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 11 0 1 0 0 0 0 1

21 flips

33 flips

00001001

0 1 0 0 0 1 0 0

1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 10 1 0 1 1 1 1 0

0-40-4-6-4-2+2

-2 -4 -2 -2 -2 -2 0 -4

1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1

00000000

0 0 0 0 0 0 0 0

0000

+20-2+2

00001001

0 0 0 0 0 0 0 0

0000-20-2-2

-2 +4 -2 -2 -2 +2 0 -4

1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 00 1 0 0 0 1 1 01 1 0 1 0 1 0 00 1 0 0 1 0 0 10 0 0 1 1 0 1 0

Flip columns with + gainFlip rows with + gain Flip rows with + gain

Can We do better?

Encoding Optimization Write cost can be further reduced even if no row

or column shows a positive gain

0 1 0 00 0 1 11 0 0 01 0 0 0

0000

0 0 0 0

-20-2-2

0100

1 0 0 0 0 -2 -2 -2

0-2-4-4

-2 0 -4 -4

1 1 0 00 1 0 00 0 0 00 0 0 0

5 flips

3 flips

Flip row and column together

Gain

Encoding Optimization Write cost can be further reduced even if no row

or column shows a positive gain

• Flipping both a row and a column, leaves their intersecting cell un-inverted

• The local gain of the intersecting cell has to subtracted from the total gain of the corresponding row and columns

• Gain is achieved if Gr + Gc – 2gr+c > 0

Gc

Gr gr+c 0 1 0 0

0 0 1 11 0 0 01 0 0 0

0000

0 0 0 0

-2

0

-2

-2

0100

1 0 0 0 0 -2 -2 -2

0

-2

-4

-4

-2 0 -4 -4

1 1 0 00 1 0 00 0 0 00 0 0 0

Flip row and column together

Gain

Encoding Optimization (cont.) Generalize to Flipping 1 column with multiple

rows (Vice Versa)

0 0 0 00 0 1 11 0 0 11 1 0 0

0000

0 0 0 0

-4000

0 1 0 01 0 0 00 0 1 01 0 0 0

0110

0 1 0 0

Gain-2

-2

-2

-1

0 -2 -2 0 0 -2 -2 0

6 flips 4 flips

Flip 2 rows and 2column together

Aux. Bits Cost The cost of updating the auxiliary bits can be easily

incorporated in the gain calculation0 0 1 1 0 1 1 0 1

1 0 1 0 1 0 1 0

a: 01, b: 10, c: 00, d:11

a c d b a b d c b

c a b d c d b a d

C = 2a + 3b + 2c +d = 8

Cinverted = 2a + 2b + 2c +2d = 6

G= C – Cinverted = 8 - 6 = 2

Cost of Writing

Gain

Currently Stored Data

New Data

Cost of Writing

Inverted Data

Costs: a = 1, b = 2, c = 0, d = 0

Old aux bit has to be flipped

to “0”

Old aux bit stays the same0 1 0 1 0 1 0 1

Decoding Simple: XOR the corresponding vertical and

horizontal aux bits• Output of “1”: read cell value inverted• Output of “0”: read cell valued un-inverted

0 0 0 00 0 1 11 0 0 11 1 0 0

0000

0 0 0 0

0 1 0 01 0 0 00 0 1 01 0 0 0

0110

0 1 0 0

0 0 0 00 0 1 11 0 0 11 1 0 0

Encode Decode

Decoding Simple: XOR the corresponding vertical and

horizontal aux bits• Output of “1”: read cell value inverted• Output of “0”: read cell valued un-inverted

0 0 0 00 0 1 11 0 0 11 1 0 0

0000

0 0 0 0

0 1 0 01 0 0 00 0 1 01 0 0 0

0110

0 1 0 0

0 0 0 00 0 1 11 0 0 11 1 0 0

Encode Decode

Evaluation Compare Against Flip-Min and Flip-N-Write (FNW) Experiment with various block sizes of matching space

overhead Compute average cost reduction achieved by every

scheme relative to differential write Experiment with random input stream and memory

traces collected from various SPEC benchmark programs Model both PCM and STT-RAM through setting the cost

labels to match the underlying technology

Cost Reduction vs. Cost oblivious FNW and Flip-Min

a=1, b=1

a=1, b=2

a=1, b=3

a=1, b=4

a=1, b=1

a=1, b=2

a=1, b=3

a=1, b=4

a=1, b=1

a=1, b=2

a=1, b=3

a=1, b=4

64 B 128 B 512 B

0.5

0.7

0.9

1.1

1.3

1.5

1.7 FNW Flip-Min CAFO

Cost

(les

s is

bett

er)

Overhead: 3.125%Overhead: 12.5% Overhead: 6.25%

Cost Reduction vs. Cost oblivious FNW and Flip-Min

a=1, b=1

a=1, b=2

a=1, b=3

a=1, b=4

a=1, b=1

a=1, b=2

a=1, b=3

a=1, b=4

a=1, b=1

a=1, b=2

a=1, b=3

a=1, b=4

64 B 128 B 512 B

0.5

0.7

0.9

1.1

1.3

1.5

1.7 FNW Flip-Min CAFO

Cost

(les

s is

bett

er)

Overhead: 3.125%Overhead: 12.5% Overhead: 6.25%

Cost Reduction vs. Cost aware FNW and Flip-Min

a=1, b=1

a=1, b=2

a=1, b=3

a=1, b=4

a=1, b=1

a=1, b=2

a=1, b=3

a=1, b=4

a=1, b=1

a=1, b=2

a=1, b=3

a=1, b=4

64 B 128 B 512 B

0.50.60.70.80.9

11.11.21.31.41.5 C-FNW C-Flip-Min CAFO

Cost

(les

s is

bett

er)

Overhead: 12.5% Overhead: 6.25% Overhead: 3.125%

Cost Model Improves FNW and Flip Min

Cost Model Improvement

a-1, b=1a=1, b=2a=1, b=3a=1, b=4a-1, b=1a=1, b=2a=1, b=3a=1, b=4

C-FN

WC-

Min

-Flip

0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25

Improvement over Cost Oblivious

Optimization Isolation

a=1, b=1 a=1, b=2 a=1, b=3 a=1, b=4128 B

0%

5%

10%

15%

20%

25%

30%

CAF-O CAFO

Cost

Red

uctio

n ov

er F

NW

At least 15% of cost reduction without encoding optimization

STT-RAM Cost Reduction

64 B 128 B 256 B0.6

0.8

1

1.2

1.4

1.6

1.8

2 C-FNW FNW C-Flip-Min Flip-Min CAFO

Cost

(less

is b

etter

)

Costs: a = 1, b = 0, c = 0, d = 0

Overhead: 12.5% Overhead: 6.25% Overhead: 3.125%

Benchmark Data

zeusm

pso

plexGCC wrf

leslie3d

libquantu

mmilc lbm

Average

0.6

0.8

1

1.2

1.4

1.6

1.8

2 CAFO C-FNW C-Flip-Min

Cost

(les

s is

bett

er)

Costs: a = 1, b = 2, c = 0, d = 0Block Size: 128B (6.25% overhead)

Conclusion Bit flip Minimization techniques are oblivious to

write asymmetries

Move from the concept of bit flip minimization to cost Reduction

CAFO◦ Cost model that captures the asymmetry in the write cost◦ 2D Encoder that minimizes the overall cost of write operations

Questions?