CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH...
-
Upload
samuel-hoover -
Category
Documents
-
view
216 -
download
1
Transcript of CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH...
CAFO: Cost Aware Flip Optimization for Asymmetric
Memories
RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM
COMPUTER SCIENCE DEPARTMENT
UNIVERSITY OF PITTSBURGH
HPCA 2015
Introduction DRAM and NAND Flash are facing physical limitations putting
their scalability into question• DRAM: Decrease in cell reliability and Increase in power consumption• NAND Flash: Endurance degradation and Increase in number of
transient and hard errors
Phase-Change Memory (PCM) and Spin-Transfer Torque Random Access Memory (STT-RAM)are a promising alternative• Scalability, low access latency and close to zero leakage power• Initial assessments and evaluations are encouraging
Challenges PCM and STT-RAM have a number of challenges that
needs to be dealt with before deployment in functional systems• PCM suffers from limited endurance • STT-RAM suffers from high write bit error rate
Solution: Bit flip minimization• Service write requests while flipping as few bits as possible• Preserves PCM’s endurance and improves STT-RAM’s write
reliability
Previous Work
• Differential Write: compares old data against new data and then only flips differing cells.
• Flip-N-Write: encodes write data into either its regular or inverted form and then picks the encoding that yields in less flips in comparison against old data
• Flip-Min: encodes write data into a set of data vectors and then picks the vector that yields in less flips in comparison against old data
0 0 1 1 1
0 1 0 0 1
Old
New
Saves 2 bit flips
0 0 1 1 1
0 1 0 0 1
Old
New1 0 1 1 0New
Saves 3 bit flips
0 0 1 1 1
0 1 0 0 1
Old
New1
1 0 1 1 0New2
Saves 4 bit flips1 0 1 1 1New3
Write Asymmetries PCM
• The RESET state is more detrimental to endurance than the set state
STT-RAM
• Anti-parallel magnetization is more prone to write errors than parallel magnetization
SET (“1”)
RESET (“0”)
Time
Pow
er
Free Layer
Oxide Layer
Reference Layer
Free Layer
Oxide Layer
Reference Layer
Parallel magnetization (“0”)
Anti-parallel magnetization (“1”)
Contribution Observation: existing schemes fail to exploit the
write asymmetry0 0 0 1
1 1 1 1
0 0 0 0
Saves 1 bit flip
Old
New
NewSaves 3 bit flips
Writing a “0” is 4 times more detrimental to endurance than writing a“1”
Number of bit flips is oblivious to the write asymmetry!
Contribution Observation: existing schemes fail to exploit the write asymmetry
• Focusing solely on the number of bit flips is oblivious to the write asymmetry
Proposal: move from the concept of “bit flip reduction” to “cost reduction”
Cost Aware Flip Optimization (CAFO)• Cost model: captures the write asymmetry and assigns a cost for a given
write operation• Coding engine: encodes the write data into a form that result in overall
cost reduction
Cost Model Compare write data to currently stored data and
associate a cost to each cell
The costs “a”, “b”, “c” and “d” depend on the technology being modeled and the optimization objective (endurance, energy, error rate)
0 0 1 1 0 1 1 1
1 0 1 0 1 0 1 0
a c d b a b d b
Currently Stored Data
New DataCost of Writing
a: 01, b: 10, c: 00, d:11
Write cost:
With a write cost we can define a gain among
different encodings
Gain Calculation
C= 2a + 3b + 1c + 2d = 8
Cencoded = 1a + 2b + 2c + 3d = 5
Gain G = C- Cencoded = 8 – 5 = 3
0 0 1 1 0 1 1 1
1 0 1 0 1 0 1 0Currently Stored DataNew Data
a c d b a b d bCost of Writing
Cost of Writing
c b a d c d a d
Encoded Data
a: 01, b: 10, c: 00, d:11
Costs: a = 1, b = 2, c = 0, d = 0
0 1 0 1 0 1 0 1
A positive gain implies that it is less costly to write the data encoded
How to encode Data?
Encoding
Auxiliary bits
Auxiliary bits serve as inversion flags
Coding steps:1. Compute rows gain2. Flip all rows with positive gain
EncodingAuxiliary bits serve as inversion
flagsCoding steps:
1. Compute rows gain2. Flip all rows with positive gain3. Compute columns gain4. Flip all columns with positive gain5. Repeat process until all rows and
columns show a zero or negative gain
Alteration between row and column flips yields in additional cost reduction
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a
cell that is to be flipped, “0” otherwise
1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1
00000000
0 0 0 0 0 0 0 0
0000
+20-2+2
Gain
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a
cell that is to be flipped, “0” otherwise
1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1
00000000
0 0 0 0 0 0 0 0
0000
+20-2+2
00001001
0 0 0 0 0 0 0 0
0000-20-2-2
-2 +4 -2 -2 -2 +2 0 -4
1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 00 1 0 0 0 1 1 01 1 0 1 0 1 0 00 1 0 0 1 0 0 10 0 0 1 1 0 1 0
Flip rows with + gain
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a
cell that is to be flipped, “0” otherwise
00001001
0 1 0 0 0 1 0 0
1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 10 1 0 1 1 1 1 0
0-40-4-6-4-2+2
-2 -4 -2 -2 -2 -2 0 -4
Flip columns with + gain
1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1
00000000
0 0 0 0 0 0 0 0
0000
+20-2+2
00001001
0 0 0 0 0 0 0 0
0000-20-2-2
-2 +4 -2 -2 -2 +2 0 -4
1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 00 1 0 0 0 1 1 01 1 0 1 0 1 0 00 1 0 0 1 0 0 10 0 0 1 1 0 1 0
Flip rows with + gain
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a
cell that is to be flipped, “0” otherwise
00001000
0 1 0 0 0 1 0 0
-60-6-20-2-4-2
0 -6 -0 -6 -4 -4 -2 -2
1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 11 0 1 0 0 0 0 1
21 flips
33 flips
00001001
0 1 0 0 0 1 0 0
1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 10 1 0 1 1 1 1 0
0-40-4-6-4-2+2
-2 -4 -2 -2 -2 -2 0 -4
1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1
00000000
0 0 0 0 0 0 0 0
0000
+20-2+2
00001001
0 0 0 0 0 0 0 0
0000-20-2-2
-2 +4 -2 -2 -2 +2 0 -4
1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 00 1 0 0 0 1 1 01 1 0 1 0 1 0 00 1 0 0 1 0 0 10 0 0 1 1 0 1 0
Encoding terminates as no row or column shows a positive gain
Flip columns with + gainFlip rows with + gain Flip rows with + gain
Row only Inversion1 0 0 10 1 1 01 1 1 00 1 0 00 1 1 00 0 1 10 1 1 01 1 0 01 0 1 11 0 0 11 1 0 10 1 0 00 1 0 01 0 0 11 1 1 00 1 0 1
0000000000000000
0010000010100010
FNW
1 0 0 10 1 1 00 0 0 10 1 0 00 1 1 00 0 1 10 1 1 01 1 0 00 1 0 01 0 0 10 0 1 00 1 0 00 1 0 01 0 0 10 0 0 10 1 0 1
33 flips
25 flips
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a
cell that is to be flipped, “0” otherwise
00001000
0 1 0 0 0 1 0 0
-60-6-20-2-4-2
0 -6 -0 -6 -4 -4 -2 -2
1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 11 0 1 0 0 0 0 1
21 flips
33 flips
00001001
0 1 0 0 0 1 0 0
1 1 0 1 0 0 1 01 0 1 0 0 0 0 00 0 1 0 0 1 1 10 0 1 0 1 0 0 00 0 0 0 0 0 1 01 0 0 1 0 0 0 00 0 0 0 1 1 0 10 1 0 1 1 1 1 0
0-40-4-6-4-2+2
-2 -4 -2 -2 -2 -2 0 -4
1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 01 0 1 1 1 0 0 11 1 0 1 0 1 0 00 1 0 0 1 0 0 11 1 1 0 0 1 0 1
00000000
0 0 0 0 0 0 0 0
0000
+20-2+2
00001001
0 0 0 0 0 0 0 0
0000-20-2-2
-2 +4 -2 -2 -2 +2 0 -4
1 0 0 1 0 1 1 01 1 1 0 0 1 0 00 1 1 0 0 0 1 10 1 1 0 1 1 0 00 1 0 0 0 1 1 01 1 0 1 0 1 0 00 1 0 0 1 0 0 10 0 0 1 1 0 1 0
Flip columns with + gainFlip rows with + gain Flip rows with + gain
Can We do better?
Encoding Optimization Write cost can be further reduced even if no row
or column shows a positive gain
0 1 0 00 0 1 11 0 0 01 0 0 0
0000
0 0 0 0
-20-2-2
0100
1 0 0 0 0 -2 -2 -2
0-2-4-4
-2 0 -4 -4
1 1 0 00 1 0 00 0 0 00 0 0 0
5 flips
3 flips
Flip row and column together
Gain
Encoding Optimization Write cost can be further reduced even if no row
or column shows a positive gain
• Flipping both a row and a column, leaves their intersecting cell un-inverted
• The local gain of the intersecting cell has to subtracted from the total gain of the corresponding row and columns
• Gain is achieved if Gr + Gc – 2gr+c > 0
Gc
Gr gr+c 0 1 0 0
0 0 1 11 0 0 01 0 0 0
0000
0 0 0 0
-2
0
-2
-2
0100
1 0 0 0 0 -2 -2 -2
0
-2
-4
-4
-2 0 -4 -4
1 1 0 00 1 0 00 0 0 00 0 0 0
Flip row and column together
Gain
Encoding Optimization (cont.) Generalize to Flipping 1 column with multiple
rows (Vice Versa)
0 0 0 00 0 1 11 0 0 11 1 0 0
0000
0 0 0 0
-4000
0 1 0 01 0 0 00 0 1 01 0 0 0
0110
0 1 0 0
Gain-2
-2
-2
-1
0 -2 -2 0 0 -2 -2 0
6 flips 4 flips
Flip 2 rows and 2column together
Aux. Bits Cost The cost of updating the auxiliary bits can be easily
incorporated in the gain calculation0 0 1 1 0 1 1 0 1
1 0 1 0 1 0 1 0
a: 01, b: 10, c: 00, d:11
a c d b a b d c b
c a b d c d b a d
C = 2a + 3b + 2c +d = 8
Cinverted = 2a + 2b + 2c +2d = 6
G= C – Cinverted = 8 - 6 = 2
Cost of Writing
Gain
Currently Stored Data
New Data
Cost of Writing
Inverted Data
Costs: a = 1, b = 2, c = 0, d = 0
Old aux bit has to be flipped
to “0”
Old aux bit stays the same0 1 0 1 0 1 0 1
Decoding Simple: XOR the corresponding vertical and
horizontal aux bits• Output of “1”: read cell value inverted• Output of “0”: read cell valued un-inverted
0 0 0 00 0 1 11 0 0 11 1 0 0
0000
0 0 0 0
0 1 0 01 0 0 00 0 1 01 0 0 0
0110
0 1 0 0
0 0 0 00 0 1 11 0 0 11 1 0 0
Encode Decode
Decoding Simple: XOR the corresponding vertical and
horizontal aux bits• Output of “1”: read cell value inverted• Output of “0”: read cell valued un-inverted
0 0 0 00 0 1 11 0 0 11 1 0 0
0000
0 0 0 0
0 1 0 01 0 0 00 0 1 01 0 0 0
0110
0 1 0 0
0 0 0 00 0 1 11 0 0 11 1 0 0
Encode Decode
Evaluation Compare Against Flip-Min and Flip-N-Write (FNW) Experiment with various block sizes of matching space
overhead Compute average cost reduction achieved by every
scheme relative to differential write Experiment with random input stream and memory
traces collected from various SPEC benchmark programs Model both PCM and STT-RAM through setting the cost
labels to match the underlying technology
Cost Reduction vs. Cost oblivious FNW and Flip-Min
a=1, b=1
a=1, b=2
a=1, b=3
a=1, b=4
a=1, b=1
a=1, b=2
a=1, b=3
a=1, b=4
a=1, b=1
a=1, b=2
a=1, b=3
a=1, b=4
64 B 128 B 512 B
0.5
0.7
0.9
1.1
1.3
1.5
1.7 FNW Flip-Min CAFO
Cost
(les
s is
bett
er)
Overhead: 3.125%Overhead: 12.5% Overhead: 6.25%
Cost Reduction vs. Cost oblivious FNW and Flip-Min
a=1, b=1
a=1, b=2
a=1, b=3
a=1, b=4
a=1, b=1
a=1, b=2
a=1, b=3
a=1, b=4
a=1, b=1
a=1, b=2
a=1, b=3
a=1, b=4
64 B 128 B 512 B
0.5
0.7
0.9
1.1
1.3
1.5
1.7 FNW Flip-Min CAFO
Cost
(les
s is
bett
er)
Overhead: 3.125%Overhead: 12.5% Overhead: 6.25%
Cost Reduction vs. Cost aware FNW and Flip-Min
a=1, b=1
a=1, b=2
a=1, b=3
a=1, b=4
a=1, b=1
a=1, b=2
a=1, b=3
a=1, b=4
a=1, b=1
a=1, b=2
a=1, b=3
a=1, b=4
64 B 128 B 512 B
0.50.60.70.80.9
11.11.21.31.41.5 C-FNW C-Flip-Min CAFO
Cost
(les
s is
bett
er)
Overhead: 12.5% Overhead: 6.25% Overhead: 3.125%
Cost Model Improves FNW and Flip Min
Cost Model Improvement
a-1, b=1a=1, b=2a=1, b=3a=1, b=4a-1, b=1a=1, b=2a=1, b=3a=1, b=4
C-FN
WC-
Min
-Flip
0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25
Improvement over Cost Oblivious
Optimization Isolation
a=1, b=1 a=1, b=2 a=1, b=3 a=1, b=4128 B
0%
5%
10%
15%
20%
25%
30%
CAF-O CAFO
Cost
Red
uctio
n ov
er F
NW
At least 15% of cost reduction without encoding optimization
STT-RAM Cost Reduction
64 B 128 B 256 B0.6
0.8
1
1.2
1.4
1.6
1.8
2 C-FNW FNW C-Flip-Min Flip-Min CAFO
Cost
(less
is b
etter
)
Costs: a = 1, b = 0, c = 0, d = 0
Overhead: 12.5% Overhead: 6.25% Overhead: 3.125%
Benchmark Data
zeusm
pso
plexGCC wrf
leslie3d
libquantu
mmilc lbm
Average
0.6
0.8
1
1.2
1.4
1.6
1.8
2 CAFO C-FNW C-Flip-Min
Cost
(les
s is
bett
er)
Costs: a = 1, b = 2, c = 0, d = 0Block Size: 128B (6.25% overhead)
Conclusion Bit flip Minimization techniques are oblivious to
write asymmetries
Move from the concept of bit flip minimization to cost Reduction
CAFO◦ Cost model that captures the asymmetry in the write cost◦ 2D Encoder that minimizes the overall cost of write operations