Sangyeun Cho Hyunjin Lee
description
Transcript of Sangyeun Cho Hyunjin Lee
Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Per-
formance, Energy and Endurance
Sangyeun Cho Hyunjin Lee
Dept. of Computer ScienceUniversity of Pittsburgh
University of Pittsburgh
Phase-change RAM (PRAM)
topelectrode
GST
metal
bit line
select TR
currentpulsesource
memory
VCC
Vgate
(Pictures from Hegedüs and Elliott, Nature Materials, March 2008)
Amorphous = high resistivity Crystalline = low resistivity
SET
RESET
University of Pittsburgh
PRAM asymmetries
read latency << write latency
University of Pittsburgh
PRAM asymmetries
read power << write power
University of Pittsburgh
PRAM asymmetries
read endurance >> write endurance
University of Pittsburgh
Writes are Bad…
… and are Hated!
University of Pittsburgh
Partial writes [Lee et al. ‘09]
cache block replaced and written to PRAM
1
0 1introduce multiple dirty bits to isolate and not write clean data
4B granularity gives a 6x improvement (over 64B)
University of Pittsburgh
Differential write [Yang et al. ‘07][Zhou et al. ‘09]
0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 1 cache block replaced to be written to PRAM
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 content of the corresponding memory block
University of Pittsburgh
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
Differential write [Yang et al. ‘07][Zhou et al. ‘09]
cache block replaced to be written to PRAM
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 content of the corresponding memory block
READ out (old) memory content1
0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 1
University of Pittsburgh
0 0 0 1 0 1 1 0
Differential write [Yang et al. ‘07][Zhou et al. ‘09]
cache block replaced to be written to PRAM
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 1
Compare old and new data bit by bit2
1 1 1 1 0 1 1 0
0 0 10 0
University of Pittsburgh
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
Differential write [Yang et al. ‘07][Zhou et al. ‘09]
cache block replaced to be written to PRAM
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 1
Update bits that differ3
0 0 10 0
Probabilistically, 50% of bits will be updated;In practice, (typically) fewer bits are updated as
bit-level redundancies are common in data
0 1 0 1 0 0 0 1 0 0 10 0
University of Pittsburgh
Contributions of this work We propose Flip-N-Write, a differential write scheme based
on Read-Modify-Write and data encoding• We use a simple bi-modal data encoding strategy: Intact or flipped• Flip bit is introduced to denote the mode
Importantly, Flip-N-Write will update at most N/2 bits at a time when updating N bits• c.f., Differential write updates at most N bits• Write power is deterministically bounded
We perform a comprehensive comparative study• Conventional write scheme• Differential write• Flip-N-Write
University of Pittsburgh
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
Flip-N-Write basic idea
cache block replaced to be written to PRAM
1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 “New data”
“Old data”
University of Pittsburgh
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
Flip-N-Write basic idea
cache block replaced to be written to PRAM
1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 “New data”
“Old data”
11 bits are different!
University of Pittsburgh
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
Flip-N-Write basic idea
cache block replaced to be written to PRAM
1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 “New data”
“Old data”
Only five bits are different!
0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 “Flippednew data”
University of Pittsburgh
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
Flip-N-Write basic idea
cache block replaced to be written to PRAM
1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 “New data”
“Old data”
(5+1) bits need be updated…
0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 “Flippednew data” 1
0
“Flip bit”
University of Pittsburgh
Flip-N-Write steps
1 1 1 1 1 1 1 10 0 0 1 0 0 0 1cache block replaced to be written to PRAM
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 content of the corresponding memory block
University of Pittsburgh
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
Flip-N-Write steps
cache block replaced to be written to PRAM
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 content of the corresponding memory block
READ out (old) memory content1
1 1 1 1 1 1 1 10 0 0 1 0 0 0 1
University of Pittsburgh
0 0 0 1 0 1 1 0
Flip-N-Write stepscache block replaced to be written to PRAM
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
1 1 1 1 1 1 1 10 0 0 1 0 0 0 1
Compare old and new data and compute hamming distance2
1 1 1 1 0 1 1 0
University of Pittsburgh
0 0 0 1 0 1 1 0
Flip-N-Write stepscache block replaced to be written to PRAM
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
1 1 1 1 1 1 1 10 0 0 1 0 0 0 1
If hamming distance > N/2, flip new data2
1 1 1 1 0 1 1 0
University of Pittsburgh
0 0 0 1 0 1 1 0
Flip-N-Write stepscache block replaced to be written to PRAM
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0
0 0 0 0 0 0 0 01 1 1 0 1 1 1 0
If hamming distance > N/2, flip new data2
1 1 1 1 0 1 1 0
0 0 0 10
1 1 1 01
University of Pittsburgh
Flip-N-Write stepscache block replaced to be written to PRAM
Update bits that differ3
0 0 10 0
At most N/2 bits are updated;In practice, (typically) fewer bits are updated as
bit-level redundancies are common in data
0 0 0 0 0 0 0 01 1 1 0 1 1 1 00 0 0 10 0 0 0 0 0 0 01 1 1 0 1 1 1 00 0 0 100
0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 0 0 0 10
University of Pittsburgh
Analysis: bit update reduction
2 4 8 16 320%
10%
20%
30%
40%
50%
60%
70%
word width
improvementover conventional
University of Pittsburgh
Analysis: bit update reduction
2 4 8 16 320%
10%
20%
30%
40%
50%
60%
70%
word width
improvementover DW
improvementover conventional
University of Pittsburgh
Analysis: flip bit overheads
2 4 8 16 320%
10%
20%
30%
40%
50%
60%
70%
word width
improvementover conventional
improvementover DW
overhead
University of Pittsburgh
Analysis: comparison with DW
DW
# bi
t upd
ates
/16-
bit w
rite
in-position bit flip probability
0 0.2 0.4 0.600000000000001 0.8 10
4
8
12
16
University of Pittsburgh
Analysis: comparison with DW
DW
# bi
t upd
ates
/16-
bit w
rite
in-position bit flip probability
0 0.2 0.4 0.600000000000001 0.8 10
4
8
12
16
Flip-N-Write
University of Pittsburgh
Implications Savings in bit updates improves power and endurance
On average, in terms of # update bits, Flip-N-Write is better than, but not far superior to DW
However, in terms of maximum # update bits, Flip-N-Write has an edge: N/2 vs. N• This deterministic property is extremely useful with a limited cell write
current requirement
Write current limited write time (M bits, S-bit update rate):• Conventional: (M/S)×TSET
• DW: TREAD + (M/S)×TSET
• Flip-N-Write: TREAD + (M/2S)×TSET
University of Pittsburgh
Circuit design (16): read path
(Baseline design from Lee et al., ISSCC, February 2007)
cell blocks have more bits (flip bits)
larger read buffer (flip bits)
logic for bypassing/flipping data
University of Pittsburgh
Circuit design (16): write prep.
(Baseline design from Lee et al., ISSCC, February 2007)
determine ifflipping is needed
bit mask showingwhat bits need programming bit-wise SET/RESET commands
University of Pittsburgh
Circuit design (16): write driver
(Baseline design from Lee et al., ISSCC, February 2007)
SET operation suppressedif outcome is “0” (not needed) RESET operation suppressed
if outcome is “0”
write (SET/RESET) pulseCell current
University of Pittsburgh
Experimental setup We have three sets of experiments
• Storage environment: # of bit updates from firmware/data update• Main memory environment: Program execution time
Four-core processor chip with ATOM-like cores PRAMs with conventional write, DW, and Flip-N-Write
• Based on Samsung’s 512Mbit prototype chip Workloads
• Firmware update: MiBench compiled with gcc• Data update: photos from New York Times “Pictures of the Day” (late April of
2009), music files “ripped” from two albums
• Program run: SPEC2006 (multiprogrammed), SPLASH-2, SPECjbb
University of Pittsburgh
Firmware update (ARM)
(a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c)
basicmath typeset stringsearch
patricia pgp Average
0
128
256
384
512
640
768
896
1,024
SET
RESET
# bi
t upd
ates
/1,0
24 c
ode
bits
University of Pittsburgh
Firmware update (ARM)
(a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c)
basicmath typeset stringsearch
patricia pgp Average
0
128
256
384
512
640
768
896
1,024
SET
RESET
# bi
t upd
ates
/1,0
24 c
ode
bits
conventionalDWFlip-N-Write
University of Pittsburgh
Firmware update (ARM)
(a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c)
basicmath typeset stringsearch
patricia pgp Average
0
128
256
384
512
640
768
896
1,024
# bi
t upd
ates
/1,0
24 c
ode
bits
Many more “SET” operations than “RESET”
University of Pittsburgh
Firmware update (ARM)
(a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c)
basicmath typeset stringsearch
patricia pgp Average
0
128
256
384
512
640
768
896
1,024
# bi
t upd
ates
/1,0
24 c
ode
bits
For DW & Flip-N-Write, # SET ~ # RESET
University of Pittsburgh
Firmware update (ARM)
(a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c)
basicmath typeset stringsearch
patricia pgp Average
0
128
256
384
512
640
768
896
1,024
# bi
t upd
ates
/1,0
24 c
ode
bits
305.7278.6
Conventional > DW > Flip-N-Write
University of Pittsburgh
Firmware update (x86)
(a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c)
basicmath typeset stringsearch
patricia pgp Average
0128256384512640768896
1,024
385.0331.3
# bi
t upd
ates
/1,0
24 c
ode
bits
Improvement w/ Flip-N-Write over DW ~14%
University of Pittsburgh
Music file update
(a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c) (a)
(b)
(c)
Krall-Medium Stravinsky-Medium
Krall-High Stravinsky-High
0
128
256
384
512
640
768
896
1,024
# bi
t upd
ates
/1,0
24 d
ata
bits
Compressed files have roughly equal 0’s and 1’s
University of Pittsburgh
Main memory performance
LLMM LLHH MMHH cholesky lu ocean SPECjbb0
400
800
1,200
AM
AL
(cyc
les)
conv
entio
nal
DW
Flip
-N-W
rite
NoD
elay
University of Pittsburgh
Main memory performance
LLMM LLHH MMHH cholesky lu ocean SPECjbb0
400
800
1,200
AM
AL
(cyc
les)
Higher memory intensity leads to larger AMAL
University of Pittsburgh
Main memory performance
LLMM LLHH MMHH cholesky lu ocean SPECjbb0
400
800
1,200
AM
AL
(cyc
les)
Flip-N-Write eases bandwidth shortage problem
University of Pittsburgh
Program performance
LLMM LLHH MMHH cholesky lu ocean SPECjbb0
0.2
0.4
0.6
0.8
1
rela
tive
perf
orm
ance
(to
NoD
elay
)
PRAM write bandwidth remains a challenge…
University of Pittsburgh
Program performance
LLMM LLHH MMHH cholesky lu ocean SPECjbb0
0.2
0.4
0.6
0.8
1
rela
tive
perf
orm
ance
(to
NoD
elay
)
Flip-N-Write salvages performance…
University of Pittsburgh
Bandwidth, bandwidth, bandwidth
16 banks 32 banks 64 banks 128 banks
0
200
400
600
800
1,000
1,200
1,400
400ns 200ns 100ns 50ns0
200
400
600
800
1,000
1,200
1,400
2MB 4MB 8MB 16MB0
200
400
600
800
1,000
1,200
1,400
AM
AL
(cyc
les)
conventional
DW
Flip-N-Write
NoDelay
Write bandwidth is the key to high performance
# banks/device L2 cache size PRAM update time
University of Pittsburgh
Conclusions Flip-N-Write is a differential write scheme to reduce PRAM’s
bit update actions using simple encoding• Power/energy reduction• Potential improvement in write endurance
Simple algorithm allows straightforward design Deterministic bound on # update bits beneficial for power
provisioning or bandwidth improvement• Faster firmware/data update latency• Performance improvement can be sizable
Write bandwidth remains a challenge
University of Pittsburgh
Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Per-
formance, Energy and Endurance
Sangyeun Cho Hyunjin Lee
Dept. of Computer ScienceUniversity of Pittsburgh
University of Pittsburgh
Flip-N-Write, more formally