Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R....
-
Upload
victor-hicks -
Category
Documents
-
view
214 -
download
2
Transcript of Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R....
Reducing Cache Power with Low-Cost, Multi-Bit
Error-Correcting CodesChris Wilkerson, Alaa R. Alameldeen,
Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien Lu
Intel Labs
Overview
• eDRAM refresh contributes overall power
• ECC can increase refresh time– High cost: Storage/logic/latency
• Hi-ECC: a practical multi-bit ECC– Addresses traditional obstacles to multi-bit
ECC• Reduces storage/logic overhead• Minimizes latency
– Reduces eDRAM refresh power by 93%
0
2000
4000
6000
8000
10000
12000
0 10 20 30 40 50 100
% Time Active
Po
wer
(m
W)
128MB eDRAM Refresh (930mW)
CPU Idle
CPU Active
Why Refresh Power?
Typical Usage: 27% total power
A. Naveh, et al., “Power and thermal management in the Intel® Core® Duo processor,” Intel Technology Journal, vol. 10, no. 2, May 2006.
CPU Idle
CPU ActiveIntel® Centrino® 2 Processor Technology.www.intel.com
Reducing eDRAM Refresh Power
• Option 1: Power gating eDRAM banks – 50% power gating -> 50% reduction in refresh power– Lose 64MB of cache state.
• ~2ms to refetch 64MB from bulk DRAM • >>typical idle exit/enter 100-200us
• Option 2: Extend Refresh Time ...
Problem w/ Extending Refresh Time
Reference: Wong et al., ITC 2008
1.E-21
1.E-18
1.E-15
1.E-12
1.E-09
1.E-06
1.E-03
1.E+00
0 100 200 300 400 500Refresh Time (us)
Pro
bab
lity
of
Fai
lure
pfbit Base
128MB eDRAM cache
single bit (eDRAM cell)
Target yield loss
~30us, ~930mW
Extending refresh time introduces random bit failures
Ways to Extend Refresh Time
• Use tests to identify/avoid weak bits:– RAPID: R. Venkatesan et al. (HPCA 06)– BitFix: C. Wilkerson et al. (ISCA 08)
• Eliminate refresh when data is written/read.– Smart Refresh: M. Ghosh, H. Lee (Micro 07)
• Use error correcting codes (ECC) to eliminate refresh related failures– Rethinking Refresh: P. Emma et al. IEEE Micro
Nov 08
Impact of ECC on Refresh Time
1.E-21
1.E-18
1.E-15
1.E-12
1.E-09
1.E-06
1.E-03
1.E+00
0 100 200 300 400 500Refresh Time (us)
Pro
ba
bili
ty o
f F
ailu
re
30us930mW
Base: 128MB eDRAM cache
Impact of ECC on Refresh Time
1.E-21
1.E-18
1.E-15
1.E-12
1.E-09
1.E-06
1.E-03
1.E+00
0 100 200 300 400 500Refresh Time (us)
Pro
ba
bili
ty o
f F
ailu
re
30us930mW
SECDEC 150us, 185mW
Base: 128MB eDRAM cache
Impact of ECC on Refresh Time
1.E-21
1.E-18
1.E-15
1.E-12
1.E-09
1.E-06
1.E-03
1.E+00
0 100 200 300 400 500Refresh Time (us)
Pro
ba
bili
ty o
f F
ailu
re
30us930mW
SECDEC 150us, 185mW
5EC6ED
Base: 128MB eDRAM cache
0
10
20
30
SECDED 5EC6ED 10EC11ED 5EC6ED 10EC11ED 5EC6ED 10EC11ED
% n
orm
aliz
ed t
o 1
28M
B e
DR
AM
ECC Logic
Storage
Reducing Cost of ECC Storage
64B
1 cycle
6 cycle
11 cycle
6 cycle
11 cycle
165 cycle170 cycle
1KB 1KB
Assume 1-xor = 10 DRAM bits
Partial (64) reads/writes on 1KB lines
The Partial Read/Write Problem
ECCx16 1KB (64Bx16)
ECC processing
Scenario 1: Segmented ECC
ECC processing
Scenario 2: Monolithic ECC1KB 64Bx16ECC
ReadWrite
•Entire 1KB line must be read to decode ECC
•64B write requires read-modify-write
• After reading/checking a line, lines are guaranteed to be error free for 30us.
• Recently Accessed Line Table (RALT) identifies lines referenced 30us ago.
• Lines identified by RALT don’t require ECC checking, do not require 1KB reads.
RALT Reduces Reads
0
10
20
30
SECDED 5EC6ED 10EC11ED 5EC6ED 10EC11ED 5EC6ED 10EC11ED
% n
orm
aliz
ed t
o 1
28M
B e
DR
AM
ECC Logic
Storage
Reducing the Cost of ECC Logic
64B
1 cycle
6 cycle
11 cycle
6 cycle
11 cycle
165 cycle170 cycle
1KB 1KB
Reduced complexity at the cost of high latency.
Assume 1-xor = 10 DRAM bits
Hi-ECC
Quick ECC Reduces Latency
CPU
TAG/ECC ARRAY
eDRAMAddress
Quick ECC
>1 fail?
High latency ECC processing~165 cycles
No
Yes
• Functionally equivalent to full ECC. • Optimizes the common case of 1 error or less.
Hi-ECC
• Reduce storage costs of ECC: – Amortize ECC bits over larger lines
• RALT: – Facilitates partial reads
• Quick-ECC: – Minimizes latency for error-detect/single-bit correct– High latency (165 cyles) for lines with multi-bit
failures. • Disable lines with multi-bit failures to further
reduce latency– 900 out of 128K (< 1%) lines have multi-bit failures– Total overhead (disabled lines + ECC code): ~1.6%
Evaluation• Intel® Core™ i7-like processor (2GHz)
– 256KB L2 Cache/ 128MB eDRAM cache (40-cycle)
• SD: SECDED – Reads/Writes: 2-cycle latency
• HE: Hi-ECC w/o RALT– Reads: 32-cycle latency; Writes: read-modify-write
• HER: Hi-ECC w/ RALT– RALT miss/hit: 32/2-cycle latency
• Negligible perf impact: ~0.1 – 0.5%
Power Analysis
0
100
200
300
400
500
600
700
800
900
1000B
AS
ES
DH
EH
ER
BA
SE
SD
HE
HE
RB
AS
ES
DH
EH
ER
BA
SE
SD
HE
HE
RB
AS
ES
DH
EH
ER
BA
SE
SD
HE
HE
R
eDR
AM
Po
wer
(m
W)
Rd/Wr
Refresh
ISPEC FSPEC GM OFF SERV GMEAN
SD: SECDED
HE: Hi-ECC
HER: Hi-ECCw/ RALT
HER reduces cache power 92% vs BASE, 61% vs SECDED
Conclusions
• Idle power significant for low power systems• eDRAM refresh ~27% of total power • ECC can increase refresh time
– High storage/logic/latency
• Hi-ECC: a practical multi-bit ECC– Addresses traditional obstacles to multi-bit ECC
• Reduces storage/logic overhead• Minimizes latency
• 93% reduction in eDRAM refresh power• Still have a problem with writes…
Backup
0
100
200
300
400
500
600
700
800
900
1000
BA
SE
SD
HE
HE
RB
AS
ES
DH
EH
ER
BA
SE
SD
HE
HE
RB
AS
ES
DH
EH
ER
BA
SE
SD
HE
HE
RB
AS
ES
DH
EH
ER
BA
SE
SD
HE
HE
RB
AS
ES
DH
EH
ER
BA
SE
SD
HE
HE
RB
AS
ES
DH
EH
ER
DH ISPEC FSPEC GM MM OFF PROD SERV WS GMEAN
Power AnalysisBase eDRAM
SD: SECDED
HE: Hi-ECC
HER: Hi-ECCw/ RALT
CPU idle power
A. Naveh, et al., “Power and thermal management in the Intel® Core® Duo
processor,” Intel Technology Journal, vol.
10, no. 2, May 2006.
High frequency idle enter/exits motivates low cost transitions (100-200us)
Low average power motivates aggressive power management.
~0.5W
~1.05W
Bar chart compares DRAM power & CPU power (active, idle)Power vs Refresh time for eDRAM.
0
100
200
300
400
500
600
700
800
900
1000
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450
Refresh Time (us)
Ref
resh
Po
wer
eDRAM Idle Power
A. Naveh, et al., “Power and thermal management in the Intel® Core® Duo processor,” Intel Technology
Journal, vol. 10, no. 2, May 2006.
Refresh Power for 128MB eDRAM
50% Reduction in Refresh Power
CPU idle power
Double Refresh Time
ECC processing
1KB 64Bx16ECC
….
line addr parity valid 2 bit period
01
…
63
ADDR
HIT
MISS
recency cntr~30us
Period?
HIT
RALT Reduces Reads
•First read requires 1KB read.
•Subsequent reads
Remove Failing Bits w/ BitFix
Read
Repaired Line
Quick ECC> 1 fail?
N
Generate ECC
Write
Physical Line
Failure Locations
Further Reducing Latency
Read
Failing Bits
Repaired Line
Failure Locations
Quick ECC> 1 fail?
Y
High latency ECC processing
• Of 128K lines < 900 require high latency ECC processing.• Disable lines with high failure rates.• If too many lines have multi-bit failures…
ECC