Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R....

25
Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien Lu Intel Labs

Transcript of Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R....

Page 1: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Reducing Cache Power with Low-Cost, Multi-Bit

Error-Correcting CodesChris Wilkerson, Alaa R. Alameldeen,

Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien Lu

Intel Labs

Page 2: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Overview

• eDRAM refresh contributes overall power

• ECC can increase refresh time– High cost: Storage/logic/latency

• Hi-ECC: a practical multi-bit ECC– Addresses traditional obstacles to multi-bit

ECC• Reduces storage/logic overhead• Minimizes latency

– Reduces eDRAM refresh power by 93%

Page 3: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

0

2000

4000

6000

8000

10000

12000

0 10 20 30 40 50 100

% Time Active

Po

wer

(m

W)

128MB eDRAM Refresh (930mW)

CPU Idle

CPU Active

Why Refresh Power?

Typical Usage: 27% total power

A. Naveh, et al., “Power and thermal management in the Intel® Core® Duo processor,” Intel Technology Journal, vol. 10, no. 2, May 2006.

CPU Idle

CPU ActiveIntel® Centrino® 2 Processor Technology.www.intel.com

Page 4: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Reducing eDRAM Refresh Power

• Option 1: Power gating eDRAM banks – 50% power gating -> 50% reduction in refresh power– Lose 64MB of cache state.

• ~2ms to refetch 64MB from bulk DRAM • >>typical idle exit/enter 100-200us

• Option 2: Extend Refresh Time ...

Page 5: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Problem w/ Extending Refresh Time

Reference: Wong et al., ITC 2008

1.E-21

1.E-18

1.E-15

1.E-12

1.E-09

1.E-06

1.E-03

1.E+00

0 100 200 300 400 500Refresh Time (us)

Pro

bab

lity

of

Fai

lure

pfbit Base

128MB eDRAM cache

single bit (eDRAM cell)

Target yield loss

~30us, ~930mW

Extending refresh time introduces random bit failures

Page 6: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Ways to Extend Refresh Time

• Use tests to identify/avoid weak bits:– RAPID: R. Venkatesan et al. (HPCA 06)– BitFix: C. Wilkerson et al. (ISCA 08)

• Eliminate refresh when data is written/read.– Smart Refresh: M. Ghosh, H. Lee (Micro 07)

• Use error correcting codes (ECC) to eliminate refresh related failures– Rethinking Refresh: P. Emma et al. IEEE Micro

Nov 08

Page 7: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Impact of ECC on Refresh Time

1.E-21

1.E-18

1.E-15

1.E-12

1.E-09

1.E-06

1.E-03

1.E+00

0 100 200 300 400 500Refresh Time (us)

Pro

ba

bili

ty o

f F

ailu

re

30us930mW

Base: 128MB eDRAM cache

Page 8: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Impact of ECC on Refresh Time

1.E-21

1.E-18

1.E-15

1.E-12

1.E-09

1.E-06

1.E-03

1.E+00

0 100 200 300 400 500Refresh Time (us)

Pro

ba

bili

ty o

f F

ailu

re

30us930mW

SECDEC 150us, 185mW

Base: 128MB eDRAM cache

Page 9: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Impact of ECC on Refresh Time

1.E-21

1.E-18

1.E-15

1.E-12

1.E-09

1.E-06

1.E-03

1.E+00

0 100 200 300 400 500Refresh Time (us)

Pro

ba

bili

ty o

f F

ailu

re

30us930mW

SECDEC 150us, 185mW

5EC6ED

Base: 128MB eDRAM cache

Page 10: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

0

10

20

30

SECDED 5EC6ED 10EC11ED 5EC6ED 10EC11ED 5EC6ED 10EC11ED

% n

orm

aliz

ed t

o 1

28M

B e

DR

AM

ECC Logic

Storage

Reducing Cost of ECC Storage

64B

1 cycle

6 cycle

11 cycle

6 cycle

11 cycle

165 cycle170 cycle

1KB 1KB

Assume 1-xor = 10 DRAM bits

Partial (64) reads/writes on 1KB lines

Page 11: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

The Partial Read/Write Problem

ECCx16 1KB (64Bx16)

ECC processing

Scenario 1: Segmented ECC

ECC processing

Scenario 2: Monolithic ECC1KB 64Bx16ECC

ReadWrite

•Entire 1KB line must be read to decode ECC

•64B write requires read-modify-write

Page 12: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

• After reading/checking a line, lines are guaranteed to be error free for 30us.

• Recently Accessed Line Table (RALT) identifies lines referenced 30us ago.

• Lines identified by RALT don’t require ECC checking, do not require 1KB reads.

RALT Reduces Reads

Page 13: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

0

10

20

30

SECDED 5EC6ED 10EC11ED 5EC6ED 10EC11ED 5EC6ED 10EC11ED

% n

orm

aliz

ed t

o 1

28M

B e

DR

AM

ECC Logic

Storage

Reducing the Cost of ECC Logic

64B

1 cycle

6 cycle

11 cycle

6 cycle

11 cycle

165 cycle170 cycle

1KB 1KB

Reduced complexity at the cost of high latency.

Assume 1-xor = 10 DRAM bits

Hi-ECC

Page 14: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Quick ECC Reduces Latency

CPU

TAG/ECC ARRAY

eDRAMAddress

Quick ECC

>1 fail?

High latency ECC processing~165 cycles

No

Yes

• Functionally equivalent to full ECC. • Optimizes the common case of 1 error or less.

Page 15: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Hi-ECC

• Reduce storage costs of ECC: – Amortize ECC bits over larger lines

• RALT: – Facilitates partial reads

• Quick-ECC: – Minimizes latency for error-detect/single-bit correct– High latency (165 cyles) for lines with multi-bit

failures. • Disable lines with multi-bit failures to further

reduce latency– 900 out of 128K (< 1%) lines have multi-bit failures– Total overhead (disabled lines + ECC code): ~1.6%

Page 16: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Evaluation• Intel® Core™ i7-like processor (2GHz)

– 256KB L2 Cache/ 128MB eDRAM cache (40-cycle)

• SD: SECDED – Reads/Writes: 2-cycle latency

• HE: Hi-ECC w/o RALT– Reads: 32-cycle latency; Writes: read-modify-write

• HER: Hi-ECC w/ RALT– RALT miss/hit: 32/2-cycle latency

• Negligible perf impact: ~0.1 – 0.5%

Page 17: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Power Analysis

0

100

200

300

400

500

600

700

800

900

1000B

AS

ES

DH

EH

ER

BA

SE

SD

HE

HE

RB

AS

ES

DH

EH

ER

BA

SE

SD

HE

HE

RB

AS

ES

DH

EH

ER

BA

SE

SD

HE

HE

R

eDR

AM

Po

wer

(m

W)

Rd/Wr

Refresh

ISPEC FSPEC GM OFF SERV GMEAN

SD: SECDED

HE: Hi-ECC

HER: Hi-ECCw/ RALT

HER reduces cache power 92% vs BASE, 61% vs SECDED

Page 18: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Conclusions

• Idle power significant for low power systems• eDRAM refresh ~27% of total power • ECC can increase refresh time

– High storage/logic/latency

• Hi-ECC: a practical multi-bit ECC– Addresses traditional obstacles to multi-bit ECC

• Reduces storage/logic overhead• Minimizes latency

• 93% reduction in eDRAM refresh power• Still have a problem with writes…

Page 19: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Backup

Page 20: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

0

100

200

300

400

500

600

700

800

900

1000

BA

SE

SD

HE

HE

RB

AS

ES

DH

EH

ER

BA

SE

SD

HE

HE

RB

AS

ES

DH

EH

ER

BA

SE

SD

HE

HE

RB

AS

ES

DH

EH

ER

BA

SE

SD

HE

HE

RB

AS

ES

DH

EH

ER

BA

SE

SD

HE

HE

RB

AS

ES

DH

EH

ER

DH ISPEC FSPEC GM MM OFF PROD SERV WS GMEAN

Power AnalysisBase eDRAM

SD: SECDED

HE: Hi-ECC

HER: Hi-ECCw/ RALT

Page 21: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

CPU idle power

A. Naveh, et al., “Power and thermal management in the Intel® Core® Duo

processor,” Intel Technology Journal, vol.

10, no. 2, May 2006.

High frequency idle enter/exits motivates low cost transitions (100-200us)

Low average power motivates aggressive power management.

~0.5W

~1.05W

Bar chart compares DRAM power & CPU power (active, idle)Power vs Refresh time for eDRAM.

Page 22: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

0

100

200

300

400

500

600

700

800

900

1000

0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450

Refresh Time (us)

Ref

resh

Po

wer

eDRAM Idle Power

A. Naveh, et al., “Power and thermal management in the Intel® Core® Duo processor,” Intel Technology

Journal, vol. 10, no. 2, May 2006.

Refresh Power for 128MB eDRAM

50% Reduction in Refresh Power

CPU idle power

Double Refresh Time

Page 23: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

ECC processing

1KB 64Bx16ECC

….

line addr parity valid 2 bit period

01

63

ADDR

HIT

MISS

recency cntr~30us

Period?

HIT

RALT Reduces Reads

•First read requires 1KB read.

•Subsequent reads

Page 24: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Remove Failing Bits w/ BitFix

Read

Repaired Line

Quick ECC> 1 fail?

N

Generate ECC

Write

Physical Line

Failure Locations

Page 25: Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Further Reducing Latency

Read

Failing Bits

Repaired Line

Failure Locations

Quick ECC> 1 fail?

Y

High latency ECC processing

• Of 128K lines < 900 require high latency ECC processing.• Disable lines with high failure rates.• If too many lines have multi-bit failures…

ECC