Dynamic Zero Compression for Cache Energy Reduction
description
Transcript of Dynamic Zero Compression for Cache Energy Reduction
![Page 1: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/1.jpg)
Dynamic Zero Compression
for Cache Energy Reduction
Luis Villa
Michael Zhang
Krste Asanovic
{luisv|rzhang|krste}@lcs.mit.edu
![Page 2: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/2.jpg)
Conventional Cache Structure
Energy Dissipation Bitlines (~75%) Decoders I/O Drivers Wordlines
wl bit bit_b
addr
BUS
Ad
dre
ss D
ecod
er
I/O
![Page 3: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/3.jpg)
Existing Energy Reduction Techniques
Sub-banking
Hierarchical Bitlines
Low-swing BitlinesOnly for reads, writes
performed full swing.
Wordline Gating
I/O
BUS
addr
Ad
dre
ss D
ecod
er
gwl
lwl
Offset Dec.
offset
SRAM Cells
SenseAmps
lwl
Offset Dec.
offset
SRAM Cells
SenseAmps
32128
![Page 4: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/4.jpg)
Asymmetry of Bits in Cache
>70% of the bits in D-cache accesses are “0”s Measured from SPECint95 and MediaBench Examples: small values, data types
Differential bitlines preferred in large SRAM designs. Better Noise Immunity Faster Sensing
Related work with single-ended bitlines [Tseng and Asanovic ’00] --- Used in register file
design with single-ended bitlines. [Chang et. al. ’99] --- Used in ROM and small
RAM with single-ended bitlines.
![Page 5: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/5.jpg)
Dynamic Zero Compression
Zero Indicator Bit One bit per grouping of bits Set if bits are zeros Controls wordline gating
I/O
addr
Ad
dre
ss D
ecod
er
lwl
SRAM Cells
SnsAmpoff dec
Address-controlled
BUS
lwl
SRAM Cells
Sns Amp
ZIB
Data-Controlled
![Page 6: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/6.jpg)
Data Cache Bitline Swing Reduction%
Bit
lin
e S
win
g R
ed
ucti
on
-10
0
10
20
30
40
50
com
p liijp
eg govo
rtex
m88
kgc
cpe
rl
adpc
m_e
n
adpc
m_d
eep
ic
unep
ic
g721
_en
g721
_de
mpe
g_en
mpe
g_de
pegw
it_en
pegw
it_de Avg
wordhalf-wordbytehalf-byte
Calculation includes the bitline swings introduced by ZIB
![Page 7: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/7.jpg)
Hardware Modifications
Zero Indicator Bit
Wordline Gating Circuitry
Sense Amplifier
CPU Store Driver
Cache Output Driver
![Page 8: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/8.jpg)
ZIB and Wordline Gating Circuitry
ZIB
Wordline Gating Circuitry
LWL
BitBWL
Bit_b
ZIB_b
W_EN
I/O
BUS
addr
Ad
dre
ss D
ecod
er
bw
l
SRAM Cells
Sense Amplifiers
ZIB
Cwl
Cwl/4
Small Delay Overhead
![Page 9: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/9.jpg)
Sense Amplifier Modification
BUS
Modified Sense-Amp
Bit Bit_b
zero
Data Bit
Sense-Amp
ZIB ZIB_bsense
ZIB
I/O
addr
Ad
dre
ss D
ecod
er
bw
l
SRAM Cells
Sense Amplifiers
ZIB
Zero-valued data: Not driven onto bus Not in critical path ZIB read w/o delay
![Page 10: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/10.jpg)
CPU Store and Cache Output Drivers
ZIB
W_EN
wri
te d
ata
CPU
LWL
Data Bits
ZIB
Cache
Data Bits
ZIB
Low
-Sw
ing
Bu
s
To WLG
8
88
8
Reduce Data Bus Energy Dissipation
![Page 11: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/11.jpg)
Area Overhead
Area Overhead: 9% Zero-Indicator-Bits Sense Amplifiers WLG Circuitry I/O Circuitry
Byte slice of the sub-bank
(Data,ZIB,WLG)
![Page 12: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/12.jpg)
Delay Overhead
No delay overhead for writes Zero check performed in parallel with tag check
2 F04 gate-delays for reads A pessimistic 7% worst case delay
Data Bits
Low
-Sw
ing
Bu
s
ZIB
LWL
![Page 13: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/13.jpg)
Data Cache Energy Savings%
of
En
erg
y S
avin
gs
Savings obtained for a low-power cache with sub-banking, wordline gating, and low-swing bitlines
0
5
10
15
20
25
30
35
40
45
comp li
ijpeg go
vorte
x
m88k
gcc
perl
adpc
m_en
adpc
m_de
epic
unep
ic
g721
_en
g721
_de
mpeg_
en
mpeg_
de
pegw
it_en
pegw
it_de Avg
![Page 14: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/14.jpg)
Bits Distribution for Instruction Cache
Zeros are not as prevalent in I-Cache. Use a recoding scheme to increase the zero-byte in I-cache. [Panich ’99] --- IWLG technique that compacts the
instructions. Use two-address form when src reg = dest reg Shorter immediates Three different instruction length: short, medium, long Gate the unused portion of the instruction to avoid bitline swing Faster read-out for top two bytes (opcode, reg. acc., inter-locks)
16 7 9Optimal:
lwl
s/m m/l
![Page 15: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/15.jpg)
IWLG to Dynamic Zero Compression
Adopting IWLG technique for Dynamic Zero Compression Small modification on instruction format
Use 8-8-8-8 instead of 16-7-9 Upper two byte are zero-detected Lower two bytes are usage-detected Able to eliminate bitline swings of zero-valued
bytes in 2 upper bytesExample: Opcode 000000
Slower than IWLG due to wordline gating in the critical path
s/m m/l0? 0?8 8 8 8
lwl
![Page 16: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/16.jpg)
Instruction Cache Bit Swing Reduction%
Red
ucti
on
of
Bit
lin
e
Sw
ing
s
0
5
10
15
20
25
30
35 byte w/o recodingbyte w/ recodingIWLG
![Page 17: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/17.jpg)
Instruction Cache Energy Savings%
En
erg
y S
avin
gs
0
5
10
15
20
25 byte w/o recodingbyte w/ recodingIWLG
![Page 18: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/18.jpg)
Conclusion
A novel hardware technique to reduce cache energy by eliminating the access of zero bytes. Small area and delay overhead
Area: 9%, Delay: 2 F04 gate-delays Average energy saving: D-Cache: 26%, I-
Cache:18%Processor wide: ~10% for typical embedded processors
Completely orthogonal to existing energy reduction techniques
Dynamic Zero Compression is applicable to Second level caches DRAM Datapath [Canal et. al. Micro-33]
![Page 19: Dynamic Zero Compression for Cache Energy Reduction](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813fd6550346895daab911/html5/thumbnails/19.jpg)
Thank You!
http://www.cag.lcs.mit.edu/scale/