1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K....

35
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007

Transcript of 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K....

Page 1: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

1

Line Distillation: Increasing Cache Capacity by

Filtering Unused Words in Cache Lines

Moinuddin K. Qureshi M. Aater Suleman

Yale N. Patt

HPCA 2007

Page 2: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

2

Introduction

Caches are organized at linesize granularity Helps when spatial locality is high

Unused words when spatial locality is low

Unused words occupy space without contributing to cache hits

Filtering unused words allows cache to store more cache lines

Page 3: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

3

Problem: Not all words are useful

On average less than 60% words used (4.7/8)

Cache line (64B) divided into 8 words of 8B each(1 MB 8-way L2 cache)

Word

s u

sed

per

line (

avg

)

Page 4: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

4

Goal: Improving cache performance

Smaller linesize can result in fewer unused words

Smaller linesize degrades cache performance

Linesize of 32B increases MPKI for 14 of 16 benchmarksAverage MPKI increases by 25%

Insight:Words usage stabilizes as line traverses from MRU to LRU

Goal: Improving cache performance by filtering unused words

Page 5: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

5

Insight

Footprint = 8-bits per line that tracks word usage

Most footprint updates occurearly in recency stack

Max recency position before footprint update

78%

5%

6%

11%

MRUPos 1Pos 2Pos 3Pos 4Pos 5Pos 6LRU

Recency Stack

Line Distillation (LDIS):Evict unused words when

line crosses certain recency

Page 6: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

6

Outline

Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary

Page 7: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

7

Framework for LDIS

PROCESSOR

ICACHE DCACHE

footprint

LOC WOC

L2 Cache

Distill Cache

valid bits

(sectored)

Line Organized Cache Word Organized Cache

Line from memory

Page 8: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

8

Distill Cache (Operation)

Traditional cache (4-way)

LOC WOC

MRU LRU

B AC

Four cases:1. Cache Miss: Access to line D2. LOC Hit: Access to line B3. WOC Hit: Access to line A (word A0)4. Hole Miss: Access to line A (word

A1)Words used? Evict

A[1:6]Install A0,A7

(A0,A7 used)

Install Line D in LOC and update LRU state

Same as traditional cache

Send A0 and A7 to L1 and valid bitsInvalidate all words of A in WOC.

Fetch A from Memory and install in LOC

DA0,A7

Page 9: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

9

Median Threshold Filtering

A line with many used words can evict several lines from WOC

A0 B0 C0 D0 E0 F0 G0 H0Line X has all 8 words used

X0 X1 X2 X3 X4 X5 X6 X78 Lines

evicted from WOC

WOC

Increase lines in WOC by not installing lines for which used words > threshold “K”

K = median words used in LOC line (computed at runtime)

Page 10: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

10

Outline

Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary

Page 11: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

11

Methodology

Configuration:

L2 cache: 1MB 8-way 64B linesize

(Distill cache gives 6 ways to LOC and 2 ways to WOC)

Out-of-order processor with 16KB 2-way L1s

400 cycle memory

Benchmarks:

15 SPEC2K benchmarks + health from olden suite

(A 250M instruction slice using SimPoint for SPEC2K)

Page 12: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

12

ResultsLDIS (No MT) LDIS (with MT)

LDIS (MT) reduces MPKI by 25%

(%)

Reduct

ion

in L

2 M

PK

I

Page 13: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

13

Reverter Circuit (RC)

Tournament selection: Distill cache vs. traditional cache Dynamic set sampling with 32 sets [Qureshi+ ISCA’06]

For sets A, C, D, F, H:if (SCTR > 75%) Enable LDISif (SCTR < 25%) Disable LDIS

ATD-LRU

Distill cache

Set B

Set E

Set G

Set A

Set CSet D

Set F

Set H

Set B

Set E

Set G

Set A

Set CSet D

Set F

Set H

Set BSet ESet G

SCTR

- +

(storage overhead of ATD: 1KB)

Page 14: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

14

Results with RCLDIS (MT, No RC) LDIS (MT,RC)

RC disables LDIS when it increases MPKI.

LDIS (MT,RC) reduces MPKI by 30%

(%)

Reduct

ion

in L

2 M

PK

I

Page 15: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

15

Overheads

Storage Tags for WOC + footprint bits: 12.2%

overhead

LatencyTag-access (LOC+WOC) increases by one

cycle WOC hits incur two cycles to rearrange words

PowerAdditional power of WOC tag-store

Page 16: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

16

IPC Results

LDIS improves average IPC by 12%

(%)

IPC

Im

pro

vem

en

t

Page 17: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

17

Outline

Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary

Page 18: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

18

Compression vs. LDIS

Several proposals to increase capacity via compression

Compression and LDIS fundamentally different Compression exploits redundancy in stored data LDIS leverages unused words for spare capacity

Footprint Aware Compression (FAC) combines both

FAC compresses used words before installing in WOC

Page 19: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

19

Results for FAC

Compression and LDIS interact positively.

FAC reduces MPKI by 50%

LDIS Compression FAC

(%)

Reduct

ion

in L

2 M

PK

I

50

40

30

20

10

0

Page 20: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

20

Outline

Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary

Page 21: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

21

Related work

Spatial-Temporal Cache -Gonzales+ [ICS’95]

Spatial Locality Prediction –Johnson+ [ISCA’97]

Variable Linesize Cache –Veidenbaum+ [ICS’99]

Spatial Footprint Prediction –Kumar+ [ISCA’98], Pujara+ [HPCA’06]

Spatial Pattern Prediction -Chen+ [HPCA’05]

LDIS is particularly suited for large caches and outperforms predictor-based techniques without

requiring separate structure for tracking spatial footprint

Page 22: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

22

Contributions

Line Distillation: Filter unused words without a separate footprint predictor

Distill cache: Utilize extra capacity created by LDIS

Median Threshold Filtering and Reverter Circuit: Improve performance and robustness of LDIS Result: LDIS (MT+RC) reduces MPKI by 30%

Footprint Aware Compression: LDIS + compressionResult: FAC reduces MPKI by 50%

Page 23: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

23

Questions

Page 24: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

24

Result comparing capacity

Page 25: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

25

Line Size vs. MPKI

Page 26: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

26

Distribution of Hit-Miss

Page 27: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

27

Average words usage (detailed)

Page 28: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

28

Result for 3 types of LDIS

Page 29: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

29

Replacement

LRU in LOC

WOC needs variable sized replacement

Only power-of-two sizes allowed in WOC

Placement constrained to alignment boundary

Random selection in case of multiple candidates

Page 30: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

30

Background (pictorial)

Page 31: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

31

Result LDIS vs. FAC (detailed)

Page 32: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

32

Comparison with SFP

Page 33: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

33

Appendix A: Other SPEC Benchmarks

Page 34: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

34

Appendix B: Cache Size vs. Density

Page 35: 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

35

Summary

Many words in cache lines remain unused

Unused words unlikely to be accessed in less recent part of LRU stack Line Distillation (LDIS)

Distill-cache utilizes extra capacity created by LDIS

LDIS reduces MPKI by 30% and improves IPC by 12%

“Footprint Aware Compression” combines LDIS and compression to reduce MPKI by 50%