CDA 5155

20
CDA 5155 CDA 5155 Associativity in Caches Lecture 25

description

CDA 5155. Associativity in Caches Lecture 25. New Topic: Memory Systems. Cache 101 – review of undergraduate material Associativity and other organization issues Advanced designs and interactions with pipelines Tomorrow’s cache design (power/performance) Advances in memory design - PowerPoint PPT Presentation

Transcript of CDA 5155

Page 1: CDA 5155

CDA 5155CDA 5155

Associativity in Caches

Lecture 25

Page 2: CDA 5155

New Topic: Memory SystemsNew Topic: Memory Systems24. Cache 101 – review of undergraduate material25. Associativity and other organization issues 26. Advanced designs and interactions with pipelines27. Tomorrow’s cache design (power/performance)28. Advances in memory design29. Virtual memory (and how to do it fast)

Page 3: CDA 5155

Direct-mapped cacheDirect-mapped cache

29

123

150162

18

33

19

210

00000000100010000110010000101001100011101000010010101001011011000110101110011110

Cache

V d tag data

Memory

78

120

71

173

21

28

200

225

0

00

0

Address

01011 218

44

14128

33

181

119

66

23

10

16

214

98

129

42

74

Block Offset (1-bit)

Line Index (2-bit)

Tag (2-bit)

Compulsory Miss: First reference to memory blockCapacity Miss: Working set doesn’t fit in cache Conflict Miss: Working set maps to same cache line

Page 4: CDA 5155

2-way set associative cache2-way set associative cache

29

123

150162

18

33

19

210

00000000100010000110010000101001100011101000010010101001011011000110101110011110

Cache

V d tag data

Memory

78

120

71

173

21

28

200

225

0

00

0

Address

01101 218

44

14128

33

181

119

66

23

10

16

214

98

129

42

74

Block Offset (unchanged)

1-bit Set Index

Larger (3-bit) Tag

Rule of thumb: Increasing associativity decreases conflict misses. A 2-way associative cache has about the same hit rate as a direct mapped cache twice the size.

Page 5: CDA 5155

Effects of Varying Cache ParametersEffects of Varying Cache Parameters• Total cache size: block size # sets associativity

– Positives:• Should decrease miss rate

– Negatives:• May increase hit time• Increased area requirements

Page 6: CDA 5155

Effects of Varying Cache ParametersEffects of Varying Cache Parameters• Bigger block size

– Positives:• Exploit spatial locality ; reduce compulsory misses• Reduce tag overhead (bits)• Reduce transfer overhead (address, burst data mode)

– Negatives:• Fewer blocks for given size; increase conflict misses• Increase miss transfer time (multi-cycle transfers)• Wasted bandwidth for non-spatial data

Page 7: CDA 5155

Effects of Varying Cache ParametersEffects of Varying Cache Parameters• Increasing associativity

– Positives:• Reduces conflict misses• Low-assoc cache can have pathological behavior (very high miss)

– Negatives:• Increased hit time• More hardware requirements (comparators, muxes, bigger tags)• Decreased improvements past 4- or 8- way.

Page 8: CDA 5155

Effects of Varying Cache ParametersEffects of Varying Cache Parameters• Replacement Strategy: (for associative caches)

– How is the evicted line chosen?

1. LRU: intuitive; difficult to implement with high assoc; worst case performance can occur (N+1 element array)

2. Random: Pseudo-random easy to implement; performance close to LRU for high associativity

3. Optimal: replace block that has its next reference farthest in the future; Belady replacement; hard to implement

Page 9: CDA 5155

Other Cache Design DecisionsOther Cache Design Decisions• Write Policy: How to deal with write misses?

– Write-through / no-allocate• Total traffic? Read misses block size + writes• Common for L1 caches back by L2 (esp. on-chip)

– Write-back / write-allocate• Needs a dirty bit to determine whether cache data differs• Total traffic? (read misses + write misses) block size +

dirty-block-evictions block size• Common for L2 caches (memory bandwidth limited)

– Variation: Write validate• Write-allocate without fetch-on-write• Needs sub-block cache with valid bits for each word/byte

Page 10: CDA 5155

Other Cache Design DecisionsOther Cache Design Decisions• Write Buffering

– Delay writes until bandwidth available• Put them in FIFO buffer • Only stall on write if buffer is full• Use bandwidth for reads first (since they have latency problems)

– Important for write-through caches since write traffic is frequent

• Write-back buffer– Holds evicted (dirty) lines for Write-back caches

• Also allows reads to have priority on the L2 or memory bus.• Usually only needs a small buffer

Page 11: CDA 5155

Adding a Victim cacheAdding a Victim cache V d tag data (Direct mapped)

0

00

0

0

00

0

0

00

0

0

00

0

0 V d tag data (fully associative)

000

110

1101001

010Small victim cache adds associativity to “hot” lines

• Blocks evicted from direct-mapped cache go to victim• Tag compares are made to direct mapped and victim• Victim hits cause lines to swap from L1 and victim• Not very useful for associative L1 caches

Victim cache (4 lines)

Ref: 11010011Ref: 01010011

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

Page 12: CDA 5155

Hash-Rehash CacheHash-Rehash Cache V d tag data (Direct mapped)

0

00

0

0

00

0

0

00

0

0

00

0

110

110100110101001111010011

Page 13: CDA 5155

Hash-Rehash CacheHash-Rehash Cache V d tag data (Direct mapped)

0

00

0

0

00

0

0

00

0

0

00

0

110

110100110101001101000011

Allocate?

11010011

Miss

Rehashmiss

Page 14: CDA 5155

Hash-Rehash CacheHash-Rehash Cache V d tag data (Direct mapped)

0

00

0

0

00

0

0

00

0

0

00

0

110110100110101001101000011

11010011

Miss

Rehashmiss

R

010

Page 15: CDA 5155

Hash-Rehash CacheHash-Rehash Cache V d tag data (Direct mapped)

0

00

0

0

00

0

0

00

0

0

00

0

110110100110101001101000011

1101001111000011

Miss

RehashHit!

R

010

Page 16: CDA 5155

Hash-Hash-Rehash CacheRehash Cache

• Calculating performance:– Primary hit time (normal Direct mapped)

– Rehash hit time (sequential tag lookups)

– Block swap time?

– Hit rate comparable to 2-way associative.

Page 17: CDA 5155

Compiler support for cachingCompiler support for caching• Array Merging (array of structs vs. 2 arrays)• Loop interchange (row vs. column access)• Structure padding and alignment (malloc)• Cache conscious data placement

– Pack working set into same line

– Map to non-conflicting address is packing impossible

Page 18: CDA 5155

PrefetchingPrefetching• Already done – bring in an entire line assuming

spatial locality• Extend this… Next Line Prefetch

– Bring in the next block in memory as well a miss line (very good for Icache)

• Software prefetch– Loads to R0 have no data dependency

• Aggressive/speculative prefetch useful for L2• Speculative prefetch problematic for L1

Page 19: CDA 5155

Calculating the Effects of LatencyCalculating the Effects of Latency• Does a cache miss reduce performance?

– It depends on whether there are critical instructions waiting for the result

Page 20: CDA 5155

Calculating the Effects of LatencyCalculating the Effects of Latency– It depends on whether critical resources are held up

– Blocking: When a miss occurs, all later reference to the cache must wait. This is a resource conflict.

– Non-blocking: Allows later references to access cache while miss is being processed.

• Generally there is some limit to how many outstanding misses can be bypassed.