CDA 5155
description
Transcript of CDA 5155
CDA 5155CDA 5155
Associativity in Caches
Lecture 25
New Topic: Memory SystemsNew Topic: Memory Systems24. Cache 101 – review of undergraduate material25. Associativity and other organization issues 26. Advanced designs and interactions with pipelines27. Tomorrow’s cache design (power/performance)28. Advances in memory design29. Virtual memory (and how to do it fast)
Direct-mapped cacheDirect-mapped cache
29
123
150162
18
33
19
210
00000000100010000110010000101001100011101000010010101001011011000110101110011110
Cache
V d tag data
Memory
78
120
71
173
21
28
200
225
0
00
0
Address
01011 218
44
14128
33
181
119
66
23
10
16
214
98
129
42
74
Block Offset (1-bit)
Line Index (2-bit)
Tag (2-bit)
Compulsory Miss: First reference to memory blockCapacity Miss: Working set doesn’t fit in cache Conflict Miss: Working set maps to same cache line
2-way set associative cache2-way set associative cache
29
123
150162
18
33
19
210
00000000100010000110010000101001100011101000010010101001011011000110101110011110
Cache
V d tag data
Memory
78
120
71
173
21
28
200
225
0
00
0
Address
01101 218
44
14128
33
181
119
66
23
10
16
214
98
129
42
74
Block Offset (unchanged)
1-bit Set Index
Larger (3-bit) Tag
Rule of thumb: Increasing associativity decreases conflict misses. A 2-way associative cache has about the same hit rate as a direct mapped cache twice the size.
Effects of Varying Cache ParametersEffects of Varying Cache Parameters• Total cache size: block size # sets associativity
– Positives:• Should decrease miss rate
– Negatives:• May increase hit time• Increased area requirements
Effects of Varying Cache ParametersEffects of Varying Cache Parameters• Bigger block size
– Positives:• Exploit spatial locality ; reduce compulsory misses• Reduce tag overhead (bits)• Reduce transfer overhead (address, burst data mode)
– Negatives:• Fewer blocks for given size; increase conflict misses• Increase miss transfer time (multi-cycle transfers)• Wasted bandwidth for non-spatial data
Effects of Varying Cache ParametersEffects of Varying Cache Parameters• Increasing associativity
– Positives:• Reduces conflict misses• Low-assoc cache can have pathological behavior (very high miss)
– Negatives:• Increased hit time• More hardware requirements (comparators, muxes, bigger tags)• Decreased improvements past 4- or 8- way.
Effects of Varying Cache ParametersEffects of Varying Cache Parameters• Replacement Strategy: (for associative caches)
– How is the evicted line chosen?
1. LRU: intuitive; difficult to implement with high assoc; worst case performance can occur (N+1 element array)
2. Random: Pseudo-random easy to implement; performance close to LRU for high associativity
3. Optimal: replace block that has its next reference farthest in the future; Belady replacement; hard to implement
Other Cache Design DecisionsOther Cache Design Decisions• Write Policy: How to deal with write misses?
– Write-through / no-allocate• Total traffic? Read misses block size + writes• Common for L1 caches back by L2 (esp. on-chip)
– Write-back / write-allocate• Needs a dirty bit to determine whether cache data differs• Total traffic? (read misses + write misses) block size +
dirty-block-evictions block size• Common for L2 caches (memory bandwidth limited)
– Variation: Write validate• Write-allocate without fetch-on-write• Needs sub-block cache with valid bits for each word/byte
Other Cache Design DecisionsOther Cache Design Decisions• Write Buffering
– Delay writes until bandwidth available• Put them in FIFO buffer • Only stall on write if buffer is full• Use bandwidth for reads first (since they have latency problems)
– Important for write-through caches since write traffic is frequent
• Write-back buffer– Holds evicted (dirty) lines for Write-back caches
• Also allows reads to have priority on the L2 or memory bus.• Usually only needs a small buffer
Adding a Victim cacheAdding a Victim cache V d tag data (Direct mapped)
0
00
0
0
00
0
0
00
0
0
00
0
0 V d tag data (fully associative)
000
110
1101001
010Small victim cache adds associativity to “hot” lines
• Blocks evicted from direct-mapped cache go to victim• Tag compares are made to direct mapped and victim• Victim hits cause lines to swap from L1 and victim• Not very useful for associative L1 caches
Victim cache (4 lines)
Ref: 11010011Ref: 01010011
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Hash-Rehash CacheHash-Rehash Cache V d tag data (Direct mapped)
0
00
0
0
00
0
0
00
0
0
00
0
110
110100110101001111010011
Hash-Rehash CacheHash-Rehash Cache V d tag data (Direct mapped)
0
00
0
0
00
0
0
00
0
0
00
0
110
110100110101001101000011
Allocate?
11010011
Miss
Rehashmiss
Hash-Rehash CacheHash-Rehash Cache V d tag data (Direct mapped)
0
00
0
0
00
0
0
00
0
0
00
0
110110100110101001101000011
11010011
Miss
Rehashmiss
R
010
Hash-Rehash CacheHash-Rehash Cache V d tag data (Direct mapped)
0
00
0
0
00
0
0
00
0
0
00
0
110110100110101001101000011
1101001111000011
Miss
RehashHit!
R
010
Hash-Hash-Rehash CacheRehash Cache
• Calculating performance:– Primary hit time (normal Direct mapped)
– Rehash hit time (sequential tag lookups)
– Block swap time?
– Hit rate comparable to 2-way associative.
Compiler support for cachingCompiler support for caching• Array Merging (array of structs vs. 2 arrays)• Loop interchange (row vs. column access)• Structure padding and alignment (malloc)• Cache conscious data placement
– Pack working set into same line
– Map to non-conflicting address is packing impossible
PrefetchingPrefetching• Already done – bring in an entire line assuming
spatial locality• Extend this… Next Line Prefetch
– Bring in the next block in memory as well a miss line (very good for Icache)
• Software prefetch– Loads to R0 have no data dependency
• Aggressive/speculative prefetch useful for L2• Speculative prefetch problematic for L1
Calculating the Effects of LatencyCalculating the Effects of Latency• Does a cache miss reduce performance?
– It depends on whether there are critical instructions waiting for the result
Calculating the Effects of LatencyCalculating the Effects of Latency– It depends on whether critical resources are held up
– Blocking: When a miss occurs, all later reference to the cache must wait. This is a resource conflict.
– Non-blocking: Allows later references to access cache while miss is being processed.
• Generally there is some limit to how many outstanding misses can be bypassed.