Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions...
Transcript of Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions...
![Page 1: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/1.jpg)
Lecture 11 & 12: Caches
Cache overview
4 Hierarchy questions
More on Locality
Please bring these slides to the next lecture!
![Page 2: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/2.jpg)
Projects 2 and 3
• Regrade issues for 3
– Please resubmit and come to office hours with
a diff.
• Project 2
– Should be back over the weekend.
![Page 3: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/3.jpg)
Exam (updated on Monday)
• Grades: – 9: 88533211110
– 8: 99988765443332211100
– 7: 988776666555310
– 6: 764431
– 5: 9875
– 4: 84
– 3: 4
• IF your grade was solely
based on this exam, this is
what you’d probably get:
– 82+ A- to A+
– 66-81 B- to B+
– 57-65 C to C+
– 48-56 C-
– <48 D or lower
• Please note that these are
estimates. But the class
median has historically been a
high B+. So…
I’m loath to give “grade equivalents” because people think: “Well, I got a C on this,
but I’m getting good grades on other stuff, so I probably have a B”. The issue is that
homework and the three programming assignments tend to have fairly high
scores. So they don’t pull you up as much as you’d think. Plus they aren’t worth that much.
![Page 4: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/4.jpg)
Class project
• Project restrictions
– The I-cache and D-cache size is limited to the size it has in the
tarball.
• M-token
– Get one from the computer showcase if you haven’t.
• Status:
– You should be working on your high-level diagram.
• Use all of us to bounce ideas off of and just to talk to.
– We may give differing advice.
– Also the module for MS1.
• It is due on Monday
• Self testing, well written.
• Be aware of sample (single) RS.
![Page 5: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/5.jpg)
Memory pyramid
Disk (Many GB)
Memory (128MB – fewGB)
L2 Cache (½-32MB)
L1 Cache
(several KB)
Reg
100s bytes
Cache Design 101
1 cycle access (early in pipeline)
1-4 cycle access
6-15 cycle access
100-500 cycle access
Millions cycle access!
![Page 6: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/6.jpg)
Cache1
• 1 a : a hiding place especially for
concealing and preserving provisions or
implements
b : a secure place of storage
• 3 : a computer memory with very short
access time used for storage of
frequently used instructions or data --
called also cache memory
1From Merriam-Webster on-line
Cache overview
![Page 7: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/7.jpg)
Locality of Reference
• Principle of Locality:
– Programs tend to reuse data and instructions near those they
have used recently.
– Temporal locality: recently referenced items are likely to be
referenced in the near future.
– Spatial locality: items with nearby addresses tend to be
referenced close together in time. sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
*v = sum; Locality in Example:
• Data
–Reference array elements in succession (spatial)
• Instructions
–Reference instructions in sequence (spatial)
–Cycle through loop repeatedly (temporal)
![Page 8: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/8.jpg)
Caching: The Basic
Idea
• Main Memory
– Stores words
A–Z in example
• Cache
– Stores subset of the words
4 in example
– Organized in lines
• Multiple words
• To exploit spatial locality
• Access
– Word must be in cache for
processor to access
Big, Slow Memory
A
B
C •
•
•
Y
Z
Small,
Fast Cache
A
B
G
H
Processor
![Page 9: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/9.jpg)
Direct-mapped cache
29
123
150 162
18
33
19
210
00000
00010
00100
00110
01000
01010
01100
01110
10000
10010
10100
10110
11000
11010
11100
11110
Cache
V d tag data
Memory
78
120
71
173
21
28
200
225
0
0 0
0
Address
01101 218
44
141 28
33
181
119
66
23
10
16
214
98
129
42
74
Block Offset (1-bit)
Line Index (2-bit)
Tag (2-bit)
Compulsory Miss: first reference to memory block
Capacity Miss: Working set doesn’t fit in cache
Conflict Miss: Working set maps to same cache line
3-C’s
Cache overview
![Page 10: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/10.jpg)
2-way set associative cache
29
123
150 162
18
33
19
210
00000
00010
00100
00110
01000
01010
01100
01110
10000
10010
10100
10110
11000
11010
11100
11110
Cache
V d tag data
Memory
78
120
71
173
21
28
200
225
0
0 0
0
Address
01101 218
44
141 28
33
181
119
66
23
10
16
214
98
129
42
74
Block Offset (unchanged)
1-bit Set Index
Larger (3-bit) Tag
Impact on the 3C’s?
Cache overview
![Page 11: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/11.jpg)
Parameters
• Total cache size
– (block size # sets associativity)
• Associativity (Number of “ways”)
• Block size (bytes per block)
• Number of sets
Cache overview
![Page 12: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/12.jpg)
• Performance Measures – Miss rate
• % of memory refereces which are not found in the cache.
• A related measure is #misses per 1000 instructions
– Average memory access time • MR*TMiss
+ (1-MR)* THit – THit & TMiss --Access time for a hit or miss
• But what do we want to measure? – Impact on program execution time.
• What are some flaws of using – Miss Rate?
– Ave. Memory Access Time?
– Program execution time?
– Misses per 1000 instructions?
Cache overview
![Page 13: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/13.jpg)
Effects of Varying Cache Parameters
• Total cache size?
– Positives:
• Should decrease miss rate
– Negatives:
• May increase hit time
• Increased area requirements
• Increased power (mainly static)
– Interesting paper:
» Krisztián Flautner, Nam Sung Kim, Steve Martin,
David Blaauw, Trevor N. Mudge: Drowsy Caches:
Simple Techniques for Reducing Leakage Power.
ISCA 2002: 148-157
Cache overview
![Page 14: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/14.jpg)
Effects of Varying Cache Parameters
• Bigger block size?
– Positives:
• Exploit spatial locality ; reduce compulsory misses
• Reduce tag overhead (bits)
• Reduce transfer overhead (address, burst data mode)
– Negatives:
• Fewer blocks for given size; increase conflict misses
• Increase miss transfer time (multi-cycle transfers)
• Wasted bandwidth for non-spatial data
Cache overview
![Page 15: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/15.jpg)
Effects of Varying Cache Parameters
• Increasing associativity
– Positives:
• Reduces conflict misses
• Low-assoc cache can have pathological behavior (very
high miss)
– Negatives:
• Increased hit time
• More hardware requirements (comparators, muxes,
bigger tags)
• Minimal improvements past 4- or 8- way.
Cache overview
![Page 16: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/16.jpg)
Effects of Varying Cache Parameters
• Replacement Strategy: (for associative caches)
– LRU: intuitive; difficult to implement with high assoc; worst
case performance can occur (N+1 element array)
– Random: Pseudo-random easy to implement; performance
close to LRU for high associativity
– Optimal: replace block that has next reference farthest in the
future; hard to implement (need to see the future)
Cache overview
![Page 17: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/17.jpg)
Effects of Varying Cache Parameters
• Write Policy: How to deal with write misses?
– Write-through / no-allocate
• Total traffic? Read misses block size + writes
• Common for L1 caches back by L2 (esp. on-chip)
– Write-back / write-allocate
• Needs a dirty bit to determine whether cache data differs
• Total traffic? (read misses + write misses) block size +
dirty-block-evictions block size
• Common for L2 caches (memory bandwidth limited)
– Variation: Write validate
• Write-allocate without fetch-on-write
• Needs sub-block cache with valid bits for each word/byte
![Page 18: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/18.jpg)
4 Hierarchy questions
• Where can a block be placed?
• How do you find a block (and know you’ve
found it)?
• Which block should be replaced on a miss?
• What happens on a write?
Cache overview
![Page 19: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/19.jpg)
So from here…
• We need to think in terms of both the
hierarchy questions as well as
performance.
– We often will use Average Access Time as a
predictor of the impact on execution time. But
we will try to keep in mind they may not be the
same thing!
• Even all these questions don’t get at
everything!
Cache overview
![Page 20: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/20.jpg)
Set Associative as a change
from Direct Mapped
• Impact of being more associative?
– MR? TMiss? THit?
• Hierarchy questions:
– Where can a block be placed?
– How do you find a block (and know you’ve
found it)?
– Which block should be replaced on a miss?
– What happens on a write?
4 Hierarchy
questions
![Page 21: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/21.jpg)
Hash cache • Idea:
– Grab some bits from the tag and use them, as well as the old index bits, to select a set.
– Simplest version would be if N sets, grab the 2N lowest order bits after the offset and XOR them in groups of 2.
• Impact: – Impact of being more associative?
• MR? TMiss? THit?
– Hierarchy questions: • Where can a block be placed?
• How do you find a block (and know you’ve found it)?
• Which block should be replaced on a miss?
• What happens on a write?
4 Hierarchy
questions
![Page 22: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/22.jpg)
Skew cache
• Idea:
– As hash cache but a different and independent
hashing function is used for each way.
• Impact:
– Impact of being more associative?
• MR? TMiss? THit?
– Hierarchy questions:
• Where can a block be placed?
• How do you find a block (and know you’ve found it)?
• Which block should be replaced on a miss?
• What happens on a write?
4 Hierarchy
questions
![Page 23: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/23.jpg)
Victim cache
• Idea: – A small fully-associative cache (4-8 lines typically)
that is accessed in parallel with the main cache. This victim cache is managed as if it were an L2 cache (even though it is as fast as the main L1 cache).
• Impact: – Impact of being more associative?
• MR? TMiss? THit?
– Hierarchy questions: • Where can a block be placed?
• How do you find a block (and know you’ve found it)?
• Which block should be replaced on a miss?
• What happens on a write?
4 Hierarchy
questions
![Page 24: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/24.jpg)
Critical Word First
• Idea:
– For caches where the line size is greater than the
word size, send the word which causes the miss first
• Impact:
– Impact of being more associative?
• MR? TMiss? THit?
– Hierarchy questions:
• Where can a block be placed?
• How do you find a block (and know you’ve found it)?
• Which block should be replaced on a miss
• What happens on a write?
4 Hierarchy
questions
![Page 25: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/25.jpg)
Lots and lots of other things
• How do I really do replacement on highly-
associative caches (4+ ways)?
• We’ve skipped over writing
![Page 26: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/26.jpg)
Pseudo-LRU replacement
• # of bits needed to maintain order among
N items?
• So for N=16 we need: ____ bits.
• Any better ideas?
45
![Page 27: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/27.jpg)
Psuedo LRU
Way 0 Way 1 Way 2 Way 3
General theme:
On a hit or replacement, switch all the bits to point away
from you.
Replace the one pointed to.
0
0 0
1
1 1
![Page 28: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/28.jpg)
Multi-lateral caches (1/3)
• There are two cache structures, the main cache and the companion buffer. – The main cache is
set-associative
– The companion buffer can hold any line
• Any given access can be put in either structure and both need to be searched.
Set 0
Set 1
Way 0 Way 1
Main
cache
Companion
buffer
![Page 29: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/29.jpg)
Multi-lateral caches (2/3)
• Lots of cache schemes can be viewed as
multi-lateral caches
– Victim caches
– Assist caches
• Some things are super-sets of multi-lateral
caches
– Skew caches (only use one line of one of the
ways)
– Exclusive Multi-level caches (at least in
structure)
![Page 30: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/30.jpg)
Multi-lateral caches (3/3)
• Why is this term important?
– First the idea of having two structures in
parallel has an attractive sound to it.
• Bypassing1 is brave, but keeping low-locality
accesses around for only a short time seems
smart.
– Of course you have to identify low-locality accesses…
• Can somehow segregate data streams (much as
an I vs. D cache does) may improve performance.
– Some ideas include keeping the heap, stack, and global
variables in different structures!
1This means not putting the data into the cache
![Page 31: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/31.jpg)
3C’s model
• Break cache misses into three categories – Compulsory miss
– Capacity miss
– Conflict miss
• Compulsory – The block in question had never been accessed
before.
• Capacity – A fully-associative cache of the same size would also
have missed this access given the same reference stream.
• Conflict – That fully-associative cache would have gotten a hit.
![Page 32: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/32.jpg)
3C’s example
• Consider the “stream” of blocks 0, 1, 2, 0,
2, 1
– Given a direct-mapped cache with 2 lines,
which would hit, which would miss?
– Classify each type of miss.
![Page 33: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/33.jpg)
3C’s – sum-up.
• What’s the point?
– Well, if you can figure out what kind of misses
you are getting, you might be able to figure
out how to solve the problem.
• How would you “solve” each type?
• What are the problems?
![Page 34: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/34.jpg)
Reference stream
• A memory reference stream is an n-tuple
of addresses which corresponds to n
ordered memory accesses.
– A program which accesses memory locations
4, 8 and then 4 again would be represented
as (4,8,4).
![Page 35: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/35.jpg)
Locality of reference
• The reason small caches get such a good hit-rate is that
the memory reference steam is not random.
– A given reference tends to be to an address that was used
recently. This is called temporal locality.
– Or it may be to an address that is near another address that was
used recently. This is called spatial locality.
• Therefore, keeping recently accessed blocks in the
cache can result in a remarkable number of hits.
– But there is no known way to quantify the amount of locality in
the reference stream.
![Page 36: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/36.jpg)
Stack distance –
A measure of locality • Consider the reference stream
(0, 8, 4, 4, 12, 16, 32, 4)
• If the cache line size is 8 bytes, the block reference stream is
(0, 1, 0, 0, 1, 2, 4, 0)
• Now define the stack distance of a reference to be the number of unique block addresses between the current reference and the previous reference to the same block number. In this case
( , , 1, 0, 1, , , 3)
![Page 37: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/37.jpg)
Stack distance –
A measure of locality (2) Memory reference stream 0 8 4 4 12 16 32 4
Block reference stream 0 1 0 0 1 2 4 0
Stack distance 1 0 1 3
0
1
2
3
0 1 2 3
0
0.2
0.4
0.6
0 1 2 3 4
Number non-infinite accesses Cumulative stack distance
![Page 38: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/38.jpg)
Stack distances of the SPEC
benchmarks – cumulative
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000
SPECint
SPECfp
![Page 39: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/39.jpg)
Stack distances of selected
SPECfp benchmarks
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000
ammp
art
fma3d
galgel
lucas
mesa
mgrid
wupwise
![Page 40: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/40.jpg)
Why is this interesting?
• It is possible to distinguish locality from conflict.
– Pure miss-rate data of set-associative caches
depends upon both the locality of the reference
stream and the conflict in that stream.
• It is possible to make qualitative statements
about locality
– For example, SPECint has a higher degree of
locality than SPECfp.
![Page 41: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/41.jpg)
Fully-associative caches
• A fully-associative LRU cache consisting
of n lines will get a hit on any memory
reference access with a stack distance of
n-1 or less.
– Fully associative caches of size n store the
last n uniquely accessed blocks.
– This means the locality curves are simply a
graph of the hit rate on a fully associative
cache of size n-1.
![Page 42: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/42.jpg)
Direct-mapped caches
• Consider a direct-mapped cache of n cache
lines and the block reference stream (x, y, x).
– The second reference to x will be a hit unless y is
mapped to the same cache line as x.
– If x and y are independent, the odds x being a hit is
(n-1)/n
• In general if there are m unique accesses
between the two references to x, the odds of a
hit are: n
n
m1
![Page 43: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/43.jpg)
Expected hit rate at a given
stack distance for a 128-line
cache
0
0.2
0.4
0.6
0.8
1
0 100 200 300 400 500
Direct-mapped
2-way associative
4-way associative
8-way associative
Fully associative
![Page 44: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/44.jpg)
Verification – direct mapped
128-entry cache
0
0.2
0.4
0.6
0.8
1
0 100 200 300 400 500
Actual
Predicted
0
0.2
0.4
0.6
0.8
1
0 100 200 300 400 500
Actual
Predicted
SPECfp SPECint
![Page 45: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/45.jpg)
Why is this interesting?
• It is possible to see exactly why more
associative caches do better than less
associative caches
– It also becomes possible to see when they do worse.
– The area under the curve is always equal to the
number of cache lines.
• It provides an expectation of performance.
– If things are worse, there must be excessive conflict.
– If things are better, it is likely due to spatial locality.
![Page 46: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/46.jpg)
The 3C’s model The 3’Cs model describes
misses as conflict, capacity or
compulsory by comparing a
direct-mapped cache to a fully-
associative cache.
– Those accesses in the gray area
are conflict misses.
– The in the green area are
capacity misses.
– The blue area is where both
caches get a hit.
– The yellow area is ignored by the
3C’s model. Perhaps a “conflict
hit”?
The 3C’s model is much more limited.
It cannot distinguish between
expected conflict and excessive
conflict.
![Page 47: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/47.jpg)
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200
64-line DM
64-line FA
128-line FA
128-line DM
![Page 48: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/48.jpg)
The filtering of locality gcc after a 64KB direct-mapped cache
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000
Read/Write
All reads
Unfiltered
![Page 49: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/49.jpg)
The filtering of locality (2)
• Notice that there fewer references at the
lowest stack distances than at the highest.
– Yet it is at the lowest stack distances where
highly-associative caches concentrate their
power.
– The locality seen by the L2 cache is
fundamentally different than that seen by the
L1 cache
• A large enough cache can make this effect go
away…
![Page 50: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/50.jpg)
Measuring non-random conflict
• Combining the cache and locality models
makes it possible to predict a hit rate.
– On these reference streams the cache tends to do
better than predicted.
• Although some benchmarks, like mgrid, see significantly
worse performance.
Benchmark Average predicted
hit rate
Average actual
hit rate
SPECint 90.81% 91.38%
SPECfp 83.19% 83.84%
![Page 51: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/51.jpg)
Measuring non-random conflict
(2) • A more advanced technique, using a hash-
cache, allows us to roughly quantify the amount of excessive conflict and scant conflict. – This can be useful when deciding if a hash cache is
appropriate.
– It is also useful to provide feedback to the compiler about its data-layout choices.
• Other compilers (gcc for example) tend to have a higher degree of excessive conflict. – So this technique may also be able to tell us
something about compliers.
![Page 52: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/52.jpg)
Understanding non-standard
caches 128-line direct-mapped component and a 6-line victim component
0
0.2
0.4
0.6
0.8
1
0 100 200 300 400 500
Victim cache
Victim component
Direct-mapped component
![Page 53: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/53.jpg)
Some review
• Consider the access pattern A, B, C, A. Assume
the three accesses are all independently
randomly placed with uniform probability
– In a direct-mapped cache with 8 lines, what is the
probability of a miss?
– A two-way associative cache with 4 lines?
– A victim cache of 1 one backing up a direct-mapped
cache of 4 lines?
– A skew cache with 8 lines?
• What is bogus about the above assumptions?
![Page 54: Lecture 14: Caches · 2014. 2. 24. · Lecture 11 & 12: Caches Cache overview 4 Hierarchy questions More on Locality Please bring these slides to the next lecture! Projects 2 and](https://reader036.fdocuments.us/reader036/viewer/2022081411/60ac316cb1283062ff63c519/html5/thumbnails/54.jpg)
VIPT caches
• (Done on board)