Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
![Page 1: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/1.jpg)
Caches
J. Nelson AmaralUniversity of Alberta
![Page 2: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/2.jpg)
Processor-Memory Performance Gap
Bauer p. 47
![Page 3: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/3.jpg)
Memory Hierarchy
Bauer p. 48
![Page 4: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/4.jpg)
Principle of Locality
• Temporal Localitywhat was used in the past is likely to be reused in the near future
• Spatial Localitywhat is close to the thing that is being used now is likely to be also used in the near future
Bauer p. 48
![Page 5: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/5.jpg)
Hits and Misses
• Cache hit: the requested location is in the cache
• Cache miss: the requested location in not in the cache
Bauer p. 48
![Page 6: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/6.jpg)
Cache Organizations
• When to bring the content of a memory location into the cache?
• Where to put it?• How do we know it is there?• What happens if the cache is full and we need
to bring the content of a location into the cache?
On demand
Depends on Cache Organization
Tag entries
Use a replacement algorithm
Bauer p. 49
![Page 7: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/7.jpg)
Cache Organization
Bauer p. 50
![Page 8: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/8.jpg)
Mapping
Bauer p. 51
![Page 9: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/9.jpg)
Content-Addressable Memories (CAMs)
• Indexed by matching (part of) the content of entries
• All entries are searched in parallel• Drawbacks:
– expensive hardware– consume more power– difficult to modify
Bauer p. 50
![Page 10: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/10.jpg)
Cache Geometry
• C: number of cache lines• m: number of banks in the cache (associativity)• L: line size• S: Cache size (or capacity)• S = C × L• (S, L, m) gives the geometry of a cache• d: number of bits needed for displacement
Bauer p. 52
![Page 11: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/11.jpg)
Hit and Miss Detection
(S,L,m) = (32KB, 16B, 1)Cache Geometry:
Memory Reference:(t,i,d) = (?, ?, ?)
d = log2 L = log2 16 = 4
i = log2 (C/m) = log 2048 = 11
C = S/L = 32KB/16B = 2048
t= 32 – i – d = 32 – 11 – 4 = 17
Bauer p. 52
• C: # of cache lines• m: associativity• L: line size• S: Cache size
• S = C × L• (S, L, m): geometry• d: # displacement bits
(t,i,d) = (tag, index, displacement)
![Page 12: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/12.jpg)
Hit and Miss Detection
d = log2 L = log2 16 = 4
i = log2 (C/m) = log 2048 = 11
C = S/L = 32KB/16B = 2048
t= 32 – i – d = 32 – 11 – 4 = 17
Bauer p. 52
What happens to t if we doublethe line size?
32
32B
5
1024
1024
10
10
5
• C: # of cache lines• m: associativity• L: line size• S: Cache size
• S = C × L• (S, L, m): geometry• d: # displacement bits
(t,i,d) = (tag, index, displacement)
(S,L,m) = (32KB, 16B, 1)32
![Page 13: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/13.jpg)
Hit and Miss Detection
d = log2 L = log2 16 = 4
i = log2 (C/m) = log 2048 = 11
C = S/L = 32KB/16B = 2048
t= 32 – i – d = 32 – 11 – 4 = 17
Bauer p. 52
What happens to t if we changeto a 2-way associativity?
1024 10
10 17
Need one more comparatorand a multiplexor.
(S,L,m) = (32KB, 16B, 1) 2
• C: # of cache lines• m: associativity• L: line size• S: Cache size
• S = C × L• (S, L, m): geometry• d: # displacement bits
(t,i,d) = (tag, index, displacement)
![Page 14: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/14.jpg)
Replacement Algorithm
• Direct mapped– There is only one location for a block– If the location is occupied, the block that is there
is evicted• m-way set associative
– If all m are valid, must select a victim• Low associativity:
- Least-Recently Used (LRU) entry should be evicted- High associativity:
- (Two) Most-Recently Used (MRU) should not be evicted.
Bauer p. 53
![Page 15: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/15.jpg)
Write Strategies (on a hit)
• Write back– Write only to the cache (memory becomes stale)– Add a dirty bit to each cache line– Must write back to memory when entry is evicted
• Write through– Write to both cache and memory– No need to have a dirty bit– Memory is consistent at all times
Bauer p. 54
![Page 16: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/16.jpg)
Write Strategies (on a miss)
• Write allocate– read the line from the memory– write to the line to modify it
• Write around– write to the next level only
• Combinations that make sense:– write back with write allocate– write through with write around
Bauer p. 54
![Page 17: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/17.jpg)
Write Buffer
Processor CacheWriteBuffer MemoryRead
Read
WriteWrite
Bauer p. 54
![Page 18: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/18.jpg)
The three C’s• Compulsory (cold) misses
– first time a memory block is referenced• Conflict misses
– more than m blocks compete for the same cache entries in an m-way cache
• Capacity misses– more than C blocks compute for space in a cache with
C lines• Coherence misses
– needed blocks are invalidated because of I/O or multiprocessor operations.
Bauer p. 54
![Page 19: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/19.jpg)
Caches and I/O (read)
Bauer p. 55
What happens to the cache when data need to move fromdisk to memory?
1. Invalidate cache data using valid bit.
![Page 20: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/20.jpg)
Caches and I/O (read)
Bauer p. 55
2. Update cache with new data.
What happens to the cache when data need to move fromdisk to memory?
![Page 21: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/21.jpg)
Caches and I/O (Write)
Bauer p. 55
What happens to the cache when data need to move frommemory to disk?
purge dirty lines
Alternative: Hardware Snoopy Protocol.
![Page 22: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/22.jpg)
Cache Performance
Hit Ratio:
€
h= number of memory references that hit the cache
total number of memory references to the cache
€
miss ratio= 1- h
€
Average Memory Access Time= h× Tcache+ (1- )h Tmem
For two levels of cache:
€
AMAT = h1 ×T 1L + (1- h1 ) ×h2 ×T 2L +(1 - h1 ) ×(1 - h2 ) ×Tmem
Bauer p. 56
![Page 23: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/23.jpg)
Cache Performance
€
AMAT = h× Tcache+ (1- )h Tmem
Goal: Reduce AMAT
Strategies: 1. Increase hit ratio (h) 2. Reduce Tcache
Parameters: 1. Cache Capacity 2. Cache Associativity 3. Cache Line Size
Bauer p. 56
![Page 24: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/24.jpg)
Influence of Capacity on Miss Rate
Bauer p. 57Cache is (S, 2, 64) Application: 176.gcc
![Page 25: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/25.jpg)
Associativity X Miss Rate
Cache is (32KB, m, 64) Application: 176.gcc
![Page 26: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/26.jpg)
Line Size X Miss Rate
Cache is (16KB, 1, L)
![Page 27: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/27.jpg)
Memory Access time
€
AMAT = h× Tcache+ (1- )h Tmem
€
Tmem = Tacc+ ( / )L w TbusTacc : Time to send address + Time to Read
L : L2 cache line size
w : Bus width
Tbus : bus cycle time
€
AMAT = h× Tcache+ (1- )h ×(Tacc + (L/w) ×Tbus )
![Page 28: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649d435503460f94a1f726/html5/thumbnails/28.jpg)
AMAT Example
€
Tacc : 5 cycles
w : 64 bits
Tbus : 2 cycles
Cache CA : hA = 0.88, LA = 16 bytes
Cache CB : hB = 0.92, LB = 32 bytes
Both access time (CA and CB) is 1 cycle
We will study two alternative configurations, CA and CB, for a single level of cache. What is the AMAT in each case?
€
AMATA = 0.88 × 1+ (1- 0.88) ×(5 + (16 /8) × 2) = 1.96
€
AMAT = h× Tcache+ (1- )h ×(Tacc + (L/w) ×Tbus )
€
AMATB = 0.92 × 1+ (1 - 0.92) ×(5 + (16 /4) × 2) = 1.96