Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE...
Transcript of Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE...
![Page 1: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/1.jpg)
MEMORY HIERARCHY DESIGN
CS/ECE 6810: Computer Architecture
Mahdi Nazm Bojnordi
Assistant Professor
School of Computing
University of Utah
![Page 2: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/2.jpg)
Overview
¨ Announcement¤ Homework 4 will be released on Oct. 30st
¨ This lecture¤ Memory hierarchy¤ Memory technologies¤ Principle of locality
¨ Cache concepts
![Page 3: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/3.jpg)
Memory Hierarchy
“Ideally one would desire an indefinitely large memory capacity such that any particular [...] word would be immediately available [...] We are [...] forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.”
-- Burks, Goldstine, and von Neumann, 1946
Level 1
Core
Level 2
Level 3
Greater capacityLess quickly accessible
![Page 4: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/4.jpg)
The Memory Wall
¨ Processor-memory performance gap increased over 50% per year¤ Processor performance historically improved ~60% per
year¤ Main memory access time improves ~5% per year
![Page 5: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/5.jpg)
Modern Memory Hierarchy
¨ Trade-off among memory speed, capacity, and cost
Register
CacheMemory
SSD
Disk
small, fast, expensive
big, slow, inexpensive
![Page 6: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/6.jpg)
Memory Technology
¨ Random access memory (RAM) technology¤ access time same for all locations (not so true anymore)
¤ Static RAM (SRAM)n typically used for cachesn 6T/bit; fast but – low density, high power, expensive
¤ Dynamic RAM (DRAM)n typically used for main memoryn 1T/bit; inexpensive, high density, low power – but slow
![Page 7: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/7.jpg)
RAM Cells
¨ 6T SRAM cell¤ internal feedback
maintains data while power on
¨ 1T-1C DRAM cell¤ needs refresh regularly to
preserve data
wordline
bitline bitline
wordline
bitline
![Page 8: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/8.jpg)
Processor Cache
¨ Occupies a large fraction of die area in modern microprocessors
Source: Intel Core i7
3-3.5 GHz~$1000 (2014)
![Page 9: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/9.jpg)
Processor Cache
¨ Occupies a large fraction of die area in modern microprocessors
Source: Intel Core i7
20MB of cache
3-3.5 GHz~$1000 (2014)
![Page 10: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/10.jpg)
Cache Hierarchy
¨ Example three-level cache organization
Core
L2L3
Off-chip Memory
32 KB1 cycle
256 KB10 cycles
4 MB30 cycles
8 GB~300 cycles
L1L1
![Page 11: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/11.jpg)
Cache Hierarchy
¨ Example three-level cache organization
Core
L2L3
Off-chip Memory
32 KB1 cycle
256 KB10 cycles
4 MB30 cycles
8 GB~300 cycles
Application
inst. data
L1L1
![Page 12: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/12.jpg)
Cache Hierarchy
¨ Example three-level cache organization
Core
L2L3
Off-chip Memory
32 KB1 cycle
256 KB10 cycles
4 MB30 cycles
8 GB~300 cycles
Application
inst. data
1. Where to put the application?2. Who decides?
a. software (scratchpad)b. hardware (caches)
L1L1
![Page 13: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/13.jpg)
Principle of Locality
¨ Memory references exhibit localized accesses¨ Types of locality
¤ spatial: probability of access to A+d at time t+ehighest when d→0
¤ temporal: probability of accessing A+e at time t+dhighest when d→0
Aspatial
ttemporal
Key idea: store local data in fast cache levels
for (i=0; i<1000; ++i) {sum = sum + a[i];
}
![Page 14: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/14.jpg)
Principle of Locality
¨ Memory references exhibit localized accesses¨ Types of locality
¤ spatial: probability of access to A+d at time t+ehighest when d→0
¤ temporal: probability of accessing A+e at time t+dhighest when d→0
Aspatial
ttemporal
Key idea: store local data in fast cache levels
for (i=0; i<1000; ++i) {sum = sum + a[i];
}temporal spatial
![Page 15: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/15.jpg)
Cache Terminology
¨ Block (cache line): unit of data access¨ Hit: accessed data found at current level
¤ hit rate: fraction of accesses that finds the data¤ hit time: time to access data on a hit
¨ Miss: accessed data NOT found at current level¤ miss rate: 1 – hit rate¤ miss penalty: time to get block from lower level
hit time << miss penalty
![Page 16: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/16.jpg)
Cache Performance
¨ Average Memory Access Time (AMAT)
problem: hit rate is 90%; hit time is 2 cycles; and accessing the lower level takes 200 cycles; find the average memory access time?
Outcome Rate Access Time
Hit rh th
Miss rm th + tp
AMAT = rhth+rm(th+tp)rh = 1 – rm
AMAT = th + rmtp
cache
Request
Hit
th
tp
Miss
![Page 17: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/17.jpg)
Cache Performance
¨ Average Memory Access Time (AMAT)
problem: hit rate is 90%; hit time is 2 cycles; and accessing the lower level takes 200 cycles; find the average memory access time?
AMAT = 2 + 0.1x200 = 22 cycles
Outcome Rate Access Time
Hit rh th
Miss rm th + tp
AMAT = rhth+rm(th+tp)rh = 1 – rm
AMAT = th + rmtp
cache
Request
Hit
th
tp
Miss
![Page 18: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/18.jpg)
Example Problem
¨ Assume that the miss rate for instructions is 5%; the miss rate for data is 8%; the data references per instruction is 40%; and the miss penalty is 20 cycles; find performance relative to perfect cache with no misses
![Page 19: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/19.jpg)
Example Problem
¨ Assume that the miss rate for instructions is 5%; the miss rate for data is 8%; the data references per instruction is 40%; and the miss penalty is 20 cycles; find performance relative to perfect cache with no misses
¤ misses/instruction = 0.05 + 0.08 x 0.4 = 0.082¤ Assuming hit time =1
n AMAT = 1 + 0.082x20 = 2.64n Relative performance = 1/2.64
![Page 20: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/20.jpg)
Summary: Cache Performance
¨ Bridging the processor-memory performance gap
Core
Main Memory
Main memory access time: 300 cycles
![Page 21: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/21.jpg)
Summary: Cache Performance
¨ Bridging the processor-memory performance gap
Core
Main Memory
Level-2
Level-1
Main memory access time: 300 cyclesTwo level cache
§ L1: 2 cycles hit time; 60% hit rate§ L2: 20 cycles hit time; 70% hit rate
What is the average mem access time?
![Page 22: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/22.jpg)
Summary: Cache Performance
¨ Bridging the processor-memory performance gap
Core
Main Memory
Level-2
Level-1
Main memory access time: 300 cyclesTwo level cache
§ L1: 2 cycles hit time; 60% hit rate§ L2: 20 cycles hit time; 70% hit rate
What is the average mem access time?AMAT = th1 + rm1 tp1
tp1 = th2 + rm2 tp2AMAT = 46
![Page 23: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/23.jpg)
Cache Addressing
¨ Instead of specifying cache address we specify main memory address
¨ Simplest: direct-mapped cache
1111111011011100101110101001100001110110010101000011001000010000
Memory
Cache
![Page 24: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/24.jpg)
Cache Addressing
¨ Instead of specifying cache address we specify main memory address
¨ Simplest: direct-mapped cache
1111111011011100101110101001100001110110010101000011001000010000
11100100
Note: each memory address maps to a single cache location determined by modulo hashing
Memory
Cache
![Page 25: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/25.jpg)
Cache Addressing
¨ Instead of specifying cache address we specify main memory address
¨ Simplest: direct-mapped cache
1111111011011100101110101001100001110110010101000011001000010000
11100100
Note: each memory address maps to a single cache location determined by modulo hashing
Memory
Cache
How to exactly specify which blocks are in the cache?
![Page 26: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/26.jpg)
Direct-Mapped Lookup
¨ Byte offset: to select the requested byte
¨ Tag: to maintain the address
¨ Valid flag (v): whether content is meaningful
¨ Data and tag are always accessed
hit
data
v012
…
102110221023
tag index byte
=
![Page 27: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/27.jpg)
Example Problem
¨ Find the size of tag, index, and offset bits for an 8MB, direct-mapped L3 cache with 64B cache blocks. Assume that the processor can address up to 4GB of main memory.
![Page 28: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/28.jpg)
Example Problem
¨ Find the size of tag, index, and offset bits for an 8MB, direct-mapped L3 cache with 64B cache blocks. Assume that the processor can address up to 4GB of main memory.
¨ 4GB = 232 B à address bits = 32¨ 64B = 26 B à byte offset bits = 6¨ 8MB/64B = 217 à index bits = 17¨ tag bits = 32 – 6 – 17 = 9
![Page 29: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/29.jpg)
Cache Optimizations
¨ How to improve cache performance?
¨ Reduce hit time (th)
¨ Improve hit rate (1 - rm)
¨ Reduce miss penalty (tp)
AMAT = th + rm tp
![Page 30: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/30.jpg)
Cache Optimizations
¨ How to improve cache performance?
¨ Reduce hit time (th)¤ Memory technology, critical access path
¨ Improve hit rate (1 - rm)
¨ Reduce miss penalty (tp)
AMAT = th + rm tp
![Page 31: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/31.jpg)
Cache Optimizations
¨ How to improve cache performance?
¨ Reduce hit time (th)¤ Memory technology, critical access path
¨ Improve hit rate (1 - rm)¤ Size, associativity, placement/replacement policies
¨ Reduce miss penalty (tp)
AMAT = th + rm tp
![Page 32: Mahdi NazmBojnordibojnordi/classes/6810/f19/slides/15-cache.pdf · MEMORY HIERARCHY DESIGN CS/ECE 6810: Computer Architecture Mahdi NazmBojnordi Assistant Professor School of Computing](https://reader033.fdocuments.us/reader033/viewer/2022060605/605a3986fdc08c33d1453444/html5/thumbnails/32.jpg)
Cache Optimizations
¨ How to improve cache performance?
¨ Reduce hit time (th)¤ Memory technology, critical access path
¨ Improve hit rate (1 - rm)¤ Size, associativity, placement/replacement policies
¨ Reduce miss penalty (tp)¤ Multi level caches, data prefetching
AMAT = th + rm tp