EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British...
-
date post
18-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British...
![Page 1: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/1.jpg)
EECE476: Computer Architecture
Lecture 25:Chapter 7, Memory and Caches
The University ofBritish Columbia EECE 476 © 2005 Guy Lemieux
![Page 2: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/2.jpg)
Motivation for CachesCPU vs. Memory Performance Gap
Memory is getting slower relative to CPU speeds (log scale!)
Goal: Make memory faster!
![Page 3: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/3.jpg)
Importance of Cache MemoryFast CPUs are Mostly Cache!
64 kBData
Cache
64 kBInstr.Cache
Load/Store
ExecutionUnit
Fetch ScanAlign
Micro-code
BusUnit
HyperTransport DDR Memory Interface
1 MB UnifiedInstruction/DataLevel 2 Cache
Floating-Point Unit
Memory Controller
Total Area: 193 mm2
42% 1MB L2 Cache
4% Instr. Cache
4% Data Cache
(50% is cache)
13% HyperTransport
10% DDR Memory
(23% is I/O)
6% Fetch/Scan/etc
4% Mem Controller
4% FPU
3% Exec Units
2% Bus Unit
(only 20% is actually CPU!)
![Page 4: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/4.jpg)
Main Memory
• What to use for Main Memory?– SRAM– DRAM– SDRAM– RAMBUS
– FLASH
– Disk
![Page 5: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/5.jpg)
Memory Technology• SRAM: Static RAM
– 6 transistors per bit• Expensive
– Transistors configured as 2 inverters in a loop• Stable, positive feedback holds value strongly (static)• Actively drive bit value along bitlines to sense amps
– Fast: can tune transistors and sense amps• Used to make cache memory!
• DRAM: Dynamic RAM– 1 transistor per bit
• Inexpensive– Transistor holds charge (C)
• Loses charge/value when driving bitline (dynamic)• Transistor leaks charge over time (dynamic)• Must recharge transistor periodically (including after a data-read)
– Slow• Transistors tiny, hold small charge• Sense amps must detect tiny change in voltage
(row select) word
bit bit
10
0 1
(row select) word
bit
C
![Page 6: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/6.jpg)
Memory Technology• SDRAM: Synchronous DRAM (not Static DRAM!)
– New, around 1995-1996
– Like DRAM, but pipelined (needs clock!)• Pipeline register on Address inputs• Pipeline register on Data outputs• Sometimes additional registers in-between!
– Multiple clock cycles to get data• Latency: CL=2, 2.5, or 3 cycles
– SDR vs DDR• Single data word, transfers once per clock cycle (SDR)• Double data word, twice per clock cycle (DDR, both edges)
– Clock rate• DDR: PC266, PC333, PC400 is 133MHz, 167MHz, 200MHz• SDR: PC100, PC133 is 100MHz, 133MHz
![Page 7: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/7.jpg)
Memory Technology
• RAMBUS– New, around SDRAM time– More complex than SDR, DDR SDRAM
– Faster clock rates (800MHz!)• Fancy signaling on circuit board• Narrow data width (16 bits)• Difficult to get working• Must license technology from Rambus Inc.• Rambus lawyers are costly, $$
– Longer latency (eg, ten cycles)– Overall memory speed higher (not by a lot!)
– Only used on high-end server PCs (too costly)
![Page 8: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/8.jpg)
Memory Technology
• FLASH Memory– Different beast: non-volatile
• Keeps power even when turned off!
– 1 transistor per bit (sometimes 0.5)• Very Cheap
– Operation• Trap charge in floating (disconnected) gate of transistor (tunneling)• Floating-gate keeps transistor turned on or off• Not leaky like DRAM
– Not suitable for main memory• Physically wears out with use (100,000 writes)• Writes are very slow, reads are slow (70ns)
![Page 9: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/9.jpg)
Memory Technology Trends
• Semiconductor manufacturing processes– SRAM & logic compatible– DRAM & logic incompatible– FLASH memory = logic process + extra masks + some tweaking
• Impact on CPU– On-chip SRAM feasible
• Can get FAST memory! (but at high cost)
– On-chip DRAM possible, but unlikely• Cannot get BIG memory
– On-chip FLASH may be feasible• Can store some non-volatile information
![Page 10: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/10.jpg)
Memory Technology Trends
Memory is getting slower relative to CPU speeds (log scale!)
![Page 11: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/11.jpg)
Recent Impact of Memory Speed
• 1996– 100 MHz CPU clock rate (10ns)– 80 ns Memory Access Time– Memory read: 8 CPU clock cycles– Add 8 pipeline stages just to access data memory?
• DF+DS+DT+DF+DF+DS+DS+DE ?
• 2003– 3 GHz CPU clock rate (0.33ns = 330ps)– PC400 DDR (200MHz or 5ns)– Memory read: 5ns x 2 cycles = 10ns
= 30 CPU clock cycles– Add 30 pipeline stages? Impossible to keep up!
![Page 12: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/12.jpg)
Memory Technology (1997)
Memory Technology
Access Time Cost/MB
SRAM 5-25 ns $100-$250
SDRAM 50-60 ns$10-$20
(today: cheaper than DRAM)
DRAM 60-120 ns $5-$10
Disk 10-20 million ns $0.10-$0.20
![Page 13: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/13.jpg)
Cache Memory• Problem:
– SRAM fast, but costly– DRAM cheap, but slow
• Solution:– Cache
• Small SRAM memory• Holds frequently-used data• Logically, insert between CPU and main memory
– Memory Hierarchy is born• Generally, use cheaper/bigger/slower memory as
you move farther away from CPU
• Question: How to access cache SRAM?
![Page 14: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/14.jpg)
Memory HierarchyCPU
Level n
Level 2
Level 1
Levels in thememory hierarchy
Increasing distance from the CPU in
access time
Size of the memory at each level
MultipleLevels ofMemory
![Page 15: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/15.jpg)
Memory Hierarchy
SRAM
CPURegisters
SDRAM
Cost ($/bit)
Smallest
Biggest
Highest
Lowest
Fastest
SlowestDisk
and/orTape
SizeSpeed
![Page 16: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/16.jpg)
Accessing a Cache• Cache: hide in French, a safe place to hide things
• Importance concept: transparent to user/software!– Wish to speed up ALL programs
• Do not want to rewrite old programs• Do not want to write programs to specifically use the cache
• How to hide? Need general cache management policy– CPU manages cache itself (NOT managed by software)
– Load data• If data is in cache
– retrieve from cache• Else, retrieve from main memory
– put a copy in cache
– Store data (write-through, no-alloc-on-write policy)• If data is in cache, write to that cache location and memory• Else write data to memory
![Page 17: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/17.jpg)
Using a Cache
• Problems– Finding existing location for data in cache?
– Finding new location for new cache data?
– Cache is full?• Finding a location that is no longer needed• Must evict data presently in cache
• Various Solutions– Different styles of caches!
![Page 18: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/18.jpg)
Associative Cache
• Choosing a location– Associative cache is very flexible– New data: any– Find existing data: must search all– Difficult, but not impossible
• CAM: content-addressable memory– Searches all locations (addresses) in “1 cycle”– Reports “match” location– Match location holds data
• Cache is full?– Must throw out old data– Need replacement or eviction policy
![Page 19: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/19.jpg)
Associative Cache:Replacement Policies
• Associative Cache is full? Possible replacement policies:– Ideal
• Non-causal: cannot predict what CPU will do in the future!• CPU architects use simulation to find performance of ideal cache
– Least Frequently Accessed• Count # of accesses, choose the one accessed the least• Problem: you will always choose to evict NEW DATA
– Least Recently Used (LRU)• Timestamp every time you use data in cache• Location with oldest timestamp is evicted
– Pseudo-LRU• Periodically “age” contents of cache• Flag data every time it is used• Location with “aged” status is evicted
RANDOM works too!
(LRU or PseudoLRU is slightly better, so is commonly used)
![Page 20: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/20.jpg)
Direct-mapped Cache
• Choosing a location– Much more restrictive than associative cache– New data: one eligible location– Find existing data: search one location only– Location: use lower bits of data address– Easy to use SRAM, fast access!
• Cache is full? Replacement is easy…– Only one location– Must evict old data
![Page 21: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/21.jpg)
Direct-mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
00
00
01
01
00
11
100
101
110
11
1
Cache Location
Memory Address
Each address inmemory maps toonly one locationin a direct-mappedcache
Lowest 3 bitsof addressdetermineslocation
![Page 22: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d225503460f949f8c0f/html5/thumbnails/22.jpg)
Direct-mapped Cache
2 0 1 0
B y t e
o f f s e t
0
1
2
1 0 2 1
1 0 2 2
1 0 2 3
2 0 3 2
3 1 3 0 1 3 1 2 1 1 2 1 0Memory Address
DataHit
DataTagVIndex
TagIndex
Cache Size:1024 locations* 4 data bytes each= 4kB cache
Overhead:1024 locations* 21 bits (Tag + V)= 2.626kB tag bits
(more than 50% overhead!)