10/16: Lecture Topics
description
Transcript of 10/16: Lecture Topics
10/16: Lecture Topics
• Memory problem• Memory Solution: Caches• Locality• Types of caches
– Fully associative– Direct mapped– n-way associative
Memory Problem
• If all memory accesses (lw/sw) accessed main memory, programs would run 20 times slower
• And it’s getting worse– processors speed up by 50% annually– memory accesses speed up by 9%
annually– it’s becoming harder and harder to
keep these processors fed
Solution: Memory Hierarchy
• Keep all data in the big, slow, cheap storage
• Keep copies of the “important” data in the small, fast, expensive storage
fast, small, expensivestorage
slow, large, cheap storage
Memory Hierarchy
Memory Level
Fabrication Tech
Access Time (ns)
Size (bytes)
$/MB
Registers
Registers <0.5 256 1000
L1 Cache
SRAM 2 8K 100
L2 Cache
SRAM 10 1M 100
Memory DRAM 50 128M 0.75
Disk Magnetic Disk
10M 32G 0.0035
What is a Cache?
• A cache allows for fast accesses to a subset of a larger data store
• Your web browser’s cache gives you fast access to pages you visited recently– faster because it’s stored locally– subset because the web won’t fit on your disk
• The memory cache gives the processor fast access to memory that it used recently– faster because it’s located on the CPU
Locality
• Temporal locality: the principle that data being accessed now will probably be accessed again soon– Useful data tends to continue to be
useful
• Spatial locality: the principle that data near the data being accessed now will probably be needed soon– If data item n is useful now, then it’s
likely that data item n+1 will be useful soon
Memory Access Patterns
• Memory accesses don’t look like this– random accesses
• Memory accesses do look like this– hot variables– step through
arrays
Cache Terminology
• Hit—the data item is in the cache• Miss—the data item is not in the cache• Hit rate—the percentage of time the
data item is in the cache (hit ratio)• Miss rate—the percentage of time the
data item is not in the cache (hit ratio)• Hit time—the time required to access
data in the cache• Miss time—the time required to access
data not in the cache (miss penalty)• Access time—the time required to
access a level of the memory hierarchy
Effective Access Time
teffective = htcache + (1-h)tmemoryeffective
access time
memory access timecache
miss rate
cache access timecache
hit rate
aka, Average Memory Access Time (AMAT)
Access Time Example
• Suppose tmemory for main memory is 50ns
• Suppose tcache for the processor cache is 2ns
• We want an effective access time of 3ns
• What hit rate h do we need?
Cache Contents
• When do we put something in the cache?– when it is used for the first time
• When do we take something out of the cache?– when it hasn’t been used for a long
time– subset because all of memory won’t
fit on the CPU
Concepts in Caching
• Assume a two level hierarchy:• Level 1: a cache that can hold 8 words• Level 2: a memory that can hold 32 words
cache
memory
0000000
0000100
0001000
1110100
1111000
1111100
0001100
0010000
0010100
Fully Associative Cache
Address Valid Value
0010100 Y 0x00000001
0000100 N 0x09D91D11
0100100 Y 0x00000410
0101100 Y 0x00012D10
0001100 N 0x00000005
1101100 Y 0xDEADBEEF
0100000 Y 0x000123A8
1111100 N 0x00000200
• In a fully associative cache,– any memory word
can be placed in any cache line
– each cache line stores which address it contains
– accesses are slow (but not as slow as you would think)
Direct Mapped Caches
• Fully associative caches are too slow• With direct mapped caches the
address of the item determines where in the cache to store it– In this case, the lower five bits of the
address dictate the cache entry
Direct Mapped CacheAddress Valid Value
1100000 Y 0x00000001
1000100 N 0x09D91D11
0101000 Y 0x00000410
0001100 Y 0x00012D10
1010000 N 0x00000005
1110100 Y 0xDEADBEEF
0011000 Y 0x000123A8
1011100 N 0x00000200
• The lower 5 bits of the address determine where it is located in the table
Index
00000
00100
01000
01100
10000
10100
11000
11100
Address Tags
• A tag is a label for a cache entry indicating where it came from– The upper bits of the data item’s
address7 bit Address
1011101
Tag (2) Index (3) Byte Offset (2)
10 111 01
Cache with Address Tag
Tag Valid Value11 Y 0x00000001
10 N 0x09D91D11
01 Y 0x00000410
00 Y 0x00012D10
10 N 0x00000005
11 Y 0xDEADBEEF
00 Y 0x000123A8
10 N 0x00000200
Index
000
001
010
011
100
101
110
111
Reference Stream Example
11010, 10111, 00001, 11010, 11011, 11111, 01101, 11010
datatagvbindex
000
001
010
011
100
101110
111
Sample L1 Cache
• Suppose a L1 cache has the following characteristics– one word stored per entry– 1024 entries– direct mapped
• How many total bytes are stored in the cache?
• How many bits are used for the index and tag fields of the address?
• How many total bits are stored in the cache?
N-way Set Associative Caches
• Direct mapped caches cannot store two addresses with the same index
• If two addresses collide, then you have to kick one of them out
• 2-way associative caches can store two different addresses with the same index
• Reduces misses due to conflicts• Also, 3-way, 4-way and 8-way set
associative• Larger sets imply slower accesses
2-way Associative Cache
Tag Valid Value11 Y 0x00000001
10 N 0x09D91D11
01 Y 0x00000410
00 Y 0x00012D10
10 N 0x00000005
11 Y 0xDEADBEEF
00 Y 0x000123A8
10 N 0x00000200
Index000
001
010
011
100
101
110
111
Tag Valid Value00 Y 0x00000002
10 N 0x0000003B
11 Y 0x000000CF
10 N 0x000000A2
11 N 0x00000333
10 Y 0x00003333
01 Y 0x0000C002
10 N 0x00000005
Associativity Spectrum
Direct MappedFast to access
Conflict Misses
Fully AssociativeSlow to access
No Conflict Misses
N-way AssociativeSlower to access
Fewer Conflict Misses