ECE 2300 Digital Logic & Computer Organization...Hit time + Miss rate * Miss penalty • Example –...
Transcript of ECE 2300 Digital Logic & Computer Organization...Hit time + Miss rate * Miss penalty • Example –...
Lecture 20:
Spring 2018
1
Caches
ECE 2300Digital Logic & Computer Organization
Lecture 20:
Announcements
2
• HW7 will be posted tonight
• Instructor OH cancelled today
• Lab sessions resume next week
Lecture 20:
Course Content• Binary numbers and logic gates• Boolean algebra and combinational logic• Sequential logic and state machines• Binary arithmetic• Memories• Instruction set architecture• Processor organization• Caches and virtual memory• Input/output• Advanced topics
3
Lecture 20: 4
ALU
Fm … F0
MUX
MD
CU
PC
PCL
MUX
PCJMW
D_IN
+2 Adder
IF/ID ID/EX EX/MEM MEM/WB
Decoder
SE
InstRAM
DataRAM
MB
MUX
MUX
MUX
MUX
MUX
LDSASBDR
D_in
RF
=?sign bit
PCJ
MB, F MW, MD LD
Review: Pipelined Microprocessor
Lecture 20: 5
• Assume HW forwarding and NO delay slot for load
• Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back
LW R2, 0(R1) X: SW R2, 4(R1)
ADDI R3, R2, 1BEQ R3, R1, X
Example: Data Hazards with Forwarding
Lecture 20:
We Need Fast and Large Memory
Instruction RAM
RegReg ALU
Data RAM
6
IF/ID ID/EX EX/MEM MEM/WB
IF ID EX MEM WB
• Processor cycle time: ~300ps-2ns (~3GHz-500MHz)• DRAM
– Slow (10-50 ns for a read or write)– Cheap (1 transistor + capacitor per bit cell)
• SRAM– Fast (100’s of ps to few ns for a read/write)– Expensive (6 transistors per bit cell)
Lecture 20:
Using Caches in the Pipeline
InstructionCache
(SRAM)
RegReg ALU
DataCache
(SRAM)
7
IF/ID ID/EX EX/MEM MEM/WB
IF ID EX MEM WB
Main Memory(DRAM)
Lecture 20:
Cache
8
• Small SRAM memory that permits rapid access to a subset of instructions or data – If the data is in the cache (cache hit), we retrieve it without
slowing down the pipeline– If the data is not in the cache (cache miss), we retrieve it from the
main memory (penalty incurred in accessing DRAM)
• The hit rate is the fraction of memory accesses found in the cache – The miss rate is (1 – hit rate)
Lecture 20:
Memory Access with Cache
9
• Average memory access time with cache: Hit time + Miss rate * Miss penalty
• Example– Main memory access time = 50ns– Cache hit time = 2ns– Miss rate = 10%
Average mem access time w/o cache = 50ns
Average mem access time w/ cache = 2 + 0.1*50 = 7ns
Lecture 20:
Why Caches Work: Principle of Locality• Temporal locality
– If memory location X is accessed, then it is likely to be accessed again in the near future• Caches exploit temporal locality by keeping a referenced
instruction or data in the cache
• Spatial locality– If memory location X is accessed, then locations near
X are likely to be accessed in the near future • Caches exploit spatial locality by bringing in a block of
instructions or data into the cache on a miss
10
Lecture 20:
Some Important Terms
11
• Cache is partitioned into blocks– Each cache block (or cache line) typically contains
multiple bytes of data – A whole block is read or written during data transfer
between cache and main memory
• Each cache block is associated with a tag and a valid bit – Tag: A unique ID to differentiate between different
memory blocks may be mapped into the same block – Valid bit: indicates whether the data in a cache block
is valid (1) or not (0)
Lecture 20: 12
• A given memory block is mapped to one and only one cache block
CacheBlock
MemoryBlock
0 0, 8, 16, 241 1, 9, 17, 252 2, 10, 18, 263 3, 11, 19, 274 4, 12, 20, 285 5, 13, 21, 296 6, 14, 22, 307 7, 15, 23, 31
Example:• A cache with 8 blocks
– Block addresses in decimal
• Assume the main memory is 4 times larger than cache (i.e., 32 blocks)
Direct Mapped Cache Concepts
Lecture 20: 13
Lecture 21: 4
00001
00101
01001
01101
10001
10101
11001
11101
Memory
Cache
Direct Mapped (DM) Cache Concepts
Same example • Cache has 8 blocks and
main memory has 32 blocks
• Block addresses are in binary
• 4 different memory blocks may be mapped to the same cache location
Lecture 20: 14
Address Translation for DM Cache
n-i-b tag bits i index bitsb byte offset bits
• DM cache parameters – Size of each cache block is 2b bytes
• “cache block” and “cache line” are synonymous– Number of blocks is 2i
– Total cache size is 2b × 2i = 2b+i bytes
• Breakdown of a n-bit memory address for cache use
Lecture 20: 15
DM Cache Organization32-bit memory addressDM cache parameters• 2 byte offset bits• 10 index bits• 20 tag bits
Lecture 20:
Reading DM Cache• Use the index bits to retrieve
the tag, data, and valid bit
• Compare the tag from the address with the retrieved tag
• If valid & a match in tag (hit), select the desired data using the byte offset
• Otherwise (miss)– Bring the memory block into the cache (also set valid)– Store the tag from the address with the block– Select the desired data using the byte offset
16
Lecture 20:
Writing DM Cache• Use the index bits to retrieve
the tag and valid bit
• Compare the tag from the address with the retrieved tag
• If valid & a match in tag (hit), write the data into the cache location
• Otherwise (miss), one option– Bring the memory block into the cache (also set valid)– Store the tag from the address with the block– Write the data into the cache location
17
Data
Lecture 20:
• Size of each block is 4 bytes • Cache holds 4 blocks• Memory holds 16 blocks• Memory address has 6 bits
Direct Mapped Cache Example
2 byte offset bits2 index bits
2 tag bits
18
V tag data00011011
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
0000000100100011010001010110011110001001101010111100110111101111
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
0000
miss
19
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Memory block address (binary) Data (decimal)
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1000
miss
100
100
20
00
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
0000000100100011010001010110011110001001101010111100110111101111
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1000
miss
100
100
21
00
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1100
miss
100
10000 11000
110
22
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1100
miss
100
10000 11000
110
23
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1100
miss
100
14000 11001
110140
24
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1100
miss
100
14000 11001
110140
25
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1101
miss
100
14000 11001
170140
01 170
26
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1101
miss
100
14000 11001
170140
01 170
27
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1101
miss
100
10000 11000
170140
01 170
28
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1101
hit
100
10000 11000
170140
01 170
29
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Direct Mapped Cache Example
110
130
150160
180
200
220
240
CacheProcessor
V tag data
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
00011011
1101
hit
110
10000 11000
170140
01 170
30
0000000100100011010001010110011110001001101010111100110111101111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
• Size of each block is 8 bytes • Cache holds 2 blocks• Memory holds 8 blocks• Memory address has 6 bits
Doubling the Block Size
2 tag bits1 index bit
3 byte offset bits
31
V tag data01
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
32
Cache
V tag data01
00
missR1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
000000001001010010011011100100101101110110111111
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
33
Cache
V tag data01
10
miss00 110 100
100
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100
34
Cache
V tag data01
10
hit00 110 100
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100
35
Cache
V tag data01
10
hit00 110 100
110
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100
36
Cache
V tag data01
10
00 110 100
110
miss
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100
37
Cache
V tag data01
10
01 150 140
110
miss
140
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100
38
Cache
V tag data01
10
01 150 140
110
miss
140
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100
39
Cache
V tag data01
11
01 150 140
170
miss
140
01 170 160
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100
40
Cache
V tag data01
11
01 150 140
170
miss
140
01 170 160
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100
41
Cache
V tag data01
11
00 110 100
170
miss
140
01 170 160
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
100
42
Cache
V tag data01
11
00 110 100
170140
01 170 160
hit
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Doubling the Block Size
110
130
150160
180
200
220
240
Processor
R0R1R2R3
Memory
100
120
140
170
190
210
230
250
110
43
Cache
V tag data01
11
00 110 100
170140
01 170 160
hit
000000001001010010011011100100101101110110111111
R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]
Lecture 20:
Block Size Considerations• Larger blocks may reduce miss rate due to
spatial locality
• But in a fixed-sized cache– Larger blocks => fewer of them => increased miss rate
due to conflicts– Larger blocks => data fetched along with the
requested data may not be used
• Larger blocks increase the miss penalty– Takes longer to transfer a larger block from memory
44
Lecture 20:
Next Time
More Caches
45