ECE 2300 Digital Logic & Computer Organization...Hit time + Miss rate * Miss penalty • Example –...

Lecture 20:

Spring 2018

1

Caches

ECE 2300Digital Logic & Computer Organization

Lecture 20:

Announcements

2

• HW7 will be posted tonight

• Instructor OH cancelled today

• Lab sessions resume next week

Lecture 20:

Course Content• Binary numbers and logic gates• Boolean algebra and combinational logic• Sequential logic and state machines• Binary arithmetic• Memories• Instruction set architecture• Processor organization• Caches and virtual memory• Input/output• Advanced topics

3

Lecture 20: 4

ALU

Fm … F0

MUX

MD

CU

PC

PCL

MUX

PCJMW

D_IN

+2 Adder

IF/ID ID/EX EX/MEM MEM/WB

Decoder

SE

InstRAM

DataRAM

MB

MUX

MUX

MUX

MUX

MUX

LDSASBDR

D_in

RF

=?sign bit

PCJ

MB, F MW, MD LD

Review: Pipelined Microprocessor

Lecture 20: 5

• Assume HW forwarding and NO delay slot for load

• Identify all data hazards in the following instruction sequences by circling each source register that is read before the updated value is written back

LW R2, 0(R1) X: SW R2, 4(R1)

ADDI R3, R2, 1BEQ R3, R1, X

Example: Data Hazards with Forwarding

Lecture 20:

We Need Fast and Large Memory

Instruction RAM

RegReg ALU

Data RAM

6


IF ID EX MEM WB

• Processor cycle time: ~300ps-2ns (~3GHz-500MHz)• DRAM

– Slow (10-50 ns for a read or write)– Cheap (1 transistor + capacitor per bit cell)

• SRAM– Fast (100’s of ps to few ns for a read/write)– Expensive (6 transistors per bit cell)

Lecture 20:

Using Caches in the Pipeline

InstructionCache

(SRAM)

RegReg ALU

DataCache

(SRAM)

7


IF ID EX MEM WB

Main Memory(DRAM)

Lecture 20:

Cache

8

• Small SRAM memory that permits rapid access to a subset of instructions or data – If the data is in the cache (cache hit), we retrieve it without

slowing down the pipeline– If the data is not in the cache (cache miss), we retrieve it from the

main memory (penalty incurred in accessing DRAM)

• The hit rate is the fraction of memory accesses found in the cache – The miss rate is (1 – hit rate)

Lecture 20:

Memory Access with Cache

9

• Average memory access time with cache: Hit time + Miss rate * Miss penalty

• Example– Main memory access time = 50ns– Cache hit time = 2ns– Miss rate = 10%

Average mem access time w/o cache = 50ns

Average mem access time w/ cache = 2 + 0.1*50 = 7ns

Lecture 20:

Why Caches Work: Principle of Locality• Temporal locality

– If memory location X is accessed, then it is likely to be accessed again in the near future• Caches exploit temporal locality by keeping a referenced

instruction or data in the cache

• Spatial locality– If memory location X is accessed, then locations near

X are likely to be accessed in the near future • Caches exploit spatial locality by bringing in a block of

instructions or data into the cache on a miss

10

Lecture 20:

Some Important Terms

11

• Cache is partitioned into blocks– Each cache block (or cache line) typically contains

multiple bytes of data – A whole block is read or written during data transfer

between cache and main memory

• Each cache block is associated with a tag and a valid bit – Tag: A unique ID to differentiate between different

memory blocks may be mapped into the same block – Valid bit: indicates whether the data in a cache block

is valid (1) or not (0)

Lecture 20: 12

• A given memory block is mapped to one and only one cache block

CacheBlock

MemoryBlock

0 0, 8, 16, 241 1, 9, 17, 252 2, 10, 18, 263 3, 11, 19, 274 4, 12, 20, 285 5, 13, 21, 296 6, 14, 22, 307 7, 15, 23, 31

Example:• A cache with 8 blocks

– Block addresses in decimal

• Assume the main memory is 4 times larger than cache (i.e., 32 blocks)

Direct Mapped Cache Concepts

Lecture 20: 13

Lecture 21: 4

00001

00101

01001

01101

10001

10101

11001

11101

Memory

Cache

Direct Mapped (DM) Cache Concepts

Same example • Cache has 8 blocks and

main memory has 32 blocks

• Block addresses are in binary

• 4 different memory blocks may be mapped to the same cache location

Lecture 20: 14

Address Translation for DM Cache

n-i-b tag bits i index bitsb byte offset bits

• DM cache parameters – Size of each cache block is 2b bytes

• “cache block” and “cache line” are synonymous– Number of blocks is 2i

– Total cache size is 2b × 2i = 2b+i bytes

• Breakdown of a n-bit memory address for cache use

Lecture 20: 15

DM Cache Organization32-bit memory addressDM cache parameters• 2 byte offset bits• 10 index bits• 20 tag bits

Lecture 20:

Reading DM Cache• Use the index bits to retrieve

the tag, data, and valid bit

• Compare the tag from the address with the retrieved tag

• If valid & a match in tag (hit), select the desired data using the byte offset

• Otherwise (miss)– Bring the memory block into the cache (also set valid)– Store the tag from the address with the block– Select the desired data using the byte offset

16

Lecture 20:

Writing DM Cache• Use the index bits to retrieve

the tag and valid bit

• Compare the tag from the address with the retrieved tag

• If valid & a match in tag (hit), write the data into the cache location

• Otherwise (miss), one option– Bring the memory block into the cache (also set valid)– Store the tag from the address with the block– Write the data into the cache location

17

Data

Lecture 20:

• Size of each block is 4 bytes • Cache holds 4 blocks• Memory holds 16 blocks• Memory address has 6 bits

Direct Mapped Cache Example

2 byte offset bits2 index bits

2 tag bits

18

V tag data00011011

Lecture 20:


110

130

150160

180

200

220

240

0000000100100011010001010110011110001001101010111100110111101111

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

0000

miss

19

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Memory block address (binary) Data (decimal)

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1000

miss

100

100

20

00

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

0000000100100011010001010110011110001001101010111100110111101111

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1000

miss

100

100

21

00

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1100

miss

100

10000 11000

110

22

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1100

miss

100

10000 11000

110

23

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1100

miss

100

14000 11001

110140

24

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1100

miss

100

14000 11001

110140

25

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1101

miss

100

14000 11001

170140

01 170

26

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1101

miss

100

14000 11001

170140

01 170

27

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1101

miss

100

10000 11000

170140

01 170

28

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1101

hit

100

10000 11000

170140

01 170

29

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

CacheProcessor

V tag data

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

00011011

1101

hit

110

10000 11000

170140

01 170

30

0000000100100011010001010110011110001001101010111100110111101111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:

• Size of each block is 8 bytes • Cache holds 2 blocks• Memory holds 8 blocks• Memory address has 6 bits

Doubling the Block Size

2 tag bits1 index bit

3 byte offset bits

31

V tag data01

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

32

Cache

V tag data01

00

missR1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

000000001001010010011011100100101101110110111111

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

33

Cache

V tag data01

10

miss00 110 100

100

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100

34

Cache

V tag data01

10

hit00 110 100

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100

35

Cache

V tag data01

10

hit00 110 100

110

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100

36

Cache

V tag data01

10

00 110 100

110

miss

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100

37

Cache

V tag data01

10

01 150 140

110

miss

140

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100

38

Cache

V tag data01

10

01 150 140

110

miss

140

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100

39

Cache

V tag data01

11

01 150 140

170

miss

140

01 170 160

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100

40

Cache

V tag data01

11

01 150 140

170

miss

140

01 170 160

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100

41

Cache

V tag data01

11

00 110 100

170

miss

140

01 170 160

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

100

42

Cache

V tag data01

11

00 110 100

170140

01 170 160

hit

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:


110

130

150160

180

200

220

240

Processor

R0R1R2R3

Memory

100

120

140

170

190

210

230

250

110

43

Cache

V tag data01

11

00 110 100

170140

01 170 160

hit

000000001001010010011011100100101101110110111111

R1 <= M[000000]R2 <= M[000100]R3 <= M[010000]R2 <= M[011100]R1 <= M[000000]R1 <= M[000100]

Lecture 20:

Block Size Considerations• Larger blocks may reduce miss rate due to

spatial locality

• But in a fixed-sized cache– Larger blocks => fewer of them => increased miss rate

due to conflicts– Larger blocks => data fetched along with the

requested data may not be used

• Larger blocks increase the miss penalty– Takes longer to transfer a larger block from memory

44

Lecture 20:

Next Time

More Caches

45

ECE 2300 Digital Logic & Computer Organization...Hit time + Miss rate * Miss penalty • Example –...

Documents

Transcript of ECE 2300 Digital Logic & Computer Organization...Hit time + Miss rate * Miss penalty • Example –...