Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest...

76
Caching Chapter 7

Transcript of Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest...

Page 1: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Caching

Chapter 7

Page 2: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Memory Hierarchy

CPU

L1

L2 Cache

DRAM

Speed

Fastest

Slowest

Size

Smallest

Largest

Cost/bit

Highest

Lowest

Tech

SRAM(logic)

SRAM(logic)

DRAM(capacitors)

Page 3: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Two design decisions

• What shall we put in the cache?

• How shall we organize cache to – find things quickly– hold the most important data– freezer or backpack….

Page 4: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What to put in cache?Try to apply a similar problem’s solution

• Can we predict what data we will use?

Page 5: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What to put in cache?

• Can we predict what data we will use?– Instead of predicting branch direction, predict

next memory address request

Page 6: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What to put in cache?

• Can we predict what data we will use?– Instead of predicting branch direction, predict

next memory address request– Like branch prediction, use previous behavior

Page 7: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What to put in cache?

• Can we predict what data we will use?– Instead of predicting branch direction, predict

next memory address request– Like branch prediction, use previous behavior

• Keep a prediction for every load?– Fetch stage for load is *TOO LATE*

• Keep a prediction per-memory address?

Page 8: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What to put in cache?

• Can we predict what data we will use?– Instead of predicting branch direction, predict

next memory address request– Like branch prediction, use previous behavior

• Keep a prediction for every load?– Fetch stage for load is *TOO LATE*

• Keep a prediction per-memory address?– Given address, guess next likely address

Page 9: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What to put in cache?

• Can we predict what data we will use?– Instead of predicting branch direction, predict next

memory address request– Like branch prediction, use previous behavior

• Keep a prediction for every load?– Fetch stage for load is *TOO LATE*

• Keep a prediction per-memory address?– Given address, guess next likely address– Too many choices – table too large or fits too few

Page 10: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Program CharacteristicsFind out more about programs

• Temporal Locality

• Spatial Locality

Page 11: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Program Characteristics

• Temporal Locality– If you use one item, you are likely to use it

again soon

• Spatial Locality

Page 12: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Program Characteristics

• Temporal Locality– If you use one item, you are likely to use it

again soon

• Spatial Locality– If you use one item, you are likely to use its

neighbors soon

Page 13: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Locality

• Programs tend to exhibit spatial & temporal locality. Just a fact of life.

• How can we use this knowledge of program behavior to design a cache?

Page 14: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What does that mean?!?

• 1. Design cache that takes advantage of spatial & temporal locality

Page 15: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What does that mean?!?

• 1. Design cache that takes advantage of spatial & temporal locality

• 2. When you program, place data together that is used together to increase spatial & temporal locality

Page 16: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What does that mean?!?

• 1. Design cache that takes advantage of spatial & temporal locality

• 2. When you program, place data together that is used together to increase locality– Java - difficult to do– C - more control over data placement

• Note: Caches exploit locality. Programs have varying degrees of locality. Caches do not have locality!

Page 17: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache Design

• Temporal Locality

• Spatial Locality

Page 18: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache Design

• Temporal Locality– When we obtain the data, store it in the cache.

• Spatial Locality

Page 19: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache Design

• Temporal Locality– When we obtain the data, store it in the cache.

• Spatial Locality– Transfer large block of contiguous data to get

item’s neighbors.– Block (Line): Amount of data transferred for a

single miss (data plus neighbors)

Page 20: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Where do we put data?

• Searching whole cache takes time & power

• Direct-mapped– Limit each piece of data to one possible

position

• Search is quick and simple

Page 21: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

What is our “key” for lookup?

• Tools are sorted by tool-type

• Books are sorted by subject (Dewey-Decimal)

• Old LISP machine sorted by data type

• Modern machines have no information – can only sort by address

Page 22: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Direct-Mapped

Cache

00011011

010000

100000

110000

Memory

000100

010100

100100

110100

Index

000000

Each box corresponds to one

word (4 bytes)

Page 23: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Direct-Mapped

Cache

00011011

Memory

One block (line)

Index

000000

010000

100000

110000

000100

010100

100100

110100

Page 24: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Direct-Mapped

Cache

00011011

000000

010000

100000

110000

Memory

000100

010100

100100

110100

One block (line)

Index

Draw on the board!!!Show what addresses go

where

Page 25: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Direct-Mapped cacheBlock (Line) size = 2 words or 8 bytes

00011011

Byte Address0b100100100

Where do we look in the cache?

How do we know if it is there?

DataIndex

Page 26: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Direct-Mapped cacheBlock (Line) size = 2 words or 8 bytes

00011011

Byte Address0b100100100

Where do we look in the cache? BlockAddress mod #setsBlockAddress & (#sets-1)

How do we know if it is there?

DataIndex

Where is it within the block?Block Address

Page 27: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Direct-Mapped cacheBlock (Line) size = 2 words or 8 bytes

00011011

Byte Address0b100100100

Where do we look in the cache? BlockAddress mod #slots BlockAddress & (#slots-1)

How do we know if it is there? We need a tag & valid bit

M[292-295]

DataTag1001

Valid1 M[288-291]

Where is it within the block?IndexTag

Page 28: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

00011011

Direct-Mapped Cache

DataTagValid

000

00b1010001

Tag

Index

Byte Offset

Block Offset

Splitting the Address

Page 29: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Byte Offset: Which _____ within _____?

• Block Offset: Which _____ within ______?

• Set: Group of ______ checked each access

• Index: Which ______ within cache?• Tag: Is this the right one?

Page 30: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Byte Offset: Which byte within word• Block Offset: Which _____ within

______?• Set: Group of ______ checked each

access• Index: Which ______ within cache?• Tag: Is this the right one?

Page 31: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Byte Offset: Which byte within word• Block Offset: Which word within

block• Set: Group of ______ checked each

access• Index: Which ______ within cache?• Tag: Is this the right one?

Page 32: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Byte Offset: Which byte within word• Block Offset: Which word within

block• Set: Group of blocks checked each

access• Index: Which ______ within cache?• Tag: Is this the right one?

Page 33: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Byte Offset: Which byte within word

• Block Offset: Which word within block

• Set: Group of blocks checked each access

• Index: Which set within cache?

• Tag: Is this the right one?

(All of the upper bits)

Page 34: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Block (Line)

• Hit

• Miss

• Hit time / Access time

• Miss Penalty

Page 35: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Block - unit of data transfer – bytes/words

• Hit

• Miss

• Hit time / Access time

• Miss Penalty

Page 36: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Block - unit of data transfer – bytes/words

• Hit - data found in this cache

• Miss

• Hit time / Access time

• Miss Penalty

Page 37: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Block - unit of data transfer – bytes/words

• Hit - data found in this cache

• Miss - data not found in this cache– Send request to lower level

• Hit time / Access time

• Miss Penalty

Page 38: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions

• Block - unit of data transfer – bytes/words• Hit - data found in this cache• Miss - data not found in this cache

– Send request to lower level

• Hit time / Access time– Time to access this cache – look for item, return

data

• Miss Penalty

Page 39: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Definitions• Block - unit of data transfer – bytes/words

• Hit - data found in this cache

• Miss - data not found in this cache– Send request to lower level

• Hit time / Access time– Time to access this cache

• Miss Penalty– Time to receive block from lower level– Not always constant

Page 40: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

00011011

Direct-Mapped Cache

DataTagValid

000

0 0x1010001

Tag

Index

Byte Offset

Block Offset

Example 1 – Direct-MappedBlock size=2 words

Page 41: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

00011011

Direct-Mapped Cache

DataTagValid

000

0

Reference Stream: Hit/Miss0b10010000b00101000b01110000b00100000b00101000b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

Page 42: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Example 1 – Direct-MappedBlock size=2 words

00011011

Direct-Mapped Cache

DataTagValid

00

0

Reference Stream: Hit/Miss0b10010000b00101000b0111000 0b00100000b0010100 0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

0

Page 43: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Example 1 – Direct-MappedBlock size=2 words

001001

1011

M[76-79]

Direct-Mapped Cache

DataTagValid

100

0

Reference Stream: Hit/Miss0b1001000 M0b00101000b0111000 0b00100000b0010100 0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

M[72-75]

Page 44: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

001001

1011

Direct-Mapped Cache

DataTagValid

100

0

Reference Stream: Hit/Miss0b1001000 M0b00101000b0111000 0b00100000b0010100 0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

M[76-79] M[72-75]

Page 45: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

0010010010

11M[20-23]

Direct-Mapped Cache

DataTagValid

11

0

0

Reference Stream: Hit/Miss0b1001000 M0b00101000b0111000 0b00100000b0010100 0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

M[16-19]

Example 1 – Direct-MappedBlock size=2 words

M[76-79] M[72-75]

Page 46: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

0010010010

11

Direct-Mapped Cache

DataTagValid

0

11

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b01110000b00100000b0010100 0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

M[76-79] M[72-75]M[20-23] M[16-19]

Page 47: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

00100100100111 M[60-63]

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b00100000b0010100 0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

M[56-59]

Example 1 – Direct-MappedBlock size=2 words

M[76-79] M[72-75]M[20-23] M[16-19]

Page 48: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

00100100100111

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b00100000b0010100 0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

M[76-79] M[72-75]M[20-23] M[16-19]M[60-63] M[56-59]

Page 49: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

00100100100111

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

Page 50: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

00100100100111

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

Page 51: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

00100100100111

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

Page 52: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

00100101100111

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M

Miss Rate:Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

Page 53: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

0100100101100111

M[36-39]

Direct-Mapped Cache

DataTagValid

111

1

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M

Miss Rate:Tag Index Byte OffsetBlock Offset

M[32-35]

Example 1 – Direct-MappedBlock size=2 words

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

Page 54: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

0100100101100111

Direct-Mapped Cache

DataTagValid

111

1

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M

Miss Rate: Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

M[36-39] M[32-35]

Page 55: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

0100100101100111

Direct-Mapped Cache

DataTagValid

111

1

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M

Miss Rate: 4 / 6 = 67%Hit Rate: 2 / 6 = 33%

Tag Index Byte OffsetBlock Offset

Example 1 – Direct-MappedBlock size=2 words

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

M[36-39] M[32-35]

Page 56: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Implementation

00011011

DataTagValid

Byte Address0b100100100

Tag IndexByte Offset

=

Hit?

MUX

Block offset

Data

Page 57: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Example 2• You are implementing a 64-Kbyte cache,

32-bit address• The block size (line size) is 16 bytes.• Each word is 4 bytes• How many bits is the block offset?

• How many bits is the index?

• How many bits is the tag?

Page 58: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Example 2• You are implementing a 64-Kbyte cache

• The block size (line size) is 16 bytes.

• Each word is 4 bytes

• How many bits is the block offset?– 16 / 4 = 4 words -> 2 bits

• How many bits is the index?

• How many bits is the tag?

Page 59: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Example 2• You are implementing a 64-Kbyte cache

• The block size (line size) is 16 bytes.

• Each word is 4 bytes, address 32 bits

• How many bits is the block offset?– 16 / 4 = 4 words -> 2 bits

• How many bits is the index?– 64*1024 / 16 = 4096 -> 12 bits

• How many bits is the tag?

Page 60: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Example 2• You are implementing a 64-Kbyte cache

• The block size (line size) is 16 bytes.

• Each word is 4 bytes, address 32 bits

• How many bits is the block offset?– 16 / 4 = 4 words -> 2 bits

• How many bits is the index?– 64*1024 / 16 = 4096 -> 12 bits

• How many bits is the tag?– 32 - (2 + 12 + 2) = 16 bits

Page 61: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

How caches work• Classic abstraction

• Each level of hierarchy has no knowledge of the configuration of lower level

L1

L2 Cache

DRAM

Memory

Me L2 Cache

DRAM

Memory

Me

L1 cache’s perspective L2 cache’s perspective

Page 62: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Memory Operation at any level

Cache

Memory

Me

Address

1. Cache receives request1.

Page 63: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Memory operation at any level

1.

2.

Page 64: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Hit - return data

Memory operation at any levelData

1.

2.

3.

Page 65: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Hit - return dataMiss - request memory

Memory operation at any level

1.

2.

3.

Page 66: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Hit - return dataMiss - request memory

receive dataupdate cache

Memory operation at any level

1.

2.

3.4.

Page 67: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Hit - return dataMiss – 3. request memory

4. receive data5. update cache5. return data

Memory operation at any levelData

1.

2.

3.4.

5.

Page 68: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Timing

Cache

Memory

Me

Address

1. Cache receives request

Page 69: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Timing

Access Time

Page 70: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Hit - return data

Data

Access Time

Page 71: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Hit - return dataMiss - request memory

Access Time

Page 72: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Hit - return dataMiss - request memory

receive blockupdate cache

Access Time

Miss Penalty

Page 73: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Hit - return dataMiss - request memory

receive blockupdate cachereturn data

Data

Access Time

Miss Penalty

Page 74: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Performance

• Hit: latency =

• Miss: latency =

• Goal: minimize misses!!!

Page 75: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Performance

• Hit: latency = access time

• Miss: latency =

• Goal: minimize misses!!!

Page 76: Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Performance

• Hit: latency = access time

• Miss: latency = access time + miss penalty

• Goal: minimize misses!!!