Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest...

Caching

Chapter 7

Memory Hierarchy

L2 Cache

Fastest

Slowest

Smallest

Largest

Cost/bit

Highest

Lowest

SRAM(logic)

DRAM(capacitors)

Two design decisions

• What shall we put in the cache?

• How shall we organize cache to – find things quickly– hold the most important data– freezer or backpack….

What to put in cache?Try to apply a similar problem’s solution

• Can we predict what data we will use?

What to put in cache?

• Can we predict what data we will use?– Instead of predicting branch direction, predict

next memory address request

next memory address request– Like branch prediction, use previous behavior

• Keep a prediction for every load?– Fetch stage for load is *TOO LATE*

• Keep a prediction per-memory address?

• Keep a prediction per-memory address?– Given address, guess next likely address

• Can we predict what data we will use?– Instead of predicting branch direction, predict next

memory address request– Like branch prediction, use previous behavior

• Keep a prediction per-memory address?– Given address, guess next likely address– Too many choices – table too large or fits too few

Program CharacteristicsFind out more about programs

• Temporal Locality

• Spatial Locality

Program Characteristics

• Temporal Locality– If you use one item, you are likely to use it

again soon

Program Characteristics

• Temporal Locality– If you use one item, you are likely to use it

again soon

• Spatial Locality– If you use one item, you are likely to use its

neighbors soon

Locality

• Programs tend to exhibit spatial & temporal locality. Just a fact of life.

• How can we use this knowledge of program behavior to design a cache?

What does that mean?!?

• 1. Design cache that takes advantage of spatial & temporal locality

• 2. When you program, place data together that is used together to increase spatial & temporal locality

• 2. When you program, place data together that is used together to increase locality– Java - difficult to do– C - more control over data placement

• Note: Caches exploit locality. Programs have varying degrees of locality. Caches do not have locality!

Cache Design

• Temporal Locality

Cache Design

• Temporal Locality– When we obtain the data, store it in the cache.

Cache Design

• Temporal Locality– When we obtain the data, store it in the cache.

• Spatial Locality– Transfer large block of contiguous data to get

item’s neighbors.– Block (Line): Amount of data transferred for a

single miss (data plus neighbors)

Where do we put data?

• Searching whole cache takes time & power

• Direct-mapped– Limit each piece of data to one possible

position

• Search is quick and simple

What is our “key” for lookup?

• Tools are sorted by tool-type

• Books are sorted by subject (Dewey-Decimal)

• Old LISP machine sorted by data type

• Modern machines have no information – can only sort by address

Direct-Mapped

00011011

010000

100000

110000

Memory

000100

010100

100100

110100

000000

Each box corresponds to one

word (4 bytes)

Direct-Mapped

00011011

Memory

One block (line)

000000

010000

100000

110000

000100

010100

100100

110100

Direct-Mapped

00011011

000000

010000

100000

110000

Memory

000100

010100

100100

110100

One block (line)

Draw on the board!!!Show what addresses go

Direct-Mapped cacheBlock (Line) size = 2 words or 8 bytes

00011011

Byte Address0b100100100

Where do we look in the cache?

How do we know if it is there?

DataIndex

00011011

Where do we look in the cache? BlockAddress mod #setsBlockAddress & (#sets-1)

How do we know if it is there?

DataIndex

Where is it within the block?Block Address

00011011

Where do we look in the cache? BlockAddress mod #slots BlockAddress & (#slots-1)

How do we know if it is there? We need a tag & valid bit

M[292-295]

DataTag1001

Valid1 M[288-291]

Where is it within the block?IndexTag

00011011

Direct-Mapped Cache

DataTagValid

00b1010001

Byte Offset

Block Offset

Splitting the Address

Definitions

• Byte Offset: Which _____ within _____?

• Block Offset: Which _____ within ______?

• Set: Group of ______ checked each access

• Index: Which ______ within cache?• Tag: Is this the right one?

Definitions

• Byte Offset: Which byte within word• Block Offset: Which _____ within

______?• Set: Group of ______ checked each

access• Index: Which ______ within cache?• Tag: Is this the right one?

Definitions

• Byte Offset: Which byte within word• Block Offset: Which word within

block• Set: Group of ______ checked each

Definitions

• Byte Offset: Which byte within word• Block Offset: Which word within

block• Set: Group of blocks checked each

Definitions

• Byte Offset: Which byte within word

• Block Offset: Which word within block

• Set: Group of blocks checked each access

• Index: Which set within cache?

• Tag: Is this the right one?

(All of the upper bits)

Definitions

• Block (Line)

• Hit

• Miss

• Hit time / Access time

• Miss Penalty

Definitions

• Block - unit of data transfer – bytes/words

• Hit

• Miss

• Miss Penalty

Definitions

• Hit - data found in this cache

• Miss

• Miss Penalty

Definitions

• Miss - data not found in this cache– Send request to lower level

• Miss Penalty

Definitions

• Block - unit of data transfer – bytes/words• Hit - data found in this cache• Miss - data not found in this cache

– Send request to lower level

• Hit time / Access time– Time to access this cache – look for item, return

• Miss Penalty

Definitions• Block - unit of data transfer – bytes/words

• Miss - data not found in this cache– Send request to lower level

• Hit time / Access time– Time to access this cache

• Miss Penalty– Time to receive block from lower level– Not always constant

00011011

Direct-Mapped Cache

DataTagValid

0 0x1010001

Byte Offset

Block Offset

Example 1 – Direct-MappedBlock size=2 words

00011011

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b10010000b00101000b01110000b00100000b00101000b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset

00011011

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b10010000b00101000b0111000 0b00100000b0010100 0b0100100

001001

M[76-79]

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b1001000 M0b00101000b0111000 0b00100000b0010100 0b0100100

M[72-75]

001001

Direct-Mapped Cache

DataTagValid

M[76-79] M[72-75]

0010010010

11M[20-23]

Direct-Mapped Cache

DataTagValid

M[16-19]

M[76-79] M[72-75]

0010010010

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b01110000b00100000b0010100 0b0100100

M[76-79] M[72-75]M[20-23] M[16-19]

00100100100111 M[60-63]

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b00100000b0010100 0b0100100

M[56-59]

M[76-79] M[72-75]M[20-23] M[16-19]

00100100100111

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b00100000b0010100 0b0100100

M[76-79] M[72-75]M[20-23] M[16-19]M[60-63] M[56-59]

00100100100111

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 0b0100100

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

00100100100111

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 0b0100100

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

00100100100111

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

00100101100111

Direct-Mapped Cache

DataTagValid

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

0100100101100111

M[36-39]

Direct-Mapped Cache

DataTagValid

M[32-35]

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

0100100101100111

Direct-Mapped Cache

DataTagValid

Miss Rate: Tag Index Byte OffsetBlock Offset

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

M[36-39] M[32-35]

0100100101100111

Direct-Mapped Cache

DataTagValid

Miss Rate: 4 / 6 = 67%Hit Rate: 2 / 6 = 33%

Tag Index Byte OffsetBlock Offset

M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

M[36-39] M[32-35]

Implementation

00011011

DataTagValid

Tag IndexByte Offset

Block offset

Example 2• You are implementing a 64-Kbyte cache,

32-bit address• The block size (line size) is 16 bytes.• Each word is 4 bytes• How many bits is the block offset?

• How many bits is the index?

• How many bits is the tag?

Example 2• You are implementing a 64-Kbyte cache

• The block size (line size) is 16 bytes.

• Each word is 4 bytes

• How many bits is the block offset?– 16 / 4 = 4 words -> 2 bits

• How many bits is the index?

• Each word is 4 bytes, address 32 bits

• How many bits is the index?– 64*1024 / 16 = 4096 -> 12 bits

• Each word is 4 bytes, address 32 bits

• How many bits is the index?– 64*1024 / 16 = 4096 -> 12 bits

• How many bits is the tag?– 32 - (2 + 12 + 2) = 16 bits

How caches work• Classic abstraction

• Each level of hierarchy has no knowledge of the configuration of lower level

L2 Cache

Memory

Me L2 Cache

Memory

L1 cache’s perspective L2 cache’s perspective

Memory Operation at any level

Memory

Address

1. Cache receives request1.

Memory

Address

1. Cache receives request2. Look for item in cache

Memory operation at any level

Memory

Address

Hit - return data

Memory operation at any levelData

Memory

Address

Hit - return dataMiss - request memory

Memory

Address

receive dataupdate cache

Memory

Address

Hit - return dataMiss – 3. request memory

4. receive data5. update cache5. return data

Memory operation at any levelData

Timing

Memory

Address

1. Cache receives request

Memory

Address

Timing

Access Time

Memory

Address

Hit - return data

Access Time

Memory

Address

Access Time

Memory

Address

receive blockupdate cache

Access Time

Miss Penalty

Memory

Address

receive blockupdate cachereturn data

Access Time

Miss Penalty

Performance

• Hit: latency =

• Miss: latency =

• Goal: minimize misses!!!

Performance

• Hit: latency = access time

• Miss: latency =

Performance

• Hit: latency = access time

• Miss: latency = access time + miss penalty

Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest...

Documents

Transcript of Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest...

Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

Advanced VLSI Design Unit 06: SRAM. CMOS VLSI Design13: SRAMSlide 2 Outline Memory Arrays SRAM Architecture –SRAM Cell –Decoders –Column Circuitry.

Sram Urban_2012_E

SRAM Technology - University of Michiganweb.eecs.umich.edu/.../readings/sram-technology.pdfmemory configuration. The second driving force for SRAM technology is low power applications.

UT8R1M39 40Megabit SRAM MCM UT8R2M39 80Megabit SRAM MCM UT8R4M39 160Megabit SRAM MCM · 2018-05-21 · SRAM MCM Module (0.90” Square, 132-lead Side-Brazed Dual Cavity Ceramic Flatpack)

spare parts catalog · 2019-12-13 · SRAM Spare Parts Catalog 2012 • Rev A 4 tAble of ContentS REaR DERaiLLEuRS - RoaD SRaM Red™ ™SRaM Force / SRaM Rival™ / SRaM apex FRont

Bay Area's slowest commutes

USER MANUAL - SRAM...SRAM CORPORATION.•J ANUARY 2006 5 SRAM CORPORATION • DART USER MANUAL ENGLISH 4 95-4012-306-000, REV.C ENGLISH SRAM CORPORATION • DART USER MANUAL 8. …

spare parts catalog - DIRTFREAK · SRAM Spare Parts Catalog 2013 † Rev A 4 TABLE OF CONTENTS REAR DERAILLEURS - ROAD SRAM Red 2012 SRAM Red SRAM Force / SRAM Rival / SRAM Apex FRONT

23K256 SRAM

SRAM redundancy insertion

SRAM Generator

spare parts catalog - SRAMcdn.sram.com/cdn/farfuture/04-N3gv2_-6... · SRAM Spare Parts Catalog 2013 • Rev B 4 TABLE OF CONTENTS REAR DERAILLEURS - ROAD SRAM Red 2012 SRAM Red SRAM

SRAM Overview

Sram Implementation

Symfony2: the world slowest framework

Sram technology

SRAM Technology - web.eecs.umich.edu

spare parts catalog · 43".4qbsf1bsut$bubmph t3fw$ 4 table of contents rear derailleurs - road ™sram red ™sram force™ / sram rival / sram apex front derailleurs - road

SRAM LLC WARRANTY