Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache...

92
Memory and Cache Alexander Nelson April 20, 2020 University of Arkansas - Department of Computer Science and Computer Engineering

Transcript of Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache...

Page 1: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Memory and Cache

Alexander Nelson

April 20, 2020

University of Arkansas - Department of Computer Science and Computer Engineering

Page 2: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

“Ideally one would deire an indefinitely large

memory capacity such that any particular ...

word would be immediately available. ... We

are ... forced to recognize the possibility of

construction a hierarchy of memories,e ach of

which has greater capacity than the preceding

but which is less quicly accessible” - Burks,

Goldstine, von Neumann (1946)

0

Page 3: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Memory

What is memory?

SRAM Cell DRAM Cell

1

Page 4: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Memory

Memory – Safe place to store data

• Includes static & dynamic memory technologies

• Includes magnetic disk storage

Goal – Be able to access data quickly & reliably

May need to access a lot of data

2

Page 5: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Memory

Why a hierarchy of memories?

Type Access Time Cost per GiB

SRAM semiconductor memory 0.5ns - 2.5ns $500 - $1000

DRAM semiconductor memory 50ns - 70ns $10 - $20

Flash semiconductor memory 5,000ns - 50,000ns $0.75 - $1.00

Magnetic disk 5M ns - 20M ns $0.05 - $0.10

(Numbers as of 2012 per our book)

Tradeoff between cost and speed!

3

Page 6: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Goal: Hierarchy of Memories

Why this structure?4

Page 7: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Memory

Use near-CPU memories as cache!

Cache – “A safe place for hiding or storing things”

Copy data from main memory to near memory for quick access

How do we know what will be used?

5

Page 8: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Principle of Temporal Locality

Principle of Temporal Locality

def. “An item that has been referenced is likely to be referenced

again soon”

e.g. Web Browser cache – Store more static aspects of pages

e.g. Variables in a loop

6

Page 9: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Principle of Spatial Locality

Principle of Spatial Locality

def. “An item that is referenced is likely near to items that will be

referenced soon”

e.g. Website pre-fetch – Downloading content before you scroll so

that content shows in real-time

e.g. Sequential instruction access

7

Page 10: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Locality

These two principles of locality enable caches to work well

How do we define working well?

Hit Rate – Fraction of memory accesses found in the upper levels

of cache!

conversely

Miss Rate – Fraction of memory accesses not found in upper

levels of cache

Miss Rate = 1 - Hit Rate

Good cache choices increase hit rate!

8

Page 11: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Taking Advantage of Locality

Cache Strategy:

• Store everything on disk

• Copy recently accessed items

to smaller DRAM memory

• Copy items nearby recent

accesses to smaller DRAM

memory

• Copy most recent accesses

(and nearby) from DRAM to

SRAM

9

Page 12: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Hit Rate & Miss Penalty

Block (a.k.a line) – Minimum addressable unit in memory

If Block present in memory = Hit

If Block not present in memory = Miss

Hit Time = Time required to access a given level of hierarchy if

item is present

Miss Penalty = Time required to fetch a block into a level of

memory hierarchy from a lower level

Hierarchy performance based on improved Hit Time, reduced Miss

Rate & Miss Penalty

10

Page 13: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Memory Technologies

Page 14: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

SRAM – Static Random Access Memroy

SRAM – 6-Transistor SRAM cell

Fixed access & write times – typically very close to cycle time

11

Page 15: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

DRAM – Dynamic Random Access Memory

DRAM – Uses a capacitor to

store bit

Must be periodically refreshed

To refresh:

• Read value

• Write it back

Charge kept for several

milliseconds

12

Page 16: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

DRAM Structure

Uses two-level decode structure – Allows refreshing entire rows at

the same time

13

Page 17: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

DRAM Organization

Bits in DRAM organized as rectangular array

• DRAM accesses entire row

• Burst Mode – supply successive words from a row with

reduced latency

Double Data Rate DRAM (DDR DRAM)

• Transfer on rising and falling clock edges

Quad Data Rate (QDR) DRAM

• Separate DDR inputs & outputs

14

Page 18: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

DRAM – Cost & Capacity

DRAM – Often organized into DIMM

DIMM = Dual Inline Memory Modules

DIMMs contain 4-16 DRAMs

Cost per bit has dramatically reduced

15

Page 19: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

DRAM Performance Factors

Row Buffer

• Allows several words to be read and refreshed in parallel

Synchronous DRAM

• Allow consecutive accesses in bursts without need to send

addresses

• Improves bandwidth

DRAM banking

• Allow simultaneous access to multiple DRAMs

• Improves bandwidth

16

Page 20: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Increasing Memory Bandwidth

17

Page 21: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Flash Storage

Flash – Nonvolatile Semiconductor Storage

• 100x - 1000x faster than disk

• Smaller, lower power, more robust

• More expensive per bit

18

Page 22: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Disk Storage

Disk – Nonvolatile, rotating magnetic storage

19

Page 23: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Disk Sectors and Access

Each sector records

• Sector ID

• Data (512 bytes, 4096 bytes proposed)

• Error correcting code (ECC)

• Used to hide defects and recording errors

• Synchronization fields and gaps

Access to a sector involves – Queuing delay if other accesses are

pending; Seek: move the heads; Rotational latency; Data transfer,

& Controller overhead

20

Page 24: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache

Page 25: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Memory

Cache memory – level of memory hierarchy closest to CPU

Cache memory is smaller than lower levels of hierarchy

How do we know if data is in the cache?

Where would we look?

21

Page 26: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Memory

Direct Mapped Cache – Location determined by address

Direct mapped = only one choice!

Cache address = (block address) mod (number of blocks in cache)

# of blocks is a power of 2 22

Page 27: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Memory

How do we know which block in a current location?

23

Page 28: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Memory

How do we know which block in a current location?

Store block address as well as data!

• Only need high order bits

• i.e. (block address) / (number blocks in cache)

• Called the “tag”

What if no data in a location?

24

Page 29: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Memory

How do we know which block in a current location?

Store block address as well as data!

• Only need high order bits

• i.e. (block address) / (number blocks in cache)

• Called the “tag”

What if no data in a location?

Valid bit!

• 1 = present

• 0 = not present

• initialized to 0

25

Page 30: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Example

Consider the following cache:

8 blocks, 1 word/block, direct mapped

Note, all valid bits = 0

26

Page 31: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Example – Initial State

Instruction 1:

lw $s0, $zero, 101102 # Load from 101102 into register

Is this cache hit or miss?

27

Page 32: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Example – Initial State

Instruction 1:

lw $s0, $zero, 101102 # Load from 101102 into register

Is this cache hit or miss?

MISS! – Cache is empty, all valid bits are 0!

Load block from memory into cache, then into register

28

Page 33: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Example – State 1

New state:

Instruction 2:

lw $s1, $zero, 110102 # Load from 110102 into register

Cache hit or miss?

29

Page 34: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Example – State 2

MISS! – Index 010 not valid

Load block from memory, into register. New state:

Instruction 3:

lw $s3, $zero, 101102 # Load from 101102 into register

Cache hit or miss?30

Page 35: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Example – State 3

HIT! – Index 110 valid, and Tag == 10

Load directly from cache into register, no state change

Instruction 4:

lw $s4, $zero, 100002 # Load from 100002 into register

Cache hit or miss?31

Page 36: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Example – State 4

MISS – Index 000 not valid

Load into cache then register

Instruction 5: lw $s5, $zero, 000112 # Load from 000112 into

register

Cache hit or miss?

Instruction 6: lw $s6, $zero, 100002 # Load from 100002 into

register

Cache hit or miss?

32

Page 37: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Example – State 5

Instruction 5: MISS! – index 011 not valid

Instruction 6: HIT! – index 000 valid, tag == 10

New state:

Instruction 7 – lw $s7, $zero, 100102

Cache hit or miss?33

Page 38: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Example – State 6

MISS! – Index 010 valid, but tag != 10

Replace cache with memory, load into register

Final state:

34

Page 39: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Address Subdivision

LSBs determine which cache

block to address

• Less “Byte Offset” = size of

block in words

Compare MSBs to identify if tag

matches

35

Page 40: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example – Larger Block Size

What if we use a larger block?

64 blocks, 16 bytes/block = 4 words/block

• What block number does address 1200 map to?

36

Page 41: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example – Larger Block Size

What if we use a larger block?

64 blocks, 16 bytes/block = 4 words/block

• What block number does address 1200 map to?

Block address = (1200/16) = 75

Block Number = 75 mod 64 = 11

37

Page 42: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Block Size Considerations

Larger blocks should reduce miss rate!

• What principle helps us here?

38

Page 43: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Block Size Considerations

Larger blocks should reduce miss rate!

• What principle helps us here?

• Spatial Locality!

But in a fixed-size cache:

• Larger blocks = fewer # of blocks

• More competition = increased miss rate

• Larger blocks leads to pollution

Larger miss penalty!

• Can override benefit of reduced miss rate

• Early restart and critical-word-first addressing can help

39

Page 44: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Miss

On cache hit, CPU loads from cache & proceeds normally

What happens on a cache miss?

40

Page 45: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Miss

On cache hit, CPU loads from cache & proceeds normally

What happens on a cache miss?

On cache miss:

• Stall CPU pipeline

• Fetch block from next level of hierarchy

• Instruction Cache Miss?

• Restart instruction fetch

• Data Cache Miss?

• Complete data access

41

Page 46: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Write-Through Cache

What about cache writing?

42

Page 47: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Write-Through Cache

What about cache writing?

On data write hit, could just update block in cache

• Is this true memory hierarchy?

• No! – Memory & cache inconsistent

Write Through – Also update memory on writes!

What is the drawback of this strategy?

43

Page 48: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Write-Through Cache

What about cache writing?

On data write hit, could just update block in cache

• Is this true memory hierarchy?

• No! – Memory & cache inconsistent

Write Through – Also update memory on writes!

What is the drawback of this strategy?

Writes take longer!

• e.g. Base CPI = 1

• If 10% of instructions are store & write-back takes 100 cycles

• Effective CPI = 1 + 0.1 × 100 = 11!

• Over an order of magnitude slowdown!

44

Page 49: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Write Buffer

How do we solve this problem?

Write Buffer! – Hold onto data waiting to be written to memory

CPU continues immediately, only stalls on write if buffer full

45

Page 50: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Write-Back Cache

Alternate Strategy – Only update memory when cache swapping

Called “Write-Back” cache

Strategy:

• On data-write hit, update in cache

• Keep track of whether a block is “dirty”

• When dirty block replaced – write back to memory

• Can use write buffer to allow replacing block to be read first

46

Page 51: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Write Allocation

What should happen on a write miss?

47

Page 52: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Write Allocation

What should happen on a write miss?

Alternatives for write-through

• Allocate on miss – fetch the block

• Write around – don’t fetch the block

• Programs often write a whole block before reading it

• e.g. Initialization of arrays

For write-back – Usually fetch the block

48

Page 53: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example – Intrinsity FastMATH

Embedded MIPS processor:

• 12-stage pipeline

• Instruction and data access on each cycle

Split cache – separate Instruction and Data caches

• Each 16KB

• 256 blocks, 16 words/block

• D-cache - write-through or write-back

49

Page 54: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example – Intrinsity FastMATH

SPEC2000 miss rates:

• I-cache = 0.4%

• D-cache = 11.4%

• Weighted average = 3.2%

50

Page 55: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example – Intrinsity FastMATH

51

Page 56: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Main Memory Supporting Caches

Use DRAMs for main memory

• Fixed width (e.g. 1 word)

• Connected by fixed-width clocked bus

• Bus clock is typically slower than CPU clock

Example cache block read:

• 1 bus cycle for address transfer

• 15 bus cycles per DRAM access

• 1 bus cycle per data transfer

For 4-word blcok, 1-word-wide DRAM

• Miss penalty = 1 + 4 × 15 + 4 × 1 = 65 cycles

• Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle

52

Page 57: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Improving Cache Performance

Page 58: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Measuring Cache Performance

Components of CPU time:

• Program execution cycles – includes cache hit time

• Memory stall cycles – mainly cache misses

If we make a few simplifications:

Memory stall cycles = Memory accessesProgram ×Miss rate ×Miss penalty

= InstructionsProgram × Misses

Instruction ×Miss penalty

53

Page 59: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Cache Performance Example

Given:

• I-cache miss rate = 2%

• D-cache miss rate = 4%

• Miss penalty = 100 cycles

• Base CPI (ideal cache) = 2

• Load & stores = 36% of instructions

Then, miss cycles per instruction:

• I-cache = 0.02 × 100 = 2

• D-cache = 0.36 × 0.04 × 100 = 1.44

Therefore, actual CPI:

• 2 (base) + 2 (I-cache) + 1.44 (D-cache) = 5.44

• Ideal: 5.442 = 2.72 times faster

54

Page 60: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Average Access Time

Hit time is also important for performance

Average memory access time (AMAT)

• AMAT = Hit time + Miss rate × Miss penalty

Example:

• CPU with 1ns clock

• Hit time = 1 cycle

• Miss penalty = 20 cycles,

• I-cache miss rate = 0.05 (5%)

AMAT = 1 + 0.05 × 20 = 2ns

(2 cycles per instruction)

55

Page 61: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Performance Summary

When CPU performance increased, miss penalty becomes more

significant!

Decreasing base CPI – greater proportion of time spent on memory

stalls (Amdahl’s law)

Increasing clock rate – memory stalls account for more CPU cycles

Can’t neglect cache behavior when evaluating system performance!

56

Page 62: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Associative Caches

What if we made caches more flexible to improve hit rate?

Associative caches – Blocks have flexibility on where to go

Fully Associative Cache – Allow block to any cache entry

• Requires all entries to be searched at once

• Comparator per entry (expensive!)

n-way set associative – each set contains n entries

• Block number determines which set

• (Block number) mod (#Sets in cache)

• Search all entries in a given set at once

• n comparators (less expensive!)

57

Page 63: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Associative Caches

58

Page 64: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Spectrum of Associativity

For a cache with 8 entries:

59

Page 65: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Associativity Example

Compare 4-block caches:

• Direct mapped, 2-way set associative, fully associative

• Block access sequence: 0, 8, 0, 6, 8

Direct Mapped:

60

Page 66: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Associativity Example

2-way set associative:

Fully associative:

61

Page 67: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

How Much Associativity?

Increased associativity decreases miss rate

• With diminishing returns

Simulation of a system with 64KB D-cache, 16-word blocks,

SPEC2000

• 1-way (direct) – 10.3% Miss rate

• 2-way – 8.6% Miss rate

• 4-way – 8.3% Miss rate

• 8-way – 8.1% Miss rate

62

Page 68: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Set Associative Cache Organization

63

Page 69: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Replacement Policy

Direct mapped – no choice!

Set associative:

• Prefer non-valid entry, if exists

• Else, choose among entries in the set

LRU – least recently used:

• Choose the block unused for longest time

• Simple for 2-way, manageable for 4-way, hard beyond that

Random – Approximately same performance as LRU for high

associativity

64

Page 70: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Multilevel Caches

Primary cache attached to CPU – small but fast!

Level-2 cahce services misses from primary cache

• Larger, slower, still faster than main memory

Main memory services L-2 cache misses

Some systems include L-3 cache

L-4 Cache? – Very uncommon

65

Page 71: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Multilevel Cache Example

Given:

• CPU base CPI = 1

• Clock rate = 4 GHz

• Miss rate/instruction = 2%

• Main memory access time = 100ns

With just primary cache:

• Miss penalty = 100ns0.25ns = 400 cycles

• Effective CPI = 1 + 0.02 × 400 = 9

66

Page 72: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Multilevel Cache Example

Now add L-2 cache:

• Access time = 5ns

• Global miss rate to main memory = 0.5%

Primary miss with L-2 hit:

• Penalty = 5ns0.25ns = 20 cycles

Primary miss with L-2 miss:

• Extra penalty = 500 cycles

CPI = 1 + 0.02 × 20 + 0.005 × 400 = 3.4

Performance ratio = 93.4 = 2.6 times faster!

67

Page 73: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Multilevel Cache Considerations

Primary cache – focus on minimal hit time

L-2 cache:

• Focus on low miss-rate to avoid main memory access

• Hit time has less overall impact

Results:

• L-1 cache usually smaller than a single cache

• L-1 block size smaller than L-2 block size

68

Page 74: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Interactions with Advanced CPUs

Out-of-order CPUs can execute instructions during cache miss

• Pending store stays in load/store unit

• Dependent instructions wait in reservation stations

• Independent instructions continue

Effect of miss depends on program data flow

• Much harder to analyze

• Use system simulation

69

Page 75: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Interactions with Software

Misses depend on memory access

patterns

• Algorithm behavior

• Compiler optimization for memory

access

70

Page 76: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Dependable Memory

Page 77: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Dependable Memory

Memory hierarchy needs to hold data

safely, even in failure

Dependability from redundancy!

Two is one, one is none!

71

Page 78: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Defining Failure

How to define failure?

1. Service Accomplishment – Service is delivered as specified

2. Service Interruption – Delivered service is different from

specification

Failure – Transition from state 1 to state 2

Restoration – Transition from state 2 to state 1

Failures can be intermittent or permanent

72

Page 79: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Reliability

Reliability – A measure of continuous service accomplishment

Can be expressed in the amount of time to failure

Mean Time to Failure (MTTF)

–Average amount of time until a failure

Annual Failure Rate (AFR)

– Percentage of devices expected to fail in a year

AFR = 1 − exp(−8766

MTTF) (1)

73

Page 80: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Reliability Example

Example: Assume HD with 1M hours MTTF = 114 years! Assume

server farm with 100k disks. How many disks fail per year?

AFR = 1 − exp( −87661,000,000)

AFR = 1 − exp(−0.008766)

AFR = 1 − 0.99127

AFR = 0.00872 == 0.872%

Failures per year = AFR × #components

Failures per year = 0.00872 × 100, 000

Failures per year = 872.77 Disks/year!

74

Page 81: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Reliability Example

Our book uses an approximation for AFR:

AFR ≈ 8766

MTTF(2)

Using this approximation:

AFR = 87661,000,000 = 0.008766

Failures per year = 0.008766 × 100, 000 = 876.6 disks/year

Fairly close approximation for small AFR!

75

Page 82: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Availability

Service Interruption can be measured in amount of time to repair

Mean Time to Repair (MTTR)

Mean Time Between Failures (MTBF)

– Sum of MTTF + MTTR

Availability – Measure of service accomplishment

Measured with respect to accomplishment and interruption

Therefore:

Availability =MTTF

MTTF + MTTR(3)

76

Page 83: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Availability

Goal – keep availability high!

Shorthand = “nines of availability”

• One nine = 90% = 36.5 days of repair/year

• Two nines = 99% = 3.65 days of repair/year

• Three nines = 99.9% = 526 minutes of repair/year

• Four nines = 99.99% = 52.6 minutes of repair/year

• Five nines = 99.999% = 5.26 minutes of repair/year

Good services can provide four or five nines of availability

77

Page 84: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Increasing Availability

Availability can be improved by:

• Increasing MTTF

• Decreasing MTTR

Fault – Failure of any single component

• May or may not result in system failure

78

Page 85: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Increasing Availability

Three ways to improve MTTF:

• Fault Avoidance – Prevent fault occurrence by construction

• Fault Tolerance – Using redundancy to allow service to

comply with specification despite faults

• Fault Forecasting – Predicting presence and creation of faults,

allowing component to be replaced before it fails

79

Page 86: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example: Hamming ECC Code

Example of improving dependability:

Goal – Reduce errors as the result of individual bits being incorrect

Hamming Distance – # of bits that are different between two bit

patterns

What is the hamming distance between the following:

0b00110011

0b00100001

80

Page 87: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example: Hamming ECC Code

Example of improving dependability:

Goal – Reduce errors as the result of individual bits being incorrect

Hamming Distance – # of bits that are different between two bit

patterns

What is the hamming distance between the following:

0b00110011

0b00100001

2!

81

Page 88: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example: Hamming ECC Code

Minimum Hamming Distance – Minimum allowable difference

between bit patterns

If minimum distance == 2, provide 1-bit error detection

• e.g. Parity codes

If minimum distance == 3, provides singe-bit error correction,

2-bit error detection

82

Page 89: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example: Hamming ECC Code

Hamming Error Correction Code (ECC) – distance-3 code

Steps:

1. Start number bits from 1 on the left

2. Mark all bit positions that are powers of 2 as parity bits

(positions 1, 2, 4, 8, 16, ...)

3. All other bit positions are used for data bits

4. Position of parity bit determines the sequence of data bits

that it checks

5. Set parity bits to create even parity for each group

83

Page 90: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example: Hamming ECC Code

For 8 data bits:

• Parity bits at 1, 2, 4, 8

• Data bits at 3, 5, 6, 7, 9, 10, 11, 12

• Bits are checked as follows:

84

Page 91: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Decoding ECC

Value of parity bits indicates which bits are in error!

• Use numbering from encoding procedure

• E.g.:

• Parity bits = 0000 – No Error!

• Parity bits = 1010 – bit 10 was flipped

85

Page 92: Memory and Cache - csce.uark.educsce.uark.edu/.../lectures/lecture14-memory-cache.pdf · Cache Memory How do we know which block in a current location? Store block address as well

Example Hamming SEC/DED

Did not stop at single ECC

Add an additional parity bit for the whole word (pn)

Makes Hamming Distance == 4

Single Error Correcting, Double Error Detection (SEC/DED)

Decoding: Let H = SEC parity bits

• H even, pn even – No error

• H odd, pn odd – Correctable single bit error

• H even, pn odd – Error in pn bit

• H odd, pn even – Double error occurred

86