The Memory Hierarchy CPSC 321 Andreas Klappenecker.

41
The Memory Hierarchy CPSC 321 Andreas Klappenecker
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    0

Transcript of The Memory Hierarchy CPSC 321 Andreas Klappenecker.

The Memory Hierarchy CPSC 321

Andreas Klappenecker

Some Results from the Survey

• Issues with the CS curriculum• CPSC 111 Computer Science Concepts & Prg• CPSC 310 Databases• CPSC 431 Software Engineering

• Something from the wish list:• More C++• More Software Engineering• More focus on industry needs• Less focus on industry needs

Some Results from the Survey

• Why (MIPS) assembly language? • More detailed explanations of

programming language xyz.• Implement slightly reduced version of the

Pentium 4 or Athlon processors• Have another computer architecture class• Lack of information on CS website about

specialization...

Follow Up

• CPSC 462 Microcomputer Systems• CPSC 410 Operating Systems

• Go to seminars/lectures by Bjarne Stroustrup, Jaakko Jarvi, or Gabriel Dos Reis

Today’s Menu

Caches

Memory

Current memory is largely implemented inCMOS technology. Two alternatives: • SRAM

• fast, but not area efficient• stored value in a pair of inverting gates

• DRAM• slower, but more area efficient• value stored on charge of a capacitor (must be

refreshed)

Static RAM

Static RAM

Dynamic RAM

Dynamic RAM

Memory

• Users want large and fast memories• SRAM is too expensive for main memory• DRAM is too slow for many purposes

• Compromise• Build a memory hierarchy

CPU

Level n

Level 2

Level 1

Levels in thememory hierarchy

Increasing distance from the CPU in

access time

Size of the memory at each level

Locality

• If an item is referenced, then • it will be again referenced soon (temporal locality)• nearby data will be referenced soon (spatial locality)

• Why does code have locality?

Memory Hierarchy

• The memory is organized as a hierarchy• levels closer to the processor is a subset

of any level further away• the memory can consist of multiple

levels, but data is typically copied between two adjacent levels at a time

• initially, we focus on two levels

Memory Hierarchy

Two Level Hierarchy

• Upper level (smaller and faster)• Lower level (slower)• A unit of information that is present or not

within a level is called a block• If data requested by the processor is in the

upper level, then this is called a hit, otherwise it is called a miss

• If a miss occurs, then data will be retrieved from the lower level. Typically, an entire block is transferred

Cache

A cache represents some level of memory between CPU and main memory

[More general definitions are often used]

A Toy Example

• Assumptions• Suppose that processor requests are each one word,• and that each block consists of one word

• Example • Before request C = [X1,X2,…,Xn-1]• Processor requests Xn not contained in C• item Xn is brought from the memory to the cache• After the request C = [X1,X2,…,Xn-1,Xn]

• Issues• What happens if the cache is full?

Issues

• How do we know whether the data item is in the cache?

• If it is, how do we find it?

• Simple strategy: direct mapped cache• exactly one location where data might

be in the cache

• Mapping: address modulo the number of blocks in the cache, x -> x mod B

Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

01

001

11

001

011

101

11

• Cache with 1024=210 words• tag from cache is compared against

upper portion of the address• If tag=upper 20 bits and valid bit is

set, then we have a cache hit otherwise it is a cache miss

What kind of locality are we taking advantage of?

Direct Mapped Cache

Address (showing bit positions)

20 10

Byteoffset

Valid Tag DataIndex

0

1

2

1021

1022

1023

Tag

Index

Hit Data

20 32

31 30 13 12 11 2 1 0

Direct Mapped Cache Example

Direct Mapped Cache Example

Direct Mapped Cache Example

• Taking advantage of spatial locality:

Direct Mapped Cache

Address (showing bit positions)

16 12 Byteoffset

V Tag Data

Hit Data

16 32

4Kentries

16 bits 128 bits

Mux

32 32 32

2

32

Block offsetIndex

Tag

31 16 15 4 32 1 0

• Read hits• this is what we want!

• Read misses• stall the CPU, fetch block from memory, deliver to cache,

restart

• Write hits:• can replace data in cache and memory (write-through)• write the data only into the cache (write-back the cache later)

• Write misses:• read the entire block into the cache, then write the word

Hits vs. Misses

Hits vs. Miss Example

What Block Size?

• A large block size reduces cache misses• Cache miss penalty increases • We need to balance these two

constraints• How can we measure cache

performance?• How can we improve cache

performance?

The performance of a cache depends on many parameters:

• Memory stall clock cycles

• Read stall clock cycles

• Write stall clock cycles

Cache Block Mapping

• Direct mapped cache• a block goes in exactly one place in the

cache

• Fully associative• a block can go anywhere in the cache• difficult to find a block• parallel comparison to speed-up search

Cache Block Mapping

• Set associative• Each block maps to a unique set, and

the block can be placed into any element of that set

• Position is given by (Block number) modulo (# of sets in cache)

• If the sets contain n elements, then the cache is called n-way set associative

Cache Types