Review CPSC 321

36
Review CPSC 321 Andreas Klappenecker

description

Review CPSC 321. Andreas Klappenecker. Announcements. Tuesday, November 30, midterm exam. Cache. Placement strategies direct mapped fully associative set-associative Replacement strategies random FIFO LRU. Direct Mapped Cache. - PowerPoint PPT Presentation

Transcript of Review CPSC 321

Page 1: Review CPSC 321

ReviewCPSC 321

Andreas Klappenecker

Page 2: Review CPSC 321

Announcements

• Tuesday, November 30, midterm exam

Page 3: Review CPSC 321

Cache

• Placement strategies• direct mapped• fully associative• set-associative

• Replacement strategies• random• FIFO• LRU

Page 4: Review CPSC 321

• Mapping: address modulo the number of blocks in the cache, x -> x mod B

Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

01

001

11

001

011

101

11

Page 5: Review CPSC 321

Set Associative Caches

• Each block maps to a unique set,• the block can be placed into any

element of that set,• Position is given by

(Block number) modulo (# of sets in cache)

• If the sets contain n elements, then the cache is called n-way set associative

Page 6: Review CPSC 321

• Cache with 1024=210 words• tag from cache is compared against

upper portion of the address• If tag=upper 20 bits and valid bit is

set, then we have a cache hit otherwise it is a cache miss

What kind of locality are we taking advantage of?

Direct Mapped Cache

Address (showing bit positions)

20 10

Byteoffset

Valid Tag DataIndex

0

1

2

1021

1022

1023

Tag

Index

Hit Data

20 32

31 30 13 12 11 2 1 0

The index is determined by address mod 1024

Byte offset

Page 7: Review CPSC 321

• Taking advantage of spatial locality:

Direct Mapped Cache

Address (showing bit positions)

16 12 Byteoffset

V Tag Data

Hit Data

16 32

4Kentries

16 bits 128 bits

Mux

32 32 32

2

32

Block offsetIndex

Tag

31 16 15 4 32 1 0

Block offset

Page 8: Review CPSC 321

Address Determination

reconstruction of the memory address = tag bits || set index bits || block offset || byte offset

Example: • 32 bit words, cache capacity 2^12 = 4096

words, blocks of 8 words, direct mapped • byte offset = 2 bits, block offset = 3 bits, set

index bits = 9 bits, tag bits = 18 bits

Page 9: Review CPSC 321
Page 10: Review CPSC 321

Example

• Suppose you want to realize a cache with a capacity for 8 KB of data (32 bits of address size). Assume that the blocksize is 4 words and a word consists of 4 bytes.

• How many bits are needed to realize a direct mapped cache? • 8 KByte = 2K words = 512 blocks = 2^9 blocks• direct mapped => # index bits = log(2^9)=9. • 2^9 x (128 + (32 – 9 – 2 – 2) + 1) = 2^9 x 148 bits

= number of blocks x (bits per block + tag + valid bit)

• How many bits are needed to realize a 8-way set associative cache? • Number of tag bits increase by 3. Why?

Page 11: Review CPSC 321

Typical Questions

• Show the evolution of a cache• Determine the number of bits needed in an

implementation of a cache• Know the placement and replacement

strategies• Be able to design a cache according to

specifications• Determine the number of cache misses• Measure cache performance

Page 12: Review CPSC 321

Typical Questions

• What kind of placement is typically used in virtual memory systems?

• What is a translation lookaside buffer?

• Why is a TLB used?

Page 13: Review CPSC 321

Pages: virtual memory blocks

• Page faults: if data is not in memory, retrieve it from disk• huge miss penalty, thus pages should be fairly large

(e.g., 4KB)• reducing page faults is important (LRU is worth the

price)• can handle the faults in software instead of hardware• using write-through takes too long so we use writeback• Example: page size 212=4KB; 218 physical pages; main memory <= 1GB; virtual memory <= 4GB

3 2 1 011 10 9 815 14 13 1231 30 29 28 27

Page offsetVirtual page number

Virtual address

3 2 1 011 10 9 815 14 13 1229 28 27

Page offsetPhysical page number

Physical address

Translation

Page 14: Review CPSC 321

Page Faults

• Incredible high penalty for a page fault• Reduce number of page faults by

optimizing page placement• Use fully associative placement

• full search of pages is impractical• pages are located by a full table that indexes

the memory, called the page table• the page table resides within the memory

Page 15: Review CPSC 321

Page Tables

Physical memory

Disk storage

Valid

1

1

1

1

0

1

1

0

1

1

0

1

Page table

Virtual pagenumber

Physical page ordisk address

The page table maps each page to either a page in mainmemory or to a page stored on disk

Page 16: Review CPSC 321

Page Tables

Page offsetVirtual page number

Virtual address

Page offsetPhysical page number

Physical address

Physical page numberValid

If 0 then page is notpresent in memory

Page table register

Page table

20 12

18

31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0

29 28 27 15 14 13 12 11 10 9 8 3 2 1 0

Page 17: Review CPSC 321

Making Memory Access Fast

• Page tables slow us down• Memory access will take at least twice as

long• access page table in memory• access page

• What can we do?

Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer

Page 18: Review CPSC 321

Making Address Translation Fast

A cache for address translations: translation lookaside buffer

Valid

1

1

1

1

0

1

1

0

1

1

0

1

Page table

Physical pageaddressValid

TLB

1

1

1

1

0

1

TagVirtual page

number

Physical pageor disk address

Physical memory

Disk storage

Page 19: Review CPSC 321

MIPS Processor and Variations

Page 20: Review CPSC 321

Datapath for MIPS instructions

Note the seven control signals!

Page 21: Review CPSC 321

Single Cycle Datapath

Page 22: Review CPSC 321

Pipelined Version

Page 23: Review CPSC 321

Obstacles to Pipelining

• Structural Hazards• hardware cannot support the combination of

instructions in the same clock cycle

• Control Hazards• need to make decision based on results of one

instruction while other is still executing

• Data Hazards• instruction depends on results of instruction

still in pipeline

Page 24: Review CPSC 321

• Control Hazards Resolution (for branch)• Stall pipeline• predict result• delayed branch

Page 25: Review CPSC 321

Stall on Branch

• Assume that all branch computations are done in stage 2

• Delay by one cycle to wait for the result

Page 26: Review CPSC 321

Branch Prediction

• Predict branch result• For example, predict always that branch is not taken (e.g. reasonable for while instructions)• if choice is correct, then pipeline runs at

full speed• if choice is incorrect, then pipeline stalls

Page 27: Review CPSC 321

Branch Prediction

Page 28: Review CPSC 321

Delayed Branch

Page 29: Review CPSC 321

Data Hazards

• A data hazard results if an instruction depends on the result of a previous instruction• add $s0, $t0, $t1• sub $t2, $s0, $t3 // $s0 to be determined

• These dependencies happen often, so it is not possible to avoid them completely

• Use forwarding to get missing data from internal resources once available

Page 30: Review CPSC 321

Forwarding

add $s0, $t0, $t1

sub $t2, $s0, $t3

Page 31: Review CPSC 321
Page 32: Review CPSC 321
Page 33: Review CPSC 321

Typical Questions

• Given a brief specification of the processor and a sequences of instructions, determine all pipeline hazards.

• Most typical question: fill in some steps in a timing diagram (almost every exam has such a question, google).

Page 34: Review CPSC 321

Example

add $1, $2, $3 _ _ _ _ _

add $4, $5, $6 _ _ _ _ _

add $7, $8, $9 _ _ _ _ _

add $10, $11, $12 _ _ _ _ _

add $13, $14, $1 _ _ _ _ _ (data arrives early OK)

add $15, $16, $7 _ _ _ _ _ (data arrives on time OK)

add $17, $18, $13 _ _ _ _ _ (uh, oh)

add $19, $20, $17 _ _ _ _ _ (uh, oh)

Page 35: Review CPSC 321

Verilog

Page 36: Review CPSC 321

Mixed Questions