Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by...

80
Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.

Transcript of Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by...

Page 1: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

AnneBracyCS3410

ComputerScienceCornellUniversity

The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.

Page 2: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

2

insn0.fetch, dec, exec

Single-cycle

insn1.fetch, dec, exec

Pipelinedinsn0.decinsn0.fetch

insn1.decinsn1.fetchinsn0.exec

insn1.exec

Page 3: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

5-stagePipeline• Implementation• WorkingExample

3

Hazards• Structural• DataHazards• ControlHazards

Page 4: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Write-BackMemory

InstructionFetch Execute

InstructionDecode

extend

registerfile

control

4

alu

memory

din dout

addrPC

memory

newpc

inst

IF/ID ID/EX EX/MEM MEM/WB

imm

BA

ctrl

ctrl

ctrl

BD D

M

computejump/branch

targets

+4

Page 5: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

5

1 2 3 4 5 6 7 8 9Cycle

Latency:Throughput:

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

Latency: 5cyclesThroughput: 1insn/cycle CPI=1

add

nand

lw

add

sw

Page 6: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

• Breakdatapath intomultiplecycles (here5)• Parallelexecutionincreasesthroughput• Balancedpipelineveryimportant

• Sloweststagedeterminesclockrate• Imbalancekillsperformance

• Addpipelineregisters(flip-flops) forisolation• Eachstagebeginsbyreadingvaluesfromlatch• Eachstageendsbywritingvaluestolatch

• Resolvehazards

6

Page 7: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

7

Stage PerformFunctionality Latchvaluesofinterest

Fetch UsePCtoindexProgramMemory,incrementPC

Instructionbits(tobedecoded)PC+4(tocomputebranchtargets)

Decode Decodeinstruction,generatecontrolsignals,readregisterfile

Controlinformation,Rdindex,immediates,offsets,register values(Ra,Rb),PC+4(tocomputebranchtargets)

ExecutePerformALUoperationComputetargets(PC+4+offset,etc.)incasethisisabranch,decideifbranchtaken

Controlinformation,Rdindex, etc.ResultofALUoperation,valueincasethisisastoreinstruction

Memory Performload/storeifneeded,addressisALUresult

Controlinformation,Rdindex,etc.Resultofload,passresultfrom execute

Writeback Selectvalue,writetoregisterfile

Page 8: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

8

PC

instructionmemory

inst

addr mc

00=readword

IF/ID

Restofp

ipeline

+4

PC+4

pc-sel

pc-regpc-rel

pc-abs•PC+4•pc-reg (PCregisters:JR)•pc-rel (PC-relative: BEQ,BNE)•pc-abs(PCabsolute:JandJAL)

Page 9: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

9

ctrl

ID/EX

Restofp

ipeline

PC+4

inst

IF/ID

PC+4

Stage1:InstructionFetch

registerfile

WERd

Ra Rb

DB

A

BA

extend imm

decode

result

dest

Page 10: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Stage2:InstructionDe

code

pc-rel

pc-abs

10

ctrl

EX/MEM

Restofp

ipeline

BD

ctrl

ID/EX

PC+4

BA

alu

+�

branch?im

mpc-sel

pc-reg

target

Page 11: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

11

ctrl

MEM/WB

Restofp

ipeline

Stage3:Execute

MD

ctrl

EX/MEM

BD

memory

din doutaddr

mctarget

branch?pc-sel

pc-rel

pc-abs

pc-reg

Page 12: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

12

Stage4:M

emory

ctrl

MEM/WB

MD

result

dest

Page 13: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

BA

Rt

BD

MD

PC+4

imm

ctrl

target

OP

Rd

OP

PC

instmem

Rd

Ra Rb

DB

A

Rd

13

Page 14: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Consideranon-pipelinedprocessorwithclockperiodC (e.g.,50ns).IfyoudividetheprocessorintoN stages(e.g.,5),yournewclockperiodwillbe:

A. CB. NC. lessthanC/ND. C/NE. greaterthanC/N

14

Page 15: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

• Instructionssamelength• 32bits,easytofetchandthendecode

• 3typesofinstructionformats• Easytoroutebitsbetweenstages• Canreadaregistersourcebeforeevenknowing

whattheinstructionis• Memoryaccessthroughlwandswonly

• AccessmemoryafterALU

15

Page 16: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

5-stagePipeline• Implementation• WorkingExample

16

Hazards• Structural• DataHazards• ControlHazards

Page 17: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

add r3 ß r1, r2 nand r6 ß r4, r5 lw r4 ß 20(r2)add r5 ß r2, r5sw r7 à 12(r3)

Assume8-registermachine

17

Page 18: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

data

dest

IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

Time:018

Page 19: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

nop

0

0

0

040

0

nop

0

0

nop

0

0

0

0

add312

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits26-31

data

dest

Fetch:add312

add312

IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

Time:119

Page 20: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

add

3

9

36

480

0

nop

0

0

nop

0

0

0

0nand645

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

12

Bits26-31

data

dest

Fetch:nand645

nand 645 add312

IF/ID ID/EX EX/MEM MEM/WB

extend

2MUX

3

Time:220

Page 21: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

nand

6

7

18

8124

45

add

3

9

nop

0

0

0

0lw420(2)

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

45

Bits26-31

data

dest

Fetch:lw420(2)

lw420(2) nand 645 add312

36

9

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

6 32

Time:321

Page 22: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

lw

20

18

9

12168

-3

nand

6

7

add

3

45

0

0add525

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

24

Bits26-31

data

dest

Fetch:add525

add525 lw420(2) nand 645 add312

18

7

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

4MUX

0 65

Time:4

nand

18=0100107=000111-------------------3=111101

22

Page 23: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

add

5

7

9

162012

29

lw

4

18

nand

6

-3

0

0sw712(3)

945187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

25

Bits26-31

data

dest

Fetch:sw712(3)

sw712(3) add525 lw420(2) nand 645add312

9

20

4

-3

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

5 04

Time:523

Page 24: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

sw

12

22

45

2016

16

add

5

7

lw

4

29

99

0945187

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

37

Bits26-31

data

dest

Nomoreinstructions

nop sw712(3) add525 lw420(2) nand 645

9

7

5

29

4

-3

6

IF/ID ID/EX EX/MEM MEM/WB

extend

7MUX

0 55

Time:624

Page 25: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

20

57

sw

7

22

add

5

16

0

0945997

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits26-31

data

dest

Nomoreinstructions

nop nop sw712(3) add525 lw420(2)

45

7

12

16

5

99

4

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

07

Time:725

Page 26: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

sw

7

57

0

9459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits26-31

data

dest

Nomoreinstructions

nop nop nop sw712(3) add525

2257

22

16

5

SlidesthankstoSallyMcKee

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Time:826

Page 27: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

9459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits21-23

data

dest

Nomoreinstructions

nop nop nop nop sw712(3)

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Time:927

Page 28: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Pipeliningisgreatbecause:

A. Youcanfetchanddecodethesameinstructionatthesametime.

B. Youcanfetchtwoinstructionsatthesametime.C. Youcanfetchoneinstructionwhiledecoding

another.D. Instructionsonlyneedtovisitthepipeline

stagesthattheyrequire.E. CandD

28

Page 29: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

5-stagePipeline• Implementation• WorkingExample

29

Hazards• Structural• DataHazards• ControlHazards

Page 30: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Correctnessproblemsassociatedw/processordesign

1. StructuralhazardsSameresourceneededfordifferentpurposesatthesametime(Possible:ALU, RegisterFile,Memory)

2. DatahazardsInstructionoutputneededbeforeit’savailable

3. ControlhazardsNextinstructionPCunknownattimeofFetch

30

Page 31: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

31

addr3,r2,r1nopnop

addr6,r5,r4

datamem

instmem

DB

A

IF ID Ex M WIF ID Ex M W

IF ID Ex M W

add r3, r2,r1nopaddr6,r5,r4

Problem: NeedtoreadfromandwritetoRegisterFileatthesametimeSolution: negateRFclock:writefirsthalf,readsecondhalf

nop

IF ID Ex M W

Page 32: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Dependence:relationshipbetweentwoinsns• Data:twoinsnsusesamestoragelocation• Control: 1insnaffectswhetheranotherexecutesatall• Notabadthing,programswouldbeboring otherwise• Enforcedbymakingolderinsngobeforeyoungerone

– Happensnaturallyinsingle-/multi-cycledesigns– Butnotinapipeline

Hazard:dependence&possibilityofwronginsnorder• Effectsofwronginsnordercannotbeexternallyvisible• Hazardsareabadthing:mostsolutionseithercomplicatethehardwareorreduceperformance

32

Page 33: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

DataHazards• registerfile(RF)readsoccurinstage2(ID)• RFwritesoccurinstage5(WB)• RFwrittenin½half,readinsecond½halfofcycle

x10: add r3 ß r1, r2x14: sub r5 ß r3, r4

1.Isthereadependence?2.Isthereahazard?

33

A) YesB) NoC) Cannottellwiththe

informationgiven.

Page 34: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Whichofthefollowingstatementsistrue?

A.Whetherthereisadatadependencebetweentwoinstructionsdependsonthemachinetheprogramisrunningon.B.Whetherthereisadatahazardbetweentwoinstructionsdependsonthemachinetheprogramisrunningon.C.BothA&BD.NeitherAnorB

34

Page 35: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

35

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

Page 36: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

36

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

backwardsarrowsrequiretimetravel

Page 37: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

37

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

Page 38: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

38

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

Page 39: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

PC

instmem

Rd

Ra Rb

DB

A

Rd

Detecting Data Hazards

IF/ID.Ra ≠0?

39

Ra==? Ra==

?

add r3, r1, r2subr5,r3,r4

Stall=(IF/ID.Ra !=0&& (IF/ID.Ra ==ID/EX.Rd||IF/ID.Ra ==EX/M.Rd))

Page 40: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

1. DoNothing• ChangetheISAtomatchimplementation• “Heycompiler:don’tcreatecodew/datahazards!”

(Wecandobetterthanthis)

2. Stall• Pausecurrentandsubsequentinstructionstillsafe

3. Forward/bypass• Forwarddatavaluetowhereitisneeded

(Onlyworksifvalueactuallyexistsalready)

40

Page 41: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

HowtostallaninstructioninIDstage• preventIF/IDpipelineregisterupdate

– stallstheIDstageinstruction

• convertIDstageinsn intonop forlaterstages– innocuous“bubble”passesthroughpipeline

• preventPCupdate– stallsthenext(IFstage)instruction

41

Page 42: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addr

PC

instmem

Rd

Ra Rb

DB

A

42

Rd

addr3,r1,r2subr5,r3,r5orr6,r3,r4addr6,r3,r8

inst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

Ifhazard:

WE=0MemWr=0RegWr=0

detecthazard

Page 43: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

43

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

addr3,r1,r2

(MemWr=0RegWr=0)

NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))

subr5,r3,r5

orr6,r3,r4 (WE=0)

STALLCONDITIONMET

Page 44: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

44

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

nop

(MemWr=0RegWr=0)

NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))

addr3,r1,r2subr5,r3,r5

(MemWr=0RegWr=0)

orr6,r3,r4 (WE=0)

STALLCONDITIONMET

Page 45: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

45

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

nop

NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))

addr3,r1,r2subr5,r3,r5

(MemWr=0RegWr=0)

orr6,r3,r4 (WE=1)NOSTALLCONDITIONMET:suballowedtoleavedecodestage

nop

Page 46: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

46

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

subr5,r3,r5

or r6,r3,r4

addr6,r3,r8

time

Page 47: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

47

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

subr5,r3,r5

or r6,r3,r4

addr6,r3,r8

r3=10

r3=20

time

IF ID Ex M W

IF ID Ex M W

IF ID Ex M

ID* ID*

IF* IF*

IF ID Ex

2StallCycles

Page 48: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

1. DoNothing• ChangetheISAtomatchimplementation• “Compiler:don’tcreatecodewithdatahazards!”

(Nicetry,wecandobetterthanthis)

2. Stall• Pausecurrentandsubsequentinstructionstillsafe

3. Forward/bypass• Forwarddatavaluetowhereitisneeded

(Onlyworksifvalueactuallyexistsalready)

48

Page 49: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

49

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

IF/ID ID/Ex Ex/Mem Mem/WB

Page 50: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

50

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

Twotypesofforwarding/bypass• ForwardingfromEx/Mem registerstoExstage(M®Ex)• ForwardingfromMem/WBregistertoExstage(W® Ex)

IF/ID ID/Ex Ex/Mem Mem/WB

Page 51: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

51

addr3,r1,r2

subr5,r3,r1

datamem

instmem

DB

A

IF ID Ex M W

IF ID Ex M W

addr3,r1,r2subr5,r3,r1

Problem:EXneedsALUresultthatisinMEMstageSolution:addabypassfromEX/MEM.DtostartofEX

Ex/Mem

Page 52: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

52

datamem

instmem

DB

A

DetectionLogicinExStage:forward=(Ex/M.WE&&EX/M.Rd !=0&&

ID/Ex.Ra ==Ex/M.Rd)||(sameforRb)

addr3,r1,r2subr5,r3,r1

Ex/Mem

Page 53: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

53

addr3,r1,r2

subr5,r3,r1

orr6,r3,r4

datamem

instmem

DB

A

IF ID Ex M WIF ID

IF WEx M WID Ex M

Problem:EXneedsvaluebeingwrittenbyWBSolution:AddbypassfromWBfinalvaluetostartofEX

Mem/WB

add r3, r1,r2subr5,r3,r1orr6,r3,r4

Page 54: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

54

datamem

instmem

DB

A

DetectionLogic:forward=(M/WB.WE&&M/WB.Rd !=0&&

ID/Ex.Ra ==M/WB.Rd &&not(ID/Ex.WE &&Ex/M.Rd !=0&&

ID/Ex.Ra ==Ex/M.Rd)||(sameforRb)

Mem/WB

add r3, r1,r2subr5,r3,r1orr6,r3,r4

Page 55: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

55

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

time

Page 56: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

56

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

IF ID Ex M W

IF ID

IF W

Ex M W

ID Ex M

IF ID Ex

time

M W

IF ID Ex M W

Page 57: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Datadependencyafteraloadinstruction:• ValuenotavailableuntilaftertheMstageàNextinstructioncannotproceedifdependent

THEKILLERHAZARD57

datamem

instmem

DB

A

lwr4,20(r8)orr6,r3,r4

Page 58: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

58

lwr4,20(r8)

or r6,r3,r4

datamem

instmem

DB

A

lwr4,20(r8)orr6,r4,r1

Page 59: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

59

lwr4,20(r8)

or r6,r3,r4

datamem

instmem

DB

A

IF ID Ex

IF ID

lwr4,20(r8)orr6,r4,r1

Page 60: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

60

datamem

instmem

DB

A

NOPorr6,r4,r1 lwr4,20(r8)

lwr4,20(r8)

or r6,r3,r4

IF ID Ex M W

IF ID Ex M WID*Stall

Page 61: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

61

datamem

instmem

DB

A

NOPorr6,r4,r1 lwr4,20(r8)

Ex

lwr4,20(r8)

or r6,r3,r4

IF ID Ex M W

IF ID Ex M WID*Stall

Page 62: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

62

datamemim

m

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MCRa

MC

forwardunit

detecthazard

IF/ID ID/Ex Ex/Mem Mem/WB

Stall=If(ID/Ex.MemRead &&IF/ID.Ra ==ID/Ex.Rd

RdMC

Page 63: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Mostfrequent3410non-solutiontoload-usehazardsWhyisthis“solution”sosososososoawful? 63

datamemim

m

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MCRa

MC

forwardunit

detecthazard

IF/ID ID/Ex Ex/Mem Mem/WB

RdMC

Page 64: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

ForwardingvaluesdirectlyfromMemorytotheExecutestagewithoutstoringtheminaregisterfirst:

A. Doesnotremovetheneedtostall.B. AddsonetoomanypossibleinputstotheALU.C. Willcausethepipelineregistertohavethe

wrongvalue.D. Halvesthefrequencyoftheprocessor.E. BothA&D

64

Page 65: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

TwoMIPSSolutions:• MIPS2000/3000:delayslot

– ISAsaysresultsofloadsarenotavailableuntilonecyclelater

–Assemblerinsertsnop,orreorderstofilldelayslot

• MIPS4000onwards:stall– Butreally,programmer/compilerreorderstoavoidstallingintheloaddelayslot

65

Page 66: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

5-stagePipeline• Implementation• WorkingExample

66

Hazards• Structural• DataHazards• ControlHazards

Page 67: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

67

for (i = 0; i < max; i++) {n += 2;

}i = 7;n--;

x10 addi r1, r0, 0 # i=0x14 Loop: addi r2, r2, 2 # n x+= 2x18 addi r1, r1, 1 # i++x1C blt r1, r3, Loop # i<max?x20 addi r1, r0, 7 # i = 7x24 subi r2, r2, 1 # n++

i à r1Assume:n à r2max à r3

Page 68: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

ControlHazards• instructionsarefetchedinstage1(IF)• branchandjumpdecisionsoccurinstage3(EX)à nextPCnotknownuntil2cycles after branch/jump

x1C blt r1, r3, Loop x20 addi r1, r0, 7

x24 subi r2, r2, 1

68

Branchnot taken?NoProblem!

Branchtaken?Justfetched2addi’sà Zap&Flush

Page 69: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

69

1C blt r1,r3,L20 addi r1,r0,724 subi r2,r2,1

datamem

instmem D

B

A

PC

+4

NOPIF ID Ex M W

IF ID NOP NOPNOPIF NOP NOP NOP

branchcalc

decidebranch

IF ID Ex M W

IfbranchTaken®Zap

• preventPCupdate• clearIF/IDlatch• branchcontinues

NewPC=1C

14 L:addi r2,r2,2

Page 70: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

70

1C blt r1,r3,L20 addi r1,r0,724 subi r2,r2,1

datamem

instmem D

B

A

PC

+4

NOPIF ID Ex M W

IF ID NOP NOPNOPIF NOP NOP NOP

branchcalc

decidebranch

IF ID Ex M W

IfbranchTaken®Zap

• preventPCupdate• clearIF/IDlatch• branchcontinues

NewPC=1C

14 L:addi r2,r2,2

Foreverytakenbranch?OUCH!!!

Page 71: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

1. DelaySlot• YouMUSTdothis• MIPSISA:1insn afterctrlinsn always executed

• Whetherbranchtakenornot

2. ResolveBranchatDecode• SomegroupsdothisforProject2,yourchoice• Movebranchcalc fromEXtoID• Alternative:justzap2nd instructionwhenbranchtaken

3. BranchPrediction• Notin3410,buteveryprocessorworthanythingdoesthis

(nooffense!)

71

Page 72: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

datamem

instmem D

B

A

PC

+4

branchcalc

decidebranchNewPC=1C

1C blt r1, r3, Loop F D X

20 addi r1, r0, 7 F D

24 subi r2, r2, 1 F

Page 73: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

73

for (i = 0; i < max; i++) {n += 2;

}i = 7;n--;x10 addi r1, r0, 0 # i=0x14 Loop: addi r2, r2, 2 # n x+= 2x18 addi r1, r1, 1 # i++x1C blt r1, r3, Loop # i<max?x20 nopx24 addi r1, r0, 7 # i = 7x28 subi r2, r2, 1 # n++

i à r1Assume:n à r2max à r3

Page 74: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

datamem

instmem D

B

A

PC

+4

branchcalc

decidebranchNewPC=1C

1C blt r1, r3, Loop F D X

20 nop F D

24 addi r1, r0, 7 F

Page 75: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

75

datamem

instmem D

B

A

PC

+4

NewPC=1C

branchcalc

decidebranch

1C blt r1, r3, Loop F D X

20 nop F D

14 Loop:addi r2,r2,2 F

Page 76: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

76

x10 addi r1, r0, 0 # i=0x14 Loop: addi r2, r2, 2 # n x+= 2x18 addi r1, r1, 1 # i++x1C blt r1, r3, Loop # i<max?x20 nop

x10 addi r1, r0, 0 # i=0x14 Loop: addi r1, r1, 1 # i++x18 blt r1, r3, Loop # i<max?x1C addi r2, r2, 2 # n x+= 2

Compiler transforms code

Page 77: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

77

datamem

instmem D

B

A

PC

+4

NewPC=1C

branchcalc

decidebranch

1C blt r1, r3, Loop F D X

20 addi r2,r2,2 F D

14 Loop:addi r1,r1,1 F

Page 78: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

MostprocessorsupportSpeculativeExecution• Guess directionofthebranch

– Allowinstructionstomovethroughpipeline– Zapthemlaterifguessturnsouttobewrong

• Amustforlongpipelines

78

Page 79: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

Datahazardsoccurwhenaoperand(register)dependsontheresultofapreviousinstructionthatmaynotbecomputedyet.Pipelinedprocessorsneedtodetectdatahazards.

Stalling,preventingadependentinstructionfromadvancing,isonewaytoresolvedatahazards.StallingintroducesNOPs(“bubbles”)intoapipeline.IntroduceNOPsby(1)preventingthePCfromupdating,(2)preventingwritestoIF/IDregistersfromchanging,and(3)preventingwritestomemoryandregisterfile.Nops significantlydecreaseperformance.

Forwardingbypassessomepipelinedstagesforwardingaresulttoadependentinstructionoperand(register).Betterperformancethanstalling.

79

Page 80: Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by reading values from latch • Each stage ends by writing values to latch • Resolve

ControlhazardsoccurbecausethePCfollowingacontrolinstructionisnotknownuntilcontrolinstructionisexecuted.Ifbranchistakenà needtozapinstructions.1cycleperformancepenalty.

DelaySlotscanpotentiallyincreaseperformanceduetocontrolhazards.Theinstructioninthedelayslotwillalwaysbeexecuted.Requiressoftware(compiler)tomakeuseofdelayslot.Putnop indelayslotifnotabletoputusefulinstructionindelayslot.

WecanreducecostofacontrolhazardbymovingbranchdecisionandcalculationfromExstagetoIDstage.Withadelayslot,thisremovestheneedtoflushinstructionsontakenbranches.

80