Forwarding - University of Washington · Forwarding Now, we’ll ... Pipeline diagram review This...

37
1 Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.

Transcript of Forwarding - University of Washington · Forwarding Now, we’ll ... Pipeline diagram review This...

1

Forwarding

Now, we’ll introduce some problems that data hazards can cause for ourpipelined processor, and show how to handle them with forwarding.

2

The pipelined datapath

Readaddress

Instructionmemory

Instruction[31-0] Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

MemToReg

4

Shiftleft 2

Add

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0] RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB ControlM

WB

WB

PC

1

0

PCSrc

Signextend

EX

M

WB

3

Pipeline diagram review

This diagram shows the execution of an ideal code fragment.— Each instruction needs a total of five cycles for execution.— One instruction begins on every clock cycle for the first five cycles.— One instruction completes on each cycle from that time on.

WBMEMEXIDIFadd $13, $14, $0

WBMEMEXIDIFor $16, $17, $18

WBMEMEXIDIFand $9, $10, $11

WBMEMEXIDIFsub $2, $4, $5

WBMEMEXIDIFlw $8, 4($29)

987654321Clock cycle

4

Here is the example instruction sequence used to illustrate pipelining onthe previous page.

lw $8, 4($29)sub $2, $4, $5and $9, $10, $11or $16, $17, $18add $13, $14, $0

The instructions in this example are independent.— Each instruction reads and writes completely different registers.— Our datapath handles this sequence easily, as we saw last time.

But most sequences of instructions are not independent!

Our examples are too simple

5

An example with dependencies

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

6

An example with dependencies

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

There are several dependencies in this new code fragment.— The first instruction, SUB, stores a value into $2.— That register is used as a source in the rest of the instructions.

This is not a problem for the single-cycle and multicycle datapaths.— Each instruction is executed completely before the next one begins.— This ensures that instructions 2 through 5 above use the new value of

$2 (the sub result), just as we expect. How would this code sequence fare in our pipelined datapath?

7

WBMEMEXIDIFsw $15, 100($2)

WBMEMEXIDIFadd $14, $2, $2

WBMEMEXIDIFor $13, $6, $2

WBMEMEXIDIFand $12, $2, $5

WBMEMEXIDIFsub $2, $1, $3

987654321Clock cycle

The SUB instruction does not write to register $2 until clock cycle 5. Thiscauses two data hazards in our current pipelined datapath.— The AND reads register $2 in cycle 3. Since SUB hasn’t modified the

register yet, this will be the old value of $2, not the new one.— Similarly, the OR instruction uses register $2 in cycle 4, again before

it’s actually updated by SUB.

Data hazards in the pipeline diagram

8

WBMEMEXIDIFsw $15, 100($2)

WBMEMEXIDIFadd $14, $2, $2

WBMEMEXIDIFor $13, $6, $2

WBMEMEXIDIFand $12, $2, $5

WBMEMEXIDIFsub $2, $1, $3

987654321Clock cycle

The ADD instruction is okay, because of the register file design.— Registers are written at the beginning of a clock cycle.— The new value will be available by the end of that cycle.

The SW is no problem at all, since it reads $2 after the SUB finishes.

Things that are okay

9

WBMEMEXIDIFsw $15, 100($2)

WBMEMEXIDIFadd $14, $2, $2

WBMEMEXIDIFor $13, $6, $2

WBMEMEXIDIFand $12, $2, $5

WBMEMEXIDIFsub $2, $1, $3

987654321Clock cycle

Arrows indicate the flow of data between instructions.— The tails of the arrows show when register $2 is written.— The heads of the arrows show when $2 is read.

Any arrow that points backwards in time represents a data hazard in ourbasic pipelined datapath. Here, hazards exist between instructions 1 & 2and 1 & 3.

Dependency arrows

10

A fancier pipeline diagram

DM Reg RegIM

DM Reg RegIM

DM Reg RegIM

DM Reg RegIM

DM Reg RegIM

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Clock cycle1 2 3 4 5 6 7 8 9

11

A more detailed look at the pipeline

We have to eliminate the hazards, so the AND and OR instructions in ourexample will use the correct value for register $2.

When is the data is actually produced and consumed? What can we do?

WBMEMEXIDIFor $13, $6, $2

WBMEMEXIDIFand $12, $2, $5

WBMEMEXIDIFsub $2, $1, $3

7654321Clock cycle

12

A more detailed look at the pipeline

We have to eliminate the hazards, so the AND and OR instructions in ourexample will use the correct value for register $2.

Let’s look at when the data is actually produced and consumed.— The SUB instruction produces its result in its EX stage, during cycle 3

in the diagram below.— The AND and OR need the new value of $2 in their EX stages, during

clock cycles 4-5 here.

WBMEMEXIDIFor $13, $6, $2

WBMEMEXIDIFand $12, $2, $5

WBMEMEXIDIFsub $2, $1, $3

7654321Clock cycle

13

Bypassing the register file

The actual result $1 - $3 is computed in clock cycle 3, before it’s neededin cycles 4 and 5.

If we could somehow bypass the writeback and register read stages whenneeded, then we can eliminate these data hazards.— Today we’ll focus on hazards involving arithmetic instructions.— Next time, we’ll examine the lw instruction.

Essentially, we need to pass the ALU output from SUB directly to the ANDand OR instructions, without going through the register file.

WBMEMEXIDIFor $13, $6, $2

WBMEMEXIDIFand $12, $2, $5

WBMEMEXIDIFsub $2, $1, $3

7654321Clock cycle

14

Where to find the ALU result

The ALU result generated in the EX stage is normally passed through thepipeline registers to the MEM and WB stages, before it is finally written tothe register file.

This is an abridged diagram of our pipelined datapath.

Instructionmemory Data

memory

1

0

PC

ALURegisters

Rd

Rt0

1

IF/ID ID/EX EX/MEM MEM/WB

15

Forwarding

DM Reg RegIM

DM Reg RegIM

DM Reg RegIM

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

Clock cycle1 2 3 4 5 6 7

Since the pipeline registers already contain the ALU result, we could justforward that value to subsequent instructions, to prevent data hazards.— In clock cycle 4, the AND instruction can get the value $1 - $3 from

the EX/MEM pipeline register used by sub.— Then in cycle 5, the OR can get that same result from the MEM/WB

pipeline register being used by SUB.

16

Outline of forwarding hardware

A forwarding unit selects the correct ALU inputs for the EX stage.— If there is no hazard, the ALU’s operands will come from the register

file, just like before.— If there is a hazard, the operands will come from either the EX/MEM

or MEM/WB pipeline registers instead. The ALU sources will be selected by two new multiplexers, with control

signals named ForwardA and ForwardB.

DM Reg RegIM

DM Reg RegIM

DM Reg RegIM

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

17

Simplified datapath with forwarding muxes

ForwardAInstructionmemory

Datamemory

1

0

PC

ALURegisters

Rd

Rt0

1

IF/ID ID/EX EX/MEM MEM/WB

012

012

ForwardB

18

Detecting EX/MEM data hazards

So how can the hardware determine if a hazard exists?

DM Reg RegIM

DM Reg RegIM

sub $2, $1, $3

and $12, $2, $5

19

Detecting EX/MEM data hazards

So how can the hardware determine if a hazard exists? An EX/MEM hazard occurs between the instruction currently in its EX

stage and the previous instruction if:1. The previous instruction will write to the register file, and2. The destination is one of the ALU source registers in the EX stage.

There is an EX/MEM hazard between the two instructions below.

Data in a pipeline register can be referenced using a class-like syntax.For example, ID/EX.RegisterRt refers to the rt field stored in the ID/EXpipeline.

DM Reg RegIM

DM Reg RegIM

sub $2, $1, $3

and $12, $2, $5

20

EX/MEM data hazard equations

The first ALU source comes from the pipeline register when necessary.

if (EX/MEM.RegWrite = 1and EX/MEM.RegisterRd = ID/EX.RegisterRs)

then ForwardA = 2

The second ALU source is similar.

if (EX/MEM.RegWrite = 1and EX/MEM.RegisterRd = ID/EX.RegisterRt)

then ForwardB = 2

DM Reg RegIM

DM Reg RegIM

sub $2, $1, $3

and $12, $2, $5

21

Detecting MEM/WB data hazards

A MEM/WB hazard may occur between an instruction in the EX stage andthe instruction from two cycles ago.

One new problem is if a register is updated twice in a row.

add $1, $2, $3add $1, $1, $4sub $5, $5, $1

Register $1 is written by both of the previous instructions, but only themost recent result (from the second ADD) should be forwarded.

DM Reg RegIM

DM Reg RegIM

DM Reg RegIM

add $1, $2, $3

add $1, $1, $4

sub $5, $5, $1

22

MEM/WB hazard equations

Here is an equation for detecting and handling MEM/WB hazards for thefirst ALU source.

if (MEM/WB.RegWrite = 1and MEM/WB.RegisterRd = ID/EX.RegisterRsand (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs or EX/MEM.RegWrite = 0)

then ForwardA = 1

The second ALU operand is handled similarly.

if (MEM/WB.RegWrite = 1and MEM/WB.RegisterRd = ID/EX.RegisterRtand (EX/MEM.RegisterRd ≠ ID/EX.RegisterRt or EX/MEM.RegWrite = 0)

then ForwardB = 1

23

Simplified datapath with forwarding

ForwardAInstruction

memory

Datamemory

1

0

PC

ALURegisters

Rd

Rt0

1

IF/ID ID/EX EX/MEM MEM/WB

Rs

012

012

ForwardingUnit

EX/MEM.RegisterRd

MEM/WB.RegisterRd

ForwardB

ID/EX.RegisterRt

ID/EX.RegisterRs

24

The forwarding unit

The forwarding unit has several control signals as inputs.

ID/EX.RegisterRs EX/MEM.RegisterRd MEM/WB.RegisterRdID/EX.RegisterRt EX/MEM.RegWrite MEM/WB.RegWrite

(The two RegWrite signals are not shown in the diagram, but they comefrom the control unit.)

The fowarding unit outputs are selectors for the ForwardA and ForwardBmultiplexers attached to the ALU. These outputs are generated from theinputs using the equations on the previous pages.

Some new buses route data from pipeline registers to the new muxes.

25

Example

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

Assume again each register initially contains its number plus 100.— After the first instruction, $2 should contain -2 (101 - 103).— The other instructions should all use -2 as one of their operands.

We’ll try to keep the example short.— Assume no forwarding is needed except for register $2.— We’ll skip the first two cycles, since they’re the same as before.

26

MEM/WB.RegisterRdID/EX.RegisterRs

Clock cycle 3

Instructionmemory

Datamemory

1

0

PC

ALURegisters

12 (Rd)

5 (Rt)0

1

IF/ID ID/EX EX/MEM MEM/WB

2 (Rs)

012

012

ForwardingUnit

1

EX: sub $2, $1, $3ID: and $12, $2, $5IF: or $13, $6, $2

102

105

X

X

2

5

101

103

101

-2

103

0

0

3

2 2

ID/EX.RegisterRt

EX/MEM.RegisterRd

27

-2

ID/EX.RegisterRs

5

MEM/WB.RegisterRd

EX/MEM.RegisterRd

Clock cycle 4: forwarding $2 from EX/MEM

Instructionmemory

Datamemory

1

0

PC

ALURegisters

13 (Rd)

2 (Rt)0

1

IF/ID ID/EX EX/MEM MEM/WB

6 (Rs)

012

012

ForwardingUnit

2

EX: and $12, $2, $5ID: or $13, $6, $2IF: add $14, $2, $2

106

102

X

X

6

2

102

105

-2

104

105

0

2

12 12

MEM: sub $2, $1, $3

-2

2

ID/EX.RegisterRt

28

-2

ID/EX.RegisterRs

2EX/MEM.RegisterRd

Clock cycle 5: forwarding $2 from MEM/WB

Instructionmemory

Datamemory

1

0

PC

ALURegisters

14 (Rd)

2 (Rt)0

1

IF/ID ID/EX EX/MEM MEM/WB

2 (Rs)

012

012

ForwardingUnit

12

6

EX: or $13, $6, $2ID: add $14, $2, $2IF: sw $15, 100($2)

-2

-2

-2

2

2

2

106106

-2

102

1

0

13 13

MEM: and $12, $2, $5

104

104

WB: sub$2, $1, $3

X

-2

-2

-2

2

2

ID/EX.RegisterRt

MEM/WB.RegisterRd

29

Lots of data hazards

The first data hazard occurs during cycle 4.— The forwarding unit notices that the ALU’s first source register for the

AND is also the destination of the SUB instruction.— The correct value is forwarded from the EX/MEM register, overriding

the incorrect old value still in the register file. A second hazard occurs during clock cycle 5.

— The ALU’s second source (for OR) is the SUB destination again.— This time, the value has to be forwarded from the MEM/WB pipeline

register instead. There are no other hazards involving the SUB instruction.

— During cycle 5, SUB writes its result back into register $2.— The ADD instruction can read this new value from the register file in

the same cycle.

30

Complete pipelined datapath...so far

0

1

Addr

Instructionmemory

Instr

Address

Write data

Datamemory

Readdata 1

0

PC

Extend

ALUSrc Result

ZeroALU

Instr [15 - 0] RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

Rd

Rt0

1

IF/ID

ID/EX

EX/MEM

MEM/WB

EX

M

WB

Control M

WB

WB

Rs

012

012

ForwardingUnit

EX/MEM.RegisterRd

MEM/WB.RegisterRd

31

What about stores?

Two “easy” cases:

DM Reg RegIM

DM Reg RegIM

add $1, $2, $3

sw $1, 0($4)

DM Reg RegIM

DM Reg RegIM

add $1, $2, $3

sw $4, 0($1)

1 2 3 4 5 6

1 2 3 4 5 6

32

Store Bypassing: Version 1

0

1

Addr

Instructionmemory

Instr

Address

Write data

Datamemory

Readdata 1

0

PC

Extend

ALUSrc Result

ZeroALU

Instr [15 - 0] RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

Rd

Rt0

1

IF/ID ID/EX EX/MEM MEM/WB

Rs

012

012

ForwardingUnit

EX/MEM.RegisterRd

MEM/WB.RegisterRd

EX: sw $4, 0($1) MEM: add $1, $2, $3

33

Store Bypassing: Version 2

0

1

Addr

Instructionmemory

Instr

Address

Write data

Datamemory

Readdata 1

0

PC

Extend

ALUSrc Result

ZeroALU

Instr [15 - 0] RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

Rd

Rt0

1

IF/ID ID/EX EX/MEM MEM/WB

Rs

012

012

ForwardingUnit

EX/MEM.RegisterRd

MEM/WB.RegisterRd

EX: sw $1, 0($4) MEM: add $1, $2, $3

34

What about stores?

A harder case:

In what cycle is:— The load value available?— The store value needed?

What do we have to add to the datapath?

DM Reg RegIM

DM Reg RegIM

lw $1, 0($2)

sw $1, 0($4)

1 2 3 4 5 6

35

Load/Store Bypassing: Extend the Datapath

0

1

Addr

Instructionmemory

Instr

Address

Write data

Datamemory

Readdata 1

0

PC

Extend

ALUSrc Result

ZeroALU

Instr [15 - 0] RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

Rd

Rt0

1

IF/ID ID/EX EX/MEM MEM/WB

Rs

012

012

ForwardingUnit

EX/MEM.RegisterRd

MEM/WB.RegisterRd

Sequence :lw $1, 0($2)sw $1, 0($4)

ForwardC

0

1

36

Miscellaneous comments

Each MIPS instruction writes to at most one register.— This makes the forwarding hardware easier to design, since there is

only one destination register that ever needs to be forwarded. Forwarding is especially important with deep pipelines like the ones in all

current PC processors. Section 6.4 of the textbook has some additional material not shown here.

— Their hazard detection equations also ensure that the source registeris not $0, which can never be modified.

— There is a more complex example of forwarding, with several casescovered. Take a look at it!

37

Summary

In real code, most instructions are dependent upon other ones.— This can lead to data hazards in our original pipelined datapath.— Instructions can’t write back to the register file soon enough for the

next two instructions to read. Forwarding eliminates data hazards involving arithmetic instructions.

— The forwarding unit detects hazards by comparing the destinationregisters of previous instructions to the source registers of the currentinstruction.

— Hazards are avoided by grabbing results from the pipeline registersbefore they are written back to the register file.

Next, we’ll finish up pipelining.— Forwarding can’t save us in some cases involving lw.— We still haven’t talked about branches for the pipelined datapath.