Lecture 11: Pipelining

15
Lecture 11: Pipelining Computer Engineering 585 Fall 2001

description

Lecture 11: Pipelining. Computer Engineering 585 Fall 2001. Three Generic Data Hazards. Instr I followed by Instr J Read After Write (RAW) Instr J tries to read operand before Instr I writes it (also known as data dependence ). Three Generic Data Hazards. Instr I followed by Instr J - PowerPoint PPT Presentation

Transcript of Lecture 11: Pipelining

Page 1: Lecture 11: Pipelining

Lecture 11: Pipelining

Computer Engineering 585Fall 2001

Page 2: Lecture 11: Pipelining

Three Generic Data Hazards

InstrI followed by InstrJ

Read After Write (RAW) InstrJ tries to read operand before InstrI writes it (also known as data dependence).

Page 3: Lecture 11: Pipelining

Three Generic Data Hazards

InstrI followed by InstrJ

Write After Read (WAR) InstrJ tries to write operand before InstrI reads i Gets wrong operand

Can’t happen in DLX 5 stage pipeline because: All instructions take 5 stages, and Reads are always in stage 2, and Writes are always in stage 5

Antidependence

Page 4: Lecture 11: Pipelining

WAR Example

SW 0(R1),R2

IF ID EX MEM1 MEM2 R2

WB

ADD R2,R3,R4

IF ID EX WB

Page 5: Lecture 11: Pipelining

Three Generic Data Hazards

InstrI followed by InstrJ

Write After Write (WAW) InstrJ tries to write operand before InstrI writes it Leaves wrong result ( InstrI not InstrJ )

Can’t happen in DLX 5 stage pipeline because: All instructions take 5 stages, and Writes are always in stage 5

Output dependence Will see WAR and WAW later in more

complicated pipelines

Page 6: Lecture 11: Pipelining

WAW Example

LW R1,0(R2) IF ID EX MEM1

MEM2 WB

ADD R1,R2,R3

IF ID EX WB

Page 7: Lecture 11: Pipelining

Summary: data hazard situations

Situation Example code sequence

Action

No dependence LW R1,45(R2)ADD R5,R6,R7SUB R8,R6,R7OR R9,R6,R7

No hazard possible because no dependence exists on R1 in the immediately following three instructions.

Dependencerequiring stall

LW R1,45(R2)ADD R5, R1,R7SUB R8,R6,R7OR R9,R6,R7

Comparators detect the use of R1 in the ADDand stall the ADD (and SUB and OR) before the ADD begins EX.

Dependence overcome by forwarding

LW R1,45(R2)ADD R5,R6,R7SUB R8, R1,R7OR R9,R6,R7

Comparators detect use of R1 in SUB and for-ward result of load to ALU in time for SUB to begin EX.

Dependence with accesses in order

LW R1,45(R2)ADD R5,R6,R7SUB R8,R6,R7OR R9,R1,R7

No action required because the read of R1 by OR occurs in the second half of the ID phase,while the write of the loaded data occurred in the first half.

FIGURE 3.17 Situations that the pipeline hazard detection hardware can see by comparing thedestination and sources of adjacent instructions.

Page 8: Lecture 11: Pipelining

Load Interlock Logic

Opcode field of ID/EX (ID/EX.IR0..5) Opcode field of IF/ID (IF/ID.IR0..5) Matching operand fields

Load Register-register ALU ID/EX.IR11..15 = ID/EX.IR6..10

Load Register-register ALU ID/EX.IR11..15 = ID/EX.IR11..15

Load Load, store, ALU immediate, or branch ID/EX.IR11..15 = ID/EX.IR6..10

FIGURE 3.18 The logic to detect the need for load interlocks during the ID stage of an instruction requires three compari-sons.

Page 9: Lecture 11: Pipelining

Comprehensive Forwarding Scenarios

Pipeline register containing source instruction

Opcodeof sourceinstruction

Pipeline register containing destination instruction

Opcode of destination instruction

Destination of the forwarded result

Comparison(if equal then forward)

EX/MEM Register-register ALU

ID/EX Register-register ALU ,ALU immediate, load,store, branch

Top ALU input

EX/MEM.IR16..20 = ID/EX.IR6..10

EX/MEM Register-register ALU

ID/EX Register-register ALU Bottom ALU input

EX/MEM.IR16..20 = ID/EX.IR11..15

MEM/WB Register-register ALU

ID/EX Register-register ALU ,ALU immediate, load,store, branch

Top ALU input

MEM/WB.IR16..20 = ID/EX.IR6..10

MEM/WB Register-register ALU

ID/EX Register-register ALU Bottom ALU input

MEM/WB.IR16..20 = ID/EX.IR11..15

EX/MEM ALU immediate

ID/EX Register-register ALU ,ALU immediate, load,store, branch

Top ALU input

EX/MEM.IR11..15 = ID/EX.IR6..10

EX/MEM ALU immediate

ID/EX Register-register ALU Bottom ALU input

EX/MEM.IR11..15 = ID/EX.IR11..15

MEM/WB ALU immediate

ID/EX Register-register ALU ,ALU immediate, load,store, branch

Top ALU input

MEM/WB.IR11..15 = ID/EX.IR6..10

MEM/WB ALU immediate

ID/EX Register-register ALU Bottom ALU input

MEM/WB.IR11..15 = ID/EX.IR11..15

MEM/WB Load ID/EX Register-register ALU ,ALU immediate, load,store, branch

Top ALU input

MEM/WB.IR11..15 = ID/EX.IR6..10

MEM/WB Load ID/EX Register-register ALU Bottom ALU input

MEM/WB.IR11..15 = ID/EX.IR11..15

FIGURE 3.19 Forwarding of data to the two ALU inputs (for the instruction in EX) can occur from the ALU result (in EX/MEMor in MEM/WB) or from the load result in MEM/WB.

Page 10: Lecture 11: Pipelining

Branches in DLX sequential implementation

Instruction fetchInstruction decode/#

register fetch

Execute/#address#

calculation

Memory#access

Write#back

B#

PC

4

ALU

16 32

Add

Data#memory

Registers

Sign#extend

Instruction#memory

M#u#x

M#u#x

M#u#x

M#u#x

Zero?Branch

takenCond

NPC

lmm

ALU#output

IRA

LMD

Page 11: Lecture 11: Pipelining

Control Hazard:3 cycle stall

DM

DM

DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

#

Reg

Reg

ALU

ALU

ALU

ALU

Reg

Reg

RegIM

IM

IM

IM

IM

Reg

Reg

Pro

gram

exe

cuti

on o

rder

(in

inst

ruct

ions

)

100: BEQZ R1, +40

104: ADD R2,R3,R4

108: SUB R6,R7,R8

112: ANDI R12,R10,0xAA

140: ADD R3,R4,R5

Page 12: Lecture 11: Pipelining

Control Hazard: 1 cycle stall

Data

ALU

Signextend

16 32

memory

PC

Instruction#memory

ADD

ADD

IF/ID

4

ID/EX

EX/MEM MEM/WB

IR6..10

MEM/WB.IR

IR11..15

Registers

Zero?

M#u#x

M#u#x

M#u#x

IR

Page 13: Lecture 11: Pipelining

1 Cycle Stall Pipeline RTL Description

Pipe stage Branch instruction

IF IF/ID.IR Mem[PC]; IF/ID.NPC,PC (if ID/EX.cond {ID/EX.NPC} else {PC+4});

ID ID/EX.A Regs[IF/ID.IR 6..10]; ID/EX.B Regs[IF/ID.IR 11..15];

ID/EX.NPC IF/ID.NPC + (IR 16)16##IR16..31;

ID/EX.IR IF/ID.IR ID/EX.cond (Regs[IF/ID.IR 6..10] op 0);

ID/EX.Imm (IR16)16##IR16..31

EX

MEM

WB

Page 14: Lecture 11: Pipelining

Branch Stall Impact

If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!

Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier

DLX branch tests if register = 0 DLX Solution:

Move Zero test to ID/RF stage Adder to calculate new PC in ID/RF stage 1 clock cycle penalty for branch versus 3

Page 15: Lecture 11: Pipelining

Branch Behavior Statistics

Percentage of instructions executed0% 25%5% 10% 15% 20%

10%0%

0%2%

1%

2%

6%4%4%

6%2%2%

11%8%

4%

12%4%

3%

11%1%

4%

22%2%2%

11%3%3%

9%0%

1%

Forward conditional branches

Unconditional branchesBackward conditional branches

Benchmark

compress

eqntott

espresso

gcc

li

doduc

ear

hydro2d

mdljdp

su2cor

Int: 13% forward cond., 3% backward cond., 4% unconditionalFP: 7% forward cond., 2% backward cond., 1% unconditional