Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture...

54
Fall 2014, Sep 26 . . Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 ELEC 5200-001/6200-001 Lecture 6 Lecture 6 1 ELEC 5200-001/6200-001 ELEC 5200-001/6200-001 Computer Architecture and Design Computer Architecture and Design Fall 2014 Fall 2014 Pipelining (Chapter 6) Pipelining (Chapter 6) Vishwani D. Agrawal Vishwani D. Agrawal James J. Danaher Professor James J. Danaher Professor Department of Electrical and Computer Department of Electrical and Computer Engineering Engineering Auburn University, Auburn, AL 36849 Auburn University, Auburn, AL 36849 http://www.eng.auburn.edu/~vagrawal [email protected]

Transcript of Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture...

Page 1: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 11

ELEC 5200-001/6200-001ELEC 5200-001/6200-001Computer Architecture and DesignComputer Architecture and Design

Fall 2014Fall 2014 Pipelining (Chapter 6)Pipelining (Chapter 6)

Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor

Department of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849Auburn University, Auburn, AL 36849

http://www.eng.auburn.edu/[email protected]

Page 2: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 22

ILP: Instruction Level ParallelismILP: Instruction Level Parallelism

Single-cycle and multi-cycle datapaths Single-cycle and multi-cycle datapaths execute one instruction at a time.execute one instruction at a time.

How can we get better performance?How can we get better performance?

Answer: Execute multiple instructions at a Answer: Execute multiple instructions at a time:time:

Pipelining – Enhance a multi-cycle datapath to Pipelining – Enhance a multi-cycle datapath to fetch one instruction every cycle.fetch one instruction every cycle.

Parallelism – Fetch multiple instructions every Parallelism – Fetch multiple instructions every cycle.cycle.

Page 3: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 33

Automobile Team AssemblyAutomobile Team Assembly

1 car assembled every four hours6 cars per day180 cars per month2,040 cars per year

1 hour

1 hour

1 hour1 hour

Page 4: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 44

Automobile Assembly LineAutomobile Assembly Line

Task 11 hour

Task 21 hour

Task 31 hour

Task 41 hour

First car assembled in 4 hours (pipeline latency) thereafter, 1 car completed per hour 21 cars on first day, thereafter 24 cars per day 717 cars per month 8,637 cars per yearWhat gives 4X increase?

Mecahnical Electrical Painting Testing

Page 5: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 55

Throughput: Team AssemblyThroughput: Team Assembly

Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing

Time of assembling one car = n hours

where n is the number of nearly equal subtasks,each requiring 1 unit of time

Throughput = 1/n cars per unit time

Red car completed

Red car started

TimeBlue car started

Blue car completed

Page 6: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 66

Throughput: Assembly LineThroughput: Assembly Line

Mechanical Electrical Painting Testing

Mechanical Electrical Painting Testing

Mechanical Electrical Painting Testing

Mechanical Electrical Painting Testing

Car 1

Car 2

Car 3

Car 4

.

.

Car 1 complete

Car 2 complete

Time to complete first car = n time units (latency)

Cars completed in time T = T – n + 1

Throughput = 1 – (n – 1)/ T cars per unit time

Throughput (assembly line) 1 – (n – 1)/ T n(n – 1)─────────────────── = ──────── = n – ───── → nThroughput (team assembly) 1/n T as T→∞

time

Page 7: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 77

Some Features of Assembly LineSome Features of Assembly Line

Task 11 hour

Task 21 hour

Task 31 hour

Task 41 hour

Mechanical Electrical Painting Testing

Electrical parts delivered (JIT)

Defect found

Stall assembly line to fix the cause of defect

3 cars in the assembly line are suspects, to be removed (flush pipeline)

Page 8: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Pros and ConsPros and ConsAdvantages:Advantages:

Efficient use of labor.Efficient use of labor.

Specialists can do better job.Specialists can do better job.

Just in time (JIT) methodology eliminates warehouse cost.Just in time (JIT) methodology eliminates warehouse cost.

Disadvantages:Disadvantages:Penalty of defect latency.Penalty of defect latency.

Lack of flexibility in production.Lack of flexibility in production.

Assembly line work is monotonous and boring.Assembly line work is monotonous and boring.

https://www.youtube.com/watch?v=IjarLbD9r30

https://www.youtube.com/watch?v=ANXGJe6i3G8

https://www.youtube.com/watch?v=5lp4EbfPAtI

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 88

Page 9: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 99

Pipelining in a ComputerPipelining in a ComputerDivide datapath into nearly equal tasks, to be Divide datapath into nearly equal tasks, to be performed serially and requiring non-overlapping performed serially and requiring non-overlapping resources.resources.Insert registers at task boundaries in the Insert registers at task boundaries in the datapath; registers pass the output data from datapath; registers pass the output data from one task as input data to the next task.one task as input data to the next task.Synchronize tasks with a clock having a cycle Synchronize tasks with a clock having a cycle time that just exceeds the time required by the time that just exceeds the time required by the longest task.longest task.Break each instruction down into the set of tasks Break each instruction down into the set of tasks so that instructions can be executed in a so that instructions can be executed in a staggered fashion.staggered fashion.

Page 10: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1010

Pipelining a Single-Cycle DatapathPipelining a Single-Cycle DatapathInstructionInstruction

classclass

Instr.Instr.

fetchfetch

(IF)(IF)

Instr. Instr. DecodeDecode

(also reg. (also reg. file read)file read)

(ID)(ID)

Execution Execution (ALU(ALU

Operation)Operation)

(EX)(EX)

Data Data accessaccess

(MEM)(MEM)

Write Write Back Back (Reg. (Reg.

file file write)write)

(WB)(WB)

Total Total timetime

lwlw 2ns2ns 1ns1ns 2ns2ns 2ns2ns 1ns1ns 8ns8ns

swsw 2ns2ns 1ns1ns 2ns2ns 2ns2ns 8ns8ns

R-formatR-formatadd, sub, and, or, sltadd, sub, and, or, slt

2ns2ns 1ns1ns 2ns2ns 1ns1ns 8ns8ns

B-format,B-format, beq beq 2ns2ns 1ns1ns 2ns2ns 8ns8ns

No operation on data; idle times equalize instruction lengths.

Page 11: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1111

Execution Time: Single-CycleExecution Time: Single-Cycle

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

0 2 4 6 8 10 12 14 16 . .

Time (ns) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0)

Clock cycle time = 8 ns

Total time for executing three lw instructions = 24 ns

Page 12: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1212

Pipelined DatapathPipelined DatapathInstructionInstruction

classclass

Instr.Instr.

fetchfetch

(IF)(IF)

Instr. Instr. DecodeDecode

(also reg. (also reg. file read)file read)

(ID)(ID)

Execu-Execu-tion tion

(ALU(ALU

Opera-Opera-tion)tion)

(EX)(EX)

Data Data accessaccess

(MEM)(MEM)

Write Write Back Back (Reg. (Reg.

file file write)write)

(WB)(WB)

Total Total timetime

lwlw 2ns2ns1ns1ns

2ns2ns2ns2ns 2ns2ns

1ns1ns

2ns2ns10ns10ns

swsw 2ns2ns1ns1ns

2ns2ns2ns2ns 2ns2ns

1ns1ns

2ns2ns10ns10ns

R-format: add, R-format: add, sub, and, or, sltsub, and, or, slt 2ns2ns

1ns1ns

2ns2ns2ns2ns 2ns2ns

1ns1ns

2ns2ns10ns10ns

B-format:B-format:

beqbeq2ns2ns

1ns1ns

2ns2ns2ns2ns 2ns2ns

1ns1ns

2ns2ns10ns10ns

No operation on data; idle time inserted to equalize instruction lengths.

Page 13: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1313

Execution Time: PipelineExecution Time: Pipeline

IF ID EX MEM RW

IF ID EX MEM RW

IF ID EX MEM RW

0 2 4 6 8 10 12 14 16 . .

Time (ns) lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

Clock cycle time = 2 ns, four times faster than single-cycle clock

Total time for executing three lw instructions = 14 ns

Single-cycle time 24Performance ratio = ──────────── = ── = 1.7

Pipeline time 14

Page 14: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1414

Pipeline PerformancePipeline PerformanceClock cycle time = 2 ns

1,003 lw instructions:

Total time for executing 1,003 lw instructions = 2,014 ns

Single-cycle time 8,024Performance ratio = ──────────── = ──── = 3.98

Pipeline time 2,014

10,003 lw instructions:

Performance ratio = 80,024 / 20,014 = 3.998 → Clock cycle ratio (4)

Pipeline performance approaches clock-cycle ratio for long programs.

Page 15: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1515

Instr. mem.

PC

Ad

d

Re

g. F

ileDatamem.1

mu

x 0

1 m

ux

0

0 m

ux

1

4

1 m

ux

0

Sign ext.

Shift left 2

ALUCont.

CO

NT

RO

L

opcode

MemWriteMemRead

AL

U

Branch

zero

0-15

0-5

11-15

16-20

21-25

26-31

AL

U

Single-Cycle Datapath

MemtoReg

ALUOp

ALUSrc

RegDst

RegWrite

IF: Instr. fetch ID: Instr. decode,reg. file read

EX: Execute,address calc.

MEM: mem. access

WB:writeback

Page 16: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1616

Pipelining of RISC InstructionsPipelining of RISC Instructions(From Lecture 3, Slide 6)(From Lecture 3, Slide 6)

FetchInstruction

ExamineOpcode

ALUOperation

MemoryRead/Write

StoreResult

Although an instruction takes five clock cycles,one instruction is completed every cycle.

IF ID EX MEM WB

Instruction Decode Execute Memory WriteFetch instruction and Operation Back

Fetch operands to Reg file

Page 17: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1717

Instr. mem.

PC

Ad

d

Re

g. F

ileDatamem.1

mu

x 0

1 m

ux

0

0 m

ux

1

4

1 m

ux

0

Sign ext.

Shift left 2

ALUCont.

CO

NT

RO

L

opcode

MemWriteMemRead

AL

U

Branch

zero

0-15

0-5

11-15

16-20

21-25

26-31

AL

U

Pipeline Registers

MemtoReg

ALUOp

ALUSrc

RegDst

RegWrite

IF/ID ID/EX EX/MEM

MEM/WB

This requires aCONTROL not too different from single-cycle

Page 18: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1818

Pipeline Register FunctionsPipeline Register FunctionsFour pipeline registers are added:Four pipeline registers are added:

Register Register namename Data heldData held

IF/IDIF/ID PC+4, Instruction word (IW)PC+4, Instruction word (IW)

ID/EXID/EX PC+4, R1, R2, IW(0-15) sign ext., IW(11-15)PC+4, R1, R2, IW(0-15) sign ext., IW(11-15)

EX/MEMEX/MEM PC+4, zero, ALUResult, R2, IW(11-15) or IW(16-20)PC+4, zero, ALUResult, R2, IW(11-15) or IW(16-20)

MEM/WBMEM/WB M[ALUResult], ALUResult, IW(11-15) or IW(16-20)M[ALUResult], ALUResult, IW(11-15) or IW(16-20)

Page 19: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1919

Instr

memPC

Ad

d

Re

g. F

ile

Data

mem.1 m

ux

0

1 m

ux

0

0 m

ux

1

4

Sign ext.

Shift left 2

opcode

AL

U

zero

0-15

16-20

21-25

26-31

AL

U

Pipelined Datapath

IF/ID ID/EX EX/MEM MEM/WB

11-15 for R-type16-20 for I-type lw

Page 20: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2020

Five-Cycle PipelineFive-Cycle Pipeline

CC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

PC

Page 21: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2121

Add InstructionAdd Instruction addadd $t0, $s1, $s2$t0, $s1, $s2Machine instruction wordMachine instruction word

000000000000 10001 10010 01000 00000 10001 10010 01000 00000 100000100000opcodeopcode $s1 $s2 $t0 $s1 $s2 $t0 functionfunction

CC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IF ID EX MEM WBread $s1 add write $t0read $s2 $s1+$s2

PC

Page 22: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2222

PC

Re

g. F

ile

1 m

ux

0

1 m

ux

0

0 m

ux

1

4

Sign ext.

Shift left 2

opcode

AL

U

zero

0-15

16-20

21-25

26-31

AL

U

Pipelined Datapath Executing add

IF/ID ID/EX EX/MEM MEM/WB

s1

$s1

11-15 for R-type16-20 for I-type lw

t0

Instr

mems2 $s2

Ad

d

addr

Datamem

data

Page 23: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2323

Load InstructionLoad Instruction

lwlw $t0, 1200 ($t1)$t0, 1200 ($t1)

100011 100011 01001 01000 01001 01000 0000 0100 1000 00000000 0100 1000 0000

opcodeopcode $t1 $t1 $t0$t0 12001200CC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IF ID EX MEM WBread $t1 add read write $t0sign ext $t1+1200 M[addr]1200

PC

Page 24: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2424

PC

Ad

d

Re

g. F

ile addr

Datamem

data1

mu

x 0

1 m

ux

0

0 m

ux

1

4

Sign ext.

Shift left 2

opcode

AL

U

zero

0-15

16-20

21-25

26-31

AL

U

Pipelined Datapath Executing lw

IF/ID ID/EX EX/MEM MEM/WB

t1

1200

$t1

11-15 for R-type16-20 for I-type lw

t0

Instr

mem

Page 25: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2525

Store InstructionStore Instructionswsw $t0, 1200 ($t1)$t0, 1200 ($t1)

101011101011 01001 01000 01001 01000 0000 0100 1000 00000000 0100 1000 0000

opcodeopcode $t1 $t1 $t0$t0 12001200CC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IF ID EX MEM WBread $t1 add write sign ext $t1+1200 M[addr]1200 (addr) ← $t0

PC

Page 26: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2626

PC

Re

g. F

ile addr

Datamem

data1

mu

x 0

1 m

ux

0

0 m

ux

1

4

Sign ext.

Shift left 2

opcode

AL

U

zero

0-15

16-20

21-25

26-31

AL

U

Pipelined Datapath Executing sw

IF/ID ID/EX EX/MEM MEM/WB

t1

1200

$t1

11-15 for R-type16-20 for I-type lw

Instr

memt0

$t0

Ad

d

Page 27: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2727

Executing a ProgramExecuting a Program

Consider a five-instruction segment:

lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6

Page 28: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2828

Program ExecutionProgram ExecutionCC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

lw $10, 20($1)

sub $11, $2, $3

add $12, $3, $4

lw $13, 24($1)

add $14, $5, $6

time

Pro

gra

m in

stru

ctio

ns

PC

PC

PC

PC

Page 29: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2929

Instr

memPC

Ad

d

Re

g. F

ile

Data

mem.1 m

ux

0

1 m

ux

0

0 m

ux

1

4

Sign ext.

Shift left 2

opcode

AL

U

zero

0-15

16-20

21-25

26-31

AL

U

CC5

IF/ID ID/EX EX/MEM MEM/WB

11-15 for R-type16-20 for I-type lw

IF: add $14, $5, $6 ID: lw $13, 24($1) EX: add $12, $3, $4MEM: sub $11, $2, $3

WB: lw $10, 20($1)

Page 30: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3030

Advantages of PipelineAdvantages of PipelineAfter the fifth cycle (CC5), one instruction is After the fifth cycle (CC5), one instruction is completed each cycle; CPI ≈ 1, neglecting the completed each cycle; CPI ≈ 1, neglecting the initial initial pipeline latencypipeline latency of 5 cycles. of 5 cycles.– Pipeline latency is defined as the number of stages in Pipeline latency is defined as the number of stages in

the pipeline, orthe pipeline, or– The number of clock cycles after which the first The number of clock cycles after which the first

instruction is completed.instruction is completed.

The clock cycle time is about four times shorter The clock cycle time is about four times shorter than that of single-cycle datapath and about the than that of single-cycle datapath and about the same as that of multicycle datapath.same as that of multicycle datapath.For multicycle datapath, CPI = 3. ….For multicycle datapath, CPI = 3. ….So, pipelined execution is faster, but . . .So, pipelined execution is faster, but . . .

Page 31: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3131

Science is always wrong. It never solves a problem without creating ten more.

George Bernard Shaw

Page 32: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3232

Pipeline HazardsPipeline HazardsDefinition: Definition: Hazard in a pipeline is a Hazard in a pipeline is a situation in which the next instruction situation in which the next instruction cannot complete execution one clock cycle cannot complete execution one clock cycle after completion of the presentafter completion of the present instructioninstruction..

Three types of hazards:Three types of hazards:– Structural hazard (resource conflict)Structural hazard (resource conflict)– Data hazardData hazard– Control hazardControl hazard

Page 33: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3333

Structural HazardStructural HazardTwo instructions cannot execute due to a Two instructions cannot execute due to a resource conflict.resource conflict.Example: Consider a computer with a Example: Consider a computer with a common data and instruction memory. common data and instruction memory. The fourth cycle of a The fourth cycle of a lwlw instruction instruction requires memory access (memory read) requires memory access (memory read) and at the same time the first cycle of the and at the same time the first cycle of the fourth instruction requires instruction fetch fourth instruction requires instruction fetch (memory read). This will cause a memory (memory read). This will cause a memory resource conflict.resource conflict.

Page 34: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

PC

PC

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3434

Example of Structural HazardExample of Structural HazardCC1 CC2 CC3 CC4 CC5

IM/D

M

ID, R

EG

. F

ILE

R

EA

D

AL

U

IM/D

M R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM/D

M

ID, R

EG

. F

ILE

R

EA

D

AL

U

IM/D

M R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM/D

M

ID, R

EG

. F

ILE

R

EA

D

AL

U

IM/D

M R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM/D

M

ID, R

EG

. F

ILE

R

EA

D

AL

U

IM/D

M R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

lw $10, 20($1)

sub $11, $2, $3

add $12, $3, $4

lw $13, 24($1)

time

Pro

gra

m in

stru

ctio

ns

Common data and instr. Mem.

Nedded by two instructions

PC

Page 35: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3535

Possible Remedies for Structural Possible Remedies for Structural HazardsHazards

Provide duplicate hardware resources in Provide duplicate hardware resources in datapath.datapath.

Control unit or compiler can insert delays Control unit or compiler can insert delays (no-op cycles) between instructions. This (no-op cycles) between instructions. This is known as pipeline is known as pipeline stallstall or or bubblebubble..

Page 36: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3636

Stall (Bubble) for Structural HazardStall (Bubble) for Structural HazardCC1 CC2 CC3 CC4 CC5

IM/D

M

ID, R

EG

. F

ILE

R

EA

D

AL

U

IM/D

M R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM/D

M

ID, R

EG

. F

ILE

R

EA

D

AL

U

IM/D

M R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM/D

M

ID, R

EG

. F

ILE

R

EA

D

AL

U

IM/D

M R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM/D

M

ID, R

EG

. F

ILE

R

EA

D

AL

U

IM/D

M R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

lw $10, 20($1)

sub $11, $2, $3

add $12, $3, $4

lw $13, 24($1)

time

Pro

gra

m in

stru

ctio

ns

Stall (bubble)

PC

PC

PC

Page 37: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3737

Data HazardData Hazard

Data hazard means that an instruction Data hazard means that an instruction cannot be completed because the needed cannot be completed because the needed data, being generated by another data, being generated by another instruction in the pipeline, is not available.instruction in the pipeline, is not available.

Example: consider two instructions:Example: consider two instructions: addadd $s0, $t0, $t1$s0, $t0, $t1

subsub $t2, $s0, $t3$t2, $s0, $t3 # needs $s0# needs $s0

Page 38: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3838

Example of Data HazardExample of Data HazardCC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

add $s0, $t0, $t1

sub $t2, $s0, $t3

time

Pro

gra

m in

stru

ctio

ns

Write s0 in CC5

Read s0 and t3 in CC3

We need to read s0 from reg file in cycle 3But s0 will not be written in reg file until cycle 5

However, s0 will only be used in cycle 4And it is available at the end of cycle 3

PC

PC

Page 39: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3939

Forwarding or BypassingForwarding or Bypassing

Output of a resource used by an Output of a resource used by an instruction is forwarded to the input of instruction is forwarded to the input of some resource being used by another some resource being used by another instruction.instruction.

Forwarding can eliminate some, but not Forwarding can eliminate some, but not all, data hazards.all, data hazards.

Page 40: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4040

Forwarding for Data HazardForwarding for Data Hazard

CC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

add $s0, $t0, $t1

sub $t2, $s0, $t3

time

Pro

gra

m in

stru

ctio

ns

Write s0 in CC5

Read s0 and t3 in CC3

Forwarding

PC

PC

Page 41: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4141

Forwarding Unit HardwareForwarding Unit Hardware

ForwardingUnit

DataMem.A

LU

FOR

W.

MU

X

FOR

W.

MU

X

ID/EX EX/MEM MEM/WB

MU

X

Destination registersSource reg.

IDs from opcode

Datato reg.

file

Page 42: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4242

Forwarding Alone May Not WorkForwarding Alone May Not Work

CC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

lw $s0, 20($s1)

sub $t2, $s0, $t3

time

Pro

gra

m in

stru

ctio

ns

Write s0 in CC5

Read s0 and t3 in CC3

data needed by sub (data hazard)

data available from memory only at the end of cycle 4

PC

PC

Page 43: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4343

Use Bubble and ForwardingUse Bubble and ForwardingCC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

lw $s0, 20($s1)

sub $t2, $s0, $t3

time

Pro

gra

m in

stru

ctio

ns

Write s0 in CC5

Forwarding

stall(bubble)

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

PC

PC

Page 44: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4444

Hazard Detection Unit HardwareHazard Detection Unit Hardware

ForwardingUnit

DataMem.A

LU

FOR

W.

MU

X

FOR

W.

MU

X

ID/EX EX/MEM MEM/WB

Controlsignals

registerIDs from

prev. instr.

to reg. file

HazardDetection

Unit

PC

ControlN

OP M

UX

0

IF/ID

Inst

ruct

ion

Disablewrite

Page 45: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4545

Resolving HazardsResolving Hazards

Hazards are resolved by Hazard detection Hazards are resolved by Hazard detection and forwarding units.and forwarding units.

Compiler’s understanding of how these Compiler’s understanding of how these units work can improve performance.units work can improve performance.

Page 46: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4646

C code: A = B + E;C = B + F;

MIPS code: lw $t1, 0($t0) . $t1 writtenlw $t2, 4($t0) . . $t2 writtenadd $t3, $t1, $t2 . . . $t1, $t2 neededsw $t3, 12($t0) . . . .lw $t4, 8($t0) . . . . . $t4 writtenadd $t5, $t1, $t4 . . . . . $t4 neededsw $t5, 16($t0) . . . . .

. . . . . . .

. . .

Avoiding Stall by Code ReorderAvoiding Stall by Code Reorder

Page 47: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4747

Reordered CodeReordered Code

C code: A = B + E;C = B + F;

MIPS code: lw $t1, 0($t0)lw $t2, 4($t0)lw $t4, 8($t0)add $t3, $t1, $t2 no hazardsw $t3, 12($t0)add $t5, $t1, $t4 no hazardsw $t5, 16($t0)

Page 48: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4848

Control HazardControl Hazard

Instruction to be fetched is not known!Instruction to be fetched is not known!Example: Instruction being executed is Example: Instruction being executed is branch-type, which will determine the next branch-type, which will determine the next instruction:instruction:

addadd $4, $5, $6$4, $5, $6 beqbeq $1, $2, 40$1, $2, 40 next instructionnext instruction . . .. . . 4040 andand $7, $8, $9$7, $8, $9

Page 49: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4949

Stall on BranchStall on BranchCC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

add $4, $5, $6

beq $1, $2, 40

next instruction or and $7, $8, $9

time

Pro

gra

m in

stru

ctio

ns

Stall (bubble)

PC

PC

PC

Page 50: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5050

Why Only One Stall?Why Only One Stall?

Extra hardware in ID phase:Extra hardware in ID phase:Additional ALU to compute branch addressAdditional ALU to compute branch address

Comparator to generate zero signalComparator to generate zero signal

Hazard detection unit writes the branch address Hazard detection unit writes the branch address in PCin PC

Page 51: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5151

Ways to Handle BranchWays to Handle BranchStall or bubbleStall or bubble

Branch prediction:Branch prediction:– HeuristicsHeuristics

Next instructionNext instruction

Prediction based on statistics (dynamic)Prediction based on statistics (dynamic)

Hardware decision (dynamic)Hardware decision (dynamic)

– Prediction error: pipeline flushPrediction error: pipeline flush

Delayed branchDelayed branch

Page 52: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5252

Delayed Branch ExampleDelayed Branch Example

Stall on branchStall on branch

add $4, $5, $6add $4, $5, $6

beq $1, $2, beq $1, $2, skipskip

next instructionnext instruction

. . .. . .

skipskip or $7, $8, $9or $7, $8, $9

Delayed branchDelayed branch

beq $1, $2, beq $1, $2, skipskip

add $4, $5, $6add $4, $5, $6

next instructionnext instruction

. . .. . .

skipskip or $7, $8, $9or $7, $8, $9

Instruction executed irrespective of branch decision

Page 53: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5353

Delayed BranchDelayed BranchCC1 CC2 CC3 CC4 CC5

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

IM

ID, R

EG

. F

ILE

R

EA

D

AL

U

DM

R

EG

. F

ILE

W

RIT

E

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

add $4, $5, $6

beq $1, $2, skip

next instruction or skip or $7, $8, $9

time

Pro

gra

m in

stru

ctio

ns

PC

PC

PC

Page 54: Fall 2014, Sep 26... ELEC 5200-001/6200-001 Lecture 6 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2014 Pipelining (Chapter 6) Vishwani.

Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5454

Summary: HazardsSummary: HazardsStructural hazardsStructural hazards– Cause: resource conflictCause: resource conflict– Remedies: (i) hardware resources, (ii) stall (bubble)Remedies: (i) hardware resources, (ii) stall (bubble)

Data hazardsData hazards– Cause: data unavailablityCause: data unavailablity– Remedies: (i) forwarding, (ii) stall (bubble), (iii) code Remedies: (i) forwarding, (ii) stall (bubble), (iii) code

reorderingreordering

Control hazardsControl hazards– Cause: out-of-sequence execution (branch or jump)Cause: out-of-sequence execution (branch or jump)– Remedies: (i) stall (bubble), (ii) branch prediction/pipeline Remedies: (i) stall (bubble), (ii) branch prediction/pipeline

flush, (iii) delayed branch/pipeline flushflush, (iii) delayed branch/pipeline flush