The Processor Pipeline - Basavaraj Talawar · The Processor Pipeline Chapter 4, Patterson and...

Post on 16-Aug-2020

14 views 0 download

Transcript of The Processor Pipeline - Basavaraj Talawar · The Processor Pipeline Chapter 4, Patterson and...

The Processor Pipeline

Chapter 4, Patterson and Hennessy, 4ed.Section 5.3, 5.4: J P Hayes.

Pipeline

A Basic MIPS Implementation

● Memory-reference instructions – Load Word (lw) and Store Word (sw)

● ALU instructions – add, sub, AND, OR and slt● Branch on equal (beq)

Instruction Fetch – Elements

Instruction Fetch

ALU Operations – Elements

ADD R1, R2, R3

REGISTERFILE

REGISTERFILE

Addr

Data

Data

Write

ALU Operations – Elements

ADD R1, R2, R3

ALU Operations – Elements

ADD R1, R2, R3

Loads and Stores – Elements

LW R1, -8(R2)

Branches – Elements

BEQ R1, R2, LABEL BEQ R1, R2, -16

Branches – Elements

BEQ R1, R2, LABEL BEQ R1, R2, -16

Memory and R-type Instructions

Memory Instruction – Load

LW R1, -8(R2)

Memory Instruction – Store

SW R1, -8(R2)

R Type Instruction – ADD

ADD R1, R2, R3

The MIPS Datapath

The MIPS Datapath – BEQ

BEQ R1, R2, -16

MIPS Datapath and Control Lines

Pipeline Stages

Instruction Fetch (IF)Instruction Fetch (IF) ID: Instruction decode/Register file read

ID: Instruction decode/Register file read

EX: Execution/Address Calculation

EX: Execution/Address Calculation

MEM: MemoryAccess

MEM: MemoryAccess

WB: WriteBack

WB: WriteBack

Pipelined Datapath

Instruction Fetch (IF)Instruction Fetch (IF)

ID: Instruction decode/Register file read

ID: Instruction decode/Register file read

EX: Execution/Address Calculation

EX: Execution/Address Calculation

MEM: MemoryAccess

MEM: MemoryAccess

WB: WriteBack

WB: WriteBack

Pipelined vs. Nonpipelined Implementation

Pipelined vs. Nonpipelined Implementation

● Ratio of total execution times between the two versions for 10^6 instructions?

● Pipelining increases the instruction throughput opposed to individual instruction execution time.

IF ID EX MEM WB

Speedup of the Pipeline● The speedup of a k stage pipelined processor

over an unpipelined processor

S k=T unpipelinedT pipelined

=n⋅k

k+(n−1)

n: number of instructions in the program.k: number of pipeline stages

Efficiency of the Pipeline● Percentage of stages accomplishing tasks

related to the instruction in execution

η=No. of Instructions

Instruction ExecutionTime

n: number of instructions in the program.k: number of pipeline stages

η=n

k+(n−1)

Throughput of the Pipeline

● Number of tasks completed in unit time (one second)

w=η× f

f: frequency of operation

Pipeline Hazards

● Hazard: n. An unavoidable danger or risk, even though often foreseeable.

● Situations that prevent the next instruction in the instruction stream from being executing during its designated clock cycle

● Reduce the performance from the ideal speedup gained by pipelining

Structural Hazard

MEM ID EX MEM WB

MEM ID EX MEM WB

MEM ID EX MEM WB

MEM ID EX MEM WB

i1

i2

i3

i4

...

1 2 3 4 5 6 7 8 9

MEM ID EX MEM WBi5

HAZARD!!!

● Lack of resources● Solution: Increase resources

– Use of separate Data and Instruction memories in the MIPS pipeline

s

Data Hazard

IF ID EX MEM WBADD R1, R2, R3

1 2 3 4 5 6 7 8 9

IFSUB R4, R1, R5 ID EX MEM WB WRONG!

● Data (input operands) required by the instruction are not ready/available

● Data dependence● RAW, WAR, WAW dependences

ADD R1, R2, R3

SUB R2, R4, R5

ADD R1, R2, R3

SUB R1, R4, R5

Data HazardDADDDSUBANDORXOR

R4,R1,R5R6,R1,R7

R1,R2,R3

R8,R1,R9R10,R1,R11

IM REG DMDADD

DSUB

AND

OR

Time (clock cycles)

XOR

ALU REG

IM REG DMALU REG

IM REG DMALU REG

IM REG DMALU

IM REG ALU

Avoiding Data Hazards – ForwardingDADDDSUBANDORXOR

R4,R1,R5R6,R1,R7

R1,R2,R3

R8,R1,R9R10,R1,R11

IM REG DMDADD

DSUB

AND

OR

Time (clock cycles)

XOR

ALU REG

IM REG DMALU REG

IM REG DMALU REG

IM REG DMALU

IM REG ALU

Pipeline without Forwarding

Pipeline with Forwarding

Data Hazard – Load InstructionLDDSUBANDOR

R4,R1,R5R6,R1,R7

R1,0(R2)

R8,R1,R9

IM REG DMLD

DSUB

AND

OR

Time (clock cycles)

ALU REG

IM REG DMALU REG

IM REG DMALU REG

IM REG DMALU

Data Hazards – StallsLDDSUBANDOR

R4,R1,R5R6,R1,R7

R1,0(R2)

R8,R1,R9

IM REG DMLD

DSUB

AND

OR

Time (clock cycles)

ALU REG

IM REG DMALU REG

IM REG

IM REG

ALU

DMALU ALU

ALU ALU

Data Hazard – Solutions

● Data Forwarding● Instruction Reordering

Control Hazard

● Arise from the pipelining of branches and other instructions that change the PC

● Also called Branch Hazards

Branch Hazards

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

BEQ

Branch Successor

Branch Successor + 1

Branch Successor + 2

Time(clock cycles)

1 2 3 4 5 6 7 8 9

IF ID EX MEM WB

IFADD ID EX MEM WB

Assumption: Branch condition evaluation completed in the ID stageAssumption: Branch condition evaluation completed in the ID stage

Reducing Pipeline Branch Penalties

● Freeze the pipeline● Predict Taken● Predict Untaken● Fill Branch Delay Slot

IF ID EX MEM WB

IF

IF ID EX MEM WB

IF ID EX MEM WB

BEQ

AND

Branch Successor

Branch Successor + 1

Time(clock cycles)

1 2 3 4 5 6 7 8 9

ID EX MEM WB

i

i-1

i+16

i+17

Dynamic Branch Prediction● Branch prediction buffers

– Single bit predictors

– Change prediction with branch behaviour

– No. of wrong predictions?

PC Prediction

0x0100 1

0x0154 0

0x0210 1

... 1

BRANCH PREDICTION BUFFER

T T T T N T T T T T T T T T T T T

Wrong Predictions

Dynamic Branch Prediction

● 2-bit predictors 00

11

10

11

11

Branch PredictionBuffer

0x0100

0x0154

0x0210

11 10

0100