ECE/CS 552: Pipelining (Part...

Post on 09-May-2020

9 views 0 download

Transcript of ECE/CS 552: Pipelining (Part...

CS/ECE 552: Pipelining (Part 3)

Prof. Matthew D. Sinclair

Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, Josh San Miguel, and John Shen

Announcements 2/20

• Project Design Review Monday 2/24– My office, 6369 CS

• Midterm coming up next week (3/5 in class)– Closed book, one double-sided hand-written cheat sheet– Calculators allowed– MIPS green cards provided– Covers Weeks 1 through 6– Will post additional Midterm Details today under Week 7 on Canvas

• HW3 Posted Tomorrow, Due 2/28• Project Phase 1 due 3/13• HW1 Grades Released• HW2 Canvas Submission – per group

2

Announcements 2/25

• Midterm coming up next week (3/5 in class)– Closed book, one double-sided hand-written cheat sheet– Calculators allowed– MIPS green cards provided– Covers Weeks 1 through 6– Posted additional Midterm Details

• Practice Exams posted on Canvas Week 7• Link to Course Website with topics that will covered

– Next Tuesday: exam review – bring questions!

• HW3 Posted Friday, Due 2/28• Project Phase 1 due 3/13• HW1 Grades Released

– Expectations for your homework and project submissions

• HW2 Grading In Progress

3

Today’s Learning Objectives

• Analyze how branches impact the performance of pipelined programs

• Identify branch delay slots, and revise code to utilize them

• Demonstrate how branches require additional forwarding

4

Data Hazards?

• Pipelining, without forwarding:– assume RF bypassing– average CPI = 1

5

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

RAW (20%)

RAW (50%)

Data Hazards?

• Pipelining, without forwarding:– assume RF bypassing– average CPI = 1 + (1 × 20%) + (2 × 50%) = 2.2– 770 ps per instruction 6

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

RAW (20%)

RAW (50%)

Data Hazards?

• Pipelining, with forwarding:– assume RF bypassing– average CPI = 1

7

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

load-to-use (25%)

Data Hazards?

• Pipelining, with forwarding:– assume RF bypassing– average CPI = 1 + (1 × 25%) = 1.25– 437.5 ps per instruction 8

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

load-to-use (25%)

Control Dependences• Conditional branches (e.g., beq, bne):

– Branch must execute to determine which instruction to fetch next; subsequent instructions are control-dependent on the branch instruction

– COD Figure 4.65, branches resolved in ID stage:

9

Control Dependences• Conditional branches (e.g., beq, bne):

– Branch must execute to determine which instruction to fetch next; subsequent instructions are control-dependent on the branch instruction

– COD Figure 4.65, branches resolved in ID stage:

10

target

condition

Control Dependences

beq $s1, $s2, SKIP

add $s4, $s5, $s6

...

SKIP: sub $s4, $s5, $s6

With predict-not-taken (flush otherwise):

11

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F

Control Dependences

beq $s1, $s2, SKIP

add $s4, $s5, $s6

...

SKIP: sub $s4, $s5, $s6

With predict-not-taken (flush otherwise):

12

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D

add F

Control Dependences

beq $s1, $s2, SKIP

add $s4, $s5, $s6

...

SKIP: sub $s4, $s5, $s6

With predict-not-taken (flush otherwise):

13

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X

add F D

sub F

Control Dependences

beq $s1, $s2, SKIP

add $s4, $s5, $s6

...

SKIP: sub $s4, $s5, $s6

With predict-not-taken (flush otherwise):

14

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X

add F =

sub F

Control Dependences

beq $s1, $s2, SKIP

add $s4, $s5, $s6

...

SKIP: sub $s4, $s5, $s6

With predict-not-taken (flush otherwise):

15

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F =

sub F D X M W

16

COD Figure 4.65

Set PC to IF/ID.BranchAddr

Set IF/ID.Instruction to 0x00000000(sll $0, $0, 0)

Control Dependences

Control Hazards?

• Pipelining, with predict-not-taken:– assume branches resolved in ID, flush if branch taken– average CPI = 1

17

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

60% branches taken

Control Hazards?

• Pipelining, with predict-not-taken:– assume branches resolved in ID, flush if branch taken– average CPI = 1 + (1 × 60%) = 1.6– 560 ps per instruction 18

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

60% branches taken

Control Hazards?

• Pipelining, with dynamic branch prediction:– assume branches resolved in ID, flush if branch mispredicted– average CPI = 1

19

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

90% branches predicted correctly

Control Hazards?

• Pipelining, with dynamic branch prediction:– assume branches resolved in ID, flush if branch mispredicted– average CPI = 1 + (1 × 10%) = 1.1– 385 ps per instruction 20

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

90% branches predicted correctly

21

COD Figure 4.65

condition

…But RAW Hazard at ID?

add $s1, $s2, $s3

beq $s1, $s4, SKIP

With no forwarding to branch decision circuit in ID(assume RF bypassing):

22

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

add

beq

…But RAW Hazard at ID?

add $s1, $s2, $s3

beq $s1, $s4, SKIP

With no forwarding to branch decision circuit in ID(assume RF bypassing):

23

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

add F D X

beq F

…But RAW Hazard at ID?

add $s1, $s2, $s3

beq $s1, $s4, SKIP

With no forwarding to branch decision circuit in ID(assume RF bypassing):

24

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

add F D X M W

beq F * * D X M W

…But RAW Hazard at ID?

25

COD Figure 4.65

condition

…But RAW Hazard at ID?

add $s1, $s2, $s3

beq $s1, $s4, SKIP

With forwarding to branch decision circuit in ID(assume RF bypassing):

26

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

add F D X

beq F

…But RAW Hazard at ID?

add $s1, $s2, $s3

beq $s1, $s4, SKIP

With forwarding to branch decision circuit in ID(assume RF bypassing):

27

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

add F D X M W

beq F * D X M W

…But RAW Hazard at ID?

CS/ECE 552: Pipelining (Part 4)

Prof. Matthew D. Sinclair

Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, Josh San Miguel, and John Shen

Pipeline Diagrams

beq $s1, $s2, DEST

add $s4, $s5, $s6

...

DEST: sub $s4, $s5, $s6

With predict-not-taken:

29

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

Pipeline Diagrams

beq $s1, $s2, DEST

add $s4, $s5, $s6

...

DEST: sub $s4, $s5, $s6

With predict-not-taken:

30

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F =

Pipeline Diagrams

beq $s1, $s2, DEST

add $s4, $s5, $s6

...

DEST: sub $s4, $s5, $s6

With predict-not-taken:

31

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F =

DEST F D X M W

Pipeline Diagrams

beq $s1, $s2, DEST

add $s4, $s5, $s6

...

DEST: sub $s4, $s5, $s6

With predict-not-taken:

32

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F D X M W

DEST F D X M W

NOP

33

beq

Control Hazards

add

cycle 2:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F =

DEST F D X M W

34

add (NOP)

Control Hazards

DEST beq

cycle 3:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F =

DEST F D X M W

35

add (NOP)

Control Hazards

DEST beq

cycle 4:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F =

DEST F D X M W

36

add (NOP)

Control Hazards

DEST beq

cycle 5:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F =

DEST F D X M W

37

add (NOP)

Control Hazards

DEST

cycle 6:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F =

DEST F D X M W

Branch Delay Slots

beq $s1, $s2, DEST

add $s4, $s5, $s6 # branch delay slot

...

DEST: sub $s4, $s5, $s6

With one branch delay slot:

38

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

Branch Delay Slots

beq $s1, $s2, DEST

add $s4, $s5, $s6 # branch delay slot

...

DEST: sub $s4, $s5, $s6

With one branch delay slot:

39

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F D X M W

Branch Delay Slots

beq $s1, $s2, DEST

add $s4, $s5, $s6 # branch delay slot

...

DEST: sub $s4, $s5, $s6

With one branch delay slot:

40

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

add F D X M W

sub F D X M W

Branch Delay Slots

beq $s1, $s2, DEST

sll $0, $0, 0 # branch delay slot

add $s4, $s5, $s6

...

DEST: sub $s4, $s5, $s6

With one branch delay slot:

41

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

beq F D X M W

NOP F D X M W

sub F D X M W

Branch Delay Slots

sub $s5, $s6, $s7

add $s1, $s2, $s3

beq $s1, $s4, T1

sll $0, $0, 0 # branch delay slot

add $s4, $s5, $s6

j T2

T1: or $s4, $s4, $s7

slt $s2, $s4, $s6

T2: and $s2, $s2, $s3

42

Branch Delay Slots

sub $s5, $s6, $s7

add $s1, $s2, $s3

beq $s1, $s4, T1

sub $s5, $s6, $s7 # branch delay slot – legal?

add $s4, $s5, $s6

j T2

T1: or $s4, $s4, $s7

slt $s2, $s4, $s6

T2: and $s2, $s2, $s3

43

Branch Delay Slots

sub $s5, $s6, $s7

add $s1, $s2, $s3

beq $s1, $s4, T1

sub $s5, $s6, $s7 # branch delay slot – legal? yes

add $s4, $s5, $s6

j T2

T1: or $s4, $s4, $s7

slt $s2, $s4, $s6

T2: and $s2, $s2, $s3

44

Branch Delay Slots

sub $s5, $s6, $s7

add $s1, $s2, $s3

beq $s1, $s4, T1

add $s1, $s2, $s3 # branch delay slot – legal?

add $s4, $s5, $s6

j T2

T1: or $s4, $s4, $s7

slt $s2, $s4, $s6

T2: and $s2, $s2, $s3

45

Branch Delay Slots

sub $s5, $s6, $s7

add $s1, $s2, $s3

beq $s1, $s4, T1

add $s1, $s2, $s3 # branch delay slot – legal? no

add $s4, $s5, $s6

j T2

T1: or $s4, $s4, $s7

slt $s2, $s4, $s6

T2: and $s2, $s2, $s3

46

Branch Delay Slots

sub $s5, $s6, $s7

add $s1, $s2, $s3

beq $s1, $s4, T1

add $s4, $s5, $s6 # branch delay slot – legal?

add $s4, $s5, $s6

j T2

T1: or $s4, $s4, $s7

slt $s2, $s4, $s6

T2: and $s2, $s2, $s3

47

Branch Delay Slots

sub $s5, $s6, $s7

add $s1, $s2, $s3

beq $s1, $s4, T1

add $s4, $s5, $s6 # branch delay slot – legal? no

add $s4, $s5, $s6

j T2

T1: or $s4, $s4, $s7

slt $s2, $s4, $s6

T2: and $s2, $s2, $s3

48

Branch Delay Slots

sub $s5, $s6, $s7

add $s1, $s2, $s3

beq $s1, $s4, T1

or $s4, $s4, $s7 # branch delay slot – legal?

add $s4, $s5, $s6

j T2

T1: or $s4, $s4, $s7

slt $s2, $s4, $s6

T2: and $s2, $s2, $s3

49

Branch Delay Slots

sub $s5, $s6, $s7

add $s1, $s2, $s3

beq $s1, $s4, T1

or $s4, $s4, $s7 # branch delay slot – legal? yes

add $s4, $s5, $s6

j T2

T1: or $s4, $s4, $s7

slt $s2, $s4, $s6

T2: and $s2, $s2, $s3

50

Branch Delay Slots

51

Branch Delay Slots

jal FUNC

sll $0, $0, 0 # branch delay slot

add $s4, $s5, $s6

...

FUNC:

or $s4, $s5, $s6

jr $ra

52

Branch Delay Slots

jal FUNC

sll $0, $0, 0 # branch delay slot

add $s4, $s5, $s6

...

FUNC:

or $s4, $s5, $s6

jr $ra

sll $0, $0, 0 # branch delay slot

53

BACKUP

54

Why Pipelining?

55

Why Pipelining?

56

Why Pipelining?

57

Why Pipelining?

58

Why Pipelining?

59

Why Pipelining?

60

Why Pipelining?

61

Why Pipelining?

62

Why Pipelining?

63

Why Pipelining?

64

Why Pipelining?

65

Why Pipelining?

66

Why Pipelining?

67

Why Pipelining?

68

I1I2I3

Why Pipelining?

69

I1

I2I3I4

Why Pipelining?

70

I1

I2I3I4

Why Pipelining?

71

I1

I2I3I4

Why Pipelining?

72

I1

I2I3I4

Why Pipelining?

73

I2I3I4

I1

Why Pipelining?

74

I1I2I3

Why Pipelining?

75

I2I3I4

I1

Why Pipelining?

76

I3I4I5

I2 I1

Why Pipelining?

77

I4I5I6

I3 I2 I1

Why Pipelining?

78

I5I6I7

I4 I3 I2 I1

Why Pipelining?

79

I6I7I8

I5 I4 I3 I2 I1

Why Pipelining?

80

250 ps 150 ps 100 ps 350 ps 150 ps

Why Pipelining?

• Single-cycle:– clock period = 1 ns– CPI = 1– 1 ns per instruction 81

250 ps 150 ps 100 ps 350 ps 150 ps

Why Pipelining?

• Pipelining:– clock period = max{IF,ID,EX,MEM,WB} = 350 ps

82

250 ps 150 ps 100 ps 350 ps 150 ps

Why Pipelining?

• Pipelining:– clock period = max{IF,ID,EX,MEM,WB} = 350 ps– individual CPI = 5

83

250 ps 150 ps 100 ps 350 ps 150 ps

Why Pipelining?

• Pipelining:– clock period = max{IF,ID,EX,MEM,WB} = 350 ps– individual CPI = 5, average CPI = (#insns + 4) / #insns ≈ 1

84

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

Why Pipelining?

• Pipelining:– clock period = max{IF,ID,EX,MEM,WB} = 350 ps– individual CPI = 5, average CPI = (#insns + 4) / #insns ≈ 1– 350 ps per instruction 85

250 ps 150 ps 100 ps 350 ps 150 ps

I5 I4 I3 I2 I1

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

86

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

87

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

lw $s3 F

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

88

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

lw $s3 F D* D

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

89

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

lw $s3 F D* D X M W

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

90

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

lw $s3 F D* D X M W

add F* F

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

91

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

lw $s3 F D* D X M W

add F* F D* D

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

92

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

lw $s3 F D* D X M W

add F* F D* D X M W

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

93

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

add F* F D* D X M W

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

94

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

Pipeline Diagrams

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

95

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

stalls

NOP

NOP

96

lw $s3

Data Hazards

add lw $s1

cycle 3:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

97

lw $s3

Data Hazards

add lw $s1

cycle 3:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

98

lw $s3

Data Hazards

add NOP

cycle 4:

lw $s1

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

99

lw $s3

Data Hazards

add NOP

cycle 5:

lw $s1

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

100

lw $s3

Data Hazards

add NOP

cycle 5:

lw $s1

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

101

lw $s3

Data Hazards

add NOP

cycle 5:

lw $s1

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

102

lw $s3

Data Hazards

add NOP

cycle 6:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

NOP

103

lw $s3

Data Hazards

add

cycle 7:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

NOP

104

lw $s3

Data Hazards

add

cycle 7:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

NOP

105

Data Hazards

add

cycle 8:

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

NOP

MEM-to-EX Forwarding

lw $s1, 0($s2)

lw $s3, 4($s1)

add $s5, $s4, $s3

Assume full forwarding and bypassing:

106

insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13

lw $s1 F D X M W

X M W

lw $s3 F D* D X M W

X M W

add F* F D* D X M W

107

COD Figure 4.65

MEM-to-EX Forwarding

108

COD Figure 4.65

Why not this?

109

COD Figure 4.65

How about this?