CSL718 : Superscalar Processors

33
Anshul Kumar, CSE IITD CSL718 : Superscalar CSL718 : Superscalar Processors Processors Handling Data Dependencies 24th Jan, 2006

description

CSL718 : Superscalar Processors. Handling Data Dependencies 24th Jan, 2006. Illustration 1. CDC6600 : score-boarding scheme Dispatch bound fetch FUs : INT, MUL1, MUL2, ADD/SUB, DIV 1 RS per FU 1 RF In order issue, dispatch order trivial, out of order execution. - PowerPoint PPT Presentation

Transcript of CSL718 : Superscalar Processors

Page 1: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD

CSL718 : Superscalar CSL718 : Superscalar ProcessorsProcessors

CSL718 : Superscalar CSL718 : Superscalar ProcessorsProcessors

Handling Data Dependencies

24th Jan, 2006

Page 2: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 2

Illustration 1Illustration 1Illustration 1Illustration 1

CDC6600 : score-boarding scheme

• Dispatch bound fetch

• FUs : INT, MUL1, MUL2, ADD/SUB, DIV

• 1 RS per FU

• 1 RF

• In order issue, dispatch order trivial, out of order execution

Page 3: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 3

Checking in dispatch bound fetchChecking in dispatch bound fetchChecking in dispatch bound fetchChecking in dispatch bound fetch

RegisterFile

Reservationstation

OC Rs1 Rs2 Rd

EU

decodedinstruction

check V bits of sources

update Rdset V bitRs1,Rs2,Rd

reset V bit of Rd

OC(opcode)

Os1

Os2 (operand value)

result, Rd

Page 4: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT

2 MUL1

3 MUL2

4 ADD

5 DIV

F0 F2 F4 F6 F8 F10 F12 F14

FU No

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 5: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT Y LF

2 MUL1 Y MUL

3 MUL2 N

4 ADD Y SUB

5 DIV Y DIV

F0 F2 F4 F6 F8 F10 F12 F14

FU No

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 6: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT Y LF F2 R3

2 MUL1 Y MUL F0 F2 F4

3 MUL2 N

4 ADD Y SUB F8 F6 F2

5 DIV Y DIV F10 F0 F6

F0 F2 F4 F6 F8 F10 F12 F14

FU No

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 7: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT Y LF F2 R3 Y Y

2 MUL1 Y MUL F0 F2 F4 1 N Y

3 MUL2 N

4 ADD Y SUB F8 F6 F2 1 Y N

5 DIV Y DIV F10 F0 F6 2 N Y

F0 F2 F4 F6 F8 F10 F12 F14

FU No 2 1 4 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 8: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT Y LF F2 R3 N N

2 MUL1 Y MUL F0 F2 F4 1 N Y

3 MUL2 N

4 ADD Y SUB F8 F6 F2 1 Y N

5 DIV Y DIV F10 F0 F6 2 N Y

F0 F2 F4 F6 F8 F10 F12 F14

FU No 2 1 4 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 9: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT N

2 MUL1 Y MUL F0 F2 F4 Y Y

3 MUL2 N

4 ADD Y SUB F8 F6 F2 Y Y

5 DIV Y DIV F10 F0 F6 2 N Y

F0 F2 F4 F6 F8 F10 F12 F14

FU No 2 4 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 10: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT N

2 MUL1 Y MUL F0 F2 F4 N N

3 MUL2 N

4 ADD N

5 DIV Y DIV F10 F0 F6 2 N Y

F0 F2 F4 F6 F8 F10 F12 F14

FU No 2 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 11: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT N

2 MUL1 Y MUL F0 F2 F4 N N

3 MUL2 N

4 ADD Y ADD F6 F8 F2 Y Y

5 DIV Y DIV F10 F0 F6 2 N Y

F0 F2 F4 F6 F8 F10 F12 F14

FU No 2 4 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 12: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT N

2 MUL1 Y MUL F0 F2 F4 N N

3 MUL2 N

4 ADD Y ADD F6 F8 F2 N N

5 DIV Y DIV F10 F0 F6 2 N Y

F0 F2 F4 F6 F8 F10 F12 F14

FU No 2 4 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 13: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT N

2 MUL1 Y MUL F0 F2 F4 N N

3 MUL2 N

4 ADD Y ADD F6 F8 F2 N N

5 DIV Y DIV F10 F0 F6 2 N Y

F0 F2 F4 F6 F8 F10 F12 F14

FU No 2 4 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 14: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT N

2 MUL1 N

3 MUL2 N

4 ADD Y ADD F6 F8 F2 N N

5 DIV Y DIV F10 F0 F6 Y Y

F0 F2 F4 F6 F8 F10 F12 F14

FU No 4 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 15: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT N

2 MUL1 N

3 MUL2 N

4 ADD Y ADD F6 F8 F2 N N

5 DIV Y DIV F10 F0 F6 N N

F0 F2 F4 F6 F8 F10 F12 F14

FU No 4 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 16: CSL718 : Superscalar Processors

INSTRUCTION ISSUE READ OP EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NoNAME BUSY OP Fi Fj Fk Qj Qk Rj Rk

1 INT N

2 MUL1 N

3 MUL2 N

4 ADD N

5 DIV Y DIV F10 F0 F6 N N

F0 F2 F4 F6 F8 F10 F12 F14

FU No 5

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 17: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 17

Illustration 2Illustration 2Illustration 2Illustration 2

IBM 360/91 - Tomasulo’s scheme

• Issue bound fetch

• FUs : LOAD, STORE, 3 x ADD/SUB,

2 x MUL/DIV

• Group RS’s with 1 slot per FU

• 1 RF

• In order issue, out of order execution

Page 18: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 18

Checking in issue bound fetchChecking in issue bound fetchChecking in issue bound fetchChecking in issue bound fetch

OC Os1/Is1 Vs1 Os2/Is2 Vs2 Rd

EU

decodedinstruction

OC, Os1, Os2, Rd

result, Rd

RegisterFile

update Rd, set V bitRs1,Rs2,Rdreset V bit of Rd

Os1

Os2 (operand value)

Reservation station check Vs1, Vs2

associative update ofIs1, Is2 with Rd, set Vs bits

Page 19: CSL718 : Superscalar Processors

INSTRUCTION ISSUE EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3)

MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NAME BUSY OP Vj Vk Qj Qk

ADD1

ADD2

ADD3

MUL1

MUL2

F0 F2 F4 F6 F8 F10 F12 F14

Qi

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 20: CSL718 : Superscalar Processors

INSTRUCTION ISSUE EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3) MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NAME BUSY OP Vj Vk Qj Qk

ADD1 Y SUB

ADD2 Y ADD

ADD3 N

MUL1 Y MUL

MUL2 Y DIV

F0 F2 F4 F6 F8 F10 F12 F14

Qi

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 21: CSL718 : Superscalar Processors

INSTRUCTION ISSUE EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3) MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NAME BUSY OP Vj Vk Qj Qk

ADD1 Y SUB (LD1) LD2

ADD2 Y ADD ADD1 LD2

ADD3 N

MUL1 Y MUL (F4) LD2

MUL2 Y DIV (LD1) MUL1

F0 F2 F4 F6 F8 F10 F12 F14

Qi MUL1 LD2 ADD2 ADD1 MUL2

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 22: CSL718 : Superscalar Processors

INSTRUCTION ISSUE EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3) MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NAME BUSY OP Vj Vk Qj Qk

ADD1 Y SUB (LD1) (LD2)

ADD2 Y ADD (LD2) ADD1

ADD3 N

MUL1 Y MUL (LD2) (F4)

MUL2 Y DIV (LD1) MUL1

F0 F2 F4 F6 F8 F10 F12 F14

Qi MUL1 ADD2 ADD1 MUL2

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 23: CSL718 : Superscalar Processors

INSTRUCTION ISSUE EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3) MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NAME BUSY OP Vj Vk Qj Qk

ADD1 N

ADD2 Y ADD (ADD1) (LD2)

ADD3 N

MUL1 Y MUL (LD2) (F4)

MUL2 Y DIV (LD1) MUL1

F0 F2 F4 F6 F8 F10 F12 F14

Qi MUL1 ADD2 MUL2

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 24: CSL718 : Superscalar Processors

INSTRUCTION ISSUE EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3) MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NAME BUSY OP Vj Vk Qj Qk

ADD1 N

ADD2 Y ADD (ADD1) (LD2)

ADD3 N

MUL1 Y MUL (LD2) (F4)

MUL2 Y DIV (LD1) MUL1

F0 F2 F4 F6 F8 F10 F12 F14

Qi MUL1 ADD2 MUL2

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 25: CSL718 : Superscalar Processors

INSTRUCTION ISSUE EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3) MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NAME BUSY OP Vj Vk Qj Qk

ADD1 N

ADD2 N

ADD3 N

MUL1 Y MUL (LD2) (F4)

MUL2 Y DIV (LD1) MUL1

F0 F2 F4 F6 F8 F10 F12 F14

Qi MUL1 MUL2

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 26: CSL718 : Superscalar Processors

INSTRUCTION ISSUE EX COMPL WRITERES

LF F6, 34(R2)

LF F2, 45(R3) MUL F0,F2,F4

SUB F8,F6,F2

DIVF10,F0,F6

ADD F6,F8,F2

NAME BUSY OP Vj Vk Qj Qk

ADD1 N

ADD2 N

ADD3 N

MUL1 N

MUL2 Y DIV (MUL1) (LD1)

F0 F2 F4 F6 F8 F10 F12 F14

Qi MUL2

Inst

ruct

ion

stat

usF

unct

iona

l Uni

tsR

F

Page 27: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 27

End of IllustrationEnd of IllustrationRef: Hennesy & Patterson’s Book [Ch. 4]Ref: Hennesy & Patterson’s Book [Ch. 4]

End of IllustrationEnd of IllustrationRef: Hennesy & Patterson’s Book [Ch. 4]Ref: Hennesy & Patterson’s Book [Ch. 4]

Page 28: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 28

RAW, WAR and WAWRAW, WAR and WAW(in Static Pipeline)(in Static Pipeline)

RAW, WAR and WAWRAW, WAR and WAW(in Static Pipeline)(in Static Pipeline)

IF D RF EX WB

IF D RF EX WB

IF D RF EX WB

IF D RF EX WB

RAW

WAR

IF D RF EX WB

IF D RF EX WBWAW

EX EX

Page 29: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 29

RAW, WAR and WAWRAW, WAR and WAW(in Superscalar)(in Superscalar)

RAW, WAR and WAWRAW, WAR and WAW(in Superscalar)(in Superscalar)

IF IS DP EX WB

IF IS DP EX WB

IF IS DP EX WB

write

read

write

RAW

WARWAW

Page 30: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 30

Implementation using scoreboard bitImplementation using scoreboard bitImplementation using scoreboard bitImplementation using scoreboard bit

IF IS DP EX WB

IF IS DP EX WB

IF IS DP EX WB

write

read

write

RAW

WARWAW

b 0 b 1

b 0

Page 31: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 31

CDC 6600 like ImplementationCDC 6600 like ImplementationCDC 6600 like ImplementationCDC 6600 like Implementation

IF IS DP EX WB

IF IS DP EX WB

IF IS DP EX WB

write

read

write

RAW

WARWAW

b 0 b 1

b 0

Page 32: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 32

IBM 360 like ImplementationIBM 360 like ImplementationIBM 360 like ImplementationIBM 360 like Implementation

IF IS DP EX WB

IF IS DP EX WB

IF IS DP EX WB

write

read

write

RAW

WARWAW

b 0 b 1

b 0

Page 33: CSL718 : Superscalar Processors

Anshul Kumar, CSE IITD slide 33

Use of RenamingUse of RenamingUse of RenamingUse of Renaming

IF IS DP EX WB

IF IS DP EX WB

IF IS DP EX WB

write

read

write

RAW

WARWAW