CS 61C: Great Ideas in Computer Architecture Lecture 13 ...cs61c/sp18/lec/13/lec13.pdf · Great...

CS61C:GreatIdeasinComputerArchitecture

Lecture13:Pipelining

JohnWawrzynek&NickWeaverhttp://inst.eecs.berkeley.edu/~cs61c/sp18


Agenda• RISC-VPipeline• PipelineControl• Hazards

− Structural− Data

▪ R-typeinstructions▪ Load

− Control• SuperscalarprocessorsCS61c 2

Recap:PipeliningwithRISC-V

CS61c 3

addt0,t1,t2

ort3,t4,t5

sllt6,t0,t3tcycle

instructionsequence

tinstruction

SingleCycle Pipelining

Timing tstep=100…200ps tcycle=200ps

Registeraccessonly100ps Allcyclessamelength

Instructiontime,tinstruction =tcycle=800ps 1000ps

Clockrate,fs 1/800ps=1.25GHz 1/200ps=5GHz

Relativespeed 1x 4x


RISC-VPipelineaddt0,t1,t2

ort3,t4,t5

sltt6,t0,t3

tcycle=200ps

instructionsequence

tinstruction=1000ps

swt0,4(t3)

lwt0,8(t3)

addit2,t2,1

Resourceuseofinstructionovertime

Resourceuseinaparticulartimeslot

CS61c 4

Single-CycleRISC-VRV32IDatapath

CS61c 5

IMEMALU

Imm.Gen

+4

DMEM

BranchComp.

Reg[]

AddrAAddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0

0121

0pc

0

1

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wbReg[rs1]

PipeliningRISC-VRV32IDatapath

CS61c 6

IMEMALU

Imm.Gen

+4

DMEM

BranchComp.

Reg[]

AddrAAddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0

0121

0pc

0

1

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

pc

imm[31:0]

Reg[rs2]

wb

InstructionFetch(F)

InstructionDecode/RegisterRead

(D)

ALUExecute(X)

MemoryAccess(M)

WriteBack(W)

Reg[rs1]

PipelinedRISC-VRV32IDatapath

CS61c7

IMEM

ALU+4

DMEMBranchComp.

Reg[]

AddrA

AddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0aluX

pcF+4

+4pcDpcFpcX pcM

instD

instX

rs1X

rs2X

aluM

rs2MimmXImm.

RecalculatePC+4inMstagetoavoidsendingbothPCandPC+4downpipeline

instM instW

Mustpipelineinstructionalongwithdata,socontroloperatescorrectlyineachstage

Eachstageoperatesondifferentinstruction

CS61c8

IMEM

ALU

+4

DMEMBranchComp.

Reg[]

AddrA

AddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0aluX

pcF+4

+4pcDpcFpcX pcM

instD

instX

rs1X

rs2X

aluM

rs2MimmXImm.instM instW

addt0,t1,t2

ort3,t4,t5sltt6,t0,t3swt0,4(t3)lwt0,8(t3)

Pipelineregistersseparatestages,holddataforeachinstructioninflight

ClickerQuestion

9

Time=InstructionsCyclesTimeProgramProgram*Instruction*Cycle

Pipeliningthesingle-cycleprocessorcanincreaseprocessorperformanceby:

Instructions/program

Cycles/instruction

Time/cycle

A decrease decrease same

B same increase decrease

C same same decrease

D increase decrease increase

PipelinedControl• Controlsignalsderivedfrominstruction

− Asinsingle-cycleimplementation− Informationisstoredinpipelineregistersforusebylaterstages

CS61c 11


HazardsAhead

CS61c 12


StructuralHazard• Problem:Twoormoreinstructionsinthepipelinecompeteforaccesstoasinglephysicalresource• Solution1:Instructionstaketurnstouseresource,someinstructionshavetostall• Solution2:Addmorehardwaretomachine• Canalwayssolveastructuralhazardbyaddingmorehardware

CS61c 14


RegfileStructuralHazards• Eachinstruction:

− canreaduptotwooperandsindecodestage− canwriteonevalueinwritebackstage

• Avoidstructuralhazardbyhavingseparate“ports”− twoindependentreadportsandoneindependentwriteport

• Threeaccessespercyclecanhappensimultaneously

CS61c 15


StructuralHazard:MemoryAccess

addt0,t1,t2

ort3,t4,t5

sltt6,t0,t3

instructionsequence

swt0,4(t3)

lwt0,8(t3)

• Instructionanddatamemoryusedsimultaneously✓ Usetwoseparate

memories

CS61c 16


InstructionandDataCaches

17CS61c

Processor

Control

DatapathPC

RegistersArithmetic&LogicUnit

(ALU)

Memory(DRAM)

Bytes

Program

Data

InstructionCache

DataCache

Caches:smallandfast“buffer”memories


StructuralHazards–Summary• Conflictforuseofaresource• InRISC-Vpipelinewithasinglememory

− Load/storerequiresdataaccess− Withoutseparatememories,instructionfetchwouldhavetostallforthatcycle▪ Allotheroperationsinpipelinewouldhavetowait

• Pipelineddatapathsrequireseparateinstruction/datamemories− Orseparateinstruction/datacaches

• RISCISAs(includingRISC-V)designedtoavoidstructuralhazards− e.g.atmostonememoryaccess/instruction

18


DataHazard:RegisterAccess

addt0,t1,t2

ort3,t4,t5

sltt6,t4,t3

instructionsequence

swt0,4(t3)

lwt0,8(t3)

• Separateports,butwhatifwritetosamevalueasread?• Doesswintheexamplefetchtheoldornewvalue?

CS61c 20


RegisterAccessPolicy

addt0,t1,t2

ort3,t4,t5

sltt6,t4,t3

instructionsequence swt0,4(t3)

lwt0,8(t3)

• Exploithighspeedofregisterfile(100ps)

1) WBupdatesvalue2) IDreadsnewvalue

• Indicatedindiagrambyshading

CS61c 21

Mightnotalwaysbepossibletowritethenreadinsamecycle,especiallyinhigh-frequencydesigns.


DataHazard:ALUResult

adds0,t0,t1

subt2,s0,t0

ort6,s0,t3

instructionsequence

xort5,t1,s0

sws0,8(t3)

5 5 5 5 5/9 9 9 9 9Valueofs0

Withoutsomefix,subandorwillcalculatewrongresult!CS61c 22

s0holds“5”thenaddinstrchangess0to“9”

Solution1:Stalling• Problem:Instructiondependsonresultfrompreviousinstruction

− add s0,t0,t1sub t2,s0,t3

• Bubble:• stalldependentinstruction− effectivelyNOP:affectedpipelinestagesdo“nothing”

StallsandPerformance

• Stallsreduceperformance− Butstallsarerequiredtogetcorrectresults

• Compilercouldtrytoarrangecodetoavoidhazardsandstalls− Requiresknowledgeofthepipelinestructure

CS61c 24

Solution2:Forwarding

addt0,t1,t2

ort3,t0,t5

subt6,t0,t3

instructionsequence

xort5,t1,t0

swt0,8(t3)

5 5 5 5 5/9 9 9 9 9Valueoft0

Forwarding:graboperandfrompipelinestage,ratherthanregisterfileCS61c 25


Forwarding(akaBypassing)• Useresultwhenitiscomputed

− Don’twaitforittobestoredinaregister− Requiresextraconnectionsinthedatapath

CS61c 26

1)DetectNeedforForwarding(example)

addt0,t1,t2

ort3,t0,t5

subt6,t0,t3

instX.rd

instD.rs1

CS61c 27

Comparedestinationofolderinstructionsinpipelinewithsourcesofnewinstructionindecodestage.Mustignorewritestox0!

ExampleForwardingPath

CS61c28

IMEM

ALU

+4

DMEMBranchComp.

Reg[]

AddrA

AddrB

DataA

AddrD

DataB

DataD

Addr

DataWDataR

1

0aluX

pcF+4

+4pcDpcFpcX pcM

instD

instX

rs1X

rs2X

aluM

rs2MimmXImm.instM instW

ForwardingControlLogic

Sameideaextendstors2,andtoinstructioninstD,instMpairing

Administrivia• Project3.1stillduenextWednesday(3/7)• Homework2dueFriday(11:59PM)• ProjectpartyonbothMonday(8-10,Cory293)andWednesday(7-10,Cory293)• GuerrillasessionTonight7-9pm,Barrows20!• Midterm2,March20,ismovedto8-10PM(was7-9onthewebsite)• Alternativeexamearlier,6-8PM(sopeopledon’tneedtobeinexamsuntilmidnight:)• submitexamconflictformiftheyhaven’t

CS61c 29


LoadDataHazard

1cyclestallunavoidable

CS61c 31

forward

unaffected


StallPipeline

Stall

CS61c 32

repeatandinstructionandforward


lwDataHazard• Slotafteraloadiscalledaloaddelayslot

− Ifthatinstructionusestheresultoftheload,thenthehardwarewillstallforonecycle

− Equivalenttoinsertinganexplicitnopintheslot▪ exceptthelatterusesmorecodespace

− Performanceloss!• Idea:

− Putunrelatedinstructionintoloaddelayslot− Noperformanceloss!

33CS61c


CodeSchedulingtoAvoidStalls• Reordercodetoavoiduseofloadresultinthenextinstruction!• RISC-VcodeforD=A+B; E=A+C;

34

Original Order: lw t1, 0(t0) lw t2, 4(t0) add t3, t1, t2 sw t3, 12(t0) lw t4, 8(t0) add t5, t1, t4 sw t5, 16(t0)

Alternative: lw t1, 0(t0) lw t2, 4(t0) lw t4, 8(t0) add t3, t1, t2 sw t3, 12(t0) add t5, t1, t4 sw t5, 16(t0)

Stall!

Stall!

13cycles11cyclesCS61c


ControlHazards

beqt0,t1,label

subt2,s0,t5

ort6,s0,t3

xort5,t1,s0

sws0,8(t3)

executedregardlessofbranchoutcome!

executedregardlessofbranchoutcome!!!

PCupdatedreflectingbranchoutcome

CS61c 36


Observation• Ifbranchnottaken,theninstructionsfetchedsequentiallyafterbrancharecorrect• Ifbranchorjumptaken,thenneedtoflushincorrectinstructionsfrompipelinebyconvertingtoNOPs

CS61c 37


KillInstructionsafterBranchifTaken

beqt0,t1,label

subt2,s0,t5

ort6,s0,t3

label:xxxxxx PCupdatedreflectingbranchoutcome

CS61c 38

Takenbranch

ConverttoNOP

ConverttoNOP


ReducingBranchPenalties• Everytakenbranchinsimplepipelinecosts2deadcycles• Toimproveperformance,use“branchprediction”toguesswhichwaybranchwillgoearlierinpipeline• Onlyflushpipelineifbranchpredictionwasincorrect

CS61c 39


BranchPrediction

beqt0,t1,label

label:…..

…..

CS61c 40

Takenbranch

GuessnextPC!

Checkguesscorrect


IncreasingProcessorPerformance1. Clockrate

− Limitedbytechnologyandpowerdissipation2. Pipelining

− “Overlap”instructionexecution− Deeperpipeline:5=>10=>15stages

▪ Lessworkperstageà shorterclockcycle▪ Butmorepotentialforhazards(CPI>1)

3. Multi-issue“super-scalar”processor− Multipleexecutionunits(ALUs)

▪ Severalinstructionsexecutedsimultaneously▪ CPI<1(ideally)

CS61c 42


SuperscalarProcessor

CS61c 43

P&Hp.340


Benchmark:CPIofIntelCorei7

CS61c 44

CPI=1

P&Hp.350


InConclusion• Pipeliningincreasesthroughputbyoverlappingexecutionofmultipleinstructions• Allpipelinestageshavesameduration

− Choosepartitionthataccommodatesthisconstraint

• Hazardspotentiallylimitperformance− Maximizingperformancerequiresprogrammer/compilerassistance− E.g.LoadandBranchdelayslots

• Superscalarprocessorsusemultipleexecutionunitsforadditionalinstructionlevelparallelism− Performancebenefithighlycodedependent

45CS61c


ExtraSlides

CS61c 46


PipeliningandISADesign• RISC-VISAdesignedforpipelining

− Allinstructionsare32-bits▪ Easytofetchanddecodeinonecycle▪ Versusx86:1-to15-byteinstructions

− Fewandregularinstructionformats▪ Decodeandreadregistersinonestep

− Load/storeaddressing▪ Calculateaddressin3rdstage,accessmemoryin4thstage

− Alignmentofmemoryoperands▪ Memoryaccesstakesonlyonecycle

CS61c 47


SuperscalarProcessor• Multipleissue“superscalar”

− Replicatepipelinestages⇒multiplepipelines− Startmultipleinstructionsperclockcycle− CPI<1,souseInstructionsPerCycle(IPC)− E.g.,4GHz4-waymultiple-issue

▪ 16BIPS,peakCPI=0.25,peakIPC=4− Dependenciesreducethisinpractice

• “Out-of-Order”execution− Reorderinstructionsdynamicallyinhardwaretoreduceimpactofhazards

• CS152discussesthesetechniques!CS61c 48

CS 61C: Great Ideas in Computer Architecture Lecture 13 ...cs61c/sp18/lec/13/lec13.pdf · Great...

Documents

Transcript of CS 61C: Great Ideas in Computer Architecture Lecture 13 ...cs61c/sp18/lec/13/lec13.pdf · Great...