10/3/17
1
CS61C:GreatIdeasinComputerArchitecture
Lecture11:RISC-VProcessorDatapath
KrsteAsanović &RandyKatz
http://inst.eecs.berkeley.edu/~cs61c/fa17
Recap:CompleteRV32IISA
2
NotinCS61C
StateRequiredbyRV32IISAEachinstructionreadsandupdatesthisstateduringexecution:• Registers(x0..x31)
− Registerfile(orregfile)Reg holds32registersx32bits/register:Reg[0].. Reg[31]− Firstregisterreadspecifiedbyrs1fieldininstruction− Secondregisterreadspecifiedbyrs2fieldininstruction− Writeregister(destination)specifiedbyrd fieldininstruction− x0 isalways0(writestoReg[0]areignored)
• ProgramCounter(PC)− Holdsaddressofcurrentinstruction
• Memory(MEM)− Holdsbothinstructions&data,inone32-bitbyte-addressedmemoryspace− We’lluseseparatememoriesforinstructions(IMEM)anddata(DMEM)
§ Laterwe’llreplacethesewithinstructionanddatacaches− Instructionsareread(fetched)frominstructionmemory(assumeIMEM read-only)− Load/storeinstructionsaccessdatamemory
10/3/17 3
One-Instruction-Per-CycleRISC-VMachine• Oneverytickoftheclock,
thecomputerexecutesoneinstruction
• Currentstateoutputsdrivetheinputstothecombinationallogic,whoseoutputssettlesatthevaluesofthestatebeforethenextclockedge
• Attherisingclockedge,allthestateelementsareupdatedwiththecombinationallogicoutputs,andexecutionmovestothenextclockcycle
CS61c 4
Reg[]
pc
IMEM
DMEM
CombinationalLogic
clock
BasicPhasesofInstructionExecution
IMEM
+4
rs2rs1rd
Reg[]
ALU
DMEM
imm
1.InstructionFetch
2.Decode/RegisterRead
3.Execute 4.Memory 5.RegisterWrite
PC
10/3/17 5
mux
Clocktime
Implementingtheadd instruction
add rd, rs1, rs2• Instructionmakestwochangestomachine’sstate:
− Reg[rd] = Reg[rs1] + Reg[rs2]− PC = PC + 4
CS61c 6
10/3/17
2
ControlLogic
Datapath foradd
CS61c 7
+4
pcpc+4
inst[11:7]
inst[19:15]inst[24:20]
IMEM
inst[31:0] RegWriteEnable(RegWEn)
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD Reg[rs1]
Reg[rs2]+ alu
TimingDiagramforadd
8
1000 1004PC
1004 1008PC+4
add x1,x2,x3 add x6,x7,x9inst[31:0]
Clock
time
+4
pcpc+4 inst[11:7]
inst[19:15]inst[24:20]
IMEM
inst[31:0]
+RegWEn
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD Reg[rs1]
Reg[rs2]
clock
alu
Reg[2] Reg[7]Reg[rs1]
Reg[2]+Reg[3]alu Reg[7]+Reg[9]
Reg[3] Reg[9]Reg[rs2]
???Reg[1] Reg[2]+Reg[3]
Implementingthesub instruction
sub rd, rs1, rs2• Almostthesameasadd,exceptnowhavetosubtractoperandsinsteadofaddingthem• inst[30] selectsbetweenaddandsubtract
CS61c 9
ControlLogic
Datapath foradd/sub
CS61c 10
+4
pcpc+4
inst[11:7]
inst[19:15]inst[24:20]
IMEM
inst[31:0] RegWEn(1=write,0=nowrite)
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD Reg[rs1]
Reg[rs2]alu
ALU
ALUSel(Add=0/Sub=1)
ImplementingotherR-Formatinstructions
• Allimplementedbydecodingfunct3andfunct7fieldsandselectingappropriateALUfunction
CS61c 11
Implementingtheaddi instruction• RISC-VAssemblyInstruction:
addi x15,x1,-50
1210/3/17
111111001110 00001 000 01111 0010011
OP-Immrd=15ADDimm=-50 rs1=1
10/3/17
3
ControlLogic
Datapath foradd/sub
CS61c 13
+4
pcpc+4
inst[11:7]
inst[19:15]inst[24:20]
IMEM
inst[31:0] RegWEn(1=write,0=nowrite)
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD Reg[rs1]
Reg[rs2]alu
ALU
ALUSel(Add=0/Sub=1)
ControlLogic
Addingaddi todatapath
CS61c 14
+4
pcpc+4
inst[11:7]
inst[19:15]inst[24:20]
IMEM
inst[31:0]
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Reg[rs1]
Reg[rs2]
aluALU
ALUSel=Add
Imm.Gen
01
RegWEn=1
inst[31:20] imm[31:0]
ImmSel=I BSel=1
I-Formatimmediates
CS61c 15
inst[31:0]
------inst[31]-(sign-extension)------- inst[30:20]
imm[31:0]Imm.Gen
inst[31:20] imm[31:0]
ImmSel=I
• High12bitsofinstruction(inst[31:20])copiedtolow12bitsofimmediate(imm[11:0])
• Immediateissign-extendedbycopyingvalueofinst[31]tofilltheupper20bitsoftheimmediatevalue(imm[31:12]) ControlLogic
Addingaddi todatapath
CS61c 16
+4
pcpc+4
inst[11:7]
inst[19:15]inst[24:20]
IMEM
inst[31:0]
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Reg[rs1]
Reg[rs2]
aluALU
ALUSel=Add
Imm.Gen
01
RegWEn=1
inst[31:20]imm[31:0]
ImmSel=I BSel=1
AlsoworksforallotherI-formatarithmeticinstruction(slti,sltiu,andi,ori,xori,slli,srli,srai)justbychangingALUSel
TSMCAnnounces3nmCMOSFab
CS61c 17
LatestAppleiPhone8,iPhoneXuseTSMC’s10nmprocesstechnology.
3nmtechnologyshouldallow10xmorestuffonthesamesizedchip(10/3)2
Thenewmanufacturingplantwilloccupynearly200acresandcostaround$15B,openinaround5years(~2022).
Currently,fabs use193nmlighttoexposemasksFor3nm,somelayerswilluseExtremeUltra-Violet(13.5nm)
Break!
10/3/17 18
10/3/17
4
ImplementingLoadWordinstruction• RISC-VAssemblyInstruction:
lw x14, 8(x2)
1910/3/17
000000001000 00010 010 01110 0000011
LOADrd=14LWimm=+8 rs1=2
ControlLogic
Addingaddi todatapath
CS61c 20
+4
pcpc+4
inst[11:7]
inst[19:15]inst[24:20]
IMEM
inst[31:0]
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Reg[rs1]
Reg[rs2]
aluALU
ALUSel=Add
Imm.Gen
01
RegWEn=1
inst[31:20]imm[31:0]
ImmSel=I BSel=1
Addinglw todatapath
CS61c 21
IMEMALU
Imm.Gen
+4
DMEM
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr DataR 0
1pc
01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:20]
alu
mem
wbpc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel RegWEn BSel ALUSel MemRW WBSel
wb
Addinglw todatapath
CS61c 22
IMEMALU
Imm.Gen
+4
DMEM
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr DataR 0
1pc
01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:20]
alu
mem
wbpc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel=I RegWEn=1 Bsel=1 ALUSel=Add MemRW=Read WBSel=0
wb
AllRV32LoadInstructions
• Supportingthenarrowerloadsrequiresadditionalcircuitstoextractthecorrectbyte/halfword fromthevalueloadedfrommemory,andsign- orzero-extendtheresultto32bitsbeforewritingbacktoregisterfile.
23
funct3fieldencodessizeandsignedness ofloaddata
ImplementingStoreWordinstruction• RISC-VAssemblyInstruction:
sw x14, 8(x2)
2410/3/17
0000000 01110 00010 010 01000 0100011
STOREoffset[4:0]=8
SWoffset[11:5]=0
rs2=14 rs1=2
combined12-bitoffset=80000000 01000
10/3/17
5
Addinglw todatapath
CS61c 25
IMEMALU
Imm.Gen
+4
DMEM
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr DataR 0
1pc
01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:20]
alu
mem
wbpc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel RegWEn BSel ALUSel MemRW WBSel
wb
Addingsw todatapath
CS61c 26
IMEMALU
Imm.Gen
+4
DMEM
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR 0
1pc01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
alu
mem
wbpc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel RegWEn Bsel ALUSel MemRW WBSel=
wb
Addingsw todatapath
CS61c 27
IMEMALU
Imm.Gen
+4
DMEM
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR 0
1pc01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
alu
mem
wbpc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel=S RegWEn=0 Bsel=1 ALUSel=Add MemRW=Write WBSel=*
wb
*=“Don’tCare”
I-Formatimmediates
CS61c 28
inst[31:0]
------inst[31]-(sign-extension)------- inst[30:20]
imm[31:0]Imm.Gen
inst[31:20] imm[31:0]
ImmSel=I
• High12bitsofinstruction(inst[31:20])copiedtolow12bitsofimmediate(imm[11:0])
• Immediateissign-extendedbycopyingvalueofinst[31]tofilltheupper20bitsoftheimmediatevalue(imm[31:12])
I&SImmediateGenerator
CS61c 29
imm[11:5] rs2 rs1 funct3 imm[4:0] S-opcode
imm[11:0] rs1 funct3 rd I-opcode
inst[31](sign-extension) inst[30:25]
imm[31:0]
inst[31:0]
inst[24:20]
SI
inst[31](sign-extension) inst[30:25] inst[11:7]
067111214151920242531
045101131
1 65
5
S
I
• Justneeda5-bitmuxtoselectbetweentwopositionswherelowfivebitsofimmediatecanresideininstruction
• Otherbitsinimmediatearewiredtofixedpositionsininstruction
ImplementingBranches
• B-formatismostlysameasS-Format,withtworegistersources(rs1/rs2)anda12-bitimmediate• Butnowimmediaterepresentsvalues-4096to+4094in2-byteincrements• The12immediatebitsencodeeven 13-bitsignedbyteoffsets(lowestbitofoffsetisalwayszero,sononeedtostoreit)
30
10/3/17
6
Addingsw todatapath
CS61c 31
IMEMALU
Imm.Gen
+4
DMEM
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR 0
1pc01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
alu
mem
wbpc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel RegWEn Bsel ALUSel MemRW WBSel=
wb
Addingbranchestodatapath
CS61c 32
IMEMALU
Imm.Gen
+4
DMEMBranchComp.
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR
10
01
10
pc01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
alu
mem
wbalu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel
wb
Addingbranchestodatapath
CS61c 33
IMEMALU
Imm.Gen
+4
DMEMBranchComp.
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR
10
01
10
pc01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
alu
mem
wb
alu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel=B RegWEn=0 BrUnBrEq BrLT ASel=1Bsel=1
ALUSel=Add
MemRW=Read WBSel=*PCSel=taken/not-taken
wb
BranchComparator• BrEq =1,ifA=B• BrLT =1,ifA<B• BrUn =1selectsunsignedcomparisonforBrLT,0=signed
• BGEbranch:A>=B,if!(A<B)
CS61c 34
BranchComp.
A
B
BrUn BrEq BrLT
Administrivia (1/2)• Midterm1hasbeengraded!• RegradeRequestswillopentonight
− DuenextTuesday(inoneweek)− Piazzawillexplaintheinstructions
CS61c 35
Administrivia (2/2)• Project1hasbeenreleased
− Part1isduenextMonday− ProjectPartyinCory293onWednesday7-9pm(possiblylaterifneeded)
• Homework2isduethisFridayat11:59pm−Willhelptodothisbeforetheproject!
• NoGuerrillaSessionthisweek—willstartupagainnextTuesday
CS61c 36
10/3/17
7
Break!
10/3/17 37
MultiplyBranchImmediates byShift?• 12-bitimmediateencodesPC-relativeoffsetof-4096to+4094bytesinmultiplesof2
bytes• Standardapproach:treatimmediateasinrange-2048..+2047,thenshiftleftby1bitto
multiplyby2forbranches
CS61c 38
s rs2 rs1 funct3 imm[4:0] B-opcodeimm[10:5]
s imm[10:5] imm[4:0]
s imm[10:5] imm[4:0] 0
sign-extension
sign-extension
S-Immediate
B-Immediate(shiftleftby1)
Eachinstructionimmediatebitcanappearinoneoftwoplacesinoutputimmediatevalue–soneedone2-waymuxperbit
RISC-VBranchImmediates• 12-bitimmediateencodesPC-relativeoffsetof-4096to+4094bytesinmultiplesof2
bytes• RISC-Vapproach:keep11immediatebitsinfixedpositioninoutputvalue,androtate
LSBofS-formattobebit12ofB-format
CS61c 39
sign=imm[11] imm[10:5] imm[4:0]
sign=imm[12] imm[10:5] imm[4:1] 0
S-Immediate
B-Immediate(shiftleftby1)
OnlyonebitchangespositionbetweenSandB,soonlyneedasingle-bit2-waymux
imm[11]
RISC-VImmediateEncoding
40
InstructionEncodings,inst[31:0]
32-bitimmediates produced,imm[31:0]
Onlybit7ofinstructionchangesroleinimmediatebetweenSandBUpperbitssign-extendedfrominst[31]always
ImplementingJALR Instruction(I-Format)
• JALRrd,rs,immediate−WritesPC+4toReg[rd](returnaddress)− SetsPC=Reg[rs1]+immediate− Usessameimmediates asarithmeticandloads
§ no multiplicationby2bytes
41
Addingbranchestodatapath
CS61c 42
IMEMALU
Imm.Gen
+4
DMEMBranchComp.
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR
10
01
10
pc01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
alu
mem
wbalu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel
wb
10/3/17
8
Addingjalr todatapath
CS61c 43
IMEMALU
Imm.Gen
+4
DMEMBranchComp.
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR
10
0121
0pc
01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
pc+4alu
mem
wbalu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel
wb
Addingjalr todatapath
CS61c 44
IMEMALU
Imm.Gen
+4
DMEMBranchComp.
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR
10
0121
0pc
01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
pc+4alu
mem
wb
alu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel=B RegWEn=1
BrUn=* BrEq=* BrLT=*
Asel=0Bsel=1ALUSel=Add
MemRW=Read WBSel=2PCSel
wb
Implementingjal Instruction
• JALsavesPC+4inReg[rd](thereturnaddress)• SetPC=PC+offset(PC-relativejump)• Targetsomewherewithin±219 locations,2bytesapart
− ±218 32-bitinstructions• Immediateencodingoptimizedsimilarlytobranchinstructiontoreducehardwarecost
45
Addingjal todatapath
CS61c 46
IMEMALU
Imm.Gen
+4
DMEMBranchComp.
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR
10
0121
0pc
01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
pc+4alu
mem
wbalu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel
wb
Addingjal todatapath
CS61c 47
IMEMALU
Imm.Gen
+4
DMEMBranchComp.
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR
10
0121
0pc
01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
pc+4alu
mem
wb
alu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel=J RegWEn=1
BrUn=* BrEq=* BrLT=*
Asel=1Bsel=1ALUSel=Add
MemRW=Read WBSel=2PCSel
wb
Single-CycleRISC-VRV32IDatapath
CS61c 48
IMEMALU
Imm.Gen
+4
DMEMBranchComp.
Reg[]
AddrAAddrB
DataAAddrD
DataB
DataD
Addr
DataWDataR
10
0121
0pc
01
inst[11:7]
inst[19:15]inst[24:20]
inst[31:7]
pc+4alu
mem
wbalu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel
wb
10/3/17
9
AndinConclusion,…• Universaldatapath
− CapableofexecutingallRISC-Vinstructionsinonecycleeach− Notallunits(hardware)usedbyallinstructions
• 5Phasesofexecution− IF,ID,EX,MEM,WB− Notallinstructionsareactiveinallphases
• Controllerspecifieshowtoexecuteinstructions−whatnewinstructionscanbeaddedwithjustmostcontrol?
CS61c 49
Top Related