1Designing a multicycle processor
ESE 345 Computer Architecture
Designing a MulticycleProcessor Datapath and Control
Computer Architecture
23
How many cycles will it take to execute this code?
lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume not takenadd $t5, $t2, $t3sw $t5, 8($t3)
Label: ...
What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place?
Simple Questions
Designing a multicycle processor
24
Value of control signals is dependent upon: what instruction is being executed which step is being performed
Use the information we’ve accumulated to specify a finite state machine specify the finite state machine graphically, or use microprogramming
Implementation can be derived from specification
Implementing the Control for a Multicycle Processor
Designing a multicycle processor
25
Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input)
We’ll use a Moore machine for the output function output based only on current state
Review: Finite State Machines
Inputs
Current state
Outputs
Clock
Next-statefunction
Outputfunction
Nextstate
Designing a multicycle processor
26
Readregister 1
Readregister 2
Writeregister
Writedata
Registers ALU
Zero
Readdata 1
Readdata 2
Signextend
16 32
Instruction[31–26]
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
ALUresult
Mux
Mux
Shiftleft 2
Shiftleft 2
Instructionregister
PC 0
1
Mux
0
1
Mux
0
1
Mux
0
1A
B 0123
Mux
0
1
2
ALUOut
Instruction[15–0]
Memorydata
register
Address
Writedata
MemoryMemData
4
Instruction[15–11]
PCWriteCond
PCWrite
IorD
MemRead
MemWrite
MemtoReg
IRWrite
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
26 28
Outputs
Control
Op[5–0]
ALUcontrol
PC [31–28]
Instruction [25-0]
Instruction [5–0]
Jumpaddress[31–0]
Multicycle Processor
Designing a multicycle processor
27
Note: don’t care if not mentioned asserted if name only otherwise exact value
How many state bits will we need?
Graphical Specification of FSMMemRead
ALUSrcA = 0IorD = 0IRWrite
ALUSrcB = 01ALUOp = 00
PCWritePCSource = 00
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
MemReadIorD = 1
MemWriteIorD = 1
RegDst = 1RegWrite
MemtoReg = 0
RegDst = 1RegWrite
MemtoReg = 0
PCWritePCSource = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01
PCWriteCondPCSource = 01
Instruction decode/register fetch
Instruction fetch
0 1
Start
Jumpcompletion
9862
3
4
5 7
Memory readcompleton step
R-type completionMemoryaccess
Memoryaccess
ExecutionBranch
completionMemory address
computation
0
1
Designing a multicycle processor
28
Implementation:Finite State Machine for Control
P C W rite
P C W rite C o n d
Io rD
M e m to R e g
P C S o u rce
A L U O p
A L U S rc B
A L U S rc A
R e g W rite
R e g D s t
N S 3N S 2N S 1N S 0
Op
5
Op
4
Op
3
Op
2
Op
1
Op
0
S3
S2
S1
S0
S ta te re g is te r
IR W rite
M e m R e ad
M e m W rite
In s tru c t io n re g is te ro p c o d e fie ld
O u tp u ts
C o n tro l lo g ic
In p u ts
Designing a multicycle processor
29
PLA ImplementationOp5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
IorD
IRWrite
MemReadMemWrite
PCWritePCWriteCond
MemtoRegPCSource1
ALUOp1
ALUSrcB0ALUSrcARegWriteRegDstNS3NS2NS1NS0
ALUSrcB1ALUOp0
PCSource0
Designing a multicycle processor
30
ROM = "Read Only Memory" values of memory locations are fixed ahead of time
A ROM can be used to implement a truth table if the address is m-bits, we can address 2m entries in the ROM. our outputs are the bits of data that the address points to.
m is the "height", and n is the "width"
ROM Implementation
m n
0 0 0 0 0 1 10 0 1 1 1 0 00 1 0 1 1 0 00 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 11 1 0 0 1 1 01 1 1 0 1 1 1
Designing a multicycle processor
31
How many inputs are there?6 bits for opcode, 4 bits for state = 10 address lines(i.e., 210 = 1024 different addresses)
How many outputs are there?16 datapath-control outputs, 4 state bits = 20 outputs
ROM is 210 x 20 = 20K bits (and a rather unusual size)
Rather wasteful, since for lots of the entries, the outputs are the same
— i.e., opcode is often ignored
ROM Implementation
Designing a multicycle processor
32
Break up the table into two parts— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM— 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM— Total: 4.3K bits of ROM
PLA is much smaller— can share product terms— only need entries that produce an active output— can take into account don't cares
Size is (#inputs ´ #product-terms) + (#outputs ´ #product-terms)For this example = (10x17)+(20x17) = 510 PLA cells
PLA cells usually about the size of a ROM cell (slightly bigger)
ROM vs PLA
Designing a multicycle processor
33
Complex instructions: the "next state" is often current state + 1
Another Implementation Style
A d d rC t l
O u tp u ts
P L A o r R O M
S ta te
A d d re s s s e le c t lo g ic
Op
[5–
0]
A d d e r
In s tru c t io n re g is te ro p co d e fie ld
1
C o n tro l u n it
In p u t
P C W ri teP C W ri te C o n dIo r D
M e m to R e gP C S o u rc eA L U O pA L U S rc BA L U S rc AR e g W riteR e g D s t
IR W r ite
M e m R e a dM e m W rite
B W rite
Designing a multicycle processor
34
DetailsDispatch ROM 1 Dispatch ROM 2
Op Opcode name Value Op Opcode name Value000000 R-format 0110 100011 lw 0011000010 jmp 1001 101011 sw 0101000100 beq 1000100011 lw 0010101011 sw 0010
State number Address-control action Value of AddrCtl0 Use incremented state 31 Use dispatch ROM 1 12 Use dispatch ROM 2 23 Use incremented state 34 Replace state number by 0 05 Replace state number by 0 06 Use incremented state 37 Replace state number by 0 08 Replace state number by 0 09 Replace state number by 0 0
State
Adder
1
PLA or ROM
Mux3 2 1 0
Dispatch ROM 1Dispatch ROM 2
0
AddrCtl
Address select logic
Instruction registeropcode field
Designing a multicycle processor
35
Microprogramming
What are the “microinstructions” ?
PCWritePCWriteCondIorD
MemtoRegPCSourceALUOpALUSrcBALUSrcARegWrite
AddrCtl
Outputs
Microcode memory
IRWrite
MemReadMemWrite
RegDst
Control unit
Input
Microprogram counter
Address select logic
Adder
1
Instruction registeropcode field
BWrite
Datapath
Designing a multicycle processor
36
Microprogramming
Designing a multicycle processor
sequencercontrol
micro-PC-sequencer:fetch,dispatch,sequential
DispatchROM
Opcode
Inputs
° Microprogramming is a fundamental concept• implement an instruction set by building a very simple processor
and interpreting the instructions• essential for very complex instructions and when few register
transfers are possible• overkill when ISA matches datapath 1-1
-Code ROM
To DataPath
DecodeDecode
datapath control
microinstruction ()
37
Microprogramming
Designing a multicycle processor
° Microprogramming is a convenient method for implementing structured control state diagrams:
• Random logic replaced by microPC sequencer and ROM• Each line of ROM called a instruction:
contains sequencer control + values for control points• limited state transitions:
branch to zero, next sequential,branch to instruction address from displatch ROM
° Horizontal Code: one control bit in Instruction for every control line in datapath
° Vertical Code: groups of control-lines coded together in Instruction (e.g. possible ALU dest)
° Control design reduces to Microprogramming• Part of the design process is to develop a “language” that
describes control and is easy for humans to understand
38
“Macroinstruction” Interpretation
Designing a multicycle processor
MainMemory
executionunit
controlmemory
CPU
ADDSUBAND
DATA
.
.
.
User program plus Data
this can change!
AND microsequence
e.g., FetchCalc Operand AddrFetch Operand(s)CalculateSave Answer(s)
one of these ismapped into oneof these
39
Designing a Microinstruction Set
Designing a multicycle processor
1) Start with list of control signals2) Group signals together that make sense (vs.
random): called “fields”3) Place fields in some logical order
(e.g., ALU operation & ALU operands first andmicroinstruction sequencing last)
4) To minimize the width, encode operations that will never be used at the same time
5) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals
• Use computers to design computers
40
Multicycle datapath (with ORI but w/o Jump)
Designing a multicycle processor
IdealMemoryWrAdrDin
RAdr
32
32
32Dout
MemWr
32
AL
U
3232
ALUOp
ALUControl
32
IRWr
Instruction Reg
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mux
0
1
Rt
Rd
PCWr
ALUSelA
Mux 01
RegDst
Mux
0
1
32
PC
MemtoReg
Extend
ExtOp
Mux
0
132
0
1
23
4
16Imm 32
<< 2
ALUSelB
Mux
1
0
32
Equal
ZeroPCWrCond PCSrc
32
IorD
Mem
Data R
eg
AL
U O
utB
A
MemRd
41
Step 1 Start with List of control signals
Designing a multicycle processor
Signal name Effect when deasserted Effect when assertedALUSelA 1st ALU operand = PC 1st ALU operand = Reg[rs]RegWrite None Reg. is written MemtoReg Reg. write data input = ALU Reg. write data input = memory RegDst Reg. dest. no. = rt Reg. dest. no. = rdMemRead None Memory at address is read,
MDR <= Mem[addr]MemWrite None Memory at address is written IorD Memory address = PC Memory address = SIRWrite None IR <= MemoryPCWrite None PC <= PCSourcePCWriteCond None IF ALUzero then PC <= PCSourcePCSource PCSource = ALU PCSource = ALUoutExtOp Zero Extended Sign Extended
Sing
le B
it C
ontr
ol
Signal name Value EffectALUOp 00 ALU adds
01 ALU subtracts 10 ALU does function code11 ALU does logical OR
ALUSelB 00 2nd ALU input = 401 2nd ALU input = Reg[rt] 10 2nd ALU input = extended,shift left 2 11 2nd ALU input = extended
Mul
tiple
Bit
Con
trol
42
Step 2Group together related signals
Designing a multicycle processor
IdealMemoryWrAdrDin
RAdr
32
32
32Dout
MemWr
32
AL
U
3232
ALUOp
ALUControl
32
IRWr
Instruction Reg
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mux
0
1
Rt
Rd
PCWr
ALUSelA
Mux 01
RegDst
Mux
0
1
32
PC
MemtoReg
Extend
ExtOp
Mux
0
132
0
1
23
4
16Imm 32
<< 2
ALUSelB
Mux
1
0
32
Equal
ZeroPCWrCond PCSrc
32
IorD
Mem
Data R
eg
AL
U O
utB
A
MemRd
ALU
SRC1
SRC2
DestinationMemory
PCWrite
43
3&4) Microinstruction Format: unencoded vs. encoded fields
Designing a multicycle processor
Field Name Width Control Signals Setwide narrow
ALU Control 4 2 ALUOpSRC1 2 1 ALUSelASRC2 5 3 ALUSelB, ExtOpALU Destination 3 2 RegWrite, MemtoReg, RegDstMemory 3 2 MemRead, MemWrite, IorDMemory Register 1 1 IRWritePCWrite Control 3 2 PCWrite, PCWriteCond, PCSourceSequencing 3 2 AddrCtlTotal width 24 15 bits
44
Step 5Group into Fields, Order and Assign Names
Designing a multicycle processor2/25/04 ©UCB Spring 2004
Field Name Values for Field Function of Field with Specific ValueALU Add ALU adds
Subt. ALU subtractsFunc ALU does function codeOr ALU does logical OR
SRC1 PC 1st ALU input <= PCrs 1st ALU input <= Reg[rs]
SRC2 4 2nd ALU input <= 4Extend 2nd ALU input <= sign ext. IR[15-0]Extend0 2nd ALU input <= zero ext. IR[15-0] Extshft 2nd ALU input <= sign ex., sl IR[15-0]rt 2nd ALU input <= Reg[rt]
Dest(ination) rd ALU Reg[rd] <= ALUoutrt ALU Reg[rt] <= ALUoutrt Mem Reg[rt] <= Mem
Mem(ory) Read PC Read memory using PC; IR <= Mem [PC]Read ALU Read memory using ALUout for addrWrite ALU Write memory using ALUout for addr
PCwrite ALU PC <= ALUALUout-cond IF Zero then PC <= ALUout
Seq(uencing) Seq Go to next sequential µinstructionFetch Go to the first microinstructionDispatch 1 Dispatch using ROM1 Dispatch 2 Dispatch using ROM2
ALU SRC1 SRC2 Dest Mem Memreg Pcwrite Seq
45
A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions
Will two implementations of the same architecture have the same microcode?
Microprogramming
Designing a multicycle processor
LabelALU
control SRC1 SRC2 Dest Memory PCWrite SequencingFetch Add PC 4 Read PC ALU Seq
Add PC Extshft Dispatch 1Mem1 Add rs Extend Dispatch 2LW2 Read ALU Seq
rt MEM FetchSW2 Write ALU FetchRformat1 Func rs rt Seq
rd ALU FetchBEQ1 Subt rs rt ALUOut-cond FetchORI1 Or rs Extend0 Seq
rt ALU Fetch
46
Microcode: Trade-offs Distinction between specification and implementation is sometimes
blurred Specification Advantages:
Easy to design and write
Design architecture and microcode in parallel
Implementation (off-chip ROM) Advantages
Easy to change since values are in memory
Can emulate other architectures
Can make use of internal registers
Implementation Disadvantages, SLOWER now that:
Control is implemented on same chip as processor
ROM is no longer faster than RAM
No need to go back and make changes
Designing a multicycle processor
47
Historical Perspective In the ‘60s and ‘70s microprogramming was very important for
implementing machines This led to more sophisticated ISAs and the VAX In the ‘80s RISC processors based on pipelining became popular Pipelining the microinstructions is also possible! Implementations of IA-32 architecture processors since 486 use:
“hardwired control” for simpler instructions (few cycles, FSM control implemented using PLA or random
logic) “microcoded control” for more complex instructions
(large numbers of cycles, central control store)
The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store
Designing a multicycle processor
48
Pentium 4 Somewhere in all that “control we must handle complex instructions
Processor executes simple microinstructions, 70 bits wide (hardwired) 120 control lines for integer datapath (400 for floating point) If an instruction requires more than 3 microinstructions to implement,
control from microcode ROM (8000 microinstructions) It’s complicated!
Control
Control
Control
Enhancedfloating pointand multimedia
Control
I/Ointerface
Instruction cache
Integerdatapath
Datacache
Secondarycacheandmemoryinterface
Advanced pipelininghyperthreading support
Designing a multicycle processor
49
Overview of Control
Designing a multicycle processor
° Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique.
Initial Representation Finite State Diagram Microprogram
Sequencing Control Explicit Next State Microprogram counterFunction + Dispatch ROMs
Logic Representation Logic Equations Truth Tables
Implementation PLA ROM Technique “hardwired control” “microprogrammed control”
50
Summary If we understand the instructions…
We can build a simple processor!
If instructions take different amounts of time, multi-cycle is better
Datapath implemented using:
Combinational logic for arithmetic
State holding elements to remember bits
Control implemented using:
Combinational logic for single-cycle implementation
Finite state machine for multi-cycle implementation
Designing a multicycle processor
51
Acknowledgements These slides contain material developed and copyright
by: Morgan Kauffmann (Elsevier, Inc.) Arvind (MIT) Krste Asanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David Patterson (UCB)
Designing a multicycle processor
Top Related