Multi Cycle CPU Mid-term Review Discussion...
Transcript of Multi Cycle CPU Mid-term Review Discussion...
1
CS141-L4-1 Tarun Soni, Summer ‘03
Mult i Cycle CPU
�Previously: built a Single Cycle CPU.
�Today:
�Exceptions
�Multi-cycle CPU;
�Microprogramming
CS141-L4-2 Tarun Soni, Summer ‘03
Mid-term Review Discussion Session
�Peterson Hall 104
�Tue: 2-3 pm
�Tue: 3-4 pm
0
5
10
1520
25
30
35
40
45
50
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55
Series1
CS141-L4-3 Tarun Soni, Summer ‘03
�Instruction Set Architectures
�Performance issues
�2s complement, Addition, Subtraction
�Multiplication, Division, Floating Point numbers
�ALUs
�Single Cycle CPU
�Exceptions
�Multicycle CPU: datapath; control
�Microprogramming
The Story so far:
CS141-L4-4 Tarun Soni, Summer ‘03
• Design alternative:
– provide more powerful operations
– goal is to reduce number of instructions executed
– danger is a slower cycle time and/or a higher CPI
• Sometimes referred to as “RISC vs. CISC”
– virtually all new instruction sets since 1982 have been RISC
– VAX: minimize code size, make assembly language easyinstructions from 1 to 54 bytes long!
• We’ll look at Pentium, UltraSparc and JVM
Alternative Architectures
CS141-L4-5 Tarun Soni, Summer ‘03
Pentium
CS141-L4-6 Tarun Soni, Summer ‘03
Java VM
• Most i nstr one byte– ADD– POP
• One byte arg– ILOAD IND8– BIPUSH CON8
• Two byte arg– SIPUSH CON16– IF_ICMPEQ OFFSET16
• Type = int, signed int etc.
2
CS141-L4-7 Tarun Soni, Summer ‘03
UltraSparc
CS141-L4-8 Tarun Soni, Summer ‘03
Exceptions
or
Oops!
CS141-L4-9 Tarun Soni, Summer ‘03
Exceptions
• There are two sources of non-sequential control flow in a processor– explicit branch and jump instructions– exceptions
• Branches are synchronous and determinist ic• Exceptions are typically asynchronous and non-determinist ic• Guess which is more difficult to handle?
• exceptions as any unexpected change in control flow
• interrupts as any externally-caused exception
• Literature is not consistent
arithmetic overflow
divide by zero
I/O device signals completion to CPU
user program invokes the OS
memory parity error
il legal instruction
timer signal
CS141-L4-10 Tarun Soni, Summer ‘03
Exceptions
• The machine we’ve been designing in class can generate two types of exceptions.
– arithmetic overflow
– illegal instruction
• On an exception, we need to
– save the PC (invisible to user code)
– record the nature of the exception/interrupt
– transfer control to OS
user programSystemExceptionHandlerException:
return fromexception
CS141-L4-11 Tarun Soni, Summer ‘03
Exceptions
• MIPS architecture defines the instruction as having no effect if the instruction causes an exception.
• When we get to virtual memory we will see that certain classes of exceptions must prevent the instruction from changing the machine state.
• This aspect of handling exceptions becomes complex and potentially limits performance => why it is hard
• Interrupts
– caused by external events
– asynchronous to program execution
– may be handled between instructions
– simply suspend and resume user program
• Traps/Exceptions
– caused by internal events
• exceptional conditions (overflow)
• errors (parity)
• faults (non-resident page)
– synchronous to program execution
– condition must be remedied by the handler
– instruction may be retried or simulated and program continued or program may be aborted
CS141-L4-12 Tarun Soni, Summer ‘03
Exceptions
Addressing the Exception Handler
• Traditional Approach: Interupt Vector– PC <- MEM[ IV_base + cause || 00]– 370, 68000, Vax, 80x86, . . .
• RISC Handler Table– PC <– IT_base + cause || 0000– saves state and jumps– Sparc, PA, M88K, . . .
• MIPS Approach: fixed entry– PC <– EXC_addr– Actually very small table
• RESET entry• TLB • other
iv_basecause
handlercode
iv_basecause
handler entry code
3
CS141-L4-13 Tarun Soni, Summer ‘03
Exceptions
Saving State
• Push it onto the stack
– Vax, 68k, 80x86
• Save it in special registers– MIPS EPC, BadVaddr, Status, Cause
• Shadow Registers
– M88k– Save state in a shadow of the internal pipeline registers
Significant component of “ interrupt response time”
CS141-L4-14 Tarun Soni, Summer ‘03
Exceptions
• For our MIPS-subset architecture, we will add two registers:
– EPC: a 32-bit register to hold the user’s PC
– Cause: A register to record the cause of the exception
• we’ll assume undefined inst = 0, overflow = 1
• We will also add three control signals:
– EPCWrite (will need to be able to subtract 4 from PC)
– CauseWrite
– IntCause
• We will extend PCSource multiplexor to be able to latch the interrupt handler address into the PC.
CS141-L4-15 Tarun Soni, Summer ‘03
Cau
se
CauseWrite
IntCause
EPC
PC
PCWrite EPCWrite
PCSource
InterruptHandlerAddress
sub4
imm
16
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Extender
Mux
3216imm16
ALUSrcExtOp
Mux
MemtoReg
Clk
Data InWrEn32 Adr
DataMemory
MemWrA
LU
Equal
Instruction<31:0>
0
1
0
1
01
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRtRs
=
Adder
Adder
PC
Clk
00
Mux
4
nPC_sel
PC
Ext
Adr
InstMemory
Exceptions
CS141-L4-16 Tarun Soni, Summer ‘03
ALUctrRegDst ALUSrcExtOp MemtoRegMemWr Equal
Instruction<31:0>
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
nPC_sel
Adr
InstMemory
DATA PATH
Cont rol
Op
<21:25>
Fun
RegWrExceptionSignals
Exceptions: Creating a “ Control line”
Regs: – EPC: – Cause:
control s ignals:– EPCWrite (subtract 4 from PC)– CauseWrite– IntCause
CS141-L4-17 Tarun Soni, Summer ‘03
Clk
5
Rw Ra Rb
32 32-bitRegisters
Rd
AL
U
Clk
Data In
DataAddress
IdealData
Memory
Instruction
InstructionAddress
IdealInstruction
Memory
Clk
PC
5Rs
5Rt
16Imm
32
323232
A
B
Nex
t A
dd
ress
Regs: – EPC: – Cause:
control s ignals:– EPCWrite (subtract 4 from PC)– CauseWrite– IntCause
Extend PCSource MUX to include jump address from int-table
Exceptions: Creating the data path
CS141-L4-18 Tarun Soni, Summer ‘03
CPU
Multi Cycle CPU
4
CS141-L4-19 Tarun Soni, Summer ‘03
CPU
The Big Picture: Where are We Now?
• The Five Classic Components of a Computer
• Datapath Design, then Control Design
Cont rol
Datapath
Memory
Processor
Input
Output
CS141-L4-20 Tarun Soni, Summer ‘03
Recap: Processor Design is a Process
• Bottom-up
– assemble components in target technology to establish critical timing
• Top-down
– specify component behavior from high-level requirements
• Iterative refinement
– establish partial solution, expand and improve
datapath control
processorInstruction SetArchitecture
=>
Reg. File Mux ALU Reg Mem Decoder Sequencer
Cells Gates
CS141-L4-21 Tarun Soni, Summer ‘03
CPU: The single cycle
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction ° Design hardware for each of these steps!!!
Execute anentire instruction
Fet
ch
Dec
ode
Fet
ch
Exe
cute
Sto
re
Nex
t
CS141-L4-22 Tarun Soni, Summer ‘03
CPU: Clocking
Clk
Don’ t Care
Setup Hold
.
.
.
.
.
.
.
.
.
.
.
.
Setup Hold
• All storage elements are clocked by the same clock edge
CS141-L4-23 Tarun Soni, Summer ‘03
CPU: Main Control PLA Implementation of the Main Control
op<0>
op<5>. .op<5>. .
<0>
op<5>. .
<0>
op<5>. .
<0>
op<5>. .
<0>
op<5>. .
<0>
R-type ori lw sw beq jumpRegWrite
ALUSrc
MemtoReg
MemWrite
Branch
Jump
RegDst
ExtOp
ALUop<2>
ALUop<1>
ALUop<0>CS141-L4-24 Tarun Soni, Summer ‘03
CPU: Main Control
• In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction”
– in general, the controller is a finite state machine
– microinstruction can also control sequencing (see later)
Control Logic / Store(PLA, ROM)
OPcode
Datapath
Inst
ruct
ion
Decode
Co
nditi
ons
ControlPoints
microinstruction
5
CS141-L4-25 Tarun Soni, Summer ‘03
CPU: Abstract View of a single cycle processor
• looks like a FSM with PC as state
PC
Ne
xt P
C
Reg
iste
rF
etch ALU
Reg
. W
rt
Mem
Acc
ess
Dat
aM
emInst
ruct
ion
Fet
ch
Res
ult S
tore
AL
Uct
r
Reg
Dst
AL
US
rcE
xtO
p
Mem
Wr
Eq
ual
nPC
_sel
Reg
Wr
Mem
Wr
Mem
Rd
MainControl
ALUcontrol
op
fun
Ext
CS141-L4-26 Tarun Soni, Summer ‘03
CPU: Why is a CPI=1 processor bad?
• Long Cycle Time• All instructions take as much time as the slowest• Real memory is not so nice as our idealized memory
– cannot always get the job done in one (short) cycle
PC Inst Memory mux ALU Data Mem mux
PC Reg FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
CS141-L4-27 Tarun Soni, Summer ‘03
3.0--.1.911beq
3.9-1-.911Store
4.7.81-.911Load
3.7.8--.911R-type
TotalR-WriteD cachePC update
ALUDecode,R-Read
I cache
•Load needs 5 cycles•Store and R-type need 4•beq needs 3
Goal: balance amount of work done each cycle.
CPU: Why is a CPI=1 processor bad?
CS141-L4-28 Tarun Soni, Summer ‘03
CPU: Reducing Cycle Time
• Cut combinational dependency graph and insert register / latch
• Do same work in two fast cycles, rather than one slow one
storage element
Acyclic CombinationalLogic
storage element
storage element
Acyclic CombinationalLogic (A)
storage element
storage element
Acyclic CombinationalLogic (B)
=>
CS141-L4-29 Tarun Soni, Summer ‘03
CPU: Building blocks
• Adder
• MUX
• ALU
32
32
A
B
32Sum
Carry
32
32
A
B
32Result
OP
32A
B32
Y32
Select
Ad
derM
UX
AL
U
Carry In
CS141-L4-30 Tarun Soni, Summer ‘03
CPU: Building blocks
OP
32A
B32
Y32
Select
MU
X
3232
A[31..0]
B[31..0]32
Sum[31..0]
Carry
Ad
der
Carry In
32A[63..32]
B[63..32]32
Sum[63..32]
Carry
Ad
der
Carry In
32
• Building a 64-bit adder from 2x32-bit adders
• Speed of addition? • For one ADD? • For consecutive ADDS?
6
CS141-L4-31 Tarun Soni, Summer ‘03
Multicycle CPU: Individual operations
• Next address logic
– PC <= branch ? PC + offset : PC + 4
• Instruction Fetch
– InstructionReg <= Mem[PC]
• Register Access
– A <= R[rs]
• ALU operation
– R <= A + B
PC
Ne
xt P
C
Ope
rand
Fet
ch Exec
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
Inst
ruct
ion
Fet
ch
Res
ult S
tore
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Mem
Wr
nPC
_sel
Reg
Wr
Mem
Wr
Mem
Rd
Control
CS141-L4-32 Tarun Soni, Summer ‘03
• Five execution steps (some instructions use fewer)– IF: Instruction Fetch– ID: Instruction Decode (& register fetch & add PC+immed)– EX: Execute– Mem: Memory access– WB: Write-Back into registers
IF ID EX Mem WB
3.0--.1.911beq
3.9-1-.911Store
4.7.81-.911Load
3.7.8--.911R-type
TotalR-WriteD cachePC update
ALUDecode,R-Read
I cache
Multicycle CPU: Partitioning Time
CS141-L4-33 Tarun Soni, Summer ‘03
PC
Instruction memory
Read address
Instruction
16 32
Add ALU result
M u x
Registers
Write registerWrite data
Read data 1
Read data 2
Read register 1Read register 2
Shift left 2
4
M u x
ALU operation3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALU result
ZeroALU
Data memory
Address
Write data
Read data M
u x
Sign extend
Add
��� ��� ��� ��� ���
Multicycle CPU: StepsNote: Reuse of ALU
CS141-L4-34 Tarun Soni, Summer ‘03
Multicycle CPU
Partitioning the CPI=1 Datapath
• Add registers between smallest stepsP
C
Ne
xt P
C
Ope
rand
Fet
ch Exec
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
Inst
ruct
ion
Fet
ch
Res
ult S
tore
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Mem
Wr
nPC
_sel
Reg
Wr
Mem
Wr
Mem
Rd
CS141-L4-35 Tarun Soni, Summer ‘03
Multicycle CPU
Clk
Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Ifetch Reg Exec Mem
Load Store
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Cycle 1 Cycle 2
CS141-L4-36 Tarun Soni, Summer ‘03
Step R-type Memory Branch Instruction Fetch IR = Mem[PC]
PC = PC + 4 Instruction Decode/ register fetch
A = Reg[IR[25-21]] B = Reg[IR[20-16]]
ALUout = PC + (sign-extend(IR[15-0]) << 2) Execution, address computation, branch completion
ALUout = A op B ALUout = A + sign-
extend(IR[15-0])
if (A==B) then PC=ALUout
Memory access or R-type completion
Reg[IR[15-11]] = ALUout
memory-data = Mem[ALUout]
or Mem[ALUout]=
B
Write-back Reg[IR[20-16]] = memory-data
Multicycle CPU: Instruction Types
7
CS141-L4-37 Tarun Soni, Summer ‘03
Multicycle CPU: Sharing Hardware
• Example: memory is used twice, at different times
– Ave memaccess per inst = 1 + Flw + Fsw ~ 1.3
– if CPI is 4.8, imem util ization = 1/4.8, dmem =0.3/4.8
• We could reduce HW without hurting performance
– extra control
IR <- Mem[PC]
A <- R[rs] ; B<– R[rt]
S <– A + B
R[rd] <– S;PC <– PC+4;
S <– A + SX
M <– Mem[S]
R[rd] <– M;PC <– PC+4;
S <– A or ZX
R[rt] <– S;PC <– PC+4;
S <– A + SX
Mem[S] <- B
PC <– PC+4; PC < PC+4; PC < PC+SX;
CS141-L4-38 Tarun Soni, Summer ‘03
Multicycle CPU: Sharing Functional Units
PC
Memory
Address
Instruction �or data
Data
Instruction �register
RegistersRegister #
Data
Register #
Register #
ALU
Memory �data �
register
A
B
ALUOut
S t ep n a m eA c t i o n f o r R -t y p e
i n s t r u c t io n sA c t io n fo r m e m o r y -r e f er en c e
in s t r u c t i o n sA c ti o n f o r b r an c h e s
A c t io n f o r j u m p s
In st ruc tio n fet c h IR = M em o ry[ P C]P C = P C + 4
In st ruc tio n A = Re g [IR [2 5-2 1] ]de c od e /reg ist er fet c h B = Re g [IR [2 0-1 6] ]
A LU O u t = P C + (s ign -ex t en d (IR [ 15 -0] ) < < 2)
E xe c u tio n , a dd re ss AL U O ut = A o p B AL U O ut = A + sig n-e x te nd if (A = = B ) t he n PC = P C [3 1 -2 8 ] I Ic om p uta tio n, bra nc h / (IR [ 15 -0] ) P C = A L UO u t (I R [ 25 -0] << 2 )ju m p co m ple tion
Me m ory ac c e ss or R -ty p e Re g [IR [1 5-1 1 ]] = L o ad : M D R = Me m ory [ AL U O ut ]c om p le tion A L UO u t o r
S to re : M e mo ry [A L UO ut] = B
Me m ory re a d co m ple tion L o ad : R e g[ IR [ 20 -16 ]] = M DR
Reuse:
• ALU
• Memory
Need more
• Muxing
• Control
Single ALU, Common data and instruction memory datapath
CS141-L4-39 Tarun Soni, Summer ‘03
Since we reuse logic (e.g. ALU), we need to store results between states
Need extra registers when:– signal is computed in one clock cycle and used in
another, AND– the inputs to the combinational circuit can change
before the signal is written into a state element.
Multicycle CPU: Adding State Elements
CS141-L4-40 Tarun Soni, Summer ‘03
PC
Instruction memory
Read address
Instruction
16 32
Add ALU result
M u x
Registers
Write registerWrite data
Read data 1
Read data 2
Read register 1Read register 2
Shift left 2
4
M u x
ALU operation3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALU result
ZeroALU
Data memory
Address
Write data
Read data M
u x
Sign extend
Add
� � � � ��� ��� � �
Multicycle CPU: Adding State Elements
CS141-L4-41 Tarun Soni, Summer ‘03
Shift left 2
PCM u x
0
1
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction [15–11]
M u x
0
1
M u x
0
1
4
Instruction [15–0]
Sign extend
3216
Instruction [25–21]
Instruction [20–16]
Instruction [15–0]
Instruction register
ALU control
ALU result
ALUZero
Memory data
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op [5–0]
Instruction [31-26]
Instruction [5–0]
M u x
0
2
Jump address [31-0]Instruction [25–0] 26 28
Shift left 2
PC [31-28]
1
1 M u x
0
32
M u x
0
1ALUOut
Memory
MemData
Write data
Address
Multicycle CPU: The Full Multi-Cycle Implementation
CS141-L4-42 Tarun Soni, Summer ‘03
Cycle 1: Instruction Fetch
Shift left 2
PCM u x
0
1
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction � � � � � � � M u x
0
1
M u x
0
1
4
Instruction � � � � � �
Sign extend
3216
Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �
Instruction register
ALU control
ALU result
ALUZero
Memory data
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op � �
Instruction [31-26]
� � � � � � � � � � � � � � � �
M u x
0
2
Jump address [31-0]� � � � � � � � � � � � � � � � � 26 28
Shift left 2
PC [31-28]
1
1 M u x
0
32
M u x
0
1ALUOut
Memory
MemData
Write data
Address
Datapath: IR = Memory[PC], PC = PC + 4 (may be revised later)Control: IorD=0, MemRead=1, MemWr=0, IRwrite=1, ALUsrcA=0, etc
8
CS141-L4-43 Tarun Soni, Summer ‘03
A = Register [IR[25-21]]B = Register [IR[20-16]]ALUout = PC + (sign-extend (IR[15-0]) << 2)
Shift left 2
PCM u x
0
1
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction � � � � � � � M u x
0
1
M u x
0
1
4
Instruction � � � � � �
Sign extend
3216
Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �
Instruction register
ALU control
ALU result
ALUZero
Memory data
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op � �
Instruction [31-26]
� � � � � � � � � � � � � � � �
M u x
0
2
Jump address [31-0]
� � � � � � � � � � � � � � � � � 26 28Shift left 2
PC [31-28]
1
1 M u x
0
32
M u x
0
1ALUOut
Memory
MemData
Write data
Address
Cycle 1: Instruction Decode
CS141-L4-44 Tarun Soni, Summer ‘03
A = Reg[IR[25-21]]B = Reg[IR[20-16]]
ALUout = PC + (sign-extend (IR[15-0]) << 2)
We compute target address even though we don’t know if it will be used – Operation may not be branch– Even if it is, branch may not be taken
Why? Everything up to this point must be instruction-independent,
because we haven’t decoded the instruction.The ALU, the (incremented) PC, and the immed field are now all
available
Cycle 2: Instruction Decode & RegFetch
CS141-L4-45 Tarun Soni, Summer ‘03
Cycle 3 for beq: EXecute
• In cycle 1, PC was incremented by 4• In cycle 2, ALUout was set to branch target•This cycle, we conditionally reset PC: if (A==B) PC=ALUout
Shift left 2
PCM u x
0
1
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction � � � � � � � M u x
0
1
M u x
0
1
4
Instruction � � � � � �
Sign extend
3216
Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �
Instruction register
ALU control
ALU result
ALUZero
Memory data
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op � �
Instruction [31-26]
� � � � � � � � � � � � � � � �
M u x
0
2
Jump address [31-0]
� � � � � � � � � � � � � � � � � 26 28Shift left 2
PC [31-28]
1
1 M u x
0
32
M u x
0
1ALUOut
Memory
MemData
Write data
Address
A
B
ALUout
CS141-L4-46 Tarun Soni, Summer ‘03
• Cycle 3 (EXecute)
ALUout = A op B
• Cycle 4 (WriteBack)
Reg[IR[15-11]] = ALUout
R-type instruction is finished
Cycle 3: R-type Instruction
CS141-L4-47 Tarun Soni, Summer ‘03
Cycle 3: ALUout = A op B
Cycle 4: Reg[IR[15-11]] = ALUout
Shift left 2
PCM u x
0
1
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction � � � � � � � M u x
0
1
M u x
0
1
4
Instruction � � � � � �
Sign extend
3216
Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �
Instruction register
ALU control
ALU result
ALUZero
Memory data
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op � �
Instruction [31-26]
� � � � � � � � � � � � � � � �
M u x
0
2
Jump address [31-0]� � � � � � � � � � � � � � � � � 26 28
Shift left 2
PC [31-28]
1
1 M u x
0
32
M u x
0
1ALUOut
Memory
MemData
Write data
Address
B
A
Cycle 3: R-type Instruction
CS141-L4-48 Tarun Soni, Summer ‘03
Cycle 3: ALUout = A op B
Cycle 4: Reg[IR[15-11]] = ALUout
Shift left 2
PCM u x
0
1
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Instruction � � � � � � � M u x
0
1
M u x
0
1
4
Instruction � � � � � �
Sign extend
3216
Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �
Instruction register
ALU control
ALU result
ALUZero
Memory data
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op � �
Instruction [31-26]
� � � � � � � � � � � � � � � �
M u x
0
2
Jump address [31-0]� � � � � � � � � � � � � � � � � 26 28
Shift left 2
PC [31-28]
1
1 M u x
0
32
M u x
0
1ALUOut
Memory
MemData
Write data
Address
B
AALUout
Cycle 4: R-type Instruction
9
CS141-L4-49 Tarun Soni, Summer ‘03
Multicycle CPU: The datapathP
C
Nex
t PC
Ope
rand
Fet
ch
Ext
ALU Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
Inst
ruct
ion
Fet
ch
Res
ult S
tore
AL
Uct
r
Reg
Dst
AL
USr
c
Ext
Op
nPC
_sel
Reg
Wr
Mem
Wr
Mem
Rd
IR
A
B
R
M
RegFile
Mem
ToR
eg
Equ
al
Extra Registers:
• IR
• A,B
• R ( sometimes called S or ALUout)
• MCS141-L4-50 Tarun Soni, Summer ‘03
Multicycle CPU: The datapath
• Logical Register Transfer
• Physical Register Transfers
inst Logical Register Transfers
ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4
inst Physical Register Transfers
IR <– MEM[pc]
ADDU A<– R[rs]; B <– R[rt]
S <– A + B
R[rd] <– S; PC <– PC + 4
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
S
M
Reg
File
Equ
al
PC
Ne
xt P
C
IR
Inst
. M
em
CS141-L4-51 Tarun Soni, Summer ‘03
Multicycle CPU: The datapath
• Logical Register Transfer
• Physical Register Transfers
inst Logical Register Transfers
ORI R[rt] <– R[rs] OR zx(Im16); PC <– PC + 4
inst Physical Register Transfers
IR <– MEM[pc]
ADDU A<– R[rs]; B <– R[rt]
S <– ( A or ZeroExt(Im16) )
R[rt] <– S; PC <– PC + 4
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
S
M
Reg
File
Equ
al
PC
Ne
xt P
C
IR
Inst
. M
em
CS141-L4-52 Tarun Soni, Summer ‘03
Multicycle CPU: The datapath
• Logical Register Transfer
• Physical Register Transfers
inst Logical Register Transfers
LW R[rt] <– MEM(R[rs] + sx(Im16);
PC <– PC + 4
inst Physical Register Transfers
IR <– MEM[pc]
LW A<– R[rs]; B <– R[rt]
S <– A + SignEx(Im16)
M <– MEM[S]
R[rd] <– M; PC <– PC + 4
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
S
M
Reg
File
Equ
al
PC
Ne
xt P
C
IR
Inst
. M
em
CS141-L4-53 Tarun Soni, Summer ‘03
Multicycle CPU: The datapath
• Logical Register Transfer
• Physical Register Transfers
inst Logical Register Transfers
SW MEM(R[rs] + sx(Im16) <– R[rt];
PC <– PC + 4
inst Physical Register Transfers
IR <– MEM[pc]
SW A<– R[rs]; B <– R[rt]
S <– A + SignEx(Im16);
MEM[S] <– B; PC <– PC + 4
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
S
M
Reg
File
Equ
al
PC
Ne
xt P
C
IR
Inst
. M
em
CS141-L4-54 Tarun Soni, Summer ‘03
Multicycle CPU: The datapath
• Logical Register Transfer
• Physical Register Transfers
inst Logical Register Transfers
BEQ if R[rs] == R[rt]
then PC <= PC + sx(Im16) || 00
else PC <= PC + 4
inst Physical Register Transfers
IR <– MEM[pc]
BEQ|Eq PC <– PC + 4
inst Physical Register Transfers
IR <– MEM[pc]
BEQ|Eq PC <– PC + sx(Im16) || 00
Exe
c
Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
A
B
S
M
Reg
File
Equ
al
PC
Ne
xt P
C
IR
Inst
. M
em
10
CS141-L4-55 Tarun Soni, Summer ‘03
Multicycle CPU: Summary
Step nameAction for R-type
instructionsAction for memory-reference
instructionsAction for branches
Action for jumps
Instruction fetch IR = Memory[PC]PC = PC + 4
Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
CS141-L4-56 Tarun Soni, Summer ‘03
Multicycle CPU: Mid-term alert !!
• How many cycles will it take to execute this code?
lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume notadd $t5, $t2, $t3sw $t5, 8($t3)
Label: ...
• What is going on during the 8th cycle of execution?
• In what cycle does the actual addition of $t2 and $t3 takes place?
CS141-L4-57 Tarun Soni, Summer ‘03
Multicycle CPU: Sharing Hardware
“Princeton” Organization
• Single memory for instruction and data access
– memory utilization -> 1.3/4.8
• In this case our state diagram does not change
– several additional control signals
– must ensure each bus is only driven by one source on each cycle
RegFile
A
B
A-BusB Bus
IR S
W-Bus
PC
nextPC ZX SX
Mem
CS141-L4-58 Tarun Soni, Summer ‘03
Multicycle CPU: Control Line Timing
Shift �left 2
MemtoReg
IorD MemRead MemWrite
PC
Memory
MemData
Write �data
M �u �x
0
1
RegistersWrite �register
Write �data
Read�data 1
Read �data 2
Read �register 1
Read �register 2
Instruction �[15– 11]
M �u �x
0
1
M�u �x
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction �[15– 0]
Instruction [5– 0]
Sign �extend
3216
Instruction �[25– 21]
Instruction �[20– 16]
Instruction �[15– 0]
Instruction �register
1 M �u �x
0
3
2
ALU �control
M �u �x
0
1ALU �
resultALU
ALUSrcA
ZeroA
B
ALUOut
IRWrite
Address
Memory �data �
register
Clk
Cycle 1
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Ifetch Reg Exec Mem
Load Store
Ifetch
R-type
IRWr ite
CS141-L4-59 Tarun Soni, Summer ‘03
Review: Finite State Machines
• Finite state machines:
– a set of states and
– next state function (determined by current state and the input)
– output function (determined by current state and possibly input)
– We’ll use a Moore machine (output based only on current state)
Next-state�
functionCurrent state
Clock
Output�
function
Next�
state
Outputs
Inputs
CS141-L4-60 Tarun Soni, Summer ‘03
Multicycle CPU: Control
PCWrite
PCWriteCondIorD
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
NS3NS2NS1NS0
Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
State register
IRWrite
MemRead
MemWrite
Instruction register �opcode field
Outputs
Control logic
Inputs
If (State == Instruction Fetch)
{
IRWrite = 1;
// All other signals are 0;
State = Operand Fetch;
}
If (State == Execute && InstructionOpCode == BEQ )
{
// Do your thing..
}
ControlOutput = f(State, OpCode)
NextState = f(State, OpCode)
11
CS141-L4-61 Tarun Soni, Summer ‘03
Multicycle CPU: Our basic FSM
Instruction fetch
Decode and Register Fetch
Memoryinstructions
R-typeinstructions
Branchinstructions
Jumpinstruction
CS141-L4-62 Tarun Soni, Summer ‘03
Multicycle CPU: Control
IR <= MEM[PC]
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= SPC <= PC + 4
S <= A or ZX
R[rt] <= SPC <= PC + 4
ORi
S <= A + SX
R[rt] <= MPC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= BPC <= PC + 4
BEQ & Equal
BEQ & ~Equal
PC <= PC + 4 PC <= PC +SX || 00
SW
“instruction fetch”
“decode / operand fetch”
Exe
cute
Mem
ory
Writ
e-ba
ck
CS141-L4-63 Tarun Soni, Summer ‘03
Multicycle CPU: Control
PCWrite �PCSource = 10
ALUSrcA = 1 �ALUSrcB = 00 �ALUOp = 01 �PCWriteCond �
PCSource = 01
ALUSrcA =1 �ALUSrcB = 00 �ALUOp = 10
RegDst = 1 �RegWrite �
MemtoReg = 0
MemWrite �IorD = 1
MemRead �IorD = 1
ALUSrcA = 1 �ALUSrcB = 10 �ALUOp = 00
RegDst=0 �RegWrite �
MemtoReg =1 ��
ALUSrcA = 0 �ALUSrcB = 11 �ALUOp = 00
MemRead �ALUSrcA = 0 �
IorD = 0 �IRWrite �
ALUSrcB = 01 �ALUOp = 00 �
PCWrite �PCSource = 00
Instruction fetchInstruction decode/�
register fetch
Jump �completion
Branch �completionExecution
Memory address �computation
Memory �access
Memory �access R-type completion
Write-back step
(Op = 'LW') or (Op = 'SW') (Op = R-type)
(Op
= 'B
EQ')
(Op
= 'J
')
(Op = 'SW
')
(Op
= 'L
W')
4
01
9862
753
Start
Number of states?
Number of bits for state?
CS141-L4-64 Tarun Soni, Summer ‘03
Multicycle CPU: Control: Assigning States
IR <= MEM[PC]
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= SPC <= PC + 4
S <= A or ZX
R[rt] <= SPC <= PC + 4
ORi
S <= A + SX
R[rt] <= MPC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= BPC <= PC + 4
BEQ & EqualBEQ & ~Equal
PC <= PC + 4 PC <= PC +SX || 00
SW
“instruction fetch”
“decode”
Exe
cute
Mem
ory
Writ
e-ba
ck
0000
0001
0100
0101
0110
0111
1000
1001
1010
0011 00101011
1100
CS141-L4-65 Tarun Soni, Summer ‘03
Multicycle CPU: Detailed control spec.
0000 ?????? ? 0001 10001 BEQ 0 0011 1 10001 BEQ 1 0010 1 10001 R-type x 0100 1 10001 orI x 0110 1 10001 LW x 1000 1 10001 SW x 1011 1 10010 xxxxxx x 0000 1 10011 xxxxxx x 0000 1 00100 xxxxxx x 0101 0 1 fun 10101 xxxxxx x 0000 1 0 0 1 10110 xxxxxx x 0111 0 0 or 10111 xxxxxx x 0000 1 0 0 1 01000 xxxxxx x 1001 1 0 add 11001 xxxxxx x 1010 1 0 01010 xxxxxx x 0000 1 0 1 1 01011 xxxxxx x 1100 1 0 add 11100 xxxxxx x 0000 1 0 0 1
State Op field Eq Next IR PC Ops Exec Mem Write-Backen sel A B Ex Sr ALU S R W M M-R Wr Dst
R:
ORi:
LW:
SW:
CS141-L4-66 Tarun Soni, Summer ‘03
Multicycle CPU: Implementation styles
• ROM = "Read Only Memory"
– values of memory locations are fixed ahead of time
• A ROM can be used to implement a truth table
– if the address is m-bits, we can address 2m entries in the ROM.
– our outputs are the bits of data that the address points to.
– 2m is the "height", and n is the "width"
m n
0 0 0 0 0 1 10 0 1 1 1 0 00 1 0 1 1 0 00 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 11 1 0 0 1 1 01 1 1 0 1 1 1
12
CS141-L4-67 Tarun Soni, Summer ‘03
Multicycle CPU: Implementation styles
• How many inputs are there?6 bits for opcode, 4 bits for state = 10 address lines(i.e., 210 = 1024 different addresses)
• How many outputs are there?16 datapath-control outputs, 4 state bits = 20 outputs
• ROM is 210 x 20 = 20K bits (and a rather unusual size)
• Rather wasteful, since for lots of the entries, the outputs are the same— i.e., opcode is often ignored
CS141-L4-68 Tarun Soni, Summer ‘03
Multicycle CPU: Implementation styles
• Break up the table into two parts
— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM
— 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM
— Total: 4.3K bits of ROM
• PLA is much smaller
— can share product terms
— only need entries that produce an active output
— can take into account don't cares
• Size is (#inputs ´ #product-terms) + (#outputs ´ #product-terms)
For this example = (10x17)+(20x17) = 460 PLA cells
• PLA cells usually about the size of a ROM cell (slightly bigger)
CS141-L4-69 Tarun Soni, Summer ‘03
Multicycle CPU: Implementation styles
PLA ImplementationOp5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
IorD
IRWrite
MemReadMemWrite
PCWritePCWriteCond
MemtoRegPCSource1
ALUOp1
ALUSrcB0ALUSrcARegWriteRegDstNS3NS2NS1NS0
ALUSrcB1ALUOp0
PCSource0
IRWrite = (!S0 && !S1 && !S2 && !S3)
NS0 = ( S[3..0] == 0000) ||( S[3..0] == 0110 ) ||( S[3..0] == 1001 && OP[5..0]=000010 ) ||(…)(…)
CS141-L4-70 Tarun Soni, Summer ‘03
Microprogramming
PCWrite�PCSource = 10
ALUSrcA = 1�ALUSrcB = 00�ALUOp = 01�PCWriteCond�
PCSource = 01
ALUSrcA =1 �ALUSrcB = 00�ALUOp= 10
RegDst = 1 �RegWrite�
MemtoReg = 0
MemWrite�IorD = 1
MemRead�IorD = 1
ALUSrcA = 1�ALUSrcB = 10 �ALUOp = 00
RegDst=0�RegWrite�
MemtoReg=1 ��
ALUSrcA = 0 �ALUSrcB = 11 �ALUOp = 00
MemRead�ALUSrcA = 0�
IorD = 0 �IRWrite �
ALUSrcB = 01�ALUOp = 00�
PCWrite �PCSource = 00
Instruction fetchInstruction decode/ �
register fetch
Jump �completion
Branch �completionExecution
Memory address �computation
Memory �access
Memory �access R-type completion
Write-back step
(Op = 'LW') or
(Op = 'SW') (Op = R
-t ype)
(Op
= 'BE
Q')
(Op
= 'J
')
( Op = 'SW')
(Op
= 'L
W')
4
01
9862
753
Start
• Control is the hard part of processor design
° Datapath is fairly regular and well-organized
° Memory is highly regular
° Control is irregular and global
Consider the FSM in case of 100s of i nstructions !!!
• FSMs get unmanageable quickly as they grow.
– hard to specify
– hard to manipulate
– error prone
– hard to visualize
• The state digrams that arise define the controller for an instruction set processor are highly structured
• Use this structure to construct a simple “microsequencer”
• Control reduces to programming this very simple device
– microprogramming
CS141-L4-71 Tarun Soni, Summer ‘03
Microprogramming
Opcode
State Reg
Inputs
Outputs
Control LogicPLA or ROM
M ulticycleDatapath
1
Address Select Logic
Adder
Types of “ branching”• Set state to 0• Dispatch (state 1)• Use incremented state
number
Common case: State += 1;
Microprogramming:A Part icular Strategy for Implement ing the Control Unit of a processor by "programming" at the level of register transfer operations
Microarchitecture:Logical structure and functional capabilities of the hardware as seen by the microprogrammer
Historical Note:
IBM 360 Series first to distinguish between architecture & organizat ion Same instruction set across wide range of implementat ions, each with different cost/performance
CS141-L4-72 Tarun Soni, Summer ‘03
Macro-Micro programming?
MainMemory
executionunit
controlmemory
CPU
ADDSUBAND
DATA
.
.
.
User program plus Data
this can change!
AND microsequence
e.g., FetchCalc Operand AddrFetch Operand(s)CalculateSave Answer(s)
one of these ismapped into oneof these
13
CS141-L4-73 Tarun Soni, Summer ‘03
Horizontal Microinstructions
° “Horizontal” Microcode
– control field for each control point in the machine
µseq µaddr A-mux B-mux bus enables register enables
Control Logic / Store(PLA, ROM)
OPcode
Datapath
Inst
ruct
ion
Decode
Con
ditio
ns
ControlPoints
microinstruction
Depending on bus organization, many potent ial control combinations simply wrong, i.e., implies transfers that can never happen atthe same time.
Idea: encode fields to save ROM space
Example: mem_to_reg and ALU_to_reg should never happen simultenously;=> encode in single bit which is decoded rather than two separate bits
CS141-L4-74 Tarun Soni, Summer ‘03
Vertical Microinstructions
° “Vertical” Microcode
– encoded control fields with local decode
src dst
DEC
DEC
other control fields next states inputs
MUX
Some of these may havenothing to do with registers!
CS141-L4-75 Tarun Soni, Summer ‘03
Design Microinstruction Sets
1) Start with list of control signals2) Group signals together that make sense (vs. random): called “ fields”3) Places fields in some logical order
(e.g., ALU operation & ALU operands first andmicroinstruction sequencing last)
4) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals
– Use computers to design computers5) To minimize the width, encode operations that will never be used at the same
time
CS141-L4-76 Tarun Soni, Summer ‘03
Microinstructions Start with list of control signals, grouped into fields
Signal name Effect when deasserted Effect when assertedALUSelA 1st ALU operand = PC 1st ALU operand = Reg[rs]RegWrite None Reg. is written MemtoReg Reg. write data input = ALU Reg. write data input = memory RegDst
Reg. dest. no. = rt Reg. dest. no. = rdTargetWrite None Target reg. = ALU MemRead None Memory at address is readMemWrite None Memory at address is written IorD Memory address = PC Memory address = ALUIRWrite None IR = MemoryPCWrite None PC = PCSourcePCWriteCond None IF ALUzero then PC = PCSource
Sing
le B
it C
ontr
ol
Signal name Value EffectALUOp 00 ALU adds
01 ALU subtracts 10 ALU does function code11 ALU does logical OR
ALUSelB 000 2nd ALU input = Reg[rt] 001 2nd ALU input = 4 010 2nd ALU input = sign extended IR[15-0] 011 2nd ALU input = sign extended, shift left 2 IR[15-0]100 2nd ALU input = zero extended IR[15-0]
PCSource 00 PC = ALU 01 PC = Target 10 PC = PC+4[29-26] : IR[25–0] << 2
Mul
tiple
Bit
Con
trol
CS141-L4-77 Tarun Soni, Summer ‘03
Microinstructions
Field Name Width Control Signals Set
wide narrow
ALU Control 4 2 ALUOp
SRC1 2 1 ALUSelA
SRC2 5 3 ALUSelB
ALU Destination 6 4 RegWrite, MemtoReg, RegDst, TargetWr.
Memory 4 3 MemRead, MemWrite, IorD
Memory Register 1 1 IRWrite
PCWrite Control 5 4 PCWrite, PCWriteCond, PCSource
Sequencing 3 2 AddrCtl
Total width 30 20 bits
CS141-L4-78 Tarun Soni, Summer ‘03
Microinstructions: MIPS f ield name and values
Field Name Values for Field Function of Field with Specific ValueALU Add ALU adds
Subt. ALU subtractsFunc code ALU does function codeOr ALU does logical OR
SRC1 PC 1st ALU input = PCrs 1st ALU input = Reg[rs]
SRC2 4 2nd ALU input = 4Extend 2nd ALU input = sign ext. IR[15-0]Extend0 2nd ALU input = zero ext. IR[15-0] Extshft 2nd ALU input = sign ex., sl IR[15-0]rt 2nd ALU input = Reg[rt]
ALU destination Target Target = ALUoutrd Reg[rd] = ALUout
Memory Read PC Read memory using PCRead ALU Read memory using ALU outputWrite ALU Write memory using ALU output
Memory register IR IR = MemWrite rt Reg[rt] = MemRead rt Mem = Reg[rt]
PC write ALU PC = ALU outputTarget-cond. IF ALU Zero then PC = Targetjump addr. PC = PCSource
Sequencing Seq Go to sequential µinstructionFetch Go to the first microinstructionDispatch Dispatch using ROM.
14
CS141-L4-79 Tarun Soni, Summer ‘03
Microinstructions: The datapath again
Shift�
left 2
MemtoReg
IorD MemRead MemWri te
PC
Memory
MemData
Write�
data
M�
u�
x
0
1
RegistersWrite
�register
Write�
data
Read�
data 1
Read�
data 2
Read�
register 1
Read�
register 2
Instruction�
[15–11]
M�
u�
x
0
1
M�
u�
x
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction�
[15–0]
Instruction [5– 0]
Sign�
extend
3216
Instruction�
[25–21]
Instruction�
[20–16]
Instruction�
[15– 0]
Instruction�
register1 M
�u
�x
0
3
2
ALU�
control
M�
u�
x
0
1ALU
�result
ALU
ALUSrcA
ZeroA
B
ALUOut
IRWr ite
Address
Memory�
data�
register
Field Name Values for Field Function of Field with Specific ValueSRC1 PC 1st ALU input = PC
rs 1st ALU input = Reg[rs]SRC2 4 2nd ALU input = 4
Extend 2nd ALU input = sign ext. IR[15-0]Extend0 2nd ALU input = zero ext. IR[15-0] Extshft 2nd ALU input = sign ex., sl IR[15-0]rt 2nd ALU input = Reg[rt]
ALU destination Target Target = ALUoutrd Reg[rd] = ALUout
CS141-L4-80 Tarun Soni, Summer ‘03
Microinstructions: Pros-Cons
• Specification Advantages:
– Easy to design and write
– Design architecture and microcode in parallel
• Implementation (off-chip ROM) Advantages
– Easy to change since values are in memory
– Can emulate other architectures and instruction sets
– Can make use of internal registers
• Implementation Disadvantages, SLOWER now that:
– Control is implemented on same chip as processor
– ROM is no longer faster than RAM
– No need to go back and make changes
CS141-L4-81 Tarun Soni, Summer ‘03
CPU Control: Methodology
Initial�
representationFinite state
�
diagramMicroprogram
Sequencing�
controlExplicit next
�
state functionMicroprogram counter
�
+ dispatch ROMS
Logic�
representationLogic
�
equationsTruth
�
tables
Implementation�
techniqueProgrammable
�
logic arrayRead only
�
memory
CS141-L4-82 Tarun Soni, Summer ‘03
Microprogramming: the last word ?
Summary: Microprogramming one inspiration for RISC
• If simple instruction could execute at very high clock rate…
• If you could even write compilers to produce microinstructions…
• If most programs use simple instructions and addressing modes…
• If microcode is kept in RAM instead of ROM so as to fix bugs …
• If same memory used for control memory could be used instead as cache for “macroinstructions”…
• Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine? (microprogramming is overkill when ISA matches datapath 1-1)
CS141-L4-83 Tarun Soni, Summer ‘03
Exceptions
Support ing exceptions in our FSM
MemReadALUSelA = 0
IorD = 0IRWrite
ALUSelB = 01ALUOp = 00
PCWritePCSource = 00
ALUSelA = 0ALUSelB = 11ALUOp = 00TargetWrite
Memory InstFSM
R-type InstFSM
Branch InstFSM
Jump InstFSM
Instruction Fetch, state 0 Instruction Decode/ Register Fetch, state 1
Opcode = LW or SW
Opcode = R-ty
pe
Opc
ode
= BE
Q
Opc
ode
= J
MP
Start
to state 10
Opcode = anything else
CS141-L4-84 Tarun Soni, Summer ‘03
Exceptions
Support ing exceptions in our FSM
ALUSelA = 1ALUSelB = 00ALUOp = 10
from state 1
ALUSelA = 1RegDst = 1RegWrite
MemtoReg = 0ALUSelB = 10ALUOp = 10
To state 0
R-type instructions
overflowTo state 11
15
CS141-L4-85 Tarun Soni, Summer ‘03
Exceptions
IntCause=1CauseWrite
ALUSelA = 0ALUSelB = 01ALUOp = 01
EPCWrite
To state 0 (fetch)
IntCause=0CauseWrite
PCWritePCSource=11
state 11
state 13
state 10 state 12
EPC
Cau
se
PC
PCWrite EPCWrite
CauseWrite
IntCause
PCSource
InterruptHandlerAddress
sub4
illegalinstruction
arithmeticoverflow
Support ing exceptions in our FSM
Write Cause into registerWrite PC into EPCLoad Exception Handler address to PC
CS141-L4-86 Tarun Soni, Summer ‘03
Exceptions
IR <= MEM[PC]PC <= PC + 4
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= S
S <= A op ZX
R[rt] <= S
ORi
S <= A + SX
R[rt] <= M
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= B
SW
other
undefined instruction
EPC <= PC - 4PC <= exp_addrcause <= 10 (RI)
EPC <= PC - 4PC <= exp_addrcause <= 12 (Ovf)
overflow
Additional condition fromDatapath
Equal
BEQ
PC <= PC +SX || 00
0010
0011
S <= A - B ~Equal
CS141-L4-87 Tarun Soni, Summer ‘03
Summary
• multicycle CPUs make things faster.
• control is harder.
• microprogramming can simplify (conceptually) CPU control generation
• a microprogram is a small program inside the CPU that executes the individual instructions of the “real” program.
• exception-handling is difficult in the CPU, because the interactions between the executing instructions and the interrupt are complex and unpredictable.
CS141-L4-88 Tarun Soni, Summer ‘03
Mid-Term Review
• Technology trends: Design for the future • Instruction Set Architectures: types of ISAs: Addressing modes, length of instruction etc.• MIPS instruction format-basic classes of instructions• Registers and load store architectures• Data types, operands, memory organization/addressing• Basic MIPS instructions: Arithmetic, logical, data transfer, branching, jumps• Issues in jump/branching distance and immediate addressing modes• Stacks and frames• E.g., swap(), leaf_procedure(), nested_procedure()
• Performance: Relative (Boeing e.g,), Metrics, Benchmarking, SPEC marks• Performance = Instruction Count x Cycles/Instruction x Seconds/Cycle • Amdahl’s law Improvement = Execution Time Unaffected + ( Execution Time Affected /
Amount of Improvement )• Arithmetic: 2s complement• Basic digital logic, 1-bit adder, full adder, 32-bit adder/subtractor• ALU: adder+mux+special conditions• Delays in combinational logic, clocking• Ripple carry vs. Carry look ahead adders
CS141-L4-89 Tarun Soni, Summer ‘03
Mid-Term Review
• Multiplication & Division: grade school version• 3 incrementally better algorithms (data paths)
• Basics of booth arithmetic• Floating point representation• Floating point operations (+,-,*,/)• Guard,round and sticky bits
• Single cycle CPU• Building blocks: Register files, memory etc.• Storage units, clocking methodology • PC arithmetic• Instruction fetch• Datapath on various operations: Load, Store, Branch, R-type, I-type• Control: basic control signals for the MIPS subset• Distributed control: Main control + ALU control • PLA implementation• Timing diagrams
CS141-L4-90 Tarun Soni, Summer ‘03
Mid-Term Review
• Multi-cycle CPU• Datapath: registers/stages: Ifetch, A,B, Execute, Store etc.• Various instructions through the datapath• Control: Sharing functional units• Finite state machine perspective for control: FSM for MIPS • Implementation styles: ROM, PLA• Microprogramming: Horizontal, vertical, relationship to RISC• Exceptions: change in FSM, internal, external; need to save state.