CSE-45432 Introduction to Computer Architecture Chapter …bai/CSE45432/Comp_arch_chp5_3RD.pdf ·...
Transcript of CSE-45432 Introduction to Computer Architecture Chapter …bai/CSE45432/Comp_arch_chp5_3RD.pdf ·...
1
CSE-45432 Introduction to Computer
Architecture
Chapter 5 The Processor:
Datapath & Control
Dr. Izadi
ECE 45432 SUNY New Paltz2
RegistersRegister #
Data
Register #
Data�memory
Address
Data
Register #
PC Instruction ALU
Instruction�memory
AddressProcessorWe're ready to look at an implementation of the MIPSSimplified to contain only:
memory-reference instructions: lw, swarithmetic-logical instructions: add, sub, and, or, sltcontrol flow instructions: beq, j
Generic Implementation:use the program counter (PC) to supply instruction addressget the instruction from memoryread (write) from (to) registersthe op-code determines exactly what to do
All instructions use the ALU after reading the registersWhy? memory-reference? arithmetic? control flow?
ECE 45432 SUNY New Paltz3
More Implementation Details
Abstract / Simplified View:
Two types of functional units:Elements that operate on data values (combinational)
ALU
Elements that contain state (sequential)Registers and memory
RegistersRegister #
Data
Register #
Data�memory
Address
Data
Register #
PC Instruction ALU
Instruction�memory
Address
ECE 45432 SUNY New Paltz4
Unclocked vs. ClockedClocks used in synchronous logic
when should an element that contains state be updated?
State Elements
Clock period Rising edge
Falling edge
cycle time
ECE 45432 SUNY New Paltz5
An Unclocked State Element
The set-reset latchOutput depends on present inputs and also on past inputs
S R Q R
S
Q
Q
0 0 Q
0 1 0
1 0 1
1 1 -
ECE 45432 SUNY New Paltz6
Two inputs:the data value to be stored (D)the clock signal (C) indicating when to read & store D
Two outputs:the value of the internal state (Q) and it's complement
D-latch
Q
C
D
_Q
D Q(t+1)
0 0
1 1
D
C
Q
ECE 45432 SUNY New Paltz7
Latches and Flip-flops
Output is equal to the stored value inside the element(don't need to ask for permission to look at the value)Change of state (value) is based on the clock
Latches: whenever the clock is asserted (high or low) and the input changesFlip-flop: whenever the clock is asserted (low to high or high to low transition) and the input changes (edge-triggered methodology)
Note:"logically true“ could mean electrically low
ECE 45432 SUNY New Paltz8
D flip-flop
Output changes only on the clock edge
D
C
Dlatch
D
C
QD
latch
D
C
Q Q
Q Q
D
C
Q
ECE 45432 SUNY New Paltz9
Our Implementation
An edge triggered methodologyTypical execution:
read contents of some state elements, send values through some combinational logicwrite results to one or more state elements
Stateelement
1
Stateelement
2Combinational logic
Clock cycle
ECE 45432 SUNY New Paltz10
Clocking Methodology A clocking methodology defines when signals can be read and written
wouldn't want to read a signal at the same time it was being written
All storage elements are clocked by the same clock edgeCycle Time
CLK-to-Q + Longest Delay Path + Setup + Clock Skew(CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time
Clk
Don’t Care
Setup Hold
.
.
.
.
.
.
.
.
.
.
.
.
Setup Hold
ECE 45432 SUNY New Paltz11
Abstraction
Make sure you understand the abstractions!Sometimes it is easy to think you do, when you don’t
Mux
Select
B31
A31
C31
Mux
B30
A30
C30
Mux
B0
A0
C0
......
Mux
C
Select
32
32
32
B
A
ECE 45432 SUNY New Paltz12
Register File
Read operation using D flip-flops and MUX’s
Read registernumber 1 Read
data 1Read registernumber 2
Readdata 2
Writeregister
WriteWritedata
Register file
Read registernumber 1
Register 0
Register 1
. . .
Register n – 2
Register n – 1
Mux
Read registernumber 2
Mux
Read data 1
Read data 2
What is the function of “Mux” above?
ECE 45432 SUNY New Paltz13
Register File – Read and Write
M�u�x
Register 0Register 1
Register n – 1Register n
M�u�x
Read data 1
Read data 2
Read register�number 1
Read register�number 2
Read registernumber 1 Read
data 1Read registernumber 2
Readdata 2
Writeregister
WriteWritedata
Register file
n-to-1decoder
Register0
Register1
Registern– 1C
C
D
DRegistern
C
C
D
D
01
n– 1n
Write
Write Register
Write Data
How many registers can we read and write at the same time?
Does this support MIPS instructions requirements?
ECE 45432 SUNY New Paltz14
Building the Datapath
Include the functional units we need for each instruction
Use multiplexors to stitch them together
ALU Control:000 AND001 OR010 add110 subtract111 set-on-less-than
A LU control
R egW rite
R eg is te rsW riteregister
R eaddata 1
R eaddata 2
R eadregister 1
R eadregister 2
W rited ata
A LUresu lt
ALU
D ata
D ata
R eg is te rnum bers
a . R egisters b. A LU
Zero5
5
5 3
PC
Instructionmemory
Instructionaddress
Instruction
a. Instruction memory b. Program counter
Add Sum
c. Adder
16 32Sign�
extend
b. Sign-extension unit
MemRead
MemWrite
Data�memory
Write�data
Read�data
a. Data memory unit
Address
ECE 45432 SUNY New Paltz15
Datapath for Fetch and R-Instructions
Portion of datapath for fetching instructions and updating PC
R-format instructions: read two registers and write one register
PC
Instructionmemory
Readaddress
Instruction
4
Add
ALU control
RegWrite
RegistersWriteregister
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Writedata
ALUresult
ALUZero
3
Instruction
ALU Control:000 AND001 OR010 add110 subtract111 set-on-less-than
ECE 45432 SUNY New Paltz16
Datapath for lw and swInsturctions
lw $t1, offset_value($t2) or sw $t1, offset_value ($t2)Compute memory addressSign extend 16 bit to 32 bit
1 6 3 2
R e g is te rs
W rite
re g is te r
W rite
d a ta
R e a d
d a ta 1
R e a d
d a ta 2
R e a d
re g is te r 1
R e a d
re g is te r 2
A L U o p e ra tio n3
R e g W rite
M e m R e a d
M e m W rite
A L U
re su lt
Z e ro
A L U
D a ta
m e m o ry
A d d re s s
W rite
d a ta
R e a
d a ta
S ig n
e x te n d
Instruction
ECE 45432 SUNY New Paltz17
Building beq Instruction
beq $t1, $t2, offsetCompare registers, use ALU to affect Z flagIf not equal, next PC <= PC +4If equal, sign extend the offset and shift by two
Instruction
16 32
Add ALUresult
Registers
W riteregister
W ritedata
R eaddata 1
Readdata 2
Readregister 1R eadregister 2
Shiftleft 2
ALU operation3
RegW rite
Z eroALU
PC + 4
Branch control
Next instruction address
Signextend
ECE 45432 SUNY New Paltz18
A Simple Implementation of aDatapath
Covers: lw, sw, beq, add, sub, and, or, set-on-less-thanUse multiplexors to stitch them together
Instruction
16 32
Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Mux
ALU operation3
RegWrite
MemRead
MemWriteALUSrc
MemtoReg
ALUresult
ZeroALU
Datamemory
Address
Writedata
Readdata M
ux
Signextend
ECE 45432 SUNY New Paltz19
Implementation of the Datapath
PC
Instructionmemory
Readaddress
Instruction
16 32
Add ALUresult
Mux
Registers
WriteregisterWritedata
Readdata1
Readdata2
Readregister1Readregister2
Shiftleft2
4
Mux
ALUoperation3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrcMemtoReg
ALUresult
ZeroALU
Datamemory
Address
Writedata
Readdata M
ux
Signextend
Add
ECE 45432 SUNY New Paltz20
Three Instruction Classes• R-Type Instruction
op rs rt rd shamt funct
rd: destination
•Load and Store Instructions
op rs rt 16 bit offset
rt: destination
•Branch Instruction
op rs rt 16 bit offset
rt: destination
ECE 45432 SUNY New Paltz21
Completed Data Path
MemtoRegP C
Instructionmemory
Readadd ress
Ins truc tion[31– 0 ]
Instruction [20– 16 ]
Instruction [25– 21 ]
A dd
4
16 32Instruction [15– 0]
0
0Mux
0
1
Add ALUresu lt
Mux
0
1
Regis tersW riteregister
W riteda ta
Readda ta 1
Readda ta 2
R eadregister 1
R eadregister 2
Signextend
S hiftle ft 2
Mux
1
ALUresult
Zero
Datam em ory
W ritedata
Readda ta
Mux
1
Instruction [15– 11 ]
A LUAddress
RegWrite
ALUSrc
MemRead
RegDst
MemWrite
PCSrc
ALU Control
ECE 45432 SUNY New Paltz22
Control
Using the op-code from the instruction, the control issues signals to:
Selecting the operations to perform (ALU, read/write, etc.)
Controlling the flow of data (multiplexer inputs)
ALU's operation based on instruction type and function code
Example: what should the ALU do with the instruction
add $8, $17, $18
000000 10001 10010 01000 00000 100000op rs rt rd shamt funct
ECE 45432 SUNY New Paltz23
Control
Example: what should the ALU do with the instruction
lw $1, 100($2)35 2 1 100op rs rt 16 bit offset
Why is the ALU code for subtract is 110 and not 011?
ECE 45432 SUNY New Paltz24
ALU Control
Must describe hardware to compute 3-bit ALU control inputGiven instruction type 00 = lw, sw01 = beq, 11 = arithmeticFunction code for arithmeticALUOp Funct field Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F00 0 X X X X X X 010 add0 1 X X X X X X 110 sub1 1 (X) X X 0 0 0 0 010 add1 1 (X) X X 0 0 1 0 110 sub1 1 (X) X X 0 1 0 0 000 and1 1 (X) X X 0 1 0 1 001 or1 1 (X) X X 1 0 1 0 111 slt
ALU Control:000 AND001 OR010 add110 subtract111 set-on-less-than
ALUOp
computed from instruction type
ECE 45432 SUNY New Paltz25
ALU Control
Operation2
Operation1
Operation0
Operation
ALUOp1
F3
F2
F1
F0
F (5–0)
ALUOp0
ALUOp
ALUcontrol blockMuli-level decoding can reduce size of control unit and increase its speed.
ALUOp Funct field OperationALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 2 1 0
0 0 X X X X X X 010 add0 1 X X X X X X 110 sub1 0 X X 0 0 0 0 010 add1 0 X X 0 0 1 0 110 sub1 0 X X 0 1 0 0 000 and1 0 X X 0 1 0 1 001 or1 0 X X 1 0 1 0 111 slt
Inputs Outputs
ECE 45432 SUNY New Paltz26
A Simple Control
P C
Instructionmemory
Readadd ress
Ins truc tion[31– 0 ]
Instruction [20– 16 ]
Instruction [25– 21 ]
A dd
Instruction [5– 0]
4
16 32Instruction [15– 0]
0
0Mux
0
1
Add ALUresu lt
Mux
0
1
Regis tersW riteregister
W riteda ta
Readda ta 1
Readda ta 2
R eadregister 1
R eadregister 2
Signextend
S hiftle ft 2
Mux
1
ALUresult
Zero
Datam em ory
W ritedata
Readda ta
Mux
1
Instruction [15– 11 ]
A LUcontrol
A LUAddress
RegWrite
ALUSrc
ALUop
MemRead
MemtoReg
RegDst
MemWrite
PCSrc
ECE 45432 SUNY New Paltz27
Control
Instruction Opcode RegDst ALUSrcMemto-
RegReg
WriteMem Read
Mem Write Branch ALUOp1ALUOp0
R-format 00 0000 1 0 0 1 0 0 0 1 0lw 10 0011 0 1 1 1 1 0 0 0 0sw 10 1011 X 1 X 0 0 1 0 0 0beq 00 0100 X 0 X 0 0 0 1 0 1
PC
Instruction�memory
Read�address
Instruction�[31– 0]
Instruction [20– 16]
Instruction [25– 21]
Add
Instruction [5– 0]
MemtoRegALUOpMemWrite
RegWrite
MemReadBranchRegDst
ALUSrc
Instruction [31– 26]
4
16 32Instruction [15– 0]
0
0M�u�x
0
1
Control
Add ALU�result
M�u�x
0
1
RegistersWrite�register
Write�data
Read�data 1
Read�data 2
Read�register 1
Read�register 2
Sign�extend
Shift�left 2
M�u�x
1
ALU�result
Zero
Data�memory
Write�data
Read�data
M�u�x
1
Instruction [15– 11]
ALU�control
ALUAddress
ECE 45432 SUNY New Paltz28
Single Cycle Control
Simple combinational logic
R-format Iw sw beq
Op0Op1Op2Op3Op4Op5
Inputs
Outputs
RegDst
ALUSrcMemtoReg
RegWrite
MemReadMemWrite
BranchALUOp1
ALUOpO
Operation2
Operation1
Operation0
Operation
ALUOp1
F3
F2
F1
F0
F (5–0)
ALUOp0
ALUOp
ALUcontrol block
ECE 45432 SUNY New Paltz29
Our Simple Control Structure
All of the logic is combinational
We wait for everything to settle down, and the right thing to be done
ALU might not produce “right answer” right away
we use write signals along with clock to determine when to write
Cycle time determined by length of the longest path
Combinational logicContent of PC Content
of PC
Clock
ECE 45432 SUNY New Paltz30
Single Cycle Implementation
Calculate cycle time assuming negligible delays except:memory (2ns), ALU and adders (2ns), register file access (1ns)
MemtoReg
MemRead
MemWriteALUSrc
RegDst
PC
Instructionmemory
Readaddress
Instruction[31–0]
Instruction[20–16]
Instruction[25–21]
Add
Instruction[5–0]
RegWrite
4
16 32Instruction[15–0]
0Registers
WriteregisterWritedata
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
ALUresult
Zero
Datamemory
Address Readdata M
ux
1
0
Mux
1
Mux
1
0
Mux
1
Instruction[15–11]
Shiftleft 2
PCSrc
ALU
Add ALUresult
ALUOp
0
ALUcontrol
ECE 45432 SUNY New Paltz31
Single Cycle ImplementationCalculate cycle time assuming negligible delays except:
memory (2ns), ALU and adders (2ns), register file access (1ns)
M e m t o R e g
M e m R e a d
M e m W r i te
A L U O p
A L U S rc
R e g D s t
P C
I n s t ru c t io n �m e m o ry
R e a d �a d d re s s
In s tru c t io n �[3 1 – 0 ]
In s tr u c t io n [2 0 – 1 6 ]
In s tr u c t io n [2 5 – 2 1 ]
A d d
I n s tru c t io n [5 – 0 ]
R e g W r i te
4
1 6 3 2In s tr u c t io n [1 5 – 0 ]
0R e g is t e rs
W r ite �re g is te rW r ite �d a ta
W r i te �d a ta
R e a d �d a ta 1
R e a d �d a ta 2
R e a d �re g is te r 1R e a d �re g is te r 2
S ig n �e x te n d
A L U �re s u l t
Z e r o
D a ta �m e m o ry
A d d re s s R e a d �d a ta M �
u �x
1
0
M �u �x
1
0
M �u �x
1
0
M �u �x
1
In s tr u c t io n [1 5 – 1 1 ]
A L U �c o n t ro l
S h i f t�le f t 2
P C S rc
A L U
A d d A L U �re s u lt
InstructionInstr.
MemoryRegister
ReadALU Op.
Data Memory
Reg. Write Total
R-format 2 1 2 0 1 6 nslw 2 1 2 2 1 8 nssw 2 1 2 2 7 nsbeq 2 1 2 5 ns
ECE 45432 SUNY New Paltz32
Single Cycle Problems:
Cycle time should accommodate the longest instruction.
What if we had more complicated instructions like floating point?
Can use a unit only once during a cycleMay need multiple copies of some functional units (wasteful of area)
One Solution:use a “smaller” cycle timehave different instructions take different numbers of cyclesa “multi-cycle” datapath:
ECE 45432 SUNY New Paltz33
Multi-cycle ApproachWe will be reusing functional units
ALU used to compute address and the new PC valueSame memory used for instruction and data
Break up the instructions into steps, each step takes a cycle
Balance the amount of work to be doneRestrict each cycle to use only one major functional unit
Our control signals will not be determined solely by instruction, but also by the current step
e.g., what should the ALU do for an “add” instruction?At the end of a cycle (step)
Store values for use in later steps Introduce additional “internal” registers
ECE 45432 SUNY New Paltz34
Multi-cycle ApproachAdded components
IR and MDR both needed during the same cycleA and B registers to hold operand valuesALUOut register
PC
Memory
Address
Instructionor data
Data
Instructionregister
RegistersRegister #
Data
Register #
Register #
ALUMemory
dataregister
A
B
ALUOut
ECE 45432 SUNY New Paltz35
Multicycle Approach
Shiftleft2
PC
MemoryMemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata1
Readdata2
Readregister1
Readregister2
Mux
0
1
Mux
0
1
4
Instruction[15–0]
Signextend
3216
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
Instructionregister
1 Mux
0
32
Mux
ALUresult
ALUZero
Memorydata
register
Instruction[15–11]
A
B
ALUOut
0
1
Address
Internal registers except IR are updated every clock cycle; No write control
Need to add new MUX’s and expand existing MUX’s
ECE 45432 SUNY New Paltz36
Shiftleft2
PC
MemoryMemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata1
Readdata2
Readregister1
Readregister2
Mux
0
1
Mux
0
1
4
Instruction[15–0]
Signextend
3216
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
Instructionregister
1 Mux
0
32
Mux
ALUresult
ALUZero
Memorydata
register
Instruction[15–11]
A
B
ALUOut
0
1
Address
Shiftleft2
01
2
3226
ALUcontrol
ALUOp
ALUSrcB
ALUSrcA
PCSource
MemtoReg
REGWriteIRWriteMemWriteMemRead
IorD
PCWrite
REGDst
PCWriteCondZero (ALU)
Control Signals
ECE 45432 SUNY New Paltz37
Instructions from ISA PerspectiveConsider each instruction from perspective of ISA.Example:
The add instruction changes a register. Register specified by bits 15:11 of instruction. Instruction specified by the PC. New value is the sum (“op”) of two registers. Registers specified by bits 25:21 and 20:16 of the instruction
Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]
In order to accomplish this we must break up the instruction.(kind of like introducing variables when programming)
ECE 45432 SUNY New Paltz38
Breaking Down of an Instruction
ISA definition of arithmetic:Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]
Could break down to:IR <= Memory[PC]A <= Reg[IR[25:21]]B <= Reg[IR[20:16]]ALUOut <= A op BReg[IR[15:11]] <= ALUOut
We forgot an important part of the definition of arithmetic!PC <= PC + 4
ECE 45432 SUNY New Paltz39
Idea Behind Multicycle ApproachWe define each instruction from the ISA perspective (do this!)Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps)Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)Finally try and pack as much work into each step
(avoid unnecessary cycles)while also trying to share steps where possible
(minimizes control, helps to simplify solution)Result: Our book’s multicycle Implementation!
ECE 45432 SUNY New Paltz40
1. Instruction fetch2. Instruction decode and register fetch3. Execution, memory address computation, or
branch completion4. Memory access or R-type instruction completion5. Write-back step
INSTRUCTIONS TAKE FROM 3 INSTRUCTIONS TAKE FROM 3 -- 5 CYCLES!5 CYCLES!
Five Execution Steps1
2
3
4
5
ECE 45432 SUNY New Paltz41
Step 1: Instruction Fetch 1
Use PC to get instruction and put it in the Instruction Register.Increment the PC by 4 and put the result back in the PC
using RTL "Register-Transfer Language"IR <= Memory[PC];PC <= PC + 4;
MemReadALUSrcA =0IorD = 0IRWriteALUSrcB = 01ALUOp = 00PCWritePCSource = 00
Shiftleft2
PC
MemoryMemData
Writedata
Mux
0
1RegistersWriteregister
Writedata
Readdata1Readdata2
Readregister1Readregister2
Mux
0
1
Mux
0
1
4Instruction[15–0]
Signextend
3216
Instruction[25–21]Instruction[20–16]Instruction[15–0]
Instructionregister 1Mux
0
32
MuxALUresult
ALUZero
Memorydataregister
Instruction[15–11]
A
BALUOut
0
1Address
Shiftleft2012
3226
ALUcontrol
ALUOpALUSrcB
ALUSrcA
PCSource
MemtoReg
REGWriteIRWriteMemWriteMemRead
IorD
PCWrite
REGDst
PCWriteCondZero (ALU)
ECE 45432 SUNY New Paltz42
Step 2: Instruction Decode and Register Fetch
Read registers rs and rt (in case we need them) and put them in A & BCompute the branch address in case the instruction is a branch; put it in ALUOut
A <= Reg[IR[25-21]];B <= Reg[IR[20-16]];ALUOut <= PC + (sign-extend(IR[15-0]) << 2);
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
We are Looking at the instruction and determine what to do in the next cycleWe aren't setting any control lines based on the instruction type
Shiftleft2
PC
MemoryMemData
Writedata
Mux
0
1RegistersWriteregister
Writedata
Readdata1Readdata2
Readregister1Readregister2
Mux
0
1
Mux
0
1
4Instruction[15–0]
Signextend
3216
Instruction[25–21]Instruction[20–16]Instruction[15–0]
Instructionregister 1Mux
0
32
MuxALUresult
ALUZero
Memorydataregister
Instruction[15–11]
A
BALUOut
0
1Address
Shiftleft2012
3226
ALUcontrol
ALUOpALUSrcB
ALUSrcA
PCSource
MemtoReg
REGWriteIRWriteMemWriteMemRead
IorD
PCWrite
REGDst
PCWriteCondZero (ALU)
1
23
3 33
jlw or sw
beq
R-type
ECE 45432 SUNY New Paltz43
Step 3: Instruction DependentALU is performing one of three functions, based on instruction type(ignore j instruction for now)
Memory Reference: (lw or sw) R-type: Branch: beqALUOut <= A + sign-extend(IR[15-0]); ALUOut <= A op B; if (A==B) PC <= ALUOut
(compute address) (execute) (complete branch)
ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1ALUSrcB = 10 ALUSrcB = 00 ALUSrcB = 00ALUOp = 00 ALUOp = 10 ALUOp = 01
go to step 4 PCWriteCondPCSource = 01
lw swend of execution fetch next instruction
ECE 45432 SUNY New Paltz44
Shiftleft2
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata1
Readdata2
Readregister 1
Readregister 2
Mux
0
1
Mux
0
1
4
Instruction[15–0]
Signextend
3216
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
Instructionregister
1 Mux
0
32
Mux
ALUresult
ALUZero
Memorydata
register
Instruction[15–11]
A
B
ALUOut
0
1
Address
Shiftleft2
01
2
3226
ALUcontrol
ALUOp
ALUSrcB
ALUSrcA
PCSource
MemtoReg
REGWriteIRWriteMemWriteMemRead
IorD
PCWrite
REGDst
PCWriteCondZero (ALU)
For lw and sw
For beq
ECE 45432 SUNY New Paltz45
Step 4: R-type or Memory Access
Lw sw R-type
MDR = Memory[ALUOut]; Memory[ALUOut] = B; Reg[IR[15-11]] = ALUOut;
(read from memory) (write to memory) (store result)MemRead MemWrite RegDst = 1IorD = 1 IorD = 1 REGWrite
MemtoReg = 0
end of execution end of execution fetch next instruction fetch next instruction
ECE 45432 SUNY New Paltz46
PCSource
Shiftleft2
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata1
Readdata2
Readregister 1
Readregister 2
Mux
0
1
Mux
0
1
4
Instruction[15–0]
Signextend
3216
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
Instructionregister
1 Mux
0
32
Mux
ALUresult
ALUZero
Memorydata
register
Instruction[15–11]
A
B
ALUOut
0
1
Address
Shiftleft2
01
2
2826
ALUcontrol
ALUOp
ALUSrcB
ALUSrcA
MemtoReg
REGWriteIRWriteMemWriteMemRead
IorD
PCWrite
REGDst
PCWriteCondZero (ALU)
lw sw
R-type
32
ECE 45432 SUNY New Paltz47
Step 5 – Write Back
Only for lw instructions to store data read from memory into a register
Reg[IR[20-16]]= MDR;
RegDest = 0RegWriteMemtoReg = 1
Shiftleft2
PC
MemoryMemData
Writedata
Mux
0
1RegistersWriteregister
Writedata
Readdata1Readdata2
Readregister1Readregister2
Mux
0
1
Mux
0
1
4Instruction[15–0]
Signextend
3216
Instruction[25–21]Instruction[20–16]Instruction[15–0]
Instructionregister 1Mux
0
32
MuxALUresult
ALUZero
Memorydataregister
Instruction[15–11]
A
BALUOut
0
1Address
Shiftleft2012
3226
ALUcontrol
ALUOpALUSrcB
ALUSrcA
PCSource
MemtoReg
REGWriteIRWriteMemWriteMemRead
IorD
PCWrite
REGDst
PCWriteCondZero (ALU)
ECE 45432 SUNY New Paltz48
Control Summary
Step nameAction for R-type
instructionsAction for memory-reference
instructionsAction for branches
Action for jumps
Instruction fetch IR = Memory[PC]PC = PC + 4
Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completionMemory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
ECE 45432 SUNY New Paltz49
Implementing the Control
Value of control signals is dependent upon:what instruction is being executedwhich step is being performed
Use the information we’ve accumulated to specify a finite state machine
specify the finite state machine graphically, oruse microprogramming
Implementation can be derived from specification
ECE 45432 SUNY New Paltz50
Review: Finite State Machines
Finite state machines:A set of states and Next state function (determined by current state and the input)Output function (determined by current state and possibly input)
We’ll use a Moore machine (output based only on current state)How does Moore machine various from Mealy machine?
Inputs
Current state
Outputs
Clock
Next-statefunction
Outputfunction
Nextstate
ECE 45432 SUNY New Paltz51
Review: finite state machines
Example: A friend would like you to build an “electronic eye” for use as a fake security device. The device consists of three lights lined up in a row, controlled by the outputs Left, Middle, and Right, which, if asserted, indicate that a light should be on. Only one light is on at a time, and the light “moves” from left to right and then from right to left, thus scaring away thieves who believe that the device is monitoring their activity. Draw the graphical representation for the finite state machine used to specify the electronic eye. Note that the rate of the eye’s movement will be controlled by the clock speed (which should not be too great) and that there are essentially no inputs.
Graphical Specification of FSM
PCWritePCSource = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PC WriteCond
PCSource = 01
ALUSrcA = 1ALUSrcB = 00
ALUOp = 10
RegDst = 1RegWrite
MemtoReg = 0MemWriteIorD = 1
MemReadIorD = 1
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
R egDst = 0RegWrite
MemtoReg = 1
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
MemReadALUSrcA = 0
IorD = 0IRWrite
ALUSrcB = 01ALUOp = 00
PCWritePCSource = 00
Instruction fe tchInstruction decode/
register fetch
Jumpcompletion
BranchcompletionExecution
M emory addresscomputation
Memoryaccess
Memoryaccess R-type completion
W rite-back step
(Op = 'LW ') or (Op = 'SW ') (Op = R-type)
(Op =
'BEQ')
(Op
='J
' )
(Op= 'SW
')
(Op
='L
W' )
4
01
9862
753
Start
don’t care if not mentioned
asserted if name only
otherwise exact value
How many state bits will we need?
ECE 45432 SUNY New Paltz53
Simple QuestionsHow many cycles will it take to execute this code?
lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume notadd $t5, $t2, $t3sw $t5, 8($t3)
Label: ...
What is going on during the 8th cycle of execution?In what cycle does the actual addition of $t2 and $t3 takes place?
ECE 45432 SUNY New Paltz54
Multiple Cycle Datapath
IdealMemoryWrAdrDin
RAdr
32
32
32Dout
MemWr32
AL
U
3232
ALUOp
ALUControl
InstructionR
eg
32
IRWr
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mux
0
1
Rt
Rd
PCWr
ALUSelA
Mux 01
RegDst
Mux
0
1
32
PC
MemtoReg
Extend
ExtOp
Mux
0
132
0
1
23
4
16Imm 32
<< 2
ALUSelB
Mux
1
0
Target32
Zero
ZeroPCWrCond PCSrc BrWr
32
IorD
AL
U O
ut
Readregister 1
Readregister 2
Writeregister
Writedata
Registers ALUZero
Readdata 1
Readdata 2
Signextend
16 32
Instruction[31–26]
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
ALUresult
Mux
Mux
Shiftleft 2
Shiftleft 2
Instructionregister
PC 0
1
Mux
0
1
Mux
0
1
Mux
0
1A
B 0123
Mux
0
1
2
ALUOut
Instruction[15–0]
Memorydata
register
Address
Writedata
MemoryMemData
4
Instruction[15–11]
PCWriteCond
PCWrite
IorD
MemRead
MemWrite
MemtoReg
IRWrite
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
26 28
Outputs
Control
Op[5–0]
ALUcontrol
PC [31–28]
Instruction [25-0]
Instruction [5–0]
Jumpaddress[31–0]
ECE 45432 SUNY New Paltz56
Finite State Machine for Control
Implementation:PCWrite
PCWriteCondIorD
MemtoRegPCSourceALUOpALUSrcBALUSrcARegWriteRegDst
NS3NS2NS1NS0
Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
State register
IRWrite
MemReadMemWrite
Instruction register�opcode field
Outputs
Control logic
Inputs
ECE 45432 SUNY New Paltz57
PLA ImplementationOp5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
IorD
IRWrite
MemReadMemWrite
PCWritePCWriteCond
MemtoRegPCSource1
ALUOp1
ALUSrcB0ALUSrcARegWriteRegDstNS3NS2NS1NS0
ALUSrcB1ALUOp0
PCSource0
D Q
D Q
Op
codeIf I picked a horizontal or vertical line could you explain it?
ECE 45432 SUNY New Paltz58
ROM = "Read Only Memory"values of memory locations are fixed ahead of time
A ROM can be used to implement a truth tableif the address is m-bits, we can address 2m entries in the ROM.our outputs are the bits of data that the address points to.
m n
m is the "height", and n is the "width"
ROM Implementation
m n0 0 0 0 0 1 10 0 1 1 1 0 00 1 0 1 1 0 00 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 11 1 0 0 1 1 01 1 1 0 1 1 1
ECE 45432 SUNY New Paltz59
ROM Implementation
How many inputs are there?6 bits for opcode, 4 bits for state = 10 address lines(i.e., 210 = 1024 different addresses)
How many outputs are there?16 datapath-control outputs, 4 state bits = 20 outputs
ROM is 210 x 20 = 20K bits (and a rather unusual size)
Rather wasteful, since for lots of the entries, the outputs are the same
i.e., opcode is often ignored
ECE 45432 SUNY New Paltz60
ROM vs PLA
Break up the table into two parts4 state bits tell you the 16 outputs, 24 x 16 bits of ROM
10 bits tell you the 4 next state bits, 210 x 4 bits of ROM
Total: 4.3K bits of ROM
PLA is much smallerCan share product terms
Only need entries that produce an active output
Can take into account don't cares
Size: (#inputs´ #product-terms) + (#outputs´ #product-terms)For this example = (10x17)+(20x17) = 510 PLA cells
PLA cells usually about the size of a ROM cell (slightly bigger)
ECE 45432 SUNY New Paltz61
Another Implementation StyleComplex instructions
The "next state" is often current state + 1
AddrCtl
Outputs
PLA or ROM
State
Address select logic
Op[
5 –0 ]
Adder
Instruction registeropcode field
1
Control unit
Input
PCWritePCWriteCondIorD
MemtoRegPCSourceALUOpALUSrcBALUSrcARegWriteRegDst
IRWrite
MemReadMemWrite
BWrite
ECE 45432 SUNY New Paltz62
MicroprogrammingControl is the hard part of processor design
Datapath is fairly regular and well-organizedMemory is highly regularControl is irregular and global
Microprogramming:A Particular Strategy for Implementing the Control Unit of a processor by "programming" at the level of register transfer operations
Microarchitecture:Logical structure and functional capabilities of the hardware asseen by the microprogrammer
Historical Note:IBM 360 Series first to distinguish between architecture & organization Same instruction set across wide range of implementations, each with different cost/performance
ECE 45432 SUNY New Paltz63
Macroinstruction Versus Microinstruction
MainMemory
executionunit
controlmemory
CPU
ADDSUBAND
DATA
.
.
.
User program plus Data
this can change!
AND microsequence
e.g., FetchFetch Operand(s)CalculateSave Answer(s)
one of these ismapped into oneof these at RTL level
ECE 45432 SUNY New Paltz64
Controller Designsequencercontrol
datapath control
micro-PCsequencer
microinstruction
The state diagrams that arise define the controller for an instruction set processor are highly structuredUse this structure to construct a simple “microsequencer” Control reduces to programming this very simple device
microprogramming
ECE 45432 SUNY New Paltz65
Microprogramming Implementation of the Control
P C WriteP C Write C on dIo rD
Me m toR e gP C S ourceALUOpALUS rcBALUS rcAR e gWrite
Ad drC tl
O utpu ts
Microcod e me m ory
IR Write
Me m R e a dMe m Write
R e gDs t
C o n trol u nit
Input
Mic ro pro gra m c ou nte r
Addre s s s e le ct lo gic
Op[
5–0]
Add e r
1
Da ta pa th
Ins tru c tion re g is te rop code fie ld
BWrite
ECE 45432 SUNY New Paltz66
Microprogrammingmicroinstruction: low level control instruction which defines a set of datapath control signal.A specification methodology
appropriate if hundreds of opcodes, modes, cycles, etc.signals specified symbolically using microinstructions
LabelALU
control SRC1 SRC2Register control Memory
PCWrite control Sequencing
Fetch Add PC 4 Read PC ALU SeqAdd PC Extshft Read Dispatch 1
Mem1 Add A Extend Dispatch 2LW2 Read ALU Seq
Write MDR FetchSW2 Write ALU FetchRformat1 Func code A B Seq
Write ALU FetchBEQ1 Subt A B ALUOut-cond FetchJUMP1 Jump address Fetch
Field name Value Signals active CommentAdd ALUOp = 00 Cause the ALU to add.
ALU control Subt ALUOp = 01 Cause the ALU to subtract; this implements the compare forbranches.
Func code ALUOp = 10 Use the instruction's function code to determine ALU control.SRC1 PC ALUSrcA = 0 Use the PC as the first ALU input.
A ALUSrcA = 1 Register A is the first ALU input.B ALUSrcB = 00 Register B is the second ALU input.
SRC2 4 ALUSrcB = 01 Use 4 as the second ALU input.Extend ALUSrcB = 10 Use output of the sign extension unit as the second ALU input.Extshft ALUSrcB = 11 Use the output of the shift-by-two unit as the second ALU input.Read Read two registers using the rs and rt fields of the IR as the register
numbers and putting the data into registers A and B.Write ALU RegWrite, Write a register using the rd field of the IR as the register number and
Register RegDst = 1, the contents of the ALUOut as the data.control MemtoReg = 0
Write MDR RegWrite, Write a register using the rt field of the IR as the register number andRegDst = 0, the contents of the MDR as the data.MemtoReg = 1
Read PC MemRead, Read memory using the PC as address; write result into IR (and lorD = 0 the MDR).
Memory Read ALU MemRead, Read memory using the ALUOut as address; write result into MDR.lorD = 1
Write ALU MemWrite, Write memory using the ALUOut as address, contents of B as thelorD = 1 data.
ALU PCSource = 00 Write the output of the ALU into the PC.PCWrite
PC write control ALUOut-cond PCSource = 01, If the Zero output of the ALU is active, write the PC with the contentsPCWriteCond of the register ALUOut.
jump address PCSource = 10, Write the PC with the jump address from the instruction.PCWrite
Seq AddrCtl = 11 Choose the next microinstruction sequentially.Sequencing Fetch AddrCtl = 00 Go to the first microinstruction to begin a new instruction.
Dispatch 1 AddrCtl = 01 Dispatch using the ROM 1.Dispatch 2 AddrCtl = 10 Dispatch using the ROM 2.
ECE 45432 SUNY New Paltz68
DetailsDispatch ROM 1 Dispatch ROM 2
Op Opcode name Value Op Opcode name Value000000 R-format 0110 100011 lw 0011000010 jmp 1001 101011 sw 0101000100 beq 1000100011 lw 0010101011 sw 0010
State number Address-control action Value of AddrCtl0 Use incremented state 31 Use dispatch ROM 1 12 Use dispatch ROM 2 23 Use incremented state 34 Replace state number by 0 05 Replace state number by 0 06 Use incremented state 37 Replace state number by 0 08 Replace state number by 0 09 Replace state number by 0 0
State
Adder
1
PLA or ROM
Mux3 2 1 0
Dispatch ROM 1Dispatch ROM 2
0
AddrCtl
Address select logic
Instruction registeropcode field
ECE 45432 SUNY New Paltz69
No encoding:1 bit for each datapath operationfaster, requires more memory (logic)used for Vax 780 — an astonishing 400K of memory!
Lots of encoding:send the microinstructions through logic to get control signalsuses less memory, slower
Historical context of CISC:Too much logic to put on a single chip with everything elseUse a ROM (or even RAM) to hold the microcodeIt’s easy to add new instructions
Maximally vs. Minimally Encoded
ECE 45432 SUNY New Paltz70
Designing a Microinstruction Set1. Start with list of control signals2. Group signals together that make sense (vs.
random): called “fields”3. Places fields in some logical order e.g., ALU
operation & ALU operands first and microinstruction sequencing last
4. Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals
Use computers to design computers5. To minimize the width, encode operations that will
never be used at the same time
ECE 45432 SUNY New Paltz71
Possible Design Paths of Control
Initial Representation Finite State Diagram Microprogram
Sequencing Control Explicit Next State Microprogram counterFunction + Dispatch ROMs
Logic Representation Logic Equations Truth Tables
Implementation PLA ROM Technique “hardwired control” “microprogrammed control”
ECE 45432 SUNY New Paltz72
Microprogramming Pros and Cons
Ease of designFlexibility
Easy to adapt to changes in organization, timing, technologyCan make changes late in design cycle, or even in the field
Can implement very powerful instruction sets (just more control memory)Generality
Can implement multiple instruction sets on same machine.Can tailor instruction set to application.
CompatibilityMany organizations, same instruction set
Costly to implementSlow
ECE 45432 SUNY New Paltz73
Microcode: Trade-offsDistinction between specification and implementation is blurredSpecification Advantages:
Easy to design and write
Design architecture and microcode in parallel
Implementation (off-chip ROM) AdvantagesEasy to change since values are in memory
Can emulate other architectures
Can make use of internal registers
Implementation Disadvantages, SLOWER now that:Control is implemented on same chip as processor
ROM is no longer faster than RAM
No need to go back and make changes
ECE 45432 SUNY New Paltz74
Historical PerspectiveIn the ‘60s and ‘70s microprogramming was very important for implementing machinesThis led to more sophisticated ISAs and the VAXIn the ‘80s RISC processors based on pipelining became popularPipelining the microinstructions is also possible!Implementations of IA-32 architecture since 486
“hardwired control” for simpler instructions (few cycles, FSM control implemented using PLA or random logic)
“microcoded control” for more complex instructions(large numbers of cycles, central control store)
The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store
ECE 45432 SUNY New Paltz75
Pentium 4
Pipelining is important (last IA-32 without it was 80386 in 1985)
Control
Control
Control
Enhancedfloating pointand multimedia
Control
I/Ointerface
Instruction cache
Integerdatapath
Datacache
Secondarycacheandmemoryinterface
Advanced pipelininghyperthreading support
Chapter 6
Chapter 7
ECE 45432 SUNY New Paltz76
Pentium 4
Somewhere in all that “control we must handle complex instructionsProcessor executes simple microinstructions, 70 bits wide (hardwired)120 control lines for integer datapath (400 for floating point)If an instruction requires more than 4 microinstructions to implement, control from microcode ROM (8000 microinstructions)Its complicated!
Control
Control
I/Ointerface
InstructioncacheData
Control
Control
Enhancedfloatingpointandmultimedia Integer
datapath
cache
Secondarycacheandmemoryinterface
Advancedpipelininghyperthreadingsupport
ECE 45432 SUNY New Paltz77
SummaryIf we understand the instructions, we can build a simple processor!
If instructions take different amounts of time, multi-cycle is better
Datapath implemented using:
Combinational logic for arithmetic logic unit
State holding elements for registers and memory
Control implemented using:
Combinational logic for single-cycle implementation
Finite state machine for multi-cycle implementation