Working with time: interrupts, counters and timers Chapter Six Dr. Gheith Abandah1.
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted...
-
Upload
gillian-atkins -
Category
Documents
-
view
217 -
download
0
Transcript of CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted...
CPE232 Basic MIPS Architecture 1
Computer Organization
Multi-cycle Approach
Dr. Iyad Jafar
Adapted from Dr. Gheith Abandah slides
http://www.abandah.com/gheith/Courses/CPE335_S08/index.html
CPE232 Basic MIPS Architecture 2
Multicycle Datapath Approach Let an instruction take more than 1 clock cycle to complete
Break up instructions into steps where - each step takes a cycle while trying to balance the amount of work to be
done in each step
- restrict each cycle to use only one major functional unit; unless used in parallel
Not every instruction takes the same number of clock cycles
In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used on different clock cycles, as a result
Need one memory only– but only one memory access per cycle Need one ALU/adder only – but only one ALU operation per cycle
CPE232 Basic MIPS Architecture 3
At the end of a cycle Store values needed in a later cycle by the current instruction in internal registers
(A,B, IR, and MDR) . These registers are invisible to the programmer. All of these registers, except IR, hold data only between a pair of adjacent clock
cycles thus they don’t need write control signal.
IR – Instruction Register MDR – Memory Data Register
A, B – regfile read data registers ALUout – ALU output register
Multicycle Datapath Approach, con’t
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB A
LU
ou
t
Data used by subsequent instructions are stored in programmer visible registers (i.e., register file, PC, or memory)
CPE232 Basic MIPS Architecture 4
Multicycle Datapath Approach, con’t
Similar to single cycle, shared functional units should have multiplexers at their inputs. There is only one adder that will be used to update PC, perform ALU operations, comparison for beq, memory address computation, and branch address computation.
CPE232 Basic MIPS Architecture 5
Multicycle Datapath Approach- Control Signals
CPE232 Basic MIPS Architecture 6
The Multicycle Datapath with Control Signals
Address
Read Data(Instr. or Data)
Memory
PC
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
Write Data
IRM
DR
AB
AL
Uo
ut
SignExtend
Shiftleft 2 ALU
control
Shiftleft 2
ALUOpControl
IRWriteMemtoReg
MemWriteMemRead
IorD
PCWrite
PCWriteCond
RegDstRegWrite
ALUSrcAALUSrcB
zero
PCSource
1
1
1
1
1
10
0
0
0
0
0
2
2
3
4
Instr[5-0]
Instr[25-0]
PC[31-28]
Instr[15-0]
Instr[3
1-2
6]
32
28
CPE232 Basic MIPS Architecture 7
Multicycle Machine: 1-bit Control Signals
Signal Effect when deasserted Effect when asserted
RegDstThe destination register number comes from the rt field
The destination register number comes from the rd field
RegWrite NoneWrite is enabled to selected destination register
ALUSrcA The first ALU operand is the PC The first ALU operand is register A
MemRead NoneContent of memory address is placed on Memory data out
MemWrtite NoneMemory location specified by the address is replaced by the value on Write data input
MemtoRegThe value fed to register file is from ALUOut
The value fed to register file is from memory
IorDPC is used as an address to memory unit
ALUOut is used to supply the address to the memory unit
IRWrite None The output of memory is written into IR
PCWrite NonePC is written; the source is controlled by PCSource
PCWriteCond NonePC is written if Zero output from ALU is also active
CPE232 Basic MIPS Architecture 8
Multicycle Machine: 2-bit Control Signals
Signal Value Effect
ALUOp
00 ALU performs add operation
01 ALU performs subtract operation
10 The funct field of the instruction determines the ALU operation
ALUSrcB
00 The second input to the ALU comes from register B
01 The second input to the ALU is 4 (to increment PC)
10The second input to the ALU is the sign extended offset , lower 16 bits of IR.
11The second input to the ALU is the sign extended , lower 16 bits of the IR shifted left by two bits
PCSource
00 Output of ALU (PC +4) is sent to the PC for writing
01The content of ALUOut are sent to the PC for writing (Branch address)
10 The jump address is sent to the PC for writing
CPE232 Basic MIPS Architecture 9
Breaking Instruction Execution into Clock Cycles
1. IFetch: Instruction Fetch and Update PC (Same for all instructions) Operations
1.1 Instruction Fetch: IR <= Memory[PC]
1.2 Update PC : PC <= PC + 4
Control signals values- IorD = 0 , MemRead = 1 , IRWrite = 1
- ALUSrcA = 0, ALUSrcB = 01, ALUOp = 00, PCWrite = 1
- PCSrc = 00
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
IFetch Dec Exec Mem WB
CPE232 Basic MIPS Architecture 10
Breaking Instruction Execution into Clock Cycles
2. Decode - Instruction decode and register fetch (same for all instructions)
We don’t know the instruction yet, do non harmful operations Operations
2.1 read the two source registers rs and rt and place them in registers A and B, respectively.
A <= Reg[IR[25:21]]
B <= Reg[IR[20:16]]
2.2 Compute the branch address
ALUOut <= PC + (sign-extend(IR[15:0]) <<2)
Control signals values- ALUSrcA = 0, ALUSrcB = 11, ALUOp = 00
CPE232 Basic MIPS Architecture 11
Breaking Instruction Execution into Clock Cycles
3. Execution, Memory address computation, or branch completion
Operation in this cycle depends on instruction type Operations
* if memory reference, compute address
ALUOut <= A + sign-extend(IR[15:0])
ALUSrcA = 1, ALUSrcB = 10, ALUOp = 00
* if arithmetic-logic instruction, perform operation
ALUOut <= A op B
ALUSrcA = 1, ALUSrcB = 00, ALUOp = 10
CPE232 Basic MIPS Architecture 12
Breaking Instruction Execution into Clock Cycles
3. Execution, Memory address computation, or branch completion (continued)
operation depends on instruction type Operations
* if branch instruction
if (A == B) PC<= ALUOut
ALUSrcA = 1, ALUSrcB = 00, ALUOp = 01, PCWriteCond = 1, PCSrc = 01
* if jump instruction
PC <= {PC[31:28], (IR[25:0],2’b00)}
PCSource = 10, PCWrite = 1
CPE232 Basic MIPS Architecture 13
Breaking Instruction Execution into Clock Cycles
4. Memory access or R-type completion
operation in this cycle depends on instruction type Operations
* if load instruction : read value from memory into MDR
MDR <= Memory[ALUOut]
MemRead = 1, IorD = 1
* if store instruction: store rt into memory
Memory[ALUOut] <= B
MemWrite = 1, IorD = 1
* if arithmetic-logical instruction: write ALU result into rd
Reg[IR[15:11]] <= ALUOut
MemtoReg = 0, RegDst = 1, RegWrite = 1
CPE232 Basic MIPS Architecture 14
Breaking Instruction Execution into Clock Cycles
5. Memory read completion
Needed for the load instruction only Operations
5.1 store the loaded value in MDR into rt
Reg[IR[20:16]] <= MDR
RegWrite = 1, MemtoReg = 1, RegDst = 0
CPE232 Basic MIPS Architecture 15
Breaking Instruction Execution into Clock Cycles
In this implementation, not all instructions take 5 cycles
Instruction Class Clock Cycles Required
Load 5
Store 4
Branch 3
Arithmetic-logical 4
Jump 3
CPE232 Basic MIPS Architecture 16
Multicycle Performance
Compute the average CPI for multicycle implementation for SPECINT2000 program which has the following instruction mix: 25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU. Assume the CPI for each instruction class as given in the previous table
CPI = Σ CPIi x ICi / IC
= 0.25 x 5 + 0.1 x 4 + 0.11 x 3 + 0.02 x 3 + 0.52 x 4
= 4.12
Compare to CPI = 1 for single cycle ?!! Assume CCM = 1/5 CCS
Then
PerformanceM / PerformanceS = (IC x 1 x CCS ) / (IC x 4.12 x (1/5) CCS)
= 1.21 Multicycle is also cost-effective in terms of hardware.
CPE232 Basic MIPS Architecture 17
Multicycle datapath control signals are not determined solely by the bits in the instruction e.g., op code bits tell what operation the ALU should be doing, but not what instruction cycle is to be done next
Since the instruction is broken into multiple cycles, we need to know what we did in the previous cycle(s) in order to determine the current action
Must use a finite state machine (FSM) for control a set of states (current state stored in State Register) next state function (determined
by current state and the input) output function (determined by
current state and the input)
Multicycle Control Unit
Combinationalcontrol logic
State RegInst
Opcode
Datapathcontrolpoints
Next State
. . . . . .
. . .
CPE232 Basic MIPS Architecture 18
The States of the Control Unit
10 states are required in the FSM control
The sequence of states is determined by five steps of execution and the instruction
CPE232 Basic MIPS Architecture 19
The Control Unit
1. Logic gates inputs : present state +
opcode #bits = 10 outputs: control +
next state #bits = 20 truth table size =
210 rows x 20 columns
2. ROM Can be used to implement
the truth table above (210 x 20 bit = 20 Kbit)
Each location stores the control signals values and the next state
Each location is addressable by the opcode and next state value
CPE232 Basic MIPS Architecture 20
Micro-programmed Control Unit ROM implementation is
vulnerable to bugs and expensive especially for complex CPU. Size increase as the number and complexity of instructions (states) increases.
Use Microprogramming
The next state value may not be sequential
Generate the next state outside the storage element
Each state is a microinstruction and the signals are specified symbolically
Use labels for sequencing
CPE232 Basic MIPS Architecture 21
Sequencer
CPE232 Basic MIPS Architecture 22
Microprogram
The microassembler converts the microcode into actual signal values
The sequencing field is used along with the opcode to determine the next state
CPE232 Basic MIPS Architecture 23
Multicycle Advantages & Disadvantages
Uses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction step
Multicycle implementations allow functional units to be used more than once per instruction as long as they are used on different clock cycles
but
Requires additional internal state registers, more muxes, and more complicated (FSM) control
Clk
Cycle 1
IFetch Dec Exec Mem WB
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
IFetch Dec Exec Mem
lw sw
IFetch
R-type
CPE232 Basic MIPS Architecture 24
Single Cycle vs. Multiple Cycle Timing
Clk Cycle 1
Multiple Cycle Implementation:
IFetch Dec Exec Mem WB
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
IFetch Dec Exec Mem
lw sw
IFetch
R-type
Clk
Single Cycle Implementation:
lw sw Waste
Cycle 1 Cycle 2
multicycle clock slower than 1/5th of single cycle clock due to state register overhead