Single-Cycle CPU DataPath
description
Transcript of Single-Cycle CPU DataPath
Building A CPU
• We’ve built a small ALU
• Add, Subtract, SLT, And, Or• Could figure out Multiply and Divide...
5.1
• What about the rest
• How do we deal with memory and registers?
• What about control operations (branches)?
• How do we interpret instructions?
• The whole thing...
• A CPU’s datapath deals with moving data around
• A CPU’s control manages the data
Datapath Overview
5.1
InstructionMemory
RegistersData Memory
Read reg. num A
Read reg. num B
Write reg num
Write reg data
Read reg data A
Read reg dataB
Read address
Instruction [31-0]
Read address
Write address
Write data
Read dataResult
PC
Instructions: R-type: 3 registers I-type: 2 registers, Data
Instructions: R-type: 3 registers I-type: 2 registers, Data
ALU Computes on: R-type: 2 registers I-type: Register and data
ALU Computes on: R-type: 2 registers I-type: Register and data
Data to write intodest. register from: ALU or Memory
Data to write intodest. register from: ALU or Memory
Memory: Address from ALU Data to/from regs
Memory: Address from ALU Data to/from regs
Current Instruction: PCCurrent Instruction: PC
Instruction Datapath
5.2
InstructionMemory
Read address
InstructionPC
Add
4
• Instructions will be held in the instruction memory
• The instruction to fetch is at the location specified by the PC
• Instr. = M[PC]
Note: Regular instruction width(32 for MIPS) makes this easy
Note: Regular instruction width(32 for MIPS) makes this easy
• After we fetch one instruction, the PC must be incremented to the next instruction
• All instructions are 4 bytes
• PC = PC + 4
R-type Instruction Datapath
5.2
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Result
Zero
ALU
Instruction
• R-type Instructions have three registers
• Two read (Rs, Rt) to provide data to the ALU
• One write (Rd) to receive data from the ALU
• We’ll need to specify the operation to the ALU (later...)
• We might be interested if the result of the ALU is zero (later...)
Read reg num A
Memory Operations
5.2
Data MemoryRead address
Write address
Write data
Read dataResult
Zero
signextend
16 32
• Memory operations first need to compute the effective address
• LW $t1, 450($s3) # E.A. = 450 + $s3
• Add together one register and 16 bits of immediate data
• Immediate data needs to be converted from 16-bit to 32-bit
• Memory then performs load or store using destination register
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Instruction
Branches
5.2
AddResult
Sh.Left2
Result
Zero
signextend
16 32
PC + 4
To controllogic
Instruction
• Branches conditionally change the next instruction
• BEQ $2, $1, 42
• The offset is specified as the number of words to be added to the next instruction (PC+4)
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
• Control logic has to decide if the branch is taken
• Uses ‘zero’ output of ALU
• Take offset, multiply by 4
• Shift left two
• Add this to PC+4 (from PC logic)
offset
Integrating the R-types and Memory
5.3
• R-types and Load/Stores are similar in many respects
• Differences:
• 2nd ALU source: R-types use register, I-types use Immediate
• Write Data: R-types use ALU result, I-types use memory
• Mux the conflicting datapaths together
Data MemoryRead address
Write address
Write data
Read dataResult
Zero
signextend
16 32
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Instruction
0
1
1
0
MemoryDatapath
Adding the instruction memory
5.3
InstructionMemory
Add
4
Read address
Instruction [31-0]
Result
PC
Simply add the instruction memoryand PC to the beginning of the datapath.
Separate Instruction and Data memories are needed in order to allowthe entire datapath to complete its job in a single clock cycle.
Data MemoryRead address
Write address
Write data
Read dataResult
Zero 1
00
1
signextend
16 32
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Adding the Branch Datapath
5.3
InstructionMemory
Add
4
Read address
Instruction [31-0]
Result
PCData MemoryRead address
Write address
Write data
Read dataResult
Zero 1
00
1
signextend
16 32
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
AddResult
Sh.Left2
0
1
Now we have the datapath for R-type, I-type, and branch instructions.
On to the control logic!
When does everything happen?
5.3
InstructionMemory
Data Memory
AddAdd
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read dataResult
Zero
Result
Result Sh.Left2
0
1
1
00
1
signextend
PC
16 32
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Combinational Logic:Just does it! Outputs are always just a function of its inputs (with some delay)
Registers: Written at the end of the clock cycle. (Rising edge triggered).
clk
clk
clk
Single-Cycle Design
Example
• Suppose it takes: • memory 100 nsec to read a word, • the ALU and adders take 4 nsec, • the register file can be read or written in 1 nsec, • the PC can be read or written in 0.2 nsec, • all multiplexors take 0.1 nsec. • Assume everything else takes 0 time (control, shift,
sign extend, wires, etc.). • How long will it take to execute an add instruction? • How long will it take to execute a lw instruction? • How long will it take to execute a beq instruction? • How long will it take to execute a j instruction?
What do we need to control?
5.3
InstructionMemory
Data Memory
AddAdd
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read dataResult
Zero
Result
Result Sh.Left2
0
1
1
00
1
signextend
PC
16 32
ALU -What is theOperation?
ALU -What is theOperation?
Memory-Read/Write/neither?
Memory-Read/Write/neither?
Mux - are webranching or not?
Mux - are webranching or not?
Mux - Wheredoes 2nd ALUoperand come from?
Mux - Wheredoes 2nd ALUoperand come from?
Registers-Should we write data?
Registers-Should we write data? Mux - Result from
ALU or Memory?
Mux - Result fromALU or Memory?
Almost all of the information we need is in the instruction!
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
The ALU• The ALU is stuck right in the middle of everything...
• It must:
• Add, Subtract, And, or Or for arithmetic instructions
• Subtract for a branch on equal
• Subtract and set for a SLT
• Add for a memory access
5.3
0
1
A
Operation
Result
+ 2B
CarryIn
CarryOut
0
1
BInvert
3Less
Function BInvert Op Carryin Result
And 0 00 0 R = A • BOr 0 01 0 R = A BAdd 0 10 0 R = A + BSubtract 1 10 1 R = A - BSLT 1 11 1 R = 1 if A < B
0 if A B
Function BInvert Op Carryin Result
And 0 00 0 R = A • BOr 0 01 0 R = A BAdd 0 10 0 R = A + BSubtract 1 10 1 R = A - BSLT 1 11 1 R = 1 if A < B
0 if A B
Always the same: Combine into one signal called “sub”
Setting the ALU controls• The instruction Opcode and Function give us the info we need
• For R-type instructions, Opcode is zero, function code determines ALU controls
5.3
Instruction Opcode ALUOp Funct. Code ALU action ALU controlsub op
add R-type 10 100000 add 0 10 sub R-type 10 100010 subtract 1 10and R-type 10 100100 and 0 00or R-type 10 100101 or 0 01SLT R-type 10 101010 SLT 1 11
New control signal: ALUOp is 00 for memory, 01 for Branch, and 10 for R-typeNew control signal: ALUOp is 00 for memory, 01 for Branch, and 10 for R-type
• For I-type instructions, Opcode determines ALU controls
load word LW 00 xxxxxx add 0 10store word SW 00 xxxxxx add 0 10branch equal BEQ 01 xxxxxx subtract 1 10
Controlling the ALU
5.3
ALUOp F5 F4 F3 F2 F1 F0 Function ALU Ctrl
00 x x x x x x Add 0 10x1 x x x x x x Sub 1 101x x x 0 0 0 0 Add 0 101x x x 0 0 1 0 Sub 1 101x x x 0 1 0 0 And 0 001x x x 0 1 0 1 Or 0 011x x x 1 0 1 0 SLT 1 11
Since ALUOp can only be 00, 01, or 10, we don’t care what ALUOp2 is when ALUOP1 is 1
Since ALUOp can only be 00, 01, or 10, we don’t care what ALUOp2 is when ALUOP1 is 1
For ALUOp = 00 or 01, function code is unused
For ALUOp = 00 or 01, function code is unused
AluOp is determined by Opcode -separate logic will generate ALUOp
A 6-input truth table - use standard minimization techniques
A 6-input truth table - use standard minimization techniques
ALUOp1
ALUOp0
F0
F3
F1
F2
A0
A1
A2
Decoding the Instruction - DataThe instruction holds the key to all of the data signals
Writereg./Readreg. B
R-type
Memory,Branch
Opcode RS RT RD ShAmt Function
31-26 25-21 20-16 15-11 10-6 5-0
Opcode RS RT Immediate Data
31-26 25-21 20-16 15-0
To ctrllogic
Readreg. A
Memory address or Branch Offset
To ctrllogic
Readreg. A
Readreg. B
Writereg.
To ALUControl
Not Used
One problem - Write register number must come from two different places.
5.3
Instruction Decoding
5.3
InstructionMemory
Data Memory
AddAdd
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read dataResult
Zero
Result
Result Sh.Left2
0
1
1
00
1
signextend
PC
16 32
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Imm:[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:[15-11]
Op:[31-26]Ctrl
Read Reg A: Rs
Read Reg B: Rt
Write Reg: Either Rd or Rt
Immediate Data: [15-0]
Opcode: [31-26]
0
1
We can decode the data simply by dividing up the instruction bus
Control Signals
5.3
InstructionMemory
Data Memory
AddAdd
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read dataResult
Zero
Result
Result Sh.Left2
0
1
1
00
1
signextend
PC
16 32
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALUCtrl
6 ALUOp
ALU Control - A function of: ALUOp and the function code
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
Load
StoreLoad
Memory
Load,R-type BEQ and zero
00: Memory01: Branch10: R-type
0
1
Ctrl
Imm:[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:[15-11]
Op:[31-26]
FC:[5-0]
RegDest
R-type
Inside the control oval
Reg ALU Mem Reg Mem MemInstruction Opcode Write Src To Reg Dest Read Write PCSrc ALUOp
5.3
• This control logic can be decoded in several ways:
• Random logic, PLA, PAL
• Just build hardware that looks for the 4 opcodes
• For each opcode, assert the appropriate signals
Note: BEQ must also check the zero output of the ALU...Note: BEQ must also check the zero output of the ALU...
BEQ 000100 0 0 x x 0 0 1 01
R-format 000000 1 0 0 1 0 0 0 10
LW 100011 1 1 1 0 1 0 0 00
SW 101011 0 1 x x 0 1 0 00
0:Rt1:Rd
0:Reg1:Imm
1:Mem0:ALU
1:Branch
00:Mem01:Branch10:R-type
Control Signals
5.3
InstructionMemory
Data Memory
AddAdd
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read dataResult
Zero
Result
Result Sh.Left2
0
1
1
00
1
signextend
PC
16 32
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALUCtrl
6
ALUOp
RegWrite
MemToReg
MemWriteMemRead
ALUSrc
PCSrc
0
1
Ctrl
Imm:[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:[15-11]
Op:[31-26]
FC:[5-0]
RegDest
BEQ
ReadWrite
We must ANDBEQ and Zero
Jumping
5.3
InstructionMemory
Data Memory
AddAdd
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read dataResult
Zero
Result
Result Sh.Left2
0
1
1
00
1
signextend
PC
16 32
Read reg. num A
RegistersRead reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALUCtrl
6
ALUOp
RegWrite
MemToReg
MemWriteMemRead
ALUSrc
PCSrc
0
1
Ctrl
Imm:[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:[15-11]
Op:[31-26]
FC:[5-0]
RegDest
BEQ
ReadWrite
1
0
Sh.Left2
J:[25-0]
Concat.26
4
32
28
[31-28]
Jump
PerformanceWhat major functional units are used by different instructions?
R-type: Instr. Fetch Register Read ALU Register Write
Assume the following times:
Memory Access: 2ns
ALU: 2ns
Registers: 1ns
6ns
8ns
7ns
5ns
2ns
Branch: Instr. Fetch Register Read ALU
LW: Instr. Fetch Register Read ALU Memory Read Register Write
SW: Instr. Fetch Register Read ALU Memory Write
Jump: Instr. Fetch
Since the longest time is 8ns (LW),the cycle time must be at least 8ns.
Example
• Calculate the execution times for the following program in a Single-cycle datapath with a cycle time of 50 ns
main:
add $9, $0, $0 # clear $9
lw $8, Tonto($9) # put Tonto[0] in $8
addi $9, $9, 4 # increment $9
lw $10, Tonto($9) # put Tonto[1] in $10
add $11, $10, $8
Example 2
Calculate the execution times for the following program in a Single-cycle datapath with a cycle time of 50 ns
.dataARRAY: .word 3, 5, 7, 9, 2 #random valuesSUM: .word 0 #initialize sum to zero
.textmain: addi $6, $0, 5 #initialize loop counter to 5
addi $7, $0, 0 #initialize array index to zeroaddi $8, $0, 0 #set $8 (sum temp) to zero
REPEAT: lw $5, ARRAY($7) #R5 = ARRAY[i]add $8, $8, $5 #SUM+= ARRAY[I]addi $7, $7, 4 #increment index (i++)addi $6, $6, -1 #decrement loop counterbne $6, $0, REPEAT #check if 5 repetitionssw $8, SUM($0) #copy sum to memoryaddi $v0, $0, 10 #exit programsyscall