Chapter 4
description
Transcript of Chapter 4
Chapter 4
The Processor
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013
Zhao ZhangIowa State UniversityRevised from original slides provided by MKP
Week 8 Overview CPU design overview Datapath and Control Control Unit ALU Control Unit
Chapter 1 — Computer Abstractions and Technology — 2
Announcements Mini-project B starts in week 9
Mini-projects B and C will be revised The grading scale will be discussed by
Friday (week 8)
Chapter 1 — Computer Abstractions and Technology — 3
Chapter 4 — The Processor — 4
Introduction CPU performance factors
Instruction count Determined by ISA and compiler
CPI and Cycle time Determined by CPU hardware
We will examine two MIPS implementations A simplified, single-cycle version A more realistic, pipelined version
§4.1 Introduction
Nine-Instruction MIPS We will first use a MIPS subset of nine
instructions, then extend the subset It’s enough to illustrate the most aspects of CPU
design, particularly datapath and control design
Memory reference: LW and SW
Arithmetic/logic: ADD, SUB, AND, OR, SLT
Branch: BEQ, BNE
Chapter 1 — Computer Abstractions and Technology — 5
Chapter 4 — The Processor — 6
Instruction Execution PC instruction memory, Fetch instruction Register numbers register file, Read registers
Then, depending on instruction class Execute: Use ALU to calculate
Arithmetic result Memory address for load/store Branch target address
Memory access: Access data memory for load/store Register writeback: Write data back to registers
PC update (for all): PC target address or PC + 4
Chapter 4 — The Processor — 7
CPU Overview
Next Sequential PC= PC + 4
Branch Target= (PC+4)+offset
A Sketchy view
An instruction may change1. PC (all instructions)2. Some register (arithmetic/logic, load)3. Some memory word/halfword/byte (store)
Chapter 4 — The Processor — 8
Multiplexers Can’t just join
wires together Use multiplexers
What would happen if you just join signals in VHDL?
Chapter 4 — The Processor — 9
Control
Control signals: mux select, read/write enable, ALU opcode, etc.
Chapter 4 — The Processor — 10
Logic Design Basics§4.2 Logic D
esign Conventions
Combinational element Operate on data Output is a function of input
State (sequential) elements Store information Output is a function of internal state and
input
Chapter 4 — The Processor — 11
Combinational Elements
AND-gate Y = A & B
AB
Y
I0I1
YMux
S
Multiplexer Y = S ? I1 : I0
A
B
Y+
A
B
YALU
F
Adder Y = A + B
Arithmetic/Logic Unit Y = F(A, B)
Chapter 4 — The Processor — 12
Sequential Elements Register: stores data in a circuit
Uses a clock signal to determine when to update the stored value
Edge-triggered: update when Clk changes from 0 to 1
Data output Q is stable for a clock cycle
D
Clk
QClk
D
Q
Chapter 4 — The Processor — 13
Sequential Elements Register with write control
Only updates on clock edge when write control input is 1
VHDL: rising_edge(Clk) AND Write Used when stored value is required later
D
Clk
Q
Write
Write
D
Q
Clk
Chapter 4 — The Processor — 14
Clocking Methodology Combinational logic transforms data during
clock cycles Input from state elements Output must stabilize within one cycle
Longest delay determines clock period Output to state element at the next rising edge
Clocking Methodology Processor is a big state machine
Works like a Moore machine in non-I/O phase Output is a function of the state
States include PC, all registers and memory contents
Chapter 1 — Computer Abstractions and Technology — 15
Chapter 4 — The Processor — 16
Building a Datapath Datapath elements
Elements that process data and addressesin the CPU
Registers, ALUs, mux’s, memories, …
We will build a MIPS datapath incrementally Refining the overview design
§4.3 Building a D
atapath
Chapter 4 — The Processor — 17
Instruction Fetch
32-bit register
Increment by 4 for next instruction
Datapath elements: PC register, instruction memory, 32-bit adder
Chapter 4 — The Processor — 18
R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result
Datapath elements: Register file, ALU
Chapter 4 — The Processor — 19
Load/Store Instructions Read register operands Calculate address using 16-bit offset
Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory
Datapath elements: Data memory, sign extender
Chapter 4 — The Processor — 20
Branch Instructions Read register operands Compare operands
Use ALU, subtract and check Zero output Calculate target address
Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4
Already calculated by instruction fetch
Chapter 4 — The Processor — 21
Branch Instructions
Justre-routes wires
Sign-bit wire replicatedNew: Shifter, 2nd 32-bit Adder
Chapter 4 — The Processor — 22
Composing the Elements First-cut data path does an instruction in
one clock cycle Each datapath element can only do one
function at a time Hence, we need separate instruction and data
memories Use multiplexers where alternate data
sources are used for different instructions
Chapter 4 — The Processor — 23
R-Type/Load/Store Datapath
Chapter 4 — The Processor — 24
Full Datapath
Chapter 4 — The Processor — 25
Performance Issues Longest delay determines clock period
Critical path: load instruction Instruction memory register file ALU data
memory register file Not every instruction requires the same time
Chapter 4 — The Processor — 26
Performance Issues Some instructions may take substantially longer
time, e.g. multiply/division Not feasible to vary clock cycle for different
instructions Must use the worst-case delay as the clock cycle Violates design principle making the common
case fast We will improve performance by pipelining
Chapter 4 — The Processor — 27
ALU Control ALU used for
Load/Store: F = add Branch: F = subtract R-type: F depends on funct field
§4.4 A S
imple Im
plementation S
cheme
ALU control Function
0000 AND
0001 OR
0010 add
0110 subtract
0111 set-on-less-than
1100 NOR
Chapter 4 — The Processor — 28
ALU Control Assume 2-bit ALUOp derived from opcode
Combinational logic derives ALU control
opcode ALUOp Operation funct ALU function ALU control
lw 00 load word XXXXXX add 0010
sw 00 store word XXXXXX add 0010
beq 01 branch equal XXXXXX subtract 0110
R-type 10 add 100000 add 0010
subtract 100010 subtract 0110
AND 100100 AND 0000
OR 100101 OR 0001
set-on-less-than 101010 set-on-less-than 0111
VHDL Notes How to program the ALU control? -- Behavior style process (alu_op, funct) begin case alu_op is when ‘00’ => alu_code <= ‘0010’; when ’01’ => … end case; end process;
Chapter 1 — Computer Abstractions and Technology — 29
Chapter 4 — The Processor — 30
The Main Control Unit Control signals derived from instruction
0 rs rt rd shamt funct
31:26 5:025:21 20:16 15:11 10:6
35 or 43 rs rt address
31:26 25:21 20:16 15:0
4 rs rt address
31:26 25:21 20:16 15:0
R-type
Load/Store
Branch
opcode always read
read, except for load
write for R-type and load
sign-extend and add
Chapter 4 — The Processor — 31
Datapath With Control
Summary of Control Signals RegDst: Write to register rt or rd? ALUSrc: Immediate to ALU? MemtoReg: Write memory or ALU output? RegWrite: Write to regfile at all? MemRead: Read from Data Memory? MemWrite: Write to the Data Memory? Branch: Is it a branch intruction? ALUOp[1:0]: ALU control field
Chapter 1 — Computer Abstractions and Technology — 32
Chapter 4 — The Processor — 33
R-Type Instruction
R-Type: Control Signals
RegDst ALUSrc MemtoRegRegWriteMemReadMemWriteBranchALUOp[1:0]
1 (write to rd)0 (No immediate)0 (wrote not from memory)1 (does write regfile)
0 (no memory read)
0 (no memory write)
0 (does write regfile)
10 (R-type ALU op)
Chapter 1 — Computer Abstractions and Technology — 34
Chapter 4 — The Processor — 35
Load Instruction
Load: Control Signals
RegDst ALUSrc MemtoRegRegWriteMemReadMemWriteBranchALUOp[1:0]
011110000
Chapter 1 — Computer Abstractions and Technology — 36
Store: Control Signals
RegDst ALUSrc MemtoRegRegWriteMemReadMemWriteBranchALUOp[1:0]
X1X001000
Chapter 1 — Computer Abstractions and Technology — 37
Chapter 4 — The Processor — 38
Branch-on-Equal Instruction
BEQ: Control Signals
RegDst ALUSrc MemtoRegRegWriteMemReadMemWriteBranchALUOp[1:0]
X0X000101
Chapter 1 — Computer Abstractions and Technology — 39
Control Signal Setting What’re the control signal values for each
instruction or instruction type?
Inst Reg-Dst
ALU-Src
Mem-toReg
Reg-Write
Mem-Read
Mem-Write
Branch
ALUOp1
ALUOp0
R- 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
Chapter 1 — Computer Abstractions and Technology — 40
Note: “R-” means R-format
VHDL Notes How to program the control?
entity control is port (op_code : in m32_6bits; reg_dst : out m32_1bit; alu_src : out m32_1bit; mem_to_reg : out m32_1bit; reg_write : out m32_1bit; mem_read : out m32_1bit; mem_write : out m32_1bit; branch : out m32_1bit; alu_op : out m32_2bits);end control;
Chapter 1 — Computer Abstractions and Technology — 41
VHDL Notesarchitecture rom of control is subtype code_t is m32_vector(8 downto 0); type rom_t is array (0 to 63) of code_t; -- The ROM content for control signals signal rom : rom_t := ( 00 => "100100010", -- R-type 35 => "011110000", -- LW … -- More for other instructions others=>"000000000");
begin (reg_dst, alu_src, mem_to_reg, reg_write, mem_read, mem_write, branch, alu_op(1), alu_op(0)) <= rom(to_integer(unsigned(op_code)));end rom;
Chapter 1 — Computer Abstractions and Technology — 42
Chapter 4 — The Processor — 43
Implementing Jumps
Jump uses word address Update PC with concatenation of
Top 4 bits of old PC 26-bit jump address 00
Need an extra control signal decoded from opcode
2 address
31:26 25:0
Jump
Chapter 4 — The Processor — 44
Datapath With Jumps Added
Grading Scale Tentative grading scale
A: 90, A-: 87
B+: 84, B: 80, B-: 75
C+: 70, C: 65, C-: 60
D: 50 There will be a bonus in lab projects
Chapter 1 — Computer Abstractions and Technology — 45
Mini-Project B, Tentative
Implement single-cycle processor (SCP). There will be three parts
1. Part 1, SCPv1: Implement the nine-instruction ISA plus the J instruction
2. Part 2, SCPv2a: Support all the instructions needed to run bubble sorting
3. Part 3, SCPv2b: Detailed modeling of data elements
Chapter 1 — Computer Abstractions and Technology — 46
Mini-Project B Bonus part, SCPv3: Support all integer
instructions on the green sheet, due in the last lab
Some support files will be provided High-level modeling of Register File, ALU,
Adder, to be used in Parts 1 and 2
Partial sample VHDL code will be provided
Chapter 1 — Computer Abstractions and Technology — 47
Mini-Project B The CPU composition must be strongly
structural Parts 1 and 2 may use behavior/dataflow
modeling for data elements Part 3 must use detailed modeling for data
elements – Reuse your VHDL code in the labs
Chapter 1 — Computer Abstractions and Technology — 48
Extend Single-Cycle MIPS
Consider the following instructions addi: add immediate sll: Shift left logic by a constant bne: branch if not equal jal: Jump and link jr: Jump register
Chapter 1 — Computer Abstractions and Technology — 49
Chapter 4 — The Processor — 50
SCPv0: R-Format, LW/SW, BEQ
Chapter 4 — The Processor — 51
SCPv1: R-Format, LW/SW, BEQ, J
SCPv1: Control Signals What’re the control signal values for each
instruction or instruction type?
Inst Reg-Dst
ALU-Src
Mem-toReg
Reg-Write
MemRead
MemWrite
Branch
ALUOp1
ALUOp0
Jump
R- 1 0 0 1 0 0 0 1 0 0
lw 0 1 1 1 1 0 0 0 0 0
sw X 1 X 0 0 1 0 0 0 0
beq X 0 X 0 0 0 1 0 1 0
j X X X 0 0 0 0 X X 1
Chapter 1 — Computer Abstractions and Technology — 52
Note: “R-” means R-format
Extend the Single-Cycle Processor
For each instruction, do we need
1. Any new or revised datapath element(s)?
2. Any new control signal(s)?
Then revise, if necessary,
3. Datapath: Add new elements or revise existing ones, add new connections
4. Control Unit: Add/extend control signals, extend the truth table
5. ALU Control: Extend the truth tableChapter 1 — Computer Abstractions and Technology — 53
SCPv0 + ADDI
addi rs, rt, immediate
R[rt] = R[rs]+SignExtImm Read register operands (only one is used) Sign extend the immediate (in parallel) Perform arithmetic/logical operation Write register result
Chapter 1 — Computer Abstractions and Technology — 54
001000 rs rt immediate
31:26 25:21 20:16 15:0
SCPv0 + ADDI
Chapter 1 — Computer Abstractions and Technology — 55
What changes to this baseline?
Chapter 4 — The Processor — 56
SCPv0 + ADDI
Do we need new or revised datapath elements?
SCPv0 + ADDI Do we need new or revised datapath
elements? Do we need new control signal(s)?
Inst Reg-Dst
ALU-Src
Mem-toReg
Reg-Write
Mem-Read
Mem-Write
Branch
ALUOp1
ALUOp0
R- 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
addi
Chapter 1 — Computer Abstractions and Technology — 57
SCPv0 + ADDI Like LW
I-format instruction Write to register[rt] Use add operation
Inst Reg-Dst
ALU-Src
Mem-toReg
Reg-Write
Mem-Read
Mem-Write
Branch
ALUOp1
ALUOp0
R- 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
addi 0 1 0 1 0 0 0 0 0
Chapter 1 — Computer Abstractions and Technology — 58
Like R-format arithmetic Write ALU result to
register file
SCPv0 + SLL
sll rd, rs, shamt
R[rd] = R[rt]<<shamt Read register operands (only one is used) Perform shift operation Write register result
Note: sllv rd, rt, rs for shift left logic variable
Chapter 1 — Computer Abstractions and Technology — 59
000000 rs rt rd shamt 000000
31:26 5:025:21 20:16 15:11 10:6
SCPv0 + SLL
Chapter 1 — Computer Abstractions and Technology — 60
What changes to the datapath elements?
SCPv0 + SLL
Chapter 1 — Computer Abstractions and Technology — 61
ALU needs to do the shift operation
ALU 1st input needs anothersource: shamt extended to 32-bit
SCPv0 + SLL Add another source to the 1st
input of ALU Shamt: Instruction[10-6]
Add a Mux and ALUSrc1 control line 0: R[rs] 1: Shamt (sign-extended)
Rename ALUSrc to ALUSrc2 Extend ALU control
Add an ALU control code for SLL
Chapter 1 — Computer Abstractions and Technology — 62
Chapter 4 — The Processor — 63
SCPv0 + SLL Extend ALU control: Choose a code of
your choice (kkkk shown in the table)
opcode ALUOp Operation funct ALU function ALU control
lw 00 load word XXXXXX add 0010
sw 00 store word XXXXXX add 0010
beq 01 branch equal XXXXXX subtract 0110
R-type 10 add 100000 add 0010
subtract 100010 subtract 0110
AND 100100 AND 0000
OR 100101 OR 0001
set-on-less-than 101010 set-on-less-than 0111
shift-left-logic 000000 shift-left-logic kkkk
SCPv0 + SLLInst Reg-
DstALU-Src
Mem-toReg
Reg-Write
Mem-Read
Mem-Write
Branch
ALUOp
R- 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
sll
Chapter 1 — Computer Abstractions and Technology — 64
Inst Reg-Dst
ALUSrc1
ALU-Src2
Mem-toReg
Reg-Write
Mem-Read
Mem-Write
Branch
ALUOp
R- 1 0 0 0 1 0 0 0 1 0
lw 0 0 1 1 1 1 0 0 0 0
sw X 0 1 X 0 0 1 0 0 0
beq X 0 0 X 0 0 0 1 0 1
sll 1 1 0 0 1 0 0 0 1 0
SCPv0 + BNEbne rs, rt, label
PC = (R[Rs]!=R[rt]) ? PC+4+(SignExtImm<<2) : PC+4
Read register operands Compare operands
Use ALU, subtract and check Zero output Calculate target address
Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4
Already calculated by instruction fetchChapter 1 — Computer Abstractions and Technology — 65
000101 rs rt offset
31:26 25:21 20:16 15:0
Chapter 4 — The Processor — 66
SCPv0 + BNE
Make what changes tothe datapath?
SCPv0 + BNE Extend Branch to two bits
10: Branch-Equal 11: Branch-Not-Equal
Replace the AND gate with the following logic
Can use a different truth table
Chapter 1 — Computer Abstractions and Technology — 67
Branch Zero Branch taken?
1 0 1 1
1 1 0 1
otherwise 0
SCPv0 + BNEInst Reg-
DstALU-Src
Mem-toReg
Reg-Write
Mem-Read
Mem-Write
Branch
ALUOp
R- 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
bne
Chapter 1 — Computer Abstractions and Technology — 68
Inst Reg-Dst
ALU-Src
Mem-toReg
Reg-Write
Mem-Read
Mem-Write
Branch
ALUOp
R- 1 0 0 1 0 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0 0
sw X 1 X 0 0 1 0 0 0 0
beq X 0 X 0 0 0 1 0 0 1
bne X 0 X 0 0 0 1 1 0 1
SCPv1 + JAL
jal target
PC = JumpAddrR[31] = PC+4
Jump uses word address Update PC with JumpAddr: concatenation
of top 4 bits of old PC, 26-bit jump address, and 00 (called pseudo-direct)
Save PC+4 to $raChapter 1 — Computer Abstractions and Technology — 69
000011 address
31:26 25:0
Chapter 4 — The Processor — 70
SCPv1 + JAL
Make what changes tothe datapath?