RISC Central Processing Unit
Transcript of RISC Central Processing Unit
http://www.cs.nctu.edu.tw/~ldvan/
RISC Central Processing Unit
Lan-Da Van (范倫達), Ph. D.
Department of Computer Science National Chiao Tung University
Taiwan, R.O.C. Spring, 2012
Source: Prof. M. Morris Mano and Prof.
Charles R. Kime, Logic and Computer Design
Fundamentals, 3rd Edition, 2004, Prentice Hall.
Digital Systems Design
Lecture 10
Outline
Introduction
Pipelined Datapath
Pipelined Control
The Reduced Instruction Set Computer (RISC)
Summary
2
Digital Systems Design
Lecture 10
Introduction
CPU: Datapath and control unit
Datapath:
— Consist of a function unit, regs, and internal buses
— May be nonpiplined or pipelined
Control unit:
— Consist of a program counter, an instr reg, and
control logic
— May be hardwired, microprogrammed, or pipelined
(if the datapath is pipelined)
3
Digital Systems Design
Lecture 10
Conventional datapath v.s. Pipelined datapath
Fig 12-1 4
Digital Systems Design
Lecture 10
Detailed block
diagram of the
pipelined datapath:
— the increased cost:
the pipeline platforms
Fig 12-2
5
Digital Systems Design
Lecture 10
Pipeline Execution Pattern (1/2)
Execution of pipeline -ops:
Fig 12-3
6
Digital Systems Design
Lecture 10
Pipeline Execution Pattern (2/2)
In the first two clock cycles, not all of the pipeline stages are active. => filling
In the next five clock cycles, all stages of the pipeline are active. => fully utilized
In the last two clock cycles, not all pipeline stages are active. => emptying
Speedup = (712)/ (95) = 1.9
More pipelines, much better?? No… — The delay contributed by the pipeline platforms
— The difference between the delay of the logic assigned to each stage
— Thus, there exists one optimized pipelined stages.
7
Digital Systems Design
Lecture 10
Pipelined Control
Based on the single-cycle computer: (hardwired control)
Fig 12-2 Fig 10-15
8
Digital Systems Design
Lecture 10
Block diagram
of pipelined
computer:
9
Digital Systems Design
Lecture 10
Pipeline Stages
Stage 1: Instr fetch (IF)
Stage 2: Instr decoder & reg file read (DOF)
— (decode & operand fetch)
Stage 3: The function unit & data mem read
and write (EX)
Stage 4: Reg file write (WB)
Pipeleine Principle: The location of the pipeline
platforms has balanced the partitioning of the
delays.
Timing: 5 ns/stage 200 MHz, speedup = 3.4 (w.r.t single-cycle)
10
Digital Systems Design
Lecture 10
Pipeline Programming and Performance (1/3)
E.g.: Load constants 1 ~ 7 into regs R1 ~ R7
1 LDI R1, 1
2 LDI R2, 2
3 LDI R3, 3
4 LDI R4, 4
5 LDI R5, 5
6 LDI R6, 6
7 LDI R7, 7
11
Digital Systems Design
Lecture 10
In the first 4 clk periods: 20 ns
— 1/4 + 1/2 + 3/4 + 1 = 2.5 instrs completed
Overall time: 50 ns
— 10 clk cycles for 7 instrs
— Speedup = (7 17) / 50 = 2.38
Filling: the first 3 clks
Fully utilized: the next 4 clks
— Speedup = (4 17) / 20 = 3.4
Emptying: the last 3 clks
12
Pipeline Programming and Performance (2/3)
Digital Systems Design
Lecture 10
For a k-stage pipelined computer:
— The speedup is not k w.r.t. the single-cycle
computer
The delays cannot be divided into k equal pieces.
The delays of the added pipeline platforms
The delay of the function unit is larger than that of ideal k
equal delays.
The filling and emptying of the pipeline
Data hazard
Control hazard
13
Pipeline Programming and Performance (3/3)
Digital Systems Design
Lecture 10
The Reduced Instruction Set Computer
Design goal:
— A RISC with a pipelined datapath and control unit
— The instr set arch:
load/store mem access,
4 addressing modes,
A single instr format length,
Instrs that require only elementary ops.
— The ops, resembling those that can be performed by
the single-cycle computer, can be performed by a
single pass through the pipeline.
14
Digital Systems Design
Lecture 10
Instruction Set Architecture
The CPU regs accessible to the programmer: Fig12-6
— Reg file: 32 regs, 32 bits/reg, R0 = 0
The size of the reg file: RISC > CISC, load/store instr set
arch
— PC: 32 bits
— No stack pointer or status reg
15
Digital Systems Design
Lecture 10
Instr formats:
— 3-reg type
— 2-reg type
— Branch: target addr = PC + target offset
16
Instruction Format
Digital Systems Design
Lecture 10
17
Instructions (1/2)
Digital Systems Design
Lecture 10
— All of the ops are elementary and can be described
by a single reg transfer statement.
— The only ops that can access memory: Load & Store
— The immediate field: 15 bits 32 bits and using zero
fill or sign extension
— BZ, BNZ, SLT: handle the absence of stored versions
of status bits
— JML: Jump and Link
18
Instructions (2/2)
Digital Systems Design
Lecture 10
Addressing Modes
4 addressing modes: specified by the opcode
i. Register: the 3-operand data manipulation instrs
ii. Register indirect: load and store instrs
iii. Immediate: the 2-reg format instrs
iv. Relative: branch and jump instrs
Implement an addressing mode not directly
provided:
— Use a sequence of RISC instrs
— E.g.: Indexed addressing, R15 M[R5 + 0 || I]
AIU R9, R5, I
LD R15, R9
19
Digital Systems Design
Lecture 10
Datapath Organization
The Pipelined
computer in Fig 12-
4: 16-bit version
20
Digital Systems Design
Lecture 10
Modified Datapath and Control Unit
Modifications of
datapath: Fig 12-4
Fig 12-8: 32-bit
version
— Register file
— Function unit
— Bus structure
Modifications of
control unit:
— Instruction decoder
— Control logic related
to the PC
— Pipeline platforms
21
Digital Systems Design
Lecture 10
Register File and Function Unit
Reg file:
— 16 16-bit regs & all regs are identical in function
32 32-bit regs & R0 = 0
— edge triggered read-after-write reg file
Function unit:
— ALU: 16 bits 32 bits
— Shifter: Single-bit position shifter Barrel shifter
with lsr or lsl of 0 ~ 31 positions (SH: IR[4:0])
22
Digital Systems Design
Lecture 10
Left/right: a control signal decoded from OPCODE SH: = IR(4:0), the shift amount field
Perform both the left and right shift by using a right rotate:
p-position right shift rotate p position to the right
p-position left shift rotate 64 p position to the right 23
32-bit Barrel Shifter
Digital Systems Design
Lecture 10
Bus structure:
— zero fill constant unit:
CS = 0, zero fill ; CS = 1, sign extension
0 || IM se IM
— MUX A is added: provide a path for PC1 to the reg file
for implementing JML instr
Jump and Link: PC PC + se IM, R[DR] PC + 1
— MUX D is extended: help implement SLT instr
Set if Less Than: If R[SA] < R [SB] then R[DR] = 1
24
Bus Structure
Digital Systems Design
Lecture 10
Instruction Decoder
Instruction decoder: to deal with the new instr
set
— SH is added as an IR field.
— A 1-bit CS field is added to the instr decoder.
— A 1-bit MA field is added to the instr decoder.
— MD is expanded to two bits.
— A new pipeline platform for SH & expanded 2-bit
platforms for MD
25
Digital Systems Design
Lecture 10
Control Logic (1/2)
Control logic related to the PC: Permit the loading
of addrs into the PC for implementing branches
and jumps
— MUX C: in EX stage, selects from 3 different sources
for the next value of PC (BS, PS)
PC + 1
BrA: for branches and jumps PC PC + 1 + se IM
R[AA]: for reg jump PC R[AA]
— Pipeline regs PC1 & PC2
26
Digital Systems Design
Lecture 10
Control Logic (2/2)
27
Digital Systems Design
Lecture 10
Control Words for Instrs
28
Digital Systems Design
Lecture 10
29
Digital Systems Design
Lecture 10
Data Hazards
Data hazard example:
Solutions of data hazard:
— Program-based solution
— Data hazard stall
— Data forwarding
30
Digital Systems Design
Lecture 10
Program-Based Solution
31
Disadv. of
program-based
sol.:
— The program is
longer (unless
some unrelated
instrs may be
placed in the NOP
positions)
— Reduce the
throughput
Digital Systems Design
Lecture 10
Data Hazard Stall (1/4)
Data hazard stall: HW-based sol.
32
Digital Systems Design
Lecture 10
Data Hazard Stall (2/4)
When an operand is found at the DOF stage
that has not been written back yet, the
associated execution and write-back are
delayed by stalling the pipeline flow in IF and
DOF for one clock cycle.
— The pipeline is said to be stalled, i.e., contain a
bubble in subsequent clock cycles and stages for
that instr.
Disadv.:
— Has the same throughput penalty as the program w/
the NOPs
33
Digital Systems Design
Lecture 10
Data Hazard Stall (3/4)
The following events must all occur for HA.
— MA in the DOF stage must be 0, meaning that the A
operand is coming from the register file.
— AA in the DOF stage equals DA in the EX stage,
meaning that there is potentially a register being
read in the DOF stage that is to be written in the next
clock cycle.
— RW in the EX stage is 1, meaning that register DA in
the EX stage will definitely be written in WB during
the next clock cycle.
— The OR of all bits of DA is 1, meaning that the
register to be written is not R0 and so is a register
that must be written before being read.
34
Digital Systems Design
Lecture 10
Data Hazard Stall (4/4)
Pipelined RISC with Data hazard stall:
— Added or modified hardware:
Data hazard detection: DHS
Pipeline stalling:
DHS is inverted to initiate a bubble in the pipeline for the instr
currently in the IR and to stop the PC and IR from changing.
4
0
)()(i
iEXEXDOFEXDOF DARWAADAAMHA
4
0
)()(i
iEXEXDOFEXDOF DARWBADABMHB
HBHADHS
35
Digital Systems Design
Lecture 10
36
Digital Systems Design
Lecture 10
Data Forwarding (1/2)
Data forwarding: HW-based sol.
37
Digital Systems Design
Lecture 10
Pipelined RISC w/ Data forwarding:
— Added or modified hardware:
Data hazard detection: HA, HB
Data forwarding:
The information needed to form the result is available on the
inputs to the pipeline platform that provides the inputs to
MUX D. MUX D is added to produce the result on Bus
D
Add an additional input to MUX A & MUX B from Bus D
38
Data Forwarding (2/2)
Digital Systems Design
Lecture 10
39
Digital Systems Design
Lecture 10
Control hazard example: If R1 = 0
Solutions of control hazard:
— Program-based solution
— Branch hazard stall
— Branch prediction
1 BZ R1, 18
2 MOVA R2, R3
3 MOVA R1, R2
20 MOVA R5, R6
40
Control Hazards
Digital Systems Design
Lecture 10
Program-Based Solution
41
2 NOPs are
inserted after the
branch instr.
— (These wasted
cycles can
sometimes be
avoided by
rearranging the
order of instrs.)
Digital Systems Design
Lecture 10
Branch Hazard Stall
Branch hazard stall: HW-based sol.
— Just as in the case of the data hazard, a stall can be
used to deal w/ the control hazard.
Produce 2 bubbles after the branch instr
— Disadv.: The reduction in throughput will be the
same as w/ the insertion of NOPs.
42
Digital Systems Design
Lecture 10
Branch Prediction (1/3)
Branch prediction
— Simplest form: predict that branches will never be
taken
Instrs will be fetched and decoded and operands fetched on
the basis of +1 to the value of the PC.
If the branch is not taken, the instrs already in the pipeline
due to the prediction will be allowed to proceed.
If the branch is taken, the instrs following the branch instr
need to be cancelled.
The cancellation may be done by inserting bubbles into EX
& WB stages.
43
Digital Systems Design
Lecture 10
E.g.: Branch prediction with branch non-taken
Figure 12-16, p.555
— If the branch is taken
44
Branch Prediction (2/3)
Digital Systems Design
Lecture 10
Pipelined RISC w/ Branch prediction
— Figure 12-17, p.556
— Added or modified hardware:
Branch detection: EX stage
the selection values on the inputs to MUX C are not 00
Instruction canceling: IF & DOF stages
45
Branch Prediction (3/3)
Digital Systems Design
Lecture 10
46
Digital Systems Design
Lecture 10
Summary
Widely discussed in the following topics.
— Pipelined Datapath
— Pieplined Control
— RISC
Instruction Set Architecture
Addressing Modes
Datapath
Data Hazard
Control Hazard
47