Single-Cycle MIPS Processor Complete Single Cycle Processor
Transcript of Single-Cycle MIPS Processor Complete Single Cycle Processor
3/12/2015
1
Lecture 19, 20 & 21: Processor Pipelining – Part I
Reading: Chapter 4, March 9, 11, 13, 2015Patterson & Hennesey texbook Prof. R. Iris Bahar
© 2015 R.I. BaharPortions of these slides taken from Professors S. Reda
and D. Patterson
Homework #2 has been posted Due Friday, March 20 Pipeline hazards covered today and next week.
Lab #4 has been posted design a single-cycle processor First due date: Friday, March 20
Demo the first set of instructions executing through your processor design
Second due date: Friday, April 3 Demo the full set of instructions executing correctly on your processor Final report due at this time
Before starting on lab, go through the TimingQuest and PLL tutorials. This lab is to be completed INDIVIDUALLY
Homework #2 and Lab #4
2
Datapath Control
Single-Cycle MIPS Processor
3
Fetch instruction @ PC
Decode instruction
Fetch Operands
Execute instruction
Store result
Update PC
4
Complete Single Cycle Processor
Datapath for all instructions except jump
3/12/2015
2
Single Cycle processor with Control
5
[without jumps]
Datapath and control with jumps
6
7
Performance Issues
Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file
Not feasible to vary period for different instructions Violates design principle
Making the common case fast
We will improve performance by pipelining
8
Pipelining Analogy Pipelined laundry: overlapping execution
Parallelism improves performance
3/12/2015
3
MIPS stages
9
Five stages, one step per stage1. IF: Instruction fetch from memory2. ID: Instruction decode & register read3. EX: Execute operation or calculate address4. MEM: Access memory operand5. WB: Write result back to register
Need registers between stages Holds information produced in previous cycle Note that the register file is written in written in the first half
of the cycle and read in the second half.
MIPS datapath pipeline stages
10
Showing optimal resource usagePipeline datapath abstraction
11
Multi-cycle datapath pipeline diagram
12
Traditional form
3/12/2015
4
Tracing lW in its journey: 1st cycle
13
Tracing lw in its journey: 2nd cycle
14
Tracing lw in its journey: 3rd cycle
15
Tracing lw in its journey: 4th cycle
16
3/12/2015
5
Tracing lw in its journey: 5th cycle
17
Wrongregisternumber
Corrected pipeline datapath for lW
18
Pipeline state in 5th cycle
19
lw $10, 20($1)sub $11, $2, $3add $12, $3, $4lw $13, 24($1)add $14, $5, $6
20
Pipeline Performance Assume time for stages is
100ps for register read or write 200ps for other stages
Compare pipelined datapath with single-cycle datapath
Instr Instr fetch Register read
ALU op Memory access
Register write
Total time
lw 200ps 100 ps 200ps 200ps 100 ps 800ps
sw 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps
3/12/2015
6
Single-cycle vs. pipeline performance
21
Single-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
22
Pipeline Speedup
If all stages are balanced i.e., all take the same time Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages
If not balanced, speedup is less Speedup due to increased throughput
Latency (time for each instruction) does not decrease
How many cycles does it take to execute this code?
Pipeline datapath summary
23
Reminder of single-cycle control
24
3/12/2015
7
25
ALU Control Assume 2-bit ALUOp derived from opcode
Combinational logic derives ALU control Define additional ALU control encodings to expand its functionality
opcode ALUOp Operation funct ALU function ALU control
lw 00 load word XXXXXX add 0010sw 00 store word XXXXXX add 0010beq 01 branch equal XXXXXX subtract 0110R-type 10 add 100000 add 0010
subtract 100010 subtract 0110AND 100100 AND 0000OR 100101 OR 0001set-on-less-than 101010 set-on-less-than 0111
Main decoder
26
Instruction Op5:0 RegWrite RegDst AluSrc Branch Mem-read MemWrite MemtoReg ALUOp1:0
R-type 000000 1 1 0 0 0 0 0 10
lw 100011 1 0 1 0 1 0 1 00
sw 101011 0 X 1 0 0 1 X 00
beq 000100 0 X 0 1 0 0 X 01
addi001000 1 0 1 0 0 0 0 00
Control signals
27
Modifications to pipeline control
28
Control signals are derived from instructions Same as in single-cycle implementation Control is carried over to the proper pipeline stage
3/12/2015
8
Pipelined datapath + control
29
Example: Cycle 1
30
Cycle 2
31
Cycle 3
32
3/12/2015
9
Cycle 4
33
Cycle 5
34
Cycle 6
35
Cycle 7
36
3/12/2015
10
Cycle 8
37
Cycle 9
38
39
Pipelining Hazards
Hazards are situations that prevent starting the next instruction in the next cycle1. Structural hazards
A required resource is busy2. Data hazards
Need to wait for previous instruction to complete its data read/write
3. Control hazards Deciding on control action depends on previous instruction
40
1. Structure Hazards
Conflict for use of a resource What if in MIPS pipeline we had a single memory for
instruction and data? Load/store requires data access Instruction fetch would have to stall for that cycle
Would cause a pipeline “bubble”
Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches
What about having only one adder in the MIPS pipeline?
3/12/2015
11
2. Data Hazards: compute-use
41
Time (cycles)
add $s0, $s2, $s3 RF $s3
$s2RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IM add
or
sub
2. Data Hazard: load-use
42
Handling data hazards
43
A. Compile-time techniques B. Stall the processor at run timeC. Forward data at run time
44
A. Compile Time Technique: Code Scheduling
Reorder code to avoid use of load result in the next instruction
C code for A = B + E; C = B + F; Compiler must be aware of pipeline structure
lw $t1, 0($t0)
lw $t2, 4($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
lw $t4, 8($t0)
add $t5, $t1, $t4
sw $t5, 16($t0)
stall
stall
lw $t1, 0($t0)
lw $t2, 4($t0)
lw $t4, 8($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
add $t5, $t1, $t4
sw $t5, 16($t0)
11 cycles13 cycles
3/12/2015
12
A. Compile Time Technique: Insert NOPs
45
Time (cycles)
add $s0, $s2, $s3 RF $s3
$s2RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IM add
or
sub
nop
nop
RF RFDMnopIM
RF RFDMnopIM
9 10
Insert enough NOPs until result is ready (wastes cycles) Doesn’t require HW to detect hazards
46
B. Run time technique: Stall pipeline Detect dependency at run time and insert “bubbles”
Prevent new instruction from advancing in pipelineadd $s0, $t0, $t1sub $t2, $s0, $t3
How do we stall the pipeline?
Do not update PC or IF/ID instruction in ID stage is decoded again, instruction in IF stage
is fetched again
Force control values in ID/EX register to 0 Essentially passes on a NOP instruction to the EX stage Inserting a 2-cycle stall allows results to be written to
register file before reading them in ID stage.
47
inserting a bubble Don’t wait for result to be stored in a register forward the results from wherever they happen to be
Requires extra connections in the datapath
C. Data forwarding during runtime
48
3/12/2015
13
Dependencies and forwarding
49
Circuitry for forwarding
50
One more MUX for immediates
51
When should data be forwarded? EX/MEM.RegWrite and/or MEM/WB.RegWrite are
true
Destination register(s) are equal to the source registers of the next 1 - 2 instructions. That is, EX/MEM.RegisterRd == ID/EX.RegisterRs
EX/MEM.RegisterRd == ID/EX.RegisterRt
MEM/WB.RegisterRd == ID/EX.RegisterRs
MEM/WB.RegisterRd == ID/EX.RegisterRt
Dest. reg. in EX/MEM and/or MEM/WB is not $0. EX/MEM.RegisterRd ≠ 0
MEM/WB.RegisterRd ≠ 052