Midterm Review 2 Dr. Zhao Zhang Iowa State University CprE 381 Computer Organization and Assembly...
-
Upload
neal-norman -
Category
Documents
-
view
221 -
download
0
Transcript of Midterm Review 2 Dr. Zhao Zhang Iowa State University CprE 381 Computer Organization and Assembly...
Midterm Review 2
Dr. Zhao ZhangIowa State University
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013
Announcement No quiz today No homework this Friday Exam on Monday 9:00-9:50 HW9 deadline extended to next Friday HW8 solutions will be posted today
Chapter 1 — Computer Abstractions and Technology — 2
Exam 2 Coverage Coverage: Ch. 4, The Processor
Datapath and control Simple MIPS pipeline Data hazards and forwarding Load-use hazard and pipeline stall Control hazards
Arithmetic will NOT be covered Will be covered in the final exam Final exam is comprehensive
Chapter 1 — Computer Abstractions and Technology — 3
Question Styles and Coverage Short answer True/False or multi-choice Design and Analysis
Signal values in the datapath and control Identify critical path Support a new MIPS instruction
Performance analysis and optimization Identify pipeline bubbles in program execution Reorder instructions to improve performance
And others
Chapter 1 — Computer Abstractions and Technology — 4
Nine-Instruction MIPS They’re enough to illustrate the most aspects of
CPU design, particularly datapath and control design
Some questions will use it as the baseline design
Memory reference: LW and SW
Arithmetic/logic: ADD, SUB, AND, OR, SLT
Branch: BEQ, J
Chapter 1 — Computer Abstractions and Technology — 5
Chapter 4 — The Processor — 6
Datapath With Jumps Added
The Control Control signals for the nine-instruction
implementation
Inst Reg-Dst
ALU-Src
Mem-toReg
Reg-Write
MemRead
MemWrite
Branch
ALUOp1
ALUOp0
Jump
R- 1 0 0 1 0 0 0 1 0 0
lw 0 1 1 1 1 0 0 0 0 0
sw X 1 X 0 0 1 0 0 0 0
beq X 0 X 0 0 0 1 0 1 0
j X X X 0 0 0 0 X X 1
Chapter 1 — Computer Abstractions and Technology — 7
Note: “R-” means R-format
Chapter 4 — The Processor — 8
ALU Control Truth table for ALU Control
Extend it as a secondary control unit in projects B & C, with more control signal output
opcode ALUOp Operation funct ALU function ALU control
lw 00 load word XXXXXX add 0010
sw 00 store word XXXXXX add 0010
beq 01 branch equal XXXXXX subtract 0110
R-type 10 add 100000 add 0010
subtract 100010 subtract 0110
AND 100100 AND 0000
OR 100101 OR 0001
set-on-less-than 101010 set-on-less-than 0111
Extend the Single-Cycle Processor
For each instruction, do we need1.Any new or revised datapath element(s)?2.Any new control signal(s)?
Then revise, if necessary, 1.Datapath: Add new elements or revise existing ones, add new connections2.Control Unit: Add/extend control signals, extend the truth table3.ALU Control: Extend the truth table
Chapter 1 — Computer Abstractions and Technology — 9
Support JAL
jal target
PC = JumpAddrR[31] = PC_plus_4
PC_plus_4 = PC+4
JumpAddr = PC_plus_4[31:28]
& Inst[25:0] & “00”
Chapter 1 — Computer Abstractions and Technology — 10
000011 address
31:26 25:0
Chapter 4 — The Processor — 11
Support JAL
Make what changes tothe datapath?
Support JAL Analyze the instruction execution
Writes register $ra ($31) Update PC with jump target
This part already done for supporting J Analyze datapath
Needs another input, fixed at 31, to “Write register” port of register file
Needs another input, PC+4, to “Write data” port of register file
Revise control Add a “link” signal The (main) control unit can tell it by reading the
opcode
Chapter 1 — Computer Abstractions and Technology — 12
Chapter 4 — The Processor — 13
SCPv1 + JAL
Revises the two muxes•Add another input•Extend the select signalsAlternatively, use extra mux
Control Signals Control signals for the nine-instruction
implementation
Inst Reg-Dst
ALU-Src
Mem-toReg
Reg-Write
MemRead
MemWrite
Branch
ALUOp1
ALUOp0
Jump Link
R- 1 0 0 1 0 0 0 1 0 0
lw 0 1 1 1 1 0 0 0 0 0
sw X 1 X 0 0 1 0 0 0 0
beq X 0 X 0 0 0 1 0 1 0
j X X X 0 0 0 0 X X 1
jal
Chapter 1 — Computer Abstractions and Technology — 14
• Add a new row for jal• Extend RegDst• Add a control line link
Control Signals Control signals for the nine-instruction
implementation
Inst Reg-Dst
ALU-Src
Mem-toReg
Reg-Write
MemRead
MemWrite
Branch
ALUOp1
ALUOp0
Jump Link
R- 1 0 0 1 0 0 0 1 0 0 0
lw 0 1 1 1 1 0 0 0 0 0 0
sw X 1 X 0 0 1 0 0 0 0 0
beq X 0 X 0 0 0 1 0 1 0 0
j X X X 0 0 0 0 X X 1 0
jal 0 X 0 1 0 0 X X X 1 1
Chapter 1 — Computer Abstractions and Technology — 15
• Extend control input to RegDst Mux: RegDst & Link• Extend control input to MemtoReg Mux: MemtoReg & Link
Chapter 4 — The Processor — 16
Simple Pipeline Add pipeline registers hold information
produced in each cycle
Chapter 4 — The Processor — 17
Pipelined Control
Chapter 4 — The Processor — 18
Hazards Situations that prevent starting the next
instruction safely in the next cycle The simple pipeline won’t work correctly
Structure hazards A required resource is busy
Data hazard Need to wait for previous instruction to
complete its data read/write Control hazard
Deciding on control action depends on previous instruction
Data Hazards
Program with data dependencesub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2)
Program with control dependence beq $1, $3, +4 addi $2, $2, 1 addi $4, $4, 1
Chapter 1 — Computer Abstractions and Technology — 19
Data Forwarding
sub $2, $1,$3 # MEM=>EX forwardingand $12,$2,$5 # WB =>EX forwardingor $13,$6,$2add $14,$2,$2sw $15,100($2)
Chapter 1 — Computer Abstractions and Technology — 20
or and sub … …
or and sub …addAND gets forwarded new $2 value
or and subaddsw SUB gets forwardednew $2 value
IF ID EX MEM WB
Chapter 4 — The Processor — 21
Data Forwarding Paths
Chapter 4 — The Processor — 22
Detecting the Need to Forward
Input rs and rt from EX rd and RegWrite from MEM rd and RegWrite from WB
Output FwdA, FwdB
Caveats Check RegWrite Check if rd = 0 Forwarding from MEM wins over WB
Review slides and textbook for details
Chapter 4 — The Processor — 23
Load-Use Data Hazardlw $s0, 20($t1)sub $t2, $s0, $t3
Can’t always avoid stalls by forwardingMust stall pipeline by one cycle
Chapter 4 — The Processor — 24
Datapath with Hazard Detection
Hazard Detection Unit
Input rs and rt from ID rt and MemRead from EX
Output PCWrite, IF/IDWrite (0 for holding instructions) Select signal to a MUX to insert bubble in EX
Read slides/textbook for details
Chapter 4 — The Processor — 25
Chapter 4 — The Processor — 26
Pipeline Stall The nop has all control signals set to zero
It does nothing at EX, MEM and WB Prevent update of PC and IF/ID register
Using instruction is decoded again (OK) Following instruction is fetched again (OK) 1-cycle stall allows MEM to read data for lw
Can subsequently forward from WB to EX
Chapter 4 — The Processor — 27
Code Scheduling to Avoid Stalls
Reorder code to avoid use of load result in the next instruction
C code for A = B + E; C = B + F;
lw $t1, 0($t0)lw $t2, 4($t0)add $t3, $t1, $t2sw $t3, 12($t0)lw $t4, 8($t0)add $t5, $t1, $t4sw $t5, 16($t0)
stall
stall
lw $t1, 0($t0)lw $t2, 4($t0)lw $t4, 8($t0)add $t3, $t1, $t2sw $t3, 12($t0)add $t5, $t1, $t4sw $t5, 16($t0)
11 cycles13 cycles
Chapter 4 — The Processor — 28
Control Hazards Branch determines flow of control
Two branch outcomes: Taken or Not-Taken The CPU doesn’t recognize a branch until
it reaches the end of the ID stage Every cycle, the CPU has to fetch one
instruction
Chapter 4 — The Processor — 29
Control Hazards The MIPS pipeline in textbook always
predict “not-taken” Pipeline flush on every taken branch OK to flush because mis-fetched instructions
don’t write to register/memory But this incurs pipeline bubbles (performance
penalty) The revised MIPS pipeline move branch
comparison to the ID stage Doable for BEQ and BNE Reduce pipeline bubbles from 3 to 1 per taken
branch Complicate data forwarding and hazard detection
Chapter 4 — The Processor — 30
Revised MIPS Pipeline
Chapter 4 — The Processor — 31
Revised MIPS Pipeline
Note: Branch does nothing in EX, MEM and WB
Performance Penalty Any pipeline bubbles?
Chapter 1 — Computer Abstractions and Technology — 32
add $4, $5, $6
lw $1, addr
beq $1, $4, target
add $4, $5, $6
addi $1, $1, -1
beq $1, $zero, loop
loop:
Delayed BranchDelayed branch may remove the one-cycle stall
The instruction right after the beq is executed no matter the branch is taken or not (sub instruction in the example)
Alternatingly saying, the execution of beq is delayed by one cycle
sub $10, $4, $8 beq $1, $3, 7 beq $1, $3, 7 => sub $10, $4, $8 and $12, $2, $5 and $12, $2, $5 Must find an independent instruction, otherwise
May have to fill in a nop instruction, or Need two variants of beq, delayed and not delayed
Chapter 1 — Computer Abstractions and Technology — 33
Other Topics Exception handling Multi-issue pipeline
Those topics will be covered in the final exam Exam 2 will NOT cover them
Chapter 1 — Computer Abstractions and Technology — 34