Midterm Review 2 Dr. Zhao Zhang Iowa State University CprE 381 Computer Organization and Assembly...

Post on 21-Dec-2015

222 views 0 download

Transcript of Midterm Review 2 Dr. Zhao Zhang Iowa State University CprE 381 Computer Organization and Assembly...

Midterm Review 2

Dr. Zhao ZhangIowa State University

CprE 381 Computer Organization and Assembly Level Programming, Fall 2013

Announcement No quiz today No homework this Friday Exam on Monday 9:00-9:50 HW9 deadline extended to next Friday HW8 solutions will be posted today

Chapter 1 — Computer Abstractions and Technology — 2

Exam 2 Coverage Coverage: Ch. 4, The Processor

Datapath and control Simple MIPS pipeline Data hazards and forwarding Load-use hazard and pipeline stall Control hazards

Arithmetic will NOT be covered Will be covered in the final exam Final exam is comprehensive

Chapter 1 — Computer Abstractions and Technology — 3

Question Styles and Coverage Short answer True/False or multi-choice Design and Analysis

Signal values in the datapath and control Identify critical path Support a new MIPS instruction

Performance analysis and optimization Identify pipeline bubbles in program execution Reorder instructions to improve performance

And others

Chapter 1 — Computer Abstractions and Technology — 4

Nine-Instruction MIPS They’re enough to illustrate the most aspects of

CPU design, particularly datapath and control design

Some questions will use it as the baseline design

Memory reference: LW and SW

Arithmetic/logic: ADD, SUB, AND, OR, SLT

Branch: BEQ, J

Chapter 1 — Computer Abstractions and Technology — 5

Chapter 4 — The Processor — 6

Datapath With Jumps Added

The Control Control signals for the nine-instruction

implementation

Inst Reg-Dst

ALU-Src

Mem-toReg

Reg-Write

MemRead

MemWrite

Branch

ALUOp1

ALUOp0

Jump

R- 1 0 0 1 0 0 0 1 0 0

lw 0 1 1 1 1 0 0 0 0 0

sw X 1 X 0 0 1 0 0 0 0

beq X 0 X 0 0 0 1 0 1 0

j X X X 0 0 0 0 X X 1

Chapter 1 — Computer Abstractions and Technology — 7

Note: “R-” means R-format

Chapter 4 — The Processor — 8

ALU Control Truth table for ALU Control

Extend it as a secondary control unit in projects B & C, with more control signal output

opcode ALUOp Operation funct ALU function ALU control

lw 00 load word XXXXXX add 0010

sw 00 store word XXXXXX add 0010

beq 01 branch equal XXXXXX subtract 0110

R-type 10 add 100000 add 0010

subtract 100010 subtract 0110

AND 100100 AND 0000

OR 100101 OR 0001

set-on-less-than 101010 set-on-less-than 0111

Extend the Single-Cycle Processor

For each instruction, do we need1.Any new or revised datapath element(s)?2.Any new control signal(s)?

Then revise, if necessary, 1.Datapath: Add new elements or revise existing ones, add new connections2.Control Unit: Add/extend control signals, extend the truth table3.ALU Control: Extend the truth table

Chapter 1 — Computer Abstractions and Technology — 9

Support JAL

jal target

PC = JumpAddrR[31] = PC_plus_4

PC_plus_4 = PC+4

JumpAddr = PC_plus_4[31:28]

& Inst[25:0] & “00”

Chapter 1 — Computer Abstractions and Technology — 10

000011 address

31:26 25:0

Chapter 4 — The Processor — 11

Support JAL

Make what changes tothe datapath?

Support JAL Analyze the instruction execution

Writes register $ra ($31) Update PC with jump target

This part already done for supporting J Analyze datapath

Needs another input, fixed at 31, to “Write register” port of register file

Needs another input, PC+4, to “Write data” port of register file

Revise control Add a “link” signal The (main) control unit can tell it by reading the

opcode

Chapter 1 — Computer Abstractions and Technology — 12

Chapter 4 — The Processor — 13

SCPv1 + JAL

Revises the two muxes•Add another input•Extend the select signalsAlternatively, use extra mux

Control Signals Control signals for the nine-instruction

implementation

Inst Reg-Dst

ALU-Src

Mem-toReg

Reg-Write

MemRead

MemWrite

Branch

ALUOp1

ALUOp0

Jump Link

R- 1 0 0 1 0 0 0 1 0 0

lw 0 1 1 1 1 0 0 0 0 0

sw X 1 X 0 0 1 0 0 0 0

beq X 0 X 0 0 0 1 0 1 0

j X X X 0 0 0 0 X X 1

jal

Chapter 1 — Computer Abstractions and Technology — 14

• Add a new row for jal• Extend RegDst• Add a control line link

Control Signals Control signals for the nine-instruction

implementation

Inst Reg-Dst

ALU-Src

Mem-toReg

Reg-Write

MemRead

MemWrite

Branch

ALUOp1

ALUOp0

Jump Link

R- 1 0 0 1 0 0 0 1 0 0 0

lw 0 1 1 1 1 0 0 0 0 0 0

sw X 1 X 0 0 1 0 0 0 0 0

beq X 0 X 0 0 0 1 0 1 0 0

j X X X 0 0 0 0 X X 1 0

jal 0 X 0 1 0 0 X X X 1 1

Chapter 1 — Computer Abstractions and Technology — 15

• Extend control input to RegDst Mux: RegDst & Link• Extend control input to MemtoReg Mux: MemtoReg & Link

Chapter 4 — The Processor — 16

Simple Pipeline Add pipeline registers hold information

produced in each cycle

Chapter 4 — The Processor — 17

Pipelined Control

Chapter 4 — The Processor — 18

Hazards Situations that prevent starting the next

instruction safely in the next cycle The simple pipeline won’t work correctly

Structure hazards A required resource is busy

Data hazard Need to wait for previous instruction to

complete its data read/write Control hazard

Deciding on control action depends on previous instruction

Data Hazards

Program with data dependencesub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2)

Program with control dependence beq $1, $3, +4 addi $2, $2, 1 addi $4, $4, 1

Chapter 1 — Computer Abstractions and Technology — 19

Data Forwarding

sub $2, $1,$3 # MEM=>EX forwardingand $12,$2,$5 # WB =>EX forwardingor $13,$6,$2add $14,$2,$2sw $15,100($2)

Chapter 1 — Computer Abstractions and Technology — 20

or and sub … …

or and sub …addAND gets forwarded new $2 value

or and subaddsw SUB gets forwardednew $2 value

IF ID EX MEM WB

Chapter 4 — The Processor — 21

Data Forwarding Paths

Chapter 4 — The Processor — 22

Detecting the Need to Forward

Input rs and rt from EX rd and RegWrite from MEM rd and RegWrite from WB

Output FwdA, FwdB

Caveats Check RegWrite Check if rd = 0 Forwarding from MEM wins over WB

Review slides and textbook for details

Chapter 4 — The Processor — 23

Load-Use Data Hazardlw $s0, 20($t1)sub $t2, $s0, $t3

Can’t always avoid stalls by forwardingMust stall pipeline by one cycle

Chapter 4 — The Processor — 24

Datapath with Hazard Detection

Hazard Detection Unit

Input rs and rt from ID rt and MemRead from EX

Output PCWrite, IF/IDWrite (0 for holding instructions) Select signal to a MUX to insert bubble in EX

Read slides/textbook for details

Chapter 4 — The Processor — 25

Chapter 4 — The Processor — 26

Pipeline Stall The nop has all control signals set to zero

It does nothing at EX, MEM and WB Prevent update of PC and IF/ID register

Using instruction is decoded again (OK) Following instruction is fetched again (OK) 1-cycle stall allows MEM to read data for lw

Can subsequently forward from WB to EX

Chapter 4 — The Processor — 27

Code Scheduling to Avoid Stalls

Reorder code to avoid use of load result in the next instruction

C code for A = B + E; C = B + F;

lw $t1, 0($t0)lw $t2, 4($t0)add $t3, $t1, $t2sw $t3, 12($t0)lw $t4, 8($t0)add $t5, $t1, $t4sw $t5, 16($t0)

stall

stall

lw $t1, 0($t0)lw $t2, 4($t0)lw $t4, 8($t0)add $t3, $t1, $t2sw $t3, 12($t0)add $t5, $t1, $t4sw $t5, 16($t0)

11 cycles13 cycles

Chapter 4 — The Processor — 28

Control Hazards Branch determines flow of control

Two branch outcomes: Taken or Not-Taken The CPU doesn’t recognize a branch until

it reaches the end of the ID stage Every cycle, the CPU has to fetch one

instruction

Chapter 4 — The Processor — 29

Control Hazards The MIPS pipeline in textbook always

predict “not-taken” Pipeline flush on every taken branch OK to flush because mis-fetched instructions

don’t write to register/memory But this incurs pipeline bubbles (performance

penalty) The revised MIPS pipeline move branch

comparison to the ID stage Doable for BEQ and BNE Reduce pipeline bubbles from 3 to 1 per taken

branch Complicate data forwarding and hazard detection

Chapter 4 — The Processor — 30

Revised MIPS Pipeline

Chapter 4 — The Processor — 31

Revised MIPS Pipeline

Note: Branch does nothing in EX, MEM and WB

Performance Penalty Any pipeline bubbles?

Chapter 1 — Computer Abstractions and Technology — 32

add $4, $5, $6

lw $1, addr

beq $1, $4, target

add $4, $5, $6

addi $1, $1, -1

beq $1, $zero, loop

loop:

Delayed BranchDelayed branch may remove the one-cycle stall

The instruction right after the beq is executed no matter the branch is taken or not (sub instruction in the example)

Alternatingly saying, the execution of beq is delayed by one cycle

sub $10, $4, $8 beq $1, $3, 7 beq $1, $3, 7 => sub $10, $4, $8 and $12, $2, $5 and $12, $2, $5 Must find an independent instruction, otherwise

May have to fill in a nop instruction, or Need two variants of beq, delayed and not delayed

Chapter 1 — Computer Abstractions and Technology — 33

Other Topics Exception handling Multi-issue pipeline

Those topics will be covered in the final exam Exam 2 will NOT cover them

Chapter 1 — Computer Abstractions and Technology — 34