Out-of-Order Execution, Exception, Branch Prediction, CMP

86
1 EE 457 Questions and Answers for Special Topics Out-of-Order Execution, Exception, Branch Prediction, CMP Gandhi Puvvada, Weirong Jiang & Tony Toghia, USC 2008

Transcript of Out-of-Order Execution, Exception, Branch Prediction, CMP

Page 1: Out-of-Order Execution, Exception, Branch Prediction, CMP

1

EE 457 Questions and Answers for Special Topics

Out-of-Order Execution, Exception,

Branch Prediction, CMP

Gandhi Puvvada, Weirong Jiang & Tony Toghia, USC 2008

Page 2: Out-of-Order Execution, Exception, Branch Prediction, CMP

Out of Order (OoO) ExecutionDynamic Scheduling of

Instructions(The Tomasulo Algorithm)

Page 3: Out-of-Order Execution, Exception, Branch Prediction, CMP

IntegerMultiplier

Issue UnitIn

t. D

ivid

er

63

2

TAG FIFO

Simplifiedfor EE457

Block Diagramprovided by Prof. Dubois

Mult

Page 4: Out-of-Order Execution, Exception, Branch Prediction, CMP

I -Cache

����

Dispatch

I-Fetch Queue

Integer Queue

Load/StoreQueue

Div

Queue

Mult Queue

CDB

Back-end

Front-end

Re-order Buffer

Reg File

BPB

Exe Unit Exe UnitCache

Exe Unit Exe Unit

����

Add Buff

OoO Execution and In-Order Committing with ROB (Re-Order Buffer)

Issue Unit

Page 5: Out-of-Order Execution, Exception, Branch Prediction, CMP

Q#1 What is the important difference between the two block diagrams?

Which supports precise exceptions

IntegerMultiplier

Issue Unit

Int.

Div

ider

63

2

TAG FIFO

Page 6: Out-of-Order Execution, Exception, Branch Prediction, CMP

A#1 ROB is the important difference between the two block diagrams.

The right-side block diagram supportsprecise exceptions.

IntegerMultiplier

Issue Unit

Int.

Div

ider

63

2

TAG FIFO

Page 7: Out-of-Order Execution, Exception, Branch Prediction, CMP

Q#2 Choose the right attributes to describe the block diagrams.

1. Left Block Diagram__________ (Out of Order / In-Order) Issue,__________ (Out of Order / In-Order) Execute,__________ (Out of Order / In-Order) Complete.

2. Right Block Diagram__________ (Out of Order / In-Order) Issue,__________ (Out of Order / In-Order) Execute,__________ (Out of Order / In-Order) Complete.

Page 8: Out-of-Order Execution, Exception, Branch Prediction, CMP

A#2 Choose the right attributes to describe the block diagrams.

1. Left Block Diagram__________ (Out of Order / In-Order) Issue,__________ (Out of Order / In-Order) Execute,__________ (Out of Order / In-Order) Complete.

2. Right Block Diagram__________ (Out of Order / In-Order) Issue,__________ (Out of Order / In-Order) Execute,__________ (Out of Order / In-Order) Complete.

Page 9: Out-of-Order Execution, Exception, Branch Prediction, CMP

9

Out-of-Order Execution (with ROB)Q#3 When we refer to an out-of-order

processor with ROB, do we mean:a. instructions are issued out-of-order?b. instructions start execution out-of-order?c. instructions finish execution out-of-order?d. instructions retire out of order?

Page 10: Out-of-Order Execution, Exception, Branch Prediction, CMP

• A#3: b and c. Instructions are issued and retired in-order, to maintain the functionality of in-order execution. What happens in between, however, the start and completion (of execution in integer and floating point units) of instructions, can be done out-of-order.

10

Page 11: Out-of-Order Execution, Exception, Branch Prediction, CMP

TAG FIFO (Token FIFO) in the left diagram

IntegerMultiplier

Issue Unit

Int.

Div

ider

63

2

TAG FIFO

Q#4 Q#4.1 Is it necessary to hold the 64 tokens in the 0 to 63 order initially on reset?Q#4.2 Is FIFO used for convenience or is it necessary that we follow the “First-In-First_Out orderQ#4.3 Can the FIFO overflow?Q#4.4 Can the FIFO become empty?

Page 12: Out-of-Order Execution, Exception, Branch Prediction, CMP

TAG FIFO (Token FIFO)A#4 A#4.1 It is not necessary to hold the 64 tokens in the 0 to 63 order initially on reset.

A#4.2 FIFO is used for convenience. It is not necessary that we follow the “First-In-First_Out” order.

A#4.3 The FIFO can not overflow as we can not receive more tokens than what we issued.

A#4.4 The FIFO can become empty if the backend capacity exceeds the total number of tokens.

Q#4 Q#4.1 Is it necessary to hold the 64 tokens in the 0 to 63 order initially on reset?

Q#4.2 Is FIFO used for convenience or is it necessary that we follow the “First-In-First_Out order

Q#4.3 Can the FIFO overflow?

Q#4.4 Can the FIFO become empty?

Page 13: Out-of-Order Execution, Exception, Branch Prediction, CMP

TAGs for destinations or sources or for both? (in ROB-less design)

• A new tag is assigned to the destination register of the instruction being dispatched.

• For each of the source registers (source operands) of the instruction being dispatched, either the value of the source register (if it has not been previously tagged) or the existing tag associated with the source register (if it has been tagged already in RAS) is conveyed to the instruction.

• If a tag is conveyed for a source, then the instruction needs to wait for the original instruction with that destination tag to go on to the CDB and announce the value.

Page 14: Out-of-Order Execution, Exception, Branch Prediction, CMP

Unique TAG

• Like SSN, we need a unique TAG

• SSNs are reused.

• Similarly TAGs can be reused.

• TAGs are similar to the number TOKENs.

4

4

(in ROB-less design)

Page 15: Out-of-Order Execution, Exception, Branch Prediction, CMP

TAGs (= Tokens)

• How many Tokens should the bank cashier have to start with?

• What happens if the tokens are run out?

• Does he need to have any order in holding tokens and issuing tokens?

• Does he have to collect tokens back?

4(in ROB-less design)

Page 16: Out-of-Order Execution, Exception, Branch Prediction, CMP

TAG FIFO (FIFOs are taught in EE560)

• To issue and collect Tokens (TAGs), use a circular FIFO (First-in-First-Out) unit.

• Filled with (say) 64 tokens (in any order) initially on reset.

• Tokens return in out of order anyway.• Put tokens back in stack and issue.

01

63

wp rp

2

Full

wp

rp

63

2

2 tokens issued

1

63

wprp2

1 token returned

(in ROB-less design)

Page 17: Out-of-Order Execution, Exception, Branch Prediction, CMP

17

• Q#5 What is meant by retirement in an out-of-order processor?

• Q#6 What two conditions are required for retirement?

Page 18: Out-of-Order Execution, Exception, Branch Prediction, CMP

• A#5: Retirement is the point at which an instruction’s results can be committed(can be written into the register file or memory) or if it is a conditional branch or an exception it can be taken. In short its execution is insured and it is no longer speculative. Note: In speculative execution, conditional branches are executed based on prediction, and if it turns out to be a misprediction, wrong-path instructions are flushed.

• A#6: Execution must be completed, and the instruction must be the oldest instruction not yet retired. (It is the oldest instruction in the re-order buffer.) 18

Page 19: Out-of-Order Execution, Exception, Branch Prediction, CMP

19

• Q#7 __________________ (Architectural / Physical) registers are visible to software (i.e. can be used in instructions)

• Q#8 __________________ (Architectural / Physical) registers allow multiple copies of a register to support out-of-order execution (including speculative execution) via register renaming.

Page 20: Out-of-Order Execution, Exception, Branch Prediction, CMP

20

• Q#7 __________________ (Architectural / Physical) registers are visible to software (i.e. can be used in instructions)

• Q#8 __________________ (Architectural / Physical) registers allow multiple copies of a register to support out-of-order execution (including speculative execution) via register renaming.

Page 21: Out-of-Order Execution, Exception, Branch Prediction, CMP

Limited Architectural RegistersMore Physical Registers

Register Renaminglw $8, 40($2);add $8, $8, $8;sw $8, 40($2);

lw $8, 60($3);add $8, $8, $8;sw $8, 60($3);

It is clear that compiler is using $8 as a temporary register.

If there is a delay in obtaining $2, the first part of the code can not proceed.

Unfortunately, the second part of the code can not proceed because of name dependency for $8.

Page 22: Out-of-Order Execution, Exception, Branch Prediction, CMP

22

Q#9 Register renaming can NOT solvea. RAW hazardsb. WAR hazardsc. WAW hazards

Note: In a design with ROB, WAW and WAR will never occur as all writes are performed strictly in-order. So answer the above question for the ROB-less design.

Page 23: Out-of-Order Execution, Exception, Branch Prediction, CMP

• A#9: a, The RAW (Read After Write) hazard is the only hazard which cannot be solved by register renaming.

• For WAW (Write After Write) hazard:– if the instruction order is that $1 gets written twice, and if the later

write (W2) can execute before the first write (W1), then register renaming mechanism allows the earlier write to be discarded in a ROB-less design.

• For WAR (Write After Read) hazard:– register renaming allows the older version of the register to be

read and held in the Issue Queues, so that the later write can proceed.

• For RAW (Read After Write) hazard:– a dependent read MUST wait and cannot execute before a write

to the same location. (The to-be written value must be determined before it can be read by a later instruction.) The dependent instruction waits in the Issue Queues for the operand to be broadcast on the CDB. 23

Page 24: Out-of-Order Execution, Exception, Branch Prediction, CMP

IntegerMultiplier

Issue Unit

Int.

Div

ider

63

2

TAG FIFO

24

Q#10 What resource is the major bottleneck of Tomasulo algorithm?

IFQ / Dispatcher / Issue Queues / Execution Units / CDB

Page 25: Out-of-Order Execution, Exception, Branch Prediction, CMP

25

A#10 What resource is the major bottleneck of Tomasulo algorithm?

CDB

The issue unit has to throttle issuing instructions to the execution units based on CDB’s availability. It does not let multiple execution units to finish execution at the same time.

Page 26: Out-of-Order Execution, Exception, Branch Prediction, CMP

26

• Q#11a Suppose the following lwinstruction is in progress and is currently waiting for the cache to respond. lw $2, 0($4)Which of the following instructions in the integer issue queue will begin execution the earliest?

#4 subi $6, $7, $8#3 addi $5, $3, $4#2 sub $4, $4, $6#1 (oldest)

add $1, $2, $3

Page 27: Out-of-Order Execution, Exception, Branch Prediction, CMP

27

• A#11a #2. #1 cannot begin execution, because it reads $2, which is still being written by the LW instruction (RAW hazard). Instruction #2 can begin execution. (Note: Register renaming solves the WAR hazard on $4.)

#4 subi $6, $7, $8#3 addi $5, $3, $4#2 sub $4, $4, $6#1 (oldest)

add $1, $2, $3

Page 28: Out-of-Order Execution, Exception, Branch Prediction, CMP

28

• Q#11b Given the same situation (lw $2, 0($4) ) as the previous problem, now which of the following instructions in the integer issue queue will begin execution the earliest?

#4 subi $6, $7, $8#3 addi $5, $3, $4#2 sub $4, $4, $1#1 (oldest)

add $1, $2, $3

Was $6

Page 29: Out-of-Order Execution, Exception, Branch Prediction, CMP

29

• A#11b Instruction #4 is the earliest instruction that does not read a value that is modified by an earlier instruction.

#4 subi $6, $7, $8#3 addi $5, $3, $4#2 sub $4, $4, $1#1 (oldest)

add $1, $2, $3

Was $6

Page 30: Out-of-Order Execution, Exception, Branch Prediction, CMP

Without or with ROB? • Q#11c Are your answers to Q#11a and

Q#11b for the first design without ROB or the second design with ROB?

Page 31: Out-of-Order Execution, Exception, Branch Prediction, CMP

Without or with ROB? • Q#11c Are your answers to Q#11a and

Q#11b for the first design without ROB or the second design with ROB?

• A#11c For both! RAW dependency is the true dependency and every implementation has to honor that dependency.

Page 32: Out-of-Order Execution, Exception, Branch Prediction, CMP

Q#12 ROB is the important difference between the two block diagrams.

Compare and contrast

IntegerMultiplier

Issue Unit

Int.

Div

ider

63

2

TAG FIFO

Page 33: Out-of-Order Execution, Exception, Branch Prediction, CMP

A#12 Compare and contrastWithout ROB With ROB

1. TAG FIFO provides unique TAGs

1. ROB location IDs are TAGs

2. Register Status Table specifies if a register is obsolete.

2. ROB needs to be searched associatively to find the latest register content

3. Allows out-of-order completion

3. Enforces in-order-only completion

Page 34: Out-of-Order Execution, Exception, Branch Prediction, CMP

A#12 Compare and contrastWithout ROB With ROB

4. Can not support exceptions

4. Can support exceptions

5. Can not support speculative execution.

5. Can support speculative execution.

6. No speculation,No BPB.

6. Has BPB to aid in branch prediction

7. No good for real implementation

7. Good for real implementation

Page 35: Out-of-Order Execution, Exception, Branch Prediction, CMP

A#12 Compare and contrastWithout ROB With ROB

8. Writes are out of order. Hence dispatch is suspended after dispatching a conditional branch, until the branch is resolved.

8. Writes are in-order. Dispatch continues based on prediction. Design provides for flushing wrong-path execution.

9. Stores write to cache when they come out of lsq (load/store queue).

9. Stores write to cache when they reach the top of ROB.

Page 36: Out-of-Order Execution, Exception, Branch Prediction, CMP

A#12 Compare and contrastWithout ROB With ROB

10. Memory disambiguation rules are stricter.

10. Since WAW and WAR are not present, rules are simpler.

11. Only RAR is irrelevant. So two loads from the same address can execute in any order. Rest of loads and stores with matching addresses have go in-order.

11. Only RAW needs to be looked at. Loads read cache before going into ROB. Hence, loads have to wait until senior stores with matching addresses finish

Page 37: Out-of-Order Execution, Exception, Branch Prediction, CMP

A#12 Compare and contrastWithout ROB With ROB

12. Suppose a senior load is yet to calculate its memory address.A junior load (but not store) can leave LSQ. (No RAR, but WAR).Suppose a senior store is yet to calculate its memory address.A junior load/store can not leave. (RAW, WAW)

12. Stores leave a copy of their address in Address Buffer near LSQ, so that junior loads can figure out (without looking up the ROB) if they can read cache. It means junior stores, with a senior load yet to calculate address, can not leave LSQ. It means, junior stores with address matching to a senior load should not leave LSQ. Or they can leave if senior loads with matching address make a note of this.

Page 38: Out-of-Order Execution, Exception, Branch Prediction, CMP

38

Exceptions

• Q#1 What is the definition of an exception?

• Q#2 What is the difference between asynchronous and synchronous exceptions? Give two examples of each.

• Q#3 Precise exceptions are _______________ (synchronous, asynchronous ) and the excepting instruction _________ (must be/does not need to be) re-executed .

Page 39: Out-of-Order Execution, Exception, Branch Prediction, CMP

• A#1: Exceptions are very rare events forcing a transfer of program control to a software handler.

• A#2: Synchronous exceptions are triggered by specific instructions (e.g. Divide by zero, illegal instruction, page fault, etc.). Asynchronous exceptions include the hardware interrupts and are not tied to a specific executing instruction (e.g. keyboard interrupt, real-time clock, power failure)

• A#3: Precise exceptions are (synchronous, asynchronous ) and the excepting instruction (must be/does not need to be) re-executed (e.g. in the case page fault, ....).

39

Page 40: Out-of-Order Execution, Exception, Branch Prediction, CMP

40

Q#4• Interrupts are ___________

(Asynchronous/Synchronous) to program execution.

• Traps are ___________ (Asynchronous/Synchronous) to program execution.

Page 41: Out-of-Order Execution, Exception, Branch Prediction, CMP

41

A#4• Interrupts are ___________

(Asynchronous/Synchronous) to program execution. Example: Keyboard interrupt.

• Traps are ___________ (Asynchronous/Synchronous) to program execution. Example: addition overflow trap.

Page 42: Out-of-Order Execution, Exception, Branch Prediction, CMP

42

Q#5• Match the exceptions with the 5 pipeline

stages

IF ID EX MEM WB

Page Fault

Integer Overflow

Undefined Opcode

Memory Protection Violation

Page 43: Out-of-Order Execution, Exception, Branch Prediction, CMP

43

A#5• Match the exceptions with the 5 pipeline

stages

IF ID EX MEM WB

Page Fault X X

Integer Overflow X

Undefined Opcode X

Memory Protection Violation

X X

Page 44: Out-of-Order Execution, Exception, Branch Prediction, CMP

44

Q#6 For precise exceptions, the exceptions should be taken in

a. process orderb. temporal order

Page 45: Out-of-Order Execution, Exception, Branch Prediction, CMP

45

Q#6 For precise exceptions, the exceptions should be taken in

a. process orderb. temporal order

• A#6: Process order. Exceptions on earlier instructions must be handled before exceptions due to later instructions, regardless of when they are detected.

Page 46: Out-of-Order Execution, Exception, Branch Prediction, CMP

46

Q#7• For precise exceptions in the 5-stage

pipeline, an exception should be taken in which stage? Why?

Page 47: Out-of-Order Execution, Exception, Branch Prediction, CMP

• A#7: WB Stage. This is to insure that no earlier instruction in program order triggers an exception.

Well, as discussed in our class, an exception can be taken in MEM stage (instead of the WB stage) as the instruction in the WB stage would not cause a new exception.

47

Page 48: Out-of-Order Execution, Exception, Branch Prediction, CMP

48

Q#8• What are the functions of the Cause

Register and Exception PC (EPC)?

Page 49: Out-of-Order Execution, Exception, Branch Prediction, CMP

49

Q#8• What are the functions of the Cause

Register and Exception PC (EPC)?

• A#8: Cause register records what type of exception occurred, and the EPC tells the exception handler on which instruction the exception occurred.

Page 50: Out-of-Order Execution, Exception, Branch Prediction, CMP

50

Q#9 What are the requirements of precise exception handling in a pipelined processor?

Page 51: Out-of-Order Execution, Exception, Branch Prediction, CMP

51

Q#9 What are the requirements of precise exception handling in a pipelined processor?

A#9: All preceding instructions in process order must complete.All instructions following the faulting instruction plus the faulting instruction itself must be squashed.The execution of the handler must be started.

Page 52: Out-of-Order Execution, Exception, Branch Prediction, CMP

52

• Q#10

Page 53: Out-of-Order Execution, Exception, Branch Prediction, CMP

53

First run (before first exception handled)

Page 54: Out-of-Order Execution, Exception, Branch Prediction, CMP

54

Second run (after page fault handled)

Page 55: Out-of-Order Execution, Exception, Branch Prediction, CMP

55

A#10: First run (before first exception handled)

IF ID EX MEM WB

Cycle 1 SW Illegal –Exception Detected

ADD LW –Exception Detected

Cycle 2 Start of Exception Handler

NOP NOP NOP NOP (Exception)

Page 56: Out-of-Order Execution, Exception, Branch Prediction, CMP

56

A#10: Second run (after page fault handled)

IF ID EX MEM WB

Cycle 1

SW Illegal –Exception Detected

ADD LW

Cycle 2

NOP NOP NOP (Exception)

ADD LW

Cycle 3

NOP NOP NOP NOP (Exception)

ADD

Cycle 4

Start of Exception Handler

NOP NOP NOP NOP (Exception)

Page 57: Out-of-Order Execution, Exception, Branch Prediction, CMP

57

Branch PredictionQ#1 Which types of branches need

prediction?a. Indirect branch due to return from

function callb. Conditional branchc. Unconditional branch

Page 58: Out-of-Order Execution, Exception, Branch Prediction, CMP

58

Branch PredictionQ#1 Which types of branches need

prediction (direction prediction)?a. Indirect branch due to return from

function callb. Conditional branchc. Unconditional branch

A#1: Conditional branch

Page 59: Out-of-Order Execution, Exception, Branch Prediction, CMP

59

The misprediction rate (increases/decreases/stays the same) if the loop is re-executed.

branchPCBranch Prediction Buffer

N T

Q#2 Given a simple 1-bit (2-state) pattern history predictor, assuming the initial branch is predicted not taken what is the misprediction rate for the following loop? (Assume there are no other branches in the loop):

for (i=0; i<4, i++)

Page 60: Out-of-Order Execution, Exception, Branch Prediction, CMP

60

The misprediction rate stays the same for all subsequent runs of the loop.

branchPCBranch Prediction Buffer

N T

A#2 The predictor will predict the 1st branch not taken, and it will predict the 2nd, 3rd, 4th, and 5th branches taken. The 1st and last predictions will be incorrect. So, the misprediction rate is 40%.

for (i=0; i<4, i++)

I 0 1 2 3 4

Pred N T T T T

Page 61: Out-of-Order Execution, Exception, Branch Prediction, CMP

Examples

DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT …

100,000 iterations

How often is branch outcome != previous outcome?2 / 100,000

TNNT

DC44: TTTTT ... TNTTTTT … TNTTTTT …

2 / 100

DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …

2 / 2

99.998%Prediction

Rate98.0%

0.0%

© Murali Annavaram, Gabe Loh & Gary Tyson, All rights reserved

Page 62: Out-of-Order Execution, Exception, Branch Prediction, CMP

Brandon Franzke, USC 2006 62

Use two bit history• 2-bit history

– Start as strongly not taken – Update BPB after every branch execution

branchPC

SN N

Branch Prediction Buffer

T ST

© Murali Annavaram, Gabe Loh & Gary Tyson, All rights reserved

Page 63: Out-of-Order Execution, Exception, Branch Prediction, CMP

TWO-BIT PREDICTOR

2-BIT UP-DOWN SATURATING COUNTER IN EACH ENTRY OF THE BPB

TAKEN==> ADD 1; UNTAKEN: SUBTRACT 1NOW IT TAKES 2 MISPREDICTIONS IN A ROW TO CHANGE THE PREDICTIONFOR THE NESTED LOOP, THE MISPRECTION AT ENTRY IS AVOIDED

COULD HAVE MORE THAN 2-BITS, BUT TWO BITS COVER MOST PATTERNS (LOOPS)

00Predict U

10Predict T

01Predict U

11Predict T

T

U T

U

T U

T

U

U: UntakenT: Taken

SN N

TST

SN

N

T

ST

Strongly Not Taken

Not Taken

Taken

Strongly Taken

SN N T ST

EE557 Michel Dubois USC 2007

Page 64: Out-of-Order Execution, Exception, Branch Prediction, CMP

64

• Q#3 Show the states and predictions for 2 runs of the loop shown in Q#2 using the 2-bit pattern history predictor?

First run: Second run:Iteration 0 1 2 3 4

Actual T T T T N

State

Prediction N

Iteration 0 1 2 3 4

Actual T T T T N

State

Prediction

SN N T ST

SN

Page 65: Out-of-Order Execution, Exception, Branch Prediction, CMP

65

• A#3 The 2-bit predictor works better than the 1-bit predictor after the initial training period.We can improve the initial training period by starting in the state.

First run: Second run:Iteration 0 1 2 3 4

Actual T T T T N

State

Prediction N N T T T

Iteration 0 1 2 3 4

Actual T T T T N

State

Prediction T T T T T

SN N T ST

SN N T ST ST T ST ST ST ST

T

Page 66: Out-of-Order Execution, Exception, Branch Prediction, CMP

66

Q#4 (Global / Local) predictors make use of the PC, while (global / local) predictors do not.

Page 67: Out-of-Order Execution, Exception, Branch Prediction, CMP

67

A#4 (Global / Local) predictors make use of the PC, while (global / local) predictors do not.

A#4 Local (also known as per-address) predictors, make use of the PC to distinguish between different branch instructions. Global predictors do not.

Page 68: Out-of-Order Execution, Exception, Branch Prediction, CMP

Correlating Branches

(2,2) predictor– Behavior of recent

branches selects between four predictions of next branch, updating just that prediction

Branch address

2-bits per branch predictor

Prediction

2-bit global branch history

4

CS252 UC Berkeley David A. Patterson

Page 69: Out-of-Order Execution, Exception, Branch Prediction, CMP

69

• Q#5 Two-Level Prediction:• Given the following branch history / pattern

history predictor:– 2-bit global branch history register (Shift-Left)– 3-bits of PC used to access pattern history table.– All predictors are 2-bits Predictors.– Instruction width = 32-bits– Assume the next branch instruction is at PC = 8004,

and it will be taken eventually.• On the following page:

– Provide the bits of the PC used by the predictor.– Indicate if the prediction is taken/not taken.– Show any changes to the branch history register and

pattern history table after the branch taken outcome info is provided.

Page 70: Out-of-Order Execution, Exception, Branch Prediction, CMP

700 1

00 10 11 10

11 10 01 01

01 01 01 11

00 01 00 10

00 10 11 10

11 10 01 01

01 01 01 11

00 01 00 10

PC A__ - A__

00 11

000

111BHR

Pattern History Table

01 10

Page 71: Out-of-Order Execution, Exception, Branch Prediction, CMP

710 1

00 10 11 10

11 10 01 01

01 01 01 11

00 01 00 10

00 10 11 10

11 10 01 01

01 01 01 11

00 01 00 10

PC A 4 - A 2

00 11

000

111BHR

Pattern History Table

01 10

001

A#5: 8004H => 00110 => Predict T (Taken)

This branch is taken as predicted eventually. Hence•Branch History Register shifts left from 01 to 11.•Pattern changes from state 10 to state 11 (refer to the 2-bit predictor state diagram).

Shift in a 1

Page 72: Out-of-Order Execution, Exception, Branch Prediction, CMP

72

Q#6 Is the following statement true or false? Explain.

“A predictor with more bits can always achieve a better performance”

Page 73: Out-of-Order Execution, Exception, Branch Prediction, CMP

73

Q#6 Is the following statement true or false? Explain.

“A predictor with more bits can always achieve a better performance”

A#6 : No. More bits can often just increase training time, which will reduce the accuracy for shorter loops. Also more bits mean more hysteresis which in turn means “refusing” to “adopt” or “change”.

Page 74: Out-of-Order Execution, Exception, Branch Prediction, CMP
Page 75: Out-of-Order Execution, Exception, Branch Prediction, CMP

Q#7 With a branch target buffer, the address of the next instruction can be predicted while the branch is in _____ (IF/ID/EX/MEM/WB) stage.

75

Page 76: Out-of-Order Execution, Exception, Branch Prediction, CMP

Q#7 With a branch target buffer, the address of the next instruction can be predicted while the branch is in _____ (IF/ID/EX/MEM/WB) stage.

76

A#7: IF Stage. The branch target buffer compares the PC against the known predicted taken branches and supplies the next address. Since only the PCs are being compared, the instruction does not have to be decoded. For accurately predicted branches, this results in zero clock penalty.

Page 77: Out-of-Order Execution, Exception, Branch Prediction, CMP

77

CMPQ#1 Uniprocessor pipelines (with no

multithreading) are constrained by ___________ level parallelism

Q#2 Dynamic power considerations favors ____(Uniprocessor / Parallel Processor)

Page 78: Out-of-Order Execution, Exception, Branch Prediction, CMP

78

CMPA#1 Uniprocessor pipelines (with no

multithreading) are constrained by instruction level parallelism (ILP)

A#2 Dynamic power considerations favors ____(Uniprocessor / Parallel Processor)

Page 79: Out-of-Order Execution, Exception, Branch Prediction, CMP

79

Q#3a Which types of processor multithreading need context switch through Process Control Block?

a. Software multithreadingb. Hardware multithreading

Q#3b Which has high over-head of context switching?

a. Software multithreadingb. Hardware multithreading

Page 80: Out-of-Order Execution, Exception, Branch Prediction, CMP

80

A#3a Which types of processor multithreading need context switch through Process Control Block?

a. Software multithreadingb. Hardware multithreading

A#3b Which has high over-head of context switching?

a. Software multithreadingb. Hardware multithreading

Page 81: Out-of-Order Execution, Exception, Branch Prediction, CMP

81

Q#4 Does Niagara have the cache coherence issue? If Yes, in which level of cache?

Page 82: Out-of-Order Execution, Exception, Branch Prediction, CMP

82

Q#4 Does Niagara have the cache coherence issue? If Yes, in which level of cache?

A#4: Yes, in L1 cache since it’s not shared.

Page 83: Out-of-Order Execution, Exception, Branch Prediction, CMP

83

Q#5a Is L1 cache shared across cores?

Q#5b Is L1 cache shared (used) by the different threads running on a single core?

Page 84: Out-of-Order Execution, Exception, Branch Prediction, CMP

84

Q#5a Is L1 cache shared across cores?

No.

Q#5b Is L1 cache shared (used) by the different threads running on a single core?

Yes.

Page 85: Out-of-Order Execution, Exception, Branch Prediction, CMP

85

• Q#6 Uniprocessors place greater burden on (hardware / software) designers, while parallel processors place greater burden on (hardware / software) designers.

Page 86: Out-of-Order Execution, Exception, Branch Prediction, CMP

86

• Q#6 Uniprocessors place greater burden on (hardware / software) designers, while parallel processors place greater burden on (hardware / software) designers.

• A#6 Uniprocessors place greater burden on (hardware / software) designers, while parallel processors place greater burden on (hardware / software) designers.