Superscalar - summary

15
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex IFU Able to issue multiple instructions/cycle (typ 4) Able to detect hazards (unavailability of operands) Able to re-order instruction issue Aim to keep all the FUs busy Typically, 6-way superscalars can achieve instruction level parallelism of 2-3

description

Superscalar - summary. Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex IFU Able to issue multiple instructions/cycle (typ 4) Able to detect hazards (unavailability of operands) - PowerPoint PPT Presentation

Transcript of Superscalar - summary

Page 1: Superscalar - summary

Superscalar - summary

• Superscalar machines have multiple functional units (FUs)eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x

load/store

• Requires complex IFU• Able to issue multiple instructions/cycle (typ 4)• Able to detect hazards (unavailability of

operands)• Able to re-order instruction issue

• Aim to keep all the FUs busy

• Typically, 6-way superscalars can achieveinstruction level parallelism of 2-3

Page 2: Superscalar - summary

Computer Architecture

Speculation & Branching

Iolanthe II approaches Rangitoto

Page 3: Superscalar - summary

Speculation

• High Tech Gambling?• Data Prefetch

• Cache instruction dcbt : data cache block touch

• Attempts to bring data into cache• so that it will be “close” when needed

• Allows SIU to use idle bus bandwidth• if there’s no spare bandwidth,

this read can be given low priority• Speculative because

• a branch may occur before it’s used• we speculate that this data may be needed

PowerPC mnemonic -Similar opcodes found in other architectures:SPARC v9, MIPS, …

Page 4: Superscalar - summary

Speculation - General

• Some functional units almost always idle• Make them do some (possibly useful) work

rather than idle• If the speculation was incorrect,

results are simply abandoned• No loss in efficiency; Chance of a gain

• Researchers are actively looking at software prefetch schemes• Fetch data well before it’s needed• Reduce latency when it’s actually needed

• Speculative operations have low priority and use idle resources

Page 5: Superscalar - summary

Branching

• Expensive• 2-3 cycles lost in pipeline

• All instructions following branch ‘flushed’

• Bandwidth wasted fetching unused instructions• Stall while branch target is fetched

• We can speculate about the target of a branch• Terminology

• Branch Target : address to which branch jumps

• Branch Taken : control transfers to non- sequential address (target)

• Branch Not Taken : next instruction is executed

Page 6: Superscalar - summary

Branching - Prediction

• Branches can be• unconditional: branch is always taken

call subroutine return from subroutine

• conditional: branch depends on state of computation, eghas loop terminated yet?

• Unconditional branches are simple• New instructions are fetched as soon as the

branch is recognized • As early in the pipeline as possible

• Branch units often placed with fetch & decode stages

Page 7: Superscalar - summary

Branching - Branch Unit

• PowerPC 603 logical layout

Page 8: Superscalar - summary

Branching - Speculation

• We have the following code: if ( cond ) s1; else s2;

• Superscalar machine • Multiple functional units• Start executing both branches (s1 and s2)• Keep idle functional units busy!

• One is speculative and will be abandoned• Processor will eventually calculate the branch

condition and select which result should be retained (written back)

• MIPS R10000 - up to 4 speculative at once

Page 9: Superscalar - summary

Branching - Speculation

• MIPS R10000 - • Up to 4 speculative at once• Instructions are “tagged” with a 4 bit mask

• Indicates to which branch instruction it belongs

• As soon as condition is determined,mis-predicted instructions are aborted

Page 10: Superscalar - summary

Branching - Prediction• We have a sequence of instructions:

addlw

sub brne L1 or st

? If you were asked to guess which branch should be preferred, which would you choose:

? Next sequential instruction (L2)? Branch target (L1)

L2

L1 Some mixture of arithmetic,load, store, etc, instructions

branch on some condition

Some more arithmetic,load, store, etc, instructions

Page 11: Superscalar - summary

Branching - Prediction

• Studies show that branches are taken most of the time!

• Because of loops:

add ;any mix of arith,lw ;load, store, etc,

sub ;instructionsbrne L1 ;branch back to loop start

or ;some more arith,st ;memory, etc instructions

L2

L1

Page 12: Superscalar - summary

Branching - Prediction Rule

• A simple prediction rule:• Take backward branches

works amazingly well!• For a loop with n iterations,

this is wrong in 1/n cases only!• A system working on this rule alone would

• detect the backward branch and • start fetching from the branch target

rather than the next instruction

Page 13: Superscalar - summary

Branching - Improving the prediction

• Static prediction systems• Compiler can mark branches

• Likely to be taken or not• Instruction fetch unit will use the marking as

advice on which instruction to fetch

• Compiler often able to give the right advice • Loops are easily detected• Other patterns in conditions can be recognized

• Checking for EOF when reading a file• Error checking

Page 14: Superscalar - summary

Branching - Improving the prediction

• Dynamic prediction systems• Program history determines most likely branch• Branch Target Buffers - Another cache!

Page 15: Superscalar - summary

Branching - Branch Target Buffer

• Instruction Add[11:3] selects BTB entry• Tag determines “hit”• Stats select taken/not taken

R1000087% prediction

accuracy -SPEC’92 integer