05-superscalar 2

download 05-superscalar 2

of 38

Transcript of 05-superscalar 2

  • 8/3/2019 05-superscalar 2

    1/38

    Nov 2010 ICS311 - superscalar architecture 2 1

    Superscalar Processor Architecture(2)

  • 8/3/2019 05-superscalar 2

    2/38

    Nov 2010 ICS311 - superscalar architecture 2 2

    Super-scalar Processor

    basic concept basic operation

    Limitations/challenges

    Architecture Issue policy Dealing with false dependencies

    Other design issues

    Operand fetch and update policies Dealing with conditional branches Preserving the sequential consistency of execution

  • 8/3/2019 05-superscalar 2

    3/38

    Nov 2010 ICS311 - superscalar architecture 2 3

    Register operand fetching, updating

  • 8/3/2019 05-superscalar 2

    4/38

    Nov 2010 ICS311 - superscalar architecture 2 4

    operand fetch policiesdirect issue

    shelved issueissue bound

    dispatch bound

  • 8/3/2019 05-superscalar 2

    5/38

    Nov 2010 ICS311 - superscalar architecture 2 5

    operand fetch policiesConsider the instruction in form of:

    add Rd, Rs1, Rs2

    E.g. in IA32: add AX, BX, CX

    How/when do the values of the source operands reach theexecution unit?

    How/when does the computed result get committed to thedestination operand?

  • 8/3/2019 05-superscalar 2

    6/38

    Nov 2010 ICS311 - superscalar architecture 2 6

    (direct issue)operand fetch policies

    I-bufferdecode/issue

    EU

    reg file

    src operands

    src reg numbers

    opcode, dest

    reg nos.

  • 8/3/2019 05-superscalar 2

    7/38Nov 2010 ICS311 - superscalar architecture 2 7

    From decode/issue

    register file

    VRs1, Rs2, Rd

    fetch Rs1, Rs2 (if avail),

    reset V-bit of Rd

    EU

    OC Os1 Os2 Rd

    update Rd, set its V-bit

    (direct issue)operand fetch policies

  • 8/3/2019 05-superscalar 2

    8/38Nov 2010 ICS311 - superscalar architecture 2 8

    shelved issue

  • 8/3/2019 05-superscalar 2

    9/38Nov 2010 ICS311 - superscalar architecture 2 9

    I-buffer

    decode/issue

    shelving-buffer

    dispatch

    EU

    operand fetch policies (shelved issue): issue bound

    reg file

    src operands

    src reg numbers

    opcode, destreg nos.

    Operands are

    fetched at issue

  • 8/3/2019 05-superscalar 2

    10/38Nov 2010 ICS311 - superscalar architecture 2 10

    From decode/issue

    EU

    operand fetch/update (shelved issue): issue bound

    register file

    V

    Rs1, Rs2, Rd

    fetch Rs1, Rs2 (if avail),

    reset V-bit of Rd

    Reservation Station

    OC Os1 Vs1 Os2 Vs2 Rd

    update Rd, set its V-bit

    out-of-order dispatch1 instr/cycle

  • 8/3/2019 05-superscalar 2

    11/38Nov 2010 ICS311 - superscalar architecture 2 11

    I-buffer

    decode/issue

    shelving-buffer

    dispatch

    EU

    operand fetch policies (shelved issue): dispatch bound

    reg file

    src operands

    src reg numbers

    opcode, dest

    reg nos.

    operands are fetched

    at dispatch

  • 8/3/2019 05-superscalar 2

    12/38Nov 2010 ICS311 - superscalar architecture 2 12

    From decode/issue

    EU

    operand fetch/update (shelved issue): dispatch bound

    register file

    VRs1, Rs2, Rd

    fetch Rs1, Rs2 (if avail),

    reset V-bit of Rd

    Reservation Station

    OC Rs1 Rs2 Rd

    update Rd, set its V-bit

    dispatch

    OC Os1 Vs1 Os2 Vs2Rd

  • 8/3/2019 05-superscalar 2

    13/38

    Nov 2010 ICS311 - superscalar architecture 2 13

    In the case of shelved issue

    Compare issue bound vs. dispatch bound

  • 8/3/2019 05-superscalar 2

    14/38

    Nov 2010 ICS311 - superscalar architecture 2 14

    Dealing with conditional branches:speculative execution:Speculative execution

  • 8/3/2019 05-superscalar 2

    15/38

    Nov 2010 ICS311 - superscalar architecture 2 15

    Non-speculative executionon encounter of a conditional branch,

    suspend fetching and processingdependent instructions: leads to delays

    Speculative execution

    on encounter of a conditional branch,predict the direction of the branch andcontinue processing along the predictedpath.Avoids delays as long as predictionsare correct.

  • 8/3/2019 05-superscalar 2

    16/38

    Nov 2010 ICS311 - superscalar architecture 2 16

    Branch prediction

    Static branch prediction

    Predict never taken

    Continue to fetch instructions sequentially until thebranch direction is resolved.

    Predict always taken

    As soon as the branch target is decoded, startfetching from the target. (Note that the target maybe decoded before the branch condition is evaluatedand branch direction is resolved.)

    Speculative execution

  • 8/3/2019 05-superscalar 2

    17/38

    Nov 2010 ICS311 - superscalar architecture 2 17

    Static branch prediction

    Predict by opcode, e.g.

    JLE may be typically used for loop

    control and the branch is usually taken.JE also used in loop control but is

    usually not taken.

    Compiler may exploit.

  • 8/3/2019 05-superscalar 2

    18/38

    Nov 2010 ICS311 - superscalar architecture 2 18

    Dynamic branch prediction

    Taken/not taken switch Keep record of the result of the previous

    execution of the branch and assume the currentexecution will go the same way.

    Branch history table

    Maintain recent history of a branchs executions.Use the history to predict direction.

  • 8/3/2019 05-superscalar 2

    19/38

    Nov 2010 ICS311 - superscalar architecture 2 19

    Branch prediction

    what happens if the prediction is wrong?!

  • 8/3/2019 05-superscalar 2

    20/38

    Nov 2010 ICS311 - superscalar architecture 2 20

    Branch prediction

    what happens if the prediction is wrong?!

    Need to flush the speculatively issued, dispatched, and

    executed instructions from buffers & EUs.

    cancel/discard effects of the speculatively executed

    instructions

  • 8/3/2019 05-superscalar 2

    21/38

    Nov 2010 ICS311 - superscalar architecture 2 21

    preserving the sequential consistency ofexecution:ROB

  • 8/3/2019 05-superscalar 2

    22/38

    Nov 2010 ICS311 - superscalar architecture 2 22

    parallel executioninstructions can finish out of program

    order! Need to preserve seq consistency

    Definitions:

    distinguish

    finish: required operation of instruction is

    accomplished except writeback complete: last action performed, e.g. wb retire: wb + delete

  • 8/3/2019 05-superscalar 2

    23/38

    Nov 2010 ICS311 - superscalar architecture 2 23

    preserving sequential consistency of instruction executionusing

    Re-order buffer (ROB)

  • 8/3/2019 05-superscalar 2

    24/38

    Nov 2010 ICS311 - superscalar architecture 2 24

    Re-order buffer (ROB) description & use cyclic buffer holding record of all active

    instructions; issued but yet to be retired keeps track of state of instruction; e.g. i, x, f

    head: first free slot tail: next instruction to be retired

    add new instruction at head on issue an instruction in ROB may complete and retire if

    it has finished and all instructions ahead haveretired

  • 8/3/2019 05-superscalar 2

    25/38

    Nov 2010 ICS311 - superscalar architecture 2 25

    I-buffer

    shelving-buffers

    dispatch unit

    EUs

    Preserving sequential consistency: ROB

    ROB

    completion/r

    etire unit

    issued

    in execution

    finished

    completed/retired

    decode/ issueunit

  • 8/3/2019 05-superscalar 2

    26/38

    Nov 2010 ICS311 - superscalar architecture 2 26

    Exercise

  • 8/3/2019 05-superscalar 2

    27/38

    Nov 2010 ICS311 - superscalar architecture 2 27

    Example superscalar pipeline

    In-order issue. Non blocking. Issue rate = 1 instructionper cycle

    Shelved issue. One 4 entry reservation station. No

    bypass. 3 execution units same functions (iadd, isub, imul, idiv)

    Possible out-of-order dispatch. Dispatch window = 3.Dispatch rate 1 instruction per cycle.

    Execution times: iadd, isub=3cycles; imul, idiv = 6cycles.(EU not pipelined)

    Register renaming with very large number of H/Wregisters

    Exercise

  • 8/3/2019 05-superscalar 2

    28/38

    Nov 2010 ICS311 - superscalar architecture 2 28

    Given the following instruction stream

    I1: iadd R1, R2, R3

    I2: imul R4, R5, R6

    I3: iadd R7, R8, R9

    I4: iadd R4, R7, R8I5: idiv R10, R8, R3

    I6: isub R2, R5, R6

    I7: iadd R11, R3,R6

    I8: isub R12, R3,R6

    I9: .

    Exercise

    Instruction format:

    opcode Rdest, Rsrc1, Rsrc2

  • 8/3/2019 05-superscalar 2

    29/38

    Nov 2010 ICS311 - superscalar architecture 2 29

    Trace the processing of I1 to I8 assuming that

    initially, all the 8 instructions are in the I-buffer

    Show contents of the I-buffer, RS and EUs duringeach cycle.

    Show contents of the ROB during each cycleDerive the total execution time of the 8

    instructions in cycles (from the time I1 is issuedto the time I8 is retired)

    Exercise

  • 8/3/2019 05-superscalar 2

    30/38

    Nov 2010 ICS311 - superscalar architecture 2 30

    Super scalar implementation

    Processor hardware requirements: summary

  • 8/3/2019 05-superscalar 2

    31/38

    Nov 2010 ICS311 - superscalar architecture 2 31

    Processor hardware requirements: summary

    Multiple pipelined fetch and decode stages,and branch prediction logic.

    Logic for determining true datadependencies and mechanisms forcommunicating values to where needed

    during execution. Mechanisms for issuing multiple

    instructions in parallel.

  • 8/3/2019 05-superscalar 2

    32/38

    Nov 2010 ICS311 - superscalar architecture 2 32

    Processor hardware requirements: summary

    Resources for parallel execution of multipleinstructions

    Multiple pipelined functional units

    Memory hierarchies for simultaneousservicing of multiple references.

    Mechanisms for committing the process

    state in correct order.

  • 8/3/2019 05-superscalar 2

    33/38

    Nov 2010 ICS311 - superscalar architecture 2 33

    Recognizing opportunities for parallelprocessing of instructions

  • 8/3/2019 05-superscalar 2

    34/38

    Nov 2010 ICS311 - superscalar architecture 2 34

    Recognizing opportunities for parallel processing ofinstructions

    Processor hardware Processor hardware designed to look for

    and recognize opportunities forparallelism.

    Implies sophisticated, expensive, bulkyhardware

    Compiler support Compiler programmed to look for and

    recognize opportunities for parallelism. Then communicate accordingly to the

    hardware (machine language must support!)

  • 8/3/2019 05-superscalar 2

    35/38

    Nov 2010 ICS311 - superscalar architecture 2 35

    Recognizing opportunities for parallel processing of instructions

    Programming language support

    Programming language allows

    programmer to communicateopportunities for parallel processing.

    Information communicated to thehardware via the compiler

  • 8/3/2019 05-superscalar 2

    36/38

    Nov 2010 ICS311 - superscalar architecture 2 36

    Reading assignment:

    Intel Pentium II super scalar processorarchitecture. Stalling pp.515-520.

    Power PC 601 super scalar processorarchitecture. Stalling pp. 521-526

    MPC7450 Microprocessor, Motorola. P. 1-4(attached)

    S l A hit t

  • 8/3/2019 05-superscalar 2

    37/38

    Nov 2010 ICS311 - superscalar architecture 2 37

    Superscalar Architecture

    practical assignment

    Use the simplescalar simulation tool to demonstrate the

    effects of superscalarity of a processor on performance as

    measured by instruction throughput.You should be able to come up with a graph illustrating

    the relationship between the two variables, i.e. a selected

    aspect of superscalarity and instruction throughput.You may use a selected benchmark program in carrying

    out the above.Hand in your results including a report on how you

    obtained them.Work in groups of no more than four.

    Ref: http://www.simplescalar.com

  • 8/3/2019 05-superscalar 2

    38/38

    next

    superthreading

    http://var/www/apps/conversion/current/tmp/My%20Documents/Downloads/05-superthreading.ppthttp://var/www/apps/conversion/current/tmp/My%20Documents/Downloads/05-superthreading.ppt