Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The...

download Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

of 137

Transcript of Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The...

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    1/137

    COMPUTER ORGANIZATION AND DESIGNThe Hardware/Software Interface

    5th

    Edition

    Chap er 4

    The Processor 

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    2/137

    Chapter 4 — The Processor — 2

    Introduction

    CPU performance factors Instruction count

    Determined by ISA and compiler 

    CPI and Cycle time Determined by CPU hardware

    e will e!amine two "IPS implementations  A simplified #ersion  A more realistic pipelined #ersion

    Simple subset$ shows most aspects "emory reference% lw$ sw  Arithmetic/lo&ical% add$ sub$ and$ or$ slt Control transfer% beq$ j

    '()*Introductio n

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    3/137

    Chapter 4 — The Processor — 3

    Instruction Execution

    PC→ instruction memory$ fetch instruction +e&ister numbers→ re&ister file$ read re&isters Dependin& on instruction class

    Use A,U to calculate  Arithmetic result "emory address for load/store -ranch tar&et address

     Access data memory for load/store PC← tar&et address or PC . (

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    4/137

    Chapter 4 — The Processor — 4

    CPU Overview

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    5/137

    Chapter 4 — The Processor — 5

    Multiplexers

    Cant 0ust 0oinwires to&ether  Use multiple!ers

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    6/137

    Chapter 4 — The Processor —

    Control

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    7/137Chapter 4 — The Processor — !

    "o#ic $esi#n %asics

    '()1,o&icDe

    s i&nCon#entio

    ns

    Information encoded in binary ,ow #olta&e 2 3$ Hi&h #olta&e 2 * 4ne wire per bit "ulti5bit data encoded on multi5wire buses

    Combinational element 4perate on data

    4utput is a function of input State 6se7uential8 elements

    Store information

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    8/137Chapter 4 — The Processor — &

    Co'(inational Ele'ents

     A9D5&ate : 2 A ; -

     A

    -

    :

    I3

    I*:

    "u!

    S

    "ultiple!er  : 2 S < I* % I3

     A

    -

    :.

     A

    -

    : A,U

    =

     Adder  : 2 A . -

     Arithmetic/,o&ic Unit : 2 =6A$ -8

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    9/137

    Chapter 4 — The Processor — )

    *e+uential Ele'ents

    +e&ister% stores data in a circuit Uses a cloc> si&nal to determine when to

    update the stored #alue Ed&e5tri&&ered% update when Cl> chan&es

    from 3 to *

    D

    Cl>

    ?

    Cl>

    D

    ?

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    10/137

    Chapter 4 — The Processor — ,-

    *e+uential Ele'ents

    +e&ister with write control 4nly updates on cloc> ed&e when write

    control input is * Used when stored #alue is re7uired later 

    D

    Cl>

    ?

    rite

    rite

    D

    ?

    Cl>

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    11/137

    Chapter 4 — The Processor — ,,

    Cloc.in# Methodolo#/

    Combinational lo&ic transforms datadurin& cloc> cycles -etween cloc> ed&es Input from state elements$ output to state

    element ,on&est delay determines cloc> period

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    12/137

    Chapter 4 — The Processor — ,2

    %uildin# a $atapath

    Datapath Elements that process data and addresses

    in the CPU +e&isters$ A,Us$ mu!s$ memories$ @

    e will build a "IPS datapathincrementally +efinin& the o#er#iew desi&n

    '()-uildin&

    a Datapath

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    13/137

    Chapter 4 — The Processor — ,3

    Instruction 0etch

    15bit

    re&ister 

    Increment by( for ne!tinstruction

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    14/137

    Chapter 4 — The Processor — ,4

    10or'at Instructions

    +ead two re&ister operands Perform arithmetic/lo&ical operation rite re&ister result

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    15/137

    Chapter 4 — The Processor — ,5

    "oad*tore Instructions

    +ead re&ister operands Calculate address usin& *B5bit offset

    Use A,U$ but si&n5e!tend offset

    ,oad% +ead memory and update re&ister 

    Store% rite re&ister #alue to memory

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    16/137

    Chapter 4 — The Processor — ,

    %ranch Instructions

    +ead re&ister operands Compare operands

    Use A,U$ subtract and chec> ero output

    Calculate tar&et address Si&n5e!tend displacement Shift left 1 places 6word displacement8

     Add to PC . (  Already calculated by instruction fetch

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    17/137

    Chapter 4 — The Processor — ,!

    %ranch Instructions

    ustre5routes

    wires

    Si&n5bit wirereplicated

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    18/137

    Chapter 4 — The Processor — ,&

    Co'posin# the Ele'ents

    =irst5cut data path does an instruction inone cloc> cycle Each datapath element can only do one

    function at a time Hence$ we need separate instruction and data

    memories

    Use multiple!ers where alternate data

    sources are used for different instructions

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    19/137

    Chapter 4 — The Processor — ,)

    1T/pe"oad*tore $atapath

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    20/137

    Chapter 4 — The Processor — 2-

    0ull $atapath

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    21/137

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    22/137

    Chapter 4 — The Processor — 22

    "U Control

     Assume 15bit A,U4p deri#ed from opcode Combinational lo&ic deri#es A,U control

    opcode A,U4p 4peration funct A,U function A,U control

    lw 33 load word add 33*3

    sw 33 store word add 33*3

    be7 3* branch e7ual subtract 3**3

    +5type *3 add *33333 add 33*3

    subtract *333*3 subtract 3**3

     A9D *33*33 A9D 33334+ *33*3* 4+ 333*

    set5on5less5than *3*3*3 set5on5less5than 3***

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    23/137

    Chapter 4 — The Processor — 23

    The Main Control Unit

    Control si&nals deri#ed from instruction3 rs rt rd shamt funct

    *%1B F%31F%1* 13%*B *F%** *3%B

    F or ( rs rt address*%1B 1F%1* 13%*B *F%3

    ( rs rt address

    *%1B 1F%1* 13%*B *F%3

    +5type

    ,oad/Store

    -ranch

    opcode alwaysread

    read$e!ceptfor load

    write for+5type

    and load

    si&n5e!tendand add

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    24/137

    Chapter 4 — The Processor — 24

    $atapath ith Control

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    25/137

    Chapter 4 — The Processor — 25

    1T/pe Instruction

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    26/137

    Chapter 4 — The Processor — 2

    "oad Instruction

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    27/137

    Chapter 4 — The Processor — 2!

    %ranchonE+ual Instruction

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    28/137

    Chapter 4 — The Processor — 2&

    I'ple'entin# 6u'ps

    ump uses word address

    Update PC with concatenation of  Top ( bits of old PC 1B5bit 0ump address

    33 9eed an e!tra control si&nal decoded from

    opcode

    1 address*%1B 1F%3

    ump

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    29/137

    Chapter 4 — The Processor — 2)

    $atapath ith 6u'ps dded

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    30/137

    Chapter 4 — The Processor — 3-

    Per7or'ance Issues

    ,on&est delay determines cloc> period Critical path% load instruction Instruction memory→ re&ister file→ A,U→ 

    data memory→ re&ister file

    9ot feasible to #ary period for differentinstructions

    Giolates desi&n principle "a>in& the common case fast

    e will impro#e performance by pipelinin&

    '(

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    31/137

    Chapter 4 — The Processor — 3,

    Pipelinin# nalo#/

    Pipelined laundry% o#erlappin& e!ecution Parallelism impro#es performance

    ()FAn4#er#

    i ew

    ofPipelinin

    & =our loads% Speedup

    2 /)F 2 1)

    9on5stop%

    Speedup2 1n/3)Fn . *)F (2 number of sta&es

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    32/137

    Chapter 4 — The Processor — 32

    MIP* Pipeline

    =i#e sta&es$ one step per sta&e

    *) I=% Instruction fetch from memory

    1) ID% Instruction decode ; re&ister read

    ) E% E!ecute operation or calculate address

    () "E"% Access memory operand

    F) -% rite result bac> to re&ister 

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    33/137

    Chapter 4 — The Processor — 33

    Pipeline Per7or'ance

     Assume time for sta&es is *33ps for re&ister read or write 133ps for other sta&es

    Compare pipelined datapath with sin&le5cycle

    datapath

    Instr Instr fetch +e&isterread

     A,U op "emoryaccess

    +e&isterwrite

    Total time

    lw 133ps *33 ps 133ps 133ps *33 ps 33ps

    sw 133ps *33 ps 133ps 133ps J33ps

    +5format 133ps *33 ps 133ps *33 ps B33ps

    be7 133ps *33 ps 133ps F33ps

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    34/137

    Chapter 4 — The Processor — 34

    Pipeline Per7or'ance

    Sin&le5cycle 6Tc2 33ps8

    Pipelined 6Tc2 133ps8

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    35/137

    Chapter 4 — The Processor — 35

    Pipeline *peedup

    If all sta&es are balanced i)e)$ all ta>e the same time

    Time between instructionspipelined

    2 Time between instructionsnonpipelined9umber of sta&es

    If not balanced$ speedup is less

    Speedup due to increased throu&hput ,atency 6time for each instruction8 does not

    decrease

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    36/137

    Chapter 4 — The Processor — 3

    Pipelinin# and I* $esi#n

    "IPS ISA desi&ned for pipelinin&  All instructions are 15bits

    Easier to fetch and decode in one cycle c)f) !B% *5 to *J5byte instructions

    =ew and re&ular instruction formats Can decode and read re&isters in one step

    ,oad/store addressin& Can calculate address in rd sta&e$ access memory

    in (th sta&e  Ali&nment of memory operands

    "emory access ta>es only one cycle

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    37/137

    Chapter 4 — The Processor — 3!

    8a9ards

    Situations that pre#ent startin& the ne!tinstruction in the ne!t cycle

    Structure haKards  A re7uired resource is busy

    Data haKard 9eed to wait for pre#ious instruction to

    complete its data read/write

    Control haKard Decidin& on control action depends on

    pre#ious instruction

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    38/137

    Chapter 4 — The Processor — 3&

    *tructure 8a9ards

    Conflict for use of a resource

    In "IPS pipeline with a sin&le memory ,oad/store re7uires data access

    Instruction fetch would ha#e tostall 

     for thatcycle ould cause a pipeline LbubbleM

    Hence$ pipelined datapaths re7uire

    separate instruction/data memories 4r separate instruction/data caches

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    39/137

    Chapter 4 — The Processor — 3)

    $ata 8a9ards

     An instruction depends on completion ofdata access by a pre#ious instruction add $s0, $t0, $t1sub $t2, $s0, $t3

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    40/137

    Chapter 4 — The Processor — 4-

    0orwardin# :a.a %/passin#;

    Use result when it is computed Dont wait for it to be stored in a re&ister  +e7uires e!tra connections in the datapath

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    41/137

    Chapter 4 — The Processor — 4,

    "oadUse $ata 8a9ard

    Cant always a#oid stalls by forwardin& If #alue not computed when needed Cant forward bac>ward in timeN

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    42/137

    Chapter 4 — The Processor — 42

    Code *chedulin# to void *talls

    +eorder code to a#oid use of load result inthe ne!t instruction

    C code for A = B + E; C = B + F;

    lw $t1, 0($t0)

    lw $t2, ($t0)

    add $t3, $t1, $t2

    sw $t3, 12($t0)

    lw $t, !($t0)add $t", $t1, $t

    sw $t", 1#($t0)

    stall

    stall

    lw $t1, 0($t0)

    lw $t2, ($t0)

    lw $t, !($t0)

    add $t3, $t1, $t2

    sw $t3, 12($t0)add $t", $t1, $t

    sw $t", 1#($t0)

    ** cycles* cycles

    C

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    43/137

    Chapter 4 — The Processor — 43

    Control 8a9ards

    -ranch determines flow of control =etchin& ne!t instruction depends on branch

    outcome Pipeline cant always fetch correct instruction

    Still wor>in& on ID sta&e of branch In "IPS pipeline

    9eed to compare re&isters and computetar&et early in the pipeline

     Add hardware to do it in ID sta&e

    * ll % h

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    44/137

    Chapter 4 — The Processor — 44

    *tall on %ranch

    ait until branch outcome determinedbefore fetchin& ne!t instruction

    % h P di ti

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    45/137

    Chapter 4 — The Processor — 45

    %ranch Prediction

    ,on&er pipelines cant readily determinebranch outcome early Stall penalty becomes unacceptable

    Predict outcome of branch 4nly stall if prediction is wron&

    In "IPS pipeline Can predict branches not ta>en =etch instruction after branch$ with no delay

    MIP* ith P di t < t T .

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    46/137

    Chapter 4 — The Processor — 4

    MIP* with Predict

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    47/137

    Chapter 4 — The Processor — 4!

    More1ealistic %ranch Prediction

    Static branch prediction -ased on typical branch beha#ior  E!ample% loop and if5statement branches

    Predict bac>ward branches ta>en Predict forward branches not ta>en

    Dynamic branch prediction Hardware measures actual branch beha#ior 

    e)&)$ record recent history of each branch

     Assume future beha#ior will continue the trend hen wron&$ stall while re5fetchin&$ and update history

    Pi li *

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    48/137

    Chapter 4 — The Processor — 4&

    Pipeline *u''ar/

    Pipelinin& impro#es performance byincreasin& instruction throu&hput

    E!ecutes multiple instructions in parallel Each instruction has the same latency

    Sub0ect to haKards

    Structure$ data$ control Instruction set desi&n affects comple!ity of

    pipeline implementation

    The BIG Pic ure

    MIP* Pi li d $ t th

    '()

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    49/137

    Chapter 4 — The Processor — 4)

    MIP* Pipelined $atapathBPipelined

    Datapathand

    C

    ontrol

    -

    "E"

    +i&ht5to5leftflow leads tohaKards

    Pi li i t

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    50/137

    Chapter 4 — The Processor — 5-

    Pipeline re#isters

    9eed re&isters between sta&es To hold information produced in pre#ious cycle

    Pi li O ti

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    51/137

    Chapter 4 — The Processor — 5,

    Pipeline Operation

    Cycle5by5cycle flow of instructions throu&hthe pipelined datapath LSin&le5cloc>5cycleM pipeline dia&ram

    Shows pipeline usa&e in a sin&le cycle Hi&hli&ht resources used

    c)f) Lmulti5cloc>5cycleM dia&ram Oraph of operation o#er time

    ell loo> at Lsin&le5cloc>5cycleM dia&ramsfor load ; store

    I0 7 " d *t

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    52/137

    Chapter 4 — The Processor — 52

    I0 7or "oad= *tore= >

    I$ 7 " d *t

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    53/137

    Chapter 4 — The Processor — 53

    I$ 7or "oad= *tore= >

    E? 7 " d

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    54/137

    Chapter 4 — The Processor — 54

    E? 7or "oad

    MEM 7 " d

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    55/137

    Chapter 4 — The Processor — 55

    MEM 7or "oad

    % 7or "oad

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    56/137

    Chapter 4 — The Processor — 5

    % 7or "oad

    ron&re&ister number 

    Corrected $atapath 7or "oad

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    57/137

    Chapter 4 — The Processor — 5!

    Corrected $atapath 7or "oad

    E? 7or *tore

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    58/137

    Chapter 4 — The Processor — 5&

    E? 7or *tore

    MEM 7or *tore

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    59/137

    Chapter 4 — The Processor — 5)

    MEM 7or *tore

    % 7or *tore

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    60/137

    Chapter 4 — The Processor — -

    % 7or *tore

    Multi C/cle Pipeline $ia#ra'

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    61/137

    Chapter 4 — The Processor — ,

    MultiC/cle Pipeline $ia#ra'

    =orm showin& resource usa&e

    Multi C/cle Pipeline $ia#ra'

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    62/137

    Chapter 4 — The Processor — 2

    MultiC/cle Pipeline $ia#ra'

    Traditional form

    *in#le C/cle Pipeline $ia#ra'

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    63/137

    Chapter 4 — The Processor — 3

    *in#leC/cle Pipeline $ia#ra'

    State of pipeline in a &i#en cycle

    Pipelined Control :*i'pli7ied;

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    64/137

    Chapter 4 — The Processor — 4

    Pipelined Control :*i'pli7ied;

    Pipelined Control

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    65/137

    Chapter 4 — The Processor — 5

    Pipelined Control

    Control si&nals deri#ed from instruction  As in sin&le5cycle implementation

    Pipelined Control

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    66/137

    Chapter 4 — The Processor —

    Pipelined Control

    $ata 8a9ards in "U Instructions

    '()J

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    67/137

    Chapter 4 — The Processor — !

    $ata 8a9ards in "U Instructions

    Consider this se7uence%

    sub $2, $1,$3and $12,$2,$"or $13,$#,$2

    add $1,$2,$2sw $1",100($2)

    e can resol#e haKards with forwardin&

    How do we detect when to forward<

    DataHaK

    ards%=orwardi n

    &#s)S

    tallin&

    $ependencies @ 0orwardin#

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    68/137

    Chapter 4 — The Processor — &

    $ependencies @ 0orwardin#

    $etectin# the

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    69/137

    Chapter 4 — The Processor — )

    $etectin# the

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    70/137

    Chapter 4 — The Processor — !-

    $etectin# the

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    71/137

    Chapter 4 — The Processor — !,

    0orwardin# Paths

    0orwardin# Conditions

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    72/137

    Chapter 4 — The Processor — !2

    0orwardin# Conditions

    E haKard if 6E/"E")+e&rite and 6E/"E")+e&ister+d Q 38

      and 6E/"E")+e&ister+d 2 ID/E)+e&ister+s88  =orwardA 2 *3

    if 6E/"E")+e&rite and 6E/"E")+e&ister+d Q 38  and 6E/"E")+e&ister+d 2 ID/E)+e&ister+t88

      =orward- 2 *3 "E" haKard

    if 6"E"/-)+e&rite and 6"E"/-)+e&ister+d Q 38  and 6"E"/-)+e&ister+d 2 ID/E)+e&ister+s88  =orwardA 2 3*

    if 6"E"/-)+e&rite and 6"E"/-)+e&ister+d Q 38  and 6"E"/-)+e&ister+d 2 ID/E)+e&ister+t88  =orward- 2 3*

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    73/137

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    74/137

    $atapath with 0orwardin#

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    75/137

    Chapter 4 — The Processor — !5

    $atapath with 0orwardin#

    "oadUse $ata 8a9ard

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    76/137

    Chapter 4 — The Processor — !

    "oadUse $ata 8a9ard

    9eed to stallfor one cycle

    "oadUse 8a9ard $etection

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    77/137

    Chapter 4 — The Processor — !!

    "oadUse 8a9ard $etection

    Chec> when usin& instruction is decodedin ID sta&e

     A,U operand re&ister numbers in ID sta&eare &i#en by I=/ID)+e&ister+s$ I=/ID)+e&ister+t

    ,oad5use haKard when ID/E)"em+ead and

      66ID/E)+e&ister+t 2 I=/ID)+e&ister+s8 or   6ID/E)+e&ister+t 2 I=/ID)+e&ister+t88

    If detected$ stall and insert bubble

    8ow to *tall the Pipeline

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    78/137

    Chapter 4 — The Processor — !&

    8ow to *tall the Pipeline

    =orce control #alues in ID/E re&ister to 3 E$ "E" and - do no 6no5operation8

    Pre#ent update of PC and I=/ID re&ister  Usin& instruction is decoded a&ain =ollowin& instruction is fetched a&ain *5cycle stall allows "E" to read data for lw

    Can subse7uently forward to E sta&e

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    79/137

    *tall%u((le in the Pipeline

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    80/137

    Chapter 4 — The Processor — &-

    *tall%u((le in the Pipeline

    4r$ moreaccurately@

    $atapath with 8a9ard $etection

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    81/137

    Chapter 4 — The Processor — &,

    $atapath with 8a9ard $etection

    *talls and Per7or'ance

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    82/137

    Chapter 4 — The Processor — &2

    *talls and Per7or'ance

    Stalls reduce performance -ut are re7uired to &et correct results

    Compiler can arran&e code to a#oidhaKards and stalls +e7uires >nowled&e of the pipeline structure

    The BIG Pic ure

    %ranch 8a9ards'()C

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    83/137

    Chapter 4 — The Processor — &3

    %ranch 8a9ards

    If branch outcome determined in "E"

    ControlH

    aKards

    PC

    =lush theseinstructions6Set control

    #alues to 38

    1educin# %ranch $ela/

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    84/137

    Chapter 4 — The Processor — &4

    1educin# %ranch $ela/

    "o#e hardware to determine outcome to ID

    sta&e Tar&et address adder  +e&ister comparator 

    E!ample% branch ta>en3#% sub $10, $, $!0% beq $1, $3, &% and $12, $2, $"!% or $13, $2, $#

    "2% add $1, $, $2"#% slt $1", $#, $&  '''&2% lw $, "0($&)

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    85/137

    Exa'pleA %ranch Ta.en

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    86/137

    Chapter 4 — The Processor — &

    Exa'pleA %ranch Ta.en

    $ata 8a9ards 7or %ranches

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    87/137

    Chapter 4 — The Processor — &!

    $ata 8a9ards 7or %ranches

    If a comparison re&ister is a destination of

    1nd or rd precedin& A,U instruction

    I= ID E "E" -

    I= ID E "E" -

    I= ID E "E" -

    I= ID E "E" -

    add $, $", $#

    add $1, $2, $3

    beq $1, $, taret

    Can resol#e usin& forwardin&

    $ata 8a9ards 7or %ranches

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    88/137

    Chapter 4 — The Processor — &&

    $ata 8a9ards 7or %ranches

    If a comparison re&ister is a destination of

    precedin& A,U instruction or 1nd precedin&load instruction 9eed * stall cycle

    beq stalled

    I= ID E "E" -

    I= ID E "E" -

    I= ID

    ID E "E" -

    add $, $", $#

    lw $1, addr

    beq $1, $, taret

    $ata 8a9ards 7or %ranches

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    89/137

    Chapter 4 — The Processor — &)

    $ata 8a9ards 7or %ranches

    If a comparison re&ister is a destination of

    immediately precedin& load instruction 9eed 1 stall cycles

    beq stalled

    I= ID E "E" -

    I= ID

    ID

    ID E "E" -

    beq stalled

    lw $1, addr

    beq $1, $0, taret

    $/na'ic %ranch Prediction

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    90/137

    Chapter 4 — The Processor — )-

    / a c a c ed ct o

    In deeper and superscalar pipelines$ branch

    penalty is more si&nificant Use dynamic prediction

    -ranch prediction buffer 6a>a branch history table8

    Inde!ed by recent branch instruction addresses Stores outcome 6ta>en/not ta>en8 To e!ecute a branch

    Chec> table$ e!pect the same outcome

    Start fetchin& from fall5throu&h or tar&et If wron&$ flush pipeline and flip prediction

    ,%it PredictorA *hortco'in#

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    91/137

    Chapter 4 — The Processor — ),

    #

    Inner loop branches mispredicted twiceN

    outer%   *nner%  

      beq , , *nner    beq , , outer

    "ispredict as ta>en on last iteration of

    inner loop Then mispredict as not ta>en on first

    iteration of inner loop ne!t time around

    2%it Predictor 

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    92/137

    Chapter 4 — The Processor — )2

    4nly chan&e prediction on two successi#e

    mispredictions

    Calculatin# the %ranch Tar#et

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    93/137

    Chapter 4 — The Processor — )3

    # #

    E#en with predictor$ still need to calculate

    the tar&et address *5cycle penalty for a ta>en branch

    -ranch tar&et buffer  Cache of tar&et addresses Inde!ed by PC when instruction fetched

    If hit and instruction is branch predicted ta>en$ can

    fetch tar&et immediately

    Exceptions and Interrupts'()RE!

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    94/137

    Chapter 4 — The Processor — )4

    p p

    LUne!pectedM e#ents re7uirin& chan&e

    in flow of control Different ISAs use the terms differently

    E!ception

     Arises within the CPU e)&)$ undefined opcode$ o#erflow$ syscall$ @

    Interrupt =rom an e!ternal I/4 controller 

    Dealin& with them without sacrificin&performance is hard

    !ception

    s

    8andlin# Exceptions

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    95/137

    Chapter 4 — The Processor — )5

    # p

    In "IPS$ e!ceptions mana&ed by a System

    Control Coprocessor 6CP38 Sa#e PC of offendin& 6or interrupted8 instruction

    In "IPS% E!ception Pro&ram Counter 6EPC8

    Sa#e indication of the problem In "IPS% Cause re&ister  ell assume *5bit

    3 for undefined opcode$ * for o#erflow

    ump to handler at 333 33*3

    n lternate Mechanis'

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    96/137

    Chapter 4 — The Processor — )

    Gectored Interrupts Handler address determined by the cause

    E!ample% Undefined opcode% C333 3333

    4#erflow% C333 3313 @% C333 33(3

    Instructions either 

    Deal with the interrupt$ or  ump to real handler 

    8andler ctions

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    97/137

    Chapter 4 — The Processor — )!

    +ead cause$ and transfer to rele#ant

    handler  Determine action re7uired If restartable

    Ta>e correcti#e action use EPC to return to pro&ram

    4therwise

    Terminate pro&ram +eport error usin& EPC$ cause$ @

    Exceptions in a Pipeline

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    98/137

    Chapter 4 — The Processor — )&

    p p

     Another form of control haKard Consider o#erflow on add in E sta&e

    add $1, $2, $1 Pre#ent * from bein& clobbered

    Complete pre#ious instructions =lush add and subse7uent instructions Set Cause and EPC re&ister #alues

    Transfer control to handler  Similar to mispredicted branch

    Use much of the same hardware

    Pipeline with Exceptions

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    99/137

    Chapter 4 — The Processor — ))

    p p

    Exception Properties

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    100/137

    Chapter 4 — The Processor — ,--

    p p

    +estartable e!ceptions Pipeline can flush the instruction Handler e!ecutes$ then returns to the

    instruction

    +efetched and e!ecuted from scratch PC sa#ed in EPC re&ister 

    Identifies causin& instruction

     Actually PC . ( is sa#ed Handler must ad0ust

    Exception Exa'ple

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    101/137

    Chapter 4 — The Processor — ,-,

    p p

    E!ception on add in

    0 sub $11, $2, $ and $12, $2, $"! or $13, $2, $#C add $1, $2, $1"0 slt $1", $#, $&

    " lw $1#, "0($&)

    Handler !00001!0 sw $2", 1000($0)

    !00001! sw $2#, 100($0)

    Exception Exa'ple

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    102/137

    Chapter 4 — The Processor — ,-2

    p p

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    103/137

    Multiple Exceptions

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    104/137

    Chapter 4 — The Processor — ,-4

    Pipelinin& o#erlaps multiple instructions Could ha#e multiple e!ceptions at once

    Simple approach% deal with e!ception fromearliest instruction

    =lush subse7uent instructions LPreciseM e!ceptions

    In comple! pipelines "ultiple instructions issued per cycle 4ut5of5order completion "aintainin& precise e!ceptions is difficultN

    I'precise Exceptions

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    105/137

    Chapter 4 — The Processor — ,-5

    ust stop pipeline and sa#e state Includin& e!ception cause6s8

    ,et the handler wor> out hich instruction6s8 had e!ceptions

    hich to complete or flush "ay re7uire LmanualM completion

    Simplifies hardware$ but more comple! handlersoftware

    9ot feasible for comple! multiple5issueout5of5order pipelines

    Instruction"evel Parallelis' :I"P;

    '()*3Pa

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    106/137

    Chapter 4 — The Processor — ,-

    Pipelinin&% e!ecutin& multiple instructions in

    parallel To increase I,P

    Deeper pipeline ,ess wor> per sta&e ⇒ shorter cloc> cycle

    "ultiple issue +eplicate pipeline sta&es⇒ multiple pipelines Start multiple instructions per cloc> cycle CPI *$ so use Instructions Per Cycle 6IPC8 E)&)$ (OHK (5way multiple5issue

    *B -IPS$ pea> CPI 2 3)1F$ pea> IPC 2 ( -ut dependencies reduce this in practice

    arallel is

    m#iaInstruc

    tion

    s

    Multiple Issue

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    107/137

    Chapter 4 — The Processor — ,-!

    Static multiple issue Compiler &roups instructions to be issued to&ether  Pac>a&es them into Lissue slotsM Compiler detects and a#oids haKards

    Dynamic multiple issue CPU e!amines instruction stream and chooses

    instructions to issue each cycle Compiler can help by reorderin& instructions

    CPU resol#es haKards usin& ad#anced techni7ues atruntime

    *peculation

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    108/137

    Chapter 4 — The Processor — ,-&

    LOuessM what to do with an instruction Start operation as soon as possible Chec> whether &uess was ri&ht

    If so$ complete the operation If not$ roll5bac> and do the ri&ht thin&

    Common to static and dynamic multiple issue E!amples

    Speculate on branch outcome

    +oll bac> if path ta>en is different Speculate on load

    +oll bac> if location is updated

    Co'piler8ardware *peculation

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    109/137

    Chapter 4 — The Processor — ,-)

    Compiler can reorder instructions e)&)$ mo#e load before branch Can include Lfi!5upM instructions to reco#er

    from incorrect &uess

    Hardware can loo> ahead for instructionsto e!ecute -uffer results until it determines they are

    actually needed =lush buffers on incorrect speculation

    *peculation and Exceptions

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    110/137

    Chapter 4 — The Processor — ,,-

    hat if e!ception occurs on a

    speculati#ely e!ecuted instruction< e)&)$ speculati#e load before null5pointer

    chec>

    Static speculation Can add ISA support for deferrin& e!ceptions

    Dynamic speculation Can buffer e!ceptions until instruction

    completion 6which may not occur8

    *tatic Multiple Issue

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    111/137

    Chapter 4 — The Processor — ,,,

    Compiler &roups instructions into Lissue

    pac>etsM Oroup of instructions that can be issued on a

    sin&le cycle

    Determined by pipeline resources re7uired Thin> of an issue pac>et as a #ery lon&

    instruction

    Specifies multiple concurrent operations ⇒ Gery ,on& Instruction ord 6G,I8

    *chedulin# *tatic Multiple Issue

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    112/137

    Chapter 4 — The Processor — ,,2

    Compiler must remo#e some/all haKards +eorder instructions into issue pac>ets 9o dependencies with a pac>et Possibly some dependencies between

    pac>ets Garies between ISAs compiler must >nowN

    Pad with nop if necessary

    MIP* with *tatic $ual Issue

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    113/137

    Chapter 4 — The Processor — ,,3

    Two5issue pac>ets 4ne A,U/branch instruction 4ne load/store instruction B(5bit ali&ned

     A,U/branch$ then load/store Pad an unused instruction with nop

     Address Instruction type Pipeline Sta&es

    n A,U/branch I= ID E "E" -

    n . ( ,oad/store I= ID E "E" -

    n . A,U/branch I= ID E "E" -

    n . *1 ,oad/store I= ID E "E" -

    n . *B A,U/branch I= ID E "E" -

    n . 13 ,oad/store I= ID E "E" -

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    114/137

    8a9ards in the $ualIssue MIP*

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    115/137

    Chapter 4 — The Processor — ,,5

    "ore instructions e!ecutin& in parallel

    E data haKard =orwardin& a#oided stalls with sin&le5issue 9ow cant use A,U result in load/store in same pac>et

    add $t0, $s0, $s1

    load $s2, 0($t0) Split into two pac>ets$ effecti#ely a stall

    ,oad5use haKard Still one cycle use latency$ but now two instructions

    "ore a&&ressi#e schedulin& re7uired

    *chedulin# Exa'ple

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    116/137

    Chapter 4 — The Processor — ,,

    Schedule this for dual5issue "IPS

    oo% lw $t0, 0($s1) $t0=arra- ele.ent  addu $t0, $t0, $s2 add s/alar *n $s2  sw $t0, 0($s1) store result  add* $s1, $s1, de/re.ent o*nter

      bne $s1, $ero, oo bran/ $s1=0

     A,U/branch ,oad/store cycle

    oo% no lw $t0, 0($s1) 1

    add* $s1, $s1, no 2

    addu $t0, $t0, $s2 no 3bne $s1, $ero, oo sw $t0, ($s1)

    IPC 2 F/( 2 *)1F 6c)f) pea> IPC 2 18

    "oop Unrollin#

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    117/137

    Chapter 4 — The Processor — ,,!

    +eplicate loop body to e!pose more

    parallelism +educes loop5control o#erhead

    Use different re&isters per replication Called Lre&ister renamin&M  A#oid loop5carried Lanti5dependenciesM

    Store followed by a load of the same re&ister 

     A>a Lname dependenceM +euse of a re&ister name

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    118/137

    $/na'ic Multiple Issue

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    119/137

    Chapter 4 — The Processor — ,,)

    LSuperscalarM processors

    CPU decides whether to issue 3$ *$ 1$ @each cycle  A#oidin& structural and data haKards

     A#oids the need for compiler schedulin& Thou&h it may still help Code semantics ensured by the CPU

    $/na'ic Pipeline *chedulin#

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    120/137

    Chapter 4 — The Processor — ,2-

     Allow the CPU to e!ecute instructions out

    of order to a#oid stalls -ut commit result to re&isters in order 

    E!ample

    lw $t0, 20($s2)addu $t1, $t0, $t2sub $s, $s, $t3

    slt* $t", $s, 20 Can start sub while addu is waitin& for lw

    $/na'icall/ *cheduled CPU

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    121/137

    Chapter 4 — The Processor — ,2,

    +esults also sentto any waitin&

    reser#ation

    stations

    +eorders buffer forre&ister writes

    Can supplyoperands for

    issued instructions

    Preser#esdependencies

    Hold pendin&operands

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    122/137

    *peculation

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    123/137

    Chapter 4 — The Processor — ,23

    Predict branch and continue issuin& Dont commit until branch outcome

    determined

    ,oad speculation  A#oid load and cache miss delay

    Predict the effecti#e address Predict loaded #alue ,oad before completin& outstandin& stores -ypass stored #alues to load unit

    Dont commit load until speculation cleared

    h/ $o $/na'ic *chedulin#B

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    124/137

    Chapter 4 — The Processor — ,24

    hy not 0ust let the compiler schedule

    code< 9ot all stalls are predicable

    e)&)$ cache misses

    Cant always schedule around branches -ranch outcome is dynamically determined

    Different implementations of an ISA ha#edifferent latencies and haKards

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    125/137

    Power E77icienc/

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    126/137

    Chapter 4 — The Processor — ,2

    Comple!ity of dynamic schedulin& and

    speculations re7uires power  "ultiple simpler cores may be better "icroprocessor :ear Cloc> +ate Pipeline

    Sta&esIssuewidth

    4ut5of5order/Speculation

    Cores Power  

    i(B *RR 1F"HK F * 9o * F

    Pentium *RR BB"HK F 1 9o * *3

    Pentium Pro *RRJ 133"HK *3 :es * 1R

    P( illamette 133* 1333"HK 11 :es * JF

    P( Prescott 133( B33"HK * :es * *3

    Core 133B 1R3"HK *( ( :es 1 JF

    UltraSparc III 133 *RF3"HK *( ( 9o * R3

    UltraSparc T* 133F *133"HK B * 9o J3

    Cortex & and Intel i!

    '()**+

    ea

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    127/137

    Processor 1M & Intel Core i! )2-

    "ar>et Personal "obile De#ice Ser#er$ cloud

    Thermal desi&n power 1 atts *3 atts

    Cloc> rate * OHK 1)BB OHK

    Cores/Chip * (

    =loatin& point< 9o :es

    "ultiple issue< Dynamic Dynamic

    Pea> instructions/cloc> cycle 1 (

    Pipeline sta&es *( *(

    Pipeline schedule Static in5order Dynamic out5of5orderwith speculation

    -ranch prediction 15le#el 15le#el

    *st le#el caches/core 1 i- I$ 1 i- D 1 i- I$ 1 i- D

    1nd le#el caches/core *15*31( i- 1FB i-

    rd le#el caches 6shared8 5 15 "-

    Chapter 4 — The Processor — ,2!

    alStuff

    %The

    A+"

    Corte!5AandIn

    telCor e

    iJPipelines

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    128/137

    1M Cortex& Per7or'ance

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    129/137

    Chapter 4 — The Processor — ,2)

    Core i! Pipeline

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    130/137

    Chapter 4 — The Processor — ,3-

    Core i! Per7or'ance

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    131/137

    Chapter 4 — The Processor — ,3,

    Matrix Multipl/'()*1In

    str

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    132/137

    Unrolled C code1 #include

    2 #define UNROLL (4)

    3

    4 vid d!e"" (int n du$le% & du$le% ' du$le% )

    *

    6 fr ( int i + ,- i < n- i+UNROLL%4 )

    / fr ( int 0 + ,- 0 < n- 0 ) *

    8 "26d c4-

    fr ( int x + ,- x < UNROLL- x )

    1, cx + ""26l5dd(ix%40%n)-

    11

    12 fr( int 7 + ,- 7 < n- 7 )

    13 *

    14 "26d $ + ""26$r5dc5td('70%n)-

    1 fr (int x + ,- x < UNROLL- x)

    16 cx + ""265ddd(cx

    1/ ""26"uld(""26l5dd(&n%7x%4i) $))-

    18 9

    1

    2, fr ( int x + ,- x < UNROLL- x )

    21 ""26tred(ix%40%n cx)-

    22 9

    23 9

    Chapter 4 — The Processor — ,32

    ructio

    n

    5,e#elParall e

    lisma

    nd"atri!"

    ult iply

    Matrix Multipl/

    '()*1In

    str

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    133/137

     Assembly code%1 v"v5d (:r11):;""4 # L5d 4 ele"ent f int :;""4

    2 "v :r$x:r5x # re!iter :r5x + :r$x

    3 xr :ecx:ecx # re!iter :ecx + ,

    4 v"v5d ,x2,(:r11):;""3 # L5d 4 ele"ent f int :;""3

    v"v5d ,x4,(:r11):;""2 # L5d 4 ele"ent f int :;""2

    6 v"v5d ,x6,(:r11):;""1 # L5d 4 ele"ent f int :;""1

    / v$r5dc5td (:rcx:r1):;"", # 57e 4 cie f ' ele"ent

    8 5dd =,x8:rcx # re!iter :rcx + :rcx 8

    v"uld (:r5x):;"",:;"" # 5r5llel "ul :;""14 & ele"ent

    1, v5ddd :;"":;""4:;""4 # 5r5llel 5dd :;"" :;""411 v"uld ,x2,(:r5x):;"",:;"" # 5r5llel "ul :;""14 & ele"ent

    12 v5ddd :;"":;""3:;""3 # 5r5llel 5dd :;"" :;""3

    13 v"uld ,x4,(:r5x):;"",:;"" # 5r5llel "ul :;""14 & ele"ent

    14 v"uld ,x6,(:r5x):;"",:;"", # 5r5llel "ul :;""14 & ele"ent

    1 5dd :r8:r5x # re!iter :r5x + :r5x :r8

    16 c" :r1,:rcx # c"5re :r8 t :r5x

    1/ v5ddd :;"":;""2:;""2 # 5r5llel 5dd :;"" :;""2

    18 v5ddd :;"",:;""1:;""1 # 5r5llel 5dd :;"", :;""1

    1 0ne 68 # 0u" if nt :r8 ?+ :r5x

    2, 5dd =,x1:ei # re!iter : ei + : ei 1

    21 v"v5d :;""4(:r11) # @tre :;""4 int 4 ele"ent

    22 v"v5d :;""3,x2,(:r11) # @tre :;""3 int 4 ele"ent

    23 v"v5d :;""2,x4,(:r11) # @tre :;""2 int 4 ele"ent

    24 v"v5d :;""1,x6,(:r11) # @tre :;""1 int 4 ele"ent

    Chapter 4 — The Processor — ,33

    ructio

    n

    5,e#elParall e

    lisma

    nd"atri!"

    ult iply

    Per7or'ance I'pact

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    134/137

    Chapter 4 — The Processor — ,34

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    135/137

    Pit7alls

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    136/137

    Chapter 4 — The Processor — ,3

    Poor ISA desi&n can ma>e pipelinin&

    harder  e)&)$ comple! instruction sets 6GA$ IA518

    Si&nificant o#erhead to ma>e pipelinin& wor>

    IA51 micro5op approach e)&)$ comple! addressin& modes

    +e&ister update side effects$ memory indirection

    e)&)$ delayed branches  Ad#anced pipelines ha#e lon& delay slots

  • 8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

    137/137