ZJ Hyperthreading

download ZJ Hyperthreading

of 30

Transcript of ZJ Hyperthreading

  • 7/24/2019 ZJ Hyperthreading

    1/30

    Hyper-Threading, Chip

    multiprocessors andboth

    Zoran Jovanovic

  • 7/24/2019 ZJ Hyperthreading

    2/30

    2

    To Be Tackled in Multithreading

    Revie o! Threading "lgorithms

    Hyper-Threading Concepts Hyper-Threading "rchitecture

    "dvantages#$isadvantages

  • 7/24/2019 ZJ Hyperthreading

    3/30

    3

    Threading "lgorithms Time-slicing

    " processor sitches beteen threads in !i%edtime intervals&

    High e%penses, especially i! one o! theprocesses is in the ait state& Fine grain

    'itch-on-event

    Task sitching in case o! long pauses(aiting !or data coming !rom a relatively slo

    source, C)* resources are given to otherprocesses& Coarse grain

  • 7/24/2019 ZJ Hyperthreading

    4/30

    4

    Threading "lgorithms +cont& Multiprocessing

    $istribute the load over many processors

    "dds e%tra cost

    'imultaneous multi-threadingMultiple threads e%ecute on a single

    processor ithout sitching&

    Basis o! ntel.s Hyper-Threading technology&

  • 7/24/2019 ZJ Hyperthreading

    5/30

    5

    Hyper-Threading Concept

    "t each point o! time only a part o!processor resources is used !or e%ecutiono! the program code o! a thread&

    *nused resources can also be loaded, !ore%ample, ith parallel e%ecution o!another thread#application&

    /%tremely use!ul in desktop and serverapplications here many threads areused&

  • 7/24/2019 ZJ Hyperthreading

    6/30

    0uick Recall1 Many Resources$2/3

    From: Tullsen,Eggers, and Levy,SimultaneousMultithreading:Maximizing On-chip

    Parallelism, ISCA1995.

    For an 8-waysuperscalar.

    Slide source: John Kubiatowicz

  • 7/24/2019 ZJ Hyperthreading

    7/30

    7

  • 7/24/2019 ZJ Hyperthreading

    8/30

    4

    +a " superscalar processor ith no multithreading

    +b " superscalar processor ith coarse-grain multithreading

    +c " superscalar processor ith !ine-grain multithreading

    +d " superscalar processor ith simultaneous multithreading+'MT

    (a)(a) (b)(b) (c)(c) (d)(d)

  • 7/24/2019 ZJ Hyperthreading

    9/30

    5

    'imultaneous Multithreading

    +'MT/%ample1 ne )entium ith 6Hyperthreading78ey dea1 /%ploit 2) across multiple threads3

    i&e&, convert thread-level parallelism into more 2)

    e%ploit !olloing !eatures o! modern processors1 multiple !unctional units

    modern processors typically have more !unctional unitsavailable than a single thread can utili9e

    register renaming and dynamic scheduling multiple instructions !rom independent threads can co-e%ist

    and co-e%ecute3

  • 7/24/2019 ZJ Hyperthreading

    10/30

    10

    Hyper-Threading "rchitecture

    :irst used in ntel ;eon M) processor Makes a single physical processor appear as

    multiple logical processors&

    /ach logical processor has a copy o! architecturestate& 2ogical processors share a single set o! physical

    e%ecution resources

  • 7/24/2019 ZJ Hyperthreading

    11/30

    11

    Hyper-Threading "rchitecture

  • 7/24/2019 ZJ Hyperthreading

    12/30

    )oer = data!lo &&&

    (hy only to threads>

    With 4, one of the shared resources (physical registers,cache, memory bandwidth) would be prone to bottleneck

    Cost1

    The Power core is about !4" larger than the Power4 corebecause of the addition of #$T support

  • 7/24/2019 ZJ Hyperthreading

    13/30

    13

    "dvantages

    /%tra architecture onlyadds about =? to thetotal die area&

    @o per!ormance loss i!only one thread is active&ncreased per!ormanceith multiple threads

    Better resourceutili9ation&

  • 7/24/2019 ZJ Hyperthreading

    14/30

    14

    $isadvantages

    To take advantage o! hyper-threadingper!ormance, serial e%ecution can not beused&Threads are non-deterministic and involve

    e%tra design

    Threads have increased overhead

    'hared resource con!licts

  • 7/24/2019 ZJ Hyperthreading

    15/30

    Multicore

    Multiprocessors on a single chip

    15

  • 7/24/2019 ZJ Hyperthreading

    16/30

    CS267 Lecture 6 16

    Basic 'hared Memory"rchitecture )rocessors all connected to a large shared memory (here are caches>

    A @o take a closer look at structure, costs, limits,programming

    )

    interconnect

    memory

    ) )n

  • 7/24/2019 ZJ Hyperthreading

    17/30

    Slide source: John Kubiatowicz

    (hat "bout Caching>>>

    (ant High per!ormance !or shared memory1 *se Caches3

    /ach processor has its on cache +or multiple caches )lace data !rom memory into cache

    (riteback cache1 don.t send all rites over bus to memory

    Caches Reduce average latency "utomatic replication closer to processor

    Moreimportant to multiprocessor than uniprocessor1 latencies longer

    @ormal uniprocessor mechanisms to access data 2oads and 'tores !orm very lo-overhead communication primitive

    )roblem1 Cache Coherence3

    #< devicesMem

    )

    D D

    )n

    Bus

  • 7/24/2019 ZJ Hyperthreading

    18/30

    /%ample Cache Coherence )roblem

    #< devices

    Memory

    )

    D D D

    ) )E

    =

    u F >

    G

    u F >

    u 1=

    u 1=

    u 1=

    E

    u F

    Things to note1 )rocessors could see di!!erent values !or u a!ter event E (ith rite back caches, value ritten back to memory depends on

    happenstance o! hich cache !lushes or rites back value hen

    Ho to !i% ith a bus1 Coherence )rotocol *se bus to broadcast rites or invalidations 'imple protocols rely on presence o! broadcast medium

    Bus not scalable beyond about IG processors +ma% Capacity, bandidth limitations

    Slide source: John Kubiatowicz

  • 7/24/2019 ZJ Hyperthreading

    19/30

    CS267 Lecture 6

    2imits o! Bus-Based 'haredMemory#< M/M M/M

    )R

  • 7/24/2019 ZJ Hyperthreading

    20/30

    20

  • 7/24/2019 ZJ Hyperthreading

    21/30

    Cache

  • 7/24/2019 ZJ Hyperthreading

    22/30

    " Reminder1 'MT+'imultaneous Multi Threading

    'MT vs& CM)

  • 7/24/2019 ZJ Hyperthreading

    23/30

    " 'ingle Chip Multiprocessor2& Hammond at al& +'tan!ord, /// Computer 5

    A :or 'ame area +a billion tr& $R"Marea

    'uperscalar and 'MT1 Pery Comple%A(ideA"dvanced Branch predictionARegister RenamingA

  • 7/24/2019 ZJ Hyperthreading

    24/30

    '' and 'MT vs& CM)

    CP% Cores&Three main hardare design problems +o! '' and'MT1A"rea increases Ouadraticallyith core comple%ity

    A@umber o! Registers

  • 7/24/2019 ZJ Hyperthreading

    25/30

    '' and 'MT vs& CM)

    $emory&A issue '' or 'MT reOuire multiport data cache +G-I ports

    A ; 4 8byte + cycle latencyCM) I ; I 8byte +single cycle latency, but secondarycache is sloer +multiport

    'hared memory1 rite through caches#$T C$P

  • 7/24/2019 ZJ Hyperthreading

    26/30

    )er!ormance comparison

    ACompress1 +nteger apps 2o 2) and no T2)A Mpeg-1 +MMedia apps

    High 2) and T2) and moderate memory reOuirement +paralleli9ed by hand

    'MT utili9es core resources better But CM) has I issue slots instead o!

    A Tomcatv1 +:) applications2arge loop-level parallelism and large memory bandidth +T2) by compiler

    CM) has large memory bandidth on primary cache - 'MT !undamental problem1 uni!ied and slo cache

    A Multiprogram1 nteger multiprogramming orkload, all computation-intensive +2o 2), High )2)

  • 7/24/2019 ZJ Hyperthreading

    27/30

    CM) Motivation

    Ho to utili9e available silicon>

    'peculation +aggressive superscalar

    'imultaneous Multithreading +'MT, Hyperthreading

    'everal processors on a single chip

    (hat is a CM) +Chip Multi)rocessor>

    'everal processors +several masters

    Both shared and distributed memory architectures

    Both homogenous and heterogeneous processor types

    (hy>

    (ire $elays

    $iminishing o! *niprocessors

    Pery long design and veri!ication times !or modern processors

  • 7/24/2019 ZJ Hyperthreading

    28/30

    " 'ingle Chip Multiprocessor2& Hammond at al& +'tan!ord, /// Computer 5

    AT2) and )2) become idespread in !uture applicationsAParious Multimedia applicationsACompilers and

  • 7/24/2019 ZJ Hyperthreading

    29/30

    " Reminder1 'MT+'imultaneous Multi Threading

    'MT CM)

    A)ool o! e%ecution units +(ide machineA'everal 2ogical processorsACopy o! 'tate !or each

    AMul& Threads are runningconcurrentlyABetter utili9ation and 2atencyTolerance

    A'imple CoresAModerate amount o! parallelismAThreads are running concurrently

    on di!!erent cores

  • 7/24/2019 ZJ Hyperthreading

    30/30

    E

    'MT $ual-core1 all !our threads can runconcurrently

    BTB and -T2B

    $ecoder

    Trace Cache

    Rename#"lloc

    *op Oueues

    'chedulers

    nteger :loating )oint

    2 $-Cache $-T2B

    uCodeR