Multi-Core+Architectures.ppt

download Multi-Core+Architectures.ppt

of 43

Transcript of Multi-Core+Architectures.ppt

  • 8/14/2019 Multi-Core+Architectures.ppt

    1/43

    Multi-Core Architectures

  • 8/14/2019 Multi-Core+Architectures.ppt

    2/43

    Multi-Core Architectures Single-core computer

  • 8/14/2019 Multi-Core+Architectures.ppt

    3/43

    Single-Core CPU Chip

  • 8/14/2019 Multi-Core+Architectures.ppt

    4/43

    Multi-Core Architectures

    Replicate multiple processor cores on asingle die.

  • 8/14/2019 Multi-Core+Architectures.ppt

    5/43

    Multi-Core CPU Chip The cores fit on a single processor socket Also called CMP (Chip Multi-Processor)

  • 8/14/2019 Multi-Core+Architectures.ppt

    6/43

    The Cores Run in Parallel

  • 8/14/2019 Multi-Core+Architectures.ppt

    7/43

    Within each core, threads are time-sliced

    (just like on a uniprocessor)

  • 8/14/2019 Multi-Core+Architectures.ppt

    8/43

    Interaction With OS

    OS perceives each core as a separateprocessor

    OS scheduler maps threads/processes todifferent cores

    Most major OS support multi-core today

  • 8/14/2019 Multi-Core+Architectures.ppt

    9/43

    Why Multi-Core ? Difficult to make single-core clock frequencies

    even higher Deeply pipelined circuits:

    heat problems

    speed of light problems difficult design and verification large design teams necessary

    server farms need expensive air-conditioning Many new applications are multithreaded General trend in computer architecture (shift

    towards more parallelism)

  • 8/14/2019 Multi-Core+Architectures.ppt

    10/43

    Multi-Core Processor is aSpecial Kind of a Multiprocessor: All processors are on the same chip Multi-core processors are MIMD:

    Different cores execute different threads (Multiple Instructions), operating on different

    parts of memory (Multiple Data).

    Multi-core is a shared memorymultiprocessor:

    All cores share the same memory

  • 8/14/2019 Multi-Core+Architectures.ppt

    11/43

    What Applications Benefit FromMulti-core?

    Database servers Web servers (Web commerce) Compilers Multimedia applications Scientific applications,

    CAD/CAM

    In general, applications with Thread-level parallelism as opposed to instruction level

    parallelism

  • 8/14/2019 Multi-Core+Architectures.ppt

    12/43

    More Examples

    Editing a photo while recording a TV showthrough a digital video recorder Downloading software while running an

    anti-virus program Anything that can be threaded today will

    map efficiently to multi- core

    BUT; some applications difficult to parallelize

  • 8/14/2019 Multi-Core+Architectures.ppt

    13/43

    Instruction-level Parallelism

    Parallelism at the machine-instruction level The processor can re-order, pipelineinstructions, split them intomicroinstructions, do aggressive branchprediction, etc.

    Instruction-level parallelism enabled rapidincreases in processor speeds over thelast 15 years

  • 8/14/2019 Multi-Core+Architectures.ppt

    14/43

    Thread-level Parallelism (TLP)

    This is parallelism on a more coarser scale Server can serve each client in a separatethread (Web server, database server)

    A computer game can do AI, graphics, andphysics in three separate threads Single-core superscalar processors cannot

    fully exploit TLP Multi-core architectures are the next stepin processor evolution: explicitly exploitingTLP

  • 8/14/2019 Multi-Core+Architectures.ppt

    15/43

    Simultaneous MultiThreading (SMT)

    Permits multiple independent threads toexecute SIMULTANEOUSLY on theSAME core

    Weaving together multiple threads on thesame core Example:

    if one thread is waiting for a floating pointoperation to complete, another thread can use the integer units

  • 8/14/2019 Multi-Core+Architectures.ppt

    16/43

    Simultaneous MultiThreading (SMT) Without SMT, only a single

    thread can run at any giventime

  • 8/14/2019 Multi-Core+Architectures.ppt

    17/43

    SMT Processor:

    both threads can run concurrently

  • 8/14/2019 Multi-Core+Architectures.ppt

    18/43

    But: Cant simultaneously use thesame functional unit

  • 8/14/2019 Multi-Core+Architectures.ppt

    19/43

    SMT not a true ParallelProcessor

    Enables better threading (e.g. up to 30%) OS and applications perceive each

    simultaneous thread as a separate virtualprocessor

    The chip has only a single copy of eachresource

    Compare to multi-core: each core has its own copy of resources

  • 8/14/2019 Multi-Core+Architectures.ppt

    20/43

    Multi-Core:threads can run on separate

    cores

  • 8/14/2019 Multi-Core+Architectures.ppt

    21/43

    Multi-Core:threads can run on separate cores

  • 8/14/2019 Multi-Core+Architectures.ppt

    22/43

    Combining Multi-Core and SMT Cores can be SMT-enabled (or not) The different combinations:

    Single-core, non-SMT: standard uniprocessor Single-core, with SMT: different operation

    function Multi-core, non-SMT: Multi-core, with SMT: our F is h M ac h i n e s

    The number of SMT threads: 2, 4, or sometimes 8 simultaneous threads

    Intel calls them hyper -threads

  • 8/14/2019 Multi-Core+Architectures.ppt

    23/43

    SMT Dual-Core: all four threads can run concurrently

  • 8/14/2019 Multi-Core+Architectures.ppt

    24/43

    Multi-Core vs SMT Advantages/disadvantages?

    Multi-core: Since there are several cores, each is smaller

    and not as powerful, but easier to design and manufacture

    Great with thread-level parallelism

    SMT

    Can have one large and fast superscalar core Great performance on a single thread Mostly still only exploits instruction-level parallelism

  • 8/14/2019 Multi-Core+Architectures.ppt

    25/43

  • 8/14/2019 Multi-Core+Architectures.ppt

    26/43

    Fish Machines

    Dual-core Intel Xeonprocessors

    Each core is hyper-threaded

    Private L1 caches Shared L2 caches

  • 8/14/2019 Multi-Core+Architectures.ppt

    27/43

    Designs with Private L2 Caches

  • 8/14/2019 Multi-Core+Architectures.ppt

    28/43

    Private vs Shared Caches? Advantages/disadvantages? Advantages of private:

    They are closer to core, so faster access Reduces contention

    Advantages of shared: Threads on different cores can share the

    same cache data

    More cache space available if a single (or afew) high-performance thread runs on thesystem

  • 8/14/2019 Multi-Core+Architectures.ppt

    29/43

  • 8/14/2019 Multi-Core+Architectures.ppt

    30/43

    The Cache Coherence Problem Suppose variable x initially contains 15213

  • 8/14/2019 Multi-Core+Architectures.ppt

    31/43

    The Cache Coherence Problem Core 1 reads x

  • 8/14/2019 Multi-Core+Architectures.ppt

    32/43

  • 8/14/2019 Multi-Core+Architectures.ppt

    33/43

    The Cache Coherence Problem Core 1 writes to x, setting it to 21660

  • 8/14/2019 Multi-Core+Architectures.ppt

    34/43

    The Cache Coherence Problem Core 2 attempts to read x gets a stale

    copy

  • 8/14/2019 Multi-Core+Architectures.ppt

    35/43

    Solutions for Cache Coherence

    This is a general problem withmultiprocessors, not limited just to multi-core

    There exist many solution algorithms,coherence protocols, etc. simple solutions:

    Invalidation -based protocol with snoopingProtocol Update Protocol

  • 8/14/2019 Multi-Core+Architectures.ppt

    36/43

    Inter-Core Bus

  • 8/14/2019 Multi-Core+Architectures.ppt

    37/43

    Invalidation Protocol withSnooping

    Invalidation: If a core writes to a data item, all other copies

    of this data item in other caches are

    invalidated Snooping:

    All cores continuously snoop (monitor) the

    bus connecting the cores.

  • 8/14/2019 Multi-Core+Architectures.ppt

    38/43

  • 8/14/2019 Multi-Core+Architectures.ppt

    39/43

    Invalidation ProtocolCore 1 writes to x, setting it to 21660

  • 8/14/2019 Multi-Core+Architectures.ppt

    40/43

  • 8/14/2019 Multi-Core+Architectures.ppt

    41/43

  • 8/14/2019 Multi-Core+Architectures.ppt

    42/43

    Invalidation vs UpdateProtocols

    Multiple writes to the same location invalidation: only the first time update: must broadcast each write

    Writing to adjacent words in the samecache block: invalidation: only invalidate block once

    update: must update block on each writeInvalidation generally performs better:

    it generates less bus traffic

  • 8/14/2019 Multi-Core+Architectures.ppt

    43/43

    Programming for Multi-Core

    Programmers must use threads orprocesses

    Spread the workload across multiple coresWrite parallel algorithms

    OS will map threads/processes to cores