Multi-Core+Architectures.ppt
Transcript of Multi-Core+Architectures.ppt
-
8/14/2019 Multi-Core+Architectures.ppt
1/43
Multi-Core Architectures
-
8/14/2019 Multi-Core+Architectures.ppt
2/43
Multi-Core Architectures Single-core computer
-
8/14/2019 Multi-Core+Architectures.ppt
3/43
Single-Core CPU Chip
-
8/14/2019 Multi-Core+Architectures.ppt
4/43
Multi-Core Architectures
Replicate multiple processor cores on asingle die.
-
8/14/2019 Multi-Core+Architectures.ppt
5/43
Multi-Core CPU Chip The cores fit on a single processor socket Also called CMP (Chip Multi-Processor)
-
8/14/2019 Multi-Core+Architectures.ppt
6/43
The Cores Run in Parallel
-
8/14/2019 Multi-Core+Architectures.ppt
7/43
Within each core, threads are time-sliced
(just like on a uniprocessor)
-
8/14/2019 Multi-Core+Architectures.ppt
8/43
Interaction With OS
OS perceives each core as a separateprocessor
OS scheduler maps threads/processes todifferent cores
Most major OS support multi-core today
-
8/14/2019 Multi-Core+Architectures.ppt
9/43
Why Multi-Core ? Difficult to make single-core clock frequencies
even higher Deeply pipelined circuits:
heat problems
speed of light problems difficult design and verification large design teams necessary
server farms need expensive air-conditioning Many new applications are multithreaded General trend in computer architecture (shift
towards more parallelism)
-
8/14/2019 Multi-Core+Architectures.ppt
10/43
Multi-Core Processor is aSpecial Kind of a Multiprocessor: All processors are on the same chip Multi-core processors are MIMD:
Different cores execute different threads (Multiple Instructions), operating on different
parts of memory (Multiple Data).
Multi-core is a shared memorymultiprocessor:
All cores share the same memory
-
8/14/2019 Multi-Core+Architectures.ppt
11/43
What Applications Benefit FromMulti-core?
Database servers Web servers (Web commerce) Compilers Multimedia applications Scientific applications,
CAD/CAM
In general, applications with Thread-level parallelism as opposed to instruction level
parallelism
-
8/14/2019 Multi-Core+Architectures.ppt
12/43
More Examples
Editing a photo while recording a TV showthrough a digital video recorder Downloading software while running an
anti-virus program Anything that can be threaded today will
map efficiently to multi- core
BUT; some applications difficult to parallelize
-
8/14/2019 Multi-Core+Architectures.ppt
13/43
Instruction-level Parallelism
Parallelism at the machine-instruction level The processor can re-order, pipelineinstructions, split them intomicroinstructions, do aggressive branchprediction, etc.
Instruction-level parallelism enabled rapidincreases in processor speeds over thelast 15 years
-
8/14/2019 Multi-Core+Architectures.ppt
14/43
Thread-level Parallelism (TLP)
This is parallelism on a more coarser scale Server can serve each client in a separatethread (Web server, database server)
A computer game can do AI, graphics, andphysics in three separate threads Single-core superscalar processors cannot
fully exploit TLP Multi-core architectures are the next stepin processor evolution: explicitly exploitingTLP
-
8/14/2019 Multi-Core+Architectures.ppt
15/43
Simultaneous MultiThreading (SMT)
Permits multiple independent threads toexecute SIMULTANEOUSLY on theSAME core
Weaving together multiple threads on thesame core Example:
if one thread is waiting for a floating pointoperation to complete, another thread can use the integer units
-
8/14/2019 Multi-Core+Architectures.ppt
16/43
Simultaneous MultiThreading (SMT) Without SMT, only a single
thread can run at any giventime
-
8/14/2019 Multi-Core+Architectures.ppt
17/43
SMT Processor:
both threads can run concurrently
-
8/14/2019 Multi-Core+Architectures.ppt
18/43
But: Cant simultaneously use thesame functional unit
-
8/14/2019 Multi-Core+Architectures.ppt
19/43
SMT not a true ParallelProcessor
Enables better threading (e.g. up to 30%) OS and applications perceive each
simultaneous thread as a separate virtualprocessor
The chip has only a single copy of eachresource
Compare to multi-core: each core has its own copy of resources
-
8/14/2019 Multi-Core+Architectures.ppt
20/43
Multi-Core:threads can run on separate
cores
-
8/14/2019 Multi-Core+Architectures.ppt
21/43
Multi-Core:threads can run on separate cores
-
8/14/2019 Multi-Core+Architectures.ppt
22/43
Combining Multi-Core and SMT Cores can be SMT-enabled (or not) The different combinations:
Single-core, non-SMT: standard uniprocessor Single-core, with SMT: different operation
function Multi-core, non-SMT: Multi-core, with SMT: our F is h M ac h i n e s
The number of SMT threads: 2, 4, or sometimes 8 simultaneous threads
Intel calls them hyper -threads
-
8/14/2019 Multi-Core+Architectures.ppt
23/43
SMT Dual-Core: all four threads can run concurrently
-
8/14/2019 Multi-Core+Architectures.ppt
24/43
Multi-Core vs SMT Advantages/disadvantages?
Multi-core: Since there are several cores, each is smaller
and not as powerful, but easier to design and manufacture
Great with thread-level parallelism
SMT
Can have one large and fast superscalar core Great performance on a single thread Mostly still only exploits instruction-level parallelism
-
8/14/2019 Multi-Core+Architectures.ppt
25/43
-
8/14/2019 Multi-Core+Architectures.ppt
26/43
Fish Machines
Dual-core Intel Xeonprocessors
Each core is hyper-threaded
Private L1 caches Shared L2 caches
-
8/14/2019 Multi-Core+Architectures.ppt
27/43
Designs with Private L2 Caches
-
8/14/2019 Multi-Core+Architectures.ppt
28/43
Private vs Shared Caches? Advantages/disadvantages? Advantages of private:
They are closer to core, so faster access Reduces contention
Advantages of shared: Threads on different cores can share the
same cache data
More cache space available if a single (or afew) high-performance thread runs on thesystem
-
8/14/2019 Multi-Core+Architectures.ppt
29/43
-
8/14/2019 Multi-Core+Architectures.ppt
30/43
The Cache Coherence Problem Suppose variable x initially contains 15213
-
8/14/2019 Multi-Core+Architectures.ppt
31/43
The Cache Coherence Problem Core 1 reads x
-
8/14/2019 Multi-Core+Architectures.ppt
32/43
-
8/14/2019 Multi-Core+Architectures.ppt
33/43
The Cache Coherence Problem Core 1 writes to x, setting it to 21660
-
8/14/2019 Multi-Core+Architectures.ppt
34/43
The Cache Coherence Problem Core 2 attempts to read x gets a stale
copy
-
8/14/2019 Multi-Core+Architectures.ppt
35/43
Solutions for Cache Coherence
This is a general problem withmultiprocessors, not limited just to multi-core
There exist many solution algorithms,coherence protocols, etc. simple solutions:
Invalidation -based protocol with snoopingProtocol Update Protocol
-
8/14/2019 Multi-Core+Architectures.ppt
36/43
Inter-Core Bus
-
8/14/2019 Multi-Core+Architectures.ppt
37/43
Invalidation Protocol withSnooping
Invalidation: If a core writes to a data item, all other copies
of this data item in other caches are
invalidated Snooping:
All cores continuously snoop (monitor) the
bus connecting the cores.
-
8/14/2019 Multi-Core+Architectures.ppt
38/43
-
8/14/2019 Multi-Core+Architectures.ppt
39/43
Invalidation ProtocolCore 1 writes to x, setting it to 21660
-
8/14/2019 Multi-Core+Architectures.ppt
40/43
-
8/14/2019 Multi-Core+Architectures.ppt
41/43
-
8/14/2019 Multi-Core+Architectures.ppt
42/43
Invalidation vs UpdateProtocols
Multiple writes to the same location invalidation: only the first time update: must broadcast each write
Writing to adjacent words in the samecache block: invalidation: only invalidate block once
update: must update block on each writeInvalidation generally performs better:
it generates less bus traffic
-
8/14/2019 Multi-Core+Architectures.ppt
43/43
Programming for Multi-Core
Programmers must use threads orprocesses
Spread the workload across multiple coresWrite parallel algorithms
OS will map threads/processes to cores