Challenges and Solutions for 32nm Node Ultra-Shallow Junctions
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a...
-
Upload
brynn-buttles -
Category
Documents
-
view
215 -
download
1
Transcript of Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a...
![Page 1: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/1.jpg)
Intel Multi-Core Technology
![Page 2: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/2.jpg)
Intel Multi-Core Technology• New Energy Efficiency by Parallel Processing
– Multi cores in a single package– Second generation high k + metal gate 32nm
Technology• Intel Turbo Boost technology
– Changing frequency depending on workload • Intel Hyper-Threading Technology
– Two threads on a single core• Tera-scale computing
– Intend to scale multi-core to 100 cores and beyond
![Page 3: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/3.jpg)
Multi-Core Hyper-Thread
• Multi-core chips allow 2 or more cores on a single package on a computer.
• Multi-core chips do more work per clock cycle, running at lower clock frequency.
• Hyper-thread allows efficient use of a single processor– by allowing multiple threads to share the core’s
resources
![Page 4: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/4.jpg)
Interaction with theOperating System
• OS perceives each core as a separate processor
• OS scheduler maps threads/processes to different cores
• Most major OS support multi-core today:Windows, Linux, Mac OS X, …
![Page 5: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/5.jpg)
Why multi-core ?• Difficult to make single-core
clock frequencies even higher • Deeply pipelined circuits:
– heat problems– speed of light problems– difficult design and verification– large design teams necessary– server farms need expensive
air-conditioning• Many new applications are multithreaded • General trend in computer architecture (shift
towards more parallelism)
![Page 6: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/6.jpg)
Instruction-level parallelism
• Parallelism at the machine-instruction level• The processor can re-order, pipeline
instructions, split them into microinstructions, do aggressive branch prediction, etc.
• Instruction-level parallelism enabled rapid increases in processor speeds over the last 15 years
![Page 7: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/7.jpg)
Thread-level parallelism (TLP)• This is parallelism on a more coarser scale• Server can serve each client in a separate thread
(Web server, database server)• A computer game can do AI, graphics, and
physics in three separate threads• Single-core superscalar processors cannot fully
exploit TLP• Multi-core architectures are the next step in
processor evolution: explicitly exploiting TLP
![Page 8: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/8.jpg)
What applications benefit from multi-core?
• Database servers• Web servers (Web commerce)• Compilers• Multimedia applications• Scientific applications, CAD/CAM• In general, applications with
Thread-level parallelism(as opposed to instruction-level parallelism)
![Page 9: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/9.jpg)
A technique complementary to multi-core:Simultaneous multithreading
• Problem addressed:The processor pipeline can get stalled:– Waiting for the result
of a long floating point (or integer) operation
– Waiting for data to arrive from memory
Other execution unitswait unused BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTBL2 C
ache
and
Con
trol
Bus
![Page 10: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/10.jpg)
Simultaneous multithreading (SMT)
• Permits multiple independent threads to execute SIMULTANEOUSLY on the SAME core
• Weaving together multiple “threads” on the same core1. Example: if one thread is waiting for a floating
point operation to complete, another thread can use the integer units
![Page 11: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/11.jpg)
Without SMT, only a single thread can run at any given time
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROMBTBL2 C
ache
and
Con
trol
Bus
Thread 1: floating point
![Page 12: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/12.jpg)
Without SMT, only a single thread can run at any given time
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROMBTBL2 C
ache
and
Con
trol
Bus
Thread 2:integer operation
![Page 13: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/13.jpg)
SMT processor: both threads can run concurrently
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROMBTBL2 C
ache
and
Con
trol
Bus
Thread 1: floating pointThread 2:integer operation
![Page 14: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/14.jpg)
SMT not a “true” parallel processor
• Enables better threading (e.g. up to 30%)• OS and applications perceive each simultaneous
thread as a separate “virtual processor”
• The chip has only a single copy of each resource
• Compare to multi-core:each core has its own copy of resources
![Page 15: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/15.jpg)
Multi-core: threads can run on separate cores
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTBL2 C
ache
and
Con
trol
Bus
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTBL2 C
ache
and
Con
trol
Bus
![Page 16: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/16.jpg)
Multi-core: threads can run on separate cores
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTBL2 C
ache
and
Con
trol
Bus
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTBL2 C
ache
and
Con
trol
Bus
![Page 17: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/17.jpg)
Combining Multi-core and SMT• Cores can be SMT-enabled (or not)• The different combinations:
– Single-core, non-SMT: standard uniprocessor– Single-core, with SMT – Multi-core, non-SMT– Multi-core, with SMT
• The number of SMT threads:2, 4, or sometimes 8 simultaneous threads
• Intel calls them “Hyper-Threads” (HT Technology)
![Page 18: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/18.jpg)
SMT Dual-core: all four threads can run concurrently
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTBL2 C
ache
and
Con
trol
Bus
BTB and I-TLB
Decoder
Trace Cache
Rename/Alloc
Uop queues
Schedulers
Integer Floating Point
L1 D-Cache D-TLB
uCode ROM
BTBL2 C
ache
and
Con
trol
Bus
![Page 19: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/19.jpg)
Comparison: multi-core vs SMT
• Multi-core:– Since there are several cores,
each is smaller and not as powerful(but also easier to design and manufacture)
– However, great with thread-level parallelism• SMT
– Can have one large and fast superscalar core– Great performance on a single thread– Mostly still only exploits instruction-level
parallelism
![Page 20: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/20.jpg)
The memory hierarchy for threading
• If simultaneous multithreading only: – all caches shared
• Multi-core chips:– L1 caches private– L2 caches private in some architectures
and shared in others
• Memory is always shared
![Page 21: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/21.jpg)
Intel Xeon Dual-core
• Dual-coreIntel Xeon processors
• Each core is hyper-threaded
• Private L1 caches• Shared L2 caches
memory
L2 cache
L1 cache L1 cacheC O
R E
1
C O
R E
0
hyper-threads
![Page 22: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/22.jpg)
Designs with private L2 caches
memory
L2 cache
L1 cache L1 cacheC O
R E
1
C O
R E
0
L2 cache
memory
L2 cache
L1 cache L1 cacheC O
R E
1
C O
R E
0
L2 cache
L3 cache L3 cache
A design with L3 caches
Example: Intel Itanium 2
Both L1 and L2 are private
Examples: AMD Opteron, AMD Athlon, Intel Pentium D
![Page 23: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/23.jpg)
Private vs shared caches
• Advantages of private:– They are closer to core, so faster access– Reduces contention
• Advantages of shared:– Threads on different cores can share the same
cache data– More cache space available if a single (or a few)
high-performance thread runs on the system
![Page 24: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/24.jpg)
The cache coherence problem• Since we have private caches:
How to keep the data consistent across caches?• Each core should perceive the memory as a
monolithic array, shared by all the cores
MESI cache Coherence Protocol
![Page 25: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/25.jpg)
The Core i3 500 series products are dual cores and they do have hyper-threading and support virtualization, but they do not have Turbo Boost.
The Core i5 600 series products are dual cores which have hyper-threading, Turbo Boost, virtualization, and the AES instruction set.
![Page 26: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/26.jpg)
TDP: Thermal Design Power
![Page 27: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/27.jpg)
The Turbo Boost Technology
• When using fewer cores, transistors built into the chip disconnected from the power bus
• When programs need a single thread, then the connected core is automatically pumped up with extra voltage and over clock for a short period of time until the job is done.
• The Turbo Boost decodes when to do what to maximize performance
![Page 28: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/28.jpg)
The Turbo Boost Technology
• Of course, consistently over clocking a machine can overheat the chip and render it useless fairly rapidly
• Intel has ensured that its Mobile Nehalem parts (codenamed Clarksfield) protect themselves through self monitoring and shutting down if temperature limits are breached. (How about constantly shutting down the cores !!)
![Page 29: Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.](https://reader035.fdocuments.us/reader035/viewer/2022062712/56649c785503460f9492dd96/html5/thumbnails/29.jpg)
Teaching a new course at UCCS, fall 2012
ECE5990/4990 Power Electronics
Graduate students welcome to take this course