COMP-25212 Multithreading. Coarse Grain Multithreading Minimal pipeline changes – Need to abort...
-
Upload
anastasia-gilmore -
Category
Documents
-
view
213 -
download
0
Transcript of COMP-25212 Multithreading. Coarse Grain Multithreading Minimal pipeline changes – Need to abort...
COMP-25212
Multithreading
Coarse Grain Multithreading
• Minimal pipeline changes– Need to abort instructions in “shadow” of miss– Resume instruction stream to recover
• Good to compensate for infrequent, but expensive pipeline disruption
CS25212 Fine Grain Multithreading
• Learning Objectives:– To be able to describe a fine grain multithreading
implementation– To be able to describe performance characteristics– To be able to describe Simultaneous
Multithreading implementations
Fine-Grain Multithreading
• Switch CPU Threads with minimal (zero?) overhead
• Multithreading now helps resolve fine-grain dependencies (e.g. forwarding?)
1 2 3 4 5 6 7
Inst a IF ID EX MEM WB
Inst M IF ID EX MEM WB
Inst b IF ID EX MEM WB
Inst N IF ID EX MEM
Inst c IF ID EX
Inst P IF ID
1 2 3 4 5 6 7
Inst a IF ID EX MEM WB
Inst M IF ID EX MEM WB
Inst b IF ID EX MEM WB
Inst N IF ID EX MEM
Inst c IF ID EX
Inst P IF ID
Fine Grain Multithreading
• What about cache misses?
• This has the advantage of simplicity
4 5 6 7
M MISS
EX MEM WB
ID
IF ID EX MEM
IF ID
4 5 6 7
M MISS Miss Miss WB
EX MEM WB
ID (ID) (ID) EX
IF ID EX MEM
IF (IF) (IF)
IF ID
Fine Grain Multithreading
• Alternatively, if 1 CPU thread stalled, issue every clock from alternate thread
1 2 3 4 5 6 7
Inst a IF ID EX M-MISS Miss Miss WB
Inst M IF ID EX MEM WB
Inst b IF ID (ID) (ID) EX
Inst N IF ID EX MEM
Inst P IF ID EX
Inst Q IF ID
• Fine-grain dependency assistance?• Other comments?
CPU Support for Fine Grain MT
Data Cache
Fetch Logic
Fetch Logic
Decode Logic
Fetch Logic
Exec Logic
Fetch Logic
Mem
Logic
Write Logic
Inst Cache
PCA
PCB
VA MappingA
VA MappingB
AddressTranslation
GPRsA
GPRsB
PCA
PCB
Simultaneous Multi-Threading
“permit different threads to occupy the same pipeline stage at the same time”
• This makes most sense with superscalar issue
Inst Issue Logic
Fetch Logic
Decode+
Registers
Inst Cache Data Cache
Fetch Logic
Mem
Logic
Write Logic
Simultaneous MultiThreading
• Let’s look simply at instruction issue:1 2 3 4 5 6 7 8 9 10
Inst a IF ID EX MEM WB
Inst b IF ID EX MEM WB
Inst M IF ID EX MEM WB
Inst N IF ID EX MEM WB
Inst c IF ID EX MEM WB
Inst P IF ID EX MEM WB
Inst Q IF ID EX MEM WB
Inst d IF ID EX MEM WB
Inst e IF ID EX MEM WB
Inst R IF ID EX MEM WB
SMT issues
• Asymmetric pipeline stall– One part of pipeline stalls – we want other
pipeline to continue• Overtaking – want unstalled thread to make
progress• Pipeline overcrowding – may need extra wide
pipeline registers (why?)• Existing implementations (mainly) on O-o-O,
register renamed architectures
How Far Can SMT go?
• From Intel Core i7 description:
From Intel publication 248966-020
Core i7 Instruction Issue Logic
• Alternate clock cycles to alternate CPU threads• Out-of-Order engine supports up to 128 uOps
SMT: Glimpse Into The Future?
• Scout threads?– A thread to prefetch memory – reduce cache miss
overhead
• Speculative threads?– Allow a thread to execute speculatively way past
branch/jump/call/miss/etc– Needs revised O-o-O logic– Needs and extra memory support– See Transactional Memory
CPU Multithreading Summary
• A cost-effective way of finding additional parallelism for the CPU pipeline
• Available in x86, Itanium, Power and SPARC• (Most architectures) Present additional CPU
thread as additional CPU to Operating System• Operating Systems Beware!!! (why?)
But…
• Performance problems with multithreading?
a) ………………..
b) ………………..
c) ………………..