COMP25212 SYSTEM ARCHITECTURE Antoniu Pop [email protected] Jan/Feb 2015COMP25212 Lecture 1.
COMP25212 CPU Multi Threading
-
Upload
rahim-moses -
Category
Documents
-
view
39 -
download
4
description
Transcript of COMP25212 CPU Multi Threading
![Page 1: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/1.jpg)
COMP25212 CPU Multi Threading
• Learning Outcomes: to be able to:– Describe the motivation for multithread support in CPU
hardware
– To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading
– To explain when multithreading is inappropriate
– To be able to describe a multithreading implementations
– To be able to estimate performance of these implementations
– To be able to state important assumptions of this performance model
![Page 2: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/2.jpg)
Revision: IncreasingCPU Performance
Data Cache
Fetch Logic
Fetch Logic
Decode Logic
Fetch Logic
Exec Logic
Fetch Logic
Mem
Logic
Write Logic
Inst Cache
How can throughput be increased?
Clock
a
c
b
d
f
e
![Page 3: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/3.jpg)
Increasing CPU Performance
a) By increasing clock frequency
b) By increasing Instructions per Clock
c) Minimizing memory access impact – data cached) Maximising Inst issue rate – branch prediction
e) Maximising Inst issue rate – superscalar
f) Maximising pipeline utilisation – avoid instruction dependencies – out of order execution
g) (What does lengthening pipeline do?)
![Page 4: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/4.jpg)
Increasing Program Parellelism
– Keep issuing instructions after branch?– Keep processing instructions after cache miss?– Process instructions in parallel?– Write register while previous write pending?
• Where can we find additional independent instructions?– In a different program!
![Page 5: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/5.jpg)
Revision – Process States
Terminated
Running on a CPU
Blocked waiting for
event
Ready waiting for
a CPU
New
Dispatch(scheduler)
Needs to wait(e.g. I/O)
I/O occurs
Pre-empted(e.g. timer)
![Page 6: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/6.jpg)
Revision – Process Control Block
• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management
Info
• Open File List, with positions
• Network Connections• CPU time used• Parent Process ID
![Page 7: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/7.jpg)
Revision: CPU Switch
Process P0Process P1Operating System
Save state into PCB0
Load state fromPCB1
Save state into PCB0
Load state fromPCB1
![Page 8: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/8.jpg)
What does CPU load on dispatch?
• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management
Info
• Open File List, with positions
• Network Connections• CPU time used• Parent Process ID
![Page 9: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/9.jpg)
What does CPU need to store on deschedule?
• Process ID• Process State• PC• Stack Pointer• General Registers• Memory Management
Info
• Open File List, with positions
• Network Connections• CPU time used• Parent Process ID
![Page 10: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/10.jpg)
CPU Support for Multithreading
Data Cache
Fetch Logic
Fetch Logic
Decode Logic
Fetch Logic
Exec Logic
Fetch Logic
Mem
Logic
Write Logic
Inst Cache
PCA
PCB
VA MappingA
VA MappingB
AddressTranslation
GPRsA
GPRsB
![Page 11: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/11.jpg)
How Should OS View Extra Hardware Thread?
• A variety of solutions
• Simplest is probably to declare extra CPU
• Need multiprocessor-aware OS
![Page 12: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/12.jpg)
CPU Support for Multithreading
Data Cache
Fetch Logic
Fetch Logic
Decode Logic
Fetch Logic
Exec Logic
Fetch Logic
Mem
Logic
Write Logic
Inst Cache
PCA
PCB
VA MappingA
VA MappingB
AddressTranslation
GPRsA
GPRsB
Design Issue:when to switch
threads
![Page 13: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/13.jpg)
Coarse-Grain Multithreading
• Switch Thread on “expensive” operation:– E.g. I-cache miss– E.g. D-cache miss
• Some are easier than others!
![Page 14: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/14.jpg)
Switch Threads on Icache miss1 2 3 4 5 6 7
Inst a IF ID EX MEM WB
Inst b IF ID EX MEM WB
Inst c IF MISS ID EX MEM WB
Inst d IF ID EX MEM
Inst e IF ID EX
Inst f IF ID
Inst X
Inst Y
Inst Z
- - - -
![Page 15: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/15.jpg)
Performance of Coarse Grain
• Assume (conservatively)– 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks)– 1 i-cache miss per 100 instructions– 1 instruction per clock otherwise
• Then, time to execute 100 instructions without multithreading– 100 + 20 clock cycles– Inst per Clock = 100 / 120 = 0.83.
• With multithreading: time to exec 100 instructions:– 100 [+ 1]– Inst per Clock = 100 / 101 = 0.99..
![Page 16: COMP25212 CPU Multi Threading](https://reader036.fdocuments.us/reader036/viewer/2022062407/56812b8d550346895d8fa953/html5/thumbnails/16.jpg)
Switch Threads on Dcache miss1 2 3 4 5 6 7
Inst a IF ID EX M-Miss WB
Inst b IF ID EX MEM WB
Inst c IF ID EX MEM WB
Inst d IF ID EX MEM
Inst e IF ID EX
Inst f IF ID
MISS MISS MISS
- - -
- - -
- - -
Inst X
Inst Y
Performance:similar calculation (STATE ASSUMPTIONS!)
Where to restart after memory cycle? I suggest instruction “a” – why?
Abort theseAbort these