Multi-core, Mega-nonsense
description
Transcript of Multi-core, Mega-nonsense
![Page 1: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/1.jpg)
Multi-core, Mega-nonsense
![Page 2: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/2.jpg)
Will multicore cure cancer?
• Given that multicore is a reality– …and we have quickly jumped from one core to 2 to 4 to 8– It is easy to let one’s imagination run wild – a million cores!
• A lot of misinformation has surfaced
• What multi-core is and what it is not
• And where we go from here
![Page 3: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/3.jpg)
To whet the appetite
• Can multi-core save power via the freq cube law?
• Is ILP dead?
• Should sample benchmarks drive future designs?
• Is hardware really sequential?
• Should multi-core structures be simple?
• Does productivity demand we ignore what’s below?
![Page 4: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/4.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 5: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/5.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 6: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/6.jpg)
How we got here (Moore’s Law)
• The first microprocessor (Intel 4004), 1971– 2300 transistors– 106 KHz
• The Pentium chip, 1992– 3.1 million transistors– 66 MHz
• Today– more than one billion transistors– Frequencies in excess of 5 GHz
• Tomorrow ?
![Page 7: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/7.jpg)
How have we used the available transistors?
Time
Nu
mb
er o
f T
ran
sist
ors
Cache
Microprocessor
![Page 8: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/8.jpg)
Intel Pentium M
![Page 9: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/9.jpg)
Intel Core 2 Duo
• Penryn, 2007• 45nm, 3MB L2
![Page 10: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/10.jpg)
Why Multi-core chips?
• In the beginning: a better and better uniprocessor– improving performance on the hard problems– …until it just got too hard
• Followed by: a uniprocessor with a bigger L2 cache– forsaking further improvement on the “hard” problems– poorly utilizing the chip area– and blaming the processor for not delivering performance
• Today: dual core, quad core, octo core
• Tomorrow: ???
![Page 11: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/11.jpg)
Why Multi-core chips?
• It is easier than designing a much better uni-core …and cheaper!
• It was embarrassing to continue making L2 bigger
• It was the next obvious step
![Page 12: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/12.jpg)
So, What’s the Point
• Yes, Multi-core is a reality
• No, it wasn’t a technological solution to performance improvement • Ergo, we do not have to accept it as is
• i.e., we can get it right the second time, and that means:
What goes on the chipWhat are the interfaces
![Page 13: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/13.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 14: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/14.jpg)
Hardware is the ultimate in parallelism!
It is NOT about cycle by cycle,
It is about what goes on in EACH cycle
![Page 15: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/15.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 16: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/16.jpg)
The Asymmetric Chip Multiprocessor (ACMP)
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Largecore
ACMP Approach
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
Niagara-likecore
“Niagara” Approach
Largecore
Largecore
Largecore
Largecore
“Tile-Large” Approach
![Page 17: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/17.jpg)
Large core vs. Small Core
• Out-of-order• Wide fetch e.g. 4-wide• Deeper pipeline• Aggressive branch
predictor (e.g. hybrid)• Many functional units• Trace cache• Memory dependence
speculation
• In-order• Narrow Fetch e.g. 2-
wide• Shallow pipeline• Simple branch predictor
(e.g. Gshare)• Few functional units
LargeCore
SmallCore
![Page 18: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/18.jpg)
0
1
2
3
4
5
6
7
8
9
0 0.2 0.4 0.6 0.8 1
Degree of Parallelism
Sp
eed
up
vs.
1 L
arg
e C
ore
NiagaraTile-LargeACMP
Throughput vs. Serial Performance
![Page 19: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/19.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 20: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/20.jpg)
Huh?
![Page 21: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/21.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 22: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/22.jpg)
ILP is dead
• We double the number of transistors on the chip– Pentium M: 77 Million transistors (50M for the L2 cache)– 2nd Generation: 140 Million (110M for the L2 cache)
• We see 5% improvement in IPC• Ergo: ILP is dead! • Perhaps we have blamed the wrong culprit.
• The EV4,5,6,7,8 data: from EV4 to EV8:– Performance improvement: 55X– Performance from frequency: 7X– Ergo: 55/7 > 7 -- more than half due to microarchitecture
![Page 23: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/23.jpg)
![Page 24: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/24.jpg)
Moore’s Law
• A law of physics• A law of process technology• A law of microarchitecture• A law of psychology
![Page 25: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/25.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 26: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/26.jpg)
Examine what is (rather than what can be)
Should sample benchmarks drive future designs?
Another bridge over the East River?
![Page 27: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/27.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 28: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/28.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 29: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/29.jpg)
“Abstraction” is Misunderstood
• Taxi to the airport• The Scheme Chip (Deeper understanding)• Sorting (choices)• Microsoft developers (Deeper understanding)
![Page 30: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/30.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 31: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/31.jpg)
Not all programmers are created equal
• Some want to just get their work done– Performance be damned– They could care less about how computers work
• Some want performance above all else– They understand how computers work– They can program at the lowest level
Ergo: At least two interfaces
![Page 32: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/32.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 33: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/33.jpg)
Thinking in Parallel is Hard
• Perhaps: Thinking is Hard
• How do we get people to believe:
Thinking in parallel is natural
![Page 34: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/34.jpg)
Parallel Programming is Hard?
• What if we start teaching parallel thinking
in the first course to freshmen
• For example:
– Factorial– Parallel search– Streaming
![Page 35: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/35.jpg)
Mega-nonsense
• Multi-core was a solution to a performance problem• Hardware works sequentially• Make the hardware simple – thousands of cores• Do in parallel at a slower clock and save power• ILP is dead• Examine what is (rather than what can be)• Communication: off-chip hard, on-chip easy• Abstraction is a pure good• Programmers are all dumb and need to be protected• Thinking in parallel is hard
![Page 36: Multi-core, Mega-nonsense](https://reader035.fdocuments.us/reader035/viewer/2022062314/568147ff550346895db532fb/html5/thumbnails/36.jpg)
!