The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC...
Transcript of The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC...
![Page 1: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/1.jpg)
The Future of ARM in HPC
Devrin Talen and James Coddington
![Page 2: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/2.jpg)
Background
● The shift from RISC to x86 supercomputers was driven by cost, availability and performance
● Are the same conditions are now present for a shift from x86 to ARM systems?
![Page 3: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/3.jpg)
Source: Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?, 2013
Rise and Fall of RISC and x86
![Page 4: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/4.jpg)
Performance Trends of ARM and x86
Source: Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?, 2013
![Page 5: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/5.jpg)
Motivation
● Energy consumption is now the limiting factor for performance
● ARM processors, originally designed for mobile and embedded systems, are very energy efficient
● Makes them an attractive choice for an power efficient HPC cluster
![Page 6: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/6.jpg)
Introducing ARM
● Energy efficient RISC processor● Architecture developed by ARM Holdings
and licensed to manufacturers● Dominates the large mobile market and
starting to reach the server market● Cheap!
$3.40 for a 32b microprocessor running Linux
![Page 7: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/7.jpg)
Quick Comparison of ARM vs. x86ARM x86
Processor Texas Instruments OMAP4430 (Dual core A9)
Intel Core2 Q9400
Year Released 2011 2008
Process 45nm 45nm
Cores 2 4
Clock Speed 1GHz 2.66GHz
Memory 1GB DDR2 8GB DDR2
Network 100Mbps 1000Mbps
TDP 1.9W 95W
![Page 8: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/8.jpg)
PerformanceWeb server performance measured with three different page sizes: 30KB, 50KB and 100KB.
Source: Energy- and Cost-Efficiency Analysis of ARM-Based Clusters, 2012
![Page 9: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/9.jpg)
PerformanceThe energy consumption (inversely proportional to performance) of a simulated ARM system. Shows that performance increases as bottlenecks in the system are identified and addressed.
Source: On Understanding the Energy Consumption of ARM-bases Multicore Servers, 2013
![Page 10: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/10.jpg)
Performance
● ARM processors can’t (yet) compete with x86 processors in raw compute power
● Depending on how important energy consumption is and the performance needed, they may be an alternative
● ARM processors are getting more powerful and x86 processors are getting more energy efficient
![Page 11: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/11.jpg)
DARPA Exascale Computing
● Started in 2008 as a challenge to reach an exaflop of computing power
● Design goals○ Finish by 2018○ Achieve 1 Exaflop within 20 MWatts
● Limitations found○ Power○ Cost
![Page 12: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/12.jpg)
Limited by Power Requirements
● Exascale goal: 20 MW, 50 GFlops/Watt● Current Technology: 200 MW● Most Efficient Supercomputer
○ TSUBAME-KFC - LX 1U-4GPU/104Re-1G Cluster○ Tokyo Institute of Technology○ 4,503.17 MFlops/Watt (4.5 GFLOPS/W)○ Heterogeneous system with Intel processors and
NVIDIA GPUs
![Page 13: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/13.jpg)
Attempts to Lower Power
● BlueGene (custom PowerPC processor)● Other RISC Machines● Heterogeneous Architectures
○ Xeon Phi Co-Processor - 4 GFlops / W○ NVIDIA K20x Co-Processor - 18 GFlops / W
![Page 14: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/14.jpg)
Current ARM Clusters
● Tibidabo (2011)○ Proof of Concept ARM based Supercomputer○ 256 NVIDIA Tegra2 nodes○ 512 GFLOPS @ 3.4 kW (0.15 GFLOPS/W)
● Mont-Blanc (2014 - 2016)○ Currently 810 processor modules with Samsung
Exynos5 SoCs, 4GB DDR3 and 1Gb ethernet○ 26 TFLOPS @ 18 kW (1.44 GFLOPS/W)○ Plans to scale higher
![Page 15: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/15.jpg)
Tibidabo
● 256 nodes, 1Gb links● 9 48-port routers● 48-ary tree network
○ Network diameter: 4○ Average distance: 3.7○ Bisection width: 1
![Page 16: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/16.jpg)
Mont-Blanc● Level 1: 810 nodes, 1Gb links● Level 2: 15-port switch, 10Gb
links● Level 3: 9-port switch, 10Gb links● Top level: 6-port switch● Fat tree network
○ Network Diameter: 6○ Average Distance: 5○ Bisection width: 3
![Page 17: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/17.jpg)
Conclusions
● Not there yet in terms of performance● Could be solved by heterogeneous systems
with a mix of ARM SoCs and GPUs● Huge mobile market drives ARM cost down
and performance up● Could be the future of HPC
![Page 18: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/18.jpg)
Questions?
![Page 19: The Future of ARM in HPCmeseec.ce.rit.edu/756-projects/spring2014/1-1.pdfThe Future of ARM in HPC Devrin Talen and James Coddington Background The shift from RISC to x86 supercomputers](https://reader033.fdocuments.us/reader033/viewer/2022051604/600366c6229ee708bf224d7f/html5/thumbnails/19.jpg)
Source: Building Supercomputers from Commodity Embedded Chips, 2014
Backup Slide - Energy Requirements