outthink limits - HPC Advisory Council · outthink limits Ingolf Wittmann Technical Director Leader...
Transcript of outthink limits - HPC Advisory Council · outthink limits Ingolf Wittmann Technical Director Leader...
outthinklimits
Ingolf WittmannTechnical DirectorLeader of HPC
The Computer that could be smarter than usCognitive Computing
IBM HPC/HPDA
Unparalleled acceleration
2.5xBandwidth
An industry first — POWER8 with NVIDIA NVLink delivers 2.5X the bandwidth to GPU accelerators, allowing you to experience Kinetica at the speed it was intended compared to x86 based systems.
Real-time results
10xPerformance
With the unique capabilities of Tesla P100 + POWER8, Kinetica has 2.4x the performance of competing systems enabling you to analyze and visualize large datasets in milliseconds vs. hours or minutes.
Kinetica - Unstructured Databases
POWER8 with NVLink System
Power Systems S822LC with 4 Tesla P100s:
188,852 queries per hour
PCIe x16 3.0/x86 SystemXeon E5-2640 v4
with 4 Tesla K80s : 73,320 queries per hour
What is Kinetica? Kinetica’s in-memory database powered by graphics processing units (GPUs) was built from the ground up to deliver truly real-time insights on data in motion: orders of magnitude faster performance (potentially 100, 1000X) at 10% to 25% of the cost of traditional data platforms.
What are the Key Markets?• Retail: Inventory Mgt, BI, Apps, Big
Data tools, HPA
• Distribution / Logistics : Supply Chain Mgt
• Financial Services : Fraud Detection, AML
• Ad-Tech : More Targeted Marketing
• IoT : End Point Management, RFID
65% reduction in data transfer time (3X improvement) in for Kinetica GPU-accelerated DB• Less data-induced latency in all applications• Unique to POWER8 with NVLink• Less coding to compensate for slow data movement!1.95X of the 2.5X overall performance improvement attributable to NVLink
40 tick Query Time: S822LC for HPC, NVLink40 tick Query Time: S822LC for HPC, NVLink
100 tick Query Time: Competing System PCI-E x16 3.0100 tick Query Time: Competing System PCI-E x16 3.0
73 ticks
Data Transfer
27 ticks
Calculation*
26 ticks 14 ticks
* Includes non-overlapping: CPU, GPU, and idle times.
Data Transfer Calculation*
65% Reduction
IBM HPC/HPDA
Deep Learning - Getting HPC to ‘Work Smart Not Hard’
• Typically HPC development is focused on increased speed.
• The fastest calculation is the one which you don’t run!
• Can we use machine learning to make better decisions on which simulations give the most value?
• Can we use machine learning to improve resolution of information?
Cognitive steering of an ensemble of simulations
Application of cognitive techniques in HPC can overcome and go beyond Moore’s law
Cognitive driven workflow uses
1/3of the calculations to achieve
4xOrders of magnitude resolution increase.
An industry first — POWER8 with NVIDIA NVLink delivers 2.5X the bandwidth to GPU accelerators, allowing you to experience Kinetica at the speed it was intended compared to x86 based systems.
IBM HPC/HPDA
ArtificialIntelligence &
Cognitive Applications
BigData
MachineLearning
Deep Learning
(Neuronal Nets)
Cognitive landscape: terms and relationship
IBM HPC/HPDA
Deep Learning / AI Lexicon
• Artificial Intelligence > Machine Learning > Deep Learning
• Deep Learning = Training (datacenter, compute intensive) + Inference (edge, embedded… closer to user)
• Training = neural “inspired”, fed by millions of data points … repetition drives weighting and connection
• Platform = Frameworks + Supporting Libraries + Compute
• Compute = Acceleration + Extreme Bandwidth
• Desired outcome: higher accuracy in perceptive tasks, a model for inference
IBM HPC/HPDA
P8TrueNorth
Neuromorphic“Right Brain Computing”
1x
http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=5o2q0UxHcFa
o=pkikk
20x
100x
∞
IBM HPC/HPDA
Deep Learning areas
“ The general idea of deep learning is to use neural networks to build multiple layers of abstraction to solve a complex semantic problem.”- Aaron Chavez, IBM Watson
Voice assisted recognition
Image recognition
Fraud prevention
Critical environment investigation
IBM HPC/HPDA
Neuron Function (Architecture)TrueNorth Chip – Synapse Chip
o Emulation of analog behaviour by +/- 255 INT variableo 2-dimensional on-chip synaptic weighted network and off-chip packet
based thru-neuron routing for multi-chip scaling
o Update of Synaptic network every ms (logical / biological clock), internal processing ~ 1MHz
o Neuron fires a spike (45 pJ) to the network if in the last update cycle a threshold was reached or exceeded
o Stochastic and leak behavior configurable
Vj(t) = Vj(t-1) + Ai(t) * ij * [(1 - bj) * sj + sign(sj) * bj * F(|sj|, j)] + Leak
Leak = * [(1 - cj) * j + sign(j) * cj * F(|j|,j)]+1
0-1
( )
Random number {uint}
Weight {signed int}
Step {0,1}Weight {0,1}membrane potential
{signed int}
synapsematrix {0,1}input
spike {0,1}
i=0
255
Membrane potential for neuron j at time t:
Random number {uint}
Leak weight {signed int}
Weight {0,1}Step {0,1}
o=pkikk
Maintenance in234 flights
Liquid Synapsean extreme Blue project
David Stöckel Julian Heyne Maximillian Löhr Pascal Nieters
IBM HPC/HPDA
IBM Neurosynaptic SystemLiquid State Machine
Liquid state machine with TrueNorth Chip – Synapse Chip
Sensor Data Neural Network Readout
IBM HPC/HPDA
Accelerator Connection Bandwidths
| 11
POWER9PowerAccel
State of the Art I/O and Acceleration Attachment Signaling• PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth• 25G Link x 48 lanes – 300 GB/s duplex bandwidth
Robust Accelerated Compute Options with OPEN standards• On-Chip Acceleration – Gzip x1, 842 Compression x2, AES/SHA x2• CAPI 2.0 – 4x bandwidth of POWER8 using PCIe Gen 4• NVLink 2.0 – Next generation of GPU/CPU interconnect
Up to 2x bandwidth of NVLink1.0 Easier programming model for complex analytic & cognitive applications
• Coherency, virtual addressing, low overhead communication• OpenCAPI 3.0 – High bandwidth, low latency and open interface using 25G Link
Extreme Processor / Accelerator Bandwidth and Reduced Latency
Coherent Memory and Virtual Addressing Capability for all Accelerators
OpenPOWER Community Enablement – Robust Accelerated Compute Options
IBM HPC/HPDA
FlashSystem
Mellanox
Power + CAPI
FPGA
Nvida
Synapse Chip
FPGA
Quantum Chip
Today Tomorrow
Accelerators and today’s Systems
Focus on Data, Analytics, Cognitive, and HPC full workflow performance, Heterogeneous compute
Add improved Cognitive capabilities, integration of new technologies: (e.g. SyNAPSE, Quantum computing), seamless enablement of heterogeneous compute – on-premises and in the cloud
IBM HPC/HPDA
Summery: Cognitive Computing in an HPC environment
Data-induced latency is an issue for every installation
Data-centric computing is our answer, pioneered by IBM Research
Acknowledged by our competitors, governments, customers
Minimize Data Motion
Enable Compute Everywhere
Modularity
Application-driven design
Cognitive
Data motion is expensiveHardware and software to support & enable compute in dataAllow workloads to run where they run best
Introduce “active” system elements including network, memory, storage
Balanced, composable architecture for Big Data & analytics, modeling, and simulationModular and driven by accelerators with upgradeable design scalable from subrack to 100s of racks
Use real workloads/workflows to drive design pointsCo-design for customer value
Home of AI, Deep & Machine Learning, Neuromorphic Computing, and neuronal networks
IBM HPC/HPDA
Ingolf WittmannDiplom-InformatikerTechnical Director
IBM-Allee 1D-71139 EhningenMail: D-71137 Ehningen
Phone: +49-7034-15-4881Mobile: [email protected]
@ijwatHALhttps://www.facebook.com/ingolf.wittmann.7 de.linkedin.com/pub/ingolf-wittmann/27/189/132/