outthink limits - HPC Advisory Council · outthink limits Ingolf Wittmann Technical Director Leader...

outthinklimits

Ingolf WittmannTechnical DirectorLeader of HPC

The Computer that could be smarter than usCognitive Computing

IBM HPC/HPDA

Unparalleled acceleration

2.5xBandwidth

An industry first — POWER8 with NVIDIA NVLink delivers 2.5X the bandwidth to GPU accelerators, allowing you to experience Kinetica at the speed it was intended compared to x86 based systems.

Real-time results

10xPerformance

With the unique capabilities of Tesla P100 + POWER8, Kinetica has 2.4x the performance of competing systems enabling you to analyze and visualize large datasets in milliseconds vs. hours or minutes.

Kinetica - Unstructured Databases

POWER8 with NVLink System

Power Systems S822LC with 4 Tesla P100s:

188,852 queries per hour

PCIe x16 3.0/x86 SystemXeon E5-2640 v4

with 4 Tesla K80s : 73,320 queries per hour

What is Kinetica? Kinetica’s in-memory database powered by graphics processing units (GPUs) was built from the ground up to deliver truly real-time insights on data in motion: orders of magnitude faster performance (potentially 100, 1000X) at 10% to 25% of the cost of traditional data platforms.

What are the Key Markets?• Retail: Inventory Mgt, BI, Apps, Big

Data tools, HPA

• Distribution / Logistics : Supply Chain Mgt

• Financial Services : Fraud Detection, AML

• Ad-Tech : More Targeted Marketing

• IoT : End Point Management, RFID

65% reduction in data transfer time (3X improvement) in for Kinetica GPU-accelerated DB• Less data-induced latency in all applications• Unique to POWER8 with NVLink• Less coding to compensate for slow data movement!1.95X of the 2.5X overall performance improvement attributable to NVLink

40 tick Query Time: S822LC for HPC, NVLink40 tick Query Time: S822LC for HPC, NVLink

100 tick Query Time: Competing System PCI-E x16 3.0100 tick Query Time: Competing System PCI-E x16 3.0

73 ticks

Data Transfer

27 ticks

Calculation*

26 ticks 14 ticks

* Includes non-overlapping: CPU, GPU, and idle times.

Data Transfer Calculation*

65% Reduction

IBM HPC/HPDA

Deep Learning - Getting HPC to ‘Work Smart Not Hard’

• Typically HPC development is focused on increased speed.

• The fastest calculation is the one which you don’t run!

• Can we use machine learning to make better decisions on which simulations give the most value?

• Can we use machine learning to improve resolution of information?

Cognitive steering of an ensemble of simulations

Application of cognitive techniques in HPC can overcome and go beyond Moore’s law

Cognitive driven workflow uses

1/3of the calculations to achieve

4xOrders of magnitude resolution increase.

An industry first — POWER8 with NVIDIA NVLink delivers 2.5X the bandwidth to GPU accelerators, allowing you to experience Kinetica at the speed it was intended compared to x86 based systems.

IBM HPC/HPDA

ArtificialIntelligence &

Cognitive Applications

BigData

MachineLearning

Deep Learning

(Neuronal Nets)

Cognitive landscape: terms and relationship

IBM HPC/HPDA

Deep Learning / AI Lexicon

• Artificial Intelligence > Machine Learning > Deep Learning

• Deep Learning = Training (datacenter, compute intensive) + Inference (edge, embedded… closer to user)

• Training = neural “inspired”, fed by millions of data points … repetition drives weighting and connection

• Platform = Frameworks + Supporting Libraries + Compute

• Compute = Acceleration + Extreme Bandwidth

• Desired outcome: higher accuracy in perceptive tasks, a model for inference

IBM HPC/HPDA

P8TrueNorth

Neuromorphic“Right Brain Computing”

1x

http://research.ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=5o2q0UxHcFa

o=pkikk

20x

100x

∞

IBM HPC/HPDA

Deep Learning areas

“ The general idea of deep learning is to use neural networks to build multiple layers of abstraction to solve a complex semantic problem.”- Aaron Chavez, IBM Watson

Voice assisted recognition

Image recognition

Fraud prevention

Critical environment investigation

IBM HPC/HPDA

Neuron Function (Architecture)TrueNorth Chip – Synapse Chip

o Emulation of analog behaviour by +/- 255 INT variableo 2-dimensional on-chip synaptic weighted network and off-chip packet

based thru-neuron routing for multi-chip scaling

o Update of Synaptic network every ms (logical / biological clock), internal processing ~ 1MHz

o Neuron fires a spike (45 pJ) to the network if in the last update cycle a threshold was reached or exceeded

o Stochastic and leak behavior configurable

Vj(t) = Vj(t-1) + Ai(t) * ij * [(1 - bj) * sj + sign(sj) * bj * F(|sj|, j)] + Leak

Leak = * [(1 - cj) * j + sign(j) * cj * F(|j|,j)]+1

0-1

( )

Random number {uint}

Weight {signed int}

Step {0,1}Weight {0,1}membrane potential

{signed int}

synapsematrix {0,1}input

spike {0,1}

i=0

255

Membrane potential for neuron j at time t:

Random number {uint}

Leak weight {signed int}

Weight {0,1}Step {0,1}

o=pkikk

Maintenance in234 flights

Liquid Synapsean extreme Blue project

David Stöckel Julian Heyne Maximillian Löhr Pascal Nieters

IBM HPC/HPDA

IBM Neurosynaptic SystemLiquid State Machine

Liquid state machine with TrueNorth Chip – Synapse Chip

Sensor Data Neural Network Readout

IBM HPC/HPDA

Accelerator Connection Bandwidths

| 11

POWER9PowerAccel

State of the Art I/O and Acceleration Attachment Signaling• PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth• 25G Link x 48 lanes – 300 GB/s duplex bandwidth

Robust Accelerated Compute Options with OPEN standards• On-Chip Acceleration – Gzip x1, 842 Compression x2, AES/SHA x2• CAPI 2.0 – 4x bandwidth of POWER8 using PCIe Gen 4• NVLink 2.0 – Next generation of GPU/CPU interconnect

Up to 2x bandwidth of NVLink1.0 Easier programming model for complex analytic & cognitive applications

• Coherency, virtual addressing, low overhead communication• OpenCAPI 3.0 – High bandwidth, low latency and open interface using 25G Link

Extreme Processor / Accelerator Bandwidth and Reduced Latency

Coherent Memory and Virtual Addressing Capability for all Accelerators

OpenPOWER Community Enablement – Robust Accelerated Compute Options

IBM HPC/HPDA

FlashSystem

Mellanox

Power + CAPI

FPGA

Nvida

Synapse Chip

FPGA

Quantum Chip

Today Tomorrow

Accelerators and today’s Systems

Focus on Data, Analytics, Cognitive, and HPC full workflow performance, Heterogeneous compute

Add improved Cognitive capabilities, integration of new technologies: (e.g. SyNAPSE, Quantum computing), seamless enablement of heterogeneous compute – on-premises and in the cloud

IBM HPC/HPDA

Summery: Cognitive Computing in an HPC environment

Data-induced latency is an issue for every installation

Data-centric computing is our answer, pioneered by IBM Research

Acknowledged by our competitors, governments, customers

Minimize Data Motion

Enable Compute Everywhere

Modularity

Application-driven design

Cognitive

Data motion is expensiveHardware and software to support & enable compute in dataAllow workloads to run where they run best

Introduce “active” system elements including network, memory, storage

Balanced, composable architecture for Big Data & analytics, modeling, and simulationModular and driven by accelerators with upgradeable design scalable from subrack to 100s of racks

Use real workloads/workflows to drive design pointsCo-design for customer value

Home of AI, Deep & Machine Learning, Neuromorphic Computing, and neuronal networks

IBM HPC/HPDA

Ingolf WittmannDiplom-InformatikerTechnical Director

IBM-Allee 1D-71139 EhningenMail: D-71137 Ehningen

Phone: +49-7034-15-4881Mobile: [email protected]

@ijwatHALhttps://www.facebook.com/ingolf.wittmann.7 de.linkedin.com/pub/ingolf-wittmann/27/189/132/

outthink limits - HPC Advisory Council · outthink limits Ingolf Wittmann Technical Director Leader...

Documents

Transcript of outthink limits - HPC Advisory Council · outthink limits Ingolf Wittmann Technical Director Leader...