Deep Learning –a cookbook view - NVIDIAon-demand.gputechconf.com/gtc-eu/2017/presentation/... ·...

Deep Learning – a cookbook viewor “Comparative Analysis of Different Deep Learning Solutions “

Deep Learning– Unsupervised training– Generic code– Pattern recognition

Systems can– Observe– Test – Refine

Successes– AlphaGO First Computer GO program to

beat a human – Deep Face Facial verification– Libratus AI Poker App– Digital virtual assistants Siri– Google Self-driving cars

The evolution of artificial intelligence

21940 – 1980

Early artificial intelligence– ENIAC Heralded the Giant Brain“;

used for WW II ballistics– Industrial robots

1990 – 2000s

Machine learning– Deep Blue Beating World Chess

Champion Kasparov– DARPA Challenge Autonomous

vehicle drove 132 miles

Today

Small data sets

Massive structured data sets

Massive unstructured big data

Statistical and mathematical models applied to solve problems Advanced Analytics and Heuristic Predictive models defined by

machines based neural networks

Traditional machine learningRequires feature engineering

3

Training Data Machine learning algorithmFeature engineering

Data Learned model (prediction function)Feature extraction Prediction

Training

Prediction

DeepLearning

MachineLearning

Artificial Intelligence

HPC

Deep learningEfficient data representations, no more feature engineering

4

Training Data Deep learning algorithm

DataLearned model

(transformation and prediction function)

Prediction

Training

Prediction (inference)

Types of artificial neural networksTopology to fit data characteristics

5

Input HiddenLayer 1

HiddenLayer 2 Output Input Hidden

Layer 1HiddenLayer 2 Output

Convolutional: Images

Fully connected: Speech, text, sensor

Recurrent: Speech, text, sensor

Input HiddenLayer 1

Output

Terminology

6

Training dataTrue labels

House

Batch

Model

PredictionsFlower

ErrorsIteration

Epoch

Weak scalingWorker 1Worker 1 Worker 2Worker 1

Strong scaling

Worker 2

‒ Search & information extraction

‒ Security/Video surveillance

‒ Self-driving cars‒ Medical imaging‒ Robotics

‒ Interactive voice response (IVR) systems

‒ Voice interfaces (Mobile, Cars, Gaming, Home)

‒ Security (speaker identification)

‒ Health care‒ People with disabilities

‒ Search and ranking‒ Sentiment analysis‒ Machine translation‒ Question answering

‒ Recommendation engines

‒ Advertising‒ Fraud detection‒ AI challenges‒ Drug discovery‒ Sensor data analysis‒ Diagnostic support

Why deep learning?Applications

7

TextVision Speech Other

Applications break down

8

DetectionLook for a known object/pattern

ClassificationAssign a label from a predefined set of labels

GenerationGenerate content

Anomaly detectionLook for abnormal, unknown patterns

Images

Video

Text

Sensor

Other

Speech

Video surveillance

Speech recognition

Sentiment analysis

Predictive maintenance

Fraud detection

Image analysis

How an individual customer’s AI evolves

9

ExploreHow can AI help me?

ExperimentHow can I get started?

Scale up and OptimizeHow can I scale and optimize?

Do things better– Product development– Customer experience– Productivity– Employee experience

Do new things– New disruptions

Boundary constraints(regulations, etc.)

DataData model? Location?How to create a model? – Homegrown solution or open source?– Simple ML or scalable DL?

DesignHow to design and deploy the PoC?– On-prem, cloud?– How to think about inference

PerformanceWhat is the best config to run?How to tune the model to improve accuracy?

Provisioning for inference

Infrastructure scale up– Training– Inference– On-prem / cloud / hybrid

Data management– Between edge and core– Security– Updates– Regulations– Tracing

Key IT challenges are constraining deep learning adoptionLimited knowledge, resources and capabilities

10

How to get started? How to go to production? How to optimize?

“I need simple, infrastructure and software capabilities to rapidly and efficiently support deep learning app development.”

“I could use more expert advice and tailored solutions for migrating and integrating apps in a production environment.”

“I need help integrating the latest technologies into my deep learning environment to accelerate actionable insights.”

Immature, sub-optimalfoundation

Lack of technologyintegration capabilities

Inability to scale and integrate

Content under embargo until Oct 10, 2017

What about AI consumers ?

Do it yourself

Current wave of AI / Machine Learning is core to their business. All in-house

Google, Baidu, Facebook,Microsoft, Apple, etc.

How do I do it ?

Could benefit from better data science, machine learning, but it is not historically their core-competency

Banks, advertisers, healthcare, manufacturing, food, automotive, etc.

I know better

Super-Experts – current wave is woefully inadequate

Government – DoD, DoE, NSA, NASA, etc.

Not ready for an ASIC. Don’t know what they need exactly. Many still developing on CPUs. Can’t use solutions that can’t be verified or understood

Begging for higher performance ASICs. Know exactly what they want to do. Strong technology pull.

Where to start ? Recommend DL stack by vertical application

12

Infrastructure

Frameworks

Typical layers

Data type

Data

ManufacturingVerticals Oil & gas Connected carsVoice interfaces Social media

Speech Images Sensor dataVideo

Small Moderate Large

Convolutional Fully-connected Recurrent

TensorFlow Caffe 2 CNTK …

x86 GPUs FPGAs TPU ? …

…

Torch

Neural Network sits here

Neural Network : Popular Networks

13

Network Model size(# params)

Model size(MB)

GFLOPs(forward pass)

AlexNet 60,965,224 233 MB 0.7

GoogleNet 6,998,552 27 MB 1.6

VGG-16 138,357,544 528 MB 15.5

VGG-19 143,667,240 548 MB 19.6

ResNet50 25,610,269 98 MB 3.9

ResNet101 44,654,608 170 MB 7.6

ResNet152 60,344,387 230 MB 11.3

Today’s scaleModel size, data size, compute requirements

Application Model Training data FLOPs per epoch

Vision 1.7 * 109

~6.8 GB14*106 images~2.5 TB (256x256)~10 TB (512x512)

6*1.7*109*14*106

~1.4*1017

Speech 60 * 106

~240 MB100K hours of audio~34*109 frames~50 TB

6*60*106*34*109

~1.2*1019

Text 6.5 * 106

~260 MB856*106 words 6*6.5*106*856*106

~3.3*1016

Signals 1.2 * 106

~4.8 MB3*106 frames 6*1.2*3*106*3*106

6.5*1013

Today’s hardwareModel size, data size, compute requirements

Application Model Training data FLOPs per epoch

Vision 1.7 * 109

~6.8 GB14*106 images~2.5 TB (256x256)~10 TB (512x512)

6*1.7*109*14*106

~1.4*1017

1 epoch per hour: ~39 TFLOPS

Today’s hardware:Google TPU2: 180 TFLOPS Tensor ops (FP16 ??)NVIDIA Tesla V100: 15 TFLOPS SP (30 TFLOPS FP16 , 120 TFLOPS Tensor ops), 12 GB memoryNVIDIA Tesla P100: 10.6 TFLOPS SP, 16 GB memoryNVIDIA Tesla K40: 4.29 TFLOPS SP, 12 GB memoryNVIDIA Tesla K80: 5.6 TFLOPS SP (8.74 TFLOPS SP with GPU boost), 24 GB memoryINTEL Xeon Phi: 2.4 TFLOPS SP

Superdome X: ~21 TFLOPS SP, 24 TB memory

Hardware

Software

So what to recommend?

16

Building performance models

17

TensorFlow

Tensor RT

Caffe 2

Alex Net

GoogleNet

VGG-16, VGG -19

ResNet 50, 101,152

Eng Acoustic Model

Worker 1

Strong scaling

Worker 2

Weak scaling

Worker 1 Worker 2

Hardware

Populated with 8 GPUs

Scalable, automated real-time intelligence

BVLC Caffe

TensorFlow – Weak Scaling – Training – Different models perfromancein Tensor Flow . Scaling up to 8 GPUs

18

0

1

2

3

4

5

6

7

8

1 2 4 8

Speedup for up to 8 GPUs

DeepMNIST EngAcousticModel GoogleNet ResNet101 ResNet152ResNet50 SensorNet VGG16 VGG19

TensorFlow - Inference ( Inferences per Second) - Different Models witth different Batch numbers

19

0

50000

100000

150000

200000

250000

300000

350000

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192

DeepMNIST

1 2 4 8

0500

10001500200025003000350040004500

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192

GoogleNet

1 2 4 8

0200400600800

10001200140016001800

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192

ResNet50

1 2 4 8

0100200300400500600700800900

1000

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192

VGG19

1 2 4 8

HOW TO ANALYZE ALL THE DIFFERENT NUMBERS .

AS WE ADD MORE OPTIONS and MORE TECHNOLOGIES IT WOULD BE IMPOSSIBLE TO USE

HPE demystifies deep learning for faster intelligence across all organizationsNew IT expertise, blueprints and technologies to get started, scale, integrate and optimize

20

Get started rapidly: Develop deep learning models

Scale and Integrate: Deliver attractive returns

Optimize Environment: Enhance competitive advantage

IT expertise and solutionsto “get started” with deep learning models

Proven blueprints and servicesfor “scalable” production deployments

Technology integration capabilities to maximize performance

Expertise− Rapid technology selection guides− State of the art trainingSolutions− Integrated purpose-built solutions− Out of the box solutions

Integration capabilities− Enhanced global Centers of Excellence− Next gen technology integration

Proven Blueprints− Reference Architectures− Innovation labs for best practicesServices− Deploy, integrate and support − Flexible, on-demand capacity

Select ideal technology configurationswith HPE Deep Learning Cookbook

21

“Book of recipes” for deep learning workloads

Expert advice to get you started

Availability of complete toolset

− Comprehensive tool set based on extensive benchmarking

− Includes 11 workloads with 8 DL frameworks and 8 HPE hardware systems

− Estimates workload performance and recommends an optimal HW/SW stack for that workload

− Informed decision making - optimal hardware and software configurations

− Eliminates the “guesswork” - validated methodology and data

− Improves efficiency - detects bottlenecks in deep learning workloads

− Deep Learning Benchmarking Suite: available on GitHub Dec 2018

− Deep Learning Performance Analysis Tool: planned to be released in the beginning of 2018.

− Reference configurations: available soon on HPE.com website

Get Started

Deep Learning Cookbook helps to pick the right HW/SW stack

22

Benchmarking Suite

• Benchmarking scripts• Reference models• Performance metrics

Knowledgebase

• 11 reference models• 8 frameworks• 8 hardware systems

Performance results

Performance and scalability models

• Machine learning (SVR)to predict performanceof core operations

• Analytical communication models

• Analytical models for overall performance

• Performance results• Performance prediction for arbitrary

ANNs• Scalability prediction• Optimal HW/SW configuration

for a given workload

Reporting tool

Reference configurations• Image classification• Others to come

will be available externally internal assets

Thank youNatalia [email protected]

Sergey [email protected]

24

Sorin [email protected]

Bruno [email protected]

Deep Learning –a cookbook view - NVIDIAon-demand.gputechconf.com/gtc-eu/2017/presentation/... ·...

Documents

Transcript of Deep Learning –a cookbook view - NVIDIAon-demand.gputechconf.com/gtc-eu/2017/presentation/... ·...