Deep Learning –a cookbook view - NVIDIAon-demand.gputechconf.com/gtc-eu/2017/presentation/... ·...
Transcript of Deep Learning –a cookbook view - NVIDIAon-demand.gputechconf.com/gtc-eu/2017/presentation/... ·...
Deep Learning – a cookbook viewor “Comparative Analysis of Different Deep Learning Solutions “
Deep Learning– Unsupervised training– Generic code– Pattern recognition
Systems can– Observe– Test – Refine
Successes– AlphaGO First Computer GO program to
beat a human – Deep Face Facial verification– Libratus AI Poker App– Digital virtual assistants Siri– Google Self-driving cars
The evolution of artificial intelligence
21940 – 1980
Early artificial intelligence– ENIAC Heralded the Giant Brain“;
used for WW II ballistics– Industrial robots
1990 – 2000s
Machine learning– Deep Blue Beating World Chess
Champion Kasparov– DARPA Challenge Autonomous
vehicle drove 132 miles
Today
Small data sets
Massive structured data sets
Massive unstructured big data
Statistical and mathematical models applied to solve problems Advanced Analytics and Heuristic Predictive models defined by
machines based neural networks
Traditional machine learningRequires feature engineering
3
Training Data Machine learning algorithmFeature engineering
Data Learned model (prediction function)Feature extraction Prediction
Training
Prediction
DeepLearning
MachineLearning
Artificial Intelligence
HPC
Deep learningEfficient data representations, no more feature engineering
4
Training Data Deep learning algorithm
DataLearned model
(transformation and prediction function)
Prediction
Training
Prediction (inference)
Types of artificial neural networksTopology to fit data characteristics
5
Input HiddenLayer 1
HiddenLayer 2 Output Input Hidden
Layer 1HiddenLayer 2 Output
Convolutional: Images
Fully connected: Speech, text, sensor
Recurrent: Speech, text, sensor
Input HiddenLayer 1
Output
Terminology
6
Training dataTrue labels
House
Batch
Model
PredictionsFlower
ErrorsIteration
Epoch
Weak scalingWorker 1Worker 1 Worker 2Worker 1
Strong scaling
Worker 2
‒ Search & information extraction
‒ Security/Video surveillance
‒ Self-driving cars‒ Medical imaging‒ Robotics
‒ Interactive voice response (IVR) systems
‒ Voice interfaces (Mobile, Cars, Gaming, Home)
‒ Security (speaker identification)
‒ Health care‒ People with disabilities
‒ Search and ranking‒ Sentiment analysis‒ Machine translation‒ Question answering
‒ Recommendation engines
‒ Advertising‒ Fraud detection‒ AI challenges‒ Drug discovery‒ Sensor data analysis‒ Diagnostic support
Why deep learning?Applications
7
TextVision Speech Other
Applications break down
8
DetectionLook for a known object/pattern
ClassificationAssign a label from a predefined set of labels
GenerationGenerate content
Anomaly detectionLook for abnormal, unknown patterns
Images
Video
Text
Sensor
Other
Speech
Video surveillance
Speech recognition
Sentiment analysis
Predictive maintenance
Fraud detection
Image analysis
How an individual customer’s AI evolves
9
ExploreHow can AI help me?
ExperimentHow can I get started?
Scale up and OptimizeHow can I scale and optimize?
Do things better– Product development– Customer experience– Productivity– Employee experience
Do new things– New disruptions
Boundary constraints(regulations, etc.)
DataData model? Location?How to create a model? – Homegrown solution or open source?– Simple ML or scalable DL?
DesignHow to design and deploy the PoC?– On-prem, cloud?– How to think about inference
PerformanceWhat is the best config to run?How to tune the model to improve accuracy?
Provisioning for inference
Infrastructure scale up– Training– Inference– On-prem / cloud / hybrid
Data management– Between edge and core– Security– Updates– Regulations– Tracing
Key IT challenges are constraining deep learning adoptionLimited knowledge, resources and capabilities
10
How to get started? How to go to production? How to optimize?
“I need simple, infrastructure and software capabilities to rapidly and efficiently support deep learning app development.”
“I could use more expert advice and tailored solutions for migrating and integrating apps in a production environment.”
“I need help integrating the latest technologies into my deep learning environment to accelerate actionable insights.”
Immature, sub-optimalfoundation
Lack of technologyintegration capabilities
Inability to scale and integrate
Content under embargo until Oct 10, 2017
What about AI consumers ?
Do it yourself
Current wave of AI / Machine Learning is core to their business. All in-house
Google, Baidu, Facebook,Microsoft, Apple, etc.
How do I do it ?
Could benefit from better data science, machine learning, but it is not historically their core-competency
Banks, advertisers, healthcare, manufacturing, food, automotive, etc.
I know better
Super-Experts – current wave is woefully inadequate
Government – DoD, DoE, NSA, NASA, etc.
Not ready for an ASIC. Don’t know what they need exactly. Many still developing on CPUs. Can’t use solutions that can’t be verified or understood
Begging for higher performance ASICs. Know exactly what they want to do. Strong technology pull.
Where to start ? Recommend DL stack by vertical application
12
Infrastructure
Frameworks
Typical layers
Data type
Data
ManufacturingVerticals Oil & gas Connected carsVoice interfaces Social media
Speech Images Sensor dataVideo
Small Moderate Large
Convolutional Fully-connected Recurrent
TensorFlow Caffe 2 CNTK …
x86 GPUs FPGAs TPU ? …
…
Torch
Neural Network sits here
Neural Network : Popular Networks
13
Network Model size(# params)
Model size(MB)
GFLOPs(forward pass)
AlexNet 60,965,224 233 MB 0.7
GoogleNet 6,998,552 27 MB 1.6
VGG-16 138,357,544 528 MB 15.5
VGG-19 143,667,240 548 MB 19.6
ResNet50 25,610,269 98 MB 3.9
ResNet101 44,654,608 170 MB 7.6
ResNet152 60,344,387 230 MB 11.3
Today’s scaleModel size, data size, compute requirements
Application Model Training data FLOPs per epoch
Vision 1.7 * 109
~6.8 GB14*106 images~2.5 TB (256x256)~10 TB (512x512)
6*1.7*109*14*106
~1.4*1017
Speech 60 * 106
~240 MB100K hours of audio~34*109 frames~50 TB
6*60*106*34*109
~1.2*1019
Text 6.5 * 106
~260 MB856*106 words 6*6.5*106*856*106
~3.3*1016
Signals 1.2 * 106
~4.8 MB3*106 frames 6*1.2*3*106*3*106
6.5*1013
Today’s hardwareModel size, data size, compute requirements
Application Model Training data FLOPs per epoch
Vision 1.7 * 109
~6.8 GB14*106 images~2.5 TB (256x256)~10 TB (512x512)
6*1.7*109*14*106
~1.4*1017
1 epoch per hour: ~39 TFLOPS
Today’s hardware:Google TPU2: 180 TFLOPS Tensor ops (FP16 ??)NVIDIA Tesla V100: 15 TFLOPS SP (30 TFLOPS FP16 , 120 TFLOPS Tensor ops), 12 GB memoryNVIDIA Tesla P100: 10.6 TFLOPS SP, 16 GB memoryNVIDIA Tesla K40: 4.29 TFLOPS SP, 12 GB memoryNVIDIA Tesla K80: 5.6 TFLOPS SP (8.74 TFLOPS SP with GPU boost), 24 GB memoryINTEL Xeon Phi: 2.4 TFLOPS SP
Superdome X: ~21 TFLOPS SP, 24 TB memory
Hardware
Software
So what to recommend?
16
Building performance models
17
TensorFlow
Tensor RT
Caffe 2
Alex Net
GoogleNet
VGG-16, VGG -19
ResNet 50, 101,152
Eng Acoustic Model
Worker 1
Strong scaling
Worker 2
Weak scaling
Worker 1 Worker 2
Hardware
Populated with 8 GPUs
Scalable, automated real-time intelligence
BVLC Caffe
TensorFlow – Weak Scaling – Training – Different models perfromancein Tensor Flow . Scaling up to 8 GPUs
18
0
1
2
3
4
5
6
7
8
1 2 4 8
Speedup for up to 8 GPUs
DeepMNIST EngAcousticModel GoogleNet ResNet101 ResNet152ResNet50 SensorNet VGG16 VGG19
TensorFlow - Inference ( Inferences per Second) - Different Models witth different Batch numbers
19
0
50000
100000
150000
200000
250000
300000
350000
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192
DeepMNIST
1 2 4 8
0500
10001500200025003000350040004500
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192
GoogleNet
1 2 4 8
0200400600800
10001200140016001800
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192
ResNet50
1 2 4 8
0100200300400500600700800900
1000
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192
VGG19
1 2 4 8
HOW TO ANALYZE ALL THE DIFFERENT NUMBERS .
AS WE ADD MORE OPTIONS and MORE TECHNOLOGIES IT WOULD BE IMPOSSIBLE TO USE
HPE demystifies deep learning for faster intelligence across all organizationsNew IT expertise, blueprints and technologies to get started, scale, integrate and optimize
20
Get started rapidly: Develop deep learning models
Scale and Integrate: Deliver attractive returns
Optimize Environment: Enhance competitive advantage
IT expertise and solutionsto “get started” with deep learning models
Proven blueprints and servicesfor “scalable” production deployments
Technology integration capabilities to maximize performance
Expertise− Rapid technology selection guides− State of the art trainingSolutions− Integrated purpose-built solutions− Out of the box solutions
Integration capabilities− Enhanced global Centers of Excellence− Next gen technology integration
Proven Blueprints− Reference Architectures− Innovation labs for best practicesServices− Deploy, integrate and support − Flexible, on-demand capacity
Select ideal technology configurationswith HPE Deep Learning Cookbook
21
“Book of recipes” for deep learning workloads
Expert advice to get you started
Availability of complete toolset
− Comprehensive tool set based on extensive benchmarking
− Includes 11 workloads with 8 DL frameworks and 8 HPE hardware systems
− Estimates workload performance and recommends an optimal HW/SW stack for that workload
− Informed decision making - optimal hardware and software configurations
− Eliminates the “guesswork” - validated methodology and data
− Improves efficiency - detects bottlenecks in deep learning workloads
− Deep Learning Benchmarking Suite: available on GitHub Dec 2018
− Deep Learning Performance Analysis Tool: planned to be released in the beginning of 2018.
− Reference configurations: available soon on HPE.com website
Get Started
Deep Learning Cookbook helps to pick the right HW/SW stack
22
Benchmarking Suite
• Benchmarking scripts• Reference models• Performance metrics
Knowledgebase
• 11 reference models• 8 frameworks• 8 hardware systems
Performance results
Performance and scalability models
• Machine learning (SVR)to predict performanceof core operations
• Analytical communication models
• Analytical models for overall performance
• Performance results• Performance prediction for arbitrary
ANNs• Scalability prediction• Optimal HW/SW configuration
for a given workload
Reporting tool
Reference configurations• Image classification• Others to come
will be available externally internal assets
23
Thank youNatalia [email protected]
Sergey [email protected]
24
Sorin [email protected]
Bruno [email protected]