Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This...
Transcript of Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This...
![Page 1: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/1.jpg)
1© 2015 The MathWorks, Inc.
Deploying Deep Neural Networks
to Embedded GPUs and CPUs
Dr Rishu Gupta
Senior Application Engineer
![Page 2: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/2.jpg)
2
Deep Learning Workflow in MATLAB
Application
logic
Application
Design
Standalone
Deployment
Deep Neural Network
Design + Training
Trained
DNN
![Page 3: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/3.jpg)
3
Deep Neural Network Design and Training
▪ Design in MATLAB
▪ Manage large data sets
▪ Automate data labeling
▪ Easy access to models
▪ Training in MATLAB
▪ Acceleration with GPU’s
▪ Scale to clusters
Train in MATLAB
Model
importer
Trained
DNN
Transfer
learning
Reference
model
![Page 4: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/4.jpg)
4
Application Design
Pre-
processing
Post-
processing
Application
logic
Embedded
Multi-Platform Deep Learning Deployment
![Page 5: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/5.jpg)
5
Algorithm Design to Embedded Deployment Workflow
DNN Design &
Train
1
Desktop
GPU
Application
Design
2
Desktop
GPU
C++
Deployment
integration-test
3
Embedded GPU
C++
Real-time test4
High-level language
Deep learning framework
Large, complex software stack
Challenges
• Integrating multiple libraries and packages
• Verifying and maintaining multiple
implementations
• Algorithm & vendor lock-in
C/C++
Low-level APIs
Application-specific libraries
C/C++
Target-optimized libraries
Optimize for memory & speed
![Page 6: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/6.jpg)
6
Solution: Use MATLAB Coder & GPU Coder for
Deep Learning Deployment
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
Target Libraries
ARM
Compute
Library
![Page 7: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/7.jpg)
7
Solution: Use MATLAB Coder & GPU Coder for
Deep Learning Deployment
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
ARM
Compute
Library
![Page 8: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/8.jpg)
8
Musashi Seimitsu Industry Co.,Ltd.Detect Abnormalities in Automotive Parts
MATLAB use in project:
▪ Preprocessing of captured images
▪ Image annotation for training
▪ Deep learning based analysis
– Various transfer learning methods
(Combinations of CNN models, Classifiers)
– Estimation of defect area using Class Activation Map
(CAM)
– Abnormality/defect classification
▪ Deployment to NVIDIA Jetson using GPU CoderAutomated visual inspection of 1.3 million
bevel gear per month
![Page 9: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/9.jpg)
9
Deep Learning Deployment Workflows
Pre-
processing
Post-
processing
codegen
Portable target code
INTEGRATED APPLICATION DEPLOYMENT
cnncodegen
Portable target code
INFERENCE ENGINE DEPLOYMENT
Trained
DNN
![Page 10: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/10.jpg)
10
Workflow for Inference Engine Deployment
Steps for inference engine deployment
1. Generate the code for trained model>> cnncodegen(net, 'targetlib’, ‘arm-
compute’)
2. Copy the generated code onto target board
3. Build the code for the inference engine>> make –C ./codegen –f …mk
4. Use hand written main function to call inference
engine
5. Generate the exe and test the executable>> make –C ./ ……
cnncodegen
Portable target code
INFERENCE ENGINE DEPLOYMENT
Trained
DNN
![Page 11: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/11.jpg)
11
Deep Learning Inference Deployment
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
Target Libraries
ARM
Compute
Library
![Page 12: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/12.jpg)
12
Deep Learning Inference Deployment
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
Target Libraries
ARM
Compute
Library
Pedestrian Detection
![Page 13: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/13.jpg)
13
ARM
Compute
Library
Application
logic
GPU
Coder
NVIDIA
TensorRT &
cuDNN
Libraries
Intel
MKL-DNN
Library
MATLAB
Coder
Target Libraries
How is the
performance?
![Page 14: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/14.jpg)
14
Performance of Generated Code
▪ CNN inference (ResNet-50, VGG-16, Inception V3) on Titan V GPU
▪ CNN inference (ResNet-50) on Jetson TX2
▪ CNN inference (ResNet-50 , VGG-16, Inception V3) on Intel Xeon CPU
![Page 15: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/15.jpg)
15
Single Image Inference on Titan V using cuDNN
PyTorch (1.0.0)
MXNet (1.4.0)
GPU Coder (R2019a)
TensorFlow (1.13.0)
Intel® Xeon® CPU 3.6 GHz - NVIDIA libraries: CUDA10 - cuDNN 7 - Frameworks: TensorFlow 1.13.0, MXNet 1.4.0 PyTorch 1.0.0
![Page 16: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/16.jpg)
16
Single Image Inference on Jetson TX2
NVIDIA libraries: CUDA9 - cuDNN 7 – TensorRT 3.0.4 - Frameworks: TensorFlow 1.12.0
GPU Coder
+
TensorRT
TensorFlow
+
TensorRT
ResNet-50
![Page 17: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/17.jpg)
17
CPU Performance
MATLAB
TensorFlow
MXNet
MATLAB Coder
PyTorch
Intel® Xeon® CPU 3.6 GHz - Frameworks: TensorFlow 1.6.0, MXNet 1.2.1, PyTorch 0.3.1
CPU, Single Image Inference (Linux)
![Page 18: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/18.jpg)
18
Brief Summary
DNN libraries are great for inference, …
MATLAB Coder and GPU Coder generates code that takes advantage of:
NVIDIA® CUDA libraries, including TensorRT & cuDNN
Intel® Math Kernel Library for Deep Neural Networks
(MKL-DNN)
ARM® Compute libraries for mobile platforms
![Page 19: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/19.jpg)
19
Brief Summary
DNN libraries are great for inference, …
MATLAB Coder and GPU Coder generates code that takes advantage of:
NVIDIA® CUDA libraries, including TensorRT & cuDNN
Intel® Math Kernel Library for Deep Neural Networks
(MKL-DNN)
ARM® Compute libraries for mobile platforms
But, applications
require more than just
inference
![Page 20: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/20.jpg)
20
Deep Learning Workflows: Integrated Application Deployment
Pre-
processing
Post-
processing
codegen
Portable target code
![Page 21: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/21.jpg)
21
Lane
Detection
Strongest
Bounding
Box
Lane and Object Detection using YOLO v2
Post-
processing
Object
Detection
Workflow:
1) Test in MATLAB
2) Generate code and test on
desktop
3) Generate code and test on
Jetson AGX Xavier GPU
AlexNet-based YOLO v2
![Page 22: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/22.jpg)
22
Lane
Detection
Strongest
Bounding
Box
(1) Test in MATLAB
Post-
processing
Object
Detection
AlexNet-based YOLO v2
![Page 23: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/23.jpg)
23
Lane
Detection
Strongest
Bounding
Box
(2) Generate Code and Test on Desktop GPU
Post-
processing
Object
Detection
cuDNN/TensorRT optimized code
CUDA optimized code
AlexNet-based YOLO v2
![Page 24: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/24.jpg)
24
Lane
Detection
Strongest
Bounding
Box
(3) Generate Code and Test on Jetson AGX Xavier GPU
Post-
processing
Object
Detection
cuDNN/TensorRT optimized code
CUDA optimized code
AlexNet-based YOLO v2
![Page 25: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/25.jpg)
25
Lane
Detection
Strongest
Bounding
Box
Lane and Object Detection using YOLO v2
Post-
processing
Object
Detection
cuDNN/TensorRT optimized code
CUDA optimized code
AlexNet-based YOLO v2
Workflow:
1) Test in MATLAB
2) Generate code and test on
desktop
3) Generate code and test on
Jetson AGX Xavier GPU
![Page 26: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/26.jpg)
26
Accessing Hardware
Deploy Standalone
Application
Access Peripheral
from MATLAB
Processor-in-Loop
Verification
![Page 27: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/27.jpg)
27
Deploy to Target Hardware via Apps and Command Line
![Page 28: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/28.jpg)
28
PyTorch (1.0.0)
MXNet (1.4.0)
GPU Coder (R2019a)
TensorFlow (1.13.0)
![Page 29: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/29.jpg)
29
PyTorch (1.0.0)
MXNet (1.4.0)
GPU Coder (R2019a)
TensorFlow (1.13.0)
How does
MATLAB Coder and
GPU Coder
achieve these results?
![Page 30: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/30.jpg)
30
Coders Apply Various Optimizations
….
….
CUDA kernel
lowering
Traditional compiler
optimizations
MATLAB Library function mapping
Parallel loop creation
CUDA kernel creation
cudaMemcpy minimization
Shared memory mapping
CUDA code emission
Scalarization
Loop perfectization
Loop interchange
Loop fusion
Scalar replacement
Loop
optimizations
![Page 31: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/31.jpg)
31
Deep Learning Workflow in MATLAB
Deep Neural Network
Design + Training
Train in MATLAB
Model
importer
Trained
DNN
Transfer
learning
Reference
model
Application
Design
Application
logic
Standalone
Deployment
TensorRT and
cuDNN Libraries
MKL-DNN
Library
Coders
ARM Compute
Library
![Page 32: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/32.jpg)
32
Deep Learning with MATLAB
This two-day course provides a comprehensive introduction to practical
deep learning using MATLAB®.
Topics include:
▪ Importing image and sequence data
▪ Using convolutional neural networks for image classification,
regression, and object detection
▪ Using long short-term memory networks for sequence
classification and forecasting
▪ Modifying common network architectures
to solve custom problems
▪ Improving the performance of a network
by modifying training options
![Page 33: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/33.jpg)
33© 2015 The MathWorks, Inc.
Email: [email protected],
LinkedIn: https://www.linkedin.com/in/rishu-gupta-72148914/
![Page 34: Deploying Deep Neural Networks to Embedded GPUs and CPUs - Matlab · Deep Learning with MATLAB This two-day course provides a comprehensive introduction to practical deep learning](https://reader030.fdocuments.us/reader030/viewer/2022040115/5e80ef49b6b8111698117ece/html5/thumbnails/34.jpg)
34
▪ Scan this QR Code or log onto link below
(link also sent to your phone and email)
▪ http://bit.ly/expo19-feedback
▪ Enter the registration id number displayed
on your badge
▪ Provide feedback for this session
Please provide feedback for this block of sessions
Email: [email protected],
LinkedIn: https://www.linkedin.com/in/rishu-gupta-72148914/