VISUAL COMPUTING FOR CLOUD MOBILE...VISUAL COMPUTING FOR CLOUD MOBILE . 2 THREE TRENDS CONVERGING...
Transcript of VISUAL COMPUTING FOR CLOUD MOBILE...VISUAL COMPUTING FOR CLOUD MOBILE . 2 THREE TRENDS CONVERGING...
HPC Advisory Council Singapore
October 7, 2014
Marc Hamilton, Vice President,
Solution Architecture and Engineering
VISUAL COMPUTING
FOR CLOUD MOBILE
2
THREE TRENDS CONVERGING
Torrent of Data
2010 2015
Exabyte
s of
unst
ructu
red d
ata
Deep Neural Networks GPU Computing
SOURCE: : IDC
3
Branch of Artificial Intelligence
Computers that learn from data
person
car
helmet
motorcycle
bird
frog
person
dog
chair
person
hammer
flower pot
power drill
MACHINE LEARNING
4
DEEP LEARNING IN A LARGER CONTEXT
Data Science
(“Big Data”)
Data
Analysis
Data
Management
Some GPU value
SVM
K-Means
Clustering
Deep Learning
Deep Neural Nets
Convolutional Neural Nets
Strong GPU value
Recommender Systems
Collaborative Filtering
Regression
Bayesian Networks
Decision Trees
Random Forests
Semantic Analysis
More research to prove
GPU value
Machine
Learning
Distributed
Storage
e.g. HDFS
Queries & Indexing
e.g. Map-D, GISFederal, SQream
Data Mining
e.g. Statistics
5
GPUS FOR DEEP LEARNING
1.2M training images • 1000 object
categories Hosted by
Image Recognition
CHALLENGE Winning %
Error
GPU usage for ILSVRC
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0%
5%
10%
15%
20%
25%
30%
2010 2011 2012 2013 2014
Winning % Error
% Teams
using GPUs
6
NUS WINS IMAGENET 2014 CHALLENGE
7
MACHINE LEARNING USE CASES
Face Detection
Autonomous Driving Image / Video Tagging
Speech Recognition
Product Recommendations
Object Recognition
Situational Awareness
…machine learning is pervasive
8
A B C
D E F
G H I
a b c
d e f
g h i
EFFICIENT CONVOLUTIONS ON GPUS
Convolution as GEMM (matrix-matrix product) => Great on GPUs
x
y
image
kernel α
…
- A B - D E - G H
A B C D E F G H I
B C - E F - H I -
…
i
h
g
f
… e …
d
c
b
a
x,y
α
i h g
f e d
c b a
9
INTRODUCING NVIDIA CUDNN
Lets DNN researchers focus on DNNs
We provide expertly tuned computational components
Accelerate, don’t replace, existing popular DNN frameworks
Forward and backward convolution routines tuned for NVIDIA GPUs
Optimized for all future NVIDIA GPU generations
Arbitrary dimension ordering, striding, and subregions for 4d tensors means easy integration into any neural net implementation
Download: http://www.nvidia.com/cudnn
Contact: [email protected]
10
USING CAFFE WITH CUDNN
Accelerate Caffe layer types by 1.2 – 3x
Example: AlexNet Layer 2 forward:
1.9x faster convolution, 2.7x faster pooling
Integrated into Caffe dev branch today! (targeting official release with Caffe 1.0)
Comparison against SOL: ~50% headroom
(still trying to figure this out)
CPU could probably get within ~3x
Caffe (CPU*)
1x
Caffe (GPU) 11x
Caffe (cuDNN)
14x
Baseline Caffe compared to Caffe
accelerated by cuDNN on K40
Overall AlexNet training time
*CPU is 24 core E5-2697v2 @ 2.4GHz
Intel MKL 11.1.3
11
Deep Learning with COTS HPC Systems
A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro
Stanford / NVIDIA • ICML 2013
STANFORD AI LAB
3 GPU-Accelerated Servers
12 GPUs • 18,432 cores
4 kWatts
$33,000
Now You Can Build Google’s
$1M Artificial Brain on the
Cheap
“
“
-Wired
1,000 CPU Servers 2,000 CPUs • 16,000
cores
600 kWatts
$5,000,00
0
GOOGLE BRAIN
12
Mobile - More Than Just Phones
13
MOBILE
ARCHITECTURE
Maxwell
Kepler
Tesla
Fermi
Tegra 3
Tegra 4
Tegra
K1
GPU
ARCHITECTURE
UNIFIED ARCHITECTURE TEGRA K1 – MOBILE SUPER
CHIP
BREAKTHROUGH EXPERIENCES
TEGRA TK1
14
192 CUDA cores
326 GFLOPS
VisionWorks SDK
JETSON TK1 DEV KIT 1ST MOBILE SUPERCOMPUTER FOR EMBEDDED SYSTEMS
15
DIGITAL COCKPIT
EVOLUTION OF COMPUTING IN THE CAR
Tegra 4 Tegra 3 Tegra K1
Virtual Cockpit Autonomous Driving Infotainment
16
COMPUTER VISION ON CUDA
Feature Detection / Tracking ~30 GFLOPS @ 30 Hz
Object Recognition / Tracking ~180 GFLOPS @ 30 Hz
3D Scene Interpretation ~280 GFLOPS @ 30 Hz
17
Without GPU With GPU
NIGHT AND DAY DIFFERENCE HTTP://NVIDIA.COM/TRYGRID
18
Thank You