GTC China 2016
-
Upload
nvidia -
Category
Technology
-
view
491 -
download
0
Transcript of GTC China 2016
2
GPU DEEP LEARNING BIG BANG
Deep Learning NVIDIA GPU
NIPS (2012)
ImageNet Classification with Deep ConvolutionalNeural Networks
Alex KrizhevskyUniversity of Toronto
Ilya SutskeverUniversity of Toronto
Geoffrey e. HintonUniversity of Toronto
3
74%
96%
2010 2011 2012 2013 2014 2015
DL
GPU DEEP LEARNING ACHIEVES “SUPERHUMAN” RESULTS
2012: Deep Learning researchersworldwide discover GPUs
2015: DNN achieves superhuman image recognition
2015: Deep Speech 2 achieves superhuman voice recognition
ImageNet — Accuracy %
Human
Hand-coded CV
Microsoft, Google
3.5% error rate
5
ANNOUNCING NEW GRAPHICS SDKS
Funhouse VROpen Source
360 Video 1.0Real-Time Panoramic VR
Iray VRPhotorealistic VR Ray Tracing
GVDBSparse Volumes for
Special Effects
Remote RenderingVideo Compositing
AnselIn-game Photography
VolumetricPhysical Light Models
OptiX 4.0Multi-GPU Ray-Tracing
MDL 1.0Physically Based Materials
Mental RayNow GPU-Accelerated!
9
GTC — 25X GROWTH IN GPU DL DEVELOPERS
4X Attendees 3X GPU Developers 25x Deep Learning Developers
2014
55,000400,00016,000
2,200120,000
3,700
• Australia• China• Europe• India
• Japan• Korea• United States
(Silicon Valley, D.C.)
20162014 2016
• Japan• United States
• Higher Ed 35%• Software 19%• Internet 15%• Auto 10%
• Government 5%• Medical 4%• Finance 4%• Manufacturing 4%
2014 2016
14
GPU DEEP LEARNING IS A NEW COMPUTING MODEL
Training
Device
Datacenter
TRAINING
Billions of Trillions of Operations
GPU train larger models,accelerate time to market
15
GPU DEEP LEARNING IS A NEW COMPUTING MODEL
Training
Device
Datacenter
DATACENTER INFERENCING
10s of billions of image, voice, video queries per day
GPU inference for fast response, maximize datacenter throughput
16
GPU DEEP LEARNING IS A NEW COMPUTING MODEL
Training
Device
Datacenter
DEVICE INFERENCING
Billions of intelligent devices
GPU for real-time accurate response
17
AI — THE ULTIMATE COMPUTING CHALLENGE
IMAGE RECOGNITION SPEECH RECOGNITION
Important Property of Neural Networks
Results get better with
more data +bigger models +
more computation
(Better algorithms, new insights and improved techniques always help, too!)
2012AlexNet
2015ResNet
152 layers
22.6 GFLOP/image
~3.5% error8 layers
1.4 GFLOP/image
~16% Error
16XModel
2014Deep Speech 1
2015Deep Speech 2
2 ExaFLOPS
25M | 7,000 Hours
~8% Error
10XTraining Ops
20 ExaFLOPS
100M | 12,000 Hours
~5% Error
18
PASCAL “5 MIRACLES” BOOST DEEP LEARNING 65X
Pascal — 5 Miracles NVIDIA DGX-1 Supercomputer 65X in 4 yrs Accelerate Every Framework
PaddlePaddleBaidu Deep Learning
Pascal
16nm FinFET
CoWoS HBM2
NVLink
cuDNN
Chart: Relative speed-up of images/sec vs K40 in 2013. AlexNet training throughput based on 20 iterations. CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04. M40 datapoint: 8x M40 GPUs in a node P100: 8x P100 NVLink-enabled.
Kepler
Maxwell
Pascal
X
10X
20X
30X
40X
50X
60X
70X
2013 2014 2015 2016
19
ANNOUNCINGNEW IBM SERVERPOWER8 + NVIDIA TESLA P100 FOR THE AI ENTERPRISE
“ Putting NVIDIA’s technology into the IBM system will speed
up performance for such emerging workloads as AI, deep
learning and data analytics.” — eWeek
22
ANNOUNCINGTESLA P4 & P40 INFERENCING ACCELERATORS
Pascal Architecture | INT8
P40: 250W | 40X Energy Efficient versus CPU
P40: 250W | 40X Performance versus CPU
23
ANNOUNCINGTensorRTPERFORMANCE OPTIMIZING INFERENCING ENGINE
FP32, FP16, INT8 | Vertical & Horizontal Fusion | Auto-Tuning
VGG, GoogLeNet, ResNet, AlexNet & Custom Layers
Available Today: developer.nvidia.com/tensorrt
26
NVIDIA GPUDEEP LEARNING EVERYWHERE
Alibaba/Aliyun
iQIYI
Shazam
Amazon
JD.com
Skype
Orange
Flickr
Periscope
Yahoo Supermarket
Yandex
iFLYTEK
Qihoo 360
Yelp
eBay
Tencent
Netflix
Baidu
Sogou
Microsoft
27
>1,500 AI STARTUPS AROUND THE WORLD
Deep Learning for Cybersecurity
Deep Learning for Genomics
Deep Learning for Self-Driving Cars
Deep Learning for Art
28
AI STARTUPS IN CHINA
Weather & Environment Forecast
Eye-tracking for Human-machine Interaction
MedicalImaging
Face Recognition
Product Recognition, Detection, Search
Personal Concierge App
30
“BILLIONS OF INTELLIGENT DEVICES”
“Billions of intelligent devices will take advantage of DNNs to provide personalization and localization as GPUs become faster and faster over the next several years.”
— Tractica
31
AI CITY — 1B CAMERAS BY 2020
~1 billion cameras worldwide by 2020
30 billion inferences/sec
Tesla P40: 2,500 inferences/sec @ 720P
AI City needs ~10M P40 servers
DATA: 1B cameras, IHS “Video Surveillance Intelligence Service, Aug. 2016”
32
1/20TH THE SPACE, 1/10TH THE POWER
Hikvision Blade16 Jetson TX1s
NVIDIA DGX-1 Traditional Server Hikvision Blade
~21 1U Servers42 CPUs~4,000 W
1 Hikvision Blade16 TX1 + 1 CPU>8 1080 streams
~300 W
34
AI TRANSPORTATION — $10T INDUSTRY
PERCEPTION AI PERCEPTION AI LOCALIZATION DRIVING AI
DEEP LEARNING
37
NVIDIA DRIVE PX 2AutoCruise to Full Autonomy — One Architecture
Full Autonomy
AutoChauffeur
AutoCruise
AUTONOMOUS DRIVINGPerception, Reasoning, Driving
AI Supercomputing, AI Algorithms, Software
Scalable Architecture
38
NVIDIA DRIVE PX 2 AUTOCRUISE
10W AI Car Computer | Passive Cooling | Automotive IO
AI Highway Driving | Localization & Mapping
41
NVIDIA END-TO-END DEEP LEARNING PLATFORM
TRAINING
PaddlePaddleBaidu Deep Learning
DGX-1TESLA P100
42
NVIDIA END-TO-END DEEP LEARNING PLATFORM
TRAINING
PaddlePaddleBaidu Deep Learning
DGX-1TESLA P100
DATACENTER INFERENCING
ANNOUNCING TESLA P4 & P40
ANNOUNCINGTensorRT
43
NVIDIA END-TO-END DEEP LEARNING PLATFORM
TRAINING
PaddlePaddleBaidu Deep Learning
DGX-1TESLA P100
DATACENTER INFERENCING
ANNOUNCING TESLA P4 & P40
ANNOUNCINGTensorRT
CUDA
JETPACK DRIVEWORKS
JETSON TX1ANNOUNCING
DRIVE PX 2 AUTOCRUISE
INTELLIGENT DEVICES
45
AI FOR EVERYONE
AI will Revolutionize Transportation AI will Revolutionize Healthcare AI will Revolutionize Society