Introduction to Deep Learning (NVIDIA)

35
Oct 2016 NVIDIA DEEP LEARNING

Transcript of Introduction to Deep Learning (NVIDIA)

Page 1: Introduction to Deep Learning (NVIDIA)

1

Oct 2016

NVIDIA DEEP LEARNING

Page 2: Introduction to Deep Learning (NVIDIA)

2

ENTERPRISE AUTOGAMING DATA CENTERPRO VISUALIZATION

THE WORLD LEADER IN VISUAL COMPUTING

Page 3: Introduction to Deep Learning (NVIDIA)

3

THE BIG BANG IN MACHINE LEARNING

DNN GPUBIG DATA

100 hours of video uploaded every minute

350 millions images uploaded per day

2.5 Petabytes of customer data hourly

0.0

0.5

1.0

1.5

2.0

2.5

3.0

2008 2009 2010 2011 2012 2013 2014

NVIDIA GPU x86 CPU

TFLO

PS

Page 4: Introduction to Deep Learning (NVIDIA)

4

BIG DATA & ANALYTICS

AUTOMOTIVEAuto sensors reporting

location, problems

COMMUNICATIONSLocation-based advertising

CONSUMER PACKAGED GOODSSentiment analysis of what’s hot, problems

$FINANCIAL SERVICES

Risk & portfolio analysis New products

EDUCATION & RESEARCHExperiment sensor analysis

HIGH TECHNOLOGY / INDUSTRIAL MFG.

Mfg. qualityWarranty analysis

LIFE SCIENCESClinical trials

MEDIA/ENTERTAINMENTViewers / advertising

effectiveness

ON-LINE SERVICES / SOCIAL MEDIA

People & career matching

HEALTH CAREPatient sensors, monitoring, EHRs

OIL & GASDrilling exploration sensor

analysis

RETAILConsumer sentiment

TRAVEL &TRANSPORTATION

Sensor analysis for optimal traffic flows

UTILITIESSmart Meter analysis for network capacity,

LAW ENFORCEMENT & DEFENSE

Threat analysis - social media monitoring, photo analysis

Page 5: Introduction to Deep Learning (NVIDIA)

5

EXPONENTIAL DATA GROWTH

INCREASING DATA VARIETY

Search Marketing

Behavioral Targeting

Dynamic Funnels

User Generated Content

Mobile Web

SMS/MMS

Sentiment

HD Video

Speech To Text

Product/Service Logs

Social Network

Business Data Feeds

User Click Stream

Sensors Infotainment Systems

Wearable Devices

CyberSecurity Logs

ConnectedVehicles

Machine Data

IoT Data

Dynamic Pricing

Payment Record

Purchase Detail

Purchase Record

Support Contacts

Segmentation

Offer Details

Web Logs

Offer History

A/B Testing

BUSINESS PROCESS

PETA

BYTE

STE

RABY

TES

GIG

ABYT

ESEX

ABYT

ESZE

TTAB

YTES

Streaming Video

Natural Language Processing

WEB

DIGITAL

AI 90% of the world’s data created in the last year - IBM

Page 6: Introduction to Deep Learning (NVIDIA)

6

Page 7: Introduction to Deep Learning (NVIDIA)

7

WHAT IS DEEP LEARNING?

ARTIFICALINTELLIGENCE MACHINE

LEARNINGDEEP LEARNINGPerception

Reasoning

Planning

Optimization

Computational Statistics

Supervised and Unsupervised Learning

Neural networks

Distributed Representations

Hierarchical Explanatory Factors

Unsupervised Feature Engineering

Page 8: Introduction to Deep Learning (NVIDIA)

8

DEEP LEARNING FUELING DISCOVERY

Classify Satellite Images for Carbon Monitoring

Analyze Obituaries on the Web for Cancer-related Discoveries

Determine Drug Treatments to Increase Child’s Chance of Survival

NASA AMES

Page 9: Introduction to Deep Learning (NVIDIA)

9

DEEP LEARNING FOR EVERY APPLICATION

Visual search for e-commerce

Visual Search in Geoinformatics

Improving Agriculture: LettuceBot only

sprays weeds

Page 10: Introduction to Deep Learning (NVIDIA)

10

Language Classification

Deep Learning CNN

Super-Human Language Translation

DEEP LEARNING FOR EVERY APPLICATION

Page 11: Introduction to Deep Learning (NVIDIA)

11

DEEP LEARNING FOR EVERY APPLICATION

Page 12: Introduction to Deep Learning (NVIDIA)

12

CONSUMERS LOVE DEEP LEARNING

Page 13: Introduction to Deep Learning (NVIDIA)

13

MORE THAN 1,500 AI START UPS AROUND THE WORLD

Deep Learningfor Art

Deep Learning for Cybersecurity

Deep Learning for Genomics

Deep Learning for Self-Driving Cars

Page 14: Introduction to Deep Learning (NVIDIA)

14

IMAGENET CHALLENGEWhere it all started … again

bird

frog

person

hammer

flower pot

power drill

person

car

helmet

motorcycle

person

dog

chair

1.2M training images • 1000 object categories

Challenge

Page 15: Introduction to Deep Learning (NVIDIA)

15

ACHIEVING SUPERHUMAN PERFORMANCE

2012: Deep Learning researchers

worldwide discover GPUs

2016: Microsoft achieves speech recognition

milestone

2015: ImageNet — Deep Learning achievessuperhuman image

recognition

Page 16: Introduction to Deep Learning (NVIDIA)

16

DEEP LEARNING ADOPTION IS EXPONENTIAL

# of Organizations Using Deep Learning

Source: Jeff Dean, Spark Summit 2016

Page 17: Introduction to Deep Learning (NVIDIA)

17

MASSIVE COMPUTING CHALLENGE

SPEECH RECOGNITION

2014Deep Speech 1

80 GFLOP7,000 hrs of Data

~8% Error

465 GFLOP12,000 hrs of

Data~5% Error

2015Deep Speech 2

10XTraining Ops

IMAGE RECOGNITION

2012AlexNet

8 Layers1.4 GFLOP~16% Error

152 Layers22.6 GFLOP~3.5% Error

2015ResNet

16XModel

Page 18: Introduction to Deep Learning (NVIDIA)

18

Device

NVIDIA DEEP LEARNING PLATFORM

TRAINING

DIGITS Training System

Deep Learning Frameworks

Tesla P100, DGX1

DATACENTER INFERENCING

DeepStream SDK

TensorRT

Tesla P40 & P4

Page 19: Introduction to Deep Learning (NVIDIA)

19

Device

NVIDIA DEEP LEARNING PLATFORM

TRAINING DATACENTER INFERENCING

Training: comparing to Kepler GPU in 2013 using Caffe, Inference: comparing img/sec/watt to CPU: Intel E5-2697v4 using AlexNet

65Xin 3 years

Tesla P100

40Xvs CPU

Tesla P4

Page 20: Introduction to Deep Learning (NVIDIA)

20

40x Efficient vs CPU, 8x Efficient vs FPGA

0

50

100

150

200

AlexNet

CPU FPGA 1x M4 (FP32) 1x P4 (INT8)

Imag

es/S

ec/W

att

Maximum Efficiency for Scale-out Servers

TESLA P4

5.5 TFLOPS

0

20,000

40,000

60,000

80,000

100,000

GoogLeNet AlexNet

8x M40 (FP32) 8x P40 (INT8)TESLA P40Highest Throughput for Scale-up Servers

Imag

es/S

ec

4x Boost in Less than One Year

Page 21: Introduction to Deep Learning (NVIDIA)

21

INTRODUCING TESLA P100

Page Migration Engine

Virtually Unlimited Memory

CoWoS HBM2

3D Stacked Memory (i.e fast!)

NVLink

GPU Interconnect for Maximum Scalability

Page 22: Introduction to Deep Learning (NVIDIA)

22

NVIDIA DGX-1AI Supercomputer-in-a-Box

170 TFLOPS | 8x Tesla P100 16GB | NVLink Hybrid Cube Mesh2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U — 3200W

Page 23: Introduction to Deep Learning (NVIDIA)

23

Instant productivity — plug-and-play, supports every AI framework

Performance optimized across the entire stack

Always up-to-date via the cloud

Mixed framework environments —containerized

Direct access to NVIDIA experts

DGX STACKFully integrated Deep Learning platform

Page 24: Introduction to Deep Learning (NVIDIA)

24

NVIDIA POWERS DEEP LEARNINGEvery major DL framework leverages NVIDIA SDKs

Mocha.jl

NVIDIA DEEP LEARNING SDK

COMPUTER VISION SPEECH & AUDIO NATURAL LANGUAGE PROCESSING

OBJECT DETECTION

IMAGE CLASSIFICATION

VOICE RECOGNITION

LANGUAGE TRANSLATION

RECOMMENDATION ENGINES

SENTIMENT ANALYSIS

Page 25: Introduction to Deep Learning (NVIDIA)

25

NVIDIA DIGITSInteractive Deep Learning GPU Training System

Interactive deep neural network development environment for image classification and object detection

Schedule, monitor, and manage neural network training jobs

Analyze accuracy and loss in real time

Track datasets, results, and trained neural networks

Scale training jobs across multiple GPUs automatically

Page 26: Introduction to Deep Learning (NVIDIA)

26

NVIDIA cuDNNAccelerating Deep Learning

High performance building blocks for deep learning frameworks

Drop-in acceleration for widely used deep learning frameworks such as Caffe, CNTK, Tensorflow, Theano, Torch and others

Accelerates industry vetted deep learning algorithms, such as convolutions, LSTM, fully connected, and pooling layers

Fast deep learning training performance tuned for NVIDIA GPUs

Deep Learning Training PerformanceCaffe AlexNet

Spee

d-up

of

Imag

es/S

ec v

s K4

0 in

201

3

K40 K80 + cuDN…

M40 + cuDNN4

P100 + cuDNN5

0x

10x

20x

30x

40x

50x

60x

70x

80x

“ NVIDIA has improved the speed of cuDNNwith each release while extending the interface to more operations and devices at the same time.”— Evan Shelhamer, Lead Caffe Developer, UC Berkeley

AlexNet training throughput on CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04M40 bar: 8x M40 GPUs in a node, P100: 8x P100 NVLink-enabled

Page 27: Introduction to Deep Learning (NVIDIA)

27

0 50 100 150 200 250 300

P40

P4

1x CPU (14 cores)

Inference Execution Time (ms)

11 ms

6 ms

User Experience: Instant Response45x Faster with Pascal + TensorRT

Faster, more responsive AI-powered services such as voice recognition, speech translation

Efficient inference on images, video, & other data in hyperscale production data centers

INTRODUCING NVIDIA TensorRTHigh Performance Inference Engine

260 ms

Training

Device

Datacenter

Page 28: Introduction to Deep Learning (NVIDIA)

28

NVIDIA DEEPSTREAM SDKDelivering Video Analytics at Scale

Inference

PreprocessHardware Decode

“Boy playing soccer”

Simple, high performance API for analyzing video

Decode H.264, HEVC, MPEG-2, MPEG-4, VP9

CUDA-optimized resize and scale

TensorRT

0

20

40

60

80

100

1x Tesla P4 Server +DeepStream SDK

13x E5-2650 v4 Servers

Conc

urre

nt V

ideo

Str

eam

s

Concurrent Video Streams Analyzed

Page 29: Introduction to Deep Learning (NVIDIA)

29

“Billions of intelligent devices will take advantage of deep learning to provide personalization and localization as GPUs become faster and faster over the next several years.” — Tractica

BILLIONS OF INTELLIGENT DEVICES

Page 30: Introduction to Deep Learning (NVIDIA)

30

SMART CITIES OF THE FUTURE

“Pittsburgh's "predictive policing" program … police car laptops will display maps showing locations where crime is likely to occur, based on data-crunching algorithms developed by scientists at Carnegie Mellon University — Science

Page 31: Introduction to Deep Learning (NVIDIA)

31

ACCELERATED ANALYTICS TECHNOLOGY

Page 32: Introduction to Deep Learning (NVIDIA)

32

GPU-ACCELERATION HAS NO LIMITS

MapDMapD is 55x to 1,000x faster than comparable CPU databases on billion+ row datasets

KineticaHardware costs that are 1⁄10 that of standard in-memory databases

BlazeGraph200-300x speed-up

GraphistrySee 100x more data at millisecond speed

SQreamThe supercomputing powers of the GPU combined with SQream’s patented technology, results in up to 100 times faster analytics performance on terabyte-petabyte scale data sets

Page 33: Introduction to Deep Learning (NVIDIA)

33

MASSIVE SCALE GPU ACCELERATED ANALYTICS

DEA theft of Silk Road bitcoinsSIEM attack escalationTwitter botnet deconstruction

Page 34: Introduction to Deep Learning (NVIDIA)

34

GETTING STARTED WITH DEEP LEARNINGdeveloper.nvidia.com/deep-learning

Page 35: Introduction to Deep Learning (NVIDIA)

35

Thank you!