REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690...

45
Dr. Ettikan Kandasamy Karuppiah Director, Developers’ Ecosystem 2 November 2017 REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING / AI

Transcript of REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690...

Page 1: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

Dr. Ettikan Kandasamy Karuppiah

Director, Developers’ Ecosystem

2 November 2017

REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING / AI

Page 2: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

2

“Find where I parked my car”

AI IS EVERYWHERE

“Find the bag I just saw in this magazine”

“What movie should I watch next?”

Page 3: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

3

Bringing grandmother closer to family by bridging language barrier

TOUCHING OUR LIVES

Predicting sick baby’s vitals like heart rate, blood pressure, survival rate

Enabling the blind to “see” their surrounding, read emotions on faces

Page 4: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

4

Increasing public safety with smart video surveillance at airports & malls

AI FOR PUBLIC GOOD

Providing intelligent services in hotels, banks and stores

Separating weeds as it harvests, reduces chemical usage by 90%

Page 5: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

5

HOW A DEEP NEURAL NETWORK SEES

Image “Audi A7”

Raw data Low-level features Mid-level features High-level features

Page 6: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

6

KNOW YOUR PROBLEM WELL

Page 7: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

7

3000

AI MOMENTUM

TODAY BY 2020

AI startups

85% $47B 20%

of all customer service interactions will

be powered by AI bots

spend on AI technologies

of companies will dedicate workers to monitor and guide neural networks

Page 8: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

8

EVERY INDUSTRY HAS AWOKEN TO AI

2014 2016

1,549

19,439

Higher Ed

Internet

Healthcare

Finance

Automotive

Others

Government

Developer Tools

Organizations engaged with NVIDIA on Deep Learning

Page 9: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

9

1980 1990 2000 2010 2020

GPU-Computing perf

1.5X per year

1000X

by

2025

RISE OF GPU COMPUTING

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte,

O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected

for 2010-2015 by K. Rupp

102

103

104

105

106

107

Single-threaded perf

1.5X per year

1.1X per year

APPLICATIONS

SYSTEMS

ALGORITHMS

CUDA

ARCHITECTURE

Page 10: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

10

GPUCPU

Add GPUs: Accelerate Data Processing & Analytics

© NVIDIA 2013

Page 11: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

11

NVIDIA IGNITES

THE AI BIG BANGArtificial intelligence is the use of computers to simulate human intelligence.

AI amplifies our cognitive abilities — letting us solve problems where the

complexity is too great, the information is incomplete, or the details are

too subtle and require expert training.

Learning from data — a computer’s version of life experience — is how AI evolves.

GPU computing powers the computation required for deep neural networks to learn

to recognize patterns from massive amounts of data.

This new computing model sparked the AI era.

Page 12: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

12

DEEP LEARNING FRAMEWORKS

VISION SPEECH BEHAVIOR

Object Detection Voice Recognition Language TranslationRecommendation

EnginesSentiment Analysis

DEEP LEARNING

cuDNN

MATH LIBRARIES

cuBLAS cuSPARSE

MULTI-GPU

NCCL

cuFFT

Mocha.jl

Image Classification

NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning

Page 13: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

13

ANNOUNCING TESLA V100GIANT LEAP FOR AI & HPCVOLTA WITH NEW TENSOR CORE

21B xtors | TSMC 12nm FFN | 815mm2

5,120 CUDA cores

7.5 FP64 TFLOPS | 15 FP32 TFLOPS

NEW 120 Tensor TFLOPS

20MB SM RF | 16MB Cache

16GB HBM2 @ 900 GB/s

300 GB/s NVLink

Page 14: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

14

NEW TENSOR CORE

New CUDA TensorOp instructions & data formats

4x4 matrix processing array

D[FP32] = A[FP16] * B[FP16] + C[FP32]

Optimized for deep learning

Activation Inputs Weights Inputs Output Results

Page 15: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

15

Page 16: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

16

MODEL COMPLEXITY IS EXPLODING

2016 — Baidu Deep Speech 22015 — Microsoft ResNet 2017 — Google NMT

105 ExaFLOPS8.7 Billion Parameters

20 ExaFLOPS300 Million Parameters

7 ExaFLOPS60 Million Parameters

Page 17: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

17

REVOLUTIONARY AI PERFORMANCE3X Faster DL Training Performance

Over 80x DL Training Performance in 3 Years

1x K80cuDNN2

4x M40cuDNN3

8x P100cuDNN6

8x V100cuDNN7

0x

20x

40x

60x

80x

100x

Q1

15

Q3

15

Q2

17

Q2

16

Googlenet Training Performance(Speedup Vs K80)

Speedup v

s K80

85% Scale-Out EfficiencyScales to 64 GPUs with Microsoft

Cognitive Toolkit

0 5 10 15

64X V100

8X V100

8X P100

Multi-Node Training with NCCL2.0(ResNet-50)

ResNet50 Training for 90 Epochs with 1.28M images dataset | Using Caffe2 | V100 performance measured on pre-production hardware.

1 Hour

7.4 Hours

18 Hours

3X Reduction in Time to Train Over P100

0 10 20

1XV100

1XP100

2XCPU

LSTM Training(Neural Machine Translation)

Neural Machine Translation Training for 13 Epochs |German ->English, WMT15 subset | CPU = 2x Xeon E5 2699 V4 | V100 performance

measured on pre-production hardware.

15 Days

18 Hours

6 Hours

Page 18: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

18

Deep Learning-Inferencing:TESLA V100 DELIVERS NEEDED RESPONSIVENESS WITH UP TO 99X MORE THROUGHPUT

0

1,000

2,000

3,000

4,000

5,000

CPU Server Tesla V100

ResNet-50

4,647 i/s@ 7mslatency

47 i/s@ 21msLatency

0

600

1,200

1,800

CPU Server Tesla V100

VGG-16

1,658i/s@ 7ms

Latency

23 i/s@ 43msLatency

0

500

1,000

1,500

2,000

2,500

3,000

3,500

CPU Server Tesla V100

GoogleNet

3,270 i/s@ 7ms

Latency

136 i/s@ 7ms

LatencyThro

ughput

(Im

ages/

Sec)

Thro

ughput

(Im

ages/

Sec)

Thro

ughput

(Im

ages/

Sec)

CPU: Xeon E5-2690 V4

GPU Servers: Dual Xeon E5-2690 [email protected] with 16GB PCIe GPUs configs as shownUbuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13; NCCL 2.0.4, TensorRT pre-release, data set: ImageNet,GPU Optimal batch size used to achieve 7ms latency; CPU batch size reduced to 1 if latency exceeds 7ms

Page 19: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

NVIDIA METROPOLIS —EDGE TO CLOUD AI CITY PLATFORM

Camera

CLOUDTraining and Inference

EDGE AND ON-PREMISESInference

DGX Appliance

Video recorder Server

JETSON TESLA/QUADRO

JETPACK, TENSOR RT, DEEPSTREAM

Page 20: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

20

0

10

20

30

40

50

60

70

HPCG Performance EquivalencySingle GPU Server vs Multiple CPU-Only Servers

CPU Server: Dual Xeon E5-2690 [email protected], GPU Servers: same CPU server w/ V100s PCIeCUDA Version: CUDA 9.0.103; Dataset: 256x256x256 local sizeTo arrive at CPU node equivalence, we use measured benchmark with up to 8 CPU nodes. Then we use linear scaling to scale beyond 8 nodes.

HPCGBenchmark

Exercises computational and data access patterns that closely match a broad set of important HPC applications

VERSION3

ACCELERATED FEATURESAll

SCALABILITYMulti-GPU and Multi-Node

MORE INFORMATIONhttp://www.hpcg-benchmark.org/index.html

19 CPU Servers

37 CPU Servers

67 CPU Servers # of CPU Only Servers

1 server 8x V100 GPU’s

19x 36x 67xSpeed up vs

CPU server

1 server 2x V100 GPU’s

1 server 4x V100 GPU’s

Page 21: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

REINVENTING OUR COMMUNITY

Page 22: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

REINVENTING OUR COMMUNITY

Page 23: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

23

AUTOMOTIVE DEEP LEARNING

Page 24: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

Deep Learning for Damage Estimation based on photo of the damage

Custom development for one of top 5 Insurance Companies

Total Data Available

90,000 Claims~380,000 car images

Scope of Work

Data Processed to Date*

1000 Claims4500 images

*Time lag in data migration, data clean-up and tagging images

Page 25: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

Visualization of the activations of theconvolutions over a car damage image

Visualization of the layers of a neural network and the back propagation process

Neural Network Demo

https://www.galacticar.ai/ -> Galaxy.ai : Artificial Intelligence Driven Sedan Damage Estimator

Page 26: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

Software as a Service

Annual fee + fixed fee every time API is pinged

Integration with Insurance Mobile app.

Use cases verified by the industry and insurance clients:

1) Whether client should file a claim or not - Claims triaging

2) Claims estimation from damaged sedan vehicle images

Business Model

Page 27: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

27

(1) DEEP LEARNING IN FINANCE - TRADING

http://on-demand.gputechconf.com/gtc/2016/presentation/s6589-masahiko-todoriki-performance-improvement-algorithmic-trading.pdf

http://tradetechasia.wbresearch.com/ 20 Oct 2016

Page 28: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

Generic Bigdata Operations

Correction

& Assurance

Real-time

Data

Sources

Data Analytics

Data Visualization

Scrambling

&

Protecting

Fields/Data

Data

Governanc

e

Published

Data

Marts/Lake

Harmonization

& Ontology

Mapping

Data

Harmonization

Harmonization

Terminologies

Data

Quality &

Integrity

Data

Cleansing

Data

Staging

Data

Acquisition

Unstructured

Data

Sources

Structured

Data

Sources

Data Preparation Data Dissemination

Virtualized Platform & Security Management

Data Warehousing/Storage (including Network Data)

Check for different Representation and usage Ontology

Check forCorrectionException

Data Anonymity &

Data Protection

Structured Data Access

Data Modeling

Sentiment Analytics

Data Reporting & Visualization

Data Interpretation

Social Network Understanding

Real-time Data Ingestion

Data Analytics

NetworkAnalytics

Unstructured Data Collection

Data ……….

Data Statistics

Mobile Sharing

Tablet Sharing

PC Sharing

Push & Pull Data Platform

Data Models

& Schema

Page 29: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

29

GPU DATABASES ARE EVEN FASTER1.1 Billion Taxi Ride Benchmarks

21 30

1560

80 99

1250

150269

2250

372

696

2970

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

MapD DGX-1 MapD 4 x P100 Redshift 6-node Spark 11-node

Query 1 Query 2 Query 3 Query 4

Tim

e in M

illise

conds

Source: MapD Benchmarks on DGX from internal NVIDIA testing following guidelines of Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS @marklit82

10190 8134 19624 85942

Page 30: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

30NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Silicon Valley-based Blue River

Technology has developed

a deep learning solution called

LettuceBot that rolls through a

field photographing 5,000

young plants a minute, using

algorithms and machine vision

to identify each sprout as

lettuce or a weed. The

company trained their neural

network with GPUs and the

Caffe deep learning

framework.

Accurate within a quarter inch,

the LettuceBot automatically

pinpoints weeds,

underdeveloped sprouts, and

overplanted areas and then

applies tiny doses of herbicide

to maximize crop production.

Automated Crop Management

Source : https://news.developer.nvidia.com/artificial-intelligence-helping-to-ensure-humanitys-future-food-supply/

Page 31: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

31NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Researchers from the Costa

Rica Institute of Technology

and French Agricultural

Research Centre for

International Development

developed a deep learning

algorithm to automatically

identify plant specimens that

have been pressed, dried and

mounted on herbarium sheets.

According to the researchers,

this is the first attempt to use

deep learning to tackle the

difficult taxonomic task of

identifying species in natural-

history collections.

Artificial Intelligence Helps Identify Plant Species

Source : https://news.developer.nvidia.com/artificial-intelligence-helps-identify-plant-species-for-science/

Page 32: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

32

Train

whole slide image

sample

sample

training data

no

rmal

tum

or

Test

whole slide imageoverlapping image

patches tumor prob. map

1.0

0.0

0.5

Convolutional Neural Network

P(tumor)

How does it work?HOW DOES IT WORK?

Page 33: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

SAFE AND SMART CITIES IS AN AI PROBLEM

0M

200M

400M

600M

800M

1,000M

2016 2020

1 billion installed security cameras WW (2020)

30 billion frames per day

Challenging real world conditions

Traditional video analytics not trustworthy

74%

9…

2010 2011 2012 2013 2014 2015 2016

Accura

cy

Image Classification

Human

Hand-coded CV

Deep Learning

AI achieves super human results

AI driven intelligent video analytics

Page 34: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

34

Page 35: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

35

Page 36: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

36

Page 37: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;
Page 38: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

38

Page 39: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

39

AI FOR VISUAL SEARCH

IN MARKET PLACE

Page 40: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

40

TESLA PLATFORMLeading Data Center Platform for HPC and AI

TESLA GPU & SYSTEMS

NVIDIA SDK

INDUSTRY TOOLS

APPLICATIONS & SERVICES

C/C++

ECOSYSTEM TOOLS & LIBRARIES

HPC

+400 More Applications

cuBLAS

cuDNN TensorRT

DeepStreamSDK

FRAMEWORKS MODELS

Cognitive Services

AI TRAINING & INFERENCE

Machine Learning Services

ResNetGoogleNetAlexNet

DeepSpeechInceptionBigLSTM

DEEP LEARNING SDK

NCCL

COMPUTEWORKS

NVLINK SYSTEM OEM CLOUDTESLA GPU

Page 41: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

41

Instant productivity — plug-and-play, supports every AI framework

Performance optimized across the entire stack

Docker/Container Development and Deployment model

Always up-to-date via the cloud

Mixed framework environments — containerized

Direct access to NVIDIA experts

DGX STACKFully integrated Deep Learning platform

Page 42: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

42

DEEP LEARNING REQUIREMENTS

DEEP LEARNING NEEDS… DEEP LEARNING CHALLENGES NVIDIA DELIVERS

Data Scientists Demand far exceeds supply DIGITS, DLI Training

Latest Algorithms Rapidly evolvingDL SDK, GPU-Accelerated

Frameworks

Fast Training Impossible -> Practical DGX-1, P100, P40, TITAN X

Deployment Platform Must be available everywhereTensorRT, P40, P4, Jetson,

Drive PX

Page 43: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

43

DEEP LEARNING SOFTWARE

developer.nvidia.com/deep-learning

Page 44: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

44

NVIDIA DEEP LEARNING INSTITUTE

Training organizations and individuals to solve challenging problems using Deep Learning

On-site workshops and online courses presented by certified experts

Covering complete workflows for proven application use cases

Self-driving cars, recommendation engines, medical image classification, intelligent video analytics and more

Hands-on training for data scientists and software engineers

Nvidia AI Conference Singapore

23/24th October 2017

Page 45: REAL WORLD PROBLEM SIMPLIFICATION USING DEEP LEARNING … · GPU Servers: Dual Xeon E5-2690 v4@2.6GHz with 16GB PCIe GPUs configs as shown Ubuntu 14.04.5, CUDA 9.0.103, cuDNN 7.0.1.13;

THANK YOU