Post on 20-May-2020
1
A GENTLE INTRODUCTION TO AIMiguel Martínez - Solution Architect
2
AI
3
4
5
6
7
AI, ML, DL
8
1950 1960 1970 1980 1990 2000 2010
9
vehicle
coupe
car
A NEW COMPUTING MODELAlgorithms that Learn from Examples
Traditional Approach
Requires domain experts
Time consuming
Error prone
Not scalable to new problems
Expert Written
Computer
Program
vehicle
coupe
car
Deep Learning Approach
Learn from data
Easy to extend
Speedup with GPUs
10
CATS & DOGS
11
TRAININGLearning a new capability
from existing data
Deep LearningFramework
UntrainedNeural Network
Model
Trained ModelNew capability
INFERENCEApplying this capability
to new data
Application or Service
Trained ModelOptimized for performance
12
TRAININGLearning a new capability
from existing data
Deep LearningFramework
UntrainedNeural Network
Model
Trained ModelNew capability
INFERENCEApplying this capability
to new data
Application or Service
Trained ModelOptimized for performance
13
TRAININGLearning a new capability
from existing data
Deep LearningFramework
UntrainedNeural Network
Model
Trained ModelNew capability
INFERENCEApplying this capability
to new data
Application or Service
Trained ModelOptimized for performance
14
TRAININGLearning a new capability
from existing data
Deep LearningFramework
UntrainedNeural Network
Model
Trained ModelNew capability
INFERENCEApplying this capability
to new data
Application or Service
Trained ModelOptimized for performance
15
ORIGIN OF NEURAL NETWORKSBiologically inspired computational units
Input Output
dendrites
impulses carried toward cell body
axon
impulses carried away from cell body
axonterminals
nucleus
cellbody
16
A SIMPLE NEURONNeurons apply weights to inputs to create output
Input OutputNeuron
x1
w2x2
y
w1x1
x2
17
COMBINING NEURONSStacking neurons and layers creates a more powerful model
x1
x2
x3
x4
x5
Additional neurons can be added to createa layer.
Multiple layers can also be added, resultingin input, hidden, and output layers.
Expanding the neural network size createsadditional predictive power.
In feed forward neural networks, neuronsare fully connected to surrounding layers.
y
Input
Layer
Output
Layer
Hidden
Layers
18
DEEP NEURAL NETWORKS (DNNS)Neural networks with many layers enable deep learning
x1
x2
x3
x4
x5
Input Layer Output LayerMany Hidden Layers
y
19
WHAT PROBLEM ARE YOU SOLVING?Different Tasks to Different Problems
QUESTION AI/DL TASK
Is “it” present
or not?Detection
What type of thing
is “it”?Classification
To what extent
is “it” present?Segmentation
What is the likely
outcome? Prediction
What will likely
satisfy the objective?Recommendation
INPUTSEXAMPLE
OUTPUTS
Text Data Images
AudioVideo
Image/Text Classification
Fraud Detection
Size/Shape
Analysis
Analytic Prediction
Direction
Recommendation
20
NVIDIA GTC
21
22https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
23https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
24https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
25https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
26https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
27https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
“Building an Enterprise Machine Learning Center of Excellence”Zachary Hanif (Capital One)
“Juicing Up Ye Olde GPU Monte Carlo Code”Richard Hayden, Oleg Rasskazov (JP Morgan Chase)
“Extracting Data from Tables and Charts in Natural Document
Formats”Philipp Meerkamp, David Rosenberg (Bloomberg)
“Detection of Financial Statement Fraud using Deep Autoencoder
Networks”Timur Sattarov (PricewaterhouseCoopers GmbH WPG), Marco Schreyer
(German Research Center for Artificial Intelligence)
GTC ONLINE FSI CONTENT
S8843
S8802
S8651
S8343
28https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
“Deep Thinking: The Challenges of Deep Learning and GPU
Acceleration of Financial Data”Erind Brahimi (Wells Fargo)
“Finance - Parallel Processing for Derivative Pricing”Louis Scott (Federal Reserve Bank of New York)
“GPU Acceleration of Monte Carlo Simulation for Capital Markets
and Insurance”Serguei Issakov (Numerix)
“Applying Deep Learning to Financial Market Signal Identification
with News Data”Rafael Nicolas Fermin Cota, Andrew Tan (Triumph Asset Management)
GTC ONLINE FSI CONTENT
S8754
S8123
S7417
S7696
29https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
“Practical Aspects of Porting Monte Carlo Exotic Derivative
Pricing Engines to IBM Power 8+ with Tesla P100 GPUs”Oleg Rasskazov (JP Morgan Chase)
“Algorithmic Trading Strategy Performance Improvement Using
Deep Learning”Masahiko Todoriki (Mizuho Securities. Co., Ltd.)
“Designing a GPU-Based Counterparty Credit Risk System”Patrik Tennberg (TriOptima)
“Accelerating Derivatives Contracts Pricing Computation with
GPGPUs”Daniel Augusto Magalhães Borges, Alexandre Barbosa (BMFBOVESPA)
GTC ONLINE FSI CONTENT
S7668
S6589
S5125
S5570
30https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
“A True Story: GPU in Production for Intraday Risk Calculations”Regis Fricker (Societe Generale)
“Effortless GPU Models for Finance”Ben Young (SunGard)
“GPU Implementation of Explicit and Implicit Finite Difference
Methods in Finance”Mike Giles (University of Oxford)
“Monte-Carlo Simulation of American Options with GPUs”Julien Demouth (NVIDIA)
“Domain Specific Languages for Financial Payoffs”Matthew Leslie (Bank Of America Merrill Lynch)
GTC ONLINE FSI CONTENT
S5666
S4199
S4227
S4784
S3369
31https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
“High Performance Counterparty Risk and CVA Calculations in Risk
Management”Dominique Delarue, Azim Siddiqi (BNP Paribas)
“GPU-enabled Real-time Risk Pricing in Option Market Making”Cris Doloc (Chicago Trading Company)
“High Productivity Computational Finance on GPUs”Peter Phillips, Aamir Mohammad (Aon Benfield Securities)
“Leveraging GPGPU Technology for Valuation of Complex
Insurance Products”Chris Stiefeling (Oliver Wyman Financial Services)
GTC ONLINE FSI CONTENT
S3374
S3173
S2418
S2435
32https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
“GPU-enabled Real-time Risk Pricing in Option Market Making”Cris Doloc (Chicago Trading Company)
“kdb+ and GPUs for Market Data Analytics and Trading”Philip A. Beasley-Harling (Bank of America Merrill Lynch)
“How to Speed Up Financial Risk Management Cost Efficiently for
Intra-day and Pre-deal CVA Calculations”Thomas Moser (Misys)
“Running Risk on GPUs”Tim Wood (ING Bank nv)
“Accelerating Pricing Models with virtual GPUs”Scott Donovan- (Citadel Investment Group)
GTC ONLINE FSI CONTENT
S3173
S2656
GTCE067
GTCE017
2033
33
GPU ACCELERATED MACHINE LEARNING FOR BOND PRICE PREDICTION
Input: 100+ features per trade.
o Trade size / historical
o Coupon rate / time to maturity
o Bond rating
o Trade type buy/sell
o Reporting delays
o Current yield / yield to maturity
Output: Bond trading price.
Launch as many CUDA threads as there
are data elements leverage 5120 Cores
on V100 to run multiple Kernels in
parallel.
https://bit.ly/2GeQLse
NEARLY 10X SPEED UP
OVER CPU IMPLEMENTATION
20 21 22 23 24 25
P
Speedup o
ver
CPU
0
2
4
6
8
10
Unoptimized CUDA
Optimized CUDA
N = 2P, number of rows
34
NVIDIA GPU CLOUD
35
DIY GPU-accelerated AI and HPCdeployments are complex andtime consuming to build, test andmaintain.
Development of software by thecommunity is moving very fast.
Requires high level of expertise tomanage driver, library, frameworkdependencies.
NVIDIA Libraries
NVIDIA Container
Runtime for Docker
NVIDIA Driver
NVIDIA GPU
Applications or
Frameworks
CHALLENGES WITH COMPLEX SOFTWARE
NVIDIA GPU CLOUD (NGC)
36
SIMPLE ACCESS TO GPU-ACCELERATED SOFTWARE
+60 GPU-Accelerated Containers
Deep learning, HPC applicationsand visualization tools, and partnerapplications.
Innovate in Minutes, Not Weeks
Optimized, pre-configured, andready-to-run.
Always up to date
Monthly updates by NVIDIA toensure maximum performance. DEEP LEARNING HPC APPS HPC VIZ
NVIDIA GPU CLOUD (NGC)
37
GPU-ACCELERATED CONTAINERS
Tuned and tested to maximizeperformance.
Cross-stack optimizations.
Pre-integrated and ready-to-run.
Frameworks and applications areisolated.
NVIDIA CONTAINER RUNTIME FOR DOCKERDOCKER ENGINE
NVIDIA DRIVERHOST OS
MOUNTED NVIDIA DRIVERCONTAINER OS
CUDA TOOLKIT
DEEP LEARNING FRAMEWORKSDEEP LEARNING LIBRARIES
APPLICATIONS
NGC SOFTWARE STACK
NVIDIA GPU CLOUD (NGC)
38
USING NGC CONTAINERS
Data Scientists and
ResearchersDevelopers
Eliminate setup time, focus on
science and research
Work with the latest software with
a known good starting point
Sysadmins
Deploy to production
immediately
BENEFITS FOR A WIDE VARIETY OF USERS
39
WHY GPUs
40
BEYOND MOORE´S LAW
1980 1990 2000 2010 2020
103
105
107 GPU PERFORMANCE
1.5X per year
CPU PERFORMANCE
+
BEYOND MOORE’S LAW — 1000X EVERY 10 YEARS ACCELERATED COMPUTING COMPUTERS WRITING SOFTWARE
DATA
DEEP NEURAL
NETWORK
PROGRAM
GPUs
1000X
by 2025
1.5X per year
1.1X per year
41
TRADITIONALCOMPUTE CLUSTER
300 Dual-CPU Servers
180 kW
42
12 Accelerated Servers
with x4 NVIDIA Tesla V100
1/3 the Cost
1/4 the Space
1/5 the Power
ACCELERATED DATACENTER
43
Idea
Code
Experiment
EXPERIMENTAL NATURE OF DEEP LEARNING
Unacceptable training time
44
WHAT IS RAPIDS
45
In GPU Memory
cuXFilter
Visualization
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph AnalyticsDeep Learning
cuDF
Analytics
GPU Accelerated End-to-End Data Science
RAPIDS is a set of open source libraries for GPU accelerating
data preparation and machine learning.
rapids.ai
46
cuDF
• GPU-accelerated data preparation and feature engineering
• Python drop-in Pandas replacement
cuML
• GPU-accelerated traditional machine learning libraries
• XGBoost, PCA, Kalman, K-means, k-NN, DBScan, tSVD…
cuGraph
• GPU-accelerated graph analytics libraries
cuXfilter
• Web Data Visualization library
• DataFrame kept in GPU-memory throughout the session
47
cuML roadmap
cuML Algorithms Available Q2-2019
XGBoost GBDT MGMN
XGBoost Random Forest MGMN
K-Means Clustering MG
K-Nearest Neighbors (KNN) MG
Principal Component Analysis (PCA) SG
Density-based Spatial Clustering of Applications with Noise (DBSCAN) SG
Truncated Singular Value Decomposition (tSVD) SG
Uniform Manifold Aproximation and Projection (UMAP) SG MG
Kalman Filters (KF) SG
Ordinary Least Squares Linear Regression (OLS) SG
Stochastic Gradient Descent (SGD) SG
Generalized Linear Model, including Logistic (GLM) MG
Time Series (Holts-Winters) SG
Autoregressive Integrated Moving Average (ARIMA) SG
SGSingle GPU
MGMulti-GPU
MGMNMulti-GPU Multi-Node
Last updated 29.03.19
48
HOW TO START
49
Source code on GitHub | Containers on NGC & Docker | Conda & PIP packages
On-premisesIn the cloud
https://anaconda.org/rapidsaihttps://github.com/rapidsai https://ngc.nvidia.com
Pascal architecture or better Ubuntu 16.04 or 18.04
CUDA 9.2 or 10.0
50
A step-by-step installation guide
(MS Azure)
1. Create a NC6s_v2 virtual machine, and select NVIDIA GPU Cloud Image for Deep Learning and HPC.
2. Connect to the virtual machine:
$ ssh -L 8080:localhost:8888 -L 8787:localhost:8787 username@public_ip_address
3. Pull the RAPIDS container from NGC. Run it.
$ docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04
$ docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
nvcr.io/nvidia/rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04
4. Run JupyterLab:
(rapids)$ bash /rapids/notebooks/utils/start-jupyter.sh
5. Open your browser, and navigate to http://localhost:8080.
6. Navigate to cuml folder for cuML examples, or mortgage folder for XGBoost examples.
51
A step-by-step installation guide
(Amazon Web Services)
1. Create a p3.8xlarge virtual machine, and select NVIDIA Volta Deep Learning AMI as image.
2. Connect to the virtual machine:
$ ssh -L 8080:localhost:8888 -L 8787:localhost:8787 ubuntu@public_ip_address
3. Pull the RAPIDS container from NGC. Run it.
$ docker pull nvcr.io/nvidia/rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04
$ docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
nvcr.io/nvidia/rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04
4. Run JupyterLab:
(rapids)$ bash /rapids/notebooks/utils/start-jupyter.sh
5. Open your browser, and navigate to http://localhost:8080.
6. Navigate to cuml folder for cuML examples, or mortgage folder for XGBoost examples.
52
PORT YOUR CODE
53
CPU vs GPU
Training results
CPU: 57.1 seconds
GPU: 4.28 seconds
System: AWS p3.8xlarge
CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 32 vCPU cores, 244 GB RAM
GPU: Tesla V100 SXM2 16GB
PRINCIPAL COMPONENT
ANALYSIS(PCA)
Specific: Import CPU algorithm
Common: Data loading and algo params Common: Data loading and algo params
Specific: DataFrame from Pandas to GPU
Common: Model training Common: Model training
Specific: Import GPU algorithm
54
CPU vs GPU
Training results
CPU: ~9 minutes
GPU: 1.12 seconds
System: AWS p3.8xlarge
CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 32 vCPU cores, 244 GB RAM
GPU: Tesla V100 SXM2 16GB
K-NEAREST NEIGHBORS
(KNN)
Specific: DataFrame from Pandas to GPU
Specific: Import CPU algorithm Specific: Import GPU algorithm
Common: Data loading and algo params Common: Data loading and algo params
Specific: Model trainingSpecific: Model training
55
TRAINING TIME COMPARISON
The bigger the dataset is, the higher
the training performance difference is
between CPU and GPU.
Dataset size trained in 15 minutes.
CPU: ~130.000 rows.
GPU: ~5.900.000 rows.
Specs NC6s_vs
Cores(Broadwell 2.6Ghz)
6
GPU 1 x P100
Memory 112 GB
Local Disk ~700 GB SSD
Network Azure Network
CPU vs GPU
56
BENCHMARKS
Benchmark
200GB CSV dataset; Data preparation includes joins, variable transformations.
CPU Cluster Configuration
CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark
DGX Cluster Configuration
5x DGX-1 on InfiniBand network
Time in seconds — Shorter is better
XGBoost
2.290
1.956
1.999
1.948
169
157
0 1.000 2.000 3.000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
0 5.000 10.000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
2.741
1.675
715
379
42
19
0 1.000 2.000 3.000
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
cuDF – Load and Data Prep cuML – XGBoost End-to-End
8,763
6,147
3,926
3,221
322
213
cuDF (Load and Data Preparation) Data Conversion
57
WHAT IS XGBOOST
58
DEFINITION
XGBoost is an implementation of gradient
boosted decision trees designed for speed
and performance.
It is a powerful tool for
solving classification and
regression problems in a
supervised learning setting.
XGBOOST
59
WHO ENJOYS COMPUTER GAMESExample of Decision Trees
Input: age, gender, hair colour, … Does the person like computer games?
Age < 30
Is male?
+2 +1 -1 -1 -1Prediction score in each leaf
60
COMBINE TREES FOR BETTER PREDICTIONSEnsembled Decision Trees
Age < 30
Is male?
+2 +1 -1 -1 -1
Use computer
daily?
+0.9 +0.9 +0.9 -0.9 -0.9
Tree 1 Tree 2
f(‘Bill’) = 2 + 0.9 = 2.9 f(‘Sam’) = -1 - 0.9 = -1.9
61
TRAINED MODELS VISUALIZATIONSingle Decision Tree vs Ensembled Decision Trees
Source: https://goo.gl/GWNdEm
62
WHY XGBOOST
63
Winner of Caterpiller Kaggle Contest 2015
– Machinery component pricing
Winner of CERN Large Hadron Collider Kaggle Contest 2015
– Classification of rare particle decay phenomena
Winner of KDD Cup 2016
– Research institutions’ impact on the acceptance of submitted academic papers
Winner of ACM RecSys Challenge 2017
– Job posting recommendation
A STRONG HISTORY OF SUCCESSOn a Wide Range of Problems
64
WHICH ML ALGORITHM PERFORMS BETTERAverage Rank Across 165 Datasets
Source: https://goo.gl/R8Y8Pp
Lower
Is
better
65
XGBOOST + RAPIDS
66
XGBoost
• Tuned for eXtreme performance and high efficiency
• Multi-GPU and Multi-Node Support
RAPIDS
• E2E data science & analytics pipeline entirely on GPU
• User-friendly Python interfaces
• Relies on CUDA primitives, exposes parallelism and
high-memory bandwidth
• Dask integration for managing workers and data in
distributed environments
+
67
LEARN MORE
69
CODE EXAMPLES
70
LOADING DATA INTO A GPU DATAFRAME
Create an empty DataFrame, and add a column
cuDF code examples
Create a DataFrame with two columns
Load a CSV file into a GPU DataFrame
Use Pandas to load a CSV file, and copy its content into a GPU DataFrame
71
WORKING WITH GPU DATAFRAMEScuDF code examples
Return the first three rows as a new DataFrame Row slicing with column selection
Find the mean and standard deviation of a column Count number of occurrences per value, and number of unique values
Transform column values with a custom function Change the data type of a column
72
QUERY, SORT, GROUP, JOIN, …cuDF code examples
Query a DataFrame with a boolean expression
Return the first ‘n’ rows ordered by ‘columns’
Sort a column by its values
One-hot encoding
Group by column with aggregate function
Join and merge DataFrames
73
SUMMARY
74
GPU Accelerated Data Science
RAPIDS is a set of open source libraries for GPU
accelerating data preparation and machine learning.
www.rapids.ai
75
ONE MORE THING
76
77