Predicting Loan Delinquency at One Million Transactions per Second
-
Upload
revolution-analytics -
Category
Technology
-
view
672 -
download
3
Transcript of Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquencyat 1M Transactions per Second
David Smith @revodavidR Community Lead, Microsoft
Generating Predictions
Batch Mode
• Create many (millions!) of predictions at once
• Time required proportional to number of predictions
Real Time
• Only a few (maybe only one!) data point available to predict– There may be multiple requests in a short timeframe
• Latency the key metric here– Many applications require sub-second latency at endpoint
4
Real-Time Operationalization Options
• Rewrite prediction code in some other language
– PMML / C++ / Java / …
• OR, use your R code:
– Deploy as a web service with Microsoft R Server
– Deploy as a stored procedure in SQL Server
5
Lending Club Loan Performance Data
• www.lendingclub.com/info/download-data.action
– Feature selection and generation: aka.ms/lendingclub
6
LoanStatNew Description
all_util Balance to credit limit on all trades
annual_inc_joint The combined self-reported annual income provided by the co-borrowers during registration
dti_joint A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income
int_rate Interest Rate on the loan
mths_since_last_record The number of months since the last public record.
revol_util Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
total_rec_prncp Principal received to date
is_bad (generated) Late > 16 days, Default, or Charged Off
Operationalization with Microsoft R Server
Data Scientist
Developer
Integration
Swagger API Service
Consume with any programming language
Deployment
Publish R function into web services
Configuration
Data Science Virtual Machine
Azure GS5 Instance
32 cores
448Gb RAM
Microsoft R Serverconfigured for
operationalizing R analytics
Microsoft R Client
(mrsdeploy package)
Quant
Consumption
Explore and consume services in R directly
publishServiceMicrosoft R Client
(mrsdeploy package)
IT Administator
Flexible vs Real-Time Deployment
Flexible Deployment
Publish R as Web Service
• Any R function or package
• R interpreter runs on-demand in Swagger via REST API
Real-Time Deployment
Publish R model object
• RevoScaleR or MicrosoftML models
• Prediction engine generates scores from data via REST API
8
library(mrsdeploy)
publishService(
serviceType='Script',
Code=<<R script or function>>)
library(mrsdeploy)
publishService(
serviceType='RealTime',
model=<<R object>>)
Real-Time Deployment Models
Linear Regression (rxLinMod, rxFastLinear)
Logistic Regression (rxLogit, rxLogisticRegression)
Classification / Regression trees (rxDTree, rxFastTrees)
Classification / Regression forests (rxDForest, rxFastForest)
Stochastic gradient-boosted decision trees (rxBTrees)
One-class Support Vector Machines (rxOneClassSvm)
Convolutional Neural Networks (rxNeuralNet)
Also: pre-trained models for text sentiment and image featurization
9
FLEXIBLE AND REAL-TIME SCORING WITH MICROSOFT R SERVER
DemonstrationServer: Azure Data Science Virtual Machine, Azure GS5 instance (32 cores, 448 GB memory)Client: SurfaceBook / Microsoft R Client
10
Flexible vs Real-TimePerformance Comparison
Server: Standard_D3_v2 (4 CPU core, 14GB RAM), Windows
16
Algos Real time (ms)
Flexible (ms)
RxLogit(model size 2K) 3.5 39.2
RxNeuralNet(model size 8K) 2.5 122.0
Model Size Real time(ms)
Flexible(ms)
2 MB(RxLogisticRegression)
5.0 9215.7
43 MB(RxLogisticRegression)
5.4 20255.6
sp_execute_external_script
Flexible
Deployment in SQL Server 2016
17
SQLSERVER
2016
Microsoft R Client
(RevoScaleR package)
rxSerializeObject
sp_rxPredict
Real-Time
20
blog.revolutionanalytics.com/2016/09/fraud-detection.html
SQL Server 20178 sockets, 192 cores6 TB RAMFlexible operationalization
Flexible vs Real-Time1M predictions/secSame benchmark
One-sixth the resources
Operationalization Overview
Platform Flexible Operationalization• Any R Function / Package
Real-Time Operationalization• Specific RevoScaleR / MicrosoftML models
SQL Server EXEC sp_execute_external_script
@language = N'R',
@script = N'<<R script>>'
EXEC sp_rxPredict
@model=<<serialized R object>>@inputData=<<SQL query>>
Microsoft R Server
library(mrsdeploy)
publishService(
serviceType='Script',
Code=<<R script or function>>)
library(mrsdeploy)
publishService(
serviceType='RealTime',
model=<<R object>>)
21
• Use Microsoft R Server 9+ or SQL Server 2016+ as the deployment server• Flexible Operationalization supports any R code / package• Real-Time Operationalization supports Microsoft R models with improved latency