Predicting Loan Delinquency at One Million Transactions per Second

20
Predicting Loan Delinquency at 1M Transactions per Second David Smith @revodavid R Community Lead, Microsoft

Transcript of Predicting Loan Delinquency at One Million Transactions per Second

Predicting Loan Delinquencyat 1M Transactions per Second

David Smith @revodavidR Community Lead, Microsoft

2

It looks like you’ve created a

predictive model…

NOW

WHAT?

3

http://hamiltonmusical.wikia.com/wiki/Right_Hand_Man

Generating Predictions

Batch Mode

• Create many (millions!) of predictions at once

• Time required proportional to number of predictions

Real Time

• Only a few (maybe only one!) data point available to predict– There may be multiple requests in a short timeframe

• Latency the key metric here– Many applications require sub-second latency at endpoint

4

Real-Time Operationalization Options

• Rewrite prediction code in some other language

– PMML / C++ / Java / …

• OR, use your R code:

– Deploy as a web service with Microsoft R Server

– Deploy as a stored procedure in SQL Server

5

Lending Club Loan Performance Data

• www.lendingclub.com/info/download-data.action

– Feature selection and generation: aka.ms/lendingclub

6

LoanStatNew Description

all_util Balance to credit limit on all trades

annual_inc_joint The combined self-reported annual income provided by the co-borrowers during registration

dti_joint A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income

int_rate Interest Rate on the loan

mths_since_last_record The number of months since the last public record.

revol_util Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.

total_rec_prncp Principal received to date

is_bad (generated) Late > 16 days, Default, or Charged Off

Operationalization with Microsoft R Server

Data Scientist

Developer

Integration

Swagger API Service

Consume with any programming language

Deployment

Publish R function into web services

Configuration

Data Science Virtual Machine

Azure GS5 Instance

32 cores

448Gb RAM

Microsoft R Serverconfigured for

operationalizing R analytics

Microsoft R Client

(mrsdeploy package)

Quant

Consumption

Explore and consume services in R directly

publishServiceMicrosoft R Client

(mrsdeploy package)

IT Administator

Flexible vs Real-Time Deployment

Flexible Deployment

Publish R as Web Service

• Any R function or package

• R interpreter runs on-demand in Swagger via REST API

Real-Time Deployment

Publish R model object

• RevoScaleR or MicrosoftML models

• Prediction engine generates scores from data via REST API

8

library(mrsdeploy)

publishService(

serviceType='Script',

Code=<<R script or function>>)

library(mrsdeploy)

publishService(

serviceType='RealTime',

model=<<R object>>)

Real-Time Deployment Models

Linear Regression (rxLinMod, rxFastLinear)

Logistic Regression (rxLogit, rxLogisticRegression)

Classification / Regression trees (rxDTree, rxFastTrees)

Classification / Regression forests (rxDForest, rxFastForest)

Stochastic gradient-boosted decision trees (rxBTrees)

One-class Support Vector Machines (rxOneClassSvm)

Convolutional Neural Networks (rxNeuralNet)

Also: pre-trained models for text sentiment and image featurization

9

FLEXIBLE AND REAL-TIME SCORING WITH MICROSOFT R SERVER

DemonstrationServer: Azure Data Science Virtual Machine, Azure GS5 instance (32 cores, 448 GB memory)Client: SurfaceBook / Microsoft R Client

10

11

12

13

14

15

Flexible vs Real-TimePerformance Comparison

Server: Standard_D3_v2 (4 CPU core, 14GB RAM), Windows

16

Algos Real time (ms)

Flexible (ms)

RxLogit(model size 2K) 3.5 39.2

RxNeuralNet(model size 8K) 2.5 122.0

Model Size Real time(ms)

Flexible(ms)

2 MB(RxLogisticRegression)

5.0 9215.7

43 MB(RxLogisticRegression)

5.4 20255.6

sp_execute_external_script

Flexible

Deployment in SQL Server 2016

17

SQLSERVER

2016

Microsoft R Client

(RevoScaleR package)

rxSerializeObject

sp_rxPredict

Real-Time

20

blog.revolutionanalytics.com/2016/09/fraud-detection.html

SQL Server 20178 sockets, 192 cores6 TB RAMFlexible operationalization

Flexible vs Real-Time1M predictions/secSame benchmark

One-sixth the resources

Operationalization Overview

Platform Flexible Operationalization• Any R Function / Package

Real-Time Operationalization• Specific RevoScaleR / MicrosoftML models

SQL Server EXEC sp_execute_external_script

@language = N'R',

@script = N'<<R script>>'

EXEC sp_rxPredict

@model=<<serialized R object>>@inputData=<<SQL query>>

Microsoft R Server

library(mrsdeploy)

publishService(

serviceType='Script',

Code=<<R script or function>>)

library(mrsdeploy)

publishService(

serviceType='RealTime',

model=<<R object>>)

21

• Use Microsoft R Server 9+ or SQL Server 2016+ as the deployment server• Flexible Operationalization supports any R code / package• Real-Time Operationalization supports Microsoft R models with improved latency

Thank You!

David Smith @revodavidR Community Lead, Microsoft

Special thanks: Pratik Palnitkar, Microsoft

Arun Gurunathan, Microsoft

Download Microsoft R Client: aka.ms/rclient

Data Science Virtual Machine: aka.ms/dsvm