Production model lifecycle management 2016 09
-
Upload
greg-makowski -
Category
Data & Analytics
-
view
60 -
download
0
Transcript of Production model lifecycle management 2016 09
![Page 1: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/1.jpg)
© 2016 LigaData, Inc. All Rights Reserved.
Production Model Lifecycle Management
Presented: Tue, Sept 22, 2016
[email protected] www.Ligadata.org www.Kamanja.org
![Page 2: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/2.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 2
Develop a Robust Solution (or get fired)Selecting the Best Model w/ Model NotebookDescribing the ModelPutting a Model in ProductionModel Drift over Time (Non-Stationary)Retrain or Refresh the ModelKamanja Open Source PMML Scoring Platform
ContentsAccurate
General
Understandable
ModelCan you have all 3Modelattributes?
![Page 3: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/3.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 3
Epsilon (owned by American Express then)ACG’s first neural network (1992) (~40 quants in Analytic Consulting Group)
Score 250mm house holds every month, pick the best 5mm hhNeural net by a previous consultant,
did great “in the lab” !! did “reasonable” month 1
Develop a Robust Solution (or get fired)General
![Page 4: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/4.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 4
Epsilon (owned by American Express then)ACG’s first neural network (1992) (~40 quants in Analytic Consulting Group)
Score 250mm house holds every month, pick the best 5mm hhNeural net by a previous consultant,
did great “in the lab” !! did “reasonable” month 1did “worse” month 2“bad” month 3 (no lift over random)
prior consultant was firedI was hired, and told why I was replacing him
My model captured the same response with 4mm hh mailedwas stable for 24+ months, saved $1mm / monthWhy? Good KDD Process (Knowledge Discovery in Databases)
Develop a Robust Solution (or get fired)General
![Page 5: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/5.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 5
Develop a Robust Solution (or get fired)Selecting the Best Model w/ Model NotebookDescribing the ModelPutting a Model in ProductionModel Drift over Time (Non-Stationary)Retrain or Refresh the ModelKamanja Open Source PMML Scoring Platform
Contents
![Page 6: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/6.jpg)
Model Notebook
6
Bad vs.Good
Accurate
![Page 7: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/7.jpg)
7
R package “caret”Same parameter search wrapper over 217 algorithmshttp://topepo.github.io/caret/index.htmlA “section” of a model notebookStill need to track the results of each section
Model Notebook Accurate
![Page 8: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/8.jpg)
8
Bad vs.Good
217 R Algorithms Covered
Do you really want a one-off solution?• Experimenting with Algorithms• Experimenting with Algorithm Parameters• Variable description à refine preprocessing• :
• Deep Learning architectures have many parameters and network designs
Accurate
![Page 9: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/9.jpg)
Model Notebook
9
Bad vs. Good
Q) What is the best outcome metric?ROC, R2, Lift, MAD ….
Accurate
![Page 10: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/10.jpg)
Model Notebook
10
Bad vs. Good
Q) What is the best outcome metric?ROC, R2, Lift, MAD ….
A) Deployment simulation of cost-value-strategyDoes the business problem mirror the 80-20 rule?Just act on top 1% or top 5%?
Is the business deployment over all the score range? [0… 1]?
Just over the top 1% or 5% of the score (then NOT ROC, R2, corr)Are some records 5* or 20* more valuable?à Use cost-profit weighting, or more complex system Is this taught in
mining competitions or
classes?
Accuratein terms
of business
focus
![Page 11: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/11.jpg)
Calculate $ of “Business Pain”
zeroerror
OverStock
UnderStock
Need to DeeplyUnderstand
Business Metrics
Accurate
![Page 12: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/12.jpg)
Calculate $ of “Business Pain”
1% bus pain $
15% business pain $
zeroerror
?←Equal mistakes → Unequal PAIN in $
OverStock
UnderStock
Need to DeeplyUnderstand
Business Metrics
At least use Type I vs.Type II weighting
Accuratein terms
of business
focus
![Page 13: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/13.jpg)
Calculate $ of “Business Pain”
No way – that could get you fired!New progress in getting feedback
OverStock
4 week supply of SKU →
30% off sale
UnderStock
1% bus pain $
30% bus pain $15% business
pain $
zeroerror
←Equal mistakes → Unequal PAIN in $
Accuratein terms
of business
focus
![Page 14: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/14.jpg)
Model NotebookOutcome Details
• My Heuristic Design Objectives: (yours may be different)
– Accuracy in deployment– Reliability and consistent behavior, a general solution
• Use one or more hold-out data sets to check consistency• Penalize more, as the forecast becomes less consistent
– No penalty for model complexity (if it validates consistently)
– Develop a “smooth, continuous metric” to sort and find models that perform “best” in future deployment
14
What would you do?
![Page 15: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/15.jpg)
Model NotebookOutcome Details
• Training = results on the training set• Validation = results on the validation hold out• Gap = abs( Training – Validation )
A bigger gap (volatility) is a bigger concern for deployment, a symptomMinimize Senior VP Heart attacks! (one penalty for volatility)Set expectations & meet expectationsRegularization helps significantly
• Conservative Result = worst( Training, Validation) + Gap_penaltyCorr / Lift / Profit → higher is better: Cons Result = min(Trn, Val) - GapMAD / RMSE / Risk → lower is better: Cons Result = max(Trn, Val) + Gap
Business Value or Pain ranking = function of( conservative result )15
Generalization: You can’t optimize
something you don’t measure
![Page 16: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/16.jpg)
Model Notebook
16
Bad vs.Good
Accurate & General
![Page 17: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/17.jpg)
Model Notebook ProcessTracking Detail ➔ Training the Data Miner
Input / Test Outcome
Regression
Top 5%
Top 10%
Top 20%
AutoNeural
Neural
Yippeee!
More
Heuristic Strategy: • Try a few models of many
algorithm types (seed the search)
• Opportunistically spend more effort on what is working (invest in top stocks)
• Still try a few trials on medium success (diversify, limited by project time-box)
• Try ensemble methods, combining model forecasts & top source vars w/ model
The Data Mining Battle Field
![Page 18: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/18.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 18
Develop a Robust Solution (or get fired)Selecting the Best Model w/ Model NotebookDescribing the ModelPutting a Model in ProductionModel Drift over Time (Non-Stationary)Retrain or Refresh the ModelKamanja Open Source PMML Scoring Platform
Contents
![Page 19: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/19.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 19
The law does not care how complex the model or ensemble was..i.e. NOT sex, age, marital status, race, ….i.e. ”over 180 days late on 2+ bills”
There are solutions to this constraint, for an arbitrary black box
The solutions have broad use in many areas of the model lifecycle
When Rejecting Credit –Law Requires 4 Record Level Reasons
Understandable
![Page 20: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/20.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 20
Should a data miner cut algorithm choices, so they can come up with reasons?
![Page 21: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/21.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 21
97% of the time, NO!(or let me compete with you)
Focus on the most GENERAL & ACCURATE system first
A VP does not need to know how to program a B+ tree, in order to make a SQL vendor purchase decision. (Be a trusted advisor)
Should a data miner cut algorithm choices, so they can come up with reasons?
“I understand how a bike works, but I drive a car to work”“I can explain the model, to the level of detail needed to drive your business”
Understandable
![Page 22: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/22.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 22
Description Solution – Sensitivity Analysis(OAT) One At a Time
https://en.wikipedia.org/wiki/Sensitivity_analysis
Arbitrarily ComplexData Mining System
(S) Source fields
Target field
For source fields with binned ranges, sensitivity
tells you importance of the range, i.e. “low”, …. “high”
Can put sensitivity values in Pivot Tables
or Cluster
Record Level “Reason codes” can be extracted from the most important
bins that apply to the given record
![Page 23: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/23.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 23
Description Solution – Sensitivity Analysis(OAT) One At a Time
Arbitrarily ComplexData Mining System
Present record N, S times, each input 5% bigger (fixed input delta)Record delta change in output, S times per record
Aggregate: average(abs(delta)), target change per input field delta
(S) Source fields
Target field
For source fields with binned ranges, sensitivity
tells you importance of the range, i.e. “low”, …. “high”
Can put sensitivity values in Pivot Tables
or Cluster
Record Level “Reason codes” can be extracted from the most important
bins that apply to the given record
Delta in forecast
![Page 24: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/24.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 24
Description Solution – Sensitivity AnalysisApplying Reasons per record (independent of var ranking)
• Reason codes are specific to the model and recordrecord 1 record 2
• Ranked predictive fields Mr. Smith Mr. Jonesmax_late_payment_120d 0 1max_late_payment_90d 1 0bankrupt_in_last_5_yrs 1 1max_late_payment_60d 0 0
• Mr. Smith’s reason codes include:max_late_payment_90d 1bankrupt_in_last_5_yrs 1
![Page 25: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/25.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 25
Description Solution – Alternatives
R’s caret offers some feature selection, • http://topepo.github.io/caret/featureselection.htmlFilter methods (univariate)Wrapper methods• Recursive feature elimination• Simulated Annealing• Genetic algorithms
Variable Importance• http://topepo.github.io/caret/varimp.html• Algorithm specific (9 kinds)• Model Independent Metrics
If classification: ROC curve analysis (univariate) per predictorIf regression: Fit a linear model
With variable rankingstill need to relate field ranking to record reason
Univariate methods do NOT cover variable interactions in the model, or non-linear
Understandable
![Page 26: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/26.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 26
Description SolutionLocal Interpretable Model-agnostic Explanations (LIME)
”Why Should I Trust You?” Explaining the Predictions of Any Classifier – Knowledge Discovery in Databases 2016 (August 13-17)https://arxiv.org/abs/1602.04938 (PDF)
https://github.com/marcotcr/lime-experiments (Python code)
Describes models locally, in terms of their variablesMinimize locality-aware loss
Understandable
![Page 27: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/27.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 27
Description SolutionLocal Interpretable Model-agnostic Explanations (LIME)
Understandable
![Page 28: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/28.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 28
Develop a Robust Solution (or get fired)Selecting the Best Model w/ Model NotebookDescribing the ModelPutting a Model in ProductionModel Drift over Time (Non-Stationary)Retrain or Refresh the ModelKamanja Open Source PMML Scoring Platform
Contents
![Page 29: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/29.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 29
Cut out extra preprocessed variables not used in final modelMinimize passes of the data
Many situations, I have had to RECODE prep and/or model to meet production system requirements• BAD: recode to Oracle, move SAS to mainframe & create JCL
Could take 2 months for conversion & full QA
• GOOD: Generate PMML code for model Build up PMML preprocessing library, like Netflix
Putting a Model in Production
![Page 30: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/30.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 30
Putting a Model in Productionwww.DMG.org/ PMML/products
![Page 31: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/31.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 31
Develop a Robust Solution (or get fired)Selecting the Best Model w/ Model NotebookDescribing the ModelPutting a Model in ProductionModel Drift over Time (Non-Stationary)Retrain or Refresh the ModelKamanja Open Source PMML Scoring Platform
Contents
![Page 32: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/32.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 32
Tracking Model Drift (easy to see with 2 input dimensions vs. score)
CurrentScoring
Data
TrainingData
General
![Page 33: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/33.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 33
A trained model is only as general asthe variety of behavior in the training datathe artifacts abstracted out by preprocessing
Good KDD process and variable designs the analysis universe like the general scoring universe
Over time, there is “drift” from the behavior represented in the scoring data, and the original training data
Stock market cyclesBull à Bear à Bull à …
Tracking Model DriftGeneral
![Page 34: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/34.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 34
MODEL DRIFT DETECTOR in N dimensions
• Change in distribution of target (alert over threshold)During training, find thresholds for 10 or 20 equal frequency bins of the scoreDuring scoring, look at key thresholds around business decisions (act vs not)Has the % over the fixed threshold changed much?
• Change in distribution of most important input fieldsDiagnose CAUSES, what is changing, how much…Out of the top 25% of the most important input fields…Which had the largest change in contingency table metric?
Tracking Model DriftGeneral & Description
![Page 35: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/35.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 35
A frequent process in companies – RETRAIN EVERY DAY
• Does yesterday’s 4th of July sale training data best represent your 5th of July activity?
• Have you ”forgotten” past lessons, not in yesterday’s dataThe Stability vs. Placticity dilemma orLearn how to play the guitar without forgetting grandmotherWhat about fraud cases from 6 months ago?Same issues exist in online training
• Drifting vs. forgetting? choose robustness and transparency, which ever you do
Tracking Model DriftGeneral & Description
![Page 36: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/36.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 36
Develop a Robust Solution (or get fired)Selecting the Best Model w/ Model NotebookDescribing the ModelPutting a Model in ProductionModel Drift over Time (Non-Stationary)Retrain or Refresh the ModelKamanja Open Source PMML Scoring Platform
Contents
![Page 37: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/37.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 37
Model Retrain• Brute force, most effort, most expense, most reliable• Repeat the full data mining model training project• Re-evaluate all algorithms, preprocessing, ensembles
Model Refresh • “Minimal retraining”• Just run the final 1-3 model trainings on “fresher” data• Do not repeat exploring all algorithms and ensembles• Assume the ”structure” is a reasonable solution• Go back to your prior Model Notebook – choose the best as a short cut
Retrain, Refresh or Update DBC
1-2 months
3-5 days
General
![Page 38: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/38.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 38
Develop a Robust Solution (or get fired)Selecting the Best Model w/ Model NotebookDescribing the ModelPutting a Model in ProductionModel Drift over Time (Non-Stationary)Retrain or Refresh the ModelKamanja Open Source PMML Scoring Platform
Contents
![Page 39: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/39.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 39
Solution Architecture for Threat and ComplianceLambda Architecture with Continuous Decisioning
1
2
3
45
6
![Page 40: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/40.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 40
Solution Stack for Threat and ComplianceLeveraging Primarily Open Source Big Data Technologies
![Page 41: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/41.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 41
Problem
Diverse Inputs
• Structured and unstructured data, with varying latencies
Data Enrichment
• Long and laborious process, manual and ad hoc
Quality of Threat Intelligence
• Lots of false positives waste analyst resources
Poor Integrations with Response Teams
• Manual and Time Consuming Process
Solution
• Ingest IP addresses, malware signatures, hash values, email addresses, etc. in real time
• Automatically enrich with third party data
• Check historical logs against new threats continuously
• Predictive analytics based on machine learning flag suspicious activity before it becomes a problem
• Direct integration with dashboards to generate alerts and speed up investigation
Use Kamanja to detect potential cyber security breaches
Continuous DecisioningUse Case: Cyber Threat Detection & Response
![Page 42: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/42.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 42
Problem
• Legacy system is batch oriented
• Months required to create and implement new alerts
• Slow speed-to-market developing new source system extracts. Months required to assimilate new data.
• Risks to PII and NPI, with compliance implications.
Solution
• Use open source big data stack to migrate to real time data streaming, rapid model deployment, and alerts with no manual intervention.
• Calculate number of times PII/NPI accessed over eight hour period, and calculate risk to generate alerts
• Machine learning to identify normal pattern of out of office hours access. Trigger automatic alerts when anomalies occur.
• Rapid implementation of new models to deal with emerging threats.
Use Kamanja to detect insider attacks to sensitive data
Continuous DecisioningUse Case: Application Monitoring
![Page 43: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/43.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 43
Problem
• Need timely alerting of potentially unauthorized trading activity
• Must tie together voluminous data, reports, and risk measures
• Meet increasingly stringent time requirements
Solution
• Create a Trader Surveillance Dashboard
• Provide a holistic view of a trader, based on all relevant information about the trader, the marketplace, and peers
• Build supervised and unsupervised machine learning models based on operational, transactional, and financial data.
• Real-time analysis and monitoring of trader activity automatically highlights unusual activity and triggers alerts on trades to investigate
Use Kamanja to reduce the risk of rogue behavior at an investment bank
Continuous DecisioningUse Case: Unauthorized Trading Detection
![Page 44: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/44.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 44
Problem
• $16.3 billion in credit card fraud losses annually
• Fraud is growing more quickly than transaction value
• New types of fraud are one step ahead of existing solutions
• Dependence on third party proprietary systems means slow reaction times and expensive changes
Solution
• Apply Kamanja to IVR, web, and transactional data to trigger alerts
• Initial models detect suspicious web traffic, common purchase points, and application rarity
• Leverage existing infrastructure as well as existing third party systems (Falcon and TSYS)
• Reduce costs by 80% with open source software
Use Kamanja to incrementally reduce fraud losses by applying multiple predictive models for transaction authorization
Continuous DecisioningUse Case: Credit Card Fraud Detection
![Page 45: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/45.jpg)
© 2016 LigaData, Inc. All Rights Reserved. | 45
You can have it all: accurate, general & describable• You may fully understand a bike – but drive a car to work (level of detail)
Control and plan complexity: track in a model notebook• Reuse notebook when you need to retrain• Balance accuracy and generalization in the notebook outcomes• Track business net value per model (be more competitive)
Model and record level description helps model lifecycle• Helps during model building, to improve preprocessing, DBC• Helps gain trust• Helps track model drift and degradation
Use Kamanja, a real time decisioning engine for production deployment
SummaryAccurate
General
Understandable
Model
![Page 46: Production model lifecycle management 2016 09](https://reader033.fdocuments.us/reader033/viewer/2022042906/58a437fd1a28ab3e3d8b733f/html5/thumbnails/46.jpg)
© 2015 LigaData, Inc. All Rights Reserved.
Thank You
Tuesday, September 20, [email protected]/in/GregMakowski
www.Kamanja.org (Apache open source licensed)