Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald...
-
Upload
shannon-moody -
Category
Documents
-
view
219 -
download
0
description
Transcript of Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald...
![Page 1: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/1.jpg)
Special Challenges With Large Data Mining Projects
CAS PREDICTIVE MODELING SEMINAR
Beth FitzgeraldISO
October 2006
![Page 2: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/2.jpg)
![Page 3: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/3.jpg)
Agenda
•Project Overview•Prior to Modeling•Modeling•Business Issues
![Page 4: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/4.jpg)
Development of a Model - Project Overview
•Data•Statistical Tools•Computer Capacity•Team Skills–Data management –Analytical/statistical– Technology–Business Knowledge
![Page 5: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/5.jpg)
Prior to Modeling
•Formulate the Problem•Evaluate Possible Data Sources•Prepare the Data•Develop Understanding of Modeling
Procedures and Diagnostics•Explore the Data with Simple Modeling
Techniques
![Page 6: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/6.jpg)
What percent of a model building project is the data preparation and
data management? 25% 50% 75% 85%
![Page 7: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/7.jpg)
Formulate the Problem
•What problem are you trying to solve?•What results do you expect to see?•How will you know if the results are
reasonable?
![Page 8: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/8.jpg)
Prepare the Data•Do quality checks in level of detail needed
for project•Understand how to prepare individual
variables for use in models•Need to be practical about number of
classification categories models can handle•Need to decide on truncation and bucketing
of variables that are continuous•Create new variables
![Page 9: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/9.jpg)
Develop Understanding of Modeling Procedures and Diagnostics
•Basic modeling training – GLM, Data Mining•What software is available? •What software/models work for my data
investigation, modeling problem, etc.•What computer capacity do I need?•Learn how to use software •Learn how to interpret the diagnostics
![Page 10: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/10.jpg)
Development of a Model• Analyze historical policy and loss data– Policy level detail– Location level detail
• Link policy and loss data with external and/or internal data:– Specific business risk data – operational,
financial – Specific location data – demographic,
weather– Other data – building, vehicle, agency
• Need link between policy detail and other data
![Page 11: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/11.jpg)
Explore the Data with Simple Modeling Techniques
•Start with sample of data•Try different classical analysis on
sample such as:– regression– linear models– correlation matrices
•Make use of graphical options to explore data
![Page 12: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/12.jpg)
Data Management Issues
•Matching additional internal policy information to premium/loss data– Different points in time– Tracking & balancing audited exposures
•Different summarization keys – handling of mid-term endorsements•Address scrubbing •Matching to external data for correct point in
time• Significance of missing values within variable
![Page 13: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/13.jpg)
Modeling Activities
•Selection of Predictors – variable elimination, variable transformation•Start with classical models prior to
evaluating more complex models•Methodology Understanding and
Evaluation•Evaluation of Model Performance
![Page 14: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/14.jpg)
Data Mining Techniques
Balance good fit with explanatory power
•Generalized Linear Models•Classification Trees•Regression Trees•Multivariate Adaptive Regression
Splines•Neural Networks
![Page 15: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/15.jpg)
Data Mining Process
BusinessKnowledge
Data Linking
Data Cleansing
Analyze Variables
Determine Predictive Variables
Evaluation
Data Gathering
Data Mining
![Page 16: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/16.jpg)
Model Performance
•Lift Curve Analysis– Score all risks in sample –Rank risks by score from Bad to
Good–Compare loss ratio of risks in each
decile to loss ratio for all risks
![Page 17: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/17.jpg)
Sample Lift Curve AnalysisRelative Loss Ratio Lift
Optimal Model
0.7
0.8
0.9
1
1.1
1.2
1.3
1 2 3 4 5 6 7 8 9 10
Decile of Worst to Best Risk
Loss
Rat
io R
elat
ivity
LR Relativity by Decile
![Page 18: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/18.jpg)
Business Issues
• Model uses information from a third-party vendor• Model needs to be accessible
electronically• Technology Issues• Implementation Decisions
![Page 19: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/19.jpg)
Technology Issues
• Develop/Modify Systems • Integrate into underwriting/rating workflow– Decision process– Agency system
• Decide on technology– Web-based interface– API, FTP, MQ, TCP/IP, HTTPS webservices
![Page 20: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/20.jpg)
Implementation of ModelSolution focus/usage:
• Suitability of risk for underwriting decision
• Source for additional pricing factors• Consistency in underwriting/pricing
decisions • Compliance with regulations based on
implementation decision• Consider model alone or model with
other information available from application
![Page 21: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/21.jpg)
Implementation of Model
Workflows:• Underwriting– New Business– Renewal business
• Rating– Pricing– Coverage Adjustment
![Page 22: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/22.jpg)
Business Implementation of Model
• Strategic Plan - need management involvement • Prepare Announcement/Training Material
for Internal & External Customers•Coordinate Implementation •Monitor Feedback/Adjust Implementation
![Page 23: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/23.jpg)
Future Plans
•Determine Process for Updates to Model–Use of Updated Data–Use of New Data Variables–Use of New Techniques
![Page 24: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062523/5a4d1b0f7f8b9ab05998e380/html5/thumbnails/24.jpg)