AIG Performance Classification

13
Agency Performance Prediction Miraj Vashi 11-Dec-2016

Transcript of AIG Performance Classification

Page 1: AIG Performance Classification

Agency Performance Prediction

Miraj Vashi11-Dec-2016

Page 2: AIG Performance Classification

Agency Performance Prediction | 2

Contents

Contents Business Case Insights Required & Business Benefit A Bit About Domain… Data Pre-Processing Modelling

Approach Evaluation Metric Outcome Best Model Comparison Model Interpretation & Key Challenges

Page 3: AIG Performance Classification

Agency Performance Prediction | 3

Business Case

Azure Insurance Group is operating property and casualty (P&C) insurance, life insurance and insurance brokerage companies. Azure sells the policies through direct & indirect sales channel. For indirect selling, Azure has tie up with 1600+ agencies across 6 states. Azure is interested in classifying existing agencies into predefined performance categories in a supervised predictive framework & based on agencies past performance. Specifically Azure expects to better understand which agencies are likely to bring more growth in Personal Line (PL) of Business

Page 4: AIG Performance Classification

Agency Performance Prediction | 4

Insights Required & Business Benefits

What Insights Are Required?

Classify each agency into one of the following categories– GROW: Business from the agency is likely to grow > 5% in 2014– STABLE: Business from the agency is likely to stay flat with growth in the range [-5%,5%] in 2014– LOSS: Business form the agency is likely to shrink > 5% (< -5% growth) in 2014

Note: Business growth is measured in terms of %growth in Average Monthly Written Premium Amount achieved by the Agency for a given year

Potential Business Benefits

• Improved understanding of Agency Performance - at a micro level & macro level– How is an individual agency is likely to perform?– How are all agencies in a state are likely to perform?

• Optimized utilization of Agency Development Funds

Page 5: AIG Performance Classification

Agency Performance Prediction | 5

A Bit About Domain…

What is Insurance?

• Risk Management Tool for the customer (individual/business) allowing him to transfer the risk of financial loss to the insurance company

• In exchange for a constant stream of premiums, insurance companies offer to pay consumers a sum of money upon the occurrence of a predetermined event, such as a natural catastrophe, a car crash or death etc..

• Broadly, from a business perspective - insurance is classified as: Life OR Non-Life (General)

Insurance

Life Insurance

General Insurance

Property & CasualtyInsurance

Medical Insurance

Motor VehicleInsurance

Marine Insurance

FireInsurance

Homeowner’sInsurance

Insurance Type

Page 6: AIG Performance Classification

Agency Performance Prediction | 6

Data Preprocessing

What Data Was Provided By Azure?• 213K+ observations with 49 dimensions• Each observation representing yearly aggregated data for an Agency >> for a Year >> for a state >> for a product• Key attribute summary:

– 1624 agencies– 11 years of time duration (2005-2015)– 6 states– 29 products– 2 product lines

• No target class in the data !

Attribute AnalysisEach input attribute was assessed from 3 different angles:• Business meaning: What does it mean?• Domain Expertise Based Predictive Importance: Can it help in predicting agency performance? • Sparsity: Does it have enough values?

Page 7: AIG Performance Classification

Agency Performance Prediction | 7

Data Preprocessing (Cont…)

Key Preprocessing Challenges:SR# Challenge Category Challenge Resolution

1 Missing Values 1. Identified and dropped highly sparse attributes 2. Missing values encoded as "99999", "Unknown" were converted to NA during file read in R

2 Unwanted Data 1. Agencies, appointed as late as 2014 & for which 2014 growth rate can not be calculated - were removed2. Scope of analysis is "Personal Line (PL)" data, hence, Commercial Line (CL) data was filtered out

3 Unavailable Data New attributes created for all Quantity and Revenue attributes to average them over the # of months data is available

4 Incomplete Data 2005 and 2015 data removed as data were available only for 8 & 5 months respectively

5 Repeating Data Agency specific attributes were detached from raw data, processed separately and later merge with main data

6 Format Of Data For Modelling

1. All Quantity attributes & revenue attributes were aggregated based on AGENCY_ID and YEAR2. Each important attribute was expanded with AGENCY_ID in row and Year identifier in columnE.g. WrittenPremAmount column was converted to 2006_WrittenPremAmount, 2007_WrittenPremAmount....

7 No Target Class Present 1. Lag variable was created for Written Premium Amount 2. Growth Rate for each agency for all years (2006-20014) was calculated3. Each agency was assigned a class label based on 2014 growth rate:• GROW class := 2014 growth rate > 5%• STABLE class := 2014 growth rate in the range [-5%, 5%]• LOSS class := 2014 growth rate < -5%

Page 8: AIG Performance Classification

Agency Performance Prediction | 8

Modelling - Approach• Important features were identified using Boruta package (11 attributes dropped)• As this is a classification problem, following algorithms were used:

– CART– C5.0– Random Forest– K Nearest Neighbours– Artificial Neural Network– Support Vector Machine– GBM– Ensemble-Stacking

• Many algorithms were tried on three flavours of data:– ASIS Data– ASIS Data + Range transformation– ASIS Data + Range transformation + Important Features

• 10-fold cross-validation (3x-10x repeated) was performed to get an initial best-estimate of hyper parameters ("caret" package)

• One or more round of grid search was used to fine tune the hyper-parameter values ("caret" package)• Cost-sensitive learning was used in CART and SVM

Page 9: AIG Performance Classification

Agency Performance Prediction | 9

Modelling – Evaluation Metric

• Interesting Insight: – Only ~40% of the agencies achieved >0% growth in 2014– Of the 40%, Only ~50% of the agencies grew > 5%. Same is reflected in the

2014 growth class distribution:

• Azure is interested in identifying agencies in GROW class as accurately as possible

GROW STABLE LOSS21% 37% 42%

• Model Evaluation Metric:– Higher Recall For GROW class AND– Optimal F1 to balance Recall-Precision tradeoff

Page 10: AIG Performance Classification

Agency Performance Prediction | 10

Modelling - Outcome

Page 11: AIG Performance Classification

Agency Performance Prediction | 11

Modelling – Best Model Comparison

Best Model Vs. Baseline Model:

• In the absence of a model OR as a baseline model, the best estimate of 2014 Performance Class is MODE of 2014 Performance Class attribute.

• Baseline model would predict "LOSS" class for all agencies as, with 42% observations, "LOSS" is the highest occurring class

Model Metric Baseline Predictive Model Best Predictive ModelGROWTH Class – Recall 0 0.80GROWTH Class – Precision 0 0.35GROWTH Class – F1 NA 0.49Overall Accuracy 41.78% 49.20%

Page 12: AIG Performance Classification

Agency Performance Prediction | 12

Modelling – Model Interpretation & Key Challenges

Model Interpretation• If an agency is likely to grow > 5% in 2014:

– Best Predictive Model is able to accurately label it as "GROW" in 4/5 cases

• If the Best Predictive Model has labeled an agency as "GROW": – In 1/3 cases the agency will actually grow > 5% in 2014– In 2/3 cases the agency will stay STABLE or LOSS in 2014

Key Challenges:• GROW Class is a minority class in the data. The class distribution is imbalanced & is skewed toward

"LOSS" class• For majority of algorithms, the learning is skewed toward learning LOSS class correctly - something that

Azure is not interested in• The data has lot of variance. Difficult to get Test Data truly representative of Train data !• There is "not enough" data to overcome class-imbalance and variance in the data

Page 13: AIG Performance Classification