2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to...

34
2017 Predictive Analytics Symposium Session 35, Kaggle Contests--Tips From Actuaries Who Have Placed Well Moderator: Kyle A. Nobbe, FSA, MAAA Presenters: Thomas DeGodoy Shea Kee Parkes, FSA, MAAA SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Transcript of 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to...

Page 1: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

2017 Predictive Analytics Symposium Session 35, Kaggle Contests--Tips From Actuaries Who Have Placed

Well

Moderator: Kyle A. Nobbe, FSA, MAAA

Presenters:

Thomas DeGodoy Shea Kee Parkes, FSA, MAAA

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Page 2: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Predictive Modeling Contests

Tom de Godoy

Page 3: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Tom de GodoyCTO & Co-founder,

DataRobot

● 15 years of experience in Insurance Analytics● Previously, Director of Research & Modeling at

Travelers Insurance● Advisor to the DataRobot Insurance practice that

works with a large number of insurance companies

DataRobot is an Automated Machine Learning Platform

● Founded in 2012, Funding over $100 million● Experts with 70+ years of insurance analytics

experience● DataRobot’s insurance portfolio includes Fortune 100,

Regional players, global players and InsurTechs

This session is based on my experience working with leading insurers and then, founding a Machine Learning company that is helping hundreds of companies in their

Machine Learning journey.

Page 4: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Why Kaggle?

- Money prizes? Glory?

- Learn Machine Learning by doing it!

- Be part of a large community of data scientists

Page 5: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Why Learn Machine Learning?

open source programming

democratization

Open-Source Innovations➔ ML driven by open-source and academics

Low-cost computingDisruptive Competition

➔ New business models around data

Unstructured dataTraditional data

“90% of the data in the world today has been created in the last two years alone”

Avalanche of new data➔ “Big Data” environment: Velocity, Volume, Variety

Better Product

Better Service

Optimised Operations

Page 6: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Predictive Analytics is a Competition

1. The ability to identify opportunities

2. The ability to execute on these opportunities

3. Better predictive models than your competitor

Keys to Building a Competitive Advantage:

Page 7: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Keys to Winning This Competition

1. Knowledge of the data and of the business problem

2. Large and diverse set of algorithms

3. Robust model validation

4. Speed

Develop Models Better and Faster Than Your Competitors

Page 8: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Know Your Data

Simple ways to “know your data”:

- Data dictionary- Simple profiles & summaries- Interactive queries

Insights from machine learning models:

- Identify important features- Visualize partial dependencies- Discover non-linear effects and interactions- Discover prediction outliers and their reasons

Most useful insights about the data come from machine learning models.

Page 9: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Quick Prototype & Rapid Iterations

prototype

socializerapid iteration

feedback loop

Rapid iteration and early socialization are the key!

Page 10: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Leverage a Large and Diverse Set of Algorithms

“For each particular method there are situations for which it is particularly well suited, and others where it performs badly compared to the best that can be done with that data”

Source: http://statweb.stanford.edu/~tibs/ElemStatLearn/

Page 11: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

How to Leverage More Algorithms?

For each new algorithm, you need to figure out....

● What library/implementation should you use?

● How do you tune the model?

● How do you prepare the data for the model?

● How do you score new data with this model?

● How do you run it faster and less costly?

Page 12: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

How to Leverage More Algorithms?

Automated Machine Learning Platform

Having a diverse set of algorithms is key to maximizing accuracy.

Page 13: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Robust Validation

On your own cross-validation framework, evaluate your models using:

- Ranking & accuracy metrics (AUC, Gini, R-Squared, MSE etc)

- Lift charts & dual-lift charts

- Feature importance plots

- Partial dependency plots

- Reason codes

This cross-validation framework should be used only for evaluation and not for tuning

Don’t trust the leaderboard. Trust your own cross validation.

Page 14: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

A Lesson from Kaggle

Page 15: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Trust Your Own Cross-Validation

Page 16: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Speed

Speed is a limiting factor for:

- Leveraging a large number of features (and data sources)

- Modeling complex types of data

- Using many models to discover the best solution

- Maximizing model accuracy

- Doing robust validation of any model

You Must be Faster than Your Competitors

Page 17: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

The #1 Barrier: The Traditional Approach is Hard!

Math&

Stats

DomainExpertise

DATASCIENCE

Hacking & CodingSkills

RPythonSparkHadoop

Logistic RegressionGLMGBMRandom ForestDecision TreesNeural NetsDeep LearningText MiningFeature EngineeringBlendingCross Validation

Page 18: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Advantages of Automated Machine Learning

1. Time to Value: 10x faster to build and deploy predictive models.

2. Accuracy: Unprecedented accuracy of models “out-of-the-box”.

3. Transparency: Easy to know your model and collaborate on projects.

4. Pervasiveness: Simple UI and workflow for people of various backgrounds to

leverage machine learning.

5. Democratization: Not limited to data scientists.

6. Consistency: Best practices in model building, validation and deployment

applied consistently in every project.

Page 19: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Summary

- Know your data (with multivariate model insights)

- Leverage a large and diverse set of algorithms

- Apply robust model validation

- Speed is critical!

- Leverage automation as much as possible

Page 20: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Questions?

Page 21: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

How to do well at a Kaggle contest

Shea Parkes, FSA MAAA

Page 22: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Limitations

The views expressed in this presentation are those of the presenter, and not those of Milliman or the Society of Actuaries. Nothing in this presentation is intended to represent a professional opinion or be an interpretation of actuarial standards of practice.

2

Page 23: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

http://blog.kaggle.com/

3

Page 24: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Focus on a single contest

4

Page 25: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Join a contest when it begins

5

Page 26: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Read all contest information

6

Page 27: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Participate in forums

7

Page 28: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Participate in notebooks and kernels

8

Page 29: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Join a team

9

Page 30: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Spend a ton of time feature engineering

10

Page 31: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Setup appropriate validation framework

11

Page 32: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Use existing implementations of common algorithms

12

Page 33: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Write code

13

Page 34: 2017 Predictive Analytics Symposium - SOA · A panel of Kaggle Masters share their tips on how to get started and what to do to win prizes. Kaggle is a platform for predictive analytics

Use GitHub

14