Michael de Toldi BNP Paribas Cardif October...
Transcript of Michael de Toldi BNP Paribas Cardif October...
-
Michael de Toldi – BNP Paribas Cardif
October 2015
-
2
What governance for Data Analytics?
-
3
What do we want as clients?
Personalised offers
Adapted services
Quick answers
Great prices
Contextual offers
Expected contacts
Value for loyalty
Privacy
-
4
Some people think that we should…
-
5
But this, is the reality!
-
6
And it is better to know more than one model…
All models are wrong
but some are useful“
“George E. P. Box
-
7
“Data Science” vs “Data Mining” culture
Statistics
SQL, Flat files, Web scrapping, web
logs, Cookies, Text, Images, .json,
.xml, …
High abstraction programming
Open Source soft & libraries
MOOCs
Blogs, Tutorials, Forums,
KAGGLE competitions
Statistics,
Computer science : memory
optimization, parallelization
Data Format
Tools
Training
Skills
SQL
Flat files
Low level programming
frameworks
Proprietary software
Books
Diploma
1990 2015Data Mining Data Science
-
8
Toolbox
Objective
Strategy
Generalizing
Validation
Computation
Experience
The Data Modelling Culture The Machine Learning Culture
OLS
GLMs
GAMs
Cox
X Y
Regularized GLMs
Random Forests
Gradient Boosting
Neural Nets
Blending/Stacking
yX
Understand model nature Look for best accuracy
Design manually model structure Hyperparameters to control
model complexity
Combat overfitting with expertise Automated strategies to combat
overfitting
« Goodness of fit » statistical tests Measured by predictive accuracy
Model simplicity Parallelizing strategies
• Provide more insight about how nature is associating
the response variables to the input variables.
• Works well for small datasets
• But, if the model is a poor emulation of nature, the
conclusions based on this insight may be wrong !
• Sometimes considered as black box (unfairly
for some techniques)
• They often produce higher predictive power
with less modelling efforts because of
automated strategies
“Machine Learning” vs “Data Modelling” culture
-
9
1st conviction for Data Analytics
Data Science should be
understood and internalised“
“Michaël de Toldi
-
10
What about data in insurance companies?
Actuarial Data
Marketing Data
Commercial Data
Finance Data
Client Management
Data
Fraud Data
-
11
What about data in insurance companies?
Actuarial
Marketing
Finance
CommercialClient
Management
Fraud
INT / EXT
DATA
-
12
2nd conviction for Data Analytics
Internal & external Data
should be freed, shared,
controlled and secured
“
“Michaël de Toldi
-
13
Data Science & IT are deeply linked
-
14
Data Science innovation relies on Open Source
-
15
3rd conviction for Data Analytics
IT framework should be
Data Science friendly“
“Michaël de Toldi
-
16
Data Analytics governance in a nutshell…
Data Science should be understood and internalised
Internal & external Data should be freed, shared, controlled and secured
IT framework should be Data Science friendly
Don’t wait too much to
understand what is at stake!
-
THANK YOU!BNP PARIBAS CARDIF8, rue du Port
92 728 Nanterre Cedex
Tel.: +33 (0)1 41 42 83 00
bnpparibascardif.com