DATA PRODUCTS: 5 DEADLY SINS AND HOW TO PREVENT THEM
Pride Wrath Lust Gluttony Sloth
Mathieu BastianWeb Summit 2015, Dublin
Credits: The Seven Deadly Sins, Nanatsu no Taizai & nimbus-mage.deviantart.com
ABOUT ME• Data scientist & engineer
• Led data products team at LinkedIn
• Gephi co-founder
• Open-source contributor
2
DATA PRODUCTS
Source: http://bit.ly/1kMUPAe.
Tentative definition
User-facing production system based on an automated learning algorithm
3
DATA PRODUCTS TODAY
4
PRIDE"Excessive belief in one’s own abilities or
excessive love of oneself"
5
PRIDE
Source: http://www.themeasurementstandard.com/wp-content/uploads/2015/06/data-scientist-as-superman.jpg6
With power comes responsibility
7
Source: http://www.economist.com/node/15579717
Who are you building it for?Understand user intent
Integrate into the user flow
Explain recommendations to the user
Set right user expectations
Treat user like you would like to be treated
8
Credits: Google
Anticipate edge cases
9
WRATH“Choice of violent and hateful actions
over love and patience"
10
Exercise perseverance
Reward
Time
Phase II: Growth
Phase III: Maintenance
Phase I: Inception
12
But have a plan
13
LUST"Depraved thought, unwholesome morality and desire for excitement"
14
LUST
Credits: Google Data Center
15
Perform due diligence
16
Thank the janitor & handyman
17
GLUTTONY"The consumption of more of anything
than you need"
18
Avoid solo data scientists
20
Credits: Lucasfilm
Choose the right problemM - Measurable
E - Explainable
R - Rapid prototyping
C - Core
I - Iterable
21
SLOTH"Not caring about others or living life in a
fulfilling way"
22
Embrace continuous data pipelines
Source: http://http://azkaban.github.io/
24
Make data pipelines robust
Code
Upload
Run workflow
Look atlogs Code Upload Run
workflow
PigUnit25
THANK YOU!
Mathieu Bastian @mathieubastian
www.linkedin.com/in/mathieubastian