How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect
-
Upload
papisio -
Category
Technology
-
view
1.217 -
download
5
Transcript of How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect
Hi ! I’m FLORIAN DOUETTEAU, CEO of Dataiku
x 54 +
x 1+
+
58++
It’s Me !!
It’s our software !!
…and our software is
The most complete Data Science platform
Deployment
Dataiku - Data Tuesday
Meet Hal Alowne
Big Guys
• 10B$+ Revenue
• 100M+ customers
• 100+ Data Scientist
Hal AlowneBI Manager
Dim’s Private Showroom
Hey Hal ! We need
a big data platform
like the big guys.
Let’s just do as they do!
‟
”Average E-commerce Web site
• 100M$ Revenue
• 1 Million customer
• 1 Data Analyst (Hal Himself)
Dim SumCEO & Founder
Dim’s Private Showroom
Big Data
Copy Cat
Project
Technology Disconnect
5
Welcome to Technoslavia !
LOL PLATFORM ANTI-PATTERN
Test and Invest in Infrastructure == Skilled Peopleor
Go For Cloud / Packaged Infrastructure
Your Brand New Hadoop Clusteris perceived as slow, not so used and not reliable
TECHNO MISMATCH ANTI-PATTERN
Assume Being Polyglotor
Be a Dictator
VS
VS
The PythonClan
The RTribe
The Old ElephantFraternity
The New ElephantClub
PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY
Website 2000’ winners
Companies that were able to release fast
"Artificial Intelligence with Data for Internet of Things" 2010’ winners
Companies able to put intelligence in production
?
Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION
PEOPLE DISCONNECT
10
Classic Business Intelligence Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
DBA / IT Data Owner
Specs
Data Science Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor Data Team Manager
Data Engineer
Data Analyst
Data System Engineer /
Data Architect
Specs
Data Scientist
Built From Scratch
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
DBA / IT Data Owner
Specs
DATA SCIENTISTS EVERYWHERE
Built From Engineering
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
Specs
DATA ENGINEERS
DATA ANALYSTS
Built From Analysts
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
Specs
Manage Expectations
Data
Plumberer
Data
Engineer
Data
Scientist
Data
Waiter
Data
Cleaner
Data
Analyst
REAL
JOB
DREAM
JOB
Perfectly Natural Hidden thoughts
Business Project
Sponsor
Data Team Manager
Data EngineerData Analyst
Data Scientist
Managing Extreme Personalities
Data SCIENTIST
Highly Creative
Passionate
Hard to hire ?
Hard to manage ?
Want to take your job ?
Ambitious
Paired for Data
Data Analyst
Discover Patterns
Data Engineer
Make things work
Fight
data
entropy
Entropy
tech
entropy
When do you prefer ?
One Analyst
One EngineerOne Data Scientist
That work together ?
Four data scientists
Data Disconnect
21
What is the main reason for data project to fail ?
DATA
NOT AVAILABLE
BUT FOR ONLY INCREMENTAL GAIN
50 30 20
0% 25% 50% 75% 100%
Contribution to the overall project performance
Business Goal Definition and Data Feature Engineering Algorithm
How to Get Data if you don’t have it
THE GRASSHOPER THE SPIDER THE FOX
The Cicada : Optimistic and Opportunistic Data
THE CICADA
As a startup
As a group inside a company
- Build a new product using open data
- Benefit from the data sharing initiative within your company
- Wait for data to be available in your data lake
The Spider: Power of the Network
THE SPIDER
As a startup
As a group inside a company
- Create a network of (web trackers | sensors)
- Make it available for free
- Build your service on people’s collected data
- Make a web service available to collect data
- Promote it internally so that people use it
The Fox: Hunt for the Big Money first
THE FOX
As a startup
As a group inside a company
- Hunt for a Business Group within a large company with a problem
- Build a SaaS solution using their data
- Replicate to competitors
- Take in a charge a critical problem as per the CEO’s request
- Build your own integrated tech team to solve it
- Use those ressources to reset data services internally
29
PRODUCT DISCONNECT
What is Big Data about ?
The Age Of Distributed Intelligence
Global, Personalised and Real Time Data Driven Services
Data to Visualize or Data to Automate ?
2013 2014 2015 2016 2017 2018
Automated Decision VIsualize To Decide
Moving to a world of automated decision making
Involve product team
Product FeaturePersonalised Item Ranking
Product FeatureNotify User Only when Needed
Product Feature:Historical Data For Path Optimisation
Have Product Management Deeply Involved In the Data Team
Where is your added value ?
Is the problem at the Core of my Business Process?
Is it a common problem / with share data ?
Go for Best of Breed SAAS
Solution
Can I Solve it on my own ?
Really ?
Build by the data team
Build by the data team ?
Build by the data team
Hire Consultants and Learn
Yes
Yes No
I can’t Ok, I can try
Yes!
No!
No
Be aware of the confort zone
Mission
Critical
Small
StructuredLarge
Diverse
Sheer
Curiosity
Reporting
for Finance
in Any Industry
Analyze
Each Tweet
Web Navigation
For E-Merchant
Ticket Data
For Discounts
in Retail
Phone Call
Logs for Security
RTB Data
For Advertising
Customer
Consumption
For Anti-Churn
in Utilities
Optimization
Filings
For Fraud
in Insurance
Not Enough
Data To Learn
From ?
Not Enough
“Hard" Examples
So that you can learn
Create an "API" Culture
Do not share
• Random Piece of Code
• Flat File
Do share
• Reproductible documented workflows
• Clean, documented APIs
Food for thoughtswww.dataiku.com/blog
Free Data Science Software
www.dataiku.com/dss
THANK YOU !
Data Science
Is no longer a science