How to Build a Successful Data Team ?
Hi ! Self-Introduction
x50++48
+ x1
FlorianDouetteau DATAIKU
Dataiku - Data Tuesday
Meet Hal Alowne
3
Big Guys • 10B$+ Revenue • 100M+ customers • 100+ Data Scientist
Hal Alowne BI Manager Dim’s Private Showroom
Hey Hal ! We need a big data platform like the big guys.
Let’s just do as they do!
‟”Average E-commerce Web site
• 100M$ Revenue • 1 Million customer • 1 Data Analyst (Hal Himself)
Dim Sum CEO & Founder Dim’s Private Showroom
Big Data Copy Cat Project
TECHNOLOGY DISCONNECT
4
Welcome to Technoslavia !
5
LOL PLATFORM ANTI-PATTERN
6
Test and Invest in Infrastructure == Skilled People or
Go For Cloud / Packaged Infrastructure
YourBrandNewHadoopClusterisperceivedasslow,notsousedandnotreliable
TECHNO MISMATCH ANTI-PATTERN
7
Assume Being Polyglot or
Be a Dictator
VS
VS
ThePythonClan
TheRTribe
TheOldElephantFraternity
TheNewElephantClub
PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY
8
Website2000’winners
Companiesthatwereabletoreleasefast
"ArtificialIntelligencewithDataforInternetofThings"2010’winners
Companiesabletoputintelligenceinproduction
?
Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION
PEOPLE DISCONNECT
9
Classic BI Team Org
Business Leader Data Consumer
Line-of-business Data Consumer
Business Project Sponsor BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
DBA / IT Data Owner
Specs
Data Science Team Org
Business Leader Data Consumer
Line-of-business Data Consumer
Business Project Sponsor Data Team Manager
Data Engineer
Data Analyst
Data System Engineer / Data Architect
Specs
Data Scientist
Built From Scratch
12
Business Leader Data Consumer
Line-of-business Data Consumer
Business Project Sponsor
DBA / IT Data Owner
Specs
Built From Engineering
13
Business Leader Data Consumer
Line-of-business Data Consumer
Business Project Sponsor
Specs
Built From Analysts
14
Business Leader Data Consumer
Line-of-business Data Consumer
Business Project Sponsor
Specs
Manage Expectations
15
Data Plumberer
Data Engineer
Data Scientist
Data Waiter
Data Cleaner
Data Analyst
REALJOB
DREAMJOB
Perfectly Natural Hidden thoughts
16
Business Project Sponsor
Data Team Manager
Data EngineerData Analyst
Data Scientist
Managing Extreme Personalities
17
Data SCIENTIST
Highly Creative
Passionate
Hard to hire ?
Hard to manage ?
Want to take your job ? Ambitious
Paired for Data
18
Data Analyst
Discover Patterns
Data Engineer
Make things work
Fight
data entropy
Entropy tech
entropy
When do you prefer ?
19
One Analyst
One Engineer That work together ?
Two data scientists
Data Disconnect
20
DATA DISCONNECT
What is the main reason for data project to fail ?
21
DATA NOT
AVAILABLE
BUT FOR ONLY INCREMENTAL GAIN
Contribu=ontotheoverallprojectperformance
0% 25% 50% 75% 100%
20%30%50%
BusinessGoalDefinitionandData FeatureEngineering Algorithm
How to Get Data if you don’t have it
23
THEGRASSHOPER THESPIDER THEFOX
The Cicada : Optimistic and Opportunistic Data
24
THECICADA
As a startup
As a group inside a company
- Build a new product using open data
- Benefit from the data sharing initiative within your company
- Wait for data to be available in your data lake
The Spider: Power of the Network
25
THESPIDER
As a startup
As a group inside a company
- Create a network of (web trackers | sensors)
- Make it available for free
- Build your service on people’s collected data
- Make a web service available to collect data
- Promote it internally so that people use it
The Fox: Hunt for the Big Money first
26
THEFOX
As a startup
As a group inside a company
- Hunt for a Business Group within a large company with a problem
- Build a SaaS solution using their data
- Replicate to competitors
- Take in a charge a critical problem as per the CEO’s request
- Build your own integrated tech team to solve it
- Use those ressources to reset data services internally
Product Disconnect
27
PRODUCT DISCONNECT
What is Big Data about ?
28
The Age Of Distributed Intelligence
29
Global,PersonalisedandRealTimeDataDrivenServices
Data to Visualize or Data to Automate ?
30
2013 2014 2015 2015 2017 2018
AutomatedDecision VIsualizeToDecide
Moving to a world of automated decision making
Where is your added value ?
31
IstheproblemattheCoreofmyBusinessProcess?
Isitacommonproblem/withsharedata?
GoforBestofBreedSAASSolution
CanISolveitonmyown?
Really?
Buildbythedatateam
Buildbythedatateam?
Buildbythedatateam
HireConsultantsandLearn
Yes
Yes No
I can’t Ok, I can try
Yes!
No!
No
Be aware of the confort zone
32
Mission Critical
Small Structured
Large Diverse
Sheer Curiosity
Reporting for Financein Any Industry
Analyze Each Tweet
Web Navigation For E-Merchant
Ticket DataFor Discountsin Retail
Phone Call Logs for Security
RTB Data For Advertising
Customer Consumption For Anti-Churn in Utilities
Optimization
FilingsFor Fraud in Insurance
Not EnoughData To Learn From ?
Not Enough“Hard" Examples So that you can learn
Infuse the Data and Try Mindset
33
BrendanSternisnowaSpecialistforDataScienceinHealthcareatDataiku
“ When I was 20, as I was working as a manager at my Starbucks shop, I realised that I could probably enhance the amount
of sales for ground coffee. Depending on the day and time of days, I kept moving around the ground coffee. I manage to made some
A/B tests that optimised the average sale amount by 12%”
Involve product team
34
ProductFeaturePersonalisedItemRanking
ProductFeatureNotifyUserOnlywhenNeeded
ProductFeature:HistoricalDataForPathOptimisation
Have Product Management Deeply Involved In the Data Team
Create an "API" Culture
35
Do not share • Random Piece of Code • Flat File
Do share • Reproductible documented workflows • Clean, documented APIs
36
WAITING FOR QUESTIONS SLIDE WAITING FOR QUESTIONS SLIDE
Morefoodforthoughts onDataiku’sblog
http://www.dataiku.com/blog/
FindusonTwitter
@fdouetteau @dataiku
Top Related