Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
-
Upload
dataiku -
Category
Technology
-
view
297 -
download
1
Transcript of Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
I’m Florian CEO of Dataiku maker Data Science Studio,the « Photoshop for Data Science »
COMMUNITY EDITION (it’s FREE) http://www.dataiku.com/dss/trynow/
H i !
React on twitter @fdouetteau #BigDataParis
WHAT IF THE META GROUP HAD CHOSEN ANOTHER LETTER?
Capacity Complexity Celerity
Size Serendipity Speed
Big Blur Blazing
M L I K E M E T R I C S
How much does it cost to produce and maintain a metric ?
How many metrics do I need ?
Do I Follow the right metrics ?
Do I Have enough data ?
Do I Have enough Data?
• Self-ServiceBuild your own metrics
• Analytical Capabilities Find your patterns
• Large VolumeStore it all
M o r e M e t r i c s M e a n s M o r e M e a n s
DATA MINING
M o r e M e t r i c s M e a n s M o r e A p p l i c a t i o n
Mission Critical
Small Structured
Large Diverse
Sheer Curiosity
Reporting for Financein Any Industry
Analyze Each Tweet
Web Navigation For E-Merchant
Ticket DataFor Discountsin Retail
Phone Call Logs for Security
RTB Data For Advertising
Customer Consumption For Anti-Churn in Utilities
CLASSIC BI
LARGE PRODUCTION
PLATFORM
DATAEXPLORATION
Optimization
FilingsFor Fraud in Insurance
D
DATA MINING
T O D AY E A C H O W N A S I T S S T O R E
Mission Critical
Small Structured
Large Diverse
Sheer Curiosity
CLASSIC BI
LARGE PRODUCTION
PLATFORM
DATAEXPLORATION
Optimization
DATA WAREHOUSING
DATA MININGREPOSITORIES DATA LAKE
GOOGLE LIKE PLATFORM
P r o b l e m i s t h e h u m a n
Cannot take decisions in seconds Limited sight (100 rows) Limited short term memory (10k rows)?
R i s e o f A I
1997 Deep Blue 2011 Watson’s Jeopardy
2012 Google Cat2005 Autonomous Vehicule
1974 - 1993 AI Winters
www.dataiku.com
Churn
Volume Forecast
RecommenderSegmentation Lifetime Value
Risk Score Hot Location
Pricing Ranking FraudEvent Paths
APPLICATIONS OF MACHINE LEARNING TO
BUSINESS PROBLEMS
P R E D I C T I V E M A I N C O N F O R T Z O N E
Mission Critical
Small Structured
Large Diverse
Sheer Curiosity
Reporting for Financein Any Industry
Analyze Each Tweet
Web Navigation For E-Merchant
Ticket DataFor Discountsin Retail
Phone Call Logs for Security
RTB Data For Advertising
Customer Consumption For Anti-Churn in Utilities
Optimization
FilingsFor Fraud in Insurance
Not EnoughData To Learn From ?
Not Enough“Hard" Examples So that you can learn
Dataiku - Pig, Hive and Cascading
Welcome to Technoslavia
Hadoop Ceph
Sphere Cassandra
Kafka Flume Spark
Scikit-Learn GraphLAB prediction.io jubatus
Mahout WEKA
MLBase LibSVM
RapidMiner Panda
Kibana
InfiniDB Drill Spark SQL
Hive Impala
…
Elastic Search
SOLR MongoDB
Riak Membase
Pig
Cascading
Talend
Machine Learning Mystery LandScalability Central
SQL Colunnar Republic
Vizualization County Data Clean Wasteland
Statistician Old House
R Real-time island
Storm
NOSQL Nihiland
E m b r a c e M a n y S k i l l s M a n y - S e t s
Data Plumberer
BI Manager
Data Scientist
Data Waiter
Data Cleaner
Business Analyst
REALJOB
DREAMJOB
• Reformulation de la recherche
• Pas de réponse
• Clic sur un pro• Top recherche• Clic de navigation ou filtre
COMMENT AMÉLIORER LA PERTINENCE DE NOS RÉPONSES VIA L’ANALYSE DU COMPORTEMENT UTILISATEUR ?
20 M
Analyse & corrections
automatisation
>10 occurrences1,4M
requêtes
>200M recherches
✗ ✓
0,5M requêtes priorisées
"PREDICTIVE CONTENT MANAGEMENT” FROM PAGES JAUNES
Machine
Gestion Exploration
pagesjaunes.frAnnuaire
hadoop PIG+Hive
Export indexation
Moteur d’interprétation
crawl Autres référentiels
Sickit-learn
O p t i m i z i n g L a s t M i l e w i t h D a t a S c i e n c e S t u d i o
Data Science Studio
Historical delivery and retrieval data
Modeling of a score for each delivery
Cleaning and temporal enrichment of data
Data aggregation by geographic location
Incorporation of new deliveries to the existing model
by
E X P L O R E N E W W O R D S
Mission Critical
Small Structured
Large Diverse
Sheer Curiosity
Optimization
OptimizeExisting
BI Capabilities Build MandatoryLarge Volume Capabilities
EXPLORE POTENTIAL
NOT BEING RELEVANT DANGER ZONE
Analytics
Predictive
Self Service
Cluster