Dataiku - google cloud platform roadshow - october 2013

Post on 18-Dec-2014

754 views 1 download

description

 

Transcript of Dataiku - google cloud platform roadshow - october 2013

Data Science Studio

19 customers

Founded in January 2013

Data Science For Everyone

(big) data(s) + machine learning + for practical applications = Data Science

The Project

(c) Dataiku 2013 - Confidential

Hal Alowne BI Manager Dim’s Private Showroom

Dim Sum CEO & Founder Dim’s Private Showroom Medium size e-commerce •  100M$ revenue •  1 Data Analyst

Big Guys $10B + revenue 100+ Data Scientists

Hey Hal ! We need a big data platform, like the big guys! Let’s just do as they do!

Hal Wish #1Global Customer Value Funnel

SEO

NewsLetter

Display Retargeting Display AdWords

Marketplace Direct Sales

Delivery

View Basket

Support Returns

$

$ $ $

Orders

Hal Wish #2Why people drop basket ?

9/30/13 5

Basket

Payment refused

Credit Refused

Cheaper elsewhere ?

Delivery costs ? Wait Xmas?

ACTION

Hal Wish #3What product to put on top ?

9/30/13 6

Original Most Popular on top

Better Machine Learning Score (age/discount/margin…)

Advanced Machine Learning Score + Personalization

9/30/13 7

Why is it so

complicated

?

Partner Data Spaghetti

Mailing Partner

DMP Partnerz

Mail Optimizer

Retargeter

Market Data Providers

Social z Networks

Database are Full

9/30/13 9

1 TB BI Database

20 TB BI Database

Any new computing job take > 1 day

NEED FOR SCALE

Architecture Bingo

9/30/13 10

BI Real-Time Batch Real Real-Time

Simple Queries

Statistics

Machine Learning

Hive

Pig

Spark

MongoDB

ElasticSearch

Cascading

R

Hadoop Ceph

Sphere Cassandra Spark

Scikit-Learn

Mahout WEKA

MLBase

RapidMiner

Panda D3 Crossfilter

InfiniDB LucidDB

Impala

Elastic Search SOLR

MongoDB Riak

Membase

Pig Hive Cascading Talend

Machine Learning !Mystery Land!

Scalability Central!NoSQL-Slavia!

SQL Columnar Republic!

Vizualization County! Data Cleanup Wasteland!

Statistician Old !House!

R

Hal’s Bingo !

9/30/13 12

HADOOP Google Cloud Platform Dataiku

Dataiku Open Source Web Tracker (WT1) }  Apache License }  Javascript & IO }  Write directly to Google

Cloud Storage }  Full Java, Easy To Deploy

Step 1 Get your own data

9/30/13 13

Silent in night Autoscale during Sales summer and winter

Step 2 Mix All Your Data

9/30/13 14

4 VMs on GCE

Tracking Data

Internal Data

Partner Data

Data Science Studio Pig Hive

HADOOP

auto-sync to BigQuery

Step 3 Mine your Data

9/30/13 15

Builtin Predictive Models

Advanced Adhoc Models (R or Python)

Shared Web Based Data Mining Platform

}  January ◦  Choose Partner / Setup the architecture

}  February ◦  Initial Deployment : 4TB ◦  Replace BI

}  May ◦  New Applications (SEO, …)

}  September ◦  Scale Deployment to 15TB ◦  Integrate all channels

Typical Project Calendar

9/30/13 16

}  Enhance Daily Report Availability ◦  Previous architecture �  Between H+17 and H+26 (!) ◦  Hadoop on GCE �  Between H+3 AND H+7

}  +21% Email Channel Optimization }  SEO plan optimization }  and a dozen BI Style “apps”

Some Success For the Project

9/30/13 17

Thank you !

9/30/13 18

Follow us on twitter @dataiku

Ask any big data question florian.douetteau@dataiku.com