OWF 2014 - Take back control of your Web tracking - Dataiku

37
www.dataiku.com Take back control of your Web Tracking @ClementStenac CTO, Dataiku

description

Why you should probably do your own web tracking, what are the challenges. Concludes with a presentation of the WT1 open source web tracker.

Transcript of OWF 2014 - Take back control of your Web tracking - Dataiku

Page 1: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Take back control of your Web Tracking

@ClementStenacCTO, Dataiku

Page 2: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Give me dashboards !

Page 3: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Choose one

Raw dataDo what you want

Your moneyAccess to raw data is a premium feature

Page 4: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Who cares about raw data ?

• SAAS analytics are full-featured• Custom variables to link with your backend data

• Did you really join all data for yourfuture needs ?

• Do you have access / want to push to the JS all necessary data ?

• What kinds of analysis will you do later on ?

Page 5: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

A real exampleSegmentation and tracking user-satisfaction

Raw tracking

data

User-level stats

User base segmentation

Metrics per segments

Tracking over time

TB

GB

Page 6: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

User-level data

Page 7: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Clustering

Page 8: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Labeling

Search for a specific Topic

Newcomer from Google News

Foreigner Discovering The

Site

Fan who loves to comment

Home Page Wanderer

Dark Bot (Competitor?)

Here you need your business intelligence

Page 9: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Compute metrics per segment

Search for a specific Topic Newcomer from

Google NewsForeigner

Discovering The Site

Fan that loves to comment

Home Page Wanderer Dark Bot

(Competitor?)

0.3€ per session0.23€ acquisition costs

```

13k sessions1.3€ per session0.23€ acquisition costs

938k sessions

938k sessions0.3€ per session0.23€ acquisition costs

738k sessions0.83€ per session0.73€ acquisition costs

68k sessions0.3€ per session

1.23€ acquisition costs

1k sessions0€ per session

0€ acquisition costs

Here you need tocross with your CRM

Page 10: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Track metrics over time

Search for a specific Topic

Newcomer from Google News

Foreigner Discovering The

Site

Fan that loves to comment

Home Page Wanderer

Dark Bot (Competitor?)

Using your already-computed segments

Damnour latest

releasehas diverging

effects on segments

Page 11: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

A few other examples

• Churn prediction and explanation

• Customer lifetime value prediction

Page 12: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

OK

I WANT TO DO IT

Page 13: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

So, I have these Apache logs

• First level of web tracking• "Nothing required"

Page 14: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Are backend logs a solution ?

Challenge 1 : Identify a visitor• IP ?

• NAT / Proxy• Not everyone has a public IP address

• IP + user-agent ?• Big companies !

Page 15: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Are backend logs a solution ?

Challenge 2 : Re-create sessions• Using expiration times• Advanced SQL / Hive / …

makes this easier

Page 16: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Are backend logs a solution ?

Challenge 3 : single-page webapps• Track behaviour within each page• Track events, not pages

Also: getting logs from IT is sometimes another challenge

Page 17: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Client-side tracking

• visitor_id and session_id handled with cookies• Tracking page loads and various events

• Historically, "tracking" = fetching a 1x1 image• AJAX

www.website.com

Browser

tracker.com

JS tracking code

Tracking calls

Page 18: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Are cookies good for your (web) health ?

• Each cookie belongs to a domain(and its subdomains)

• Who can write a cookie ?– The HTTP server, who becomes owner

(via the Set-Cookie HTTP header)– JS code running on the "owner" domain

• Who can read a cookie ?– The owner HTTP server (sent by the browser)– JS code running on the "owner" domain

Page 19: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

First-party cookies

• Set by the originating server (HTTP) or JS code• Belong to the originating domain• Sent by HTTP to the originating domain only• Readable by JS code

www.website.com

Browser

Cookies for www.website.com:None

tracker.com

GET /Cookies: none

Fetch tracking script

Tracking JS code: read cookies for www.website.comTracking JS code: create visitor id and set cookie

Contents

Page 20: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

First-party cookies

• Set by the originating server (HTTP) or JS code• Belong to the originating domain• Sent by HTTP to the originating domain only• Readable by JS code

www.website.com

Browser

tracker.com

GET /track?visitor_id=d37ecbaCookies: None

JS code: send AJAX request to tracker.com with visitor_id

Cookies for www.website.com:visitor_id=d37ecba

Page 21: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Third-party cookies

• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain• Not send by HTTP to the originating domain (does not belong)• NOT readable by JS code (does not belong)

www.website.com

Browser

tracker.com

GET /Cookies: none

Fetch tracking script

Contents

Cookies for www.website.com:None

Cookies for tracker.com: None

Page 22: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

www.website.com

Browser

Cookies for www.website.com:None

tracker.com

Cookies for tracker.com: None

GET /trackCookies: None

200 OKSet-Cookie: visitor_id=33d7

Tracker code: assign visitor_id

Third-party cookies

• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain• Not send by HTTP to the originating domain (does not belong)• NOT readable by JS code (does not belong)

Page 23: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Third-party cookies

• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain• Not send by HTTP to the originating domain (does not belong)• NOT readable by JS code (does not belong)

www.website.com

Browser

tracker.com

Cookies for tracker.com: visitor_id=33d7

GET /trackCookies: visitor_id=33d7

200 OK

Tracker code: read visitor_id

Cookies for www.website.com:None

Page 24: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

First party cookie

• Tracks on a single website• Requires JS code for tracking• Reduced privacy impact:

No exchange of information between sites

• Usage: track your user's behaviour

Third party cookie

• Tracks across all websitesusing the same tracker

• More frowned upon

• Usage: generally, adsbut also multi-website

Why each ?

Rarely blocked(used for logins)

Blocked by up to 40% visitors

Page 25: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

What are your obligations ?

With ALL cookies• You should ask user whether he wants cookies• Even non-tracking related cookies• Yes, even login-related ones

Page 26: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

What are your obligations ?

With third party cookies• Obey the Do-Not-Track header

www.website.com

Browser

tracker.com

GET /trackCookies: NoneDNT: 1

200 OK

Tracker code: DO NOTHING

Page 27: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

What are your obligations ?

With third party cookies• Provide an opt-out URL• Allows the user to /optin , /optout or /status

See in action : www.youronlinechoices.com

Page 28: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

What are your obligations ?

With third party cookies• Provide a P3P policy• Else, older IE blocks you

"What are you doing with my data ?"

Looks like this: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"

Page 29: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Tracking in mobile apps

• Preserve battery– Each network call is costly– Do not track everything synchronously

• Network access is intermittent– Queue events and wait for network access

Page 30: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

So, what are my choices ?

• You might really want to be your own web tracker

• Most used open source Webtracker : Piwik

• Provides both raw data and nice dashboards– MySQL backend– Raw data via API– Slightly less suited for analytics

Page 31: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

WT1

YOUR OWN TRACKERIN MINUTES

Page 32: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

WT1

An open source (Apache License) serverto build your own web tracking

https://github.com/dataiku/wt1

• Designed to provide you with raw data, directly usable for analytics

• Very high performance and scalability

Page 33: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Features

• 1st or 3rd party cookies– Handling of DNT and opt-out– Helps handling P3P

• Track events or pages with key-value data• Visitor-scope and session-scope variables

• "Live view" debugging console

Page 34: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Features

• Dashboards: None

• Events processing and storage– Filesystem, S3– Event queues: Flume– Custom processors

• JSON API for custom tracking

• iOS library

Page 35: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Architecture

Client-side JS tracker

iOS library

• 1st or 3rdparty cookies

• Event-level tracking

• Automatic batching• Queuing to deal with

network interruptions

WT1 Server

Raw storage• Filesystem• S3

Event processors:• Real-time aggregations• Custom code

Event queues• Flume • Kafka, RabbitMQ, …

• Java• > 20K events / second• Handles DNT, P3P, opt-out, …

JSON POST

Page 36: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Future work

• Android library

• More event queues supported OOTB– Kafka– RabbitMQ

• Avro storage

Page 37: OWF 2014 - Take back control of your Web tracking - Dataiku

www.dataiku.com

Thank you !

Clément [email protected]@ClementStenac

www. .com