Looker's Ben Porterfield - Asking The Right Questions

Post on 23-Jul-2015

130 views 1 download

Tags:

Transcript of Looker's Ben Porterfield - Asking The Right Questions

Ben Porterfield Founder, VP Engineering

Business Analytics: Asking the Right

Questions

B U S I N E S S I N T E L L I G E N C E

Operational Control How many sales did I do today?

Understand & Improve Experience Are users engaging? Do they like the new features?

Make business decisions Should we start delivering in a new city?

—ANDREW LEONARD Salon

“Data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher”

1 1 Tracking Data

2 Storing Data

3 Merging Data (ETL)

4 Retrieving Data

5 Analysis & Decision Making

The Analytical Process

Tracking Data

What To Track?

Views Clicks

In-app actions

E v e n t

Users Orders

Inventory

T r a n s a c t i o n a l

Embed in product process

Server-side too

Taxonomy matters

Tracking - Event Data

Every new feature should come with events

Lots of non-transactional events happen on server

Big flat event space becomes unwieldy

Storing Data

Go with SQL

Store all states

Keep it clean

Storing - Transactional Data

NoSQL could be a burden long-term

Even offline processes Messy schema = complicated analytics

—MICHAEL ERASMUS Back-end Engineer, Buffer

“We were relying on MongoDB…while it was easier for developers to play with the data, it became a hurdle for other team members.”

Own it Use eco-system too

Store all the IDs

Storing - Event Data

Or, at a minimum, be able to get it

Lots of great SaaS event platforms

Need to be able to correlate events to transactions

Merging Data

O t h e r D a ta?

Tra n s a c t i o n a l D a ta E ve n t D a ta

Ra w Q u e r i e s B i z- Us e r To o l s

You should combine transaction and event data, +more

Use an analytical database

Redshift is current leader

Difficult - data is heavy

A p p l i ca t i o n

WITH  user_order_activity  AS  (        SELECT  user_id,  age      FROM  ORDERS      GROUP  BY  user_id)  SELECT  AVG(users.age)  as  average_age_of_purchaser  FROM  user_order_activity  LEFT  JOIN  users  ON  user_order_activity.user_id  =  users.user_id    

SUMMARY

Traditional Approach

OLAP / Data Summaries S I L O E D

Restricted Q&A L I M I T E D

I

G

M N

L

D

Q

B A

P

R

S

Q

D I F F I C U LT & C O M B E R S O M E ETL - Heavy Transformation

E N D US E R B I T E A M E T L T E A M E D W T E A M

W A N T T O A S K N E W Q U E S T I O N S ?

A B

? C F

X

E B

A EVENT DATA

TRANSACTIONAL DATA

Modern Approach

3 R D PA RTY A P P

A P I

A N Y D E V I C E

Transformation at Query F L E X I B L E

Anywhere for Anyone A C C E S S I B L E C O N S O L I D A T E D

Simple Extract & Load

I

G

M N

D

Q

A

P

R

S

Q

T U W

X

G Q

U

S

A

Z

Data Modeling Layer A G I L E

D A T A T E A M E N D U S E R S

Data Model

- name: first_purchasers type: single_value base_view: orders measures: [orders.first_purchase_count] listen: - name: orders_by_day_and_category title: "Orders by Day and Category" type: looker_area base_view: order_items

I N N O V A T I O N

TRANSACTIONAL DATA

EVENT DATA

Z

B

Q A A Z

M P P | R E D S H I F T | I M P A L A

Asana’s Data Infrastructure

Retrieving Data

—TODD LEHR SVP Engineering, Dollar Shave Club

“We have a developer name Juan and any reports we needed would flow through him.”

—TODD LEHR SVP Engineering, Dollar Shave Club

“When he got backlogged, our team didn’t have access to the data immediately.”

—ANNIE CORBETT Business Intelligence Analyst, Venmo

“Initially whenever we were asked for data, we would write a custom script…”

—ANNIE CORBETT Business Intelligence Analyst, Venmo

“..and then repeat this process whenever the product team wanted to extend the timeframe.”

What’s selling? What colors and sizes is it selling in?

What’s getting returned? Is there a particular size/color?

Is there a product people buy first that increases their likelihood of becoming a repeat customer?

Questions from a retail buyer at e-commerce store:

Get them the tool

Decisions vs. data science

Game-changing insights

Self-Service is Key

People with questions are running the businesses.

“Should we open a new market in Maine?”

Don’t only come from analyst group

Analysis and Decision Making

1 1 Clearly define success metrics

2 Look for low-hanging fruit

3 Go one level deeper

Analysis and Decision Making

Analysis and Decision Making: Success Metrics

Focus on desired outcome What do you want users to experience?

Measure Engagement In most cases this is first-line business analytics

Measure Retention Are people coming back?

S U C C E S S M E T R I C S

H O W T O T R A C K E N G A G E M E N T ?

Not with page views Usually not even with time on page

Upworthy’s attention minutes Lots of indicators (mouse, video, etc)

Looker’s approximate usage Any event in 2 minute window

Deriving Approximate Usage

SELECT      event.created_at  AS  created_date,      event.user_id  as  user_id,      COUNT(*)  AS  count,      COUNT(DISTINCT          CONCAT(              CONCAT(event.user_id,'|',event.user_browser_id),              FLOOR(UNIX_TIMESTAMP(event.created_at)/(60*2))          )      )*2  AS  approximate_usage_in_minutes  FROM  event  GROUP  BY  created_date,  user_id  

created_date user_id   count   approximate_usage  

1/10   1 123 100 minutes

1/10   2 228 50 minutes

1/10   3 45 80 minutes

Derived Tables

SELECT      orders.user_id  as  user_id      COUNT(*)  as  lifetime_orders      MIN(orders.created_at)  as  first_order      MAX(orders.created_at)  as  latest_order      COUNT(DISTINCT  DATE_TRUNC('month’))  as        

 distinct_months_with_orders  FROM  orders  GROUP  BY  user_id  

Transactional

Event

Analytical

Derived Table

Insights

Start simple

Most useful at row level

Great for cohorts and sessionization

Derived Tables

Subselects until slow, SQL on cron works surprisingly well

Don’t roll up data, pre-compute facts

Tiered derived dimension vs. some other metric

Derived Table - User Order Facts

SELECT      orders.user_id  as  user_id      COUNT(*)  as  lifetime_orders      MIN(orders.created_at)  as  first_order      MAX(orders.created_at)  as  latest_order      COUNT(DISTINCT  DATE_TRUNC('month’))  as        

 distinct_months_with_orders  FROM  orders  GROUP  BY  user_id  

user_id lifetime_orders   first_order   latest_order   distinct_months_with_orders  

1   10 1/10/15 2/14/15 2

Derived Table + Sourcing

Derived Table + Sourcing

Churn Users that will likely never do X again

Usage How likely to purchase if they do X

Time to transaction How long till first X

Retention Are users coming back

??? Invent a metric

Repeat buyers What’s different about them

Pay/Charge Mistake.

It was clear some users were

accidentally paying instead of charging, but it wasn't clear

how widespread the problem was and

whether it was worth prioritizing a fix

Inventing Metrics

Identify behavior

Measure % of population

Experiment

Inventing Metrics

Can be good or bad – just something possibly significant

Who is doing this thing? Ability to play with numbers is crucial

Analysis and Decision Making: Low-hanging Fruit

This is the kind of very visual, very

data‑driven piece of analysis that

helps us think, "Is opening the sale at

noon the right decision?”

???

Low-hanging Fruit

Out of stocks are huge detractors from

the customer experience - it sucks ordering something and then not getting

it - as well as revenue we failed to

capture

Low-hanging Fruit

Analysis and Decision Making: One Level Deeper

While this immediate insight might have led us to focus on small groups, this didn’t match our expectations of

people planning an outing on a Friday

night, prompting us to look further.

One Level Deeper

2 3 4

Time To Book

2 3 4

Group Size

We analyze all the platform data

available - When someone attempts to sign, completes the signup, pushes an app, has spend,

etc

One Level Deeper

Even though it looks like we were

having nice incremental

growth, looking into the details we see

some things to look into further

One Level Deeper

Don’t confuse an increase in a metric with success.

Put data in analytical database

Give business users tool

Define success metrics

Takeaways

Make sure it’s fast and speaks SQL

Empower them to answer their own questions

Focus on engagement and retention

Ben Porterfield Founder, VP Engineering

ben@looker.com