Data mining for causal inference: Effect of recommendations on Amazon.com

64
Data mining for causal inference AMIT SHARMA Postdoctoral Researcher, Microsoft Research (Joint work with JAKE HOFMAN and DUNCAN WATTS, Microsoft Research) http://www.amitsha rma.in @ amt_shrma 1

Transcript of Data mining for causal inference: Effect of recommendations on Amazon.com

Page 1: Data mining for causal inference: Effect of recommendations on Amazon.com

1

Data mining for causal inferenceAMIT SHARMA Postdoctoral Researcher, Microsoft Research

(Joint work with JAKE HOFMAN and DUNCAN WATTS, Microsoft Research)

http://www.amitsharma.in@amt_shrma

Page 2: Data mining for causal inference: Effect of recommendations on Amazon.com

2

My research Analyzing the effect of online systems

◦ Recommender systems [WWW ’13, EC ’15, CSCW ‘15]◦ Social news feeds [CSCW ‘16]◦ Web search

Methodological◦ Threats to large-scale observational studies [WWW ’16b]◦ Mining for natural experiments [EC ‘15]◦ New identification strategies suited for fine-grained data◦ Testing assumptions for validity of an instrumental variable◦ Gaps between prediction and understanding [WWW ’16a, ICWSM ‘16]

Page 3: Data mining for causal inference: Effect of recommendations on Amazon.com

What is the effect of a recommender system?

Page 4: Data mining for causal inference: Effect of recommendations on Amazon.com

4

How much do they change user behavior?

Page 5: Data mining for causal inference: Effect of recommendations on Amazon.com

5

Naively, up to 30% of traffic comes from recommendations

Page 6: Data mining for causal inference: Effect of recommendations on Amazon.com

6

Naively, up to 30% of traffic comes from recommendations

“Burton Snowboard, a sports retailer, reported that personalized product recommendations have driven nearly 25% of total sales since it began offering them in 2008. Prior to this, Burton’s customer recommendations consisted of items from its list of top-selling products.”

Page 7: Data mining for causal inference: Effect of recommendations on Amazon.com

Almost surely an over-estimate of the actual effect, because of correlated demand between products.

Page 8: Data mining for causal inference: Effect of recommendations on Amazon.com

Example: product browsing on Amazon.com

Page 9: Data mining for causal inference: Effect of recommendations on Amazon.com

Example: product browsing on Amazon.com

Page 10: Data mining for causal inference: Effect of recommendations on Amazon.com

Example: product browsing on Amazon.com

Page 11: Data mining for causal inference: Effect of recommendations on Amazon.com

Counterfactual browsing: no recommendations

Page 12: Data mining for causal inference: Effect of recommendations on Amazon.com

Counterfactual browsing: no recommendations

Page 13: Data mining for causal inference: Effect of recommendations on Amazon.com

Problem: Correlated demand may drive page visits, even without recommendations

Page 14: Data mining for causal inference: Effect of recommendations on Amazon.com

14

The problem of correlated demand

Demand for winter

accessories

Visits to winter hat

Rec. visits to winter

gloves

Page 15: Data mining for causal inference: Effect of recommendations on Amazon.com

15

Goal: Estimate the causal effect

Causal

Convenience

OBSERVED CLICK-THROUGHS WITHOUT RECOMMENDER

Convenience

?

Page 16: Data mining for causal inference: Effect of recommendations on Amazon.com

16

Ideal experiment: A/B Test

Treatment (A)Control (B)

But, experiments:may be costlyhamper user experiencerequire full access to the system

Page 17: Data mining for causal inference: Effect of recommendations on Amazon.com

Can we derive an observational strategy to identify the causal effect of recommendations?

Page 18: Data mining for causal inference: Effect of recommendations on Amazon.com

18

Using natural variations to simulate an experiment

Page 19: Data mining for causal inference: Effect of recommendations on Amazon.com

19

Studying sudden spikes, “shocks” to demand for a book

[Carmi et al. 2012]

Page 20: Data mining for causal inference: Effect of recommendations on Amazon.com

20

The same author’s recommended book may also have a shock

Page 21: Data mining for causal inference: Effect of recommendations on Amazon.com

21

Past work Uses statistical models to control for confounds Carmi et al. [2012], Oestreicher and Sundararajan [2012] and Lin [2013] construct “complementary sets” of similar, non-recommended products.

Garfinkel et. al. [2006] and Broder et al. [2015] compare to model-predicted clicks without recommendations.

But, 1. These assumptions are hard to verify.2. Finding examples of valid shocks requires ingenuity

and restricts researchers to very specific categories

Page 22: Data mining for causal inference: Effect of recommendations on Amazon.com

22

This talk: Using data mining for natural experiments

I. Data-driven instrumental variables

“Shock-IV” method: Mining for sudden spikes (“shocks”) in data

II. General data-driven identification strategy for time series data “Split-door” criterion: Generalizing the idea of shocks

Throughout, we will use Amazon’s recommendation system as an example.

Page 23: Data mining for causal inference: Effect of recommendations on Amazon.com

23

I. Shock-IV: Mining for valid natural experiments

Page 24: Data mining for causal inference: Effect of recommendations on Amazon.com

24

Distinguishing between recommendation and direct traffic

All visits to a product

Recommender visits Direct visits

Search visits

Direct browsing

Proxy for unobserved demand

Page 25: Data mining for causal inference: Effect of recommendations on Amazon.com

25

The Shock-IV strategy: Searching for valid shocks

? ?

Page 26: Data mining for causal inference: Effect of recommendations on Amazon.com

26

The Shock-IV strategy: Filtering out invalid shocks

Page 27: Data mining for causal inference: Effect of recommendations on Amazon.com

Search for products that receive a sudden shock in their traffic but direct traffic for their recommendations remains constant.

Page 28: Data mining for causal inference: Effect of recommendations on Amazon.com

Why does it work? Shock as an instrumental variable

Demand

Focal visits (X)

Rec. visits (Y)

Sudden Shock

Directvisits (Y)

Page 29: Data mining for causal inference: Effect of recommendations on Amazon.com

Computing the causal estimate

Increase in recommendation clicks (Δr)

Causal CTR (ρ) = Δr/Δv

*Same as Wald estimator for instrumental variables

Increase in visits to focal product (Δv)

Page 30: Data mining for causal inference: Effect of recommendations on Amazon.com

Application to Amazon.com, using Bing toolbar logs

Anonymized browsing logs:

• 23 million pageviews

• 1.3 million Amazon products

• 2 million Bing Toolbar users

Sept 2013-May 2014

Page 31: Data mining for causal inference: Effect of recommendations on Amazon.com

Recreating sequence of page visits by a user

Search page Focal product page Recommended product page

Page 32: Data mining for causal inference: Effect of recommendations on Amazon.com

Recreating sequence of page visits by a user

Timestamp URL2014-01-20 09:04:10

http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=George%20saunders

2014-01-20 09:04:15

http://www.amazon.com/dp/0812984250/ref=sr_1_1

2014-01-20 09:05:01

http://www.amazon.com/dp/1573225797/ref=pd_sim_b_2

Page 33: Data mining for causal inference: Effect of recommendations on Amazon.com

Recreating sequence of page visits by a user

Timestamp URL2014-01-20 09:04:10

http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=George%20saunders

2014-01-20 09:04:15

http://www.amazon.com/dp/0812984250/ref=sr_1_1

2014-01-20 09:05:01

http://www.amazon.com/dp/1573225797/ref=pd_sim_b_2

User searches for George Saunders

User clicks on the first search result

User clicks on the second recommendation

Page 34: Data mining for causal inference: Effect of recommendations on Amazon.com

I. Weekly and seasonal patterns in traffic, nearly tripling in holidays

Page 35: Data mining for causal inference: Effect of recommendations on Amazon.com

II. 30% of all pageviews come through recommendations

Page 36: Data mining for causal inference: Effect of recommendations on Amazon.com

III. Books and eBooks are the most popular categories by far

Page 37: Data mining for causal inference: Effect of recommendations on Amazon.com

IV. Apparel and shoes see a substantially higher fraction of visits through recommendations

Page 38: Data mining for causal inference: Effect of recommendations on Amazon.com

38

Shock-IV: Finding shocks in user visit data

We look for focal products with large and sudden increases in views relative to typical traffic.

Size of shock exceeds:◦ 5 times median traffic◦ Shock exceeds 5 times the previous day's traffic and 5 times the

mean of the last 7 days.

Shocked product has: ◦ Visits from at least 10 unique users during the shock◦ Non-zero visits for at least five out of seven days before and after

the shock

Page 39: Data mining for causal inference: Effect of recommendations on Amazon.com

39

Shock-IV: Ensuring exclusion restriction

Recommended product (Y) should have constant direct visits during the time of the shock.

(1-β): Ratio of maximum 14-day variation in visits to a recommended product to the size of the shock for the focal product.

Direct traffic to Y is stable relative to the shock to the focal product.

β = 1 Direct traffic to Y is no less varying than the shock to focal product.

β = 0

Page 40: Data mining for causal inference: Effect of recommendations on Amazon.com

How to choose

Focal product visits Rec. product direct visits

Focal product visits Rec. product direct visits

Accept

RejectSelect

Page 41: Data mining for causal inference: Effect of recommendations on Amazon.com

Using the method, obtain >4000 natural experiments!

20% of all products that had visits on any single day.

Page 42: Data mining for causal inference: Effect of recommendations on Amazon.com

Estimating the causal clickthrough rate ()

ρ =Δrxyt*/ Δvxt*

At β = 0.7, causal CTR =3%.

Page 43: Data mining for causal inference: Effect of recommendations on Amazon.com

Causal click-through rate by product category

Page 44: Data mining for causal inference: Effect of recommendations on Amazon.com

What fraction of the observed click-throughs are causal?

Page 45: Data mining for causal inference: Effect of recommendations on Amazon.com

45

Estimating fraction of observed click-throughs that are causal

Compare the number of estimated causal clicks to all observed recommendation clicks (non-shock period).

λ = ρxy.vxt / rxyt

Page 46: Data mining for causal inference: Effect of recommendations on Amazon.com

Only a quarter of the observed click-throughs are causal

At β = 0.7, only 25% of recommendation traffic is caused by the recommender.

Page 47: Data mining for causal inference: Effect of recommendations on Amazon.com

47

Generalization? Shocks may be due to discounts or sales

Lower CTR may be due to the holiday season

Page 48: Data mining for causal inference: Effect of recommendations on Amazon.com

48

Local average treatment effect (LATE), not fully generalizable

Shocked products are not a representative sample of all products, nor are the users who participate in them.

• Fortunately, Shock-IV method covers roughly one-fifth of all products with at least 10 visits on any single day.

• Causal estimates are consistent with experimental findings (e.g., Belluf et. al. [2012])

Page 49: Data mining for causal inference: Effect of recommendations on Amazon.com

49

Summary: Shock-IV method

I. Mining for instruments allows us to study a much larger sample of natural experiments.

II. Fine-grained data allowed us to test for exclusion restriction directly.

A simple, scalable method for causal inference.◦ Can used for improving recommender systems through causal metrics.◦ Can be applied to other domains, such as online ads.◦ Can be used for finding potential instruments.

Page 50: Data mining for causal inference: Effect of recommendations on Amazon.com

50

II. Generalizing Shock-IV: “Split-door” criterion

Page 51: Data mining for causal inference: Effect of recommendations on Amazon.com

Shocks are traditionally used to identify causal effects, but capture a very rare specialized event.

Page 52: Data mining for causal inference: Effect of recommendations on Amazon.com

Let’s have a look at the model again

Demand

Focal visits (X)

Rec. visits (Y)

Sudden Shock

Directvisits (Y)

Page 53: Data mining for causal inference: Effect of recommendations on Amazon.com

All we require is that direct traffic to recommended product is not affected by visits to focal product.(no correlated demand)

Page 54: Data mining for causal inference: Effect of recommendations on Amazon.com

54

Focal Product Recommended Product

Accept

Accept

Page 55: Data mining for causal inference: Effect of recommendations on Amazon.com

55

The split-door criterion Instead of searching for shocks, Check whether direct traffic for Y is independent of visits to X.

Demand

Focal visits (X)

Rec. visits (Y)

Direct Visits

(YD

Page 56: Data mining for causal inference: Effect of recommendations on Amazon.com

More formal: Why does it work?

Can show: Statistical independence of and X guarantees unconfoundedness between X and Y.

Demand

Focal visits (X)

Rec. visits (Y)

Direct Visits

(YD

Page 57: Data mining for causal inference: Effect of recommendations on Amazon.com

Two possibilities, both remove the effect of common demand

Demand

Focal visits (X)

Rec. visits (Y)

Dir. visits (YD

Demand

Focal visits (X)

Rec. visits (Y)

Dir. visits (YD

Page 58: Data mining for causal inference: Effect of recommendations on Amazon.com

58

Sidenote: Split-door criterion generalizes Shock-IV

By capturing shocks, we were essentially capturing notion of independence between X and

Split-door will admit all valid shocks, as also other variations.

Page 59: Data mining for causal inference: Effect of recommendations on Amazon.com

Applying to logs from Amazon recommendations

1. Divide up data into t=15 day periods.

2. Find product pairs (X and Y) such that:

: Direct visits to recommended product

Compute ρ =Δrxyt/ Δvxt

Page 60: Data mining for causal inference: Effect of recommendations on Amazon.com

Using the split-door criterion, Causal CTR , similar to the estimate from Shock-IV (

Page 61: Data mining for causal inference: Effect of recommendations on Amazon.com

61

Summary: A general identification criterion

Split-door criterion admits a broader sample of natural experiments than shocks.

Automatically tests for valid identification. Can be used whenever is separable.

Applications: Evaluate the relationship between any two timeseries: e.g. social media and news, ads and search.

Page 62: Data mining for causal inference: Effect of recommendations on Amazon.com

62

ConclusionMajority of traffic from recommendations may be not causal, simply convenience.Two data-driven methods:• Shock-IV: An IV-based method for mining

exclusion-valid instruments from observational data

• Split-door: A general identification strategy for time series data.

Page 63: Data mining for causal inference: Effect of recommendations on Amazon.com

63

More generally, data mining can augment causal inference methods

Hypothesize about a natural variation

Argue why it resembles a randomized experiment

Compute causal effect

Develop tests for validity of natural

variation

Mine for such valid variations in

observational data

Compute causal effect

Page 64: Data mining for causal inference: Effect of recommendations on Amazon.com

64

Thank you!AMIT SHARMA

MICROSOFT RESEARCH@amt_shrma h t tp : / /www.amitsharma. in

Hypothesize about a natural variation

Argue why it resembles a randomized experiment

Compute causal effect

Develop tests for validity of natural variation

Mine for such valid variations in observational

data

Compute causal effect

Sharma, A., Hofman, J. M., & Watts, D. J. (2015). Estimating the causal impact of recommendation systems from observational data. In Proceedings of the Sixteenth ACM Conference on Economics and Computation.