Data mining for causal inference: Effect of recommendations on Amazon.com
-
Upload
amit-sharma -
Category
Data & Analytics
-
view
1.652 -
download
1
Transcript of Data mining for causal inference: Effect of recommendations on Amazon.com
1
Data mining for causal inferenceAMIT SHARMA Postdoctoral Researcher, Microsoft Research
(Joint work with JAKE HOFMAN and DUNCAN WATTS, Microsoft Research)
http://www.amitsharma.in@amt_shrma
2
My research Analyzing the effect of online systems
◦ Recommender systems [WWW ’13, EC ’15, CSCW ‘15]◦ Social news feeds [CSCW ‘16]◦ Web search
Methodological◦ Threats to large-scale observational studies [WWW ’16b]◦ Mining for natural experiments [EC ‘15]◦ New identification strategies suited for fine-grained data◦ Testing assumptions for validity of an instrumental variable◦ Gaps between prediction and understanding [WWW ’16a, ICWSM ‘16]
What is the effect of a recommender system?
4
How much do they change user behavior?
5
Naively, up to 30% of traffic comes from recommendations
6
Naively, up to 30% of traffic comes from recommendations
“Burton Snowboard, a sports retailer, reported that personalized product recommendations have driven nearly 25% of total sales since it began offering them in 2008. Prior to this, Burton’s customer recommendations consisted of items from its list of top-selling products.”
Almost surely an over-estimate of the actual effect, because of correlated demand between products.
Example: product browsing on Amazon.com
Example: product browsing on Amazon.com
Example: product browsing on Amazon.com
Counterfactual browsing: no recommendations
Counterfactual browsing: no recommendations
Problem: Correlated demand may drive page visits, even without recommendations
14
The problem of correlated demand
Demand for winter
accessories
Visits to winter hat
Rec. visits to winter
gloves
15
Goal: Estimate the causal effect
Causal
Convenience
OBSERVED CLICK-THROUGHS WITHOUT RECOMMENDER
Convenience
?
16
Ideal experiment: A/B Test
Treatment (A)Control (B)
But, experiments:may be costlyhamper user experiencerequire full access to the system
Can we derive an observational strategy to identify the causal effect of recommendations?
18
Using natural variations to simulate an experiment
19
Studying sudden spikes, “shocks” to demand for a book
[Carmi et al. 2012]
20
The same author’s recommended book may also have a shock
21
Past work Uses statistical models to control for confounds Carmi et al. [2012], Oestreicher and Sundararajan [2012] and Lin [2013] construct “complementary sets” of similar, non-recommended products.
Garfinkel et. al. [2006] and Broder et al. [2015] compare to model-predicted clicks without recommendations.
But, 1. These assumptions are hard to verify.2. Finding examples of valid shocks requires ingenuity
and restricts researchers to very specific categories
22
This talk: Using data mining for natural experiments
I. Data-driven instrumental variables
“Shock-IV” method: Mining for sudden spikes (“shocks”) in data
II. General data-driven identification strategy for time series data “Split-door” criterion: Generalizing the idea of shocks
Throughout, we will use Amazon’s recommendation system as an example.
23
I. Shock-IV: Mining for valid natural experiments
24
Distinguishing between recommendation and direct traffic
All visits to a product
Recommender visits Direct visits
Search visits
Direct browsing
Proxy for unobserved demand
25
The Shock-IV strategy: Searching for valid shocks
? ?
26
The Shock-IV strategy: Filtering out invalid shocks
Search for products that receive a sudden shock in their traffic but direct traffic for their recommendations remains constant.
Why does it work? Shock as an instrumental variable
Demand
Focal visits (X)
Rec. visits (Y)
Sudden Shock
Directvisits (Y)
Computing the causal estimate
Increase in recommendation clicks (Δr)
Causal CTR (ρ) = Δr/Δv
*Same as Wald estimator for instrumental variables
Increase in visits to focal product (Δv)
Application to Amazon.com, using Bing toolbar logs
Anonymized browsing logs:
• 23 million pageviews
• 1.3 million Amazon products
• 2 million Bing Toolbar users
Sept 2013-May 2014
Recreating sequence of page visits by a user
Search page Focal product page Recommended product page
Recreating sequence of page visits by a user
Timestamp URL2014-01-20 09:04:10
http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=George%20saunders
2014-01-20 09:04:15
http://www.amazon.com/dp/0812984250/ref=sr_1_1
2014-01-20 09:05:01
http://www.amazon.com/dp/1573225797/ref=pd_sim_b_2
Recreating sequence of page visits by a user
Timestamp URL2014-01-20 09:04:10
http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=George%20saunders
2014-01-20 09:04:15
http://www.amazon.com/dp/0812984250/ref=sr_1_1
2014-01-20 09:05:01
http://www.amazon.com/dp/1573225797/ref=pd_sim_b_2
User searches for George Saunders
User clicks on the first search result
User clicks on the second recommendation
I. Weekly and seasonal patterns in traffic, nearly tripling in holidays
II. 30% of all pageviews come through recommendations
III. Books and eBooks are the most popular categories by far
IV. Apparel and shoes see a substantially higher fraction of visits through recommendations
38
Shock-IV: Finding shocks in user visit data
We look for focal products with large and sudden increases in views relative to typical traffic.
Size of shock exceeds:◦ 5 times median traffic◦ Shock exceeds 5 times the previous day's traffic and 5 times the
mean of the last 7 days.
Shocked product has: ◦ Visits from at least 10 unique users during the shock◦ Non-zero visits for at least five out of seven days before and after
the shock
39
Shock-IV: Ensuring exclusion restriction
Recommended product (Y) should have constant direct visits during the time of the shock.
(1-β): Ratio of maximum 14-day variation in visits to a recommended product to the size of the shock for the focal product.
Direct traffic to Y is stable relative to the shock to the focal product.
β = 1 Direct traffic to Y is no less varying than the shock to focal product.
β = 0
How to choose
Focal product visits Rec. product direct visits
Focal product visits Rec. product direct visits
Accept
RejectSelect
Using the method, obtain >4000 natural experiments!
20% of all products that had visits on any single day.
Estimating the causal clickthrough rate ()
ρ =Δrxyt*/ Δvxt*
At β = 0.7, causal CTR =3%.
Causal click-through rate by product category
What fraction of the observed click-throughs are causal?
45
Estimating fraction of observed click-throughs that are causal
Compare the number of estimated causal clicks to all observed recommendation clicks (non-shock period).
λ = ρxy.vxt / rxyt
Only a quarter of the observed click-throughs are causal
At β = 0.7, only 25% of recommendation traffic is caused by the recommender.
47
Generalization? Shocks may be due to discounts or sales
Lower CTR may be due to the holiday season
48
Local average treatment effect (LATE), not fully generalizable
Shocked products are not a representative sample of all products, nor are the users who participate in them.
• Fortunately, Shock-IV method covers roughly one-fifth of all products with at least 10 visits on any single day.
• Causal estimates are consistent with experimental findings (e.g., Belluf et. al. [2012])
49
Summary: Shock-IV method
I. Mining for instruments allows us to study a much larger sample of natural experiments.
II. Fine-grained data allowed us to test for exclusion restriction directly.
A simple, scalable method for causal inference.◦ Can used for improving recommender systems through causal metrics.◦ Can be applied to other domains, such as online ads.◦ Can be used for finding potential instruments.
50
II. Generalizing Shock-IV: “Split-door” criterion
Shocks are traditionally used to identify causal effects, but capture a very rare specialized event.
Let’s have a look at the model again
Demand
Focal visits (X)
Rec. visits (Y)
Sudden Shock
Directvisits (Y)
All we require is that direct traffic to recommended product is not affected by visits to focal product.(no correlated demand)
54
Focal Product Recommended Product
Accept
Accept
55
The split-door criterion Instead of searching for shocks, Check whether direct traffic for Y is independent of visits to X.
Demand
Focal visits (X)
Rec. visits (Y)
Direct Visits
(YD
More formal: Why does it work?
Can show: Statistical independence of and X guarantees unconfoundedness between X and Y.
Demand
Focal visits (X)
Rec. visits (Y)
Direct Visits
(YD
Two possibilities, both remove the effect of common demand
Demand
Focal visits (X)
Rec. visits (Y)
Dir. visits (YD
Demand
Focal visits (X)
Rec. visits (Y)
Dir. visits (YD
58
Sidenote: Split-door criterion generalizes Shock-IV
By capturing shocks, we were essentially capturing notion of independence between X and
Split-door will admit all valid shocks, as also other variations.
Applying to logs from Amazon recommendations
1. Divide up data into t=15 day periods.
2. Find product pairs (X and Y) such that:
: Direct visits to recommended product
Compute ρ =Δrxyt/ Δvxt
Using the split-door criterion, Causal CTR , similar to the estimate from Shock-IV (
61
Summary: A general identification criterion
Split-door criterion admits a broader sample of natural experiments than shocks.
Automatically tests for valid identification. Can be used whenever is separable.
Applications: Evaluate the relationship between any two timeseries: e.g. social media and news, ads and search.
62
ConclusionMajority of traffic from recommendations may be not causal, simply convenience.Two data-driven methods:• Shock-IV: An IV-based method for mining
exclusion-valid instruments from observational data
• Split-door: A general identification strategy for time series data.
63
More generally, data mining can augment causal inference methods
Hypothesize about a natural variation
Argue why it resembles a randomized experiment
Compute causal effect
Develop tests for validity of natural
variation
Mine for such valid variations in
observational data
Compute causal effect
64
Thank you!AMIT SHARMA
MICROSOFT RESEARCH@amt_shrma h t tp : / /www.amitsharma. in
Hypothesize about a natural variation
Argue why it resembles a randomized experiment
Compute causal effect
Develop tests for validity of natural variation
Mine for such valid variations in observational
data
Compute causal effect
Sharma, A., Hofman, J. M., & Watts, D. J. (2015). Estimating the causal impact of recommendation systems from observational data. In Proceedings of the Sixteenth ACM Conference on Economics and Computation.