Causal Inference, Reinforcement Learning, and Continuous Optimization
-
Upload
scientificrevenue -
Category
Data & Analytics
-
view
781 -
download
0
Transcript of Causal Inference, Reinforcement Learning, and Continuous Optimization
![Page 1: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/1.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Causal Inference, Reinforcement Learning, and Continuous Optimization
Nucl.ai, 2016
![Page 2: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/2.jpg)
info@@scientificrevenue.com Pricing Without Compromise
The Motivational Quote
This book does not take a decision theoretic perspective ... because the problem faced by most economists or intending economists does not seem sensibly described as one of decision. It seems more like that of sensibly and concisely reporting their findings .... this leaves it up to others to use your report as a basis for decision making.
![Page 3: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/3.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Stated Another Way
….
![Page 4: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/4.jpg)
info@@scientificrevenue.com Pricing Without Compromise
The First Escape: “A/B” Testing
It’s a compromise
You get engineering to insert some bifurcated code (the test) into the system
Usually define all the variations in advance, and then wait for a new version of the game to be released
After that, someone looks at the test every hour until statistical significance is achieved
You have a “winner” and go with it
![Page 5: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/5.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Multivariate Testing
A/ B quickly becomes multivariate – 4 or 5 arms is common
This is the most common evaluation methodology today
Problems:
• For most things worth testing, more arms elongates the testing cycle
• Therefore, success requires a long-term test• Therefore, ability to iterate is limited
During the Test:
• You’ve got potentially bad variations live!
![Page 6: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/6.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Runge et Al on Churn
Churn detection algorith, worked very well
No churn prevention policy worked well against the general population
A/B Test
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6932875
![Page 7: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/7.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Levitt Et Al on Pricing
Had 4 arms and a control.
Ran for 3 months.
Results were inconclusive.
http://www.pnas.org/content/113/27/7323.full
![Page 8: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/8.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Next Step: Multi-Arm Bandits
Core idea: vary traffic to arms of test based on performance criteria
• At any given moment in time, either “explore” (focus on learning about performance) or “exploit” (use currently optimal arm)
Very popular in advertising realm. Huge and interesting literature
![Page 9: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/9.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Multi-Arm Bandit Pros
Traffic to bad variations quickly decreases (assuming a robust performance metric)
Generally, helps you get to a “winner” faster
It’s got an O’Reilly book, so you don’t have to explain it to engineering
One major use case: use MAB to eliminate “bad” arms, then multi-variate test the rest
![Page 10: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/10.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Multi-Arm Bandit “Cons”
You’re not sending traffic to all the arms at the same rate. Statistical significance is very hard to achieve
Changing traffic volumes introduces bias in experimental populations (during analysis, you could conceivably reweight using propensity scores)
Markov assumptions underlying standard reinforcement learning theory are not fully valid
Defining the objective function can be difficult
![Page 11: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/11.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Bias ?
Changing traffic volumes introduces bias in experimental populations (during analysis, you could conceivably reweight using propensity scores)?
Suppose you send 20% of the users to each of 5 arms. Then suppose you send 40% of new users to the first arm, and 15% to each of the remaining arms
The population going to the first arm has a lower percentage of experienced users, and a higher percentage of people from certain locales (depending on the time you alter the percentages)
Is it a big deal? Not as long as you’re aware of it. And use propensity scores appropriately.
![Page 12: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/12.jpg)
info@@scientificrevenue.com Pricing Without Compromise
But … Why Assume There's a Single Best Outcome?
You've parametrized multiple behaviors
You're recording lots of user features
You're already changing system behavior at runtime
You’re running randomized trials already
If you're really a bandit maven, you've got a reserved population already in place for ongoing exploration
![Page 13: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/13.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Articulating the New Goal
Instead of thinking about “winners” and “losers”
Instead of thinking about “better” and “worse”
Think of a test arm as a population-selecting function
Given an arm of a test, the population it selects is the population it is optimal for, under some objective function
![Page 14: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/14.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Key Idea: Continuous Optimization Using a Control Framework
If you have an objective function
And you have a control state
And you have multiple treatments
Then you should map the user to the treatment that maximizes the objective function
• In realtime• On a per-user basis
![Page 15: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/15.jpg)
info@@scientificrevenue.com Pricing Without Compromise
The Analytical Two-Step
Run randomized trials
• Users randomly assigned to treatments• Banditing has much more explore (and much less
exploit) than is usual• Exploration is guided by models
After the trial, run a causal model builder
• Put your eyeballs at the end of the experiment and see if you can figure out how you should have assigned the users (to optimize the objective function)
• This is inherently a counterfactual exercise, and requires causal inference
![Page 16: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/16.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Causal Inference Notation
Long history of “counterfactual” or “causal” reasoning – goes back almost 100 years.
(Binary) Notation:
• 𝐷𝑖 -- whether user i received a treatment.• 𝑌∗𝑖 -- the outcome for user i under treatment *
𝑌𝑖 = ቊ𝑌1𝑖 𝑖𝑓 𝐷𝑖 = 1𝑌0𝑖 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝜏𝑖 = 𝑌1𝑖 − 𝑌0𝑖
![Page 17: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/17.jpg)
info@@scientificrevenue.com Pricing Without Compromise
The Hard Part To Wrap Your Head Around
Note that 𝑌∗𝑖 and 𝜏𝑖 are unmeasurable in general (they’re not observed) – users either get the treatment or they don’t. This is the hard part to wrap your head around.
Hence the term “counterfactual”
![Page 18: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/18.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Causal Inference Notation II
E[𝑌𝑖 𝐷𝑖 = 1 − 𝐸 𝑌𝑖 𝐷𝑖 = 0] -- observed difference in outcome
E[𝑌1𝑖 𝐷𝑖 = 1 − 𝐸 𝑌0𝑖 𝐷𝑖 = 0] -- same thing
E[𝑌1𝑖 𝐷𝑖 = 1 − 𝐸 𝑌0𝑖 𝐷𝑖 = 1] + E[𝑌0𝑖 𝐷𝑖 = 1 − 𝐸 𝑌0𝑖 𝐷𝑖 = 0]
Treatment effect on treated Selection bias
(red is counterfactual and inserted for algebraic convenience)
![Page 19: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/19.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Causal Decision Trees
Idea:
Split leafs based on a MSE across all treatmentsStandard penalization for complex trees (𝜆 ∗ # 𝑙𝑒𝑎𝑣𝑒𝑠)
Estimator:
Ƹ𝜏𝑖𝐶𝑇 sample average treatment effect in leaf (with
propensity scores)
𝑌𝑖∗ = ቊ
2 ∗ 𝑌𝑖 (𝐷𝑖 = 1)−2 ∗ 𝑌𝑖 (𝐷𝑖 = 0)
−1
𝑛σ𝑖=1𝑁 ( Ƹ𝜏𝑖
𝐶𝑇 − 𝑌𝑖∗)^2 in-sample goodness of fit
![Page 20: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/20.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Why Do This
Compare to “two tree model”
Build two regression trees (control and treatment)Predict outcome for given user on both treesChoose the treatment with maximal value for a given user.
Causal DT separates “model construction” from treatment effect estimation. Works well when there is a lot of hererogeneityunrelated to treatment effects
Two trees work well when control outcomes are close to constant – rare in real life
![Page 21: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/21.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Drilldown: Causal Random Forests
A Casual Decision Tree …. Overfits (just like a decision tree!)
A Causal Random Forest is just a bag of Causal Decision Trees
This example is adding two treatments to decision trees.
But the algebra is similar for m treatments, and for different ML algorithms (which is what we use)
![Page 22: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/22.jpg)
info@@scientificrevenue.com Pricing Without Compromise
The Analytical Two-Step (Revisited)
Run randomized trials
• Users randomly assigned to treatments• Banditing has much more explore (and much less
exploit) than is usual• Exploration is guided by models
After the trial, run a causal model builder
• Put your eyeballs at the end of the experiment and see if you can figure out how you should have assigned the users (to optimize the objective function)
• This is inherently a counterfactual exercise, and requires causal inference
![Page 23: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/23.jpg)
info@@scientificrevenue.com Pricing Without Compromise
What Does a Model Builder Produce
Fast segmenters (the primary goal of a model builder is to provide a real-time segmentation algorithm whose segments can be matched to treatments)
Proportional estimates. What percentage of traffic is going to each treatment (thought of as a segment)
Estimates of improvement for each segment (the model should predict the gain)
![Page 24: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/24.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Evaluation Via “Two-Armed Partition-Based Test”
Two arms: Control and Variation
• Control has “Before” (no treatment)• Variation has entire model (all m treatments)
Partitions partition user space
• m treatments -> up to m disjoint segments in the partition.
• Disjoint segments are each mapped to different treatments
![Page 25: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/25.jpg)
info@@scientificrevenue.com Pricing Without Compromise
![Page 26: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/26.jpg)
info@@scientificrevenue.com Pricing Without Compromise
Revisiting Runge et Al
Runge et al built a model of churn prediction
This defines both the test population and the objective function
• Test population: Likely to churn• (Very simpleminded) Objective function: (Number of
Days until Actually Churned) – (Predicted Number of Days)
Covariates: they’ve collected a bunch
What should they do next?
Causal inference to see which strategies worked for whom (using the covariates as features)
![Page 27: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/27.jpg)
info@@scientificrevenue.com Pricing Without Compromise
This Works in Production ….
(SR Customer Dashboard, with identifying info removed)
![Page 28: Causal Inference, Reinforcement Learning, and Continuous Optimization](https://reader031.fdocuments.us/reader031/viewer/2022030316/587279061a28abc7068b4d87/html5/thumbnails/28.jpg)
info@@scientificrevenue.com Pricing Without Compromise
The Starting Points for Really Understanding This