Foundations of large-scale “doubly-sequential”...
Transcript of Foundations of large-scale “doubly-sequential”...
![Page 1: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/1.jpg)
Foundations of large-scale“doubly-sequential” experimentation
Aaditya Ramdas
Assistant ProfessorDept. of Statistics and Data Science Machine Learning Dept.Carnegie Mellon University
www.stat.cmu.edu/~aramdas/kdd19/
(KDD tutorial in Anchorage, on 4 Aug 2019)
![Page 2: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/2.jpg)
A/B-testing : tech :: clinical trials : pharma
![Page 3: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/3.jpg)
Large-scale A/B-testing or other relatedforms of randomized experimentation hasrevolutionized the tech industry in the last 15yrs.
A/B-testing : tech :: clinical trials : pharma
![Page 4: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/4.jpg)
Large-scale A/B-testing or other relatedforms of randomized experimentation hasrevolutionized the tech industry in the last 15yrs.
In 2013, a team from Microsoft (Bing) claimed that they run tens of thousands of such experiments, leading to millions of dollars in increased revenue.
Kohavi et al. ’13
A/B-testing : tech :: clinical trials : pharma
![Page 5: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/5.jpg)
Large-scale A/B-testing or other relatedforms of randomized experimentation hasrevolutionized the tech industry in the last 15yrs.
In 2013, a team from Microsoft (Bing) claimed that they run tens of thousands of such experiments, leading to millions of dollars in increased revenue.
Kohavi et al. ’13
Much has been discussed about doing A/B testing the “right” way,both theoretically and practically in real-world systems.Many companies contributing to this vast and growing literature.
A/B-testing : tech :: clinical trials : pharma
![Page 6: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/6.jpg)
Audience poll (for the speaker)
![Page 7: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/7.jpg)
How many of you have written papers on A/B testing or online experimentation?(or work in the area, or consider yourselves experts?)
Audience poll (for the speaker)
![Page 8: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/8.jpg)
How many of you have written papers on A/B testing or online experimentation?(or work in the area, or consider yourselves experts?)
How many of you have read papers on A/B testingand know what it is, but want to know more?
Audience poll (for the speaker)
![Page 9: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/9.jpg)
How many of you have written papers on A/B testing or online experimentation?(or work in the area, or consider yourselves experts?)
How many of you have read papers on A/B testingand know what it is, but want to know more?
Audience poll (for the speaker)
How many have no idea what I’m talking about?
![Page 10: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/10.jpg)
Users of app or website
Sign up!
Sign up!……
50% 50%
A B
![Page 11: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/11.jpg)
Users of app or website
Sign up!
Sign up!……
50% 50%
44 conversions 71 conversions
A B
![Page 12: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/12.jpg)
Users of app or website
Sign up!
Sign up!……
50% 50%
44 conversions 71 conversionsB wins!
A B
![Page 13: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/13.jpg)
Users of app or website
Sign up!
Sign up!……
50% 50%
44 conversions 71 conversionsB wins!
A B
![Page 14: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/14.jpg)
What we will NOT cover today
![Page 15: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/15.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differences
![Page 16: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/16.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)
![Page 17: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/17.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterion
![Page 18: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/18.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trends
![Page 19: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/19.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
![Page 20: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/20.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
![Page 21: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/21.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)
![Page 22: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/22.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)Bayesian A/B testing
![Page 23: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/23.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)Bayesian A/B testingSide effects and risks associated with running experiments
![Page 24: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/24.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)Bayesian A/B testingSide effects and risks associated with running experimentsDeployment with controlled, phased rollouts
![Page 25: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/25.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)Bayesian A/B testingSide effects and risks associated with running experimentsDeployment with controlled, phased rollouts
![Page 26: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/26.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)Bayesian A/B testingSide effects and risks associated with running experimentsDeployment with controlled, phased rollouts
Pitfalls of long experiments (survivorship bias, perceived trends)
![Page 27: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/27.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)Bayesian A/B testingSide effects and risks associated with running experimentsDeployment with controlled, phased rollouts
Pitfalls of long experiments (survivorship bias, perceived trends)ML meets causal inference meets online experiments
![Page 28: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/28.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)Bayesian A/B testingSide effects and risks associated with running experimentsDeployment with controlled, phased rollouts
Pitfalls of long experiments (survivorship bias, perceived trends)ML meets causal inference meets online experimentsExperimentation in marketplaces or with network effects
![Page 29: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/29.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)Bayesian A/B testingSide effects and risks associated with running experimentsDeployment with controlled, phased rollouts
Pitfalls of long experiments (survivorship bias, perceived trends)ML meets causal inference meets online experimentsExperimentation in marketplaces or with network effectsEthical aspects of running experiments
![Page 30: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/30.jpg)
What we will NOT cover today
Pre-experiment analysis and difference-in-differencesWhat makes a good metric? (directionality and sensitivity)Combining metrics into an overall evaluation criterionTime-series aspects: dealing with periodicity and trendsCausal inference in observational studies
Parametric methods for A/B testing (like SPRT and variants)Bayesian A/B testingSide effects and risks associated with running experimentsDeployment with controlled, phased rollouts
Pitfalls of long experiments (survivorship bias, perceived trends)ML meets causal inference meets online experimentsExperimentation in marketplaces or with network effectsEthical aspects of running experiments
![Page 31: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/31.jpg)
There are many resources for these topics
Yandex tutorial at The Web Conference ’18
Microsoft tutorial at The Web Conference ’19 (+ ExP Platform webpage)Blog posts by Evan Miller, Etsy, Optimizely, etc.
…
![Page 32: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/32.jpg)
A new “doubly-sequential” perspective: a sequence of sequential experiments
![Page 33: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/33.jpg)
Time / Samples
Exp.
A new “doubly-sequential” perspective: a sequence of sequential experiments
![Page 34: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/34.jpg)
Time / Samples
Exp.
A new “doubly-sequential” perspective: a sequence of sequential experiments
![Page 35: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/35.jpg)
Time / Samples
Exp.
A/B tests
A new “doubly-sequential” perspective: a sequence of sequential experiments
![Page 36: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/36.jpg)
Time / Samples
Exp.
A new “doubly-sequential” perspective: a sequence of sequential experiments
![Page 37: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/37.jpg)
Time / Samples
Exp.
common control
treatments
A new “doubly-sequential” perspective: a sequence of sequential experiments
![Page 38: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/38.jpg)
Time / Samples
Exp.
A new “doubly-sequential” perspective: a sequence of sequential experiments
Zrnic, Ramdas, Jordan ‘18Yang, Ramdas, Jamieson, Wainwright ‘17
![Page 39: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/39.jpg)
What kind of guarantees would we like for doubly sequential experimentation?
![Page 40: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/40.jpg)
What kind of guarantees would we like for doubly sequential experimentation?
(a) inner sequential process (a single experiment) — correct inference when experiment ends (correct p-values for A/B test or correct confidence intervals for treatment effect)
![Page 41: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/41.jpg)
What kind of guarantees would we like for doubly sequential experimentation?
(a) inner sequential process (a single experiment) — correct inference when experiment ends (correct p-values for A/B test or correct confidence intervals for treatment effect)
(b) outer sequential process (multiple experiments) — less clear (is error control on inner process enough?!)
![Page 42: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/42.jpg)
Some potential issues within each experiment
Some potential issues across experiments
Some existing problems in practice
Many other concerns as well
![Page 43: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/43.jpg)
Some potential issues within each experiment
Some potential issues across experiments
Some existing problems in practice
(a) continuous monitoring (b) flexible experiment horizon (c) arbitrary stopping (or continuation) rules
Many other concerns as well
![Page 44: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/44.jpg)
Some potential issues within each experiment
Some potential issues across experiments
Some existing problems in practice
(a) continuous monitoring (b) flexible experiment horizon (c) arbitrary stopping (or continuation) rules
(a) selection bias (multiplicity) (b) dependence across experiments (c) don’t know future outcomes
Many other concerns as well
![Page 45: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/45.jpg)
Solutions for these issues
![Page 46: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/46.jpg)
Inner sequential process:
“confidence sequence” for estimation also called “anytime confidence intervals” (correspondingly, “always valid p-values” for testing)
Solutions for these issues
Part I
![Page 47: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/47.jpg)
Inner sequential process:
“confidence sequence” for estimation also called “anytime confidence intervals” (correspondingly, “always valid p-values” for testing)
Outer sequential process:
“false coverage rate” for estimation (correspondingly, “false discovery rate” for testing)
Solutions for these issues
Part II
Part I
![Page 48: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/48.jpg)
Inner sequential process:
“confidence sequence” for estimation also called “anytime confidence intervals” (correspondingly, “always valid p-values” for testing)
Outer sequential process:
“false coverage rate” for estimation (correspondingly, “false discovery rate” for testing)
Solutions for these issues
Part II
Part I
Modular solutions: fit well together Many extensions to each piece
Part III
![Page 49: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/49.jpg)
The INNER Sequential Process(a single experiment)
[1 hour]
Part I
![Page 50: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/50.jpg)
The “duality” between confidence intervals and p-values
![Page 51: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/51.jpg)
Hypothesis testing is like stochastic proof by contradiction.
![Page 52: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/52.jpg)
Hypothesis testing is like stochastic proof by contradiction.
![Page 53: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/53.jpg)
Hypothesis testing is like stochastic proof by contradiction.
Null hypothesis: The coin is fair (bias = 0)
Alternative:Coin is biased towards H
![Page 54: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/54.jpg)
Hypothesis testing is like stochastic proof by contradiction.
1000 tosses
0 200 400 600 800
Tails Heads
Null hypothesis: The coin is fair (bias = 0)
Alternative:Coin is biased towards H
![Page 55: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/55.jpg)
Hypothesis testing is like stochastic proof by contradiction.
1000 tosses
0 200 400 600 800
Tails Heads
Null hypothesis: The coin is fair (bias = 0)
Alternative:Coin is biased towards H
Apparent contradiction!Should we reject the null hypothesis?
![Page 56: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/56.jpg)
Calculate p-value
![Page 57: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/57.jpg)
Possible observations
Calculate p-value
![Page 58: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/58.jpg)
Possible observationsall
tailsall
heads
Calculate p-value
![Page 59: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/59.jpg)
Possible observationsall
tailsall
heads
Prob.density
Calculate p-value
![Page 60: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/60.jpg)
Possible observationsall
tailsall
heads
Prob.density
data
Calculate p-value
![Page 61: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/61.jpg)
Possible observationsall
tailsall
heads
Prob.density
data
p-value P
Calculate p-value
![Page 62: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/62.jpg)
Possible observationsall
tailsall
heads
Prob.density
data
p-value P
Calculate p-value
Reject null if P ≤ α
![Page 63: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/63.jpg)
Possible observationsall
tailsall
heads
Prob.density
data
p-value P
Calculate p-value
Reject null if P ≤ α ≈ #H − #T ≥ 2N log(1/α) .
![Page 64: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/64.jpg)
Possible observationsall
tailsall
heads
Prob.density
data
p-value P
Calculate p-value
Then, Pr(false positive) ≤ α .
Reject null if P ≤ α ≈ #H − #T ≥ 2N log(1/α) .
![Page 65: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/65.jpg)
An equivalent view via confidence intervals
![Page 66: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/66.jpg)
An equivalent view via confidence intervals
Estimate the coin bias by ̂μ :=#H − #T
N.
![Page 67: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/67.jpg)
An equivalent view via confidence intervals
An asymptotic (1 − α)-CI for μ is given by ( ̂μ −z1−α
N, ̂μ +
z1−α
N ),
Estimate the coin bias by ̂μ :=#H − #T
N.
![Page 68: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/68.jpg)
An equivalent view via confidence intervals
(appealing to the Central Limit Theorem) where z1−α is the (1 − α)-quantile of N(0,1) .
An asymptotic (1 − α)-CI for μ is given by ( ̂μ −z1−α
N, ̂μ +
z1−α
N ),
Estimate the coin bias by ̂μ :=#H − #T
N.
![Page 69: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/69.jpg)
An equivalent view via confidence intervals
If this confidence interval does not contain 0,we may be reasonably confident that the coin is biased,and we may reject the null hypothesis.
(appealing to the Central Limit Theorem) where z1−α is the (1 − α)-quantile of N(0,1) .
An asymptotic (1 − α)-CI for μ is given by ( ̂μ −z1−α
N, ̂μ +
z1−α
N ),
Estimate the coin bias by ̂μ :=#H − #T
N.
![Page 70: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/70.jpg)
An equivalent view via confidence intervals
If this confidence interval does not contain 0,we may be reasonably confident that the coin is biased,and we may reject the null hypothesis.
(appealing to the Central Limit Theorem) where z1−α is the (1 − α)-quantile of N(0,1) .
An asymptotic (1 − α)-CI for μ is given by ( ̂μ −z1−α
N, ̂μ +
z1−α
N ),
Estimate the coin bias by ̂μ :=#H − #T
N.
≈ #H − #T ≥ 2N log(1/α) .
![Page 71: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/71.jpg)
For any parameter µ of interest,
with associated estimator bµ,the following claim holds:
bµ( )
R
| {z }(1�↵) confidence interval
for µ
![Page 72: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/72.jpg)
For any parameter µ of interest,
with associated estimator bµ,the following claim holds:
bµ( )
R
| {z }(1�↵) confidence interval
µ0
for µ
![Page 73: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/73.jpg)
For any parameter µ of interest,
with associated estimator bµ,the following claim holds:
bµ( )
R
| {z }(1�↵) confidence interval
µ0⌘
for µ
![Page 74: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/74.jpg)
For any parameter µ of interest,
with associated estimator bµ,the following claim holds:
bµ( )
R
| {z }(1�↵) confidence interval
µ0⌘
For H0 : µ = µ0
for µ
![Page 75: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/75.jpg)
For any parameter µ of interest,
with associated estimator bµ,the following claim holds:
bµ( )
R
| {z }(1�↵) confidence interval
µ0⌘
For H0 : µ = µ0
we have p-value Pµ0 ↵
for µ
![Page 76: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/76.jpg)
For any parameter µ of interest,
with associated estimator bµ,the following claim holds:
bµ( )
R
| {z }(1�↵) confidence interval
µ0⌘
For H0 : µ = µ0
we have p-value Pµ0 ↵
for µ
(we would reject the nullhypothesis at level ↵)
![Page 77: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/77.jpg)
For any parameter µ of interest,
with associated estimator bµ,the following claim holds:
bµ( )
R
| {z }(1�↵) confidence interval
µ0⌘
For H0 : µ = µ0
we have p-value Pµ0 ↵
for µ
(we would reject the nullhypothesis at level ↵)
(1�↵/2) confidence intervalz }| {
( )
![Page 78: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/78.jpg)
For any parameter µ of interest,
with associated estimator bµ,the following claim holds:
bµ( )
R
| {z }(1�↵) confidence interval
µ0⌘
For H0 : µ = µ0
we have p-value Pµ0 ↵
for µ
(we would reject the nullhypothesis at level ↵)
(1�↵/2) confidence intervalz }| {
( )
⌘ Pµ0 > ↵/2
![Page 79: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/79.jpg)
In summary, tests (p-values) and CIs are “dual”.
![Page 80: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/80.jpg)
In summary, tests (p-values) and CIs are “dual”.
family of tests for θ → CI for θ
CI for θ → family of tests for θ
![Page 81: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/81.jpg)
In summary, tests (p-values) and CIs are “dual”.
A (1 − α)-CI for a parameter θ is the set of all θ0 such that the test for H0 : θ = θ0 has p-value larger than α .
family of tests for θ → CI for θ
CI for θ → family of tests for θ
![Page 82: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/82.jpg)
In summary, tests (p-values) and CIs are “dual”.
A (1 − α)-CI for a parameter θ is the set of all θ0 such that
the smallest q for which the (1 − q)-CI for θ fails to cover θ0 .A p-value for testing the null H0 : θ = θ0 can be given by
the test for H0 : θ = θ0 has p-value larger than α .
family of tests for θ → CI for θ
CI for θ → family of tests for θ
![Page 83: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/83.jpg)
In summary, tests (p-values) and CIs are “dual”.
A (1 − α)-CI for a parameter θ is the set of all θ0 such that
the smallest q for which the (1 − q)-CI for θ fails to cover θ0 .A p-value for testing the null H0 : θ = θ0 can be given by
the test for H0 : θ = θ0 has p-value larger than α .
family of tests for θ → CI for θ
CI for θ → family of tests for θ
CI for θ → composite tests for θ
![Page 84: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/84.jpg)
In summary, tests (p-values) and CIs are “dual”.
A (1 − α)-CI for a parameter θ is the set of all θ0 such that
the smallest q for which the (1 − q)-CI for θ fails to cover θ0 .A p-value for testing the null H0 : θ = θ0 can be given by
the test for H0 : θ = θ0 has p-value larger than α .
family of tests for θ → CI for θ
CI for θ → family of tests for θ
the smallest q for which the (1 − q)-CI for θ fails to intersect Θ0 .A p-value for testing the null H0 : θ ∈ Θ0 can be given byCI for θ → composite tests for θ
![Page 85: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/85.jpg)
In summary, tests (p-values) and CIs are “dual”.
A (1 − α)-CI for a parameter θ is the set of all θ0 such that
the smallest q for which the (1 − q)-CI for θ fails to cover θ0 .A p-value for testing the null H0 : θ = θ0 can be given by
the test for H0 : θ = θ0 has p-value larger than α .
family of tests for θ → CI for θ
CI for θ → family of tests for θ
the smallest q for which the (1 − q)-CI for θ fails to intersect Θ0 .A p-value for testing the null H0 : θ ∈ Θ0 can be given byCI for θ → composite tests for θ
Both of them are useful tools to estimate uncertainty,and like any other tool, they can be used well, or be misused.
![Page 86: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/86.jpg)
However, commonly taught confidence intervalsand p-values are only valid (correctly control error)
if the sample size is fixed in advance.
![Page 87: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/87.jpg)
Start
High-level caricature of an A/B-test
![Page 88: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/88.jpg)
Collect more data (increase sample size)
Start
High-level caricature of an A/B-test
![Page 89: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/89.jpg)
Collect more data (increase sample size) Check if P(n) ≤ α
“peek”Start
High-level caricature of an A/B-test
![Page 90: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/90.jpg)
Collect more data (increase sample size) Check if P(n) ≤ α
“peek”
Stop,Report
“optional stopping”
Start
High-level caricature of an A/B-test
![Page 91: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/91.jpg)
Collect more data (increase sample size) Check if P(n) ≤ α
“peek”
“optional continuation”Stop,
Report
“optional stopping”
Start
High-level caricature of an A/B-test
![Page 92: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/92.jpg)
Collect more data (increase sample size) Check if P(n) ≤ α
“peek”
“optional continuation”Stop,
Report
“optional stopping”
Start
High-level caricature of an A/B-test
With commonly-taught p-values, false positive rate ≫ α .
![Page 93: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/93.jpg)
![Page 94: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/94.jpg)
![Page 95: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/95.jpg)
After 10 people
![Page 96: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/96.jpg)
After 10 people
![Page 97: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/97.jpg)
After 10 people
After 284 people
![Page 98: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/98.jpg)
After 10 people
After 284 people
![Page 99: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/99.jpg)
After 10 people
After 284 people
After 1214 people
![Page 100: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/100.jpg)
After 10 people
After 284 people
After 1214 people
![Page 101: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/101.jpg)
After 10 people
After 284 people
After 1214 people
After 2398 people
![Page 102: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/102.jpg)
After 10 people
After 284 people
After 1214 people
After 2398 people
![Page 103: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/103.jpg)
After 10 people
After 284 people
After 1214 people
After 2398 people
After 7224 people
![Page 104: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/104.jpg)
After 10 people
After 284 people
After 1214 people
After 2398 people
After 7224 people
![Page 105: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/105.jpg)
After 10 people
After 284 people
After 1214 people
After 2398 people
After 7224 people
After 11,219 people, STOP!
![Page 106: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/106.jpg)
After 10 people
After 284 people
After 1214 people
After 2398 people
After 7224 people
After 11,219 people, STOP!
![Page 107: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/107.jpg)
Let P(n) be a classical p-value (eg: t-test), calculated using the first n samples.
![Page 108: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/108.jpg)
Let P(n) be a classical p-value (eg: t-test), calculated using the first n samples.
Under the null hypothesis (no treatment effect),
∀n ≥ 1, Pr(P(n) ≤ α)
prob. of false positive
≤ α .
![Page 109: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/109.jpg)
Let P(n) be a classical p-value (eg: t-test), calculated using the first n samples.
Let τ be the stopping time of the experiment.
Under the null hypothesis (no treatment effect),
∀n ≥ 1, Pr(P(n) ≤ α)
prob. of false positive
≤ α .
Often, τ depends on data, eg: τ := min{n ∈ ℕ : Pn ≤ α} .
![Page 110: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/110.jpg)
Let P(n) be a classical p-value (eg: t-test), calculated using the first n samples.
Let τ be the stopping time of the experiment.
Under the null hypothesis (no treatment effect),
∀n ≥ 1, Pr(P(n) ≤ α)
prob. of false positive
≤ α .
Unfortunately, Pr(P(τ) ≤ α) ≰ α .
Often, τ depends on data, eg: τ := min{n ∈ ℕ : Pn ≤ α} .
In other words, Pr(∃n ∈ ℕ : P(n) ≤ α) ≫ α .
![Page 111: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/111.jpg)
Collect more data (increase sample size) Check if 0 ∉ (1 − α) CI
“peek”
“optional continuation”
Stop
“optional stopping”
Start
Same problem with confidence interval (CI)
Again, false positive rate ≫ α .
Stop,Report
![Page 112: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/112.jpg)
Let (L(n), U(n)) be any classical (1 − α) CI, calculated using the first n samples (eg: CLT).
![Page 113: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/113.jpg)
Let (L(n), U(n)) be any classical (1 − α) CI, calculated using the first n samples (eg: CLT).
When trying to estimate the treatment effect θ,
∀n ≥ 1, Pr(θ ∈ (L(n), U(n)))
prob. of coverage
≥ 1 − α .
![Page 114: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/114.jpg)
Let (L(n), U(n)) be any classical (1 − α) CI, calculated using the first n samples (eg: CLT).
Let τ be the stopping time of the experiment.
When trying to estimate the treatment effect θ,
∀n ≥ 1, Pr(θ ∈ (L(n), U(n)))
prob. of coverage
≥ 1 − α .
Unfortunately, Pr(θ ∈ (L(τ), U(τ))) ≱ 1 − α .
Again, τ may depend on data, eg: τ := min{n ∈ ℕ : L(n) > 0} .
![Page 115: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/115.jpg)
Let (L(n), U(n)) be any classical (1 − α) CI, calculated using the first n samples (eg: CLT).
Let τ be the stopping time of the experiment.
When trying to estimate the treatment effect θ,
∀n ≥ 1, Pr(θ ∈ (L(n), U(n)))
prob. of coverage
≥ 1 − α .
Unfortunately, Pr(θ ∈ (L(τ), U(τ))) ≱ 1 − α .
Again, τ may depend on data, eg: τ := min{n ∈ ℕ : L(n) > 0} .
In other words, Pr(∀n ≥ 1 : θ ∈ (L(n), U(n))) ≪ 1 − α .usually = 0.
![Page 116: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/116.jpg)
Solution: “confidence sequence”(aka “anytime confidence intervals”)
or “sequential p-values” for testing(aka “always-valid p-values”)
![Page 117: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/117.jpg)
A “confidence sequence” for a parameteris a sequence of confidence intervalswith a uniform (simultaneous) coverage guarantee.
✓<latexit sha1_base64="JqEnYvV6PtsKBJYmBVwEpjIMANw=">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit><latexit sha1_base64="JqEnYvV6PtsKBJYmBVwEpjIMANw=">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit><latexit sha1_base64="JqEnYvV6PtsKBJYmBVwEpjIMANw=">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit><latexit sha1_base64="JqEnYvV6PtsKBJYmBVwEpjIMANw=">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit>
(Ln, Un)<latexit sha1_base64="PpnXScjDZFhd9U2EC1D0fZEkPzE=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBahgpREBD0WvXjwUMG0xTaEzXbTLt1swu5GKKH/wosHRbz6b7z5b9y2OWjrg4HHezPMzAtTzpR2nG+rtLK6tr5R3qxsbe/s7tn7By2VZJJQjyQ8kZ0QK8qZoJ5mmtNOKimOQ07b4ehm6refqFQsEQ96nFI/xgPBIkawNtJj7S4QZ8gLxGlgV526MwNaJm5BqlCgGdhfvX5CspgKTThWqus6qfZzLDUjnE4qvUzRFJMRHtCuoQLHVPn57OIJOjFKH0WJNCU0mqm/J3IcKzWOQ9MZYz1Ui95U/M/rZjq68nMm0kxTQeaLoowjnaDp+6jPJCWajw3BRDJzKyJDLDHRJqSKCcFdfHmZtM7rrlN37y+qjesijjIcwTHUwIVLaMAtNMEDAgKe4RXeLGW9WO/Wx7y1ZBUzh/AH1ucP2MyPtg==</latexit><latexit sha1_base64="PpnXScjDZFhd9U2EC1D0fZEkPzE=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBahgpREBD0WvXjwUMG0xTaEzXbTLt1swu5GKKH/wosHRbz6b7z5b9y2OWjrg4HHezPMzAtTzpR2nG+rtLK6tr5R3qxsbe/s7tn7By2VZJJQjyQ8kZ0QK8qZoJ5mmtNOKimOQ07b4ehm6refqFQsEQ96nFI/xgPBIkawNtJj7S4QZ8gLxGlgV526MwNaJm5BqlCgGdhfvX5CspgKTThWqus6qfZzLDUjnE4qvUzRFJMRHtCuoQLHVPn57OIJOjFKH0WJNCU0mqm/J3IcKzWOQ9MZYz1Ui95U/M/rZjq68nMm0kxTQeaLoowjnaDp+6jPJCWajw3BRDJzKyJDLDHRJqSKCcFdfHmZtM7rrlN37y+qjesijjIcwTHUwIVLaMAtNMEDAgKe4RXeLGW9WO/Wx7y1ZBUzh/AH1ucP2MyPtg==</latexit><latexit sha1_base64="PpnXScjDZFhd9U2EC1D0fZEkPzE=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBahgpREBD0WvXjwUMG0xTaEzXbTLt1swu5GKKH/wosHRbz6b7z5b9y2OWjrg4HHezPMzAtTzpR2nG+rtLK6tr5R3qxsbe/s7tn7By2VZJJQjyQ8kZ0QK8qZoJ5mmtNOKimOQ07b4ehm6refqFQsEQ96nFI/xgPBIkawNtJj7S4QZ8gLxGlgV526MwNaJm5BqlCgGdhfvX5CspgKTThWqus6qfZzLDUjnE4qvUzRFJMRHtCuoQLHVPn57OIJOjFKH0WJNCU0mqm/J3IcKzWOQ9MZYz1Ui95U/M/rZjq68nMm0kxTQeaLoowjnaDp+6jPJCWajw3BRDJzKyJDLDHRJqSKCcFdfHmZtM7rrlN37y+qjesijjIcwTHUwIVLaMAtNMEDAgKe4RXeLGW9WO/Wx7y1ZBUzh/AH1ucP2MyPtg==</latexit><latexit sha1_base64="PpnXScjDZFhd9U2EC1D0fZEkPzE=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBahgpREBD0WvXjwUMG0xTaEzXbTLt1swu5GKKH/wosHRbz6b7z5b9y2OWjrg4HHezPMzAtTzpR2nG+rtLK6tr5R3qxsbe/s7tn7By2VZJJQjyQ8kZ0QK8qZoJ5mmtNOKimOQ07b4ehm6refqFQsEQ96nFI/xgPBIkawNtJj7S4QZ8gLxGlgV526MwNaJm5BqlCgGdhfvX5CspgKTThWqus6qfZzLDUjnE4qvUzRFJMRHtCuoQLHVPn57OIJOjFKH0WJNCU0mqm/J3IcKzWOQ9MZYz1Ui95U/M/rZjq68nMm0kxTQeaLoowjnaDp+6jPJCWajw3BRDJzKyJDLDHRJqSKCcFdfHmZtM7rrlN37y+qjesijjIcwTHUwIVLaMAtNMEDAgKe4RXeLGW9WO/Wx7y1ZBUzh/AH1ucP2MyPtg==</latexit>
Sample sizeℙ(∀n ≥ 1 : θ ∈ (Ln, Un)) ≥ 1 − α .
![Page 118: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/118.jpg)
A “confidence sequence” for a parameteris a sequence of confidence intervalswith a uniform (simultaneous) coverage guarantee.
Darling, Robbins ’67, ‘68Lai ’76, ’84
Howard, Ramdas, McAuliffe, Sekhon ’18
✓<latexit sha1_base64="JqEnYvV6PtsKBJYmBVwEpjIMANw=">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit><latexit sha1_base64="JqEnYvV6PtsKBJYmBVwEpjIMANw=">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit><latexit sha1_base64="JqEnYvV6PtsKBJYmBVwEpjIMANw=">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit><latexit sha1_base64="JqEnYvV6PtsKBJYmBVwEpjIMANw=">AAAB7XicbVDLSgNBEJyNrxhfUY9eBoPgKeyKoMegF48RzAOSJcxOOsmY2ZllplcIS/7BiwdFvPo/3vwbJ8keNLGgoajqprsrSqSw6PvfXmFtfWNzq7hd2tnd2z8oHx41rU4NhwbXUpt2xCxIoaCBAiW0EwMsjiS0ovHtzG89gbFCqwecJBDGbKjEQHCGTmp2cQTIeuWKX/XnoKskyEmF5Kj3yl/dvuZpDAq5ZNZ2Aj/BMGMGBZcwLXVTCwnjYzaEjqOKxWDDbH7tlJ45pU8H2rhSSOfq74mMxdZO4sh1xgxHdtmbif95nRQH12EmVJIiKL5YNEglRU1nr9O+MMBRThxh3Ah3K+UjZhhHF1DJhRAsv7xKmhfVwK8G95eV2k0eR5GckFNyTgJyRWrkjtRJg3DySJ7JK3nztPfivXsfi9aCl88ckz/wPn8Ao/ePKA==</latexit>
(Ln, Un)<latexit sha1_base64="PpnXScjDZFhd9U2EC1D0fZEkPzE=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBahgpREBD0WvXjwUMG0xTaEzXbTLt1swu5GKKH/wosHRbz6b7z5b9y2OWjrg4HHezPMzAtTzpR2nG+rtLK6tr5R3qxsbe/s7tn7By2VZJJQjyQ8kZ0QK8qZoJ5mmtNOKimOQ07b4ehm6refqFQsEQ96nFI/xgPBIkawNtJj7S4QZ8gLxGlgV526MwNaJm5BqlCgGdhfvX5CspgKTThWqus6qfZzLDUjnE4qvUzRFJMRHtCuoQLHVPn57OIJOjFKH0WJNCU0mqm/J3IcKzWOQ9MZYz1Ui95U/M/rZjq68nMm0kxTQeaLoowjnaDp+6jPJCWajw3BRDJzKyJDLDHRJqSKCcFdfHmZtM7rrlN37y+qjesijjIcwTHUwIVLaMAtNMEDAgKe4RXeLGW9WO/Wx7y1ZBUzh/AH1ucP2MyPtg==</latexit><latexit sha1_base64="PpnXScjDZFhd9U2EC1D0fZEkPzE=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBahgpREBD0WvXjwUMG0xTaEzXbTLt1swu5GKKH/wosHRbz6b7z5b9y2OWjrg4HHezPMzAtTzpR2nG+rtLK6tr5R3qxsbe/s7tn7By2VZJJQjyQ8kZ0QK8qZoJ5mmtNOKimOQ07b4ehm6refqFQsEQ96nFI/xgPBIkawNtJj7S4QZ8gLxGlgV526MwNaJm5BqlCgGdhfvX5CspgKTThWqus6qfZzLDUjnE4qvUzRFJMRHtCuoQLHVPn57OIJOjFKH0WJNCU0mqm/J3IcKzWOQ9MZYz1Ui95U/M/rZjq68nMm0kxTQeaLoowjnaDp+6jPJCWajw3BRDJzKyJDLDHRJqSKCcFdfHmZtM7rrlN37y+qjesijjIcwTHUwIVLaMAtNMEDAgKe4RXeLGW9WO/Wx7y1ZBUzh/AH1ucP2MyPtg==</latexit><latexit sha1_base64="PpnXScjDZFhd9U2EC1D0fZEkPzE=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBahgpREBD0WvXjwUMG0xTaEzXbTLt1swu5GKKH/wosHRbz6b7z5b9y2OWjrg4HHezPMzAtTzpR2nG+rtLK6tr5R3qxsbe/s7tn7By2VZJJQjyQ8kZ0QK8qZoJ5mmtNOKimOQ07b4ehm6refqFQsEQ96nFI/xgPBIkawNtJj7S4QZ8gLxGlgV526MwNaJm5BqlCgGdhfvX5CspgKTThWqus6qfZzLDUjnE4qvUzRFJMRHtCuoQLHVPn57OIJOjFKH0WJNCU0mqm/J3IcKzWOQ9MZYz1Ui95U/M/rZjq68nMm0kxTQeaLoowjnaDp+6jPJCWajw3BRDJzKyJDLDHRJqSKCcFdfHmZtM7rrlN37y+qjesijjIcwTHUwIVLaMAtNMEDAgKe4RXeLGW9WO/Wx7y1ZBUzh/AH1ucP2MyPtg==</latexit><latexit sha1_base64="PpnXScjDZFhd9U2EC1D0fZEkPzE=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBahgpREBD0WvXjwUMG0xTaEzXbTLt1swu5GKKH/wosHRbz6b7z5b9y2OWjrg4HHezPMzAtTzpR2nG+rtLK6tr5R3qxsbe/s7tn7By2VZJJQjyQ8kZ0QK8qZoJ5mmtNOKimOQ07b4ehm6refqFQsEQ96nFI/xgPBIkawNtJj7S4QZ8gLxGlgV526MwNaJm5BqlCgGdhfvX5CspgKTThWqus6qfZzLDUjnE4qvUzRFJMRHtCuoQLHVPn57OIJOjFKH0WJNCU0mqm/J3IcKzWOQ9MZYz1Ui95U/M/rZjq68nMm0kxTQeaLoowjnaDp+6jPJCWajw3BRDJzKyJDLDHRJqSKCcFdfHmZtM7rrlN37y+qjesijjIcwTHUwIVLaMAtNMEDAgKe4RXeLGW9WO/Wx7y1ZBUzh/AH1ucP2MyPtg==</latexit>
Sample sizeℙ(∀n ≥ 1 : θ ∈ (Ln, Un)) ≥ 1 − α .
![Page 119: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/119.jpg)
Example: tracking the mean of a Gaussianor Bernoulli from i.i.d. observations.
X1, X2, … ∼ N(θ,1) or Ber(θ)
![Page 120: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/120.jpg)
Example: tracking the mean of a Gaussianor Bernoulli from i.i.d. observations.
X1, X2, … ∼ N(θ,1) or Ber(θ)
Producing a confidence interval at a fixed timeis elementary statistics (~100 years old).
![Page 121: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/121.jpg)
Example: tracking the mean of a Gaussianor Bernoulli from i.i.d. observations.
X1, X2, … ∼ N(θ,1) or Ber(θ)
Producing a confidence interval at a fixed timeis elementary statistics (~100 years old).
How do we produce a confidence sequence?(which is like a confidence band over time)
![Page 122: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/122.jpg)
Empirical mean
−1.0
−0.5
0.0
0.5
1.0
101 102 103 104 105
Number of samples, t
Con
fiden
ce b
ounds
0.0
0.2
0.4
0.6
101 102 103 104 105
Number of samples, t
Cum
ula
tive
mis
cove
rage
pro
b.
Pointwise CLT Pointwise Hoeffding Linear boundary Curved boundary
(Fair coin)
Empirical mean
−1.0
−0.5
0.0
0.5
1.0
101 102 103 104 105
Number of samples, t
Con
fiden
ce b
ounds
0.0
0.2
0.4
0.6
101 102 103 104 105
Number of samples, t
Cum
ula
tive
mis
cove
rage
pro
b.
Pointwise CLT Pointwise Hoeffding Linear boundary Curved boundary
Empirical mean
−1.0
−0.5
0.0
0.5
1.0
101 102 103 104 105
Number of samples, t
Con
fiden
ce b
ounds
0.0
0.2
0.4
0.6
101 102 103 104 105
Number of samples, t
Cum
ula
tive
mis
cove
rage
pro
b.
Pointwise CLT Pointwise Hoeffding Linear boundary Curved boundaryAnytime CIPointwise CI (CLT)
![Page 123: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/123.jpg)
Empirical mean
−1.0
−0.5
0.0
0.5
1.0
101 102 103 104 105
Number of samples, t
Con
fiden
ce b
ounds
0.0
0.2
0.4
0.6
101 102 103 104 105
Number of samples, t
Cum
ula
tive
mis
cove
rage
pro
b.
Pointwise CLT Pointwise Hoeffding Linear boundary Curved boundary
(Fair coin)
Empirical mean
−1.0
−0.5
0.0
0.5
1.0
101 102 103 104 105
Number of samples, t
Con
fiden
ce b
ounds
0.0
0.2
0.4
0.6
101 102 103 104 105
Number of samples, t
Cum
ula
tive
mis
cove
rage
pro
b.
Pointwise CLT Pointwise Hoeffding Linear boundary Curved boundary
Empirical mean
−1.0
−0.5
0.0
0.5
1.0
101 102 103 104 105
Number of samples, t
Con
fiden
ce b
ounds
0.0
0.2
0.4
0.6
101 102 103 104 105
Number of samples, t
Cum
ula
tive
mis
cove
rage
pro
b.
Pointwise CLT Pointwise Hoeffding Linear boundary Curved boundaryAnytime CIPointwise CI (CLT)
![Page 124: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/124.jpg)
Eg:∑n
i=1 Xi
n± 1.71
log log(2n) + 0.72 log(5.19/α)n
If Xi is 1-subGaussian, then
is a (1 − α) confidence sequence.
![Page 125: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/125.jpg)
Hoeffding boundCLT bound
Jamieson et al. (2013)Balsubramani (2014)Zhao et al. (2016)Darling & Robbins (1967b)Kaufmann et al. (2014)Normal mixtureDarling & Robbins (1968)Polynomial stitching (ours)Inverted stitching (ours)Discrete mixture (ours)
0
2,000
0 105
Vt
Bou
ndar
y
Eg:∑n
i=1 Xi
n± 1.71
log log(2n) + 0.72 log(5.19/α)n
If Xi is 1-subGaussian, then
is a (1 − α) confidence sequence.
![Page 126: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/126.jpg)
Hoeffding boundCLT bound
Jamieson et al. (2013)Balsubramani (2014)Zhao et al. (2016)Darling & Robbins (1967b)Kaufmann et al. (2014)Normal mixtureDarling & Robbins (1968)Polynomial stitching (ours)Inverted stitching (ours)Discrete mixture (ours)
0
2,000
0 105
Vt
Bou
ndar
y
Eg:∑n
i=1 Xi
n± 1.71
log log(2n) + 0.72 log(5.19/α)n
If Xi is 1-subGaussian, then
is a (1 − α) confidence sequence.
Howard, Ramdas, McAuliffe, Sekhon ’18
![Page 127: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/127.jpg)
ℙ( ⋃n∈ℕ
{θ ∉ (Ln, Un)}) ≤ α .
![Page 128: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/128.jpg)
Some implications:
ℙ( ⋃n∈ℕ
{θ ∉ (Ln, Un)}) ≤ α .
![Page 129: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/129.jpg)
Some implications:
1. Valid inference at any time, even stopping times:
ℙ( ⋃n∈ℕ
{θ ∉ (Ln, Un)}) ≤ α .
![Page 130: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/130.jpg)
Some implications:
1. Valid inference at any time, even stopping times:
ℙ( ⋃n∈ℕ
{θ ∉ (Ln, Un)}) ≤ α .
For any stopping time τ : ℙ(θ ∉ (Lτ, Uτ)) ≤ α .
![Page 131: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/131.jpg)
Some implications:
1. Valid inference at any time, even stopping times:
2. Valid post-hoc inference (in hindsight):
ℙ( ⋃n∈ℕ
{θ ∉ (Ln, Un)}) ≤ α .
For any stopping time τ : ℙ(θ ∉ (Lτ, Uτ)) ≤ α .
![Page 132: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/132.jpg)
Some implications:
1. Valid inference at any time, even stopping times:
2. Valid post-hoc inference (in hindsight):
ℙ( ⋃n∈ℕ
{θ ∉ (Ln, Un)}) ≤ α .
For any stopping time τ : ℙ(θ ∉ (Lτ, Uτ)) ≤ α .
For any random time T : ℙ(θ ∉ (LT, UT)) ≤ α .
![Page 133: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/133.jpg)
Some implications:
1. Valid inference at any time, even stopping times:
2. Valid post-hoc inference (in hindsight):
3. No pre-specified sample size: can extend or stop experiments adaptively.
ℙ( ⋃n∈ℕ
{θ ∉ (Ln, Un)}) ≤ α .
For any stopping time τ : ℙ(θ ∉ (Lτ, Uτ)) ≤ α .
For any random time T : ℙ(θ ∉ (LT, UT)) ≤ α .
![Page 134: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/134.jpg)
The same duality between confidence intervals and p-valuesalso holds in the sequential setting:“confidence sequences” are dual to“always valid p-values”.
![Page 135: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/135.jpg)
Duality between anytime p-value and CI
![Page 136: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/136.jpg)
Define a set of null values ℋ0 for θ .
Duality between anytime p-value and CI
![Page 137: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/137.jpg)
Define a set of null values ℋ0 for θ .
Let P(n) := inf{α : the (1 − α) CI(n) does not intersect ℋ0}
Duality between anytime p-value and CI
![Page 138: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/138.jpg)
Define a set of null values ℋ0 for θ .
Let P(n) := inf{α : the (1 − α) CI(n) does not intersect ℋ0}
If CI(n) is a pointwise CI then P(n) is a classical p-value .
For all fixed times n, Pr(P(n) ≤ α)
prob. of false positive
≤ α .
Duality between anytime p-value and CI
![Page 139: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/139.jpg)
Define a set of null values ℋ0 for θ .
Let P(n) := inf{α : the (1 − α) CI(n) does not intersect ℋ0}
If CI(n) is a pointwise CI then P(n) is a classical p-value .
If CI(n) is an anytime CI then P(n) is an always-valid p-value .
For all fixed times n, Pr(P(n) ≤ α)
prob. of false positive
≤ α .
For all stopping times τ, Pr(P(τ) ≤ α) ≤ α .
Duality between anytime p-value and CI
For all data-dependent times T, Pr(P(T) ≤ α) ≤ α .
![Page 140: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/140.jpg)
Relationship to Sequential Probability Ratio Test
we want to test a null hypothesis H0 : θ = θ0
against an alternative hypothesis H1 : θ = θ1 .
Given a stream of data X1, X2, … ∼ fθ, suppose
Wald ‘48
![Page 141: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/141.jpg)
Relationship to Sequential Probability Ratio Test
we want to test a null hypothesis H0 : θ = θ0
against an alternative hypothesis H1 : θ = θ1 .
Given a stream of data X1, X2, … ∼ fθ, suppose
Wald's SPRT (or SLRT) calculates a probability/likelihood ratio:
L(n) :=∏n
i=1 f1(Xi)
∏ni=1 f0(Xi)
,
and rejects when L(n) > 1/α . Can also use prior/mixture over θ1 .
Wald ‘48
![Page 142: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/142.jpg)
Relationship to Sequential Probability Ratio Test
we want to test a null hypothesis H0 : θ = θ0
against an alternative hypothesis H1 : θ = θ1 .
Given a stream of data X1, X2, … ∼ fθ, suppose
Wald's SPRT (or SLRT) calculates a probability/likelihood ratio:
L(n) :=∏n
i=1 f1(Xi)
∏ni=1 f0(Xi)
,
and rejects when L(n) > 1/α . Can also use prior/mixture over θ1 .
Equivalently, define P(n) = 1/L(n) . Then P(n) is an always-valid p-value.
Wald ‘48
![Page 143: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/143.jpg)
Relationship to Sequential Probability Ratio Test
we want to test a null hypothesis H0 : θ = θ0
against an alternative hypothesis H1 : θ = θ1 .
Given a stream of data X1, X2, … ∼ fθ, suppose
Wald's SPRT (or SLRT) calculates a probability/likelihood ratio:
L(n) :=∏n
i=1 f1(Xi)
∏ni=1 f0(Xi)
,
and rejects when L(n) > 1/α . Can also use prior/mixture over θ1 .
Equivalently, define P(n) = 1/L(n) . Then P(n) is an always-valid p-value.
(And inverting it defines a confidence sequence.) Wald ‘48
![Page 144: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/144.jpg)
Can construct confidence sequences(and hence always valid p-values)in a wide variety of nonparametric settings(eg: random variables that arebounded, or subGaussian, or subexponential)
Howard, Ramdas, McAuliffe, Sekhon ’18
![Page 145: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/145.jpg)
Solutions for these issues
![Page 146: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/146.jpg)
Inner sequential process:
“confidence sequence” for estimation also called “anytime confidence intervals” (correspondingly, “always valid p-values” for testing)
Solutions for these issues
Part I
![Page 147: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/147.jpg)
Inner sequential process:
“confidence sequence” for estimation also called “anytime confidence intervals” (correspondingly, “always valid p-values” for testing)
Outer sequential process:
“false coverage rate” for estimation (correspondingly, “false discovery rate” for testing)
Solutions for these issues
Part II
Part I
![Page 148: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/148.jpg)
Inner sequential process:
“confidence sequence” for estimation also called “anytime confidence intervals” (correspondingly, “always valid p-values” for testing)
Outer sequential process:
“false coverage rate” for estimation (correspondingly, “false discovery rate” for testing)
Solutions for these issues
Part II
Part I
Modular solutions: fit well together Many extensions to each piece
Part III
![Page 149: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/149.jpg)
The OUTER Sequential Process(a sequence of experiments)
[40 mins]
Part I1
![Page 150: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/150.jpg)
Quick recap of A/B testing
A :
![Page 151: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/151.jpg)
Quick recap of A/B testing
A :
B :
![Page 152: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/152.jpg)
Quick recap of A/B testing
Null hypothesis: A is at least
as good as B.
A :
B :
![Page 153: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/153.jpg)
Quick recap of A/B testing
0 200 400 600 800
Misses Clicks
Null hypothesis: A is at least
as good as B.
A :
B :
![Page 154: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/154.jpg)
Quick recap of A/B testing
0 200 400 600 800
Misses Clicks
Null hypothesis: A is at least
as good as B.
A :
B :
P = Pr(observed data or moreextreme, assuming null is true)
Calculate p-value:
![Page 155: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/155.jpg)
Quick recap of A/B testing
0 200 400 600 800
Misses Clicks
Null hypothesis: A is at least
as good as B.
A :
B :
P = Pr(observed data or moreextreme, assuming null is true)
Decision rule : if , then we reject the null
(“discovery”). We change A to B,
ensuring that type-1 error .
Calculate p-value:
P ↵
P ↵
![Page 156: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/156.jpg)
Quick recap of A/B testing
0 200 400 600 800
Misses Clicks
Null hypothesis: A is at least
as good as B.
A :
B :
P = Pr(observed data or moreextreme, assuming null is true)
Decision rule : if , then we reject the null
(“discovery”). We change A to B,
ensuring that type-1 error .
Calculate p-value:
a wrong rejection of the null
is a false discovery and implies
a bad change from A to B.
P ↵
P ↵
![Page 157: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/157.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
![Page 158: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/158.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
![Page 159: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/159.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs. Color
![Page 160: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/160.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs. ColorDecision rule:
![Page 161: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/161.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs. ColorDecision rule:
P1 ↵?
![Page 162: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/162.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs.
vs.
Color
Size
Decision rule:P1 ↵?
![Page 163: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/163.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs.
vs.
Color
Size
Decision rule:P1 ↵?
P2 ↵?
![Page 164: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/164.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs.
vs.
vs.
Color
Size
Orientation
Decision rule:P1 ↵?
P2 ↵?
![Page 165: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/165.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs.
vs.
vs.
Color
Size
Orientation
Decision rule:P1 ↵?
P2 ↵?
P3 ↵?
![Page 166: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/166.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Decision rule:P1 ↵?
P2 ↵?
P3 ↵?
![Page 167: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/167.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Decision rule:P1 ↵?
P2 ↵?
P3 ↵?
P4 ↵?
![Page 168: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/168.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs.
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Logo
Decision rule:P1 ↵?
P2 ↵?
P3 ↵?
P4 ↵?
![Page 169: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/169.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs.
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Logo
Decision rule:P1 ↵?
P2 ↵?
P3 ↵?
P4 ↵?
P5 ↵?
![Page 170: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/170.jpg)
Reality: internet companies run thousands of different (independent) A/B tests over time.
Time
vs.
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Logo
Decision rule:
Problem!
P1 ↵?
P2 ↵?
P3 ↵?
P4 ↵?
P5 ↵?
![Page 171: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/171.jpg)
![Page 172: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/172.jpg)
Run 10,000 different,
independentA/B tests
![Page 173: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/173.jpg)
Run 10,000 different,
independentA/B tests
9,900 true nulls
100 non-nulls
![Page 174: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/174.jpg)
Run 10,000 different,
independentA/B tests
9,900 true nulls
100 non-nulls
type-1error rate (per test)= 0.05
![Page 175: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/175.jpg)
Run 10,000 different,
independentA/B tests
9,900 true nulls
100 non-nulls
type-1error rate (per test)= 0.05
495 false discoveries
![Page 176: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/176.jpg)
Run 10,000 different,
independentA/B tests
9,900 true nulls
100 non-nulls
type-1error rate (per test)= 0.05
495 false discoveries
power (per test)= 0.80
![Page 177: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/177.jpg)
Run 10,000 different,
independentA/B tests
9,900 true nulls
100 non-nulls
type-1error rate (per test)= 0.05
495 false discoveries
80 true discoveries
power (per test)= 0.80
![Page 178: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/178.jpg)
Run 10,000 different,
independentA/B tests
9,900 true nulls
100 non-nulls
type-1error rate (per test)= 0.05
495 false discoveries
80 true discoveries
power (per test)= 0.80
![Page 179: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/179.jpg)
Run 10,000 different,
independentA/B tests
9,900 true nulls
100 non-nulls
type-1error rate (per test)= 0.05
495 false discoveries
80 true discoveries
false discovery proportion
FDP = 495/575
power (per test)= 0.80
FDP = # false discoveries# discoveries
![Page 180: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/180.jpg)
Run 10,000 different,
independentA/B tests
9,900 true nulls
100 non-nulls
type-1error rate (per test)= 0.05
495 false discoveries
80 true discoveries
false discovery proportion
FDP = 495/575
power (per test)= 0.80
FDP = # false discoveries# discoveries
FDR = E[FDP]
![Page 181: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/181.jpg)
Run 10,000 different,
independentA/B tests
9,900 true nulls
100 non-nulls
type-1error rate (per test)= 0.05
495 false discoveries
80 true discoveries
false discovery proportion
FDP = 495/575
power (per test)= 0.80
Summary: FDR can be larger than per-test error rate.(even if hypotheses, tests, data are independent)
FDP = # false discoveries# discoveries
FDR = E[FDP]
![Page 182: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/182.jpg)
Given a possibly infinite sequence of independent tests (p-values), can we guarantee control of the FDR in a fully online fashion?
Foster-Stine ’08Aharoni-Rosset ’14
Javanmard-Montanari ’16Ramdas-Yang-Wainwright-Jordan ’17Ramdas-Zrnic-Wainwright-Jordan ’18
Tian-Ramdas ’19
![Page 183: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/183.jpg)
The aim of online FDR procedures
Time
Decision rule:
![Page 184: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/184.jpg)
The aim of online FDR procedures
Time
vs. ColorDecision rule:
![Page 185: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/185.jpg)
The aim of online FDR procedures
Time
vs. ColorDecision rule:
P1 ↵1?
![Page 186: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/186.jpg)
The aim of online FDR procedures
Time
vs.
vs.
Color
Size
Decision rule:P1 ↵1?
![Page 187: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/187.jpg)
The aim of online FDR procedures
Time
vs.
vs.
Color
Size
Decision rule:P1 ↵1?
P2 ↵2?
![Page 188: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/188.jpg)
The aim of online FDR procedures
Time
vs.
vs.
vs.
Color
Size
Orientation
Decision rule:P1 ↵1?
P2 ↵2?
![Page 189: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/189.jpg)
The aim of online FDR procedures
Time
vs.
vs.
vs.
Color
Size
Orientation
Decision rule:P1 ↵1?
P2 ↵2?
P3 ↵3?
![Page 190: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/190.jpg)
The aim of online FDR procedures
Time
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Decision rule:P1 ↵1?
P2 ↵2?
P3 ↵3?
![Page 191: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/191.jpg)
The aim of online FDR procedures
Time
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Decision rule:P1 ↵1?
P2 ↵2?
P3 ↵3?
P4 ↵4?
![Page 192: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/192.jpg)
The aim of online FDR procedures
Time
vs.
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Logo
Decision rule:P1 ↵1?
P2 ↵2?
P3 ↵3?
P4 ↵4?
![Page 193: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/193.jpg)
The aim of online FDR procedures
Time
vs.
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Logo
Decision rule:P1 ↵1?
P2 ↵2?
P3 ↵3?
P4 ↵4?
P5 ↵5?
![Page 194: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/194.jpg)
The aim of online FDR procedures
Time
vs.
vs.
vs.
vs.
vs.
Color
Size
Orientation
Style
Logo
Decision rule:
How do weset each
error level to control FDR at any time?
P1 ↵1?
P2 ↵2?
P3 ↵3?
P4 ↵4?
P5 ↵5?
![Page 195: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/195.jpg)
Offline FDR methodsdo not control the FDRin online settings
Benjamini-Hochberg ’95
One of the most famous offline FDR methods is the “Benjamini-Hochberg” (BH) method
![Page 196: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/196.jpg)
The following method is not a valid online FDR algorithm:
At the end of experiment t, run BH on P1, …, Pt .
![Page 197: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/197.jpg)
The following method is not a valid online FDR algorithm:
At the end of experiment t, run BH on P1, …, Pt .
The reason is that the decision about the first hypothesis depends on all future hypotheses. We cannot commit to a decision and stick to it.
![Page 198: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/198.jpg)
The following method is not a valid online FDR algorithm:
At the end of experiment t, run BH on P1, …, Pt .
The reason is that the decision about the first hypothesis depends on all future hypotheses. We cannot commit to a decision and stick to it.
We need the error level αt for experiment tto be specified when it starts, and we need to make a final decision when experiment t ends.
![Page 199: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/199.jpg)
This multiple testing issue is not particular to p-values. It also exists when selectively reporting treatment effectswith confidence intervals.
Benjamini, Yekutieli ’05Weinstein, Ramdas ’19
![Page 200: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/200.jpg)
Multiplicity in reported CIs
One rarely cares about all CIs or follows-up on them,one usually reports only the most “promising” CIs.
![Page 201: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/201.jpg)
False coverage proportion
Multiplicity in reported CIs
FCP =# incorrectly reported CIs
# reported CIs
One rarely cares about all CIs or follows-up on them,one usually reports only the most “promising” CIs.
![Page 202: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/202.jpg)
False coverage proportion
Multiplicity in reported CIs
FCP =# incorrectly reported CIs
# reported CIs
FCR = 𝔼[FCP]
False coverage rate
One rarely cares about all CIs or follows-up on them,one usually reports only the most “promising” CIs.
Benjamini-Yekutieli ’06Weinstein-Yekutieli ’14
Fithian et al. ’14
![Page 203: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/203.jpg)
Controlling FCR is nontrivial
Constructing marginal 95% CIs for all parametersfails to control FCR at 0.05.
<latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit>
![Page 204: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/204.jpg)
Controlling FCR is nontrivial
Suppose treatment e↵ect ✓j 2 {±0.1} for all j,and experimental observations are normalized toXj ⇠ N(✓j , 1).
<latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit><latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit><latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit><latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit>
Constructing marginal 95% CIs for all parametersfails to control FCR at 0.05.
<latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit>
![Page 205: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/205.jpg)
Controlling FCR is nontrivial
Suppose we only care about drugs with large e↵ects.So we only pursue phase II of the trial if Xj > 3.
<latexit sha1_base64="eT1OBoZu+q2ozcPHjJOqWpC8g1E=">AAACVXicbVFNbxMxFPQupZTw0QBHLk+kSJxWu+0BTqgqF3orKmkjJVH01vs2a+pdW/Zzqyjqn+wF8U+4IOGkEYKWkSyNZt7Iz+PSauU5z38k6YOth9uPdh73njx99ny3/+LlmTfBSRpKo40blehJq46GrFjTyDrCttR0Xl58Wvnnl+S8Mt1XXliatjjvVK0kcpRmfX0arDWe4IrAdHoBEh0BliYwVC7MPVwpbkCjmxNQXZNkn00mvVPzJ2GD84HANnENOD4GUwM3BOwUalA17I1m3+AjHOxls/4gz/I14D4pNmQgNjiZ9W8mlZGhpY6lRu/HRW55ukTHSmq67k2CJ4vyAuc0jrTDlvx0uW7lGt5GpYLauHg6hrX6d2KJrfeLtoyTLXLj73or8X/eOHD9YbpUnQ1Mnby9qA4a2MCqYqiUizXFZiqF0qm4K8gGHUqOH9GLJRR3n3yfnO1nRZ4VX/YHh0ebOnbEa/FGvBOFeC8OxWdxIoZCihvxM0mSNPme/Eq30u3b0TTZZF6Jf5Du/gY/R7EV</latexit><latexit sha1_base64="eT1OBoZu+q2ozcPHjJOqWpC8g1E=">AAACVXicbVFNbxMxFPQupZTw0QBHLk+kSJxWu+0BTqgqF3orKmkjJVH01vs2a+pdW/Zzqyjqn+wF8U+4IOGkEYKWkSyNZt7Iz+PSauU5z38k6YOth9uPdh73njx99ny3/+LlmTfBSRpKo40blehJq46GrFjTyDrCttR0Xl58Wvnnl+S8Mt1XXliatjjvVK0kcpRmfX0arDWe4IrAdHoBEh0BliYwVC7MPVwpbkCjmxNQXZNkn00mvVPzJ2GD84HANnENOD4GUwM3BOwUalA17I1m3+AjHOxls/4gz/I14D4pNmQgNjiZ9W8mlZGhpY6lRu/HRW55ukTHSmq67k2CJ4vyAuc0jrTDlvx0uW7lGt5GpYLauHg6hrX6d2KJrfeLtoyTLXLj73or8X/eOHD9YbpUnQ1Mnby9qA4a2MCqYqiUizXFZiqF0qm4K8gGHUqOH9GLJRR3n3yfnO1nRZ4VX/YHh0ebOnbEa/FGvBOFeC8OxWdxIoZCihvxM0mSNPme/Eq30u3b0TTZZF6Jf5Du/gY/R7EV</latexit><latexit sha1_base64="eT1OBoZu+q2ozcPHjJOqWpC8g1E=">AAACVXicbVFNbxMxFPQupZTw0QBHLk+kSJxWu+0BTqgqF3orKmkjJVH01vs2a+pdW/Zzqyjqn+wF8U+4IOGkEYKWkSyNZt7Iz+PSauU5z38k6YOth9uPdh73njx99ny3/+LlmTfBSRpKo40blehJq46GrFjTyDrCttR0Xl58Wvnnl+S8Mt1XXliatjjvVK0kcpRmfX0arDWe4IrAdHoBEh0BliYwVC7MPVwpbkCjmxNQXZNkn00mvVPzJ2GD84HANnENOD4GUwM3BOwUalA17I1m3+AjHOxls/4gz/I14D4pNmQgNjiZ9W8mlZGhpY6lRu/HRW55ukTHSmq67k2CJ4vyAuc0jrTDlvx0uW7lGt5GpYLauHg6hrX6d2KJrfeLtoyTLXLj73or8X/eOHD9YbpUnQ1Mnby9qA4a2MCqYqiUizXFZiqF0qm4K8gGHUqOH9GLJRR3n3yfnO1nRZ4VX/YHh0ebOnbEa/FGvBOFeC8OxWdxIoZCihvxM0mSNPme/Eq30u3b0TTZZF6Jf5Du/gY/R7EV</latexit><latexit sha1_base64="eT1OBoZu+q2ozcPHjJOqWpC8g1E=">AAACVXicbVFNbxMxFPQupZTw0QBHLk+kSJxWu+0BTqgqF3orKmkjJVH01vs2a+pdW/Zzqyjqn+wF8U+4IOGkEYKWkSyNZt7Iz+PSauU5z38k6YOth9uPdh73njx99ny3/+LlmTfBSRpKo40blehJq46GrFjTyDrCttR0Xl58Wvnnl+S8Mt1XXliatjjvVK0kcpRmfX0arDWe4IrAdHoBEh0BliYwVC7MPVwpbkCjmxNQXZNkn00mvVPzJ2GD84HANnENOD4GUwM3BOwUalA17I1m3+AjHOxls/4gz/I14D4pNmQgNjiZ9W8mlZGhpY6lRu/HRW55ukTHSmq67k2CJ4vyAuc0jrTDlvx0uW7lGt5GpYLauHg6hrX6d2KJrfeLtoyTLXLj73or8X/eOHD9YbpUnQ1Mnby9qA4a2MCqYqiUizXFZiqF0qm4K8gGHUqOH9GLJRR3n3yfnO1nRZ4VX/YHh0ebOnbEa/FGvBOFeC8OxWdxIoZCihvxM0mSNPme/Eq30u3b0TTZZF6Jf5Du/gY/R7EV</latexit>
Suppose treatment e↵ect ✓j 2 {±0.1} for all j,and experimental observations are normalized toXj ⇠ N(✓j , 1).
<latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit><latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit><latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit><latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit>
Constructing marginal 95% CIs for all parametersfails to control FCR at 0.05.
<latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit>
![Page 206: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/206.jpg)
Controlling FCR is nontrivial
Suppose we only care about drugs with large e↵ects.So we only pursue phase II of the trial if Xj > 3.
<latexit sha1_base64="eT1OBoZu+q2ozcPHjJOqWpC8g1E=">AAACVXicbVFNbxMxFPQupZTw0QBHLk+kSJxWu+0BTqgqF3orKmkjJVH01vs2a+pdW/Zzqyjqn+wF8U+4IOGkEYKWkSyNZt7Iz+PSauU5z38k6YOth9uPdh73njx99ny3/+LlmTfBSRpKo40blehJq46GrFjTyDrCttR0Xl58Wvnnl+S8Mt1XXliatjjvVK0kcpRmfX0arDWe4IrAdHoBEh0BliYwVC7MPVwpbkCjmxNQXZNkn00mvVPzJ2GD84HANnENOD4GUwM3BOwUalA17I1m3+AjHOxls/4gz/I14D4pNmQgNjiZ9W8mlZGhpY6lRu/HRW55ukTHSmq67k2CJ4vyAuc0jrTDlvx0uW7lGt5GpYLauHg6hrX6d2KJrfeLtoyTLXLj73or8X/eOHD9YbpUnQ1Mnby9qA4a2MCqYqiUizXFZiqF0qm4K8gGHUqOH9GLJRR3n3yfnO1nRZ4VX/YHh0ebOnbEa/FGvBOFeC8OxWdxIoZCihvxM0mSNPme/Eq30u3b0TTZZF6Jf5Du/gY/R7EV</latexit><latexit sha1_base64="eT1OBoZu+q2ozcPHjJOqWpC8g1E=">AAACVXicbVFNbxMxFPQupZTw0QBHLk+kSJxWu+0BTqgqF3orKmkjJVH01vs2a+pdW/Zzqyjqn+wF8U+4IOGkEYKWkSyNZt7Iz+PSauU5z38k6YOth9uPdh73njx99ny3/+LlmTfBSRpKo40blehJq46GrFjTyDrCttR0Xl58Wvnnl+S8Mt1XXliatjjvVK0kcpRmfX0arDWe4IrAdHoBEh0BliYwVC7MPVwpbkCjmxNQXZNkn00mvVPzJ2GD84HANnENOD4GUwM3BOwUalA17I1m3+AjHOxls/4gz/I14D4pNmQgNjiZ9W8mlZGhpY6lRu/HRW55ukTHSmq67k2CJ4vyAuc0jrTDlvx0uW7lGt5GpYLauHg6hrX6d2KJrfeLtoyTLXLj73or8X/eOHD9YbpUnQ1Mnby9qA4a2MCqYqiUizXFZiqF0qm4K8gGHUqOH9GLJRR3n3yfnO1nRZ4VX/YHh0ebOnbEa/FGvBOFeC8OxWdxIoZCihvxM0mSNPme/Eq30u3b0TTZZF6Jf5Du/gY/R7EV</latexit><latexit sha1_base64="eT1OBoZu+q2ozcPHjJOqWpC8g1E=">AAACVXicbVFNbxMxFPQupZTw0QBHLk+kSJxWu+0BTqgqF3orKmkjJVH01vs2a+pdW/Zzqyjqn+wF8U+4IOGkEYKWkSyNZt7Iz+PSauU5z38k6YOth9uPdh73njx99ny3/+LlmTfBSRpKo40blehJq46GrFjTyDrCttR0Xl58Wvnnl+S8Mt1XXliatjjvVK0kcpRmfX0arDWe4IrAdHoBEh0BliYwVC7MPVwpbkCjmxNQXZNkn00mvVPzJ2GD84HANnENOD4GUwM3BOwUalA17I1m3+AjHOxls/4gz/I14D4pNmQgNjiZ9W8mlZGhpY6lRu/HRW55ukTHSmq67k2CJ4vyAuc0jrTDlvx0uW7lGt5GpYLauHg6hrX6d2KJrfeLtoyTLXLj73or8X/eOHD9YbpUnQ1Mnby9qA4a2MCqYqiUizXFZiqF0qm4K8gGHUqOH9GLJRR3n3yfnO1nRZ4VX/YHh0ebOnbEa/FGvBOFeC8OxWdxIoZCihvxM0mSNPme/Eq30u3b0TTZZF6Jf5Du/gY/R7EV</latexit><latexit sha1_base64="eT1OBoZu+q2ozcPHjJOqWpC8g1E=">AAACVXicbVFNbxMxFPQupZTw0QBHLk+kSJxWu+0BTqgqF3orKmkjJVH01vs2a+pdW/Zzqyjqn+wF8U+4IOGkEYKWkSyNZt7Iz+PSauU5z38k6YOth9uPdh73njx99ny3/+LlmTfBSRpKo40blehJq46GrFjTyDrCttR0Xl58Wvnnl+S8Mt1XXliatjjvVK0kcpRmfX0arDWe4IrAdHoBEh0BliYwVC7MPVwpbkCjmxNQXZNkn00mvVPzJ2GD84HANnENOD4GUwM3BOwUalA17I1m3+AjHOxls/4gz/I14D4pNmQgNjiZ9W8mlZGhpY6lRu/HRW55ukTHSmq67k2CJ4vyAuc0jrTDlvx0uW7lGt5GpYLauHg6hrX6d2KJrfeLtoyTLXLj73or8X/eOHD9YbpUnQ1Mnby9qA4a2MCqYqiUizXFZiqF0qm4K8gGHUqOH9GLJRR3n3yfnO1nRZ4VX/YHh0ebOnbEa/FGvBOFeC8OxWdxIoZCihvxM0mSNPme/Eq30u3b0TTZZF6Jf5Du/gY/R7EV</latexit>
For these drugs, the standard marginal 95% CIdoes not cover ✓j . So FCR=1.
<latexit sha1_base64="MNZd6d4WqE/DvhH93VkMRXIatwE=">AAACQHicbVA9bxNBEN0LX8F8GShpRtiRKNDpLhJKKJCiWIqgCx+OI/ksa25vzt5kb/e0OxfJsvLT0uQn0FHTUIAQLRV7iQtIeNXbN/NmZ15ea+U5Sb5Eazdu3rp9Z/1u5979Bw8fdR8/OfC2cZKG0mrrDnP0pJWhISvWdFg7wirXNMqPB219dELOK2s+8aKmSYUzo0olkYM07Y72rAOekycoXDPzL9sHeEZToCugQjdTBjX0X7/KNvoweJdlncKSB2MZpA2joZ8FC+P0qB/DRwt7gw9v0nja7SVxcgG4TtIV6YkV9qfdz1lhZVORYanR+3Ga1DxZomMlNZ12ssZTjfIYZzQO1GBFfrK8COAUNoJSQBlOKa1p9wrq344lVt4vqjx0Vshzf7XWiv+rjRsutydLZeqGycjLj8pGA1to04RCOZKsF4GgdCrsCnKODiWHzDshhPTqydfJwWacJnH6frO3s7uKY108E8/FC5GKLbEj3op9MRRSnImv4rv4EZ1H36Kf0a/L1rVo5Xkq/kH0+w9bcayH</latexit><latexit sha1_base64="MNZd6d4WqE/DvhH93VkMRXIatwE=">AAACQHicbVA9bxNBEN0LX8F8GShpRtiRKNDpLhJKKJCiWIqgCx+OI/ksa25vzt5kb/e0OxfJsvLT0uQn0FHTUIAQLRV7iQtIeNXbN/NmZ15ea+U5Sb5Eazdu3rp9Z/1u5979Bw8fdR8/OfC2cZKG0mrrDnP0pJWhISvWdFg7wirXNMqPB219dELOK2s+8aKmSYUzo0olkYM07Y72rAOekycoXDPzL9sHeEZToCugQjdTBjX0X7/KNvoweJdlncKSB2MZpA2joZ8FC+P0qB/DRwt7gw9v0nja7SVxcgG4TtIV6YkV9qfdz1lhZVORYanR+3Ga1DxZomMlNZ12ssZTjfIYZzQO1GBFfrK8COAUNoJSQBlOKa1p9wrq344lVt4vqjx0Vshzf7XWiv+rjRsutydLZeqGycjLj8pGA1to04RCOZKsF4GgdCrsCnKODiWHzDshhPTqydfJwWacJnH6frO3s7uKY108E8/FC5GKLbEj3op9MRRSnImv4rv4EZ1H36Kf0a/L1rVo5Xkq/kH0+w9bcayH</latexit><latexit sha1_base64="MNZd6d4WqE/DvhH93VkMRXIatwE=">AAACQHicbVA9bxNBEN0LX8F8GShpRtiRKNDpLhJKKJCiWIqgCx+OI/ksa25vzt5kb/e0OxfJsvLT0uQn0FHTUIAQLRV7iQtIeNXbN/NmZ15ea+U5Sb5Eazdu3rp9Z/1u5979Bw8fdR8/OfC2cZKG0mrrDnP0pJWhISvWdFg7wirXNMqPB219dELOK2s+8aKmSYUzo0olkYM07Y72rAOekycoXDPzL9sHeEZToCugQjdTBjX0X7/KNvoweJdlncKSB2MZpA2joZ8FC+P0qB/DRwt7gw9v0nja7SVxcgG4TtIV6YkV9qfdz1lhZVORYanR+3Ga1DxZomMlNZ12ssZTjfIYZzQO1GBFfrK8COAUNoJSQBlOKa1p9wrq344lVt4vqjx0Vshzf7XWiv+rjRsutydLZeqGycjLj8pGA1to04RCOZKsF4GgdCrsCnKODiWHzDshhPTqydfJwWacJnH6frO3s7uKY108E8/FC5GKLbEj3op9MRRSnImv4rv4EZ1H36Kf0a/L1rVo5Xkq/kH0+w9bcayH</latexit><latexit sha1_base64="MNZd6d4WqE/DvhH93VkMRXIatwE=">AAACQHicbVA9bxNBEN0LX8F8GShpRtiRKNDpLhJKKJCiWIqgCx+OI/ksa25vzt5kb/e0OxfJsvLT0uQn0FHTUIAQLRV7iQtIeNXbN/NmZ15ea+U5Sb5Eazdu3rp9Z/1u5979Bw8fdR8/OfC2cZKG0mrrDnP0pJWhISvWdFg7wirXNMqPB219dELOK2s+8aKmSYUzo0olkYM07Y72rAOekycoXDPzL9sHeEZToCugQjdTBjX0X7/KNvoweJdlncKSB2MZpA2joZ8FC+P0qB/DRwt7gw9v0nja7SVxcgG4TtIV6YkV9qfdz1lhZVORYanR+3Ga1DxZomMlNZ12ssZTjfIYZzQO1GBFfrK8COAUNoJSQBlOKa1p9wrq344lVt4vqjx0Vshzf7XWiv+rjRsutydLZeqGycjLj8pGA1to04RCOZKsF4GgdCrsCnKODiWHzDshhPTqydfJwWacJnH6frO3s7uKY108E8/FC5GKLbEj3op9MRRSnImv4rv4EZ1H36Kf0a/L1rVo5Xkq/kH0+w9bcayH</latexit>
Suppose treatment e↵ect ✓j 2 {±0.1} for all j,and experimental observations are normalized toXj ⇠ N(✓j , 1).
<latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit><latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit><latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit><latexit sha1_base64="gfM/xJEjMu7AD/AFwnxRplluGo0=">AAACeHicbVHLbtNAFB2bVwmPBlh2c0Vc0UpVZGdDlxVsWKEiSBspE1nj8XUz6TysmXHVYOUb+Dd2fAgbVozdIEHLXR2dO0dnzrlFLYXzafojiu/df/Dw0c7jwZOnz57vDl+8PHOmsRyn3EhjZwVzKIXGqRde4qy2yFQh8by4fN/tz6/QOmH0F7+ucaHYhRaV4MwHKh9++9zUtXEIPqi8Qu0Bqwq5h4T6JXqWr6jQtKW1gnSc0U0ClbHApIRklRxRCgOmS8DrGq3o5EyCKRzaq97AAbMI2ljFpPiKJXgDlA4gmeUroE4o+Hjwx+coO0zG+XCUjtN+4C7ItmBEtnOaD7/T0vCms+aSOTfP0tovWma94BI3A9o4rBm/ZBc4D1AzhW7R9sVtYD8wZR+oMiF5z/6taJlybq2K8FIxv3S3dx35v9288dXxohW6bjxqfmNUNbKL310BSmFDx3IdAONWhL8CXzLLuA+3GoQSstuR74KzyTgLJ/k0GZ2829axQ/bIa3JAMvKWnJAP5JRMCSc/o70oifajXzHEb+LDm6dxtNW8Iv9MPPkNyNS8yQ==</latexit>
Constructing marginal 95% CIs for all parametersfails to control FCR at 0.05.
<latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit><latexit sha1_base64="FMP/KogEvHFO2tjJge5o3+g3UBs=">AAACP3icbVA9TyMxFPTyfeErHCXNEwkSVbSLhIAOEekEHSASkLJR9NbxBguvvbLfIkUR/4zm/sJ1tDQUnE60dOcNKfh61WjmPXtmklxJR2H4EExNz8zOzS/8qCwuLa+sVtd+tp0pLBctbpSxVwk6oaQWLZKkxFVuBWaJEpfJTbPUL2+FddLoCxrmopvhQMtUciRP9artptGObMFJ6gFkaAdSo4L6wW68VYfmiYPUWEClIEeLmSD/VhxDJUWpHJABbjRZo+BX8xyQoB42wt16o1eteTAe+AqiCaixyZz2qn/ivuFFJjRxhc51ojCn7ggtSa7EXSUunMiR3+BAdDzU3orrjsb572DLM/2x0dS7gTH7/mKEmXPDLPGbGdK1+6yV5Hdap6B0vzuSOi9IaP72UVqoMndZJvSlFZzU0APkVnqvwK99T7ysqeJLiD5H/graO40obERnO7XDo0kdC2yDbbJtFrE9dsiO2SlrMc7u2SN7Zn+D38FT8C94eVudCiY36+zDBK//AfwqrFs=</latexit>
![Page 207: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/207.jpg)
Can we control FCR *online*?
When experiment j starts, we must assigna target confidence level ↵j .When experiment j ends, we must decideif we wish to report ✓j .This must be done such that the FCRis controlled at any time.
<latexit sha1_base64="5F3c2hZSIZVBX2jI9lT37fkUWZU=">AAAC33icbVJNb9NAEF2brxK+Ahy5jGiQOKDI7gWOFZUQx4KaplIcRev1ON52vWvtjluiKBcuHECIK3+LG3+EM+MmSKXtSLae570Zz77ZvDE6UJL8juIbN2/dvrN1t3fv/oOHj/qPnxwG13qFI+WM80e5DGi0xRFpMnjUeJR1bnCcn+x1/PgUfdDOHtCiwWkt51aXWkni1Kz/J7NO2wItwbhCC/ipQa/r7ntwPIBA0lN4BWcIdRsIZAh6brMMehKYmiOBcrbU3EAhGDxFA4NMmqaSs+PBMMt613VFW1zoWaDiepbqssud6VABOfDYOM/6jCqkf90OKh3WVTlC4SxCaBXLK0n8Qni397EbjkU8FnlnDBbAXIZ1Uy2lXayAeI7hrL+dDJPzgKsg3YBtsYn9Wf9XVjjVdkdQhl2YpElD0yW7o5XBVS9rAzZSncg5ThhaWWOYLs/3s4IXnCmgdJ4f2znG2YsVS1mHsKhzVtaSqnCZ65LXcZOWyjfTpbZNS7yA9Y/K1nT2dcuGQntUZBYMpPKaZwVVSS8V8ZXosQnp5SNfBYc7wzQZph92tnffbuzYEs/Ec/FSpOK12BXvxb4YCRVl0efoa/QtlvGX+Hv8Yy2No03NU/FfxD//Aijn46o=</latexit><latexit sha1_base64="5F3c2hZSIZVBX2jI9lT37fkUWZU=">AAAC33icbVJNb9NAEF2brxK+Ahy5jGiQOKDI7gWOFZUQx4KaplIcRev1ON52vWvtjluiKBcuHECIK3+LG3+EM+MmSKXtSLae570Zz77ZvDE6UJL8juIbN2/dvrN1t3fv/oOHj/qPnxwG13qFI+WM80e5DGi0xRFpMnjUeJR1bnCcn+x1/PgUfdDOHtCiwWkt51aXWkni1Kz/J7NO2wItwbhCC/ipQa/r7ntwPIBA0lN4BWcIdRsIZAh6brMMehKYmiOBcrbU3EAhGDxFA4NMmqaSs+PBMMt613VFW1zoWaDiepbqssud6VABOfDYOM/6jCqkf90OKh3WVTlC4SxCaBXLK0n8Qni397EbjkU8FnlnDBbAXIZ1Uy2lXayAeI7hrL+dDJPzgKsg3YBtsYn9Wf9XVjjVdkdQhl2YpElD0yW7o5XBVS9rAzZSncg5ThhaWWOYLs/3s4IXnCmgdJ4f2znG2YsVS1mHsKhzVtaSqnCZ65LXcZOWyjfTpbZNS7yA9Y/K1nT2dcuGQntUZBYMpPKaZwVVSS8V8ZXosQnp5SNfBYc7wzQZph92tnffbuzYEs/Ec/FSpOK12BXvxb4YCRVl0efoa/QtlvGX+Hv8Yy2No03NU/FfxD//Aijn46o=</latexit><latexit sha1_base64="5F3c2hZSIZVBX2jI9lT37fkUWZU=">AAAC33icbVJNb9NAEF2brxK+Ahy5jGiQOKDI7gWOFZUQx4KaplIcRev1ON52vWvtjluiKBcuHECIK3+LG3+EM+MmSKXtSLae570Zz77ZvDE6UJL8juIbN2/dvrN1t3fv/oOHj/qPnxwG13qFI+WM80e5DGi0xRFpMnjUeJR1bnCcn+x1/PgUfdDOHtCiwWkt51aXWkni1Kz/J7NO2wItwbhCC/ipQa/r7ntwPIBA0lN4BWcIdRsIZAh6brMMehKYmiOBcrbU3EAhGDxFA4NMmqaSs+PBMMt613VFW1zoWaDiepbqssud6VABOfDYOM/6jCqkf90OKh3WVTlC4SxCaBXLK0n8Qni397EbjkU8FnlnDBbAXIZ1Uy2lXayAeI7hrL+dDJPzgKsg3YBtsYn9Wf9XVjjVdkdQhl2YpElD0yW7o5XBVS9rAzZSncg5ThhaWWOYLs/3s4IXnCmgdJ4f2znG2YsVS1mHsKhzVtaSqnCZ65LXcZOWyjfTpbZNS7yA9Y/K1nT2dcuGQntUZBYMpPKaZwVVSS8V8ZXosQnp5SNfBYc7wzQZph92tnffbuzYEs/Ec/FSpOK12BXvxb4YCRVl0efoa/QtlvGX+Hv8Yy2No03NU/FfxD//Aijn46o=</latexit><latexit sha1_base64="5F3c2hZSIZVBX2jI9lT37fkUWZU=">AAAC33icbVJNb9NAEF2brxK+Ahy5jGiQOKDI7gWOFZUQx4KaplIcRev1ON52vWvtjluiKBcuHECIK3+LG3+EM+MmSKXtSLae570Zz77ZvDE6UJL8juIbN2/dvrN1t3fv/oOHj/qPnxwG13qFI+WM80e5DGi0xRFpMnjUeJR1bnCcn+x1/PgUfdDOHtCiwWkt51aXWkni1Kz/J7NO2wItwbhCC/ipQa/r7ntwPIBA0lN4BWcIdRsIZAh6brMMehKYmiOBcrbU3EAhGDxFA4NMmqaSs+PBMMt613VFW1zoWaDiepbqssud6VABOfDYOM/6jCqkf90OKh3WVTlC4SxCaBXLK0n8Qni397EbjkU8FnlnDBbAXIZ1Uy2lXayAeI7hrL+dDJPzgKsg3YBtsYn9Wf9XVjjVdkdQhl2YpElD0yW7o5XBVS9rAzZSncg5ThhaWWOYLs/3s4IXnCmgdJ4f2znG2YsVS1mHsKhzVtaSqnCZ65LXcZOWyjfTpbZNS7yA9Y/K1nT2dcuGQntUZBYMpPKaZwVVSS8V8ZXosQnp5SNfBYc7wzQZph92tnffbuzYEs/Ec/FSpOK12BXvxb4YCRVl0efoa/QtlvGX+Hv8Yy2No03NU/FfxD//Aijn46o=</latexit>
![Page 208: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/208.jpg)
Can we control FCR *online*?
When experiment j starts, we must assigna target confidence level ↵j .When experiment j ends, we must decideif we wish to report ✓j .This must be done such that the FCRis controlled at any time.
<latexit sha1_base64="5F3c2hZSIZVBX2jI9lT37fkUWZU=">AAAC33icbVJNb9NAEF2brxK+Ahy5jGiQOKDI7gWOFZUQx4KaplIcRev1ON52vWvtjluiKBcuHECIK3+LG3+EM+MmSKXtSLae570Zz77ZvDE6UJL8juIbN2/dvrN1t3fv/oOHj/qPnxwG13qFI+WM80e5DGi0xRFpMnjUeJR1bnCcn+x1/PgUfdDOHtCiwWkt51aXWkni1Kz/J7NO2wItwbhCC/ipQa/r7ntwPIBA0lN4BWcIdRsIZAh6brMMehKYmiOBcrbU3EAhGDxFA4NMmqaSs+PBMMt613VFW1zoWaDiepbqssud6VABOfDYOM/6jCqkf90OKh3WVTlC4SxCaBXLK0n8Qni397EbjkU8FnlnDBbAXIZ1Uy2lXayAeI7hrL+dDJPzgKsg3YBtsYn9Wf9XVjjVdkdQhl2YpElD0yW7o5XBVS9rAzZSncg5ThhaWWOYLs/3s4IXnCmgdJ4f2znG2YsVS1mHsKhzVtaSqnCZ65LXcZOWyjfTpbZNS7yA9Y/K1nT2dcuGQntUZBYMpPKaZwVVSS8V8ZXosQnp5SNfBYc7wzQZph92tnffbuzYEs/Ec/FSpOK12BXvxb4YCRVl0efoa/QtlvGX+Hv8Yy2No03NU/FfxD//Aijn46o=</latexit><latexit sha1_base64="5F3c2hZSIZVBX2jI9lT37fkUWZU=">AAAC33icbVJNb9NAEF2brxK+Ahy5jGiQOKDI7gWOFZUQx4KaplIcRev1ON52vWvtjluiKBcuHECIK3+LG3+EM+MmSKXtSLae570Zz77ZvDE6UJL8juIbN2/dvrN1t3fv/oOHj/qPnxwG13qFI+WM80e5DGi0xRFpMnjUeJR1bnCcn+x1/PgUfdDOHtCiwWkt51aXWkni1Kz/J7NO2wItwbhCC/ipQa/r7ntwPIBA0lN4BWcIdRsIZAh6brMMehKYmiOBcrbU3EAhGDxFA4NMmqaSs+PBMMt613VFW1zoWaDiepbqssud6VABOfDYOM/6jCqkf90OKh3WVTlC4SxCaBXLK0n8Qni397EbjkU8FnlnDBbAXIZ1Uy2lXayAeI7hrL+dDJPzgKsg3YBtsYn9Wf9XVjjVdkdQhl2YpElD0yW7o5XBVS9rAzZSncg5ThhaWWOYLs/3s4IXnCmgdJ4f2znG2YsVS1mHsKhzVtaSqnCZ65LXcZOWyjfTpbZNS7yA9Y/K1nT2dcuGQntUZBYMpPKaZwVVSS8V8ZXosQnp5SNfBYc7wzQZph92tnffbuzYEs/Ec/FSpOK12BXvxb4YCRVl0efoa/QtlvGX+Hv8Yy2No03NU/FfxD//Aijn46o=</latexit><latexit sha1_base64="5F3c2hZSIZVBX2jI9lT37fkUWZU=">AAAC33icbVJNb9NAEF2brxK+Ahy5jGiQOKDI7gWOFZUQx4KaplIcRev1ON52vWvtjluiKBcuHECIK3+LG3+EM+MmSKXtSLae570Zz77ZvDE6UJL8juIbN2/dvrN1t3fv/oOHj/qPnxwG13qFI+WM80e5DGi0xRFpMnjUeJR1bnCcn+x1/PgUfdDOHtCiwWkt51aXWkni1Kz/J7NO2wItwbhCC/ipQa/r7ntwPIBA0lN4BWcIdRsIZAh6brMMehKYmiOBcrbU3EAhGDxFA4NMmqaSs+PBMMt613VFW1zoWaDiepbqssud6VABOfDYOM/6jCqkf90OKh3WVTlC4SxCaBXLK0n8Qni397EbjkU8FnlnDBbAXIZ1Uy2lXayAeI7hrL+dDJPzgKsg3YBtsYn9Wf9XVjjVdkdQhl2YpElD0yW7o5XBVS9rAzZSncg5ThhaWWOYLs/3s4IXnCmgdJ4f2znG2YsVS1mHsKhzVtaSqnCZ65LXcZOWyjfTpbZNS7yA9Y/K1nT2dcuGQntUZBYMpPKaZwVVSS8V8ZXosQnp5SNfBYc7wzQZph92tnffbuzYEs/Ec/FSpOK12BXvxb4YCRVl0efoa/QtlvGX+Hv8Yy2No03NU/FfxD//Aijn46o=</latexit><latexit sha1_base64="5F3c2hZSIZVBX2jI9lT37fkUWZU=">AAAC33icbVJNb9NAEF2brxK+Ahy5jGiQOKDI7gWOFZUQx4KaplIcRev1ON52vWvtjluiKBcuHECIK3+LG3+EM+MmSKXtSLae570Zz77ZvDE6UJL8juIbN2/dvrN1t3fv/oOHj/qPnxwG13qFI+WM80e5DGi0xRFpMnjUeJR1bnCcn+x1/PgUfdDOHtCiwWkt51aXWkni1Kz/J7NO2wItwbhCC/ipQa/r7ntwPIBA0lN4BWcIdRsIZAh6brMMehKYmiOBcrbU3EAhGDxFA4NMmqaSs+PBMMt613VFW1zoWaDiepbqssud6VABOfDYOM/6jCqkf90OKh3WVTlC4SxCaBXLK0n8Qni397EbjkU8FnlnDBbAXIZ1Uy2lXayAeI7hrL+dDJPzgKsg3YBtsYn9Wf9XVjjVdkdQhl2YpElD0yW7o5XBVS9rAzZSncg5ThhaWWOYLs/3s4IXnCmgdJ4f2znG2YsVS1mHsKhzVtaSqnCZ65LXcZOWyjfTpbZNS7yA9Y/K1nT2dcuGQntUZBYMpPKaZwVVSS8V8ZXosQnp5SNfBYc7wzQZph92tnffbuzYEs/Ec/FSpOK12BXvxb4YCRVl0efoa/QtlvGX+Hv8Yy2No03NU/FfxD//Aijn46o=</latexit>
Weinstein, Ramdas ‘19
![Page 209: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/209.jpg)
A simple solution for both online FDR control, andonline FCR control
![Page 210: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/210.jpg)
Online FCR control: the main idea
Maintain [FCP(T ) :=PT
i=1 ↵i
1 _PT
i=1 Si
↵.<latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit>
Let Si 2 {0, 1} denote the selection decisionmade after experiment i.
<latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit>
![Page 211: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/211.jpg)
Online FCR control: the main idea
Weinstein & Ramdas ’19
Maintain [FCP(T ) :=PT
i=1 ↵i
1 _PT
i=1 Si
↵.<latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit>
Let Si 2 {0, 1} denote the selection decisionmade after experiment i.
<latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit>
![Page 212: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/212.jpg)
Online FCR control: the main idea
Weinstein & Ramdas ’19
Maintain [FCP(T ) :=PT
i=1 ↵i
1 _PT
i=1 Si
↵.<latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit>
Let Si 2 {0, 1} denote the selection decisionmade after experiment i.
<latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit>
For testing, let Ri ∈ {0,1} represent a rejection.
Then maintain ̂FDP (T ) :=∑T
i=1 αi
1 ∨ ∑Ti=1 Ri
≤ α .
![Page 213: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/213.jpg)
Online FCR control: the main idea
Weinstein & Ramdas ’19
Maintain [FCP(T ) :=PT
i=1 ↵i
1 _PT
i=1 Si
↵.<latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit><latexit sha1_base64="h2Dsi8pb9WFrVY+BGXU36brflXA=">AAACVnicbVFNa9wwEJWdpkm3X0567EV0KaSXxS6BhkAgNFB6KWzpbhJYb81YO86KSLIjjdMsxn+yvbQ/pZdS7a4PTdKBgcd7b0bSU14p6SiOfwXhxoPNh1vbj3qPnzx99jza2T11ZW0FjkWpSnueg0MlDY5JksLzyiLoXOFZfnmy1M+u0TpZmhEtKpxquDCykALIU1mkU8Ibaj6BNOSbtzz9Jmc4B2rWyoeTYdvujd4cHqWFBdGkrtZZI4+S9uuIp6CqOWSSt03C02tEfkv+kkm/T+FVZxxkUT8exKvi90HSgT7raphF39NZKWqNhoQC5yZJXNG0AUtSKGx7ae2wAnEJFzjx0IBGN21WsbT8tWdmvCitb0N8xf470YB2bqFz79RAc3dXW5L/0yY1FQfTRpqqJjRifVBRK04lX2bMZ9KiILXwAISV/q5czMGnR/4nej6E5O6T74PTt4MkHiSf9/vH77s4ttlL9ortsYS9Y8fsIxuyMRPsB/sdhMFG8DP4E26GW2trGHQzL9itCqO/IGOz7g==</latexit>
Let Si 2 {0, 1} denote the selection decisionmade after experiment i.
<latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit><latexit sha1_base64="GOHWK6b8TQqolUAuucVsdwvzEVA=">AAACRHicbVA9T8MwEHX4pnwVGFlOtEgMqEpYYESwMDCAoAWpqSrHuVALx45sB1FF/XEs/AA2fgELAwixIpy2A18nWXp67+7d+UWZ4Mb6/pM3MTk1PTM7N19ZWFxaXqmurrWMyjXDJlNC6auIGhRcYtNyK/Aq00jTSOBldHNU6pe3qA1X8sL2M+yk9FryhDNqHdWttkOpuIxRWjhBC/XzLoeQSwgLfycIB3VwkrIItofgtiArxxzJeGkZhlBJaYxAE4sa8C5DzdPSrM7rjW615jf8YcFfEIxBjYzrtFt9DGPF8tKACWpMO/Az2ymotpwJHFTC3GBG2Q29xraDkqZoOsUwhAFsOSaGRGn33AFD9vtEQVNj+mnkOlNqe+a3VpL/ae3cJvudgssstyjZaFGSC7AKykQh5trFIvoOUKa5uxVYj2rKXCSm4kIIfn/5L2jtNgK/EZzt1g4Ox3HMkQ2ySbZJQPbIATkmp6RJGLknz+SVvHkP3ov37n2MWie88cw6+VHe5xcxvK/J</latexit>
For testing, let Ri ∈ {0,1} represent a rejection.
Then maintain ̂FDP (T ) :=∑T
i=1 αi
1 ∨ ∑Ti=1 Ri
≤ α .
This provably controls FCR/FDR at level α .
![Page 214: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/214.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
![Page 215: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/215.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
α1
![Page 216: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/216.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
α1
α2
![Page 217: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/217.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
Expts. use wealth
α1
α2
α3
![Page 218: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/218.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
Expts. use wealthSelections
earn wealth
α1
α2
α3
α4
![Page 219: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/219.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
Expts. use wealthSelections
earn wealth
α1
α2
α3
α4
![Page 220: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/220.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
Expts. use wealthSelections
earn wealth
Error budgetis data-dependent
α1
α2
α3
α4
α5
![Page 221: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/221.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
Expts. use wealthSelections
earn wealth
Error budgetis data-dependent
Infinite process
α1
α2
α3
α4
α5
α6
![Page 222: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/222.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
Expts. use wealthSelections
earn wealth
Error budgetis data-dependent
Infinite process
α1
α2
α3
α4
α5
α6
![Page 223: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/223.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
Expts. use wealthSelections
earn wealth
Error budgetis data-dependent
Infinite process
α1
α2
α3
α4
α5
α6
![Page 224: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/224.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
Expts. use wealthSelections
earn wealth
Error budgetis data-dependent
Infinite process
α1
α2
α3
α4
α5
α6
![Page 225: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/225.jpg)
Online FCR control : high-level picture
Remaining error budget or “alpha-wealth”
Error budget for first expt.
Error budget for second expt.
Expts. use wealthSelections
earn wealth
Error budgetis data-dependent
Infinite process
α1
α2
α3
α4
α5
α6
![Page 226: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/226.jpg)
Summary of this section
![Page 227: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/227.jpg)
Summary of this section
even if hypotheses, data, p-values are independent.8t, errort ↵ does not imply 8t, FDR(t) ↵,
![Page 228: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/228.jpg)
Summary of this section
even if hypotheses, data, p-values are independent.
Can track a running estimate of the FDP (or FCP):a simple update rule to keep this estimate boundedalso results in the FDR (or FCR) being controlled.
8t, errort ↵ does not imply 8t, FDR(t) ↵,
![Page 229: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/229.jpg)
Handling local dependence
Most online FDR algorithms assume independent p-values(but hypotheses can be dependent).
![Page 230: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/230.jpg)
Handling local dependence
Most online FDR algorithms assume independent p-values(but hypotheses can be dependent).
However, assuming arbitrary dependence between all p-valuesis also extremely pessimistic and unrealistic.
![Page 231: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/231.jpg)
Handling local dependence
Most online FDR algorithms assume independent p-values(but hypotheses can be dependent).
However, assuming arbitrary dependence between all p-valuesis also extremely pessimistic and unrealistic.
A middle ground is a flexible notion of local dependence:Pt arbitrarily depends on the previous Lt p-values,where Lt is a user-chosen lag-parameter .
![Page 232: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/232.jpg)
Handling local dependence
Zrnic, Ramdas, Jordan ’18
Most online FDR algorithms assume independent p-values(but hypotheses can be dependent).
However, assuming arbitrary dependence between all p-valuesis also extremely pessimistic and unrealistic.
A middle ground is a flexible notion of local dependence:Pt arbitrarily depends on the previous Lt p-values,where Lt is a user-chosen lag-parameter .
The online FDR and FCR algorithms can be easily modifiedto handle local dependence.
![Page 233: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/233.jpg)
Solutions for these issues
![Page 234: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/234.jpg)
Inner sequential process:
“confidence sequence” for estimation also called “anytime confidence intervals” (correspondingly, “always valid p-values” for testing)
Solutions for these issues
Part I
![Page 235: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/235.jpg)
Inner sequential process:
“confidence sequence” for estimation also called “anytime confidence intervals” (correspondingly, “always valid p-values” for testing)
Outer sequential process:
“false coverage rate” for estimation (correspondingly, “false discovery rate” for testing)
Solutions for these issues
Part II
Part I
![Page 236: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/236.jpg)
Inner sequential process:
“confidence sequence” for estimation also called “anytime confidence intervals” (correspondingly, “always valid p-values” for testing)
Outer sequential process:
“false coverage rate” for estimation (correspondingly, “false discovery rate” for testing)
Solutions for these issues
Part II
Part I
Modular solutions: fit well together Many extensions to each piece
Part III
![Page 237: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/237.jpg)
Putting the modular pieces together: the doubly-sequential process
[Next 10 mins]
![Page 238: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/238.jpg)
Putting the modular pieces together: the doubly-sequential process
[Next 10 mins]
Part III
![Page 239: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/239.jpg)
Combining inner and outer solutions (FCR):
α1
α2
α3
αi
α4
α5
![Page 240: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/240.jpg)
Combining inner and outer solutions (FCR):
α1
α2
α3
αi
α4
α5
(a) Online FCR method assigns αi when expt. starts
![Page 241: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/241.jpg)
Combining inner and outer solutions (FCR):
α1
α2
α3
αi
α4
α5
(a) Online FCR method assigns αi when expt. starts(b) We keep track of (1 − αi) confidence sequence
![Page 242: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/242.jpg)
Combining inner and outer solutions (FCR):
α1
α2
α3
αi
α4
α5
(a) Online FCR method assigns αi when expt. starts(b) We keep track of (1 − αi) confidence sequence(c) Adaptively decide to stop, to report final CI or not
![Page 243: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/243.jpg)
Combining inner and outer solutions (FCR):
α1
α2
α3
αi
α4
α5
(a) Online FCR method assigns αi when expt. starts(b) We keep track of (1 − αi) confidence sequence(c) Adaptively decide to stop, to report final CI or not(d) Guarantee FCR(T) ≤ α at any time positive time T
![Page 244: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/244.jpg)
Time / Samples
Exp.
Combining inner and outer solutions (FCR):
α1
α2
α3
αi
α4
α5
(a) Online FCR method assigns αi when expt. starts(b) We keep track of (1 − αi) confidence sequence(c) Adaptively decide to stop, to report final CI or not(d) Guarantee FCR(T) ≤ α at any time positive time T
![Page 245: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/245.jpg)
Time / Samples
Exp.
Combining inner and outer solutions (FCR):
α1
α2
α3
αi
α4
α5
(a) Online FCR method assigns αi when expt. starts(b) We keep track of (1 − αi) confidence sequence(c) Adaptively decide to stop, to report final CI or not(d) Guarantee FCR(T) ≤ α at any time positive time T
![Page 246: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/246.jpg)
Time / Samples
Exp.
Combining inner and outer solutions (FDR):
α1
α2
α3
αi
α4
α5
![Page 247: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/247.jpg)
Time / Samples
Exp.
Combining inner and outer solutions (FDR):
α1
α2
α3
αi
α4
α5
(a) Online FDR method assigns αi when expt. starts
![Page 248: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/248.jpg)
Time / Samples
Exp.
Combining inner and outer solutions (FDR):
α1
α2
α3
αi
α4
α5
(a) Online FDR method assigns αi when expt. starts(b) We keep track of anytime p-value P(n)
i
![Page 249: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/249.jpg)
Time / Samples
Exp.
Combining inner and outer solutions (FDR):
α1
α2
α3
αi
α4
α5
(a) Online FDR method assigns αi when expt. starts(b) We keep track of anytime p-value P(n)
i(c) Adaptively stop at time τ, report discovery if P(τ)
i ≤ αi
![Page 250: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/250.jpg)
Time / Samples
Exp.
Combining inner and outer solutions (FDR):
α1
α2
α3
αi
α4
α5
(a) Online FDR method assigns αi when expt. starts(b) We keep track of anytime p-value P(n)
i(c) Adaptively stop at time τ, report discovery if P(τ)
i ≤ αi(d) Guarantee FDR(T) ≤ α at any time positive time T
![Page 251: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/251.jpg)
PART IV: Advanced topics (inner sequential process)
[Next 25 mins]
![Page 252: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/252.jpg)
Much more traffic needed by an A/B/n test
1. What if we are testing more than one alternative?
![Page 253: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/253.jpg)
1. Multi-armed bandits for hypothesis testing
What would you do?
![Page 254: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/254.jpg)
1. Multi-armed bandits for hypothesis testing
What would you do?Depends on the aim: minimize regret OR identify best arm?We would like to test null hypothesis
H0 : μA ≥ max{μB, μC} .
![Page 255: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/255.jpg)
1. Multi-armed bandits for hypothesis testing
What would you do?Depends on the aim: minimize regret OR identify best arm?We would like to test null hypothesis
Can design variant of UCB algorithms to define anytime p-value, with optimal sample complexity for high power.
H0 : μA ≥ max{μB, μC} .
![Page 256: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/256.jpg)
1. Multi-armed bandits for hypothesis testing
Yang, Ramdas, Jamieson, Wainwright ’17
What would you do?Depends on the aim: minimize regret OR identify best arm?We would like to test null hypothesis
Can design variant of UCB algorithms to define anytime p-value, with optimal sample complexity for high power.
H0 : μA ≥ max{μB, μC} .
![Page 257: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/257.jpg)
MAB-FDR meta algorithm
𝛼𝛼𝑗𝑗 𝑅𝑅𝑗𝑗 (𝛼𝛼𝑗𝑗)
Exp j
MAB Test𝑝𝑝𝑗𝑗 < 𝛼𝛼𝑗𝑗
𝑝𝑝𝑗𝑗 (𝛼𝛼𝑗𝑗)
𝛼𝛼j+1 𝑅𝑅j+1 (𝛼𝛼j+1)
Exp j+1
MAB Test𝑝𝑝j+1 < 𝛼𝛼j+1
𝑝𝑝j+1 (𝛼𝛼j+1)
Online FDR procedure
……
desired FDR level 𝛼𝛼
![Page 258: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/258.jpg)
2. Switch from estimating means to quantiles?
Let X ∼ F . Define the α-quantile as qα := sup{x : F(x) ≤ α} .
(hence q1/2 is the median)
![Page 259: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/259.jpg)
2. Switch from estimating means to quantiles?
Howard, Ramdas ‘19
Let X ∼ F . Define the α-quantile as qα := sup{x : F(x) ≤ α} .
(hence q1/2 is the median)
![Page 260: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/260.jpg)
2. Switch from estimating means to quantiles?
Howard, Ramdas ‘19
Let X ∼ F . Define the α-quantile as qα := sup{x : F(x) ≤ α} .
(hence q1/2 is the median)
Reasons to use quantiles include:
![Page 261: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/261.jpg)
2. Switch from estimating means to quantiles?
Howard, Ramdas ‘19
Let X ∼ F . Define the α-quantile as qα := sup{x : F(x) ≤ α} .
(hence q1/2 is the median)
Reasons to use quantiles include:Quantiles always exist for any distribution, while means (moments) do not always exist (eg: Cauchy).
![Page 262: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/262.jpg)
2. Switch from estimating means to quantiles?
Howard, Ramdas ‘19
Let X ∼ F . Define the α-quantile as qα := sup{x : F(x) ≤ α} .
(hence q1/2 is the median)
Reasons to use quantiles include:Quantiles always exist for any distribution, while means (moments) do not always exist (eg: Cauchy).Quantiles can be defined for any totally ordered space,eg: ratings A-F, where “distance between ratings” undefined.
![Page 263: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/263.jpg)
2. Switch from estimating means to quantiles?
Howard, Ramdas ‘19
Let X ∼ F . Define the α-quantile as qα := sup{x : F(x) ≤ α} .
(hence q1/2 is the median)
Reasons to use quantiles include:Quantiles always exist for any distribution, while means (moments) do not always exist (eg: Cauchy).Quantiles can be defined for any totally ordered space,eg: ratings A-F, where “distance between ratings” undefined.Estimating quantiles can be done sequentially, withoutany tail assumptions, unlike estimating means.
![Page 264: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/264.jpg)
2. Switch from estimating means to quantiles?
Howard, Ramdas ‘19
Let X ∼ F . Define the α-quantile as qα := sup{x : F(x) ≤ α} .
(hence q1/2 is the median)
Reasons to use quantiles include:Quantiles always exist for any distribution, while means (moments) do not always exist (eg: Cauchy).Quantiles can be defined for any totally ordered space,eg: ratings A-F, where “distance between ratings” undefined.Estimating quantiles can be done sequentially, withoutany tail assumptions, unlike estimating means.Can run A/B tests and get always valid p-valuesfor testing the difference in quantiles.
![Page 265: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/265.jpg)
2. Switch from estimating means to quantiles?
Howard, Ramdas ‘19
Let X ∼ F . Define the α-quantile as qα := sup{x : F(x) ≤ α} .
(hence q1/2 is the median)
Reasons to use quantiles include:Quantiles always exist for any distribution, while means (moments) do not always exist (eg: Cauchy).Quantiles can be defined for any totally ordered space,eg: ratings A-F, where “distance between ratings” undefined.Estimating quantiles can be done sequentially, withoutany tail assumptions, unlike estimating means.Can run A/B tests and get always valid p-valuesfor testing the difference in quantiles.Can run bandit experiments, including best-arm identification.
![Page 266: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/266.jpg)
2. Switch from estimating means to quantiles?
Howard, Ramdas ‘19
Let X ∼ F . Define the α-quantile as qα := sup{x : F(x) ≤ α} .
(hence q1/2 is the median)
Reasons to use quantiles include:Quantiles always exist for any distribution, while means (moments) do not always exist (eg: Cauchy).Quantiles can be defined for any totally ordered space,eg: ratings A-F, where “distance between ratings” undefined.Estimating quantiles can be done sequentially, withoutany tail assumptions, unlike estimating means.Can run A/B tests and get always valid p-valuesfor testing the difference in quantiles.Can run bandit experiments, including best-arm identification.
Can also estimate all quantiles simultaneously!
![Page 267: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/267.jpg)
2. Quantiles are informative for heavy tails
Possible observations
Prob.density
q0 q1/2 q0.8
Mean = + ∞
Eg: amount of time spent on Reddit
(heavy right tail)
![Page 268: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/268.jpg)
2. The mean need not even exist (eg: Cauchy)
Prob.density
q0.3 q1/2 q0.8
Mean is undefined.
Eg: amount of money won/lost in a casino
(heavy right tail)(heavy left tail)
![Page 269: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/269.jpg)
2. The mean need not even exist (eg: Cauchy)
Prob.density
q0.3 q1/2 q0.8
Mean is undefined.
Eg: amount of money won/lost in a casino
Do not need to resort to trimming “outliers”.(How to pick threshold? Throw away or cap?)
(heavy right tail)(heavy left tail)
![Page 270: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/270.jpg)
2. The same could arise in discrete settings
Prob.mass
q0.8q1/2
Mean = + ∞
Eg: number of links clicked
…
pk ∝ 1/k2 ⟹
![Page 271: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/271.jpg)
2. Quantile sensible in totally ordered settings
Prob.mass
q0.8q1/2
Mean is undefined.
Eg: grades or non-numerical ratings
…A < B < C < D < E < …
![Page 272: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/272.jpg)
2. Quantile sensible in totally ordered settings
Prob.mass
q0.8q1/2
Mean is undefined.
Eg: grades or non-numerical ratings
…A < B < C < D < E < …
Do not need to artificially assign numerical values.(Are they equally spaced? Spacing and start point matter.)
![Page 273: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/273.jpg)
2. A/B testing with quantiles
H0 : q0.9(A) = q0.9(B)
First pick target quantile α (say 0.9).
H1 : q0.9(A) < q0.9(B)
![Page 274: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/274.jpg)
2. A/B testing with quantiles
H0 : q0.9(A) = q0.9(B)
First pick target quantile α (say 0.9).
H1 : q0.9(A) < q0.9(B)
Howard, Ramdas ‘19
![Page 275: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/275.jpg)
2. A/B testing with quantiles
H0 : q0.9(A) = q0.9(B)
First pick target quantile α (say 0.9).
H1 : q0.9(A) < q0.9(B)
Howard, Ramdas ‘19
Can construct always valid p-value.
![Page 276: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/276.jpg)
2. A/B testing with quantiles
H0 : q0.9(A) = q0.9(B)
First pick target quantile α (say 0.9).
H1 : q0.9(A) < q0.9(B)
Howard, Ramdas ‘19
If numerical, can construct confidence sequence for q0.9(B) − q0.9(A) .
Can construct always valid p-value.
![Page 277: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/277.jpg)
2. A/B testing with quantiles
H0 : q0.9(A) = q0.9(B)
First pick target quantile α (say 0.9).
H1 : q0.9(A) < q0.9(B)
Howard, Ramdas ‘19
If numerical, can construct confidence sequence for q0.9(B) − q0.9(A) .
Can construct always valid p-value.
(In that case, one way to define sequential p-value is the smallest δ such that the (1 − δ) CS overlaps with ℝ−
0 .)
![Page 278: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/278.jpg)
2. Best-arm identification with quantiles
Howard, Ramdas ‘19
…
Which arm has the highest 80% quantile?
![Page 279: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/279.jpg)
2. Best-arm identification with quantiles
Howard, Ramdas ‘19
…
Which arm has the highest 80% quantile?
Can design MAB algorithms to adaptively determinethe “best” arm with a prescribed failure probability.
![Page 280: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/280.jpg)
2. Best-arm identification with quantiles
Howard, Ramdas ‘19
…
Which arm has the highest 80% quantile?
Can design MAB algorithms to adaptively determinethe “best” arm with a prescribed failure probability.
If the first arm is “special”, can design MAB algorithms to adaptivelytest the null hypothesis that A is best, and get a sequential p-value.
![Page 281: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/281.jpg)
3. Running intersections or minimums: pros/cons
Fact 1: if P(n) is an anytime p-value, so is minm≤n
P(m) .
Fact 2: if (L(n), U(n)) is a confidence sequence, so is ⋂m≤n
(L(n), U(n)) .
![Page 282: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/282.jpg)
3. Running intersections or minimums: pros/cons
Howard, Ramdas, McAuliffe, Sekhon ’19
Fact 1: if P(n) is an anytime p-value, so is minm≤n
P(m) .
Fact 2: if (L(n), U(n)) is a confidence sequence, so is ⋂m≤n
(L(n), U(n)) .
![Page 283: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/283.jpg)
3. Running intersections or minimums: pros/cons
Howard, Ramdas, McAuliffe, Sekhon ’19
Fact 1: if P(n) is an anytime p-value, so is minm≤n
P(m) .
Fact 2: if (L(n), U(n)) is a confidence sequence, so is ⋂m≤n
(L(n), U(n)) .
Pro of taking running intersections of CIs : Smaller width, hence tighter inference, without inflating error.
![Page 284: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/284.jpg)
3. Running intersections or minimums: pros/cons
Howard, Ramdas, McAuliffe, Sekhon ’19
Fact 1: if P(n) is an anytime p-value, so is minm≤n
P(m) .
Fact 2: if (L(n), U(n)) is a confidence sequence, so is ⋂m≤n
(L(n), U(n)) .
Pro of taking running intersections of CIs : Smaller width, hence tighter inference, without inflating error.
Con of taking running intersections of CIs : Can have intervals of decreasing width (great!) and then in thenext step, end up with an empty interval (disconcerting).
![Page 285: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/285.jpg)
3. Running intersections or minimums: pros/cons
Howard, Ramdas, McAuliffe, Sekhon ’19
Fact 1: if P(n) is an anytime p-value, so is minm≤n
P(m) .
Fact 2: if (L(n), U(n)) is a confidence sequence, so is ⋂m≤n
(L(n), U(n)) .
Pro of taking running intersections of CIs : Smaller width, hence tighter inference, without inflating error.
Con of taking running intersections of CIs : Can have intervals of decreasing width (great!) and then in thenext step, end up with an empty interval (disconcerting).
Pro of ending up with zero width : “Failing loudly”: you know you’re in the low-probability errorevent, or assumptions have been violated.
![Page 286: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/286.jpg)
Can change with time!(eg: keep groups balanced)
4. Sequential Average Treatment Effect estimation with adaptive randomization
Users of app or website
Sign up!
Sign up!……
50% 50%
A B
![Page 287: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/287.jpg)
Can change with time!(eg: keep groups balanced)
Howard, Ramdas, McAuliffe, Sekhon ’19
Can infer the treatment effect sequentially (Neyman-Rubin potential outcomes model) using anytime p-value or CI.
4. Sequential Average Treatment Effect estimation with adaptive randomization
Users of app or website
Sign up!
Sign up!……
50% 50%
A B
![Page 288: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/288.jpg)
PART V: Advanced topics (outer sequential process)
[Next 15 mins]
![Page 289: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/289.jpg)
Recent tests may be more relevant than older ones, and hence we may wish to smoothly forget the past.
1. Smoothly forgetting the past
![Page 290: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/290.jpg)
Recent tests may be more relevant than older ones, and hence we may wish to smoothly forget the past.
With this motivation, we may wish to control the
decaying memory FDR: (user-chosen decay d < 1)
1. Smoothly forgetting the past
![Page 291: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/291.jpg)
Recent tests may be more relevant than older ones, and hence we may wish to smoothly forget the past.
With this motivation, we may wish to control the
decaying memory FDR: (user-chosen decay d < 1)
mem-FDR(T ) = EP
t dT�t1(false discoveryt)Pt d
T�t1(discoveryt)
�
Ramdas et al. ’17
1. Smoothly forgetting the past
![Page 292: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/292.jpg)
Recent tests may be more relevant than older ones, and hence we may wish to smoothly forget the past.
With this motivation, we may wish to control the
decaying memory FDR: (user-chosen decay d < 1)
mem-FDR(T ) = EP
t dT�t1(false discoveryt)Pt d
T�t1(discoveryt)
�
Ramdas et al. ’17
1. Smoothly forgetting the past
(similarly mem-FCR)
![Page 293: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/293.jpg)
2. Post-hoc analysis
![Page 294: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/294.jpg)
2. Post-hoc analysis
Katsevich, Ramdas ’18
![Page 295: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/295.jpg)
2. Post-hoc analysis
Katsevich, Ramdas ’18
What if you did not use an online FCR or FDR algorithm,but at the end of the year, you would like to answer “based on the decisions made and error levels used, how large could my FCR or FDR be?”
![Page 296: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/296.jpg)
2. Post-hoc analysis
Katsevich, Ramdas ’18
What if you did not use an online FCR or FDR algorithm,but at the end of the year, you would like to answer “based on the decisions made and error levels used, how large could my FCR or FDR be?”
FDPt ≤1 + ∑i≤t αi
∑i≤t Ri⋅
log(1/δ)log(1 + log(1/δ))
simultaneously for all t .
With probability at least 1 − δ we have
![Page 297: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/297.jpg)
2. Post-hoc analysis
Katsevich, Ramdas ’18
What if you did not use an online FCR or FDR algorithm,but at the end of the year, you would like to answer “based on the decisions made and error levels used, how large could my FCR or FDR be?”
FDPt ≤1 + ∑i≤t αi
∑i≤t Ri⋅
log(1/δ)log(1 + log(1/δ))
simultaneously for all t .
With probability at least 1 − δ we have
FCPt ≤1 + ∑i≤t αi
∑i≤t Si⋅
log(1/δ)log(1 + log(1/δ))
simultaneously for all t .
![Page 298: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/298.jpg)
3. Weighted error metrics and algorithms
![Page 299: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/299.jpg)
3. Weighted error metrics and algorithms
But, different experiments may have differing importances.The usual error metrics count all mistakes equally.
![Page 300: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/300.jpg)
3. Weighted error metrics and algorithms
Benjamini, Hochberg ’97Ramdas, Yang, Jordan, Wainwright ’17
But, different experiments may have differing importances.The usual error metrics count all mistakes equally.
Can define “weighted” variants of FDR and FCR in the natural way: weighted sums in numerator/denominator.
![Page 301: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/301.jpg)
3. Weighted error metrics and algorithms
Benjamini, Hochberg ’97Ramdas, Yang, Jordan, Wainwright ’17
But, different experiments may have differing importances.The usual error metrics count all mistakes equally.
Can define “weighted” variants of FDR and FCR in the natural way: weighted sums in numerator/denominator.
Online FDR and FCR algorithms can be extended to control weighted error metrics.
![Page 302: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/302.jpg)
4. False-sign rate
![Page 303: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/303.jpg)
4. False-sign rate
Sometimes, all we want is a “sign decision” about parameter:an output of +1 if treatment effect is +ve,and an output of -1 if treatment effect is -ve,or no output at all if it is uncertain.
![Page 304: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/304.jpg)
4. False-sign rate
Sometimes, all we want is a “sign decision” about parameter:an output of +1 if treatment effect is +ve,and an output of -1 if treatment effect is -ve,or no output at all if it is uncertain.
We may correspondingly define the false sign rate as
FSR := 𝔼 [ # incorrect sign decisions made# sign decisions made ]
![Page 305: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/305.jpg)
4. False-sign rate
Weinstein, Ramdas ’19
Sometimes, all we want is a “sign decision” about parameter:an output of +1 if treatment effect is +ve,and an output of -1 if treatment effect is -ve,or no output at all if it is uncertain.
We may correspondingly define the false sign rate as
FSR := 𝔼 [ # incorrect sign decisions made# sign decisions made ]
To control the FSR, just using the online FCR algorithm, and report the sign iff the CI does not contain zero.
![Page 306: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/306.jpg)
Open Problems [5 mins]
![Page 307: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/307.jpg)
1. Errors and incentives in large organizations
A large number of different teams run such A/B tests or randomized experiments
From the larger organization’s perspective, coordination is necessary to control FDR or FCR,since that might affect the bottom line of the company.
![Page 308: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/308.jpg)
1. Errors and incentives in large organizations
A large number of different teams run such A/B tests or randomized experiments
From the larger organization’s perspective, coordination is necessary to control FDR or FCR,since that might affect the bottom line of the company.
But each individual group or team might feel“why do we have to pay if some other groupis running lots of random tests/experiments”?
![Page 309: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/309.jpg)
1. Errors and incentives in large organizations
A large number of different teams run such A/B tests or randomized experiments
From the larger organization’s perspective, coordination is necessary to control FDR or FCR,since that might affect the bottom line of the company.
But each individual group or team might feel“why do we have to pay if some other groupis running lots of random tests/experiments”?
How do we align incentives? Should our notion of error be hierarchical?
![Page 310: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/310.jpg)
1. A hierarchical FDR or FCR control?
Company
. . .
Product 1(Group 1)
Product 2(Group 2)
Product 15(Group 15)
desires FDR ≤ 0.1
The average of group FDRs does not give company FDR.
FDR is additive in the worst case: if each group separately controls FDR at 0.1, the company FDR could be trivial.
![Page 311: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/311.jpg)
2. Utilizing contextual information
Often, we have contextual information abouteach visitor (sample), like age, gender, etc.These have been utilized for contextualbandit algorithms that minimize regret.
![Page 312: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/312.jpg)
2. Utilizing contextual information
Often, we have contextual information abouteach visitor (sample), like age, gender, etc.These have been utilized for contextualbandit algorithms that minimize regret.
Is such information useful for hypothesis testing?How do we use contextual bandits for hypothesis testing?
![Page 313: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/313.jpg)
3. Designing systems that fail loudly
When our assumptions are wrong, and the system is not behaving like intended or expected, howcan we automatically detect and report this?
![Page 314: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/314.jpg)
3. Designing systems that fail loudly
When our assumptions are wrong, and the system is not behaving like intended or expected, howcan we automatically detect and report this?
Is it possible to design such self-critical systemsthat “announce” failures?
![Page 315: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/315.jpg)
Summary [15 mins]
![Page 316: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/316.jpg)
A selective history (inner process)
Time
![Page 317: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/317.jpg)
A selective history (inner process)
Fisher (1925) null hypothesis testing basicsrandomization for causal inference
Time
![Page 318: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/318.jpg)
A selective history (inner process)
Fisher (1925) null hypothesis testing basicsrandomization for causal inferenceWald (1948)
sequential probability ratio test(the first always-valid p-values)
Time
![Page 319: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/319.jpg)
A selective history (inner process)
Fisher (1925) null hypothesis testing basicsrandomization for causal inferenceWald (1948)
sequential probability ratio test(the first always-valid p-values)
Robbins (1952) multi-armed bandits
Time
![Page 320: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/320.jpg)
A selective history (inner process)
Fisher (1925) null hypothesis testing basicsrandomization for causal inferenceWald (1948)
sequential probability ratio test(the first always-valid p-values)
Darling & Robbins (1967) confidence sequences
(the first always valid CIs)
Robbins (1952) multi-armed bandits
Time
![Page 321: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/321.jpg)
A selective history (inner process)
Fisher (1925) null hypothesis testing basicsrandomization for causal inferenceWald (1948)
sequential probability ratio test(the first always-valid p-values)
Darling & Robbins (1967) confidence sequences
(the first always valid CIs)
Robbins (1952) multi-armed bandits
Lai, Siegmund,… (1970s) confidence sequences, inference after stopping experiments
Time
![Page 322: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/322.jpg)
A selective history (inner process)
Fisher (1925) null hypothesis testing basicsrandomization for causal inferenceWald (1948)
sequential probability ratio test(the first always-valid p-values)
Darling & Robbins (1967) confidence sequences
(the first always valid CIs)
Jennison & Turnbull (1980s) group sequential methods
(peeking only 2 or 3 times)
Robbins (1952) multi-armed bandits
Lai, Siegmund,… (1970s) confidence sequences, inference after stopping experiments
Time
![Page 323: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/323.jpg)
A selective history (outer process)
Time
![Page 324: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/324.jpg)
A selective history (outer process)
Tukey (1953) an unpublished book on the problem of multiple comparisons
Time
![Page 325: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/325.jpg)
A selective history (outer process)
Tukey (1953) an unpublished book on the problem of multiple comparisons
Eklund & Seeger (1963) define false discovery proportion
suggested heuristic algorithm
Time
![Page 326: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/326.jpg)
A selective history (outer process)
Tukey (1953) an unpublished book on the problem of multiple comparisons
Eklund & Seeger (1963) define false discovery proportion
suggested heuristic algorithmBenjamini & Hochberg (1995) rediscovered Eklund-Seeger methodfirst proof of FDR control
Time
![Page 327: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/327.jpg)
A selective history (outer process)
Tukey (1953) an unpublished book on the problem of multiple comparisons
Eklund & Seeger (1963) define false discovery proportion
suggested heuristic algorithm
Benjamini & Yekutieli (2005) false coverage rate (FCR)first methods to control it
Benjamini & Hochberg (1995) rediscovered Eklund-Seeger methodfirst proof of FDR control
Time
![Page 328: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/328.jpg)
A selective history (outer process)
Tukey (1953) an unpublished book on the problem of multiple comparisons
Eklund & Seeger (1963) define false discovery proportion
suggested heuristic algorithm
Benjamini & Yekutieli (2005) false coverage rate (FCR)first methods to control it
Benjamini & Hochberg (1995) rediscovered Eklund-Seeger methodfirst proof of FDR control
Foster & Stine (2008) conceptualized online FDR control first method to control it
Time
![Page 329: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/329.jpg)
In this tutorial, you learnt the basics of
![Page 330: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/330.jpg)
In this tutorial, you learnt the basics of
• How to think about a single experimentA. Why peeking is an issue in practiceB. Why applying a t-test repeatedly inflates errorsC. Anytime confidence intervals and p-values
![Page 331: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/331.jpg)
In this tutorial, you learnt the basics of
• How to think about a single experimentA. Why peeking is an issue in practiceB. Why applying a t-test repeatedly inflates errorsC. Anytime confidence intervals and p-values
• How to think about a sequence of experiments A. Why selective reporting is an issue in practiceB. Why Benjamini-Hochberg fails in the online settingC. Online FCR and FDR controlling algorithms
![Page 332: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/332.jpg)
In this tutorial, you learnt the basics of
• How to think about a single experimentA. Why peeking is an issue in practiceB. Why applying a t-test repeatedly inflates errorsC. Anytime confidence intervals and p-values
• How to think about a sequence of experiments A. Why selective reporting is an issue in practiceB. Why Benjamini-Hochberg fails in the online settingC. Online FCR and FDR controlling algorithms
• How to think about doubly-sequential experimentationA. Using anytime CIs with online FCR controlB. Using anytime p-values with online FDR controlC. Handling asynchronous tests with local dependence
![Page 333: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/333.jpg)
You also learnt some advanced topics:
![Page 334: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/334.jpg)
You also learnt some advanced topics:• Within a single experiment:
A. Using bandits for hypothesis testingB. Quantiles can be estimated sequentiallyC. The pros and cons of running intersectionsD. SATE with adaptive randomization
![Page 335: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/335.jpg)
You also learnt some advanced topics:• Within a single experiment:
A. Using bandits for hypothesis testingB. Quantiles can be estimated sequentiallyC. The pros and cons of running intersectionsD. SATE with adaptive randomization
• Across experiments: A. Error metrics with decaying-memoryB. The false sign rateC. Weighted error metricsD. Post-hoc analysis
![Page 336: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/336.jpg)
You also learnt some advanced topics:• Within a single experiment:
A. Using bandits for hypothesis testingB. Quantiles can be estimated sequentiallyC. The pros and cons of running intersectionsD. SATE with adaptive randomization
• Across experiments: A. Error metrics with decaying-memoryB. The false sign rateC. Weighted error metricsD. Post-hoc analysis
• Open problems: A. Incentives/errors within hierarchical organizationsB. Utilizing contextual information for testingC. Designing systems that fail loudly
![Page 337: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/337.jpg)
SOFTWARE
• Within a single experiment:Python package called “confseq”Maintained by Steve Howard (Berkeley)Frequent updates + wrappers for months to come
• Across experiments:R package called “onlineFDR”Maintained by David Robertson (Cambridge)Frequent updates + wrappers for months to come
References and links atwww.stat.cmu.edu/~aramdas/kdd19/
![Page 338: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/338.jpg)
Collaborators from this talk
Martin Wainwright
Michael Jordan
TijanaZrnic
Steve Howard
JasjeetSekhon
Jon McAuliffe
Asaf Weinstein
Kevin Jamieson
Eugene Katsevich
Jinjin Tian
Akshay Balsubramani
DavidRobertson
Fanny Yang
![Page 339: Foundations of large-scale “doubly-sequential” experimentationstat.cmu.edu/~aramdas/kdd19/ramdas-kdd-tut.pdfLarge-scale A/B-testing or other related forms of randomized experimentation](https://reader035.fdocuments.us/reader035/viewer/2022071108/5fe36e4cb14ace304e6b8231/html5/thumbnails/339.jpg)
Foundations of large-scale“doubly-sequential” experimentation
Aaditya Ramdas
Assistant ProfessorDept. of Statistics and Data Science Machine Learning Dept.Carnegie Mellon University
www.stat.cmu.edu/~aramdas/kdd19/
(KDD tutorial in Anchorage, on 4 Aug 2019)
Thank you! Questions?Funding welcomed!