SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
-
Upload
zack-notes -
Category
Marketing
-
view
290 -
download
0
description
Transcript of SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
![Page 1: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/1.jpg)
Sample Size
The indispensable A/B test calculation
that you’re not making.
![Page 2: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/2.jpg)
As Marketers, many of us run A/B Tests
![Page 3: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/3.jpg)
We test copy
![Page 4: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/4.jpg)
We test design
![Page 5: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/5.jpg)
We test subject lines
![Page 6: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/6.jpg)
We choose winners
![Page 7: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/7.jpg)
Version A is converting better than Version B and statistical significance
has breached 95%.
So, Version A won.
![Page 8: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/8.jpg)
Version A is converting better than Version B and statistical significance
has breached 95%.
So, Version A won.
OR DID IT?
![Page 9: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/9.jpg)
That math is half-baked
![Page 10: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/10.jpg)
Suppose you check an A/B Test twice: Once after 200 impressions and then after 500.
Then you end the test.
![Page 11: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/11.jpg)
Now, instead, suppose you stop the test once you reach significance:
![Page 12: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/12.jpg)
Now, suppose you stop the experiment as soon
as there is a significant result:
FALSE POSITIVE!
![Page 13: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/13.jpg)
![Page 14: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/14.jpg)
How often will you get a false positive?
![Page 15: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/15.jpg)
26.1%So you just went from 95% confidence to 74%
This is a worst-case scenario. BUT, some test platforms do this automatically!
Assuming you check results after every impression andstop once you reach significance….
![Page 16: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/16.jpg)
OK…well, then when should I stop an A/B test?
![Page 17: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/17.jpg)
SAMPLE SIZEDictates how long to run a test
![Page 18: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/18.jpg)
SAMPLE SIZE
• Used religiously in the pharmaceutical Industry, economic studies, etc…
![Page 19: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/19.jpg)
https://www.optimizely.com/resources/sample-size-calculator
![Page 20: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/20.jpg)
Agenda
1. How we put this into practice on a website test
2. How we applied these learnings to email testing:
• Open rates
• Click to Open Rates
• Conversion Rates
![Page 21: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/21.jpg)
A/B Testing on your websiteHere’s your new test process:
1. Determine your baseline conversion rate (or click rate, or download rate, etc..)
2. Decide how long you are willing to wait for a result. Convert your unique traffic metric to a sample size.
3. Adjust MDE (Minimum Detectable Effect) until your Sample Size is just under the target you determined in #2 above.
4. Re-adjust MDE until you are content.
5. Start the test, and don’t stop until you hit the sample size.
![Page 22: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/22.jpg)
Case Study: Item Urgency
![Page 23: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/23.jpg)
Case Study: Item Urgency
TEST (VERSION A):INVENTORY NOTIFICATION
CONTROL (VERSION B):NO INVENTORY NOTIFICATION
![Page 24: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/24.jpg)
STEP 1 – We determined our baseline conversion rate
![Page 25: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/25.jpg)
STEP 2 – Calculate Target Sample Size
We initially decided we wanted a result in 2 weeks. So we took the last 2 weeks of unique product page views:
![Page 26: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/26.jpg)
STEP 2 – Calculate Target Sample Size
We then divided that number by two (since we’ll have two test segments)
Divided by two again to account for desktop traffic only
Then multiplied by 5% (since the message only displays on 5% of product pages)
Sample Size -> 12,351
![Page 27: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/27.jpg)
This gave us 30% MDE (Conversion Lift). This is unrealistic
![Page 28: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/28.jpg)
How about 10% ?
![Page 29: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/29.jpg)
107,105 unique visits ~ 17 weeks
![Page 30: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/30.jpg)
Wow, that’s a long time…
![Page 31: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/31.jpg)
Yep.
![Page 32: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/32.jpg)
You’re probably not running your tests long
enough
![Page 33: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/33.jpg)
WAIT A MINUTE.
MY A/B TEST PLATFORM SAYS NOTHING ABOUT SAMPLE SIZE…
![Page 34: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/34.jpg)
EVERYONE WANTS INSTANT GRATIFICATION
![Page 35: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/35.jpg)
YOUR A/B TEST PLATFORM IS HAPPY TO SELL IT
![Page 36: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/36.jpg)
Quietly assuming you have calculated sample size on your own
![Page 37: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/37.jpg)
Item Urgency - Test ResultsWe are over 4 weeks in….
*Conv. rate is higher than expected because test platform runs on 7 day conversion window.
![Page 38: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/38.jpg)
Lift is over 10%
Note the spike in the beginning and the increased stabilization with time
Item Urgency - Test Results
![Page 39: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/39.jpg)
The effect is slowly approaching the MDE
Test Results
![Page 40: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/40.jpg)
Significance is now over 95%, but it’s been up and down.
Many marketers would stop the test on 9/5 and declare a 57% Lift.
Test Results
![Page 41: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/41.jpg)
Email Testing
![Page 42: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/42.jpg)
After learning about Sample Size, we reconsidered our email testing strategy
• Open Rate (Subject line testing)
• Click-to-Open (CTO) Rate
• Conversion Rate
![Page 43: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/43.jpg)
OPEN RATE
We used sample size to gut check the size of our subject line test segments
![Page 44: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/44.jpg)
OPEN RATE
Remember, for the sample size calculator, you need the baseline conversion rate and then the sample size, and that will give you
MDE.
![Page 45: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/45.jpg)
OPEN RATE
First, we needed the baseline conversion open rate
![Page 46: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/46.jpg)
OPEN RATE
Our open rates typically end up ~ 17% , but when we make the call on our winning subject line, open
rates are usually around 7%.
![Page 47: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/47.jpg)
OPEN RATE
Next we need the sample size
![Page 48: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/48.jpg)
OPEN RATE
We always test 4 different subject lines.
We had been sending each subject line to 10,000 customers.
So, sample size ~ 10,000
![Page 49: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/49.jpg)
OPEN RATE
Plugging these numbers in, this would only detect 13% open rate lift or higher
![Page 50: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/50.jpg)
OPEN RATE
13% lift on 17% open rate is 19.2%.
We rarely see subject lines this high
We needed a lower MDE to make sure we could detect more winners…
![Page 51: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/51.jpg)
OPEN RATE
We ended up doubling our subject line segment to 80,000, giving us an MDE ~ 9.2%
![Page 52: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/52.jpg)
CTO
First we needed the baseline
![Page 53: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/53.jpg)
CTO
We averaged the last 10 weeks -> 11% CTO
![Page 54: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/54.jpg)
CTO
Sample size = ½ of the avg opens count
![Page 55: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/55.jpg)
CTO
We averaged the last 10 weeks -> Avgopens = 107,000 / 2 = 53,500
![Page 56: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/56.jpg)
CTO
![Page 57: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/57.jpg)
CTO
4.4% CTO lift is a very reasonable goal for a test.
This showed us that we could trust most of the results of our past CTO tests.
![Page 58: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/58.jpg)
GRID vs. FREE FORM
15.7% CTO Lift
![Page 59: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/59.jpg)
PRODUCT NAMES vs. NO PRODUCT NAMES
22.6% CTO Lift
![Page 60: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/60.jpg)
Conversion Rate
We had been making many email decisions after reaching significance
on a conversion rate lift
![Page 61: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/61.jpg)
Conversion Rate
Time for a reality check.
![Page 62: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/62.jpg)
Conversion Rate
Baseline Conversion Rate ~ 1.5%
![Page 63: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/63.jpg)
Conversion Rate
Sample Size = ½ Average # Clicks -> 6,000
![Page 64: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/64.jpg)
Conversion Rate
![Page 65: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/65.jpg)
Conversion Rate
38% is ASTRONOMICAL
![Page 66: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/66.jpg)
Conversion Rate
To get meaningful results for conversion rate, consider running an email test many times, so that
you can eventually reach the necessary sample size.
![Page 67: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/67.jpg)
Takeaways
This is the MDE curve again. Remember what this looks like.The longer you run a test, the lower MDE will be.
The more traffic volume you have, the faster MDE will drop
![Page 68: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/68.jpg)
Takeaways
For Web Testing
• If you stop your A/B tests once you reach statistical significance, you are increasing your chances of finding false positives
• Calculating sample size will give you a clear stop date and an MDE
• MDE and sample size are inversely related – The lower the MDE, the larger the sample size
• Most likely, your A/B tests need to run much longer than you realize
For Email Testing
• Use sample size to determine the size of your subject line test segments
• Your CTO tests are probably reaching the necessary sample size
• Your Conversion tests are probably not hitting sample size
![Page 69: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/69.jpg)
Sources
Kyle Rush – Mozcon 2014 Presentation
https://seomoz.box.com/shared/static/2fw6yevkkmmdumz431j4.pdf
Evan Miller – How not to run an AB test
http://www.evanmiller.org/how-not-to-run-an-ab-test.html
![Page 70: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/70.jpg)
Zack NotesDigital Marketing Manager
@zacknotes
slideshare.net/zacknotes1/presentations
![Page 71: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/71.jpg)
Appendix
![Page 72: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/72.jpg)
GRID vs. FREE FORM
![Page 73: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/73.jpg)
PRODUCT NAMES vs. NO PRODUCT NAMES
![Page 74: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/74.jpg)
What do you do if a test reaches sample size and your lift < MDE?
![Page 75: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making](https://reader033.fdocuments.us/reader033/viewer/2022052908/559453881a28abd94f8b478f/html5/thumbnails/75.jpg)
You can either extend the test and accept a lower MDE or Move On.