Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

59
#SMX #12C3 @AdriaK How to Avoid the First Two When Producing the Latter Lies, Damned Lies, and Search Marketing Statistics Adria Kyne Vistaprint

Transcript of Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

Page 1: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaKHow to Avoid the First Two When Producing the Latter

Lies, Damned Lies, and Search Marketing Statistics

Adria KyneVistaprint

Page 2: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Problems• Using samples that are too small• Using significance as a stopping point for a testSolutions• More rigor with fixed-sample tests• Using sequential sampling tests• Bayesian testingBonus Pro Tip for achieving valid samples

Today’s Topics

Page 3: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

1) Make sure that we understand what actually happened

2) Be sure that we can use these results to predict the future

What is the Whole Point of This Anyway?

Page 4: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

1. We want to know whether the variation is better, worse, or the same as the original.

2. We don’t want to see a positive outcome that isn’t really there— a false positive or Type I error

3. We don’t want to miss a positive outcome—a Type II error.

Basics of Hypothesis Testing

Page 5: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Your product page has an average 2.0% CR. You make a bunch of tweaks to the design, and after 30,000 visits, your CR is 2.25%.

You think you’re a genius, and so you tell your boss. Score!

#1 A Common (Sad) Story

Page 6: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

At the end of the month, your revenue is no higher.You look bad.

The change you saw was not “significant,” because your sample size wasn’t big enough.

Yes, 30,000 visits was not enough.

You spoke too soon.

Page 7: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

I gotta be cruel to be kind.

Page 8: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

The smaller the difference, the bigger the sample you’ll need:

2% - 3% is a 50% increase

2.0%-2.5% is a 25% increase

2.0% - 2.25% is a 12.5% increase

For standard A/B hypothesis tests

Page 9: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Decide on How Much Impact Your Change Should Have

Visits CROrder

s AOVRevenu

eAnnual

Increase20,00

02.00

%

400 $50 $20,00

0 20,00

02.25

%

450 $50 $22,50

0 $30,000 20,00

02.50

%

500 $50 $25,00

0 $60,000 How much of a difference do you want to be able to detect with your test?

Page 10: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

“power analysis for two independent proportions”

Pick a Sample Size Calculator

minimum sample size

we’re showing the variants to different visitors

we’re comparing rates, which are proportions

Page 11: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Is variation is higher or lower than the original? “two-tailed test.”

A 5% significance level is common—that is, there’s a 5% chance of a false positive

80% statistical power is common—there is a 20% chance (1 in 5) that if there was an effect, we’d miss it.

Calculator Options

http://bit.ly/25zI5Rv

P1 = your control CR, e.g. 0.02 for 2%P2= your likely test CR, e.g. 0.025 for 2.5%

Page 12: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

The effect of using 0.05 and 80% is that we are 4 times more likely to get a false negative than a false positive

We’re more concerned about making things worse

We accept a higher chance that won’t see a positive effect that is actually there

Consequences of Significance and Power Choices

Page 13: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Those are arbitrary choices.We’re not testing pharmaceuticals.Are we really so terrified that we’ll roll out a page that isn’t an improvement?

NOBODY IS GOING TO DIE

Page 14: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Means that I love you.Baby.

Page 15: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Necessary Sample Sizes

1% change

13,809

3,826

0.5% change

52,238

Page 16: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Requires 52,238 Visits

Detecting a 12.5% increase in Conversion Rate

For each sample

Page 17: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Photo by Marilynn Windust https://ronmitchelladventure.com

Page 18: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

You’re hoping for a 0.25% uplift on a 2.0% average CR.

The Control is getting 2.0% CR, and the Variant is getting 3.0% CR!

#2 Another Common (Sad) Story

“Why haven’t we switched to the test variant? It’s CLEARLY

WINNING.”

Page 19: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

So you test the significance level.

Success! The difference is significant. You roll out the new page, and...

...nothing happens

And this is how things go awry

Page 20: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

A significance calculation assumes that the sample size was

fixed in advance

It assumes that you have a valid sample

So when you ignore this and run until you get a “significant result,” you’re misusing the math

Why didn’t it work?

Page 21: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

If you hit a period that happens to be performing well

You may succumb to the temptation to stop while you’re ahead

Repeated significance testing increases the rate of false positives

Friends don’t let friends test significance prematurely

Image: Public Domain, via Wikipedia

Page 22: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Why repeated significance testing is a problem

Page 23: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

5% significance means that even if there is no difference between the test and the control

We’ll see an imaginary difference in the test 5% of the time

Remember what significance means?

Page 24: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Repeated Significance Testing is The Devil

Given: there is no actual difference between two test variants

Option 1 Option 2 Option 3 Option 41st observation

Significant No difference

Significant No difference

2nd observation

- Significant - No difference

End of Test Significant Significant Significant No difference

Likelihood ? ?

Option 1 Option 2 Option 3 Option 41st observation

Significant No difference

Significant No difference

2nd observation

Significant Significant No difference

No difference

End of Test Significant Significant No Difference

No difference

Likelihood 5% chance 95% chanceOption 1 Option 2 Option 3 Option 4

1st observation

Significant No difference

Significant No difference

2nd observation

- Significant - No difference

End of Test Significant Significant Significant No difference

Likelihood 26% chance 74% chance

Option 1 Option 21st observation Significant No difference Likelihood 5% chance 95% chance

Page 25: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

See the slippery slope in action!

Day 1Control 150

2.00% 2.01%Variant 175

2.25% 2.35%Visits/Variant 7,460

not

Day 1 Day 2Control 150 313

2.00% 2.01% 2.10%Variant 175 332

2.25% 2.35% 2.23%Visits/Variant 7,460 14,920

not not

Day 1 Day 2 Day 3Control 150 313 448

2.00% 2.01% 2.10% 2.00%Variant 175 332 498

2.25% 2.35% 2.23% 2.23%Visits/Variant 7,460 14,920 22,380

not not not

Day 1 Day 2 Day 3 Day 4Control 150 313 448 636

2.00% 2.01% 2.10% 2.00% 2.13%Variant 175 332 498 695

2.25% 2.35% 2.23% 2.23% 2.33%Visits/Variant 7,460 14,920 22,380 29,840

not not not not

Day 1 Day 2 Day 3 Day 4 Day 5Control 150 313 448 636 750

2.00% 2.01% 2.10% 2.00% 2.13% 2.01%Variant 175 332 498 695 835

2.25% 2.35% 2.23% 2.23% 2.33% 2.24%Visits/Variant 7,460 14,920 22,380 29,840 37,300

not not not not SIGNIFICANT

Day 1 Day 2 Day 3 Day 4 Day 5 Day 6Control 150 313 448 636 750 922

2.00% 2.01% 2.10% 2.00% 2.13% 2.01% 2.06%Variant 175 332 498 695 835 993

2.25% 2.35% 2.23% 2.23% 2.33% 2.24% 2.22%Visits/Variant 7,460 14,920 22,380 29,840 37,300 44,760

not not not not SIGNIFICANT not

Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7Control 150 313 448 636 750 922 1098

2.00% 2.01% 2.10% 2.00% 2.13% 2.01% 2.06% 2.10%Variant 175 332 498 695 835 993 1174

2.25% 2.35% 2.23% 2.23% 2.33% 2.24% 2.22% 2.25%Visits/Variant 7,460 14,920 22,380 29,840 37,300 44,760 52,220

not not not not SIGNIFICANT not not

Page 26: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Smart marketers PRE-COMMIT to a valid sample size

And do not test for significance before they’ve collected it!

Therefore:

Page 27: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Because you have to be able to satisfy impatient observers

But I neeeeeeed to test significance repeatedly!

Page 28: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Solves the problem of repeated significance testingAllows you to stop the test early if the Variant is a winnerWorks with low conversion rates (under 10%)

Sequential A/B Testing

Image: http://geneticsandbeyond.blogspot.com/2014/08/the-puffinss-lair-sweat-of-hippos.html

Page 29: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

1. Determine your sample size N (number of total conversions)

2. Measure the success of your Control and Variant groups

3. Check for stopping points If Variant - Control = 2.25√N the Variant wins If Control - Variant = 2.25√N the Control wins If Variant + Control = N, there is no winner

Sequential experiment design

http://bit.ly/1sSDz29

Page 30: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Sequential Sampling Calculator

http://bit.ly/1TM1LKv

Page 31: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Given a baseline conversion rate pMinimum detectable effect you want to see is d

1.5p + d < 36%When less than 36%, a sequential test will be shorter

p = 2.0%, d = 12.5% (2.25% CR) 1.5p + d = 15.5%

When to choose a fixed sample vs. a sequential test

Page 32: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Variant CR = better than ControlP-value = 0.18 (i.e. greater than our 0.05 significance level)

When Good Math Leads to Bad Career Moves

So how did the test go?

Neither. We didn’t achieve significance.

So which version won?

We stopped this morning.

So why did you stop it?!

Just show it to another 10,000 visitors.

We can’t do that. We have to accept that the test is over.

This guy is not a team player.

I am so screwed..

Well, the null hypothesis... blah blah blah

Blah blah p-value blah blah hlah blah

Image: 20th Century Fox via Amazon

Page 33: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Communicating results is hard.

So which one performs better?

There is a 95% probability that the results we saw are not due to random chance!

Why can’t this guy just answer a straight question?

I hate my life.Image: 20th Century Fox via Amazon

Page 34: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

How to stop your test at any time and still make valid inferences!!

Much easier to understand and explain the results!!

Bayes’ Theorem

Image via Wikipedia

Page 35: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Frequentist BayesianAssumes that there is no difference,and finds the probability that chance alone could have produced the experimental results seen

Focuses on not getting Type I errors

Most people don’t understand what the results mean

What’s the Difference?

Finds the probability that the test is better

More forgiving of Type I errors

Easier to understand and communicate to non-technical audiences

Page 36: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Calculus

Why Don’t Marketers Use Bayes’ Theorem?

This formula determines is the probability that B will beat A in the long run. There’s a slightly different one if you have three test groups, etc.

Page 37: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Online calculators are your friends!

But Wait!

Page 38: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Wins and losses dataGraph• Probability distributionsTable• Probability of being

best• Spread of conversion

rates

Cool online Bayesian calculator

http://bit.ly/24mKJaY

Page 39: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

1. Decide on the probability you’re comfortable with

2. Decide how much variance you’re willing to accept

How to use this calculator

Page 40: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

96% probability that B is betterBut what’s the real CR?Needs more data

High spread, less overlap

Page 41: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Not very much CR varianceBut B is only 70% likely to be better

Low spread, high overlap

Page 42: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Variance of CR isn’t as badSeparation of peaks means that the CRs are different94% probability that B is probably betterWe aren’t certain about the actual CR

Less spread, less overlap

Sample size is only 100 conversions each!

Page 43: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

You might actually see

Page 44: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Allows you to start the test with some assumptions, called “priors”

Can include: • the prior success probability (our belief about the

average conversion rate)• How much variance you expect

Bayesian’s interesting twist

Page 45: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

1. Set your “priors”2. Input your test data3. Get back the

probability that the test variant performs better

Different cool Bayesian calculator

http://bit.ly/1Wzrtro

Page 46: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Actual, Understandable Results

Page 47: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

You can make inferences from low traffic and low conversions

When someone says "What's the probability that the new page outperforms the old one?", you can give them an answer!

Advantage of Bayesian results

Page 48: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

1. You know how not to run a fixed sample test 2. You know you can run a sequential sample test

when you need ongoing information about the results

3. You know how to run a Bayesian test, where you can keep checking your progress AND explain the results easily

So now what?

Page 49: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Are you trying to detect a big difference, or a small difference?

Use the formula 1.5p + d big difference - use a normal fixed sample test

(>36%) small difference - use a sequential test (< 36%)

Do the people you report to get confused or unhappy when you try to explain significance and p-values to them?

Run a Bayesian test

Review: How to Design your Experiment

Page 50: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Tests using significance Bayesian test1. Use a sample calculator 2. Run the test for the specified

sample3. Profit!

So That’s It, Then?

1. Decide how solid you want your probability estimate to be

2. Run the test and update the data

3. Profit!

Page 51: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

I’m all about the tough love.

Page 52: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

We are not measuring consistent user groups• Time of day• Day of week• Seasonality• Sales

The Problem of Illusory Lift

Page 53: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Run your tests long enough to cover at least

one entire traffic/conversion cycle

Monday-Sunday or equivalent full week

Account for business cycles

Page 54: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Daily differences in performance

Page 55: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Don’t run your test too long

Visitors delete their cookies and will pollute your samples

Account for user behavior

Page 56: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Nearly 40 percent of Internet users delete cookies from their primary computers on at least a monthly

basis

53 percent delete cookies, cache or browsing history to help protect their privacy online

It’s probably more than you think

JupiterResearch 2005

TRUSTe/National Cyber Security Alliance U.S. Consumer Privacy Index January 2016

Page 57: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

• Pre-commit to a sample size/experimental design• Fixed Sample A/B testing – no peeking before it’s

done• Sequential A/B testing – built-in peeking • Bayesian – easier to understand the results• Collect samples for a full business cycle, but not too

long

Summary

Page 58: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaK

Fixed sample calculator Stats Dept., U of British Columbia http://bit.ly/25zI5RvSequential sampling calculatorEvan Miller http://bit.ly/1TM1LKv Simple Bayesian calculatorPeak Conversion http://bit.ly/24mKJaY Bayesian calculator with priorsLyst http://bit.ly/1Wzrtro

Calculators I used

Page 59: Lies, Damned Lies, & Search Marketing Statistics By Adria Kyne

#SMX #12C3 @AdriaKLEARN MORE: UPCOMING @SMX EVENTS

THANK YOU! SEE YOU AT THE NEXT #SMX