A/B testing

22
A/B Testing By: Kapil Saxena

Transcript of A/B testing

A/B TestingBy: Kapil Saxena

What Is A/B Testing?

Develop two versions of a page

Randomly show different versions to users

Track how users perform

Evaluate (that's where statistics comes in)

Use the better version

Why A/B Test?

A typical website converts 2% of visitors

into customers

People can't explain why they left

Small changes can make a big difference

How about +40%?

Google believes it works, see Content

Experiments in Google Analytics

What Can You A/B Test?

Removing form fields

Adding relevant form fields

Marketing landing pages

Different explanations

Having interstitial pages

Email content

Any casual decisions you care about

A/B Tests Do Not Substitute For

Talking to users

Usability tests

Acceptance tests

Unit tests

Thinking

The G-Test

A method for comparing 2 data sets

It was invented by Karl Pearson in 1900

It is a close relative of the chi-square test

It is our main method for evaluating A/B

tests

There are alternatives

Limitations Of The G-Test

Only answers yes/no questions (but you

pick the question)

Only handles 2 versions (there is a

workaround)

Requires independence in samples

Does not do confidence intervals

What To Measure

Start your A/B test

Divide your users into groups A and B

Decide whether each person did what

you want

Reduce your results to 4 numbers: ($a_yes,

$a_no, $b_yes, $b_no)

G-Test Evaluation

Select a yes/no question about users

Divide users in A and B into yes/no

Perform complicated G-test calculation to calculate $p

Our confidence is 1-$p

Make decision if our confidence is near 100% and we have enough samples

Enough samples is at least 10 yes and no results in each test

The Conversion Funnel

Your Conversion Funnel

Every company has one or more conversion

funnels

You should know yours, and be actively trying

to improve each step

Each step can be tracked with some metric

Most A/B tests concentrate on one step in the

funnel

Expect to run multiple A/B tests against each

Standardize these metrics

Examples Of Metrics

Sessions, sessions with registration

People who searched, who viewed detail page, contacted, leased

People who saved favorites, started a cart, completed purchase

People who saw at least 3 pages, clicked on an ad

Anything measurable and important to your business

Too Many Metrics?

You may have many metrics

High confidence on one may be chance

Believe if it was the metric you tried to

change

Believe if very high confidence

Believe if metrics agree

Conflicting metrics require business

decision

Is That It?

You now know enough to run a successful

A/B test!

If you do everything right

If you do it wrong you won't know

You'll just get random answers

And believe them

Compare Apples To Apples

Traffic behaves differently at different

times

Friday night ≠ Monday morning

First week in month ≠ last week in month

Last week's visitors have done more than

this week's

Do not try to compare people from

different times

Be Careful When Changing The Mix

A and B can receive unequal traffic

But do not change the mix they get

Wrong Change(90/10) A vs B to (80/20)

You are implicitly comparing old A's with

new B's

Right Change(10/10/80) A vs B vs

Untested to (20/20/60)

This comes up repeatedly

What Is Wrong With This?

Suppose you are A/B testing a change to

your product page

You log hits on your product page

You log clicks on Buy Now

You plug those numbers into the A/B

calculator

Is this OK?

Beware Hidden Correlations!

Correlations increase variability, and therefore $g_test

Some people look at many product pages

Their buying behaviour is correlated on those pages

This increases the size of chance fluctuations

Leading to wrong results

Guarantee Independence

Whatever granularity (session, person,

event) you make A/B decisons on...

Needs to be what you test on

In this case measure people who hit your

product page

Measure people who clicked on Buy Now

Those are the right statistics to use

This comes up repeatedly

Wrong Metric At Rent.com we changed the title of our

lease report email

The new email had improved opens and clicks

That was because it interested people who were still looking for a place to live

That email needed to interest people who had already found a place to live

We looked at the wrong metric, and it cost us millions

This mistake is fairly rare

That's It!

Those are the big mistakes that I've seen

You now know how to do an A/B test

...and should have good odds of getting it

right

Of course there is more to know

But this is the core

Thanks!!!