Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Testing and Tracking To Improve Conversion Rates

Jonathan Isernhagen

Digital Marketing Fundamentals Workshop Day 2

Digital Travel Summit

4/2/2014


Home Page search widget modification

Control



Variant A



Variant B



Variant B

Variant C



Variant B


Hotel Detail Page “Flexible Dates” tab name

Control



Variant A:Removal



Variant B:“Best Dates”



Variant C:“Cheapest Dates”


Agenda

1) Examples

2) Program goals

3) Test approach philosophiesa) Eisenbergb) Anderson

4) Processa) solicitationb) prioritizationc) pre-testd) testinge) reportage

5) Alternative method: the multi-armed bandit

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen


What is A/B testing?

Comparing conversion performance between shoppers on:

• Current site (=“control”);

• Modified version(s) (=“variants”).

Highest Success / View ratio (=“conversion”) is the winner.

“Success” can be defined in many ways:

1) Some sites aim for visitor pages viewed or time-on-site

2) Transaction sites count booking completion pages

3) Sophisticated testers incorporate all revenue (even of ads)

Control page

Test variant

Completion“success”

pageOther site pages



Why do we A/B Test? Conversion Improvement

Continuous conversion improvement

subject to market competition

≈Building in Venice


Why do we A/B Test? HIPPO* Defense

Highest Income Person’s Opinion

“If we have data, let’s look at data.

If all we have are opinions, let’s go with mine.”

– Jim Barksdale, Netscape CEO

*[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen


Why do we A/B test? Causality is Optional

“A British ship’s captain observed the lack of scurvy among sailors serving on the naval ships of Mediterranean countries, where citrus fruit was part of their rations…

He then gave half his crew limes (the Treatment group) while the other half (the Control group) continued with their regular diet...

While the captain did not realize that scurvy is a consequence of vitamin C deficiency, and that limes are rich in vitamin C, the intervention worked.” –

-Ron Kohavi, Practical Guide to Controlled Experiments on the Web



Agenda

1) Examples

2) Program goal






What should we test?

Travelocity relied heavily on “Always be Testing”

Contains laundry list of every combination of testable elements on web pages.

Good starting point for test brainstorming.


Approaches: Eisenbergs (Monetate)


Credentials: author of “Always Be Testing,” lecturer, consultant

Philosophy:

• understand shopper psychology;

• sustain “scent;”

• note where shoppers bail;

• write formal hypotheses/draw wireframe, and;

• use matrix to prioritize.

Pace: high volume: 30+/month

Reporting: record results for historical memory/learning

Rigor: 80-95% confidence threshold, extensive path analysis


Credentials: top consultant of top A/B test firm, 300+ clients gained average 10-40% conversion improvement using his methodology.

Philosophy:

• cares about the shopper’s actions, not thoughts

• test with open mind, no preconceived notions

• start with page element elimination or MVT,

then test permutations

• prioritize:– low-converting high-traffic or – high-conversion-correlated pages

Pace: limited only by sequential testing, 10-14 days/test

Reporting: five metrics max, revenue per visitor, no path reporting

Rigor: 80% confidence threshold, 5% rise, frequent re-testing.

Approaches: Andrew Anderson (Adobe)



Comparing the methodologies

Topic areaEisenberg / Monetate

Adobe/Andrew

Sabre Research / Travelocity

Test prioritization 5 x 5 x 5 product Page element Effort/impact

Customer intent Profile/channel Ignore Understand

Design bias Univariate Multivariate Univariate A/B/C…

Contamination Embrace/ignore Acknowledge Accept sometimes

Negative lift Unfortunate Interesting Unfortunate

“Micro-conversion” Exciting Worthless Evaluating

Prone to find Occasional big wins Constant small wins 1 win per 7-10 tests

Test ends Eyeball assessment Eyeball assessment Precalculated size

Error risk High Medium Low



Agenda

1) Examples

2) Program goal






Process: test suggestion database

• Test name

• Page name

• Page abbreviation

• Site section

• Date submitted

• Description

• Wireframe

• Effort expected

• Uplift expected

• Submitter

• Current owner

• Status

• Status change comments


Process: solicitation

• Request everyone’s ideas (there is no monopoly on good ones)– Direct them to the online test database (include the link in the message)

• Ask them to:– do filtered search for similar ideas on same page– complete the form as completely as possible, including• Rate (from 1-5) the business improvement they expect, and expected effort• Include a wireframe picture of the change, even if it’s a scanned crayon drawing• Describe the effect they’re expecting



Process: prioritization

1) Schedule periodic prioritization meetings

2) For each page on the site:a) Review recent test results and how they affect your thinkingb) Multiply projected impact numbers * Ease of implementation numbersc) Rank in descending order.d) Select the next wave of tests

3) For each site section/pathwaya) Sample conversion and traffic volume as applies to prioritized test pagesb) Decide what sensitivity/confidence /variant count you wish to specifyc) Calculate sample size and minimum test lengthd) Adjust test parameters as appropriate

4) Evaluate selected tests’ potential to interfere with each other

5) Ratify final test set and test parameters

6) Assign the developers and testers their tasks



Reject null Accept null

Null Type I error Right decision

Alternative Right decision Type II error

Decision

Truth

Avoiding errors

Accepting a bad variant.

Testing doesn’t reveal The Truth, it enables us to make accurate guesses about the truth. Those guesses can be wrong in one of two ways:

Rejecting a good variant.



Test length for statistical significance

Sample size = 2 * Z^2 * Conversion * (1 - Conversion)

(Conversion * Change)^2

• Change: ….the smaller the lift you want to detect

• Confidence: …the greater the confidence you want to have

• Conversion:…the closer the page’s conversion is to 50%

• Contamination: …the purer you want the results to be. If you let experiments re-use each others’ traffic, you can get more data faster.


You have to test longer…


Test length: additional considerations

Beyond statistical minimum sample size, there are other test-sizing factors to consider:

• Cyclicality: shoppers often demonstrate different conversion behavior on different days of week, to a different degree across variants. Inoculate your test by stopping it on a multiple of 7 days.

• Re-shop: the benefits of a superior variant may not express themselves during the same session. Know the average shop-to-book incubation period for the tested page and run several weeks longer.

• Interface shock: as above, the control has a home court advantage by virtue of being well-known to past visitors. Even superior variants play at a disadvantage until shoppers become accustomed.



Process: pre-test

1) Ensure all variants have been fully tested

2) Announce forthcoming tests, with special attention to:a) Help Deskb) Site Health

3) Re-communicate your reporting procedure:a) All results and your interpretations will be immediately posted to databaseb) Interim test results are not scientific and will not be:

i. Subject to early speculationii. Cause to prematurely terminate tests or extend them beyond agreed interval



During the tests

• Keep an eye on site metrics generally and where your tests touch

• Avoid “premature exclamation”– Observe, but do not report on, the test variants’ performance– Remember that measures of statistical confidence are valid only in the exact

moment you pre-determined to end the test.


Process: reportage

Once each test has run to its pre-determined conclusion trigger:

1) Copy the results from test tool into the database

2) Complete the after-test fields, including:a) What we conclude from this test, especially if results are significant?b) What (if any) concerns we have about its accuracy?c) What follow-on tests and actions you recommend?

3) Decide whether the results are interesting/controversial enough to warrant calling a meeting and/or distributing an explanatory .ppt.

4) Periodically review the relative performance of the various test groups in your site metrics tool if it permits selection that way.



Agenda

1) Examples

2) Program goal



5) Alternative test method: the multi-armed bandit



Testing alternatives: Multi-armed bandit method

Like a hive of bees that always reconnoiter but sends most bees to known gardens.



Take-aways

1) Commission a company-wide online test idea database

2) Be clear about your testing methodology:a) document your hypotheses, be able to tell a narrativeb) know what error risks you’re exposing yourself toc) quick tests with low confidence/sensitivity can be okay if followed up

3) Ignore your test tool vendor repa) “Let’s test longer to see if it goes to confidence.” = disqualifying statementb) The test tool readout is accurate only on the pre-chosen test day.

4) Consider multi-armed bandit optimization


Digital travel summit a b testing 2014-04-02

Internet

Transcript of Digital travel summit a b testing 2014-04-02