Digital travel summit a b testing 2014-04-02
-
Upload
jonathan-isernhagen -
Category
Internet
-
view
169 -
download
1
description
Transcript of Digital travel summit a b testing 2014-04-02
C O N F I D E N T I A L
Testing and Tracking To Improve Conversion Rates
Jonathan Isernhagen
Digital Marketing Fundamentals Workshop Day 2
Digital Travel Summit
4/2/2014
C O N F I D E N T I A L
Home Page search widget modification
Control
C O N F I D E N T I A L
Home Page search widget modification
Variant A
C O N F I D E N T I A L
Home Page search widget modification
Variant B
C O N F I D E N T I A L
Home Page search widget modification
Variant B
Variant C
C O N F I D E N T I A L
Home Page search widget modification
Variant B
C O N F I D E N T I A L
Hotel Detail Page “Flexible Dates” tab name
Control
C O N F I D E N T I A L
Hotel Detail Page “Flexible Dates” tab name
Variant A:Removal
C O N F I D E N T I A L
Hotel Detail Page “Flexible Dates” tab name
Variant B:“Best Dates”
C O N F I D E N T I A L
Hotel Detail Page “Flexible Dates” tab name
Variant C:“Cheapest Dates”
C O N F I D E N T I A L
Hotel Detail Page “Flexible Dates” tab name
Variant C:“Cheapest Dates”
C O N F I D E N T I A L
Agenda
1) Examples
2) Program goals
3) Test approach philosophiesa) Eisenbergb) Anderson
4) Processa) solicitationb) prioritizationc) pre-testd) testinge) reportage
5) Alternative method: the multi-armed bandit
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
What is A/B testing?
Comparing conversion performance between shoppers on:
• Current site (=“control”);
• Modified version(s) (=“variants”).
Highest Success / View ratio (=“conversion”) is the winner.
“Success” can be defined in many ways:
1) Some sites aim for visitor pages viewed or time-on-site
2) Transaction sites count booking completion pages
3) Sophisticated testers incorporate all revenue (even of ads)
Control page
Test variant
Completion“success”
pageOther site pages
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Why do we A/B Test? Conversion Improvement
Continuous conversion improvement
subject to market competition
≈Building in Venice
C O N F I D E N T I A L
Why do we A/B Test? HIPPO* Defense
Highest Income Person’s Opinion
“If we have data, let’s look at data.
If all we have are opinions, let’s go with mine.”
– Jim Barksdale, Netscape CEO
*[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Why do we A/B test? Causality is Optional
“A British ship’s captain observed the lack of scurvy among sailors serving on the naval ships of Mediterranean countries, where citrus fruit was part of their rations…
He then gave half his crew limes (the Treatment group) while the other half (the Control group) continued with their regular diet...
While the captain did not realize that scurvy is a consequence of vitamin C deficiency, and that limes are rich in vitamin C, the intervention worked.” –
-Ron Kohavi, Practical Guide to Controlled Experiments on the Web
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Agenda
1) Examples
2) Program goal
3) Test approach philosophiesa) Eisenbergb) Anderson
4) Processa) solicitationb) prioritizationc) pre-testd) testinge) reportage
5) Alternative method: the multi-armed bandit
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
What should we test?
Travelocity relied heavily on “Always be Testing”
Contains laundry list of every combination of testable elements on web pages.
Good starting point for test brainstorming.
C O N F I D E N T I A L
Approaches: Eisenbergs (Monetate)
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
Credentials: author of “Always Be Testing,” lecturer, consultant
Philosophy:
• understand shopper psychology;
• sustain “scent;”
• note where shoppers bail;
• write formal hypotheses/draw wireframe, and;
• use matrix to prioritize.
Pace: high volume: 30+/month
Reporting: record results for historical memory/learning
Rigor: 80-95% confidence threshold, extensive path analysis
C O N F I D E N T I A L
Credentials: top consultant of top A/B test firm, 300+ clients gained average 10-40% conversion improvement using his methodology.
Philosophy:
• cares about the shopper’s actions, not thoughts
• test with open mind, no preconceived notions
• start with page element elimination or MVT,
then test permutations
• prioritize:– low-converting high-traffic or – high-conversion-correlated pages
Pace: limited only by sequential testing, 10-14 days/test
Reporting: five metrics max, revenue per visitor, no path reporting
Rigor: 80% confidence threshold, 5% rise, frequent re-testing.
Approaches: Andrew Anderson (Adobe)
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Comparing the methodologies
Topic areaEisenberg / Monetate
Adobe/Andrew
Sabre Research / Travelocity
Test prioritization 5 x 5 x 5 product Page element Effort/impact
Customer intent Profile/channel Ignore Understand
Design bias Univariate Multivariate Univariate A/B/C…
Contamination Embrace/ignore Acknowledge Accept sometimes
Negative lift Unfortunate Interesting Unfortunate
“Micro-conversion” Exciting Worthless Evaluating
Prone to find Occasional big wins Constant small wins 1 win per 7-10 tests
Test ends Eyeball assessment Eyeball assessment Precalculated size
Error risk High Medium Low
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Agenda
1) Examples
2) Program goal
3) Test approach philosophiesa) Eisenbergb) Anderson
4) Processa) solicitationb) prioritizationc) pre-testd) testinge) reportage
5) Alternative method: the multi-armed bandit
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Process: test suggestion database
• Test name
• Page name
• Page abbreviation
• Site section
• Date submitted
• Description
• Wireframe
• Effort expected
• Uplift expected
• Submitter
• Current owner
• Status
• Status change comments
C O N F I D E N T I A L
Process: solicitation
• Request everyone’s ideas (there is no monopoly on good ones)– Direct them to the online test database (include the link in the message)
• Ask them to:– do filtered search for similar ideas on same page– complete the form as completely as possible, including• Rate (from 1-5) the business improvement they expect, and expected effort• Include a wireframe picture of the change, even if it’s a scanned crayon drawing• Describe the effect they’re expecting
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Process: prioritization
1) Schedule periodic prioritization meetings
2) For each page on the site:a) Review recent test results and how they affect your thinkingb) Multiply projected impact numbers * Ease of implementation numbersc) Rank in descending order.d) Select the next wave of tests
3) For each site section/pathwaya) Sample conversion and traffic volume as applies to prioritized test pagesb) Decide what sensitivity/confidence /variant count you wish to specifyc) Calculate sample size and minimum test lengthd) Adjust test parameters as appropriate
4) Evaluate selected tests’ potential to interfere with each other
5) Ratify final test set and test parameters
6) Assign the developers and testers their tasks
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Reject null Accept null
Null Type I error Right decision
Alternative Right decision Type II error
Decision
Truth
Avoiding errors
Accepting a bad variant.
Testing doesn’t reveal The Truth, it enables us to make accurate guesses about the truth. Those guesses can be wrong in one of two ways:
Rejecting a good variant.
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Test length for statistical significance
Sample size = 2 * Z^2 * Conversion * (1 - Conversion)
(Conversion * Change)^2
• Change: ….the smaller the lift you want to detect
• Confidence: …the greater the confidence you want to have
• Conversion:…the closer the page’s conversion is to 50%
• Contamination: …the purer you want the results to be. If you let experiments re-use each others’ traffic, you can get more data faster.
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
You have to test longer…
C O N F I D E N T I A L
Test length: additional considerations
Beyond statistical minimum sample size, there are other test-sizing factors to consider:
• Cyclicality: shoppers often demonstrate different conversion behavior on different days of week, to a different degree across variants. Inoculate your test by stopping it on a multiple of 7 days.
• Re-shop: the benefits of a superior variant may not express themselves during the same session. Know the average shop-to-book incubation period for the tested page and run several weeks longer.
• Interface shock: as above, the control has a home court advantage by virtue of being well-known to past visitors. Even superior variants play at a disadvantage until shoppers become accustomed.
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Process: pre-test
1) Ensure all variants have been fully tested
2) Announce forthcoming tests, with special attention to:a) Help Deskb) Site Health
3) Re-communicate your reporting procedure:a) All results and your interpretations will be immediately posted to databaseb) Interim test results are not scientific and will not be:
i. Subject to early speculationii. Cause to prematurely terminate tests or extend them beyond agreed interval
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
During the tests
• Keep an eye on site metrics generally and where your tests touch
• Avoid “premature exclamation”– Observe, but do not report on, the test variants’ performance– Remember that measures of statistical confidence are valid only in the exact
moment you pre-determined to end the test.
C O N F I D E N T I A L
Process: reportage
Once each test has run to its pre-determined conclusion trigger:
1) Copy the results from test tool into the database
2) Complete the after-test fields, including:a) What we conclude from this test, especially if results are significant?b) What (if any) concerns we have about its accuracy?c) What follow-on tests and actions you recommend?
3) Decide whether the results are interesting/controversial enough to warrant calling a meeting and/or distributing an explanatory .ppt.
4) Periodically review the relative performance of the various test groups in your site metrics tool if it permits selection that way.
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Agenda
1) Examples
2) Program goal
3) Test approach philosophiesa) Eisenbergb) Anderson
4) Processa) solicitationb) prioritizationc) pre-testd) testinge) reportage
5) Alternative test method: the multi-armed bandit
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Testing alternatives: Multi-armed bandit method
Like a hive of bees that always reconnoiter but sends most bees to known gardens.
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen
C O N F I D E N T I A L
Take-aways
1) Commission a company-wide online test idea database
2) Be clear about your testing methodology:a) document your hypotheses, be able to tell a narrativeb) know what error risks you’re exposing yourself toc) quick tests with low confidence/sensitivity can be okay if followed up
3) Ignore your test tool vendor repa) “Let’s test longer to see if it goes to confidence.” = disqualifying statementb) The test tool readout is accurate only on the pre-chosen test day.
4) Consider multi-armed bandit optimization
[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen