What is A/B-testing? An Introduction

A/B-Testing: An Introduction

What is it? Why Use it?

Prediction in Predictable EnvironmentsPredictable Models Excel in Deterministic Environments

Statics & Dynamics Don’t Change• ‘Fitness’ for purpose always measured

the same• Frictionless Pendulum swing Very

Predictable– Simple Harmonic Motion

• Control Systems– e.g. Anti-lock Braking System

Sacrilege:Learning is pointless (it’s all known), thus Waterfall/Heavy Development Methods Excel! :-O

Time Period give by

Uncertain/Unpredictable Contexts• Human Interaction

Uncertain. • Everyone is…

– Different– [Relatively] fickle– Growing Older– Influenced By Other Stuff– …

• Definition of fitness for purposes changes

• In fact, Everything Changes!

Story of the Foot• Once upon a time there was a foot which Belonged to the

King of a Powerful Kingdom• He Reigned Supreme because All Swords Had to be 7 ft Long• King dies naturally and a new King is Coronated• But he has a Big Ego and Really Small Feet

– Half the length of Previous King• He Ordains All Swords Now not Fit for Purpose• So they’re Melted & Remade to 7 of his feet• Along come Evil Army with swords now Twice as Long• Nobody in the Kingdom Lived Happily Ever After! :-(

Q: HOW CAN WE EVER BE PREDICTABLE?

Pick Your Tool: Certainty v UncertaintyPredictable Environments• Lots known up front• ‘Variables/factors’ can all be identified…• …So can predict with high certainty

where whole systems will be in t time-steps – seconds, minutes, hours, days, weeks,

months, years…

• Little Need to Adapt• Most appropriate for Standards Models

– SI Units– HTTP/SMTP/POP3…

• ‘Dictate works’, not nice, but true• e.g. ‘7ft’ Swords will have continued to

exist– Even if the heads of the blacksmiths didn’t.

Uncertain Environments• Very little known up front• Variable levels of traffic,

experience etc.• ‘Fitness function’ itself changes

– e.g. King changes = Foot changes

• Continual need to check the fitness function…– e.g. Customer reviews,

performance metrics

• Infers Continual Need to Change/Improve Systems

EXAMPLE: Running a Bath (Uncertain)Predictable Models• Don’t know the water temperature• Never done it before

1. Put hot tap on for 5 minutes2. Cold Tap on for 2 minutes3. Get in

RisksScolding your Jewels and More!

Uncertainty Models• Don’t know the water temperature• Never done it before

1. Put hot tap on for 5 seconds2. Put cold tap on for 2 seconds3. Dip toe in4. If

• Too hot add cold water• Too cold add hot water • Else get in & relax

5. Go to 1 (Rinse, Repeat)

RisksSlightly more time to get to ideal temperature, but gets there with much less risk of burning crucial elements and potential less water waste.

EXAMPLE: Running a Bath Cycle

Run Water (Hot and

Cold) - Build

Test with ‘Toe’ -

Measure

Evaluate Temperature

- Learn

Best test this with my toe, so I don’t

scald myself…Ahh, F@#*!!! THAT’S HOT!

I burnt my toe! Not

doing that again!

Dealing with Uncertainty• More variables than equations to solve them…• …Hence optimisation problem (no unique solution)• Like it or not, iterative cycles work best

– Build-Measure-Learn; DMAIC• Frequent Experiments & Actionable Change• Control by Experimental Design Principles

– Test one change in isolation– Compare against a control group/result– Randomise Groupings– Double Blind

• Plus, smaller tasks = smaller variance = greater certainty

Gold Standard: Randomised Double Blind Controlled Trial

Definition: Randomised

• Two groups• Randomly Assign

Subjects to Each Group

Definition: Double Blind

Both Researcher & Subject Don’t know which group they are assigned to.

So researcher and subject behave the same for A and B tests.

TIP: Automated allocation

Image via ’John the Math Guy’

Definition: Controlled

Every potential factor is fixed aside from the factor under test.

Minimises ‘confounding variables’e.g. If someone goes outside and gets wet, does it mean it’s raining?

Image via ‘Not the average’ blog

Designing Experiments• Start with Hypothesis– Include theory if analytical

• Experiment AGAINST a control group!– Control Group = Baseline to compare against (B-test)– Experimental Group is A-test

• Randomly Allocate Control & Experimental Group– Ideally Researcher & Subject Can’t Know

• Analyse Results, Conclude AND Act!

Caution• Change only one thing at once!

– Can do A/B/n tests, but have to be linearly independent variables • statistically, not a certainty!

• Objective: Make sure results aren’t by chance (e.g. against placebo)!• Analyse against ‘Null’ Hypothesis

– Opposite of what you are trying to prove• Factor in type 1 & 2 statistical errors

– False positive and Negatives• Your test is alternate hypothesis• If Null hypothesis (Chance) is very very small, accept Alternate hypothesis…

– ‘Small-p’ = probability null hypothesis is true• …which you are trying to prove!• Otherwise, no choice but to accept null hypothesis

Q: Where Can A/B-testing Be Used?

A: EVERYWHERE!

Where Can A/B-Tests Be Used?• Guerrilla testing• Lean-Startup A/B-Tests (tech, marketing etc.)• Pilots • Experiments• Proof of Concepts• Software Development Team Retrospectives• Manufacturing Processes• Change Programmes• Departmental Effectiveness• …

Q: What tools can we use?

A: STATISTICS

Toolbox: Normal DistributionData that is normally distributed shown as a continuous line.Fixed width histogram = Same (right)Pros:1. Incredibly diverse2. Tables/Excel Functions existCons: 3. Needs many samples (25+)

– Errors significantly impact result & need other ways (e.g. t-test)

4. Can’t Always Force Normality– But story point estimates can!

Source: Critical Numbers Group Sheffield University

Toolbox: Confidence IntervalsIndicates reliability of estimate, given data = Likelihood that result falls within values of x-standard deviations of the mean.Answers “How sure are you that this result was expected?” Pros:1. Easy to do2. Excel Functions/Libraries existCons: 3. Same weakness as normal

distribution4. Arbitrary confidence intervals

– Researcher chooses, but 95% defacto standard (2 sigma)

Source: Moz.com

Toolbox: Correlation MatrixMatrix of elements. Each is correlation coefficient of data v data.

“How strongly does this relate to that?”

High correlation -> dig deeperPros:1. Excel Functions/Libraries existCons: 2. Correlation isn’t Causation!3. More of a ‘faff’ in Excel

– Prone to human error in analysis

Source: Genome biology

Toolbox: Factor AnalysisUsing correlation matrix to identify factors, determine independent variables for dependent variables.Pros:1. Linear Algebra tools to help2. Identifies combinations of factorsCons: 3. Excel doesn’t support it native4. ‘Cancelling’ factors or

confounding factors problematic5. Have to understand linear algebra6. Basically an approximation (so

what’s good enough?)Source: Kovach Computing Services

DefinitionsTERM DESCRIPTION

Dependent Variable A variable that depends on one or more other variables (y = x + 2, y is dependent, x is independent)

Independent Variable A variable that does not depend on the value of any other variable.

Confounding Variable A variable that could independently present the same result as some other variable. This reduces the credibility and certainty of a result (e.g. if I go outside and I get wet, is it because it was raining?)

Distribution The ‘shape’ of the graph of a random variable

Type 1 Error (False Positive)

Declaring a result as confirmed when it’s not, usually through experimental error.

Type 2 Error (False Negative)

Declaring a result as false when it’s true. Usually by experimental or interpretive error..

Thanks for ViewingFurther Reading

Random Variables and Probability Distributionshttps://

www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/random-variables

Khan AcademyConfidence Intervals

http://en.wikipedia.org/wiki/Confidence_interval Normal Distribution

http://en.wikipedia.org/wiki/Normal_distribution “Correlation & Dependence” Wikipedia

http://en.wikipedia.org/wiki/Correlation_and_dependence Factor Analysis

http://en.wikipedia.org/wiki/Factor_analysis Genome Biology

http://genomebiology.com/ Publishes research, software and new methods

Ethar Alali @EtharUK @Dynacognetics

Managing Director & Chief ArchitectPolymath-MathMo. Programming since 9 years old. TOGAF 9 Certified, change agent. Blog: GoadingtheITGeek.blogspot.co.uk

Specialist ICT Strategists & Advisors. Member of HiveMind Network for some of the biggest household and corporate multi-nationals.

Accredited Growth Voucher Advisors certified to deliver IT & Web Growth Consultancy as part of the government’s Growth Voucher Scheme.

About Us

Accreditations & Associations

https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/random-variables




http://en.wikipedia.org/wiki/Confidence_interval

http://en.wikipedia.org/wiki/Confidence_interval

http://en.wikipedia.org/wiki/Normal_distribution

http://en.wikipedia.org/wiki/Normal_distribution

http://en.wikipedia.org/wiki/Correlation_and_dependence

http://en.wikipedia.org/wiki/Correlation_and_dependence

http://en.wikipedia.org/wiki/Factor_analysis



http://genomebiology.com/



What is A/B-testing? An Introduction

Technology

Transcript of What is A/B-testing? An Introduction