Murphy Power Analysis

8/3/2019 Murphy Power Analysis

1/40

Power Analysis for Traditional and

Modern Hypothesis Tests

Kevin R. Murphy

Pennsylvania State University


2/40

Power Analysis

Helps you plan better studies

Helps you make better sense of existingstudies

Is not limited to traditional null hypothesistests

Application of power analysis to minimum-effecttests will be discussed


3/40

Errors in Null Hypothesis Tests

No Effect (H0) Some Effect

Reject Null Type I Error -reject null when itis true

()

Power= 1-

Fail to RejectNull

Type II Error - failto reject nullwhen you should()

True State of Affairs

Your

Decision


4/40

Power Depends On

Effect Size

How large is the effect in the population?

Sample Size (N)

You are using a sample to make inferences about

the population. How large is the sample?

Decision Criteria - How do you define significant and why?


5/40

Power Analysis and the F Distribution

The power of most statistical tests in socialsciences (e.g., ANOVA, regression, t-tests,other linear model statistics) can beevaluated via the familiar F distribution

F is a ratio of observed effect to error F= MS treatments / MS error

F = (True Effect + Error) / Error The larger the true treatment effect, the larger F

you expect to find

If the null hypothesis is correct, E(F) = 1.0


6/40

How Does Power Analysis Work?

0 1 2 3 4

F Value

In the familiar F distribution below, 95% of thevalues are below 2.00 (distribution for df = 7,200)

F=2.0 represents

cutoff for rejectingH0


7/40

The Noncentral F Distribution

0 1 2 3 4

F Value

Central F

Noncentral F

If the null hypothesis is false, the Noncentral F distributionis needed. In the Noncentral F distribution below, 75% of

the values are below 2.00. Therefore, power = .25


8/40

A Larger Effect

0 1 2 3 4

F Value

Central F

Noncentral F

In the Noncentral F distribution below, in which the effect

is larger, 30% of the values are below 2.00. Therefore

power = .70


9/40

Power Functions

0

0.10.20.30.40.50.6

0.70.80.9

1

00.2

0.4

0.6

0.8 1

Effect Size

Likelihood ofrejection H0


10/40

Power Functions

0

0.10.20.30.40.50.6

0.70.80.9

1

25 75125

175

225

275

Sample Size



11/40

How to Increase Power

Increase N Effects of adding more subjects are not

identical to those of adding more observations

Increase ES

Choose a different research question

Use stronger treatments or interventions

Use better measures

Use a more lenient alpha

p


12/40

Effects of Implementing

Power Analysis

Stronger studies

Larger samples, better measures

Fewer studies

Adequate studies are harder to do than most people

realize

Less emphasis, in the long term, on null

hypothesis testing


13/40

Conducting a Power Analysis

The classic text in this field is still one of thebest sources Cohen, J. (1998). Statistical power analysis for the

behavioral sciences (2nd Ed.). Erlbaum

More current (and more accessible) sourcesinclude

Lipsey, M. (1990). Design sensitivity. Sage

Murphy, K. & Myors, B. (2004). Statistical poweranalysis: A simple and general model fortraditional and modern hypothesis tests. Erlbaum.


14/40


Power Analysis software

Power and Precision - Biostat

www.PowerAnalysis.com

One-Stop F Calculator Included in Murphy & Myors (2004)

PASS - NCSS software www.ncss.com/pass.html
http://www.poweranalysis.com/http://www.ncss.com/pass.htmlhttp://www.ncss.com/pass.htmlhttp://www.poweranalysis.com/


15/40


In planning studies, you should

Assume relatively small effects

If it was reasonable to expect a large effect, you

probably dont need to do the study or the test

Aim for power of .80 or better

Power of .50 means that significance tests have

become a coin flip


16/40

Effect Size Conventions

In behavioral and social sciences, there are widely-

followed conventions for describing small, moderate,

and large effects

d- standardized Percentage of mean difference variance explained

Small .20 1%

Moderate .50 10%

Large .80 25%


17/40

Applications of Power Analysis

Study planning - Given ES and , solve for N If you wanted to compare the effects of four types

of training programs and:

You expected small to moderate effects(programs account for 5% of variation inperformance)

You use an

level of .05

You need N=214 to achieve Power=.80


18/40


Study evaluation - Given N and , solve for ES If you wanted to compare the effects of four safety

interventions and:

You have 44 subjects available

You use an level of .05

You will achieve Power=.80 only if the effects ofinterventions are truly large (accounting for25% of the variance in outcomes)


19/40


Making a rational choice regarding GivenN and ES, solve for If you wanted to compare the effects of two

leadership development programs and:

You have 200 subjects available

You expect a small difference (d=.20, or 1% ofthe variance explained by programs)

You will achieve Power=.64 using = .0 5

You will achieve Power=.37 using = .0 1


20/40

Moving Beyond Traditional

Significance Testing Traditional null hypotheses tests are the focus

of most power analyses

These tests are deeply flawed, and there is

relatively little research on the power of

alternatives

Minimum effect tests represent one useful

alternative


21/40

Nil Hypothesis Testing

Testing the hypothesis that treatments, interventions,etc. have no effect (Nil Hypothesis Test - NHT) is mostcommon and least useful thing social and behavioralscientists do

Two problems loom largest:

Confusion over Type 1 errors

Likelihood of rejecting the null hypothesis

eventually reaches 1.0, regardless of the researchquestion


22/40

Type I Errors are Very Rare

Type I error - reject H0 when it is true

If H0 is never true, it is impossible to make a Type I

error

If H0 is very unlikely, a Type I error is even lesslikely

H0 - treatment had NO effect at all

H1 - SOMETHING happened

Most things we do to minimize Type I errors lead tomore Type II errors


23/40

This Implies

Large literature on protecting yourself from Type I

errors is not really useful

NHTs yield one of two outcomes confirm the obvious

reject H0, which you already know is likely to be

wrong

confuse you

accept H0 even though you know it is likely to

be wrong


24/40

In NHT, All You Need in N

As N increases, the likelihood of rejecting the

nil hypothesis approaches 1.0

Power to reject H0 does not depend all thatmuch on the phenomenon

if N is big enough you will reject H0

if N it is small enough, you wont

Significance tests are an indirect index of how

many subjects showed up


25/40

There Must be a Better Way

Stop doing significance tests (e.g., Schmidt,

1992)

Confidence intervals (e.g., APA Task Force,

American Psychologist, August, 1999)

Bayesian methods (e.g., Rounet,Psychological Bulletin, 1996)


26/40

There Must be a Better Way

Minimum-Effect Tests

Test the hypothesis that something nontrivialhappened

Murphy, K. & Myors, B. (2003) Statistical poweranalysis: A simple and general model fortraditional and modern hypothesis tests: 2ndEd. Mahwah, NJ: Erlbaum.

Murphy, K. & Myors, B. (1999). Testing thehypothesis that treatments have negligibleeffects: Minimum-effect tests in the generallinear model. Journal of Applied Psychology,84, 234-248.


27/40


H0 - treatments have a negligible effect (e.g.,they account for 1% or less of the variance)

H1 - the effect of treatments is big enough to

care about

This approach addresses the two biggest flaws oftraditional tests

H0 really is plausible. Treatments rarely have zeroeffect but they often have negligible effects

Increasing N does not automatically increaselikelihood of rejecting H0


28/40


With Minimum Effect Tests (METs)

Type I errors are once again possible, but can bemiminized

the question asked in MET is no longer trivial

you can actually learn something by doing the test

Power Analysis work exactly the same way in METas in NHT


29/40

Performing Minimum-Effect Tests

Put your test statistics in a simple, commonform e.g. F

Decide what you mean by a negligible effect

Find or create an F table based on thatdefinition of a negligible effect - Noncentral F

distribution

Proceed as you would for any traditional NHT


30/40

Working with the Noncentral F

Calculating or deriving noncentral F

distributions was once a daunting task

Many simple calculators now available

http://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.php

Noncentrality parameter ( )

in a measure of effect size

= [dfh * (MSh - MSe )] / MSe
http://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.phphttp://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.php


31/40

What Constitutes a Negligible

Effect ?

Standards for negligible effects depend on theresearch area and on the consequences of decisions

Aspirin use accounts for very little variance inheart attacks, but the use of aspirin savesthousands of lives at minimal cost

In personnel selection, it is relatively easy toaccount for a large proportion of the variance inperformance with simple cognitive tests, so theincrease in effectiveness that is defined asnegligible might be larger


32/40

Defining a Negligible Effect

Effect Size conventions are useful, but by themselves

may not be sufficient

Consequences of errors must also be considered

d- standardized Percentage of

mean difference variance explained

Small .20 1%

Moderate .50 10%


33/40

Power Analysis for MET:

Small Effect - d=.20, PV=.01

0

0.10.20.30.40.50.6

0.70.80.9

1

00.2

0.4

0.6

0.8 1

Effect Size



34/40



0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

25 75125

175

225

275

Sample Size

Likelihood of

rejection H0givenpopulationd=.30


35/40



0

0.10.20.30.40.50.6

0.70.80.9

1

00.2

0.4

0.6

0.8 1

Effect Size



36/40



0

0.01

0.02

0.030.04

0.05

0.06

0.07

00.05 0.

10.

15 0.2

0.25

Effect Size



37/40

Errors in MET

The potential downsides of MET are:

Type I errors could actually occur

Lower power than corresponding NHT

You can reduce Type I errors by using larger

samples

The loss of power is more than balanced bythe fact that the hypothesis being tested is not

a trivial one


38/40

Type I Error Rates of Minimum-

Effect Tests

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

00.05 0.

10.

15 0.2

0.25

Effect Size

Smaller SampleLarger Sample


39/40

Type I vs Type II Errors

The tradeoff between Type I and Type II errors is morecomplicated in METs than in Nil tests

In MET, alpha is precise only if the true effect sizeis exactly the same as your definition ofnegligible

Type II errors more of a problem with METs

METs are less powerful than NHTs (it is easier toreject the hypothesis that nothing happened thanthe hypothesis that nothing important happened),

but this is not necessarily a bad thing

METs place even greater premium on largesamples, but small samples cause problems evenwhere there is substantial power


40/40

Examples - comparing two

treatments

N needed True effect

PV=.05 PV=.10

Nil 149 79

MET 375 117

(1%=negligible)

Murphy Power Analysis

Documents

Transcript of Murphy Power Analysis