Murphy Power Analysis
-
Upload
utkarsh-gaurav -
Category
Documents
-
view
215 -
download
0
Transcript of Murphy Power Analysis
-
8/3/2019 Murphy Power Analysis
1/40
Power Analysis for Traditional and
Modern Hypothesis Tests
Kevin R. Murphy
Pennsylvania State University
-
8/3/2019 Murphy Power Analysis
2/40
Power Analysis
Helps you plan better studies
Helps you make better sense of existingstudies
Is not limited to traditional null hypothesistests
Application of power analysis to minimum-effecttests will be discussed
-
8/3/2019 Murphy Power Analysis
3/40
Errors in Null Hypothesis Tests
No Effect (H0) Some Effect
Reject Null Type I Error -reject null when itis true
()
Power= 1-
Fail to RejectNull
Type II Error - failto reject nullwhen you should()
True State of Affairs
Your
Decision
-
8/3/2019 Murphy Power Analysis
4/40
Power Depends On
Effect Size
How large is the effect in the population?
Sample Size (N)
You are using a sample to make inferences about
the population. How large is the sample?
Decision Criteria - How do you define significant and why?
-
8/3/2019 Murphy Power Analysis
5/40
Power Analysis and the F Distribution
The power of most statistical tests in socialsciences (e.g., ANOVA, regression, t-tests,other linear model statistics) can beevaluated via the familiar F distribution
F is a ratio of observed effect to error F= MS treatments / MS error
F = (True Effect + Error) / Error The larger the true treatment effect, the larger F
you expect to find
If the null hypothesis is correct, E(F) = 1.0
-
8/3/2019 Murphy Power Analysis
6/40
How Does Power Analysis Work?
0 1 2 3 4
F Value
In the familiar F distribution below, 95% of thevalues are below 2.00 (distribution for df = 7,200)
F=2.0 represents
cutoff for rejectingH0
-
8/3/2019 Murphy Power Analysis
7/40
The Noncentral F Distribution
0 1 2 3 4
F Value
Central F
Noncentral F
If the null hypothesis is false, the Noncentral F distributionis needed. In the Noncentral F distribution below, 75% of
the values are below 2.00. Therefore, power = .25
-
8/3/2019 Murphy Power Analysis
8/40
A Larger Effect
0 1 2 3 4
F Value
Central F
Noncentral F
In the Noncentral F distribution below, in which the effect
is larger, 30% of the values are below 2.00. Therefore
power = .70
-
8/3/2019 Murphy Power Analysis
9/40
Power Functions
0
0.10.20.30.40.50.6
0.70.80.9
1
00.2
0.4
0.6
0.8 1
Effect Size
Likelihood ofrejection H0
-
8/3/2019 Murphy Power Analysis
10/40
Power Functions
0
0.10.20.30.40.50.6
0.70.80.9
1
25 75125
175
225
275
Sample Size
Likelihood ofrejection H0
-
8/3/2019 Murphy Power Analysis
11/40
How to Increase Power
Increase N Effects of adding more subjects are not
identical to those of adding more observations
Increase ES
Choose a different research question
Use stronger treatments or interventions
Use better measures
Use a more lenient alpha
p
-
8/3/2019 Murphy Power Analysis
12/40
Effects of Implementing
Power Analysis
Stronger studies
Larger samples, better measures
Fewer studies
Adequate studies are harder to do than most people
realize
Less emphasis, in the long term, on null
hypothesis testing
-
8/3/2019 Murphy Power Analysis
13/40
Conducting a Power Analysis
The classic text in this field is still one of thebest sources Cohen, J. (1998). Statistical power analysis for the
behavioral sciences (2nd Ed.). Erlbaum
More current (and more accessible) sourcesinclude
Lipsey, M. (1990). Design sensitivity. Sage
Murphy, K. & Myors, B. (2004). Statistical poweranalysis: A simple and general model fortraditional and modern hypothesis tests. Erlbaum.
-
8/3/2019 Murphy Power Analysis
14/40
Conducting a Power Analysis
Power Analysis software
Power and Precision - Biostat
www.PowerAnalysis.com
One-Stop F Calculator Included in Murphy & Myors (2004)
PASS - NCSS software www.ncss.com/pass.html
http://www.poweranalysis.com/http://www.ncss.com/pass.htmlhttp://www.ncss.com/pass.htmlhttp://www.poweranalysis.com/ -
8/3/2019 Murphy Power Analysis
15/40
Conducting a Power Analysis
In planning studies, you should
Assume relatively small effects
If it was reasonable to expect a large effect, you
probably dont need to do the study or the test
Aim for power of .80 or better
Power of .50 means that significance tests have
become a coin flip
-
8/3/2019 Murphy Power Analysis
16/40
Effect Size Conventions
In behavioral and social sciences, there are widely-
followed conventions for describing small, moderate,
and large effects
d- standardized Percentage of mean difference variance explained
Small .20 1%
Moderate .50 10%
Large .80 25%
-
8/3/2019 Murphy Power Analysis
17/40
Applications of Power Analysis
Study planning - Given ES and , solve for N If you wanted to compare the effects of four types
of training programs and:
You expected small to moderate effects(programs account for 5% of variation inperformance)
You use an
level of .05
You need N=214 to achieve Power=.80
-
8/3/2019 Murphy Power Analysis
18/40
Applications of Power Analysis
Study evaluation - Given N and , solve for ES If you wanted to compare the effects of four safety
interventions and:
You have 44 subjects available
You use an level of .05
You will achieve Power=.80 only if the effects ofinterventions are truly large (accounting for25% of the variance in outcomes)
-
8/3/2019 Murphy Power Analysis
19/40
Applications of Power Analysis
Making a rational choice regarding GivenN and ES, solve for If you wanted to compare the effects of two
leadership development programs and:
You have 200 subjects available
You expect a small difference (d=.20, or 1% ofthe variance explained by programs)
You will achieve Power=.64 using = .0 5
You will achieve Power=.37 using = .0 1
-
8/3/2019 Murphy Power Analysis
20/40
Moving Beyond Traditional
Significance Testing Traditional null hypotheses tests are the focus
of most power analyses
These tests are deeply flawed, and there is
relatively little research on the power of
alternatives
Minimum effect tests represent one useful
alternative
-
8/3/2019 Murphy Power Analysis
21/40
Nil Hypothesis Testing
Testing the hypothesis that treatments, interventions,etc. have no effect (Nil Hypothesis Test - NHT) is mostcommon and least useful thing social and behavioralscientists do
Two problems loom largest:
Confusion over Type 1 errors
Likelihood of rejecting the null hypothesis
eventually reaches 1.0, regardless of the researchquestion
-
8/3/2019 Murphy Power Analysis
22/40
Type I Errors are Very Rare
Type I error - reject H0 when it is true
If H0 is never true, it is impossible to make a Type I
error
If H0 is very unlikely, a Type I error is even lesslikely
H0 - treatment had NO effect at all
H1 - SOMETHING happened
Most things we do to minimize Type I errors lead tomore Type II errors
-
8/3/2019 Murphy Power Analysis
23/40
This Implies
Large literature on protecting yourself from Type I
errors is not really useful
NHTs yield one of two outcomes confirm the obvious
reject H0, which you already know is likely to be
wrong
confuse you
accept H0 even though you know it is likely to
be wrong
-
8/3/2019 Murphy Power Analysis
24/40
In NHT, All You Need in N
As N increases, the likelihood of rejecting the
nil hypothesis approaches 1.0
Power to reject H0 does not depend all thatmuch on the phenomenon
if N is big enough you will reject H0
if N it is small enough, you wont
Significance tests are an indirect index of how
many subjects showed up
-
8/3/2019 Murphy Power Analysis
25/40
There Must be a Better Way
Stop doing significance tests (e.g., Schmidt,
1992)
Confidence intervals (e.g., APA Task Force,
American Psychologist, August, 1999)
Bayesian methods (e.g., Rounet,Psychological Bulletin, 1996)
-
8/3/2019 Murphy Power Analysis
26/40
There Must be a Better Way
Minimum-Effect Tests
Test the hypothesis that something nontrivialhappened
Murphy, K. & Myors, B. (2003) Statistical poweranalysis: A simple and general model fortraditional and modern hypothesis tests: 2ndEd. Mahwah, NJ: Erlbaum.
Murphy, K. & Myors, B. (1999). Testing thehypothesis that treatments have negligibleeffects: Minimum-effect tests in the generallinear model. Journal of Applied Psychology,84, 234-248.
-
8/3/2019 Murphy Power Analysis
27/40
Minimum-Effect Tests
H0 - treatments have a negligible effect (e.g.,they account for 1% or less of the variance)
H1 - the effect of treatments is big enough to
care about
This approach addresses the two biggest flaws oftraditional tests
H0 really is plausible. Treatments rarely have zeroeffect but they often have negligible effects
Increasing N does not automatically increaselikelihood of rejecting H0
-
8/3/2019 Murphy Power Analysis
28/40
Minimum-Effect Tests
With Minimum Effect Tests (METs)
Type I errors are once again possible, but can bemiminized
the question asked in MET is no longer trivial
you can actually learn something by doing the test
Power Analysis work exactly the same way in METas in NHT
-
8/3/2019 Murphy Power Analysis
29/40
Performing Minimum-Effect Tests
Put your test statistics in a simple, commonform e.g. F
Decide what you mean by a negligible effect
Find or create an F table based on thatdefinition of a negligible effect - Noncentral F
distribution
Proceed as you would for any traditional NHT
-
8/3/2019 Murphy Power Analysis
30/40
Working with the Noncentral F
Calculating or deriving noncentral F
distributions was once a daunting task
Many simple calculators now available
http://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.php
Noncentrality parameter ( )
in a measure of effect size
= [dfh * (MSh - MSe )] / MSe
http://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.phphttp://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.php -
8/3/2019 Murphy Power Analysis
31/40
What Constitutes a Negligible
Effect ?
Standards for negligible effects depend on theresearch area and on the consequences of decisions
Aspirin use accounts for very little variance inheart attacks, but the use of aspirin savesthousands of lives at minimal cost
In personnel selection, it is relatively easy toaccount for a large proportion of the variance inperformance with simple cognitive tests, so theincrease in effectiveness that is defined asnegligible might be larger
-
8/3/2019 Murphy Power Analysis
32/40
Defining a Negligible Effect
Effect Size conventions are useful, but by themselves
may not be sufficient
Consequences of errors must also be considered
d- standardized Percentage of
mean difference variance explained
Small .20 1%
Moderate .50 10%
-
8/3/2019 Murphy Power Analysis
33/40
Power Analysis for MET:
Small Effect - d=.20, PV=.01
0
0.10.20.30.40.50.6
0.70.80.9
1
00.2
0.4
0.6
0.8 1
Effect Size
Likelihood ofrejection H0
-
8/3/2019 Murphy Power Analysis
34/40
Power Analysis for MET:
Small Effect - d=.20, PV=.01
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
25 75125
175
225
275
Sample Size
Likelihood of
rejection H0givenpopulationd=.30
-
8/3/2019 Murphy Power Analysis
35/40
Power Analysis for MET:
Small Effect - d=.20, PV=.01
0
0.10.20.30.40.50.6
0.70.80.9
1
00.2
0.4
0.6
0.8 1
Effect Size
Likelihood ofrejection H0
-
8/3/2019 Murphy Power Analysis
36/40
Power Analysis for MET:
Small Effect - d=.20, PV=.01
0
0.01
0.02
0.030.04
0.05
0.06
0.07
00.05 0.
10.
15 0.2
0.25
Effect Size
Likelihood ofrejection H0
-
8/3/2019 Murphy Power Analysis
37/40
Errors in MET
The potential downsides of MET are:
Type I errors could actually occur
Lower power than corresponding NHT
You can reduce Type I errors by using larger
samples
The loss of power is more than balanced bythe fact that the hypothesis being tested is not
a trivial one
-
8/3/2019 Murphy Power Analysis
38/40
Type I Error Rates of Minimum-
Effect Tests
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
00.05 0.
10.
15 0.2
0.25
Effect Size
Smaller SampleLarger Sample
-
8/3/2019 Murphy Power Analysis
39/40
Type I vs Type II Errors
The tradeoff between Type I and Type II errors is morecomplicated in METs than in Nil tests
In MET, alpha is precise only if the true effect sizeis exactly the same as your definition ofnegligible
Type II errors more of a problem with METs
METs are less powerful than NHTs (it is easier toreject the hypothesis that nothing happened thanthe hypothesis that nothing important happened),
but this is not necessarily a bad thing
METs place even greater premium on largesamples, but small samples cause problems evenwhere there is substantial power
-
8/3/2019 Murphy Power Analysis
40/40
Examples - comparing two
treatments
N needed True effect
PV=.05 PV=.10
Nil 149 79
MET 375 117
(1%=negligible)