On teaching statistical inference: What do p values (not) mean? Bruce Blaine, PhD, PStat®...

On teaching statistical inference:

What do p values (not) mean?

Bruce Blaine, PhD, PStat®Department of Mathematical and Computing Sciences

St. John Fisher [email protected]

1

Limitations of NHST

The misapplication of null hypothesis significance testing (NHST) procedures for statistical inference is well known.

NHST procedures do not address what researchers most want to know.•NHST procedures test a (nil) null hypothesis, which is rarely true and therefore uninformative to reject.

•NHST procedures deliver a conditional probability, p(D|Ho), which is commonly misinterpreted.•NHST procedures do not test research hypotheses.•NHST procedures do not quantify effect size.

2

Misinterpretations of p values

Two misinterpretations of p values from NHST procedures are common in the social sciences (c.f., Kline, 2004):

1. Magnitude fallacyp values are misunderstood as an effect size statistic, such that p is inversely proportional to the evidence for the treatment effect.

“…the effect was marginally significant, p=.07”

“…the effect was highly (or extremely) significant, p<.001”

2. Validity fallacyp(D|Ho) is misunderstood as p(H1|D).

“…the treatment improved the outcome, p<.05”

“…the treatment had no effect on the outcome, p>.05”

3

Classroom exercise 1:

Addressing the magnitude fallacy

1.In Excel (using Data Analysis Toolpak add-in), have students enter the data from a hypothetical experiment in Table 1.

2. Provide, or have them create, the table in Table 2.

3. Have students run an independent-samples t test (assume equal variances).

4. Copy and paste treatment and control data to increase ns by 5, repeating the t test each time.

5. Fill in the table with values from the analyses.

4

Table1.

Table 2.

Treatment Control1 32 43 54 65 7

Group size (n)Statistic 5 10 15Mean differencePooled variancetp (2-tailed)


Results

This exercise should point out that p values decrease in the 3 experiments even though the treatment has the same effect in each—why?

Students should come to appreciate that larger samples are associated with smaller estimated standard errors. For a constant mean difference (which doesn’t change in this exercise), this will produce larger t values, and smaller p values.

5

Group size (n)Statistic 5 10 15Mean difference 2 2 2Pooled variance 2.5 2.2 2.1t -2.00 -3.00 -3.74p (2-tailed) 0.0800 0.0080 0.0008

Imagine 3 studies that compare students with high (Treatment, or T) and low Facebook time (Control, or C) on GPA, with descriptive statistics from the studies in the table below:

1.Have students observe (via hand calculated t tests or 95% confidence intervals) that none of the 3 studies would reject Ho at p<.05.

2.In Excel (using the Meta Easy add-in), have students enter the data from the 3 hypothetical studies and generate a meta-analysis of the effect of Facebook time on GPA.

6

Classroom exercise 2: Addressing the validity

fallacy

STUDY T mean C mean T SD C SD T n C nApple 2.7 3.3 0.6 0.6 20 20Blueberry 2.9 3.2 0.9 0.9 20 20Cherry 2.7 2.9 0.8 0.8 30 30


Results

The exercise should point out that although none of the 3 studies is statistically significant (defined as p<.05), when their data is combined the Facebook effect on GPA is significant.

Students should notice that the 95% CI estimate of the Facebook effect on GPA (the FE diamond) does not include 0.

7

These exercises allow data to teach students where p values come from and how to properly interpret them.

o Exercise 1 shows that although p values are influenced by mean difference and sample size data, they cannot be trusted to quantify the mean difference alone.

o Exercise 2 shows that evidence from “nonsignificant” studies, when taken as evidence against H1, can be misleading. Genuine treatment effects may be obscured in studies with small samples, high variability, or both.

8

Summary lessons

On teaching statistical inference: more estimation, less NHST

o Typical social science statistics textbooks and curricula are overdependent upon NHST methods for statistical inference.

o These exercises can be part of a larger effort to teach more estimation methods in basic statistics courses, including confidence intervals, effect size statistics, and meta-analysis.

o Estimation methods are more intuitive, because they speak to research, rather than null, hypotheses.

9

On teaching statistical inference: What do p values (not) mean? Bruce Blaine, PhD, PStat®...

Documents

Transcript of On teaching statistical inference: What do p values (not) mean? Bruce Blaine, PhD, PStat®...