SADC Course in Statistics Comparing Means from Independent Samples (Session 12)

SADC Course in Statistics

Comparing Means from Independent Samples

(Session 12)

2To put your footer here go to View > Header and Footer

Learning ObjectivesBy the end of this session, you will be able to

• explain how means from two populations may be compared

• describe the assumptions associated with the independent samples t-test

• interpret computer output from a two-sample t-test

• present and write up conclusions resulting from such tests

• explain the difference between statistical significance and an important result


An example: Comparing 2 means

Agric Non-agric

156 223

282 131

222 137

172 146

183 130

206 122

210 141

198 192199 188211 212

As part of a health survey, cholesterol levels of men in a small rural area were measured, including those working in agriculture and those employed in non-agricultural work.

Aim: To see if mean cholesterol levels were different between the two groups.


Summary statistics

Begin with summarising each column of data.

Agric Non-agric

Mean= 203.9 162.2Std. dev. = 33.9 37.6

Variance = 1147 1412

There appears to be a substantial difference between the two means.

Our question of interest is:

Is this difference showing a real effect, or could it merely be a chance occurrence?


Setting up the hypotheses

To answer the question, we set up:

Null hypothesis H0:

no difference between the two groups (in

terms of mean response), i.e. 1 = 2

Alternative hypothesis H1:

there is a difference, i.e. 1 2

The resulting test will be two-sided since the alternative is “not equal to”.


Test for comparing means

• Use a two-sample (unpaired) t-test- appropriate with 2 independent samples

• Assumptions - normal distributions for each sample- constant variance (so test uses a pooled

estimate of variance)- observations are independent

• Procedure - assess how large the difference in means is, relative to the noise in this difference, i.e. the std. error of the difference.


Test Statistic

tx x

sn

sn

1 2

2

1

2

2

s

n s n s

n nwith n n f2 1 1

22 2

2

1 21 2

1 1

22

d. .

where s2, the pooled estimate of variance, is given by

The test statistic is:


Numerical Results

2 21 1 2 22

1 2

n 1 s n 1 ss

n n 2

The pooled estimate of variance, is :

= 1279.5

Hence the t-statistic is:

= 41.7/(2x1279.5/10)

= 2.61 , based on 18 d.f.

Comparing with tables of t18, this result is significant at the 2% level, so reject H0.

Note: The exact p-value = 0.018

tx x

sn

sn

1 2

2

1

2

2


Presenting the results

• For comparisons, should report:- difference between means- s.e. of difference in means- 95% confidence interval for true diff.

• In addition, may report for each group:- mean- s.e. of each mean- sample size for each mean

• Conclusions will then follow…


Results and conclusions

Difference of means: 41.7Standard error of difference: 15.99

95% confidence interval for difference in means: (8.09, 75.3).

Conclusions: There is some evidence (p=0.018) that the mean cholesterol levels differ between those working in agriculture and others. The difference in means is 42 mg/dL with 95% confidence interval (8.1, 75.3).


Significance ideas again!e.g. Farmers report that using a fungicide increased crop yields by

2.7 kg ha-1, s.e.m.=0.41

This gave a t-statistic of 6.6 (p-value<0.001)

Recall that the p-value is the probability of rejecting the null hypothesis when it is true.

i.e. it is the chance of error in your conclusion that there is an effect due to fungicide!


How important are sig. tests?In relation to the example on the previous slide, we may find one of the following situations for different crops.

Mean yields: with and without fungicide. 589.9 587.2 Not an important finding! 9.9 7.2 Very important finding!

It is likely that in the first of these results, either too much replication or the incorrect level of replication had been used (e.g. plant level variation, rather than plot level variation used to compare means).


What does non-significance tell use.g. There was insufficient evidence in the data to demonstrate that using a fungicide had any effect on plant yields (p=0.128).

Mean yields: with and without fungicide.157.2 89.9

This difference may be an important finding, but the statistical analysis was unable to pick up this difference as being statistically significant.

HOW CAN THIS HAPPEN? Too small a sample size? High variability in the experimental material? One or two outliers? All sources of variability not identified?


Significance – Key Points

• Statistical significance alone is not enough. Consider whether the result is also scientifically meaningful and important.

• When a significant result if found, report the finding in terms of the corresponding estimates, their standard errors and C.I.’s


Some practical work follows…

SADC Course in Statistics Comparing Means from Independent Samples (Session 12)

Documents

Transcript of SADC Course in Statistics Comparing Means from Independent Samples (Session 12)