SADC Course in Statistics Comparing Means from Independent Samples (Session 12)

15
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of SADC Course in Statistics Comparing Means from Independent Samples (Session 12)

SADC Course in Statistics

Comparing Means from Independent Samples

(Session 12)

2To put your footer here go to View > Header and Footer

Learning ObjectivesBy the end of this session, you will be able to

• explain how means from two populations may be compared

• describe the assumptions associated with the independent samples t-test

• interpret computer output from a two-sample t-test

• present and write up conclusions resulting from such tests

• explain the difference between statistical significance and an important result

3To put your footer here go to View > Header and Footer

An example: Comparing 2 means

Agric Non-agric

156 223

282 131

222 137

172 146

183 130

206 122

210 141

198 192199 188211 212

As part of a health survey, cholesterol levels of men in a small rural area were measured, including those working in agriculture and those employed in non-agricultural work.

Aim: To see if mean cholesterol levels were different between the two groups.

4To put your footer here go to View > Header and Footer

Summary statistics

Begin with summarising each column of data.

Agric Non-agric

Mean= 203.9 162.2Std. dev. = 33.9 37.6

Variance = 1147 1412

There appears to be a substantial difference between the two means.

Our question of interest is:

Is this difference showing a real effect, or could it merely be a chance occurrence?

5To put your footer here go to View > Header and Footer

Setting up the hypotheses

To answer the question, we set up:

Null hypothesis H0:

no difference between the two groups (in

terms of mean response), i.e. 1 = 2

Alternative hypothesis H1:

there is a difference, i.e. 1 2

The resulting test will be two-sided since the alternative is “not equal to”.

6To put your footer here go to View > Header and Footer

Test for comparing means

• Use a two-sample (unpaired) t-test- appropriate with 2 independent samples

• Assumptions - normal distributions for each sample- constant variance (so test uses a pooled

estimate of variance)- observations are independent

• Procedure - assess how large the difference in means is, relative to the noise in this difference, i.e. the std. error of the difference.

7To put your footer here go to View > Header and Footer

Test Statistic

tx x

sn

sn

1 2

2

1

2

2

s

n s n s

n nwith n n f2 1 1

22 2

2

1 21 2

1 1

22

d. .

where s2, the pooled estimate of variance, is given by

The test statistic is:

8To put your footer here go to View > Header and Footer

Numerical Results

2 21 1 2 22

1 2

n 1 s n 1 ss

n n 2

The pooled estimate of variance, is :

= 1279.5

Hence the t-statistic is:

= 41.7/(2x1279.5/10)

= 2.61 , based on 18 d.f.

Comparing with tables of t18, this result is significant at the 2% level, so reject H0.

Note: The exact p-value = 0.018

tx x

sn

sn

1 2

2

1

2

2

9To put your footer here go to View > Header and Footer

Presenting the results

• For comparisons, should report:- difference between means- s.e. of difference in means- 95% confidence interval for true diff.

• In addition, may report for each group:- mean- s.e. of each mean- sample size for each mean

• Conclusions will then follow…

10To put your footer here go to View > Header and Footer

Results and conclusions

Difference of means: 41.7Standard error of difference: 15.99

95% confidence interval for difference in means: (8.09, 75.3).

Conclusions: There is some evidence (p=0.018) that the mean cholesterol levels differ between those working in agriculture and others. The difference in means is 42 mg/dL with 95% confidence interval (8.1, 75.3).

11To put your footer here go to View > Header and Footer

Significance ideas again!e.g. Farmers report that using a fungicide increased crop yields by

2.7 kg ha-1, s.e.m.=0.41

This gave a t-statistic of 6.6 (p-value<0.001)

Recall that the p-value is the probability of rejecting the null hypothesis when it is true.

i.e. it is the chance of error in your conclusion that there is an effect due to fungicide!

12To put your footer here go to View > Header and Footer

How important are sig. tests?In relation to the example on the previous slide, we may find one of the following situations for different crops.

Mean yields: with and without fungicide. 589.9 587.2 Not an important finding! 9.9 7.2 Very important finding!

It is likely that in the first of these results, either too much replication or the incorrect level of replication had been used (e.g. plant level variation, rather than plot level variation used to compare means).

13To put your footer here go to View > Header and Footer

What does non-significance tell use.g. There was insufficient evidence in the data to demonstrate that using a fungicide had any effect on plant yields (p=0.128).

Mean yields: with and without fungicide.157.2 89.9

This difference may be an important finding, but the statistical analysis was unable to pick up this difference as being statistically significant.

HOW CAN THIS HAPPEN? Too small a sample size? High variability in the experimental material? One or two outliers? All sources of variability not identified?

14To put your footer here go to View > Header and Footer

Significance – Key Points

• Statistical significance alone is not enough. Consider whether the result is also scientifically meaningful and important.

• When a significant result if found, report the finding in terms of the corresponding estimates, their standard errors and C.I.’s

15To put your footer here go to View > Header and Footer

Some practical work follows…