1 Session 8 Tests of Hypotheses. 2 By the end of this session, you will be able to set up, conduct...
-
Upload
aidan-sawyer -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Session 8 Tests of Hypotheses. 2 By the end of this session, you will be able to set up, conduct...
1
Session 8
Tests of Hypotheses
2
By the end of this session, you will be able to
set up, conduct and interpret results from a test of hypothesis concerning a population mean
explain how means from two populations may be compared, and state assumptions associated with the independent samples t-test
interpret computer output from one or two-sample t-tests, present and write up conclusions resulting from such tests
explain the difference between statistical significance and an important result
Learning Objectives
3
Farmers growing maize in a certain area were getting average yields of 2900 kg/ha.
A “new” Integrated Pest Management (IPM) approach was attempted with 16 farmers.
Objective: To determine if the new approach results in an increase in maize yields.
Yields from these 16 farmers (after using IPM) gave mean = 3454 kg/ha, with standard deviation = 672 kg/ha hence s.e. = 168.
Can we determine whether IPM has really increased maize yields?
An illustrative example
4
In above example, clearly the sample mean of 3454 kg/ha is greater than 2990 kg/ha
But the question of interest is “does this result indicate a significant increase in the yield
or might it just be a result of the usual random variation of yield”
Hypothesis testing seeks to answer such questions by looking at the observed change relative to the “noise”, i.e.
the standard error in the sample estimate
Is the yield increase real?
5
Null hypothesis H0: = 2900where is the true mean yield of farmers in the area using the new approach
The promoters of the new approach are confident that yields with the new approach cannot possibly decrease.
Hence the above null hypothesis needs to be tested against the alternative hypothesis
H1: > 2900
Null H0 & Alternative H1
6
Testing the hypothesis
Compute the t test statistic
t = ( - )/(s/n) = (3454 – 2900)/(168) = 3.30
which follows a t-distribution with n-1=15 degrees of freedom. Use values of the t-distribution to find the probability of getting a result, which is as extreme, or more extreme than the one (3.30) observed, given H0 is true.
The smaller this probability value, the greater is the evidence against the null hypothesis.This probability is called the p-value or significance level of the test
x
7
Analysis in StataType db ttesti or look for the
One-sample mean comparison calculator on the menu
8
Results
Result from the one-sided test done here
t-probabilities from formulae or table
t-value
9
Interpretation and conclusions
It is clear from t-tables that the p-value is smaller than 0.01.
Using statistical software, we get the exact p-value as 0.0024.
This p-value is so small, there is sufficient evidence to reject H0.
Conclusion:
Use of the new IPM technology has led to an increase in maize yields (p-value=0.0024)
10
Agric Non-agric
156 223
282 131
222 137
172 146
183 130
206 122
210 141
198 192
199 188
211 212
As part of a health survey, cholesterol levels of men in a small rural area were measured, including those working in agriculture and those employed in non-agricultural work.
Aim: To see if mean cholesterol levels were different between the two groups.
An example: Comparing 2 means
11
Begin with summarising each column of data.
Agric Non-agricMean= 203.9 162.2
Std. dev. = 33.9 37.6Variance = 1147 1412
There appears to be a substantial difference between the two means.
Our question of interest is:
Is this difference showing a real effect, or could it merely be a chance occurrence?
Summary statistics
12
To answer the question, we set up:
Null hypothesis H0:
no difference between the two groups (in terms of
mean response), i.e. 1 = 2
Alternative hypothesis H1:
there is a difference, i.e. 1 2
The resulting test will be two-sided since the alternative is “not equal to”.
Setting up the hypotheses
13
• Use a two-sample (unpaired) t-test- appropriate with 2 independent samples
• Assumptions - normal distributions for each sample- constant variance (so test uses a pooled
estimate of variance)- observations are independent
• Procedure - assess how large the difference in means is, relative to the noise in this difference, i.e. the std. error of the difference.
Test for comparing means
14
tx x
sn
sn
1 2
2
1
2
2
s
n s n s
n nwith n n f2 1 1
22 2
2
1 21 2
1 1
22
d. .
where s2, the pooled estimate of variance, is given by
The test statistic is:
Test Statistic
15
The pooled estimate of variance, is :
= 1279.5
Hence the t-statistic is:
= 41.7/(2x1279.5/10)
= 2.61 , based on 18 d.f.
Comparing with tables of t18, this result is significant at the 2% level, so reject H0.
Note: The exact p-value = 0.018
Numerical Results
2 21 1 2 22
1 2
n 1 s n 1 ss
n n 2
tx x
sn
sn
1 2
2
1
2
2
16
Difference of means: 41.7Standard error of difference: 15.99
95% confidence interval for difference inmeans: (8.09, 75.3).
Conclusions: There is some evidence (p=0.018) that the mean cholesterol levels differ between those working in agriculture and others. The difference in means is 42 mg/dL with 95% confidence interval (8.1, 75.3).
Results and conclusions
17
Analysis in StataInput the data and do a t-test
Or complete the dialogue as shown below
Or type ttesti 10 203.9 33.9 10 162.2 37.6
18
Results
This was a 2-sided test
19
Take care to report results according to size of p-value.
For example, evidence of an effect is :• almost conclusive if p-value < 0.001 and could be said to be strong if p-value < 0.010• If 0.01< p-value < 0.05, results indicate some evidence of an effect.• If p-value > 0.05, but close to 0.05, it may indicate something is going on, but further confirmatory study is needed.
General reporting the results
20
e.g. Farmers report that using a fungicideincreased crop yields by 2.7 kg ha-1, s.e.m.=0.41
This gave a t-statistic of 6.6 (p-value<0.001)
Recall that the p-value is the probability of rejecting the null hypothesis when it is true.
i.e. it is the chance of error in your conclusion that there is an effect due to fungicide!
Significance: further comments
21
In relation to the example on the previous slide,we may find one of the following situations fordifferent crops.
Mean yields: with and without fungicide. 589.9 587.2 Not an important finding! 9.9 7.2 Very important finding!
It is likely that in the first of these results, either too much replication or the incorrect level of replication had been used (e.g. plant level variation, rather than plot level variation used to compare means).
How important are sig. tests?
22
e.g. There was insufficient evidence in the data todemonstrate that using a fungicide had any effecton plant yields (p=0.128).
Mean yields: with and without fungicide. 157.2 89.9
This difference may be an important finding, but thestatistical analysis was unable to pick up this differenceas being statistically significant.
HOW CAN THIS HAPPEN? Too small a sample size? High variability in the experimental material? One or two outliers? All sources of variability not identified?
What does non-significance tell us?
23
• Statistical significance alone is not enough. Consider whether the result is also scientifically meaningful and important.
• When a significant result if found, report the finding in terms of the corresponding estimates, their standard errors and C.I.s
• (as is done by Stata)
Significance – Key Points