Unit 4 Tests of Significance
-
Upload
rai-university -
Category
Education
-
view
175 -
download
5
Transcript of Unit 4 Tests of Significance
Unit-4
Tests of Significance
Once sample data has been gathered through an observational study or experiment, statistical
inference allows analysts to assess evidence in favor or some claim about the population from which the sample has been drawn. The methods of inference used to support or reject claims
based on sample data are known as tests of significance.
Every test of significance begins with a null hypothesis H0. H0 represents a theory that has been
put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved.
For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is
not better on average, than the current drug. We would write H0: there is no difference between the two drugs on average.
The alternative hypothesis, Ha, is a statement of what a statistical hypothesis test is set up to establish.
For example, in a clinical trial of a new drug, the alternative hypothesis might be that the new
drug has a different effect, on average, compared to that of the current drug. We would write Ha: the two drugs have different effects, on average. The alternative hypothesis might also be
that the new drug is better, on average, than the current drug. In this case we would write Ha: the new drug is better than the current drug, on average.
The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We either "reject H0 in favor of Ha" or "do not reject H0"; we never conclude "reject
Ha", or even "accept Ha".
If we conclude "do not reject H0", this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favor of Ha; rejecting the
null hypothesis then, suggests that the alternative hypothesis may be true.
(Definitions taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Hypotheses are always stated in terms of population parameter, such as the mean π. An alternative hypothesis may be one-sided or two-sided. A one-sided hypothesis claims that a parameter is either larger or smaller than the value given by the null hypothesis. A two-sided hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis
-- the direction does not matter.
Hypotheses for a one-sided test for a population mean take the following form: H0: π = k Ha: π > k or H0: π = k Ha: π < k.
Hypotheses for a two-sided test for a population mean take the following form:
H0: π = k
Ha: π k.
A confidence interval gives an estimated range of values which is likely to include an unknown
population parameter, the estimated range being calculated from a given set of sample data. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Example
Suppose a test has been given to all high school students in a certain state. The mean test score for the entire state is 70, with standard deviation equal to 10. Members of the school board
suspect that female students have a higher mean score on the test than male students, because the mean score οΏ½Μ οΏ½ from a random sample of 64 female students is equal to 73. Does this provide
strong evidence that the overall mean for female students is higher?
The null hypothesis H0 claims that there is no difference between the mean score for female students and the mean for the entire population, so that π = 70. The alternative hypothesis
claims that the mean for female students is higher than the entire student populations mean, so that π > 70.
Types of errors:-
There are two types of error in testing of hypothesis.
When a statistical hypothesis is tested there are four types of possibilities arise
1. The hypothesis is true but our test rejects it. (Type- I error)
2. The hypothesis is false but our test accepts it. (Type-II error) 3. The hypothesis is true but our test accepts it. (Correct decision)
4. The hypothesis is false but our test rejects it. (Correct decision)
The first two possibility leads to errors.
In a statistical hypothesis testing experiment, a type-I error is committed by rejecting the null hypothesis when it is true. The probability of committing a type-I error is denoted by πΌ (pronounced alpha), where
πΌ = Prob. (Type- I error)
= Prob. (Rejecting π»0/π»π is true)
On the other head, a Type-II error is committed by not rejecting (i.e. accepting) the null hypothesis when it is false. The probability of committing a type-II error is denoted by π½ (pounced as beta), where
π½= Probability (Type-II error)
= Probability (Not rejecting or accepting π»0/π»π false) The distinction between these two types of error can be made by an example.
Assume that the difference between the two population mean is actually zero. If our test of
significance when applied to the simple mean is significant, we make an Type- I error. On the other hand, suppose there is true difference between the two population means . Now
our test of significance leads to the judgment βnot significantβ, we commit Type- II error, we thus find ourselves in the situation which is described by the following table:
Hypothesis test
As we know sometimes we cannot survey or test all persons or objects; therefore, we have to take a sample. From the results of analysis from the sample data, we can predict the results from the population. Some questions that one may want to answer are
1. Are unmarried workers more likely to be absent from work than married workers? 2. In Fall 1996, did students in Math 163-01 score the same on the exam as students in
Math 163-02? 3. Is there any difference between the strengths of steel wire produced by the XY
Company and Bobβs Wire Company? 4. A hospital spokesperson claims that the average daily room charge for a specific
procedure is $622. Can we reject this claim?
Hypothesis testing is a procedure, based on sample evidence and probability theory, used to determine whether the hypothesis is a reasonable statement and should not be rejected, or is
unreasonable and should be rejected.
Hypothesis test:- A statistical hypothesis test is a method of statistical inference used for testing a statistical hypothesis. A test result is called statistically significant if it has been predicted as unlikely to have occurred by chance alone, according to a threshold probabilityβthe significance level.
Steps in the hypothesis testing procedure
1. State the null hypothesis and the alternate hypothesis.
Null Hypothesis β statement about the value of a population parameter.
Alternate Hypothesis β statement that is accepted if evidence proves the null hypothesis to be
false.
2. Select the appropriate test statistic and level of significance. When testing a hypothesis of a proportion, we use the z-statistic or z-test and the formula
π§ =οΏ½ΜοΏ½ β π
βπππ
When testing a hypothesis of a mean, we use the z-statistic or we use the t-statistic according to the following conditions. If the population standard deviation, Ο, is known and either the data is normally distributed or the sample size n > 30, we use the normal distribution (z-statistic). When the population standard deviation, Ο, is unknown and either the data is normally
distributed or the sample size is greater than 30 (n > 30), we use the t-distribution (t-statistic).
A traditional guideline for choosing the level of significance is as follows: (a) the 0.10 level for political polling, (b) the 0.05 level for consumer research projects, and (c) the 0.01 level for
quality assurance work. 3. State the decision rules. The decision rules state the conditions under which the null hypothesis will be accepted or rejected. The critical value for the test-statistic is determined by the level of significance. The critical value is the value that divides the non-reject region from
the reject region.
4. Compute the appropriate test statistic and make the decision. When we use the z-statistic, we use the formula
π§ =οΏ½Μ οΏ½ β π
π/βπ
When we use the t-statistic, we use the formula
π‘ =οΏ½Μ οΏ½ β π
π /βπ
Compare the computed test statistic with critical value. If the computed value is within the
rejection region(s), we reject the null hypothesis; otherwise, we do not reject the null hypothesis.
5. Interpret the decision. Based on the decision in Step 4, we state a conclusion in the context of the original problem.
The average test score for an entire school is 75 with a standard deviation of 10. What is
the probability that a random sample of 5 studentd scored above 80 ?
Conditions for using t-test:
1. π is unknown
2. π < 30
Here π = 75, π = 10, π = 5, π₯Μ = 80
The first condition is not satisfied So in this problrm we will use π- test.
π§ =οΏ½Μ οΏ½ β π
π/βπ=
80 β 75
10/β5=
5
10/2.236=
5
4.472= 1.118
The average test score for an entire school is 75. The standard deviation of a random
sample 40. What is the probability that a random sample of 10 studentd scored above
80 ?
Conditions for using t-test:
1. π is unknown
2. π < 30 Here π = 75, π = 40, π = 10, π₯Μ = 80
The second condition is not satisfied So in this problrm we will use π- test.
π§ =οΏ½Μ οΏ½ β π
π/βπ
The average test score for an entire school is 75. The standard deviation of a random
sample of 9 students is 10. What is the probability the average test score for the sample
is above 80 ?
Conditions for using t-test:
1. π is unknown
2. π < 30
Here π = 75, π = 10, π = 9, π₯Μ = 80
Here both the condition for t-test is satisfied . So we will use the π‘ β π‘ππ π‘.
π‘ =οΏ½Μ οΏ½ β π
π /βπ
Example:-
The average score of all sixth graders in school District A on a math aptitude exam is 75 with a
standard deviation of 8.1. A random sample of 100 students in one school was taken. The
mean score of these 100 students was 71. Does this indicate that the students of this school are
significantly less skilled in their mathematical abilities than the average student in the district?
(Use a 5% level of significance.)
Solution:-
Here Mean = π = 75 , Standard deviation= π = 8.1 , π = 100, οΏ½Μ οΏ½ = 71
Conditions for using t-test:
1. π is unknown 2. π < 30
Since Ο is known and π > 30, we use the z-test that is based on the normal curve or normal
distribution. Step 1:- State the null hypothesis (contains =, β₯, or β€) and alternate hypothesis (usually contains βnotβ). Think of the statement βDoes this indicate that the students of this school are significantly less skilled in their mathematical abilities than the average student in the district?β From
β...students of this school are significantly less skilled...,β we write the alternate hypothesis as π»1: π < 75
π»0: π β₯ 75 π»1: π < 75 Step 2:- Select a level of significance. Stated in the problem as 5% ππ πΌ = 0.05 Step 3:- Identify the statistical test to use. Use z-test because Ο is known and the sample (n=100) is a large sample (n > 30).
π§ =οΏ½Μ οΏ½ β π
π/βπ
Recall that in the normal curve, Z=0 corresponds to the mean. Z=1, 2, 3 represent 1, 2, and 3
standard deviations above the mean; the negatives are below the mean.
Step 4:- Formulate a decision rule. Since the alternate hypothesis states ΞΌ< 75, this is a one-tailed test to the left. For Ξ±= 0.05, we find π in the normal curve table that gives a probability of 0.05 to the left of Z. This means the negative of the z value (critical value) corresponding to a table value of 0.5 β 0.05 = 0.45 ππ π = β1.645. That is π(π < β1.645) = 0.05.. Because 0.4500 is exactly half way between 0.4495 and
0.4505, we get half way between 1.640 and 1.650 to get z = 1.645. Since 71 is to the left of 75, we have π§ = β1.645. That is π(π§ < β1.645) = 0.05.
Thus, we reject the null hypothesis if z < -1.645. And accept the alternate hypothesis that the students in the school sampled are less skilled in math aptitude than those in district A. Step 5:- Take a sample; arrive at a decision. The sample of 100 students have been tested and found that their mean score was 71. Using the statistical test (z-test) identified in Step 3 compute the test statistic by the formula from
Step 3
π§ =π₯Μ βπ
π/βπ=
71β75
8.1/β100= β4.938
Since the computed π§ = β4.938 < β1.645 (ππππ‘ππππ π§ π£πππ’π), we reject the null hypothesis
that the students in the school are not less skilled in mathematical ability. Thus, we conclude
that the sixth graders in the school are less skilled in mathematical ability than the sixth graders in District A.
The following problem is presented for students to work:
A sample of 250 married workers showed 22 missed more than 5 days last year for any reason. A sample of 300 unmarried workers showed 35 missed more than 5 days. Use the 5% level of
significance to test and answer the question: Are unmarried workers more likely to be absent from work than married workers?
Test of significance for Large samples:-
If the size of the sample exceeds 30 then we will test of significance for large samples.
The assumption made while dealing with problems relating to large samples are:
a) The random sampling distribution of a static is approximately normal, and
b) Values given by the samples are sufficiently close to the population value and
can be used in its place for calculating the standard error of the estimate.
Standard error of Mean:-
a) When standard deviation of the population is known
π. πΈ. οΏ½Μ οΏ½ =ππ
βπ
Where π. πΈ. οΏ½Μ οΏ½ refers to the standard error of the mean.
ππ = Standard deviation of the population
π = Number of observations in the sample
b) When standard deviation of the population is not known , We have to use the standard
deviation of the sample in calculating standard error of mean.
The formula for calculating standard error is
π. πΈ. οΏ½Μ οΏ½ =π(π πππππ)
βπ
Where πdenote the standard deviation of the sample.
Noteβ If standard deviation of both the sample and the population are available then
standard deviation of the sample in calculating standard error of mean is preferred.
Example:- Calculate the standard error of mean from the following data showing the amount
paid by 100 firms in Calcutta on the occasion of Durga Puja:
Mid value (Rs.) 39 49 59 69 79 89 99
No. of firms 2 3 11 20 32 25 7
Solution:-
π. πΈ. οΏ½Μ οΏ½ =π
βπ
CALCULATION OF STANDARD DEVIATION
Mid value π
π (π β 69)/10 = π
ππ ππ2
39 2 -3 -6 18 49 3 -2 -6 12
59 11 -1 -11 11
69 20 0 0 0 79 32 +1 +32 32
89 25 +2 +50 100 99 7 +3 +21 63
π = 100 βππ = 80 βππ2 = 236
π = ββππ2
πβ (
βππ
π)
2
Γ π = β236
100β (
80
100)
2
Γ 10
= β2.36 β 0.64 Γ 10 = 1.311 Γ 10 = 13.11
π. πΈ. οΏ½Μ οΏ½ =π
βπ=
13.11
β100=
13.11
10= 1.311
Two-tailed test for the Difference between the Means of two Samples:-
i. If two independent random samples with π1and π2 numbers (Both sample sizes are
greater than 30) respectively are drawn from the same population of standard
deviation π1 the standard error of the difference between the sample means is given
by the formula:
S.E. of the difference between sample means
= βπ 2 (1
π1
+1
π2 )
If π is unknown, sample standard deviation for combined sample must be substituted.
ii. If two random sample withπ1,π1, π1 and π2,π2 , π2 respectively are drawn from the
different populations, then the S.E. of the difference between the mean is given by
the formula:
= βπ1
2
π1
+π2
2
π2
And where π1 and π2 are unknown.
S.E. of the difference between the means
= βπ1
2
π1
+π2
2
π2
Where π1 and π2 are represented standard deviation of the two samples.
The null hypothesis to be tested is that there is no significant difference in the means of
the two samples. i.e. ,
π»0: π1 = π2 β Null hypothesis, there is no difference
π»π: π1 β π2 β Alternative hypothesis, a difference exists.
Example-1:-
Intelligence test on two groups of boys and girls gave the following results:
Mean S.D N Girls 75 15 150 Boys 70 20 250
Is there a significant difference in the mean scores obtained by boys and girls ?
Solution:-
Let us take the hypothesis that there is no significant difference in the mean scored obtained by
boys and girls.
π. πΈ. (π1 β π2) = βπ1
2
π1
+π2
2
π2
π1 = 15,π2 = 20,π1 = 150, π2 = 250
Substituting these values
π. πΈ. (οΏ½Μ οΏ½1 β π2) = β(15)2
150+
(20)2
250= β1.5 + 1.6 = 1.761
π·πππππππππ
π. πΈ.=
75 β 70
1.761= 2.84
Since the difference is more than 2.58 S.E.(1% label of significance), the hypothesis is rejected.
There seems to be a significant difference in the mean scores obtained by boys and girls.
Example-2:-
A man buys 50 electric bulbs of βPhilipsβ and 50 electric bulb of βHMTβ. He finds that βPhilipsβ
bulbs give an average life of 1500 hours with a standard deviation of 60 hours and βHMTβ bulbs
give an average life of 1512 hours with a standard deviation of 80 hours. Is there a significant
difference in the mean of the two makes of bulbs ?
Solution:-
Let us take the hypothesis that there is no significant difference in the mean life of the two
makes of the bulbs. Calculating standard error of difference of means
π. πΈ. (π1 β π2) = βπ1
2
π1
+π2
2
π2
π1 = 60, π2 = 50, π1 = 80, π2 = 50
Substituting these values
π. πΈ. (π1 β π2) = β(60)2
50+
(80)2
50= β
3600 + 6400
50= β200 = 14.14
Observed difference between the means=1512-1500=12
π·πππππππππ
π. πΈ.=
12
14.14= 0.849
Since the difference is less than 2.58 S.E.(1% label of significance), it could have arisen due to
fluctuation of sampling. Hence the difference in the mean of the two makes is not significant.
Test of significance for small samples:-
When the sample size is small(less than 30) the test for large sample will not work good. So
special tests are there for small samples , such as t-test and F-test.
Student t-distribution
Theoretical work on t-distribution are done by W.S. Gosset (1876-1937) In year 1900. Gosset
was employed by the Guinness & Son, a Dublin bravery, iseland, which did not permit employs
to publish research finding under their own names. So Gosset adopted the pen name
βstudentβ and published his finding under this name. Therefore, the t-distribution is commonly
called Student t-distribution.
The t-distribution is used when the sample size is 30 or less and the population standard
deviation is unknown. The t-statistic is defined as:
π‘ =οΏ½Μ οΏ½ β π
πΓ βπ
Where π =ββ(π₯βπ₯Μ )2
πβ1
Test the significance of the mean of a Random Sample:-
In determining whether the mean of a sample drawn from a normal distribution deviates
significantly from a stated value (the hypothetical value of the population mean), when
variance of the population is unknown we calculate the statistic:
π‘ =οΏ½Μ οΏ½ β π
πΓ βπ
οΏ½Μ οΏ½ = the mean of the sample
π = the actual or hypothetical mean of the population
π = the sample size
π = the standard deviation of the sample
π = ββ(π₯βπ₯Μ )2
πβ1 or π β ββπ2 βπ(οΏ½Μ οΏ½)2
πβ1= β
1
πβ1[βπ2 β
(βπ)2
π]
Where π = deviation from the assumed mean
If the calculated value of |π‘| exceeds π‘0.05 , we say that the difference between οΏ½Μ οΏ½ and π is
significant at 5% label if it exceeds π‘0.01 , the difference is said to be significant at 1% label . If
|π‘| < π‘0.05, we conclude that the difference between οΏ½Μ οΏ½ and π is not significant and hence the
sample might have been drawn from a population with mean = π .
Fiducial limits of population Mean:-
Assuming that the sample is a random sample from a normal population of unknown mean the
95% fiducial mean of the population mean (π) are:
οΏ½Μ οΏ½ Β±π
βπ π‘0.05
And 99% limits are
οΏ½Μ οΏ½ Β±π
βπ π‘0.01
Example:- The manufacture of a certain make of electric bulbs claims that his bulbs have a
mean life of 25 months with a standard deviation of 5 months. A random sample of 6 such
bulbs gave a following value. Life of months 24, 26, 30,20, 20, 18 .
Can you regard the procedureβs claim to be valid at 1% label of significance? (Given that the
table values of the appropriate test statistics at the said label are 4.032, 3.707 and 3.499 for 5,6
and 7 degree of freedom respectively.)
Solutions:- Let us take the hypothesis that there is no significant difference in the mean life of
bulbs in the sample and that of the population. Applying t-test
π‘ =οΏ½Μ οΏ½ β π
πΓ βπ
CALCULATION OF π and π
π₯ (π₯ β οΏ½Μ οΏ½) π₯ 2
24 +1 1
26 +3 9
30 +7 49 20 -3 9
20 -3 9
18 -5 25 βπ₯ = 138 βπ₯ 2 = 102
οΏ½Μ οΏ½ =βπ₯
π=
138
6= 23
π = ββπ₯ 2
π β 1= β
102
5= β20.4 = 4.517
π‘ =οΏ½Μ οΏ½ β π
πΓ βπ =
|23 β 25|
4.517 Γ β6 =
2 Γ 2.449
4.517= 1.084
π£ = π β 1 = 6 β 1 = 5. For π£ = 5 π‘0.01 = 4.032.
The calculated value of t is less then the tabulated value. So the hypothesis is accepted. Hence
the producerβs claim is not valid at 1% label of significance.
Example:- A random sample size 16 has 53 as mean. The sum of the squares of the deviation
taken from the mean is 135. Can this sample be regarded as taken from the population having
56 as mean ? Obtain 95% and 99% confidence limit of the mean of the population. ( For v=15,
π‘0.05 = 2.13, for v = 15,π‘0.01 = 2.95)
Solutionβ
Let us take the hypothesis that there is no significant difference between the simple mean and hypothetical population mean. . Applying t-test
π‘ =οΏ½Μ οΏ½ β π
πΓ βπ
οΏ½Μ οΏ½ = 53,π = 56, π = 16, β(π₯ β οΏ½Μ οΏ½)2 = 135
π = ββ(π₯ β οΏ½Μ οΏ½)2
π β 1= β
135
15= 3
π‘ =|53 β 56|
3β16 =
3 Γ 4
3= 4
π£ = 16 β 1 = 15,For π£ = 15, π‘0.05 = 2.13 The calculated value of t is more than the tabulated value. So the hypothesis is rejected. Hence,
the sample has not come from the population having 56 as mean. 95% confidence limit of the population mean
οΏ½Μ οΏ½ Β±π
βπ π‘0.05 = 53 Β±
3
β16Γ 2.13 = 53 Β±
3
4Γ 2.13 = 53 Β± 1.6 = 51.4 to 54.6
99% confidence limit of the population mean
οΏ½Μ οΏ½ Β±π
βπ π‘0.01 = 53 Β±
3
β16Γ 2.95 = 53 Β±
3
4Γ 2.95 = 53 Β± 2.212 = 50.788 to 55.212
Testing difference between means of two samples (Independent Samples):-
Given two independent random samples of size π1 πππ π2 with the means οΏ½Μ οΏ½1 πππ οΏ½Μ οΏ½2 and the
standard deviations π1πππ π2 we may be interested in testing the hypothesis that the samples
Come from same normal populations. To carry out the test, we calculate the statistic as follows:
π‘ =οΏ½Μ οΏ½1 β οΏ½Μ οΏ½2
πΓ β
π1π2
π1 + π2
Where οΏ½Μ οΏ½1 = mean of the first sample
οΏ½Μ οΏ½2 = mean of the second sample
π1 = number of the observations in the first sample
π2 = number of the observations in the second sample
π = Combined standard deviation .
The value of π is calculated by the following formula:
π = ββ(π₯1 β οΏ½Μ οΏ½1)2 + β(π₯2 β οΏ½Μ οΏ½2)2
π1 + π2 β 2
When the actual means are in fraction the deviation should be taken from the assumed
means. In such a case the combined standard deviation is obtained by applying following
formula:
π = ββ(π₯1 β π΄1)2 + β(π₯2 β π΄2 )2 β π1(οΏ½Μ οΏ½1 β π΄1)2 β (οΏ½Μ οΏ½2 β π΄2)2
π1 + π2 β 2
π΄1 = Assumed mean of the first sample
π΄2 = Assumed mean of the second sample
οΏ½Μ οΏ½1 = Actual mean of the first sample
οΏ½Μ οΏ½2 = Actual mean of the second sample
The degree of freedom = π1 + π2 β 2.
When we are given the number of observation and the standard deviation of the two
samples, the pooled estimate of standard deviation can be obtained as follows:
π = β(π1 β 1)π1
2 + (π2 β 1)π22
π1 + π2 β 2
The calculated value of π‘ be > π‘0.05 (π‘0.01 ), the difference between the sample means is
said to be significant at 5%(1%) label of significance otherwise the data are said to be
consistent with the hypothesis.
Example:- Two typed of drug are used on 5 and 7 patient for reducing their weight.
Drug A was imported and drug B was indigenous. The decreases in the weight after using the
drug for six months as follows:
Drug A 10 12 13 11 14 Drug B 8 9 12 14 15 10 9
Solution:- Let us take the hypothesis that there is no significant difference in the
efficiency of the two drugs. Applying t-test
π‘ =οΏ½Μ οΏ½1 β οΏ½Μ οΏ½2
πΓ β
π1π2
π1 + π2
π₯1 (π₯1 β οΏ½Μ οΏ½1) (π₯1 β οΏ½Μ οΏ½1)2 π₯2 (π₯2 β οΏ½Μ οΏ½2) (π₯2 β οΏ½Μ οΏ½2)2 10 -2 4 8 -3 9
12 0 0 9 -3 9 13 +1 1 12 +1 1
11 -1 1 14 +3 9
14 +2 4 15 +4 16
10 -1 1
9 -2 4 βπ₯1 = 60 β(π₯1 β οΏ½Μ οΏ½1)2
= 10 βπ₯2 = 77 β(π₯2 β οΏ½Μ οΏ½2)2
= 44
οΏ½Μ οΏ½1 =βπ₯1
π1=
60
5= 12; οΏ½Μ οΏ½2 =
βπ₯2
π2=
77
7= 11
π = ββ(π₯1 β οΏ½Μ οΏ½1)2 + β(π₯2 β οΏ½Μ οΏ½2)2
π1 + π2 β 2= β
10 + 44
5 + 7 β 2= β
54
10= 2.324
π‘ =οΏ½Μ οΏ½1 β οΏ½Μ οΏ½2
πΓ β
π1π2
π1 + π2
=12 β 11
2.324Γ β
5 Γ 7
5 + 7=
1.708
2.324= 0.735
π£ = π1 + π2 β 2 = 5 + 7 β 2 = 10
π£ = 10, π‘0.05 = 2.228
For calculated value of t is less than the table value, the hypothesis is accepted. Hence, there is
no significance in the efficacy of two drugs. Since drug B is indigenous and there is no difference
in the efficacy of imported and ingenious drugs, we should by ingenious B.
Testing Difference between Means of two sample (Dependent sample or Matched Paired
Sample):-
Two samples are said to be dependent when the elements in one sample are related to those in
the other in any significant or meaningful manner. In fact the two samples may consist of pair
of observations made on the same objects, individual or more generally, on the same selected
population elements. The t-test based on the paired observations is defined by the following
formula:
π‘ =πβ0
πΓ βπ or π‘ =
πβπ
π
Where οΏ½Μ οΏ½ = the mean of the differences
π = the standard deviation of the differences
The value of π is calculated as follows:
π = ββ(π β οΏ½Μ οΏ½)2
π β 1 ππ β
βπ2 β π(οΏ½Μ οΏ½)2
π β 1
It should be noted that π‘ is based on π β 1 degree of freedom.
Example:-
To verify whether a course in accounting improved performance, a similar test was given to 12
participants both before and after the course. The original mark recorded in the alphabetical β
Were 44,40, 61,52,32,44,70,41,67,72,53 and 72. After the course, the marks were in the same
order 53,38,69,57,46,39,73,48,73,74,60 and 78. Was the course useful ?
Solution:-
Let us take the hypothesis that there is no significant difference in the marks obtained before
and after the course. i.e. The course has not been useful.
Applying the t- test(difference formula):
π‘ =οΏ½Μ οΏ½βπ
π
Participants Before (1st Test)
After (2nd Test)
2nd -1st Test π
π2
A 44 53 +9 81
B 40 38 -2 4
C 61 69 +8 64 D 52 57 +5 25
E 32 46 +14 196 F 44 39 -5 25
G 70 73 +3 9
H 41 48 +7 49 I 67 73 +6 36
J 72 74 +2 4 K 53 60 +7 49
L 72 78 +6 36 βπ = 60 βπ2 = 578
οΏ½Μ οΏ½ =βπ
π=
60
12= 5
π = ββπ2 β π(οΏ½Μ οΏ½)2
π β 1= β
578 β 12(5)2
12 β 1=
278
11= 5.03
π‘ =οΏ½Μ οΏ½βπ
π=
5 Γ β12
5.03=
5 Γ 3.464
5.03= 3.443
π£ = π β 1 = 12 β 1 = 11; πΉππ π£ = 11,π‘0.05 = 2.201
The calculated value of t is greater than the tabulated value. So the hypothesis is rejected.
Hence the course has been useful.
The F-test or the variance ratio test:-
The F-test is named in the honor of the great statistician R.A. Fisher. The object of the F-test is
to find out whether the two independent estimates of population variance differ significantly or
whether the two samples may be regarded as drawn from the normal populations having the
same variance. For carrying one out the test of significance, we calculate the ratio F. F is defined
as
πΉ =π1
2
π22,
Where π12 =
β(π₯1β π₯Μ 1)2
π1β1 and π2
2 =β(π₯2β π₯Μ 2)2
π2β1
It should be noted that π12 is always the larger estimate of variance. i.e. π1
2 > π22.
πΉ =πΏπππππ ππ π‘ππππ‘π ππ π£πππππππ
πππππππ ππ π‘ππππ‘π ππ π£πππππππ
π£1 = π1 β 1 and π£2 = π2 β 1
π£1 = degrees of freedom of the sample having larger variance
π£2 = degrees of freedom of the sample having smaller variance
The calculated value of F is compared with the tabulated value for π£1 and π£2 at 5% or 1% label
of significance. If calculated value of F is greater than the tabulated value then the F ratio is
considered significant and the null hypothesis is rejected. On the other hand If calculated value
of F is less than the tabulated value then the null hypothesis is accepted and it id inferred that
the both the sample have come from the population having the same variance.
Since F test is based on the ratio of two variances, it is also called Variance Ratio Test.
Exampleβ
Two random samples were drawn from two normal populations and their values are
A 66 67 75 76 82 84 88 90 92 B 64 66 74 78 82 85 87 92 93 95 97
Test whether the two populations have the same variance at the 5% label of significance
(F=3.36) at 5% label for π£1 = 10 and π£2 = 8.
Solutionβ
Let us take the hypothesis that the two populations have the same variance. Applying F-test
πΉ =π1
2
π22
A π1
(π1 β π1) = π₯1
π₯12 B
π2 (π2 β π2)
= π₯2 π₯2
2
66 -14 196 64 -19 361
67 -13 169 66 -17 289 75 -5 25 74 -9 81
76 -4 16 78 -5 25 82 +2 4 82 -1 1
84 +4 16 85 +2 4 88 +8 64 87 +4 16
90 +10 100 92 +9 81
92 +12 144 93 +10 100 95 +12 144
97 +14 196 βπ1 = 720 βπ₯1 = 0 βπ₯1
2 = 734 βπ2 = 913 βπ₯2 = 0 βπ₯22 = 1298
π1 =βπ1
π1
=720
9= 80; π2 =
βπ2
π2
=913
11= 83
π12 =
β(π1)2
π1 β 1=
734
9 β 1= 91.75
π22 =
β(π1)2
π2 β 1=
1298
11 β 1= 129.8
πΉ =π1
2
π22
=91.75
129.8= 0.707
For π£1 = 10 and π£2 = 8. πΉ0 .05 = 3.36.
The calculated value of F is less than the tabulated value. So the hypothesis is accepted. Hence
it may be calculated that the two populations have same variance.
Chi-Square Test:-
The Ο2 test (pronounced Chi-Square Test) is one of the simplest and most widely used non-
parametric tests on statistical test. The symbol Ο2 is the Greek later Chi . The Ο2 test was first
used by Karl Pearson in the year 1900. The quantity Ο2describes the magnitudes of the
discrepancy between theory and observations. It is defined as:
Ο2 = β(π β πΈ)2
πΈ
Where π is the observed frequencies and πΈ refers to the expected frequencies.
Example:- In an antimalarial complain in a certain area, quinine was administered to 812
persons out of total population of 3248. The number of fever cases is shown below
Treatment Fever No fever Total
Quinine 20 792 812 No quinine 220 2216 2436
Total 240 3008 3248
Discuss the usefulness of quinine in checking malaria.
Solution:-Let us take the hypothesis that quinine is not effective in checking malaria.
Applying Ο2 test:
πΈ11 = Expectation of (AB) =(π΄)Γ(π΅)
π=
240 Γ812
3248= 60
Expecting the frequency corresponding to first row and first column is 60
πΈ12 =3008 Γ 812
3248= 752
πΈ21 =240 Γ 2436
3248= 180
πΈ22 =3008 Γ 2436
3248= 2256
The table of the expected frequency shall be:
60 752 812
180 2256 2436
240 3008 3248
π πΈ (π β πΈ)2 (π β πΈ)2/πΈ
20 60 1600 26.667 220 180 1600 8.889
792 752 1600 2.128
2216 2256 1600 0.709
β(π β πΈ)2/πΈ = 38.393
Ο2 = β(π β πΈ)2
πΈ= 38.393
π£ = (π β 1)(π β 1) = (2 β 1)(2 β 1) = 1
π£ = 1, Ο20.05
= 3.84
The calculated value of Ο2 is greater than the tabulated value. So the hypothesis is rejected.
Hence quinine is useful in checking malaria.
Yates Correction
The Yates correction is a correction made to account for the fact that both Pearsonβs chi-square
test and Mc Nemarβs chi-square test are biased upwards for a 2 x 2 contingency table. An
upwards bias gives a larger result than they should be then the Yates correction is usually
recommended, especially if the expected cell frequency is below 5.
Calculating the Yates Correction
In Yates correction, 0.5 is subtracted from the numerical difference between the observed
frequencies and expected frequencies. It is just the Chi2 formula with the .5 subtraction:
π2πππ‘ππ
= β(|π β πΈ| β 0.5)2
πΈ
Arguments for why the Yates Correction should not be used
Although some people recommend that you should use the correction only if your expected cell
frequency is below 5, others recommend that you donβt use it at all. A large body of research has found that the correction is too strict. Several researchers, including Yates, have used known statistical data to test whether the correction works. If we are using a statistical program like SPSS to calculate the critical chi-square value for a contingency table, the program will usually force you to incorporate the correction. However, knowing that the correction may be too strict allows you to make a judgment call on your data.