Small Sampling Theory Presentation1
description
Transcript of Small Sampling Theory Presentation1
Small Sampling Theory
• Small sample theory: The study of statistical inference with small sample (i.e. n≤30). It includes t-distribution and F-distribution. They are defined in terms of “number of degrees of freedom”.
• Degrees of freedom ν: Number of useful items of information generated by a sample of given size with respect to the estimation of a given population parameter.
OR Total number of observations minus the number of independent
constraints imposed on the observations. n - no. of observations k - no. of independent constants then n - k = no. of degrees of freedom Example:- X = A + B + C , (10 = 2 + 3 + C , so C = 5) n = 4 , k = 3 n – k = 1 , so 1 degree of freedom.
Introduction
t - Distribution
• William Sealy Gosset published t-distribution in 1908 in Biometrika under pen name “Student”.
• When sample size is large than 30, then sampling distribution of mean will follow Normal distribution.
• If sample size is less than 30, then sample statistic will follow t-distribution.
• Probability density function of t-distribution:
Y0 is a constant depending on n such that area under the curve is 1.
t-table gives the probability integral of t-distribution.
2
12
1
)(
t
Ytf
o
Properties of t-Distribution
• Ranges from –∞ to ∞ • Bell-shaped and symmetrical around mean zero.• Its shape changes as the no. of degrees of freedom
changes. Hence ν is a parameter of t-distribution.• Variance is always greater than one and is defined
only when v ≥ 3, given as
• It is more platykurtic (less peaked at the centre and higher in tails) than normal distribution.
• It has greater dispersion than normal distribution. As n gets larger, t-distribution approaches normal form.
2)(
tVar
Steps involved in testing of hypothesis.
1. Establish a null hypothesis2. Suggest an alternate hypothesis.3. Calculate t value.4. Find degrees of freedom.5. Set up a suitable significance level.6. From t-table find critical value of t using α (risk of type
1 error, significance level) and v- degrees of freedom.7. If calculated t value is less than critical value obtained
from table, then null hypothesis is accepted. Otherwise alternate hypothesis is accepted.
Applications of t - distribution
1. Test of Hypothesis about the population mean.
2. Test of Hypothesis about the difference between two mean.
3. Test of Hypothesis about the difference between two mean with dependent samples.
4. Test of Hypothesis about coefficient of correlation.
1. Test of Hypothesis about the population mean(σ unknown and small sample size)
• Null hypothesis: • t value is given as:
• Standard deviation of sample is given as:
• Degrees of freedom = n – 1• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null hypothesis is
rejected.• 100(1-α)% Confidence interval for population mean:
1
2
n
xxs
nsx
t
x
nstx
nstx ,2,2
Test of hypothesis about the difference between two means
When population variances are unknown,
t-test can be used in two types.
(a) When variances are equal.
(b) When variances are not equal.
(a) Case of equal variances• Null hypothesis: μ1 = µ2
• t value is given as:
where,
and
• Degrees of freedom: n1 + n2 – 2• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null hypothesis is
rejected.
21
21
11nn
s
xxt
2
11
21
222
211
nn
snsns
11
2
1121
n
xxs
12
2
2222
n
xxs
(b) Case of unequal variances• When population variances are not equal, we use unbiased estimators s1
2 and s2
2 to replace σ12 and σ2
2.• Here, sampling distribution has large variability than population variability.• t value:
• Degrees of freedom:
• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null hypothesis is rejected.
2
22
1
21
2121
ns
ns
xxt
11
..
2
2
222
1
2
121
2
2221
21
n
ns
n
ns
nsnsfd
Confidence interval for the difference between two means
21
,221
11nn
stxx
Two samples of sizes n1 and n2 are randomly and independently drawn from two normally distributed populations with unknowns but equal variances. The 100(1-α)% confidence interval for µ1 - µ2 is given by:
(3) Test of hypothesis about the difference between two means with dependent samples (paired t-test)
• Samples are dependent, each observation in one sample is associated with some particular observation in second sample.
• Observations in two samples should be collected in form called matched pairs.
• Two samples should have same number of units.• Instead of 2 samples we can get one random sample of pairs and two
measurements associated with a pair will be related to each other. Example: in before and after type experiments or when observations are matched by rise or some other criterion.
• Null hypothesis: μ1 = µ2
• t value is given as:
where, mean of differences,
standard deviation of differences,
• Degrees of freedom = n – 1• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null hypothesis is rejected.• Confidence interval for the mean of the difference:
s
ndt
nd
d
11
22
nn
d
n
ds
(4) Testing of hypothesis about coefficient of correlation.
• Case 1: testing the hypothesis when the population coefficient of correlation equals zero, i.e., Ho : ρ=0
• Case 2: testing the hypothesis when the population coefficient of correlation equals some other value than zero, i.e., Ho: ρ= ρo
• Case 3: testing the hypothesis for the difference between two independent correlation coefficients.
Case 1: testing the hypothesis when the population coefficient of correlation
equals zero, i.e., Ho : ρ=0• Null hypothesis: there is no correlation in population, i.e.,
Ho: ρ=0
• t value is given as:
• Degrees of freedom: n-2• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null
hypothesis is rejected, then there is linear relationship between the variables.
21 2
nr
rt
Case 2: testing the hypothesis when the population coefficient of correlation equals some other value than zero, i.e., Ho: ρ= ρo
• When ρ≠0, test based on t-distribution will not be appropriate, but Fisher’s z-transformation will be applicable.
z = 0.5 loge (1+r)/(1-r) OR z = 1.1513 log10 (1+r)/(1-r)• Z is normally distributed with mean zρ = 0.5 loge (1+ ρ)/(1- ρ)• Standard deviation: σz = 1/√(n-3)• This test is more applicable if sample size is large
( atleast 10).
• Null hypothesis: Ho: ρ= ρo
• Test statistic:
• Which follows approx. standard normal distribution.
z
zzz
Case 3: testing the hypothesis for the difference between two independent
correlation coefficients• To test the hypothesis of 2 correlation coefficients derived
from two separate samples, compare the difference of the 2 corresponding values of z with the standard error of that difference.
• Formula used:
• If the absolute value of this statistic is greater than 1.96, the difference will be significant at 5% significance level.
2
210
2
22
1
110
1
11
21
2121
1
1log1513.1
1
1log2
1
1
1log1513.1
1
1log2
1
3
1
3
121
r
r
r
rz
r
r
r
rz
where
nn
zzzzz
e
e
zz
The F - Distribution• Named in honour of R.A. Fisher who studied it in 1924.• It is defined in terms of ratio of the variances of two normally
distributed populations. So, it sometimes also called variance ratio.• F – distribution :
where,
s12, s2
2 are unbiased estimator of σ12, σ2
2 resp. • Degrees of freedom: v1 = n1-1, v2 - 1 • If σ1
2=σ22, then , F=s1
2/s22
• It depends on v1 and v2 for numerator and denominator resp., so v1 and v2 are parameters of F distribution.
• For different values of v1 and v2 we will get different distributions.
22
22
21
21
s
s
1
1
2
2
2222
1
2
1121
n
xxs
n
xxs
Probability density function
• Probability density function of F-distribution:
21 21
2
1
121
FYFf o
Properties of F-distribution• It is positively skewed and its skewness decreases with increase in
v1 and v2.• Value of F must always be positive or zero, since variances are
squares. So its value lies between 0 and ∞.• Mean and variance of F-distribution:
Mean = v2/(-v2-2), for v2 > 2
Variance = 2v22(v1+v2-2) , for v2 > 4
v1(v2-2)2(v2-4)• Shape of F-distribution depends upon the number of degrees of
freedom.• The areas in left hand side of the distribution can be found by taking
reciprocal of F values corresponding to the right hand side, when the no. of degrees of freedom in nr. And in dr. are interchanged. It is known as reciprocal property,
F1-α,v1,v2=1/Fα,v2,v1
we can find lower tail f values from corresponding upper tail F values, which are given in appendix.
Testing of hypothesis for equality of two variances
It is based on the variances in two independently selected random samples drawn from two normal populations.
• Null hypothesis Ho: σ12 = σ2
2
• F = s12/σ1
2 , which reduces to F = s12
s22/σ2
2 s22
place large sample variance in numerator.• Degrees of freedom v1 and v2.• Find table value using v1 and v2.• If calculated F value exceeds table F value, null
hypothesis is rejected.
Confidence interval for the ratio of two variances
• 100(1-α)% confidence interval for the ratio of the variances of two normally distributed populations is given by:
s12/s2
2 < σ12 < s1
2/s22
F(1-α/2) σ22 Fα/2