Small Sampling Theory Presentation1

Small Sampling Theory

• Small sample theory: The study of statistical inference with small sample (i.e. n≤30). It includes t-distribution and F-distribution. They are defined in terms of “number of degrees of freedom”.

• Degrees of freedom ν: Number of useful items of information generated by a sample of given size with respect to the estimation of a given population parameter.

OR Total number of observations minus the number of independent

constraints imposed on the observations. n - no. of observations k - no. of independent constants then n - k = no. of degrees of freedom Example:- X = A + B + C , (10 = 2 + 3 + C , so C = 5) n = 4 , k = 3 n – k = 1 , so 1 degree of freedom.

Introduction

t - Distribution

• William Sealy Gosset published t-distribution in 1908 in Biometrika under pen name “Student”.

• When sample size is large than 30, then sampling distribution of mean will follow Normal distribution.

• If sample size is less than 30, then sample statistic will follow t-distribution.

• Probability density function of t-distribution:

Y0 is a constant depending on n such that area under the curve is 1.

t-table gives the probability integral of t-distribution.

2

12

1

)(

t

Ytf

o

Properties of t-Distribution

• Ranges from –∞ to ∞ • Bell-shaped and symmetrical around mean zero.• Its shape changes as the no. of degrees of freedom

changes. Hence ν is a parameter of t-distribution.• Variance is always greater than one and is defined

only when v ≥ 3, given as

• It is more platykurtic (less peaked at the centre and higher in tails) than normal distribution.

• It has greater dispersion than normal distribution. As n gets larger, t-distribution approaches normal form.

2)(

tVar

Steps involved in testing of hypothesis.

1. Establish a null hypothesis2. Suggest an alternate hypothesis.3. Calculate t value.4. Find degrees of freedom.5. Set up a suitable significance level.6. From t-table find critical value of t using α (risk of type

1 error, significance level) and v- degrees of freedom.7. If calculated t value is less than critical value obtained

from table, then null hypothesis is accepted. Otherwise alternate hypothesis is accepted.

Applications of t - distribution

1. Test of Hypothesis about the population mean.

2. Test of Hypothesis about the difference between two mean.

3. Test of Hypothesis about the difference between two mean with dependent samples.

4. Test of Hypothesis about coefficient of correlation.

1. Test of Hypothesis about the population mean(σ unknown and small sample size)

• Null hypothesis: • t value is given as:

• Standard deviation of sample is given as:

• Degrees of freedom = n – 1• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null hypothesis is

rejected.• 100(1-α)% Confidence interval for population mean:

1

2

n

xxs

nsx

t

x

nstx

nstx ,2,2

Test of hypothesis about the difference between two means

When population variances are unknown,

t-test can be used in two types.

(a) When variances are equal.

(b) When variances are not equal.

(a) Case of equal variances• Null hypothesis: μ1 = µ2

• t value is given as:

where,

and

• Degrees of freedom: n1 + n2 – 2• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null hypothesis is

rejected.

21

21

11nn

s

xxt

2

11

21

222

211

nn

snsns

11

2

1121

n

xxs

12

2

2222

n

xxs

(b) Case of unequal variances• When population variances are not equal, we use unbiased estimators s1

2 and s2

2 to replace σ12 and σ2

2.• Here, sampling distribution has large variability than population variability.• t value:

• Degrees of freedom:

• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null hypothesis is rejected.

2

22

1

21

2121

ns

ns

xxt

11

..

2

2

222

1

2

121

2

2221

21

n

ns

n

ns

nsnsfd

Confidence interval for the difference between two means

21

,221

11nn

stxx

Two samples of sizes n1 and n2 are randomly and independently drawn from two normally distributed populations with unknowns but equal variances. The 100(1-α)% confidence interval for µ1 - µ2 is given by:

(3) Test of hypothesis about the difference between two means with dependent samples (paired t-test)

• Samples are dependent, each observation in one sample is associated with some particular observation in second sample.

• Observations in two samples should be collected in form called matched pairs.

• Two samples should have same number of units.• Instead of 2 samples we can get one random sample of pairs and two

measurements associated with a pair will be related to each other. Example: in before and after type experiments or when observations are matched by rise or some other criterion.

• Null hypothesis: μ1 = µ2


where, mean of differences,

standard deviation of differences,

• Degrees of freedom = n – 1• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null hypothesis is rejected.• Confidence interval for the mean of the difference:

s

ndt

nd

d

11

22

nn

d

n

ds

(4) Testing of hypothesis about coefficient of correlation.

• Case 1: testing the hypothesis when the population coefficient of correlation equals zero, i.e., Ho : ρ=0

• Case 2: testing the hypothesis when the population coefficient of correlation equals some other value than zero, i.e., Ho: ρ= ρo

• Case 3: testing the hypothesis for the difference between two independent correlation coefficients.

Case 1: testing the hypothesis when the population coefficient of correlation

equals zero, i.e., Ho : ρ=0• Null hypothesis: there is no correlation in population, i.e.,

Ho: ρ=0


• Degrees of freedom: n-2• Calculate table value at specified significance level & d.f.• If calculated value is more than table value then null

hypothesis is rejected, then there is linear relationship between the variables.

21 2

nr

rt

Case 2: testing the hypothesis when the population coefficient of correlation equals some other value than zero, i.e., Ho: ρ= ρo

• When ρ≠0, test based on t-distribution will not be appropriate, but Fisher’s z-transformation will be applicable.

z = 0.5 loge (1+r)/(1-r) OR z = 1.1513 log10 (1+r)/(1-r)• Z is normally distributed with mean zρ = 0.5 loge (1+ ρ)/(1- ρ)• Standard deviation: σz = 1/√(n-3)• This test is more applicable if sample size is large

( atleast 10).

• Null hypothesis: Ho: ρ= ρo

• Test statistic:

• Which follows approx. standard normal distribution.

z

zzz

Case 3: testing the hypothesis for the difference between two independent

correlation coefficients• To test the hypothesis of 2 correlation coefficients derived

from two separate samples, compare the difference of the 2 corresponding values of z with the standard error of that difference.

• Formula used:

• If the absolute value of this statistic is greater than 1.96, the difference will be significant at 5% significance level.

2

210

2

22

1

110

1

11

21

2121

1

1log1513.1

1

1log2

1

1

1log1513.1

1

1log2

1

3

1

3

121

r

r

r

rz

r

r

r

rz

where

nn

zzzzz

e

e

zz

The F - Distribution• Named in honour of R.A. Fisher who studied it in 1924.• It is defined in terms of ratio of the variances of two normally

distributed populations. So, it sometimes also called variance ratio.• F – distribution :

where,

s12, s2

2 are unbiased estimator of σ12, σ2

2 resp. • Degrees of freedom: v1 = n1-1, v2 - 1 • If σ1

2=σ22, then , F=s1

2/s22

• It depends on v1 and v2 for numerator and denominator resp., so v1 and v2 are parameters of F distribution.

• For different values of v1 and v2 we will get different distributions.

22

22

21

21

s

s

1

1

2

2

2222

1

2

1121

n

xxs

n

xxs

Probability density function

• Probability density function of F-distribution:

21 21

2

1

121

FYFf o

Properties of F-distribution• It is positively skewed and its skewness decreases with increase in

v1 and v2.• Value of F must always be positive or zero, since variances are

squares. So its value lies between 0 and ∞.• Mean and variance of F-distribution:

Mean = v2/(-v2-2), for v2 > 2

Variance = 2v22(v1+v2-2) , for v2 > 4

v1(v2-2)2(v2-4)• Shape of F-distribution depends upon the number of degrees of

freedom.• The areas in left hand side of the distribution can be found by taking

reciprocal of F values corresponding to the right hand side, when the no. of degrees of freedom in nr. And in dr. are interchanged. It is known as reciprocal property,

F1-α,v1,v2=1/Fα,v2,v1

we can find lower tail f values from corresponding upper tail F values, which are given in appendix.

Testing of hypothesis for equality of two variances

It is based on the variances in two independently selected random samples drawn from two normal populations.

• Null hypothesis Ho: σ12 = σ2

2

• F = s12/σ1

2 , which reduces to F = s12

s22/σ2

2 s22

place large sample variance in numerator.• Degrees of freedom v1 and v2.• Find table value using v1 and v2.• If calculated F value exceeds table F value, null

hypothesis is rejected.

Confidence interval for the ratio of two variances

• 100(1-α)% confidence interval for the ratio of the variances of two normally distributed populations is given by:

s12/s2

2 < σ12 < s1

2/s22

F(1-α/2) σ22 Fα/2

Small Sampling Theory Presentation1

Business

Transcript of Small Sampling Theory Presentation1