Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and...

29
Estimation Bias, Estimation Bias, Standard Error and Standard Error and Sampling Distribution Sampling Distribution Topic 9 Topic 9

Transcript of Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and...

Page 1: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Estimation Bias, Standard Estimation Bias, Standard Error and Sampling Error and Sampling

DistributionDistribution

Estimation Bias, Standard Estimation Bias, Standard Error and Sampling Error and Sampling

DistributionDistribution

Topic 9Topic 9

Page 2: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

From sample to population

Inductive (inferential) statistical methods

Make inference about a population based on information from a sample derived from that population

Population

sample

inductive statistical methods

Page 3: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Statistical Concepts of Sampling

• Suppose we want to estimate the mean birthweight of Malay male live births in Singapore, 1992

• Due to logistical constraints, we decide to take a random sample of 50 live births from the records of all Malay male live births for that year

Page 4: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Sampling from Target Population

random sample of 50 Malay male live births in Singapore, 1992

Target population:

All Malay male live births

in Singapore, 1992 Suppose

sample mean = 3.55 kgsample SD (S) = 0.92 kgWhat can we say about the population mean?

Page 5: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Statistical Modeling

• Assume the population values follow a normal or some other appropriate distribution. This means a relative frequency histogram of the population values will look like a normal or that appropriate distribution.

• Assume we have a random sample, i.e., we sample n (=50 in example) values independently from the population

Page 6: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

NotationSample data: nXX ,...,1

Assume nXX ,...,1 are independent and each is

distributed according to say a normal distribution

Population parameters:Population mean = mean of the normal population

Population variance = variance of the normal population2

Population standard deviation

Page 7: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Two general areas:(a) Statistical Estimation

i.e. estimating population parameters based on sample statistics

Statistical Inference

(b) Hypothesis Testing

i.e. testing certain assumptions about the population

Also called Test of Statistical Significance

Page 8: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Statistical Estimation

There are two ways by which a population parameter can be estimated from a sample:

(1) Point estimate

(2) Interval estimate

Page 9: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Point Estimate

Estimate the population parameter by a

single value:

Sample mean population mean

Sample median population median

Sample variance population variance

Sample SD population SD

Sample proportion population proportion

Page 10: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

If the average birthweight for a random sample of Malay male births was 3.55 kg and we use it to estimate , the mean birthweight of all Malay male births in the population, we would be making a point estimate for

Point Estimate

• Poor practice to report just the point estimate because people cannot judge how good the estimate is

• Should also report the accuracy of the estimate.

• Remember that the quality of an estimator is judged by its performance over REPEATED SAMPLING although we have just one sample in hand.

Inference for population parameter should make allowance for sampling error

Page 11: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Accuracy of statistical estimationTwo types of error:

(a) Sampling error or fluctuation“random” error or fluctuation that is due entirely to chance in the process of sampling. Minimizing the sampling error maximizes the precision of a statistical estimate.

(b) Systematic error or biasNon-random error/bias which is either a property of the estimator itself or due to bias in the sampling or measurement process. Minimizing the systematic error maximizes the validity of a statistical estimate. Systematic errors can be minimized by making efforts to reduce measurement bias (eg non-random sampling, non-response and non-coverage, untruthful answers, unreliable calibration, errors with data recording and coding etc)

Page 12: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Unbiased estimation of the mean

i.e., the sample mean equals the population meanwhen averaged over repeated samples

Page 13: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Sample Mean1 3.552 3.593 3.484 3.515 3.496 3.467 3.488 3.529 3.51

10 3.49

•Unbiasedness means the sample mean equals the population mean when averaged over repeated samples•However, there is fluctuation from sample to sample•Variance = ?

Hypothetical results of repeated sampling

Page 14: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Page 15: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Page 16: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Standard Error (SE) of an estimator

• The SE of an estimator (e.g., the sample mean) is just the standard deviation (SD) of the estimator. It measures the variability of the estimator under “repeated” sampling

• SE is just a special case of SD• The reason why the standard deviation of an

estimator is called standard error is because it is a measure the magnitude of the estimation error due to sampling fluctuation

Page 17: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Standard Deviation vs Standard Error• The population standard deviation (SD) measures the

amount of variation among the individual measurements that make up the population and can be estimated from a sample using the sample standard deviation.

• The standard error (e.g. of the sample mean), on the other hand, measures how much the value of the estimator changes from sample to sample under repeated sampling.

• As we take only 1 sample rather that repeated samples in practice, it seems impossible at first to estimate standard error which is defined with reference to repeated sampling.

• Fortunately, the standard error of the sample mean is a function of the population SD. As the latter is estimable from a single sample, so is the standard error.

Page 18: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Estimated standard error of the sample mean

• Let denote the population SD

• It was shown earlier that

• SE = SD(sample mean) = / , where n is the sample size

• Since can be estimated by the sample standard deviation S, we can estimate the standard error by SE = S/

n

n

Note that SE decreases with n at the rate 1/ , i.e., the precision of the sample mean improves as sample size increases

n

Page 19: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Knowing the mean and standard error of an estimator still doesn’t tell us the whole story

The whole story is told by the sampling distribution since that helps in calculating the probabilities

Page 20: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Sampling distribution of the sample mean• The distribution of the sample mean under “repeated”

sampling from the population

Sample Mean1 3.552 3.593 3.484 3.515 3.496 3.467 3.488 3.529 3.51

10 3.49

•Distribution of the sample mean rather than individual measurements•In practice, we take only one sample, not repeated samples and so the sampling distribution is unobserved but fortunately it can often be derived theoretically

Demo: http://www.ruf.rice.edu/~lane/stat_sim/index.html

Page 21: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

• If the population is normal with mean and variance , then the sample mean based on a random sample of size n is also normal with mean and variance

• Note how we can derive theoretically the distribution of the sample mean under repeated sampling without actually drawing repeated samples

• This is important because we usually only have one sample at our disposal in practice

Exact result when sampling from a normal population

n/2

2

n

NX2

,~ i.e.,

Page 22: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Topic 10: Interval Estimate

• Provides an estimate of the population parameter by defining an interval or range of plausible values within which the population parameter could be found with a given confidence.

• This interval is called a confidence interval.• The sampling distribution is used in

constructing confidence intervals.

Page 23: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Confidence interval for the mean of a normal population

Fact: With probability 0.95, a normally distributed variable is within 1.96 standard deviations from its mean.

Nown

SE )(SD with ,),(~2 X

nNX

•It follows that the sample mean must be within 1.96 standard errors from the population mean with probability 0.95.• Equivalently, the population mean is within 1.96 standard errors from the sample mean.

Page 24: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

nX

nXP

96.196.195.0

We call

)( 96.196.1 ,96.1 XSEXn

Xn

X

a 95% confidence interval for the population mean.

If is unknown, replace it by the sample SD

and replace 1.96 by the upper 2.5-percentile of a t-distribution with n-1 degrees of freedom to yield

S

Page 25: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

)( ,^

025.0,1025.0,1025.0,1 XSEtXn

StX

nS

tX nnn

as a 95% confidence interval for the population mean

Page 26: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

The t densities

• t densities are symmetric and similar in appearance to N(0,1) density but with heavier tails

• Tables for t distributions are widely available

• As d.f. increases, t distribution converges to standard normal distribution

Demo: http://www.isds.duke.edu/sites/java.html

Page 27: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

)( ,^

025.0,1025.0,1025.0,1 XSEtXn

StX

nS

tX nnn

95% confidence interval for the population mean

Birthweight data revisited•n = 100, Sample mean = 3.55 kg, S = 0.92 kg

•SE = .92/sqrt(50) = 0.13 kg

•d.f. = 49, upper 2.5-percentile of t = 2.01

•95% C.I. for the mean Malay male birthweight is

3.55 +/- 2.01 (0.13) = (3.29 kg, 3.81 kg)

Page 28: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

The meaning of confidence interval

nS

tX n 025.0,1

Under repeated sampling,

will contain the true mean 95% of the times.

Page 29: Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Demo: http://www.isds.duke.edu/sites/java.html