Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson7-1 Lesson 7: Estimation and Confidence...

44
Lesson7-1 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson 7: Estimation and Confidence Estimation and Confidence Intervals Intervals
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson7-1 Lesson 7: Estimation and Confidence...

Lesson7-1 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Lesson 7:

Estimation and Confidence Estimation and Confidence IntervalsIntervals

Lesson7-2 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Outline

Point and interval estimates

Confidence intervals

Student’s t-distribution

Degree of freedom

Confidence interval for population mean

Confidence interval for a population proportion

Selecting a sample size

Summary

Lesson7-3 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Point and Interval Estimates

A point estimate is a single value (statistic) used to estimate a population value (parameter).

A confidence interval is a range of values within which the population parameter is expected to occur.

Lesson7-4 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Confidence Intervals

The degree to which we can rely on the statistic is as important as the initial calculation.

Samples give us estimates of the population parameter – only estimates. Ultimately, we are concerned with the accuracy of the estimate.

1. Confidence interval provides range of values Based on observations from 1 sample

2. Confidence interval gives information about closeness to unknown population parameter Stated in terms of probability Exact closeness not known because knowing exact

closeness requires knowing unknown population parameter

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Areas Under the Normal Curve

Between:± 1 - 68.26%± 2 - 95.44%± 3 - 99.74%

µµ-1σµ+1σ

µ-2σ µ+2σµ+3σµ-3σ

If we draw an observation from the normal distributed population, the drawn value is likely (a chance of 68.26%) to lie inside the interval of (µ-1σ, µ+1σ).

P((µ-1σ <x<µ+1σ) =0.6826.

Lesson7-6 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

P(µ-1σ <x<µ+1σ) vsP(x-1σ <µ <x+1σ)

P(µ-1σ <x<µ+1σ) is the probability that a randomly drawn observation will lie between (µ-1σ, µ+1σ).

P(µ-1σ <x<µ+1σ) = P(µ-1σ -µ-x <x -µ-x<µ +1σ -µ-x) = P(-1σ -x <-µ<1σ -x)= P(-(-1σ -x )>-(-µ)>-(1σ -x))= P(1σ +x >µ>-1σ +x)

= P(x - 1σ <µ <x+1σ)

P(x-1σ <µ <x+1σ) is the probability that the population mean will lie between (x-1σ, x+1σ).

Lesson7-7 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

P(µ-1σm <m<µ+1 σm) vsP(m-1 σm <µ <m+1 σm) (m=sample mean)

P(µ-1 σm <m<µ+1 σm) is the probability that a randomly drawn sample will have a sample mean between (µ-1σ, µ+1σ).

P(µ-1 σm <m<µ+1 σm)

= P(µ-1 σm -µ-m <x -µ-m<µ +1 σm -µ-m)

= P(-1 σm -m <-µ<1 σm -m)

= P(-(-1 σm -m )>-(-µ)>-(1 σm -m))

= P(1 σm +m>µ>-1 σm +m)

= P(m - 1 σm <µ <m+1 σm)

P(m-1 σm <µ <m+1 σm) is the probability that the population mean will lie between (m-1 σm , m+1 σm).

Lesson7-8 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

P(µ-a <x<µ+b) vs P(x-a<µ <x+b)

P(µ-a <x<µ+b) is the probability that a drawn observation will lie between (µ-a, µ+b).

P(x-a <µ <x+b) is the probability that the population mean will lie between (x - a, x+ b).

Generally, P(µ-a <x<µ+b) = P(x-a <µ <x+b)

Generally, P(µ-a <x<µ+b) and P(x-a <µ <x+b) are not equal. They are equal only if a = b. That is, when the confidence interval is symmetric.

Generally, P(µ-a <x<µ+b) and P(x-a <µ <x+b) are not equal. They are equal only if a = b. That is, when the confidence interval is symmetric.

NO!!!!

Lesson7-9 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

P(µ-a <x<µ+b) = P(x-b <µ <x+a)

P(µ-a <x<µ+b) is the probability that a drawn observation will lie between (µ-a, µ+b).

P(µ-a <x<µ+b) = P(µ-a -µ-x <x -µ-x<µ +b -µ-x) = P(-a -x <-µ<b -x)= P(-(-a -x )>-(-µ)>-(b -x))= P(a +x >µ>-b +x)

= P(x - b <µ <x+a)

P(x-b <µ <x+a) is the probability that the population mean will lie between (x - b, x+ a).

Lesson7-10 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Elements of Confidence Interval Estimation

Confidence Interval

Sample Statistic

Confidence Limit (Lower)

Confidence Limit (Upper)

We are concerned about the probability that the population parameter falls somewhere within the interval around the sample statistic.

XZX

XX

ZX

Generally, we consider symmetric confidence intervals only.Generally, we consider symmetric confidence intervals only.

Lesson7-11 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Confidence Intervals

90% Samples

95% Samples

99% Samples

x_

nZ

XZ

X

XXXX 58.2645.1645.158.2

XX 96.196.1

The likelihood (probability) that the sample mean of a randomly drawn sample will fall within the interval:

Lesson7-12 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Confidence Intervals

)(X

ZXX

ZP

The likelihood (or probability) that the sample mean will fall within “1 standard deviation” of the population mean is the same as the likelihood (or probability) that the population mean will fall within “1 standard deviation” of the sample mean.

Z

1.645

1.96

2.58

0.90

0.95

0.99

0.90

0.95

0.99

)(X

ZXX

ZXP

Lesson7-13 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Level of Confidence

1. Probability that the unknown population parameter falls within the interval

2. Denoted (1 - level of confidence is the probability that the parameter is not

within the interval3. Typical values are 99%, 95%, 90%

Lesson7-14 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Interpreting Confidence Intervals

Once a confidence interval has been constructed, it will either contain the population mean or it will not.

For a 95% confidence interval, If we were to draw 1000 samples and construct

the 95% confidence interval for the population mean for each of the 1000 samples.

Some of the intervals contain the population mean, some not.

If the interval is a 95% confidence interval, about 950 of the confidence intervals will contain the population mean.

That is, 95% of the samples will contain the population mean.

(1-)

(1-)

(1-)

(1-)

(1-)*1000

Lesson7-15 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Intervals & Level of Confidence

Sampling Distribution of Mean

Large Number of Intervals

Intervals Extend from

(1 - ) % of Intervals Contain .

% Do Not.

x =

1 - /2/2

X_

x_

XZX

XZX

to

Lesson7-16 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Point Estimates and Interval Estimates

The factors that determine the width of a confidence interval are:1. The size of the sample (n) from which the

statistic is calculated.2. The variability in the population, usually

estimated by s.3. The desired level of confidence.

nZX

XZX

)2/

()2/

(

=

1 - /2/2

X_

x_

x

Lesson7-17 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Point and Interval Estimates

We may use the z distribution if one of the following conditions hold: The population is normal and its standard deviation

is known The sample has more than 30 observations (The

population standard deviation can be known or unknown).

n

szX

Technical note: If the random variables A and B are normally distributed,

Y = A+B and X=(A+B)/2 will be normally distributed. If the population is normal, the sample mean of a

random sample of n observations (for any integer n) will be normally distributed.

Lesson7-18 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Point and Interval Estimates

Use the t distribution if all of the following conditions are fulfilled: The population is normal The population standard deviation is unknown

and the sample has less than 30 observations.

n

stX

Note that the t distribution does not cover those non-normal populations.

Lesson7-19 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

The normality assumption in student t

Is it possible to check whether the population is normal? Yes. Using normal probability plot and a test of moments.

In small samples, however, such check is likely very imprecise.

In practice, when sample size is small and the population distribution is known, we use student t. However, if we are very concern whether we should use

student t (i.e., the assumption of normality), we might consider some other approach to conduct the inference. One such approach is known as “bootstrap” which is based on simulations. (will be covered only in very advanced statistics course.)

Lesson7-20 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Student’s t-Distribution

The t-distribution is a family of distributions that is bell-shaped and symmetric like the standard normal distribution but with greater area in the tails. Each distribution in the t-family is defined by its degrees of freedom. As the degrees of freedom increase, the t-distribution approaches the normal distribution.

Student is a pen name for a statistician named William S. Gosset who was not allowed to publish under his real name. Gosset assumed the pseudonym Student for this purpose. Student’s t distribution is not meant to reference anything regarding college students.

Lesson7-21 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Zt

0

t (df = 5)

Standard Normal

t (df = 13)Bell-Shaped

Symmetric

‘Fatter’ Tails

Student’s t-Distribution

Lesson7-22 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Upper Tail Area

df .25 .10 .05

1 1.000 3.078 6.314

2 0.817 1.886 2.920

3 0.765 1.638 2.353

t0

Student’s t Table

Assume:n = 3df = n - 1 = 2 = .10/2 =.05

2.920t Values

/ 2

.05

Lesson7-23 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Degrees of freedom (df)

Degrees of freedom refers to the number of independent data values available to estimate the population’s standard deviation. If k parameters must be estimated before the population’s standard deviation can be calculated from a sample of size n, the degrees of freedom are equal to n - k.

Example

Sum of 3 numbers is 6X1 = 1 (or Any Number)X2 = 2 (or Any Number)X3 = 3 (Cannot Vary)Sum = 6

Degrees of freedom = n -1 = 3 -1= 2

Lesson7-24 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

t-Values

where:= Sample mean= Population mean

s = Sample standard deviationn = Sample size

n

sx

t

x

Lesson7-25 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Confidence interval for mean ( unknown in small sample)

A random sample of n = 25 has = 50 and S = 8. Set up a 95% confidence interval estimate for .

X tS

nX t

S

nn n

/ , / ,

. .

. .

2 1 2 1

50 2 06398

2550 2 0639

8

2546 69 53 30

X

Lesson7-26 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Constructing General Confidence Intervals for µ

In general, a confidence interval for the mean is computed by:

n

szX

Sometimes called Margin of Error (ME).

Lesson7-27 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Margin of Error

The confidence interval,

Can also be written aswhere ME is called the margin of error

The interval width, w, is equal to twice the margin of error

n

σzxμ

n

σzx α/2α/2

MEx

n

σzME α/2

Lesson7-28 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Reducing the Margin of Error

The margin of error can be reduced if

the population standard deviation can be reduced (σ↓)

The sample size is increased (n↑)

The confidence level is decreased, (1 – ) ↓

n

σzME α/2

Lesson7-29 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Finding the Reliability Factor, z/2

Consider a 95% confidence interval:

z = -1.96 z = 1.96

.951

.0252

α .025

2

α

Point EstimateLower Confidence Limit

UpperConfidence Limit

Z units:

X units: Point Estimate

0

Find z.025 = 1.96 from the standard normal distribution table

Lesson7-30 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Common Levels of Confidence

Commonly used confidence levels are 90%, 95%, and 99%

Confidence Level

Confidence Coefficient,

Z/2 value

1.28

1.645

1.96

2.33

2.58

3.08

3.27

.80

.90

.95

.98

.99

.998

.999

80%

90%

95%

98%

99%

99.8%

99.9%

1

Lesson7-31 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 3

The Dean of the Business School wants to estimate the mean number of hours worked per week by students. A sample of 49 students showed a mean of 24 hours with a standard deviation of 4 hours. What is the population mean?

The value of the population mean is not known. Our best estimate of this value is the sample mean of 24.0 hours. This value is called a point estimate.

Lesson7-32 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 3 continued

Find the 95 percent confidence interval for the population mean.

12.100.2449

496.100.2496.1

n

sX

The confidence limits range from 22.88 to 25.12.About 95 percent of the similarly constructed intervals include the population parameter.

Lesson7-33 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Confidence Interval for a Population Proportion

The confidence interval for a population proportion is estimated by:

n

ppZp

)ˆ1(ˆˆ 2/

1

)ˆ1(ˆˆ 2/

n

ppZp

or

An unbiased estimator for the variance of sample proportion.

An biased estimator for the variance of sample proportion.

The difference of the two estimators is small when n is very large. The textbook uses the former.

Lesson7-34 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 4

n

ppZp

)ˆ1(ˆˆ 2/

A sample of 500 executives who own their own home revealed 175 planned to sell their homes and retire to Arizona. Develop a 98% confidence interval for the proportion of executives that plan to sell and move to Arizona.

049701.35. 500

)65)(.35(.33.235.

33.2

02.098.0)1(

01.02/

ZZ

04975.35. 1500

)65)(.35(.33.235.

1

)ˆ1(ˆˆ 2/

n

ppZp

Lesson7-35 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Finite-Population Correction Factor

A population that has a fixed upper bound is said to be finite.

For a finite population, where the total number of objects is N and the size of the sample is n, the following adjustment is made to the standard errors of the sample means and the proportion: Standard error of the sample means when is

known:

1

N

nN

nx

Standard error of the sample means when is NOT known and need to be estimated by s:

NnN

ns

x

Lesson7-36 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Finite-Population Correction Factor

Standard error of the sample proportions:

N

nN

n

ppp

)ˆ1(ˆˆ ˆ

This adjustment is called the finite-population correction factor.

If n/N < .05, the finite-population correction factor is ignored.

N

nN

n

ppp

1

)ˆ1(ˆˆ ˆ

or

Lesson7-37 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Selecting a Sample Size

There are 3 factors that determine the size of a sample, none of which has any direct relationship to the size of the population. They are: The degree of confidence selected. The maximum allowable error. The variation in the population.

Lesson7-38 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Selecting a Sample Size

To find the sample size for a variable:

where : E is the allowable error (sometimes called Margin of Error), z is the z- value corresponding to the selected level of confidence, and s is the sample deviation of the pilot survey.

2*

*

E

sznE

n

sz

nZX

XZX

)2/

()2/

(

Lesson7-39 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 6

A consumer group would like to estimate the mean monthly electricity charge for a single family house in July within $5 using a 99 percent level of confidence. Based on similar studies the standard deviation is estimated to be $20.00. How large a sample is required?

1075

)20)(58.2(2

n

Lesson7-40 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Sample Size for Proportions

The formula for determining the sample size in the case of a proportion is:

where p is the estimated proportion, based on past experience or a pilot survey; z is the z value associated with the degree of confidence selected; E is the maximum allowable error the researcher will tolerate.

2

)1(

E

Zppn

Lesson7-41 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 7

The American Kennel Club wanted to estimate the proportion of children that have a dog as a pet. If the club wanted the estimate to be within 3% of the population proportion, how many children would they need to contact? Assume a 95% level of confidence and that the club estimated that 30% of the children have a dog as a pet.

89703.

96.1)70)(.30(.

2

n

898

89703.

96.1)70)(.30(.1

2

n

n

or

Lesson7-42 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Summary: Confidence interval for sample mean

General confidence interval:

ˆ),(ˆ nr

( = population mean; = confidence level; = standard deviation)

Sample Size (n)

<30

≥30

known unknown

Normal

Population distribution Unknow

nNormal

Population distribution Unknow

n

nZ

2/ˆnn

t

ˆ1,2/ˆ

nZ

ˆ

2/ˆ n

Z

2/ˆ

2/1

)1/(2)ˆ(ˆ

n

ix

? ?

Lesson7-43 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Summary: Confidence Interval for sample proportion

General confidence interval: p

nrp ˆ),(ˆ (p= population mean; = confidence level; = standard deviation)

Sample Size (n)

<30

≥30

known unknown

Normal

Population distribution Unknow

nNormal

Population distribution Unknow

n

nZp

2/ˆ

nntp

ˆ1,2/

ˆ

nZp

ˆ2/

ˆ n

Zp

2/

ˆ

2/1)ˆ1(ˆˆ pp

Because the variance of one draw from the population is 2= p(1-p), we know if only if we know p. If we know p, there is no need to estimate p or to construct the confidence interval for p.

1ˆn

1ˆn

2/1)1( pp

Lesson7-44 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

- END -

Lesson 7:Lesson 7: Estimation and Confidence Estimation and Confidence IntervalsIntervals