Topic 2: Estimation Section 9.1 Introduction

1

Topic 2: EstimationSection 9.1 Introduction

Objective: Using ______ information to learn about population characteristics.

(i) Estimation: Want to measure the value of a __________ parameter. For example, one could use the sample statistic to measure _.X

How “____” is the estimator?⇒ What are the _______ for estimating population parameters?⇒

(ii) Hypothesis Testing: Want to test the ________ of some statement about the __________ characteristics.

(iii) __________: Forecast a population parameter in the future.

*************For now, we will:

1) Present _______for judging how well a ______statistic estimates the population parameter.

2) To analyse several popular _______ for estimating these parameters.

We will now apply our knowledge of the probability of sampling statistics,and use sample measures, to learn about population measures.

Developing _______ for accurately estimating the value of • __________ parameters is an extremely important part of statistical analysis.

2

Estimation: Estimators and Estimates

Definition: *ESTIMATOR: Random ________used to estimate population parameters are called estimators.

* ESTIMATES: Specific ______of these variable estimators of population parameters are called estimates.

Example:

is an estimator of µ: This formula is an _________.X1

1nX Xi

i

n

=∑

= ⇐

“ =10" A specific value of estimator is an ________ of _.X ⇐

Remarks:

It is important to note the distinction between the two:

_________: is a formula or ____ used to measure a population ⇒ parameter on the basis of sample information. Eg. Formula defines a statistic.

________: is the value computed for a given ______ when using an ⇒ estimator. Eg. Want to determine the mean of a population. Must use X X ni= ∑ / as the estimator. The numerical value of is the estimate.X The value of =10 10 is the point estimate of _.X ⇐

Eg. We could also use the sample ______ or mode as estimators of the population mean _.

3

pIt is not necessary for an ________ of a population parameter to be one single value. The estimate could be a _____ of values.

Two Types of Estimation:

1) _____ Estimation: estimates that specify a single _____ of the population are point estimates.

Example: The estimated value of new ________ cars is $45,000. I.e. =$45,000.X

2) ________ Estimation: Estimates that specify a range of ______are called interval estimates.

Example: The mean price range of new Japanese cars will fall somewhere between $25,000 to $45,000.

Specifies a ____of values indicating that we believe the mean price for the⇒population lies within this ________.

Remarks:

” Point estimation can be ___________because we must use __________ to determine the estimate.

Hence, there is an element of ___________ introduced into estimation ⇒process.

” Must ______ some measure of the ___________of the estimation process.

Reliability of Estimation Depends On:

estimation ______⇒sampling ______⇒____ of the sample⇒characteristics (parameters) of the parent population.⇒

4

Question: “Which estimator is ____?”

&The choice of optimal estimator needs to be made with any estimation problem.

&Generally the choice of estimator in a given situation usually depends on how well the _________ satisfies certain criteria.

” and s2 are all functions of the random ______ values of Xi’s. X These statistics have a sampling ____________.⇒ The information in the sampling distribution of an estimator provides ⇒ one basis for determining how effective or good an estimator is.

Ex: We use to estimate _.X We know E( )=_ and V( )=_2/_.⇒ X X So, on average, using to generate an estimate of _ gives the right ⇒ X answer, but not necessarily for any __________ ______: %Suppose we know that _=10%We draw three samples: 1 =12X 2 = 8X 3 =10.X

X kik∑ = =/ 10 µ

Also, as the sample ____ gets bigger, the _______of , V( ), decreases. ⇒ X X

So, as an estimator of µ, is more reliable if we have _____ samples of ⇒ X data.

5

Sampling Properties of ____ Estimators:

(I) ________ness: On _______, the value of the estimate should equal the __________ parameter being estimated.

If the average value of the estimator does not equal the actual parameter ⇒ value, the estimator is a ______ estimator.

pIdeally, an estimator has a bias of ____ if it is said to be un______:

“An estimator is said to be ________ if the expected value of theestimator is equal to the ____ value of the parameter being

estimated.”

If is a __________ parameter to be estimated,θ

and is an _________ where ,$θ $ $( , , , )θ θ= X X Xn1 2 L

is said to be an ________ estimator of if: $θ θ ( )E $ _ .θ =

Example 1) under simple random sampling (Topic 1); so is an E X( ) = µ X ________ estimator.

6

Example 2) If Xi -N(µ,σ2), then it can be shown that:

. ( )X X

isi

i

n −

=

∑ σχν

1

22

Taking the expected value of the _2 we get:

E X X

ce s X X

if we rearrange the equation above

En

X X

E s

ii

n

i

i

1

1

11

22

1

2 2

2 2

2 2

σ

σ

( ) (_ _ )

sin(_ _ )

( ) ,

:

( )( ) _

( ) .

−

= −

=−

−

−−

=

=

=∑

∑

∑

So the sample variance is an ________ estimator of the population variance.

E E( )$θ θ= ( )~θ θ≠

Bias ~θ

7

” _______ness criterion requires only that the _______value of the estimator equal the population parameter. It says nothing about the potential __________ of the estimator values around the population parameter. It only seems sensible that most of the values of a good estimator should be reasonably close to the population parameter.

For this reason, the next property of a good estimator is important.

(II) __________: The most _________ estimator among a group of unbiased estimators is the one with the _______variance.

&If and are both unbiased estimators of , and the variance $θ ~θ θ of is ____ than or equal to the variance of , $θ ~θ V V( $) ( ~),θ θ≤

then is an _________ estimator of , relative to .$θ θ ~θ

pThe most ________estimator is called the ____ unbiased estimator, where ‘best’ implies ________variance.

Relative Efficiency: is defined as the _____ of the variance of the two estimators:

;Re( $)( ~) sec

lative EfficiencyVV

Variance of the first estimatorVariance of the ond estimator

= =θθ

R.E. < 1 if is _________ relative to .⇒ $θ~θ

R.E. > 1 if is efficient relative to .⇒~θ $θ

R.E. = 1 equally efficient.⇒

What if one or both of the estimators are ______?

8

Definition: “Mean Squared Error” ( MSE)

The mean squared error (MSE) of is :$θ ( )E $θ θ−2

[ ]( ) ( )[ ]

[ ] [ ] ( )( )}

MSE E add subtract E

E E E group in two

E E E and

E E E E E E E

( $) ( $ ) & ( $)$ ( $) ( $)

$ ( $) ( $) exp

( $ ( $)) ( ( $) ) ( $ ( $)) ( ( $) )

_ _ _ _ _ _ _ ( $) _ _ _ _ ( $)

_ (

θ θ θ θ

θ θ θ θ

θ θ θ θ

θ θ θ θ θ θ θ θ

θ θ

= − ⇒

= − + − ⇒

= − + − ⇒

= − + − + − −

=

=

⇑ ⇑ ⇑

2

2

2

2 2 2

06 744 844 6 74 84

[ ] [ ]$) (_ _ _ _ ( $)θ θ+ +2

0

˜ The MSE enables us to compare ______ estimators.

Note:

.MSE V if the( $) ( $) _ _ _ _ ( $)θ θ θ= = 0

Definition:

Let and be two estimators of .$θ ~θ θ Then is an efficient compared to if MSE ( )# MSE ( ).$θ ~θ $θ ~θ

9

Graphical illustration:

E( $)θ θ= E( ~)θ Bias( )~θ

In this example, the < , and is unbiased while is( )VAR $θ ( )VAR ~θ $θ ~θbiased.

Hence, MSE ( ) < MSE ( ) . ⇒ $θ ~θ

E( $)θ θ= E( ~)θ

Here < , but is unbiased while is biased.( )VAR ~θ ( )VAR $θ $θ ~θ MSE trade-off!⇒

10

(III) __________: “An estimator is said to be ________ if it uses all the ___________ about the population parameter that the sample can provide.”

” Estimator incorporates all of the information available from the ______.

”__________ estimators takes into account each ______observation and any information that is generated by these observations.

Example: ‘Not sufficient’: the ______ is not a sufficient estimator because it _____ the observations to obtain the middle value.

⇒ ( )Median X mediani: , , ; .= =1 3 5 3

Example: ‘Not sufficient’: the ____ is not sufficient because it represents the most frequent occurring observation as a estimator of the population mean.

( )Mode Xi: , , , , , , ; _ .= =1 1 1 3 4 4 5 mode

Example: ‘Sufficient’: The sample ____ is a sufficient estimator because it uses the entire data set.

( )Xn

XnX X Xi

i

n

n= = + + +=∑1 1

11 2 L

pSufficiency is a necessary condition for __________.

11

(IV) ___________: ‘Large Sample Property’

” Usually the ____________ of an estimator will change as the sample ____ changes.

‚The properties of estimators for large sample sizes (as n N or infinity) are ⇒ important.

” Properties of estimators based on distributions approached as n becomes large, are called __________ properties.

” These properties may differ from the ______ or small sample properties.

&___________ is the most important asymptotic property: A consistent estimator achieves ___________ in probability limit of

the estimator to the population parameter as the size of n increases. (Beyond this course.)

What we will discuss is a ‘stronger’ notion of consistency: (Mean Square Consistency:

˜Recall, MSE= variance + bias2.

˜An estimator is mean square consistent if its MSE 0 as the sample size, ⇒ n, becomes _____.

˜ “An estimator, , is mean square consistent if its MSE: $θ E ( - )2, approaches _____as the sample size becomes large”.$θ θ

( ) ( )E and V as n$ $ _ _ .θ θ θ→ → →

12

Note: If an estimator is mean square consistent, then it will also be consistent in the convergence in ___________ sense; But an estimator

may be consistent in the convergence in probabiltiy sense, yet not be mean square consistent.

q Mean square consistency is not a ‘________’condition for convergence in probability.

˜Consistency implies that the probability ____________ of the estimator for large samples becomes smaller and smaller (i.e. variance is __________ as more information about the population is used in each sample). The distribution becomes more _______ about the true value of the parameter (bias getting smaller). And in the limit as n = 4, the probability distribution of the estimator degenerates into a single “_____” at the true value.

When n bias→ ∞ → →, var _ & _.2

Example #1: Unbiased Estimator: : $θ ( )E $θ θ=

( )f $θ

$θ( )E $θ θ=

13

f n( $ )θ = ∞

f n n( $ )θ = 3

f n n( $ )θ = 2

f n n( $ )θ = 1

Example #2: ______ Estimator but$θ ( )E $θ θ≠

(i) Bias 0, as n _ . I.e. $θ ⇒ ⇒ ( )E as n$ , .θ θ→ → ∞

(Asymptotic unbiasedness.)

(ii) Var 0 as n _.$θ ⇒ ⇒

θ

14

Section 9.3: Estimating ________ Parameters

_________ Estimation

˜Recall, the major weakness of point estimates is that it can be misleading. "It does not permit the “expression” of any degree of ___________ about the estimate.

( We need to _________this uncertainty.

pThe usual way to express this uncertainty (i.e. probability of error) about an estimate is to define an ________ of values in which the population parameter is likely to be.

(This is known as ________ estimation.

Confidence Intervals

˜In Topic 1, we determined the value of a & b such that (P(a# #b)= some predetermined value.X

(The interval (a,b) is called a ___________ interval for .X

Example: P(a# #b)=0.90 based on a ______ sample of size n, drawn X from a population with a known mean µ.

Z

ÛWe know the random variable will fall in the probability X interval (a,b) 90% of the time.

15

Question: “What if _ is unknown and we want to construct a confidence interval for _ based on ?”X

&In a confidence interval, the exact ________ distribution of the estimator is applied to construct confidence limits that assume a ______ probability of _____ for the interval estimate.

&The confidence interval defines an interval based on , such that, µ is X likely to fall _____such intervals 90% of the time the method is employed $ creating 90% confidence interval.

&Hence, on average, __ out of ___ times, such calculated intervals from samples of size n, will contain µ.

&So, our objective is to preset a range of _____s which _____ the true parameter with some fixed ___________.

Note: This example does ___ state that there is a 90% probability that _ lies within the interval.

(_ is not a random variable.

(It is a ________: E(µ)=µ Ô no distribution.

&If µ lies within the interval, then the probability that µ is in the interval = _.

&If µ lies outside the interval, the probability is ____.

p It is improper to make __________statements aboutthe values of population parameters.

16

When constructing an interval estimate, the __________ (endpoints) are based on sample information. Each sample will generate different values. Hence, these intervals are ______ variables.

&So, we can make ______ probability statements about the “proportion of intervals” that would include a ________ parameter value.

Notation: Use the term confidence interval when specifying the upper and lower limit on the likely value of a parameter.

A 90% confidence interval for a parameter declares:

“The probability is 90% that the _______to be determined on the basis of the ______ evidence would be one that includes the population parameter.”

17

Confidence Interval For µ, (_ Known)

&Illustration with reference to estimating the mean of a normal population. Assume:

Xi ~N(µ,σ2); where i=1, 2, 3, ...&Also assume σ2 is ______.

&Use as a _____ estimator of µ, where ~N(µ, σ2 /n).X X

If we standardize ~ N(0,1).ZX

=− µ

σ_

So:

[ ]P Z

PX

n

− ≤ ≤ =

− ≤−

≤

=

_ ._ _ _ ._ _ .

_ ._ _ _ ._ _ .

0 95

0 95µ

σ

If we rearrange this:

P n X nµ σ µ σ−

≤ ≤ +

=_ ._ _ _ ._ _ .0 95

&This is the range of values over which we are 95% confident will X lie.&Use this to determine the interval which has 95% probability of containing µ. I.e. the 95% confidence interval for µ.

18

i) Upper limit:

X nrearrange

X n

≤ +

≥ −

µ σ

µ σ

_ ._ _

:

_ ._ _

ii) Lower limit:

µ σ

µ σ

− ≤

≤ +

_ ._ _

:

_ ._ _

n X

rearrange

X n

&So the probability that the random interval:

covers µ is 0.95. X n X n− +_ ._ _ , _ ._ _σ σ

“In 95% of _______, _ will lie ______ the intervalcalculated for each sample.

19

Remarks:

(i) The interval is ______because it depends on the random variable .X

& µ is a _____ value. Hence, µ does not have a probability distribution

(ii) The intervals are based on the sampling distribution of . So, the X interpretation of this “95% confidence interval for µ” is:

, Fix n (sample size) and repeatedly draw samples.

,Compute and interval each time.X X n±

196. σ

,95% of such intervals will _______ µ.

But, there is no guarantee that on any particular _____,the intervalwill cover µ.

&Either it does or does not.

What Determines the ______ of the Interval?

(I) _ , the ______ size

(II) , population varianceσ 2

(III) The chosen probability , (confidence level)(IV) Sampling ______

20

The Probability of An Error

pIt is often more convenient to refer to the probability that a confidence interval will ___ include the _________ of interest.

Notation:Let “α” = the probability that a confidence interval will ___ include the

parameter of interest.(The value of α is known as the probability of making an _____.

, α indicates the proportion of times that one will be incorrect in assuming that the intervals would contain the population parameter.

,Usually the confidence interval is referred to as being of ____ 100(1-α)%.

Example: (Continued from above) In _% of the cases, on the average, the

intervals calculated as would not contain µ.X n±

196. σ

So, if α = 0.05, then the associated interval is an 100(1-α)% confidenceinterval. (i.e. 95% confidence interval.)

, If a 90% confidence interval is specified, then α = 1-(90/100) = 0.10.

˜ There is a ‘___-___’ between the value of α and the size of the confidence interval.,The _____ the value of α, the ______ the interval must be. (Assuming all factors are held constant.)

, To be very confident that the interval will cover the population parameter, then a relatively _____ interval is necessary.,If one need not be very confident that the population parameter will be within the interval, then a small interval is fine.

21

˜ The value of “α” is usually arbitrarily set at either 1%, 5%, or 10% representing 99%, 95% and 90% confidence intervals, respectively.

,Although this choice is standard, it may not necessarily lead to an optimal trade-off between the size of the confidence interval and the ___of making an error.

Note:

(i) Confidence intervals are constructed on the basis of ______ information. So, a change in ‘α’ and ‘n’ affect the size of the interval. The larger the ______ size, the more confident one can be about the estimate of the population parameter. Hence, the smaller the need for the _______ to be to guarantee a given level of confidence.

(ii) It is usually desirable to have as ____a confidence interval as possible. -But optimal interval size must be decided on by considering the

“trade off” between the ____ of sampling and the amount of risk of making an error that one wants to assume.

*The risk of making an error and sample size trade-off is an important issue....more on this later.

22

Determining A Confidence Interval

˜ Both n and α are usually set in advance. ,But we can solve for an _______ ‘α’ and n under certain

conditions.

˜ In addition to specifying ‘α’ before we construct a confidence interval, we must also specify how much of the total error, ‘α’ , is attributed to the possibility that the ____ parameter might be larger than the upper limit of the confidence interval. The remainder of the error is attributed to the possibility that the true parameter might be smaller than the lower bound of the interval.

,Usually this _____ of ‘α’ is obtained by ________‘α’ equally between the upper and lower tails of the distribution:

“The common procedure for determining confidence intervalsis.... to _______ half of ‘α’ (i.e. α /2) on the high side and half of

‘α’ on the low side.”

˜ In applying this procedure, we are assuming that the penalty of under andover-estimating the true parameter is the same.

(This may ___ be ____!

˜ The decision on how to divide ‘α’ should depend on how _______ or costly it is to make errors on the high side relative to errors on the low side.

Problem: It can be very _________ to determine the cost of making an error.

23

Section 9.4: Confidence Interval for µ With σ _____

˜ We will examine in this section a confidence interval for the population parameter µ based on a random sample drawn from a _______ parent population with a _______ population standard deviation.

( We use as the sample statistic for estimating µ (see previous X section for details) because:

, E( ) = __X

, s nx = σ

, ,standard normal distribution.ZX

n=

−

( )µσ

Notation:% Let Zα represent the point for which the probability of

observing values of Z _______ than Z"= α:.

P(Z>Zα)= α, and the __________ probability at this point is F(Zα)=1-α.

Example: F(Z0.25)=0.____

Z

24

¸ This notation allows us to represent the proportion of the total ____ under a normal probability density function in two ways:

$F(Z) area to the ____ of a point Z$Zα area to the _____ of a point Z having an area of ‘α’ to its

right.

¸ Since the normal distribution is _________, the negative values (- Zα) can denote points in the lower ____ tail of the normal distribution below which a proportion of ‘α’ of the area is excluded.

Z

¸ For finding the upper and lower __________ interval boundaries, one only has to find the points excluding (α/2) proportion of the area in each ____ of the normal distribution, such that the total area excluded equals __and the area included in the interval between the upper and lower limits is (1-_).

25

− Zα 2 Zα 2

P Z Z Z( ). .− ≤ ≤ = −α α α2 2 1

Method:

Let and Z P Z Zα α2 2= ≥( ). − = ≤Z P Z Zα α2 2( ).

Example: If α=0.__ (__% confidence interval " + = =Z Zα 2 0 10 2 0 05. .

The value that satisfies is the same point as F(Z)=F(0.__)=1.645.

Z -1.645 0 1.645So, Thus, the probability that “Z”P Z( . . ) . . . .− ≤ ≤ = − − =1645 1645 1 0 05 0 05 0 90falls between the two limits -Zα/2 and Zα/2 is represented by:

26

& Which is 100(1-α)% probability interval for Z, any standardized normal variable:

.ZX

n=

−

( )µσ

& To derive a confidence interval for µ, we simply _________ the expression ( ):− ≤ ≤Z Z Zα α2 2

Equation 9.1X Zn

X Zn

−

≤ ≤ +

α α

σµ

σ2 2

& Equation 9.1 represents the 100(1-α)% confidence interval for µ, when σ is _____ and the parent population is ______.

Example : If α = 0.0__, the value of Z α/2 that satisfies

is the same point as F(Z) = 0.____.P Z Z Z( ) ..≥ = =α 2 0 0125 0 0125

, From the Z table, the value of Zα/2 = Z0.0125=_.__.

Z

27

Example 1: A marketing firm is interested in determining the average percentage of university students who pay off their ______ ____each month. A random sample of __ students out of a population of ____ is taken to determine the average percent. The sample average is __%.

(i) Assuming a normal population with a standard deviation of _%, find a __% confidence interval for the population ____.

ğ α=0.__.ğ α/2=0.__/2=0.0__.ğ Z(0.___)=_.__

{ }

X Zn

X Zn

Confidence Interval toWidth

−

≤ ≤ +

= ±

= ±

=

0 025 0 025

23 8

3136

19 864 261366 272

. .

(_ ._ _ ) __

_ _ .

_ _% . . ;. .

σµ

σ

Hence, if many repeated samples were taken, __% of the intervals

constructed as for a sample size of __, will include the trueX ±

_ ._ _85

value of the unknown average value. Whether this particular intervaldoes or does not contain µ is _______; However, we now have ameasurement of the __________ associated with our statement about theactual value of µ.

28

ii) Give a __% confidence interval for the average % of students who pay off their ______ cards each month. Compare it to (i).

__% Confidence interval is :ğ α=0.__.ğ α/2=0.__/2=0.0__.ğ Z(0.___)=2.576

{ }

X Zn

X Zn


−

≤ ≤ +

= ±

= ±

=

0 005 0 005

23 523 41216

188784 2712168 2432

. .

(_ ._ _ _ ) _

.

_ _% . . ;. .

σµ

σ

Compared to part (i), the __% confidence interval is _____.

29

(iii) Suppose a __% confidence interval is required but you want the interval to be ________ than part (i). What do you do?

Change the sample ____. Let n=__. A __% confidence interval is α=0.__.

Ô implies the width of CI equalling X Z n±

0 025.

σ

52 0 025∗ ±

( ).X Z nσ

The confidence interval :

{ }

= ±

= ±

=

23

23 196

2104 24 96392

(_ ._ _ ) __

.

_ _% . . ;. .


More on this later.......

30

If we wanted to dictate the _____ of the confidence interval, wecould solve for _:

Let:

( )

( )

=

=

=

=

= =

= =

= ⇔ = =

2

2 1968

2 15683136

313612 544 157 352

2

2

Zn

n

n

nrearranging

n n

α

σ_ ._

. _ ._

. / _ ._.

_ ._

:.

_ ._. .

,We need a sample ____ of 158 to achieve a __% confidence interval. p Given some ‘α’, a _______confidence interval can be derived with a ______ sample ____.

31

“What if the assumption of __________ for thepopulation is relaxed?”

˜If the sample ____, n, is _____, then the distribution of is not ( )X

n

−

µσ

______ and there is no easy way to determine a confidence interval.

( But, according to the ___, when n is _____ (n>__), is ( )X

n

−

µσ

_____________ normally distributed and we can apply formula 9.1.

Applied Questions: 9.12: When properly adjusted, the period of intense heat applied in afabricating process is normally distributed with a variance of σ2 = 0.__(seconds)2. _____ random cycles of the process are monitored and theresults are heat periods of 51, 52, 50, and 51 seconds.

A) Find an ________ estimate of µ.

( )

X

Since the E X estimator

=+ + +

=

=

51 52 50 514

_ _

, _ _ _ _ _ _ _ _ _ .µ

B) Construct a __% confidence interval for µ.

{ }

= ±

= ±

= ± ⇒

X Znσ

σ2

51 0 3535

51 0 6929

_ ._ _ ( . )

. confidence interval = 50.3 to 51.7

32

Section 9.5: Confidence Intervals for µ, With σ ___________

p The assumption that the population standard deviation σ is ______ may not be applicable to many practical problems. I.e. If the population mean is _______ it is highly unlikely the standard deviation of the _______ mean will be _____.

pThe population σ must be estimated by using ______ information to estimate ‘s’ (the sample std. deviation).

(Hence, usually the _-distributed random variable,

tX

n

with degrees of freedom=−

− =( )

_(_ )

µν1

is employed to solve these problems. (Assuming the parent population is ______.)

pThe methodology to determine a 100(1-α)% confidence interval is the ____ as the ‘σ known’ case, except the “______” of the confidence interval are found by using _-values instead of _ -values.

Notation:

&The t-value is labelled with two __________: .tα ν2,ğThe α/2 is the probability in the _____ tail;ğThe ν represents the degree of _______.

Recall, ‘t’ represents a series of distributions whose shape depends onthe sample ____. As n _________, the t-distribution approaches the______ distribution.

33

ğExample: n=__, d.f.=__, __% confidence interval:

P t( _ ._ _ _ ) .19 0 025≥ =

T19

-2.093 0 2.093

So,

[ ]P tXsn

t

The confidence interval forpopulation

− ≤−

≤

= −

−

α ν α νµ

α

µα µ

σ

2 2 1

100 1

, ,

( )% ,_ _ _ _ _ _ , _ _ _ _ _ _ _:

Rearranging and solving for :

Equation 9.2X ts

X ts

− ≤ ≤ +α ν α νµ2 2, ,_ _

34

Example: ___B.C. flies daily to Prince George to deliver mail. The trip from _________ follows an identical route. Owing to variations in weather patterns and landing clearance, the actual flight time varies. In a sample of _ trips, the following flight times are recorded as:(55,57, 65, 47, 76) minutes. Determine a __% confidence interval for µ.

Determine a ________ estimator of µ:

minutes.[ ]Xn

Xii

n

= = + + + + = ==∑1 1

555 57 65 47 76 300 5 60

1/

The sample variance sn

X Xi2 21

1=

−−∑( )

Xi ( )X Xi − ( )X Xi − 2

5557654776

3 300 3 0 3484

sn

X X

s

i2 2 21

11

484

11

=−

− = =

=

∑ ( )_

( ) _ _ _

.

minutes

ν=n-1=_ degrees of freedomF(t4) =1-α/2 = 1-0.__ = 0.__ t0.05,4 =_.___

Substituting values into formula 9.2:

35

{ }

X tsn

X tsn

confidence interval toWidth

− ≤ ≤ +

= − ≤ ≤ +

= − ≤ ≤ += ≤ ≤

=

α ν α νµ

µ

µµ

2 2

115

6011

560 10 489 60 10 48949 51 70 49

49 51 70 4920 98

, ,

_ _ _ _ _ _ _ _ ._ _ _

. .. .

: . ..

²The interval (49.51 to 70.49) represents a __% confidence interval for the mean flight time per trip to Prince George, B.C.Remarks:² This confidence interval depicts how often the ____mean is likely __ to lie in the intervals calculated as α=0.__, if this procedure is repeated over and over.

² We can ________ the ____ of the interval by either increasing the chance of error α, or by increasing the sample size n.

² So, allowing a greater risk of error, or incurring a greater sampling cost to ________ n, results in a _______ confidence interval.

36

Section 9.7: Determining the ___ of the Sample (n)

² Many times researchers do not know what is the _______sample size for determining good _________ from our estimators of population parameters.

² In this situation, researchers can determine the optimal sample ____ if they:

1) Know what level of __________ is desired (100(1-α)). I.e. The 95% confidence interval.

2) Know what is the _______ difference (D) allowed between the _____estimate of the population parameter and the true value of the population parameter.

_ = largest allowable sampling _____.

3 Cases:(I) Population ______ and σ _____

²We want to determine “n” for a 100(1-α)% confidence interval for µ.

Since , if we require a level of ZX

nis N=

− µσ ( , )0 1

confidence = (1-α), then the Z variable equation results in 100(1-α)% interval for :[ ]X − µ

.( )− ≤ − ≤

Zn

X Znα α

σµ

σ2 2

37

² Concentrating on the right upper tail, means the ( )X Zn

− ≤

µσ

α 2

________value that can assume is [ ]X − µ Znα

σ2

.

² Recall, the researcher dictates the _______ sampling _____ allowedfor =D.[ ]X − µ

² Since this error is on either side of the true mean, we can set

.D X= − =µ α___2

² Solving for _, we can determine the value of ‘n’ that guarantees with 100(1-α)% confidence that will be no ______ than D.X − µ

, To Determine the _______ Sample ____ in Estimating the Mean:

Equation 9.6

( )_

_=Zα σ2

22

2

38

Example: Credit Card Debt

We want a __% confidence interval for the mean percentage of studentswho pay off their credit cards, such that our sample differ by noX − µ

more than ._ ._%

X D− ≤ =µ _ ._%

If: σ= _ Zα/2=_.___ when α/2=0.___

² Substitute into Equation 9.6:

( )( )

nZ

D

n

n

=

=

=

= =

α σ2

22

2

2 2

2

2 576 8 20 608

13738 75

( . )_ ._ _

._ ._ _

( . ) _ _ _ .

² We need a random sample of size ___ students to ensure that __% of the time the value of__ will be within _._% of the true population mean µ.

39

(II) Population Not ______ and σ _____

¸ By the CLT, the sampling distribution approaches the normal distribution as ‘n’ increases.

¸ Hence, once _ is determined using the same method as above, check to see if ‘n’ is greater than __.

¸ If it is, our method of solution is appropriate.

(III) Population ______ and σ _______

¸If the population is ______, but the standard deviation is _______, use the _-variable:

instead of Z._ _=−X

n

µ

So, first specify the ________sampling error D= .X − µ

PROBLEMS:

² _ is calculated from the sample, but we have not determined the optimal sample size.² The t-value tα/2,ν depends on the sample ____, which is unknown.

$Solve using iteration.

40

Section 9.8: Confidence Interval for σ2

˜ If a researcher is interested in determining the “___________” or “volatility” of a population characteristic, it is often desirable to construct a confidence interval for an estimate of an _______ population variance σ2.

Example: May be interested in not just the _______ time but the ___________ of time people spend on the Internet.

Example: Variation in ___________ in some tropical location.

Construction of A 100(1-α)% Confidence Interval for σ2

” Recall, when sampling from a ______ population, that

(i) the variable has a ( )n s−

1 2

2σχν

2

(ii) values are always ________, and χν2

(iii) distribution is not ___________. It is asymmetric.χν2

Hence, (1) we ______ use values such as that were used for ± χα ν22

,

the Z and t-distributions.

(2) We must look up two separate values for each ____ of the

cumulative ___-square distribution F( ).χν2

41

”So in Table VII:

,The in the ____ half of the table, F( )= α/2 gives the χν2 χν

2

_____ limit values of the confidence interval.

,The in the _____ half of the table, F( )=1-α/2 gives the χν2

χν2

_____ limit values of the confidence interval.

Example: ν = (n-1)=__, α=__%, α/2=0.__

for α/2=0.__.χ upper2 =_ _

for α/2=0.__.χ lower2 =_ ._ _

(The ____ in-between these two limits of the chi-square distribution must contain (0.950 -0.05)=0.__ of the total area.

, These two limits define the __% probability interval for a chi-square distributed random variable.

0 7.26 15=mean 25 χ152

42

, Using these limit values and substituting in the variablesχν2 χν

2

, we can derive the following probability ________:χσν

2 1 2

2=−

( )n s

. Pn

lower upperχ χ α22

221

1≤−

≤

= −

( )__

pSolving for these inequalities for the unknown parameter σ2 we derive the:

100 1

1 19 8

2

2

22

2

2

( )% :

( ) ( ).

,−

−≤ ≤

−⇐

α σ

χσ

χ

confidence interval for parent population normal

n s n sEquation

upper lower

” Notice the _____ value of lands in the __________of the lower χν2

endpoint for σ2 and the _____ value of is located in the χν2

denominator of the _____ endpoint for σ2.

43

Example: A consumer research group is interested in the variation of minutes of operation of several brands of _________. We label each type of battery ‘A’ ‘B’ and ‘C.’ Company A claims its ____ battery life- time is ___minutes. Company B claims an average of ___ minutes and Company C claims its average life is ___ minutes. The consumer group is interested in estimating the ________ of the population of batteries from each company, using a sample of size _.

The best point estimate of σ2 is s2, which will be determined from the samples. To create an interval estimate for σ2, with a confidence level of __% (and hence a risk of _____ = 0.05,) compute the confidence interval for each company.

Company A Company B Company C

Xi ( )X Xi − 2 Xi ( )X Xi − 2 Xi ( )X Xi − 2

745 685 735

755 600 740

751 710 760

747 779 690

731 650 682

3729Xi =∑ G=332.83424Xi =∑ G=17,910.8

3607Xi =∑ G=4,557.28

=___.8X AsA

2 832= . =684.8XB sB2 4477 7= . =721.4XC sC

2 1139 32= .

Recall s nX Xi

2 211

=−

−∑( )( )

44

, To compute the confidence interval for each, we need the limit χν2

values for ν=_, α=0.__, α/2=0.___.

Looking at the chi-square distribution table:

χχlower

upper

2

2

0=

=

._ _ __ _ ._

[ ]

( ) ( ).

:

(_ )( . )_ _ ._

(_ )( . )._ _ _

. . . .

n s n sEquation

Company A

width of battery A minutes

upper lower

−≤ ≤

−⇐

= ≤ ≤

=

= ≤ ≤ ⇐ =

1 19 8

832 8320

29 98 687 60 657 62

2

22

2

2

2

2 2

χσ

χ

σ

σ

[ ]

Company B

width of battery B minutes

:

(_ )( . )_ _ ._

(_ )( . )._ _ _

. . , . .

= ≤ ≤

=

= ≤ ≤ ⇐ =

4477 7 4477 70

161358 3700578 35 392 21

2

2 2

σ

σ

[ ]

Company C

width of battery C minutes

:

(_ )( . )_ _ ._

(_ )( . )._ _ _

. . . .

= ≤ ≤

=

= ≤ ≤ ⇐ =

1139 32 1139 320

41057 941587 90053

2

2 2

σ

σ

45

pTaking square-roots, we convert back to the original units:Company Lower Boundary Upper Boundary Width (minutes)

A 5.48 26.22

B 40.17 192.37

C 20.26 97.04

Example: (9.43) A sample of size __ has a standard deviation _. Thepopulation measure is the distance (kilometres) between home and the worklocation of commuters. Find a __% confidence limit on the true population________. Assume that the population is ________ distributed.

n=__n-1=__s=_α =0.__α/2=0.__A confidence interval for the population variance: Looking at Table VII:

for α/2=0.05. for α/2=0.05.χ upper2 =_ _ ._ χ lower

2 =_ ._ _

( ) ( )

(_ _ )_ _ ._

(_ _ )_ ._ _

( . . )

n s n s

upper lower

−≤ ≤

−

= ≤ ≤

= ≤ ≤

1 1

9 9

532 1918

2

22

2

2

2

2

χσ

χ

σ

σ

46

Example: Suppose Ψ ~ . Find and such that χ202 χ lower

2 χupper2

P( #Ψ# )=0.__χ lower2 χupper

2

(i) Degrees of freedom ν=(n-1)=__

(1-α)=0.__ which implies that α=0.05 and α/2=0.025.

(ii) is such that P(Ψ# )=0.0__ where Ψ ~ .χ lower2 χ lower

2 χ202

From Table VII, = _____ Ôχ lower2

(iii) is such that P(Ψ$ )=0.0__ which implies that χupper2 χupper

2

P(Ψ# )=F(1-0.025)=0.___.χupper2

From the Table VII, = ____ Ôχupper2

So: P (9.59 #Ψ# 34.2) = 0.__.

47

,Assume s2=__. Now we can construct a __% confidence interval for σ2:

Let ~ .Ψ =−( )n s1 2

2σχn−1

2

So, a 100(1-α)% confidence interval for σ2, parent population normal is:

( ) ( )n s n s

upper lower

−≤ ≤

−1 12

22

2

2χσ

χ

= ≤ ≤

= ≤ ≤

= ≤ ≤

( )( )_ _ ._

( )( )_ ._ _

( )_ _ ._

( )_ ._ _

( . ) ( . )

20 39 20 39

780 780

22 81 8133

2

2

2

σ

σ

σ

Topic 2: Estimation Section 9.1 Introduction

Documents

Transcript of Topic 2: Estimation Section 9.1 Introduction