Chap7

28
STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single Sample Fall 2011 STAT355 - Probability & Statistics Chapter 7: Statistical Inte Fall 2011 1 / 28

description

basic statistics

Transcript of Chap7

Page 1: Chap7

STAT355 - Probability & StatisticsChapter 7: Statistical Intervals Based on a Single Sample

Fall 2011

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 1 / 28

Page 2: Chap7

Chapter 7 - Statistical Intervals Based on a Single Sample

1 7.1 Basic Properties of Confidence Intervals

2 7.2 Large-Sample Confidence Intervals for a Population Mean andProportion

3 7.3 Intervals Based on a Normal Population Distribution

4 7.4 Confidence Intervals for the Variance and Standard Deviation of aNormal Population

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 2 / 28

Page 3: Chap7

Basic Properties of Confidence Intervals

Consider a random sample X1, ...,Xn from N(µ, σ2) and x1, ..., xn be theactual observations of the random sample.

Sample mean X̄ ∼ N(µ, σ2/n).

Z =X̄ − µσ/√

n∼ N(0, 1)

P(−1.96 ≤ X̄ − µσ/√

n≤ 1.96) = 0.95

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 3 / 28

Page 4: Chap7

Basic Properties of Confidence Intervals

P(−1.96 ≤ X̄ − µσ/√

n≤ 1.96) = 0.95

is equivalent to

P(X̄ − 1.96σ√n≤ µ ≤ X̄ + 1.96

σ√n

) = 0.95

Thus,

(X̄ − 1.96σ√n, X̄ + 1.96

σ√n

)

is a random interval that includes or covers the true value of µ.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 4 / 28

Page 5: Chap7

Basic Properties of Confidence Intervals

(X̄ − 1.96σ√n, X̄ + 1.96

σ√n

) (1)

is a random interval that includes or covers the true value of µ.

Definition

If, after observing X1 = x1,X2 = x2, ...,Xn = xn, we compute the observedsample mean x̄ and then substitute x̄ into (1) in place of X̄ , the resultingfixed interval

(x̄ − 1.96σ√n, x̄ + 1.96

σ√n

)

is called a 95% confidence interval for µ.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 5 / 28

Page 6: Chap7

Basic Properties of Confidence Intervals

Definition

A 100(1− α)% confidence interval for the mean µ of a normal populationwhen the value of σ2 is known is given

(x̄ − zα/2σ√n, x̄ + zα/2

σ√n

)

or, equivalently, by

x̄ ± zα/2σ√n

α = 0.1, zα/2 = z0.05 = 1.64

α = 0.05, zα/2 = z0.025 = 1.96

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 6 / 28

Page 7: Chap7

Example

Exercises 1: Consider a normal population with the value of σ known.1 What is the confidence interval level for the interval x̄ ± 2.81σ/

√n?

2 What is the confidence interval level for the interval x̄ ± 1.44σ/√

n?3 What is the value of zα/2 that will result in a confidence level of

99.7%?

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 7 / 28

Page 8: Chap7

Large-Sample Confidence Intervals for a Population Mean

Consider X1, ...,Xn from N(µ, σ2). Often, σ2 is unknown. Let S be thesample standard deviation.

Proposition

If n is sufficiently large, the standardized variable

Z =X − µS/√

n

has approximately a standard normal distribution. This implies that

x̄ ± zα/2s√n

is a large-sample confidence interval for µ with confidence levelapproximately 100(1− α)%. This formula is valid regardless of the shapeof the population distribution.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 8 / 28

Page 9: Chap7

A Confidence Interval for a Population Proportion

Let p denote the proportion of “successes” in a population.

A random sample of n individuals is to be selected, and X is the numberof successes in the sample.

Provided that n is small compared to the population size, X can beregarded as a binomial rv with

E (X ) = np and σX =√

np(1− p)

I Furthermore, if both np ≥ 10 and n(1− p) ≥ 10, then X hasapproximately a normal distribution.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 9 / 28

Page 10: Chap7

A Confidence Interval for a Population Proportion

The natural estimator of p is p̂ = X/n, the sample fraction of successes.

Since p̂ is just X multiplied by the constant 1/n, p̂ also has approximatelya normal distribution.

As we know that, E (p̂) = p (unbiasedness) and σp̂ =√

p(1− p)/n.

The standard deviation σp̂ involves the unknown parameter p.Standardizing p̂ by subtracting p and dividing by σp̂ then implies that

P(−zα/2 ≤p̂ − p√

p(1− p)/n≤ zα/2) ≈ 1− α

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 10 / 28

Page 11: Chap7

A Confidence Interval for a Population Proportion

Proposition

Let p̃ =p̂+z2

α/2/2n

1+z2α/2

/n. Then a confidence interval for a population proportion

p with confidence level approximately 100(1− α)% is

p̃ ± zα/2

√p̂(1− p̂)/n + z2

α/2/4n2

1 + z2α/2/n

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 11 / 28

Page 12: Chap7

Exercise (7.2) 21

In a sample of 1000 randomly selected consumers who had opportunitiesto send in a rebate claim form after purchasing a product, 250 of thesepeople said they never did so. Calculate an upper confidence bound at the95% confidence level for the true proportion of such consumers who neverapply for a rebate.Based on this bound, is there compelling evidence that the true proportionof such consumers is smaller than 1/3?

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 12 / 28

Page 13: Chap7

Intervals Based on a Normal Population Distribution

The CI for µ presented earlier is valid provided that n is large.

The resulting interval can be used whatever the nature of the populationdistribution.

The CLT cannot be invoked, however, when n is small.

In this case, one way to proceed is to make a specific assumption aboutthe form of the population distribution and then derive a CI tailored tothat assumption.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 13 / 28

Page 14: Chap7

Intervals Based on a Normal Population Distribution

Assumption

The population of interest is normal, so that X1, ...,Xn constitutes arandom sample from a normal distribution with both µ and σ2 unknown.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 14 / 28

Page 15: Chap7

Intervals Based on a Normal Population Distribution

The key result underlying the interval in earlier section was that for large

n, the rv Z = X̄−µS/√

nhas approximately a standard normal distribution.

When n is small, S is no longer likely to be close to s, so the variability inthe distribution of Z arises from randomness in both the numerator andthe denominator.

This implies that the probability distribution of X̄−µS/√

nwill be more spread

out than the standard normal distribution.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 15 / 28

Page 16: Chap7

Intervals Based on a Normal Population Distribution

The result on which inferences are based introduces a new family ofprobability distributions called t distributions.

Theorem

When X̄ is the mean of a random sample of size n from a normaldistribution with mean, the rv

T =X̄ − µS/√

n

has a probability distribution called a t distribution with n − 1 degrees offreedom (df).

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 16 / 28

Page 17: Chap7

Properties of t Distributions

Although the variable of interest is still X̄−µS/√

n, we now denote it by T to

emphasize that it does not have a standard normal distribution when n issmall.

We know that a normal distribution is governed by two parameters; eachdifferent choice of µ in combination with σ2 gives a particular normaldistribution.

Any particular t distribution results from specifying the value of a singleparameter, called the number of degrees of freedom, abbreviated df.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 17 / 28

Page 18: Chap7

Properties of t Distributions

Well denote this parameter by the Greek letter ν. Possible values of ν arethe positive integers 1, 2, 3,... So there is a t distribution with 1 df,another with 2 df, yet another with 3 df, and so on.

For any fixed value of ν, the density function that specifies the associatedt curve is even more complicated than the normal density function.

Fortunately, we need concern ourselves only with several of the moreimportant features of these curves.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 18 / 28

Page 19: Chap7

Properties of t Distributions

Let tν denote the t distribution with ν df.

1 Each tν curve is bell-shaped and centered at 0.

2 Each tν curve is more spread out than the standard normal (z) curve.

3 As ν increases, the spread of the corresponding tν curve decreases.

4 As ν →∞, the sequence of tν curves approaches the standard normalcurve (so the z curve is often called the t curve with df =∞).

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 19 / 28

Page 20: Chap7

Properties of t Distributions

T =X̄ − µS/√

n

The number of df for T is n − 1 because, although S is based on the ndeviations X1 − X̄ , ..., X̄ − Xn, the fact that

∑(Xi − X̄ ) = 0 implies that

only n − 1 of these are “freely determined.”

The number of df for a t variable is the number of freely determineddeviations on which the estimated standard deviation in the denominatorof T is based.

The use of t distribution in making inferences requires notation forcapturing t-curve tail areas tα analogous to zα for the z curve.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 20 / 28

Page 21: Chap7

Properties of t Distributions

Notation: Let tα,ν = the number on the measurement axis for which thearea under the t curve with ν df to the right of tα,ν is α; tα,ν is called a tcritical value.

For example, t.05,6 is the t critical value that captures an upper-tail area of0.05 under the t curve with 6 df.

Because t curves are symmetric about zero, -tα,ν captures lower-tail areaα.

Appendix Table A.5 gives tα,ν for selected values of α and n.

The columns of the table correspond to different values of α. To obtaint0.05,15, go to the α = 0.05 column, look down to the n = 15 row, andread t0.05,15 = 1.753.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 21 / 28

Page 22: Chap7

The One-Sample t Confidence Interval

Proposition

Let x̄ and s be the sample mean and sample standard deviation computedfrom the results of a random sample from a normal population with meanµ. Then a 100(1− α)% confidence interval for µ is

(x̄ − tα/2,n−1s√n, x̄ + tα/2,n−1

s√n

)

or, more compactly,

x̄ ± tα/2,n−1s√n

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 22 / 28

Page 23: Chap7

The One-Sample t Confidence Interval

Example (11):

Even as traditional markets for sweetgum lumber have declined, largesection solid timbers traditionally used for construction bridges and matshave become increasingly scarce.

The article “Development of Novel Industrial Laminated Planks fromSweetgum Lumber” (J. of Bridge Engr., 2008: 6466) described themanufacturing and testing of composite beams designed to add value tolow-grade sweetgum lumber.

Here is data on the modulus of rupture:

6807.99 7637.06 6663.28 6165.03 6991.41 6992.23 6981.46 7569.757437.88 6872.39 7663.18 6032.28 6906.04 6617.17 6984.12 7093.717659.50 7378.61 7295.54 6702.76 7440.17 8053.26 8284.75 7347.957422.69 7886.87 6316.67 7713.65 7503.33 7674.99

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 23 / 28

Page 24: Chap7

The One-Sample t Confidence Interval

Use R software.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 24 / 28

Page 25: Chap7

The One-Sample t Confidence Interval

Example (12)

Consider the following sample of fat content (in percentage) of n = 10randomly selected hot dogs (“Sensory and Mechanical Assessment of theQuality of Frankfurters,” J. of Texture Studies, 1990: 395409):

25.2 21.3 22.8 17.0 29.8 21.0 25.5 16.0 20.9 19.5

Assuming that these were selected from a normal population distribution,find a 95% CI for (interval estimate of) the population mean fat content.

Use your calculator to obtain x̄ and s.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 25 / 28

Page 26: Chap7

The Chi-Squared (χ2) Distribution

Definition

Let X1,X2, ...,Xn be a random sample from a normal distribution withparameters µ and σ2. Then the rv

(n − 1)S2

σ2=

∑(Xi − X̄ )2

σ2

has a chi-squared (χ2) probability distribution with ν = n − 1 df.

Notation: Let χ2α,ν called a chi-squared critical value, denote the number

on the horizontal axis such that α of the area under the chi-squared curvewith ν df lies to the right of χ2

α,ν .

Remark: The chi-squared distribution is not symmetric

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 26 / 28

Page 27: Chap7

Confidence Interval of σ2

From the theorem,

P(χ21−α/2,n−1 ≤

(n − 1)S2

σ2≤ χ2

α/2,n−1) = 1− α

we get the inequalities

(n − 1)S2

χ2α/2,n−1

≤ α ≤ (n − 1)S2

χ21−α/2,n−1

I A 100(1− α)% confidence interval for the variance σ2 of a normalpopulation is

((n − 1)s2

χ2α/2,n−1

,(n − 1)s2

χ21−α/2,n−1

)

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 27 / 28

Page 28: Chap7

(Suppl) 51

An April 2009 survey of 2253 American adults conducted by the PewResearch Center’s Internet & American Life Project revealed that 1262 ofthe respondents had at some point used wireless means for online access.

1 Calculate an interpret a 95% CI for the proportion of all Americanadults who at the time of the survey had used wireless means foronline access.

2 What sample size is required if the desired width of the 95% CI is tobe at most 0.04, irrespective of the sample results?

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 28 / 28