Chap7

Post on 21-Jan-2016

255 views 0 download

Tags:

description

basic statistics

Transcript of Chap7

STAT355 - Probability & StatisticsChapter 7: Statistical Intervals Based on a Single Sample

Fall 2011

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 1 / 28

Chapter 7 - Statistical Intervals Based on a Single Sample

1 7.1 Basic Properties of Confidence Intervals

2 7.2 Large-Sample Confidence Intervals for a Population Mean andProportion

3 7.3 Intervals Based on a Normal Population Distribution

4 7.4 Confidence Intervals for the Variance and Standard Deviation of aNormal Population

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 2 / 28

Basic Properties of Confidence Intervals

Consider a random sample X1, ...,Xn from N(µ, σ2) and x1, ..., xn be theactual observations of the random sample.

Sample mean X̄ ∼ N(µ, σ2/n).

Z =X̄ − µσ/√

n∼ N(0, 1)

P(−1.96 ≤ X̄ − µσ/√

n≤ 1.96) = 0.95

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 3 / 28

Basic Properties of Confidence Intervals

P(−1.96 ≤ X̄ − µσ/√

n≤ 1.96) = 0.95

is equivalent to

P(X̄ − 1.96σ√n≤ µ ≤ X̄ + 1.96

σ√n

) = 0.95

Thus,

(X̄ − 1.96σ√n, X̄ + 1.96

σ√n

)

is a random interval that includes or covers the true value of µ.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 4 / 28

Basic Properties of Confidence Intervals

(X̄ − 1.96σ√n, X̄ + 1.96

σ√n

) (1)

is a random interval that includes or covers the true value of µ.

Definition

If, after observing X1 = x1,X2 = x2, ...,Xn = xn, we compute the observedsample mean x̄ and then substitute x̄ into (1) in place of X̄ , the resultingfixed interval

(x̄ − 1.96σ√n, x̄ + 1.96

σ√n

)

is called a 95% confidence interval for µ.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 5 / 28

Basic Properties of Confidence Intervals

Definition

A 100(1− α)% confidence interval for the mean µ of a normal populationwhen the value of σ2 is known is given

(x̄ − zα/2σ√n, x̄ + zα/2

σ√n

)

or, equivalently, by

x̄ ± zα/2σ√n

α = 0.1, zα/2 = z0.05 = 1.64

α = 0.05, zα/2 = z0.025 = 1.96

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 6 / 28

Example

Exercises 1: Consider a normal population with the value of σ known.1 What is the confidence interval level for the interval x̄ ± 2.81σ/

√n?

2 What is the confidence interval level for the interval x̄ ± 1.44σ/√

n?3 What is the value of zα/2 that will result in a confidence level of

99.7%?

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 7 / 28

Large-Sample Confidence Intervals for a Population Mean

Consider X1, ...,Xn from N(µ, σ2). Often, σ2 is unknown. Let S be thesample standard deviation.

Proposition

If n is sufficiently large, the standardized variable

Z =X − µS/√

n

has approximately a standard normal distribution. This implies that

x̄ ± zα/2s√n

is a large-sample confidence interval for µ with confidence levelapproximately 100(1− α)%. This formula is valid regardless of the shapeof the population distribution.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 8 / 28

A Confidence Interval for a Population Proportion

Let p denote the proportion of “successes” in a population.

A random sample of n individuals is to be selected, and X is the numberof successes in the sample.

Provided that n is small compared to the population size, X can beregarded as a binomial rv with

E (X ) = np and σX =√

np(1− p)

I Furthermore, if both np ≥ 10 and n(1− p) ≥ 10, then X hasapproximately a normal distribution.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 9 / 28

A Confidence Interval for a Population Proportion

The natural estimator of p is p̂ = X/n, the sample fraction of successes.

Since p̂ is just X multiplied by the constant 1/n, p̂ also has approximatelya normal distribution.

As we know that, E (p̂) = p (unbiasedness) and σp̂ =√

p(1− p)/n.

The standard deviation σp̂ involves the unknown parameter p.Standardizing p̂ by subtracting p and dividing by σp̂ then implies that

P(−zα/2 ≤p̂ − p√

p(1− p)/n≤ zα/2) ≈ 1− α

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 10 / 28

A Confidence Interval for a Population Proportion

Proposition

Let p̃ =p̂+z2

α/2/2n

1+z2α/2

/n. Then a confidence interval for a population proportion

p with confidence level approximately 100(1− α)% is

p̃ ± zα/2

√p̂(1− p̂)/n + z2

α/2/4n2

1 + z2α/2/n

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 11 / 28

Exercise (7.2) 21

In a sample of 1000 randomly selected consumers who had opportunitiesto send in a rebate claim form after purchasing a product, 250 of thesepeople said they never did so. Calculate an upper confidence bound at the95% confidence level for the true proportion of such consumers who neverapply for a rebate.Based on this bound, is there compelling evidence that the true proportionof such consumers is smaller than 1/3?

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 12 / 28

Intervals Based on a Normal Population Distribution

The CI for µ presented earlier is valid provided that n is large.

The resulting interval can be used whatever the nature of the populationdistribution.

The CLT cannot be invoked, however, when n is small.

In this case, one way to proceed is to make a specific assumption aboutthe form of the population distribution and then derive a CI tailored tothat assumption.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 13 / 28

Intervals Based on a Normal Population Distribution

Assumption

The population of interest is normal, so that X1, ...,Xn constitutes arandom sample from a normal distribution with both µ and σ2 unknown.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 14 / 28

Intervals Based on a Normal Population Distribution

The key result underlying the interval in earlier section was that for large

n, the rv Z = X̄−µS/√

nhas approximately a standard normal distribution.

When n is small, S is no longer likely to be close to s, so the variability inthe distribution of Z arises from randomness in both the numerator andthe denominator.

This implies that the probability distribution of X̄−µS/√

nwill be more spread

out than the standard normal distribution.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 15 / 28

Intervals Based on a Normal Population Distribution

The result on which inferences are based introduces a new family ofprobability distributions called t distributions.

Theorem

When X̄ is the mean of a random sample of size n from a normaldistribution with mean, the rv

T =X̄ − µS/√

n

has a probability distribution called a t distribution with n − 1 degrees offreedom (df).

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 16 / 28

Properties of t Distributions

Although the variable of interest is still X̄−µS/√

n, we now denote it by T to

emphasize that it does not have a standard normal distribution when n issmall.

We know that a normal distribution is governed by two parameters; eachdifferent choice of µ in combination with σ2 gives a particular normaldistribution.

Any particular t distribution results from specifying the value of a singleparameter, called the number of degrees of freedom, abbreviated df.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 17 / 28

Properties of t Distributions

Well denote this parameter by the Greek letter ν. Possible values of ν arethe positive integers 1, 2, 3,... So there is a t distribution with 1 df,another with 2 df, yet another with 3 df, and so on.

For any fixed value of ν, the density function that specifies the associatedt curve is even more complicated than the normal density function.

Fortunately, we need concern ourselves only with several of the moreimportant features of these curves.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 18 / 28

Properties of t Distributions

Let tν denote the t distribution with ν df.

1 Each tν curve is bell-shaped and centered at 0.

2 Each tν curve is more spread out than the standard normal (z) curve.

3 As ν increases, the spread of the corresponding tν curve decreases.

4 As ν →∞, the sequence of tν curves approaches the standard normalcurve (so the z curve is often called the t curve with df =∞).

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 19 / 28

Properties of t Distributions

T =X̄ − µS/√

n

The number of df for T is n − 1 because, although S is based on the ndeviations X1 − X̄ , ..., X̄ − Xn, the fact that

∑(Xi − X̄ ) = 0 implies that

only n − 1 of these are “freely determined.”

The number of df for a t variable is the number of freely determineddeviations on which the estimated standard deviation in the denominatorof T is based.

The use of t distribution in making inferences requires notation forcapturing t-curve tail areas tα analogous to zα for the z curve.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 20 / 28

Properties of t Distributions

Notation: Let tα,ν = the number on the measurement axis for which thearea under the t curve with ν df to the right of tα,ν is α; tα,ν is called a tcritical value.

For example, t.05,6 is the t critical value that captures an upper-tail area of0.05 under the t curve with 6 df.

Because t curves are symmetric about zero, -tα,ν captures lower-tail areaα.

Appendix Table A.5 gives tα,ν for selected values of α and n.

The columns of the table correspond to different values of α. To obtaint0.05,15, go to the α = 0.05 column, look down to the n = 15 row, andread t0.05,15 = 1.753.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 21 / 28

The One-Sample t Confidence Interval

Proposition

Let x̄ and s be the sample mean and sample standard deviation computedfrom the results of a random sample from a normal population with meanµ. Then a 100(1− α)% confidence interval for µ is

(x̄ − tα/2,n−1s√n, x̄ + tα/2,n−1

s√n

)

or, more compactly,

x̄ ± tα/2,n−1s√n

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 22 / 28

The One-Sample t Confidence Interval

Example (11):

Even as traditional markets for sweetgum lumber have declined, largesection solid timbers traditionally used for construction bridges and matshave become increasingly scarce.

The article “Development of Novel Industrial Laminated Planks fromSweetgum Lumber” (J. of Bridge Engr., 2008: 6466) described themanufacturing and testing of composite beams designed to add value tolow-grade sweetgum lumber.

Here is data on the modulus of rupture:

6807.99 7637.06 6663.28 6165.03 6991.41 6992.23 6981.46 7569.757437.88 6872.39 7663.18 6032.28 6906.04 6617.17 6984.12 7093.717659.50 7378.61 7295.54 6702.76 7440.17 8053.26 8284.75 7347.957422.69 7886.87 6316.67 7713.65 7503.33 7674.99

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 23 / 28

The One-Sample t Confidence Interval

Use R software.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 24 / 28

The One-Sample t Confidence Interval

Example (12)

Consider the following sample of fat content (in percentage) of n = 10randomly selected hot dogs (“Sensory and Mechanical Assessment of theQuality of Frankfurters,” J. of Texture Studies, 1990: 395409):

25.2 21.3 22.8 17.0 29.8 21.0 25.5 16.0 20.9 19.5

Assuming that these were selected from a normal population distribution,find a 95% CI for (interval estimate of) the population mean fat content.

Use your calculator to obtain x̄ and s.

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 25 / 28

The Chi-Squared (χ2) Distribution

Definition

Let X1,X2, ...,Xn be a random sample from a normal distribution withparameters µ and σ2. Then the rv

(n − 1)S2

σ2=

∑(Xi − X̄ )2

σ2

has a chi-squared (χ2) probability distribution with ν = n − 1 df.

Notation: Let χ2α,ν called a chi-squared critical value, denote the number

on the horizontal axis such that α of the area under the chi-squared curvewith ν df lies to the right of χ2

α,ν .

Remark: The chi-squared distribution is not symmetric

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 26 / 28

Confidence Interval of σ2

From the theorem,

P(χ21−α/2,n−1 ≤

(n − 1)S2

σ2≤ χ2

α/2,n−1) = 1− α

we get the inequalities

(n − 1)S2

χ2α/2,n−1

≤ α ≤ (n − 1)S2

χ21−α/2,n−1

I A 100(1− α)% confidence interval for the variance σ2 of a normalpopulation is

((n − 1)s2

χ2α/2,n−1

,(n − 1)s2

χ21−α/2,n−1

)

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 27 / 28

(Suppl) 51

An April 2009 survey of 2253 American adults conducted by the PewResearch Center’s Internet & American Life Project revealed that 1262 ofthe respondents had at some point used wireless means for online access.

1 Calculate an interpret a 95% CI for the proportion of all Americanadults who at the time of the survey had used wireless means foronline access.

2 What sample size is required if the desired width of the 95% CI is tobe at most 0.04, irrespective of the sample results?

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 28 / 28