Chap7

STAT355 - Probability & StatisticsChapter 7: Statistical Intervals Based on a Single Sample

Fall 2011

()STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single SampleFall 2011 1 / 28

Chapter 7 - Statistical Intervals Based on a Single Sample

1 7.1 Basic Properties of Confidence Intervals

2 7.2 Large-Sample Confidence Intervals for a Population Mean andProportion

3 7.3 Intervals Based on a Normal Population Distribution

4 7.4 Confidence Intervals for the Variance and Standard Deviation of aNormal Population

Basic Properties of Confidence Intervals

Consider a random sample X1, ...,Xn from N(µ, σ2) and x1, ..., xn be theactual observations of the random sample.

Sample mean X̄ ∼ N(µ, σ2/n).

Z =X̄ − µσ/√

n∼ N(0, 1)

P(−1.96 ≤ X̄ − µσ/√

n≤ 1.96) = 0.95

P(−1.96 ≤ X̄ − µσ/√

n≤ 1.96) = 0.95

is equivalent to

P(X̄ − 1.96σ√n≤ µ ≤ X̄ + 1.96

σ√n

) = 0.95

(X̄ − 1.96σ√n, X̄ + 1.96

σ√n

is a random interval that includes or covers the true value of µ.

(X̄ − 1.96σ√n, X̄ + 1.96

σ√n

is a random interval that includes or covers the true value of µ.

Definition

If, after observing X1 = x1,X2 = x2, ...,Xn = xn, we compute the observedsample mean x̄ and then substitute x̄ into (1) in place of X̄ , the resultingfixed interval

(x̄ − 1.96σ√n, x̄ + 1.96

σ√n

is called a 95% confidence interval for µ.

Definition

A 100(1− α)% confidence interval for the mean µ of a normal populationwhen the value of σ2 is known is given

(x̄ − zα/2σ√n, x̄ + zα/2

σ√n

or, equivalently, by

x̄ ± zα/2σ√n

α = 0.1, zα/2 = z0.05 = 1.64

α = 0.05, zα/2 = z0.025 = 1.96

Example

Exercises 1: Consider a normal population with the value of σ known.1 What is the confidence interval level for the interval x̄ ± 2.81σ/

2 What is the confidence interval level for the interval x̄ ± 1.44σ/√

n?3 What is the value of zα/2 that will result in a confidence level of

99.7%?

Large-Sample Confidence Intervals for a Population Mean

Consider X1, ...,Xn from N(µ, σ2). Often, σ2 is unknown. Let S be thesample standard deviation.

Proposition

If n is sufficiently large, the standardized variable

Z =X − µS/√

has approximately a standard normal distribution. This implies that

x̄ ± zα/2s√n

is a large-sample confidence interval for µ with confidence levelapproximately 100(1− α)%. This formula is valid regardless of the shapeof the population distribution.

A Confidence Interval for a Population Proportion

Let p denote the proportion of “successes” in a population.

A random sample of n individuals is to be selected, and X is the numberof successes in the sample.

Provided that n is small compared to the population size, X can beregarded as a binomial rv with

E (X ) = np and σX =√

np(1− p)

I Furthermore, if both np ≥ 10 and n(1− p) ≥ 10, then X hasapproximately a normal distribution.

The natural estimator of p is p̂ = X/n, the sample fraction of successes.

Since p̂ is just X multiplied by the constant 1/n, p̂ also has approximatelya normal distribution.

As we know that, E (p̂) = p (unbiasedness) and σp̂ =√

p(1− p)/n.

The standard deviation σp̂ involves the unknown parameter p.Standardizing p̂ by subtracting p and dividing by σp̂ then implies that

P(−zα/2 ≤p̂ − p√

p(1− p)/n≤ zα/2) ≈ 1− α

Proposition

Let p̃ =p̂+z2

α/2/2n

1+z2α/2

/n. Then a confidence interval for a population proportion

p with confidence level approximately 100(1− α)% is

p̃ ± zα/2

√p̂(1− p̂)/n + z2

α/2/4n2

1 + z2α/2/n

Exercise (7.2) 21

In a sample of 1000 randomly selected consumers who had opportunitiesto send in a rebate claim form after purchasing a product, 250 of thesepeople said they never did so. Calculate an upper confidence bound at the95% confidence level for the true proportion of such consumers who neverapply for a rebate.Based on this bound, is there compelling evidence that the true proportionof such consumers is smaller than 1/3?

Intervals Based on a Normal Population Distribution

The CI for µ presented earlier is valid provided that n is large.

The resulting interval can be used whatever the nature of the populationdistribution.

The CLT cannot be invoked, however, when n is small.

In this case, one way to proceed is to make a specific assumption aboutthe form of the population distribution and then derive a CI tailored tothat assumption.

Assumption

The population of interest is normal, so that X1, ...,Xn constitutes arandom sample from a normal distribution with both µ and σ2 unknown.

The key result underlying the interval in earlier section was that for large

n, the rv Z = X̄−µS/√

nhas approximately a standard normal distribution.

When n is small, S is no longer likely to be close to s, so the variability inthe distribution of Z arises from randomness in both the numerator andthe denominator.

This implies that the probability distribution of X̄−µS/√

nwill be more spread

out than the standard normal distribution.

The result on which inferences are based introduces a new family ofprobability distributions called t distributions.

Theorem

When X̄ is the mean of a random sample of size n from a normaldistribution with mean, the rv

T =X̄ − µS/√

has a probability distribution called a t distribution with n − 1 degrees offreedom (df).

Properties of t Distributions

Although the variable of interest is still X̄−µS/√

n, we now denote it by T to

emphasize that it does not have a standard normal distribution when n issmall.

We know that a normal distribution is governed by two parameters; eachdifferent choice of µ in combination with σ2 gives a particular normaldistribution.

Any particular t distribution results from specifying the value of a singleparameter, called the number of degrees of freedom, abbreviated df.

Well denote this parameter by the Greek letter ν. Possible values of ν arethe positive integers 1, 2, 3,... So there is a t distribution with 1 df,another with 2 df, yet another with 3 df, and so on.

For any fixed value of ν, the density function that specifies the associatedt curve is even more complicated than the normal density function.

Fortunately, we need concern ourselves only with several of the moreimportant features of these curves.

Let tν denote the t distribution with ν df.

1 Each tν curve is bell-shaped and centered at 0.

2 Each tν curve is more spread out than the standard normal (z) curve.

3 As ν increases, the spread of the corresponding tν curve decreases.

4 As ν →∞, the sequence of tν curves approaches the standard normalcurve (so the z curve is often called the t curve with df =∞).

T =X̄ − µS/√

The number of df for T is n − 1 because, although S is based on the ndeviations X1 − X̄ , ..., X̄ − Xn, the fact that

∑(Xi − X̄ ) = 0 implies that

only n − 1 of these are “freely determined.”

The number of df for a t variable is the number of freely determineddeviations on which the estimated standard deviation in the denominatorof T is based.

The use of t distribution in making inferences requires notation forcapturing t-curve tail areas tα analogous to zα for the z curve.

Notation: Let tα,ν = the number on the measurement axis for which thearea under the t curve with ν df to the right of tα,ν is α; tα,ν is called a tcritical value.

For example, t.05,6 is the t critical value that captures an upper-tail area of0.05 under the t curve with 6 df.

Because t curves are symmetric about zero, -tα,ν captures lower-tail areaα.

Appendix Table A.5 gives tα,ν for selected values of α and n.

The columns of the table correspond to different values of α. To obtaint0.05,15, go to the α = 0.05 column, look down to the n = 15 row, andread t0.05,15 = 1.753.

The One-Sample t Confidence Interval

Proposition

Let x̄ and s be the sample mean and sample standard deviation computedfrom the results of a random sample from a normal population with meanµ. Then a 100(1− α)% confidence interval for µ is

(x̄ − tα/2,n−1s√n, x̄ + tα/2,n−1

or, more compactly,

x̄ ± tα/2,n−1s√n

Example (11):

Even as traditional markets for sweetgum lumber have declined, largesection solid timbers traditionally used for construction bridges and matshave become increasingly scarce.

The article “Development of Novel Industrial Laminated Planks fromSweetgum Lumber” (J. of Bridge Engr., 2008: 6466) described themanufacturing and testing of composite beams designed to add value tolow-grade sweetgum lumber.

Here is data on the modulus of rupture:

6807.99 7637.06 6663.28 6165.03 6991.41 6992.23 6981.46 7569.757437.88 6872.39 7663.18 6032.28 6906.04 6617.17 6984.12 7093.717659.50 7378.61 7295.54 6702.76 7440.17 8053.26 8284.75 7347.957422.69 7886.87 6316.67 7713.65 7503.33 7674.99

Use R software.

Example (12)

Consider the following sample of fat content (in percentage) of n = 10randomly selected hot dogs (“Sensory and Mechanical Assessment of theQuality of Frankfurters,” J. of Texture Studies, 1990: 395409):

25.2 21.3 22.8 17.0 29.8 21.0 25.5 16.0 20.9 19.5

Assuming that these were selected from a normal population distribution,find a 95% CI for (interval estimate of) the population mean fat content.

Use your calculator to obtain x̄ and s.

The Chi-Squared (χ2) Distribution

Definition

Let X1,X2, ...,Xn be a random sample from a normal distribution withparameters µ and σ2. Then the rv

(n − 1)S2

∑(Xi − X̄ )2

has a chi-squared (χ2) probability distribution with ν = n − 1 df.

Notation: Let χ2α,ν called a chi-squared critical value, denote the number

on the horizontal axis such that α of the area under the chi-squared curvewith ν df lies to the right of χ2

α,ν .

Remark: The chi-squared distribution is not symmetric

Confidence Interval of σ2

From the theorem,

P(χ21−α/2,n−1 ≤

(n − 1)S2

σ2≤ χ2

α/2,n−1) = 1− α

we get the inequalities

(n − 1)S2

χ2α/2,n−1

≤ α ≤ (n − 1)S2

χ21−α/2,n−1

I A 100(1− α)% confidence interval for the variance σ2 of a normalpopulation is

((n − 1)s2

χ2α/2,n−1

,(n − 1)s2

χ21−α/2,n−1

(Suppl) 51

An April 2009 survey of 2253 American adults conducted by the PewResearch Center’s Internet & American Life Project revealed that 1262 ofthe respondents had at some point used wireless means for online access.

1 Calculate an interpret a 95% CI for the proportion of all Americanadults who at the time of the survey had used wireless means foronline access.

2 What sample size is required if the desired width of the 95% CI is tobe at most 0.04, irrespective of the sample results?

Chap7

Documents

Transcript of Chap7

Chap7 Fiber Communication

Operating System: Chap7 Deadlocks

Reactor Textbook Chap7

Chap7 Digital Filter

Poli330 Chap7

Chap7 Lean

Control chap7

Lecture8 chap7

npd chap7

Umts Chap7

Chap7- Flexion Composee

Ems207 week2-chap7

Sec2 Chap7 Syonan[1]

Chap7 (1).pdf

Chap7 Security Authorization

Chap7(Responsibility Accounting)

Anthony Appiah Cosmopolitanism Chap7

20130829 international economics chap7

Chap7 Entropy Handout

PHYSIOLOGY Chap7