IV. CONFIDENCE INTERVAL AND ESTIMATION
Transcript of IV. CONFIDENCE INTERVAL AND ESTIMATION
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
1
Unbiased Estimators – So we don’t have favorite.
IV. CONFIDENCE INTERVAL AND ESTIMATION
4.1. Significant Level and Critical Values z and t
The significant level, often denoted by , is a probability measure that indicates how
confident the sample statistic such as sample mean x or sample proportion p is
“significantly” different from the population mean or population proportion p ,
respectively. The significant level is usually associated with the critical z-scores z in
)1,0(N distribution or critical t in t – distribution, depending on which distribution in the
applications is used.
Three cases associated with are often practiced: the left-tailed and the right-tailed, two-
tailed. The two-tailed case is used in determining the confidence interval in normal and t
distributions. df is the degree of freedom in t distribution.
In Ti-84:
Left-tailed: invNormz , dfinvTt , ;
Right-tailed: 1invNormz , dfinvTt ,1
Two-tailed:
2
invNormz , ,
2t invT df
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
2
Example 4.1.1. For the given , df , and the types of tails, find z or t as indicated.
Type df z or t Graph
1.0 Z - Distribution, right - tailed -
1.0 Z - Distribution, left - tailed -
1.0 Z - Distribution, two - tailed -
05.0 Z - Distribution, two - tailed -
0.01 Z - Distribution, two - tailed -
1.0 t - Distribution, two - tailed 5
1.0 t - Distribution, two - tailed 20
0.1 t - Distribution, two - tailed 30
0.1 t - Distribution, two - tailed 60
05.0 t - Distribution, two - tailed 5
0.05 t - Distribution, two - tailed 20
0.05 t - Distribution, two - tailed 30
01.0 t - Distribution, two - tailed 5
01.0 t - Distribution, two - tailed 30
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
3
Solution:
Type df z or t Graph
1.0 Z - Distribution, right - tailed - 1.282z
1.0 Z - Distribution, left - tailed - 1.282z
1.0 Z - Distribution, two - tailed - 1.645z
05.0 Z - Distribution, two - tailed - 1.960z
0.01 Z - Distribution, two - tailed - 2.576z
1.0 t - Distribution, two - tailed 5 2.015t
1.0 t - Distribution, two - tailed 20 1.725t
0.1 t - Distribution, two - tailed 30 1.697t
0.1 t - Distribution, two - tailed 60 1.671t
05.0 t - Distribution, two - tailed 5 2.571t
0.05 t - Distribution, two - tailed 20 2.086t
0.05 t - Distribution, two - tailed 30 2.042t
01.0 t - Distribution, two - tailed 5 4.032t
01.0 t - Distribution, two - tailed 30 2.750t
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
4
4.2. Confidence Interval (CI)
The confidence level (CL) is defined as 1 . In the two-tailed case when the is given,
the confidence interval (CI) is ,z z for Z- distribution, and the CI is ,t t
for t-distribution. The area on each side of the two-sided case is2
.
Example 4.2.1. To compare Z-distribution and t- distribution, find
distribution df CI
1.0 Z -
05.0 Z -
01.0 Z -
1.0 t 5
1.0 t 30
05.0 t 5
05.0 t 30
01.0 t 5
01.0 t 30
Solution:
distribution df CI
1.0 Z - ( 1.645,1.645)
05.0 Z - ( 1.960,1.960)
01.0 Z - ( 2.576, 2.576)
1.0 t 5 ( 2.015, 2.015)
1.0 t 30 ( 1.697,1.697)
05.0 t 5 ( 2.571, 2.571)
05.0 t 30 ( 2.042, 2.042)
01.0 t 5 ( 4.032, 4.032)
01.0 t 30 ( 2.750, 2.750)
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
5
For the distribution 2( , )N , the z-scores can be used to transform 2( , ) (0,1)N N .
In summary:
Distribution Confidence Interval (CI)
)1,0(N ,CI z z or z0
)(tk ,CI t t or t0 , where k df
),( 2N zzCI , or z
when and are known
4.3. Definition of Unbiased Estimators
An estimator T is called an unbiased estimator for parameter , if the expected value of
T is :
][TE
The difference of ][TE is called the bias of T . The intuitive meaning of an unbiased
estimator is one that does not systematically overestimate or underestimate the .
4.4. The Proportion Estimator for Binomial Distributions
Let N denote the total number of units or items in the population. Suppose X is the sum
of n independent discrete random binomial variables nXXX ,,, 21 that take one or zero,
and with success probability p for each trial:
nXXXX ...21
Where n N . The mean and variance of X are npXE ][ and pnpXVar 1)( .
If the sample distribution 1 2ˆ nX X XXp
n n
is defined as the rescaled or the
average binomial variable X , then the mean of p̂ is
ˆ
[ ]ˆ[ ]p
X E X npE p E p
n n n
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
6
That is, the estimator p̂ has the property of ppE ˆ . Therefore, the sample proportion
p̂ is an unbiased estimator for the population proportion p. Its variance is
2
ˆ 2 2
[ ] (1 ) (1 )ˆ[ ]p
X Var X np p p pVar p Var
n n n n
The proportion problem should really be modeled as hypergeometric problem. That is,
the k successes in n draws without replacement. The standard deviation for
hypergeometric model is
2
ˆ
(1 )
1p
N n p p
N n
Where 1
N n
N
is the modification factor. When the sampling fraction
n
Nis small, say
10%n
N , the standard deviation can be approximated as
ˆ
1(1 ) (1 )
11
p
n
p p p pN
n n
N
This is the the “10% rule”, and in this case the binomial approximation is satisfactory.
When 1)1( pnp , or equivalently 10ˆ pn and 10)ˆ1( pn as noted later, the
binomial distribution can be approximated by the Z distribution. The z-scores in this case
can be calculated as
ˆ ˆ
ˆ ˆ(1 ) (1 )
p p p pz
p p p p
n n
p̂ can be used to approximate ˆ
ˆ ˆ(1 )p
p p
n
, when p is unknown and the 10% rule is
satisfied. The significant level is
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
7
22 )ˆ1(ˆ
ˆ
)1(
ˆ z
n
pp
ppz
n
pp
pp
The confidence interval is then
2 2
ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ,
p p p pCI p z p z
n n
or in the interval form:
2 2
ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ
p p p pp z p p z
n n
or in the form of error terms:
n
ppzp
)ˆ1(ˆˆ
2
In summary, the steps to obtain CI for the sample proportion (statistic) p̂ with sample
size n are
1.) The sample is a simple random sample.
2.) The sample size is less than 10% of population.
3.) 10ˆ pn , 10)ˆ1( pn to approximate the binomial distribution to normal distribution.
Example 4.4.1. THS administrators wanted to know how many 10th graders and 11th
graders did either internships or community services in the past summer. A random
sample of 75 students indicated that 60 students did one of the two. Find the 95%
confidence interval for the school proportion. Assume that the school population of these
two classes is 800 and all two grades are equally likely to do summer internships or
community services.
Solution:
8.075/60ˆ p , 05.0 , 96.12
zz ( )025.0(invNormz )
80800%1075 n , 608.075ˆ pn , 152.075ˆ1 pn
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
8
091.08.075
)8.01(8.096.18.0
)ˆ1(ˆˆ
2
n
ppzp
or
891.0,709.0CI
That is, the school is 95% confident that the true proportion of those two grades who did
the summer internships or community services is between 0.709 and 0.891.
Example 4.4.2. A previous study has suggested that about 19.3% of teens (aged 12-19)
are obese. How large of a sample will be needed in order to estimate the true proportion
of obese teens with 95% confidence and a margin of error of no more than 1%?
Solution:
193.0ˆ p , 05.0 , 96.12
zz ( )025.0(invNormz )
59830001.0
)193.01(193.096.1
0001.0
ˆ1ˆ
01.0)ˆ1(ˆ 2
2
2
2
ppz
nn
ppz
4.5. The Mean Estimator for the Normal Distribution 2( , )N
Suppose nXXX ,,, 21 are continuous independent random variables from a normal
distribution with the known mean and the known variance 2 . Then
n
XXXX n
...21
is the unbiased mean estimator:
n
n
n
XE
n
XXXEXE
in][...
][ 21
The variance is 2 2 2
2 1 2
2 2
[ ]...[ ]
inx
E XX X X nVar X Var
n n n n
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
9
Similar to the proportion problem, the correct hypergeometric model should add the
correction factor to the variance: 2
2
1x
N n
N n
When the 10% rule is satisfied, the standard deviation is
2 1
111
x
n
N n N
N n n n
N
The Central Limit Theorem says that when the sample size is large, this unbiased
estimator 2
~ ( , ) ~ (0,1)x
X N Nn
n
.
For the confidence level 1 ,
2
z
n
x. The confidence interval is,
nzx
nzxCI
22
,
or
nzx
nzx
22
or
nzx
2
The conditions to obtain CI for the sample mean (statistic) x with sample size n are
1.) The sample is a simple random sample.
2.) The sample size is less than 10% of population.
3.) is known and 30n for the condition of using normal distribution.
Example 4.5.1. The President of a large university wishes to estimate the average age of
the students presently enrolled. For the past studies, the standard deviation is known to be
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
10
2 years. A sample of 50 students is selected randomly, and sample mean is found to be
23.2 years. Find the 95% confidence interval of the school’s population mean.
Solution:
2 , 50n , 2.23x , 05.0 , 96.12
zz , ?CI
50
296.12.23
50
296.12.23
or
8.23,6.22CI
That is, the President can say with 95% confidence that the average age of students is
between 22.6 and 23.8 years old.
Example 4.5.2. From the last example, the President would like to be 99% confident that
the estimate of average age should be accurate within 1 year when the standard deviation
of the ages is 3 years. How large a sample is necessary?
Solution:
3 , 01.0 , 58.22
zz , ?n
6013
58.212
nnn
z
The sample size needs to be at least 60.
4.6. The Mean Estimator for t-Distribution
For a random variable ~ ( 1)X T n , where ( 1)T n is the t-distribution with the degrees
of freedom 1 nk and with sample size of n , the mean 0][ XE , the variance
2][
k
kXVar for 2k , the skewness is 0 for 3k , and the kurtosis is
4
6
kfor 4k .
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
11
When variance 2 is unknown for nXXX ,,, 21 , the unbiased variance estimator is:
22
1
1
1
n
i
i
s x xn
That is, 2 2[ ]E s S , and 2
xs , the estimate of standard error of 1 2 ... nX X XX
n
, is
22
1x
N n ss
N n
When the 10% rule is satisfied, the standard error of x is
x
ss
n
Note that the sample standard deviation estimator
n
i
i xxn
s1
2
1
1
is not an unbiased estimator of the population standard error S .
When the sample size is less than 30, and/or is unknown, similar to use normal
distribution, the mean estimator X can be studentized by the t-distribution
~ ( 1)x
T ns
n
, and the confidence level is determined by
1
2
n
k
xt
sn
.
The confidence interval is
1 1
2 2
,n ns sCI x t x t
n n
or
1 1
2 2
n ns sx t x t
n n
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
12
or
1
2
n sx t
n
Example 4.6.1. A sample of 28 THS teachers travels an average (mean) of 14.3 miles to
school. The standard deviation of their travel time was 2 miles. Find the 95% confidence
interval of true mean or population mean.
Solution:
28n , 2s , 3.14x , 05.0 , 27128 k , 05.227
2
tt , ?CI
28
205.23.14
28
205.23.14
1.155.13
or
1.15,5.13CI
4.7. Compare the Difference of Two Means with Unequal Known Variances and Large
sample Size
It is much more common to compare the difference of two means or proportions than to
estimate the means or proportions themselves.
For two normal distributions 2
1 1,N x and 2
2 2,N x , when 1 2, 30n n and variances
2
1 and 2
2 are known and unequal, the difference of the two means 2 1x x to estimate the
difference of population means 1 2 has the distribution 2 2
1 21 2
1 2
,N x xn n
. The
confidence interval is
2 2 2 2
1 2 1 21 2 1 2
1 2 1 22 2
CI ( ) , ( )x x z x x zn n n n
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
13
or 2 2
1 21 2
1 22
( )x x zn n
Example 4.7.1. A research team is interested in the difference between serum uric acid
levels in patients with and without Down's syndrome. A sample of 12 individuals with
Down's syndrome yielded a mean of 1 4.5x mg/100 ml. A sample of 15 normal
individuals of the same age and sex were found to have a mean value of 2 3.4x mg/100
ml. If it is reasonable to assume that the two populations of values are normally
distributed with variances equal to 1 and 1.5, find the 95 percent confidence interval for
1 2 .
Solution:
2
1 1 112, 4.5, 1n x
2
2 1 115, 3.4, 1.5n x
1 2 4.5 3.4 1.1x x
2 2 2
1 2
1 2
1 1.50.4282
12 15n n
2 2
1 21 2
1 22
( ) 1.1 1.96(0.4282) 1.1 0.8393x x zn n
, or (0.26,1.94)CI
That is, the difference between the two population means is 1.1 and we are 95%
confident that the true difference between the means lies between 0.26 and 1.94.
4.8. Compare the Difference of Two Means with Small Sample Size and/or with Unknown
Equal Variances
For sample size 1 2, 30n n and/or with unknown equal variances, the pooled standard
error is used for estimating difference of population means 1 2 from the difference of
the two means 1 2x x :
2 2
1 1 2 2
1 2
( 1) ( 1)
2p
n s n ss
n n
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
14
and
1 2 1 2
1 2
1 2
( )( 2)
1 1p
x xT n n
sn n
.
The confidence interval is
1 2 1 22 2
1 2 1 2
1 2 1 22 2
1 1 1 1CI ( ) , ( )
n n n n
p px x t s x x t sn n n n
or
1 2 2
1 2
1 22
1 1( )
n n
px x t sn n
Example 4.8.1. An experiment was done to compare the mean number of tapeworms in
the stomachs of sheep that had been treated for worms versus those not treated. There
were 7 sheep in the treatment group and 7 in the control group. The means and standard
deviation are
Treatment Control
x 28.57 40.0 2s 198.62 215.33
n 7 7
What is the confidence interval for the difference of two means at significant level
0.1 ?
Solution:
The sample size is small and it is reasonable to assume the variances are equal. So the
pooled estimate will be used.
28.57Tx , 40.0Cx , 2 198.62Ts , 2 215.33Cs , 1 7n , 2 7n , 0.1 ,
1 2 2 7 7 2 12
0.05 0.05
2
1.782n n
t t t
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
15
The pooled standard error is
2 2
1 1 2 2
1 2
( 1) ( 1) (7 1)(198.62) (7 1)(215.33)14.387
2 7 7 2p
n s n ss
n n
1 2 2
1 22
1 1( )
1 1(28.57 40.00) 1.782(14.387)
7 7
11.44 13.70
n n
T C px x t sn n
4.9. Compare the Difference of Two Proportions
For two binomial distributions 1 1 1( , , )b n p x and 2 2 2( , , )b n p x , when 1n and 2n are less than
10% of population and 1 1 1 1 2 2 2 2ˆ ˆ ˆ ˆ, (1 ), , (1 ) 10n p n p n p n p , the difference of the two
proportions 1 2ˆ ˆp p to estimate the difference of population means
1 2p p has the distribution 1 1 2 21 2
1 2
ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ ,
p p p pN p p
n n
. The confidence
interval is
1 1 2 2 1 1 2 21 2 1 2
1 2 1 22 2
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ(1 ) (1 ) (1 ) (1 )ˆ ˆ ˆ ˆCI ,
p p p p p p p pp p z p p z
n n n n
or
1 1 2 21 2
1 22
ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ
p p p pp p z
n n
Example 4.9.1. Independent random samples of 100 luxury cars and 250 non-luxury cars
in a certain city are examined to see if they have bumper stickers. Of the 250 non-luxury
cars, 125 have bumper stickers and of the 100 luxury cars, 30 have bumper stickers. What
is a 90% confidence interval for the difference in the proportion of non-luxury cars with
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
16
bumper stickers and the proportion of luxury cars with bumper stickers for the population
of cars represented by these samples?
Solution:
2 250n , 1
125ˆ 0.5
250p ,
2 100n , 2
30ˆ 0.3
100p , 0.1
Assume that 1n and
2n are less than 10% of total numbers of luxury cars and non-luxury
cars, respectively. 0.05
2
1.645z z , 1 1 1 1 2 2 2 2ˆ ˆ ˆ ˆ, (1 ), , (1 ) 10n p n p n p n p .
1 1 2 21 2
1 22
ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ
0.5(1 0.5) 0.3(1 0.3)0.5 0.3 1.645
250 100
0.2 0.092
p p p pp p z
n n
4.10. Summary and Examples
Case #1: use the sample proportion to estimate the population proportion
Given: the significant level , the sample size n , the favorable outcome x
Distribution: ( , , )b n p x , assume the sample proportion follows the binomial distribution.
Sample Statistic: ˆx
pn
Population Parameter: proportion p
Confidence Interval (CI): n
ppzp
)ˆ1(ˆˆ
2
Conditions: 10%n of population, 10ˆ pn , 10)ˆ1( pn
Case #2: use the sample mean to estimate the population mean
Given: the significant level , the sample size n , the sample mean x
Distribution: 2,N x , assume the sample mean follows the normal distribution.
Sample Statistic: x
Population Parameter: mean
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
17
Confidence Interval (CI): n
zx
2
Conditions: 30n , is known
Case #3: use the sample mean to estimate the population mean
Given: the significant level , the sample size n , the sample mean x
Distribution: ~ ( 1)x
T ns
n
, assume
x
sn
follows the t distribution with 1df n .
Sample Statistic: x
Population Parameter: mean
Confidence Interval (CI): 1
2
n sx t
n
Conditions: 30n , and/or is unknown
Distribution Sample
Statistic
Confidence Interval Comments
( , , )b n p x p̂
n
ppzp
)ˆ1(ˆˆ
2
1.) 10%n of
pop
2.) 10ˆ pn
3.) 10)ˆ1( pn
2,N x x
nzx
2
1.) is known
2.) 30n
~ ( 1)x
T ns
n
x
1
2
n sx t
n
1.) 1k n deg.
of freedom
2.) 5 30n or
3.) is unknown
2 2
1 21 2
1 2
,N x xn n
1 2x x 2 2
1 21 2
1 22
( )x x zn n
1.) 1 2, are
known and
unequal.
2.) 1 2, 30n n
1 2 1 2
1 2
1 2
( )( 2)
1 1p
x xT n n
sn n
1 2x x
1 2
1 2
2
1 22
( )
1 1n n
p
x x
t sn n
1 2,x x are the sample means
1.)
1 2 2k n n is
deg. of freedom
2.) 1 25 , 30n n
and/or 1 2, are
equal.
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
18
2 2
1 1 2 2
1 2
( 1) ( 1)
2p
n s n ss
n n
and 1 2,s s are the sample
deviations
3.) 1 2, are
unknown and
equal
( , , )b n p x
1 2ˆ ˆp p
1 2
1 1 2 2
1 22
ˆ ˆ
ˆ ˆ ˆ ˆ(1 ) (1 )
p p
p p p pz
n n
1.) 1 2, 10%n n
of pop
2.)
1 1 2 2ˆ ˆ, 10n p n p ,
1 1ˆ(1 ) 10n p ,
2 2ˆ(1 ) 10n p
Example 4.10.1. I want to construct a 99% confidence interval for the proportion of
Americans who think that the government has placed too many regulation on businesses,
and I want a margin of error of no more than 3%. Assume the population proportion is
0.5. How large of a sample will this require?
Solution:
0.01 , 0.5p , 0.005
2
2.576z z , 2
(1 )3%
p pz
n
?n
2
0.5(0.5) 2.575(0.5)2.576 3% 1842
0.03n
n
Example 4.10.2. A study of 5302 people aged 60 or older in US found 124 with
rheumatoid arthritis. Construct 90% confidence interval for the actual proportion of all
people aged 60 and older who have rheumatoid arthritis.
Solution:
0.1 , 124
ˆ 0.02335302
p , 0.05
2
1.6449z z , 5302n , ?CI
0.0233(1 0.0233)0.0233 1.6449 0.0233 0.0021
5302
Or
0.0212, 0.0254CI
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
19
Example 4.10.3. [AP practice question, College Board] A large company is considering
opening a franchise in St. Louis and wants to estimate the mean household income for the
area using a simple random sample of the households. Based on information from a pilot
study, the company assumes that the standard deviation of household incomes is
$7,200 . What is the least number of households that should be surveyed to obtain an
estimate that is within $200 of the true mean houshold income with 95 percent
confidence?
Solution:
The variance is known. 2
7,200, 0.05, 1.96,z 2
200zn
2 2
2
1.96 72004979
200 200n z
Example 4.10.4. [AP practice question, College Board] Courtney has constructed a
cricket out of paper and rubber bands. According to the instructions for making the
cricket, when it jumps it will land on its feet half of the time and on its back the other half
of the time. In the 50 jumps, Courtney’s cricket landed on its feet 35 times. In the next 10
jumps, it landed on its feet only twice. Based on this experience, Courtney can conclude
that
(A) the cricket was due to land on its feet less than half the time during the final 10 jumps,
since it had handed too often on its feet during the first 50 jumps.
(B) a confidence interval for estimating the cricket’s true probability of landing on its feet
is wider after the final 10 jumps than it was before the final 10 jumps.
(C) a confidence interval for estimating the cricket’s true probability of landing on its feet
after the final 10 jumps is exactly the same as it was before the final 10 jumps.
(D) a confidence interval for estimating the cricket’s true probability of landing on its feet
is more narrow after the final 10 jumps than it was before the final 10 jumps.
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
20
(E) a confidence interval for estimating the cricket’s true probability of landing on its feet
based on the initial 50 jumps does not include 0.2, so there must be a defect in the
cricket’s construction to account for the poor showing in the final 10 jumps.
Solution:
The answer is D. The proportion is asumed to be 0.5p . The error term (width) of the
confidence interval is calculated by 2
(1 )p pz
n
. So, when the sample size is
increasing, the error term will be decreasing. That is, the CI is narrowing when sample
size is increasing.
Example 4.10.5. [2015 AP Stats FRQ, #2] To increase business, the owner of a restaurant
is running a promotion in which a customer’s bill can be randomly selected to receive a
discount. When a customer’s bill is printed, a program in the cash register randomly
determines whether the customer will receive a discount on the bill. The program was
written to generate a discount with a probability of 0.2, that is, giving 20 percent of the
bills a discount in the long run. However, the owner is concerned that the program has a
mistake that results in the program not generating the intended long-run proportion of 0.2.
The owner selected a random sample of bills and found that only 15 percent of them
received discounts. A confidence interval for p, the proportion of bills that will receive a
discount in the long run, is 0.15 0.06 . All conditions for inference were met.
a.). Consider the confidence interval 0.15 0.06 .
i. Does the confidence interval provide convincing statistical evidence that the
program is not working as intended? Justify your answer.
ii. Does the confidence interval provide convincing statistical evidence that the
program generates the discount with a probability of 0.2? Justify your answer.
A second random sample of bills was taken that was four times the size of the original
sample. In the second sample, 15 percent of the bills received the discount.
b.) Determine the value of the margin of error based on the second sample of bills that
would be used to compute an interval for p with the same confidence level as that of the
original interval.
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
21
c) Based on the margin of error in part (b) that was obtained from the second sample,
what do you conclude about whether the program is working as intended? Justify your
answer.
Solution:
a.)
i. No. The assumed proportion is 0.2, and it is within the CI. So, there is no
statistical evidence to claim that the program is not working.
ii. No. Any number within CI could be the probability.
b.) 0.06
0.034
.
c.) Now the CI is 0.15 0.03 , so 0.2 is not within the CI. So, there is a convincing
evidence that the program is not working.
Example 4.10.6. [2011 AP Stats FRQ, #6] Every year, each student in a nationally
representative sample is given tests in various subjects. Recently, a random sample of
9,600 12th-grade students from US were administered a multiple-choice US history exam.
One of the multiple-choice questions is below. (The correct answer is C.)
Of the 9,600 students, 28 percent answered the multiple-choice question correctly.
a.). Let p be the proportion of all United States twelfth-grade students who would answer
the question correctly. Construct and interpret a 99 percent confidence interval for p.
Assume that students who actually know the correct answer have a 100 percent chance of
answering the question correctly, and students who do not know the correct answer to the
question guess completely at random from among the four options. Let k represent the
proportion of all United States twelfth-grade students who actually know the correct
answer to the question.
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
22
b.) A tree diagram of the possible outcomes for a randomly selected twelfth-grade student
is provided below. Write the correct probability in each of the five empty boxes. Some of
the probabilities may be expressions in terms of k.
c.) Based on the completed tree diagram, express the probability, in terms of k, that a
randomly selected twelfth-grade student would correctly answer the history question.
d.) Using your interval from part (a) and your answer to part (c), calculate and interpret a
99 percent confidence interval for k, the proportion of all United States twelfth-grade
students who actually know the answer to the history question. You may assume that the
conditions for inference for the confidence interval have been checked and verified.
Solution:
a.). ˆ 0.280p , ˆ 9600(0.280) 2688 10np , ˆ(1 ) 9600(1 0.280) 6912 10n p ,
0.01 , 0.005
2
2.576z z , ˆ ˆ(1 )p p
n
.
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
23
2
ˆ ˆ(1 )ˆ
0.280(1 0.280)0.280 2.576
9600
0.28 0.012
(0.268, 0.292)
p pp z
n
The CI indicates that 99% confidence that the population proportion is between 0.268 and
0.292. That is, we are 99 percent confident that the interval from 0.268 to 0.292 contains
the population proportion of all United States twelfth-grade students who would answer
this question correctly.
b.)
c.)
( _ )
( _ ) ( _ )
0.25(1 )
0.25 0.75
P Ans Corr
P Know Corr P Guess Corr
k k
k
d.)
Since ˆ 0.25 0.75p k , then
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
24
ˆ0.268 0.292
0.268 0.25 0.75 0.292
0.024 0.056
p
k
k
We are 99 percent confident that the interval from 0.024 to 0.056 contains the proportion
of all United States twelfth-grade students who actually know the answer to the history
question.
Example 4.10.7. [2013 AP Stats FRQ, #1] An environmental group conducted a study to
determine whether crows in a certain region were ingesting food containing unhealthy
levels of lead. A biologist classified lead levels greater than 6.0 parts per million (ppm) as
unhealthy. The lead levels of a random sample of 23 crows in the region were measured
and recorded. The data are shown in the stemplot below.
a.) What proportion of crows in the sample had lead levels that are classified by the
biologist as unhealthy?
b.) The mean lead level of the 23 crows in the sample was 4.90 ppm and the standard
deviation was 1.12 ppm. Construct and interpret a 95 percent confidence interval for the
mean lead level of crows in the region.
Solution:
a.) 4
ˆ 0.17423
p
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
25
b.) 4.90x , 23n 23 1 22df , 23
0.025
2
2.069dft t
1
2
1.124.90 2.069 4.90 0.483
23
(4.417, 5.383)
n sx t
n
CI
We can be 95% confident that the population mean lead level among all crows in this
region is between 4.416 and 5.384 parts per million
Mathacle
PSet ----- Stats, Confidence Intervals and Estimation
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
26
Quiz -- Confidence Interval