Post on 31-Mar-2018
STAT 111 Recitation 8
Linjun Zhang
March 17, 2017
Misc
I Midterm grades will be posted next Tuesday or Wednesday.
I The slides can be found on
http://stat.wharton.upenn.edu/∼ linjunz/
I Send me email at linjunz@wharton.upenn.edu if you have any
feedback. (eg. less review, more practice problems? )
1
Misc
I Midterm grades will be posted next Tuesday or Wednesday.
I The slides can be found on
http://stat.wharton.upenn.edu/∼ linjunz/
I Send me email at linjunz@wharton.upenn.edu if you have any
feedback. (eg. less review, more practice problems? )
1
Misc
I Midterm grades will be posted next Tuesday or Wednesday.
I The slides can be found on
http://stat.wharton.upenn.edu/∼ linjunz/
I Send me email at linjunz@wharton.upenn.edu if you have any
feedback. (eg. less review, more practice problems? )
1
Confidence intervals
A general formula. For a parameter θ, suppose we estimate it by θ.
Then an approximate 95% confidence interval for θ is
θ − 2 · s.d.(θ) to θ + 2 · s.d.(θ)
where s.d.(θ) is the standard deviation of θ, and s.d.(θ) is the
estimate of s.d.(θ).
2
Confidence intervals
I If X has a binomial distribution Binomial(n, θ), and we observe X = x .
An (conservative) approximate 95% confidence interval for θ is
x
n−
√1
nto
x
n+
√1
n.
I If X1,X2, ...,Xn are i.i.d. with mean µ and variance σ2, and we observe
x1, x2, ..., xn. An approximate 95% confidence interval for µ is
x − 2s√n
to x + 2s√n,
where x = x1+...+xnn
, s2 =x2
1 +...+x2n−n(x)2
n−1.
I If X1 has a binomial distribution Binomial(n1, θ1), and X2 has a binomial
distribution Binomial(n2, θ2). We observe X1 = x1,X2 = x2. An
(conservative) approximate 95% confidence interval for θ1 − θ2 is
p1 − p2 −√
1
n1+
1
n2to p1 − p2 +
√1
n1+
1
n2,
where pi = xini
, for i = 1, 2.
3
Confidence intervals
I If X has a binomial distribution Binomial(n, θ), and we observe X = x .
An (conservative) approximate 95% confidence interval for θ is
x
n−
√1
nto
x
n+
√1
n.
I If X1,X2, ...,Xn are i.i.d. with mean µ and variance σ2, and we observe
x1, x2, ..., xn. An approximate 95% confidence interval for µ is
x − 2s√n
to x + 2s√n,
where x = x1+...+xnn
, s2 =x2
1 +...+x2n−n(x)2
n−1.
I If X1 has a binomial distribution Binomial(n1, θ1), and X2 has a binomial
distribution Binomial(n2, θ2). We observe X1 = x1,X2 = x2. An
(conservative) approximate 95% confidence interval for θ1 − θ2 is
p1 − p2 −√
1
n1+
1
n2to p1 − p2 +
√1
n1+
1
n2,
where pi = xini
, for i = 1, 2.
3
Confidence intervals
I If X has a binomial distribution Binomial(n, θ), and we observe X = x .
An (conservative) approximate 95% confidence interval for θ is
x
n−
√1
nto
x
n+
√1
n.
I If X1,X2, ...,Xn are i.i.d. with mean µ and variance σ2, and we observe
x1, x2, ..., xn. An approximate 95% confidence interval for µ is
x − 2s√n
to x + 2s√n,
where x = x1+...+xnn
, s2 =x2
1 +...+x2n−n(x)2
n−1.
I If X1 has a binomial distribution Binomial(n1, θ1), and X2 has a binomial
distribution Binomial(n2, θ2). We observe X1 = x1,X2 = x2. An
(conservative) approximate 95% confidence interval for θ1 − θ2 is
p1 − p2 −√
1
n1+
1
n2to p1 − p2 +
√1
n1+
1
n2,
where pi = xini
, for i = 1, 2.
3
Estimating the difference between two means
If X11,X12, ...,X1n are i.i.d. with mean µ1 and variance σ21 , X21,X22, ...,X2m
are i.i.d. with mean µ2 and variance σ22 , and we observe x11, ..., x1n,
x21, ..., x2m, what can we say about µ1 − µ2?
I We estimate µ1 − µ2 by x1 − x2, where x1 = x11+...+x1nn , x2 = x21+...+x2m
m .
I The variance of X1 − X2 isσ2
1n +
σ22
m , and we estimate σ21 and σ2
2 by
s21 =
x211+...+x2
1n−n(x1)2
n−1 , s22 =
x221+...+x2
2m−m(x1)2
m−1 .
I An approximate 95% confidence interval for µ1 − µ2 is
x1 − x2 − 2
√s2
1
n+
s22
mto x1 − x2 + 2
√s2
1
n+
s22
m.
4
Practice problem
Question
We are interested in investigating any potential difference between the mean blood
sugar level of diabetics (µ1) and that of non-diabetics (µ2). To do this we took a
sample of six diabetics and found the following blood sugar levels: 127, 144, 140, 136,
118, 138. We also took a sample of eight non-diabetics and found the following blood
sugar levels: 125, 128, 133, 141, 109, 125, 126, 122. (a) Estimate µ1 − µ2. (b) Find
two numbers between which we are about 95% certain that µ1 − µ2 lies.
Solution
µ1 = x1 = 16
(127 + 144 + 140 + 136 + 118 + 138) = 133.83.
σ21 = s2
1 = 16−1
(1272 + 1442 + 1402 + 1362 + 1182 + 1382 − 6 × 133.832) = 93.24.
µ2 = x2 = 18
(125 + 128 + 133 + 141 + 109 + 125 + 126 + 122) = 126.13.
σ22 = s2
2 =
18−1
(1252 + 1282 + 1332 + 1412 + 1092 + 1252 + 1262 + 1222 − 8× 126.1252) = 83.55.
The 95% confidence interval is given as x1 − x2 ± 2
√s21n
+s22m
, which is −2.49 to 17.90.
5
Practice problem
Question
We are interested in investigating any potential difference between the mean blood
sugar level of diabetics (µ1) and that of non-diabetics (µ2). To do this we took a
sample of six diabetics and found the following blood sugar levels: 127, 144, 140, 136,
118, 138. We also took a sample of eight non-diabetics and found the following blood
sugar levels: 125, 128, 133, 141, 109, 125, 126, 122. (a) Estimate µ1 − µ2. (b) Find
two numbers between which we are about 95% certain that µ1 − µ2 lies.
Solution
µ1 = x1 = 16
(127 + 144 + 140 + 136 + 118 + 138) = 133.83.
σ21 = s2
1 = 16−1
(1272 + 1442 + 1402 + 1362 + 1182 + 1382 − 6 × 133.832) = 93.24.
µ2 = x2 = 18
(125 + 128 + 133 + 141 + 109 + 125 + 126 + 122) = 126.13.
σ22 = s2
2 =
18−1
(1252 + 1282 + 1332 + 1412 + 1092 + 1252 + 1262 + 1222 − 8× 126.1252) = 83.55.
The 95% confidence interval is given as x1 − x2 ± 2
√s21n
+s22m
, which is −2.49 to 17.90.
5
RegressionSuppose we observe n data points (xi , yi ), i = 1, 2, ..., n.
It seems like there is some kind of linear relationship between the random
variables Xi and Yi , i = 1, 2, ..., n, i.e.
Yi = α + βXi + εi
where εi denotes the noise term (we assume that each yi is observed with noise
εi that has mean 0 and variance σ2).
6
Regression
I We can view Y as some random non-controllable quantity, and X as
some non-random controllable quantity.
Example:
7
Regression
I We can view Y as some random non-controllable quantity, and X as
some non-random controllable quantity.
Example:
I Y is the growth height of a tree, and X is the amount of water.
7
Regression
I We can view Y as some random non-controllable quantity, and X as
some non-random controllable quantity.
Example:
I In a basketball game analysis, Y is the points scored and X is
the minutes played.
7
Regression
I We can view Y as some random non-controllable quantity, and X as
some non-random controllable quantity.
Example:
I In a basketball game analysis, Y is the points scored and X is
the minutes played of a player.
7
Regression: before/after the experiment
I Before the experiment
I Conceptualize about Y1,Y2, ...,Yn
I Y1 corresponds to x1, Y2 corresponds to x2 and so on.
I Mean of Yi = α + βxi and variance of Yi = σ2.
I The various Yi are independent but not identically distributed.
I After the experiment
I Obtain observed values y1, y2, ..., yn.
I Plot (x1, y1), (x2, y2), ..., (xn, yn) values in the x-y plane.
8
Regression: auxiliary quantities
x =1
n
n∑i=1
xi
y =1
n
n∑i=1
yi
sxx =n∑
i=1
(xi − x)2 =n∑
i=1
x2i − nx2
syy =n∑
i=1
(yi − y)2 =n∑
i=1
y2i − ny2
sxy =n∑
i=1
(xi − x)(yi − y) =n∑
i=1
xiyi − nx y
9
Regression: estimating α, β, σ2
Unbiased estimate :
I Estimate β by b =sxysxx
.
I Estimate α by a = y − bx .
I Estimate σ2 by s2r =
syy−b2sxxn−2 .
I Estimate the regression line by y = a + bx .
10
Regression: practice problem
Practice Problem
Suppose we have observations of average income and total pizza sales for a
1-month period for eight different towns:
Estimate the mean pizza sales of a town with income x via the formula
“estimated mean = a+bx”. (That is, calculate a and b.)
Solution
I x = 10, y = 43.625, sxx = 210, syy = 1829.875, sxy = 610.
I b =sxysxx
= 610210
= 2.905; a = y −bx = 43.625−2.904762×10 = 14.57738.
11
Regression: practice problem
Practice Problem
Suppose we have observations of average income and total pizza sales for a
1-month period for eight different towns:
Estimate the mean pizza sales of a town with income x via the formula
“estimated mean = a+bx”. (That is, calculate a and b.)
Solution
I x = 10, y = 43.625, sxx = 210, syy = 1829.875, sxy = 610.
I b =sxysxx
= 610210
= 2.905; a = y −bx = 43.625−2.904762×10 = 14.57738.
11
Regression: practice problem
Practice Problem
Suppose we have observations of average income and total pizza sales for a
1-month period for eight different towns:
Estimate the mean pizza sales of a town with income x via the formula
“estimated mean = a+bx”. (That is, calculate a and b.)
Solution
I x = 10, y = 43.625, sxx = 210, syy = 1829.875, sxy = 610.
I b =sxysxx
= 610210
= 2.905; a = y −bx = 43.625−2.904762×10 = 14.57738.
11