200 400 600 800 1000 Linear Regression Linear Regression II Probability IProbability II Inference.
Inference in Simple Linear Regression
description
Transcript of Inference in Simple Linear Regression
Inference in Simple Linear Regression
KNNL – Chapter 2
Least Squares Estimate of 1
1
1 1 1 1 1 1 1
2 2 2 2 22 2 2
1 1 1 1 1
22 2 1
1 1
Note the following results:
0
2 2 2
n
in n n n n n ni
i i i i i ii i i i i i i
n n n n n
i i i i i ii i i i i
n
in ni
i ii i
XX X X X X nX X n X X
n
X X X X X X X nX X X X nX nX
XX nX X n
n
22
12
1 1
1 1
1 11 2 2
1 1 11
2 1
1
1
where: Note:
n
in ni
i i ii i
n n
i in ni i
i i i i n n ni i
i i i i inni i iXX
iiin
ii
i
ii i
iXX
X
X X X Xn
X YX Y X X Y Y
nb X X Y Y X X k Y
SSX XX
Xn
X Xk k
SS
22
2 21 1 1 1 1
1 10 1
n n n n ni i XX
i i i ii i i iXX XX XX XX
X X X SSk X k X X
SS SS SS SS
Sampling Distribution of b1 – Normal Error Model
2 21
1
2 2 2 2
1 1
Note the following results assuming independent Normal Random Variables:
,..., ~ , ~ , where:
Simple linear regression model with normal and
n
n i i i i U Ui
n n
U i i U i ii i
Y Y N U aY N
E U a U a
1
1
20 1
21
1 1 1 1
1 0 1 0 1 0 1 11 1 1
2 2 2 2 2 21
1 1
independent errors:
~ ,
1where: with: 0 1
(0) (1)
i i
n n n ni
i i i i i i ii i i iXX XX
n n n
b i i i i ii i i
n n
b i i ii i
Y N X
X Xb k Y k k k X k
SS SS
E b k X k k X
b k k
1
2 2
1 1
22 2 2
1
~ ,
In practice unknown
XX XX
bXX XX
b NSS SS
s MSEs s b
SS SS
Sampling Distribution of (b1-1)/s{b1}
21 1 1 1
1 1 21
2 22 1 1
22 2 21
1 1 1 12
1 1 1 1
2
2
~ , ~ 0,1
2 2 2~ also: and independent
22
XX XX
n
XX XX
XX
b bb N N
SS bSS
n s n MSE n sb
b
b b
SS SS b bs sSS s SSn s
n
1 1
1
1 1
1
1 1
~ 2
Pr / 2; 2 1 / 2 ; 2 1
Pr 1 / 2 ; 2 1 / 2 ; 2 1
XX
XX
bt n
s b
bt n t n
s b
bt n t n
s SS
(1-)100% Confidence Interval for 1
1 11
1
1 1
1
1 1 1 1
1 1 1 1 1
1 1 1 1
~ 2
Pr 2;1 / 2 2;1 / 2 1
Pr 2;1 / 2 2;1 / 2 1
Pr 2;1 / 2 2;1 / 2 1
Pr 2;1 / 2 2;1 / 2
XX
b st n s b
s b SS
bt n t n
s b
t n s b b t n s b
b t n s b b t n s b
b t n s b b t n s
1
1 1 1
1
1 100% Confidence Interval for 2;1 / 2
b
b t n s b
Test of Hypothesis for
0 1 10 1 10 10
1 10 110
1 1
0
2-sided test: : : (Almost always 0)
Test Statistic: * Note: if 0 *
Decision Rule: * 2;1 / 2 Reject otherwise Fail to Reject
P-value: 2Pr 2
AH H
b bt t
s b s b
t t n H
t n
0 1 10 1 10
0
0 1 10 1 10
0
*
Upper-tail test: : :
Decision Rule: * 2;1 Reject otherwise Fail to Reject
P-value: Pr 2 *
Lower-tail test: : :
Decision Rule: * 2;1 Reject othe
A
A
t
H H
t t n H
t n t
H H
t t n H
rwise Fail to Reject
P-value: Pr 2 *t n t
Inference Concerning 0
0 1 11 1
0 1 0 1 1 0
22 2 2 20 1 1 1
21
1 1 1
20
1where: and
2 ,
1 1Aside: , , 0
n ni
i ii i XX
n n ni i
i ii i iXX XX
X Xb Y b X Y Y b Y
n SS
E b E Y b X X X
b Y b X Y X b X Y b
X X X XY b Y Y
n SS n SS
b
222 2 2
1
2
20 0
2 2
0
0 00 0 0
0
1
1~ ,
1 1Estimated Standard Error:
~ 2 1 100% CI for 2;1 / 2
Test Statistic for testing
XX
XX
XX XX
XY X b
n SS
Xb N
n SS
X Xs b s MSE
n SS n SS
bt n b t n s b
s b
0 000 0 00 0 00 0
0
: : : * Reject if * 2;1 / 2A
bH H t H t t n
s b
Interval Estimation of E{Yh} = 0+1Xh
0 1
^
0 1 1 1 1
^
0 1
^ 22 2 2 2 2 2
1 1 1
2 222
2
Goal:Estimate population mean when :
Parameter:
Estimator:
1
h
h h
h h h h
h h
h h h h
h h
XX XX
X X
E Y X
Y b b X Y b X b X Y b X X
E Y X
Y Y b X X Y b X X Y X X b
X X X X
n SS n SS
1
2 2
^
^ ^
0 1
Note: , 0
1 1
1 100% CI for : 2;1 / 2
h hh
XX XX
h hh h
Y b
X X X Xs Y s MSE
n SS n SS
E Y X Y t n s Y
Prediction Interval forYh(new) when X=Xh
(new) 0 1 (new)
^
0 1
^
(new)
2
^ ^2 2 2 2 2 2
(new) (new)
Goal: Predict a new (future) observation when :
Target: +
Prediction:
Prediction Error:
1pred
h
h h
h h
hh
hh hh h
XX
X X
Y X
Y b b X
Y Y
X XY Y Y Y
n SS
2
^2
(new)
2 2
^
(new)
11 Note: , 0
1 1pred 1 1
1 100% Prediction Interval for : 2;1 / 2 pred
hhh
XX
h h
XX XX
hh
X XY Y
n SS
X X X Xs s MSE
n SS n SS
Y Y t n s
Confidence Band for Regression Line
0 1
^
0 1 1 1 1
^
0 1
2
^2 2
^
Goal:Simultaneously Estimate population mean for all values (not extrapolating) :
Parameter:
Estimator:
1
h h
h h h h
h h
hh
XX
h
X
E Y X
Y b b X Y b X b X Y b X X
E Y X
X XY
n SS
s Y s
2 2
^ ^
0 1
1 1
1 100% CI for :
2 1 ;2, 2
This can be used for any number of specific levels, simultaneously
h h
XX XX
h hh h
X X X XMSE
n SS n SS
E Y X Y Ws Y
W F n
X
Analysis of Variance Approach to Regression
2
1
^
2^
1
Deviation of i observation from the Mean:
Total Sum of Squares:
Deviation of i observation from the Regression Line:
Error Sum of Squares:
Deviati
thi
n
ii
thii
n
iii
Y Y
SSTO Y Y
Y Y
SSE Y Y
^
2^
1
on of i fitted value from the Mean:
Regression Sum of Squares:
thi
n
i
i
Y Y
SSR Y Y
ANOVA Partitioning - I
^ ^
2 2^ ^ ^ ^2
2 2^ ^ ^ ^2
1 1 1 1
2
2
Note(from normal equations, Chapter 1, Slide 5
i ii i
i i i ii i i
n n n n
i i i ii i ii i i i
Y Y Y Y Y Y
Y Y Y Y Y Y Y Y Y Y
Y Y Y Y Y Y Y Y Y Y
^ ^ ^ ^
1 1 1 1
2 2^ ^2
1 1 1
):
0 0 0n n n n
i i i ii i i ii i i i
n n n
i ii ii i i
Y Y Y Y e Y Y e Y Y e
Y Y Y Y Y Y
SSTO SSR SSE
ANOVA Partitioning
2^ 22
0 1 1 11 1 1
2 22 2
1 1 1 11 1
2
Note useful result regarding :
Degrees of Freedom associated with each sum of squares:
Total:
n n n
i i ii i i
n n
i i XXi i
ii
SSR
SSR Y Y b b X Y Y b X b X Y
b X b X b X X b SS
SSTO Y Y
TO1
2^
E1
2^
R1
= 1 (One parameter estimated)
Error: 2 (Two parameters estimated)
Regression: 2 1 1
(Fitted equation has 2 parameters,
n
n
iii
n
i
i
df n
SSE Y Y df n
SSR Y Y df
TO R E
mean removes 1)
=
Note: Mean Squares are Sums of Squares divided by degrees of freedom
df df df
Analysis of Variance Table
2^ 22 2
11 1
2^2
1
2
1
2 22 2
22
11
Source { }
Regression 11
Error 22
Total 1
Note:
~ 2 22
n n
i ii i
n
iii
n
ii
n
ii
SS df MS E MS
SSRY Y MSR X X
SSEY Y n MSE
n
Y Y n
SSE SSE SSEn E n E E MSE
n
MSR SSR b X X
22 21 1 1
2 22 2 2
1 11
XX XX
n
XX iiXX
E MSR SS E b SS b E b
SS X XSS
F-Test for H0:1=0 vs HA:1≠0
0 1 1
* *0
22 2
1* 1
0 2
*0 1
: 0 : 0
Test Statistic: Reject if 1 ;1, 2 (See below for why)
Reject for large since
Sampling Distribution of Under : 0 (C
A
n
ii
H H
MSRF H F F n
MSE
X XE MSR
H FE MSE
F H
2 22 2 2 2
2
*
2
2*
2
ochran's Theorem, p. 70):
1) 1 ( 2) 1
2) ~ 2 ~ 1 , independent
1 13) ~ 1, 2
2 2
1
~ 1, 2
2
R E TOSSR SSE SSTO df df n n df
SSE SSR SSE SSRn
F F nn n
SSR
MSRF F n
MSESSEn
Comments on F-Test
21 12
22 2 2
2*
2
1) ~ 2 regardless of whether or not 0, when 0 :
2) ~ non-central 1 , independent regardless
1
3) ~ non-central 1, 2
2
4) F-test and t-
SSEn
SSR SSE SSR
SSR
MSRF F n
MSESSEn
0 1 1
22 2 2
2* *1 1 1 12
1 1
2* *
test are equivalent for : 0 : 0 :
1
2
Critical Value for 1 ;1, 2 1 2; 2 Critical Value for
A
XX
XX
H H
SSR b SS b b bF t
MSE MSE SS s b s bSSE n
F F n t n t
General Linear Test – Very Flexible Method
0 1
^
1 0 0 1
2^
1
1) Fit the Full/Unrestricted Model (No restrictions on ,
Compute , by least squares and
Error sum of squares for Full Model: 2
2) Fit the Reduced/Restri
i i
n
ii Fi
b b Y F b b X
SSE F Y Y F df n
0 1
0 0 00 1 10
^
cted Model (Restriction(s) on , and/or
: and/or
Estimate any unspecified parameters by least squares with restriction(s) and obtain
Error sum of squares for Reduced Model:
i
H
Y R
SSE R Y
2^
1
*
*0
(# of unrestricted
( ) ( )3) Compute Test Statistic:
( )
4) Reject if 1 ; ,
n
iii
sR
R F
F
R F F
Y R
df n
SSE R SSE F df dfF
SSE F df
H F F df df df
Example 1 – H0: 1 = 0
1 0 1
2^ ^
0 11
1 1 0 1
^
0
2^ 2
1 1
Full (Unrestricted) Model:
2
Reduced (Restricted) Model: 0
(0)
XY
XX
n
i ii i Fi
i i
n n
ii i Ri i
SSb b Y b X
SS
Y F b b X SSE F Y Y F SSE df n
b b Y b X Y
Y R b X Y
SSE R Y Y R Y Y SSTO df
0
*
* *0
1 (only 1 "free" parameter:
( ) ( ) ( 1) ( 2)Test Statistic:
( ) 2
1= Reject H if 1 ;1, 2 (The ANOVA -test
2
R F
F
n
SSE R SSE F df df SSTO SSE n nF
SSE F df SSE n
SSR MSRF F F n F
MSESSE n
)
Descriptive Measures of Linear Association
^
1
^
0 1
Coefficient of Determination (Proportionate Reduction in Error):
Ignoring Predictor (Setting 0) : "Error SS" =
Accounting for Predictor: "Error SS" =
Difference: Portion "accoun
i
i i
Y Y SSTO
Y b b X SSE
2 2
2
1
ted" by :
1 0 1 Note: See plots on slides 12-14
Coefficient of Correlation (Often used when both and are random):
1 1
1) and are of the
XY
XX YY
X SSTO SSE SSR
SSR SSER R
SSTO SSTO
X Y
SSr R r
SS SS
r b
1
same sign
2) (but not ) is not changed by linear transformations of and/or r b Y X
Correlation Models – Y1,Y2 Bivariate Normal
1 2
1 2
2
1 1 1 1 2 2 2 21 2 1222
1 1 2 2121 2 12
, 2 Characteristics (Random) observed on Experimental Unit
Joint Density (at specific pairs of values , ) :
1 1, exp 2
2 12 1
Y Y
y y
y y y yf y y
2
1212 1 2 12 12 1 2 1 1 2 2
1 2
2
2 21 11 1 1 2 2 1 1 1 2 2 2
11
where is the correlation between , ,
Marginal Densities:
1 1, exp ~ , ~ ,
22
Conditi
Y Y Y Y E Y Y
yf y f y y dy Y N Y N
1 2 2 2 1 1
2
1 1|2 12 21 21 2
2 2 1|21|2
2 2 21 11|2 1 2 12 12 12 1|2 1 12
2 2
2 211 2 2 1 12 2 2 1 12 1|2 1
2
onal Densities | & |
, 1 1| exp
22
where: 1
| ~ , 1
Y Y y Y Y y
y yf y yf y y
f y
Y Y y N y N
2
2 2 1|2,y
Inferences on Correlation Coefficients
1 2 1212
1 2 1 2
112 12
2 2
1 1
0 12
,Parameter:
Point (maximum likelihood) Estimator (aka Pearson product-moment correlation coefficient):
1 1
Testing :
n
i ii XY
n nXX YY
i ii i
Y Y
Y Y
X X Y YSS
r rSS SS
X X Y Y
H
12
* 12
212
*0
*12 0
*12 0
0 vs : 0 :
2Test Statistic:
1
Reject if 1 2 ; 2
For 1-sided tests:
: 0 : Reject if 1 ; 2
: 0 : Reject if 1 ; 2
This test is mathematicall
A
A
A
H
r nt
r
H t t n
H H t t n
H H t t n
0 1y equivalent to t-test for : 0H
Confidence Interval for 1212 12
12
12
12
12
Problem: When 0, sampling distribution of is messy
11Fisher's z transformation: ' ln
2 1
11 1For large (typically at least 25): ' ~ , ln
3 2 1
Compute a
approx
r
rz
r
n z Nn
2
12 2
n approximate 1 100% CI for and transform back for :
11 100% CI for : ' 1 2
3
1After computing CI for , use identity
1
z zn
e
e
Spearman Rank Correlation Coefficient
11 1 11 1
12 2
When data are not normal, and no transformations are normal:
Spearman's Rank Correlation Method:
1) Rank ,..., from 1 to n (smallest to largest) and label: ,...,
2) Rank ,..., from 1 to
n n
n
Y Y R R
Y Y
12 2
1 21 21
2 2
1 21 21 1
0 1 2
n (smallest to largest) and label: ,...,
3) Compute Spearman's rank correlation coefficient:
To Test: H : No Association Between , vs H : Ass
n
n
i ii
S n n
i ii i
A
R R
R R R Rr
R R R R
Y Y
* *02
ociation Exists
2Test Statistic: Reject if 1 2 ; 2
1S
S
r nt H t t n
r