Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters...

Topic 3: Simple Linear Regression

Outline

• Simple linear regression model–Model parameters– Distribution of error terms

• Estimation of regression parameters–Method of least squares–Maximum likelihood

Data for Simple Linear Regression

• Observe i=1,2,...,n pairs of variables

• Each pair often called a case

• Yi = ith response variable

• Xi = ith explanatory variable

Simple Linear Regression Model

• Yi = b0 + b1Xi + ei

• b0 is the intercept

• b1 is the slope

• ei is a random error term

– E(ei)=0 and s2(ei)=s2

– ei and ej are uncorrelated

Simple Linear Normal Error Regression Model

• Yi = b0 + b1Xi + ei

• b0 is the intercept

• b1 is the slope

• ei is a Normally distributed random error with mean 0 and variance σ2

• ei and ej are uncorrelated → indep

Model Parameters

• β0 : the intercept

• β1 : the slope

• σ2 : the variance of the error term

Features of Both Regression Models

• Yi = β0 + β1Xi + ei

• E (Yi) = β0 + β1Xi + E(ei) = β0 + β1Xi

• Var(Yi) = 0 + var(ei) = σ2

– Mean of Yi determined by value of Xi

– All possible means fall on a line

– The Yi vary about this line

Features of Normal Error Regression Model

• Yi = β0 + β1Xi + ei

• If ei is Normally distributed then

Yi is N(β0 + β1Xi , σ2) (A.36)

• Does not imply the collection of Yi are Normally distributed

Fitted Regression Equation and Residuals

• Ŷi = b0 + b1Xi

–b0 is the estimated intercept

–b1 is the estimated slope

• ei : residual for ith case

• ei = Yi – Ŷi = Yi – (b0 + b1Xi)

X=82

Ŷ82=b0 + b182 e82=Y82-Ŷ82

Plot the residuals

proc gplot data=a2; plot resid*year vref=0; where lean ne .; run;

Continuation of pisa.sas

Using data set from output statement

vref=0 adds horizontal line to plot at zero

Least Squares

• Want to find “best” b0 and b1

• Will minimize Σ(Yi – (b0 + b1Xi) )2

• Use calculus: take derivative with respect to b0 and with respect to b1

and set the two resulting equations equal to zero and solve for b0 and b1

• See KNNL pgs 16-17

Least Squares Solution

• These are also maximum likelihood estimators for Normal error model, see KNNL pp 30-32

XbYb

XX

YYXXb

10

2i

ii1 )(

))((

Maximum Likelihood

2

i 0 1 i

2i 0 1 i

Y X1

2i

1 2 n

0 1

Y ~ X ,

1

2 (likelihood function)

Find and which maximizes

N

f e

L f f f

L

Estimation of σ2

MSEss

MSEdf

SSE

s

E

Root

2n

e

2n

YY

2

2

iii2 )(

Standard output from Proc REG

Analysis of Variance

Source DFSum of

SquaresMean

Square F Value Pr > FModel 1 15804 15804 904.12 <.0001Error 11 192.2857

117.48052

Corrected Total 12 15997 Root MSE 4.18097 R-Square 0.9880

Dependent Mean 693.69231 Adj R-Sq 0.9869

Coeff Var 0.60271

MSEdfe

s

Properties of Least Squares Line

• The line always goes through •

• Other properties on pgs 23-24

)Y,X(

0

))((

))((

0110

10

10

bXbYnXnbnbYn

XbbY

XbbYe

ii

iii

Background Reading

• Chapter 1– 1.6 : Estimation of regression function– 1.7 : Estimation of error variance– 1.8 : Normal regression model

• Chapter 2– 2.1 and 2.2 : inference concerning ’s

• Appendix A– A.4, A.5, A.6, and A.7

Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters...

Documents

Transcript of Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters...