Lecture 6: Introduction to Linear...
Transcript of Lecture 6: Introduction to Linear...
![Page 1: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/1.jpg)
Lecture 6:Introduction to Linear Regression
24 April 2007
![Page 2: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/2.jpg)
2
Linear regression: main idea
Linear regression can be used to study anoutcome as a linear function of a predictorExample: 60 cities in the US were evaluatedfor numerous characteristics, including:
the percentage of the population that was“disadvantaged”median education level
![Page 3: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/3.jpg)
3
Binary education variable10
1520
2530
% o
f pop
ulat
ion
with
inco
me
< $3
000
Low Education High Education
![Page 4: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/4.jpg)
4
Linear regression vs. ANOVAThese means could be compared by a t-test or ANOVA
Mean in low education group: 15.7%Mean in high education group: 13.2%
Regression provides a unified equation:
where Xi= 1 for high education 0 for low education (X is a“dummy variable” or “indicator variable” that designatesgroup)
ii
i10i
X5.27.51Y
XY
![Page 5: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/5.jpg)
5
Interpreting the modelis the predicted mean of the outcome for
Xi, that observation’s value for X.
Xi=0 (Low education)
Xi=1 (High education)
0
i
7.1505.27.51Y
iY
ii
i10i
X5.27.51Y
XY
10
i
2.1315.27.51Y
![Page 6: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/6.jpg)
6
Interpretation
0 is the mean outcome for thereference group, or the group forwhich Xi=0.Here, 0 is the average percent of thepopulation that is disadvantaged forcities with low education.
![Page 7: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/7.jpg)
7
Interpretation
1 is the difference in the meanoutcome between the two groups(when Xi=1 vs. when Xi=0)Here, 1 is difference in the averagepercent of the population that isdisadvantaged for cities with higheducation compared to cities with loweducation.
![Page 8: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/8.jpg)
8
Why use linear regression?
Linear regression is very powerful. Itcan be used for many things:
Binary XContinuous XCategorical XAdjustment for confoundingInteractionCurved relationships between X and Y
![Page 9: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/9.jpg)
9
Regression Analysis
A regression is a description of aresponse measure, Y ,the dependentvariable, as a function of anexplanatory variable, X, theindependent variable.Goal: prediction or estimation of thevalue of one variable, Y , based on thevalue of the other variable, X.
![Page 10: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/10.jpg)
10
Regression Analysis
A simple relationship between the twovariables is a linear relationship(straight line relationship)
Other names: linear, simple linear, leastsquares regression
![Page 11: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/11.jpg)
11
Galton’s Example
1000 records of heights of familygroupsReally tall fathers tend on average tohave tall sons but not quite as tall asthe really tall fathersThere is a “regression” of a son’s heighttoward the average height for sons
![Page 12: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/12.jpg)
12
Galton’s ExampleRegression of Son's Stature on Father'sE(Y) = 33.73 + 0.516*X
Son
's H
eigh
t
Father's Height (inches)60 62 64 66 68 70 72 74
64
66
68
70
72
74
![Page 13: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/13.jpg)
13
Regression Analysis:Population Model
Probability Model: independent responses
y1, y2,…,yn are sampled from
Yi ~ N( i, 2)
Systematic Model: µi = E(yi|xi) = 0 + 1xiwhere: 0 = intercept
1 = slope
![Page 14: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/14.jpg)
14
Another way to write the model
Systematic: yi = 0 + 1xi + i
Probability: i ~ N(0, 2)
The response, Yi, is a linear function ofXi plus some random, normallydistributed error, I
Data = Signal + noise
![Page 15: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/15.jpg)
15
Geometric Interpretation
![Page 16: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/16.jpg)
16
Model1) Yi ~ N( i, 2)2) µi = E(yi|xi) = 0 + 1xi
OR1) yi = 0 + 1xi + i2) i ~ N(0, 2)
where: 0 = intercept1 = slope
The response, Yi, is a linear function of Xiplus some random, normally distributederror, i
![Page 17: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/17.jpg)
17
Interpretation of Coefficients
Mean Model: µ = E(y|x) = 0 + 1x0 = expected response when X = 0
Since: E(y|x=0) = 0 + 1(0) = 0
1 = change in expected response per 1 unitincrease in X
Since: E(y|x+1) = 0 + 1(x+1)And: E(y|x) = 0 + 1x
E(y) from x to x+1 = 1
![Page 18: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/18.jpg)
18
From Galton’s Example
E(Y|x) = 0 + 1xE(Y|x) = 33.7 + 0.52x
where: Y = son’s height (inches)x = father’s height (inches)
Expected son’s height =33.7 inches whenfather’s height is 0 inchesExpected difference in heights for sons whosefathers’ heights differ by one inch = 0.52inches
![Page 19: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/19.jpg)
19
City/Education Example10
1520
2530
9 10 11 12 13Median education
% o
f pop
ulat
ion
with
inco
me
< $3
000
![Page 20: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/20.jpg)
20
Model
where Xi = the median educationlevel in city i
when Xi=0
when Xi=1
when Xi=2
ii
i10i
X0.22.36Y
XY
0
i
36.200.22.36Y
10
i
34.210.22.36Y
232.220.22.36Y
10
i
![Page 21: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/21.jpg)
21
Interpretation
0 is the mean outcome for thereference group, or the group forwhich Xi=0.Here, 0 is the average percent of thepopulation that is disadvantaged forcities with median education level of 0.
![Page 22: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/22.jpg)
22
Interpretation
1 is the difference in the meanoutcome for a one unit change in X.Here, 1 is difference in the averagepercent of the population that isdisadvantaged between two cities,when the first city has 1% highermedian education level than the secondcity.
![Page 23: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/23.jpg)
23
Finding ’s from the graph
0 is the Y-intercept of the line, or theaverage value of Y when X=0.
1 is the slope of the line, or the averagechange in Y per unit change in X.
y=mx+bb= 0, m= 1
21
211 xx
yyˆNotation:
1 represents the true slope (in the population)
b1 and are sample estimates of the slope1ˆ
![Page 24: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/24.jpg)
24
Where is our intercept?10
1520
2530
3540
4550
5560
0 2 4 6 8 10 12 14Median education
% o
f pop
ulat
ion
with
inco
me
< $3
000
![Page 25: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/25.jpg)
25
Centering
0 makes no sense!We can change X to fix this problemby a process called centering
1. Pick a value of X (c) within the range ofthe data
2. For each observation, generateX_centered = Xi-c
3. Redo the regression with X_centered
![Page 26: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/26.jpg)
26
We’ll use c=12,a high school degree
1015
2025
30
9 10 11 12 13Median education
% o
f pop
ulat
ion
with
inco
me
< $3
000
![Page 27: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/27.jpg)
27
New equation
1 has not changed
0 now corresponds to X=12, not X=0
Note: with X=0, we have
12X0.22.12Y
12XY
ii
i10i
36.22412.21200.22.12Yi
![Page 28: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/28.jpg)
28
Interpretation
0 is the mean outcome for the referencegroup, or the group for which Xi-12=0, orwhen Xi=12.Here, 0 (12.2%) is the average percent ofthe population that is disadvantaged for citieswith a median education level of 12, theequivalent of a high school degree.The interpretation of 1 has not changed.
![Page 29: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/29.jpg)
29
Centering in Galton ExampleMake 6 feet (72 inch) fathers the ‘reference group’Create a new X variable, X*, by subtracting 72 fromour old X variable, X* = X – 72
Then: E(Y|x*) = 0 + 1x*= 0 + 1(x – 72)
So, 0 = expected response when X = 72,since E(Y|x=72) = 0 + 1(72 – 72) = 0
Center X’s whenever interpretations call for it!
![Page 30: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/30.jpg)
30
Population Comparisons
0: changes depending on centering of X,which doesn’t affect association of interestReal concern: is X associated with Y?Assess by testing 1:Does 1=0 in the population from which thissample was drawn?
Hypothesis testingConfidence interval
![Page 31: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/31.jpg)
31
Hypothesis testing
H0: 1=0Test statistic:
df = n-k-1n = number of observationsk = number of predictors (X’s)
1
1obs ˆSE
0ˆt
![Page 32: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/32.jpg)
32
Hypothesis testing foreducation example
H0: 1=0Test statistic:
df = n-k-1 = 60-1-1 = 58n = number of observations = 60k = number of predictors (X’s) = 1
p<2*(1-0.995)p<0.01
36.30.59
00.2-tobs
![Page 33: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/33.jpg)
33
Interpretation and conclusionIf there were no association between medianeducation and percentage of disadvantagedcitizens in the population, there would be lessthan a 1% chance of observing data as ormore extreme than ours.
The null probability is very small, so:reject the null hypothesisconclude that median education level andpercentage of disadvantaged citizens areassociated in the population
![Page 34: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/34.jpg)
34
Confidence IntervalNo need to specify a hypothesis:
3.2,-0.8-0.59021.20.2
ˆSEtˆ1cr1
![Page 35: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/35.jpg)
35
Interpretation and conclusion
We are 95% confident that the truepopulation decrease in percentage ofdisadvantaged citizens per additional year ofmedian education is between 3.2 and 0.8.
Since this interval does not contain 0, webelieve percentage of disadvantaged citizensand median education are associated amongcities in the United States.
![Page 36: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/36.jpg)
36
So far…Linear regression is used for continuous outcomevariables
0: mean outcome when X=0Binary X = “dummy variable” for group
1: mean difference in outcome between groupsContinuous X
1: mean difference in outcome corresponding toa 1-unit increase in XCenter X to give meaning to 0
Test 1=0 in the population
![Page 37: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/37.jpg)
Linear Regression:Multiple covariates andconfounding
![Page 38: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/38.jpg)
38
Dataset
Hourly wage information from 9,918workers, along with informationregarding age, gender, years ofexperience, etc.We’ll focus on predicting hourly wagewith available information.
![Page 39: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/39.jpg)
39
Regression: Hourly wage vs.Years of experience
010
2030
4050
0 20 40 60Years of Experience
Hou
rly W
age
![Page 40: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/40.jpg)
40
What are the parameters?For each person, their actual hourly wage (Yi)and predicted hourly wage are known.
is the residual or errorThe parameters are found by minimizing thesum of the squared error
The parameters are the “least squares”estimates
i10i
iii
XYYY
n
1i
2i10i XYmin
iY
![Page 41: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/41.jpg)
41
Notesfor any known pointon the line
is always true
The regression line equation
XY 10
i10i XY
ii10i XY
![Page 42: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/42.jpg)
42
Model 1Model 1: Predict income by years of experience
so the average hourly wage for someonewith no experience at all is about $8.40.
so for every additional year of experience,the predicted hourly wage increases about 4 cents.
For 10 years of additional experience, the predicted hourlywage increases about 40 cents.
38.8ˆ0
04.0ˆ1
iii10i X04.038.8YXˆˆY
![Page 43: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/43.jpg)
43
Should we center X?
0 years of experience is within therange of the dataThe average hourly wage correspondingto 0 years of experience makes sense
No need to center X
![Page 44: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/44.jpg)
44
What happens if we alsoconsider gender? (Model 2)
010
2030
4050
0 20 40 60Years of Experience
Men's hourly wage Women's hourly wagefit2_men fit2_women
Hou
rly W
age
![Page 45: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/45.jpg)
45
Model 2: Gender effect,no experience
For a man with no experience:
For a woman with no experience:0
i
ˆ9.27$)0(2.20-0)(04.027.9Y
20
i
ˆˆ$7.072.20(1)-0.04(0)9.27Y
)enderG(2.20-)Experience(04.027.9Y
)enderG(ˆ)Experience(ˆˆY
iii
i2i10i
![Page 46: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/46.jpg)
46
Model 2: Gender effect,10 years experience
For a man with 10 years of experience:
For a woman with 10 years of experience:
)enderG(2.20-)Experience(04.027.9Y
)enderG(ˆ)Experience(ˆˆY
iii
i2i10i
(10)ˆˆ9.67$)0(2.20-0)1(04.027.9Y
10
i
(1)ˆ(10)ˆˆ7.47$)1(2.20-0)1(04.027.9Y
210
i
![Page 47: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/47.jpg)
47
Model 2: Experience effect,males
For a man with no experience:
For a man with 10 years of experience:0
i
ˆ9.27$)0(2.20-0)(04.027.9Y
)enderG(2.20-)Experience(04.027.9Y
)enderG(ˆ)Experience(ˆˆY
iii
i2i10i
(10)ˆˆ9.67$)0(2.20-0)1(04.027.9Y
10
i
![Page 48: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/48.jpg)
48
Model 2: Experience effect,females
For a woman with no experience:
For a woman with 10 years of experience:
)enderG(2.20-)Experience(04.027.9Y
)enderG(ˆ)Experience(ˆˆY
iii
i2i10i
210
i
ˆ(10)ˆˆ7.47$)1(2.20-0)1(04.027.9Y
20
i
ˆˆ$7.072.20(1)-0.04(0)9.27Y
![Page 49: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/49.jpg)
49
Interpretation: Model 2
: the average hourly wage for a manwith no experience at all is about $9.30.
: for every additional year ofexperience, the predicted hourly wage increasesabout 4 cents for both men and women.
: the expected hourly wage is $2.20lower for women than it is for men at anyexperience level.
27.9ˆ0
04.0ˆ1
20.2ˆ2
![Page 50: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/50.jpg)
50
Model 1 vs. Model 2Model 1:
Model 2:
95% CI for 1 in Model 1: (0.001, 0.07)and from Model 2 is within this CI
Gender is not a confounder
)enderG(2.20-)Experience(04.027.9Y iii
ii Experience04.038.8Y
1ˆ
![Page 51: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/51.jpg)
51
What happens if we considerage, instead? (Model 3)
The relationship is harder to graph with twocontinuous predictors, since now theregression is in a 3-dimensional space.
Notice that age is centered at 40 years.Age ranged between 18 and 64 in thisdataset.
40)-Age(ˆ)Experience(ˆˆY i2i10i
![Page 52: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/52.jpg)
52
Model 3: Age effect,no experience
For a 40-year-old with no experience:
For a 41-year-old with no experience:0
i
ˆ50.62$)4040(0.920)(82.05.26Y
40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY
ii
i2i10i
20
i
ˆˆ42.72$)4041(0.920)(82.05.26Y
![Page 53: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/53.jpg)
53
Model 3: Age effect,10 years experience
For a 40-year-old with 10 years of experience:
For a 41-year-old with 10 years of experience:
10ˆˆ30.18$)4040(0.920)1(82.05.26Y
10
i
40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY
ii
i2i10i
1ˆ10ˆˆ22.19$)4041(0.920)1(82.05.26Y
210
i
![Page 54: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/54.jpg)
54
Model 3: Experience effect,40 year old
For a 40-year-old with no experience:
For a 40-year-old with 10 years of experience:0
i
ˆ50.62$)4040(0.920)(82.05.26Y
40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY
ii
i2i10i
10ˆˆ30.18$)4040(0.920)1(82.05.26Y
10
i
![Page 55: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/55.jpg)
55
Model 3: Experience effect,41 year old
For a 41-year-old with no experience:
For a 41-year-old with 10 years of experience:
40)-Age(0.92)Experience(82.05.6240)-Age(ˆ)Experience(ˆˆY
ii
i2i10i
20
i
ˆˆ42.72$)4041(0.920)(82.05.26Y
1ˆ10ˆˆ22.19$)4041(0.920)1(82.05.26Y
210
i
![Page 56: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/56.jpg)
56
Interpretation: Model 3: the average hourly wage for a 40-
year-old with no experience at all is about$26.50
: for every additional year ofexperience, the predicted hourly wage decreasesabout 82 cents for two people of the same age(or “adjusting for age”)
: for every additional year of age, theexpected hourly wage increases about 92 centsfor two people with the same amount ofexperience (or “adjusting for experience”)
5.26ˆ0
82.0ˆ1
92.0ˆ2
![Page 57: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/57.jpg)
57
Model 1 vs. Model 3Model 1:
Model 3:
95% CI for 1 in Model 1: (0.001, 0.07)and from Model 3 is outside this CI
Age is a confounder. When we adjust for age,the apparent effect of experience on wagechanges.
ii Experience04.038.8Y
1ˆ
40)-Age(0.92)Experience(82.05.62Y iii
![Page 58: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/58.jpg)
58
The Coefficient of Determination
R2 is the “coefficient of determination”R2 measures the ability to predict Yusing XVariability explained by X is
SSM =Total variability is SST =
2)ˆ( yyi
2)( yyi
![Page 59: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/59.jpg)
59
The Coefficient of Determination
R2 is defined as
Measures the proportion of totalvariability explained by the model
2
22
)(
)ˆ(
yy
yy
SSTSSMR
i
i
![Page 60: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/60.jpg)
60
R2 is the square of r, “Pearson’scorrelation coefficient”
r is a rough way of evaluating theassociation between two continuousvariables.
The Coefficient of Determination
![Page 61: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/61.jpg)
61
So, what is R2?
The coefficient of determination, R2
evaluates the entire model.R2 shows the proportion of the totalvariation in Y that has beenpredicted by this model.
Model 1: 0.0076; 0.8% of variationexplainedModel 2: 0.05; 5% of variation explainedModel 3: 0.20; 20% of variation explained
![Page 62: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/62.jpg)
62
What is the adjusted R2?
In both models 2 and 3, the new predictoradded a great deal to the model
R2 increased a lotMore importantly, both new predictors werestatistically significant
R2 always goes up!The adjusted R2 is adjusted for the number ofX’s in the model, so it only goes up whenhelpful predictors are added.
![Page 63: Lecture 6: Introduction to Linear Regressionpeople.virginia.edu/~am3xa/BiostatII/slides/lecture6.pdf · Lecture 6: Introduction to Linear Regression ... Regression Analysis A regression](https://reader033.fdocuments.us/reader033/viewer/2022052712/5b7803f67f8b9a4c438e69a8/html5/thumbnails/63.jpg)
63
SummaryRegression by least squaresInterpreting regression coefficientsAdding a 2nd predictor to a model
Binary X added: 2 parallel linesContinuous X added: 3-dimensional graphfor both, new interpretation reflecting new model
Is the new X a confounder?Compare 1 across models