Lecture 20: Simple Linear Regression API-201Z · Announcements I Midterms nearly graded I Executive...
Transcript of Lecture 20: Simple Linear Regression API-201Z · Announcements I Midterms nearly graded I Executive...
Lecture 20:Simple Linear Regression
API-201Z
Maya Sen
Harvard Kennedy Schoolhttp://scholar.harvard.edu/msen
Announcements
I Midterms nearly graded
I Executive summaries now due on 11/29 (Thursday, as part ofPS #10)
I We’ll set up online poll for which groups will present on 12/4(due date)
I Regular office hours resume post-TG – happy to chat with youat any point about final exercises!
Announcements
I Midterms nearly graded
I Executive summaries now due on 11/29 (Thursday, as part ofPS #10)
I We’ll set up online poll for which groups will present on 12/4(due date)
I Regular office hours resume post-TG – happy to chat with youat any point about final exercises!
Announcements
I Midterms nearly graded
I Executive summaries now due on 11/29 (Thursday, as part ofPS #10)
I We’ll set up online poll for which groups will present on 12/4(due date)
I Regular office hours resume post-TG – happy to chat with youat any point about final exercises!
Announcements
I Midterms nearly graded
I Executive summaries now due on 11/29 (Thursday, as part ofPS #10)
I We’ll set up online poll for which groups will present on 12/4(due date)
I Regular office hours resume post-TG – happy to chat with youat any point about final exercises!
Announcements
I Midterms nearly graded
I Executive summaries now due on 11/29 (Thursday, as part ofPS #10)
I We’ll set up online poll for which groups will present on 12/4(due date)
I Regular office hours resume post-TG – happy to chat with youat any point about final exercises!
Roadmap
I Introduce concept of Ordinary Least Squares (OLS) methodof estimating linear regression
I Discuss simplest application, Simple Linear RegressionI Relationship between two continuous variables
I Hypothesis tests and CIs for regression parameters
I Sets us up to cover regression with more than one explanatoryvariable, interpretation of tables
Roadmap
I Introduce concept of Ordinary Least Squares (OLS) methodof estimating linear regression
I Discuss simplest application, Simple Linear RegressionI Relationship between two continuous variables
I Hypothesis tests and CIs for regression parameters
I Sets us up to cover regression with more than one explanatoryvariable, interpretation of tables
Roadmap
I Introduce concept of Ordinary Least Squares (OLS) methodof estimating linear regression
I Discuss simplest application, Simple Linear RegressionI Relationship between two continuous variables
I Hypothesis tests and CIs for regression parameters
I Sets us up to cover regression with more than one explanatoryvariable, interpretation of tables
Roadmap
I Introduce concept of Ordinary Least Squares (OLS) methodof estimating linear regression
I Discuss simplest application, Simple Linear RegressionI Relationship between two continuous variables
I Hypothesis tests and CIs for regression parameters
I Sets us up to cover regression with more than one explanatoryvariable, interpretation of tables
Roadmap
I Introduce concept of Ordinary Least Squares (OLS) methodof estimating linear regression
I Discuss simplest application, Simple Linear RegressionI Relationship between two continuous variables
I Hypothesis tests and CIs for regression parameters
I Sets us up to cover regression with more than one explanatoryvariable, interpretation of tables
Last time
I We have covered several more advanced inference techniquesI ANOVA: Global test comparing means across groupsI Chi Square Test: Test comparing independence of rows and
columns in a frequency tableI But both suffer from weakness →
I If null rejected, then what can we say about strength/directionof association?
I Can we predict anything?
I Linear regression allows us to assess (1) strength and (2)direction in the relationship between two variables
I Useful across many different applications and for prediction
I Along with difference in means, one of the most widely usedstatistical techniques; we’ll cover only basics in this course
Last time
I We have covered several more advanced inference techniques
I ANOVA: Global test comparing means across groupsI Chi Square Test: Test comparing independence of rows and
columns in a frequency tableI But both suffer from weakness →
I If null rejected, then what can we say about strength/directionof association?
I Can we predict anything?
I Linear regression allows us to assess (1) strength and (2)direction in the relationship between two variables
I Useful across many different applications and for prediction
I Along with difference in means, one of the most widely usedstatistical techniques; we’ll cover only basics in this course
Last time
I We have covered several more advanced inference techniquesI ANOVA: Global test comparing means across groups
I Chi Square Test: Test comparing independence of rows andcolumns in a frequency table
I But both suffer from weakness →I If null rejected, then what can we say about strength/direction
of association?I Can we predict anything?
I Linear regression allows us to assess (1) strength and (2)direction in the relationship between two variables
I Useful across many different applications and for prediction
I Along with difference in means, one of the most widely usedstatistical techniques; we’ll cover only basics in this course
Last time
I We have covered several more advanced inference techniquesI ANOVA: Global test comparing means across groupsI Chi Square Test: Test comparing independence of rows and
columns in a frequency table
I But both suffer from weakness →I If null rejected, then what can we say about strength/direction
of association?I Can we predict anything?
I Linear regression allows us to assess (1) strength and (2)direction in the relationship between two variables
I Useful across many different applications and for prediction
I Along with difference in means, one of the most widely usedstatistical techniques; we’ll cover only basics in this course
Last time
I We have covered several more advanced inference techniquesI ANOVA: Global test comparing means across groupsI Chi Square Test: Test comparing independence of rows and
columns in a frequency tableI But both suffer from weakness →
I If null rejected, then what can we say about strength/directionof association?
I Can we predict anything?
I Linear regression allows us to assess (1) strength and (2)direction in the relationship between two variables
I Useful across many different applications and for prediction
I Along with difference in means, one of the most widely usedstatistical techniques; we’ll cover only basics in this course
Last time
I We have covered several more advanced inference techniquesI ANOVA: Global test comparing means across groupsI Chi Square Test: Test comparing independence of rows and
columns in a frequency tableI But both suffer from weakness →
I If null rejected, then what can we say about strength/directionof association?
I Can we predict anything?
I Linear regression allows us to assess (1) strength and (2)direction in the relationship between two variables
I Useful across many different applications and for prediction
I Along with difference in means, one of the most widely usedstatistical techniques; we’ll cover only basics in this course
Last time
I We have covered several more advanced inference techniquesI ANOVA: Global test comparing means across groupsI Chi Square Test: Test comparing independence of rows and
columns in a frequency tableI But both suffer from weakness →
I If null rejected, then what can we say about strength/directionof association?
I Can we predict anything?
I Linear regression allows us to assess (1) strength and (2)direction in the relationship between two variables
I Useful across many different applications and for prediction
I Along with difference in means, one of the most widely usedstatistical techniques; we’ll cover only basics in this course
Last time
I We have covered several more advanced inference techniquesI ANOVA: Global test comparing means across groupsI Chi Square Test: Test comparing independence of rows and
columns in a frequency tableI But both suffer from weakness →
I If null rejected, then what can we say about strength/directionof association?
I Can we predict anything?
I Linear regression allows us to assess (1) strength and (2)direction in the relationship between two variables
I Useful across many different applications and for prediction
I Along with difference in means, one of the most widely usedstatistical techniques; we’ll cover only basics in this course
State Unemployment Example
I Motivate linear regression with a simple example:
I Suppose our policy area is labor unemployment – thinkunemployment is “sticky” and lags over time
I Is there relationship between state-level unemployment ratesin U.S. in 1995 and in 2000?
I A random sample of 30 states was takenI For each state, data was collected on:
I Unemployment rate in 1995I Unemployment rate in 2000
State Unemployment Example
I Motivate linear regression with a simple example:
I Suppose our policy area is labor unemployment – thinkunemployment is “sticky” and lags over time
I Is there relationship between state-level unemployment ratesin U.S. in 1995 and in 2000?
I A random sample of 30 states was takenI For each state, data was collected on:
I Unemployment rate in 1995I Unemployment rate in 2000
State Unemployment Example
I Motivate linear regression with a simple example:
I Suppose our policy area is labor unemployment – thinkunemployment is “sticky” and lags over time
I Is there relationship between state-level unemployment ratesin U.S. in 1995 and in 2000?
I A random sample of 30 states was takenI For each state, data was collected on:
I Unemployment rate in 1995I Unemployment rate in 2000
State Unemployment Example
I Motivate linear regression with a simple example:
I Suppose our policy area is labor unemployment – thinkunemployment is “sticky” and lags over time
I Is there relationship between state-level unemployment ratesin U.S. in 1995 and in 2000?
I A random sample of 30 states was takenI For each state, data was collected on:
I Unemployment rate in 1995I Unemployment rate in 2000
State Unemployment Example
I Motivate linear regression with a simple example:
I Suppose our policy area is labor unemployment – thinkunemployment is “sticky” and lags over time
I Is there relationship between state-level unemployment ratesin U.S. in 1995 and in 2000?
I A random sample of 30 states was taken
I For each state, data was collected on:I Unemployment rate in 1995I Unemployment rate in 2000
State Unemployment Example
I Motivate linear regression with a simple example:
I Suppose our policy area is labor unemployment – thinkunemployment is “sticky” and lags over time
I Is there relationship between state-level unemployment ratesin U.S. in 1995 and in 2000?
I A random sample of 30 states was takenI For each state, data was collected on:
I Unemployment rate in 1995I Unemployment rate in 2000
State Unemployment Example
I Motivate linear regression with a simple example:
I Suppose our policy area is labor unemployment – thinkunemployment is “sticky” and lags over time
I Is there relationship between state-level unemployment ratesin U.S. in 1995 and in 2000?
I A random sample of 30 states was takenI For each state, data was collected on:
I Unemployment rate in 1995
I Unemployment rate in 2000
State Unemployment Example
I Motivate linear regression with a simple example:
I Suppose our policy area is labor unemployment – thinkunemployment is “sticky” and lags over time
I Is there relationship between state-level unemployment ratesin U.S. in 1995 and in 2000?
I A random sample of 30 states was takenI For each state, data was collected on:
I Unemployment rate in 1995I Unemployment rate in 2000
State Unemployment Example
State 1995 2000
Alabama 5.3 4.0Alaska 7.1 6.2Arizona 5.4 4.1
Arkansas 4.8 4.1California 8.0 5.0Colorado 4.3 3.0
... ... ...
State Unemployment Example
State 1995 2000
Alabama 5.3 4.0Alaska 7.1 6.2Arizona 5.4 4.1
Arkansas 4.8 4.1California 8.0 5.0Colorado 4.3 3.0
... ... ...
State Unemployment Example
State Unemployment Example
State Unemployment Example
I Obvious two variables correlated → could use correlation toexamine relationship
I Correlation: Measures strength of linear association btn 2 vars
I 2 variables treated in similar manner → variablesinterchangeable (correlation of x with y , or y with x , same)
I Correlation coefficient r takes values between 0 and 1I State unemployment rate example:
I Strong positive correlationI Correlation coefficient r = 0.78
State Unemployment Example
I Obvious two variables correlated → could use correlation toexamine relationship
I Correlation: Measures strength of linear association btn 2 vars
I 2 variables treated in similar manner → variablesinterchangeable (correlation of x with y , or y with x , same)
I Correlation coefficient r takes values between 0 and 1I State unemployment rate example:
I Strong positive correlationI Correlation coefficient r = 0.78
State Unemployment Example
I Obvious two variables correlated → could use correlation toexamine relationship
I Correlation: Measures strength of linear association btn 2 vars
I 2 variables treated in similar manner → variablesinterchangeable (correlation of x with y , or y with x , same)
I Correlation coefficient r takes values between 0 and 1I State unemployment rate example:
I Strong positive correlationI Correlation coefficient r = 0.78
State Unemployment Example
I Obvious two variables correlated → could use correlation toexamine relationship
I Correlation: Measures strength of linear association btn 2 vars
I 2 variables treated in similar manner → variablesinterchangeable (correlation of x with y , or y with x , same)
I Correlation coefficient r takes values between 0 and 1I State unemployment rate example:
I Strong positive correlationI Correlation coefficient r = 0.78
State Unemployment Example
I Obvious two variables correlated → could use correlation toexamine relationship
I Correlation: Measures strength of linear association btn 2 vars
I 2 variables treated in similar manner → variablesinterchangeable (correlation of x with y , or y with x , same)
I Correlation coefficient r takes values between 0 and 1
I State unemployment rate example:I Strong positive correlationI Correlation coefficient r = 0.78
State Unemployment Example
I Obvious two variables correlated → could use correlation toexamine relationship
I Correlation: Measures strength of linear association btn 2 vars
I 2 variables treated in similar manner → variablesinterchangeable (correlation of x with y , or y with x , same)
I Correlation coefficient r takes values between 0 and 1I State unemployment rate example:
I Strong positive correlationI Correlation coefficient r = 0.78
State Unemployment Example
I Obvious two variables correlated → could use correlation toexamine relationship
I Correlation: Measures strength of linear association btn 2 vars
I 2 variables treated in similar manner → variablesinterchangeable (correlation of x with y , or y with x , same)
I Correlation coefficient r takes values between 0 and 1I State unemployment rate example:
I Strong positive correlation
I Correlation coefficient r = 0.78
State Unemployment Example
I Obvious two variables correlated → could use correlation toexamine relationship
I Correlation: Measures strength of linear association btn 2 vars
I 2 variables treated in similar manner → variablesinterchangeable (correlation of x with y , or y with x , same)
I Correlation coefficient r takes values between 0 and 1I State unemployment rate example:
I Strong positive correlationI Correlation coefficient r = 0.78
State Unemployment Example
However: we can put more structure on relationship w/ regression
I Regression: Each variable has specific roleI x is explanatory (or independent or predictor) variable
I Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in
this class)
I y is outcome (or dependent or response) variable, the variablewe are trying to predict
I Always represented on vertical (Y ) axisI Here: Continuous (expanded to include dichotomous,
categorical outcomes next semester)
State Unemployment Example
However: we can put more structure on relationship w/ regression
I Regression: Each variable has specific roleI x is explanatory (or independent or predictor) variable
I Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in
this class)
I y is outcome (or dependent or response) variable, the variablewe are trying to predict
I Always represented on vertical (Y ) axisI Here: Continuous (expanded to include dichotomous,
categorical outcomes next semester)
State Unemployment Example
However: we can put more structure on relationship w/ regression
I Regression: Each variable has specific role
I x is explanatory (or independent or predictor) variableI Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in
this class)
I y is outcome (or dependent or response) variable, the variablewe are trying to predict
I Always represented on vertical (Y ) axisI Here: Continuous (expanded to include dichotomous,
categorical outcomes next semester)
State Unemployment Example
However: we can put more structure on relationship w/ regression
I Regression: Each variable has specific roleI x is explanatory (or independent or predictor) variable
I Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in
this class)
I y is outcome (or dependent or response) variable, the variablewe are trying to predict
I Always represented on vertical (Y ) axisI Here: Continuous (expanded to include dichotomous,
categorical outcomes next semester)
State Unemployment Example
However: we can put more structure on relationship w/ regression
I Regression: Each variable has specific roleI x is explanatory (or independent or predictor) variable
I Always represented on horizontal (X ) axis
I Can be binary, categorical, or continuous (will discuss a bit inthis class)
I y is outcome (or dependent or response) variable, the variablewe are trying to predict
I Always represented on vertical (Y ) axisI Here: Continuous (expanded to include dichotomous,
categorical outcomes next semester)
State Unemployment Example
However: we can put more structure on relationship w/ regression
I Regression: Each variable has specific roleI x is explanatory (or independent or predictor) variable
I Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in
this class)
I y is outcome (or dependent or response) variable, the variablewe are trying to predict
I Always represented on vertical (Y ) axisI Here: Continuous (expanded to include dichotomous,
categorical outcomes next semester)
State Unemployment Example
However: we can put more structure on relationship w/ regression
I Regression: Each variable has specific roleI x is explanatory (or independent or predictor) variable
I Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in
this class)
I y is outcome (or dependent or response) variable, the variablewe are trying to predict
I Always represented on vertical (Y ) axisI Here: Continuous (expanded to include dichotomous,
categorical outcomes next semester)
State Unemployment Example
However: we can put more structure on relationship w/ regression
I Regression: Each variable has specific roleI x is explanatory (or independent or predictor) variable
I Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in
this class)
I y is outcome (or dependent or response) variable, the variablewe are trying to predict
I Always represented on vertical (Y ) axis
I Here: Continuous (expanded to include dichotomous,categorical outcomes next semester)
State Unemployment Example
However: we can put more structure on relationship w/ regression
I Regression: Each variable has specific roleI x is explanatory (or independent or predictor) variable
I Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in
this class)
I y is outcome (or dependent or response) variable, the variablewe are trying to predict
I Always represented on vertical (Y ) axisI Here: Continuous (expanded to include dichotomous,
categorical outcomes next semester)
State Unemployment Example
Predictor, Explanatory, or Independent Variable
Out
com
e, R
espo
nse,
or
Dep
ende
nt V
aria
ble
State Unemployment Example
Predictor, Explanatory, or Independent Variable
Out
com
e, R
espo
nse,
or
Dep
ende
nt V
aria
ble
Correlation versus Regression
Regression offers key advantages:
1. Assess whether there is statistically significant relationshipbetween the 2 variables
2. Assess magnitude of that relationship
3. Use explanatory variable to predict predicted values of theoutcome variable
4. Eventually will allow us to take other variables into account
Correlation versus Regression
Regression offers key advantages:
1. Assess whether there is statistically significant relationshipbetween the 2 variables
2. Assess magnitude of that relationship
3. Use explanatory variable to predict predicted values of theoutcome variable
4. Eventually will allow us to take other variables into account
Correlation versus Regression
Regression offers key advantages:
1. Assess whether there is statistically significant relationshipbetween the 2 variables
2. Assess magnitude of that relationship
3. Use explanatory variable to predict predicted values of theoutcome variable
4. Eventually will allow us to take other variables into account
Correlation versus Regression
Regression offers key advantages:
1. Assess whether there is statistically significant relationshipbetween the 2 variables
2. Assess magnitude of that relationship
3. Use explanatory variable to predict predicted values of theoutcome variable
4. Eventually will allow us to take other variables into account
Correlation versus Regression
Regression offers key advantages:
1. Assess whether there is statistically significant relationshipbetween the 2 variables
2. Assess magnitude of that relationship
3. Use explanatory variable to predict predicted values of theoutcome variable
4. Eventually will allow us to take other variables into account
Correlation versus Regression
Regression offers key advantages:
1. Assess whether there is statistically significant relationshipbetween the 2 variables
2. Assess magnitude of that relationship
3. Use explanatory variable to predict predicted values of theoutcome variable
4. Eventually will allow us to take other variables into account
Simple Linear Regression
Let’s explore w/ simplest kind of regression:
I Simple: Only one independent variable (so bivariate)
I Linear: Straight line relationship
I Regression: Method of fitting data to (linear) model
I However: How to find the line that best describes the datasetwe have collected?
Simple Linear Regression
Let’s explore w/ simplest kind of regression:
I Simple: Only one independent variable (so bivariate)
I Linear: Straight line relationship
I Regression: Method of fitting data to (linear) model
I However: How to find the line that best describes the datasetwe have collected?
Simple Linear Regression
Let’s explore w/ simplest kind of regression:
I Simple: Only one independent variable (so bivariate)
I Linear: Straight line relationship
I Regression: Method of fitting data to (linear) model
I However: How to find the line that best describes the datasetwe have collected?
Simple Linear Regression
Let’s explore w/ simplest kind of regression:
I Simple: Only one independent variable (so bivariate)
I Linear: Straight line relationship
I Regression: Method of fitting data to (linear) model
I However: How to find the line that best describes the datasetwe have collected?
Simple Linear Regression
Let’s explore w/ simplest kind of regression:
I Simple: Only one independent variable (so bivariate)
I Linear: Straight line relationship
I Regression: Method of fitting data to (linear) model
I However: How to find the line that best describes the datasetwe have collected?
Simple Linear Regression
Let’s explore w/ simplest kind of regression:
I Simple: Only one independent variable (so bivariate)
I Linear: Straight line relationship
I Regression: Method of fitting data to (linear) model
I However: How to find the line that best describes the datasetwe have collected?
State Unemployment Example
State Unemployment Example
Simple Linear Regression
I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:
y = β0 + β1x
I where y is the outcomeI β0 is interceptI β1 is slopeI and x is the explanatory variable
I Much of our interest is in the size and sign of β1, the slope
I Slope captures the linear relationship between x and y
Simple Linear Regression
I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:
y = β0 + β1x
I where y is the outcomeI β0 is interceptI β1 is slopeI and x is the explanatory variable
I Much of our interest is in the size and sign of β1, the slope
I Slope captures the linear relationship between x and y
Simple Linear Regression
I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:
y = β0 + β1x
I where y is the outcomeI β0 is interceptI β1 is slopeI and x is the explanatory variable
I Much of our interest is in the size and sign of β1, the slope
I Slope captures the linear relationship between x and y
Simple Linear Regression
I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:
y = β0 + β1x
I where y is the outcome
I β0 is interceptI β1 is slopeI and x is the explanatory variable
I Much of our interest is in the size and sign of β1, the slope
I Slope captures the linear relationship between x and y
Simple Linear Regression
I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:
y = β0 + β1x
I where y is the outcomeI β0 is intercept
I β1 is slopeI and x is the explanatory variable
I Much of our interest is in the size and sign of β1, the slope
I Slope captures the linear relationship between x and y
Simple Linear Regression
I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:
y = β0 + β1x
I where y is the outcomeI β0 is interceptI β1 is slope
I and x is the explanatory variable
I Much of our interest is in the size and sign of β1, the slope
I Slope captures the linear relationship between x and y
Simple Linear Regression
I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:
y = β0 + β1x
I where y is the outcomeI β0 is interceptI β1 is slopeI and x is the explanatory variable
I Much of our interest is in the size and sign of β1, the slope
I Slope captures the linear relationship between x and y
Simple Linear Regression
I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:
y = β0 + β1x
I where y is the outcomeI β0 is interceptI β1 is slopeI and x is the explanatory variable
I Much of our interest is in the size and sign of β1, the slope
I Slope captures the linear relationship between x and y
Simple Linear Regression
I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:
y = β0 + β1x
I where y is the outcomeI β0 is interceptI β1 is slopeI and x is the explanatory variable
I Much of our interest is in the size and sign of β1, the slope
I Slope captures the linear relationship between x and y
Positive Relationship Between X and Y
Slope is Positive
X
Y
Positive Relationship Between X and Y
Slope is Positive
X
Y
Negative Relationship Between X and Y
Slope is Negative
X
Y
No Relationship Between X and Y
Slope is 0
X
Y
Simple Linear Regression
I However: Simple line yi = β0 + β1xi assumes perfectlydeterministic relationship between x and y
I Maybe good for understanding, e.g., relationship of Fahrenheitto Celsius, but not much else!
I More realistic → x and y are related linearly, but there issome noise around that, so that it’s not a single perfect line
I Thus: for a single observation (xi , yi ):
yi = β0︸︷︷︸Intercept
+ β1︸︷︷︸Slope
xi + εi︸︷︷︸Error
I where εi also known as random errors
Simple Linear Regression
I However: Simple line yi = β0 + β1xi assumes perfectlydeterministic relationship between x and y
I Maybe good for understanding, e.g., relationship of Fahrenheitto Celsius, but not much else!
I More realistic → x and y are related linearly, but there issome noise around that, so that it’s not a single perfect line
I Thus: for a single observation (xi , yi ):
yi = β0︸︷︷︸Intercept
+ β1︸︷︷︸Slope
xi + εi︸︷︷︸Error
I where εi also known as random errors
Simple Linear Regression
I However: Simple line yi = β0 + β1xi assumes perfectlydeterministic relationship between x and y
I Maybe good for understanding, e.g., relationship of Fahrenheitto Celsius, but not much else!
I More realistic → x and y are related linearly, but there issome noise around that, so that it’s not a single perfect line
I Thus: for a single observation (xi , yi ):
yi = β0︸︷︷︸Intercept
+ β1︸︷︷︸Slope
xi + εi︸︷︷︸Error
I where εi also known as random errors
Simple Linear Regression
I However: Simple line yi = β0 + β1xi assumes perfectlydeterministic relationship between x and y
I Maybe good for understanding, e.g., relationship of Fahrenheitto Celsius, but not much else!
I More realistic → x and y are related linearly, but there issome noise around that, so that it’s not a single perfect line
I Thus: for a single observation (xi , yi ):
yi = β0︸︷︷︸Intercept
+ β1︸︷︷︸Slope
xi + εi︸︷︷︸Error
I where εi also known as random errors
Simple Linear Regression
I However: Simple line yi = β0 + β1xi assumes perfectlydeterministic relationship between x and y
I Maybe good for understanding, e.g., relationship of Fahrenheitto Celsius, but not much else!
I More realistic → x and y are related linearly, but there issome noise around that, so that it’s not a single perfect line
I Thus: for a single observation (xi , yi ):
yi = β0︸︷︷︸Intercept
+ β1︸︷︷︸Slope
xi + εi︸︷︷︸Error
I where εi also known as random errors
Simple Linear Regression
I However: Simple line yi = β0 + β1xi assumes perfectlydeterministic relationship between x and y
I Maybe good for understanding, e.g., relationship of Fahrenheitto Celsius, but not much else!
I More realistic → x and y are related linearly, but there issome noise around that, so that it’s not a single perfect line
I Thus: for a single observation (xi , yi ):
yi = β0︸︷︷︸Intercept
+ β1︸︷︷︸Slope
xi + εi︸︷︷︸Error
I where εi also known as random errors
Simple Linear Regression
I However: Simple line yi = β0 + β1xi assumes perfectlydeterministic relationship between x and y
I Maybe good for understanding, e.g., relationship of Fahrenheitto Celsius, but not much else!
I More realistic → x and y are related linearly, but there issome noise around that, so that it’s not a single perfect line
I Thus: for a single observation (xi , yi ):
yi = β0︸︷︷︸Intercept
+ β1︸︷︷︸Slope
xi + εi︸︷︷︸Error
I where εi also known as random errors
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I This describes the “true” relationship between x and y :
yi = β0 + β1xi + εi
I However: We can never observe β0 and β1 → these arepopulation parameters!
I Best thing we can do is estimate them using our data
I Thus, we have an estimated linear relationship:
yi = b0 + b1xi + ei
I Sometimes also denoted using “hat” notation as
yi = β0 + β1xi + εi
I Residuals (ei ) represent estimates of the random errors, ε1
Simple Linear Regression
I Note: Important alternative way of thinking about linearregression is via expected values
I E [yi |xi ] gives the expected (or mean) value of yi for a givenvalue of the independent variable, xi
I Under the linear specification,
E [yi |xi ] = β0 + β1xi
I All predicted values fall exactly on regression line
I Why no error term here? Because E [εi |xi ] = 0
I (You’ll see violations of this in API 202)
Simple Linear Regression
I Note: Important alternative way of thinking about linearregression is via expected values
I E [yi |xi ] gives the expected (or mean) value of yi for a givenvalue of the independent variable, xi
I Under the linear specification,
E [yi |xi ] = β0 + β1xi
I All predicted values fall exactly on regression line
I Why no error term here? Because E [εi |xi ] = 0
I (You’ll see violations of this in API 202)
Simple Linear Regression
I Note: Important alternative way of thinking about linearregression is via expected values
I E [yi |xi ] gives the expected (or mean) value of yi for a givenvalue of the independent variable, xi
I Under the linear specification,
E [yi |xi ] = β0 + β1xi
I All predicted values fall exactly on regression line
I Why no error term here? Because E [εi |xi ] = 0
I (You’ll see violations of this in API 202)
Simple Linear Regression
I Note: Important alternative way of thinking about linearregression is via expected values
I E [yi |xi ] gives the expected (or mean) value of yi for a givenvalue of the independent variable, xi
I Under the linear specification,
E [yi |xi ] = β0 + β1xi
I All predicted values fall exactly on regression line
I Why no error term here? Because E [εi |xi ] = 0
I (You’ll see violations of this in API 202)
Simple Linear Regression
I Note: Important alternative way of thinking about linearregression is via expected values
I E [yi |xi ] gives the expected (or mean) value of yi for a givenvalue of the independent variable, xi
I Under the linear specification,
E [yi |xi ] = β0 + β1xi
I All predicted values fall exactly on regression line
I Why no error term here? Because E [εi |xi ] = 0
I (You’ll see violations of this in API 202)
Simple Linear Regression
I Note: Important alternative way of thinking about linearregression is via expected values
I E [yi |xi ] gives the expected (or mean) value of yi for a givenvalue of the independent variable, xi
I Under the linear specification,
E [yi |xi ] = β0 + β1xi
I All predicted values fall exactly on regression line
I Why no error term here? Because E [εi |xi ] = 0
I (You’ll see violations of this in API 202)
Simple Linear Regression
I Note: Important alternative way of thinking about linearregression is via expected values
I E [yi |xi ] gives the expected (or mean) value of yi for a givenvalue of the independent variable, xi
I Under the linear specification,
E [yi |xi ] = β0 + β1xi
I All predicted values fall exactly on regression line
I Why no error term here? Because E [εi |xi ] = 0
I (You’ll see violations of this in API 202)
Simple Linear Regression
I Note: Important alternative way of thinking about linearregression is via expected values
I E [yi |xi ] gives the expected (or mean) value of yi for a givenvalue of the independent variable, xi
I Under the linear specification,
E [yi |xi ] = β0 + β1xi
I All predicted values fall exactly on regression line
I Why no error term here? Because E [εi |xi ] = 0
I (You’ll see violations of this in API 202)
How to find the best estimated line?
Going back to our data:
How to fit the best line?
How to find the best estimated line?
Going back to our data:
How to fit the best line?
How to find the best estimated line?
Going back to our data:
How to fit the best line?
How to find the best estimated line?
Going back to our data:
We’ll take the line that minimizes the sum of squared residuals
How to find the best estimated line?
Going back to our data:
We’ll take the line that minimizes the sum of squared residuals
How to find the best estimated line?
I Specifically we will choose the values of β0 and β1 thatminimize:
n∑i=1
(yi − yi )2
I Or:n∑
i=1
(yi − β0 − β1xi )2
I Gives Ordinary Least Squares Estimators (see appendix forproof)
I Could calculate other ways to fit a line, but OLS has veryattractive properties
I Under Gauss-Markov Theorem, least squares line is “BLUE”(Best Linear Unbiased Estimator)
I For properties, see Wikipedia (Link)I Video of proof at Khan Academy (Link)
How to find the best estimated line?
I Specifically we will choose the values of β0 and β1 thatminimize:
n∑i=1
(yi − yi )2
I Or:n∑
i=1
(yi − β0 − β1xi )2
I Gives Ordinary Least Squares Estimators (see appendix forproof)
I Could calculate other ways to fit a line, but OLS has veryattractive properties
I Under Gauss-Markov Theorem, least squares line is “BLUE”(Best Linear Unbiased Estimator)
I For properties, see Wikipedia (Link)I Video of proof at Khan Academy (Link)
How to find the best estimated line?
I Specifically we will choose the values of β0 and β1 thatminimize:
n∑i=1
(yi − yi )2
I Or:n∑
i=1
(yi − β0 − β1xi )2
I Gives Ordinary Least Squares Estimators (see appendix forproof)
I Could calculate other ways to fit a line, but OLS has veryattractive properties
I Under Gauss-Markov Theorem, least squares line is “BLUE”(Best Linear Unbiased Estimator)
I For properties, see Wikipedia (Link)I Video of proof at Khan Academy (Link)
How to find the best estimated line?
I Specifically we will choose the values of β0 and β1 thatminimize:
n∑i=1
(yi − yi )2
I Or:n∑
i=1
(yi − β0 − β1xi )2
I Gives Ordinary Least Squares Estimators (see appendix forproof)
I Could calculate other ways to fit a line, but OLS has veryattractive properties
I Under Gauss-Markov Theorem, least squares line is “BLUE”(Best Linear Unbiased Estimator)
I For properties, see Wikipedia (Link)I Video of proof at Khan Academy (Link)
How to find the best estimated line?
I Specifically we will choose the values of β0 and β1 thatminimize:
n∑i=1
(yi − yi )2
I Or:n∑
i=1
(yi − β0 − β1xi )2
I Gives Ordinary Least Squares Estimators (see appendix forproof)
I Could calculate other ways to fit a line, but OLS has veryattractive properties
I Under Gauss-Markov Theorem, least squares line is “BLUE”(Best Linear Unbiased Estimator)
I For properties, see Wikipedia (Link)I Video of proof at Khan Academy (Link)
How to find the best estimated line?
I Specifically we will choose the values of β0 and β1 thatminimize:
n∑i=1
(yi − yi )2
I Or:n∑
i=1
(yi − β0 − β1xi )2
I Gives Ordinary Least Squares Estimators (see appendix forproof)
I Could calculate other ways to fit a line, but OLS has veryattractive properties
I Under Gauss-Markov Theorem, least squares line is “BLUE”(Best Linear Unbiased Estimator)
I For properties, see Wikipedia (Link)
I Video of proof at Khan Academy (Link)
How to find the best estimated line?
I Specifically we will choose the values of β0 and β1 thatminimize:
n∑i=1
(yi − yi )2
I Or:n∑
i=1
(yi − β0 − β1xi )2
I Gives Ordinary Least Squares Estimators (see appendix forproof)
I Could calculate other ways to fit a line, but OLS has veryattractive properties
I Under Gauss-Markov Theorem, least squares line is “BLUE”(Best Linear Unbiased Estimator)
I For properties, see Wikipedia (Link)I Video of proof at Khan Academy (Link)
OLS Estimates for One Explanatory Variable
I Proof (in Appendix) gives us equation for slope estimate:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and equation for the intercept estimate:
b0 = y − b1x
I where x is average of x values (explanatory variable)
I and y is average of y values (outcome variable)
I Note that b1 = r sxsy
OLS Estimates for One Explanatory Variable
I Proof (in Appendix) gives us equation for slope estimate:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and equation for the intercept estimate:
b0 = y − b1x
I where x is average of x values (explanatory variable)
I and y is average of y values (outcome variable)
I Note that b1 = r sxsy
OLS Estimates for One Explanatory Variable
I Proof (in Appendix) gives us equation for slope estimate:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and equation for the intercept estimate:
b0 = y − b1x
I where x is average of x values (explanatory variable)
I and y is average of y values (outcome variable)
I Note that b1 = r sxsy
OLS Estimates for One Explanatory Variable
I Proof (in Appendix) gives us equation for slope estimate:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and equation for the intercept estimate:
b0 = y − b1x
I where x is average of x values (explanatory variable)
I and y is average of y values (outcome variable)
I Note that b1 = r sxsy
OLS Estimates for One Explanatory Variable
I Proof (in Appendix) gives us equation for slope estimate:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and equation for the intercept estimate:
b0 = y − b1x
I where x is average of x values (explanatory variable)
I and y is average of y values (outcome variable)
I Note that b1 = r sxsy
OLS Estimates for One Explanatory Variable
I Proof (in Appendix) gives us equation for slope estimate:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and equation for the intercept estimate:
b0 = y − b1x
I where x is average of x values (explanatory variable)
I and y is average of y values (outcome variable)
I Note that b1 = r sxsy
OLS Estimates for One Explanatory Variable
I Proof (in Appendix) gives us equation for slope estimate:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and equation for the intercept estimate:
b0 = y − b1x
I where x is average of x values (explanatory variable)
I and y is average of y values (outcome variable)
I Note that b1 = r sxsy
OLS Estimates for One Explanatory Variable
I Proof (in Appendix) gives us equation for slope estimate:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and equation for the intercept estimate:
b0 = y − b1x
I where x is average of x values (explanatory variable)
I and y is average of y values (outcome variable)
I Note that b1 = r sxsy
State Unemployment Example
I Rare to calculate by hand except for simplest cases
I In STATA
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I In R, use lm (linear model) command: lm(yr2000 ∼ yr1995)
I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317
State Unemployment Example
I Rare to calculate by hand except for simplest cases
I In STATA
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I In R, use lm (linear model) command: lm(yr2000 ∼ yr1995)
I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317
State Unemployment Example
I Rare to calculate by hand except for simplest cases
I In STATA
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I In R, use lm (linear model) command: lm(yr2000 ∼ yr1995)
I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317
State Unemployment Example
I Rare to calculate by hand except for simplest cases
I In STATA
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I In R, use lm (linear model) command: lm(yr2000 ∼ yr1995)
I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317
State Unemployment Example
I Rare to calculate by hand except for simplest cases
I In STATA
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I In R, use lm (linear model) command:
lm(yr2000 ∼ yr1995)
I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317
State Unemployment Example
I Rare to calculate by hand except for simplest cases
I In STATA
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I In R, use lm (linear model) command: lm(yr2000 ∼ yr1995)
I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317
State Unemployment Example
I Rare to calculate by hand except for simplest cases
I In STATA
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I In R, use lm (linear model) command: lm(yr2000 ∼ yr1995)
I Statistical software will give you:
I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317
State Unemployment Example
I Rare to calculate by hand except for simplest cases
I In STATA
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I In R, use lm (linear model) command: lm(yr2000 ∼ yr1995)
I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917
I Slope coefficient estimate (b1 orβ1): 0.5398317
State Unemployment Example
I Rare to calculate by hand except for simplest cases
I In STATA
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I In R, use lm (linear model) command: lm(yr2000 ∼ yr1995)
I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317
State Unemployment Example
I Gives us estimated regression line:
y = 1.08 + 0.54x
I How to interpret?
I One-unit increase in x associated w/ b1 increase/decrease in y
I Here: Based on our data, an increase of 1 percentage point in1995 unemployment rate is associated w/ an increase of 0.54percent point in 2000 unemployment rate
State Unemployment Example
I Gives us estimated regression line:
y = 1.08 + 0.54x
I How to interpret?
I One-unit increase in x associated w/ b1 increase/decrease in y
I Here: Based on our data, an increase of 1 percentage point in1995 unemployment rate is associated w/ an increase of 0.54percent point in 2000 unemployment rate
State Unemployment Example
I Gives us estimated regression line:
y = 1.08 + 0.54x
I How to interpret?
I One-unit increase in x associated w/ b1 increase/decrease in y
I Here: Based on our data, an increase of 1 percentage point in1995 unemployment rate is associated w/ an increase of 0.54percent point in 2000 unemployment rate
State Unemployment Example
I Gives us estimated regression line:
y = 1.08 + 0.54x
I How to interpret?
I One-unit increase in x associated w/ b1 increase/decrease in y
I Here: Based on our data, an increase of 1 percentage point in1995 unemployment rate is associated w/ an increase of 0.54percent point in 2000 unemployment rate
State Unemployment Example
I Gives us estimated regression line:
y = 1.08 + 0.54x
I How to interpret?
I One-unit increase in x associated w/ b1 increase/decrease in y
I Here: Based on our data, an increase of 1 percentage point in1995 unemployment rate is associated w/ an increase of 0.54percent point in 2000 unemployment rate
State Unemployment Example
I Gives us estimated regression line:
y = 1.08 + 0.54x
I How to interpret?
I One-unit increase in x associated w/ b1 increase/decrease in y
I Here: Based on our data, an increase of 1 percentage point in1995 unemployment rate is associated w/ an increase of 0.54percent point in 2000 unemployment rate
State Unemployment Example
State Unemployment Example
State Unemployment Example
OLS Assumptions
OLS relies on several key assumptions
I (1) There is a linear relationship in the population betweenthe independent variable x and the outcome y
I (2) Observations are independent (i.e., one observation oneach state)
I (3) Errors are not correlated with one another
I → You’ll study violations of these assumptions in API 202
OLS Assumptions
OLS relies on several key assumptions
I (1) There is a linear relationship in the population betweenthe independent variable x and the outcome y
I (2) Observations are independent (i.e., one observation oneach state)
I (3) Errors are not correlated with one another
I → You’ll study violations of these assumptions in API 202
OLS Assumptions
OLS relies on several key assumptions
I (1) There is a linear relationship in the population betweenthe independent variable x and the outcome y
I (2) Observations are independent (i.e., one observation oneach state)
I (3) Errors are not correlated with one another
I → You’ll study violations of these assumptions in API 202
OLS Assumptions
OLS relies on several key assumptions
I (1) There is a linear relationship in the population betweenthe independent variable x and the outcome y
I (2) Observations are independent (i.e., one observation oneach state)
I (3) Errors are not correlated with one another
I → You’ll study violations of these assumptions in API 202
OLS Assumptions
OLS relies on several key assumptions
I (1) There is a linear relationship in the population betweenthe independent variable x and the outcome y
I (2) Observations are independent (i.e., one observation oneach state)
I (3) Errors are not correlated with one another
I → You’ll study violations of these assumptions in API 202
Using Regression for Prediction
I We can use information from estimated regression line topredict relationships between x and y
I Ex) Suppose interested in predicting 2000 unemployment ratefor another state not included in the sample
I One state has unemployment rate of 7.5% in 1995 → what ispredicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(7.5) = 5.3
I Another state has unemployment rate of 14% in 1995 → whatis predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(14.0) = 8.64
I These are called predicted values
Using Regression for PredictionI We can use information from estimated regression line to
predict relationships between x and y
I Ex) Suppose interested in predicting 2000 unemployment ratefor another state not included in the sample
I One state has unemployment rate of 7.5% in 1995 → what ispredicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(7.5) = 5.3
I Another state has unemployment rate of 14% in 1995 → whatis predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(14.0) = 8.64
I These are called predicted values
Using Regression for PredictionI We can use information from estimated regression line to
predict relationships between x and yI Ex) Suppose interested in predicting 2000 unemployment rate
for another state not included in the sample
I One state has unemployment rate of 7.5% in 1995 → what ispredicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(7.5) = 5.3
I Another state has unemployment rate of 14% in 1995 → whatis predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(14.0) = 8.64
I These are called predicted values
Using Regression for PredictionI We can use information from estimated regression line to
predict relationships between x and yI Ex) Suppose interested in predicting 2000 unemployment rate
for another state not included in the sampleI One state has unemployment rate of 7.5% in 1995 → what is
predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(7.5) = 5.3
I Another state has unemployment rate of 14% in 1995 → whatis predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(14.0) = 8.64
I These are called predicted values
Using Regression for PredictionI We can use information from estimated regression line to
predict relationships between x and yI Ex) Suppose interested in predicting 2000 unemployment rate
for another state not included in the sampleI One state has unemployment rate of 7.5% in 1995 → what is
predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(7.5) = 5.3
I Another state has unemployment rate of 14% in 1995 → whatis predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(14.0) = 8.64
I These are called predicted values
Using Regression for PredictionI We can use information from estimated regression line to
predict relationships between x and yI Ex) Suppose interested in predicting 2000 unemployment rate
for another state not included in the sampleI One state has unemployment rate of 7.5% in 1995 → what is
predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(7.5) = 5.3
I Another state has unemployment rate of 14% in 1995 → whatis predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(14.0) = 8.64
I These are called predicted values
Using Regression for PredictionI We can use information from estimated regression line to
predict relationships between x and yI Ex) Suppose interested in predicting 2000 unemployment rate
for another state not included in the sampleI One state has unemployment rate of 7.5% in 1995 → what is
predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(7.5) = 5.3
I Another state has unemployment rate of 14% in 1995 → whatis predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(14.0) = 8.64
I These are called predicted values
Using Regression for PredictionI We can use information from estimated regression line to
predict relationships between x and yI Ex) Suppose interested in predicting 2000 unemployment rate
for another state not included in the sampleI One state has unemployment rate of 7.5% in 1995 → what is
predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(7.5) = 5.3
I Another state has unemployment rate of 14% in 1995 → whatis predicted 2000 rate?
y = 1.08 + 0.54x
= 1.08 + 0.54(14.0) = 8.64
I These are called predicted values
State Unemployment Example
Some notes on prediction:
1. These are good predictions, but not necessarily correct!
2. Regression line is only good for predicting values in range forwhich we have data
I Best not to extrapolate, or predict values outside this rangeI 1995 state unemployment in our data ranges from 3% to
around 10%I Should we use regression equation to predict 2000
unemployment for state w/ 40% 1995 unemployment?
State Unemployment Example
Some notes on prediction:
1. These are good predictions, but not necessarily correct!
2. Regression line is only good for predicting values in range forwhich we have data
I Best not to extrapolate, or predict values outside this rangeI 1995 state unemployment in our data ranges from 3% to
around 10%I Should we use regression equation to predict 2000
unemployment for state w/ 40% 1995 unemployment?
State Unemployment Example
Some notes on prediction:
1. These are good predictions, but not necessarily correct!
2. Regression line is only good for predicting values in range forwhich we have data
I Best not to extrapolate, or predict values outside this rangeI 1995 state unemployment in our data ranges from 3% to
around 10%I Should we use regression equation to predict 2000
unemployment for state w/ 40% 1995 unemployment?
State Unemployment Example
Some notes on prediction:
1. These are good predictions, but not necessarily correct!
2. Regression line is only good for predicting values in range forwhich we have data
I Best not to extrapolate, or predict values outside this rangeI 1995 state unemployment in our data ranges from 3% to
around 10%I Should we use regression equation to predict 2000
unemployment for state w/ 40% 1995 unemployment?
State Unemployment Example
Some notes on prediction:
1. These are good predictions, but not necessarily correct!
2. Regression line is only good for predicting values in range forwhich we have data
I Best not to extrapolate, or predict values outside this range
I 1995 state unemployment in our data ranges from 3% toaround 10%
I Should we use regression equation to predict 2000unemployment for state w/ 40% 1995 unemployment?
State Unemployment Example
Some notes on prediction:
1. These are good predictions, but not necessarily correct!
2. Regression line is only good for predicting values in range forwhich we have data
I Best not to extrapolate, or predict values outside this rangeI 1995 state unemployment in our data ranges from 3% to
around 10%
I Should we use regression equation to predict 2000unemployment for state w/ 40% 1995 unemployment?
State Unemployment Example
Some notes on prediction:
1. These are good predictions, but not necessarily correct!
2. Regression line is only good for predicting values in range forwhich we have data
I Best not to extrapolate, or predict values outside this rangeI 1995 state unemployment in our data ranges from 3% to
around 10%I Should we use regression equation to predict 2000
unemployment for state w/ 40% 1995 unemployment?
Using Regression for Hypothesis Tests of Slope
I Can also use OLS estimators in hypothesis testing framework
I Remember that for OLS we estimate the slope via:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and the intercept via:
b0 = y − b1x
I Both b1 and b0 are sums and means of random variables
I Means that CLT kicks in!
I → b1 and b0 are normally distributed in large samples!
Using Regression for Hypothesis Tests of Slope
I Can also use OLS estimators in hypothesis testing framework
I Remember that for OLS we estimate the slope via:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and the intercept via:
b0 = y − b1x
I Both b1 and b0 are sums and means of random variables
I Means that CLT kicks in!
I → b1 and b0 are normally distributed in large samples!
Using Regression for Hypothesis Tests of Slope
I Can also use OLS estimators in hypothesis testing framework
I Remember that for OLS we estimate the slope via:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and the intercept via:
b0 = y − b1x
I Both b1 and b0 are sums and means of random variables
I Means that CLT kicks in!
I → b1 and b0 are normally distributed in large samples!
Using Regression for Hypothesis Tests of Slope
I Can also use OLS estimators in hypothesis testing framework
I Remember that for OLS we estimate the slope via:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and the intercept via:
b0 = y − b1x
I Both b1 and b0 are sums and means of random variables
I Means that CLT kicks in!
I → b1 and b0 are normally distributed in large samples!
Using Regression for Hypothesis Tests of Slope
I Can also use OLS estimators in hypothesis testing framework
I Remember that for OLS we estimate the slope via:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and the intercept via:
b0 = y − b1x
I Both b1 and b0 are sums and means of random variables
I Means that CLT kicks in!
I → b1 and b0 are normally distributed in large samples!
Using Regression for Hypothesis Tests of Slope
I Can also use OLS estimators in hypothesis testing framework
I Remember that for OLS we estimate the slope via:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and the intercept via:
b0 = y − b1x
I Both b1 and b0 are sums and means of random variables
I Means that CLT kicks in!
I → b1 and b0 are normally distributed in large samples!
Using Regression for Hypothesis Tests of Slope
I Can also use OLS estimators in hypothesis testing framework
I Remember that for OLS we estimate the slope via:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and the intercept via:
b0 = y − b1x
I Both b1 and b0 are sums and means of random variables
I Means that CLT kicks in!
I → b1 and b0 are normally distributed in large samples!
Using Regression for Hypothesis Tests of Slope
I Can also use OLS estimators in hypothesis testing framework
I Remember that for OLS we estimate the slope via:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and the intercept via:
b0 = y − b1x
I Both b1 and b0 are sums and means of random variables
I Means that CLT kicks in!
I → b1 and b0 are normally distributed in large samples!
Using Regression for Hypothesis Tests of Slope
I Can also use OLS estimators in hypothesis testing framework
I Remember that for OLS we estimate the slope via:
b1 =
∑(xi − x)(yi − y)∑
(xi − x)2
I and the intercept via:
b0 = y − b1x
I Both b1 and b0 are sums and means of random variables
I Means that CLT kicks in!
I → b1 and b0 are normally distributed in large samples!
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:I H0: β1 = 0I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:I H0: β1 = 0I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:I H0: β1 = 0I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:
I H0: β1 = 0I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:I H0: β1 = 0
I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:I H0: β1 = 0I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:I H0: β1 = 0I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:I H0: β1 = 0I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:I H0: β1 = 0I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
Using Regression for Hypothesis Tests of Slope
I Can use this fact to conduct hypothesis tests, usuallytwo-tailed
I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables
I Null and alternative hypotheses:I H0: β1 = 0I Ha: β1 6= 0
I Test statistic given by
tn−2 =b1 − 0
SE (b1)
I Where we use a t distribution and (usually) a two-tailed testand SE [b1]
SE (b1) =
√∑(yi − yi )2∑(xi − x)2
State Unemployment Example
I STATA and R report results of two-tailed hypothesis test:
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I Note: For β1, hypothesis test yields p-value of < 0.001
I Note: Hypothesis test for β0 is testing null hypothesis that interceptequal to zero → that mean of y is zero when mean of x is zero
State Unemployment Example
I STATA and R report results of two-tailed hypothesis test:
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I Note: For β1, hypothesis test yields p-value of < 0.001
I Note: Hypothesis test for β0 is testing null hypothesis that interceptequal to zero → that mean of y is zero when mean of x is zero
State Unemployment Example
I STATA and R report results of two-tailed hypothesis test:
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I Note: For β1, hypothesis test yields p-value of < 0.001
I Note: Hypothesis test for β0 is testing null hypothesis that interceptequal to zero → that mean of y is zero when mean of x is zero
State Unemployment Example
I STATA and R report results of two-tailed hypothesis test:
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I Note: For β1, hypothesis test yields p-value of < 0.001
I Note: Hypothesis test for β0 is testing null hypothesis that interceptequal to zero → that mean of y is zero when mean of x is zero
State Unemployment Example
I STATA and R report results of two-tailed hypothesis test:
. regress yr2000 yr1995
-----------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t|
-----------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000
_cons | 1.077917 .4571589 2.36 0.026
----------------------------------------------------
I Note: For β1, hypothesis test yields p-value of < 0.001
I Note: Hypothesis test for β0 is testing null hypothesis that interceptequal to zero → that mean of y is zero when mean of x is zero
State Unemployment Example
I Statistical interpretation?I Since p-value < 0.001, we can reject null hypothesis thatβ1 = 0 at an α = 0.05 level
I Substantive interpretation?I Strong evidence against the slope being zeroI Implies that there appears to be some relationship between
state unemployment rates in 1995 and in 2000
I In addition: Estimated slope suggests positive association →higher 1995 rate is linked w/higher 2000 rate
State Unemployment Example
I Statistical interpretation?
I Since p-value < 0.001, we can reject null hypothesis thatβ1 = 0 at an α = 0.05 level
I Substantive interpretation?I Strong evidence against the slope being zeroI Implies that there appears to be some relationship between
state unemployment rates in 1995 and in 2000
I In addition: Estimated slope suggests positive association →higher 1995 rate is linked w/higher 2000 rate
State Unemployment Example
I Statistical interpretation?I Since p-value < 0.001, we can reject null hypothesis thatβ1 = 0 at an α = 0.05 level
I Substantive interpretation?I Strong evidence against the slope being zeroI Implies that there appears to be some relationship between
state unemployment rates in 1995 and in 2000
I In addition: Estimated slope suggests positive association →higher 1995 rate is linked w/higher 2000 rate
State Unemployment Example
I Statistical interpretation?I Since p-value < 0.001, we can reject null hypothesis thatβ1 = 0 at an α = 0.05 level
I Substantive interpretation?
I Strong evidence against the slope being zeroI Implies that there appears to be some relationship between
state unemployment rates in 1995 and in 2000
I In addition: Estimated slope suggests positive association →higher 1995 rate is linked w/higher 2000 rate
State Unemployment Example
I Statistical interpretation?I Since p-value < 0.001, we can reject null hypothesis thatβ1 = 0 at an α = 0.05 level
I Substantive interpretation?I Strong evidence against the slope being zero
I Implies that there appears to be some relationship betweenstate unemployment rates in 1995 and in 2000
I In addition: Estimated slope suggests positive association →higher 1995 rate is linked w/higher 2000 rate
State Unemployment Example
I Statistical interpretation?I Since p-value < 0.001, we can reject null hypothesis thatβ1 = 0 at an α = 0.05 level
I Substantive interpretation?I Strong evidence against the slope being zeroI Implies that there appears to be some relationship between
state unemployment rates in 1995 and in 2000
I In addition: Estimated slope suggests positive association →higher 1995 rate is linked w/higher 2000 rate
State Unemployment Example
I Statistical interpretation?I Since p-value < 0.001, we can reject null hypothesis thatβ1 = 0 at an α = 0.05 level
I Substantive interpretation?I Strong evidence against the slope being zeroI Implies that there appears to be some relationship between
state unemployment rates in 1995 and in 2000
I In addition: Estimated slope suggests positive association →higher 1995 rate is linked w/higher 2000 rate
Using Regression for Confidence Intervals of Slope
I Just as we can conduct hypothesis tests, can also constructconfidence intervals for true slope, β1
I Follows the same formula as before:
b1 ± tn−2(α/2)× SE [b1]
I In our example (w/30 observations):
0.5398± t28,α/2 × 0.0818
→ (0.372, 0.707)
I Interpretation: In repeated sampling, expect 95 out of 100confidence intervals to contain true slope
Using Regression for Confidence Intervals of Slope
I Just as we can conduct hypothesis tests, can also constructconfidence intervals for true slope, β1
I Follows the same formula as before:
b1 ± tn−2(α/2)× SE [b1]
I In our example (w/30 observations):
0.5398± t28,α/2 × 0.0818
→ (0.372, 0.707)
I Interpretation: In repeated sampling, expect 95 out of 100confidence intervals to contain true slope
Using Regression for Confidence Intervals of Slope
I Just as we can conduct hypothesis tests, can also constructconfidence intervals for true slope, β1
I Follows the same formula as before:
b1 ± tn−2(α/2)× SE [b1]
I In our example (w/30 observations):
0.5398± t28,α/2 × 0.0818
→ (0.372, 0.707)
I Interpretation: In repeated sampling, expect 95 out of 100confidence intervals to contain true slope
Using Regression for Confidence Intervals of Slope
I Just as we can conduct hypothesis tests, can also constructconfidence intervals for true slope, β1
I Follows the same formula as before:
b1 ± tn−2(α/2)× SE [b1]
I In our example (w/30 observations):
0.5398± t28,α/2 × 0.0818
→ (0.372, 0.707)
I Interpretation: In repeated sampling, expect 95 out of 100confidence intervals to contain true slope
Using Regression for Confidence Intervals of Slope
I Just as we can conduct hypothesis tests, can also constructconfidence intervals for true slope, β1
I Follows the same formula as before:
b1 ± tn−2(α/2)× SE [b1]
I In our example (w/30 observations):
0.5398± t28,α/2 × 0.0818
→ (0.372, 0.707)
I Interpretation: In repeated sampling, expect 95 out of 100confidence intervals to contain true slope
Using Regression for Confidence Intervals of Slope
I Just as we can conduct hypothesis tests, can also constructconfidence intervals for true slope, β1
I Follows the same formula as before:
b1 ± tn−2(α/2)× SE [b1]
I In our example (w/30 observations):
0.5398± t28,α/2 × 0.0818
→ (0.372, 0.707)
I Interpretation: In repeated sampling, expect 95 out of 100confidence intervals to contain true slope
Using Regression for Confidence Intervals of Slope
I Just as we can conduct hypothesis tests, can also constructconfidence intervals for true slope, β1
I Follows the same formula as before:
b1 ± tn−2(α/2)× SE [b1]
I In our example (w/30 observations):
0.5398± t28,α/2 × 0.0818
→ (0.372, 0.707)
I Interpretation: In repeated sampling, expect 95 out of 100confidence intervals to contain true slope
State Unemployment Example
STATA and R will also report 95% CIs
. regress yr2000 yr1995
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 1, 28) = 43.54
Model | 13.3338426 1 13.3338426 Prob > F = 0.0000
Residual | 8.57415592 28 .306219854 R-squared = 0.6086
-------------+------------------------------ Adj R-squared = 0.5947
Total | 21.9079986 29 .755448226 Root MSE = .55337
------------------------------------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000 .372255 .7074084
_cons | 1.077917 .4571589 2.36 0.026 .1414697 2.014365
------------------------------------------------------------------------------
State Unemployment Example
STATA and R will also report 95% CIs
. regress yr2000 yr1995
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 1, 28) = 43.54
Model | 13.3338426 1 13.3338426 Prob > F = 0.0000
Residual | 8.57415592 28 .306219854 R-squared = 0.6086
-------------+------------------------------ Adj R-squared = 0.5947
Total | 21.9079986 29 .755448226 Root MSE = .55337
------------------------------------------------------------------------------
yr2000 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yr1995 | .5398317 .0818083 6.60 0.000 .372255 .7074084
_cons | 1.077917 .4571589 2.36 0.026 .1414697 2.014365
------------------------------------------------------------------------------
Model Fit of a Simple Linear Regression
I Model fit is a measure of how “well” the line fits the data
I In linear regression, R2 most commonly used measure
I R2: Proportion of variance in y explained by variance in x
I With one explanatory variable (one x), correlation coefficientr is square root of R2:
r =√R
I Here: √0.6086 = 0.780
I Substantive interpretation: High R2 → two variables highlycorrelated, regression explaining a lot of the variance in theoutcome
Model Fit of a Simple Linear Regression
I Model fit is a measure of how “well” the line fits the data
I In linear regression, R2 most commonly used measure
I R2: Proportion of variance in y explained by variance in x
I With one explanatory variable (one x), correlation coefficientr is square root of R2:
r =√R
I Here: √0.6086 = 0.780
I Substantive interpretation: High R2 → two variables highlycorrelated, regression explaining a lot of the variance in theoutcome
Model Fit of a Simple Linear Regression
I Model fit is a measure of how “well” the line fits the data
I In linear regression, R2 most commonly used measure
I R2: Proportion of variance in y explained by variance in x
I With one explanatory variable (one x), correlation coefficientr is square root of R2:
r =√R
I Here: √0.6086 = 0.780
I Substantive interpretation: High R2 → two variables highlycorrelated, regression explaining a lot of the variance in theoutcome
Model Fit of a Simple Linear Regression
I Model fit is a measure of how “well” the line fits the data
I In linear regression, R2 most commonly used measure
I R2: Proportion of variance in y explained by variance in x
I With one explanatory variable (one x), correlation coefficientr is square root of R2:
r =√R
I Here: √0.6086 = 0.780
I Substantive interpretation: High R2 → two variables highlycorrelated, regression explaining a lot of the variance in theoutcome
Model Fit of a Simple Linear Regression
I Model fit is a measure of how “well” the line fits the data
I In linear regression, R2 most commonly used measure
I R2: Proportion of variance in y explained by variance in x
I With one explanatory variable (one x), correlation coefficientr is square root of R2:
r =√R
I Here: √0.6086 = 0.780
I Substantive interpretation: High R2 → two variables highlycorrelated, regression explaining a lot of the variance in theoutcome
Model Fit of a Simple Linear Regression
I Model fit is a measure of how “well” the line fits the data
I In linear regression, R2 most commonly used measure
I R2: Proportion of variance in y explained by variance in x
I With one explanatory variable (one x), correlation coefficientr is square root of R2:
r =√R
I Here: √0.6086 = 0.780
I Substantive interpretation: High R2 → two variables highlycorrelated, regression explaining a lot of the variance in theoutcome
Model Fit of a Simple Linear Regression
I Model fit is a measure of how “well” the line fits the data
I In linear regression, R2 most commonly used measure
I R2: Proportion of variance in y explained by variance in x
I With one explanatory variable (one x), correlation coefficientr is square root of R2:
r =√R
I Here: √0.6086 = 0.780
I Substantive interpretation: High R2 → two variables highlycorrelated, regression explaining a lot of the variance in theoutcome
Model Fit of a Simple Linear Regression
I Model fit is a measure of how “well” the line fits the data
I In linear regression, R2 most commonly used measure
I R2: Proportion of variance in y explained by variance in x
I With one explanatory variable (one x), correlation coefficientr is square root of R2:
r =√R
I Here: √0.6086 = 0.780
I Substantive interpretation: High R2 → two variables highlycorrelated, regression explaining a lot of the variance in theoutcome
Some Notes About Residuals
I Residuals represent estimates of the random errors, ε1I Empirically: Represent “left-over” distance from each
observation to regression line after fitting
I Differences observed in our sample data between each pointand regression line (vertically):
Residual = Observed y − Predicted y
I Least-squares line makes sum of the squared residuals as smallas possible
I Other strategies for drawing the line probably have biggervalues for this sum
Some Notes About Residuals
I Residuals represent estimates of the random errors, ε1
I Empirically: Represent “left-over” distance from eachobservation to regression line after fitting
I Differences observed in our sample data between each pointand regression line (vertically):
Residual = Observed y − Predicted y
I Least-squares line makes sum of the squared residuals as smallas possible
I Other strategies for drawing the line probably have biggervalues for this sum
Some Notes About Residuals
I Residuals represent estimates of the random errors, ε1I Empirically: Represent “left-over” distance from each
observation to regression line after fitting
I Differences observed in our sample data between each pointand regression line (vertically):
Residual = Observed y − Predicted y
I Least-squares line makes sum of the squared residuals as smallas possible
I Other strategies for drawing the line probably have biggervalues for this sum
Some Notes About Residuals
I Residuals represent estimates of the random errors, ε1I Empirically: Represent “left-over” distance from each
observation to regression line after fitting
I Differences observed in our sample data between each pointand regression line (vertically):
Residual = Observed y − Predicted y
I Least-squares line makes sum of the squared residuals as smallas possible
I Other strategies for drawing the line probably have biggervalues for this sum
Some Notes About Residuals
I Residuals represent estimates of the random errors, ε1I Empirically: Represent “left-over” distance from each
observation to regression line after fitting
I Differences observed in our sample data between each pointand regression line (vertically):
Residual = Observed y − Predicted y
I Least-squares line makes sum of the squared residuals as smallas possible
I Other strategies for drawing the line probably have biggervalues for this sum
Some Notes About Residuals
I Residuals represent estimates of the random errors, ε1I Empirically: Represent “left-over” distance from each
observation to regression line after fitting
I Differences observed in our sample data between each pointand regression line (vertically):
Residual = Observed y − Predicted y
I Least-squares line makes sum of the squared residuals as smallas possible
I Other strategies for drawing the line probably have biggervalues for this sum
Some Notes About Residuals
I Residuals represent estimates of the random errors, ε1I Empirically: Represent “left-over” distance from each
observation to regression line after fitting
I Differences observed in our sample data between each pointand regression line (vertically):
Residual = Observed y − Predicted y
I Least-squares line makes sum of the squared residuals as smallas possible
I Other strategies for drawing the line probably have biggervalues for this sum
Some Notes About Residuals
Some Notes About Residuals
Some Notes About Residuals
I Sum of residuals equals zero using least-squares regression
I → Plotting residuals against x values should result in plotthat looks random, i.e. no pattern present
I If pattern, a line might not be a good fit for the data
Some Notes About Residuals
I Sum of residuals equals zero using least-squares regression
I → Plotting residuals against x values should result in plotthat looks random, i.e. no pattern present
I If pattern, a line might not be a good fit for the data
Some Notes About Residuals
I Sum of residuals equals zero using least-squares regression
I → Plotting residuals against x values should result in plotthat looks random, i.e. no pattern present
I If pattern, a line might not be a good fit for the data
Some Notes About Residuals
I Sum of residuals equals zero using least-squares regression
I → Plotting residuals against x values should result in plotthat looks random, i.e. no pattern present
I If pattern, a line might not be a good fit for the data
Some Notes About Residuals
In Stata:
predict res, r
plot res yr1995
Some Notes About ResidualsIn Stata:
predict res, r
plot res yr1995
Some Notes About ResidualsIn Stata:
predict res, r
plot res yr1995
Some Notes About ResidualsIn Stata:
predict res, r
plot res yr1995
Some Notes About Residuals
I Left hand side: Looks random
I Right hand side: Looks like errors get bigger with larger xvalues → heteroskedasticity
Some Notes About Residuals
I Left hand side: Looks random
I Right hand side: Looks like errors get bigger with larger xvalues → heteroskedasticity
Some Notes About Residuals
I Left hand side: Looks random
I Right hand side: Looks like errors get bigger with larger xvalues → heteroskedasticity
Some Notes About Residuals
I Left hand side: Looks random
I Right hand side: Looks like errors get bigger with larger xvalues → heteroskedasticity
Outliers and Leverage Points
I Outlier: Observation that has an unusual y value, conditionalon x
I Leverage point: Observation that has an unusual x value (farfrom the mean of X )
I An observation is influential if it substantially changes theregression line → that is, it is an outlier and has high leverage
I Outlier, leverage points, and influential observations raiseinteresting questions to examine more
Outliers and Leverage Points
I Outlier: Observation that has an unusual y value, conditionalon x
I Leverage point: Observation that has an unusual x value (farfrom the mean of X )
I An observation is influential if it substantially changes theregression line → that is, it is an outlier and has high leverage
I Outlier, leverage points, and influential observations raiseinteresting questions to examine more
Outliers and Leverage Points
I Outlier: Observation that has an unusual y value, conditionalon x
I Leverage point: Observation that has an unusual x value (farfrom the mean of X )
I An observation is influential if it substantially changes theregression line → that is, it is an outlier and has high leverage
I Outlier, leverage points, and influential observations raiseinteresting questions to examine more
Outliers and Leverage Points
I Outlier: Observation that has an unusual y value, conditionalon x
I Leverage point: Observation that has an unusual x value (farfrom the mean of X )
I An observation is influential if it substantially changes theregression line → that is, it is an outlier and has high leverage
I Outlier, leverage points, and influential observations raiseinteresting questions to examine more
Outliers and Leverage Points
I Outlier: Observation that has an unusual y value, conditionalon x
I Leverage point: Observation that has an unusual x value (farfrom the mean of X )
I An observation is influential if it substantially changes theregression line → that is, it is an outlier and has high leverage
I Outlier, leverage points, and influential observations raiseinteresting questions to examine more
Outliers and Leverage Points
Outliers and Leverage Points
Outliers and Leverage Points
Warning about Association versus Causation
I Linear regression allows us under certain circumstances toI Make statements about whether a relationship between two
variables existsI Make statements about the size of that relationshipI Predict one variable using another
I However: At this point, not ok to say variable “causes”change in other variables → this requires additionalassumptions about the relationship between x and y
I You’ll visit the additional assumptions required to make causalstatements in API 202
Warning about Association versus Causation
I Linear regression allows us under certain circumstances to
I Make statements about whether a relationship between twovariables exists
I Make statements about the size of that relationshipI Predict one variable using another
I However: At this point, not ok to say variable “causes”change in other variables → this requires additionalassumptions about the relationship between x and y
I You’ll visit the additional assumptions required to make causalstatements in API 202
Warning about Association versus Causation
I Linear regression allows us under certain circumstances toI Make statements about whether a relationship between two
variables exists
I Make statements about the size of that relationshipI Predict one variable using another
I However: At this point, not ok to say variable “causes”change in other variables → this requires additionalassumptions about the relationship between x and y
I You’ll visit the additional assumptions required to make causalstatements in API 202
Warning about Association versus Causation
I Linear regression allows us under certain circumstances toI Make statements about whether a relationship between two
variables existsI Make statements about the size of that relationship
I Predict one variable using another
I However: At this point, not ok to say variable “causes”change in other variables → this requires additionalassumptions about the relationship between x and y
I You’ll visit the additional assumptions required to make causalstatements in API 202
Warning about Association versus Causation
I Linear regression allows us under certain circumstances toI Make statements about whether a relationship between two
variables existsI Make statements about the size of that relationshipI Predict one variable using another
I However: At this point, not ok to say variable “causes”change in other variables → this requires additionalassumptions about the relationship between x and y
I You’ll visit the additional assumptions required to make causalstatements in API 202
Warning about Association versus Causation
I Linear regression allows us under certain circumstances toI Make statements about whether a relationship between two
variables existsI Make statements about the size of that relationshipI Predict one variable using another
I However: At this point, not ok to say variable “causes”change in other variables → this requires additionalassumptions about the relationship between x and y
I You’ll visit the additional assumptions required to make causalstatements in API 202
Warning about Association versus Causation
I Linear regression allows us under certain circumstances toI Make statements about whether a relationship between two
variables existsI Make statements about the size of that relationshipI Predict one variable using another
I However: At this point, not ok to say variable “causes”change in other variables → this requires additionalassumptions about the relationship between x and y
I You’ll visit the additional assumptions required to make causalstatements in API 202
Next Time
I More on interpretation
I Multiple regression: regression with two or more explanatoryvariables
Next Time
I More on interpretation
I Multiple regression: regression with two or more explanatoryvariables
Next Time
I More on interpretation
I Multiple regression: regression with two or more explanatoryvariables
Appendix: Proof of Least Squares Coefficient EstimatorsTaking the partial derivatives:
S(b0, b1) =n∑
i=1
(Yi − b0 − Xib1)2
=
n∑i=1
(Y 2i − 2Yib0 − 2Yib1Xi + b20 + 2b0b1Xi + b21X
2i )
∂S(b0, b1)
∂b0=
n∑i=1
(−2Yi + 2b0 + 2b1Xi )
= −2n∑
i=1
(Yi − b0 − b1Xi )
∂S(b0, b1)
∂b1=
n∑i=1
(−2YiXi + 2b0Xi + 2b1X2i )
= −2n∑
i=1
Xi (Yi − b0 − b1Xi )
Appendix: Proof of Least Squares Coefficient Estimators
I One condition of β0 and β1 minimizing the sum of thesquared residuals is that they must make the partialderivatives equal to 0
I Each of these conditions is called a first order condition.
I The first order conditions are:
0 = −2n∑
i=1
(Yi − β0 − β1Xi )
0 = −2n∑
i=1
Xi (Yi − β0 − β1Xi )
Appendix: Proof of Least Squares Coefficient Estimators
I Let’s solve for the estimator of the intercept first:
0 = −2n∑
i=1
(Yi − β0 − β1Xi )
0 =
n∑i=1
(Yi − β0 − β1Xi )
0 =
n∑i=1
Yi −
n∑i=1
β0 −
n∑i=1
β1Xi
β0n =
(n∑
i=1
Yi
)− β1
(n∑
i=1
Xi
)β0 = Y − β1X
Appendix: Proof of Least Squares Coefficient EstimatorsI Now, we can plug this back in to get an estimate for the slope:
0 = −2n∑
i=1
Xi (Yi − β0 − β1Xi )
0 =
n∑i=1
Xi (Yi − β0 − β1Xi )
0 =
n∑i=1
Xi (Yi − (Y − β1X ) − β1Xi )
0 =
n∑i=1
Xi (Yi − Y − β1(Xi − X ))
0 =
n∑i=1
Xi (Yi − Y ) − β1
n∑i=1
Xi (Xi − X )
β1
n∑i=1
Xi (Xi − X ) =
n∑i=1
Xi (Yi − Y ) − X∑i=1
(Yi − Y )
Appendix: Proof of Least Squares Coefficient Estimators
β1
n∑i=1
Xi (Xi − X ) − X∑i=1
(Xi − X ) =
n∑i=1
(Xi − X )(Yi − Y )
β1
n∑i=1
(Xi − X )2 =
n∑i=1
(Xi − X )(Yi − Y )
β1 =
∑ni=1(Xi − X )(Yi − Y )∑n
i=1(Xi − X )2
Appendix: Proof of Least Squares Coefficient Estimators
I Note: We used a key fact about sums and means,∑ni=1(Yi − Y ) = 0
I Deviations from mean sum to 0
I Intuitively this makes sense because the mean is just the sumof observations divided by n
I Allows us to write∑n
i=1 Xi (Yi −Y ) =∑n
i=1(Xi −X )(Yi −Y )