Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

9
Regression using lm lmRegression.R • Basics • Prediction World Bank CO2 Data

Transcript of Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

Page 1: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

Regression using lmlmRegression.R

• Basics• Prediction• World Bank CO2 Data

Page 2: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

Simple Linear regression

• Simple linear model: y = b1 + x b2 + error

y: the dependent variable x: the independent variable b1, b2 : intercept and slope coefficients

error: random departures between the model and the response.

Coefficients estimated by least squares

Page 3: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

Multiple regression

• y = b0 + x1 b1 + x2b2 + x3b3 + … + error

Page 4: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

Annual Boulder Temperatures

Temperature is dependent variable, Year is the independent variableErrors =???? Linear =???

Page 5: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

CO 2 Emissions by Country

• Independent: GDP/capita• Dependent: CO2 emission• Linear?? Errors ??

Page 6: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

The R lm function

• Takes a formula to describe the regression where ~ means equals

• Works best when the data set is a data frame• Returns a complicated list that can be used in summary,

predict, print plot lmFit <- lm( y ~ x1 + x2)

Page 7: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

Or more generally using a data frame

lmFit <- lm( y ~ x1 + x2, data=dataset)

dataset$y, dataset$x1, dataset$x2

Page 8: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

Analysis of World Bank data set

• Best to work on a log scale and GDP has the strongest linear relationship

• Some additional pattern leftover in the residuals

• Try other variables • Try a more complex curve• Check the predictions using cross-validation

Page 9: Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.

Leave-one-out Cross-validation• Robust way to check a models predictions andthe uncertainty measure

• Four steps:1. Sequentially leave out each observation2. Refit model with remaining data3. Predict the omitted observation4. Compare prediction and confidence interval to the actual

observation

A check on the consistency of the statistical modelBecause omitted observation is not used to make prediction