Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2....
Transcript of Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2....
![Page 1: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/1.jpg)
+
Tutorial Regression and
correlation
Presented by Jessica Raterman Shannon Hodges
![Page 2: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/2.jpg)
+Setting and checking your data
n Install the package > data(birthwt, package=“MASS”) or > install.packages(“MASS”) n Load the data > library(MASS) n Look over the raw data > print(birthwt) or > birthwt
![Page 3: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/3.jpg)
+Setting and checking your data
• Raw Data low age lwt race smoke ptl ht ui ftv bwt 85 0 19 182 2 0 0 0 1 0 2523 86 0 33 155 3 0 0 0 0 3 2551 87 0 20 105 1 1 0 0 0 1 2557
![Page 4: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/4.jpg)
+Setting and checking your data
n Check data form and structure
• Find variable names
> names(birthwt)
• Look at data structure
> str(birthwt)
![Page 5: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/5.jpg)
+Setting and checking your data
• Variables Names
[1] "low" "age" "lwt” "race" "smoke" "ptl" "ht" "ui" [9] "ftv" "bwt"
• Look at data structure
'data.frame': 189 obs. of 10 variables: $ low : int 0 0 0 0 0 0 0 0 0 0 ... $ age : int 19 33 20 21 18 21 22 17 29 26 ... $ lwt : int 182 155 105 108 107 124 118 103 123 113 ...
![Page 6: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/6.jpg)
+Setting and checking your data • Explore the variables > ?birthwt
or
> help(birthwt)
• Check data summary
> summary(birthwt)
• Rename the data if desired, e.g.
> bw <- birthwt
![Page 7: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/7.jpg)
+Setting and checking your data
• Explore the variables lwt mother's weight in pounds at last menstrual bwt birth weight in grams. • Check data summary
low age Min. :0.0000 Min. :14.00 1st Qu. :0.0000 1st Qu. :19.00 Median :0.0000 Median :23.00 Mean :0.3122 Mean :23.24 3rd Qu. :1.0000 3rd Qu. :26.00 Max. :1.0000 Max. :45.00
![Page 8: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/8.jpg)
+Setting and checking your data
n Examine all scatterplots > pairs(birthwt) • Choose two variables to scatterplot > plot(birthwt$bwt, birthwt$lwt) • Examine correlation results > cor(birthwt$bwt, birthwt$lwt)
![Page 9: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/9.jpg)
+Setting and checking your data
lwt x bwt
[1] 0.1857333
1000 2000 3000 4000 5000
100
150
200
250
birthwt$bwt
birthwt$lwt
![Page 10: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/10.jpg)
+Setting and checking your data
n Check normality and distribution
> hist(birthwt$lwt)
and/or
> stem(birthwt$lwt)
> hist(birthwt$bwt)
![Page 11: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/11.jpg)
+Setting and checking your data
n Check normality and distribution
Histogram of birthwt$lwt
birthwt$lwt
Frequency
100 150 200 250
010
2030
4050
6070
Histogram of birthwt$bwt
birthwt$bwt
Frequency
1000 2000 3000 4000 5000
010
2030
40
![Page 12: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/12.jpg)
+Setting and checking your data
n Transform your data if needed • Create a new vector (column) for this > sqrtlwt <- sqrt(birthwt$lwt) > loglwt <- log(birthwt$lwt) • Recheck your data > hist(sqrtlwt) > hist(loglwt)
![Page 13: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/13.jpg)
+Setting and checking your data
n Transform your data if needed Histogram of loglwt
loglwt
Frequency
4.4 4.6 4.8 5.0 5.2 5.4 5.6
010
2030
4050
![Page 14: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/14.jpg)
+Parametric: correlation
n Recheck your results > cor(loglwt, birthwt$bwt)
The default setting uses Pearson’s r > plot(loglwt, birthwt$bwt)
![Page 15: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/15.jpg)
+Parametric: correlation
[1] 0.2036035
4.4 4.6 4.8 5.0 5.2 5.4
1000
2000
3000
4000
5000
loglwt
birthwt$bwt
![Page 16: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/16.jpg)
+Parametric: Linear Regression
n Specify model for simple regression
> m1=lm(birthwt$bwt~loglwt)
n Check your results with summary
> summary(m1)
You will want to check p-value, R2, slope, F-statistic
![Page 17: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/17.jpg)
+Parametric: Linear Regression
n Summary Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -390.8 1174.0 -0.333 0.73958 loglwt 688.9 242.2 2.844 0.00495 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 715.8 on 187 degrees of freedom Multiple R-squared: 0.04145, Adjusted R-squared: 0.03633 F-statistic: 8.087 on 1 and 187 DF, p-value: 0.004954
![Page 18: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/18.jpg)
+Parametric: Linear Regression
n Plot your model, check normality > plot(m1) Plot shows: • Residuals vs fitted - Numbered data are
potential problem points skewing the model. • Q-Q plot
![Page 19: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/19.jpg)
+
2600 2800 3000 3200 3400
-2000
-1000
01000
2000
Fitted values
Residuals
lm(birthwt$bwt ~ loglwt)
Residuals vs Fitted
131133
130
-3 -2 -1 0 1 2 3
-3-2
-10
12
3
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
lm(birthwt$bwt ~ loglwt)
Normal Q-Q
131133
130
![Page 20: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/20.jpg)
+Parametric: Linear Regression
n Confidence and Prediction • Confidence intervals for all parameters > confint(m1) > confint(m1, level = 0.95) • CI for mean response > predict.lm(m1, interval=“confidence”) • Single predicted values of mean response > predict.lm(m1, interval="prediction")
![Page 21: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/21.jpg)
+Parametric: Linear Regression
n Add the line of best fit
> abline(m1)
Rerun the plot if needed first:
> plot(loglwt, birthwt$bwt)
n Find the regression equation
• Infer from summary data: y = B0 +/- B1x
![Page 22: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/22.jpg)
+Parametric: Linear Regression
(Intercept) -390.8 loglwt 688.9
4.4 4.6 4.8 5.0 5.2 5.4
1000
2000
3000
4000
5000
loglwt
birthwt$bwt
![Page 23: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/23.jpg)
+Nonparametric
Use when there is residuals are not normally distributed (i.e. cannot assume linear relationship between x and y).
n Correlation • Change coeff. correl. to nonparametric option
> ?cor
> cor(birthwt$bwt, birthwt$lwt, method=c(“spearman”))
![Page 24: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/24.jpg)
+Nonparametric
n Smooth with loess, then use linear reg. > m1.lo <- loess(birthwt$bwt~loglwt, span = 100, degree = 1)
> j <- order(loglwt) > plot(m1.lo) > lines(loglwt[j],m1.lo$fitted[j],col="red",lwd=3)
• Check residuals again > summary(m1.lo)
![Page 25: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/25.jpg)
+Further practice
n Try one run-through of the tutorial with a new set of data that meet parametric requirements, and one that meets the requirements of nonparametric data.
• For new data: > data() • Or browse online: https://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html
![Page 26: Regression and correlationpeople.tamu.edu/~alawing/materials/ESSM689/RCtutorial... · 2015. 2. 3. · Setting and checking your data • Raw Data low age lwt race smoke ptl ht ui](https://reader035.fdocuments.us/reader035/viewer/2022071104/5fdde89834fa58675f110fa6/html5/thumbnails/26.jpg)
+Sources
n Hartlaub, BA. 2011. “Introduction to R.” [internet]. Downloaded on January 26, 2015. Available at http://www2.kenyon.edu/Depts/Math/hartlaub/Math305%20Fall2011/R.htm
n Hosmer DW, Lemeshow S, and Sturdivant RX, editors. 1989. Applied Logistic Regression, 3rd edition. New York: John Wiley & Sons Inc.
n Stack Exchange. [internet]. “Fit a Line with LOESS in R.” Downloaded on January 30, 2015. Available at http://stackoverflow.com/questions/15337777/fit-a-line-with-loess-in-r