Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation...

73
Correlation analysis. Regression. 7.1

Transcript of Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation...

Page 1: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

Correlation analysis.

Regression.

7.1

Page 2: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.12

Page 3: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.13

Page 4: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.13

Page 5: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.33

Page 6: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

Values from the same group tend to be similar.

There is no tendency for values from the same group to be similar.

7.15

Page 7: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.14

Page 8: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

Modelling of data:Linear regression

7.16

Page 9: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.17 Overview

Page 10: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.18 Overview of the model fitting process

Page 11: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.20 Linear regression

Page 12: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.21 Estimation using ordinary least squares (OLS)

Page 13: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.22 Normal equations

Page 14: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.23 OLS solutions and predictions

Page 15: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.24 A statistical model

Page 16: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.25 Model assumptions

Page 17: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.26 Maximum likehood estimation

Page 18: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.28 Correlation coefficient

Page 19: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.29

Page 20: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.30 Hypothesis testing

Page 21: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.31

Page 22: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.32 Hypothesis testing for intercept and slope

Page 23: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.34

Page 24: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.35 Check data before doing a regression!!

Page 25: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.36

Page 26: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.37 Model diagnostic: variance and linearity

Page 27: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.38 variance stabilizing methods

Page 28: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.39 Normal residuals

Page 29: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.40 Non-normal errors

Page 30: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.41 Correlated residuals (autocorrelation)

1

Page 31: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.42 Other variable transformations

Page 32: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.43 Box-Cox family of transformations

Page 33: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.44 Parameter tuning

Page 34: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.45

Page 35: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.46 Model building

Page 36: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.47 Separate linear regressions. Examples: consider the following scenario

Page 37: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.48 Basis functions

Br(x)

Page 38: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.53 Goodness of fit criteria

Page 39: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.54 Recap

Page 40: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.55Linear regression in R:case<-read.csv("case1.txt", header=T, sep="\t")plot(case[,5],case[,6])t0=lm(case[,6]~case[,5])k = summary(t0)[[4]][2,1]b = summary(t0)[[4]][1,1]x=seq(5,70,by=1)points(x,k*x+b,type="l",col="red") ORabline(t0)

Page 41: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.56Before accepting the result of a linear regression it is important to evaluate it suitability at explaining the data. layout(matrix(1:4,2,2)); plot(t0)

Page 42: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.57One more example:x=seq(1,10,by=1)y=x+rnorm(10,0,1)y[5]=50t0=lm(y~x)plot(x,y);abline(t0)

Page 43: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.58layout(matrix(1:4,2,2))plot(t0)

Page 44: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.59Leverage and Cook’s distance:Cook’s distance measures the effect of deleting a given observation. Points with large Cook’s distance are considered to merit closer analysis. It is sum over a squared difference between the prediction from the full regression model and the prediction in which this point was deleted. P- the number of fitted parameters. MSE – the mean square error of the regression model.

Page 45: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

Robust regression. As the residual goes down, the weight goes up.

7.60

Page 46: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.61Nonlinear regression in R:x=seq(0,10,by=0.1)y=3*sin(x)+1+rnorm(length(x),mean=0,sd=0.3)plot(x,y)t1=nls(y~b*sin(x)+a,start=list(a=0.1,b=0.1))

Page 47: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.62Nonlinear regression in R:x=seq(0,10,by=0.1)y=3*sin(x)+1+rnorm(length(x),mean=0,sd=0.3)plot(x,y)t1=nls(y~b*sin(x)+a,start=list(a=0.1,b=0.1))

points(x,summary(t1)[[10]][2,1]*sin(x)+summary(t1)[[10]][1,1],type="l",col="red")

Page 48: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.55 Defining Models in RIt is necessary to understand the syntax for defining models in R. Let’s assume that the dependent variable being modeled is Y and that A, B and C are independent variables that might affect Y. The table below provides some useful examples. Note that the mathematical symbols used to define models do not have their normal meanings!

7.63

Page 49: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

Risk, Odds, Odds ration and

Logistic regression

Page 50: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 51: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 52: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 53: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 54: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 55: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 56: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 57: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 58: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 59: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 60: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 61: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 62: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 63: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 64: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be
Page 65: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.65

b0+b1xb0+b1x

Page 66: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.66 Logistic regression in R

mylogit<-gml(y~x,family=binomial(link="logit"));b0=mylogit$coefficients[1]; b1=mylogit$coefficients[2]; summary(t0)

Page 67: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.67

Page 68: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.68

Page 69: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.69

Page 70: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

7.70

Page 71: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

�71

mylogit<- glm(as.formula(data[,1]~data[,2]+data[,3]), family=binomial(link="logit"), na.action=na.pass)

koef1=exp(mylogit$coefficients[2]) ##### Odds ratiokoef2=exp(confint(mylogit))[2,1] ##### Confifence interval of odds ratio leftkoef3=exp(confint(mylogit))[2,2] ##### Confifence interval of odds ratio rightkoef4=summary(mylogit)[["coefficients"]][,"Pr(>|z|)"][2] ##### P-value of odds ratio

Extracting parameters of logistic regression

Page 72: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

Stepwise regression• Any stepwise procedure in logistic regression is based on a statistical

algorithm that checks for the "importance" of variables, and either includes or excludes them on the basis of a fixed decision rule.

• The "importance" of a variable is defined in terms of a measure of the statistical significance of the coefficient for the variable.

• The statistic used depends on the assumptions of the model. In stepwise linear regression an F-test is used since the errors are assumed to be normally distributed. In logistic regression the errors are assumed to follow a binomial distribution, and significance is assessed via likelihood ratio chi-square test.

• Thus at any step in the procedure the most important variable, in statistical terms, is the one that produces the greatest change in the log-likelihood relative to a model not containing the variable.

Page 73: Correlation analysis. Regression. - UCLrmjbale/Stat/5.pdf · 2018. 10. 10. · Correlation analysis. Regression. 7.1. 7.12. 7.13. 7.13. 7.33 . Values from the same group tend to be

Stepwise regression in R

Any stepwise regression procedure is an algorithm for forward selection followed by backward elimination.

stepAIC(object, direction = c("both", "backward", "forward")

X=runif(250,-2.5,2.5)Y=runif(250, -2.5,2.5)Z=runif(250,-2.5,2.55) K=round(1/(1+exp(-X))+runif(50,-0.01,0.01))data=cbind(K,X,Y,Z)library(MASS)mylogit<- glm(as.formula(data[,1]~data[,2]+data[,3]+data[,4]+data[,2]*data[,3]+data[,2]*data[,4]+data[,3]*data[,4]), family=binomial(link="logit"), na.action=na.pass)step <- stepAIC(mylogit, direction="both")