Lecture 15: Logistic Regression: Inference and link functions
description
Transcript of Lecture 15: Logistic Regression: Inference and link functions
![Page 1: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/1.jpg)
Lecture 15:Logistic Regression: Inference and link functions
BMTRY 701Biostatistical Methods II
![Page 2: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/2.jpg)
More on our example
> pros5.reg <- glm(cap.inv ~ log(psa) + gleason, family=binomial)> summary(pros5.reg)
Call:glm(formula = cap.inv ~ log(psa) + gleason, family = binomial)
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.1061 0.9916 -8.174 2.97e-16 ***log(psa) 0.4812 0.1448 3.323 0.000892 ***gleason 1.0229 0.1595 6.412 1.43e-10 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 512.29 on 379 degrees of freedomResidual deviance: 403.90 on 377 degrees of freedomAIC: 409.9
![Page 3: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/3.jpg)
Other covariates: Simple logistic models
Covariate Beta exp(Beta) Z
Age -0.0082 0.99 -0.51
Race -0.054 0.95 -0.15
Vol -0.014 0.99 -2.26
Dig Exam (vs. no nodule)
Unilobar left 0.88 2.41 2.81
Unilobar right 1.56 4.76 4.78
Bilobar 2.10 8.17 5.44
Detection in RE 1.71 5.53 4.48
LogPSA 0.87 2.39 6.62
Gleason 1.24 3.46 8.12
![Page 4: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/4.jpg)
What is a good multiple regression model?
Principles of model building are analogous to linear regression
We use the same approach• Look for significant covariates in simple models• consider multicollinearity• look for confounding (i.e. change in betas when a
covariate is removed)
![Page 5: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/5.jpg)
Multiple regression model proposal
Gleason, logPSA, Volume, Digital Exam result, detection in RE
But, what about collinearity? 5 choose 2 pairs.
gleason log.psa. volgleason 1.00 0.46 -0.06log.psa. 0.46 1.00 0.05vol -0.06 0.05 1.00
gleason
-1 0 1 2 3 4 5
02
46
8
-10
12
34
5
log.psa.
0 2 4 6 8 0 20 40 60 80 100
020
4060
8010
0
vol
![Page 6: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/6.jpg)
Categorical pairs
> dpros.dcaps <- epitab(dpros, dcaps)> dpros.dcaps$tab OutcomePredictor 1 p0 2 p1 oddsratio lower upper 1 95 0.2802360 4 0.09756098 1.000000 NA NA 2 123 0.3628319 9 0.21951220 1.737805 0.5193327 5.815089 3 84 0.2477876 12 0.29268293 3.392857 1.0540422 10.921270 4 37 0.1091445 16 0.39024390 10.270270 3.2208157 32.748987 OutcomePredictor p.value 1 NA 2 4.050642e-01 3 3.777900e-02 4 1.271225e-05> fisher.test(table(dpros, dcaps))
Fisher's Exact Test for Count Data
data: table(dpros, dcaps) p-value = 2.520e-05alternative hypothesis: two.sided
![Page 7: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/7.jpg)
Categorical vs. continuous
t-tests and anova: means by category> summary(lm(log(psa)~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2506 0.1877 6.662 9.55e-11 ***dcaps 0.8647 0.1632 5.300 1.97e-07 ***---Residual standard error: 0.9868 on 378 degrees of freedomMultiple R-squared: 0.06917, Adjusted R-squared: 0.06671 F-statistic: 28.09 on 1 and 378 DF, p-value: 1.974e-07
> summary(lm(log(psa)~factor(dpros)))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1418087 0.0992064 21.589 < 2e-16 ***factor(dpros)2 -0.1060634 0.1312377 -0.808 0.419 factor(dpros)3 0.0001465 0.1413909 0.001 0.999 factor(dpros)4 0.7431101 0.1680055 4.423 1.28e-05 ***---Residual standard error: 0.9871 on 376 degrees of freedomMultiple R-squared: 0.07348, Adjusted R-squared: 0.06609 F-statistic: 9.94 on 3 and 376 DF, p-value: 2.547e-06
![Page 8: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/8.jpg)
Categorical vs. continuous
> summary(lm(vol~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.905 3.477 6.587 1.51e-10 ***dcaps -6.362 3.022 -2.106 0.0359 * ---Residual standard error: 18.27 on 377 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.01162, Adjusted R-squared: 0.009003 F-statistic: 4.434 on 1 and 377 DF, p-value: 0.03589
> summary(lm(vol~factor(dpros)))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 17.417 1.858 9.374 <2e-16 ***factor(dpros)2 -1.638 2.453 -0.668 0.505 factor(dpros)3 -1.976 2.641 -0.748 0.455 factor(dpros)4 -3.513 3.136 -1.120 0.263 ---Residual standard error: 18.39 on 375 degrees of freedom (1 observation deleted due to missingness)Multiple R-squared: 0.003598, Adjusted R-squared: -0.004373 F-statistic: 0.4514 on 3 and 375 DF, p-value: 0.7164
![Page 9: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/9.jpg)
Categorical vs. continuous
> summary(lm(gleason~dcaps))Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.2560 0.1991 26.401 < 2e-16 ***dcaps 1.0183 0.1730 5.885 8.78e-09 ***---Residual standard error: 1.047 on 378 degrees of freedomMultiple R-squared: 0.08394, Adjusted R-squared: 0.08151 F-statistic: 34.63 on 1 and 378 DF, p-value: 8.776e-09
> summary(lm(gleason~factor(dpros)))
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.9798 0.1060 56.402 < 2e-16 ***factor(dpros)2 0.4217 0.1403 3.007 0.00282 ** factor(dpros)3 0.4890 0.1511 3.236 0.00132 ** factor(dpros)4 0.9636 0.1795 5.367 1.40e-07 ***---
Residual standard error: 1.055 on 376 degrees of freedomMultiple R-squared: 0.07411, Adjusted R-squared: 0.06672 F-statistic: 10.03 on 3 and 376 DF, p-value: 2.251e-06
![Page 10: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/10.jpg)
Lots of “correlation” between covariates
We should expect that there will be insignificance and confounding.
Still, try the ‘full model’ and see what happens
![Page 11: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/11.jpg)
Full model results
> mreg <- glm(cap.inv ~ gleason + log(psa) + vol + dcaps + factor(dpros), family=binomial)
> > summary(mreg)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.617036 1.102909 -7.813 5.58e-15 ***gleason 0.908424 0.166317 5.462 4.71e-08 ***log(psa) 0.514200 0.156739 3.281 0.00104 ** vol -0.014171 0.007712 -1.838 0.06612 . dcaps 0.464952 0.456868 1.018 0.30882 factor(dpros)2 0.753759 0.355762 2.119 0.03411 * factor(dpros)3 1.517838 0.372366 4.076 4.58e-05 ***factor(dpros)4 1.384887 0.453127 3.056 0.00224 ** ---
Null deviance: 511.26 on 378 degrees of freedomResidual deviance: 376.00 on 371 degrees of freedom (1 observation deleted due to missingness)AIC: 392
![Page 12: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/12.jpg)
What next?
Drop or retain? How to interpret?
![Page 13: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/13.jpg)
Likelihood Ratio Test
Recall testing multiple coefficients in linear regression
Approach: ANOVA We don’t have ANOVA for logistic More general approach: Likelihood Ratio Test Based on the likelihood (or log-likelihood) for
“competing” nested models
![Page 14: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/14.jpg)
Likelihood Ratio Test
Ho: small model Ha: large model Example:
)4()3()2(
log)(logit
654
3210
DIDIDI
volPSAGSpi
0and/or ;0;0:
0:
6541
6540
H
H
![Page 15: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/15.jpg)
Recall the likelihood function
n
iiii
n
i i
yi
xxyxyL
x
xxyL
i
1101010
1 10
1010
))exp(1log()(),;,(log
)exp(1
)exp(),;,(
![Page 16: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/16.jpg)
Estimating the log-likelihood
Recall that we use the log-likelihood because it is simpler (back to linear regression)
MLEs:• Betas are selected to maximize the likelihood• Betas also maximize the log-likelihood• If we plus the estimated betas, we get our ‘maximized’
log-likelihood for that model
We compare the log-likelihoods from competing (nested) models
![Page 17: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/17.jpg)
Likelihood Ratio Test
LR statistic = G2 = -2*(LogL(H0)-LogL(H1))
Under the null: G2 ~ χ2(p-q)
If G2 < χ2(p-q),1-α, conclude H0
If G2 > χ2(p-q),1-α conclude H1
![Page 18: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/18.jpg)
LRT in R
-2LogL = Residual Deviance So, G2 = Dev(0) - Dev(1) Fit two models:
0and/or ;0;0:
0:
6541
6540
H
H
![Page 19: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/19.jpg)
> mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros),+ family=binomial)> mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial)> mreg1Coefficients: (Intercept) gleason log(psa) vol -8.31383 0.93147 0.53422 -0.01507 factor(dpros)2 factor(dpros)3 factor(dpros)4 0.76840 1.55109 1.44743
Degrees of Freedom: 378 Total (i.e. Null); 372 Residual (1 observation deleted due to missingness)Null Deviance: 511.3 Residual Deviance: 377.1 AIC: 391.1
> mreg0Coefficients:(Intercept) gleason log(psa) vol -7.76759 0.99931 0.50406 -0.01583
Degrees of Freedom: 378 Total (i.e. Null); 375 Residual (1 observation deleted due to missingness)Null Deviance: 511.3 Residual Deviance: 399 AIC: 407
![Page 20: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/20.jpg)
Testing DPROS
Dev(0) – Dev(1) =
p – q =
χ2(p-q),1-α, =
Conclusion?
p-value?
![Page 21: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/21.jpg)
More in R
qchisq(0.95,3)-2*(logLik(mreg0) - logLik(mreg1))1-pchisq(21.96, 3)
> anova(mreg0, mreg1)Analysis of Deviance Table
Model 1: cap.inv ~ gleason + log(psa) + volModel 2: cap.inv ~ gleason + log(psa) + vol + factor(dpros) Resid. Df Resid. Dev Df Deviance1 375 399.02 2 372 377.06 3 21.96>
![Page 22: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/22.jpg)
Notes on LRT
Again, models have to be NESTED For comparing models that are not nested, you
need to use other approaches Examples:
• AIC • BIC• DIC
Next time….
![Page 23: Lecture 15: Logistic Regression: Inference and link functions](https://reader036.fdocuments.us/reader036/viewer/2022062520/568158f6550346895dc632db/html5/thumbnails/23.jpg)
For next time, read the following article
Low Diagnostic Yield of Elective Coronary AngiographyPatel, Peterson, Dai et al.NEJM, 362(10). pp. 2886-95March 11, 2010
http://content.nejm.org/cgi/content/short/362/10/886?ssource=mfv