1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
-
date post
22-Dec-2015 -
Category
Documents
-
view
219 -
download
2
Transcript of 1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
2
Exercise 11.1
• Predict risk of malaria from age and log transformed antibody level using logistic regression
• First examine some plots
• Then fit the logistic regression
• Interpret the results
• Check model assumptions
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
3
> library(ISwR)Loading required package: survival Loading required package: splines > data(malaria)> summary(malaria) subject age ab mal Min. : 1.00 Min. : 3.00 Min. : 2.0 Min. :0.00 1st Qu.: 25.75 1st Qu.: 5.75 1st Qu.: 29.0 1st Qu.:0.00 Median : 50.50 Median : 9.00 Median : 111.0 Median :0.00 Mean : 50.50 Mean : 8.86 Mean : 311.5 Mean :0.27 3rd Qu.: 75.25 3rd Qu.:12.00 3rd Qu.: 373.8 3rd Qu.:1.00 Max. :100.00 Max. :15.00 Max. :2066.0 Max. :1.00 > attach(malaria)> plot(age,mal)> plot(log(ab),mal)> plot(age, log(ab))
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
7
> mal.glm <- glm(mal ~ age+log(ab),binomial)> summary(mal.glm)
Call:glm(formula = mal ~ age + log(ab), family = binomial)
Deviance Residuals: Min 1Q Median 3Q Max -1.8492 -0.7536 -0.4838 0.8809 2.5796
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.57234 0.95184 2.702 0.006883 ** age -0.06546 0.06772 -0.967 0.333703 log(ab) -0.68235 0.19552 -3.490 0.000483 ***---(Dispersion parameter for binomial family taken to be 1)
Null deviance: 116.652 on 99 degrees of freedomResidual deviance: 98.017 on 97 degrees of freedomAIC: 104.02Residual deviance shows no lack of fit
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
8
> summary(glm(mal ~ log(ab),binomial))
Call:glm(formula = mal ~ log(ab), family = binomial)
Deviance Residuals: Min 1Q Median 3Q Max -1.9159 -0.7339 -0.4854 0.8813 2.4722
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.1552 0.8401 2.565 0.010305 * log(ab) -0.7122 0.1932 -3.686 0.000228 ***---
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 116.652 on 99 degrees of freedomResidual deviance: 98.968 on 98 degrees of freedomAIC: 102.97
Number of Fisher Scoring iterations: 4
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
9
Exercise 11.2
• The 'gvhd' data frame has 37 rows and 7 columns. It contains data from patients receiving a nondepleted allogenic bone marrow transplant, with the purpose of finding variables associated with the development of acute graft-versus-host disease.
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
10
• pnr a numeric vector. Patient number.• rcpage a numeric vector. Age of recipient (years).• donage a numeric vector. Age of donor (years).• type a numeric vector, type of leukaemia coded 1: AML,
2: ALL, 3: CML for acute myeloid, acute lymphatic, and chronic myeloid leukaemia.
• preg a numeric vector code, indicating whether donor has been pregnant. 0: no, 1: yes.
• index a numeric vector giving an index of mixed epidermal cell-lymphocyte reactions.
• gvhd a numeric vector code, graft versus host disease. 0: no, 1: yes.
• time a numeric vector. Follow-up time• dead a numeric vector code 0: no (censored), 1: yes
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
11
> summary(graft.vs.host) pnr rcpage donage type preg Min. : 1 Min. :13.00 Min. :14.00 Min. :1.000 Min. :0.0000 1st Qu.:10 1st Qu.:20.00 1st Qu.:20.00 1st Qu.:1.000 1st Qu.:0.0000 Median :19 Median :23.00 Median :23.00 Median :2.000 Median :0.0000 Mean :19 Mean :25.43 Mean :25.81 Mean :1.973 Mean :0.2703 3rd Qu.:28 3rd Qu.:29.00 3rd Qu.:34.00 3rd Qu.:3.000 3rd Qu.:1.0000 Max. :37 Max. :43.00 Max. :43.00 Max. :3.000 Max. :1.0000 index gvhd time dead Min. : 0.270 Min. :0.0000 Min. : 41.0 Min. :0.0000 1st Qu.: 0.920 1st Qu.:0.0000 1st Qu.: 177.0 1st Qu.:0.0000 Median : 2.010 Median :0.0000 Median : 667.0 Median :0.0000 Mean : 2.556 Mean :0.4595 Mean : 669.8 Mean :0.4865 3rd Qu.: 3.730 3rd Qu.:1.0000 3rd Qu.:1105.0 3rd Qu.:1.0000 Max. :10.110 Max. :1.0000 Max. :1504.0 Max. :1.0000
> attach(graft.vs.host)> hist(index)> hist(sqrt(index))> hist(log(index))
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
15
> summary(glm(gvhd ~ rcpage+donage+type+preg+log(index),binomial))
Deviance Residuals: Min 1Q Median 3Q Max -2.0007 -0.3646 -0.0956 0.5796 1.6828
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.47420 2.59647 -2.108 0.0350 *rcpage 0.01597 0.08205 0.195 0.8457 donage 0.10724 0.08262 1.298 0.1943 type2 -0.42824 1.25389 -0.342 0.7327 type3 1.61517 1.30743 1.235 0.2167 preg 1.66236 1.18103 1.408 0.1593 log(index) 1.81874 0.88919 2.045 0.0408 *---
Null deviance: 51.049 on 36 degrees of freedomResidual deviance: 26.252 on 30 degrees of freedomAIC: 40.252
Number of Fisher Scoring iterations: 6
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
16
> drop1(glm(gvhd ~ rcpage+donage+type+preg+log(index), binomial),test="Chisq")Single term deletions
Model:gvhd ~ rcpage + donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) <none> 26.252 40.252 rcpage 1 26.291 38.291 0.038 0.84495 donage 1 28.082 40.082 1.829 0.17622 type 2 28.959 38.959 2.706 0.25841 preg 1 28.447 40.447 2.195 0.13849 log(index) 1 32.343 44.343 6.090 0.01359 *
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
17
> drop1(glm(gvhd ~ donage+type+preg+log(index),binomial), test="Chisq")Single term deletions
Model:gvhd ~ donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) <none> 26.291 38.291 donage 1 28.832 38.832 2.541 0.11092 type 2 29.403 37.403 3.113 0.21092 preg 1 28.812 38.812 2.522 0.11228 log(index) 1 32.630 42.630 6.340 0.01181 *---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
18
> drop1(glm(gvhd ~ donage+preg+log(index),binomial),test="Chisq")Single term deletions
Model:gvhd ~ donage + preg + log(index) Df Deviance AIC LRT Pr(Chi) <none> 29.403 37.403 donage 1 33.654 39.654 4.251 0.0392375 * preg 1 31.068 37.068 1.665 0.1968922 log(index) 1 41.723 47.723 12.320 0.0004482 ***---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
19
> drop1(glm(gvhd ~ donage+log(index),binomial),test="Chisq")Single term deletions
Model:gvhd ~ donage + log(index) Df Deviance AIC LRT Pr(Chi) <none> 31.068 37.068 donage 1 37.740 41.740 6.671 0.0097993 ** log(index) 1 45.476 49.476 14.407 0.0001473 ***---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
20
> summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq")
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.45399 2.08147 -2.620 0.00879 **donage 0.14594 0.06465 2.257 0.02399 * log(index) 2.17773 0.78986 2.757 0.00583 **Null deviance: 51.049 on 36 degrees of freedomResidual deviance: 31.068 on 34 degrees of freedomAIC: 37.068
> summary(glm(gvhd ~ donage+sqrt(index),binomial),test="Chisq")Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.28968 2.77653 -2.986 0.00283 **donage 0.14353 0.06218 2.308 0.02100 * sqrt(index) 2.96830 1.08381 2.739 0.00617 **Null deviance: 51.049 on 36 degrees of freedomResidual deviance: 31.591 on 34 degrees of freedomAIC: 37.591
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
21
> summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq")
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.45399 2.08147 -2.620 0.00879 **donage 0.14594 0.06465 2.257 0.02399 * log(index) 2.17773 0.78986 2.757 0.00583 **Null deviance: 51.049 on 36 degrees of freedomResidual deviance: 31.068 on 34 degrees of freedomAIC: 37.068
> summary(glm(gvhd ~ donage+index,binomial),test="Chisq")Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.08600 2.10453 -2.892 0.00383 **donage 0.14075 0.06002 2.345 0.01902 * index 0.94439 0.35914 2.630 0.00855 **Null deviance: 51.049 on 36 degrees of freedomResidual deviance: 32.362 on 34 degrees of freedomAIC: 38.362
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
22
Other Issues
• If we want to analyze time to GVHD, we could use the time and the censoring variable
• This is complex because someone who does not have GVHD may develop it later (one type of censoring) and cannot develop it if dead (another type)
• We have competing risks• Some of these issues will be addressed next
quarter
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
23
Exercise 11.3
• The function confint() gives parameter confidence intervals for nonlinear models that are more accurate than the ones given by default
• This uses a profile likelihood technique, which varies the parameter until the change in likelihood/deviance is too big
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
24
> mal.glm <- glm(mal ~ age+log(ab),binomial)> summary(mal.glm)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.57234 0.95184 2.702 0.006883 ** age -0.06546 0.06772 -0.967 0.333703 log(ab) -0.68235 0.19552 -3.490 0.000483 ***
-.68235 +- (1.960)(.19552) = (-1.066, -.2991)
> library(MASS)> confint(mal.glm)Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) 0.7811646 4.54544887age -0.2024561 0.06556186log(ab) -1.0985746 -0.32397509
December 1, 2004 EPP 245 Statistical Analysis of Laboratory Data
25
> gvhd.glm <- glm(gvhd ~ donage+log(index),binomial)> summary(gvhd.glm)Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.45399 2.08147 -2.620 0.00879 **donage 0.14594 0.06465 2.257 0.02399 * log(index) 2.17773 0.78986 2.757 0.00583 **
2.17773 +- (1.960)(0.78986) = (0.6296, 3.7259)0.14594 +- (1.960)(0.06465) = (0.0192, 0.2727)
> confint(gvhd.glm)Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) -10.38957922 -1.9553765donage 0.03262422 0.2946919log(index) 0.89230108 4.0712645