AMMBR II Gerrit Rooks. Checking assumptions in logistic regression Hosmer & Lemeshow Residuals...
-
Upload
muriel-mosley -
Category
Documents
-
view
219 -
download
4
Transcript of AMMBR II Gerrit Rooks. Checking assumptions in logistic regression Hosmer & Lemeshow Residuals...
AMMBR II
Gerrit Rooks
Checking assumptions in logistic regression
• Hosmer & Lemeshow• Residuals• Multi-collinearity• Cooks distance
Hosmer & Lemeshow
Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups
Test should not be significant (indicating no difference)
Hosmer & Lemeshow
AverageProbabilityIn j th group
First logistic regression
_cons 2.425635 .3995025 6.07 0.000 1.642624 3.208645 cred_ml .7406536 .3152647 2.35 0.019 .1227463 1.358561 meals -.0936 .0084587 -11.07 0.000 -.1101786 -.0770213 yr_rnd -1.189537 .5022235 -2.37 0.018 -2.173877 -.2051967 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -156.25611 Pseudo R2 = 0.5523 Prob > chi2 = 0.0000 LR chi2(3) = 385.53Logistic regression Number of obs = 707
Iteration 5: log likelihood = -156.25611 Iteration 4: log likelihood = -156.25612 Iteration 3: log likelihood = -156.27132 Iteration 2: log likelihood = -160.11854 Iteration 1: log likelihood = -199.10312 Iteration 0: log likelihood = -349.01971
. logit hiqual yr_rnd meals cred_ml
Then postestimation command
Prob > chi2 = 0.0000 Hosmer-Lemeshow chi2(8) = 40.45 number of groups = 10 number of observations = 707
10 0.9595 62 61.1 8 8.9 70 9 0.7531 44 43.5 26 26.5 70 8 0.4960 23 22.0 47 48.0 70 7 0.1554 4 7.4 68 64.6 72 6 0.0560 2 2.4 68 67.6 70 5 0.0208 1 0.9 71 71.1 72 4 0.0078 0 0.4 68 67.6 68 3 0.0037 0 0.2 71 70.8 71 2 0.0019 1 0.1 71 71.9 72 1 0.0008 1 0.0 71 72.0 72 Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total (Table collapsed on quantiles of estimated probabilities)
Logistic model for hiqual, goodness-of-fit test
. estat gof, table group(10)
Including interaction term helps
_cons 2.686005 .4307661 6.24 0.000 1.841719 3.530291 ym .0463257 .0188326 2.46 0.014 .0094145 .0832368 cred_ml .7789823 .3206881 2.43 0.015 .1504452 1.407519 meals -.1019211 .0098691 -10.33 0.000 -.1212641 -.0825781 yr_rnd -2.834458 .8630901 -3.28 0.001 -4.526083 -1.142832 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -153.78831 Pseudo R2 = 0.5594 Prob > chi2 = 0.0000 LR chi2(4) = 390.46Logistic regression Number of obs = 707
. logit hiqual yr_rnd meals cred_ml ym , nolog
. gen ym=yr_rnd*meals
Multicollinearity
Mean VIF 2.56 yr_rnd 1.11 0.903460 avg_ed 3.25 0.307731 meals 3.31 0.301982 Variable VIF 1/VIF
. vif
_cons .2445202 .0824989 2.96 0.003 .0826554 .4063849 meals -.0076084 .000527 -14.44 0.000 -.0086423 -.0065744 yr_rnd -.0008586 .0248112 -0.03 0.972 -.0495386 .0478215 avg_ed .1729601 .021089 8.20 0.000 .1315831 .2143371 hiqual Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 254.263385 1157 .219760921 Root MSE = .30632 Adj R-squared = 0.5730 Residual 108.279876 1154 .093830049 R-squared = 0.5741 Model 145.983509 3 48.6611696 Prob > F = 0.0000 F( 3, 1154) = 518.61 Source SS df MS Number of obs = 1158
. reg hiqual avg_ed yr_rnd meals
Residuals
• Residual = observed value – predicted value / square root of variation
(42 missing values generated). predict stdres, rstand
(42 missing values generated)(option pr assumed; Pr(hiqual)). predict p
Residuals
186018135012 4552185045346984007187112933069 2521134440642152
211569651401951505919521859 5222664347513024873492140860444006452227723728227459071185406111864901 61461618545718994068208639751967 22083509413244912921724409858703121401247811607488264411872114 28223612273218467671660 19852223214459931807 198739781713164213650951630194721401315 50031966 47834718 20735980 521552952075018 1629738592218631103 28353166 323553871748594649758939302269176930232977524225192339
3147881417317291077186160465990244147271843 509014512326 39661497300921038721728214122721886 52108121501992594857685052477 3246465 84472959061988736176839704961980198219971926759 654696 46983118208931309801623413345041595932486855732786187258345899189026854876462740031069131894986717861879 207658671494 193248603767 321448651894151652402623524548331192788321617466572127589617233174190617174146909313113455953191617182280 593741437531721 1819 5379148817142323189823254618 141545964445430137942981 14374358 6124207852703161 33653735 54332548 33757142882 427612744353107 38252284927233231761083688 6105 53385900 252746263353 52734351 549945854673 432610723159
4773410 5725 60435783162 52184461004
3294314
433037644521 4329670 43284320 52945694 302937415663
33711085 44333801001 52242989 204356
309850542324 1523461422546 112542
2910526857285926 5299591289949555547 49213703 5110337359492582 8623622 3945065527152276126
30975000653700 4550
5154
5123084 4436296 6145297356201851 2490
227042533426 1502487060082480 5636479946991709995
19041923 38001706 356648152624 23336007 160017711685 39542281 32071657 4561173 17992276 538048155811339 11123150 5607
583692 5375 5374
5300
47024736 6116
549438871340 252051011500
4663
1402
11611762
1403
481138954852
22272226 3986 61066017846 13622957 320147465358 33552136 24531696140
29551264216845002128 384936381055954 1792337
50165911 58275018281795
229340593944173916872695 57191350302226633833 521919242083 4853 2319249148804002 30042752
167
19654539 536144144724
44152070 169828706182 1777 32389416715421
430726005561
162053042951520018243013 2841342544 35832972 536340225313258914841855 660 5295 1035722639180917372282981 29441751273060154533484 5211755 58734056 10451280 2984536958829355093 12391118 230714901758127514502494395511562599 70
784948 38241839 2338116 5036401821266885998 661278
52112116347116131511 38845252 335048263522 30871887
14931672
16792430 472812766180591058742440147327953765
2991 1646 623519 590464317225834130 531216016088 470540102692 389319141853 50622714 4705406 57011949 49634536
60385700
4553 2313507835324638 924640 4284184523 379736993411 2905330555482606
3307490 5276503533163775 29086109 49233853
30815847
4719
743 44394411 54425404 4271647
503641 3343
5765708 56925483581834283778 1294396 1131505754692929853666 694
3003 5323742 563532063043 4399203 5408285
43345534 451936213083 51333295092 459445254745
51146614544842 3422 550328856769234381 321037125967 4452 5716 520448657483695
3465107 28985943 272583
47353708628 42852535457246835787
336657375039
3272 692256544374194594 4083121556937613733 45575842550630077733754 49264289649 3265612959173760 33452179 487 32856674019 5612 53053193574 5563540142864497 6030 51924984583 435098 49855471 2904560541314496 1115
3845 395648791461
5864
38341234
540326522672351 14015134 258710627542698 2489
26351419 5904 4651
28496 1249 12321108 4040228 13792334
4747532913115063
1912609040245599
23772580 22661426 484948243822 544118301390 4175 270536133502 2755
216521195956
2369 12132583570 38365434
166623781492 46452679 13834223
55694816 39041297 5664
3917 181
144457732386
1219 5409422 533126252691 2098
36751199 610116614135 219818743610 60434309585838763864 5798 478647 138
372
9316252588 1608 205
596
42023460
399851941427 27035020 3870
385838291373 5572
4558
13104512 54145427531627 259
335654444226 32245316483951331198 2802
365019152593 21676190 13723832687 351847785196
51495586 3655
1033
215941822622 2381241 396053974709 121426966016856 1240
53262191 11001458 5593363
26074220373
41454477129916815589
836 5334421361714240 54654248 3521 999801 20973415 4391101834544409 42574275 784 42824670 4237799 3296
13843289
45473340
443944266134490 352042683111 5928
2817 49642924
8102711 3266
56464654 2902319 367058371031
6036
44003408 5555 3293283 61864033 2935
840
4608 4314505 388223874822 45144292
450645374036 48203593 2930
3656
42031038
3582 6114678
30635755
2922559 4790404558624518
2509792 5777387450262573 543563 571349111473843244 5853
358150565851
5704 284255974385 51895639 35894556
4369 35305656 293429135657
4580776 30643881
401
36362918 480023535796 3865606 3126019 302
49361514932
3634323656383204329426364121 5427
42785712
4591572327045844
4043 386828013449
6087
5192
328
4609342
44285761 420034164084
125301
748 331737305
4366 12344834035
5752 492952882816
5524
4386 381259784302 427040915968
493426436156063
4910364061725647 3757 257142644581010
2030
4050
stan
dard
ize
d P
ears
on r
esi
dua
l
0 .2 .4 .6 .8 1Pr(hiqual)
. scatter stdres p, mlabel(snum)
Inspect observations with large residuals (>2.5 a 3)
No 27 2.19 0 100 awards ell avg_ed hicred ym low medium medium . 808 824 59 28 cred_hl pared pared_ml pared_hl api00 api99 full some_col 1403 315 high high nd 100 497 low low 458. snum dnum schqual hiqual yr_rnd meals enroll cred cred_ml
. list if snum==1403
_cons -3.528875 1.037345 -3.40 0.001 -5.562035 -1.495716 avg_ed 2.010791 .2947269 6.82 0.000 1.433137 2.588445 meals -.0790397 .0076984 -10.27 0.000 -.0941283 -.0639511 yr_rnd -1.1328 .3842377 -2.95 0.003 -1.885892 -.3797077 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -265.68934 Pseudo R2 = 0.6358 Prob > chi2 = 0.0000 LR chi2(3) = 927.75Logistic regression Number of obs = 1157
Iteration 5: log likelihood = -265.68934 Iteration 4: log likelihood = -265.68934 Iteration 3: log likelihood = -265.70542 Iteration 2: log likelihood = -270.06297 Iteration 1: log likelihood = -332.43297 Iteration 0: log likelihood = -729.56398
. logit hiqual yr_rnd meals avg_ed if snum != 1403
_cons -3.566451 1.01715 -3.51 0.000 -5.560028 -1.572874 avg_ed 1.98805 .2884154 6.89 0.000 1.422766 2.553334 meals -.0758864 .0074453 -10.19 0.000 -.090479 -.0612938 yr_rnd -.9913148 .3743452 -2.65 0.008 -1.725018 -.2576117 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -273.66402 Pseudo R2 = 0.6255 Prob > chi2 = 0.0000 LR chi2(3) = 914.05Logistic regression Number of obs = 1158
. logit hiqual yr_rnd meals avg_ed, nolog
Cooks distance (< 1)
Means square errorNumber of parameter
Prediction for j from all observations
Prediction for j for observations excludingobservation i
cook 707 .0257177 .0899176 2.11e-07 .6101257 Variable Obs Mean Std. Dev. Min Max
. summ cook
(493 missing values generated). predict cook, dbeta
_cons 2.425635 .3995025 6.07 0.000 1.642624 3.208645 cred_ml .7406536 .3152647 2.35 0.019 .1227463 1.358561 yr_rnd -1.189537 .5022235 -2.37 0.018 -2.173877 -.2051967 meals -.0936 .0084587 -11.07 0.000 -.1101786 -.0770213 hiqual Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -156.25611 Pseudo R2 = 0.5523 Prob > chi2 = 0.0000 LR chi2(3) = 385.53Logistic regression Number of obs = 707
. logit hiqual meals yr_rnd cred_ml, nolog
. graph twoway scatter cook p, mlabel(snum)
1860
1813411650121779
4552
185045346984007187112933069
2521
13444064 215221156965140
1951505919521859 5222664347513024873492140860444006452227723728227459071185406111864901 614615991618
5457
1899
40682086
3975
4312
1967 220835094132 449129217244098587031214012 478116074882644118721142245
4294
282236122732
1846
7671660873 19852223214459931807892
1987
397817131642 13650951630194721401315 5003
1966
47834718 20735980 5215529
520750184095 16297385922186311031175
904
2835 32483166 3235
5387
1748178459464975893
930
226917693023297752422519
2339
3147881
4173172910771861
709
60465990244147271843 5090145123263966
1497
300921038721728214122721886 5210
81
2150
117116221992594857685052477 324646584
472959061988736176839704961980198219971926759 65
46962239
4698
311822402089313098016234133450415959324868557327861872583443445899
934
1890
718
2685487646274003106913189498671786
1879
207658671494
1932
48603767 321448651894
1516
52402623524548331192788321687817466572127589617233174429611801906171741469093131134559531916 72017182280 593743184143
753
1721
1819
53791488171423231898117990023254618 1415
5620
1851 2490
22704253
3426 1502487060082480 5636
4799
46991709995
1904
19233800
17063566
48152624 23336007160017711685 39542281 32071657 4561173 17992276
5380
48155811339
1112
31505607
5836
92 5375 53745300
4702
4736 61165494388713402520
5101
1500
4663
1402
11611762
1403
48113895 4852222722263986 61066017846 13622957 320147465358
33552136 24531696
140
29551264216845002128 3849
3638
1055954 179 233750165911
5827501828
1795229340593944173916872695
57191350
302226633833
521919242083
4853
231924914880
4002
30042752
167
19654539 53614414 472444152070
169828706182
1777
3238941671 542143072600 5561
16205304
2951520018243013 2841342544 358329725363
40225313258914841855660
529510
3572 26391809
1737
2282981 2944175127306015 4533484 5211755 587340561045
1280
2984
53695882
935
5093
1239
11182307
14901758127514502494 395511562599
70
78
49483824
1839 2338116
5036
401821266885998
66
1278 52112116
347116131511 38845252 335048263522 308718871493
1672
16792430 47281276 61805910
5874
244014732795
3765
2991 1646
62
3519590464317225834130 53121601 6088
470540102692
3893
19141853
5062
2714 4705406
1115
3845
395648791461
58643834
1234
5403
26522672
351
1401
5134 258710627542698248926351419 5904
4651284
96
1249
1232
11084040
228 1379
2334
4747
53291311
5063
191260904024
5599
237725802266
1426 484948243822
5441
183013904175 2705
36133502 2755
2165
21195956
2369
1213258 3570 3836 543416662378
14924645
2679
13834223
5569
4816 39041297
5664
3917
181 14445773 238612195409422 5331
26252691 2098
3675
11996101
16614135219818743610 60434309585838763864 5798
4786
47 138 37293162525881608205596
420234603998 51941427
2703
50203870 3858
3829
1373
55724558 1310
451254145427531627
259 335654444226
3224
5316
483951331198
2802
36501915
2593
21676190
137
23832687
35184778
51965149
55863655
1033
2159
4182
2622 238
1241
396053974709 12142696
6016856 1240
5326
2191 11001458 5593363
2607
4220
373
4145447712991681
5589
836
5334
42136171 4240
546542480.2
.4.6
Pre
gib
on's
dbe
ta
0 .2 .4 .6 .8 1Pr(hiqual)
To Stata
• Use apilog.dta• Awards = dependent variable• For Awards inspect frequency counts• Recode Awards into binary variable• Estimate a LR model using yr_rnd meals enroll
as predictors
To Stata
• Inspect classification table• Perform Hosmer & Lemeshow test• Inspect standardized residuals• Inspect cooks distance• See if interaction effects improve fit
• Is the Wald test an accurate test to the significance of coefficients in Logistic regression analysis?a) Yes, just like regression analysis.b) Yes, it is accurate, although a Likelihood ratio test is
more efficientc) No, unlike regression analysis, the Wald test is biased,
especially for relatively small coefficients .d) No, unlike regression analysis, the Wald test is biased,
especially for relatively large coefficients .
• Use LRtest to check the significance effect of the variable yr_rnd
• Use auto.dta (if not on your pc then)– use http://www.stata-press.com/data/r11/auto
• Predict which car will be foreign, using weigth and mpg as predictors
• Is the interaction between weigth and mpg significant?
• Tip: always center variable before making interactionvariable.
• use http://www.stata-press.com/data/r11/choice
• Does income, gender or type of car (European, Japanese or American) predict whether a car will be bought (choice)?