Ed231C- Generalized Linear Models
description
Transcript of Ed231C- Generalized Linear Models
-
26/02/10 12:26Ed231C: Generalized Linear Models
Page 1 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
Applies Categorical & Nonnormal Data AnalysisGeneralized Linear Models
Generalized Linear Models
Most students are introduced to linear models through either multiple regression or analysis of variance.With these methods the expected value of the response variable is statistically modeled, that is, it isexpressed as a linear combination of the explanatory variables. With categorical and count responsevariables, the regression cannot be linear. The problem of nonlinearity is handled through nonlinearfunctions that transform the expected value of the categorical or count variable into a linear function ofthe explanatory variables. Such transformations are referred to as link functions.
For example, in the analysis of count data, the expected frequencies must be nonnegative. To ensure thatthe predicted values from the linear models fit these constraints, the log link is used to transform theexpected value of the response variable. This loglinear transformation serves two purposes: it ensures thatthe fitted values are appropriate for count data, and it permits the unknown regression parameters to liewithin the real number space.
Different types of response variables utilize different link functions: both the logit and probit linkfunctions work with binomial response variables while the log link function works with both poisson andnegative binomial response variables. Growing out of the work of Nelder & Wedderburn (1972) andMcCullagh & Nelder (1989), generalized linear models provides a unified framework which can beapplied to various 'linear' models.
Generalized linear models take the form:
g(E(y)) = x, y -> {F}
where F is the distribution family and g( ) is the link function.
You might recognize this example more easily if it were rewritten as follows:
Y' = b0 + b1X1 + b2X2 + ... y -> {gaussian}
Now we can replace Y' with E(y),
E(y) = b0 + b1X1 + b2X2 + ... y -> {gaussian}
In OLS the distribution family is gaussian (normal), i.e., y -> {gaussian} and the link function is identity,i.e., g(y) = y. Thus, we can write g(E(y)) as just E(y).
-
26/02/10 12:26Ed231C: Generalized Linear Models
Page 2 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
Another example is poisson regression in which the distribution family is poisson, i.e., y -> {poisson}and the link function is the natural log, i.e., g(y) = ln(y). The glm model would then be written as,
g(E(y)) = b0 + b1X1 + b2X2 + ... y -> {poisson}
Here are examples of distributions and link functions for some common estimation procedures:
type of distribution linkestimation family functionOLS regression gaussian identitylogistic regression binomial logitprobit binomial probitcloglog binomial cloglogpoisson regression poisson logneg binomial regression neg binomial log
Stata's GLM Procedure
Stata's glm procedure estimates generalized linear models in which the user can specify both thedistribution family and the link function. Here is the basic syntax of the glm procedure:
glm depvar indvars [if exp] [in range] [, family(fname) link(lname) eform ]
where fname can take on the values gaussian | igaussian | binomial | poisson | nbinomial | gammaand lname can take on the values identity | log | logit | probit | cloglog | nbinomial |power | opower.
An OLS regression would look like this using regress and glm:
regress write read math genderglm write read math gender, family(gaus) link(iden)
A logistic regression would look like this:
logistic honors read math genderglm honors read math gender, family(binom) link(logit)
A poisson regression would look like this:
poisson days read math genderglm days read math gender, family(poisson) link(log)
A negative binomial regression would look like this:
nbreg days read math genderglm days read math gender, family(nbinom) link(log)
Here is a list of the allowable distribution families:
gaussian (normal)inverse gaussianbernoulli (binomial)poisson
-
26/02/10 12:26Ed231C: Generalized Linear Models
Page 3 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
negative binomialgamma
And here is a list of the link functions that are available:
indentityloglogitprobitcomplementary log-logodds powerpowernegative binomiallog-loglog-compliment
Of course, if all that glm could do was duplicate OLS, logistic, poisson and negative binomial regressionthat it would not appear to be very useful. However, it is possible to combine distribution families andlink functions in ways that do not duplicate existing estimation procedures. The table below give thepossible combinations that make sense from a data analysis perspective:
iden log logit probit cloglog nbinom power opower loglog logcgaussian X X Xinverse gaussian X X Xbinomial X X X X X X X X Xpoisson X X Xnegative binomial X X X Xgamma X X X
Examples
use http://www.gseis.ucla.edu/courses/data/hsb2
generate hon = write>=60
regress write read math female
Source | SS df MS Number of obs = 200-------------+------------------------------ F( 3, 196) = 72.52 Model | 9405.34864 3 3135.11621 Prob > F = 0.0000 Residual | 8473.52636 196 43.2322773 R-squared = 0.5261-------------+------------------------------ Adj R-squared = 0.5188 Total | 17878.875 199 89.843593 Root MSE = 6.5751
------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .3252389 .0607348 5.36 0.000 .2054613 .4450166 math | .3974826 .0664037 5.99 0.000 .266525 .5284401 female | 5.44337 .9349987 5.82 0.000 3.59942 7.287319 _cons | 11.89566 2.862845 4.16 0.000 6.249728 17.5416------------------------------------------------------------------------------
glm write read math female, link(iden) fam(gauss) nolog
Generalized linear models No. of obs = 200
-
26/02/10 12:26Ed231C: Generalized Linear Models
Page 4 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 43.23228Deviance = 8473.526357 (1/df) Deviance = 43.23228Pearson = 8473.526357 (1/df) Pearson = 43.23228
Variance function: V(u) = 1 [Gaussian]Link function : g(u) = u [Identity]Standard errors : OIM
Log likelihood = -658.4261736 AIC = 6.624262BIC = 7435.056153
------------------------------------------------------------------------------ write | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .3252389 .0607348 5.36 0.000 .2062009 .444277 math | .3974826 .0664037 5.99 0.000 .2673336 .5276315 female | 5.44337 .9349987 5.82 0.000 3.610806 7.275934 _cons | 11.89566 2.862845 4.16 0.000 6.28459 17.50674------------------------------------------------------------------------------
logit hon read math female, nolog
Logit estimates Number of obs = 200 LR chi2(3) = 80.87 Prob > chi2 = 0.0000Log likelihood = -75.209827 Pseudo R2 = 0.3496
------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .0752424 .027577 2.73 0.006 .0211924 .1292924 math | .1317117 .0324607 4.06 0.000 .06809 .1953335 female | 1.154801 .4340856 2.66 0.008 .304009 2.005593 _cons | -13.12749 1.850769 -7.09 0.000 -16.75493 -9.50005------------------------------------------------------------------------------
logit, or
Logit estimates Number of obs = 200 LR chi2(3) = 80.87 Prob > chi2 = 0.0000Log likelihood = -75.209827 Pseudo R2 = 0.3496
------------------------------------------------------------------------------ hon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | 1.078145 .0297321 2.73 0.006 1.021419 1.138023 math | 1.140779 .0370305 4.06 0.000 1.070462 1.215716 female | 3.173393 1.377524 2.66 0.008 1.355281 7.430502------------------------------------------------------------------------------
glm hon read math female, link(logit) fam(bin) nolog
Generalized linear models No. of obs = 200Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 1Deviance = 150.4196543 (1/df) Deviance = .7674472Pearson = 164.2509104 (1/df) Pearson = .8380148
Variance function: V(u) = u*(1-u) [Bernoulli]
-
26/02/10 12:26Ed231C: Generalized Linear Models
Page 5 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
Link function : g(u) = ln(u/(1-u)) [Logit]Standard errors : OIM
Log likelihood = -75.20982717 AIC = .7920983BIC = -888.0505495
------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .0752424 .0275779 2.73 0.006 .0211906 .1292941 math | .1317117 .0324623 4.06 0.000 .0680869 .1953366 female | 1.154801 .4341012 2.66 0.008 .3039785 2.005624 _cons | -13.12749 1.850893 -7.09 0.000 -16.75517 -9.499808------------------------------------------------------------------------------
glm, eform
Generalized linear models No. of obs = 200Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 1Deviance = 150.4196543 (1/df) Deviance = .7674472Pearson = 164.2509104 (1/df) Pearson = .8380148
Variance function: V(u) = u*(1-u) [Bernoulli]Link function : g(u) = ln(u/(1-u)) [Logit]Standard errors : OIM
Log likelihood = -75.20982717 AIC = .7920983BIC = -888.0505495
------------------------------------------------------------------------------ hon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | 1.078145 .029733 2.73 0.006 1.021417 1.138025 math | 1.140779 .0370323 4.06 0.000 1.070458 1.21572 female | 3.173393 1.377573 2.66 0.008 1.35524 7.430728------------------------------------------------------------------------------
probit hon read math female, nolog
Probit estimates Number of obs = 200 LR chi2(3) = 81.80 Prob > chi2 = 0.0000Log likelihood = -74.745943 Pseudo R2 = 0.3537
------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .0473262 .0157561 3.00 0.003 .0164449 .0782076 math | .0735256 .0173216 4.24 0.000 .0395759 .1074754 female | .6824682 .2447275 2.79 0.005 .2028112 1.162125 _cons | -7.663304 .9921289 -7.72 0.000 -9.607841 -5.718767------------------------------------------------------------------------------
glm hon read math female, link(probit) fam(bin) nolog
Generalized linear models No. of obs = 200Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 1Deviance = 149.4918859 (1/df) Deviance = .7627137Pearson = 160.9679286 (1/df) Pearson = .8212649
-
26/02/10 12:26Ed231C: Generalized Linear Models
Page 6 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
Variance function: V(u) = u*(1-u) [Bernoulli]Link function : g(u) = invnorm(u) [Probit]Standard errors : OIM
Log likelihood = -74.74594294 AIC = .7874594BIC = -888.978318
------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .0473262 .0157561 3.00 0.003 .0164448 .0782077 math | .0735256 .0173217 4.24 0.000 .0395758 .1074755 female | .6824681 .2447281 2.79 0.005 .2028098 1.162126 _cons | -7.663303 .9921345 -7.72 0.000 -9.607851 -5.718755------------------------------------------------------------------------------
use http://www.gseis.ucla.edu/courses/data/lahigh, clear
poisson daysabs langnce gender, nolog
Poisson regression Number of obs = 316 LR chi2(2) = 171.50 Prob > chi2 = 0.0000Log likelihood = -1549.8567 Pseudo R2 = 0.0524
------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | -.01467 .0012934 -11.34 0.000 -.0172051 -.0121349 gender | -.4093528 .0482192 -8.49 0.000 -.5038606 -.3148449 _cons | 2.646977 .0697764 37.94 0.000 2.510217 2.783736------------------------------------------------------------------------------
poisson, irr
Poisson regression Number of obs = 316 LR chi2(2) = 171.50 Prob > chi2 = 0.0000Log likelihood = -1549.8567 Pseudo R2 = 0.0524
------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | .9854371 .0012746 -11.34 0.000 .982942 .9879384 gender | .6640799 .0320214 -8.49 0.000 .6041936 .7299021------------------------------------------------------------------------------
glm daysabs langnce gender, link(log) fam(poisson) nolog
Generalized linear models No. of obs = 316Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1Deviance = 2238.317597 (1/df) Deviance = 7.151174Pearson = 2752.913231 (1/df) Pearson = 8.79525
Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]Standard errors : OIM
-
26/02/10 12:26Ed231C: Generalized Linear Models
Page 7 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
Log likelihood = -1549.85665 AIC = 9.828207BIC = 436.7702841
------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | -.01467 .0012934 -11.34 0.000 -.0172051 -.0121349 gender | -.4093528 .0482192 -8.49 0.000 -.5038606 -.3148449 _cons | 2.646977 .0697764 37.94 0.000 2.510217 2.783736------------------------------------------------------------------------------
glm, eform
Generalized linear models No. of obs = 316Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1Deviance = 2238.317597 (1/df) Deviance = 7.151174Pearson = 2752.913231 (1/df) Pearson = 8.79525
Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]Standard errors : OIM
Log likelihood = -1549.85665 AIC = 9.828207BIC = 436.7702841
------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | .9854371 .0012746 -11.34 0.000 .982942 .9879384 gender | .6640799 .0320214 -8.49 0.000 .6041936 .7299021------------------------------------------------------------------------------
nbreg daysabs langnce gender, nolog
Negative binomial regression Number of obs = 316 LR chi2(2) = 20.63 Prob > chi2 = 0.0000Log likelihood = -880.9274 Pseudo R2 = 0.0116
------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | -.0156493 .0039485 -3.96 0.000 -.0233882 -.0079104 gender | -.4312069 .1396913 -3.09 0.002 -.7049968 -.1574169 _cons | 2.70344 .2292762 11.79 0.000 2.254067 3.152813-------------+---------------------------------------------------------------- /lnalpha | .25394 .095509 .0667457 .4411342-------------+---------------------------------------------------------------- alpha | 1.289094 .1231201 1.069024 1.554469------------------------------------------------------------------------------Likelihood ratio test of alpha=0: chibar2(01) = 1337.86 Prob>=chibar2 = 0.000
glm daysabs langnce gender, link(log) fam(nbin) nolog
Generalized linear models No. of obs = 316Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1Deviance = 425.603464 (1/df) Deviance = 1.359755Pearson = 415.6288036 (1/df) Pearson = 1.327888
-
26/02/10 12:26Ed231C: Generalized Linear Models
Page 8 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
Variance function: V(u) = u+(1)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]Standard errors : OIM
Log likelihood = -884.4953535 AIC = 5.617059BIC = -1375.943849
------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | -.0156357 .0035438 -4.41 0.000 -.0225814 -.0086899 gender | -.4307736 .1253082 -3.44 0.001 -.6763732 -.185174 _cons | 2.702606 .2052709 13.17 0.000 2.300282 3.104929------------------------------------------------------------------------------
glm, eform
Generalized linear models No. of obs = 316Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1Deviance = 425.603464 (1/df) Deviance = 1.359755Pearson = 415.6288036 (1/df) Pearson = 1.327888
Variance function: V(u) = u+(1)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]Standard errors : OIM
Log likelihood = -884.4953535 AIC = 5.617059BIC = -1375.943849
------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | .9844859 .0034888 -4.41 0.000 .9776716 .9913477 gender | .650006 .0814511 -3.44 0.001 .5084577 .8309596------------------------------------------------------------------------------
glm daysabs langnce gender, fam(gamma) link(log) nolog
Generalized linear models No. of obs = 316Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1.583724Deviance = 251.8270233 (1/df) Deviance = .8045592Pearson = 495.7055497 (1/df) Pearson = 1.583724
Variance function: V(u) = u^2 [Gamma]Link function : g(u) = ln(u) [Log]Standard errors : OIM
Log likelihood = -856.2487643 AIC = 5.438283BIC = -1549.72029
------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | -.0156852 .0040626 -3.86 0.000 -.0236478 -.0077226 gender | -.4326492 .1443719 -3.00 0.003 -.7156129 -.1496854 _cons | 2.705757 .2383799 11.35 0.000 2.238541 3.172973------------------------------------------------------------------------------
glm, eform
-
26/02/10 12:26Ed231C: Generalized Linear Models
Page 9 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
Generalized linear models No. of obs = 316Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1.583724Deviance = 251.8270233 (1/df) Deviance = .8045592Pearson = 495.7055497 (1/df) Pearson = 1.583724
Variance function: V(u) = u^2 [Gamma]Link function : g(u) = ln(u) [Log]Standard errors : OIM
Log likelihood = -856.2487643 AIC = 5.438283BIC = -1549.72029
------------------------------------------------------------------------------ daysabs | ExpB Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | .9844372 .0039994 -3.86 0.000 .9766296 .9923071 gender | .6487881 .0936668 -3.00 0.003 .4888924 .8609788------------------------------------------------------------------------------
Categorical Data Analysis Course
Phil Ender
http://www.gseis.ucla.edu/courses/ed231c/231c.htmlhttp://www.gseis.ucla.edu/ender/