Ed231C- Generalized Linear Models

26/02/10 12:26Ed231C: Generalized Linear Models

Page 1 sur 9http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

Applies Categorical & Nonnormal Data AnalysisGeneralized Linear Models

Generalized Linear Models

Most students are introduced to linear models through either multiple regression or analysis of variance.With these methods the expected value of the response variable is statistically modeled, that is, it isexpressed as a linear combination of the explanatory variables. With categorical and count responsevariables, the regression cannot be linear. The problem of nonlinearity is handled through nonlinearfunctions that transform the expected value of the categorical or count variable into a linear function ofthe explanatory variables. Such transformations are referred to as link functions.

For example, in the analysis of count data, the expected frequencies must be nonnegative. To ensure thatthe predicted values from the linear models fit these constraints, the log link is used to transform theexpected value of the response variable. This loglinear transformation serves two purposes: it ensures thatthe fitted values are appropriate for count data, and it permits the unknown regression parameters to liewithin the real number space.

Different types of response variables utilize different link functions: both the logit and probit linkfunctions work with binomial response variables while the log link function works with both poisson andnegative binomial response variables. Growing out of the work of Nelder & Wedderburn (1972) andMcCullagh & Nelder (1989), generalized linear models provides a unified framework which can beapplied to various 'linear' models.

Generalized linear models take the form:

g(E(y)) = x, y -> {F}

where F is the distribution family and g( ) is the link function.

You might recognize this example more easily if it were rewritten as follows:

Y' = b0 + b1X1 + b2X2 + ... y -> {gaussian}

Now we can replace Y' with E(y),

E(y) = b0 + b1X1 + b2X2 + ... y -> {gaussian}

In OLS the distribution family is gaussian (normal), i.e., y -> {gaussian} and the link function is identity,i.e., g(y) = y. Thus, we can write g(E(y)) as just E(y).



Another example is poisson regression in which the distribution family is poisson, i.e., y -> {poisson}and the link function is the natural log, i.e., g(y) = ln(y). The glm model would then be written as,

g(E(y)) = b0 + b1X1 + b2X2 + ... y -> {poisson}

Here are examples of distributions and link functions for some common estimation procedures:

type of distribution linkestimation family functionOLS regression gaussian identitylogistic regression binomial logitprobit binomial probitcloglog binomial cloglogpoisson regression poisson logneg binomial regression neg binomial log

Stata's GLM Procedure

Stata's glm procedure estimates generalized linear models in which the user can specify both thedistribution family and the link function. Here is the basic syntax of the glm procedure:

glm depvar indvars [if exp] [in range] [, family(fname) link(lname) eform ]

where fname can take on the values gaussian | igaussian | binomial | poisson | nbinomial | gammaand lname can take on the values identity | log | logit | probit | cloglog | nbinomial |power | opower.

An OLS regression would look like this using regress and glm:

regress write read math genderglm write read math gender, family(gaus) link(iden)

A logistic regression would look like this:

logistic honors read math genderglm honors read math gender, family(binom) link(logit)

A poisson regression would look like this:

poisson days read math genderglm days read math gender, family(poisson) link(log)

A negative binomial regression would look like this:

nbreg days read math genderglm days read math gender, family(nbinom) link(log)

Here is a list of the allowable distribution families:

gaussian (normal)inverse gaussianbernoulli (binomial)poisson



negative binomialgamma

And here is a list of the link functions that are available:

indentityloglogitprobitcomplementary log-logodds powerpowernegative binomiallog-loglog-compliment

Of course, if all that glm could do was duplicate OLS, logistic, poisson and negative binomial regressionthat it would not appear to be very useful. However, it is possible to combine distribution families andlink functions in ways that do not duplicate existing estimation procedures. The table below give thepossible combinations that make sense from a data analysis perspective:

iden log logit probit cloglog nbinom power opower loglog logcgaussian X X Xinverse gaussian X X Xbinomial X X X X X X X X Xpoisson X X Xnegative binomial X X X Xgamma X X X

Examples

use http://www.gseis.ucla.edu/courses/data/hsb2

generate hon = write>=60

regress write read math female

Source | SS df MS Number of obs = 200-------------+------------------------------ F( 3, 196) = 72.52 Model | 9405.34864 3 3135.11621 Prob > F = 0.0000 Residual | 8473.52636 196 43.2322773 R-squared = 0.5261-------------+------------------------------ Adj R-squared = 0.5188 Total | 17878.875 199 89.843593 Root MSE = 6.5751

------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .3252389 .0607348 5.36 0.000 .2054613 .4450166 math | .3974826 .0664037 5.99 0.000 .266525 .5284401 female | 5.44337 .9349987 5.82 0.000 3.59942 7.287319 _cons | 11.89566 2.862845 4.16 0.000 6.249728 17.5416------------------------------------------------------------------------------

glm write read math female, link(iden) fam(gauss) nolog

Generalized linear models No. of obs = 200



Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 43.23228Deviance = 8473.526357 (1/df) Deviance = 43.23228Pearson = 8473.526357 (1/df) Pearson = 43.23228

Variance function: V(u) = 1 [Gaussian]Link function : g(u) = u [Identity]Standard errors : OIM

Log likelihood = -658.4261736 AIC = 6.624262BIC = 7435.056153

------------------------------------------------------------------------------ write | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .3252389 .0607348 5.36 0.000 .2062009 .444277 math | .3974826 .0664037 5.99 0.000 .2673336 .5276315 female | 5.44337 .9349987 5.82 0.000 3.610806 7.275934 _cons | 11.89566 2.862845 4.16 0.000 6.28459 17.50674------------------------------------------------------------------------------

logit hon read math female, nolog

Logit estimates Number of obs = 200 LR chi2(3) = 80.87 Prob > chi2 = 0.0000Log likelihood = -75.209827 Pseudo R2 = 0.3496

------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .0752424 .027577 2.73 0.006 .0211924 .1292924 math | .1317117 .0324607 4.06 0.000 .06809 .1953335 female | 1.154801 .4340856 2.66 0.008 .304009 2.005593 _cons | -13.12749 1.850769 -7.09 0.000 -16.75493 -9.50005------------------------------------------------------------------------------

logit, or

Logit estimates Number of obs = 200 LR chi2(3) = 80.87 Prob > chi2 = 0.0000Log likelihood = -75.209827 Pseudo R2 = 0.3496

------------------------------------------------------------------------------ hon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | 1.078145 .0297321 2.73 0.006 1.021419 1.138023 math | 1.140779 .0370305 4.06 0.000 1.070462 1.215716 female | 3.173393 1.377524 2.66 0.008 1.355281 7.430502------------------------------------------------------------------------------

glm hon read math female, link(logit) fam(bin) nolog

Generalized linear models No. of obs = 200Optimization : ML: Newton-Raphson Residual df = 196 Scale parameter = 1Deviance = 150.4196543 (1/df) Deviance = .7674472Pearson = 164.2509104 (1/df) Pearson = .8380148

Variance function: V(u) = u*(1-u) [Bernoulli]



Link function : g(u) = ln(u/(1-u)) [Logit]Standard errors : OIM

Log likelihood = -75.20982717 AIC = .7920983BIC = -888.0505495

------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .0752424 .0275779 2.73 0.006 .0211906 .1292941 math | .1317117 .0324623 4.06 0.000 .0680869 .1953366 female | 1.154801 .4341012 2.66 0.008 .3039785 2.005624 _cons | -13.12749 1.850893 -7.09 0.000 -16.75517 -9.499808------------------------------------------------------------------------------

glm, eform


Variance function: V(u) = u*(1-u) [Bernoulli]Link function : g(u) = ln(u/(1-u)) [Logit]Standard errors : OIM


------------------------------------------------------------------------------ hon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | 1.078145 .029733 2.73 0.006 1.021417 1.138025 math | 1.140779 .0370323 4.06 0.000 1.070458 1.21572 female | 3.173393 1.377573 2.66 0.008 1.35524 7.430728------------------------------------------------------------------------------

probit hon read math female, nolog

Probit estimates Number of obs = 200 LR chi2(3) = 81.80 Prob > chi2 = 0.0000Log likelihood = -74.745943 Pseudo R2 = 0.3537

------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .0473262 .0157561 3.00 0.003 .0164449 .0782076 math | .0735256 .0173216 4.24 0.000 .0395759 .1074754 female | .6824682 .2447275 2.79 0.005 .2028112 1.162125 _cons | -7.663304 .9921289 -7.72 0.000 -9.607841 -5.718767------------------------------------------------------------------------------

glm hon read math female, link(probit) fam(bin) nolog




Variance function: V(u) = u*(1-u) [Bernoulli]Link function : g(u) = invnorm(u) [Probit]Standard errors : OIM


------------------------------------------------------------------------------ hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- read | .0473262 .0157561 3.00 0.003 .0164448 .0782077 math | .0735256 .0173217 4.24 0.000 .0395758 .1074755 female | .6824681 .2447281 2.79 0.005 .2028098 1.162126 _cons | -7.663303 .9921345 -7.72 0.000 -9.607851 -5.718755------------------------------------------------------------------------------

use http://www.gseis.ucla.edu/courses/data/lahigh, clear

poisson daysabs langnce gender, nolog

Poisson regression Number of obs = 316 LR chi2(2) = 171.50 Prob > chi2 = 0.0000Log likelihood = -1549.8567 Pseudo R2 = 0.0524

------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | -.01467 .0012934 -11.34 0.000 -.0172051 -.0121349 gender | -.4093528 .0482192 -8.49 0.000 -.5038606 -.3148449 _cons | 2.646977 .0697764 37.94 0.000 2.510217 2.783736------------------------------------------------------------------------------

poisson, irr

Poisson regression Number of obs = 316 LR chi2(2) = 171.50 Prob > chi2 = 0.0000Log likelihood = -1549.8567 Pseudo R2 = 0.0524

------------------------------------------------------------------------------ daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | .9854371 .0012746 -11.34 0.000 .982942 .9879384 gender | .6640799 .0320214 -8.49 0.000 .6041936 .7299021------------------------------------------------------------------------------

glm daysabs langnce gender, link(log) fam(poisson) nolog

Generalized linear models No. of obs = 316Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1Deviance = 2238.317597 (1/df) Deviance = 7.151174Pearson = 2752.913231 (1/df) Pearson = 8.79525

Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]Standard errors : OIM





glm, eform


Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]Standard errors : OIM



nbreg daysabs langnce gender, nolog

Negative binomial regression Number of obs = 316 LR chi2(2) = 20.63 Prob > chi2 = 0.0000Log likelihood = -880.9274 Pseudo R2 = 0.0116

------------------------------------------------------------------------------ daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | -.0156493 .0039485 -3.96 0.000 -.0233882 -.0079104 gender | -.4312069 .1396913 -3.09 0.002 -.7049968 -.1574169 _cons | 2.70344 .2292762 11.79 0.000 2.254067 3.152813-------------+---------------------------------------------------------------- /lnalpha | .25394 .095509 .0667457 .4411342-------------+---------------------------------------------------------------- alpha | 1.289094 .1231201 1.069024 1.554469------------------------------------------------------------------------------Likelihood ratio test of alpha=0: chibar2(01) = 1337.86 Prob>=chibar2 = 0.000

glm daysabs langnce gender, link(log) fam(nbin) nolog




Variance function: V(u) = u+(1)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]Standard errors : OIM

Log likelihood = -884.4953535 AIC = 5.617059BIC = -1375.943849


glm, eform


Variance function: V(u) = u+(1)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]Standard errors : OIM



glm daysabs langnce gender, fam(gamma) link(log) nolog

Generalized linear models No. of obs = 316Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1.583724Deviance = 251.8270233 (1/df) Deviance = .8045592Pearson = 495.7055497 (1/df) Pearson = 1.583724

Variance function: V(u) = u^2 [Gamma]Link function : g(u) = ln(u) [Log]Standard errors : OIM



glm, eform



Generalized linear models No. of obs = 316Optimization : ML: Newton-Raphson Residual df = 313 Scale parameter = 1.583724Deviance = 251.8270233 (1/df) Deviance = .8045592Pearson = 495.7055497 (1/df) Pearson = 1.583724

Variance function: V(u) = u^2 [Gamma]Link function : g(u) = ln(u) [Log]Standard errors : OIM


------------------------------------------------------------------------------ daysabs | ExpB Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- langnce | .9844372 .0039994 -3.86 0.000 .9766296 .9923071 gender | .6487881 .0936668 -3.00 0.003 .4888924 .8609788------------------------------------------------------------------------------

Categorical Data Analysis Course

Phil Ender

http://www.gseis.ucla.edu/courses/ed231c/231c.htmlhttp://www.gseis.ucla.edu/ender/

Ed231C- Generalized Linear Models

Documents

Transcript of Ed231C- Generalized Linear Models