Ordinal and Multinomial Models

download Ordinal and Multinomial Models

of 58

Transcript of Ordinal and Multinomial Models

Ordinal and Multinomial ModelsWilliam Simpson Research Computing Services

http://intranet.hbs.edu/dept/research/statistics/

Types of Models Models are generalizations of the logit and probit models Ordinal logit and probit deal with ordered data (more than 2 categories) Multinomial logit deals with unordered data with more than 2 categories (Multinomial probit is not commonly used due to computational difficulties)

Outline of Talk Review of Binary Models Ordinal Models Multinomial Logit

Binary Data View 1 (CDF) View 1 we compute a number that is a linear combination of our predictors, call it y=E+F x. We then convert y into a probability p by using a cumulative distribution function (CDF). Our final outcome is 1 with probability p.prob 1 0.8 0.6 0.4 0.2 a + bX - 3 - 2 - 1 1 2 3

Another CDF ViewY

a + bX

p

X

p=1

p=0

Binary Data View 2 (Latent or Unobserved Variable) View 2 we compute a number that is a linear combination of our predictors and then add an error term, call it y*= E + F x + u We then get an outcome of 1 if y* >= 0, outcome 0 if y* < 0. In this case, the probabilistic element is the error term u, and y* is an unobserved variable.

Binary Data Unobserved Variable ViewY* a + bX

t

X

PDF of Y*

What Happens When Standard Deviation of u ChangesY* a + bX

y*= E + F x + v std(v) > std(u)t X

Comparing CDF and Latent Variable Views The two views are equivalent. Each one can be converted into the other, where the cumulative probability function (CDF) in view 1 matches the CDF of the distribution of u in view 2.

Combining the Two ViewsY, Y* a + bX

X

p=1

p=0

Combining the Two ViewsY, Y* a + bX

X

p=1

p=0

Ordinal Outcomes 3 or more categorical outcomes, which can be treated as ordered Bond ratings (AAA, AA, B, C, ) Likert scales (e.g. responses on a 1-7 scale, from strongly disagree to strongly agree) Often analyzed as continuous

Ordinal Outcomes (Latent Variable View)Y* bX

t3 t2 t1

Ordinal Outcomes (CDF and Latent Variable View)bX t3 t2 t1

p=1

p=0

Ordinal Outcomes (CDF and Latent Variable View)bX t3 t2 t1

p=1

p=0

Ordinal Outcomes (CDF and Latent Variable View)bX t3 t2 t1

p=1

p=0

Ordinal Outcomes (CDF and Latent Variable View)bX t3 t2 t1

p=1

p=0

SAS and Stata CodeStata oprobit outcome x or ologit outcome x SAS proc logistic; class outcome; model outcome = x / link=probit; or model outcome = x ; run;

Sample Output (Stata oprobit)--------------------------------------------------------y | Coef. Std. Err. z P>|z|

--------------------------------------------------------x | 1.074575 .1209108 8.89 0.000

-------------+------------------------------------------_cut1 | _cut2 | _cut3 | _cut4 | _cut5 | _cut6 | -2.076242 -.9736895 -.4528313 1.106628 2.079342 3.176076 .1548201 .0807119 .073509 .0781733 .0932966 .167065 (Ancillary parameters)

----------------------------------------------------------

Interpretation of Stata Outputx | 1.074575 .1209108 -------------+----------------------_cut1 | _cut2 | -2.076242 -.9736895 .1548201 .0807119

Outcome will be in the second ordered category or higher (not the first), if 1.07*x+u > -2.08. Outcome will be in the third ordered category or higher (not the first or second), if 1.07*x+u > -.97. Outcome will be in the second ordered category exactly, if -.97 > 1.07*x+u > -2.08.

Sample Output (SAS PROC LOGISTIC with LINK=PROBIT)Parameter Intercept 7 Intercept 6 Intercept 5 Intercept 4 Intercept 3 Intercept 2 x DF 1 1 1 1 1 1 1 Estimate -3.1758 -2.0793 -1.1066 0.4528 0.9737 2.0762 1.0746 Std Error 0.1666 0.0933 0.0781 0.0734 0.0807 0.1555 0.1208

Interpretation of SAS OutputIntercept 3 Intercept 2 x 1 1 1 0.9737 2.0762 1.0746 0.0807 0.1555 0.1208

Outcome will be in the second ordered category or higher (not the first), if 1.07*x + 2.08 + u > 0. Outcome will be in the third ordered category or higher, if 1.07*x + .97 + u > 0. Outcome will be in the second ordered category if 1.07*x + 2.08 + u > 0 and 1.07*x + .97 + u < 0.

Interpreting Coefficients Multiple cutpoints with no intercept term, or multiple intercept terms Probabilities modeled are probabilities for all outcomes >=k, compared with all outcomes < k. Interpret the coefficients the same as in the corresponding binary model.

Interpreting Coefficients (Ordinal Probit)E p 2 ! * 2 FX p 3 ! * 3 FX E * is the cumulative distribution of a standard normal p 2 is the probability of outcome 2 or higher p 3 is the probability of outcome 3 or higher prob(exactly 2) ! p3 p 2

Interpreting Coefficients (Ordinal Logit)p2 log ! E 2 FX 1 p2 exp(E 2 FX ) p2 ! 1 exp(E 2 FX ) exp(E 3 FX ) p3 ! 1 exp(E 3 FX ) p 2 is the probability of outcome 2 or higher p3 is the probability of outcome 3 or higher prob(exactly 2) ! p 3 p 2

Assumptions of Ordinal Models Relationship between probabilities and E + F x follows the assumed form (normal for probit, logistic for logit). Parallel regressions Coefficient F is the same for every hurdle aka equal slopes, (proportional odds for logistic models) If not, use generalized ordered logit

Parallel RegressionsY a3+ bX a2+ bX a1+ bX X

p=1

p=0

Proportional Oddsp2 ! E 2 FX log 1 p2 p3 ! E 3 FX log 1 p3 p3 p2 log log ! E3 E2 1 p3 1 p2 odds 3 log ! E3 E2 odds 2 odds 3 ! odds 2 * exp 3 E 2 E

Interpreting Cutpoints

Sample Likert Scale with Extra Points2.3 1 2 3 4.2 4 5 6 7 ----------------------------------------------------------SD D MoD SoD N VSA SoA A SA

SD=Strongly Disagree, SoD = Somewhat Disagree D=Disagree, N=Neutral, A=Agree SA=Strongly Agree, SoA=Somewhat Agree MoD=Moderately Disagree VSA = Very Slightly Agree

Probability of Responses

SD

D MoD

SoD

N VSA

SoA

A

SA

Sample Likert Scale with Uneven Points1 2 3 4 5 6 7 ----------------------------------------------------------SD (1) D (2) MoD (2.3) SoD (3) N (4) VSA (4.2) SA (7)

SD=Strongly Disagree, SoD = Somewhat Disagree MoD=Moderately Disagree D=Disagree, N=Neutral VSA = Very Slightly Agree SA=Strongly Agree

Probabilities with Uneven Scale

SD

DMoD

SoD

NVSA

SA

Ordinal Outcomes (Latent Variable View)Y* bX

t3 t2 t1

Interpreting CutpointsModel is Y* = u (no predictor variables)oprobit y Coef. _cut1 | _cut2 | _cut3 | _cut4 | _cut5 | _cut6 | -2.494879 -1.501138 -.4976369 .5008453 1.506652 2.519265 Std. Err. .014093 .0061005 .0041469 .0041494 .0061208 .0144766

Uneven CutpointsCoef. _cut1 | _cut2 | _cut3 | _cut4 | _cut5 | _cut6 | -2.494879 -2.217338 -.4976369 .5008453 1.506652 2.519265 Std. Err. .014093 .0106104 .0041469 .0041494 .0061208 .0144766

Cutpoints for Ordinal Logit

_cut1 | _cut2 | _cut3 | _cut4 | _cut5 | _cut6 |

-5.090054 -2.623044 -.8017561 .8111758 2.630199 5.05293

.0405469 .0125897 .0068396 .0068519 .0126288 .0398104

Multinomial Logit A generalization of logistic regression More than two outcomes Outcomes are not ordered We are interested in the relative probabilities of outcomes

Examples Choice of transportation bus, taxi, private car Choice of product brand Occupational choice (considered as unordered) craft, blue collar, professional, white collar

Example DataID 1 2 3 4 5 6 7 Distance Income 5 10 1 25 30 2 1 15 10 12 18 40 20 8 Choice Bus Car Car Bus Taxi Bus Taxi

Using a Reference LevelID 1 2 3 4 5 6 7 Distance Income 5 10 1 25 30 2 1 15 10 12 18 40 20 8 Choice Bus Car Car Bus Taxi Bus Taxi

Sample Results----------------------------------------------------outcome | Coef. Std. Err. z P>|z| -------------+--------------------------------------Taxi | distance | income | _cons | -.0757664 .319901 -6.22562 .1305456 .0830162 1.734012 -0.58 3.85 -3.59 0.562 0.000 0.000

-------------+--------------------------------------Car | distance | income | _cons | .4482523 .0447404 -2.587764 .1129979 .0581754 1.214103 3.97 0.77 -2.13 0.000 0.442 0.033

----------------------------------------------------(Outcome outcome==Bus is the comparison group)

Sample Results (2)----------------------------------------------------outcome | Coef. Std. Err. z P>|z| -------------+--------------------------------------Bus | distance | income | _cons | .0757664 -.319901 6.22562 .1305456 .0830162 1.734012 0.58 -3.85 3.59 0.562 0.000 0.000

-------------+--------------------------------------Car | distance | income | _cons | .5240187 -.2751607 3.637855 .1245058 .080734 1.705811 4.21 -3.41 2.13 0.000 0.001 0.033

----------------------------------------------------(Outcome outcome==Taxi is the comparison group)

Coefficients on Distance Taxi Bus Bus Taxi Bus Taxi Car Car .0757664 -.0757664 .4482523 .5240187

Bus Taxi + Taxi Car = Bus Car-.0757664 + .5240187 = .4482523

Bus Car = Taxi Car Taxi Bus

Probability Change Plotdistance: +/-sd/2 income: +/-sd/2

T B B C-.18 -.09 -.01 .08 Change in the Predicted Probability .16 .24

C T.33

Odds Ratio PlotFactor Change Scale Relative to .23 .37 .59 .95 1.54 2.48 4

distanceStd Coef

T B C B C T-1.48 -1.01 -.53 Logit Coefficient Scale Relative to -.05 .43 .91 1.39

incomeStd Coef

Independence from Irrelevant Alternatives (IIA) Relative odds of two categories shouldnt change when a new category is added E.g., if choices are car, bus, and Yellow Cab, the relative proportions shouldnt change if a new choice is added, e.g. Black & White Cab Not realistic in this case. Assumption should be examined carefully.

Other Models for Nominal Outcomes Conditional Logit Attributes of choices can be used as predictors

Nested Logit Treats a set of choices as a hierarchy IIA assumption can be relaxed

References Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage. Hosmer, D. W. and S. Lemeshow. (2000). Applied Logistic Regression (Second ed.). New York: Wiley. Allison, P. D. (1999). Logistic Regression Using the SAS System: Theory and Application. Cary, NC: SAS Institute. Long, J. S. & Freese, J. (2001). Regression Models for Categorical Dependent Variables using Stata. College Station, TX: Stata Press.

Appendix

Programming ExamplesBy James Zeitler

Ordered Logit (SAS)proc logistic data = work.ordinals descending; model y = x; run;The LOGISTIC Procedure Model Information Data Set WORK.ORDINALS .............................................. Model cumulative logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value y Frequency 1 7 6 ............................. 7 1 6 Probabilities modeled are cumulated over the lower Ordered Values. Analysis of Maximum Likelihood Estimates Standard Wald DF Estimate Error Chi-Square 1 -6.1912 0.4312 206.1863 1 -3.6194 0.1804 402.7389 1 -1.8611 0.1414 173.2883 1 0.7326 0.1275 33.0150 1 1.7093 0.1520 126.4030 1 4.3014 0.4189 105.4418 1 1.8479 0.2176 72.1016

Parameter Intercept Intercept Intercept Intercept Intercept Intercept x

7 6 5 4 3 2

Pr > ChiSq