RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The...
Transcript of RDERED MULTINOMIAL REGRESSION ANALYSISfaculty.smu.edu/kyler/courses/7312/presentations/... · The...
ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS
Pooja Shivraj Southern Methodist University
Linear Regression
Logistic Regression Dichotomous dependent variable (yes/no, died/
didn’t die, at risk/not at risk, etc.) Predicts the probability of a person belonging
in that category.
KINDS OF REGRESSION ANALYSES
QUICK REVIEW: LOGISTIC REGRESSION
Values calculated from linear regression are continuous – need to be transformed on a 0-1 scale to represent probability since 0 ≤ p ≤ 1
Logistic regression probability calculated by:
p ^ = e
(B1x + B0)
e (B1x + B0) 1 +
CLASS EXAMPLE: LOGISTIC REGRESSION
Probability of a person complying for a mammogram, based on whether or not they get a physician’s recommendation
CLASS EXAMPLE: LOGISTIC REGRESSION
p ^ = e (B1x + B0)
e (B1x + B0) 1 +
Probability of complying if NOT recommended by physician:
Probability of complying if recommended by physician:
p ^ = e (2.29(0) - 1.84)
e (2.29(0) - 1.84) 1 + p ^ =
e (2.29(1) - 1.84)
e (2.29(1) - 1.84) 1 +
= 0.14 = 0.61
ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS
Type of logistic regression that allows more than two discrete outcomes
Outcomes are ordinal: Yes, maybe, no First, second, third place Gold, silver, bronze medals Strongly agree, agree, neutral, disagree,
strongly disagree
ASSUMPTION
No perfect predictions – one predictor variable value cannot solely correspond to one dependent variable value – check using crosstabs.
ORDERED LOGISTIC REGRESSION EXAMPLE
Load libraries:
Load data: pooj<-read.csv("http://www.ats.ucla.edu/stat/r/dae/ologit.csv")
library(arm) library(psych)
ORDERED LOGISTIC REGRESSION EXAMPLE
Variables: apply – college juniors reported likelihood
of applying to grad school (0 = unlikely, 1 = somewhat likely, 2 = very likely)
pared – indicating whether at least one parent has a graduate degree (0 = no, 1 = yes)
public – indicating whether the undergraduate institution is a public or private (0 = private, 1 = public)
gpa – college GPA
> str(pooj) 'data.frame': 400 obs. of 4 variables: $ apply : int 2 1 0 1 1 0 1 1 0 1 ... $ pared : int 0 1 1 0 0 0 0 0 0 1 ... $ public: int 0 0 1 0 0 1 0 0 0 0 ... $ gpa : num 3.26 3.21 3.94 2.81 2.53 ... > table(pooj$apply) 0 1 2 220 140 40 > table(pooj$pared) 0 1 337 63 > table(pooj$public) 0 1 343 57
> xtabs(~pooj$pared+pooj$apply) pooj$apply pooj$pared 0 1 2 0 200 110 27 1 20 30 13 > xtabs(~pooj$public+pooj$apply) pooj$apply pooj$public 0 1 2 0 189 124 30 1 31 16 10
CHECK ASSUMPTION – CROSS-TABS
Why is this important?
SINGLE PREDICTOR MODEL - GPA > library(arm) > summary(m1<-bayespolr(as.ordered(pooj$apply)~pooj$gpa)) Call: bayespolr(formula = as.ordered(pooj$apply) ~ pooj$gpa) Coefficients: Value Std. Error t value pooj$gpa 0.7109 0.2471 2.877 Intercepts: Value Std. Error t value 0|1 2.3306 0.7502 3.1065 1|2 4.3505 0.7744 5.6179 Residual Deviance: 737.6921 AIC: 743.6921
0|1 1|2
CUMULATIVE DISTRIBUTION FUNCTION
LABELING COEFFICIENTS Coefficients:
Value Std. Error t value
pooj$gpa 0.7109 0.2471 2.877
Intercepts:
Value Std. Error t value
0|1 2.3306 0.7502 3.1065
1|2 4.3505 0.7744 5.6179
Coefficient of the model coef<- m1$coef
Intercepts of the model intercept <- m1$zeta
Let us look at the likelihood of students with an average GPA applying to graduate school.
> x<-mean(pooj$gpa)
[1] 2.998925
TRANSFORMING OUTCOMES TO PROBABILITIES
prob<-function(input){exp(input)/(1+exp(input))}
(p0<-prob(intercept[1]-coef*x))
0.5493198
(p1<-prob(intercept[2]-coef*x)-p0)
0.3525213 (p2<-1-(p0+p1))
0.0981589
WHY NOT USE LINEAR REGRESSION? > summary(linreg<-lm(pooj$apply~pooj$gpa))
Call:
lm(formula = pooj$apply ~ pooj$gpa)
Residuals:
Min 1Q Median 3Q Max
-0.7917 -0.5554 -0.3962 0.4786 1.6012
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.22016 0.25224 -0.873 0.38329
pooj$gpa 0.25681 0.08338 3.080 0.00221 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6628 on 398 degrees of freedom
Multiple R-squared: 0.02328, Adjusted R-squared: 0.02083
F-statistic: 9.486 on 1 and 398 DF, p-value: 0.002214
AND OUR ASSUMPTIONS AREN’T MET…
LINEAR REGRESSION VERSUS ORDERED LOGISTIC REGRESSION
The decision between linear regression and ordered multinomial regression is not always black and white. When you have a large number of categories that can be considered equally spaced simple linear regression is an optional alternative (Gelman & Hill, 2007).
Moral of story: Always start by checking the
assumptions of the model.
USING MULTIPLE PREDICTORS summary(m2 <- bayespolr(as.ordered(apply)~gpa + pared + public ,pooj)) Call: bayespolr(formula = as.ordered(apply) ~ gpa + pared + public, pooj) Coefficients: Value Std. Error t value gpa 0.6041463 0.2577039 2.3443424 pared 1.0274106 0.2636348 3.8970973 public -0.0528103 0.2931885 -0.1801240 Intercepts: Value Std. Error t value 0|1 2.1638 0.7710 2.8064 1|2 4.2518 0.7955 5.3449 Residual Deviance: 727.002 AIC: 737.002
TRANSFORMING OUTCOMES TO PROBABILITIES (coef<- m2$coef)
gpa pared public
0.6041463 1.0274106 -0.0528103
(intercept<-m2$zeta)
0|1 1|2
2.163841 4.251774
(x1<-cbind(0:4, 0 , .14))
[,1] [,2] [,3]
[1,] 0 0 0.14
[2,] 1 0 0.14
[3,] 2 0 0.14
[4,] 3 0 0.14
[5,] 4 0 0.14
(x2<-cbind(0:4, 1 , .14))
[,1] [,2] [,3]
[1,] 0 1 0.14
[2,] 1 1 0.14
[3,] 2 1 0.14
[4,] 3 1 0.14
[5,] 4 1 0.14
TRANSFORMING OUTCOMES TO PROBABILITIES prob<-function(VAR){exp(VAR)/(1+exp(VAR))}
> (p1<-prob(intercept[1]-x1 %*% coef))
[,1]
[1,] 0.9119769
[2,] 0.8498732
[3,] 0.7556908
[4,] 0.6282669
[5,] 0.4801055
> (p2<-prob(intercept[2]-x1 %*% coef)-p1)
[,1]
[1,] 0.07538029
[2,] 0.12722869
[3,] 0.20318345
[4,] 0.29895089
[5,] 0.39428044
> (p3<-1-(p1+p2))
[,1]
[1,] 0.01264281
[2,] 0.02289816
[3,] 0.04112575
[4,] 0.07278223
[5,] 0.12561404
TRANSFORMING OUTCOMES TO PROBABILITIES > (p4<-prob(intercept[1]-x2 %*% coef))
[,1]
[1,] 0.7876055
[2,] 0.6695483
[3,] 0.5254116
[4,] 0.3769123
[5,] 0.2484150
> (p5<-prob(intercept[2]-x2 %*% coef)-p1)
[,1]
[1,] 0.05348287
[2,] 0.08867445
[3,] 0.13730004
[4,] 0.19186675
[5,] 0.23347632
> (p6<-1-(p4+p5))
[,1]
[1,] 0.1589117
[2,] 0.2417772
[3,] 0.3372883
[4,] 0.4312209
[5,] 0.5181087
PLOTTING THE RESULTS Undergrad.GPA <-0:4 plot(Undergrad.GPA, p1, type="l", col=1, ylim=c(0,1)) lines(0:4, p2, col=2) lines(0:4, p3, col=3) lines(0:4, p4, col=1, lty = 2) lines(0:4, p5, col=2, lty = 2) lines(0:4, p6, col=3, lty = 2) legend(1.5, 1, legend=c("P(unlikely)", "P(somewhat likely)", "P(very likely)", "Line Type when Pared = 0", "Line Type when Pared = 1"), col=c(1:3,1,1), lty=c(1,1,1,1,2))
PRACTICE Read in the following table (Quinn, n.d.): practice <- read.table("http://www.stat.washington.edu/quinn/classes/536/data/nes96r.dat", header=TRUE)
Task: Run a regression using the ordered multinomial logistic model to predict the variation in the dependent variable ClinLR using the independent variables PID and educ.
ClinLR = Ordinal variable from 1-7 indicating ones view of Bill Clinton’s political
leanings, where 1 = extremely liberal, 2 = liberal, 3 = slightly liberal, 4 = moderate, 5= slightly conservative, 6 = conservative, 6 = extremely conservative.
PID = Ordinal variable from 0-6 indicating ones own political identification, where 0 = Strong Democrat and 6 = Strong Republican
educ = Ordinal variable from 1-7 indicating ones own level of education, where 1 = 8 grades or less and no diploma, 2 = 9-11 grades, no further schooling, 3 = High school diploma or equivalency test, 4 = More than 12 years of schooling, no higher degree, 5 = Junior or community college level degree (AA degrees), 6 = BA level degrees; 17+ years, no postgraduate degree, 7 = Advanced degree
REFERENCES
Gelman, A. & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.
Quinn, K. (n.d.). Retrieved from http://www.stat.washington.edu/quinn/classes/536/data/nes96r.dat UCLA: Academic Technology Services. (n.d.). Retrieved from http://www.ats.ucla.edu/stat/r/dae/ologit.csv