Model Selections and Comparisons
description
Transcript of Model Selections and Comparisons
Model Selections and Comparisons
(Categorical Data Analysis, Ch 9.2)
Yumi KuboAlvin Hsieh
Model 1
Model 2
Survey Data1992 by Wright State University School of Medicine and United Health Services in Dayton, Ohio
• 2276 students in the last year of high school (nonurban area)
• We add more dimensions to 8.2.4
• Variables: Alcohol (A), Cigarette (C), Marijuana (M)
• Added variables: Gender (G), Race (R)
Association Graphs (Definitions)
• association graph - set of vertices, each vertex is a variable
• edge - conditional association between 2 variables
• path - sequence of edges leading from one variable to another
Association Graphs (Saturated)
M
A
C R
G
Variable
Conditional Association
M
R
G
Path
Association Graphs (Reduced)
M
AC R
G
Data Set Marijuana Use
========================================================== Race = White Race = Other ============================ ==========================
Female Male Female MaleAlcohol Cigarette yes no yes no yes no yes noyes yes 405 268 453 228 23 23 30 19
no 13 218 28 201 2 19 1 18no yes 1 17 1 17 0 1 1 8
no 1 117 1 133 0 12 0 17
SAS ProgramToo large to place here:
Go to survey.sas
R Programsurvey<-data.frame(expand.grid(cigarette=c("Yes","No"), alcohol=c("Yes","No"), marijuana=c("Yes","No"), gender=c("female","male"), race=c("white","other") ), count=c(405,13,1,1,268,218,17,117,453,28,1,1,228,201,17, 133,23,2,0,0,23,19,1,12,30,1,1,0,19,18,8,17))library(MASS)fit.GR<-glm(count~ . + gender*race, data=survey, family=poisson) # mutual independence + GRfit.homog.assoc<-glm(count~ .^2, data=survey, family=poisson) # homogeneous associationfit.3fact<-glm(count~ .^3, data=survey, family=poisson) # all three factor termssummary(res<-stepAIC(fit.homog.assoc, scope= list(lower = ~ + cigarette + alcohol + marijuana + gender*race), direction="backward"))fit.AC.AM.CM.AG.AR.GM.GR.MR<-resfit.AC.AM.CM.AG.AR.GM.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR.MR, ~. - marijuana:race)fit.AC.AM.CM.AG.AR.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR, ~. - marijuana:gender)
Original codes (modified below): http://math.cl.uh.edu/~thompsonla/RCode.txt
R Program (P-values)
1-pchisq((15.8-15.3),1)
1-pchisq((16.7-15.8),1)
1-pchisq((19.9-16.7),1)
1-pchisq((28.8-19.9),1)
1-pchisq((40.3-28.8),1)
Model Selection1. Select an Alpha level (default to use 0.05)
2. Look at the P-values of the model
• Use (in R): 1-pchisq(G2, df)
3. Stop selecting once you reach the Alpha in (1)
4. Model 1: G+R+A+C+M+GR
5. Model 2: G+R+A+C+M+GR+(all pairs)
Model Selection (Continued)
6. Model 3: G+R+A+C+M+GR+(all pairs)+(all 3 factors)
7. Model 4g: lowest change in G2, taking out CR
8. Model 5: lowest change in G2, taking out CG
9. Model 6: lowest change in G2, taking out MR
10. Model 7: lowest change in G2, taking out GM
11. Consider: A+C+M+AC+AM+CM
Goodness-of-Fit tests(Table 9.2)Model (G-Gender, R-Race, A-Alcohol, C-Cigarette, M-Marijuana) G2 df
1. Mutual independence + GR 1325.1 25
2. Homogeneous association 15.3 16
3. All three-factor terms 5.3 6
4a. (2) - AC 201.2 17
4b. (2) - AC 107.0 17
4c. (2) - AC 513.5 17
4d. (2) - AC 18.7 17
4e. (2) - AC 20.3 17
4f. (2) - AC 16.3 17
4g. (2) - AC 15.8 17
4h. (2) - AC 25.2 17
4i. (2) - AC 18.9 17
5. (AC, AM, CM, AG, AR, GM, GR, MR) 16.7 18
6. (AC, AM, CM, AG, AR, GM, GR) 19.9 19
7. (AC, AM, CM, AG, AR, GR) 28.8 20
Thank You!
Any Questions???