Model Selections and Comparisons

Model Selections and Comparisons

(Categorical Data Analysis, Ch 9.2)

Yumi KuboAlvin Hsieh

Model 1

Model 2

Survey Data1992 by Wright State University School of Medicine and United Health Services in Dayton, Ohio

• 2276 students in the last year of high school (nonurban area)

• We add more dimensions to 8.2.4

• Variables: Alcohol (A), Cigarette (C), Marijuana (M)

• Added variables: Gender (G), Race (R)

Association Graphs (Definitions)

• association graph - set of vertices, each vertex is a variable

• edge - conditional association between 2 variables

• path - sequence of edges leading from one variable to another

Association Graphs (Saturated)

M

A

C R

G

Variable

Conditional Association

M

R

G

Path

Association Graphs (Reduced)

M

AC R

G

Data Set Marijuana Use

========================================================== Race = White Race = Other ============================ ==========================

Female Male Female MaleAlcohol Cigarette yes no yes no yes no yes noyes yes 405 268 453 228 23 23 30 19

no 13 218 28 201 2 19 1 18no yes 1 17 1 17 0 1 1 8

no 1 117 1 133 0 12 0 17

SAS ProgramToo large to place here:

Go to survey.sas

R Programsurvey<-data.frame(expand.grid(cigarette=c("Yes","No"), alcohol=c("Yes","No"), marijuana=c("Yes","No"), gender=c("female","male"), race=c("white","other") ), count=c(405,13,1,1,268,218,17,117,453,28,1,1,228,201,17, 133,23,2,0,0,23,19,1,12,30,1,1,0,19,18,8,17))library(MASS)fit.GR<-glm(count~ . + gender*race, data=survey, family=poisson) # mutual independence + GRfit.homog.assoc<-glm(count~ .^2, data=survey, family=poisson) # homogeneous associationfit.3fact<-glm(count~ .^3, data=survey, family=poisson) # all three factor termssummary(res<-stepAIC(fit.homog.assoc, scope= list(lower = ~ + cigarette + alcohol + marijuana + gender*race), direction="backward"))fit.AC.AM.CM.AG.AR.GM.GR.MR<-resfit.AC.AM.CM.AG.AR.GM.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR.MR, ~. - marijuana:race)fit.AC.AM.CM.AG.AR.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR, ~. - marijuana:gender)

Original codes (modified below): http://math.cl.uh.edu/~thompsonla/RCode.txt

R Program (P-values)

1-pchisq((15.8-15.3),1)

1-pchisq((16.7-15.8),1)

1-pchisq((19.9-16.7),1)

1-pchisq((28.8-19.9),1)

1-pchisq((40.3-28.8),1)

Model Selection1. Select an Alpha level (default to use 0.05)

2. Look at the P-values of the model

• Use (in R): 1-pchisq(G2, df)

3. Stop selecting once you reach the Alpha in (1)

4. Model 1: G+R+A+C+M+GR

5. Model 2: G+R+A+C+M+GR+(all pairs)

Model Selection (Continued)

6. Model 3: G+R+A+C+M+GR+(all pairs)+(all 3 factors)

7. Model 4g: lowest change in G2, taking out CR

8. Model 5: lowest change in G2, taking out CG

9. Model 6: lowest change in G2, taking out MR

10. Model 7: lowest change in G2, taking out GM

11. Consider: A+C+M+AC+AM+CM

Goodness-of-Fit tests(Table 9.2)Model (G-Gender, R-Race, A-Alcohol, C-Cigarette, M-Marijuana) G2 df

1. Mutual independence + GR 1325.1 25

2. Homogeneous association 15.3 16

3. All three-factor terms 5.3 6

4a. (2) - AC 201.2 17

4b. (2) - AC 107.0 17

4c. (2) - AC 513.5 17

4d. (2) - AC 18.7 17

4e. (2) - AC 20.3 17

4f. (2) - AC 16.3 17

4g. (2) - AC 15.8 17

4h. (2) - AC 25.2 17

4i. (2) - AC 18.9 17

5. (AC, AM, CM, AG, AR, GM, GR, MR) 16.7 18

6. (AC, AM, CM, AG, AR, GM, GR) 19.9 19

7. (AC, AM, CM, AG, AR, GR) 28.8 20

Thank You!

Any Questions???

Model Selections and Comparisons

Documents

Transcript of Model Selections and Comparisons