The Multigraph for Loglinear Models
description
Transcript of The Multigraph for Loglinear Models
The Multigraph for Loglinear Models
Harry KhamisStatistical Consulting Center
Wright State UniversityDayton, Ohio, USA
OUTLINE1. LOGLINEAR MODEL (LLM)
- two-way table- three-way table- examples
2. MULTIGRAPH- construction- maximum spanning tree- conditional independencies- collapsibility
3. EXAMPLES
22
Loglinear ModelLoglinear Model
Goal
Identify the structure of associations among a set of categorical variables.
33
LLM: two variables Y1 2 3 … J Total
------------------------------------------------------------------------------1 n11 n12 n13 … n1J n1+
2 n21 n22 n23 … n2J n2+
. . . . . .
X . . . . . .. . . . . .I nI1 nI2 nI3 … nIJ nI+
Total n+1 n+2 n+3 … n+J n
44
LLM: two variablesExample
Survey of High School Seniors in Dayton, OhioCollaboration: WSU Boonshoft School of Medicine and
United Health Services of Dayton
Marijuana Use?Yes No Total---------------------------------------------------------------------Yes 914 581 1495Cigarette Use?No 46 735 781Total 960 1316 2276
55
LLM: two variables
66
Two discrete variables, X and Y
Model of independence: generating class is [X][Y]
LLM: two variables
LLM of independence:
77
0
log
j
Yj
i
Xi
Yj
Xiij
where
LLM: two variablesSaturated LLM: generating class is [XY]:
88
RatioOddsNote
where
XYij
j
XYij
i
XYij
j
Yj
i
Xi
XYij
Yj
Xiij
:
0
log
LLM: two variables
Generating ProbabilisticInterpretation Class Model-------------------------------------------------------------------------------------X and Y independent [X][Y] pij = pi+p+j
X and Y dependent [XY] pij
99
LLM: three variablesExample: Dayton High School Data
Alcohol Cigarette Marijuana UseUse Use Yes No----------------------------------------------------------------------------------Yes Yes 911 538
No 44 456
No Yes 3 43No 2 279
1010
11111111
LLM: three variables
Saturated LLM, [XYZ]:
0...
log
k
XYZijk
j
XYij
i
XYij
j
Yj
i
Xi
XYZijk
YZjk
XZik
XYij
Zk
Yj
Xiijk
where
LLM: three variablesGenerating Probabilistic
Interpretation Class Model------------------------------------------------------------------------------------mutual independence [X][Y][Z] pijk = pi++p+j+p++k
joint independence [XZ][Y] pijk = pi+kp+j+
conditional independence [XY][XZ] pijk = pij+pi+k/pi++
homogeneous association* [XY][XZ][YZ] *
saturated model [XYZ] pijk
*nondecomposable model1212
Decomposable LLMs closed-form expression for MLEsclosed-form expression for MLEs
closed-form expression for closed-form expression for asymptotic variances (Lee, 1977)asymptotic variances (Lee, 1977)
conditional Gconditional G22 statistic simplifies statistic simplifies
allow for causal interpretationsallow for causal interpretations
easier to interpret the LLM easier to interpret the LLM
1313
1414
3 Categorical Variables: X, Y, and Z3 Categorical Variables: X, Y, and Z
If [X Y] and [Y Z] ⊗ ⊗then [X Z]⊗
FALSE!
1515
LLM: three variables
Generating ProbabilisticInterpretation Class Model------------------------------------------------------------------------------------mutual independence [X][Y][Z] pijk = pi++p+j+p++k
joint independence [XZ][Y] pijk = pi+kp+j+
conditional independence [XY][XZ] pijk = pij+pi+k/pi++
homogeneous association [XY][XZ][YZ] pijk = ψijφikωjk
saturated model [XYZ] pijk
1616
3 Categorical Variables: X, Y, and Z3 Categorical Variables: X, Y, and Z
If [Y Z] for all X = 1, 2, ….⊗then [Y Z]⊗
FALSE!
1717
LLM: three variables
Generating ProbabilisticInterpretation Class Model------------------------------------------------------------------------------------mutual independence [X][Y][Z] pijk = pi++p+j+p++k
joint independence [XZ][Y] pijk = pi+kp+j+
conditional independence [XY][XZ] pijk = pij+pi+k/pi++
homogeneous association [XY][XZ][YZ] pijk = ψijφikωjk
saturated model [XYZ] pijk
1818
3 Categorical Variables: X, Y, and Z3 Categorical Variables: X, Y, and Z
If [Y Z] ⊗then
[Y Z] for all X = 1, 2, 3, …⊗FALSE!
1919
Which Treatment is Better?Which Treatment is Better? TRIAL 1 TRIAL 2 CURED? CURED?Yes No Total Yes No Total---------------------------------------------- ----------------------------------------A 40 (.20) 160 200 85 (.85) 15 100
TREATMENTB 30 (.15) 170 200 300 (.75) 100 400
Combine TRIALS 1 and 2: CURED?Yes No Total-----------------------------------------------A 125 (.42) 175 300TREATMENTB 330 (.55) 270 600
“Ask Marilyn”, PARADE section, DDN, pages 6-7, April 28, 1996
2020
Florida Homicide Convictions Resulting in Death PenaltyML Radelet and GL Pierce, Florida Law Review 43: 1-34, 1991
Death PenaltyYes No
----------------------------------------White 53 (0.11) 430
Defendant’s RaceBlack 15 (0.08) 176
White Victim Black Victim
Death Penalty Death PenaltyYes No Yes No
------------------------------------- --------------------------------------White 53 (0.11) 414 White 0 (0.00) 16
Defendant’s RaceBlack 11 (0.23) 37 Black 4 (0.03) 139
2121
Multigraph Representation of LLMsMultigraph Representation of LLMs
Vertices = generators of the LLM
Multiedges = edges that are equal in number to the number of indices shared by the two vertices being joined
2222
Multigraph: three variablesMultigraph: three variables
[XY][XZ] XY XZ
2323
Examples of MultigraphsExamples of Multigraphs
2424
[AS][ACR][MCS][MAC]
AS ACR
MAC MCS
Examples of MultigraphsExamples of Multigraphs
2525
[ABCD][ACE][BCG][CDF]
ABCD
CDF
ACE BCG
Maximum Spanning TreeMaximum Spanning Tree
The maximum spanning tree of a multigraph M: • tree (connected graph with no circuits) • includes each vertex • sum of the edges is maximum
2626
Examples of maximum spanning trees Examples of maximum spanning trees
2727
[XY][XZ] XY XZ
Examples of maximum spanning trees Examples of maximum spanning trees
2828
[AS][ACR][MCS][MAC]
AS ACR
MAC MCS
Examples of maximum spanning trees Examples of maximum spanning trees
2929
[ABCD][ACE][BCG][CDF]
ABCD
CDF
ACE BCG
Fundamental Conditional IndependenciesFundamental Conditional Independenciesfor a Decomposable LLMfor a Decomposable LLM
1. Let S be the set of indices in a branch of the maximum spanning tree
2. Remove each factor of S from the multigraph, M; the resulting multigraph is M/S
3. An FCI is determined as:
where C1, C2, …, Ck are the sets of factors in the components of M/S
3030
3131
FCIs FCIs
[XY][XZ] XY XZX
S = {X}
M/S:Y Z
[Y⊗Z|X]
Collapsibility ConditionsCollapsibility Conditions
Consider a conditional independence relationship of the form
[C1 C⊗ 2|S].
If the levels of all factors in C1 are collapsed, then all relationships among the remaining factors are
undistorted EXCEPT for relationships among factors in S.
3232
3333
FCIs FCIs
[XY][XZ] XY XZX
S = {X}
M/S:Y Z
[Y⊗Z|X]
Example: Ob-Gyn StudyExample: Ob-Gyn Study(Darrocca, et al., 1996)
n = 201 pregnant mothers
Variables: E: EGA (Early, Late)B: Bishop score (High, Low)T: Treatment (Prostin, Placebo)
3434
Example: Ob-Gyn StudyExample: Ob-Gyn Study
BISHOP SCORE (B)High Low
EGA (E) EGA (E)TREATMENT (T) Early Late Early Late
------------------------------------------------------------------------------------------------------Prostin 34 24 27 21
Placebo 22 16 35 22
Best-fitting model: [E][TB]
3535
Example: Ob-Gyn StudyExample: Ob-Gyn Study
Generating Class: [E][TB]
Multigraph:
E TB
FCI: [E T,B]⊗
3636
Example: Ob-Gyn StudyExample: Ob-Gyn StudyCollapsed Table (collapse over EGA):
BISHOP SCORE (B) High Low Total
-------------------------------------------------Prostin 58 (0.55) 48 106
TREATMENT (T)Placebo 38 (0.40) 57 95
P = 0.037
3737
Example: WSU-United Way StudyExample: WSU-United Way Study
M: Marijuana (No, Yes)
A: Alcohol (No, Yes)
C: Cigarettes (No, Yes)
R: Race (Other, White)
S: Sex (Female, Male)
Observed cell frequencies (n = 2,276):
12 0 19 2 1 0 23 23117 1 218 13 17 1 268 40517 0 18 1 8 1 19 30133 1 201 28 17 1 228 453
3838
Example: WSU-United Way StudyExample: WSU-United Way Study
Generating class: [ACE][MAC][MCG]
Multigraph, M:
ACE MCG MAC
3939
Example: WSU-United Way StudyExample: WSU-United Way StudyM: S = {A,C}
ACE M/S: E A C MG M
MCG MAC [E M,G|⊗ A,C]
A = Alcohol C = Cigarette E = EthnicG = Gender M = Marijuana
4040
Example: WSU PASS ProgramExample: WSU PASS Program
“Preparing for Academic Success”
GPA below 2.0 at the end of first quarter
4141
Example: WSU PASS ProgramExample: WSU PASS Program
Variables (n = 972):
FACTOR LABEL LEVELS--------------------------------------------------------------------------------------------------------------Retention R 1=No, 2=YesCohort C 1, 2, 3, 4PASS Participation P 1=No, 2=YesEthnic Group E 1=Caucasian, 2=African-American, 3=OtherGender G 1=Male, 2=Female
4242
Example: WSU PASS ProgramExample: WSU PASS Program
The best-fitting LLM has generating class [EG][CP][RC][PG]
Multigraph, M: G
EG PG P
RC C CP 4343
Example: WSU PASS ProgramExample: WSU PASS ProgramM: S = {C}
EG PG EG PG
RC CP R PC M M/S
[E,G,P⊗R|C]
C = Cohort E = Ethnic G = GenderP = PASS Participation R = Retention
4444
Example: Affinal Relations in Bosnia-HerzegovinaExample: Affinal Relations in Bosnia-HerzegovinaData courtesy of Dr. Keith Doubt, Department of Sociology, Wittenberg University, Springfield, Ohio
N = 861 couples from Bosnia-Herzegovina are surveyed concerning affinal relations.
M: Marriage Type (traditional, elopement)L: Location of Man and Wife (same, different)E: Ethnicity (Bosniak, Serb, Croat)S: Settlement (rural, urban)
Best-fitting model: [MLES]
Consider structural associations among M, L, and S for each ethnic group (E) separately.
4545
Example: Affinal Relations in Bosnia-Herzegovina Example: Affinal Relations in Bosnia-Herzegovina
Bosniaks: [ML][LS]
Serbs: [MS][SL]
Croats: [M][L][S]
M: Marriage Type L: Location of Man and Wife S: Settlement
4646
ConclusionsConclusions The generator multigraph uses mathematical graph theory to
analyze and interpret LLMs in a facile manner
Properties of the multigraph allow one to:– Find all conditional independencies – Determine all collapsibility conditions
REFERENCEKhamis, H.J. (2011). The Association Graph and the Multigraph for Loglinear Models,
SAGE series Quantitative Applications in the Social Sciences, No. 167.
4747
Without data, you’re just one more person with an
opinion4848