Lecture 15: Hierarchical Latent Class Models
Based ON
N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, to appear.
COMP 538
Introduction of Bayesian networks
2
Outline Motivation
Application of LCA in medicine Model-based clustering and TCM diagnosis Need of more general models
Theoretical Issues Learning Algorithm Empirical Results Related work
3
Motivations/LCA in Medicine
In medical diagnosis, sometimes gold standard exists
Example: Lung Cancer Symptoms:
Persistent cough, Hemoptysis (Coughing up blood), Constant chest pain, Shortness of breath, Fatigue, etc
Information for diagnosis: symptoms, medical history, smoking history, X-ray, sputum.
Gold standard: Biopsy: the removal of a small sample of tissue for
examination under a microscope by a pathologist
4
Sometimes gold standard does not exist Example: Rheumatoid Arthritis (RA)
Symptoms: Back Pain, Neck Pain, Joint Pain, Joint Swelling, Morning Joint Stiffness, etc
Information for diagnosis: Symptoms, medical history, physical exam, Lab tests including a test for rheumatoid factor. (Rheumatoid factor is an antibody found in the blood of about
80 percent of adults with RA. ) No gold standard:
None of the symptoms or their combinations are not clear-cut indicators of RA
The presence or absence of rheumatoid factor does not indicate that one has RA.
Motivations/LCA in Medicine
5
Questions: How many diagnostic categories there should be? What rules to use when making diagnosis?
Note: These questions cannot be answered using regression (supervised learning) because
The true “disease type” is never directly observed. It is latent.
Ideas: Each “disease type” must correspond to a cluster of people. People in different clusters demonstrate different symptom
patterns (otherwise diagnosis is hopeless) Possible solution: Perform cluster analysis of symptom data to
reveal patterns.
Motivations/LCA in Medicine
6
Latent class analysis (LCA) Cluster analysis based on the
latent class (LC) model Observed variables Y_j: symptoms Latent variable X: “disease type” Assumption:
Y_j’s independent of each other given X
Given: Data on Y_j Determine:
Number of states for X Prevalence: P(X) Class specific probability P(Y_j|X)
X
Y1 Y2 Yp
Motivations/LCA in Medicine
7
LC Analysis of Hannover Rheumatoid Arthritis Data
Class specific probabilities Cluster 1: “disease” free
Cluster 2: “back-pain type”
Cluster 3: “Joint type”
Cluster 4: “Severe type”
Motivations/LCA in Medicine
8
Diagnosis in traditional Chinese Medicine (TCM) Example: deficiency of kidney(肾虚 ),
Symptoms: lassitude in the loins (腰酸软而痛 ), tinnitus(耳鸣 ), driping urine (小便余沥不尽 ), etc
Similar to Rheumatoid Arthritis Diagnosis based on symptoms No gold standards exist
Model-Based Clustering and TCM diagnosis
9
Current status Researcher have been searching for laboratory indices
that can serve as gold standards. All such effort failed. In practice, quite subjective. Differ considerably between
doctors. Hindering practices and preventing international
recognition.
Model-Based Clustering and TCM diagnosis
10
How to lay TCM diagnosis on a scientific foundation?
Model-based cluster analysis
Statistical methods might be the answer: TCM diagnosis based on experiences (by contemporary
practitioners and ancient doctors) Experiences are summaries of patient cases. Summarizing patient cases by humans braining leads to
subjectivity. Summarizing patient cases by computer avoids
subjectivity.
Model-Based Clustering and TCM diagnosis
11
Need of more general Models
Preliminary analysis of TCM data using LCA: Could not find models that fit data well
Reason: latent class (LC) models are too simplistic Local independence: Observed variables mutually independent
given the latent variable
Need: more realistic models
12
Hierarchical latent class (HLC) models: Tree structured Bayesian networks,
where Leaf nodes are observed and others are
not Manifest variables = observed variables
Maybe still too simplistic, but a good first step
More general than LC models Nice computational properties
Task: Learn HLC models from data Learn latent structures from what we
can observe.
Need of more general Models
13
Theoretical Issues What latent structures can be learned from data?
An HLC model M is parsimonious if there does NOT exist another model M' that
Is marginally equivalent to M, and P(manifest vars|M) = P(manifest vars|M’) Has fewer independent parameters than M.
Occam’s razor prefers parsimonious models over non-parsimonious ones
14
Theoretical Issues
Regular HLC models HLC model is regular if for any latent node Z with neighbors X1,
X2, …, Xk
where strict inequality hold when there are only two neighbors
Irregular models are not parsimonious. (Operational characterization of parsimony)
The set of all possible regular HLC models for a given set of manifest variables is finite. (Finite search space for learning algorithm.)
15
Theoretical Issues Model Equivalence
Root walking M1: root walks to X2 M2: root walks to X3
Root walking leads to equivalent models
16
Theoretical Issues Unrooted HLC models
The root of an HLC model can walk to any latent node.
Unrooted model: HLC models with undirected edges.
We can only learn unrooted models.
Question: which latent node should be the class node?
Answer: Any, depending on semantics and purpose of clustering. Learn one model for multiple clustering.
17
Theoretical Issues Measure of model complexity
When no latent variables: number of free parameters (standard dimension)
When latent variables: effective dimension instead P(Y1, Y2, …, Yn) spans 2n –1 dimensional space S if no
constraints. HLC model imposes some constraints on the joint It spans a subspace of S Effective dimension of model: dimension of S. HARD to
compute.
18
Theoretical Issues Reduction Theorem for regular HLC models (Kocka
and Zhang 2002): D(M) = D(M1) + D(M2) – number of common parameters
Problem reduces to: effective dimension of LC models. Good approximation exists.
19
Theoretical Issues Example
Standard dimension: 110 Effective dimension: 61
20
Learning HLC Models
Given: i.i.d. samples generated by some regular HLC model. Task: Reconstruct the HLC model from data.
Hill-climbing algorithm Scoring metric: We experiment with
AIC,BIC, CS, Holdout LS (yet to run experiments with effective dimension)
Search space: Set of all possible regular HLC models for the given manifest
variables.
We structure the space into two levels according to two subtasks Given a model structure, estimate cardinalities of latent
variables. Find a optimal model structure.
21
Learning HLC Models
Estimate cardinalities of latent variables given model structure
Search space: All regular models with the given model structure.
Hill-climbing: Start: All latent variables have minimum cardinality (usually
2) Search operator: Increate the cardinality of one latent
variable by one
22
Learning HLC Models
Find optimal model structures
Search space: Set of all regular unrooted HLC model structures for the given manifest variables.
Hill-Climbing: Start: unrooted LC model structure
Search operators: Node introduction, Node elimination, Neighbor relocation Can go between any two model structures using those
operators.
23
Learning HLC Models
Motivations for search Operators: Node introduction: M1’ M2’. Deal with local dependence.
Opposite: Node elimination. Neighbor relocation: M2’ M3’. Result of tradeoff.
Opposite. Itself. Not allowed to yield irregular model structures.
24
Empirical Results Synthetic data:
Generative model, randomly parameterized All latent variables have 3 states. Sample sizes: 5k, 10k, 50k, 100k
Log scores on testing data Close to that of generative model Do not vary much across scoring metrics.
25
Empirical Results Learned structures: Numbers of steps to true structure
26
Empirical Results Cardinality of Latent variables
Better results with more
skewed parameters
27
Empirical Results Hannover Rheumatoid Arthritis data:
5 binary manifest variables: back pain, neck pain, joint swelling, …
7,162 records Analysis by Kohlmann and Formann (1997): 4 class LC model. Our algorithm: exactly the same model.
Coleman data 4 binary manifest variables, 3,398 records. Analysis by Goodman (1974) and Hagenaars (1988): M1, M2 Our algorithm: M3
28
Empirical Results HIV data
4 binary manifest variables, 428 records Analysis by Uebesax (2000): Our algorithm:
House Building data 4 binary manifest variable, 1185 Records Analysis by Hagenaars (1988): M2, M3, M4
Our algorithm: 4 class LC model, fits data poorly. A failure. Reason: limitation of HLC models
29
Related Work Phylogenetic trees:
Represent relationship between a set of species.
Probabilistic model: Taxa aligned, sites evolves i.i.d Conditional probs: character evolution
model. Parameters: edge lengths, representing time.
Restricted to one site, a phylogenetic tree is a HLC model where
Binary tree structure, same state space for all vars.
The conditional probabilities are parameterized by edge lengths
The model is the same for different sites
AGGGCAT
TAGCCCA
TAGACTT
AGCACAA
AGCGCTT
AAGACTT
AGCACTT
AAGGCCT
30
Related Work Tree reconstruction:
Given: current taxa. Find: tree topology and edge lengths. Methods
Hill-climbing Stepwise addition of sites Star decomposition, similar to node introduction in HLC models. Branch swapping, similar to neighbor relocation in HLC models
Structural EM (Friedman et al 2002): Use fact: All vars have same state space
Neighbor joining (Saitou & Nei, 1987): Use facts: parameters = edge lengths, additivity.
31
Related Work Connolly (1993):
Heuristic method for constructing HLC models Mutual information used to group variables One latent variable introduced for each group. Cardinalities of latent variables determined using conceptual
clustering Martin and VanLehn (1994):
Heuristic method for learning two-level Bayesian network where the top level is latent.
Elidan et al. (2001): Learning latent variables for general Bayesian networks. Aim: Simplification. Idea: Structural signature.
Model-based hierarchical clustering (Hansen et al. 1991): Hierarchical the state space for ONE cluster variable.
32
Related Work Diagnostics for local dependence in LC models:
Hagenaars (1988): Standardized residual
Espeland & Handelmann (1988) Likelihood ratio statistic
Garret & Zeger (2000) Log odds ratio
Modeling local dependence in LC models Joint variable (M2), multiple indicator (M3), loglinear model (M4)
Top Related