Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science &...

28
Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. http://www.cse.ust.hk/~lzhang AAAI 2014 Tutorial

Transcript of Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science &...

Page 1: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

Latent Tree ModelsPart II: Definition and Properties

Nevin L. ZhangDept. of Computer Science & Engineering

The Hong Kong Univ. of Sci. & Tech.http://www.cse.ust.hk/~lzhang

AAAI 2014 Tutorial

Page 2: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

2

Part II: Concept and Properties

Latent Tree Models

Definition

Relationship with finite mixture models

Relationship with phylogenetic trees

Basic Properties

Page 3: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

3

Basic Latent Tree Models (LTM) Bayesian network

All variables are discrete

Structure is a rooted tree

Leaf nodes are observed

(manifest variables)

Internal nodes are not observed

(latent variables)

Parameters:

P(Y1), P(Y2|Y1),P(X1|Y2), P(X2|Y2),

Semantics:

Also known as Hierarchical latent class (HLC) models, HLC models (Zhang. JMLR 2004)

Page 4: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

4

Marginalizing out the latent variables in , we get a joint distribution over the observed variables .

In comparison with Bayesian network without latent variables, LTM: Is computationally very simple to work with. Represent complex relationships among manifest variables.

What does the structure look like without the latent variables?

Joint Distribution over Observed Variables

Page 5: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

5

Pouch Latent Tree Models (PLTM)

An extension of basic LTM

Rooted tree

Internal nodes represent discrete latent variables

Each leaf node consists of one or more continuous observed

variable, called a pouch.

(Poon et al. ICML 2010)

Page 6: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

6

More General Latent Variable Tree Models

Some internal nodes can be observed

Internal nodes can be continuous

Forest

Primary focus of this tutorial: the basic LTM

(Choi et al. JMLR 2011)

Page 7: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

7

Part II: Concept and Properties

Latent Tree Models

Definition

Relationship with finite mixture models

Relationship with phylogenetic trees

Basic Properties

Page 8: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

8

Finite Mixture Models (FMM)

Gaussian Mixture Models (GMM): Continuous attributes

Graphical model

Page 9: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

9

Finite Mixture Models (FMM)

GMM with independence assumption

Block diagonal co-variable matrix

Graphical Model

Page 10: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

10

Finite Mixture Models

Latent class models (LCM): Discrete attributes

Distribution for cluster k:

Product multinomial distribution:

All FMMs

One latent variable

Yielding one partition of data

Graphical Model

Page 11: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

11

From FMMs to LTMs Start with several GMMs,

Each based on a distinct subset of attributes

Each partitions data from a certain

perspective.

Different partitions are independent of each

other

Link them up to form a tree model

Get Pouch LTM

Consider different perspectives in a single

model

Multiple partitions of data that are correlated.

Page 12: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

12

From FMMs to LTMs Start with several LCMs,

Each based on a distinct subset of attributes

Each partitions data from a certain

perspective.

Different partitions are independent of each

other

Link them up to form a tree model

Get LTM

Consider different perspectives in a single

model

Multiple partitions of data that are

correlated.Summary: An LTM can be viewed as a collections of FMMs, with their latent variables linked up to form a tree structure.

Page 13: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

13

Part II: Concept and Properties

Latent Tree Models

Definition

Relationship with finite mixture models

Relationship with phylogenetic trees

Basic Properties

Page 14: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

14

Phylogenetic trees TAXA (sequences) identify species

Edge lengths represent evolution time

Usually, bifurcating tree topology

Durbin, et al. (1998). Biological Sequence Analysis: Probabilistic Models of

Proteins and Nucleic Acids. Cambridge University Press.

Page 15: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

15

Probabilistic Models of Evolution

Two assumptions

There are only substitutions, no

insertions/deletions (aligned)

One-to-one correspondence between sites

in different sequences

Each site evolves independently and

identically

P(x|y, t) = Pi=1 to m P(x(i) | y(i), t)

m is sequence length

P(x(i)|y(i), t)

Jukes-Cantor (Character Evolution) Model

[1969]

Rate of substitution a

Page 16: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

16

Phylogenetic Trees are Special LTMs

When focus on one site, phylogenetic trees are special

latent tree models The structure is a binary tree

The variables share the same state space.

Each conditional distribution is characterized by only one

parameters, i.e., the length of the corresponding edge

Page 17: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

17

Hidden Markov models are also special latent tree models

All latent variables share the same state space.

All observed variables share the same state space.

P(yt |st ) and P(st+1 | st ) are the same for different t ’s.

Hidden Markov Models

Page 18: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

18

Part II: Concept and Basic Properties

Latent Tree Models

Definition

Relationship with finite mixture models

Relationship with phylogenetic trees

Basic Properties

Page 19: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

19

So far, a model consists of

Observed and latent variables

Connections among the variables

Probability values

For the rest of Part II, a model consists of

Observed and latent variables

Connections among the variables

Probability parameters

Two Concepts of Models

Page 20: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

20

Model Inclusion

Page 21: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

21

If m includes m’ and vice versa, then they are

marginally equivalent.

If they also have the same number of free parameters,

then they are equivalent.

It is not possible to distinguish between equivalent

models based on data.

Model Equivalence

Page 22: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

22

Root Walking

Page 23: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

23

Root Walking Example

Root walks to X2; Root walks to X3

Page 24: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

24

Theorem: Root walking leads to equivalent latent tree

models.

Root Walking

(Zhang, JMLR 2004)

Special case of covered arc reversal in general Bayesian network,

Chickering, D. M. (1995). A transformational characterization of equivalent

Bayesian network structures. UAI.

Page 25: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

25

Edge orientations in latent tree models are not identifiable.

Technically, better to start with alternative definition of LTM:

A latent tree model (LTM) is

a Markov random field over an undirected tree, or tree-structured

Markov network

where variables at leaf nodes are observed and variables at

internal nodes are hidden.

Implication

Page 26: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

26

For technical convenience, we often root an LTM at one of its

latent nodes and regard it as a directed graphical model.

Rooting the model at different latent nodes lead to

equivalent directed models.

This is why we introduced LTM as directed models.

Implication

Page 27: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

27

Regularity

|X|: Cardinality of variable X, i.e., the number of states.

Page 28: Latent Tree Models Part II: Definition and Properties Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. lzhang.

AAAI 2014 Tutorial Nevin L. Zhang HKUST

28

Can focus on regular models only

Irregular models can be made regular

Regularized models better than irregular models

Theorem: The set of all regular models for a given set of

observed variables is finite.

Regularity

(Zhang, JMLR 2004)