Tensors and graphical models - Personal...

28
Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song Dept. ELEC, VUB Georgia Tech, USA INMA Seminar, May 7, 2013, LLN

Transcript of Tensors and graphical models - Personal...

Page 1: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Tensors and graphical models

Mariya Ishteva with Haesun Park, Le Song

Dept. ELEC, VUB Georgia Tech, USA

INMA Seminar, May 7, 2013, LLN

Page 2: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Outline

Tensors

Random variables and graphical models

Tractable representations

Structure learning

2

Page 3: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Tensors

RM×N×P

3

Page 4: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Ranks

• Multilinear rank (R1,R2,R3)

• Rank-R

Rank-1 tensor:

R = min(r), s.t. A =r∑

i=1

{rank-1 tensor}i

4

Page 5: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Matrix representations of tensors

Mode-1

A = A(1) =

• Mode-2

• Mode-3

• Multilinear rank: (rank(A(1)), rank(A(2)), rank(A(3)))

5

Page 6: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Tensor-matrix multiplication

• Tensor-matrix product

• Contraction A ∈ RI×J×M B ∈ R

K×L×M

C = 〈A, B〉3 C(i , j , k , l) =M∑

m=1

aijmbklm

4th order tensorC ∈ R

I×J×K×L

6

Page 7: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Basic decompositions

Singular value decomposition (SVD)

MLSVD / HOSVD

CP / CANDECOMP / PARAFAC

7

Page 8: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Outline

Tensors

Random variables and graphical models

Tractable representations

Structure learning

8

Page 9: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Discrete random variables

• Random variable

X ; 1, . . . , nPx(1), . . . , Px(n) Px ∈ R

n, Rn+, [0, 1]

• X1,X2; P(X1,X2) P12 ∈ Rn×n

1 · · · n1 P12(1, 1) · · · P12(1, n)...

n P12(n, 1) · · · P12(n, n)

• P(x1, x2) := P(X1 = x1,X2 = x2)

9

Page 10: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

2 random variables

X1,X2; P(X1,X2) P12 ∈ Rn×n

X1 ⊥ X2

P(x1, x2) = P(x1)P(x2)rank-1 matrix

=

H

X1 X2

P(x1, x2) =∑

h

P(x1|h)P(x2|h)P(h)

low-rank matrixrank-k matrix, k < n

=

Conditional probability tables (CPTs) P(X1|H),P(X2|H)

10

Page 11: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

3 random variablesX1,X2,X3; P(X1,X2,X3) P123 ∈ R

n×n×n

X1,X2,X3 independent

P(x1, x2, x3) = P(x1)P(x2)P(x3)

rank-1 tensor

=

H

X1 X2 X3

rank-k tensor, k < n

=

= · · ·

P(x1, x2, x3) =∑

h

P(x1|h)P(x2|h)P(x3|h)P(h)

11

Page 12: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

4 random variables

• X1,X2,X3,X4; P(X1,X2,X3,X4) P1234 ∈ Rn×n×n×n

• X1,X2,X3,X4 independent

• H

X1 X2 X3 X4

P(x1, x2, x3, x4) =∑

h

P(x1|h)P(x2|h)P(x3|h)P(x4|h)P(h)

• more variables

• more hidden variables

12

Page 13: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Challenges

• 10 variables, 10 states each −→ 1010 entries

• We need tractable representations• Latent variable models / low-rank factors• # parameters: exponential −→ polynomial

H

X1 X1 X

X1 X1 X1

• Challenges:• Choose a good representation X

• Learn the correct structure X

• Estimate the parameters ×

13

Page 14: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Outline

Tensors

Random variables and graphical models

Tractable representations

Structure learning

14

Page 15: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Tensors and graphical models

CP / CANDECOMP / PARAFACH

X1 X2 Xn· · ·

Tensor trainH1 H2 H3 Hn

X1 X2 X3 Xn

· · ·

HMM

Hierarchical Tucker

H

X1 X1 X

X1 X1 X1 Latent tree model

Tucker / MLSVDBlock term decomposition

×

15

Page 16: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Tensor train (TT) decomposition

A(i1,...,id )=∑

α0,...,αd

G1(α0, i1, α1)G2(α1, i2, α2) . . .Gd(αd−1, id , αd )

[I. V. Oseledets, SIAM J. Scientific Computing, 2011]

• Avoids curse of dimensionality• Small number of parameters, compared to Tucker model• Slightly more parameters than CP but more stable• Gk (αk−1, nk , αk ) has dimensions rk−1 × nk × rk , r0 = rd = 1• rk are called compression ranks:

Ak = Ak (i1, . . . , ik ; ik+1, . . . , id ), rank(Ak ) = rk

• Computation based on SVD• Computation: top → bottom

H1 H2 H3 Hn

X1 X2 X3 Xn

· · ·

16

Page 17: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Hierarchical Tucker decomposition

[L. Grasedyck, SIMAX, 2010]

• Similar properties as TT decomposition• Computation: bottom → top

H

X1 X1 X

X1 X1 X1

17

Page 18: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Potential advantages of tensor approach

• Real data are often multi-way

• Provides higher-level view

• Flexibility: different ranks in each mode: Tucker

• Uniqueness: CP, Block term decomposition

• No curse of dimensionality: Tensor train, hierarch. Tucker

18

Page 19: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Outline

Tensors

Random variables and graphical models

Tractable representations

Structure learning

19

Page 20: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Structure learning

• Given: (samples of) observed variables

• Assumption: the variables can be connected via hiddenvariables in a tree structure in a meaningful way

• Find: the tree / the relationships between the variables

• Additional difficulty: unknown number of hidden states

H

H H

X X X X

X3 X5 X2 X1 X X1 X1 X1

X1 X1

?

20

Page 21: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Quartet relationships: topologies

X1

X2

X3

X4

H G

X1

X3

X2

X4

H G

X1

X4

X2

X3

H G

P(x1, x2, x3, x4) =∑

h,g

P(x1|h)P(x2|h)P(h, g)P(x3|g)P(x4|g)

21

Page 22: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Building trees based on quartet relationships

Choose 3 variables and form a tree

Add all other variables, one by one

• Split the current tree into 3 subtrees• Choose 3 variables from different subtrees• Resolve the quartet relation with current and chosen variables• Insert the current variable in a subtree or connect to the tree

[For simplicity, assume each latent variable has 3 neighbors]

22

Page 23: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Tensor view of quartets

X1

X2

X3

X4

H G

P(X1,X2,X3,X4) =

P1|H

P2|H

IH PHG IG

P4|G

P3|G

A = reshape(P,n2,n2);

B = reshape(permute(P, [1,3,2,4]),n2,n2);

C = reshape(permute(P, [1,4,2,3]),n2,n2).

Notation: P1|H , P2|H , etc. stand for P(X1|H), P(X2|H), etc.

23

Page 24: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Rank properties of matrix representations

A =

(

( (

(P2|H P1|H PHG P4|G P3|G⊤

B =

(( (

(P3|G P1|H diag(PHG(:)) P4|G P2|H⊤

• rank(A) = rank(PHG) = krank(B) = rank(C) = nnz(PHG)

rank(A) ≪ rank(B) = rank(C)

• Sampling noise Nuclear norm relaxation

‖A‖∗ =∑n2

i=1 σi(A)

24

Page 25: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Resolving quartet relations

Algorithm 1 i∗ = Quartet(X1, X2, X3, X4)

1: Estimate P(X1,X2,X3,X4) from a set of m i.i.d. samples.2: Unfold P into matrices A, B and C, and compute

a1 = ‖A‖∗, a2 = ‖B‖∗ and a3 = ‖C‖∗.

3: Return i∗ = arg min i∈{1,2,3}ai .

• Easy to compute

• Recovery conditions

• Finite sample guarantees

• Agnostic to the number of hidden states

• Compares favorably to alternatives

25

Page 26: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Example: stock data

Given: stock prices (25 years, discretized into 10 values)

Find: relations between stocks

Finance:• C (Citigroup)• JPM (JPMorgan Chase)• AXP (American Express)• F (Ford Motor: Automotive and Financial Services)

Retailers:• TGT (Target)• WMT (WalMart)• RSH (RadioShack)

26

Page 27: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Conclusions

• Tensor decompositions are related to graphical models

• A common goal: tractable representations

• Tensors can be used for structure learning

27

Page 28: Tensors and graphical models - Personal Homepageshomepages.vub.ac.be/.../tensor_graphicalmodels_slides.pdf · Tensors and graphical models Mariya Ishteva with Haesun Park, Le Song

Thank you!

[email protected]

28