Tensors and graphical models - Personal...

Post on 10-Oct-2020

5 views 0 download

Transcript of Tensors and graphical models - Personal...

Tensors and graphical models

Mariya Ishteva with Haesun Park, Le Song

Dept. ELEC, VUB Georgia Tech, USA

INMA Seminar, May 7, 2013, LLN

Outline

Tensors

Random variables and graphical models

Tractable representations

Structure learning

2

Tensors

RM×N×P

3

Ranks

• Multilinear rank (R1,R2,R3)

• Rank-R

Rank-1 tensor:

R = min(r), s.t. A =r∑

i=1

{rank-1 tensor}i

4

Matrix representations of tensors

Mode-1

A = A(1) =

• Mode-2

• Mode-3

• Multilinear rank: (rank(A(1)), rank(A(2)), rank(A(3)))

5

Tensor-matrix multiplication

• Tensor-matrix product

• Contraction A ∈ RI×J×M B ∈ R

K×L×M

C = 〈A, B〉3 C(i , j , k , l) =M∑

m=1

aijmbklm

4th order tensorC ∈ R

I×J×K×L

6

Basic decompositions

Singular value decomposition (SVD)

MLSVD / HOSVD

CP / CANDECOMP / PARAFAC

7

Outline

Tensors

Random variables and graphical models

Tractable representations

Structure learning

8

Discrete random variables

• Random variable

X ; 1, . . . , nPx(1), . . . , Px(n) Px ∈ R

n, Rn+, [0, 1]

• X1,X2; P(X1,X2) P12 ∈ Rn×n

1 · · · n1 P12(1, 1) · · · P12(1, n)...

n P12(n, 1) · · · P12(n, n)

• P(x1, x2) := P(X1 = x1,X2 = x2)

9

2 random variables

X1,X2; P(X1,X2) P12 ∈ Rn×n

X1 ⊥ X2

P(x1, x2) = P(x1)P(x2)rank-1 matrix

=

H

X1 X2

P(x1, x2) =∑

h

P(x1|h)P(x2|h)P(h)

low-rank matrixrank-k matrix, k < n

=

Conditional probability tables (CPTs) P(X1|H),P(X2|H)

10

3 random variablesX1,X2,X3; P(X1,X2,X3) P123 ∈ R

n×n×n

X1,X2,X3 independent

P(x1, x2, x3) = P(x1)P(x2)P(x3)

rank-1 tensor

=

H

X1 X2 X3

rank-k tensor, k < n

=

= · · ·

P(x1, x2, x3) =∑

h

P(x1|h)P(x2|h)P(x3|h)P(h)

11

4 random variables

• X1,X2,X3,X4; P(X1,X2,X3,X4) P1234 ∈ Rn×n×n×n

• X1,X2,X3,X4 independent

• H

X1 X2 X3 X4

P(x1, x2, x3, x4) =∑

h

P(x1|h)P(x2|h)P(x3|h)P(x4|h)P(h)

• more variables

• more hidden variables

12

Challenges

• 10 variables, 10 states each −→ 1010 entries

• We need tractable representations• Latent variable models / low-rank factors• # parameters: exponential −→ polynomial

H

X1 X1 X

X1 X1 X1

• Challenges:• Choose a good representation X

• Learn the correct structure X

• Estimate the parameters ×

13

Outline

Tensors

Random variables and graphical models

Tractable representations

Structure learning

14

Tensors and graphical models

CP / CANDECOMP / PARAFACH

X1 X2 Xn· · ·

Tensor trainH1 H2 H3 Hn

X1 X2 X3 Xn

· · ·

HMM

Hierarchical Tucker

H

X1 X1 X

X1 X1 X1 Latent tree model

Tucker / MLSVDBlock term decomposition

×

15

Tensor train (TT) decomposition

A(i1,...,id )=∑

α0,...,αd

G1(α0, i1, α1)G2(α1, i2, α2) . . .Gd(αd−1, id , αd )

[I. V. Oseledets, SIAM J. Scientific Computing, 2011]

• Avoids curse of dimensionality• Small number of parameters, compared to Tucker model• Slightly more parameters than CP but more stable• Gk (αk−1, nk , αk ) has dimensions rk−1 × nk × rk , r0 = rd = 1• rk are called compression ranks:

Ak = Ak (i1, . . . , ik ; ik+1, . . . , id ), rank(Ak ) = rk

• Computation based on SVD• Computation: top → bottom

H1 H2 H3 Hn

X1 X2 X3 Xn

· · ·

16

Hierarchical Tucker decomposition

[L. Grasedyck, SIMAX, 2010]

• Similar properties as TT decomposition• Computation: bottom → top

H

X1 X1 X

X1 X1 X1

17

Potential advantages of tensor approach

• Real data are often multi-way

• Provides higher-level view

• Flexibility: different ranks in each mode: Tucker

• Uniqueness: CP, Block term decomposition

• No curse of dimensionality: Tensor train, hierarch. Tucker

18

Outline

Tensors

Random variables and graphical models

Tractable representations

Structure learning

19

Structure learning

• Given: (samples of) observed variables

• Assumption: the variables can be connected via hiddenvariables in a tree structure in a meaningful way

• Find: the tree / the relationships between the variables

• Additional difficulty: unknown number of hidden states

H

H H

X X X X

X3 X5 X2 X1 X X1 X1 X1

X1 X1

?

20

Quartet relationships: topologies

X1

X2

X3

X4

H G

X1

X3

X2

X4

H G

X1

X4

X2

X3

H G

P(x1, x2, x3, x4) =∑

h,g

P(x1|h)P(x2|h)P(h, g)P(x3|g)P(x4|g)

21

Building trees based on quartet relationships

Choose 3 variables and form a tree

Add all other variables, one by one

• Split the current tree into 3 subtrees• Choose 3 variables from different subtrees• Resolve the quartet relation with current and chosen variables• Insert the current variable in a subtree or connect to the tree

[For simplicity, assume each latent variable has 3 neighbors]

22

Tensor view of quartets

X1

X2

X3

X4

H G

P(X1,X2,X3,X4) =

P1|H

P2|H

IH PHG IG

P4|G

P3|G

A = reshape(P,n2,n2);

B = reshape(permute(P, [1,3,2,4]),n2,n2);

C = reshape(permute(P, [1,4,2,3]),n2,n2).

Notation: P1|H , P2|H , etc. stand for P(X1|H), P(X2|H), etc.

23

Rank properties of matrix representations

A =

(

( (

(P2|H P1|H PHG P4|G P3|G⊤

B =

(( (

(P3|G P1|H diag(PHG(:)) P4|G P2|H⊤

• rank(A) = rank(PHG) = krank(B) = rank(C) = nnz(PHG)

rank(A) ≪ rank(B) = rank(C)

• Sampling noise Nuclear norm relaxation

‖A‖∗ =∑n2

i=1 σi(A)

24

Resolving quartet relations

Algorithm 1 i∗ = Quartet(X1, X2, X3, X4)

1: Estimate P(X1,X2,X3,X4) from a set of m i.i.d. samples.2: Unfold P into matrices A, B and C, and compute

a1 = ‖A‖∗, a2 = ‖B‖∗ and a3 = ‖C‖∗.

3: Return i∗ = arg min i∈{1,2,3}ai .

• Easy to compute

• Recovery conditions

• Finite sample guarantees

• Agnostic to the number of hidden states

• Compares favorably to alternatives

25

Example: stock data

Given: stock prices (25 years, discretized into 10 values)

Find: relations between stocks

Finance:• C (Citigroup)• JPM (JPMorgan Chase)• AXP (American Express)• F (Ford Motor: Automotive and Financial Services)

Retailers:• TGT (Target)• WMT (WalMart)• RSH (RadioShack)

26

Conclusions

• Tensor decompositions are related to graphical models

• A common goal: tractable representations

• Tensors can be used for structure learning

27

Thank you!

mariya.ishteva@vub.ac.be

28