Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based...

52
Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium Email: [email protected] http://www.esat.kuleuven.be/scd/ Nonsmooth optimization in machine learning, Liege, March 4 2013 (joint work with Marco Signoretto, Quoc Tran Dinh, Lieven De Lathauwer) Learning with matrix and tensor based models using low-rank penalties - Johan Suykens

Transcript of Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based...

Page 1: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Learning with matrix and tensor based models

using low-rank penalties

Johan Suykens

KU Leuven, ESAT-SCD/SISTAKasteelpark Arenberg 10

B-3001 Leuven (Heverlee), BelgiumEmail: [email protected]

http://www.esat.kuleuven.be/scd/

Nonsmooth optimization in machine learning, Liege, March 4 2013

(joint work with Marco Signoretto, Quoc Tran Dinh, Lieven De Lathauwer)

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens

Page 2: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Learning with matrices and tensorsneuroscience: EEG data

(time samples × frequency × electrodes)

computer vision: image (/video) compression/completion/· · ·(pixel × illumination × expression × · · ·)

web mining: analyze users behaviors

(users × queries × webpages)

vector x matrix X tensor X

data vector x −→ data matrix X −→ data tensor Xvector model: −→ matrix model: −→ tensor model:y = wTx y = 〈W,X〉 y = 〈W,X〉

[Signoretto M., Tran Dinh Q., De Lathauwer L., Suykens J.A.K., “Learning with Tensors:

a Framework Based on Convex Optimization and Spectral Regularization”, 2011]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 1

Page 3: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Overview

• Sparsity

• Matrix completion and tensor completion

• Learning with matrices and low rank penalty

• Learning with tensors

• Optimization algorithms

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 1

Page 4: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Learning with matrices and tensors

vector x matrix X tensor X

data vector x −→ data matrix X −→ data tensor Xvector model: −→ matrix model: −→ tensor model:y = wTx y = 〈W,X〉 y = 〈W,X〉

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 2

Page 5: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Sparsity in machine learning

• through loss function: model y =∑

i αiK(x, xi) + b

min wTw + γ∑

i

L(ei)

⇒ sparse α

• through regularization: model y = wTx + b

min∑

j

|wj| + γ∑

i

e2i

⇒ sparse w

−ε 0 +ε

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 3

Page 6: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Sparsity (1)

• Underdetermined linear system:

Ax = b A ∈ Rn×m, n < m

• Minimum norm solution:

minx

‖x‖22 s.t. Ax = b ⇒ x = AT (AAT )−1b

• Sparsest solution:

(P0) minx

‖x‖0 s.t. Ax = b (with ‖x‖0 = #i : xi 6= 0)

• Alternatives: lp-norms ‖x‖p = (∑

i |xi|p)1/p

(Pp) minx

‖x‖p s.t. Ax = b

Nonconvex for 0 < p < 1, convex for p = 1.

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 4

Page 7: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Sparsity (2)

• Mutual coherence: µ(A) = max1≤k,j≤m, k 6=j|aT

k aj|

‖ak‖2‖aj‖2

For a full rank A ∈ Rn×m, n < m, if a solution x exists satisfying

‖x‖0 <1

2(1 +

1

µ(A))

it is both the unique solution of (P1) and (P0).

• Restricted Isometry Property (RIP): Matrix A ∈ Rn×m has by

definition RIP(δ, k) if each submatrix AI (by combining at most kcolumns of A) has its nonzero singular values bounded between 1 − δand 1 + δ.

Matrix A with RIP(0.41; 2k) implies that (P1) and (P0) have identicalsolutions on all k-sparse vectors.

[Bruckstein et al., SIAM Review, 2009; Candes & Tao, 2005; Donoho & Elad, 2003; ...]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 5

Page 8: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Learning with matrices and tensors

vector x matrix X tensor X

data vector x −→ data matrix X −→ data tensor Xvector model: −→ matrix model: −→ tensor model:y = wTx y = 〈W,X〉 y = 〈W,X〉

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 6

Page 9: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Matrix completion: example

Given image (80 % missing entries)

[experiments by M. Signoretto]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

Page 10: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Matrix completion: example

Given image (80 % missing entries) and completed image

[experiments by M. Signoretto]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

Page 11: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Matrix completion: example

Given image (40 % missing entries)

[experiments by M. Signoretto]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

Page 12: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Matrix completion: example

Given image (40 % missing entries) and completed image

[experiments by M. Signoretto]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

Page 13: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Matrix completion: example

Original image

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

Page 14: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Matrix completion (1)

Given: matrix X with missing entriesGoal: complete the missing entriesAssumption: assume that X has low rank

minX

‖X‖∗

subject to Xij = Yij, i, j ∈ S

• given values Yij with i, j ∈ S a subset of all entries of the matrix

• Nuclear norm ‖X‖∗ =∑

i σi with σi the singular values of X(singular value decomposition: X =

i σiuivTi )

• ‖X‖∗ is convex envelope of rankX on X : ‖X‖ ≤ 1 [Fazel, 2002]

• ‖X‖ ≤ ‖X‖F ≤ ‖X‖∗ ≤√

r‖X‖F ≤ r‖X‖ [Recht et al., 2010]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 8

Page 15: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Matrix completion (2)

This can be written as an SDP problem (semidefinite program):

minX,W1,W2

tr(W1) + tr(W2)

subject to Xij = Yij i, j ∈ S[

W1 XX∗ W2

]

0

The nuclear norm plays a similar role as the l1 norm, at the matrix level.

[Fazel et al., 2001; Candes & Recht, 2009]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 9

Page 16: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Matrix completion: RIP property

• Consider:(P0) : min rank X s.t. A(X) = b

(P1) : min ‖X‖∗ s.t. A(X) = b

• r-restricted isometry constant: smallest number δr(A) such that

1 − δr(A) ≤ ‖A(X)‖‖X‖F

≤ 1 + δr(A)

holds for all X of rank at most r, with A : Rm×n → R

p a linear map.

• Suppose that δ2r < 1 for integer r ≥ 1. Then solution to (P0) is theonly matrix of rank at most r satisfying A(X) = b.

• Suppose that r ≥ 1, is such that δ5r < 110, then solution to (P1) equals

solution to (P0).

[Recht, Fazel, Parrilo, Siam Rev, 2010]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 10

Page 17: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Tensor completion

Given: N -th order tensor X ∈ RI1×..×IN with missing entries

Goal: complete the missing entriesAssumption: assume that X has low rank

minX

‖X‖∗

subject to Xi1i2...iN = Yi1i2...iN , i1, i2, ..., iN ∈ Swith

• given entries Yi1i2...iN with i1, i2, ..., iN ∈ S a subset of the tensor

• Nuclear norm ‖X‖∗ = 1N

n∈NN‖X〈n〉‖∗ with X〈n〉 the n-th mode

matrix unfolding

[Signoretto M., Van De Plas R., De Moor B., Suykens J.A.K., IEEE-SPL, 2011; Gandy et

al., 2011, Tomioka et al., 2011]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 11

Page 18: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Mass spectral imaging - digital staining

Data tensor: 51 × 34 pixels × 6490 variables/spectrumGiven partial labelling (4 classes), SVM prediction on all pixels

cerebellar cortex - Ammon’s horn section of hippocampus - cauda-putamen - lateral ventricle area

[Luts J., Ojeda F., Van de Plas R., De Moor B., Van Huffel S., Suykens J.A.K., ACA 2010]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 12

Page 19: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Tensor completion on mass spectral imaging

Mass spectral imaging: sagittal section mouse brain [data: E. Waelkens, R. Van de Plas]

Tensor completion using nuclear norm regularization [Signoretto et al., IEEE-SPL, 2011]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 13

Page 20: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Multichannel EEG for patient-specific seizure detection

• The electroencephalogram (EEG) measures the electrical activity ofthe brain and is a well-established technique in epilepsy diagnosis andmonitoring.

• Automatic seizure detection would drastically decrease the workloadof clinicians; EEG can provide accurate information about the onset ofthe seizure.

• As the seizure spreads quickly through the brain, the early detection ofthe seizure is essential.

[Hunyadi B., Signoretto M., Van Paesschen W., Suykens J., Van Huffel S., De Vos A.,

Clinical Neurophysiology, 2012]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 14

Page 21: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Feature-channel matrix

Extracted features:

Time domain features:1.-3. Number zero crossings, max & min

4. Skewness (skew)

5. Kurtosis (kurt)

6. Root mean square amplitude (rmsa)

Frequency domain features:7. Total power (TP)

8. Peak frequency (PF)

9.-16. Mean and normalized power in frequency bands:

delta: 13 Hz (D, nD), theta: 48 Hz (T, nT),

alpha: 913 Hz (A, nA), beta: 1420 Hz (B, nB)

EEG data: CHB-MIT database - scalp EEG recordings, 23 pediatric patients, 18 channels

0 2 4 6 8 10

T8−P8FT10−T8

FT9−FT10T7−FT9

P7−T7CZ−PZFZ−CZP8−O2T8−P8F8−T8

FP2−F8P4−O2C4−P4F4−C4

FP2−F4P3−O1C3−P3F3−C3

FP1−F3P7−O1T7−P7F7−T7

FP1−F7

Time (sec)

365 uV

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 15

Page 22: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Model with nuclear norm regularization

• Synchronization between EEG channels is a generally occurringcharacteristic. Representing the data in matrix form allows to exploit thecommon information among the channels.

• Model: (per patient)y = 〈W,X〉 + b

where 〈W,X〉 =∑

ij WijXij with X,W ∈ Rd×p,

d the number of features, p number of channels.Classifier with decision rule sign[y]

• Training from given data (Xk, yk)Nk=1:

minW,b

N∑

k=1

(yk − yk)2 + µ‖W‖∗

with nuclear norm ‖W‖∗ =∑

i σi with singular values σi; the labels±1 correspond to seizure and non-seizure epoch.

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 16

Page 23: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Multichannel EEG for patient-specific seizure detection

[Hunyadi B., Signoretto M., Van Paesschen W., Suykens J., Van Huffel S., De Vos A., Clinical Neurophysiology, 2012]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 17

Page 24: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Learning with matrices and tensors

vector x matrix X tensor X

data vector x −→ data matrix X −→ data tensor Xvector model: −→ matrix model: −→ tensor model:y = wTx y = 〈W,X〉 y = 〈W,X〉

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 18

Page 25: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Tensors

• N -th order tensor A ∈ RI1×I2×···×IN

• inner product: 〈A,B〉 :=∑

i1

i2· · ·

iNAi1i2···iNBi1i2···iN

• norm: ‖A‖ :=√

〈A,A〉• n−mode vector: obtained by varying in and keeping other indices fixed

• n−rank rankn(A): dimension of space spanned by n−mode vectors

• rank-(r1, r2, . . . , rN) tensor: tensor for which rn = rankn(A) for n ∈ NN

• multilinear rank: N−tuple (r1, r2, . . . , rN)

• rank: rank(A) := arg min

R ∈ N : A =∑

r∈NRu

(1)r ⊗ u

(2)r ⊗ · · · ⊗

u(N)r : u

(n)r ∈ R

In ∀ r ∈ NR, n ∈ NN

• property: rankn(A) ≤ rank(A) ∀n

• special case of matrix: rank1(A) = rank2(A) = rank(A)

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 19

Page 26: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Tensors

• N -th order tensor A ∈ RI1×I2×···×IN

• inner product: 〈A,B〉 :=∑

i1

i2· · ·

iNAi1i2···iNBi1i2···iN

• norm: ‖A‖ :=√

〈A,A〉• n−mode vector: obtained by varying in and keeping other indices fixed

• n−rank rankn(A): dimension of space spanned by n−mode vectors

• rank-(r1, r2, . . . , rN) tensor: tensor for which rn = rankn(A) for n ∈ NN

• multilinear rank: N−tuple (r1, r2, . . . , rN)

• rank: rank(A) := arg min

R ∈ N : A =∑

r∈NRu

(1)r ⊗ u

(2)r ⊗ · · · ⊗

u(N)r : u

(n)r ∈ R

In ∀ r ∈ NR, n ∈ NN

• property: rankn(A) ≤ rank(A) ∀n

• special case of matrix: rank1(A) = rank2(A) = rank(A)

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 20

Page 27: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Tensors

• N -th order tensor A ∈ RI1×I2×···×IN

• inner product: 〈A,B〉 :=∑

i1

i2· · ·

iNAi1i2···iNBi1i2···iN

• norm: ‖A‖ :=√

〈A,A〉• n−mode vector: obtained by varying in and keeping other indices fixed

• n−rank rankn(A): dimension of space spanned by n−mode vectors

• rank-(r1, r2, . . . , rN) tensor: tensor for which rn = rankn(A) for n ∈ NN

• multilinear rank: N−tuple (r1, r2, . . . , rN)

• rank: rank(A) := arg min

R ∈ N : A =∑

r∈NRu

(1)r ⊗ u

(2)r ⊗ · · · ⊗

u(N)r : u

(n)r ∈ R

In ∀ r ∈ NR, n ∈ NN

• property: rankn(A) ≤ rank(A) ∀n

• special case of matrix: rank1(A) = rank2(A) = rank(A)

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 21

Page 28: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Mode unfoldings of a tensor

• n−mode unfolding A〈n〉 ∈ RIn×J (matricization):

matrix whose columns are the n−mode vectors with J :=∏

j∈NN\n Ij

• n−mode unfolding ·〈n〉 : RI1×I2×···×IN → R

In×J .

• refolding: ·〈n〉 : RIn×J → R

I1×I2×···×IN

• property: rankn(A) = rank(A〈n〉)

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 22

Page 29: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Multilinear SVD (1)

[De Lathauwer L., De Moor B., Vandewalle J., 2000]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 23

Page 30: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Multilinear SVD (2)

• n−mode product A×n U ∈ RI1×I2×···×In−1×Jn×In+1×···×IN :

product of tensor A ∈ RI1×I2×···×IN by matrix U ∈ R

Jn×In

• multilinear SVD:

A = S ×1 U(1) ×2 U

(2) ×3 · · · ×N U(N)

with

– core tensor S ∈ RI1×I2×···×IN

– U(n) ∈ R

In×In a matrix of n−mode singular vectors, i.e., left singularvectors of the n−mode unfolding W〈n〉 with SVD

A〈n〉 = U(n)diag(σ(A〈n〉))V

(n)⊤

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 24

Page 31: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Inductive and transductive learning

transductive learning with tensors inductive learning with tensors

soft-completion

data: partially specified input data tensor

and matrix of target labels

data: pairs of fully specified input features

and vectors of target labels

output: latent features and missing labels output: models for out-of-sample evaluations

of multiple tasks

hard-completion

data: pairs of fully specified input features

and vectors of target labels

output: missing input data

[Signoretto M., Tran Dinh Q., De Lathauwer L., Suykens J.A.K., 2011]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 25

Page 32: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Inductive learning with tensors: setting

• Training data DN =(

X (n), y(n))

∈ RD1×D2×···×DM × R

T : n ∈ NN

n = 1, ..., N training datat = 1, ..., T outputs (tasks)M−th order input data tensor

• Modelyt = 〈W(t),X〉 + bt, t = 1, ..., T

• Assumptions:

– X = X + E with X a rank-(r1, r2, . . . , rM) tensor– for core tensors:

〈W(t),X〉 = 〈SW(t),SX 〉low multilinear rank in W(t) = SW(t) ×1 U1 ×2 U2 × · · · ×M UM

– target lables yt generated according to p(yt|yt) = 1/(1 + exp(−ytyt))

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 26

Page 33: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Inductive learning with tensors: training

• Penalized empirical risk minimization:

minW,b

fDN(W, b) +

m∈NM+1

λm ‖W〈m〉‖∗

with misclassification error e.g. based on logistic loss:

fDN: (W, b) 7→

n∈NN

t∈NT

log(

1 + exp(

−y(n)t

(

〈X (n),W(t)〉 + bt

)))

• gives a predictive model, applicable to input data X beyond the trainingdata

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 27

Page 34: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Transductive learning with X and Y completion

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 28

Page 35: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Transductive learning with tensors: setting

• Tensors X ∈ RD1×D2×···×DM×N and Y = [y(1)y(2) · · · y(N)] ∈ R

T×N

• Missing entries both in X and Y:SX , SY: index sets of observed entries of X and Y

SSX, SSY

: sampling operators related to the index sets

• Implicit model:

y(n)t = 〈W(t),X (n)〉 + bt, t = 1, ..., T

• Assumptions:

– X = X + E with X a rank-(r1, r2, . . . , rM , rM+1) tensor– targets yt generated according to p(ytn|ytn) = 1/(1 + exp(−ytnytn))

– rank([

X〈M+1〉, Y⊤])

≤ rM+1 ≪ min(N, J + T )

with J =∏

j∈NMDj

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 29

Page 36: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Transductive learning with tensors: estimation

• Estimation of X , Y, b:

min(X ,Y ,b)∈V

fλ0(X , Y, b) +∑

m∈NM

λm‖X〈m〉‖∗ + λM+1

[

X〈M+1〉, Y⊤]∥

• objective function:

– V has module spaces(

RD1×D2×···×DM×N

)

×(

RT×N

)

×RT and inner

product 〈(X1, Y1, b1), (X2, Y2, b2)〉V = 〈X1, X2〉+ 〈Y1, Y2〉+ 〈b1, b2〉– objective

fλ0(X , Y, b) = fx(X ) + λ0fy(Y, b)

with fx : X 7→ ∑

p∈NPlx((ΩS

XX )p, z

xp)

fy : (Y, b) 7→ ∑

q∈NQly((ΩS

Y(Y + b ⊗ 1J))q, z

yq )

– losses e.g. lx : (u, v) 7→ 12(u − v)2, ly : (u, v) 7→ log(1 + exp(−uv))

– zx, zy are vectors of the observed entries

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 30

Page 37: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Transductive soft completion: Olivetti faces

original

true label: 5 predicted: 3 predicted: 5

input data matrix-sc tensor-sc

original

true label: 5 predicted: 3 predicted: 5

input data matrix-sc tensor-sc

original

true label: 3

input data matrix-sc tensor-sc

predicted: 3predicted: 3

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 31

Page 38: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Impainting color images by hard completion

original given image completed

Tensor: mode 1 and 2: pixel space, mode 3: 8-bit RGB color information

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 32

Page 39: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Impainting color images by hard completion

original given image completed

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 33

Page 40: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Impainting color images by hard completion

original given image completed

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 33

Page 41: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Impainting color images by hard completion

original given image completed

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 33

Page 42: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Impainting color images by hard completion

original given image completed

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 33

Page 43: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Optimization algorithm (1)

The learning problems are instances of the following convex optimizationproblem on an abstract vector space:

minw∈W

f(w) + g(w)

subject to w ∈ C

with- f : convex and differentiable functional- ∇f is Lf -Lipschitz:

‖∇f(w) −∇f(v)‖W ≤ Lf‖w − v‖W ∀ w, v ∈ W ;

- g: convex but possibly non-differentiable functional- C ⊆ W is a is non-empty, closed and convex set

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 34

Page 44: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Optimization algorithm (2)

• Problem restatement

minw∈W

h(w) = f(w) + g(w) + δC (w) , δC : w 7→

0, if w ∈ C

∞, otherwise

• Proximity operator

x(t+1) = proxτh

(

x(t))

with

proxτh : x 7→ arg minw∈W

h(w) +1

2τ‖w − x‖2 .

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 35

Page 45: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Optimization algorithm (3)

• Operator splitting approach:split h(w) = f(w) + g(w) + δC (w)into f(w) + δC (w) and non-smooth term g(w)

• Douglas-Rachford splitting:

y(k) = arg minx∈C

f (x) + 12τ

∥x − w(k)∥

2

W→ (solved inexactly)

r(k) = proxτg(2y(k) − w(k))

w(k+1) = w(k) + γ(k)(

r(k) − y(k))

• Projection onto C

Proof of convergence for sequence y(k)k

Stopping criterion based on h

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 36

Page 46: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Note: matrix case - Singular Value Thresholding

Given matrix Y , finding the solution to

minX

1

2‖X − Y ‖2

F + λ‖X‖∗

with λ > 0, is given by a shrinkage operation on singular values of Y :

proxtrλ (Y ) = U max(S − λI, 0)V T

[Cai et al., 2008; Tomioka et al., 2011]

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 37

Page 47: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Tensor case

• Learning problems involve the tensor modes:

m∈NM+1

λm ‖W〈m〉‖∗

• Consider space W with cartesian product W1 × W2 × · · · × WI andinner product 〈x, y〉 =

i∈NI〈xi, yi〉i.

• Assume function g : W → R defined by

g : (x1, x2, . . . , xI) 7→∑

i∈NI

gi(xi)

where for any i ∈ NI, gi : Wi → R is convex. Then we have:

proxg(x) =(

proxg1(x1),proxg2

(x2), · · · , proxgI(xI)

)

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 38

Page 48: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Duplication - transductive learning case

• Duplication of the tensors leads to considering the set:

C :=

(X[1], X[2], . . . , X[M ], X[M+1], Y , b) ∈ W : X[1] = X[2] = . . . = X[M+1]

• This gives the problem statement:

min(X[1],X[2],...,X[M ],X[M+1],Y ,b)∈W

f(X[1], . . . , X[M+1], Y , b) + g(X[1], . . . , X[M+1], Y )

subject to (X[1], . . . , X[M+1], Y , b) ∈ C

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 39

Page 49: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Prox and tensor modes

We apply

proxτg(X[1], . . . , X[M+1], Y ) =(

proxτλ1‖σ(·〈1〉)‖1(X[1]), · · · , proxτλM‖σ(·〈M〉)‖1

(X[M ]),Z1, Z2

)

where [Z1(X , Y ),Z2(X , Y )] is a partitioning of

Z(X , Y ) = Udiag(

proxτλM+1‖σ(·)‖1

([

X〈M+1〉, Y⊤]))

V⊤

with

proxλ‖σ(·〈n〉)‖1(W) =

(

U(n)diag(dλ)V (n)⊤

)〈n〉

and (dλ)i := max(σi(W〈n〉) − λ, 0).

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 40

Page 50: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Conclusions

• Sparsity:from vectors to matricesfrom matrices to tensors

• Transductive and inductive learning with matrices/tensors:going beyond matrix/tensor completion

• Further details:

Signoretto M., Tran Dinh Q., De Lathauwer L., Suykens J.A.K., “Learning with Tensors:

a Framework Based on Convex Optimization and Spectral Regularization”, 2011

• Software:https://sites.google.com/site/marcosignoretto/codeshttp://www.esat.kuleuven.be/sista/ADB/software.php

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 41

Page 51: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Acknowledgements

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 42

Page 52: Learning with matrix and tensor based models using low ... · Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark

Thank you

Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 43