On the k-Support and Related Norms - sorbonne-universite.fr
Transcript of On the k-Support and Related Norms - sorbonne-universite.fr
![Page 1: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/1.jpg)
On the k-Support and Related Norms
Massimiliano Pontil
Department of Computer ScienceCentre for Computational Statistics and Machine Learning
University College London
(Joint work with Andrew McDonald and Dimitris Stamos)
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 1 / 14
![Page 2: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/2.jpg)
Plan
Problem
Spectral regularization
k-support norm
Box norm
Link to cluster norm
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 2 / 14
![Page 3: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/3.jpg)
Problem
Learn a matrix from a set of linear measurements:
yi = 〈W ∗,Xi 〉+ noisei , i = 1, ..., n
Method
minW∈Rd×m
n∑i=1
(yi − 〈W ,Xi 〉)2 + λΩ(W )
Matrix completion: Xi = erec>
Multitask learning: Xi = erxi>
Regularizer Ω encourages matrix structure
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 3 / 14
![Page 4: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/4.jpg)
Spectral Regularization
minW∈Rd×m
n∑i=1
(yi − 〈W ,Xi 〉)2 + λΩ(W )
Ω favors matrix structure (low rank, low variance, clustering, etc.)
Choose an OI-norm: Ω(W ) ≡ ‖W ‖ = ‖UWV ‖, ∀U,V orthogonal
von Neumann (1937): ‖W ‖ = g(σ(W )), with g is an SG-function
Well studied example is trace norm: g(·) = ‖ · ‖1
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 4 / 14
![Page 5: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/5.jpg)
k-Support Norm [Argyriou et al. 2012]
Special case of group lasso with overlap [Jacob et al., 2009]
‖w‖(k) = inf
∑|J|≤k
‖vJ‖2 :∑|J|≤k
vJ = w , supp(vJ) ⊂ J
Includes the `1-norm (k = 1) and `2-norm (k = d)
Unit ball of ‖ · ‖(k) is the convex hull of card(w) ≤ k , ‖w‖2 ≤ 1
Dual norm: ‖u‖∗,(k) =
√k∑
i=1(|u|↓i )2
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 5 / 14
![Page 6: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/6.jpg)
Spectral k-Support Norm
k-support norm is an SG-function, inducing the OI-norm
‖W ‖(k) := ‖σ(W )‖(k)
Proposition. Unit ball of ‖σ(·)‖(k) is the convex hull of
rank(W ) ≤ k , ‖W ‖F ≤ 1
Includes trace norm (k = 1) and Frobenius norm (k = d)
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 6 / 14
![Page 7: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/7.jpg)
Matrix Completion Experiment
dataset norm test error r k a
ML 100k tr 0.2017 13 - -ρ = 50% en 0.2017 13 - -
ks 0.1990 9 1.87 -box 0.1989 10 2.00 1e-5
ML 1M tr 0.1790 17 - -ρ = 50% en 0.1789 15 - -
ks 0.1782 17 1.80 -box 0.1777 19 2.00 1e-6
Jester1 tr 0.1752 11 - -20 per en 0.1752 11 - -line ks 0.1739 11 6.38 -
box 0.1726 11 6.40 2e-5
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 7 / 14
![Page 8: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/8.jpg)
MTL Experiment
Table: Multitask learning clustering on Lenk dataset, with simple thresholding.
dataset norm test error k a
Lenk fr 3.7869 (0.07) - -8 per task tr 1.9058 (0.04) - -
en 1.8974 (0.04) - -ks 1.8933 (0.04) 1.02 -box 1.8916 (0.04) 1.01 5.5e-3c-fr 1.8667 (0.08) - -c-tr 1.7904 (0.03) - -c-en 1.7896 (0.03) - -c-ks 1.7775 (0.03) 1.89 -c-box 1.7754 (0.03) 1.12 9.5e-3
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 8 / 14
![Page 9: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/9.jpg)
Box Norm
Let Θ ⊆ Rd++, bounded and convex and consider the norm:
‖w‖2Θ = inf
θ∈Θ
d∑i=1
w2i
θi, ‖u‖2
∗,Θ = supθ∈Θ
d∑i=1
θiu2i
Box norm: Θ =a < θi ≤ b,
∑di=1 θi ≤ c
Includes k-support norm for a = 0, b = 1, c = k
Unit ball is the convex hull of
⋃|J|≤k
w ∈ Rd :
∑i∈J
w2i
b+∑i /∈J
w2i
a≤ 1
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 9 / 14
![Page 10: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/10.jpg)
Unit Balls
Figure: Unit balls of the box norm in R2 for k = 1, a ∈ 0.01, 0.25, 0.50.
Figure: Unit balls of the dual box norm in R2 for k = 1, a ∈ 0.01, 0.25, 0.50.
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 10 / 14
![Page 11: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/11.jpg)
Cluster Norm
Box norm is an SG-function, inducing the OI-norm
‖W ‖2Θ = ‖σ(W )‖2
Θ = inf d∑
i=1
σi (W )2
θi: θ ∈ (a, b]d ,
d∑i=1
θi ≤ c
Associated OI-norm has been used to favour task clustering [Jacob et al.
2008]. It can be written as
‖W ‖2Θ = inf
tr(WΣ−1W T ) : aI Σ bI , tr Σ ≤ c
Includes spectral k-support norm for a = 0, b = 1, c = k
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 11 / 14
![Page 12: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/12.jpg)
Interpretation of “a”
Proposition. If c = da + k(b − a), the solution of the regularizationproblem is given by W = V + Z , where
(V , Z ) = argminV ,Z
n∑i=1
(yi − 〈V + Z ,Xi 〉)2 + λ
(1
a‖V ‖2
F +1
b − a‖Z‖2
(k)
)
Parameter ‘a’ balances the relative importance of the two components
Cluster norm is the Moureau envelope of spectral k-support norm:
‖W ‖2Θ = min
Z∈Rd×m
1
a‖W − Z‖2
F +1
b − a‖Z‖2
(k)
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 12 / 14
![Page 13: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/13.jpg)
Computation of the Θ norm
Assume w.l.o.g. w ≥ 0 with non increasing components
‖w‖2Θ = 1
b‖w[1:q]‖22 + 1
c−qb−`a‖w[q+1:d−`]‖21 + 1
a‖w[`+1:d ]‖22,
where q, ` ∈ 0, ..., d are uniquely determined
In particular: ‖w‖(k) = ‖w[1:q]‖22 + 1
k−q‖w[q+1:d ]‖21
where q ∈ 0, ..., k − 1 is determined by |w |↓q ≥ 1k−q
d∑j=q+1
|w |↓j > |w |↓q+1
Computation of norm is O(d log(d))
For k-support improves previous O(kd) method
Efficient optimization using proximal-gradient methods
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 13 / 14
![Page 14: On the k-Support and Related Norms - sorbonne-universite.fr](https://reader031.fdocuments.us/reader031/viewer/2022012508/6184c0f1b28ebe404107e979/html5/thumbnails/14.jpg)
Extensions/Open Problems
Other sets Θ allow for exact prox, e.g. Θ = θ1 ≥ . . . θd > 0.Can give a general characterization?
Online learning / stochastic optimization
Kernel extensions
Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 14 / 14