New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$...

New Algorithms for Learning Incoherent and Overcomplete Dic:onaries

Sanjeev Arora Rong Ge Tengyu Ma Ankur Moitra Princeton Microso8

Research Princeton MIT

ICERM Workshop, May 7

Dic:onary Learning

•  Simple “dicEonary elements” build complicated objects

• Given the objects, can we learn the dicEonary?

Why dic:onary learning? [Olshausen Field ’96]

natural image patches

dicEonary learning

Gabor-‐like Filters

Example: Image Comple:on [Mairal, Elad & Sapiro ’08]

Outline

• DicEonary Learning problem

• GeNng a crude esEmate

• Refining the soluEon

Dic:onary Learning Problem •  Given samples of the form Y = AX •  X is a sparse matrix •  Goal: Learn A (dicEonary). •  InteresEng case: m > n (overcomplete)

…… …… =

DicEonary Element Sparse CombinaEon Samples

Previous Approach

……

DicEonary A Sparse Code X

LASSO Basis Pursuit

Matching Pursuit

Least Squares K-‐SVD

AlternaEng MinimizaEon

Problem with Alterna:ng Minimiza:on

• Monotone objecEve funcEon? •  Local Minimum Issues

……

DicEonary A Sparse Code X

Empirical Behavior IteraEons

log accuracy

•  SyntheEc Experiment, “qualitaEve” plot •  K-‐SVD converges with prob. 1/3, random samples as iniEal dict.

Converge, prob 1/3

Stuck, prob 2/3

slow at the beginning

This Talk

Provable Algorithms

• Run in poly Eme, uses poly samples, learn the ground truth

•  Separate modeling and opEmizaEon error • Design new algorithms/Tweak old algorithms

• Work only on “reasonable instances”

• Consider Image CompleEon

•  The representaEon should be unique and robust!

When is the solu:on “reasonable”?

DicEonary Image

Sparse Recovery

• Given A, y, find x.

•  Incoherence [Donoho Huo ’99]

• DicEonary elements have inner-‐product 𝜇/√𝑛   •  SoluEon is unique and robust •  Long line of work [Logan, Donoho Stark, Elad, …….] •  Sparsity up to √𝑛 

Our Results

•  Thm: If dicEonary A is incoherent, X is randomly k-‐sparse from “nice distribuEon”, learn dicEonary A with accuracy ε when sparsity 𝑘≤ min {√𝑛 /𝜇log 𝑚  , 𝑚↑0.4 } 

• Handles sparsity up to √𝑛  •  Sample complexity O*(m/ε2). Independently [Agarwal et al.] obtain similar result with slightly different assumpEons and weaker sparsity Later [Barak et al.] get stronger result using SOS

Our Results

•  Thm: Given an esEmated dicEonary ε-‐close to true dicEonary, one iteraEon of K-‐SVD outputs a ε/2-‐close dicEonary

• Works whenever ε<1/log m (before require 1/poly). •  Sample complexity O(mlog m)

• Combine: Can learn an incoherent dicEonary with O*(m) samples in poly Eme.

Outline

• Refining the soluEon

•  Find the support of X, without knowing A.

• Given support of X, find approximate A

Finding the Support

•  Tool: Test whether two columns of X intersect

Disjoint ≈ Small Inner-‐product Intersect ≈ Large Inner-‐product

Finding the Support: Overlapping Clustering

• Connect pairs of samples with large inner-‐product • Vertex = Sample • Cluster = Rows of X!

Overlapping Clustering

• Main problem

•  Idea: Count the number of common neighbors • pair of points share unique cluster è cluster

Many Common Neighbors in same Cluster

Few Common Neighbors

Es:mate Dic:onary Elements

•  Focus on a row of X/column of A •  Can use SVD to find maximum variance direcEon •  Or take samples with same sign and average

Outline

• AlternaEng MinimizaEon Works!

K-‐SVD[Aharon,Elad, Bruckstein 06]

• Given: a good guess ( 𝐴 ) • Goal: find a even beqer dicEonary • Update one dict. element:

•  Take all samples with the element •  Decode: 𝑦≈𝐴 𝑥  •  Residual: 𝑟=𝑦−∑𝑗≠𝑖↑▒𝐴 ↓𝑗 𝑥 ↓𝑗  =± 𝐴↓𝑖 +∑𝑗≠𝑖↑▒( 𝐴 ↓𝑗 𝑥 ↓𝑗 − 𝐴↓𝑗 𝑥↓𝑗 )  •  Use top singular vec of residuals

K-‐SVD illustrated

Blue: True dicEonary Dashed: EsEmated DicEonary Take all samples with same element Compute Residual

Hope: In residuals, noise is small and random, top singular vector robust for random noise

K-‐SVD: Intui:on

• When error (𝐴 ↓𝑖 − 𝐴↓𝑖 ) is random •  SEll incoherent •  Can “decode” •  Noise looks random

• When error is adversarial •  May not be incoherent •  Noise can be correlated

• Bad case: error is highly correlated, poinEng to same direcEon

make the noise “random”

• ObservaEon: Can detect the bad case!

•  To handle bad case, need to •  Perturb the esEmated dicEonary •  Keep perturbaEon small •  The result has low spectral norm

• █■min ‖𝐵‖ @𝑠.𝑡. ‖𝐵↓𝑖 − 𝐴 ↓𝑖 ‖≤𝜖 

Large Singular value!

Convex! OPT≤||A||

Low spectral norm is enough

• Key Lemma: When B has small spectral norm, |<Bi,Bj>|≤ 1/log m, random k columns of B are “almost orthogonal” • è Decoding is accurate for a random sample

• Proof sketch: For BTB •  Diagonals are large •  Off-‐diagonals are small in Exp. •  ConcentraEon è random submatrix is diag. dominant

Conclusion & Open Problems

• K-‐SVD works provably with good iniEalizaEon • Does the proof give any insight in pracEce? • Whitening

•  “Error behaves random” useful in other seNngs? • Handle larger sparsity? •  work with RIP assumpEon?

•  Lowerbounds?

Thank you!

Thank you! QuesEons?

K-‐SVD[Aharon,Elad, Bruckstein 06]

• Given: a good guess • Goal: find a even beqer dicEonary

• Update one dict. element: •  Take all samples with the element •  Remove other elements •  Use top singular vec of residuals

• Hope: In residuals, error is small and random, SVD robust for random noise

Other Applica:ons

Image Denoising [Mairal et al. ’09]

Digital Zooming [Couzinie-‐Devy ’10]

Applica:ons

Image CompleEon [Mairal, Elad & Sapiro ’08]

Image Denoising [Mairal et al. ’09]

Digital Zooming [Couzinie-‐Devy ’10]

Refining the solu:on

• Use other columns to reduce the variance! • Get ε accuracy with poly(m,n)log 1/ε samples

New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$...

Documents

Transcript of New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$...

Machine Learning Algorithms for Classiﬁcation Princeton Universityschapire/talks/picasso06.pdf · 2006. 2. 27. · Machine Learning •studies how to automatically learnautomatically

Algorithms - Princeton University Computer Science · 2013-02-06 · Algorithms ‣introduction ‣observations ‣mathematical models ‣order-of-growth classifications ‣theory

Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne COS 423: Theory of Algorithms Princeton University Spring, 2001 Kevin Wayne.

Algorithms and Data Structures Princeton University

Algorithms - Princeton University

Learning with genetic algorithms: An overview - Springer · Learning with Genetic Algorithms: An Overview ... in applying genetic algorithms to machine learning, ... LEARNING WITH

Algorithms - Princeton University Computer Science...Scientiﬁc method applied to analysis of algorithms A framework for predicting performance and comparing algorithms. Scientific

Princeton University - Lancaster Universityucrel.lancs.ac.uk/bignlp2016/Svyatkovskiy.pdf · Princeton University ... • Spark ML standardizes APIs for machine learning algorithms

Sorting Algorithms - Princeton University Computerrs/AlgsDS07/04Sorting.pdf · Sorting Algorithms rules of the game shellsort mergesort quicksort animations 1 Reference: Algorithms

Learning about the Neighborhood - Princeton University

Algorithms - Computer Science Department at Princeton ...

Decomposition Algorithms in Geometry - Princeton University

Reinforcement Learning: Learning algorithms Function Approximation

Concept Learning Algorithms

Machine Learning Algorithms for Classiﬁcation Princeton ...dimacs.rutgers.edu/Workshops/DataMiningTutorial/slides/schapire.pdf · Machine Learning Algorithms for Classiﬁcation

Accelerated Machine Learning Algorithms in Pythonylm/Files/BARC_2017_ppt.pdf · Accelerated Machine Learning Algorithms in Python ... • Attempt to speed up Machine Learning Algorithms

Machine learning algorithms for predicting the amplitude ... · Machine learning algorithms for predicting the amplitude of chaotic laser pulses Machine learning algorithms for predicting

Comparative Study on Machine Learning Algorithms using …...12. Symbolic machine learning algorithms 13. Subsymbolic machine learning algorithms. Decision tree learning:Decision tree

Princeton Analysis of Algorithms SymbolTables

Algorithms - Princeton University · ROBERT SEDGEWICK | KEVIN WAYNE Algorithms ‣ introduction ‣ Ford-Fulkerson algorithm ‣ maxflow-mincut theorem ‣ analysis of running time