Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the...

24
Fast Algorithms for Data Recovery Erin Tripp Mathematics Department Syracuse University February 3, 2017 Erin Tripp (Mathematics Department Syracuse University) Fast Algorithms for Data Recovery February 3, 2017 1 / 17

Transcript of Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the...

Page 1: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Fast Algorithms for Data Recovery

Erin Tripp

Mathematics DepartmentSyracuse University

February 3, 2017

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 1 / 17

Page 2: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Overview

Table of Contents

1 Overview

2 Low Rank Matrix Recovery

3 Related Research

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 2 / 17

Page 3: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Overview

We look at three variations of an algorithm for recovering data fromcorrupted observations and some related work. All variations are based onmodifying one classical optimization method: gradient descent.

We also look at how we can extend these methods to more generalproblems and what performance guarantees we might see.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 3 / 17

Page 4: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Table of Contents

1 Overview

2 Low Rank Matrix Recovery

3 Related Research

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 4 / 17

Page 5: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Problem

From observed data Y ∈ Rd1×d2 where Y = M + S , we want to recoverthe low rank matrix M. When does this sort of problem come up?

Foreground background separation:Y is video data encoded into a matrix of pixel valuesM is the stable backgroundS is the foreground of moving objects

Principal component analysis:Y is the observed dataM is the “true data”S is some sort of corruption or noise

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 5 / 17

Page 6: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Problem

From observed data Y ∈ Rd1×d2 where Y = M + S , we want to recoverthe low rank matrix M. When does this sort of problem come up?

Foreground background separation:Y is video data encoded into a matrix of pixel valuesM is the stable backgroundS is the foreground of moving objects

Principal component analysis:Y is the observed dataM is the “true data”S is some sort of corruption or noise

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 5 / 17

Page 7: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Problem

From observed data Y ∈ Rd1×d2 where Y = M + S , we want to recoverthe low rank matrix M. When does this sort of problem come up?

Foreground background separation:Y is video data encoded into a matrix of pixel valuesM is the stable backgroundS is the foreground of moving objects

Principal component analysis:Y is the observed dataM is the “true data”S is some sort of corruption or noise

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 5 / 17

Page 8: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Problem

We formulate the problem of recovering M as a minimization problem:

minU,V ,S

‖UV T + S − Y ‖2F + ‖UTU − V TV ‖2

F (1)

where UV T = M.Next, we make some assumptions on M and S :

M is not near sparse

M is µ-incoherent, i.e. we place a bound on the size of its singularvectors.The more spread out the principal components of M are, the more weneed to know about it to reconstruct it.

S is sufficiently sparse

S will have at most α-fraction nonzero entries.For example, if α = 0.1, only 10% of the entries of S will be nonzero.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 6 / 17

Page 9: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Problem

We formulate the problem of recovering M as a minimization problem:

minU,V ,S

‖UV T + S − Y ‖2F + ‖UTU − V TV ‖2

F (1)

where UV T = M.Next, we make some assumptions on M and S :

M is not near sparse

M is µ-incoherent, i.e. we place a bound on the size of its singularvectors.The more spread out the principal components of M are, the more weneed to know about it to reconstruct it.

S is sufficiently sparse

S will have at most α-fraction nonzero entries.For example, if α = 0.1, only 10% of the entries of S will be nonzero.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 6 / 17

Page 10: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Problem

We formulate the problem of recovering M as a minimization problem:

minU,V ,S

‖UV T + S − Y ‖2F + ‖UTU − V TV ‖2

F (1)

where UV T = M.Next, we make some assumptions on M and S :

M is not near sparse

M is µ-incoherent, i.e. we place a bound on the size of its singularvectors.The more spread out the principal components of M are, the more weneed to know about it to reconstruct it.

S is sufficiently sparse

S will have at most α-fraction nonzero entries.For example, if α = 0.1, only 10% of the entries of S will be nonzero.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 6 / 17

Page 11: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Algorithm

In 2016, a team of engineers from the University of Texas at Austin1 andCornell University2 introduced fast algorithms[1] to solve this optimizationproblem in both the fully observed and partially observed cases.

Sketch of Algorithm:

1 Produce an initial sparse estimate S0 from Y

2 Compute the rank r SVD of Y − S = [L,Σ,R]

3 Set U0 = LΣ1/2,V0 = RΣ1/2

4 Proceed by projected gradient descent on the factors Ut ,Vt

1Xinyang Yi, Dohyung Park, and Constantine Caramanis2Yudong Chen

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 7 / 17

Page 12: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Algorithm 1 RPCA via Gradient Descent

Input: observed matrix Y , target rank r , sparsity fraction α, parameters γand η, number of iterations T

procedure rpca gd(Y, r, α, params)Sinit = Tα(Y )[L,Σ,R] = SVDr (Y − Sinit)

U0 = LΣ12 , V0 = RΣ

12

U0 = ΠU (U0), V0 = ΠV(V0)for t = 0:T-1 do

St = Tγα(Y − UtVTt )

Ut+1 = ΠU (Ut − η∇UL(Ut ,Vt , St)− 12ηUt(U

Tt Ut − V T

t Vt))Vt+1 = ΠV(Vt − η∇VL(Ut ,Vt ,St)− 1

2ηVt(VTt Vt − UT

t Ut))end for

end procedure

Output: (UT ,VT )

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 8 / 17

Page 13: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Our Contribution

As presented, RPCA GD satisfies the following convergence bound:

d2(Ut ,Vt ;U∗,V ∗) ≤ (1− σ∗r η

8)td2(U0,V0;U∗,V ∗) (2)

where d2(Ut ,Vt ;U∗,V ∗) is the distance from the t-th iterate to the

optimal solution.

How to choose η?

To guarantee the above convergence, η ≤ 136σ∗

1

This step size is not related to the Lipschitz constant of the gradient,and as long as it meets that bound, the choice is somewhat arbitrary.

Algorithm 1 uses a single η for every iteration, but a larger step sizemight be feasible for some.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 9 / 17

Page 14: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Our Contribution

As presented, RPCA GD satisfies the following convergence bound:

d2(Ut ,Vt ;U∗,V ∗) ≤ (1− σ∗r η

8)td2(U0,V0;U∗,V ∗) (2)

where d2(Ut ,Vt ;U∗,V ∗) is the distance from the t-th iterate to the

optimal solution.

How to choose η?

To guarantee the above convergence, η ≤ 136σ∗

1

This step size is not related to the Lipschitz constant of the gradient,and as long as it meets that bound, the choice is somewhat arbitrary.

Algorithm 1 uses a single η for every iteration, but a larger step sizemight be feasible for some.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 9 / 17

Page 15: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Our Contribution

As presented, RPCA GD satisfies the following convergence bound:

d2(Ut ,Vt ;U∗,V ∗) ≤ (1− σ∗r η

8)td2(U0,V0;U∗,V ∗) (2)

where d2(Ut ,Vt ;U∗,V ∗) is the distance from the t-th iterate to the

optimal solution.

How to choose η?

To guarantee the above convergence, η ≤ 136σ∗

1

This step size is not related to the Lipschitz constant of the gradient,and as long as it meets that bound, the choice is somewhat arbitrary.

Algorithm 1 uses a single η for every iteration, but a larger step sizemight be feasible for some.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 9 / 17

Page 16: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Our Contribution

As presented, RPCA GD satisfies the following convergence bound:

d2(Ut ,Vt ;U∗,V ∗) ≤ (1− σ∗r η

8)td2(U0,V0;U∗,V ∗) (2)

where d2(Ut ,Vt ;U∗,V ∗) is the distance from the t-th iterate to the

optimal solution.

How to choose η?

To guarantee the above convergence, η ≤ 136σ∗

1

This step size is not related to the Lipschitz constant of the gradient,and as long as it meets that bound, the choice is somewhat arbitrary.

Algorithm 1 uses a single η for every iteration, but a larger step sizemight be feasible for some.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 9 / 17

Page 17: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Our Contribution

We introduce a backtracking algorithm to choose the largest step sizewhich results in an objective decrease.

1 Start with η

2 Compute the gradient descent step with this step size to getU+t ,V

+t ,S

+t

3 If F(U+t ,V

+t , S

+t )−F(Ut ,Vt , St) < m, accept U+

t = Ut+1,V+t = Vt+1, S+

t = St+1.

4 Else, η = τ η

In practice, we used τ = 12 and m = 0.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 10 / 17

Page 18: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Low Rank Matrix Recovery

Application to Foreground-Background Separation

(Original frame)

(GD, 30 iterations, 14.84 sec)

(Backtracking, 5 iterations, 5.73 sec)

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 11 / 17

Page 19: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Related Research

Table of Contents

1 Overview

2 Low Rank Matrix Recovery

3 Related Research

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 12 / 17

Page 20: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Related Research

Linearly Coupled Gradient and Mirror Descent

We also implemented a version of Algorithm 1 with coupled gradient andmirror descent (AGDM) [2]. This algorithm performed better than eitherthe original RPCA GD or the backtracking algorithm on large, randommatrices.

We are currently working on extending the AGDM algorithm to generalfunctions which are not Lipschitz smooth.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 13 / 17

Page 21: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Related Research

Nonzero Median Filter

The erasures in the partially observed footage manifest as zero entries inthe matrix Y . Traditional median filtering does not improve matrices withvery few observed entries as the median is almost always zero.

We developed a median filter which works around these zeros and improvethe performance of all variations of the RPCA algorithm on partiallyobserved matrices.

A partially observed frame before and after filtering.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 14 / 17

Page 22: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Related Research

Nonconvex optimization on Manifolds

Another area of interest is optimization on Riemannian manifolds. Thesearise naturally in many problems:

The Netflix problem can be formulated as a search for two low rankmatrices U and W such that the ratings matrix X = UW . Here, thespace of fixed rank matrices is a manifold.

Sychronization of rotations: we scan a 3D object from several anglesand want to piece together these scans to get a completerepresentation of the image. We need to find the correct rotationsand translations based on the observed data. The set of rotationmatrices forms a differentiable manifold.

There is recent work extending methods of classical nonconvexoptimization to this setting, which will allow us to develop algorithms for awider scope of problems.

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 15 / 17

Page 23: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Related Research

References

Xinyang Yi, Dohyung Park, Yudong Chen, and ConstantineCaramanis. Fast Algorithms for Robust PCA via Gradient Descent.Preprint, arXiv: 1605.07784v1 [cs.IT] (2016).

Zeyuan Allen-Zhu and Lorenzo Orecchia. Linear Coupling: AnUltimate Unification of Gradient and Mirror Descent. Preprint, arXiv:1407.1537v4 [cs.DS] (2015).

Heinz H. Bauschke, Jerome Bolte, and Marc Teboulle. A descentLemma beyond Lipschitz Gradient continuity: first-order methodsrevisited and applications Accepted, Mathematics of OperationsResearch (2016).

Nicolas Boumal, P.-A. Absil, and Coralia Cartis. Global rates ofconvergence for nonconvex optimization on manifolds. Preprint, arXiv:https://arxiv.org/abs/1605.08101 [math.OC] (2016).

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 16 / 17

Page 24: Fast Algorithms for Data RecoveryY is video data encoded into a matrix of pixel values M is the stable background S is the foreground of moving objects Principal component analysis:

Related Research

Thank you!

Erin Tripp (Mathematics Department Syracuse University)Fast Algorithms for Data Recovery February 3, 2017 17 / 17