Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 ·...

31
Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011 Advisor: Prof. Doron Levy (dlevy at math.umd.edu) UMD Dept. of Mathematics & Center for Scientific Computation and Mathematical Modeling (CSCAMM) Mike Pekala (UMD) AMSC663 December 6, 2011 1 / 28

Transcript of Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 ·...

Page 1: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Analyzing Task Driven Learning AlgorithmsMid Year Status

Mike Pekala

December 6, 2011

Advisor: Prof. Doron Levy (dlevy at math.umd.edu)

UMD Dept. of Mathematics & Center for Scientific Computation and

Mathematical Modeling (CSCAMM)

Mike Pekala (UMD) AMSC663 December 6, 2011 1 / 28

Page 2: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Overview and Status Overview

Today

1 Overview and StatusOverviewSchedule

2 Least Angle RegressionGeometric ViewRank 1 UpdatingValidation

3 Dictionary LearningPreliminary Results

4 Summary & Next Steps

Mike Pekala (UMD) AMSC663 December 6, 2011 2 / 28

Page 3: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Overview and Status Overview

Overview

The underlying notion of sparse coding is that, in many domains, data vectorscan be concisely represented as a sparse linear combination of basis elementsor dictionary atoms. Recent results suggest that, for many tasks, performanceimprovements can be obtained by explicitly learning dictionaries directly fromthe data (vs. using predefined dictionaries, such as wavelets). Further resultssuggest that additional gains are possible by jointly optimizing the dictionaryfor both the data and the task (e.g. classification, denoising).

Consider the Task Driven Learning algorithm [Mairal et al., 2010]:

Outer loop: Stochastic gradient descent to learn dictionary atoms,classification weight vector. We talked about this at the kickoff.

Inner loop: Sparse approximation via penalized least squares. Authorsuse the Least Angle Regression (LARS) algorithm for this purpose.Vague at kickoff - we’ll talk about this a bit more today.

Mike Pekala (UMD) AMSC663 December 6, 2011 3 / 28

Page 4: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Overview and Status Schedule

Schedule and Milestones

Schedule and milestones from the kickoff:

Phase I: Algorithm development (Sept 23 - Jan 15)Phase Ia: Implement LARS (Sept 23 ∼ Oct 24)

X Milestone: LARS code available

Phase Ib: Validate LARS (Oct 24 ∼ Nov 14)

X Milestone: results on diabetes data and hand-crafted problems

Phase Ic: Implement SGD framework (Nov 14 ∼ Dec 15)

90% Milestone: Initial SGD code available

Phase Id: Validate SGD framework (Dec 15 ∼ Jan 15)

25% Milestone: TDDL results on MNIST/USPS

Phase II: Analysis on new data sets (Jan 15 - May 1)

Milestone: Preliminary results on selected dataset (∼ Mar 1)Milestone: Final report and presentation (∼ May 1)

Mike Pekala (UMD) AMSC663 December 6, 2011 4 / 28

Page 5: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression

Today

1 Overview and StatusOverviewSchedule

2 Least Angle RegressionGeometric ViewRank 1 UpdatingValidation

3 Dictionary LearningPreliminary Results

4 Summary & Next Steps

Mike Pekala (UMD) AMSC663 December 6, 2011 5 / 28

Page 6: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Geometric View

Problem: Constrained Least Squares

Recall the Lasso: given X = [x1, . . . , xm] ∈ Rn×m, t ∈ R+, solve:

minβ||y −Xβ||22 s.t. ||β||1 ≤ t

which has an equivalent unconstrained formulation:

minβ||y −Xβ||22 + λ||β||1

for some scalar λ ≥ 0. The L1 penalty improves upon OLS by introducingparsimony (feature selection) and regularization (improved generality).

There are multiple ways to solve this problem:

1 Directly, via convex optimization (can be expensive)

2 Iterative techniques

Forward selection (“matching pursuit”), forward stagewise, others.Least Angle Regression (LARS) [Efron et al., 2004]

Mike Pekala (UMD) AMSC663 December 6, 2011 6 / 28

Page 7: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Geometric View

Visualizing the algorithm

x1

x2

y

Column space of X = [x1 x2]

Mike Pekala (UMD) AMSC663 December 6, 2011 7 / 28

Geometry when m = 2

Page 8: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Geometric View

Ordinary Least Squares (OLS)

x1

x2

y

y2 = Xβ = Py

Mike Pekala (UMD) AMSC663 December 6, 2011 8 / 28

β = arg minβ||y −Xβ||22

Page 9: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Geometric View

Least Angle Regression (LARS)

x1

x2

y

Assume ||βOLS ||1 > t

Mike Pekala (UMD) AMSC663 December 6, 2011 9 / 28

β = arg minβ||y −Xβ||22

s.t. ||β||1 ≤ t

Active set A = {}β0 = 0

Page 10: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Geometric View

Least Angle Regression (LARS)

x1

x2

y

x1/||x1||2 = u1

Choose initial direction u1

(covariate most correlated with y)

Mike Pekala (UMD) AMSC663 December 6, 2011 10 / 28

β = arg minβ||y −Xβ||22

s.t. ||β||1 ≤ t

A = {x1}

Page 11: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Geometric View

Least Angle Regression (LARS)

x1

x2

y

γ1u1 = µ1

Move along u1

until x2 is equally correlated

Mike Pekala (UMD) AMSC663 December 6, 2011 11 / 28

β = arg minβ||y −Xβ||22

s.t. ||β||1 ≤ t

A = {x1}

Page 12: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Geometric View

Least Angle Regression (LARS)

x1

x2

y

µ1 x2u2

Identify equiangular vector u2

Mike Pekala (UMD) AMSC663 December 6, 2011 12 / 28

β = arg minβ||y −Xβ||22

s.t. ||β||1 ≤ t

A = {x1, x2}

Page 13: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Geometric View

Least Angle Regression (LARS)

x1

x2

y

µ1 x2

µ2 = µ1 + γ2u2

Move along equiangular direction u2

Mike Pekala (UMD) AMSC663 December 6, 2011 13 / 28

β = arg minβ||y −Xβ||22

s.t. ||β||1 ≤ t

A = {x1, x2}

Page 14: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Geometric View

Relationship to OLS

x1

x2

y

x2

µ2 y2

µ2 approaches OLS solution y2

Mike Pekala (UMD) AMSC663 December 6, 2011 14 / 28

LARS solutions at step k related toOLS solution of ||y −Xkβ||22

Page 15: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Rank 1 Updating

Some Algorithm PropertiesFull details in [Efron et al., 2004]

(2.22) Successive LARS estimates µk always approach but neverreach the OLS estimate yk (except maybe on the final iteration).

(Theorem 1) With a small modification to the LARS step sizecalculation, and assuming covariates are added/removed one at atime from the active set, the complete LARS solution path yieldsall Lasso solutions.(Sec. 3.1) With a change to the covariate selection rule, LARScan be modified to solve the Positive Lasso problem:

minβ||y −Xβ||22 s.t. ||β||1 ≤ t

0 ≤ βj

(Sec. 7) The cost of LARS is comprable to that of a least squaresfit on m variables. The LARS sequence incrementally generates aCholesky factorization of XTX in a very specific order.

Mike Pekala (UMD) AMSC663 December 6, 2011 15 / 28

Page 16: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Rank 1 Updating

Some Algorithm PropertiesFull details in [Efron et al., 2004]

(2.22) Successive LARS estimates µk always approach but neverreach the OLS estimate yk (except maybe on the final iteration).(Theorem 1) With a small modification to the LARS step sizecalculation, and assuming covariates are added/removed one at atime from the active set, the complete LARS solution path yieldsall Lasso solutions.

(Sec. 3.1) With a change to the covariate selection rule, LARScan be modified to solve the Positive Lasso problem:

minβ||y −Xβ||22 s.t. ||β||1 ≤ t

0 ≤ βj

(Sec. 7) The cost of LARS is comprable to that of a least squaresfit on m variables. The LARS sequence incrementally generates aCholesky factorization of XTX in a very specific order.

Mike Pekala (UMD) AMSC663 December 6, 2011 15 / 28

Page 17: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Rank 1 Updating

Some Algorithm PropertiesFull details in [Efron et al., 2004]

(2.22) Successive LARS estimates µk always approach but neverreach the OLS estimate yk (except maybe on the final iteration).(Theorem 1) With a small modification to the LARS step sizecalculation, and assuming covariates are added/removed one at atime from the active set, the complete LARS solution path yieldsall Lasso solutions.(Sec. 3.1) With a change to the covariate selection rule, LARScan be modified to solve the Positive Lasso problem:

minβ||y −Xβ||22 s.t. ||β||1 ≤ t

0 ≤ βj

(Sec. 7) The cost of LARS is comprable to that of a least squaresfit on m variables. The LARS sequence incrementally generates aCholesky factorization of XTX in a very specific order.

Mike Pekala (UMD) AMSC663 December 6, 2011 15 / 28

Page 18: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Rank 1 Updating

Some Algorithm PropertiesFull details in [Efron et al., 2004]

(2.22) Successive LARS estimates µk always approach but neverreach the OLS estimate yk (except maybe on the final iteration).(Theorem 1) With a small modification to the LARS step sizecalculation, and assuming covariates are added/removed one at atime from the active set, the complete LARS solution path yieldsall Lasso solutions.(Sec. 3.1) With a change to the covariate selection rule, LARScan be modified to solve the Positive Lasso problem:

minβ||y −Xβ||22 s.t. ||β||1 ≤ t

0 ≤ βj

(Sec. 7) The cost of LARS is comprable to that of a least squaresfit on m variables. The LARS sequence incrementally generates aCholesky factorization of XTX in a very specific order.

Mike Pekala (UMD) AMSC663 December 6, 2011 15 / 28

Page 19: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Rank 1 Updating

LARS & Cholesky Decomposition

At iteration k, to determine the equiangular vector uk, one must invertthe k × k matrix Gk := XT

k Xk

Well, don’t really invert. Generate Cholesky decomposition Gk = RTRand solve triangular linear systems.

(Recall: Gk symm. pos. semi-definite and s.p.d. if Xk is full rank):

∀z ∈ Rk, z 6= 0, zTGkz = zTXTk Xkz = (Xkz)

T (Xkz) = ||Xkz||22 ≥ 0

Could call chol(Gk) each iteration, but there’s a more efficient way...

Mike Pekala (UMD) AMSC663 December 6, 2011 16 / 28

Page 20: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Rank 1 Updating

QR Rank 1 Updates(Golub and Van Loan [1996, sec. 12.5.2])

Recall: A = QR, then ATA = (RTQT )(QR) = RTR

Suppose we have Q,R, z and want the QR decomposition of:

A = [a1, . . . , ak, z, ak+1 . . . , an]

Let w := QT z. Then,

QT A =[QTa1, . . . , Q

Tak,w, QTak+1, . . . Q

Tan]

e.g.

× × × × × ×0 × × × × ×0 0 × × × ×0 0 0 × × ×0 0 0 × 0 ×0 0 0 × 0 00 0 0 × 0 0

Mike Pekala (UMD) AMSC663 December 6, 2011 17 / 28

Page 21: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Rank 1 Updating

QR Rank 1 Updates(Golub and Van Loan [1996, sec. 12.5.2])

Use Given’s rotations to remove the “spike” introduced by w:

× × × × × ×0 × × × × ×0 0 × × × ×0 0 0 × × ×0 0 0 × 0 ×0 0 0 × 0 00 0 0 × 0 0

× × × × × ×0 × × × × ×0 0 × × × ×0 0 0 × × ×0 0 0 0 × ×0 0 0 0 0 ×0 0 0 0 0 0

Operation requires mn flops.Analogous approach to downdate R when removing a column.Matlab functions qrinsert(), qrdelete() (Octave hascholupdate()...).Ran into trouble with non-uniqueness of QR decomp; used:

QR = QIR = QITk IkR

to swap sign of Rk,k (where Ik is ident. mat. w/ Ik,k = −1)Mike Pekala (UMD) AMSC663 December 6, 2011 18 / 28

Page 22: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Least Angle Regression Validation

Validation: Diabetes Data Set

0 500 1000 1500 2000 2500 3000 3500−800

−600

−400

−200

0

200

400

600

800

11

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

||β ||1

β

Diabetes Validation Test: Coefficients

m = 10, n = 442; Compares well with Figure 1 in [Efron et al., 2004]Also validated by comparing orthogonal designs with theoretical result.

Mike Pekala (UMD) AMSC663 December 6, 2011 19 / 28

Page 23: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Dictionary Learning

Today

1 Overview and StatusOverviewSchedule

2 Least Angle RegressionGeometric ViewRank 1 UpdatingValidation

3 Dictionary LearningPreliminary Results

4 Summary & Next Steps

Mike Pekala (UMD) AMSC663 December 6, 2011 20 / 28

Page 24: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Dictionary Learning Preliminary Results

Dictionary Learning: Progress

Warning:

The following is preliminary - work in progress.

No intelligent parameter selection, only looking at dictionarylearning at the moment.

Mike Pekala (UMD) AMSC663 December 6, 2011 21 / 28

Page 25: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Dictionary Learning Preliminary Results

Atoms: LARS+LASSOtime=6230.27 (sec), m=30, nIters=1000, λ=0.01

Mike Pekala (UMD) AMSC663 December 6, 2011 22 / 28

Page 26: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Dictionary Learning Preliminary Results

Experiment: LARS+LASSO, USPS 5023Num. atoms > 1e-4: 30

Mike Pekala (UMD) AMSC663 December 6, 2011 23 / 28

true digit reconstruction

a28 = 2.632

(0.12 %)

a16 = 1.662

(0.07 %)

a29 = 1.571

(0.07 %)

a11 = 1.438

(0.06 %)

a5 = −1.276

(0.06 %)

Page 27: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Dictionary Learning Preliminary Results

Atoms: LARS+LASSO-NNtime=2435.80 (sec), m=30, nIters=1000, λ=0.01

Mike Pekala (UMD) AMSC663 December 6, 2011 24 / 28

Page 28: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Dictionary Learning Preliminary Results

Experiment: LARS+LASSO-NN, USPS 5023Num. atoms > 1e-4: 13

Mike Pekala (UMD) AMSC663 December 6, 2011 25 / 28

true digit reconstruction

a27 = 2.520

(0.20 %)

a21 = 1.994

(0.16 %)

a28 = 1.882

(0.15 %)

a9 = 1.822

(0.14 %)

a8 = 1.270

(0.10 %)

Page 29: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Summary & Next Steps

Today

1 Overview and StatusOverviewSchedule

2 Least Angle RegressionGeometric ViewRank 1 UpdatingValidation

3 Dictionary LearningPreliminary Results

4 Summary & Next Steps

Mike Pekala (UMD) AMSC663 December 6, 2011 26 / 28

Page 30: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Summary & Next Steps

Summary

Progress

Milestones met; currently on schedule.

Also completed a few extra tasks not in the original plan(non-negative LARS, incremental Cholesky).

Near Term

Finish Task Driven Learning Framework and validation

On to hyperspectral data!

Optional Steps

Parallel SGD (e.g. [Zinkevich et al., 2010])

Mike Pekala (UMD) AMSC663 December 6, 2011 27 / 28

Page 31: Analyzing Task Driven Learning Algorithmsrvbalan/TEACHING/AMSC663Fall2011/... · 2011-12-05 · Analyzing Task Driven Learning Algorithms Mid Year Status Mike Pekala December 6, 2011

Summary & Next Steps

Bibliography I

Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani.Least angle regression. Annals of Statistics, 32:407–499, 2004.

Gene Golub and Charles Van Loan. Matrix Computations. JohnsHopkins University Press, 1996.

Julien Mairal, Francis Bach, and Jean Ponce. Task-Driven DictionaryLearning. Rapport de recherche RR-7400, INRIA, 2010.

Martin Zinkevich, Markus Weimer, Alex Smola, and Lihong Li.Parallelized stochastic gradient descent. In J. Lafferty, C. K. I.Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors,Advances in Neural Information Processing Systems 23, pages2595–2603, 2010.

Mike Pekala (UMD) AMSC663 December 6, 2011 28 / 28