Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory...

31
Constrained optimization on Hierarchies B Ravi Kiran Part III ICIP 2014 Tutorial T9 Optimizations on Hierarchies of Partitions B Ravi Kiran : Constrained Optimization on HOP 1/31

Transcript of Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory...

Page 1: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies

B Ravi Kiran

Part IIIICIP 2014 Tutorial T9

Optimizations on Hierarchies of Partitions

B Ravi Kiran : Constrained Optimization on HOP 1/31

Page 2: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Outline of the lecture

1 Review on Constrained OptimizationOptimally Pruned Decision TreesDecision Trees in Information Theory

2 Constrained optimization on Hierarchies of PartitionsMoving to hierarchiesOptimal Cuts on Hierarchies of PartitionsLagrangian Formulationλ-cuts are Upper Bounds

3 Conclusion

4 References

B Ravi Kiran : Constrained Optimization on HOP 2/31

Page 3: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Review on Constrained Optimization

Decision Trees and Optimally Pruning

R1

R2

R3

R4

t1

t2

t3

X1 →

↑X2 t1 ≤ 2

t2 ≤ 1

R2 R1

t3 ≤ 3

R3 R4

2D feature space recursively partitioned producing binary tree.

Grow tree until each class contains minimal number points.

How to find a good classifier or regression for f (X1,X2)?

B Ravi Kiran : Constrained Optimization on HOP 3/31

Page 4: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Review on Constrained Optimization

Cost complexity Pruning [Breiman 1984]

Given a grown tree T , we can write a cost:

Rλ(T ) =

|T |∑m=1

∑xi∈Rm

(yi − µRm)2 + λ∣∣∣T ∣∣∣ .

µRm : mean value of observed variable y in region Rm.

λ: parameter that governs trade-off between tree size and fidelityto data.

T : terminal nodes of tree T

Error term for classification includes Impurity measures like Ginicoefficient.

Constrained optimization problem: Trade-off between ClassifierComplexity-Error

B Ravi Kiran : Constrained Optimization on HOP 4/31

Page 5: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Review on Constrained Optimization

Cost complexity Pruning [Breiman 1984]

Figure: Variation of Training Error vs Tree Cost.

B Ravi Kiran : Constrained Optimization on HOP 5/31

Page 6: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Review on Constrained Optimization

Pruning Example

10

3 5

1 1 1 2 1

10.5

3.5 5.5

1.5 1.5 1.5 2.5 1.5

11

4 6

2 2 2 3 2

Figure: Pruning example demonstrating Cost-Complexity pruning. Tree with cost functiongiven for each node given. For λ = 0(left), 0.5(center), 1(right), pruned optimal subtreesare shown. AS λ increases one gets shorter trees. Ideal value of λ is chose bycross-validation, by re-running the fit over k-folds of the original data.

The value of λ at which a pruned parent node is kept w.r.t its childrenis:

λ(T ) =Error(T , t)− Error(T )

T − 1(1)

B Ravi Kiran : Constrained Optimization on HOP 6/31

Page 7: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Review on Constrained Optimization

Decision Trees in Information Theory

CART trees in Rate Distortion Minimization Framework:

D(R) = infPY |X{E [ρ(X ,Y )]|I (X ,Y ) ≤ R}

[Chou Lookabaugh & Gray 1988] Tree structured source codingand modeling

B Ravi Kiran : Constrained Optimization on HOP 7/31

Page 8: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Review on Constrained Optimization

Decision Trees in Information Theory

CART trees since then have had diverse applications in informationtheory and classifier design:

[Ramachandran & Vettereli 1988] Best Wavelet Packet Bases in aRate-Distortion Sense

B Ravi Kiran : Constrained Optimization on HOP 8/31

Page 9: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Review on Constrained Optimization

Decision Trees in Information Theory

[Chou Lookabaugh & Gray 1988] Tree structured source codingand modeling

[Ramachandran & Vettereli 1988] Best Wavelet Packet Bases inRate distortion sense

[Donoho 1997] CART and Best-ortho-basis: A connection

[Wakin, Romberg, Choi, & Baraniuk 2002] Rate distortionoptimization image compression using Wedge-lets

[Chiang & Boyd 2004] Geometric Programming Duals of ChannelCapacity and Rate Distortion

[Shukla & Vettereli 2005] Tree structured Compression forPiecewise Polynomial Images

B Ravi Kiran : Constrained Optimization on HOP 9/31

Page 10: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Moving to Hierarchies

CART motivated applications in hierarchies of segmentations:

[Salembier-Garrido 2000] Binary partition tree as an efficientrepresentation for image processing, segmentation and informationretrieval.

[Guigues 2003] Scale-Sets.

[Ballester, Caselles, Igual, Garrido 2006] Level Lines Selectionwith. Variational Models for Segmentation and Encoding.

[Calederero-Marques 2010] Region merging techniques usinginformation theory statistical measures.

Wide new domain of Hierarchical processing:

[Sylvia Valero 2011] Hyper-spectral data representation usingBinary Partition trees.

[Camille Kurtz 2012] Extraction of complex patterns from multiresolution remote sensing images.

[Xu et al. 2012] Morphological Filtering in Shape Spaces .

B Ravi Kiran : Constrained Optimization on HOP 10/31

Page 11: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Binary Partition Trees [Salembier-Garrido 2000]

Problem

Given Max-tree representation of gray scale Image

Calculate the partition with least distortion given Rate constraint

Calculate the optimal trade off parameter λ which achieves aconstraint bandwidth rate.

B Ravi Kiran : Constrained Optimization on HOP 11/31

Page 12: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Binary Partition Trees [Salembier-Garrido 2000]

Algorithm

Inputs: Distortion D; Rate C ; Budget Rate: C0; Lagrange parameter λλl = 0; \\Compute D and C for a very low λ∗ BottomUpAnalysis(Input: λl , output: C ,D)if C < C0 then { no solution; exit;}Cl = C ; Dl = D;λ = 1020; \\Compute D and C for a very high λ∗ BottomUpAnalysis(Input: λh, output: C ,D)if C > C0 then { no solution; exit;}Ch = C ; Dh = D;do {\\Find the optimum λ valueλ = Dl−Dh

Ch−Cl;

∗ BottomUpAnalysis(Input: λ, output: C ,D)if C < C0 then { Ch = C ; Dh = D;}else { Cl = C ; Dl = D;}} until (C ≈ C0)

B Ravi Kiran : Constrained Optimization on HOP 12/31

Page 13: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Dynamic Program

ω∗(π(S)) = min{ω({S},∑

a∈π(S)

ω(a)}

{S}

a b c π(S) = a t b t c

π∗(S) =

{{S}, if ω(S) ≤

∑a∈π(S) ω(a)

π(S), otherwise

Here ω(S) = ωϕ(S) + λ · ω∂(S), we will see why.D ←− ωϕ, C ←− ω∂ for Salembier-Garrido.

B Ravi Kiran : Constrained Optimization on HOP 13/31

Page 14: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Scale-Sets [Guigues 2003]

Extraction of a set of optimal cuts from a hierarchy characterized by

Energy functional/model: Mumford-ShahA scale parameter λ

B Ravi Kiran : Constrained Optimization on HOP 14/31

Page 15: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Scale-Sets [Guigues 2003]

Energy formulation in Guigues case:

ω(S , λ) = ωϕ(S) + λω∂(S)

Remark

Start from a Hierarchy, Calculate the scale function λ(S) = −∆ωϕ

∆ω∂

for classes in hierarchy H

Calculate indexed hierarchy (H, λ+) consisting of minimal cuts forincreaing λ’s:

{Π(λ,H)}λ∈R+ → (H, λ+)

Furthermore minimization of an energy on Π(H,E ) is NP hard.

Instead chose minimal cuts corresponding to scale parameter λ.

B Ravi Kiran : Constrained Optimization on HOP 15/31

Page 16: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Guigue’s Problems

We have a constrained optimization problem on hierarchies.

Problem

Conditions on objective function ωϕ and Constraint ω∂ to obtainmonotonically ordered set of optimal cuts with λ, thus an indexedoptimal hierarchy.

Conditions on energy ωϕ, ω∂ which ensure unique optimum for agiven λ?

B Ravi Kiran : Constrained Optimization on HOP 16/31

Page 17: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Problem Formulation

Given energies ωϕ, ω∂ : D(E )→ R

minimizeπ∈Π(E ,H)

∑S∈π

ωϕ(S)

subject to∑S∈π

ω∂(S) ≤ C

minimizeπ∈Π(E ,H)

∑S∈π

ω∂(S)

subject to∑S∈π

ωϕ(S) ≤ K

B Ravi Kiran : Constrained Optimization on HOP 17/31

Page 18: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Level Line selection [Casselles et al 2006]

Hierarchy: Tree of Shapes which is an Inclusion tree built from theupper and lower level sets of a scalar function

Rate-Distortion framework for compression if images.

Distorion: |f (x)− µ(S)|2 quadratic error

Rate: ∂S contour length

B Ravi Kiran : Constrained Optimization on HOP 18/31

Page 19: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Dynamic Program

ω∗(π(S)) = min{ω({S},∑

a∈π(S)

ω(a)}

{S}

a b c π(S) = a t b t c

π∗(S) =

{{S}, if ω(S) ≤

∑a∈π(S) ω(a)

π(S), otherwise

B Ravi Kiran : Constrained Optimization on HOP 19/31

Page 20: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Primal and Dual problems

Lagrangian Primal problem:

minimizeπ∈Π(E ,B)

ωϕ(π)

subject to ω∂(π) ≤ C ,

The Lagrangian is now written as:

minimizeπ∈Π(E ,H)

∑S∈π

ωϕ(S)

subject to∑S∈π

ω∂(S) ≤ C

Now the domain of the feasible cuts is the subset Π′ of Π

Π′ = {π, π ∈ Π, ω∂(π) ≤ C}

B Ravi Kiran : Constrained Optimization on HOP 20/31

Page 21: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Lagrangian Multipliers

Remark

For the constrained optimization problem [Salembier, Guigues et al.]use the Lagrangian multiplier methods to formulate an unconstrainedoptimization problem.

As we know from optimization theory, the Lagrangian is given by:

ω(π, λ) = ωϕ(π) + λ · ω∂(π)

Minimal Cuts are the family of cuts with least ωϕ for a given λ.

B Ravi Kiran : Constrained Optimization on HOP 21/31

Page 22: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Unconstrained minimization of Lagrangian

Remark

Guigues assumes sub-additive constraint ω∂ and super-additiveobjective ωϕ to extract λ-ordered cuts from the input hierarchy

Salembier et al. proposes a gradient search based method to findλ which achieves the constraint rate C approximately, that isω∂(π(λ)) ≈ C .

Breiman, Salembier, Guigues and many others, ensure Uniquenessby choosing the smallest cut that satisfies C . This is basically thecondition of Uniqueness.

B Ravi Kiran : Constrained Optimization on HOP 22/31

Page 23: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Demonstration

30

20 4

5 5 4

1 1 1 1 1 1

ωϕ Tree 4

3 2

2 2 2

1 2 1 2 1 2

ω∂ Tree

π

π′

E

j i

g h i

a b c d e f

Dendrogram6

10 2

3 3 2

- - - - - -

λ-tree

π1

π2

π3

Figure: Bottom Left: hierarchy H. Top row: two energies (ωϕ, ω∂) forcorresponding classes. Bottom right: lambda values by equating parent andchild energies, whose level sets give the minimal cuts w.r.t. the ωλ. Scale-setsor λ-cuts shown for λ = 2, 3, 4 as π2, π3, π4.

B Ravi Kiran : Constrained Optimization on HOP 23/31

Page 24: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

λ-cuts are Upper Bounds

λ

ω∂ , ωϕ

0 1 2 3 4 5

6

8

9

14

C = 7.5

ωϕ(π∗λ)

ω∂(π∗λ)

Figure: For 2 < λ < 3 the minimal cut is (a, b, c , d , k) and ω∂ = 8, for λ ≥ 3the minimal cut is (g , h, k) and ω∂ = 6, i.e. ω∂ is never equal to the costC = 7.5 at any time.

B Ravi Kiran : Constrained Optimization on HOP 24/31

Page 25: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

λ-cuts are Upper Bounds

Remark

Lack of Cost→Multiplier mapping: For a given cost ω∂ ≤ C one isnot assured a corresponding multiplier λ.

Uniqueness is lost, even when ωϕ is strictly h-increasing.

π∗(λ∗) is only the upper-bound of the constrained minimal cuts.

the error | ω∂(π∗(λ∗))− C | gives no information about the error| ωϕ(π∗(λ∗))− ωϕ(π) | where π is a constrained minimal cut.

On the ω∂-tree the structure of the solution space forms a lattice.

B Ravi Kiran : Constrained Optimization on HOP 25/31

Page 26: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Everett’s Theorem

Remark

The family of cuts is an abstract set, with energies ωϕ and ω∂ neitherdifferentiable, convex, nor smooth.

Given the multiplier λ ∈ R

minπ∈Π(E ,H)

{∑π

ωϕ(S) + λ∑π

ω∂(S)

}The solution π(λ) to this unconstrained minimization is also an optimalsolution to perturbed primal problem:

minimizeπ∈Π(E ,H)

∑π

ωϕ(S) subject to∑π

ω∂(S) ≤∑π(λ)

ω∂(S)

This solution solves the constrained problem, where the constraint isλ-dependent

B Ravi Kiran : Constrained Optimization on HOP 26/31

Page 27: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies of Partitions

Optimal λ

The two problems will be solved jointly by introducing

λ∗ = inf{λ | ω∂(π∗(λ)) ≤ 0}.

The constraint function ω∂ being h-increasing, and

0 ≤ λ∗ ≤ λ ⇒ π∗(λ∗) �∂ π∗(λ) ⇒ 0 ≥ ω∂(π∗(λ∗)) ≥ ω∂(π∗(λ)).

The domain of the feasible λ is therefore λ ≥ λ∗.We can now set the minimization problem more precisely. Threeconditions are needed:

1 Primal constraint qualification: the set Π′ is not empty,2 Dual constraint qualification: λ∗ exists and is ≥ 0,3 Multiplier based constraint: ω∂(π∗(λ∗)) = 0.

B Ravi Kiran : Constrained Optimization on HOP 27/31

Page 28: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Conclusion

Conclusion

Figure: A brief overview on constrained optimization of Hierarchies.

B Ravi Kiran : Constrained Optimization on HOP 28/31

Page 29: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Conclusion

Conclusion

Tree structured constraints are predominantly used in the fields ofCoding theory, Machine Learning, Image compression andsegmentation.

Rate-distortion minimization, Cost-Complexity, Min Descriptionlength, are various types of constrained optimization problems,which have their tree structured counterparts.

Due to discrete nature of functions, we use Lagrangian multipliersand perturbation methods to reach a minimum

Dual parameter searches can at the best provide an upper-boundon the minimum.

Uniqueness in most cases are ensured by singularity.

B Ravi Kiran : Constrained Optimization on HOP 29/31

Page 30: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

References

References

P. Salembier and Garrido, L., Binary partition tree as an efficientrepresentation for image processing, segmentation and informationretrieval, ITIP, vol. 9, pp. 561–576, 2000

Laurent Guigues, Jean Pierre Cocquerez, and Herve Le Men.Scale-sets image analysis. International Journal of ComputerVision, 68(3):289–317, 2006.

Coloma Ballester, Vicent Caselles, Laura Igual, and Luis Garrido.Level lines selection with variational models for segmentation andencoding. JMIV, 27(1):5?27, 2007.

Y. Shoham and A Gersho. Efficient bit allocation for an arbitraryset of quantizers [speech coding]. Acoustics, Speech and SignalProcessing, IEEE Transactions on, 36(9): 1445?-1453

B Ravi Kiran : Constrained Optimization on HOP 30/31

Page 31: Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

References

References

Hugh Everett. Generalized lagrange multiplier method for solvingproblems of optimum allocation of resources. OperationsResearch, 11(3):399–417, 1963.

P.A Chou, T. Lookabaugh, and R.M. Gray. Optimal pruning withapplications to tree-structured source coding and modeling.Information Theory, IEEE Transactions on, 35(2):299–315, Mar1989. ISSN 0018–9448

Context-based energy estimator: Application to objectsegmentation on the tree of shapes, Yongchao Xu, Geraud, T.,Najman, L., ICIP 2012.

Energetic-Lattice Based optimization, PhD Thesis, B Ravi Kiran,To be defended 31 Oct ESIEE paris.

B Ravi Kiran : Constrained Optimization on HOP 31/31