Hypertension & Diabetes In surgery . Presented by: Dr. Saifuddin Ahmed .
Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin...
Transcript of Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin...
![Page 1: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/1.jpg)
Submodularity in Machine Learning
Submodularity in Machine Learning
Saifuddin Syed
MLRG Summer 2016
1 / 39
![Page 2: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/2.jpg)
Submodularity in Machine Learning
What are submodular functions
Outline
1 What are submodular functionsMotivationSubmodularity and ConcavityExamples
2 Properties of submodular functionsSubmodularity and ConvexityLovasz Extension
3 Submodular minimizationSymmetric Submodular FunctionsExample: ClusteringExample: Image Denoising
4 MaximizationGreedy algorithmExamples
5 References2 / 39
![Page 3: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/3.jpg)
Submodularity in Machine Learning
What are submodular functions
Motivation
Motivation
In combinatorial optimization we are interested solving problems ofthe form
max{f (S) : S ∈ F}min{f (S) : S ∈ F}
Where f is some function and F is some discrete set of feasiblesolutions. To make the above problems tractable we can either
Work with each problem individually or
Try an capture the properties of f and F that make the abovetractable.
3 / 39
![Page 4: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/4.jpg)
Submodularity in Machine Learning
What are submodular functions
Motivation
Motivation
In the continuous case we have have that f : Rn → R can be
minimized efficiently if f is convex and
maximized efficiently if f is concave.
We want to find the analogy to discrete functions.
Submodularity is plays the role of concavity/convexity in thediscrete regime.
4 / 39
![Page 5: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/5.jpg)
Submodularity in Machine Learning
What are submodular functions
Motivation
Motivation
In the continuous case we have have that f : Rn → R can be
minimized efficiently if f is convex and
maximized efficiently if f is concave.
We want to find the analogy to discrete functions.
Submodularity is plays the role of concavity/convexity in thediscrete regime.
4 / 39
![Page 6: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/6.jpg)
Submodularity in Machine Learning
What are submodular functions
Motivation
Why should you care about submodularity?
There are many problems in machine learning that can bereformulated in the context of submodular optimization. They haveprovided elegant solutions to many important problems including:
Coverage of sensor networks
Variable selection/regularization
Clustering
MAP decoding in graphical models
5 / 39
![Page 7: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/7.jpg)
Submodularity in Machine Learning
What are submodular functions
Motivation
Notation
For the rest of this talk we will assume V is a set of size n and
F : 2V → R
where 2V is the set of all subsets of V . Furthermore, we willassume F (∅) = 0
Given S ∈ 2V , we define FS : V → R by
FS(i) = F (S ∪ {i})− F (S).
FS(i) represents the marginal value of i with respect to S .
6 / 39
![Page 8: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/8.jpg)
Submodularity in Machine Learning
What are submodular functions
Motivation
Notation
For the rest of this talk we will assume V is a set of size n and
F : 2V → R
where 2V is the set of all subsets of V . Furthermore, we willassume F (∅) = 0
Given S ∈ 2V , we define FS : V → R by
FS(i) = F (S ∪ {i})− F (S).
FS(i) represents the marginal value of i with respect to S .
6 / 39
![Page 9: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/9.jpg)
Submodularity in Machine Learning
What are submodular functions
Motivation
Submodularity
Definition
F is submodular if for all S ⊂ T and j ∈ V \T
FS(j) ≥ FT (j).
F is supermodular if −F is submodular.F is modular (or additive) if it is both submodular andsupermodular.
7 / 39
![Page 10: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/10.jpg)
Submodularity in Machine Learning
What are submodular functions
Motivation
Intuitively the submodular condition says that “you have more togain from something new, if you have less to begin with.”
Note: Sometimes the less intuitive (but equivalent) definition ofsubmodularity is used. F is submodular if for all A,B ⊂ V
F (A) + F (B) ≥ F (A ∪ B) + F (A ∩ B).
8 / 39
![Page 11: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/11.jpg)
Submodularity in Machine Learning
What are submodular functions
Motivation
Intuitively the submodular condition says that “you have more togain from something new, if you have less to begin with.”
Note: Sometimes the less intuitive (but equivalent) definition ofsubmodularity is used. F is submodular if for all A,B ⊂ V
F (A) + F (B) ≥ F (A ∪ B) + F (A ∩ B).
8 / 39
![Page 12: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/12.jpg)
Submodularity in Machine Learning
What are submodular functions
Submodularity and Concavity
More Notation
Note that F : 2V → R induces a function F : {0, 1}n → R by
F (1A) = F (A)
Where 1A is the indicator function for A. I.e.,
1A = (xA1 , . . . , xAn )
Where xAi = 1 if i ∈ A and 0 otherwise.
We will use F and F interchangeably.
9 / 39
![Page 13: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/13.jpg)
Submodularity in Machine Learning
What are submodular functions
Submodularity and Concavity
Submodularity and Concavity
In some sense submodular functions are the discrete analogue ofconcave functions.
f : R→ R is concave is the derivative f ′(x) is non-increasingin x .
F : {0, 1}n → R is submodular if ∀i the discrete derivative,
∂i f (x) = f (x + ei )− f (x),
is non-increasing in x .
Furthermore if g : R+ → R is concave, then F (A) = g(|A|) issubmodular.
10 / 39
![Page 14: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/14.jpg)
Submodularity in Machine Learning
What are submodular functions
Submodularity and Concavity
Submodularity and Concavity
In some sense submodular functions are the discrete analogue ofconcave functions.
f : R→ R is concave is the derivative f ′(x) is non-increasingin x .
F : {0, 1}n → R is submodular if ∀i the discrete derivative,
∂i f (x) = f (x + ei )− f (x),
is non-increasing in x .
Furthermore if g : R+ → R is concave, then F (A) = g(|A|) issubmodular.
10 / 39
![Page 15: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/15.jpg)
Submodularity in Machine Learning
What are submodular functions
Submodularity and Concavity
Submodularity and Concavity
In some sense submodular functions are the discrete analogue ofconcave functions.
f : R→ R is concave is the derivative f ′(x) is non-increasingin x .
F : {0, 1}n → R is submodular if ∀i the discrete derivative,
∂i f (x) = f (x + ei )− f (x),
is non-increasing in x .
Furthermore if g : R+ → R is concave, then F (A) = g(|A|) issubmodular.
10 / 39
![Page 16: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/16.jpg)
Submodularity in Machine Learning
What are submodular functions
Submodularity and Concavity
Submodularity and Concavity
In some sense submodular functions are the discrete analogue ofconcave functions.
f : R→ R is concave is the derivative f ′(x) is non-increasingin x .
F : {0, 1}n → R is submodular if ∀i the discrete derivative,
∂i f (x) = f (x + ei )− f (x),
is non-increasing in x .
Furthermore if g : R+ → R is concave, then F (A) = g(|A|) issubmodular.
10 / 39
![Page 17: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/17.jpg)
Submodularity in Machine Learning
What are submodular functions
Examples
Examples of submodular functions
Coverage function. Suppose (Ai )i∈V are measurable sets .Then
F (S) = |∪i∈SAi |
is submodular.
11 / 39
![Page 18: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/18.jpg)
Submodularity in Machine Learning
What are submodular functions
Examples
Examples of submodular functions
Cut functions. Given a (un)directed graph (V ,E ). DefineF (A) to be the total number of edges from A to V \A issubmodular.
More generally if d : V × V → R+ then
F (A) =∑
i∈A,j∈V \A
d(i , j)
is submodular.12 / 39
![Page 19: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/19.jpg)
Submodularity in Machine Learning
What are submodular functions
Examples
Examples of submodular functions
Entropy. Given n random variables (Xi )i∈V , define
F (A) = H(XA)
to be the joint entropy. Then F is submodular.
Indeed, suppose that A ⊂ B, k ∈ V \B, then
F (A ∪ {k})− F (A) = H(XA,Xk)− H(XA)
= H(Xk |XA)
≥ H(Xk |XB)
Mutual information also submodular.
I (A) = F (A) + F (V \A)− F (V )
13 / 39
![Page 20: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/20.jpg)
Submodularity in Machine Learning
What are submodular functions
Examples
Examples of submodular functions
Entropy. Given n random variables (Xi )i∈V , define
F (A) = H(XA)
to be the joint entropy. Then F is submodular.
Indeed, suppose that A ⊂ B, k ∈ V \B, then
F (A ∪ {k})− F (A) = H(XA,Xk)− H(XA)
= H(Xk |XA)
≥ H(Xk |XB)
Mutual information also submodular.
I (A) = F (A) + F (V \A)− F (V )
13 / 39
![Page 21: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/21.jpg)
Submodularity in Machine Learning
What are submodular functions
Examples
Examples of submodular functions
Entropy. Given n random variables (Xi )i∈V , define
F (A) = H(XA)
to be the joint entropy. Then F is submodular.
Indeed, suppose that A ⊂ B, k ∈ V \B, then
F (A ∪ {k})− F (A) = H(XA,Xk)− H(XA)
= H(Xk |XA)
≥ H(Xk |XB)
Mutual information also submodular.
I (A) = F (A) + F (V \A)− F (V )
13 / 39
![Page 22: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/22.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Outline
1 What are submodular functionsMotivationSubmodularity and ConcavityExamples
2 Properties of submodular functionsSubmodularity and ConvexityLovasz Extension
3 Submodular minimizationSymmetric Submodular FunctionsExample: ClusteringExample: Image Denoising
4 MaximizationGreedy algorithmExamples
5 References14 / 39
![Page 23: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/23.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Properties of Submodular Functions
Positive linear combinations: If Fi are submodular andαi ≥ 0 then ∑
i
αiFi
is submodular.
Restriction/marginalization: If B ⊂ V and F issubmodular, then
A→ F (A ∩ B)
is submodular on V and B.Contraction/conditioning: If B ⊂ V and F is submodular,then
A→ F (A ∪ B)− F (B)
Is submodular on V and V \B
15 / 39
![Page 24: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/24.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Properties of Submodular Functions
Positive linear combinations: If Fi are submodular andαi ≥ 0 then ∑
i
αiFi
is submodular.Restriction/marginalization: If B ⊂ V and F issubmodular, then
A→ F (A ∩ B)
is submodular on V and B.
Contraction/conditioning: If B ⊂ V and F is submodular,then
A→ F (A ∪ B)− F (B)
Is submodular on V and V \B
15 / 39
![Page 25: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/25.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Properties of Submodular Functions
Positive linear combinations: If Fi are submodular andαi ≥ 0 then ∑
i
αiFi
is submodular.Restriction/marginalization: If B ⊂ V and F issubmodular, then
A→ F (A ∩ B)
is submodular on V and B.Contraction/conditioning: If B ⊂ V and F is submodular,then
A→ F (A ∪ B)− F (B)
Is submodular on V and V \B 15 / 39
![Page 26: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/26.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Properties of Submodular Functions
Remark: If F ,G are submodular then
max{F ,G},min{F ,G}
need NOT be submodular.
16 / 39
![Page 27: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/27.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Submodularity and Convexity
Submodularity and Convexity
Although submodular functions are defined like concave functions,their behaviour is very similar to convex functions. Before weexplore this relation, we will need more notation.
Given x ∈ Rn+, A ⊂ V define
x(A) =∑i∈A
xi = xT1A
Where 1A ∈ Rn is the indicator of A.
17 / 39
![Page 28: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/28.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Submodularity and Convexity
Submodularity and Convexity
Although submodular functions are defined like concave functions,their behaviour is very similar to convex functions. Before weexplore this relation, we will need more notation.
Given x ∈ Rn+, A ⊂ V define
x(A) =∑i∈A
xi = xT1A
Where 1A ∈ Rn is the indicator of A.
17 / 39
![Page 29: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/29.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Lovasz Extension
Lovasz Extension
Given F : {0, 1}n → R we will define the Lovasz extensionf : Rn → R as follows. For w ∈ Rn, order wj1 ≥ · · · ≥ wjn and then
f (w) = wj1F ({j1}) +n∑
k=2
wjk [F ({j1, . . . , jk})− F ({j1, . . . , jk−1})]
= wj1F ({j1}) +n∑
k=2
wjkFVk−1(jk)
Where Vk = {j1, . . . , jk}.
Intuitively you are summing the marginal gains of F , weighted bythe components of w .
18 / 39
![Page 30: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/30.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Lovasz Extension
Lovasz Extension
The following are equivalent definitions of the Lovasz Extension.
f (w) = wj1F ({j1}) +n∑
k=2
wjkFVk−1(jk) (1)
=n−1∑k=1
(wjk − wjk+1)F (Vk) + wjnF (V ) (2)
=
∫ ∞wjn
F (w ≥ z)dz + wjnF (V ) (3)
= supx∈P(F )
wT x (4)
Where P(F ) = {x ∈ Rn : ∀A ⊂ V , x(A) ≤ F (A)}, is thesubmodular Polyhedra.
19 / 39
![Page 31: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/31.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Lovasz Extension
Properties of Lovasz Extension
f is indeed an extension of F . For A ⊂ V ,
f (1A) = F (A).
f is peicewise affine
f is convex iff F is submodular
If f is restricted to [0, 1]n, then f attains it’s minimum at thecorner! I.e.
minw∈[0,1]n
f (w) = minx∈{0,1}
F (x)
20 / 39
![Page 32: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/32.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Lovasz Extension
Properties of Lovasz Extension
f is indeed an extension of F . For A ⊂ V ,
f (1A) = F (A).
f is peicewise affine
f is convex iff F is submodular
If f is restricted to [0, 1]n, then f attains it’s minimum at thecorner! I.e.
minw∈[0,1]n
f (w) = minx∈{0,1}
F (x)
20 / 39
![Page 33: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/33.jpg)
Submodularity in Machine Learning
Properties of submodular functions
Lovasz Extension
Properties of Lovasz Extension
f is indeed an extension of F . For A ⊂ V ,
f (1A) = F (A).
f is peicewise affine
f is convex iff F is submodular
If f is restricted to [0, 1]n, then f attains it’s minimum at thecorner! I.e.
minw∈[0,1]n
f (w) = minx∈{0,1}
F (x)
20 / 39
![Page 34: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/34.jpg)
Submodularity in Machine Learning
Submodular minimization
Outline
1 What are submodular functionsMotivationSubmodularity and ConcavityExamples
2 Properties of submodular functionsSubmodularity and ConvexityLovasz Extension
3 Submodular minimizationSymmetric Submodular FunctionsExample: ClusteringExample: Image Denoising
4 MaximizationGreedy algorithmExamples
5 References21 / 39
![Page 35: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/35.jpg)
Submodularity in Machine Learning
Submodular minimization
Minimization of Submodular functions
Suppose we now want to find the minimizing set of a submodularfunction. Ie, we want to find
A∗ = argmin{F (A) : A ⊂ V }
By the Lovasz extention it is equivalent to finding
argmin{f (w) : w ∈ [0, 1]n},
where f is the Lovasz function of F .
Theorem
f can be minimized using the Ellipsoid method in O(n8 log2 n).
22 / 39
![Page 36: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/36.jpg)
Submodularity in Machine Learning
Submodular minimization
Minimization of Submodular functions
Suppose we now want to find the minimizing set of a submodularfunction. Ie, we want to find
A∗ = argmin{F (A) : A ⊂ V }
By the Lovasz extention it is equivalent to finding
argmin{f (w) : w ∈ [0, 1]n},
where f is the Lovasz function of F .
Theorem
f can be minimized using the Ellipsoid method in O(n8 log2 n).
22 / 39
![Page 37: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/37.jpg)
Submodularity in Machine Learning
Submodular minimization
Symmetric Submodular Functions
Symmetric Submodular Functions
We can knock down that O(n8) time down if we impose someextra structure onto F .
We say that F is symmetric if F (A) = F (V \A). Examplesinclude:
Mutual Information. Given random variables (Xi )i∈V then
F (A) = I (XA;XV \A) = I (XV \A;XA) = F (V \A)
Cut functions. Given a weighted graph (V ,E ), with weights{d(e)}e∈E
F (A) =∑
i∈A,j∈V \A
d(i , j) = F (V \A).
23 / 39
![Page 38: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/38.jpg)
Submodularity in Machine Learning
Submodular minimization
Symmetric Submodular Functions
Symmetric Submodular Functions
We can knock down that O(n8) time down if we impose someextra structure onto F .
We say that F is symmetric if F (A) = F (V \A). Examplesinclude:
Mutual Information. Given random variables (Xi )i∈V then
F (A) = I (XA;XV \A) = I (XV \A;XA) = F (V \A)
Cut functions. Given a weighted graph (V ,E ), with weights{d(e)}e∈E
F (A) =∑
i∈A,j∈V \A
d(i , j) = F (V \A).
23 / 39
![Page 39: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/39.jpg)
Submodularity in Machine Learning
Submodular minimization
Symmetric Submodular Functions
Symmetric Submodular Functions
We can knock down that O(n8) time down if we impose someextra structure onto F .
We say that F is symmetric if F (A) = F (V \A). Examplesinclude:
Mutual Information. Given random variables (Xi )i∈V then
F (A) = I (XA;XV \A) = I (XV \A;XA) = F (V \A)
Cut functions. Given a weighted graph (V ,E ), with weights{d(e)}e∈E
F (A) =∑
i∈A,j∈V \A
d(i , j) = F (V \A).
23 / 39
![Page 40: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/40.jpg)
Submodularity in Machine Learning
Submodular minimization
Symmetric Submodular Functions
Symmetric Submodular Functions
We can knock down that O(n8) time down if we impose someextra structure onto F .
We say that F is symmetric if F (A) = F (V \A). Examplesinclude:
Mutual Information. Given random variables (Xi )i∈V then
F (A) = I (XA;XV \A) = I (XV \A;XA) = F (V \A)
Cut functions. Given a weighted graph (V ,E ), with weights{d(e)}e∈E
F (A) =∑
i∈A,j∈V \A
d(i , j) = F (V \A).
23 / 39
![Page 41: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/41.jpg)
Submodularity in Machine Learning
Submodular minimization
Symmetric Submodular Functions
Symmetric Submodular functions
Note that for symmetric sub modular functions
2F (A) = F (A) + F (V \A)
≥ F (A ∩ (V \A)) + F (A ∪ (V \A))
= F (∅) + f (V )
= 2F (∅)= 0
So F (A) is trivially minimized at V . We are interested in
argmin{F (A) : A ⊂ V , 0 < |A| < n}
24 / 39
![Page 42: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/42.jpg)
Submodularity in Machine Learning
Submodular minimization
Symmetric Submodular Functions
Symmetric Submodular functions
Note that for symmetric sub modular functions
2F (A) = F (A) + F (V \A)
≥ F (A ∩ (V \A)) + F (A ∪ (V \A))
= F (∅) + f (V )
= 2F (∅)= 0
So F (A) is trivially minimized at V . We are interested in
argmin{F (A) : A ⊂ V , 0 < |A| < n}
24 / 39
![Page 43: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/43.jpg)
Submodularity in Machine Learning
Submodular minimization
Symmetric Submodular Functions
Theorem (Queyranne 98)
If F is a symmetric submodular function, then there is a fullycombinatorial, algorithm for solving
argmin{F (A) : A ⊂ V , 0 < |A| < n}
with run time O(n3).
The algorithm is very easy to implement but requires some newmachinery that we don’t have time for.
See slides 47-53 of“http://submodularity.org/submodularity-slides.pdf”
25 / 39
![Page 44: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/44.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Clustering
Example: Clustering
Suppose we want to partition V into k clusters A1, . . . ,Ak suchthat
F (A1, . . . ,Ak) =k∑
i=1
E (Ai )
Where E is some submodular function such as Entropy, or a cutfunctions.
In the special case of k = 2, then
F (A) = E (A) + E (V \A)
is symmetric and submodular and thus we can apply Queyranne’salgorithm
26 / 39
![Page 45: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/45.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Clustering
Example: Clustering
Suppose we want to partition V into k clusters A1, . . . ,Ak suchthat
F (A1, . . . ,Ak) =k∑
i=1
E (Ai )
Where E is some submodular function such as Entropy, or a cutfunctions.
In the special case of k = 2, then
F (A) = E (A) + E (V \A)
is symmetric and submodular and thus we can apply Queyranne’salgorithm
26 / 39
![Page 46: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/46.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Clustering
Example: Clustering
When k > 2 we can apply a greedy slitting algorithm.
1 Initially let the partition P1 = {V }.2 For i = 1 . . . k − 1.
For each Cj ∈ Pi ;
Get a partition P ji from splitting Cj in 2 using Queyranne’s
algorithm.Pi+1 = argminF (P j
i )
Theorem
If P is the partition of size k from the greedy splitting algorithm,then
F (P) ≤(
2− 2
k
)F (Popt)
27 / 39
![Page 47: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/47.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Clustering
Example: Clustering
When k > 2 we can apply a greedy slitting algorithm.
1 Initially let the partition P1 = {V }.2 For i = 1 . . . k − 1.
For each Cj ∈ Pi ;
Get a partition P ji from splitting Cj in 2 using Queyranne’s
algorithm.Pi+1 = argminF (P j
i )
Theorem
If P is the partition of size k from the greedy splitting algorithm,then
F (P) ≤(
2− 2
k
)F (Popt)
27 / 39
![Page 48: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/48.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Clustering
Example: Clustering
28 / 39
![Page 49: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/49.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Image Denoising
Example: Image Denoising
Suppose we have a noisy image and we want to find the trueunderlying image?
29 / 39
![Page 50: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/50.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Image Denoising
Example: Image Denoising
Suppose we have a Pairwise Markov Random Field. Suppose Yi
are the true pixels and Xi are the “noisey” ones.
So we have the graphical model,
P(X1, . . . ,Xn,Y1, . . . ,Yn) =∏i ,j
ψi ,j(Yi ,Yj)∏i
φi (Xi ,Yi )
30 / 39
![Page 51: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/51.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Image Denoising
Example: Image Denoising
Suppose we have a Pairwise Markov Random Field. Suppose Yi
are the true pixels and Xi are the “noisey” ones.
So we have the graphical model,
P(X1, . . . ,Xn,Y1, . . . ,Yn) =∏i ,j
ψi ,j(Yi ,Yj)∏i
φi (Xi ,Yi )
30 / 39
![Page 52: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/52.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Image Denoising
Example: Image Denoising
To find the MAP estimate we want,
argmaxYP(Y |X )
=argmaxYP(X ,Y )
=argminY∑i ,j
Ei (Yi ,Yj) +∑i
Ei (Yi )
Where
Ei ,j(Yi ,Yj) = − logψi ,j(Yi ,Yj)
Ei (Yi ) = − log φi (Xi ,Yi )
In genral When is the MAP inference efficiently solvable (in hightree width graphical models)? In general it is NP-hard.
31 / 39
![Page 53: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/53.jpg)
Submodularity in Machine Learning
Submodular minimization
Example: Image Denoising
Example: Image Denoising
Suppose yi are binary, then we have
Theorem (Kolmogorov, Kabih,’04)
MAP inference problem is solvable by graph cutsiff for all i , j ,
Ei ,j(0, 0) + Ei ,j(1, 1) ≤ Ei ,j(0, 1) + Ei ,j(1, 0)
iff each Ei ,j is submodular.
See”http://www.cs.cornell.edu/~rdz/papers/kz-pami04.pdf”if you are interested in seeing the details.
32 / 39
![Page 54: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/54.jpg)
Submodularity in Machine Learning
Maximization
Outline
1 What are submodular functionsMotivationSubmodularity and ConcavityExamples
2 Properties of submodular functionsSubmodularity and ConvexityLovasz Extension
3 Submodular minimizationSymmetric Submodular FunctionsExample: ClusteringExample: Image Denoising
4 MaximizationGreedy algorithmExamples
5 References33 / 39
![Page 55: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/55.jpg)
Submodularity in Machine Learning
Maximization
Submodular maximization
Again, even though submodular functions are defined to emulateconcave functions, in practice they behave like convex ones.
Convex functions:
Minimizing ⇒ polynomial time
Maximizing ⇒ NP-hard
Submodular functions:
Minimizing ⇒ polynomial time
Maximizing ⇒ NP-hard
BUT all hope is not lost, as we can sometimes efficiently getapproximate guarantees!
34 / 39
![Page 56: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/56.jpg)
Submodularity in Machine Learning
Maximization
Submodular maximization
Again, even though submodular functions are defined to emulateconcave functions, in practice they behave like convex ones.
Convex functions:
Minimizing ⇒ polynomial time
Maximizing ⇒ NP-hard
Submodular functions:
Minimizing ⇒ polynomial time
Maximizing ⇒ NP-hard
BUT all hope is not lost, as we can sometimes efficiently getapproximate guarantees!
34 / 39
![Page 57: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/57.jpg)
Submodularity in Machine Learning
Maximization
Submodular maximization
Again, even though submodular functions are defined to emulateconcave functions, in practice they behave like convex ones.
Convex functions:
Minimizing ⇒ polynomial time
Maximizing ⇒ NP-hard
Submodular functions:
Minimizing ⇒ polynomial time
Maximizing ⇒ NP-hard
BUT all hope is not lost, as we can sometimes efficiently getapproximate guarantees!
34 / 39
![Page 58: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/58.jpg)
Submodularity in Machine Learning
Maximization
Submodular maximization
Again, even though submodular functions are defined to emulateconcave functions, in practice they behave like convex ones.
Convex functions:
Minimizing ⇒ polynomial time
Maximizing ⇒ NP-hard
Submodular functions:
Minimizing ⇒ polynomial time
Maximizing ⇒ NP-hard
BUT all hope is not lost, as we can sometimes efficiently getapproximate guarantees!
34 / 39
![Page 59: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/59.jpg)
Submodularity in Machine Learning
Maximization
Monotonic Functions
We say that F is monotonic if A ⊂ B then
F (A) ≤ F (B)
Some examples include:
Coverage function. If (Ai )i∈V are measureable sets, thenA ⊂ B ⊂ V ,
F (A) = |∪i∈AAi | ≤ |∪i∈BAi | = F (B)
Entropy. If (Xi )i∈V are random variables then ifB = A ∪ C ⊂ V ,
F (B) = H(XA,XC ) = H(XA) + H(XC |XA) ≥ H(XA) = F (A)
Similarly Information Gain is an other example.
35 / 39
![Page 60: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/60.jpg)
Submodularity in Machine Learning
Maximization
Monotonic Functions
We say that F is monotonic if A ⊂ B then
F (A) ≤ F (B)
Some examples include:
Coverage function. If (Ai )i∈V are measureable sets, thenA ⊂ B ⊂ V ,
F (A) = |∪i∈AAi | ≤ |∪i∈BAi | = F (B)
Entropy. If (Xi )i∈V are random variables then ifB = A ∪ C ⊂ V ,
F (B) = H(XA,XC ) = H(XA) + H(XC |XA) ≥ H(XA) = F (A)
Similarly Information Gain is an other example.
35 / 39
![Page 61: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/61.jpg)
Submodularity in Machine Learning
Maximization
Monotonic Functions
We say that F is monotonic if A ⊂ B then
F (A) ≤ F (B)
Some examples include:
Coverage function. If (Ai )i∈V are measureable sets, thenA ⊂ B ⊂ V ,
F (A) = |∪i∈AAi | ≤ |∪i∈BAi | = F (B)
Entropy. If (Xi )i∈V are random variables then ifB = A ∪ C ⊂ V ,
F (B) = H(XA,XC ) = H(XA) + H(XC |XA) ≥ H(XA) = F (A)
Similarly Information Gain is an other example.
35 / 39
![Page 62: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/62.jpg)
Submodularity in Machine Learning
Maximization
Greedy algorithm
Greedy Algorithm
For monotonic functions we clearly have F is maximized at V . Sowe are interested in the constraint problem:
argmax|A|≤kF (A).
We will apply the greedy approach.
1 Initialize A0 = ∅2 For i = 1 to k:
xi = argmaxxFAi−1 (x) = argmaxxF (Ai−1 ∪ {x})− F (Ai−1)Ai = Ai−1 ∪ {xi}
Theorem (Nemhauser et al 78)
Given a monotonic submodular function F , then
F (Agreedy ) ≥(
1− 1
e
)max|A|≤k
F (A) ≈ 0.63 max|A|≤k
F (A)
36 / 39
![Page 63: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/63.jpg)
Submodularity in Machine Learning
Maximization
Greedy algorithm
Greedy Algorithm
For monotonic functions we clearly have F is maximized at V . Sowe are interested in the constraint problem:
argmax|A|≤kF (A).
We will apply the greedy approach.
1 Initialize A0 = ∅2 For i = 1 to k:
xi = argmaxxFAi−1 (x) = argmaxxF (Ai−1 ∪ {x})− F (Ai−1)Ai = Ai−1 ∪ {xi}
Theorem (Nemhauser et al 78)
Given a monotonic submodular function F , then
F (Agreedy ) ≥(
1− 1
e
)max|A|≤k
F (A) ≈ 0.63 max|A|≤k
F (A)
36 / 39
![Page 64: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/64.jpg)
Submodularity in Machine Learning
Maximization
Greedy algorithm
Greedy Algorithm
For monotonic functions we clearly have F is maximized at V . Sowe are interested in the constraint problem:
argmax|A|≤kF (A).
We will apply the greedy approach.
1 Initialize A0 = ∅2 For i = 1 to k:
xi = argmaxxFAi−1 (x) = argmaxxF (Ai−1 ∪ {x})− F (Ai−1)Ai = Ai−1 ∪ {xi}
Theorem (Nemhauser et al 78)
Given a monotonic submodular function F , then
F (Agreedy ) ≥(
1− 1
e
)max|A|≤k
F (A) ≈ 0.63 max|A|≤k
F (A)
36 / 39
![Page 65: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/65.jpg)
Submodularity in Machine Learning
Maximization
Examples
Example: Variance Reduction
Suppose we have the linear model
Y =n∑
i=1
αiXi
Each Xi represents a measurement by some sensor i with jointdistribution P(X1, . . . ,Xn).Let V denote the set of possible sensors.Sensors are expensive so we want to pick the best k sensorsthat minimized the variance in the prediction Y .
We want to find |A| ≤ k such that Var(Y |XA) is minimized.Equivalently we want to find A such that the variance reduction ismaximized ie.
F (A) = Var(Y )− Var(Y |XA)
37 / 39
![Page 66: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/66.jpg)
Submodularity in Machine Learning
Maximization
Examples
Example: Variance Reduction
Suppose we have the linear model
Y =n∑
i=1
αiXi
Each Xi represents a measurement by some sensor i with jointdistribution P(X1, . . . ,Xn).Let V denote the set of possible sensors.Sensors are expensive so we want to pick the best k sensorsthat minimized the variance in the prediction Y .
We want to find |A| ≤ k such that Var(Y |XA) is minimized.Equivalently we want to find A such that the variance reduction ismaximized ie.
F (A) = Var(Y )− Var(Y |XA)
37 / 39
![Page 67: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/67.jpg)
Submodularity in Machine Learning
Maximization
Examples
Example: Variance Reduction
argmax|A|≤kF (A) = argmax|A|≤kVar(Y )− Var(Y |XA)
In general this problem is NP-hard but It should be noted that F isalways monotonic.
Theorem (Das & Kempe, 08)
If X1, . . . ,Xn are jointly Gaussian, then F is submodular.
Thus we can apply the greedy algorithm!
38 / 39
![Page 68: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/68.jpg)
Submodularity in Machine Learning
Maximization
Examples
Example: Variance Reduction
argmax|A|≤kF (A) = argmax|A|≤kVar(Y )− Var(Y |XA)
In general this problem is NP-hard but It should be noted that F isalways monotonic.
Theorem (Das & Kempe, 08)
If X1, . . . ,Xn are jointly Gaussian, then F is submodular.
Thus we can apply the greedy algorithm!
38 / 39
![Page 69: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/69.jpg)
Submodularity in Machine Learning
References
Outline
1 What are submodular functionsMotivationSubmodularity and ConcavityExamples
2 Properties of submodular functionsSubmodularity and ConvexityLovasz Extension
3 Submodular minimizationSymmetric Submodular FunctionsExample: ClusteringExample: Image Denoising
4 MaximizationGreedy algorithmExamples
5 References39 / 39
![Page 70: Saifuddin Syed - University of British Columbiasaif.syed/papers/mlrg_submodularity.pdf · Saifuddin Syed MLRG Summer 2016 1/39. Submodularity in Machine Learning What are submodular](https://reader034.fdocuments.us/reader034/viewer/2022042402/5f10d7057e708231d44b116d/html5/thumbnails/70.jpg)
Submodularity in Machine Learning
References
References
These are some of the sources I used to prepare for this talk and Ithink are good to check out in case you are further interested insubmodularity or want more of a rigourous treatment.
Some slides worth reading:
http://www.di.ens.fr/~fbach/submodular_fbach_
mlss2012.pdf
http://submodularity.org/submodularity-slides.pdf
http://theory.stanford.edu/~jvondrak/data/
submod-tutorial-1.pdf
The following notes from Francis Bach were very helpful especiallyif you are interested in the theory as opposed to a big pictureoverview.
http://arxiv.org/pdf/1010.4207.pdf
40 / 39