Multi-objective optimization NSGA-II

56
http://www.lamda.nju.edu.cn/qianc/ Last class β€’ Multi-objective optimization β€’ NSGA-II β€’ SMS-EMOA β€’ MOEA/D Popular variants of MOEA

Transcript of Multi-objective optimization NSGA-II

Page 1: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Last class

β€’ Multi-objective optimization

β€’ NSGA-II

β€’ SMS-EMOA

β€’ MOEA/D

Popular variants

of MOEA

Page 2: Multi-objective optimization NSGA-II

Heuristic Search and Evolutionary Algorithms

Chao Qian οΌˆι’±θΆ…οΌ‰

Associate Professor, Nanjing University, China

Email: [email protected]

Homepage: http://www.lamda.nju.edu.cn/qianc/

Lecture 12: Evolutionary Algorithms for Constrained Optimization

Page 3: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Constrained optimization

π‘šπ‘–π‘›π‘₯βˆŠπ’³ 𝑓 π‘₯

𝑠. 𝑑. 𝑔𝑖 π‘₯ = 0, 1 ≀ 𝑖 ≀ π‘ž;

β„Žπ‘– π‘₯ ≀ 0, π‘ž + 1 ≀ 𝑖 ≀ π‘š

General formulation:

objective function

equality constraints

inequality constraints

The goal: find a feasible solution minimizing the objective 𝑓

A solution is (in)feasible if it does (not) satisfy the constraints

Page 4: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Example – knapsack

Knapsack problem: given 𝑛 items, each with a weight 𝑀𝑖 and a value 𝑣𝑖, to select a subset of items maximizing the sum of values while keeping the summed weights within some capacity π‘Šπ‘šπ‘Žπ‘₯

argπ‘šπ‘Žπ‘₯π’™βˆˆ{0,1}𝑛 βˆ‘π‘–=1𝑛 𝑣𝑖π‘₯𝑖 𝑠. 𝑑. βˆ‘π‘–=1

𝑛 𝑀𝑖π‘₯𝑖 ≀ π‘Šπ‘šπ‘Žπ‘₯

π‘₯𝑖 = 1: the 𝑖-th item is included

objective function constraint

Page 5: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Example – minimum spanning tree

Minimum spanning tree problem:

given an undirected connected graph 𝐺 = (𝑉, 𝐸) on 𝑛 vertices and π‘š edges with positive weights 𝑀:𝐸 β†’ R+, to find a connected subgraph 𝐸′ βŠ† 𝐸 with the minimum weight

argπ‘šπ‘–π‘›π’™βˆˆ{0,1}π‘š βˆ‘π‘–=1π‘š 𝑀𝑖π‘₯𝑖 𝑠. 𝑑. 𝑐 𝒙 = 1

π‘₯𝑖 = 1: the 𝑖-th edge is selectedobjective function constraint

𝑐 𝒙 : the number of connected components

Page 6: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Example – traveling salesman

Traveling salesman problem:

given an undirected connected graph 𝐺 = (𝑉, 𝐸) on 𝑛 vertices and π‘š edges with positive weights 𝑀:𝐸 β†’ R+, to find a Hamilton cycle with the minimum weight

argπ‘šπ‘–π‘›π‘₯ 𝑀(π‘₯) 𝑠. 𝑑. π‘₯ is a Hamilton cycle

Objective function: the sum of the edge weights on the cycle

Constraint: visit each vertex exactly once, starting and ending in the same vertex

Page 7: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

EAs for constrained optimization

How to deal with constraints when EAs are used for constrained optimization?

The optimization problems in real-world applications

often come with constraints

Page 8: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Constraint handling strategies

The final output solution must satisfy the constraints

Common constraint handling strategies

β€’ Penalty functions

β€’ Repair functions

β€’ Restricting search to the feasible region

β€’ Decoder functions

Page 9: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Penalty functions

unconstrained

π‘šπ‘–π‘› 𝑓 π‘₯ + βˆ‘π‘–=1π‘š πœ†π‘– β‹… 𝑓𝑖(π‘₯)

the 𝑖-th constraint violation degree

𝑓𝑖 π‘₯ = 𝑔𝑖 π‘₯ 1 ≀ 𝑖 ≀ π‘ž

max{0, β„Žπ‘–(π‘₯)} π‘ž + 1 ≀ 𝑖 ≀ π‘š

constrained

π‘šπ‘–π‘› 𝑓 π‘₯

𝑠. 𝑑. 𝑔𝑖 π‘₯ = 0, 1 ≀ 𝑖 ≀ π‘ž;

β„Žπ‘– π‘₯ ≀ 0, π‘ž + 1 ≀ 𝑖 ≀ π‘š

Penalty functions: add penalties on the fitness of infeasible solutions

Page 10: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Penalty functions

unconstrained

π‘šπ‘–π‘› 𝑓 π‘₯ + βˆ‘π‘–=1π‘š πœ†π‘– β‹… 𝑓𝑖(π‘₯)

constrained

π‘šπ‘–π‘› 𝑓 π‘₯

𝑠. 𝑑. 𝑔𝑖 π‘₯ = 0, 1 ≀ 𝑖 ≀ π‘ž;

β„Žπ‘– π‘₯ ≀ 0, π‘ž + 1 ≀ 𝑖 ≀ π‘š

Penalty functions: add penalties on the fitness of infeasible solutions

Requirement: the optimal solutions of the original and transformed problems should be consistent

β€’ e.g., all πœ†π‘– are equal, and large enough: compare the constraint violation degrees first; if they are the same, compare the objective values 𝑓

Page 11: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Penalty functions

Minimum spanning tree problem:

given an undirected connected graph

𝐺 = (𝑉, 𝐸) on 𝑛 vertices and π‘š edges

with positive weights 𝑀:𝐸 β†’ R+, to

find a connected subgraph 𝐸′ βŠ† 𝐸

with the minimum weight

argπ‘šπ‘–π‘›π’™βˆˆ{0,1}π‘š βˆ‘π‘–=1π‘š 𝑀𝑖π‘₯𝑖 𝑠. 𝑑. 𝑐 𝒙 = 1

Fitness function:

π‘šπ‘–π‘› 𝑐 𝒙 βˆ’ 1 βˆ™ 𝑀𝑒𝑏 + βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖

Original objective function

Constraint violation degree 𝑛2 βˆ™ π‘€π‘šπ‘Žπ‘₯

Page 12: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Repair functions

Repair functions: repair infeasible solutions to feasible

Example - Knapsack: given 𝑛 items, each with a weight 𝑀𝑖 and a value 𝑣𝑖, to select a subset of items maximizing the sum of values while keeping the summed weights within some capacity π‘Šπ‘šπ‘Žπ‘₯

1 1 0 1 1 0 1 1

1 1 0 1 1 0 0 1

argπ‘šπ‘Žπ‘₯π’™βˆˆ{0,1}𝑛 βˆ‘π‘–=1𝑛 𝑣𝑖π‘₯𝑖 𝑠. 𝑑. βˆ‘π‘–=1

𝑛 𝑀𝑖π‘₯𝑖 ≀ π‘Šπ‘šπ‘Žπ‘₯

Repairing: scan from left to right, and keep the value 1 if the summed weight does not exceed π‘Šπ‘šπ‘Žπ‘₯

𝑣𝑖: 4,2,6,10,4,3,7,2; 𝑀𝑖: 2,3,3,8,6,5,7,1; π‘Šπ‘šπ‘Žπ‘₯ = 25

infeasible

feasible

Page 13: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Restricting search to the feasible region

Restricting search to the feasible region: preserving feasibility by special initialization and reproduction

Example - traveling salesman: given an undirected connected graph 𝐺 = (𝑉, 𝐸) on 𝑛 vertices and π‘š edges with positive weights 𝑀:𝐸 β†’ R+, to find a Hamilton cycle with the minimum weight

argπ‘šπ‘–π‘›π‘₯ 𝑀(π‘₯) 𝑠. 𝑑. π‘₯ is a Hamilton cycle

1 6 2 5 7 4 8 3

Integer vector representation:the order of visiting vertexes

Initialize with permutation;Apply mutation and recombination operators for permutation representation

Permutation is feasible

Page 14: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Decoder functions

Decoder functions: map each genotype to a feasible phenotype

Example - Knapsack: given 𝑛 items, each with a weight 𝑀𝑖 and a value 𝑣𝑖, to select a subset of items maximizing the sum of values while keeping the summed weights within some capacity π‘Šπ‘šπ‘Žπ‘₯

1 1 0 1 1 0 1 1

1 1 0 1 1 0 0 1

argπ‘šπ‘Žπ‘₯π’™βˆˆ{0,1}𝑛 βˆ‘π‘–=1𝑛 𝑣𝑖π‘₯𝑖 𝑠. 𝑑. βˆ‘π‘–=1

𝑛 𝑀𝑖π‘₯𝑖 ≀ π‘Šπ‘šπ‘Žπ‘₯

Decoding: scan from left to right, and keep the value 1 if the summed weight does not exceed π‘Šπ‘šπ‘Žπ‘₯

𝑣𝑖: 4,2,6,10,4,3,7,2; 𝑀𝑖: 2,3,3,8,6,5,7,1; π‘Šπ‘šπ‘Žπ‘₯ = 25

genotype

phenotype

Page 15: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Constraint handling strategies

The final output solution must satisfy the constraints

Common constraint handling strategies

β€’ Penalty functions

β€’ Repair functions

β€’ Restricting search to the feasible region

β€’ Decoder functions

Other effective constraint handling strategies?

Page 16: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

(1+1)-EA for MST

Minimum spanning tree (MST):

β€’ Given: an undirected connected graph 𝐺 = (𝑉, 𝐸) on 𝑛 vertices and π‘š edges with positive integer weights 𝑀:𝐸 β†’ β„•+

β€’ The Goal: find a connected subgraph 𝐸′ βŠ† 𝐸 with the minimum weight

argπ‘šπ‘–π‘›π’™βˆˆ{0,1}π‘š βˆ‘π‘–=1π‘š 𝑀𝑖π‘₯𝑖 𝑠. 𝑑. 𝑐 𝒙 = 1

π‘₯𝑖 = 1 means that edge 𝑒𝑖 is selected

𝒙 ∈ {0,1}π‘š ↔ a subgraph

Formulation:

Page 17: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

(1+1)-EA: Given a pseudo-Boolean function 𝑓:

1. 𝒙 ≔ randomly selected from {0,1}𝑛.2. Repeat until some termination criterion is met3. 𝒙′ ≔ flip each bit of 𝒙 with probability 1/𝑛.4. if 𝑓 𝒙′ ≀ 𝑓(𝒙)5. 𝒙 = 𝒙′.

(1+1)-EA for MST

Theorem. [Neumann & Wegener, TCS’07; Doerr et al., Algorithmica’12] The expected running time of the (1+1)-EA solving the MST problem is 𝑂(π‘š2(log 𝑛 + logπ‘€π‘šπ‘Žπ‘₯)).

Fitness function:

π‘šπ‘–π‘› 𝑐 𝒙 βˆ’ 1 βˆ™ 𝑀𝑒𝑏 + βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖

Original objective function

Constraint violation degree 𝑛2 βˆ™ π‘€π‘šπ‘Žπ‘₯

Using the strategy of penalty functions

Page 18: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

MST by MOEAs

Bi-objective reformulation π‘šπ‘–π‘› (𝑐 𝒙 , βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖)

Theorem. [Neumann & Wegener, Nature Computing’05] The expected running time of the GSEMO solving the MST problem is 𝑂(π‘šπ‘› (𝑛 + logπ‘€π‘šπ‘Žπ‘₯)).

argπ‘šπ‘–π‘›π’™βˆˆ{0,1}π‘š βˆ‘π‘–=1π‘š 𝑀𝑖π‘₯𝑖 𝑠. 𝑑. 𝑐 𝒙 = 1

Penalty functions: 𝑂(π‘š2(log 𝑛 + logπ‘€π‘šπ‘Žπ‘₯))

Bi-objective reformulation: 𝑂(π‘šπ‘›(𝑛 + logπ‘€π‘šπ‘Žπ‘₯))

Bi-objective reformulation is better for dense graphs, e.g., π‘š ∈ Θ(𝑛2)

Page 19: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

MST by MOEAs

Bi-objective reformulation π‘šπ‘–π‘› (𝑐 𝒙 , βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖)

argπ‘šπ‘–π‘›π’™βˆˆ{0,1}π‘š βˆ‘π‘–=1π‘š 𝑀𝑖π‘₯𝑖 𝑠. 𝑑. 𝑐 𝒙 = 1

GSEMO: Given a pseudo-Boolean function vector 𝒇:

1. 𝒙 ≔ randomly selected from {0,1}𝑛.2. 𝑃 ≔ {𝒙}.3. Repeat until some termination criterion is met4. Choose 𝒙 from 𝑃 uniformly at random.5. 𝒙′ ≔ flip each bit of 𝒙 with probability 1/𝑛.6. if βˆ„ 𝒛 ∈ 𝑃 such that 𝒛 β‰Ί 𝒙′7. 𝑃:= 𝑃 βˆ’ 𝒛 ∈ 𝑃| 𝒙′ β‰Ό 𝒛 βˆͺ {𝒙′}.

Keep the non-dominatedsolutions generated so-far

Page 20: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

Main idea:

(1) obtain the empty subgraph 0𝑛

(2) obtain a minimum spanning tree

The analysis of phase (1): π‘šπ‘–π‘› (𝑐 𝒙 , 𝑀 𝒙 = βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖)

Using multiplicative drift analysis:

β€’ design the distance function: 𝑉 𝑃 = π‘šπ‘–π‘› 𝑀 𝒙 𝒙 ∈ 𝑃}

β€’ analyze the expected drift:

Ξ• 𝑉 πœ‰π‘‘ βˆ’ 𝑉 πœ‰π‘‘+1 πœ‰π‘‘ = 𝑃] β‰₯1

π‘›βˆ™1

π‘š(1 βˆ’

1

π‘š)π‘šβˆ’1 βˆ™ βˆ‘π‘–=1

|π’™βˆ—|𝑀 π’™βˆ— βˆ’ 𝑀(π’šπ’Š)

select the solution π’™βˆ— with the smallest 𝑀(𝒙) value, and flip only one 1-bit

the minimum weight in the population

the resulting solution

Page 21: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

=1

π‘›βˆ™1

π‘š(1 βˆ’

1

π‘š)π‘šβˆ’1βˆ™ 𝑀 π’™βˆ—

=1

π‘›βˆ™1

π‘š(1 βˆ’

1

π‘š)π‘šβˆ’1βˆ™ 𝑉 πœ‰π‘‘

Main idea:

(1) obtain the empty subgraph 0𝑛

(2) obtain a minimum spanning tree

The analysis of phase (1): π‘šπ‘–π‘› (𝑐 𝒙 , 𝑀 𝒙 = βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖)

Using multiplicative drift analysis:

β€’ design the distance function: 𝑉 𝑃 = π‘šπ‘–π‘› 𝑀 𝒙 𝒙 ∈ 𝑃}

β€’ analyze the expected drift:

Ξ• 𝑉 πœ‰π‘‘ βˆ’ 𝑉 πœ‰π‘‘+1 πœ‰π‘‘ = 𝑃] β‰₯1

π‘›βˆ™1

π‘š(1 βˆ’

1

π‘š)π‘šβˆ’1 βˆ™ βˆ‘π‘–=1

|π’™βˆ—|𝑀 π’™βˆ— βˆ’ 𝑀(π’šπ’Š)

Page 22: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

Main idea:

(1) obtain the empty subgraph 0𝑛

(2) obtain a minimum spanning tree

The analysis of phase (1): π‘šπ‘–π‘› (𝑐 𝒙 , 𝑀 𝒙 = βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖)

Using multiplicative drift analysis:

β€’ design the distance function: 𝑉 𝑃 = π‘šπ‘–π‘› 𝑀 𝒙 𝒙 ∈ 𝑃}

β€’ analyze the expected drift: Ξ• 𝑉 πœ‰π‘‘ βˆ’ 𝑉 πœ‰π‘‘+1 πœ‰π‘‘ = 𝑃] β‰₯1

π‘’π‘šπ‘›βˆ™ 𝑉 πœ‰π‘‘

Upper bound on the expected running time:

βˆ‘π‘ƒ πœ‹0(𝑃) βˆ™1 + ln (𝑉 𝑃 /π‘‰π‘šπ‘–π‘›)

𝛿

𝑉 𝑃 ≀ π‘šπ‘€π‘šπ‘Žπ‘₯ π‘‰π‘šπ‘–π‘› β‰₯ 1

≀ π‘’π‘šπ‘›(1 + ln (π‘šπ‘€π‘šπ‘Žπ‘₯))

Page 23: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

Main idea:

(1) obtain the empty subgraph 0𝑛

(2) obtain a minimum spanning tree

The analysis of phase (1): π‘šπ‘–π‘› (𝑐 𝒙 , 𝑀 𝒙 = βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖)

Using multiplicative drift analysis:

β€’ design the distance function: 𝑉 𝑃 = π‘šπ‘–π‘› 𝑀 𝒙 𝒙 ∈ 𝑃}

β€’ analyze the expected drift: Ξ• 𝑉 πœ‰π‘‘ βˆ’ 𝑉 πœ‰π‘‘+1 πœ‰π‘‘ = 𝑃] β‰₯1

π‘’π‘šπ‘›βˆ™ 𝑉 πœ‰π‘‘

Upper bound on the expected running time:

βˆ‘π‘ƒ πœ‹0(𝑃) βˆ™1 + ln (𝑉 𝑃 /π‘‰π‘šπ‘–π‘›)

𝛿≀ π‘’π‘šπ‘›(1 + ln (π‘šπ‘€π‘šπ‘Žπ‘₯))

∈ 𝑂(π‘šπ‘› (log 𝑛 + logπ‘€π‘šπ‘Žπ‘₯))

Page 24: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

The analysis of phase (2):

β€’ the found Pareto optimal solutions will always be kept

β€’ follow the path: 𝒙𝒏 β†’ π’™π’βˆ’πŸ β†’ β‹― β†’ π’™πŸ β†’ π’™πŸ

the probability is at least : 1

π‘›βˆ™1

π‘š(1 βˆ’

1

π‘š)π‘šβˆ’1

The expected running time is at most: (𝑛 βˆ’ 1) βˆ™ π‘’π‘šπ‘› ∈ 𝑂(π‘šπ‘›2)

The expected running time of phase (1): 𝑂(π‘šπ‘›(log 𝑛 + logπ‘€π‘šπ‘Žπ‘₯))

The total expected running time: 𝑂(π‘šπ‘›(𝑛 + logπ‘€π‘šπ‘Žπ‘₯))

π’™π’Š: the Pareto optimal solution with 𝑖 connected components

0𝑛

a minimum spanning tree

β‰₯1

π‘’π‘šπ‘›

π‘šπ‘–π‘› (𝑐 𝒙 , 𝑀 𝒙 = βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖)

Page 25: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

MST by MOEAs

Bi-objective reformulation π‘šπ‘–π‘› (𝑐 𝒙 , βˆ‘π‘–:π‘₯𝑖=1𝑀𝑖)

argπ‘šπ‘–π‘›π’™βˆˆ{0,1}π‘š βˆ‘π‘–=1π‘š 𝑀𝑖π‘₯𝑖 𝑠. 𝑑. 𝑐 𝒙 = 1

Penalty functions: 𝑂(π‘š2(log 𝑛 + logπ‘€π‘šπ‘Žπ‘₯))

Bi-objective reformulation: 𝑂(π‘šπ‘›(𝑛 + logπ‘€π‘šπ‘Žπ‘₯))

Bi-objective reformulation is better for dense graphs, e.g., π‘š ∈ Θ(𝑛2)

Theorem. [Neumann & Wegener, Nature Computing’05] The expected running time of the GSEMO solving the MST problem is 𝑂(π‘šπ‘› (𝑛 + logπ‘€π‘šπ‘Žπ‘₯)).

Page 26: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

More examples

Penalty functions Bi-objective reformulation

Set cover 𝑒π‘₯π‘π‘œπ‘›π‘’π‘›π‘‘π‘–π‘Žπ‘™ 𝑂(π‘šπ‘›(log π‘π‘šπ‘Žπ‘₯ + log 𝑛))

[Friedrich et al., ECJ’10]

Minimum cut 𝑒π‘₯π‘π‘œπ‘›π‘’π‘›π‘‘π‘–π‘Žπ‘™ 𝑂(πΉπ‘š(log π‘π‘šπ‘Žπ‘₯ + log 𝑛))

[Neumann et al., Algorithmica’11]

Minimum labelspanning tree

Ξ©(π‘˜π‘’π‘˜) 𝑂(π‘˜2log π‘˜)

[Lai et al., TEC’14]

Problem

Minimum costcoverage

𝑒π‘₯π‘π‘œπ‘›π‘’π‘›π‘‘π‘–π‘Žπ‘™ 𝑂(𝑁𝑛 log 𝑛 + logπ‘€π‘šπ‘Žπ‘₯ + 𝑁 )

[Qian et al., IJCAI’15]

Better

Page 27: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Bi-objective reformulation

1. transform the original constrained optimization problem into a bi-objective optimization problem

bi-objective

π‘šπ‘–π‘› (𝑓 π‘₯ , βˆ‘π‘–=1π‘š 𝑓𝑖(π‘₯))

constraint violation degree

𝑓𝑖 π‘₯ = 𝑔𝑖 π‘₯ 1 ≀ 𝑖 ≀ π‘ž

max{0, β„Žπ‘–(π‘₯)} π‘ž + 1 ≀ 𝑖 ≀ π‘š

constrained

π‘šπ‘–π‘› 𝑓 π‘₯

𝑠. 𝑑. 𝑔𝑖 π‘₯ = 0, 1 ≀ 𝑖 ≀ π‘ž;

β„Žπ‘– π‘₯ ≀ 0, π‘ž + 1 ≀ 𝑖 ≀ π‘š

Main idea:

Page 28: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Bi-objective reformulation

1. transform the original constrained optimization problem into a bi-objective optimization problem

Main idea:

2. employ a multi-objective EA to solve the transformed problem

bi-objective

π‘šπ‘–π‘› (𝑓 π‘₯ , βˆ‘π‘–=1π‘š 𝑓𝑖(π‘₯))

3. output the feasible solution from the generated non-dominated solution set

constraint violation degree = 0

constrained

π‘šπ‘–π‘› 𝑓 π‘₯

𝑠. 𝑑. 𝑔𝑖 π‘₯ = 0, 1 ≀ 𝑖 ≀ π‘ž;

β„Žπ‘– π‘₯ ≀ 0, π‘ž + 1 ≀ 𝑖 ≀ π‘š

Page 29: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Constraint handling strategies

The final output solution must satisfy the constraints

Common constraint handling strategies

β€’ Penalty functions

β€’ Repair functions

β€’ Restricting search to the feasible region

β€’ Decoder functions

β€’ Bi-objective reformulation

Search only in the feasible region

allow infeasible solutions in the search

Better algorithms?

Page 30: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Subset selection

Subset selection is to select a subset of size π‘˜ from a total set of 𝑛 items for optimizing some objective function

Formally stated: given all items 𝑉 = {𝑣1, … , 𝑣𝑛}, an objective function𝑓: 2𝑉 β†’ R and a budget π‘˜, to find a subset 𝑋 βŠ† 𝑉 such that

π‘šπ‘Žπ‘₯π‘‹βŠ†π‘‰ 𝑓 𝑋 𝑠. 𝑑. 𝑋 ≀ π‘˜.

Ground set 𝑉 Subset 𝑋 βŠ† 𝑉max 𝑓(𝑋)

𝑋 ≀ π‘˜

Page 31: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Application - sparse regression

Sparse regression [Tropp, TIT’04] : select a few observation variables to best approximate the predictor variable by linear regression

observation variables predictor variable 𝑧

Item 𝑣𝑖: an observation variable

Objective 𝑓: squared multiple correlation 𝑅𝑧,𝑋2 =

Var 𝑧 βˆ’ MSE𝑧,𝑋Var 𝑧

variance mean squared error

a subset 𝑋 of observation variables

Page 32: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Application - influence maximization

Influence maximization [Kempe et al., KDD’03] : select a subset of users from a social network to maximize its influence spread

Influential users

Item 𝑣𝑖: a social network user

Objective 𝑓: influence spread, measured by the expected number of social network users activated by diffusion

Page 33: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Application - document summarization

Document summarization [Lin & Bilmes, ACL’11] : select a few sentences to best summarize the documents

Item 𝑣𝑖: a sentence

Objective 𝑓: summary quality

Page 34: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Application - sensor placement

Sensor placement [Krause & Guestrin, IJCAI’09 Tutorial] : select a few places to install sensors such that the information gathered is maximized

Water contamination detection Fire detection

Item 𝑣𝑖: a place to install a sensor Objective 𝑓: entropy

Page 35: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Subset selection

Subset selection

Machine learning

Natural language

processingNetworks

Document summarization Sensor placement

Data mining

Sparse regression Influence maximization

[Mathematical Programming 1978]

𝑓:monotone and submodular

The greedy algorithm:

(1 βˆ’ 1/𝑒)-approximation

George Nemhauser

John Von Neumann

Theory Prize

Best Paper/Test of Time Award:

[Kempe et al., KDD’03]

[Das & Kempe, ICML’11]

[Iyer & Bilmes, NIPS’13]

Page 36: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Subset representation

A subset 𝑋 βŠ† 𝑉 can be naturally represented by a Boolean

vector 𝒙 ∈ {0,1}𝑛

β€’ the 𝑖-th bit π‘₯𝑖 = 1 if the item 𝑣𝑖 ∈ 𝑋; π‘₯𝑖 = 0 otherwise

β€’ 𝑋 = {𝑣𝑖 ∣ π‘₯𝑖 = 1}

𝑉 = {𝑣1, 𝑣2, 𝑣3, 𝑣4, 𝑣5} a subset 𝑋 βŠ† 𝑉 a Boolean vector 𝒙 ∈ {0,1}5

βˆ… 00000

{𝑣1} 10000

{𝑣2, 𝑣3, 𝑣5} 01101

{𝑣1, 𝑣2, 𝑣3, 𝑣4, 𝑣5} 11111

Page 37: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

POSS algorithm

POSS algorithm [Qian, Yu and Zhou, NIPS’15]

Transformation:

Initialization: put the special solution {0}𝑛

into the population 𝑃

Parent selection & Reproduction: pick asolution 𝒙 randomly from 𝑃, and flip each bitof 𝒙 with prob. 1/𝑛 to generate a new solution

Evaluation & Survivor selection: if thenew solution is not dominated, put it into𝑃 and weed out bad solutions

Output: select the best feasible solution

π‘šπ‘Žπ‘₯π‘‹βŠ†π‘‰ 𝑓 𝑋 𝑠. 𝑑. 𝑋 ≀ π‘˜ original

π‘šπ‘–π‘›π‘‹βŠ†π‘‰ (βˆ’π‘“ 𝑋 , |𝑋|) bi-objective

Page 38: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Sparse regression

Sparse regression: given all observation variables 𝑉 = {𝑣1, … , 𝑣𝑛}, apredictor variable 𝑧 and a budget π‘˜, to find a subset 𝑋 βŠ† 𝑉 such that

π‘šπ‘Žπ‘₯π‘‹βŠ†π‘‰ 𝑅𝑧,𝑋2 =

Var 𝑧 βˆ’ MSE𝑧,𝑋Var 𝑧

𝑠. 𝑑. 𝑋 ≀ π‘˜

observation variables predictor variable 𝑧

Var 𝑧 : variance of 𝑧 MSE𝑧,𝑋: mean squared error of predicting 𝑧by using observation variables in 𝑋

a subset 𝑋 of observation variables

Page 39: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Experimental results

greedy algorithms relaxation methods

POSS is significantly better than all the compared state-of-the art algorithms on all data sets

the size constraint: k= πŸ– the number of iterations of POSS: πŸπ’†π’ŒπŸπ’

exhaustive search

● denotes that POSS is significantly better by the 𝑑-test with confidence level 0.05

Page 40: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Theorem 1. For subset selection with monotone objective functions, POSS using 𝐸 𝑇 ≀ 2π‘’π‘˜2𝑛 finds a solution 𝑋 with 𝑋 ≀ π‘˜ and

𝑓 𝑋 β‰₯ (1 βˆ’ π‘’βˆ’π›Ύ) βˆ™ OPT.

Theoretical analysis

POSS can achieve the optimal polynomial-time approximation guarantee

the optimal polynomial-time approximation guarantee for monotone 𝑓 [Harshaw et al., ICML’19]

Page 41: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

𝑓(𝑋 βˆͺ { 𝑣}) βˆ’ 𝑓(𝑋) β‰₯𝛾

π‘˜(OPT βˆ’ 𝑓(𝑋))

Proof

the optimal function valuesubmodularity ratio [Das & Kempe, ICML’11]

Roughly speaking, the improvement by adding a specific item is proportional to the current distance to the optimum

Lemma 1. For any 𝑋 βŠ† 𝑉, there exists one item 𝑣 ∈ 𝑉 βˆ– 𝑋 such that

Page 42: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

Main idea:

β€’ consider a solution 𝒙 with |𝒙| ≀ 𝑖 and 𝑓(𝒙) β‰₯ 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

π‘–βˆ™ OPT

𝑖 = 0 𝑖 = π‘˜

initial solution 00…0 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

π‘˜

𝑓(𝑋 βˆͺ { 𝑣}) βˆ’ 𝑓(𝑋) β‰₯𝛾

π‘˜(OPT βˆ’ 𝑓(𝑋))

Lemma 1. For any 𝑋 βŠ† 𝑉, there exists one item 𝑣 ∈ 𝑉 βˆ– 𝑋 such that

𝑓 00…0 = 0

|00…0| = 0β‰₯ 1 βˆ’ π‘’βˆ’π›Ύ

(1 βˆ’ 1/π‘š)π‘š ≀ 1/𝑒

= 1 βˆ’ 1 βˆ’1

π‘˜/𝛾

π‘˜/𝛾 ⋅𝛾

a subset

let π‘š = π‘˜/𝛾

Page 43: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

Main idea:

β€’ consider a solution 𝒙 with |𝒙| ≀ 𝑖 and 𝑓(𝒙) β‰₯ 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

π‘–βˆ™ OPT

𝑖 = 0 𝑖 = π‘˜

initial solution 00…0 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

π‘˜

β‰₯ 1 βˆ’ π‘’βˆ’π›Ύ

the desired approximation guarantee

𝑓(𝑋 βˆͺ { 𝑣}) βˆ’ 𝑓(𝑋) β‰₯𝛾

π‘˜(OPT βˆ’ 𝑓(𝑋))

Lemma 1. For any 𝑋 βŠ† 𝑉, there exists one item 𝑣 ∈ 𝑉 βˆ– 𝑋 such that

𝑓 00…0 = 0

|00…0| = 0

?

a subset

Page 44: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

Main idea:

β€’ consider a solution 𝒙 with |𝒙| ≀ 𝑖 and 𝑓(𝒙) β‰₯ 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

π‘–βˆ™ OPT

β€’ in each iteration of POSS:

select 𝒙 from the population 𝑃

flip one specific 0-bit of 𝒙 to 1-bit

𝒙′ = 𝒙 + 1 ≀ 𝑖 + 1 and 𝑓(𝒙′) β‰₯ 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

𝑖+1βˆ™ OPT

𝑓(𝑋 βˆͺ { 𝑣}) βˆ’ 𝑓(𝑋) β‰₯𝛾

π‘˜(OPT βˆ’ 𝑓(𝑋))

Lemma 1. For any 𝑋 βŠ† 𝑉, there exists one item 𝑣 ∈ 𝑉 βˆ– 𝑋 such that

(i.e., add the specific item 𝑣 in Lemma 1)

a subset

Page 45: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

𝑓(𝑋 βˆͺ { 𝑣}) βˆ’ 𝑓(𝑋) β‰₯𝛾

π‘˜(OPT βˆ’ 𝑓(𝑋))

Lemma 1. For any 𝑋 βŠ† 𝑉, there exists one item 𝑣 ∈ 𝑉 βˆ– 𝑋 such that

𝑓 𝒙′ βˆ’ 𝑓(𝒙) β‰₯𝛾

π‘˜βˆ™ OPT βˆ’ 𝑓 𝒙

𝑓(𝒙) β‰₯ 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

𝑖

βˆ™ OPT

𝑓 𝒙′ β‰₯ 1 βˆ’π›Ύ

π‘˜π‘“ 𝒙 +

𝛾

π‘˜βˆ™ OPT

𝑓 𝒙′ β‰₯ 1 βˆ’π›Ύ

π‘˜1 βˆ’ 1 βˆ’

𝛾

π‘˜

𝑖

βˆ™ OPT +𝛾

π‘˜βˆ™ OPT = 1 βˆ’ 1 βˆ’

𝛾

π‘˜

𝑖+1

βˆ™ OPT

Page 46: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

Main idea:

β€’ consider a solution 𝒙 with |𝒙| ≀ 𝑖 and 𝑓(𝒙) β‰₯ 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

π‘–βˆ™ OPT

β€’ in each iteration of POSS:

select 𝒙 from the population 𝑃,

flip one specific 0-bit of 𝒙 to 1-bit,

𝒙′ = 𝒙 + 1 ≀ 𝑖 + 1 and 𝑓(𝒙′) β‰₯ 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

𝑖+1βˆ™ OPT

𝑓(𝑋 βˆͺ { 𝑣}) βˆ’ 𝑓(𝑋) β‰₯𝛾

π‘˜(OPT βˆ’ 𝑓(𝑋))

Lemma 1. For any 𝑋 βŠ† 𝑉, there exists one item 𝑣 ∈ 𝑉 βˆ– 𝑋 such that

(i.e., add the specific item 𝑣 in Lemma 1)

𝑖 𝑖 + 1 the probability: 1

π‘ƒβˆ™

1

𝑒𝑛

a subset

the probability: 1

𝑃

the probability: 1

𝑛1 βˆ’

1

𝑛

π‘›βˆ’1β‰₯

1

𝑒𝑛

Page 47: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

𝑖 𝑖 + 1 the probability: 1

π‘ƒβˆ™

1

𝑒𝑛

1

2π‘’π‘˜π‘›

𝑃 ≀ 2π‘˜

𝑓(𝑋 βˆͺ { 𝑣}) βˆ’ 𝑓(𝑋) β‰₯𝛾

π‘˜(OPT βˆ’ 𝑓(𝑋))

Lemma 1. For any 𝑋 βŠ† 𝑉, there exists one item 𝑣 ∈ 𝑉 βˆ– 𝑋 such that

Main idea:

β€’ consider a solution 𝒙 with |𝒙| ≀ 𝑖 and 𝑓(𝒙) β‰₯ 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

π‘–βˆ™ OPT

β€’ in each iteration of POSS:

Exclude solutions with size at least 2π‘˜

The solutions in 𝑃 are always incomparable

For each size in {0,1, … , 2π‘˜ βˆ’ 1}, there exists at most one solution in 𝑃

a subset

Page 48: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Proof

𝑖 𝑖 + 1 the probability: 1

π‘ƒβˆ™

1

𝑒𝑛

1

2π‘’π‘˜π‘›

𝑃 ≀ 2π‘˜

𝑖 𝑖 + 1 the expected number of iterations: 2π‘’π‘˜π‘›

𝑖 = 0 π‘˜ the expected number of iterations: π‘˜ βˆ™ 2π‘’π‘˜π‘›

𝑓(𝑋 βˆͺ { 𝑣}) βˆ’ 𝑓(𝑋) β‰₯𝛾

π‘˜(OPT βˆ’ 𝑓(𝑋))

Lemma 1. For any 𝑋 βŠ† 𝑉, there exists one item 𝑣 ∈ 𝑉 βˆ– 𝑋 such that

Main idea:

β€’ consider a solution 𝒙 with |𝒙| ≀ 𝑖 and 𝑓(𝒙) β‰₯ 1 βˆ’ 1 βˆ’π›Ύ

π‘˜

π‘–βˆ™ OPT

β€’ in each iteration of POSS:

a subset

Page 49: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Theoretical analysis:The advantage of bi-objective reformulation for handling constraints

Algorithm design:

The POSS algorithm for subset selection

Document summarization Sensor placementInfluence maximizationSparse regression

Page 50: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

POSS algorithm

POSS algorithm [Qian, Yu and Zhou, NIPS’15]

Transformation:

Parent selection & Reproduction:pick a solution 𝒙 randomly from𝑃, and flip each bit of 𝒙 with prob.1/𝑛 to generate a new solution

π‘šπ‘Žπ‘₯π‘‹βŠ†π‘‰ 𝑓 𝑋 𝑠. 𝑑. 𝑋 ≀ π‘˜ original

π‘šπ‘–π‘›π‘‹βŠ†π‘‰ (βˆ’π‘“ 𝑋 , |𝑋|) bi-objective

Using bit-wise mutation only

Page 51: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

PORSS algorithm

PORSS algorithm [Qian, Bian and Feng, AAAI’20]

Transformation:

Parent selection & Reproduction:

β€’ pick two solutions randomly from 𝑃‒ apply recombination operatorβ€’ apply bit-wise mutation operator

π‘šπ‘Žπ‘₯π‘‹βŠ†π‘‰ 𝑓 𝑋 𝑠. 𝑑. 𝑋 ≀ π‘˜ original

π‘šπ‘–π‘›π‘‹βŠ†π‘‰ (βˆ’π‘“ 𝑋 , |𝑋|) bi-objective

Page 52: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

PORSS algorithm

β€’ One-point crossover

1 0 1 1 1 0 0 0

0 0 1 0 1 0 1 0

1 0 1 1 1 0 1 0

0 0 1 0 1 0 0 0

Parent1

Parent2

Offspring1

Offspring2

β€’ Uniform crossover

1 0 1 1 1 0 0 0

0 0 1 0 1 0 1 0

1 0 1 1 1 0 1 0

0 0 1 0 1 0 0 0

Parent1

Parent2

Offspring1

Offspring2

Page 53: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

PORSS using one-point crossover

PORSS using uniform crossover

PORSS performs the best

the size constraint: π’Œ = πŸ– State-of-the-art algorithms

PORSS algorithm

Page 54: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

Summary

β€’ Constrained optimization

β€’ Constraint handling strategies

– Penalty functions

– Repair functions

– Restricting search to the feasible region

– Decoder functions

– Bi-objective reformulationGive an example of algorithm design guided by theoretical analysis

Page 55: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

References

β€’ A. E. Eiben and J. E. Smith. Introduction to Evolutionary Computing. Chapter13.

β€’ T. Friedrich, J. He, N. Hebbinghaus, F. Neumann and C. Witt. Approximatingcovering problems by randomized search heuristics using multi-objectivemodels. Evolutionary Computation, 2010, 18(4): 617-633

β€’ B. Doerr, D. Johannsen and C. Winzen. Multiplicative drift analysis.Algorithmica, 2012, 64: 673-697

β€’ X. Lai, Y. Zhou, J. He and J. Zhang. Performance analysis of evolutionaryalgorithms for the minimum label spanning tree problem. IEEE Transactionson Evolutionary Computation, 2014, 18(6): 860-872

β€’ F. Neumann and I. Wegener. Minimum spanning trees made easier via multi-objective optimization. Natural Computing, 2006, 5(3): 305-319

Page 56: Multi-objective optimization NSGA-II

http://www.lamda.nju.edu.cn/qianc/

References

β€’ F. Neumann and I. Wegener. Randomized local search, evolutionaryalgorithms, and the minimum spanning tree problem. Theoretical ComputerScience, 2007, 378(1): 32-40

β€’ F. Neumann, J. Reichel and M. Skutella. Computing minimum cuts byrandomized search heuristics. Algorithmica, 2011, 59(3): 323-342

β€’ C. Qian, Y. Yu and Z.-H. Zhou. On constrained Boolean Pareto optimization.In: Proceedings of the 24th International Joint Conference on Artificial Intelligence(IJCAI'15), 2015, pages 389-395, Buenos Aires, Argentina

β€’ C. Qian, Y. Yu and Z.-H. Zhou. Subset selection by Pareto optimization. In:Advances in Neural Information Processing Systems 28 (NIPS'15), 2015, pages1765-1773, Montreal, Canada

β€’ C. Qian, C. Bian and C. Feng. Subset selection by Pareto optimization withrecombination. In: Proceedings of the 34th AAAI Conference on ArtificialIntelligence (AAAI'20), New York, NY, 2020, pp.2408-2415.