Performance Analysis of Evolutionary Algorithms for the Minimum Label Spanning...

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TEVC.2013.2291790, IEEE Transactions on Evolutionary Computation

1

Performance Analysis of Evolutionary Algorithmsfor the Minimum Label Spanning Tree Problem

Xinsheng Lai, Yuren Zhou, Jun He, Member, IEEE, Jun Zhang, Senior Member, IEEE

Abstract—A few experimental investigations have shown thatevolutionary algorithms (EAs) are efficient for the minimumlabel spanning tree (MLST) problem. However, we know littleabout that in theory. In this paper, we theoretically analyze theperformances of the (1+1) EA, a simple version of EA, and asimple multi-objective evolutionary algorithm called GSEMO onthe MLST problem. We reveal that for the MLSTb problem the(1+1) EA and GSEMO achieve a b+1

2-approximation ratio in

expected polynomial runtime with respect to n the number ofnodes and k the number of labels. We also find that GSEMOachieves a (2 lnn)-approximation ratio for the MLST problemin expected polynomial runtime with respect to n and k. At thesame time, we show that the (1+1) EA and GSEMO outperformlocal search algorithms on three instances of the MLST problem.We also construct an instance on which GSEMO outperforms the(1+1) EA.

Index Terms—Evolutionary algorithm; runtime complexity;approximation ratio; minimum label spanning tree; multi-objective

I. INTRODUCTION

The minimum label spanning tree (MLST) problem is anissue arising from practice, which seeks a spanning tree withthe minimum number of labels in a connected undirectedgraph with labeled edges. This problem has many applica-tions in real-life. For example, in communication networks, acommunication node may communicate with other nodes viadifferent types of channels, such as optic fiber, coaxial cable,microwave, telephone line [1]. Given a set of communicationnodes, we want to find a spanning tree (a connected commu-nication network) that uses the minimum possible number oftypes of channels, which can reduce the construction cost andthe complexity of the network [2]. Another example is that theMLST problem has been used in data compression to increasecompression rates [3], [4]. The MLST problem, proposed byChang and Leu, has been proved to be NP-hard [2].

This work was supported in part by the National Natural Science Foundationof China (61170081, 61165003, 61300044, 61332002), in part by the EPSRCunder Grant EP/I009809/1, in part by the National High-Technology Researchand Development Program (863 Program) of China No. 2013AA01A212,and in part by the NSFC for Distinguished Young Scholars 61125205.(Corresponding author: Y. Zhou).

X. Lai and Y. Zhou are with the School of Computer Science andEngineering, South China University of Technology, Guangzhou 510006,China (e-mail: [email protected]).

X. Lai is also with the School of Mathematics and Computer Science,Shangrao Normal University, Shangrao 334001, China.

J. He is with the Department of Computer Science, Aberystwyth University,Aberystwyth, SY23 3DB, U.K.

J. Zhang is with the Department of Computer Science, Sun Yat-SenUniversity, Guangzhou 510275, China, with the Key Laboratory of DigitalLife, Ministry of Education, China, and also with the Key Laboratory ofSoftware Technology, Education Department of Guangdong Province, China.

Copyright (c) 2012 IEEE

For this problem, Chang and Leu have proposed twoheuristic algorithms. One is the edge replacement algorithm,ERA for short, and the other is the maximum vertex coveringalgorithm, MVCA for short. Their experiments showed thatERA is not stable, and MVCA is very efficient.

The genetic algorithm, belonging to the larger class of EAs,is a general purpose optimization algorithm [5], [6], [7] witha strong globally searching capacity [8]. So, Xiong, Golden,and Wasil proposed a one-parameter genetic algorithm forthe MLST problem. The experimental results on extensive in-stances generated randomly showed that the genetic algorithmoutperforms MVCA [9]. Nummela and Julstrom also proposedan efficient genetic algorithm for solving the MLST problem[10].

Besides, many methods have been recently proposed forsolving this NP-hard problem. Consoli and Moreno-Perezbproposed a hybrid local search combining variable neigh-borhood search and simulated annealing [11]. Chwatal andRaidl presented exact methods including branch-and-cut andbranch-and-cut-and-price [12]. Cerulli et al. utilized severalmetaheuristic methods for this problem, such as simulatedannealing, reactive tabu search, the pilot method, and variableneighborhood search [13]. Consoli et al. proposed a greedyrandomized adaptive search procedure and a variable neigh-borhood search for solving the MLST problem [14].

Since both ERA and MVCA are two original heuristicalgorithms for the MLST problem, the worst performanceanalysis of these two algorithms, especially MVCA, is a hotresearch topic in recent years. Krumke and Wirth proved thatMVCA has a logarithmic performance guarantee of 2 lnn+1,where n is the number of nodes in the input graph, andpresented an instance to show that ERA might perform asbadly as possible [15]. Wan, Chen, and Xu further proved thatMVCA has a better performance guarantee of ln(n − 1) + 1[16]. Xiong, Golden, and Wasil proved another bound onthe worst performance of MVCA for MLSTb problems, i.e.,Hb =

∑bi=1

1i , where the subscript b denotes that each label

appears at most b times, and is also called the maximumfrequency of the labels [17].

The performance of MVCA on the MLST problem has beendeeply investigated. However, there is still no theoretical workwhich focuses on the performance analysis of EAs for theMLST problem, though a few experimental investigations haveshown that EAs are efficient for this problem.

Recently, the theoretical analysis of EAs’ performance oncombinatorial optimization problems has received much at-tention. During the past few years, theoretical investigationsabout EAs focused on the runtime for finding globally optimal



2

solutions of combinatorial optimization problems or theirvariants. These problems include plateaus of constant fitness[18], linear function problems [19], [20], [21], minimum cutproblems [22], satisfiability problems [23], minimum spanningtree problems [24], Eulerian cycle problems [25], Euclideantraveling salesperson problems [26], the maximum matchingproblem [27], and so on.

Nevertheless, many combinatorial optimization problems,including the MLST problem, are NP-hard, and it is commonlybelieved that there is no polynomial time algorithm for them.Therefore, we usually only ask satisfying solutions to such NP-hard problems in practice. Thus, we are interested in whetheran approximation solution with a given satisfying quality canbe efficiently obtained. In fact, the approximation performanceanalysis of randomized heuristics, including EAs, on NP-hardproblems receives more and more attentions.

Oliveto, He, and Yao found that for minimum vertex coverproblems the (1+1) EA may find arbitrary bad approximationsolutions on some instances, but can efficiently find the mini-mum cover of them by using a restart strategy [28]. Friedrich etal. proved that the (1+1) EA may find almost arbitrarily badapproximation solution for minimum vertex cover problemsand minimum set cover problems as well [29]. Witt provedthat in the worst case the (1+1) EA and the randomized localsearch algorithm need an expected runtime O(n2) to producea 4

3 -approximation solution to the partition problem [30].On the approximation performance of multi-objective EAs,

Friedrich et al. revealed that the multi-objective EA efficientlyfinds an lnn-approximation solution to the minimum setcover problem [29]. Neumann and Reichel found that multi-objective EAs can find a k-approximation solution for theminimum multicuts problem in expected polynomial runtime[31]. Recently, Yu, Yao, and Zhou studied the approximationperformance of SEIP, a simple evolutionary algorithm withisolated population, on set cover problems. They found thatSEIP can efficiently obtain an Hn-approximation solutionfor unbounded set cover problems, and an (Hn − k−1

8k9 )-approximation solution for k-set cover problems as well [32].

Though the MLST problem is NP-hard, and the minimumspanning tree problem is in the complexity class P, both belongto the problems of finding spanning trees with some measureto be optimized. The measure of the MLST problem is thenumber of labels used in the tree, whereas the measure of theminimum spanning tree problem is the total weights of theedges in the tree. The performance of EAs on the minimum s-panning tree problem has been recently investigated. Neumannand Wegener showed that the (1+1) EA solves the minimumspanning tree problem in expected polynomial runtime [24].They also showed that the minimum spanning tree problemcan be solved more efficiently via multi-objective optimization[33]. Another spanning tree related problem is the multi-objective minimum spanning tree problem, which is NP-hard.Recently, the performance of EAs on this NP-hard problemhas also been theoretically investigated [34], [35]. However,there is no theoretical work focusing on EAs’ performance forthe MLST problem.

In this paper, we concentrate on the performance analysisof the (1+1) EA and GSEMO (a simple multi-objective

evolutionary algorithm with bit-wise mutation) for the MLSTproblem. We analyze the approximation performances of the(1+1) EA and GSEMO on the MLST problem. For the MLSTb

problem, We prove that the (1+1) EA and GSEMO are b+12 -

approximation algorithms. We also reveal that GSEMO canefficiently achieve a (2 lnn)-approximation ratio for the MLSTproblem. Though the MLST problem is NP-hard, we showthat on three instances the (1+1) EA and GSEMO efficientlyfind their global optima, while local search algorithms may betrapped in local optima. Meanwhile, we construct an additionalinstance where GSEMO outperforms the (1+1) EA.

The rest of this paper is organized as follows. The nextsection describes the MLST problem, and the algorithms con-sidered in this paper. Section III analyzes the approximationperformances of the (1+1) EA and GSEMO on the MLSTproblem, while Section IV investigates the performances ofthe (1+1) EA and GSEMO on four instances. Finally, SectionV concludes this paper and gives possible directions for furtherresearch.

II. THE MLST PROBLEM AND ALGORITHMS

First of all, we give the concepts of spanning subgraph, theMLST and MLSTb problems.

Definition 1: (Spanning subgraph) Let G = (V,E) andH = (V ′, E′) be two graphs, where V (V ′) is the set ofnodes of G (H), and E (E′) is the set of edges of G (H). IfV ′ = V and E′ ⊆ E, then H is a spanning subgraph of G.

Definition 2: (The MLST problem) Let G = (V,E, L)be a connected undirected graph, where V , E, and L =1, 2, . . . , k are respectively the sets of n nodes, m edges,and k labels. Each edge associates with a label by a surjectivefunction l : E → L. The MLST problem is to seek a spanningtree with the minimum number of labels in the input graph G.

From the definition of the MLST problem, it is easy to seethat each edge e ∈ E has a unique label l(e) ∈ L, and eachlabel in L has at least one edge associating with it.

Definition 3: (The MLSTb problem) For an MLST prob-lem, if each label appears at most b times, then it is an MLSTb

problem.Clearly, the MLSTb problem is a special case of the MLST

problem.Our goal in this paper is to seek a connected spanning

subgraph with the minimum number of labels rather thana spanning tree with the minimum number of labels, sinceany spanning tree contained in such a spanning subgraph isan MLST. This is an alternative formulation of the MLSTproblem which has been adopted in papers [9], [10].

Since each edge has exactly one label, if a label is selectedthen all edges with such a label have been selected. Therefore,we aim to find a set of selected labels such that the edges withthese labels construct a spanning subgraph of the input graphand the number of selected labels is minimized. Thus, a setof selected labels is a solution.

Further, we encode a solution as a bit string X =(x1, . . . , xk) which has been used in [9], where k = |L|and bit xi ∈ 0, 1 corresponding to label i. If xi = 1(i =1, 2, . . . , k), then label i is selected, otherwise it is not. Thus,



3

a bit string X represents a label subset, and |X| represents thenumber of labels contained in X .

We consider the spanning subgraph H(X) of G, whereH(X) is a spanning subgraph restricted to edges with labelsthat the corresponding bits in X are set to 1. We call a solutionX such that H(X) is a connected spanning subgraph a feasiblesolution. A feasible solution with the minimum number oflabels is a globally optimal solution.

For solving the MLST problem, the (1+1) EA uses a fitnessfunction. The fitness function is defined as

fit(X) = (c(H(X))− 1) ∗ k2 + |X|, (1)

where c(H(X)) is the number of connected components inH(X), k is the total number of labels in L, and |X| =∑k

i=1 xi, i.e., the number of labels contained in X .The fitness function should be minimized. The first target

is to make sure that H(X) is a connected spanning subgraph,and the second target is to make sure that the number of labelsin the connected spanning subgraph is minimized.

Recall that a feasible solution X satisfies that H(X) is aconnected spanning subgraph, i.e., c(H(X)) = 1. Therefore,the fitness value of a feasible solution equals to the numberof labels contained in it.

We also define the fitness vector for GSEMO as a vector(c(H(X), |X|), where c(H(X)) and |X| are simultaneouslyminimized by GSEMO.

The following algorithms are those considered in this paper.

Algorithm 1: The (1+1) EA for the MLST problem01: Begin02: Initialize a solution X ∈ 0, 1k uniformly at random;03: While termination criterion is not fulfilled04: Obtain an offspring Y by flipping each bit in X with

probability 1k ;

05: If fit(Y ) < fit(X) then X := Y ;06: End while07: End

The (1+1) EA starts with an arbitrary solution, and repeat-edly uses mutation operator to generate an offspring solutionfrom the current one. If the offspring solution is strictly betterthan the current one, then the (1+1) EA uses it to replace thecurrent solution.

Another algorithm for the MLST problem is the local searchalgorithm with the 2-switch neighborhood (LS2N), which isproposed by Bruggemanna, Monnot, and Woeginger. We nowdescribe some concepts about it.

We use X1−X2 to denote the set of labels that are containedin X1 but not in X2, where X1 and X2 are two solutions. Forexample, if X1 = 1, 2, 3, 4, 5 and X2 = 1, 2, 3, 6, thenwe have X1 − X2 = 4, 5 and X2 − X1 = 6. Thus, wehave |X1 −X2| = 2 and |X2 −X1| = 1.

Definition 4: [36](h-switch neighborhood) Let h ≥ 1 bean integer, and let X1 and X2 be two feasible solutionsfor some instance of the MLST problem. We say that X2

is in h-switch neighborhood of X1, denoted by X2 ∈ h-SWITCH(X1), if and only if

|X1 −X2| ≤ h and |X2 −X1| ≤ h. (2)

In other words, X2 ∈ h-SWITCH(X1) means that X2

can be derived from X1 by first removing at most h labelsfrom X1 and then adding at most h labels to it.

LS2N: in algorithm 1, if the initial solution X is an arbitraryfeasible solution, and the offspring Y is selected from the 2-switch neighborhood of X , then it is LS2N [36].

GSEMO has been investigated on covering problems [29],minimum spanning tree problems [33], [34], [35], and alsopseudo-Boolean functions [37], [38]. GSEMO for the MLSTproblem is described as follows.

Algorithm 2: GSEMO for the MLST problem01: Begin02: Initialize a solution X ∈ 0, 1k uniformly at random;03: P ← X;04: While termination criterion is not fulfilled05: Choose a solution X from P uniformly at random;06: Obtain an offspring Y by flipping each bit in X with

probability 1k ;

07: If Y is not dominated by ∀X ∈ P then08: Q := X|X ∈ P, and Y dominates X ;09: P ← P ∪ Y \Q;10: End if11: End while12: End

In algorithm 2, P is a population used to preserve thosesolutions which cannot be dominated by any other from thepopulation. The concept of dominance is defined as follows.

Suppose that the fitness vectors of solutions X and Y are(c(H(X)), |X|) and (c(H(Y )), |Y |), respectively. We say thatX dominates Y , if one of the following two conditions issatisfied:(1) c(H(X)) < c(H(Y )) and |X| ≤ |Y |;(2) c(H(X)) ≤ c(H(Y )) and |X| < |Y |.

For the sake of completeness, we describe another twoalgorithms in the following, which are greedy algorithms.

The first one is the modified MVCA. It starts with asolution containing no labels, and each time selects a labelsuch that when this label is chosen the decrease in the numberof connected components is the largest.

Algorithm 3: The modified MVCA [17]Input: A given connected undirected graph G = (V,E,L),

L = 1, . . . , k.01: Let C be the set of used labels, C := ∅;02: Repeat03: Let H be the spanning subgraph of G restricted to

edges with labels from C;04: For all i ∈ L \ C do05: Determine the number of connected components

when inserting all edges labeled by i in H;06: End for07: Choose label i with the smallest resulting number of

connected components: C := C ∪ i;08: Until H is connected.Output: H



4

The second is called in this paper the modified MVCAwith contraction, which has been investigated in [15].

The modified MVCA with contraction: in algorithm 3, if wecontract each connected component in H to a supernode afterstep 3, then we obtain the modified MVCA with contraction.

III. APPROXIMATION PERFORMANCES OF THE (1+1) EAAND GSEMO ON THE MLST PROBLEM

The following is the concept of approximation ratio (solu-tion). Given a minimization problem R and an algorithm A, iffor an instance I of R, the value of the best solution obtainedin polynomial runtime by A is A(I), and maxI∈R

A(I)OPT(I) = r,

where OPT(I) is the value of the optimal solution of I , thenwe say that A achieves an r-approximation ratio (solution) forR.

Although the MLST problem is NP-hard, we reveal that the(1+1) EA and GSEMO guarantee to achieve an approximationratio for the MLSTb problem in expected polynomial runtimewith respect to n the number of nodes and k the number oflabels, and that GSEMO guarantees to obtain an approximationratio for the MLST problem in expected polynomial runtimewith respect to n and k.

A. The approximation guarantees of the (1+1) EA and GSE-MO on the MLSTb problem

In this subsection, let G = (V,E, L) be an arbitrary instanceof the MLSTb problem, where |V | = n, |L| = k, and b ≥ 2,and let OPT(G) denote the number of labels used in the globaloptimum of G.

We show that the (1+1) EA and GSEMO guarantee toachieve a b+1

2 -approximation ratio for G in expected poly-nomial runtime with respect to n and k in two steps. SinceG is an arbitrary instance of the MLSTb problem, we revealthat the (1+1) EA and GSEMO guarantee to achieve a b+1

2 -approximation ratio for the MLSTb problem in expected run-time polynomial in n and k. First, we prove that the (1+1) EAand GSEMO starting with any initial solution find a feasiblesolution for G in expected runtime polynomial in n and k,then prove that starting with any feasible solution the (1+1)EA and GSEMO find a b+1

2 -approximation solution for G inexpected polynomial runtime with respect to k by simulatingthe result proved in [36].

We now prove that starting with any initial solution, the(1+1) EA can efficiently find a feasible solution of G.

Lemma 1: The (1+1) EA starting with any initial solutionfinds a feasible solution for G in expected runtime O(nk).

Proof: According to the fitness function (1), during theoptimization process of the (1+1) EA, the number of connectedcomponents will never be increased.

Let X be the current solution of G. If X is not a feasiblesolution, then the number of connected components of thespanning subgraph H(X) is greater than one. Note that G isconnected. There must exist a label such that when it is addedto X the number of connected components will be decreasedby at least one. The probability of adding this label to X is1k (1 −

1k )

k−1 ≥ 1ek , which implies that in expected runtime

O(k) the number of connected components will be decreasedby at least one.

Since there are at most n connected components, a feasiblesolution of G will be found by the (1+1) EA starting with anyinitial solution in expected runtime O(nk).

Bruggemann, Monnot, and Woeginger have proved that L-S2N is a b+1

2 -approximation algorithm for the MLSTb problem[36], so we have the following lemma.

Lemma 2: LS2N can find a feasible solution for G with atmost OPT(G) · b+1

2 labels.We partition all feasible solutions of G into two disjoint sets.

One is S1 = X|X ∈ 0, 1k, X is a feasible solution of G,|X| ≤ OPT(G)· b+1

2 , and the other is S2 = X|X ∈ 0, 1k,X is a feasible solution of G, |X| > OPT(G) · b+1

2 .From Lemma 2, we derive a property regarding the 2-switch

neighborhood.Corollary 1: If X is a feasible solution of G, and X ∈

S2, then there must exist a feasible solution X ′ ∈ 2-SWITCH(X) whose fitness value is 1 or 2 less than thatof X .

Next, we will show that starting with an arbitrary fea-sible solution, the (1+1) EA can efficiently find a b+1

2 -approximation solution of G.

Lemma 3: For G, the (1+1) EA starting with an arbitraryfeasible solution finds a b+1

2 -approximation solution in expect-ed runtime O(k4).

Proof: Let X be the current feasible solution of G. ByCorollary 1, if X ∈ S2, then there must exist a feasiblesolution X ′ ∈ 2-SWITCH(X) whose fitness value is 1 or2 less than that of X . So, replacing X with X ′ decreases thefitness value by at least 1. Since a feasible solution belongingto S2 has at most k labels, then after at most k−OPT(G)· b+1

2such replacing steps a feasible solution belonging to S1 willbe found.

Now, we calculate the expected runtime for the (1+1) EA tofind X ′. Since X ′ ∈ 2-SWITCH(X) and |X ′| < |X|. Thereexist three cases. The first is that X ′ is obtained by removingone exact label from X . The second is that X ′ is obtainedby removing two exact labels from X . The third is that X ′

is obtained by removing two exact labels from X and addingone exact label to it.

Obviously, the worst case is the third one, since in this casethree bits of X must be simultaneously flipped by the (1+1)EA. In this case, the probability that the (1+1) EA finds X ′

is 1k3 (1 − 1

k )k−3 ≥ 1

ek3 . So, the expected runtime for the(1+1) EA to find a feasible solution X ′ ∈ 2-SWITCH(X)is O(k3), which means that the expected runtime for the (1+1)EA to reduce the fitness value by at least one is O(k3).

Therefore, the expected runtime for the (1+1) EA startingwith an arbitrary feasible solution to find a b+1

2 -approximationsolution for G is O((k − OPT(G) · b+1

2 )k3) = O(k4), asOPT(G) · b+1

2 ≤ k.Combining Lemmas 1 and 3, and noting that G is an

arbitrary instance of the MLSTb problem, we obtain thefollowing theorem.

Theorem 1: The (1+1) EA starting with any initial solutionfinds a b+1

2 -approximation solution for the MLSTb problemin expected runtime O(nk + k4).



5

As we will see below, GSEMO can also efficiently obtainsuch an approximation solution for the MLSTb problem.

Theorem 2: GSEMO starting with any initial solution findsa b+1

2 -approximation solution for the MLSTb problem inexpected runtime O(nk2 + k5).

Proof: For GSEMO, the fitness vector of any solutionX is (c(H(X)), |X|). For a given value d of |X|, there is atmost one solution in the population whose fitness vector is(∗, d), so the population size is O(k) as |X| takes value from0, 1, . . . , k, where k is the number of labels contained inthe label set.

Consider the arbitrary instance G, and let X be the solutionamong the population such that c(H(X)) is the minimum. Ifc(H(X)) > 1, then there exists a label l such that when it isadded the number of connected components will be reducedby at least one, as G is connected. The probability of selectingX from the population is Ω( 1k ), as the population size is O(k),and the probability of flipping only the bit corresponding tolabel l is 1

k (1 −1k )

k−1 = Ω( 1k ), so a solution X ′ such thatc(H(X ′)) is at least one less than c(H(X)) will be includedin the population in expected runtime O(k2).

Since there are at most n connected components in thespanning subgraph induced by any solution, a feasible solutionwill be included in the population in expected runtime O(nk2).

At least one feasible solution is now included in the popu-lation. Let X be the feasible solution which has the minimumlabels among all feasible solutions in the population. If thenumber of labels contained in X is greater than b+1

2 ·OPT(G),then according to Corollary 1 there exists a feasible solutionX ′ ∈ 2-SWITCH(X) such that |X ′| is at least one less than|X|. According to the proof of Lemma 3, the expected runtimeto generate such a solution X ′ from X is O(k3). Since theexpected runtime to select solution X from the population isO(k), such a solution X ′ will be included in the populationin expected runtime O(k4).

Therefore, a b+12 -approximation solution of G will be

included in the population in expected runtime O((k − b+12 ·

OPT(G))k4) = O(k5) once a feasible solution is found.Hence, GSEMO starting with any initial solution will find

a b+12 -approximation solution for G in expected runtime

O(nk2 + k5). Noting that G is an arbitrary instance of theMLSTb problem, we obtain the theorem.

B. The approximation guarantee of GSEMO on the MLSTproblem

In this subsection, let G′ = (V ′, E′, L′) be an arbitraryinstance of the MLST problem, where |V ′| = n and |L′| = k,and let OPT(G′) be the number of labels used in the minimumlabel spanning tree T ∗ of G′.

Now we analyze the approximation guarantee of GSEMOon the MLST problem by simulating the modified MVCAwith contraction. Similar to Lemma 2 in [15], we have thefollowing lemma.

Lemma 4: For G′, if n ≥ 2, then there exists a label suchthat the number of connected components of the spanningsubgraph restricted to edges with this label is not more than⌊n(1− 1

2OPT(G′) )⌋.

Proof: Note that G′ is a connected undirected graph.Since the minimum label spanning tree T ∗ has exactly n− 1edges, there must exist a label, say j, such that the numberof edges in T ∗ labeled by j is at least ⌈ n−1

OPT(G′)⌉. Hence,the number of connected components of the spanning sub-graph, restricted to edges with label j, is not more thann−⌈ n−1

OPT(G′)⌉ = ⌊n(1−1

OPT(G′) )+1

OPT(G′)⌋. When n ≥ 2,we have ⌊n(1− 1

OPT(G′) ) +1

OPT(G′)⌋ ≤ ⌊n(1−1

2OPT(G′) )⌋.

Further, for a spanning subgraph H(X) of G′, we have thefollowing corollary.

Corollary 2: Let s be the number of connected componentsof H(X). If s ≥ 2, then there is a label such that when itis added to X the number of connected components will bereduced to not more than ⌊s(1− 1

2OPT(G′) )⌋.Proof: Contracting each connected component of H(X)

to a supernode, then G′ is converted to G′′ with s nodes.Suppose that the number of labels used in the minimum labelspanning tree of G′′ is OPT(G′′). According to Lemma 4,there is a label in G′′ such that the number of connectedcomponents of the spanning subgraph, restricted to edges withthis label, is not more than ⌊s(1 − 1

2OPT(G′′) )⌋. Noting thatthe number of labels used in the minimum label spanning treeof G′ is OPT(G′), it is clear that OPT(G′′) ≤ OPT(G′).Thus, ⌊s(1− 1

2OPT(G′′) )⌋ <⌊s(1−1

2OPT(G′) )⌋. In other words,there is a label such that when it is added to X the number ofconnected components of H(X) will be reduced to not morethan ⌊s(1− 1

2OPT(G′) )⌋.Based on Corollary 2, we prove that GSEMO guarantees to

find a (2 lnn)-approximate solution for the MLST problem inexpected polynomial runtime with respect to n and k.

Theorem 3: GSEMO starting with any initial solution findsa (2 lnn)-approximation solution for the MLST problem inexpected runtime O(k2 ln k + k3 lnn).

Proof: Consider the arbitrary instance G′ of the MLSTproblem.

We first reveal that GSEMO starting with any initial so-lution will find the all-zeros bit string for G′ in expectedruntime O(k2 ln k), then reveal that GSEMO finds a (2 lnn)-approximation solution for G′ in expected runtime O(k3 lnn)after the all-zeros bit string being included in the population.Combining them, and noting that G′ is an arbitrary instanceof the MLST problem, we obtain the theorem.

We now investigate the expected runtime that GSEMOstarting with any initial solution finds the all-zeros bit stringwith Pareto optimal fitness vector (n, 0). Once it is found, itcan never be removed from the population. If it is not includedin the population, then GSEMO can choose a solution Xfrom P which contains the minimum number of labels amongall solutions in the population with probability Ω( 1k ), as thepopulation size is O(k). The event of flipping one of |X|bits whose value is 1 will decrease the number of labels, andthe probability of this event is

(|X|1

)1k (1 −

1k )

k−1 ≥ |X|ek . So,

the expected runtime that GSEMO includes a solution whichcontains |X|−1 labels is O( k2

|X| ). Following this way, the all-zeros bit string will be included in the population in expectedruntime O(

∑1i=|X|

k2

i ) = O(k2 ln |X|) = O(k2 ln k).



6

1

2

µK-1

µK-2

3

4

1

2

µK-1

µK-2

3

4

µK

µK+1

µK+1

µK+1

µK+1

µK+1

µK+1

µK+1

µK+2

µK+2

µK+2

µK+2

µK+2µK+2

µK+2

µK µK

µK+2

1

2

µK-1

µK-2

3

4

K

K

K

K

K

K

K-1

G’1 G’2 G’(1-µ)K

µK+1 µK+2 K-1

V11 V21 V(1- )k,1µ

µK

K V0

K

Fig. 1. Instance G1.

Now, the all-zeros bit string with fitness vector (n, 0) isincluded in the population. It is easy to see that n ≥ 2,otherwise G′ is trivial. According to Corollary 2, there is alabel such that when it is added to the all-zeros bit string thenumber of connected components will be reduced to not morethan n(1− 1

2OPT(G′) ). The probability of choosing this labelis 1

k (1−1k )

k−1 = Ω( 1k ). Since the population size is O(k), theprobability that GSEMO selects the all-zeros bit string fromP is Ω( 1k ). So a solution X1 with fitness vector (c1, 1), wherec1 ≤ ⌊n(1− 1

2OPT(G′) )⌋ ≤ n(1− 12OPT(G′) ) can be included

in the population in expected runtime O(k2).If c1 ≥ 2, then there is still a label such that when it is

added to X1 the number of connected components will bereduced to not more than n(1− 1

2OPT(G′) )2. So a solution X2

with fitness vector (c2, 2), where c2 ≤ n(1− 12OPT(G′) )

2 canbe included in the population in expected runtime O(k2) afterX1 being included in the population.

Similarly, suppose that solution Xh−1 with fitness vector(ch−1, h − 1), where ch−1 ≤ n(1 − 1

2OPT(G′) )h−1, has now

been included in the population. If ch−1 ≥ 2, then a solutionXh with fitness vector (ch, h), where ch ≤ n(1− 1

2OPT(G′) )h,

will be included in the population in expected runtime O(k2)after Xh−1 being included in the population.

Note that when h = 2 ·OPT(G′) · lnn, n(1− 12OPT(G′) )

h ≤1. So, a connected spanning subgraph with at most 2 ·OPT(G′)·lnn labels will be finally included in the populationin expected runtime O(hk2) = O(2 · OPT(G′) · k2 lnn) =O(k3 lnn) after the all-zeros bit string being included in thepopulation.

TABLE IAPPROXIMATION PERFORMANCES OF THE (1+1) EA AND GSEMO. ‘r’

AND ‘—’ REFER TO THE APPROXIMATION RATIO AND UNKNOWN,RESPECTIVELY.

The (1+1) EA GSEMOr Runtime r Runtime

MLSTbb+12

O(nk + k4) b+12

O(nk2 + k5)

MLST — — 2 lnn O(k2 ln k + k3 lnn)

Table I summarizes the approximation performances of the(1+1) EA and GSEMO for the minimum label spanning treeproblem. For the MLSTb problem, the (1+1) EA and GSEMOcan efficiently achieve a b+1

2 -approximation ratio. However,the order of the upper bound on the expected runtime of

GSEMO is higher than that of the (1+1) EA. This is becausethat GSEMO has to select a promising solution to mutate froma population of size O(k). For the MLST problem, GSEMOefficiently achieves a (2 lnn)-approximation ratio, but theapproximation performance of the (1+1) EA is unknown.

IV. PERFORMANCES OF THE (1+1) EA AND GSEMO ONFOUR INSTANCES

In this section, we first present an instance where GSEMOoutperforms the (1+1) EA, then we show that the (1+1) EA andGSEMO outperform local search algorithms on three instancesof the MLST problem.

A. An instance where GSEMO outperforms the (1+1) EA

In this subsection, we construct an instance G1 =V1, E1, L1 on which GSEMO is superior to the (1+1) EA,where L1 = 1, . . . , k.

Given µ(0 < µ < 12 ), we construct instance G1 as follows.

For simplicity, we assume that µk is an integer, thus (1−µ)kis also an integer. First, we construct (1− µ)k subgraphs G′

1,. . ., G′

(1−µ)k. G′i(1 ≤ i ≤ (1−µ)k) contains a (µk−1)-sided

regular polygon whose edges are all labeled by the same labelµk+ i and an inner node in the center of this regular polygon.Since the number of sides of a regular polygon is at least3, µk − 1 ≥ 3, i.e., µk ≥ 4. From the inner node of G′

i,(µk − 1) edges labeled by from 1 to µk − 1 connect to theµk − 1 outer nodes vi1, vi2, . . ., vi,µk−1. Then three edgesare connected from G′

i (1 ≤ i ≤ (1 − µ)k − 1) to G′i+1: the

first one labeled by µk + i is from the inner node of G′i to

outer node vi+1,1 of G′i+1, the second one labeled by µk+i is

from outer node vi1 of G′i to outer node vi+1,1 of G′

i+1, andthe third one labeled by µk is from the inner node of G′

i tothe inner node of G′

i+1. Finally, v0 is connected to the innernode and v(1−µ)k,1 of G′

(1−µ)k with two edges labeled by k,and v0 is also connected to the inner node of G′

1 with an edgelabeled by µk. Figure 1 shows instance G1.

When 0 < µ < 1/2, X∗1 = (

µk︷︸︸︷1, . . . , 1,

(1−µ)k︷︸︸︷0, . . . , 0) is the global

optimum of G1. X l1 = (

µk︷︸︸︷0, . . . , 0,

(1−µ)k︷︸︸︷1, . . . , 1) is a local optimum

for LS2N. Since X∗1 is the only one feasible solution that con-

tains fewer labels than X l1. However, |X l

1−X∗1 | = (1−µ)k and



7

|X∗1 −X l

1| = µk, which implies that X∗1 ∈ 2-SWITCH(X l

1)as µk ≥ 4 and (1− µ)k > µk.

For instance G1, the expected runtime for the (1+1) EA tojump out of X l

1 is exponential.Theorem 4: For instance G1, starting with X l

1, the expectedruntime for the (1+1) EA to find the global optimum isΩ(kµk).

Proof: For instance G1, when the current solution is X l1,

the (1+1) EA only accepts the event that adds all µk labelsfrom 1, . . . , µk and simultaneously removes more than µklabels from µk + 1, . . . , k. So, the probability of escapingfrom the local optimum is∑k−2µk

i=1

(k−µkµk+i

)( 1k )

2µk+i(1− 1k )

k−2µk−i

= ( 1k )µk

∑k−2µki=1

(k−µkµk+i

)( 1k )

µk+i(1− 1k )

k−2µk−i

< ( 1k )µk.

This is because∑k−2µk

i=1

(k−µkµk+i

)( 1k )

µk+i(1− 1k )

k−2µk−i

<∑k−2µk

i=1

(k−µkµk+i

)( 1k )

µk+i(1− 1k )

k−2µk−i

+∑0

i=−µk

(k−µkµk+i

)( 1k )

µk+i(1− 1k )

k−2µk−i

=∑k−µk

i=0

(k−µk

i

)( 1k )

i(1− 1k )

k−µk−i = 1.Thus, starting with X l

1, the expected runtime for the (1+1)EA to find the global optimum of G1 is Ω(kµk).

While the (1+1) EA needs an expected exponential runtimeto jump out of X l

1, GSEMO can efficiently find the globaloptimum for instance G1.

Theorem 5: For instance G1, GSEMO finds the globaloptimum in expected runtime O(k2 ln k).

Proof: We first determine the expected runtime thatGSEMO starting with any initial solution finds the all-zerossolution. Then we determine the expected runtime that startingwith the all-zeros solution GSEMO produces the whole Paretofront.

Adding a label from L′1 = 1, . . . , µk to the all-zeros bit

string can reduce the number of connected components by(1 − µ)k, while adding a label from L′′

1 = µk + 1, . . . , kcan reduce the number of connected components by µk. Notethat when 0 < µ < 1/2, (1 − µ)k is larger than µk. Hence,the Pareto front contains µk+1 Pareto optimal solutions withfitness vectors (n, 0), (n−(1−µ)k, 1), . . ., (n−(1−µ)jk, j),. . ., (1, µk), respectively. It is clear that the population size isO(k).

It has been proved in Theorem 3 that the expected runtimefor GSEMO starting with any initial solution to include theall-zeros bit string in the population is O(k2 ln k).

Now we calculate the expected runtime to produce thewhole Pareto front after the all-zeros bit string being found.The worst case is from the all-zeros bit string to producethe whole Pareto front. Suppose that now in the population,there is a Pareto optimal solution X with fitness vector(n − (1 − µ)jk, j), which has the maximum number oflabels. Another Pareto optimal solution with fitness vector(n − (1 − µ)(j + 1)k, j + 1) can be produced by adding alabel from L′

1 which is not in X . The probability of addingthis label is

(µk−j

1

)1k (1 −

1k )

k−1 ≥ µk−jek . This implies that

the expected runtime is O( ek2

µk−j ), as the expected runtimethat GSEMO selects X from P is O(k). So, considering theworst case of starting with the all-zeros bit string, the expected

runtime for GSEMO to produce the whole Pareto front is∑µk−1j=0

ek2

µk−j= O(k2lnk).

B. An instance where the (1+1) EA and GSEMO outperformERA

ERA is a local search algorithm. It takes an arbitraryspanning tree as input, then considers each non-tree edge andtests whether the number of used labels can be reduced byadding this non-tree edge and deleting a tree edge on theinduced cycle.

In this subsection, we show that the (1+1) EA and GSEMOoutperform ERA on an instance proposed by Krumke andWirth [15], which is denoted by G2 in this paper.

This instance can be constructed in two steps. First, a starshaped graph is constructed by selecting one node out of nnodes and connecting it to the other n− 1 nodes with n− 1edges, which are labeled by n − 1 distinct labels: 1, 2, . . .,n − 1. Second, a complete graph is constructed by addingedges with the same label k to the star shaped graph. Thus,we get a complete graph G2 = (V2, E2, L2), where |V2| = n,|E2| = n(n−1)/2, and L2 = 1, 2, . . . , k is the set of labels.It is clear that |L2| = k and k = n. Figure 2 shows an examplewith n = 5, where the dashed edges construct a spanning treewith the minimum number of labels.

1

2 3

4

5

5

5

5

55

Fig. 2. An example of instance G2 with n = 5.

For instance G2, the global optimum X∗2 = (x∗

21, . . . , x∗2k)

contains two labels: one comes from 1, . . . , k − 1 and theother is label k, i.e., |X∗

2 | = 2,∑k−1

i=1 x∗2i = 1, and x∗

2k = 1.Krumke and Wirth used instance G2 to demonstrate that

ERA might perform as badly as possible. In fact, X l2 =

(

k−1︷︸︸︷1, . . . , 1, 0) is a local optimum for ERA, since starting withX l

2 the number of labels used in H(X l2) cannot be reduced

by adding any non-tree edge and deleting a tree edge on theinduced cycle. The local optimum uses k−1 labels, while theglobal optimum uses only 2 labels. However, the (1+1) EAcan efficiently solve instance G2.

Theorem 6: For instance G2, the (1+1) EA starting withany initial solution finds a global optimum in expected runtimeO(k ln k).

Proof: For simplicity, let L′2 denote the label set

1, . . . , k − 1.



8

Let A = X|c(H(X) = 1, xk = 1, 2 ≤ |X| ≤ k − 1,i.e., a solution X ∈ A contains label k and at least one but atmost k− 2 labels from L′

2. Obviously, any solution X ∈ A isfeasible.

To find a global optimum, a solution X ∈ A shouldbe found first. Once a solution X ∈ A has been found,the global optimum can be found by removing all |X| − 2redundant labels from L′

2. Once such a label is removed fromX , it cannot be added anymore. According to the CouponCollector’s theorem [39], all redundant labels contained in Xwill be removed in expected runtime O(k ln k).

In order to analyze the expected runtime to find a solutionX ∈ A, we further partition all solutions that do not belongto A into five disjoint subsets B1, B2, B3, B4, B5:B1 = X|c(H(X)) = 1, |X| = k, and xk = 1;B2 = X|c(H(X)) = 1, |X| = k − 1, and xk = 0;B3 = X|c(H(X)) > 1, 1 ≤ |X| ≤ k − 2, and xk = 0;B4 = X|c(H(X)) > 1, |X| = 1, and xk = 1;B5 = X|c(H(X)) > 1, |X| = 0.

If X ∈ B1, then X will be transformed into A by removingone label from L′

2. The probability of this event is(k−11

)1k (1−

1k )

k−1 = Ω(1), which implies that the expected runtime isO(1).

If X ∈ B2, then X will be transformed into A by addinglabel k and simultaneously removing one label from L′

2. Theprobability of this event is

(k−11

)( 1k )

2(1 − 1k )

k−2 = Ω( 1k ),which implies that the expected runtime is O(k).

If X ∈ B3(B4), then X will be transformed into A byadding label k (one label from L′

2). The probability of thisevent is 1

k (1−1k )

k−1 = Ω( 1k )((

k−11

)1k (1−

1k )

k−1 = Ω(1))

,which implies that the expected runtime is O(k).

If X ∈ B5, then X will be transformed into A by simulta-neously adding label k and a label from L′

2. The probability is(k−11

)( 1k )

2(1− 1k )

k−2 = Ω( 1k ), which implies that the expectedruntime is O(k).

So, any solution will be transformed into A in expectedruntime O(k).

Combining the expected runtime to remove all redundantlabels contained in a solution belonging to A, the expectedruntime for the (1+1) EA starting with any initial solution tofind a global optimum is O(k ln k).

As shown in the following theorem, GSEMO can alsoefficiently solve instance G2.

Theorem 7: For instance G2, GSEMO starting with anyinitial solution finds a global optimum in expected runtimeO(k2 ln k).

Proof: Let L′2 denote the label set 1, . . . , k − 1. The

optimization process consists of two independent phases: thefirst phase lasts until a solution with fitness vector (1, ∗), i.e., afeasible solution, is included in the population, and the secondphase ends when a global optimum is found.

We now analyze the expected runtime of the first phase.Let X be the solution with fitness vector (c(H(X)), |X|),where c(H(X)) is the minimum among all solutions in thepopulation. If c(H(X)) > 1, then there are three cases. Thefirst one is that X contains no label, the second is that Xcontains label k but no label from L′

2, and the third is that X

contains at least one but at most k− 2 labels from L′2 and no

label k.For all three cases, a solution with fitness vector (1, ∗) will

be included in expected runtime O(k2), since the probabilityof selecting X from P is Ω( 1k ), and the probability oftransforming X into a solution with fitness vector (1, ∗) isΩ( 1k ).

Once a solution with fitness vector (1, ∗) is included inthe population, we show that a global optimum will be foundin expected runtime O(k2 ln k). To this end, we partition thesecond phase into two subphases: the first subphase lasts untila solution belonging to A = X|xk = 1, 2 ≤ |X| ≤ k − 1is found, i.e, such a solution contains label k and at least onebut at most k − 2 labels from L′

2, the second subphase endswhen a global optimum is found.

If a solution X with fitness vector (1, ∗) and X ∈ A, thenthere are two cases needed to be considered: the first is thatX contains label k and all labels from L′

2, and the other isthat X contains all labels from L′

2 excluding label k.For the first case, removing anyone of labels from L′

2

will transform X into A. The probability of this event is(k−11

)1k (1 −

1k )

k−1 = Ω(1), which implies that the expectedruntime is O(1). For the second case, removing one label fromL′2 and simultaneously adding label k will transform X into A.

The probability of this event is(k−11

)( 1k )

2(1− 1k )

k−2 = Ω( 1k ),which implies that the expected runtime is O(k). Noting thatthe probability of selecting X from the population is Ω( 1k ),a solution belonging to A will be included in the populationin expected runtime O(k2) after a solution with fitness vector(1, ∗) being included.

Now at least one solution belonging to A is included inthe population. Let X be the solution which has the mini-mum number of labels among all solutions in the populationbelonging to A. If X is not the global optimum of G2, thenthe global optimum will be found by removing all |X| − 2redundant labels from L′

2. Once such a label is removed fromX , it cannot be added anymore. According to the CouponCollector’s theorem [39], all redundant labels will be removedin expected runtime O(k ln k), and the probability of selectingX from P is Ω( 1k ), so a global optimum will be found inexpected runtime O(k2 ln k).

Altogether, GSEMO starting with any initial solution findsa global optimum in expected runtime O(k2 ln k).

C. An instance where the (1+1) EA and GSEMO outperformLS2N

Bruggemann, Monnot, and Woeginger proposed an instance,denoted by G3 in this paper, to show that there exists a localoptimum with respect to LS2N [36].

As shown in Figure 3, this instance is a graph G3 =(V3, E3, L3), where V3 = ( v0, x0, x1, . . ., xk−4, y0, y1,. . ., yk−4), L3 = (1, 2, . . ., k), |V3| = 2k−5, |E3| = 4k−12,and |L3| = k. Figure 4 shows the minimum label spanningtree.

In this instance, the global optimum is X∗3 = (

k−2︷︸︸︷0, . . . , 0, 1,

1).



9

v0

x0 x1 x2 xk-5 xk-41 2 k-4

k-1

k

yk-5y2y1y0 1 2

k-3

k-2

k-1 k-1 k-1

k-1

k k k

k-4

k

yk-4

Fig. 3. Instance G3.

v0

x0 x1 x2

y2y1y0

k-1

k-1 k-1 k-1 k-1

xk-5 xk-4

k kkk

k

yk-5 yk-4

Fig. 4. The MLST of instance G3.

LS2N might be trapped in the local optimum which containslabels 1, 2, . . ., k−2. In fact, to jump out of this local optimum,at least three labels from 1, 2, . . . , k− 2 should be removedand simultaneously two labels k − 1 and k should be added,but the resulting solution is not in the 2-switch neighborhoodof the local optimum. Hence, L2SN may not find the globaloptimum of G3. However, the (1+1) EA is efficient for instanceG3.

Theorem 8: For instance G3, the (1+1) EA starting with anyinitial solution finds the global optimum in expected runtimeO(k2).

Proof: Let L′3 denote the label set 1, . . . , k−2, and let

C = X|c(H(X)) = 1, xk−1 = 1, xk = 1, 2 ≤ |X| ≤ k− 1,i.e, a solution X ∈ C contains labels k− 1 and k and at mostk − 3 labels from L′

3.Note that the global optimum contains only two labels k−1

and k. The optimization process consists of two independentphases: the first phase lasts until a solution X ∈ C isconstructed from an arbitrary solution, and the second phaseends when all |X|− 2 redundant labels from L′

3 are removed.For analyzing the expected runtime of finding a solution

X ∈ C, we partition all solutions that do not belong to C intoseven disjoint subsets D1, D2, D3, D4, D5, D6, D7:D1 = X|c(H(X) = 1, xk−1 = 0, xk = 0, |X| = k − 2;D2 = X|c(H(X) = 1, xk−1 = 0, xk = 1, |X| = k − 2 or|X| = k − 1;D3 = X|c(H(X) = 1, xk−1 = 1, xk = 0, |X| = k − 2 or|X| = k − 1;D4 = X|c(H(X) = 1, xk−1 = 1, xk = 1, |X| = k;D5 = X|c(H(X) > 1, xk−1 = 0, xk = 0;D6 = X|c(H(X) > 1, xk−1 = 0, xk = 1;

D7 = X|c(H(X) > 1, xk−1 = 1, xk = 0.If X ∈ D1, then X will be transformed into C by adding

labels k−1, k, and simultaneously removing three labels fromL′3. The probability of this event is

(k−23

)( 1k )

5(1 − 1k )

k−5=Ω( 1

k2 ), which implies that the expected runtime is O(k2).If X ∈ D2(D3), then X will be transformed into C by

adding label k − 1 (k) and simultaneously removing twolabels from L′


)( 1k )

3(1−1k )

k−3 = Ω( 1k ), which implies that the expected runtime isO(k).

If X ∈ D4, then X will be transformed into C by removinga label from L′


)1k (1−

1k )

k−1 = Ω(1), which implies that the expected runtime isO(1).

If X ∈ D5, then X will be transformed into C bysimultaneously adding labels k − 1 and k. The probabilityof this event is ( 1k )

2(1 − 1k )

k−2= Ω( 1k2 ), which implies that

the expected runtime is O(k2).If X ∈ D6(D7), then X will be transformed into C by

adding label k− 1 (k). The probability of this event is 1k (1−

1k )

k−1= Ω( 1k ), which implies that the expected runtime isO(k).

So, a solution belonging to C will be found in expectedruntime O(k2).

In the second phase, removing each label contained in asolution belonging to C which is from L′

3 will reduce thefitness value, and once it is removed it cannot be addedanymore. According to the Coupon Collectors theorem [39],the second phase ends in expected runtime O(k ln k).

Altogether, the expected runtime for the (1+1) EA startingwith any initial solution to find the global optimum is O(k2).

GSEMO is also efficient for instance G3.Theorem 9: For instance G3, the expected runtime for

GSEMO starting with any initial solution to find the globaloptimum is O(k2 ln k).

Proof: We first analyze the runtime that the all-zerossolution is found by GSEMO starting with an arbitrary so-lution, then analyze the runtime that the global optimum isfound by GSEMO once the all-zeros solution is included inthe population.

It has been proved in Theorem 3 that the expected runtimefor GSEMO starting with any initial solution to find the all-zeros bit string is O(k2 ln k).

Once the all-zeros bit string is included in the population,the Pareto optimal solution X1 with fitness vector (k − 2, 1)will be found by adding label k − 1 or k to the all-zeros bitstring. Since the probability that GSEMO selects the all-zerosbit string from P is Ω( 1k ), and the probability of only flippinga bit corresponding to anyone of such labels is 2

k (1−1k )

k−1 =Ω( 1k ). So, X1 will be included in the population in expectedruntime O(k2) after the all-zeros bit string being included.Then the global optimum X∗

3 with fitness vector (1, 2) willbe found by adding the remaining label from k − 1, k tosolution X1, and the expected runtime to produce solutionX∗

3 from solution X1 is also O(k2).Therefore, the expected runtime for GSEMO starting with

any initial solution to find the global optimum is O(k2 ln k).



10

D. An instance where the (1+1) EA and GSEMO outperformthe modified MVCA

In this subsection, we show that the (1+1) EA and GSEMOoutperform the modified MVCA on an instance proposed byXiong, Golden, and Wasil [17], which is denoted by G4 inthis paper.

Given the bound of the labels’ frequency b(b ≥ 2), let n =b · b! + 1. We construct G4 = (V4, E4, L4) as follows, whereV4 = 1, 2, . . . , n, |V4| = n, and L4 = Ib∪Ib−1∪. . .∪I2∪I ′.

We construct b! groups from V4, each containing b+1 nodes:V ′1 = 1, 2, . . . , b+ 1,

V ′2 = b+ 1, b+ 2, . . . , 2b+ 1,. . .

V ′j = (j − 1)b+ 1, (j − 1)b+ 2, . . . , jb+ 1,. . .

V ′b! = (b!− 1)b+ 1, (b!− 1)b+ 2, . . . , b!b+ 1.In V ′

j (j = 1, 2, . . . , b!), all pairs of consecutive nodes ((j−1)b+1, (j−1)b+2), . . ., (jb, jb+1) are connected by b edges,and all these b edges are labeled by one label. Thus, b! labelsare needed, which constitute the label set I ′.

In each V ′j , node (j − 1)b + 1 is connected to nodes (j −

1)b+ 3, . . ., jb+ 1. The label subset Ih (h = b, b− 1, . . . , 2)is obtained as follows. We choose edge ((j − 1)b + 1, (j −1)b+ 1+ h) in each V ′

j , so there are b! such edges. We labelthe first h edges with one label, and the next h edges witha second label, and so on. So, b!

h labels are needed, and theyconstruct Ih.

Hence, |Ih| = b!h (h = b, b − 1, . . . , 2), |I ′| = b!, and the

total number of labels k =∑b

j=2b!j + b!. The edges with b!

labels from I ′ construct the minimum label spanning tree, so in

this instance the global optimum is X∗4 = (

∑b

j=2

b!j︷︸︸︷

0, . . . , 0,

b!︷︸︸︷1, . . . , 1).

Figure 5 shows an example with b = 4, where the dashededges construct the spanning tree with the minimum numberof labels.

Xiong, Golden, and Wasil used this instance to show thatthe modified MVCA may obtain a solution using all labelsfrom Ib ∪ Ib−1 ∪ . . . ∪ I2 ∪ I ′, which is Hb-approximationsolution, where Hb =

∑bi=1

1i . However, the (1+1) EA can

efficiently find the global optimum of instance G4.Theorem 10: For instance G4, the (1+1) EA starting with

any initial solution finds the global optimum in expectedruntime O(nk), where n = b · b! + 1, k =

∑bj=2

b!j + b!,

and b is the maximum frequency of the labels.Proof: The optimization process consists of two indepen-

dent phases. The first phase ends when the (1+1) EA finds afeasible solution, and the second phase lasts until the (1+1)EA removes all redundant labels from 1, 2, . . . ,

∑bj=2

b!j .

Let X be the current solution. Note that a feasible solutioncontains all labels from I ′. If X is not a feasible solution,then there must exist a bit xh from xi|

∑bj=2

b!j + 1 ≤ i ≤∑b

j=2b!j +b! whose value is 0. So, the (1+1) EA can decrease

the number of connected components by at least one withprobability 1

k (1 −1k )

k−1 ≥ 1ek . This is the probability of the

event that bit xh is flipped from 0 to 1 and the other bitskeep unchanged. Hence, the expected runtime to decrease thenumber of connected components from n to 1 is O(nk), i.e.,a feasible solution will be found by the (1+1) EA in expectedruntime O(nk).

Once a feasible solution X is found, each bit fromxi|

∑bj=2

b!j + 1 ≤ i ≤

∑bj=2

b!j + b! takes value 1, and

the flippings of them cannot be accepted by the (1+1) EA, assuch flippings will create a disconnected spanning subgraph.For each bit xi(1 ≤ i ≤

∑bj=2

b!j ), if xi = 1, then it can be

flipped from 1 to 0, since this will decrease the fitness value;otherwise, its flipping cannot be accepted by the (1+1) EA,as this will increase the fitness value. So, when all bits havebeen selected at least once to flip, the global optimum will befound. According to the Coupon Collector’s theorem [39], theexpected runtime for this to happen is O(k ln k).

Hence, the expected runtime for the (1+1) EA starting withany initial solution to find the global optimum is O(nk+k ln k)= O(nk). Note that n = b · b! + 1 > k = b!(1 + 1

2 + . . .+ 1b ),

and k > ln k. So, n > ln k, and nk > k ln k.GSEMO can solve instance G4 in expected runtime poly-

nomial in k.Theorem 11: For instance G4, GSEMO starting with any

initial solution finds the global optimum in expected runtimeO(k3).

Proof: The optimization process consists of two phases.The first phase lasts until GSEMO starting with any initialsolution finds a solution with fitness vector (1, ∗), and thesecond phase ends when GSEMO finds the global optimum.

Note that a connected spanning subgraph contains all labelsfrom I ′. If X is a solution such that c(H(X)) > 1, then atleast one label from I ′ is not contained in it, and the numberof connected components can be decreased by adding such alabel.

We now analyze the expected runtime that GSEMO startingwith any initial solution finds a solution with fitness vector(1, ∗). If such a solution has not been included in the popu-lation, then there is a solution X from P such that c(H(X))is the minimal, and adding some label l from I ′ to X willreduce the number of connected components. The probabilitythat GSEMO chooses X from P is Ω( 1k ), as the populationsize is O(k), and the probability of flipping only the bitcorresponding to label l is 1

k (1−1k )

k−1 = Ω( 1k ), so a solutionwith a smaller number of connected components will be foundin expected runtime O(k2). After all labels from I ′ beingadded, a connected spanning subgraph will be constructed.Thus a solution with fitness vector (1, ∗) will included inexpected runtime O(b!k2) = O(k3).

Now at least one solution with fitness vector (1, ∗) isincluded in the population. Let X be the solution which hasthe minimum number of labels among all solutions in thepopulation with fitness vector (1, ∗). If X is not the globaloptimum of G4, then GSEMO can finish the second phase byremoving all redundant labels from Ib ∪ Ib−1 ∪ . . .∪ I2. Sincethe probability of selecting X form P is Ω( 1k ), and removingall redundant labels needs an expected runtime O(k ln k), theglobal optimum will be found in expected runtime O(k2 ln k).

Combining the expected runtime in two phases, we finish



11

1

7

1527

27 27

27

1

2 3 4

1

7

1528

28 28

28

9

6 7 8

1

7

1629

29 29

29

13

10 11 12

51

8

1630

30 30

30

17

14 15 16

2

1731

31 31

31

18 19 20

2

8

1732

32 32

32

25

22 23 24

2

9

1833

33 33

33

29

26 27 28

212

9

1834

34 34

34

33

30 31 32

8

3

9

1935

35 35

35

33

34 35 36

3

10

1936

36 36

36

41

38 39 40

3

10

2037

37 37

37

45

42 43 44

373

10

2038

38 38

38

49

46 47 48

4

2139

39 39

39

50 51 52

4

11

2140

40 40

40

57

54 55 56

4

11

2241

41 41

41

61

58 59 60

534

12

2242

42 42

42

65

62 63 64

11

5

12

2343

43 43

43

65

66 67 68

5

12

2344

44 44

44

73

70 71 72

5

13

2445

45 45

45

77

74 75 76

695

13

2446

46 46

46

81

78 79 80

6

2547

47 47

47

82 83 84

6

14

2548

48 48

48

89

86 87 88

6

14

2649

49 49

49

93

90 91 92

856

14

2650

50 50

50

97

94 95 96

13

Fig. 5. Instance G4 with b=4.

the proof.

TABLE IIUPPER BOUNDS ON THE EXPECTED RUNTIME OF THE (1+1) EA, GSEMO,ERA, LS2N, AND THE MODIFIED MVCA TO FIND THE GLOBAL OPTIMAON FOUR INSTANCES. ‘—’ AND ‘INF’ MEAN UNKNOWN AND INFINITE,

RESPECTIVELY.

Instance G1 Instance G2 Instance G3 Instance G4

The (1+1) EA Ω(kµk) O(k ln k) O(k2) O(nk)GSEMO O(k2 ln k) O(k2 ln k) O(k2 ln k) O(k3)ERA — Inf — —LS2N — — Inf —The modified MVCA — — — Inf

Note: Ω(kµk) is the lower bound on the expected runtime that the (1+1) EAstarting with the given initial solution Xl

1 finds the global optimum of G1.

Table II shows that GSEMO outperforms the (1+1) EA onG1. This is mainly because that GSEMO behaves greedily.

It also shows that ERA (LS2N, the modified MVCA) maybe trapped in the local optimum of G2 (G3, G4), thus it cannotefficiently solve G2 (G3, G4). However, the (1+1) EA andGSEMO can efficiently solve them.

This theoretically shows that EAs outperform local searchalgorithms on some instances. The reason is that EAs areglobal search algorithms as they use global mutation, while alocal search algorithm searches the neighborhood of a currentsolution, thus it may be trapped in a local optimum.

Nevertheless, local search algorithms are not always worsethan EAs.

For example, the modified MVCA can efficiently solveinstance G2. Recall that the modified MVCA begins with theall-zeros bit string, then chooses at each step a label suchthat when this label is chosen the decrease in the number ofconnected components is the largest. Therefore, the modifiedMVCA first tests all labels and chooses label k, as when it ischosen the decrease in the number of connected componentsis the largest. Next, the modified MVCA tests the remainingk − 1 labels: 1, 2, . . ., k − 1, and randomly chooses one ofthem as the decrease in the number of connected componentsis the same when each of them is chosen. Hence, a globaloptimum of G2 is found by the modified MVCA in runtime2k−1 which is superior to that of the (1+1) EA and GSEMO.

V. CONCLUSION

In this paper, we investigate the performances of the (1+1)EA and GSEMO for the minimum label spanning tree prob-lem. We reveal that the (1+1) EA and GSEMO can guaranteeto achieve some approximation ratios for the MLST problem.This gives a nice example to show that EAs can guaranteeto achieve some certain approximation ratios for difficult NP-hard problems. We also theoretically show that the (1+1) EAand GSEMO defeat local search algorithms on some instances,and that GSEMO outperforms the (1+1) EA on an instance.

On the MLSTb problem, the (1+1) EA and GSEMOachieve the same approximation ratio of b+1

2 . To find a b+12 -

approximation solution for the MLSTb problem, the upperbound on the expected runtime of GSEMO is larger than thatof the (1+1) EA. However, we still have no idea how to obtainthe lower bounds on their runtime for such an approximationratio.

As for the approximation ratio of the (1+1) EA on theMLST problem, we still know nothing about it. Apart fromthis, since the (1+1) EA and GSEMO are randomized algo-rithms, it is natural to ask whether they can achieve betterapproximate ratios than those guaranteed by some other algo-rithms or not.

Usually, EAs in practice use a population of individuals, andin order to retain the diversity of individuals an appropriatepopulation size is needed, which is different from the (1+1)EA using only one individual. Thus, the performance analysisof population-based EAs is an important research topic, e.g.,[19], [40], [41], [42].

How do population-based EAs such as (µ+λ) EAs performon the MLST problem? Are they better than the (1+1) EA?Crossover is usually an important operator for population-based EAs. Several theoretical investigations have recentlyproven that crossover is essential or useful on some problems,e.g., [43], [44]. Therefore, it is interesting to investigatethe performance of the population-based EA using not onlymutation but also crossover on the MLST problem.

REFERENCES

[1] A. S. Tanenbaum, Computer Networks. Englewood Cliffs, NJ: PrenticeHall, 1989.



12

[2] R.-S. Chang and S.-J. Leu, “The minimum labeling spanning trees,”Inform. Process. Lett., vol. 63, no. 5, pp. 277-282, 1997.

[3] A. Chwatal, G. Raidl, and O. Dietzel, “Compressing fingerprint tem-plates by solving an extended minimum label spanning tree problem,” inProc. 7th Metaheuristics International Conference (MIC’07), Montreal,Canada, Jun. 2007, pp. 105/1-3.

[4] A. Chwatal, G. Raidl, and K. Oberlechner, “Solving a k-node min-imum label spanning arborescence problem to compress fingerprintTemplates,” J. Math. Mod. Algor., vol. 8, pp. 293-334, 2009.

[5] J. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor,MI: University of Michigan Press, 1975.

[6] D. Goldberg. Genetic Algorithms in Search, Optimization, and MachineLearning. New York, NY: Addison-Wesley, 1989.

[7] F. Herrera, M. Lozano and J. Verdegay, “Tackling real-coded geneticalgorithms: Operators and tools for behavioural analysis,” Artif. Intell.Rev., vol. 12, no. 4, pp. 265-319, 1998.

[8] K. Gallagher and M. Sambridge, “Genetic algorithms: a powerful toolfor large-scale non-linear optimization problems,” Comput. Geosci., vol.20, no. 7-8, pp. 1229-1236, 1994.

[9] Y. Xiong, B. Golden, and E. Wasil, “A one-parameter genetic algorithmfor the minimum labeling spanning tree problem,” IEEE Trans. Evol.Comput., vol. 9, no. 1, pp. 55-60, 2005.

[10] J. Nummela and B. Julstrom, “An effective genetic algorithm for theminimum-label spanning tree problem,” in Proc. Genetic Evol. Comput.Conf. (GECCO’06), Seattle, Washington, Jul. 2006, pp. 553-558.

[11] S. Consoli and J. Moreno-Perez, “Solving the minimum labellingspanning tree problem using hybrid local search,” Electronic Notes inDiscrete Mathematics, vol. 39, pp. 75-82, 2012.

[12] A. Chwatal and G. Raidl, “Solving the minimum label spanning treeproblem by mathematical programming techniques,” Vienna Universityof Technology, Institute of Computer Graphics and Algorithms, Tech.Rep. TR 186-1-10-03, Jun. 2010.

[13] R. Cerulli, A. Fink, M. Gentili, and S. Voß, “Metaheuristics comparisonfor the minimum labelling spanning tree problem,” in The Next Waveon Computing, Optimization, and Decision Technologies, B. L. Golden,S. Raghavan, E. A. Wasil, eds. New York: Springer, 2005, pp. 93-106.

[14] S. Consoli, K. Darby-Dowman, N. Mladenovic, and J. Moreno-Perez,“Greedy randomized adaptive search and variable neighbourhood searchfor the minimum labelling spanning tree problem,” Eur. J. Oper. Res.,vol. 196, pp. 440-449, 2009.

[15] S. O. Krumke and H. Wirth, “On the minimum label spanning treeproblem,” Inform. Process. Lett., vol. 66, no. 2, pp. 81-85, 1998.

[16] Y. Wan, G. Chen, and Y. Xu, “A note on the minimum label spanningtree,” Inform. Process. Lett., vol. 84, pp. 99-101, 2002.

[17] Y. Xiong, B. Golden, and E. Wasil, “Worst-case behavior of the MVCAheuristic for the minimum labeling spanning tree problem,” Oper. Res.Lett., vol. 33, pp. 77-80, 2005.

[18] S. Jansen and I. Wegener, “Evolutionary algorithms: how to cope withplateaus of constant fitness and when to reject strings of the samefitness,” IEEE Trans. Evol. Comput., vol. 5, no. 6, pp. 589-599, 2001.

[19] J. He and X. Yao, “Drift analysis and average time complexity ofevolutionary algorithms,” Artif. Intell., vol. 127, no. 1, pp. 57-85, 2001.

[20] S. Droste, T. Jansen, and I. Wegener, “On the analysis of the (1+1)evolutionary algorithm,” Theor. Comput. Sci., vol. 276, no. 1-2, pp. 51-81, 2002.

[21] J. He and X. Yao, “Towards an analytic framework for analysing thecomputation time of evolutionary algorithms,” Artif. Intell., vol. 145, pp.59-97, 2003.

[22] F. Neumann, J. Reichel, and M. Skutella, “Computing minimum cutsby randomized search heuristics,” Algorithmica, vol. 59, pp. 323-342,2011.

[23] Y. Zhou, J. He, and Q. Nie, “A comparative runtime analysis of heuristicalgorithms for satisfiability problems,” Artif. Intell., vol. 173, pp. 240-257, 2009.

[24] F. Neumann and I. Wegener, “Randomized local search, evolutionaryalgorithms, and the minimum spanning tree problem,” Theor. Comput.Sci., vol. 378, pp. 32-40, 2007.

[25] F. Neumann, “Expected runtimes of evolutionary algorithms for theEulerian cycle problem,” Comput. Oper. Res., vol. 35, pp. 2750-759,2008.

[26] A. Sutton and F. Neumann, “A parameterized runtime analysis ofevolutionary algorithms for the euclidean traveling salesperson problem,”in Proc. 26th Conference on Artificial Intelligence (AAAI’12), 2012, pp.1105-1111.

[27] O. Giel and I. Wegener, “Evolutionary algorithms and the maximummatching problem,” in Proc. 20th Annu. Symp. Theor. Aspects Comput.Sci. (STACS), LNCS vol. 2607, Berlin, Germany, 2003, pp. 415-426.

[28] P. Oliveto, J. He, and X. Yao, “Analysis of the (1+1)-EA for findingapproximate solutions to vertex cover problems,” IEEE Trans. Evol.Comput., vol. 13, no. 5, pp. 1006-1029, 2009.

[29] T. Friedrich, J. He, N. Hebbinghaus, F. Neumann, and C. Witt, “Ap-proximating covering problems by randomized search heuristics usingmulti-objective models,” Evol. Comput., vol. 18, no. 4, pp. 617-633,2010.

[30] C. Witt, “Worst-case and average-case approximations by simple ran-domized search heuristics,” in Proc. 22nd Annu. Symp. Theor. AspectsComput. Sci. (STACS), LNCS vol. 3404. Stuttgart, Germany, Feb. 2005,pp. 44-56.

[31] F. Neumann and J. Reichel, “Approximating minimum multicuts by evo-lutionary multi-objective algorithms,” in Proc. 10th Int. Conf. ParallelProblem Solving Nature (PPSN X), Dortmund, Germany, 2008, pp. 72-81.

[32] Y. Yu, X. Yao, and Z. Zhou, “On the approximation ability of evolution-ary optimization with application to minimum set cover,” Artif. Intell.,vol. 180-181, pp. 20-33, 2012.

[33] F. Neumann and I. Wegener, “Minimum spanning trees made easiervia multi-objective optimization,” Natural Computing, vol. 5, no. 3, pp.305-319, 2006.

[34] F. Neumann, “Expected runtimes of a simple evolutionary algorithm forthe multi-objective minimum spanning tree problem,” Eur. J. Oper. Res.,vol. 181, pp. 1620-1629, 2007.

[35] C. Qian, Y. Yu, Z.-H. Zhou, “An analysis on recombination in multi-objective evolutionary optimization“, Artif. Intell., vol. 204, pp. 99-119,2013.

[36] T. Bruggemann, J. Monnot, and G. Woeginger, “Local search for theminimum label spanning tree problem with bounded color classes,”Oper. Res. Lett., vol. 31, pp. 195-201, 2003.

[37] O. Giel, “Expected runtimes of a simple multi-objective evolutionaryalgorithm,” in Proc. IEEE Congr. Evol. Comput., 2003, pp. 1918-1925.

[38] M. Laumanns, L. Thiele, and E. Zitzler, “Running time analysis ofmultiobjective evolutionary algorithms on pseudo-boolean functions,”IEEE Trans. Evol. Comput., vol. 8, no. 2, pp. 170-182, 2004.

[39] M. Mitzenmacher and E. Upfal, Propability and Computing. Cambridge:Cambridge University Press, 2005.

[40] T. Jansen, K. A. D. Jong, and I. Wegener, “On the choice of the offspringpopulation size in evolutionary algorithms,” Evol. Comput., vol. 13, no.4, pp. 413-440, 2005.

[41] T. Chen, J. He, G. Sun, G. Chen, and X. Yao, “A new approach toanalyzing average time complexity of population-based evolutionaryalgorithms on unimodal problems,” IEEE Trans. Syst., Man, Cybern.,B Cybern., vol. 39, no. 5, pp. 1092-1106, 2009.

[42] T. Chen, K. Tang, G. Chen, and X. Yao, “A large population size canbe unhelpful in evolutionary algorithms,” Theor. Comput. Sci., vol. 436,no. 8, pp. 54-70, 2012.

[43] T. Jansen and I. Wegener, “Real royal road functions—where crossoverprovably is essential,” Discrete Appl. Math., vol. 149, pp. 111-125, 2005.

[44] B. Doerr, E. Happ, C. Klein, “Crossover can provably be useful inevolutionary computation,” Theor. Comput. Sci., vol. 425, pp. 17-33,2012.

Xinsheng Lai received the M.Sc. degree in Com-puter Applied Technology from Guizhou University,Guiyang, China, in 2004, and he is now working to-wards the Ph.D. degree at South China University ofTechnology, Guangzhou, China. His main researchinterests include evolutionary computation, neuralcomputation, and their applications in real world.



13

Yuren Zhou received the B. Sc. degree in the Math-ematics from Peking University, Beijing, China, in1988, the M. Sc. degree in the Mathematics fromWuhan University, Wuhan, China, in 1991, and thePh. D. in Computer Science from Wuhan Universityin 2003. He is currently a professor in School ofComputer Science and Engineering, South ChinaUniversity of Technology, Guangzhou, China. Hiscurrent research interests are focused on design andanalysis of algorithms, evolutionary computation anddata mining.

Jun He (M’06) received the B.S., M.Sc. degreesin mathematics and the Ph.D. degree in computerscience all from Wuhan University, Wuhan, China,in 1989, 1992 and 1995, respectively.

He is currently a Senior Lecturer at AberystwythUniversity, United Kingdom. His major researchinterests include evolutionary computation, globaloptimization and network security. He has publishedover 80 papers in these areas.

Jun Zhang (M’02-SM’08) received the Ph.D. de-gree in Electrical Engineering from the City Uni-versity of Hong Kong, Kowloon, Hong Kong, in2002. From 2003 to 2004, he was a Brain Korean21 Post-Doctoral Fellow with the Department ofElectrical Engineering and Computer Science, Ko-rea Advanced Institute of Science and Technology,Daejeon, Korea. Since 2004, he has been with SunYat-Sen University, Guangzhou, China, where heis currently a Cheung Kong Professor with theSchool of Advanced Computing. He has authored

seven research books and book chapters, and over 100 technical papersin his research areas. His current research interests include computationalintelligence, cloud computing, high performance computing, data mining,wireless sensor networks, operations research, and power electronic circuits.

Dr. Zhang was a recipient of the China National Funds for DistinguishedYoung Scientists from the National Natural Science Foundation of Chinain 2011 and the First-Grade Award in Natural Science Research from theMinistry of Education, China, in 2009. He is currently an Associate Editor ofthe IEEE Transactions on Evolutionary Computation, the IEEE Transactionson Industrial Electronics, the IEEE Transactions on Cybernetics, and the IEEEComputational Intelligence Magazine. He is the Founding and Current Chairof the IEEE Guangzhou Subsection and IEEE Beijing (Guangzhou) SectionComputational Intelligence Society Chapters. He is the Founding and CurrentChair of the ACM Guangzhou Chapter.

Performance Analysis of Evolutionary Algorithms for the Minimum Label Spanning...

Documents

Transcript of Performance Analysis of Evolutionary Algorithms for the Minimum Label Spanning...