Large Neighborhood Search with Quality Guarantees for ... · Large Neighborhood Search with Quality...

Large Neighborhood Search with Quality Guaranteesfor Distributed Constraint Optimization Problems

Ferdinando Fioretto1,2, Federico Campeotto2,Agostino Dovier2, Enrico Pontelli1, and William Yeoh1

1 Department of Computer Science, New Mexico State University2 Department of Mathematics & Computer Science, University of Udine

{ffiorett,wyeoh,epontell}@cs.nmsu.edu,{federico.campeotto,agostino.dovier}@uniud.it

Abstract. The field of Distributed Constraint Optimization has gained momen-tum in recent years, thanks to its ability to address various applications related tomulti-agent cooperation. Nevertheless, solving Distributed Constraint Optimiza-tion Problems (DCOPs) optimally is NP-hard. Therefore, in large-scale appli-cations, incomplete DCOP algorithms are desirable. Current incomplete searchtechniques have subsets of the following limitations: (a) they find local minimawithout quality guarantees; (b) they provide loose quality assessment; or (c) theycannot exploit certain problem structures such as hard constraints. Therefore, cap-italizing on strategies from the centralized constraint reasoning community, wepropose to adapt the Large Neighborhood Search (LNS) strategy to solve DCOPs,resulting in the general Distributed LNS (D-LNS) framework. The characteristicsof this framework are as follows: (i) it is anytime; (ii) it provides quality guaran-tees by refining online upper and lower bounds on its solution quality; and (iii)it can learn online the best neighborhood to explore. Experimental results showthat D-LNS outperforms other incomplete DCOP algorithms for both random andscale-free network instances.

1 Introduction

In a distributed constraint optimization problem (DCOP), agents coordinate their valueassignments to maximize the sum of resulting constraint utilities [17, 29]. Researchershave used them to model various multi-agent coordination and resource allocation prob-lems [14, 30].

Complete DCOP algorithms [17, 21] find optimal DCOP solutions at the cost of alarge runtime or network load, while incomplete approaches trade off optimality forlower usage of resources. Since finding optimal DCOP solutions is NP-hard, incom-plete algorithms are often necessary to solve large problems of interest. Unfortunately,several local search algorithms (e.g., DBA [12], MGM [15]) and local inference algo-rithms (e.g., Max-Sum [6]) do not provide any guarantees on the quality of the solutionsfound. More recent developments, such as region-optimal algorithms [20, 27], BoundedMax-Sum [23], and Divide-and-Conquer (DaC) algorithms [26, 11] alleviate this limita-tion. Region-optimal algorithms allow one to specify regions with a maximum size of kagents or t hops from each agent, and they optimally solve the subproblem within each

region. Solution quality bounds are provided as a function of k or t. Bounded Max-Sum is an extension of Max-Sum, which solves optimally an acyclic version of theDCOP graph, bounding its solution quality as a function of the edges removed from thecyclic graph. Finally, DaC-based algorithms use Lagrangian decomposition techniquesto solve agent subproblems sub-optimally and to provide quality bounded solutions. Re-searchers have also developed extensions to complete algorithms that trade off solutionqualities and runtimes. For example, complete search algorithms have mechanisms thatallow users to specify absolute or relative error bounds [28, 22]. Another class of DCOPalgorithms that provide quality guarantees are sampling-based algorithms [18], whichprovide probabilistic bounds as a function of the number of samples. Researchers havealso used distributed learning algorithms to solve Markovian Dynamic DCOPs (MD-DCOPs), a hybrid MDP and DCOP model, where the agents use distributed Q-learningto choose actions that maximize their cumulative average reward [19]

Large Neighborhood Search (LNS) [25] is a centralized local search algorithm thatis very effective in solving large centralized optimization problems. It alternates twophases: a destroy phase, which selects a subset of variables to freeze and assigns themvalues from the previous iteration, and a repair phase, in which it determines an im-proved solution by searching over the values of non-frozen variables (i.e., a large neigh-borhood).

Good quality assessment is essential for sub-optimal DCOP algorithms. However,current incomplete DCOP approaches can provide arbitrarily poor quality assessments(as illustrated in our experimental results). In addition, they are unable to exploit theproblem structure provided by the presence of hard constraints. In this paper, we ad-dress these limitations by introducing the Distributed LNS (D-LNS) framework, whichbuilds on the strengths of LNS to solve DCOPs. This work advances the state of theart in DCOP resolution: (1) We provide a novel distributed local search frameworkfor DCOPs, based on LNS, which learns online the neighborhood to search at eachiteration; (2) We introduce two novel distributed search algorithms, D-BLNS and D-RLNS, characterized by low network usage and low computational complexity peragent; and (3) Given our D-LNS framework, we show that, compared to existing in-complete DCOP algorithms, D-BLNS converges to better solutions, converges to themfaster, provides tighter bounds, and scales better to large problems.

2 Background: DCOP

A DCOP is defined by a tuple 〈X ,D,F ,A, α〉, where X = {x1, . . . , xn} is aset of variables, D = {D1, . . . , Dn} is a set of finite domains (i.e., xi ∈ Di),F = {f1, . . . , fe} is a set of utility functions (also called constraints), where fi :"xj∈scope(fi)Dj −→ N ∪ {−∞} and scope(fi) ⊆ X is the set of the variables rel-evant to fi,A = {a1, . . . , ap} is a set of agents, and α : X → A is a function that mapseach variable to one agent. Following common conventions, we restrict our attention tobinary utility functions and assume that α is a bijection: each agent controls exactly onevariable. We will use the terms “variable” and “agent” interchangeably and assume thatα(xi) = ai. Each fj specifies the utility of each combination of values to the variablesin scope(fj).

�

��

��

��

fc

fn�

��

��

��

a2

a1

a4

x4

x2

x1

a3

x3

x4

a4

x2

a2

x1

a1

x3

a3fn fn

fc

fc fc

(a) Constraint Graph (b) Pseudo-tree (c) Constraints

Fig. 1: Example DCOP

A solution σ is a value assignment for a set Xσ ⊆ X of variables that is consistentwith their respective domains. The utility F(σ) =

∑f∈F,scope(f)⊆Xσ f(σ)3 is the sum

of the utilities across all the applicable utility functions in σ. A solution σ is completeif Xσ =X . We will denote by x a complete solution, while xi is the value of xi in x.The goal is to find an optimal complete solution x∗ = argmaxx F(x).

Given a DCOP P ,GP = (X , EF ) is the constraint graph of P , where {x, y} ∈ EFiff ∃fi ∈ F s.t. {x, y} ⊆ scope(fi). A DFS pseudo-tree arrangement for GP is aspanning tree T = 〈X , ET 〉 of GP s.t. if fi ∈ F and {x, y} ⊆ scope(fi), then x, yappear in the same branch of T . Edges ofGP that are in (resp. out of)ET are called treeedges (resp. backedges). The tree edges connect parent-child nodes, while backedgesconnect a node with its pseudo-parents and its pseudo-children. We useN(ai) = {aj ∈A | {xi, xj} ∈ EF} to denote the neighbors of the agent ai.

Figure 1(a) shows the constraint graph of a DCOP with four agents a1, . . . , a4, eachcontrolling one variable with domain {0,1}. Figure 1(b) shows one possible pseudo-tree (solid lines are tree edges and dotted lines are backedges). Each of the five edges isassociated to one utility function, labeled as fc or fn. Figure 1(c) shows their utilities.

When pseudo-trees are referred in iterative algorithms execution, T 1 denotes theinitial pseudo-tree, and T k denotes the pseudo-tree at the kth iteration.

3 D-LNS Framework

In this section, we introduce D-LNS, a general distributed LNS framework to solveDCOPs that allows (1) different strategies to choose the neighborhood as well as(2) different strategies to search over that neighborhood. We propose two learning-based approaches to choose neighborhoods and two search strategies to search over theneighborhoods. Our D-LNS solutions take into account factors such as network loads(i.e., number and size of messages exchanged by agents) as well as the restriction thateach agent is only aware of its local subproblem (i.e., its neighbors and the constraintswhose scope includes its variables).

Algorithm 1 describes the general structure of the D-LNS framework, which is ex-ecuted by each agent ai ∈ A. Agent ai first initializes its counter k (line 1) and calls

3 With a slight abuse of notation, we use F to denote the set of constraints as well as the overallutility function of the DCOP.

Algorithm 1: D-LNS1 k ← 1;2 〈x1

i , LB1i , UB

1i , Q

1i , Q

2i 〉 ← VALUE-INITIALIZATION();

3 while termination condition is not met do4 k ← k + 1;5 Qki ←UPDATE-RL(Qk−1

i );6 zki ← DESTROY-ALGORITHM(Qki );7 if zki = fr then xki ← xk−1

i ;8 else xki ← NULL;9 〈xki , LBki , UBki 〉 ← REPAIR-ALGORITHM();

10 if ¬ Accept (xki ,xk−1i ) then xki ← xk−1

i ;

the VALUE-INITIALIZATION function to initialize: its current local value assignmentx1i , its current lower and upper bounds LB1

i and UB1i of the optimal utility, and its

Q-values Q1i and Q2

i , which are used to learn good neighborhoods (see Sections 3.1and 3.2). In the rest of the algorithm (lines 3-10), we iterate through the destroy andrepair phases as in LNS, until a termination condition occurs (line 3), such as reachinga maximum value of k, a timeout limit, or a threshold on the confidence error of thereported best solution.• DESTROY PHASE: This phase aims at destroying part of the current solution, by

freezing some variables (i.e., maintain their values from the previous iteration), andenabling the optimization over the non-frozen variables (large neighborhood). Lo-cally, the agent ai calls the function DESTROY-ALGORITHM to determine if its localvariable xi should be frozen or not, as indicated by the flag zki ∈ {fr,¬fr} (line 6).Such decision is made through Q-values that are updated according to the reinforce-ment learning function UPDATE-RL (line 5). D-LNS allows the agent to use anylearning algorithm—we discuss two distributed Q-learning processes in Sect. 3.1.Once the frozen variables are selected, the agent assigns them the values from theprevious iteration and resets the value of the other variables (lines 7-8).• REPAIR PHASE: In this phase, the agent seeks a new value assignment for the non-

frozen variables using the REPAIR-ALGORITHM function (line 9). The generality ofD-LNS allows the agent to use any algorithm to solve this problem; we introduce twodistributed algorithms to do so in Sect. 3.2. Once the agent finds and evaluates a newsolution, it either accepts it or rejects it (line 10).

While most of the current local search DCOP algorithms fail to guarantee the con-sistency of the solution returned with respect to the hard constraints of the problem [20,12, 18], D-LNS naturally accommodates the consistency check during the repair phase.In our two distributed algorithms, the agent accepts the solution if it does not violateany hard constraints, that is, its utility is not −∞.

3.1 Destroy Phase

The destroy phase determines a set of variables to include in the large neighborhood,such that the repair algorithm can find good solutions quickly. This process can be chal-lenging without taking into account the structural properties of the problem at hand. We

propose to use a model-free distributed reinforcement learning-based approach, calledDistributed Q-Learning. Q-Learning does not make any assumption on the model, itsconvergence to an optimal policy is guaranteed for an arbitrary exploration strategy and,crucially, the optimal policy can be learned in distributed, deterministic environments.Let us provide some background on Distributed Q-Learning [16].

Distributed Q-Learning is used to find an optimal action-choice policy in multi-agent Markov Decision Processes (MDPs) by analyzing the history of the action-rewardpairs obtained by the agent. It works by learning an action-value function Q : S ×A → R, where S is a finite set of states and A is a finite set of executable actions(e.g., possible choices of large neighborhoods). In such a problem, an optimal policyπ∗ : S → A can be decomposed into n component-policies π∗i : S → Ai with Ai

being the set of actions executable by agent ai and π∗(s) = (π∗1(s), . . . , π∗n(s)). Thedistributed Q-learning for multi-agent deterministic MDPs is described by the followingiterative rule,

Qk+1i (s, zi) = max

{Qki (s, zi),

Ri(s, zi) + λ maxz′i∈Ai

Qki (s′, z′i)

}(1)

whereQ1i (s, zi) = 0,Ri : S×Ai → R is a reward function of agent ai that determines

the reward of performing action zi in state s, and s′ is the resulting state after executingaction z′i in state s—described by the transition function Ti : S ×Ai → S. The ideais that the results of the selected transitions are accumulated into Qi using successivemaximizations.

One can map the problem of selecting the large neighborhood to a deterministicmulti-agent MDP, where: S is the set of all possible complete solutions; for a state s,action zi of agent ai, and iteration k, Ai is the set {fr,¬fr}; the reward computed atiteration k is Ri(s, zi) = F(xk), with xk the complete solution in such iteration; andTi(s, zi) is the next solution found by the repair algorithm. The goal is to find a map-ping π∗i : S→ Ai that determines the action for each agent at each state, which corre-sponds to the decision of the agent to freeze its variable or not based on the completesolution in the previous iteration.

To update their Q-values based on Equation (1), each agent needs to know the com-plete solutions of states s and s′. Thus, we allow the agents to broadcast their localassignments to the other agents so that all agents can construct the complete solution.While allowing agents to broadcast is not uncommon in DCOP algorithms [8], it maybe a shortcoming for applications where privacy is a concern.

We also introduce a second learning approach, called Distributed Local Q-Learning,that preserves privacy by considering only local states. In this approach, S = "ai∈A Si,where the set of local states Si = "aj∈N(ai)Dj × Di is the set of local so-lutions. For a local state si, action zi of agent ai, and iteration k, Ri(si, zi) =∑{f∈F | xi∈scope(f)} f(xk) is the utility of the local solution in iteration k; and

Ti(si, zi) is the next local solution found by the repair algorithm. Although conver-gence guarantees are not available, our experimental results in Sect. 5 show that thisapproach works as well as the (broadcasting) Distributed Q-Learning approach.

3.2 Repair Phase

The goal of the repair phase is to find a new improved solution by searching over a largeneighborhood. We developed two distributed algorithms, Distributed Bounded LNS andDistributed Random LNS, where the former provides an online error bound and thelatter does not.

Distributed Bounded LNS The Distributed Bounded LNS (D-BLNS) algorithm at-tempts to iteratively refine the lower and upper bounds of the DCOP solution. In eachiteration k, it executes the following phases.• RELAXATION PHASE: Given a DCOP P , this phase computes two relaxations4 ofP , P k and P k, which are used to compute, respectively, a lower and an upper boundon the optimal utility for P at each iteration k of the algorithm. Let LNk be theset of large neighborhood variables (i.e., non-frozen variables) in iteration k. Bothproblem relaxations are solved using the same underlying pseudo-tree structure T k =〈LNk, ETk〉, computed from the subgraph of GP restricted to the nodes in LNk,i.e., 〈LNk, Ek〉 where Ek = {{x, y} | {x, y} ∈ EF ;x, y ∈ LNk}. Note that thisprocess may create a forest, which does not affect the correctness of the algorithm.5

In particular, the heuristic used for building T k prefers the inclusion of edges of GPthat have not been used in T 1, . . . , T k−1. In the relaxed DCOP P k, we wish to finda solution xk using6

xk = argmaxx

F k(x)

= argmaxx

∑f∈E

Tk

f(xi,xj) +∑

xi∈LNk,xj 6∈LNkf(xi, x

k−1j )

(2)

where xk−1j is the value assigned to variable xj for problem P k−1 in the previousiteration. The first summation is over all functions listed as tree edges in T k, whilethe second is over all functions between a non-frozen variable and a frozen vari-able. Thus, solving problem P k means optimizing over all the non-frozen variablesunder the assumption that the frozen variables take on their previous value and ignor-ing backedges. This solution is used to compute lower bounds, during the boundingphase. In the problem P k, we wish to find a solution xk using

xk = argmaxx

F k(x) = argmaxx

∑f∈F

fk(xi,xj) (3)

where fk(xi,xj) is defined as:

fk(xi,xj)=

max

di∈Di,dj∈Djf(di, dj) if Γ kf = ∅

max{f(xi,xj), max

k′∈Γk−1f

f(xk′i , x

k′j )}

otherwise

4 I.e., defined on a relaxed version of the constraint graph GP .5 One can execute the algorithm in each tree of the forest.6 We identify each edge {x, y} of the tree with the constraint f(x, y) that generated it.

where Γ kf is the set of past iteration indices for which the function f was a tree edgein the pseudo-tree. Specifically,

Γ kf = {k′ | f ∈ ETk′ ∧ 1 ≤ k′ ≤ k} (4)

Therefore, the utility of F k(xk) is composed of two parts. The first part involves allfunctions that have never been part of a pseudo-tree up to the current iteration, andthe second part involves all the remaining functions. The utility of each function inthe first part is the maximal utility over all possible pairs of value combinations ofvariables in the scope of that function, and the utility of each function in the secondpart is the largest utility over all pairs of value combinations in previous solutions.In summary, solving P k means finding the solution xk that maximizes F k(xk). Thissolution is used to compute upper bounds (during the bounding phase).

• SOLVING PHASE: Next, D-BLNS solves the relaxed DCOPs P k and P k using Equa-tions (2) and (3). At a high-level, D-BLNS is an inference-based algorithm, similarto DPOP [21], where each agent, starting from the leaves of the pseudo-tree, projectsout its own variable and sends its projected utilities to its parent. These utilities arepropagated up the pseudo-tree until they reach the root. The hard constraints of theproblem are naturally handled in this phase, where the agent prunes all inconsistentvalues before sending a message to its parent. Once the root receives utilities fromeach of its children, it selects the value that maximizes its utility and sends it downto its children, which repeat the same process. These values are propagated downthe pseudo-tree until they reach the leaves, at which point the problem is solved. D-BLNS solves the two relaxed DCOPS in parallel. In the utility propagation phase,each agent computes two sets of utilities, one for each relaxed problem, and sendsthem to its parent. In the value propagation phase, each agent selects two values, onefor each relaxed problem, and sends them to its children.• BOUNDING PHASE: Once the relaxed problems are solved, D-BLNS computes the

lower and upper bounds based on the solutions xk and xk. As we show in Sect. 4,

F(xk)≤F(x∗)≤ F k(xk) (Theorems 1 and 2). Therefore ρ =mink F

k(xk)

maxk F(xk)is a

guaranteed approximation ratio for P .

Example Trace: Figure 2 presents a running example illustrating the behavior of theD-BLNS during the first three LNS iterations. The example uses the DCOP of Figure 1.The pseudo-trees T 1, T 2, and T 3 are represented by bold lines (solid for tree edgesand dotted for backedges). The frozen variables in each LNk are shaded gray, and theconstraints they participate in are represented by thin dotted lines. Nodes representingnon-frozen variables are labeled with red values representing xki ; nodes representingfrozen variables are labeled with black values representing xk−1i . Each tree edge islabeled with a black bold value representing the utility f(xki , x

kj ) of the function corre-

sponding to that edge. Each backedge is labeled with a pair of values representing theutilities fk(xki , x

kj ) (top, in parenthesis) and f(xki , x

kj ) (bottom) of the corresponding

functions. The lower and upper bounds of each iteration are shown below the pseudo-tree.

In the first iteration (k=1), the algorithm solves P 1 finding solution x1 with utilityF 1(x1) = f(x1

1, x12) + f(x1

2, x13) + f(x1

2, x14) = 10 + 10 + 10 = 30. It evaluates this

10

10 10 9

6

9

6

k=1 k=2 k=30

0

0 0

0

0

1 0

0

0

1 1

��

��

(10)0_(10)

0_ (10)

0_

(10)10_

(10)10_ (10)

10_

(10)9_

(6)6_

��

x4

x2

x1

x3 x4

x2

x1

x3x4

x2

x1

x3

Fig. 2: Distributed Bounded LNS Example Tracesolution with the true utility function F(x1)= F 1(x1) + f(x1

1, x13) + f(x1

1, x14)=30 +

0 + 0=30 to get the lower bound. The algorithm also solves P 1 and finds a solution x1

with utility F 1(x1)= f1(x11, x

12)+f1(x1

2, x13)+f1(x1

2, x14)+f1(x1

1, x13)+f1(x1

1, x14)=

10 + 10 + 10 + 10 + 10=50, which is the upper bound.In the second iteration (k = 2), the algorithm freezes x4 and assigns it its value in

the previous iteration x24 = x1

4 = 0. Then, it builds the pseudo-tree with the remainingvariables and chooses f(x1, x3) as a tree edge, as it has never been chosen as a tree edgebefore. Solving P 2 yields solution x2 with utility F 2(x2) = f(x2

1, x23) + f(x2

2, x23) +

f(x21, x

24) + f(x2

2, x24)=6 + 9 + 0 + 10=25, which results in a lower bound F(x2)=

F 2(x2)+f(x21, x

22)=25+10=35. Solving P 2 yields solution x2 with utility F 2(x2)=

f2(x21, x

22)+f2(x2

2, x23)+f2(x2

2, x24)+f2(x2

1, x23)+f2(x2

1, x24)=10+10+10+6+10=

46, which is the current upper bound.Finally, in the third iteration (k=3), the algorithm freezes x3 and assigns it its value

in the previous iteration x33 = x2

3 =1 and builds the new pseudo-tree with the remainingvariables. Solving P 3 and P 3 yields solutions x3 and x3, respectively, with utilitiesF 3(x3) = 6 + 9 + 9 + 6 = 30, which results in a lower bound F(x3) = 30 + 10 = 40,and an upper bound F 2(x2)=10 + 10 + 10 + 6 + 6=42.

Distributed Random LNS In complex application domains with a large number ofvariables and/or large domain sizes, D-BLNS may be computationally inefficient. Tocope with such problem, we propose Distributed Random LNS (D-RLNS), an alterna-tive approach relying on the use of random value selections to reduce computation andcommunication costs. This algorithm is similar to D-BLNS except for the solving andbounding phases.

During the solving phase, D-RLNS starts by having each leaf agent randomly sam-ple S values from its domain, and send them to its parent. Each agent, upon receivingthe values from all of its children, performs the same operations, as well as aggregatingthe sum of the utilities in its subtree considering only tree edges and edges with frozenvariables. Thus, these utilities are propagated up to the pseudo-tree until they reach theroot. Once the root agent has received utilities from each of its children, it chooses thevalue (from among the S samples) that maximizes its utility, and sends this value down

to its children, which, in turn repeat the same process. Finally, at each iteration, the D-RLNS bounding phase computes exclusively the lower bound of the current solution.

4 Theoretical Properties

Theorem 1. For each large neighborhood LNk,

F(xk) ≤ F(x∗).

PROOF (SKETCH). The result trivially follows from the fact that xk is an optimal solution ofthe relaxed problem P whose functions are a subset of F . �

Lemma 1. For each iteration k,∑f∈Tk

f(xki , xkj ) ≥

∑f∈Tk

f(x∗i ,x∗j )

where xki is the value assignment to variable xi when solving the relaxed DCOP P andx∗i is the value assignment to variable xi when solving the original DCOP P .

PROOF (SKETCH). This lemma follows from the fact that, in each iteration k, the functionsassociated with the tree edges in T k are solved optimally. �

Lemma 2. For each iteration k,∑f∈Θk

fk(xki , xkj ) ≥

∑f∈Θk

f(x∗i ,x∗j ).

where Θk = {f ∈ F | Γ kf 6= ∅} is the set of functions that have been chosen as treeedges of a pseudo-tree in a previous iteration.

PROOF (SKETCH). We prove by induction on the iteration k. For k = 0, then Θ0 = ∅ and,thus, the statement vacuously holds. Assume the claim holds up to iteration k − 1. For iterationk it follows that,∑

f∈Θkfk(xki , x

kj )

=∑

f∈Θk\Θk−1

fk(xki , xkj ) +

∑f∈Θk−1

fk(xki , xkj )

=∑

f∈Θk\Θk−1

fk(xki , xkj ) +

∑f∈Θk−1∩E

Tk

fk(xki , xkj ) +

∑f∈Θk−1\E

Tk

fk(xki , xkj )

=∑

f∈Θk\Θk−1

fk(xki , xkj ) +

∑f∈Θk−1∩E

Tk

max{f(xki , x

kj ), fk−1(xk−1

i , xk−1j )

}+

∑f∈Θk−1\E

Tk+1

fk(xki , xkj ) (by definition of fk)

≥∑

f∈ETk

fk(xki , xkj ) +

∑f∈Θk−1

fk−1(xk−1i , xk−1

j )

−∑

f∈ETk∩Θk−1

max{f(xki , x

kj ), fk−1(xk−1

i , xk−1j ), f(x∗i ,x

∗j )}

≥∑

f∈ETk

f(x∗i ,x∗j ) +

∑f∈Θk−1

f(x∗i ,x∗j )

−∑

f∈ETk∩Θk−1

max{f(xki , x

kj ), fk−1(xk−1

i , xk−1j ), f(x∗i ,x

∗j )}

(by Lemma 1 and induction assumption)

≥∑

f∈Θk\Θk−1

f(x∗i ,x∗j ) +

∑f∈Θk−1

f(x∗i ,x∗j ) (since Θk \Θk−1 = T k \ (T k ∩Θk−1))

=∑f∈Θk

f(x∗i ,x∗j )

which concludes the proof. �

Theorem 2. For each large neighborhood LNk,

F k(xk) ≥ F(x∗).

PROOF (SKETCH).

F k(x) =∑f∈F

fk(xki , xkj )

=∑f∈Θk

fk(xki , xkj ) +

∑f 6∈Θk

fk(xki , xkj )

=∑f∈Θk

fk(xki , xkj ) +

∑f 6∈Θk

maxdi,dj

f(di, dj) (by definition of fk)

≥∑f∈Θk

f(x∗i , x∗j ) +

∑f 6∈Θk

f(x∗i , x∗j ) (by Lemma 2)

= F(x∗)

which concludes the proof. �

Corollary 1. An approximation ratio for the problem is

ρ =mink F

k(xk)

maxk F(xk)≥ F(x∗)

maxk F(xk)

PROOF (SKETCH). This result follows from maxk F(xk) ≤ F(x∗) (Theorem 1) andmink F

k(xk) ≥ F(x∗) (Theorem 2). �

Property 1. In each iteration, D-BLNS requires O(|A|) number of messages of sizeO(d), where d = maxai∈A |Di|.

Property 2. In each iteration, the computation complexity of each D-BLNS agent isO(d2), where d = maxai∈A |Di|.

● ● ●Random Q−Learning LQ−Learning

Simulated Time (ms)

0 250 500 750 1000

0.96

0.98

1.00

1.02

1.04S

olut

ion

Qua

lity

● ● ● ● ● ● ● ● ● ● ● ● ●●

●● ●

● ● ● ● ● ● ● ● ●

●

●●

●● ● ● ● ● ● ● ● ●

●●

●●

●● ● ● ● ● ● ● ●

●

●

● ● ● ● ● ● ● ● ● ● ●

●

●

● ● ● ● ● ● ● ● ● ● ●

|Di|=10 |Di|=25 |Di|=50 |Di|=100time ρ time ρ time ρ time ρ

D-RLNS 60 2.07 75 2.17 120 2.31 261 2.48D-BLNS 364 2.00 1507 2.07 5389 2.17 19792 2.23

Fig. 3: D-LNS Destroy (left) and Repair (right) Methods

5 Experimental Results

We evaluate our D-LNS framework against state-of-the-art incomplete DCOP algo-rithms with and without quality guarantees: Bounded Max-Sum (BMS) [23], k-optimalalgorithms (KOPT) [20], t-optimal algorithms (TOPT) [13], Distributed Breakout Al-gorithm (DBA) [12], and Distributed Gibbs (D-Gibbs) [18]. We use publicly-availableimplementations of BMS, KOPT, and TOPT and use our own implementations of DBAand D-Gibbs. All experiments are performed on an Intel i7 Quadcore 3.3GHz machinewith 8GB of RAM with a 300-second timeout. If an algorithm fails to solve a problem,it is due to either memory limitations or timeout. We conduct our experiments on ran-dom graphs [5], where we systematically vary the constraint density p17 and distributedRadio Link Frequency Assignment (RLFA) problems [3, 7], where we vary the numberof agents |A| in the problem. We generated 20 instances for each experimental setting,and we report the average runtime, measured using the simulated runtime metric [24].All experiments are statistically significant with p-values < 1.6× 10−6.

Random Graphs: We set |A|= 50, |Di|= 20 for all variables, and p1 = 0.5. We didnot impose restrictions on the tree-width, which is determined based on the underlyingrandomly generated graph.

We first evaluate the two learning algorithms, Distributed Q-Learning (labeled Q-Learning) and Distributed Local Q-Learning (labeled LQ-Learning), against a randomneighborhood selection (labeled Random), on learning good neighborhoods. In our im-plementation, the action value function Q is constructed incrementally in order to keepmemory usage as small as possible. Figure 3 (left) shows the quality of solutions foundwith the three algorithms. They all run D-BLNS in the repair phase. For each algorithm,we report the lower (upper) bound in the bottom (top) part of the plot found at each timeinterval. These measures are normalized with respect to the best lower (upper) boundfound by the three algorithms, which is illustrated through a red bold line. The closer analgorithm to the red line, the better. Both learning algorithms find solutions with almostidentical qualities and they both outperform random selections, as expected.

Next, we evaluate the two repair algorithms, D-BLNS and D-RLNS. They run LQ-Learning in the destroy phase. Figure 3 (right) reports the runtimes at varying domain

7 p1 is defined as the ratio between the number of binary constraints in the problem and themaximum possible number of binary constraints in the problem.

● ● ● ● ● ●D−BLNS BMS KOPT−2 KOPT−3 D−Gibbs DBA

Simulated Time (ms)

Sol

utio

n Q

ualit

y

10 100 1000 5000

0.6

0.7

0.8

0.9

1.0

● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ● ● ● ● ●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ● ● ● ● ●●●● ● ● ● ● ● ●

●

●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ● ● ● ● ●●●● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●● ● ●

●●●●● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ● ● ● ● ●●●● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Simulated Time (ms)

Sol

utio

n Q

ualit

y

10 100 1000 5000

0.7

0.8

0.9

1.0

● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●● ● ● ● ●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ● ● ● ● ●●●●●

● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●●●●●●●●●

●●●●

●

●

●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●

●●

● ●●●●

● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

● ● ● ● ● ● ●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Approximation ratio ρ

p1 D-BLNS BMS KOPT3 KOPT20.1 1.10 1.47 13.5 17.60.2 1.16 1.63 13.5 17.60.3 1.16 1.90 13.5 17.60.4 1.17 1.96 13.5 17.60.5 2.00 3.14 13.5 17.60.6 2.07 3.37 13.5 17.60.7 2.34 3.53 13.5 17.60.8 2.39 3.50 13.5 17.60.9 2.45 3.57 13.5 17.61.0 2.50 3.60 13.5 17.6

Fig. 4: Experimental Results on Random Graphs

sizes |Di|. For D-RLNS, we set S = 5. D-RLNS is always faster than D-BLNS but itfinds worse solutions.

We now evaluate D-BLNS with LQ-Learning against state-of-the-art incompletealgorithms. We ran KOPT for k= 2 (KOPT2) and k= 3 (KOPT3), and ran TOPT fort = 1 (TOPT1). Figure 4 shows the convergence results (normalized lower and upperbounds) at p1 = 0.4 (top-left) and p1 = 0.8 (bottom-left) as well as the approximationratio ρ of each algorithm with quality guarantees (right). We omitted TOPT1 as it solvedproblems with p1 =0.1 only (reporting ρ=26.0). The best approximation ratio is shownin bold. As the upper bounds of the k-optimal algorithms were very loose, we omittedthem from the figures so that the lower bounds are easier to distinguish.

These results clearly show that D-BLNS not only converges to better solutions andconverges to them faster, but also provides tighter upper bounds. Consequently, it findsbetter approximation ratios compared to the other algorithms.

Distributed RLFA Problem: Each link, consisting of a transmitter and a receiver, mustbe assigned a frequency from a given set such that the total interference at the receiversfalls within an acceptable level. We model each transmitter as a variable (and an agent),whose domain consists of the frequencies that can be assigned to the corresponding

|A| D-BLNS BMS KOPT2 KOPT3 D-RLNS D-Gibbs DBAsim. time ρ ε sim. time ρ ε sim. time ρ ε sim. time ρ ε sim. time ε sim. time ε sim. time ε

10 11 1.03 1.01 211 1.05 1.29 44 4.33 1.06 57 3.50 1.03 10 1.03 13 1.17 7 1.5125 14 1.42 1.00 595 1.83 1.28 231 9.33 1.25 438 7.25 1.18 11 1.07 87 1.43 29 3.8550 29 1.59 1.00 861 2.20 1.34 709 17.6 1.44 1264 13.5 1.51 19 1.13 375 1.59 108 3.48

100 112 1.66 1.00 2074 2.21 1.28 3208 34.3 1.72 9113 26.0 1.92 57 1.17 1630 1.64 582 2.12250 1405 1.63 1.00 18372 2.34 1.43 86675 42.2 1.68 – 312 1.21 23091 2.97 6533 3.22500 12938 1.65 1.00 242236 2.75 1.63 – – 1622 1.28 135366 1.67 54188 2.22

1000 133047 1.59 1.00 – – – 24728 1.35 278300 1.60 –

Table 1: Experimental Results on Distributed RLFA Problems

transmitter. The interference between transmitters are modeled as constraints of theform |xi − xj | > s, where xi, xj ∈ X and s ≥ 0 is a randomly generated requiredfrequency separation.

We model the constraint graphs as scale-free graphs [1], where the fraction P (k)of nodes in the network having k connections is modeled by P (k) ∼ k−γ . We setγ = 2.5, vary |A| from 10 to 1000, set |Di| = 10, and randomly choose s ∈

[1,|A|/2

].

Table 1 shows the results for the different algorithms, using the same algorithm set-tings as in the previous experiments. We report the time needed to find the best solution,its approximation ratio ρ, and the ratio of the best quality found versus its quality ε. Bestruntimes, approximation ratios, and quality ratios are shown in bold. TOPT1 failed tosolve any of the problems, while KOPT2 and KOPT3 run out of memory for instanceswith more than 500 and 250 agents, respectively. The results show D-BLNS can scalebetter to large problems than algorithms with quality guarantees (i.e., BMS, KOPT2,and KOPT3). Additionally, similar to the trends in random graphs, it also finds bettersolutions (i.e., better approximation ratios ρ and better quality ratios ε) than those ofthe competing algorithms. The main reason why the D-LNS algorithms are faster thanBMS, is that, on average, about half of the agents are frozen at each iteration, and thesearch space is thus significantly smaller. Lastly, like in random graphs, D-RLNS con-verges to its solution fastest but its solutions are not as good as those found by D-BLNS.

6 Conclusions

In this paper, we proposed a Distributed Large Neighborhood Search (D-LNS) frame-work, that can be used to find quality-bounded solutions in DCOPs. Like centralizedLNS, D-LNS is composed of a destroy phase, which selects the large neighborhoodto search, and a repair phase, which performs the search over the selected neighbor-hood. We introduced Distributed Q-learning and Distributed Local Q-learning to selectthe neighborhood in the destroy phase. Distributed Q-learning is guaranteed to con-verge to the optimal neighborhood, but both learning algorithms perform equally wellempirically. We also introduced D-BLNS and D-RLNS to perform the search in therepair phase, characterized by low network load demand and low computational com-plexity per agent. D-RLNS finds solutions faster, but D-BLNS finds better solutions.Empirical results also show that, compared to existing incomplete DCOP algorithms,D-BLNS with Distributed Local Q-learning converges to better solutions, converges tothem faster, provides tighter upper bounds, and scales better to large problems, therebyclearly highlighting the strengths of this approach.

In the future, we plan to investigate other schemes that can be incorporated into therepair phase of D-LNS, such us propagation techniques to better prune the search space,some of which have been applied to DCOP algorithms [2, 10, 9, 7], as well as the use ofgraphics processing units (GPUs) to parallelize the search for better speedups [4].

Acknowledgment

We would like to thank the anonymous reviewers for their useful comments. This re-search is partially supported by INdAM-GNCS 2015 and NSF grant HRD-1345232.The views and conclusions contained in this document are those of the authors andshould not be interpreted as representing the official policies, either expressed or im-plied, of the sponsoring organizations, agencies, or the U.S. government.

References

1. A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science,286(5439):509–512, 1999.

2. C. Bessiere, P. Gutierrez, and P. Meseguer. Including Soft Global Constraints in DCOPs.In Proceedings of the International Conference on Principles and Practice of ConstraintProgramming (CP), pages 175–190, 2012.

3. B. Cabon, S. De Givry, L. Lobjois, T. Schiex, and J. P. Warners. Radio Link FrequencyAssignment. Constraints, 4(1):79–89, 1999.

4. F. Campeotto, A. Dovier, F. Fioretto, and E. Pontelli. A GPU implementation of large neigh-borhood search for solving constraint optimization problems. In Proceedings of the Euro-pean Conference on Artificial Intelligence (ECAI), pages 189–194, 2014.

5. P. Erdos and A. Renyi. On Random Graphs I. Publicationes Mathematicae Debrecen, 6:290,1959.

6. A. Farinelli, A. Rogers, A. Petcu, and N. Jennings. Decentralised coordination of low-powerembedded devices using the Max-Sum algorithm. In Proceedings of the International Con-ference on Autonomous Agents and Multiagent Systems (AAMAS), pages 639–646, 2008.

7. F. Fioretto, T. Le, W. Yeoh, E. Pontelli, and T. C. Son. Improving DPOP with branch con-sistency for solving distributed constraint optimization problems. In Proceedings of the In-ternational Conference on Principles and Practice of Constraint Programming (CP), pages307–323, 2014.

8. A. Gershman, A. Meisels, and R. Zivan. Asynchronous Forward-Bounding for distributedCOPs. Journal of Artificial Intelligence Research, 34:61–88, 2009.

9. P. Gutierrez, J. H.-M. Lee, K. M. Lei, T. W. K. Mak, and P. Meseguer. Maintaining SoftArc Consistencies in BnB-ADOPT + during Search. In Proceedings of the InternationalConference on Principles and Practice of Constraint Programming (CP), pages 365–380,2013.

10. P. Gutierrez and P. Meseguer. Improving BnB-ADOPT+-AC. In Proceedings of the Interna-tional Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 273–280,2012.

11. D. Hatano and K. Hirayama. Deqed: An efficient divide-and-coordinate algorithm for dcop.In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 566–572,2013.

12. K. Hirayama and M. Yokoo. The distributed breakout algorithms. Artificial Intelligence,161(1–2):89–115, 2005.

13. C. Kiekintveld, Z. Yin, A. Kumar, and M. Tambe. Asynchronous algorithms for approximatedistributed constraint optimization with quality bounds. In Proceedings of the InternationalConference on Autonomous Agents and Multiagent Systems (AAMAS), pages 133–140, 2010.

14. R. N. Lass, W. C. Regli, A. Kaplan, M. Mitkus, and J. J. Sim. Facilitating communicationfor first responders using dynamic distributed constraint optimization. In Proceedings of theSymposium on Technologies for Homeland Security (IEEE HST), pages 316–320, 2008.

15. R. Maheswaran, J. Pearce, and M. Tambe. Distributed algorithms for DCOP: A graphicalgame-based approach. In PDCS, pages 432–439, 2004.

16. L. Martin and R. Martin. An algorithm for distributed reinforcement learning in cooperativemulti-agent systems. In ICML, pages 535–542, 2000.

17. P. Modi, W.-M. Shen, M. Tambe, and M. Yokoo. ADOPT: Asynchronous distributed con-straint optimization with quality guarantees. Artificial Intelligence, 161(1–2):149–180, 2005.

18. D. T. Nguyen, W. Yeoh, and H. C. Lau. Distributed Gibbs: A memory-bounded sampling-based DCOP algorithm. In Proceedings of the International Conference on AutonomousAgents and Multiagent Systems (AAMAS), pages 167–174, 2013.

19. D. T. Nguyen, W. Yeoh, H. C. Lau, S. Zilberstein, and C. Zhang. Decentralized multi-agentreinforcement learning in average-reward dynamic DCOPs. In Proceedings of the AAAIConference on Artificial Intelligence (AAAI), pages 1447–1455, 2014.

20. J. Pearce and M. Tambe. Quality guarantees on k-optimal solutions for distributed constraintoptimization problems. In Proceedings of the International Joint Conference on ArtificialIntelligence (IJCAI), pages 1446–1451, 2007.

21. A. Petcu and B. Faltings. A scalable method for multiagent constraint optimization. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages1413–1420, 2005.

22. A. Petcu and B. Faltings. A hybrid of inference and local search for distributed combi-natorial optimization. In Proceedings of the International Conference on Intelligent AgentTechnology (IAT), pages 342–348, 2007.

23. A. Rogers, A. Farinelli, R. Stranders, and N. Jennings. Bounded approximate decentralisedcoordination via the max-sum algorithm. Artificial Intelligence, 175(2):730–759, 2011.

24. E. Sultanik, P. J. Modi, and W. C. Regli. On modeling multiagent task scheduling as adistributed constraint optimization problem. In Proceedings of the International Joint Con-ference on Artificial Intelligence (IJCAI), pages 1531–1536, 2007.

25. P. Van Hentenryck and L. Michel. Constraint-based local search. The MIT Press, 2009.26. M. Vinyals, M. Pujol, J. A. Rodriguez-Aguilar, and J. Cerquides. Divide-and-coordinate:

Dcops by agreement. In Proceedings of the International Conference on Autonomous Agentsand Multiagent Systems (AAMAS), pages 149–156, 2010.

27. M. Vinyals, E. Shieh, J. Cerquides, J. Rodriguez-Aguilar, Z. Yin, M. Tambe, and E. Bowring.Quality guarantees for region optimal DCOP algorithms. In Proceedings of the InternationalConference on Autonomous Agents and Multiagent Systems (AAMAS), pages 133–140, 2011.

28. W. Yeoh, X. Sun, and S. Koenig. Trading off solution quality for faster computation inDCOP search algorithms. In Proceedings of the International Joint Conference on ArtificialIntelligence (IJCAI), pages 354–360, 2009.

29. W. Yeoh and M. Yokoo. Distributed problem solving. AI Magazine, 33(3):53–65, 2012.30. R. Zivan, S. Okamoto, and H. Peled. Explorative anytime local search for distributed con-

straint optimization. Artificial Intelligence, 212:1–26, 2014.

Large Neighborhood Search with Quality Guarantees for ... · Large Neighborhood Search with Quality...

Documents

Transcript of Large Neighborhood Search with Quality Guarantees for ... · Large Neighborhood Search with Quality...