Modelling and Reformulating Constraint … Hnich Patrick Prosser Barbara Smith ... Modelling and...

Brahim Hnich Patrick Prosser Barbara Smith (Eds.)

Modelling and ReformulatingConstraint SatisfactionProblems

Fourth International WorkshopSitges (Barcelona), Spain, 1 October 2005Proceedings

Held in conjunction with theElventh International Conference onPrinciples and Practice ofConstraint Programming (CP 2005)

Foreword

Constraint Programming (CP) is a powerful technology to solve combinatorialproblems which are ubiquitous in academia and industry. The last ten years orso have witnessed significant research devoted to modelling and solving problemswith constraints. CP is now a mature field and has been successfully used fortackling a wide range of real-life complex applications. However, such a technol-ogy is currently accessible to only a small number of experts. For CP to be morewidely used by non-experts, more research effort is needed in order to ease theuse of the CP technology.

This “Fourth International Workshop of Modelling and Reformulating Con-straint Satisfaction Problem” was convened to provide a forum for researcherswho share these goals.

This volume contains nine contributed papers. These papers contribute towiden the use of the CP technology. Some of these are application papers de-scribing interesting problems and interesting ways to model them; some con-tribute to understanding modelling that could guide the manual or automaticformulation of models; some identify some of the criteria that should be usedin evaluating models and the design of pragmatic techniques that facilitate thechoice among alternative models; some present higher level modelling languages;and some propose automatic reformulation techniques.

We wish to thank all the authors who submitted papers to this workshop; themembers of the programme committee; and the CP’2005 Tutorial and WorkshopChairs, Alan Frisch and Ian Miguel.

August 2005 Brahim Hnich, Patrick Prosser, and Barbara SmithProgramme Chairs

Programme Committee

Chris Beck ([email protected]) University of Toronto, CanadaNicholas Beldiceanu ([email protected]) Ecole des Mines de Nantes,FranceAlan M. Frisch ([email protected]) University of York, United KingdomFrancois Laburthe ([email protected]) Bouygues, FranceJimmy Lee ([email protected]) The Chinese University of Hong Kong, HongKongMarco Cadoli ([email protected]) Universita di Roma ”La Sapienza”, ItalyBrahim Hnich, Joint Chair ([email protected]) University College Cork, IrelandJean-Francois Puget ([email protected]) ILOG, FranceSteve Prestwich ([email protected]) University College Cork, IrelandPatrick Prosser, Joint Chair ([email protected]) Glasgow University, ScotlandPaul Shaw, ([email protected]) ILOG, FranceBarbara Smith, Joint Chair ([email protected]) University College Cork, Ireland

Table of Contents

Increasing Solution Density by Dominated Relaxation . . . . . . . . . . . . . . . . . 1Steven Prestwich

Sudoku as a Constraint Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Helmut Simonis

A Constraint Programming Approach to the Hospitals / ResidentsProblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28David F. Manlove, Gregg O´Malley, Patrick Prosser, and ChrisUnsworth

Optimization Models for Generating Graduation Roadmaps . . . . . . . . . . . . 44Avi Dechter and Rina Dechter

Models for a Variable-sized Bin Packing Problem . . . . . . . . . . . . . . . . . . . . . 59Diego Olivier Fernandez Pons

The Essence of Essence: A Constraint Language for SpecifyingCombinatorial Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Alan M. Frisch, Matthew Grum, Christopher Jefferson, BernadetteMartınez Hernandez, Ian Miguel

The Systematic Generation of Channelling Constraints . . . . . . . . . . . . . . . . 89Bernadette Martınez Hernandez, Alan M. Frisch

Representations of Sets and Multisets in Constraint Programming . . . . . . . 102Christopher Jefferson, Alan M. Frisch

Modelling and Solving Temporal Reasoning as Propositional Satisfiability 117Duc Nghia Pham, John Thornton, Abdul Sattar

Contributed Papers

Increasing Solution Density

by Dominated Relaxation

Steven Prestwich

Cork Constraint Computation CentreDepartment of Computer ScienceUniversity College Cork, Ireland

[email protected]

Abstract. The solution density of a constraint problem can have a sig-nificant effect on local search performance. This paper describes a refor-mulation technique for artificially increasing solution density: relaxingthe model in such a way that relaxed solutions are dominated by realsolutions, and can be transformed into them. On an open stack minimiza-tion problem this technique can exponentially increase solution density,boosting local search performance.

1 Introduction

The solution density of a constraint model may be defined as the number of solu-tions divided by the number of total variable assignments. The idea that highersolution density should make a problem easier to solve is natural, particularly fora local search algorithm [5, 12, 14, 17, 20, 22], but little has been done to exploitthis conjecture in practice. This paper describes a reformulation technique thatuses relaxation with dominance to increase solution density. We relax a modelin such a way that any relaxed solution is dominated by at least one real solu-tion, and can be transformed into it. This increases the solution density of themodel and boosts local search performance (we shall ignore its effect on back-track search, but discuss this issue in Section 4). We shall call this a dominated

relaxed model and the new solutions pseudo-solutions .

We illustrate the idea on the minimization of open stacks problem (MOSP)used in the First Constraint Modelling Challenge,1 and studied in several papersincluding [9, 21]. A manufacturer has a number of orders from customers tosatisfy. Each order is for a number of different products, and only one productcan be made at a time. Once a customer’s order is started a stack is created forthat customer. When all the products that a customer requires have been madethe order is sent to the customer, so that the stack is closed. Because of limitedspace in the production area, the number of stacks that are simultaneously openshould be minimized.

1 http://www.dcs.st-and.ac.uk/˜ipg/challenge/

1

1.1 A matrix representation

MOSP can be modelled as a matrix M of binary variables, in which the columnscorrespond to the products required by the customers, and the rows to thecustomers’ orders. Matrix entry Mij = 1 if and only if customer i has orderedsome quantity of product j (the quantity ordered is irrelevant). Any 0 with a 1in a column to its left, and another 1 in a column to its right, is counted as a1. A score is assigned to each column: its number of 1s (including 0s counted as1s). The score of the matrix is the maximum of its column scores. The problemis to permute the columns of the matrix to minimise its score. An example isshown in Figure 1. The zeros in bold have 1s to their left and right and arecounted as 1s. The matrix score is 3, which is the maximum column score. Thecolumn permutation shown is optimal.

0 1 0 0 10 0 1 1 01 0 0 0 00 0 1 1 10 1 0 0 0

1 2 3 3 2

Fig. 1. A MOSP instance with column scores

1.2 An integer model

The matrix representation can be modelled as an integer program. Suppose thematrix has R rows and C columns. Assume that each customer orders at leastone product (if not then that customer can be removed from the problem), sothat every row has an open order of length at least 1. In the following, i is aninteger with range 1 . . . R and j, k are integers with range 1 . . . C. Define binaryvariables pjk such that pjk = 1 if and only if product j is placed in columnk. Each product must be placed in exactly one column and each column mustreceive exactly one product:

∑

k

pjk = 1∑

j

pjk = 1 (1)

To model the idea of 1s influencing 0s to their left and right, define variables lijand rij such that lik = 1 if and only if Mik has a 1 to its left, and rik = 1 if andonly if Mik has a 1 to its right. Add constraints

pjk ≤ lik pjk ≤ rik (2)

for i, j such that Mij = 1, and

lik ≤ li k+1 ri k+1 ≤ rik (3)

2

Also define variables oik such that oik = 1 if and only if there is an open orderon row i at column k:

lik + rik ≤ 1 + oik (4)

The objective is to find consistent values for the p, l, r, o variables while minimis-ing

maxk

∑

i

oik

This optimisation problem can be solved as a series of constraint satisfactionproblems (CSPs) with additional linear constraints:

∑

i

oik ≤ Ω (5)

where Ω is an integer variable with range 0 . . . R. The objective is to minimiseΩ. Starting with Ω = R a feasible solution is found; R is then decremented andthe search resumed with the same variable assignments (to exploit any solutionclustering); and so on until timeout occurs or a desired value is reached.

1.3 Creating pseudo-solutions

We would like to remove or weaken some constraints in order to create newpseudo-solutions, thus increasing the solution density of the problem. But wecannot simply remove arbitrary constraints, because solutions to the resultingproblem might not be MOSP solutions, or they might be solutions with largerscores. We must do it in such a way that any pseudo-solution can be transformedinto a solution of equal or lower score: a dominating solution. Suppose we weakenconstraints (1) to

∑

k

pjk ≥ 1 (6)

Now each product may be placed in more than one position in the sequence, andsome sequence positions might receive no products. At first sight this appearsuseless because solutions might not be permutations. However, it can be seen asa model for a generalised problem: find a sequence of sets of products such thateach product appears in at least one set, and the orders are open in set k if thereis a product required in an order in another set i that appears in a set before orafter set k, or in k itself. Such a sequence is a pseudo-solution for MOSP.

For example, consider the 5 × 5 MOSP given by the matrix in Figure 2(i).A solution with stack usage 3 is the product sequence (4,5,1,2,3) giving thepermutation in Figure 2(ii). A pseudo-solution

(4, 4, 4, 5, , 1, 2, 3)

3

that is dominated by this permutation is shown in Figure 2(iii), so there are atleast two representations of the permutation. In fact there are several others:for example the empty set can occur anywhere in the sequence, and two of thethree occurrences of product 4 can be removed. It can be verified that all thesepseudo-solutions are optimal because they have score 3.

1 2 3 4 5

0 0 1 0 11 1 0 0 00 0 0 1 01 1 1 0 00 0 0 0 1

4 5 1 2 3

0 1 0 0 10 0 1 1 01 0 0 0 00 0 1 1 10 1 0 0 0

4 4 45 123

0 0 01 0010 0 00 1101 1 10 0000 0 00 1110 0 01 000

(i) problem (ii) optimal solution (iii) optimal pseudo-solution

Fig. 2. An example of a pseudo-solution

1.4 From pseudo-solutions to solutions

A dominating solution can be derived from a pseudo-solution as follows. For anyproduct that appears in more than one set, remove all but one of its appearances.For example the pseudo-solution

(4, 4, 4, 5, , 1, 2, 3)

becomes

(4, , 5, , 1, 2, 3)

if we delete all but the first appearance of each product. We now have the samenumber of products as sets, and can obtain a permutation by moving productswithout violating the ordering among sets. For example 5 can be moved one setto the left to obtain

(4, 5, , , 1, 2, 3)

then 1 moved two sets to the left to obtain

(4, 5, 1, , 2, 3)

and finally 2 moved one set to the left to obtain the permutation

(4, 5, 1, 2, 3)

It is clear that this transformation is correct and never increases Ω: removinga product cannot increase Ω, and it is always safe to move a product into aneighbouring empty set.

4

1.5 Increase in optimal solution density

The effect on optimal solution density can be spectacular. Consider a MOSPinstance that we shall call AN , represented by a 2 × 2N matrix:

[

00 . . . 0 1 . . . 1111 . . . 1 0 . . . 00

]

The column permutation shown has 1 open stack in any column, and this isoptimal. We may permute the left and right columns separately, and reverse allcolumns, to obtain other optimal permutations. Thus there are 2(N !)2 optimalpermutations, and any other permutation such as

[

0 . . . 0101 . . . 11 . . . 1010 . . . 0

]

has at least one column with two open stacks (because of the zeroes shown inbold). However, there are many more optimal pseudo-solutions. Consideringonly those cases in which all (0,1)-columns are in the N left (or right) columnsand all (1,0)-columns are in the N right (or left) columns, there are 2(2N −1)2N

optimal pseudo-solutions. So the number of optimal solutions has been increasedby a factor of at least

2(2N − 1)2N

2(N !)2≈

22N2

(N !)2>

2N2

N2N=

2N2

2log2

N2N= 2N2

−2N log2

N ≈ 2N2

Thus we have an exponential increase in solution density. At the other extreme,for some problems there will be exactly the same number of solutions in bothmodels: in other words the reformulation introduces no pseudo-solutions. Con-sider a MOSP instance we shall call BN , represented by the 2N × N matrix

[

UN

LN

]

where UN and LN are upper and lower diagonal N × N matrices respectively:

UN =

111 . . . 111011 . . . 111001 . . . 111

...000 . . . 001

LN =

100 . . . 000110 . . . 000111 . . . 000

...111 . . . 111

Notice that every column has exactly (N + 1) 1s and no 0 has a 1 to the leftand the right, but in any other permutation this is no longer true. For example

5

in B4 if we exchange the middle two columns:

11110111001100011000110011101111

−→

11110111010100011000101011101111

then both have 6 open stacks instead of 5. So the matrix has two optimal solu-tions, each with (N +1) open stacks: one is with UN and LN as shown, the otheris obtained by reversing their columns. Notice also that each column has at leastone 1 where the other has a 0 in the same row. Therefore no two columns can beplaced in the same set without increasing the maximum number of open stacks,so there are no optimal pseudo-solutions.

1.6 An intermediate model

We shall refer to the original integer model as model 1, and the new one asmodel 3. We shall also consider an intermediate model 2 in which all sets in thepseudo-solution must be non-empty, which is obtained by replacing (1) by

∑

k

pjk ≥ 1∑

j

pjk ≥ 1 (7)

Model 2 can also have exponentially more solutions than model 1. Consideringonly optimal pseudo-solutions to AN in which product 1 appears in every set,there are 2(2N−1 − 1)2(N−1) of them, which is still at least O(2N2

) more thanthe number of model 1 solutions.

1.7 Model size

All three models have O(R2) p, l, r variables and O(RC) o variables, giving atotal of O(R(R +C)). There are O(C) constraints (1) or (6) or (7) of size O(R),O(∆RC2) constraints (2) of constant size where ∆ is the matrix density,2 O(RC)constraints (3) of constant size, O(RC) constraints (4) of constant size, and O(C)constraints (5) of size O(R), giving a total space complexity of O(RC(1+C∆)).The models are suitable for large problems with small ∆, in other words eachcustomer orders a small number of products.

2 Defined as the number of 1-entries divided by the total number of entries.

6

2 Experiments

We compare the three models using a new local search algorithm for (non-binary)integer programs. This is based on a recent SAT algorithm called VW2 [15], anduses the VW2 modified objective function that dynamically weights variables toimprove search diversification. In all experiments below the noise parameter p

is set to 0.05, the s parameter is set to 0.1, and the c parameter to 0.000001(the algorithm will be described in future publications; see [15] for details on theparameters). These values were found to be robust across a range of probleminstances. Results shown are the number of local moves (flips) required to finda solution. The differences in flip rates (number of flips per second) between thethree models were negligible, as they have almost the same number of constraints.

For the AN benchmark (with many optimal pseudo-solutions) the results areshown in Figure 3. The graph is a log-log plot so the straight lines show thatsearch effort is polynomial in N . Models 2 and 3 are indistinguishable, but model1 has a steeper gradient and therefore a higher polynomial degree. We also ex-perimented with N = 500 and carefully tuned the search algorithm parameters,and the results were very similar: models 1 and 2 took tens of seconds to solvewhile model 3 took tens of minutes. A merely polynomial improvement mayseem disappointing given the exponential increase in solution density. An expla-nation may be that most pseudo-solutions for this problem occur in the basinsof attraction of real solutions. Nevertheless, the improvement is significant.

100

1000

10000

100000

100

flips

N

model 1model 2model 3

Fig. 3. Results on the AN benchmarks

For the BN benchmark (with no optimal pseudo-solutions) the results areshown in Figure 4 and are quite different. Models 1 and 2 are now similar, andhave lower polynomial degree than model 3. An explanation may be that when

7

there are no pseudo-solutions the omitted constraints of model 3 are redundantbecause they exclude no solutions, and redundant constraints are known to aidlocal search on some problems [4, 13, 16].

100

1000

10000

100000

1e+06

10

flips

N

model 1model 2model 3

Fig. 4. Results on the BN benchmarks

These results show that reformulation to increase solution density can bevery beneficial for local search, but that when the solution density is unchangedit can be harmful. Care must therefore be taken to choose a robust model. Onthis problem model 2 is the most robust, behaving like the better of the othertwo models on each problem class.

To test the models further we experimented with the individual instancesfrom the Constraint Modelling Challenge for which optimal solutions are known.We performed ten runs per instance and took the median number of flips takento solve the problem to optimality.3 The results are shown in Table 1, tableentries “—” denoting that the median number of flips was greater than 107. Theresults are consistent with those above: model 2 again gives the most robustresults, model 1 gives poor results, and model 3 sometimes gives the best andsometimes the worst results.

3 Related work

Dominated relaxed modelling is related to the safe delay of constraints [2, 3] inwhich certain constraints are not enforced until after the search has finished.Safe delay constraints have been applied to logical specifications of CSPs, anddelaying them allows multiple assignments of values to CSP variables. A real

3 Optimal results were obtained by a complete algorithm, thanks to Nic Wilson.

8

problem stacks model 1 model 2 model 3

Miller19 13 2,453,224 227,425 157,637

GP1 45 6,785,315 2,022,414 4,940,263GP2 40 — 1,386,156 8,902,904GP3 40 — 2,829,814 —GP4 30 8,729,306 4,901,462 —

NWRS1 3 2,475,203 29,472 17,102

NWRS2 4 670,150 122,263 29,004

NWRS3 7 2,819,739 113,894 42,625

NWRS4 7 231,640 167,744 50,389

NWRS5 12 693,360 60,100 79,034NWRS6 12 332,528 260,164 51,750

NWRS7 10 — 577,905 559,957

NWRS8 16 4,159,016 1,458,881 432,312

SP1 9 1,076,624 222,584 832,262

Table 1. Results on selected challenge instances

solution can be obtained by taking any one of the multiple assignments to eachvariable. This is analogous to the removal of at-most-one clauses in a well-knownSAT encoding of CSPs, which is known to improve local search performance [17,19]. The constraints that we omit from our first model can be regarded as safedelay constraints for MOSP instead of SAT, and a more complex transformationis required to obtain solutions.

Local search on pseudo-solutions has some resemblance to Abstract LocalSearch (ALS) [6]. ALS solves combinatorial optimization problems by applyinglocal search to a space of abstract solutions while ignoring details. For exam-ple, abstract solutions may be task prioritizations for a scheduling problem, andthe details may be task start times. But there are differences between the ap-proaches. Firstly, the objective function of the problem is not defined on abstractsolutions, but it is defined on pseudo-solutions. Secondly, dominance plays a dif-ferent role: any pseudo-solution must be dominated by a solution, whereas forany solution S there must exist an abstract solution that maps to a real solutionthat dominates S. Thirdly, pseudo-solutions are purely a modelling concept thatmay be exploited by any local search algorithm, whereas abstract solutions areused in a special search architecture. Pseudo-solutions require a polynomial-timealgorithm for constructing solutions from them, which is used only once aftersearch. Abstract solutions require a fast greedy algorithm to construct solutionsfrom them, which may be invoked many times during search. Repairs to an ab-stract solution are derived by analysing conflicts in the solution derived fromit.

Dominated relaxed modelling is also related to supersymmetric modelling [16,18]. Any solution to a supersymmetric model is either a solution to the problem,or can be transformed to one by applying a symmetry transformation. Symmetric

9

solutions may be considered to dominate each other, so supersymmetry is aspecial case of dominated relaxed modelling.

4 Conclusion

Choosing a good model can be as important as choosing a good search algorithm.But modelling for local search and for complete search can require different meth-ods, and relatively little work has been done on the former. Dominated relaxedmodelling is a promising technique, though it may require a degree of ingenuity.The same is true of supersymmetric reformulations — perhaps because super-symmetry is the inverse of symmetry breaking by reformulation, which is knownto be powerful but non-trivial to apply [10]. It is therefore unlikely that eithertechnique can be automated. Nevertheless, they can have a significant effecton local search performance, and should form part of the constraint modeller’stoolbox.

Is dominated relaxed modelling suitable for use with complete search algo-rithms? It is the reverse of the usual application of dominance reasoning — toreduce the number of solutions and the size of the search space — which seemsto imply that it will not work well with backtrackers. On the other hand, it isknown that adding symmetry breaking constraints can have a positive or nega-tive effect on the effort taken by a backtracker to find a solution, an effect thatmotivated the development of techniques such as SBDS [1, 11] and SBDD [7, 8].If the effect of removing solutions by adding constraints is unpredictable thenso (presumably) is the effect of creating new solutions by removing constraints.Thus we expect the effect of the technique on backtracker performance to behighly problem-dependent.

In future work we hope to find further applications of this technique. We alsohope to analyse its effect on search space structure, as it is currently unproventhat the performance benefits are caused by the increase in solution density.4

Acknowledgement

This material is based in part upon works supported by the Science FoundationIreland under Grant No. 00/PI.1/C075.

References

1. R. Backofen, S. Will. Excluding Symmetries in Constraint-Based Search. Fifth Inter-

national Conference on Principles and Practice of Constraint Programming, Lecture

Notes in Computer Science vol. 1713, Springer-Verlag 1999, pp. 73–87.2. M. Cadoli, T. Mancini. Automated Reformulation of Specifications by Safe Delay

of Constraints. Principles of Knowledge Representation and Reasoning: Proceedings

of the Ninth International Conference, AAAI Press, 2004, pp. 388–398.

4 Thanks to Ken Brown for pointing this out.

10

3. M. Cadoli, T. Mancini, F. Patrizi. SAT as an Effective Solving Technology for Con-straint Problems. Atti della Giornata di Lavoro: Analisi sperimentale e benchmark

di algoritmi per l’Intelligenza Artificiale, Dipartimento di Ingegneria, Universita diFerrara, Italy, 2005.

4. B. Cha, K. Iwama. Adding New Clauses for Faster Local Search. Fourteenth National

Conference on Artificial Intelligence, American Association for Artificial Intelligence,1996, pp. 332–337.

5. D. Clark, J. Frank, I. P. Gent, E. MacIntyre, N. Tomov, T. Walsh. Local Searchand the Number of Solutions. Second International Conference on Principles and

Practices of Constraint Programming, Lecture Notes in Computer Science vol. 1118,Springer, 1996, pp. 119–133.

6. J. M. Crawford, M. Dalal, J. P. Walser. Abstract Local Search. AIPS’98 Work-

shop on Planning as Combinatorial Search, Carnegie Mellon University, Pittsburgh,Pennsylvania, USA, 1998.

7. T. Fahle, S. Schamberger, M. Sellmann. Symmetry Breaking. Seventh International

Conference on Principles and Practice of Constraint Programming, Lecture Notes in

Computer Science vol. 2239, Springer, 2001, pp. 93–107.

8. F. Focacci, M. Milano. Global Cut Framework for Removing Symmetries. Seventh

International Conference on Principles and Practice of Constraint Programming, Lec-

ture Notes in Computer Science vol. 2239, Springer, 2001, pp. 77–92.

9. A. Fink, S. Voss. Applications of Modern Heuristic Search Methods to PatternSequencing Problems. Computers and Operations Research vol. 26, 1999, pp. 17–34.

10. I. P. Gent, J.-F. Puget. Symmetry Breaking in Constraint Programming. Tutorial,Tenth International Conference on Principles and Practice of Constraint Program-

ming , Toronto, Canada, 2004.

11. I. P. Gent, B. M. Smith. Symmetry Breaking During Search in Constraint Program-ming. Fourteenth European Conference on Artificial Intelligence, Berlin, Germany,2000, pp. 599–603.

12. Y. Hanatani, T. Horiyama, K. Iwama. Density Condensation of Boolean Formu-las. Sixth International Conference on the Theory and Applications of Satisfiability

Testing, Lecture Notes in Computer Science vol. 2919, Springer, 2003, pp. 69–77.

13. K. Kask, R. Dechter. GSAT and Local Consistency. Fourteenth International Joint

Conference on Artificial Intelligence, Morgan Kaufmann, 1995, pp. 616–622.

14. A. Parkes. Clustering at the Phase Transition. Fourteenth National Conference

on Artificial Intelligence and Ninth Innovative Applications of Artificial Intelligence

Conference AAAI Press / MIT Press, 1997, pp. 340–345.

15. S. D. Prestwich. Random Walk With Continuously Smoothed Variable Weights.Eighth International Conference on Theory and Applications of Satisfiability Testing,

Lecture Notes in Computer Science vol. 3569, Springer, 2005, pp. 203–215.

16. S. D. Prestwich. Negative Effects of Modeling Techniques on Search Performance.Annals of Operations Research vol. 118, Kluwer Academic Publishers, 2003, pp. 137-150.

17. S. D. Prestwich. Local Search on SAT-Encoded Colouring Problems. Sixth Inter-

national Conference on the Theory and Applications of Satisfiability Testing, Lecture

Notes in Computer Science vol. 2919, Springer, 2003, pp. 105–119.

18. S. D. Prestwich, A. Roli. Symmetry Breaking and Local Search Spaces. Second

International Conference on Integration of AI and OR Techniques in Constraint

Programming for Combinatorial Optimization Problems, Lecture Notes in Computer

Science vol. 3524, Springer, 2005, pp. 273–287.

11

19. B. Selman, H. Levesque, D. Mitchell. A New Method for Solving Hard SatisfiabilityProblems. Tenth National Conference on Artificial Intelligence, MIT Press, 1992, pp.440–446.

20. J. Singer, I. P. Gent, A. Smaill. Backbone Fragility and the Local Search CostPeak. Journal of Artificial Intelligence Research vol. 12, 2000, pp. 235–270.

21. H. H. Yanasse. On a Pattern Sequencing Problem to Minimize the MaximumNumber of Open Stacks. European Journal of Operational Research vol. 100, 1997,pp. 454–463.

22. M. Yokoo. Why Adding More Constraints Makes a Problem Easier for Hill-climbingAlgorithms: Analyzing Landscapes of CSPs. Third International Conference on Prin-

ciples and Practice of Constraint Programming, Lecture Notes in Computer Science

vol. 1330, Springer-Verlag 1997, pp. 356–370.

12

Sudoku as a Constraint Problem

Helmut Simonis

IC-ParcImperial College [email protected]

Abstract. Constraint programming has finally reached the masses, thou-sands of newspaper readers (especially in the UK) are solving theirdaily constraint problem. They apply complex propagation schemes withnames like “X-Wing” and “Swordfish” to find solutions of a rather simplelooking puzzle called Sudoku. Unfortunately, they are not aware that thisis constraint programming. In this paper we try to understand the puzzlefrom a constraint point of view, show models to solve and generate puz-zles and give an objective measure of the difficulty of a puzzle instance.This measure seems to correlate well with grades (e.g. easy to hard) thatare assigned to problem instances for the general public. We also showhow the model can be strengthened with redundant constraints and howthese can be implemented using bipartite matching and flow algorithms.

1 Introduction

Sudoku [1] is a puzzle played on a partially filled 9x9 grid. The task is to completethe assignment using numbers from 1 to 9 such that the entries in each row, eachcolumn and each major 3x3 block are pairwise different. Like for many logicalpuzzles the challenge in Sudoku does not just lie in finding a solution. Well posedpuzzles have a unique solution and the task is to find it without guessing, i.e.without search. In this paper we compare a number of propagation schemes onhow many problem instances they can solve by constraint propagation alone.

The basic Sudoku problem can be modelled with constraint programming [2]by a combination of alldifferent constraints[3]. Using different consistency tech-niques for these constraints we derive a number of propagation schemes withdiffering strength. We can extend the model either by shaving[4], testing eachvalue in each domain with a one-step lookahead technique or by adding redun-dant constraints. We use simplified versions of the colored matrix constraint[5,6] and the same with cardinality constraint [7] and propose additional, new con-straints which are also based on matching and flow algorithms.

We have evaluated our methods on different sets of example problems andcan see how the grades assigned by the puzzle designers often, but not alwaysmatch the propagation schemes described here. This evaluation also gives a faircomparison of the difficulty of the different puzzle sources.

The paper is structured as follows: We first discuss some related work insection 2 and then give a formal problem definition in section 3. This is followed

13

in section 4 by a description of the constraint model used to express the problem.Adding redundant constraints can improve the reasoning power of a constraintmodel, we present different alternatives in section 5. We continue in section 6 bylisting the different combinations of propagation schemes used for the evaluationin section 7. Finally, we discuss methods to generate puzzles in section 8, beforeending with some conclusions.

2 Related work

There is a whole sub area in constraint programming concerned with backtrack-free search[2]. But its focus is on identifying problem classes where all instancescan be solved without backtracking. For the Sudoku puzzle we want to knowwhich problem instances we can solve without search.

Solving puzzles has always been a favorite activity for the constraint program-ming community. Starting with Lauriere[8], puzzles were used to come up withnew constraint mechanisms and to compare different solution methods. Exam-ples are n-queens[9], the five houses puzzle[2], the progressive party problem[10]and more recently the social golfer problem[11]. Some well known games likePeg Solitaire[12] and minesweeper[13] were also approached with constraint pro-gramming. The Sudoku problem can be easly modelled using the alldifferent[3]constraint. There are other problems where this constraint plays a central role,in particular quasi-group completion[6][14][15][16], but also industrial problemslike aircraft stand allocation[17] [18].

An alternative view of the Sudoku problem is provided by [19], this could bethe basis for additional, redundant constraints.

3 Problem Definition

Definition 1. A Sudoku square of order n consists of n4 variables formed intoa n2 × n2 grid with values from 1 to n2 such that the entries in each row, eachcolumn and in each of the n2 major n × n blocks are alldifferent.

An example of a Sudoku square is shown in table 1. The major blocks are outlinedby a thicker framing.

Currently, only Sudoku problems of order 3 (9 × 9 grid) are widely used.It was claimed in [1] that there are 6,670,903,752,021,072,936,960 valid Sudokusquares of order 3.

Definition 2. A Sudoku problem (SP) consists of a partial assignment of thevariables in a Sudoku square. The objective is to find a completion of the assign-ment which extends the partial assignment and satisfies the constraints.

Table 2 shows a SP in the form normally presented. Its solution is the squareshown in table 1.

We are only interested in puzzles which have a unique solution. If a partialassignment allows more than one solution, we need to add hints until it is wellposed.

14

7 2 6 4 9 3 8 1 5

3 1 5 7 2 8 9 4 6

4 8 9 6 5 1 2 3 7

8 5 2 1 4 7 6 9 3

6 7 3 9 8 5 1 2 4

9 4 1 3 6 2 7 5 8

1 9 4 8 3 6 5 7 2

5 6 7 2 1 4 3 8 9

2 3 8 5 7 9 4 6 1

Table 1. Example Sudoku Square

2 6 8 1

3 7 8 6

4 5 7

5 1 7 9

3 9 5 1

4 3 2 5

1 3 2

5 2 4 9

3 8 4 6

Table 2. Example Sudoku Problem

15

Definition 3. A SP is well posed if it admits exactly one solution.

Ideally, the hints of a puzzle should not contain any redundant information,e.g. there should be no hint that is itself a logical consequence of the other hints.

Definition 4. A well posed SP is locally minimal if no restriction of the assign-ment is a well posed problem.

Whether a puzzle can be solved without guessing depends on the deductionmechanism used. We only use a informal notion here, a more formal treatmentcan be found in [2].

Definition 5. A SP is search free wrt. a propagation method if it is solvedwithout search, just by applying the constraint reasoning.

4 Constraint Model

We now discuss the basic model for the Sudoku puzzle. The programs are writtenin ECLiPSe 5.8[20, 21]. The program uses the IC library (which defines a forwardchecking[3, 22] version of alldifferent) and its extension ic global, which providesa bound-consistent[23, 24] alldifferent. A propagator for a hyper arc-consistentalldifferent[25] was added for the study.

The program uses a n2 ∗ n2 matrix of finite domain variables with domain1 to n2. We set up 3 ∗ n2 alldifferent constraints for each row, each column andeach major n ∗ n block.

4.1 Channeling

In our problem, each alldifferent constraint sets up a bijection between n2 vari-ables and n2 values. In this bijection the roles of variables and values are inter-changeable, but the constraint handling of the forward checking alldifferent isoriented. For example, it will assign a variable which has only one value left in itsdomain, but it will not assign a value which occurs only once in the domains of allvariables. A method of overcoming this limitation is the use of channeling[26]. Foreach alldifferent constraint we introduce a dual set of variables, which are linkedto the original variables via an inverse constraint [5], and which themselves mustbe alldifferent. This improves the reasoning obtained from the forward checkingand the bound-consistent versions of the alldifferent, but obviously is useless forthe hyper arc-consistent alldifferent.

5 Redundant Constraints

Even if we use hyper arc-consistency as the method for each of the alldifferentconstraints we will not always achieve global consistency. The constraints inter-act in multiple ways and the local reasoning on each constraint alone can notexploit these connections. A common technique to help with this problem is the

16

use of redundant constraints which can strengthen the deduction by combiningseveral of the original constraints. We show four such combinations here. Thefirst is a simplified version of the colored matrix [5]. This was already used in [6]to improve a method for quasi-group completion. The others are new, but usebi-partite matching and flow techniques inspired by [27].

5.1 Row/Column interaction

This constraint handles the interaction of rows and columns in the matrix. Thecolored matrix constraint[5], also called cardinality matrix in [6], expresses con-straints on the number of occurrences of the values in rows and columns of amatrix. This reduces to a set of simple matching problems for a permutationmatrix. Each value must occur exactly once in each row and column, this cor-responds to a matching between row and columns (the two sets of nodes), andedges which link a row and a column if the given value is in the domain of thecorresponding matrix element. By finding a maximal matching and then iden-tifying strongly connected components in a re-oriented graph we can eliminatethose values from all domains which do not belong to any maximal matching.Figure 1 shows the form of the graph considered.

r1 c1

r2 c2

.

.

....

S... cj T

ri...

.

.

....

r9 c9

v ∈ dom(mij)

Fig. 1. Row/Column Matching

We have to solve one matching problem per value, this makes 9 constraints.

17

5.2 Row/Block interaction

We now look at the interaction of a single row(column) with a single major blockas shown in figure 2. The row alldifferent constraint and the alldifferent constraint

x11 x12 x13 x14 x15 x16 x17 x18 x19

x21 x22 x23

x31 x32 x33

Fig. 2. Row/Block Interaction

for the major block coincide in three variables. One necessary condition is thatthe set of variables B = x14, x15, x16, x17, x18, x19 uses the same set of valuesas the set C = x21, x22, x23, x31, x32, x33 of variables. We can exploit thiscondition by removing from the variables in set B all values which are notsupported by values in set C and vice versa. This constraint is a special case ofthe same with cardinality of [7], which achieves hyper arc-consistency by solvinga flow problem. For this simpler variant, our method seems equivalent.

Considering all rows/columns and major blocks leads to 54 such constraints.

5.3 Rows/Blocks interaction

We can also consider the interaction of all major blocks in a line (column) withthe three intersecting rows(columns), as shown in figure 3. Each value must occur

x11 x12 x13 x14 x15 x16 x17 x18 x19

x21 x22 x23 x24 x25 x26 x27 x28 x29

x31 x32 x33 x34 x35 x36 x37 x38 x39

Fig. 3. Rows/Blocks interaction

exactly once in each row, but also in each block. For every value, we can expressthis as a matching problem between rows and blocks, which are the nodes in thebipartite graph shown in figure 4. There is an arc between a row and a blockif the given value is in the domain of any of the three overlapping variables. Ifthat edge does not belong to any matching, we can remove the value from thedomain of all three variables. The algorithm works in the usual way, we firstfind a maximal matching, reorient the edges not in the matching and search forstrongly connected components (SCC). Edges which are not in the matching andwhose ends do not belong to the same SCC can be removed.

18

There are 6 ∗ 4 = 54 of these constraints.

r1 b1

S r2 b2 T

r3 b3

Fig. 4. Rows/Blocks matching problem

5.4 Rows/Columns/Blocks interaction

Ideally, we would like to capture the interaction of row, column and block con-straints together. But we then have three sets of nodes to consider, so that asimple bipartite matching seems no longer possible. We would need a more com-plex flow model like the one considered in [27] to capture the interaction of thealldifferent constraints.

5.5 Shaving

Shaving[4] is a very simple, yet effective technique which considers the completeconstraint set by trying to set variables to values in their domain. If the assign-ment fails, then this value can be removed from the domain, possibly eliminatingmany inconsistent values before starting search. In many (especially scheduling)problems the shaving is only applied to the minimal and maximal values in thedomain. For a puzzle like Sudoku the order of the values is not significant, wetherefore need to test all values in the domain. We use a simple program, whichoperates on a list of variables, and which removes any value in the domain ofeach variable which directly leads to a failure.

6 Propagation Schemes

The survey[3] describes different consistency methods for the alldifferent con-straint. We combine them with some of the redundant constraints above tocreate the propagation schemes in table 3. The propagation schemes form a lat-tice as shown in figure 5, a dotted line indicates that in our evalution we haven’tfound an example to differentiate the two schemes.

19

⊤

HACSC3

HACSC HACC3 HACS3

HACC HACS HAC3 HACV

HAC

BCI BCV

BC

FCI FCV

FC

⊥

Fig. 5. Lattice of propagation schemes

FC alldifferent with arc consistency for binary decomposition (forward checking)FCI forward checking with channelingBC alldifferent with bound-consistencyBCI bound-consistency with channelingHAC alldifferent with hyper arc-consistencyHACS HAC with same constraintsHACC HAC with colored matrixHAC3 HAC with 3rows/blocks interactionHACSC HAC with colored matrix and same constraintsHACS3 HAC with same constraints and 3rows/blocks interactionHACC3 HAC with colored matrix and 3rows/blocks interactionHACSC3 HAC with same constraints, colored matrix and 3rows/blocks interactionFCV alldifferent with forward checking plus shavingBCV alldifferent with bound-consistency plus shavingHACV alldifferent with hyper arc-consistency plus shaving

Table 3. Propagation Schemes

20

7 Evaluation

We decided to test our programs on different sets of published puzzle instances.In the UK, several newspapers print their daily Sudoku puzzle, and have pub-lished collections of these. We use sets from “The Times”[28], “The Daily Tele-graph”[29], “The Guardian”[30], “The Independent”[31], “The Daily Mail”[32]and “The Sun”[33]. There are also magazines which specialize in Sudoku puzzles,of particular note is the Japanese puzzle publisher Nikoli[34–36], whose puzzlesalso appear in the UK as [37]. In addition, there are books of puzzle instances[38–41]. Usually, each instance is given a problem difficulty by the puzzle designer. Indiscussion boards, people often complain about the arbitrary way this difficultyis assigned. In addition we have found collections[42, 43] of 450 and 7611 puzzleswhich have only 17 presets, the currently smallest known number of presets forwell posed problems. These are not classified by difficulty. The collection [44]contains mainly very hard problems.

In table 4 we summarize the results of our tests. Each entry is for a groupof instances with the same grade. We present the source, the grade, the numberof instances (Inst), and the percentage of problems solved search free for thedifferent propagation schemes considered.

Percentage SearchfreeSource Grade Inst FC FCI BC BCI HAC HACS HACC HACSC HAC3

[28] easy 4 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[28] mild 26 42.31 100.00 92.31 100.00 100.00 100.00 100.00 100.00 100.00[28] difficult 45 17.78 93.33 80.00 97.78 97.78 100.00 100.00 100.00 100.00[28] fiendish 25 0.00 36.00 28.00 80.00 88.00 100.00 88.00 100.00 100.00[29] gentle 32 21.88 100.00 93.75 100.00 100.00 100.00 100.00 100.00 100.00[29] moderate 66 7.58 100.00 81.82 100.00 100.00 100.00 100.00 100.00 100.00[29] tough 22 0.00 0.00 4.55 18.18 18.18 27.27 27.27 27.27 27.27[29] diabolical 12 0.00 0.00 0.00 8.33 8.33 16.67 16.67 16.67 16.67[30] easy 20 20.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[30] medium 40 15.00 97.50 92.50 97.50 97.50 100.00 100.00 100.00 100.00[30] hard 40 0.00 45.00 42.50 90.00 92.50 100.00 97.50 100.00 100.00[31] elementary 10 80.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[31] intermediate 50 48.00 98.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[31] advanced 68 22.06 95.59 94.12 100.00 100.00 100.00 100.00 100.00 100.00[32] none 150 77.33 99.33 98.00 100.00 100.00 100.00 100.00 100.00 100.00[33] teasers 40 87.50 100.00 97.50 100.00 100.00 100.00 100.00 100.00 100.00[33] toughies 50 58.00 100.00 98.00 100.00 100.00 100.00 100.00 100.00 100.00[33] terminators 35 37.14 97.14 80.00 97.14 97.14 97.14 97.14 97.14 97.14[34] easy 37 48.65 100.00 97.30 100.00 100.00 100.00 100.00 100.00 100.00[34] medium 45 15.56 97.78 86.67 100.00 100.00 100.00 100.00 100.00 100.00[34] hard 17 0.00 100.00 47.06 100.00 100.00 100.00 100.00 100.00 100.00

[35, 36] level2 18 11.11 100.00 83.33 100.00 100.00 100.00 100.00 100.00 100.00[35, 36] level3 23 4.35 100.00 73.91 100.00 100.00 100.00 100.00 100.00 100.00[35, 36] level4 24 8.33 100.00 66.67 100.00 100.00 100.00 100.00 100.00 100.00[35, 36] level5 25 0.00 100.00 60.00 100.00 100.00 100.00 100.00 100.00 100.00[35, 36] level6 28 0.00 35.71 42.86 89.29 89.29 100.00 92.86 100.00 100.00[35, 36] level7 29 0.00 6.90 37.93 79.31 79.31 100.00 93.10 100.00 100.00[35, 36] level8 29 0.00 3.45 20.69 65.52 68.97 100.00 93.10 100.00 100.00[35, 36] level9 20 0.00 0.00 25.00 55.00 65.00 100.00 80.00 100.00 100.00[35, 36] level10 14 0.00 0.00 0.00 28.57 28.57 92.86 57.14 100.00 92.86

[37] easy 27 70.37 100.00 96.30 100.00 100.00 100.00 100.00 100.00 100.00[37] medium 20 10.00 95.00 80.00 95.00 95.00 100.00 95.00 100.00 100.00[37] hard 12 0.00 8.33 33.33 83.33 83.33 100.00 100.00 100.00 100.00[41] easy 10 70.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00

Table 4: Summary (cont’d)

21

Percentage SearchfreeSource Grade Inst FC FCI BC BCI HAC HACS HACC HACSC HAC3

[41] moderate 40 65.00 100.00 97.50 100.00 100.00 100.00 100.00 100.00 100.00[41] tricky 40 55.00 92.50 97.50 100.00 100.00 100.00 100.00 100.00 100.00[41] difficult 40 15.00 92.50 90.00 100.00 100.00 100.00 100.00 100.00 100.00[41] challenging 40 0.00 92.50 60.00 95.00 95.00 100.00 100.00 100.00 100.00[40] level1 40 80.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[40] level2 40 80.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[40] level3 40 80.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[40] level4 40 72.50 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[40] level5 41 24.39 100.00 92.68 100.00 100.00 100.00 100.00 100.00 100.00[39] easy 4 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[39] harder 4 50.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[39] even-harder 4 0.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[38] easy 50 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[38] medium 60 90.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00[38] difficult 50 22.00 100.00 92.00 100.00 100.00 100.00 100.00 100.00 100.00[38] super-difficult 40 0.00 77.50 60.00 92.50 92.50 100.00 100.00 100.00 100.00[42] none 450 0.00 55.11 40.22 81.11 81.56 90.44 85.11 90.44 90.44[43] none 7611 0.00 48.69 29.01 69.82 70.28 85.51 75.27 85.55 85.51[44] none 100 0.00 0.00 3.00 5.00 5.00 14.00 10.00 14.00 14.00

Table 4: Summary

The programs seem to be working quite well in finding solutions withoutsearch, except for the “tough” and “diabolical” puzzles from [29] and thosefrom [44]. These puzzles require either more powerful reasoning, or some formof search. Testing values with a shaving step works very well on these examples,as table 5 shows.

Preset Percentage SearchfreeSource Grade Inst Min Avg Max FC BC HAC FCV BCV HACV

[28] easy 4 34 34.75 36 100.00 100.00 100.00 100.00 100.00 100.00[28] mild 26 27 29.12 32 42.31 92.31 100.00 100.00 100.00 100.00[28] difficult 45 23 27.09 30 17.78 80.00 97.78 97.78 100.00 100.00[28] fiendish 25 22 25.32 28 0.00 28.00 88.00 76.00 100.00 100.00[29] gentle 32 26 28.28 31 21.88 93.75 100.00 96.88 100.00 100.00[29] moderate 66 26 27.88 30 7.58 81.82 100.00 98.48 100.00 100.00[29] tough 22 26 28.05 31 0.00 4.55 18.18 95.45 100.00 100.00[29] diabolical 12 24 28.00 30 0.00 0.00 8.33 100.00 100.00 100.00[30] easy 20 24 28.05 31 20.00 100.00 100.00 100.00 100.00 100.00[30] medium 40 20 25.10 31 15.00 92.50 97.50 87.50 100.00 100.00[30] hard 40 20 22.88 27 0.00 42.50 92.50 55.00 100.00 100.00[31] elementary 10 28 31.20 34 80.00 100.00 100.00 100.00 100.00 100.00[31] intermediate 50 24 29.02 34 48.00 100.00 100.00 100.00 100.00 100.00[31] advanced 68 25 28.43 33 22.06 94.12 100.00 100.00 100.00 100.00[32] none 150 30 32.01 33 77.33 98.00 100.00 100.00 100.00 100.00[33] teasers 40 35 35.98 37 87.50 97.50 100.00 100.00 100.00 100.00[33] toughies 50 32 35.54 36 58.00 98.00 100.00 100.00 100.00 100.00[33] terminators 35 30 34.20 36 37.14 80.00 97.14 100.00 100.00 100.00[34] easy 37 24 29.54 37 48.65 97.30 100.00 100.00 100.00 100.00[34] medium 45 20 25.91 36 15.56 86.67 100.00 88.89 100.00 100.00[34] hard 17 21 25.29 29 0.00 47.06 100.00 58.82 100.00 100.00

[35, 36] level2 18 22 24.39 28 11.11 83.33 100.00 100.00 100.00 100.00[35, 36] level3 23 20 23.87 26 4.35 73.91 100.00 65.22 100.00 100.00[35, 36] level4 24 20 23.63 29 8.33 66.67 100.00 66.67 100.00 100.00[35, 36] level5 25 20 23.96 30 0.00 60.00 100.00 80.00 100.00 100.00[35, 36] level6 28 20 23.61 28 0.00 42.86 89.29 60.71 100.00 100.00[35, 36] level7 29 20 24.10 32 0.00 37.93 79.31 58.62 100.00 100.00[35, 36] level8 29 21 23.28 29 0.00 20.69 68.97 41.38 100.00 100.00[35, 36] level9 20 22 23.95 27 0.00 25.00 65.00 40.00 100.00 100.00[35, 36] level10 14 22 24.64 29 0.00 0.00 28.57 35.71 100.00 100.00

Table 5: Shave Summary (cont’d)

22

Preset Percentage SearchfreeSource Grade Inst Min Avg Max FC BC HAC FCV BCV HACV

[37] easy 27 24 29.41 37 70.37 96.30 100.00 100.00 100.00 100.00[37] medium 20 20 25.15 30 10.00 80.00 95.00 85.00 100.00 100.00[37] hard 12 20 24.58 28 0.00 33.33 83.33 58.33 100.00 100.00[41] easy 10 28 31.10 34 70.00 100.00 100.00 100.00 100.00 100.00[41] moderate 40 26 29.50 34 65.00 97.50 100.00 100.00 100.00 100.00[41] tricky 40 24 28.48 33 55.00 97.50 100.00 100.00 100.00 100.00[41] difficult 40 24 28.27 34 15.00 90.00 100.00 100.00 100.00 100.00[41] challenging 40 24 28.32 32 0.00 60.00 95.00 97.50 100.00 100.00[40] level1 40 32 34.58 36 80.00 100.00 100.00 100.00 100.00 100.00[40] level2 40 31 32.02 33 80.00 100.00 100.00 100.00 100.00 100.00[40] level3 40 29 29.93 31 80.00 100.00 100.00 100.00 100.00 100.00[40] level4 40 26 28.25 30 72.50 100.00 100.00 100.00 100.00 100.00[40] level5 41 24 25.80 31 24.39 92.68 100.00 92.68 100.00 100.00[39] easy 4 32 35.00 36 100.00 100.00 100.00 100.00 100.00 100.00[39] harder 4 29 30.50 32 50.00 100.00 100.00 100.00 100.00 100.00[39] even-harder 4 24 24.50 25 0.00 100.00 100.00 50.00 100.00 100.00[38] easy 50 35 43.32 48 100.00 100.00 100.00 100.00 100.00 100.00[38] medium 60 26 35.48 40 90.00 100.00 100.00 100.00 100.00 100.00[38] difficult 50 24 27.84 30 22.00 92.00 100.00 98.00 100.00 100.00[38] super-difficult 40 24 27.68 31 0.00 60.00 92.50 92.50 100.00 100.00[42] none 450 17 17.00 17 0.00 40.22 81.56 1.56 100.00 100.00[43] none 7611 17 17.00 17 0.00 29.01 70.28 0.62 99.89 100.00[44] none 100 17 21.51 31 0.00 3.00 5.00 4.00 93.00 100.00

Table 5: Shave Summary

We can see that all problems can be solved by using shaving techniques com-bined with hyper arc-consistency, and nearly all by combining bound-consistencywith shaving. Using only forward checking together with shaving works well aslong as enough presets are given. On the minimal problems it nearly always failsto find solutions.

Table 6 shows the execution times needed for solving the puzzles. We use asimple most-constrained/indomain, complete search routine (L) together withour different propagation schemes. The models with channeling are penalizedby our naive hyper arc-consistent implementation of the inverse constraint. Thetimes show the minimal, maximal, average and median times over all examplesfrom the data sets above. All times are in milliseconds with ECLiPSe 5.8 ona 1.5 GHz/1Gb laptop. The best results are obtained using bound-consistentalldifferent constraints together with one shaving step before search.

Program Min Max Avg MedianHACV 0 7281 349 321

FCL 0 25077 745 140BCL 0 3715 51 40FCIL 340 10686 761 741BCIL 190 2053 702 730

HACL 0 1812 322 330FCVL 0 25026 725 111BCVL 0 771 66 40

Table 6. Runtime (ms)

23

We were also interested if the published puzzles were locally minimal, i.e.did not contain redundant hints. Table 7 shows the results on some tests. Mostof the published puzzles are not locally minimal, they often contain more than10 redundant hints which can be removed without loosing the uniqueness of thesolution. The instances from [42, 43] are an exception, they were collected tobe locally minimal. Most of the hard puzzles from [44] are also minimal. Wecomputed this reduction with a simple, committed choice program. We searchfor a hint that can be eliminated; if one is found, we commit to this choice and tryto find additional redundant hints. The method does not guarantee minimalityof the reduction, but leads to locally minimal instances.

Preset Locally ReducedSource Grade Inst Min Avg Max Minimal Min Avg Max

[28] easy 4 34 34.75 36 0 23 23.75 24[28] mild 26 27 29.12 32 0 22 23.62 25[28] difficult 45 23 27.09 30 1 22 23.89 25[28] fiendish 25 22 25.32 28 6 21 23.80 26[29] gentle 32 26 28.28 31 0 23 24.03 26[29] moderate 66 26 27.88 30 0 22 23.77 26[29] tough 22 26 28.05 31 0 22 24.32 26[29] diabolical 12 24 28.00 30 0 23 24.42 26[30] easy 20 24 28.05 31 0 22 22.95 25[30] medium 40 20 25.10 31 6 19 22.30 25[30] hard 40 20 22.88 27 14 19 21.80 26[31] elementary 10 28 31.20 34 0 21 23.80 26[31] intermediate 50 24 29.02 34 0 22 23.94 27[31] advanced 68 25 28.43 33 0 22 23.79 26[32] none 150 30 32.01 33 0 22 23.84 26[33] teasers 40 35 35.98 37 0 22 24.35 26[33] toughies 50 32 35.54 36 0 22 24.68 27[33] terminators 35 30 34.20 36 0 22 24.91 27[34] easy 37 24 29.54 37 0 21 23.11 25[34] medium 45 20 25.91 36 4 20 22.64 26[34] hard 17 21 25.29 29 3 20 22.82 25

[35, 36] level2 18 22 24.39 28 2 20 22.72 25[35, 36] level3 23 20 23.87 26 7 20 22.35 24[35, 36] level4 24 20 23.63 29 6 19 22.13 25[35, 36] level5 25 20 23.96 30 2 19 22.40 25[35, 36] level6 28 20 23.61 28 9 20 22.54 24[35, 36] level7 29 20 24.10 32 6 19 22.38 25[35, 36] level8 29 21 23.28 29 10 19 22.24 24[35, 36] level9 20 22 23.95 27 7 22 22.90 25[35, 36] level10 14 22 24.64 29 4 22 23.36 25

[37] easy 27 24 29.41 37 1 20 23.26 26[37] medium 20 20 25.15 30 2 20 22.20 24[37] hard 12 20 24.58 28 2 20 22.83 25[41] easy 10 28 31.10 34 0 22 23.70 25[41] moderate 40 26 29.50 34 0 22 24.07 27[41] tricky 40 24 28.48 33 1 21 23.57 25[41] difficult 40 24 28.27 34 0 22 24.13 26[41] challenging 40 24 28.32 32 0 22 24.05 26[40] level1 40 32 34.58 36 0 21 23.55 26[40] level2 40 31 32.02 33 0 21 23.55 25[40] level3 40 29 29.93 31 0 22 23.40 25[40] level4 40 26 28.25 30 0 21 23.02 25[40] level5 41 24 25.80 31 0 20 22.68 24[39] easy 4 32 35.00 36 0 23 24.00 25[39] harder 4 29 30.50 32 0 23 24.00 26[39] even-harder 4 24 24.50 25 1 21 22.75 25[38] easy 50 35 43.32 48 0 22 24.22 28

Table 7: Reduction Summary (cont’d)

24

Preset Locally ReducedSource Grade Inst Min Avg Max Minimal Min Avg Max

[38] medium 60 26 35.48 40 0 22 23.50 25[38] difficult 50 24 27.84 30 0 22 24.22 26[38] super-difficult 40 24 27.68 31 1 22 24.40 29[42] none 450 17 17.00 17 450 17 17.00 17[44] none 100 17 21.51 31 82 17 21.14 26

Table 7: Reduction Summary

8 Problem Generator

We have seen that with different variants of our model we can solve puzzle in-stances of varying difficulty. Can we use the same techniques to generate puzzlesas well? We can consider three possible approaches to the puzzle generation:

1. We start with a random partial assignment of values to the grid and checkif this is a well posed puzzle, and whether it is search free for some strategy.This does not give a guarantee to find solutions of a predetermined difficulty,in addition the partial assignment may be inconsistent and we have to decidea priori how many preset values we want to generate.

2. We start with an empty grid and add preset values one by one until there is aunique solution or we detect inconsistency. If required, we can add redundanthints until the problem is search free for a given propagation scheme. Wewill need a backtracking mechanism to escape inconsistent assignments.

3. We start with a full grid and remove values until the problem is no longerwell posed or no longer search free for some propagation scheme. This willlead to problems which are locally minimal and well posed. Generating initialstarting solutions is quite simple, but may still require some backtracking.

The third approach can either generate well posed, locally minimal problemsor can be used to find search-free puzzles of a given difficulty grade that cannot be further reduced without loosing the search free property. This bottom-upproblem generation has been used in [16] to generate solvable problem instancesof quasi-group completion for CSP or SAT solvers. In that case one starts witha completed quasi-group, removes a number of values from the grid, and is leftwith a feasible quasi-group completion problem. The generated problem mayhave multiple solutions, but that is ok in the given context. As we are interestedin well posed problems, we have to make the removal steps one by one as longas the solution stays unique.

9 Conclusions

In this paper we have discussed a constraint formulation of the Sudoku puz-zle, and have seen that we can use different modelling techniques to find puz-zle solutions without search. The problem is closely related to the quasi-groupcompletion problem, which has been extensively studied in the constraint com-munity. The additional alldifferent constraints on the major blocks give us more

25

chances to find redundant constraints that can help in the solution process. Ourmethods can solve many, but not all, published examples without search, somerequiring a shaving technique. We can use the model to generate puzzles of agiven difficulty, again following techniques developed for quasi-group comple-tion. Sudoku puzzles are not only an interesting addition to the problem set fordeveloping constraint techniques, but also provide a unique opportunity to makemore people interested in constraint programming.

Acknowledgment

We want to thank J. Schimpf for suggesting a more elegant shaving implementa-tion, and G. Stertenbrink for pointing out additional references and data sets.

References

1. various: Sudoku wikipedia entry. http://en.wikipedia.org/wiki/Sudoku (2005)2. Apt, K.: Principles of Constraint Programming. Cambridge University Press (2003)3. van Hoeve, W.: The alldifferent constraint: A survey. sherry.ifi.unizh.ch/

article/vanhoeve01alldifferent.html (2001)4. Torres, P., Lopez, P.: Overview and possible extensions of shaving techniques for

job-shop problems. In: 2nd International Workshop on Integration of AI and ORtechniques in Constraint Programming for Combinatorial Optimization Problems(CP-AI-OR’2000). (2000) 181–186

5. Beldiceanu, N., Carlsson, M., Rampon, J.: Global constraint catalog. TechnicalReport T2005:08, SICS (2005)

6. Regin, J., Gomes, C.: The cardinality matrix constraint. In: CP 2004. (2004)7. Beldiceanu, N., Katriel, I., Thiel, S.: GCC-like restrictions on the same con-

straint. In: Recent Advances in Constraints (CSCLP 2004). Volume 3419 of LNCS.,Springer-Verlag (2005) 1–11

8. Lauriere, J.: A language and a program for stating and solving combinatorialproblems. Artificial Intelligence 10 (1978) 29–127

9. Haralick, R., Elliott, G.: Increasing tree search efficiency for constraint satisfactionproblems. Artificial Intelligence 14 (1980) 263–313

10. Smith, B., Brailsford, S., Hubbard, P., Williams, H.: The progressive party prob-lem: Integer linear programming and constraint programming compared. Con-straints 1 (1996) 119–138

11. Harvey, W.: The fully social golfer problem. In: SymCon’03: Third InternationalWorkshop on Symmetry in Constraint Satisfaction Problems. (2003) 75–85

12. Jefferson, C., Miguel, A., Miguel, I., Tarim, A.: Modelling and solving Englishpeg solitaire. In: Fifth International Workshop on Integration of AI and ORTechniques in Constraint Programming for Combinatorial Optimization Problems(CPAIOR’03). (2003) 261–275

13. Simonis, H.: Interactive problem solving in ECLiPSe. ECLiPSe User GroupNewsletter (2003)

14. Stergiou, K., Walsh, T.: The difference all-difference makes. In: IJCAI-99. (1999)15. Shaw, P., Stergiou, K., Walsh, T.: Arc consistency and quasigroup completion. In:

ECAI-98 workshop on binary constraints. (1998)

26

16. Achlioptas, D., Gomes, C., Kautz, H., Selman, B.: Generating satisfiable probleminstances. In: AAAI-2000. (2000)

17. Dincbas, M., Simonis, H.: APACHE - a constraint based, automated stand allo-cation system. In: Advanced Software Technology in Air Transport (ASTAIR’91).(1991) 267–282

18. Simonis, H.: Building industrial applications with constraint programming. InCommon, H., Marche, C., Treinen, R., eds.: Constraints in Computational Logics- Theory and Applications, Springer Verlag (2001)

19. Eppstein, D.: Nonrepetitive paths and cycles in graphs with application to Sudoku.ACM Computing Research Repository (2005)

20. Wallace, M., Novello, S., Schimpf, J.: ECLiPSe : A platform for constraint logicprogramming. ICL Systems Journal 12 (1997)

21. Cheadle, A.M., Harvey, W., Sadler, A.J., Schimpf, J., Shen, K., Wallace, M.G.:ECLiPSe: An introduction. Technical Report IC-Parc-03-1, IC-Parc, Imperial Col-lege London (2003)

22. Dincbas, M., Simonis, H., Van Hentenryck, P.: Solving large combinatorial prob-lems in logic programming. J. Log. Program. 8 (1990) 75–93

23. Puget, J.F.: A fast algorithm for the bound consistency of alldiff constraints. In:AAAI. (1998) 359–366

24. Lopez-Ortiz, A., Quimper, C.G., Tromp, J., van Beek, P.: A fast and simplealgorithm for bounds consistency of the alldifferent constraint. In: IJCAI. (2003)

25. Regin, J.C.: A filtering algorithm for constraints of difference in CSPs. In: AAAI.(1994) 362–367

26. Cheng, B.M.W., Lee, J.H.M., Wu, J.C.K.: Speeding up constraint propagation byredundant modeling. In Freuder, E.C., ed.: CP. Volume 1118 of Lecture Notes inComputer Science., Springer (1996) 91–103

27. Beldiceanu, N., Katriel, I., Thiel, S.: Filtering algorithms for the same con-straint. In: Proceedings of International Conference on Integration of AI and ORTechniques in Constraint Programming for Combinatorial Optimisation Problems(CP’AI’OR) 2004. (2004)

28. Gould, W.: The Times Su Doku, Book 1. The Times (2005)29. Mepham, M.: The Daily Telegraph Sudoku. The Daily Telegraph (2005)30. n/a: The Guardian Sudoku Book 1. Guardian Books (2005)31. Norris, H.: The Independent Book of Sudoku Volume 1. The Independent (2005)32. n/a: Daily Mail Sudoku. Associated Newspapers Ltd (2005)33. Perry, J.: The Sun Doku. The Sun (2005)34. various: Sudoku 1. Nikoli (1988) In Japanese.35. various: Gekikara Sudoku 1. Nikoli (2004) In Japanese.36. various: Gekikara Sudoku 2. Nikoli (2004) In Japanese.37. n/a: Sudoku - the original hand-made puzzles. Puzzler 1 (2005)38. Vorderman, C.: Carol Vorderman’s How to do Sudoku. Random House (2005)39. Wilson, R.: How to solve Sudoku. The Infinite Ideas Company (2005)40. Sinden, P.: The Little Book of Sudoku Volume 1. Michael O’Mara Books (2005)41. Huckvale, M.: The Big Book of Su Doku. Orion (2005)42. Royle, G.: Minimum Sudoku. http://www.csse.uwa.edu.au/∼gordon/

sudokumin.php (2005)43. Royle, G.: Minimum Sudoku. http://www.csse.uwa.edu.au/∼gordon/sudoku17

(2005)44. Stertenbrink, G., Meyrignac, J.C.: 100 Sudoku problems. http://magictour.

free.fr/top100 (2005)

27

A Constraint Programming Approach to theHospitals / Residents Problem

David F. Manlove?,†, Gregg O’Malley†, Patrick Prosser, and Chris Unsworth‡

Department of Computing Science, University of Glasgow, Scotland.davidm/gregg/pat/[email protected].

Abstract. An instance I of the Hospitals / Residents problem (HR)involves a set of residents (graduating medical students) and a set ofhospitals, where each hospital has a given capacity. The residents havepreferences for the hospitals, as do hospitals for residents. A solution ofI is a stable matching, which is an assignment of residents to hospitalsthat respects the capacity conditions and preference lists in a precise way.In this paper we present constraint encodings for HR that give rise toimportant structural properties. We also present a computational studyusing both randomly-generated and real-world instances. Our study sug-gests that Constraint Programming is indeed an applicable technologyfor solving this problem, in terms of both theory and practice.

1 Introduction

Gale and Shapley described in their seminal paper [4] the classical Hospitals /Residents problem (HR), referred to by the authors as the College Admissionsproblem. An instance of HR involves a set of residents (i.e. graduating medicalstudents) and a set of hospitals. Each resident ranks in order of preference asubset of the hospitals. Each hospital has an integral capacity, and ranks in orderof preference those residents who ranked it. We seek to match each residentto an acceptable hospital, in such a way that a hospital’s capacity is neverexceeded. Moreover the matching must be stable – a formal definition of stabilityfollows, but informally stability ensures that no resident and hospital, not alreadymatched together, would rather be assigned to one another than remain withtheir assignees. Such a resident and hospital could form a private arrangementoutside the matching, undermining its integrity. Gale and Shapley [4] describeda linear-time algorithm for finding a stable matching, given an instance of HR.

Many centralised matching schemes that automate the process of assigningresidents to hospitals employ algorithms that solve HR and its variants [22]. Forexample, the National Resident Matching Program (NRMP) in the US [20] isperhaps the largest such scheme. The NRMP has been in operation since 1952and handles the annual allocation of some 31,000 residents to hospitals. Coun-terparts of the NRMP elsewhere are the Canadian Resident Matching Service? Supported by RSE / Scottish Executive Personal Research Fellowship.† Supported by EPSRC grant GR/R84597/01.‡ Supported by an EPSRC studentship.

28

(CaRMS) [3] and the Scottish PRHO Allocation scheme (SPA) [11]. Similarmatching schemes are also used in educational and vocational contexts.

A special case of HR occurs when each hospital has capacity 1 – this is theStable Marriage problem with Incomplete lists (SMI). In this context, residentsare referred to as men, whilst hospitals are referred to as women. A special caseof SMI occurs when the numbers of men and women are equal, and each manfinds all women acceptable and vice versa – this is the classical Stable Marriageproblem (SM), also introduced by Gale and Shapley [4]. A specialised linear-time algorithm for SM, known as the Gale / Shapley (GS) algorithm [4], canbe generalised to the SMI case [10, Section 1.4.2]. Using a process known as“cloning hospitals” (described in more detail in Section 3), a given instance Iof HR may be transformed to an instance J of SMI, and the GS algorithm canbe applied to J in order to give a stable matching in I. However in generalthis method expands the instance size, so that in practice specialised algorithms(such as those described in [10, Section 1.6]; see also Figure 2) are used to solveHR directly and achieve a better worst-case time complexity.

Over the last few decades, stable matching problems, and SM in particular,have been the focus of much attention in the literature [4, 13, 10, 24]. Severalencodings of SM and its variants as a Constraint Satisfaction Problem (CSP)have been formulated [1, 6, 14, 7–9, 17, 25, 26]. However, no encoding for HR hasbeen considered before now.

This paper is concerned with a Constraint Programming (CP) approach tosolving HR. We firstly present in Section 3 a cloned model for HR, indicating howexisting formulations of SMI as a CSP [6] can be used in order to model HR. Wethen present in Section 4 a constraint-based model of HR that deals directly withan HR instance without cloning, achieving improved time and space complexi-ties. We show that the effect of Arc Consistency (AC) propagation [2] applied tothis model yields the same structure as the action of established algorithms forHR [4, 10]. As a consequence, a stable matching for the given HR instance canbe obtained without search (in fact we can in general obtain two complemen-tary stable matchings following AC propagation, with optimality properties forthe residents and hospitals respectively). We also demonstrate how a failure-freeenumeration can be used to find all solutions for a given HR instance withoutsearch. These results therefore extend analogous results presented in [6] for SMI.In Sections 5 and 6, we present specialised binary and n-ary constraints for HR,comparing and constrasting the time and space requirements for establishing ACwith the models presented in Sections 3 and 4. Then, in Section 7, we describethe results of an empirical study which compares the various models presented inthis paper in practice, on both randomly-generated and real-world data. Finally,Section 8 presents some concluding remarks, and discusses future work.

The models in Sections 4-6 are non-trivial extensions of earlier constraintmodels presented for SMI [6, 17, 25, 26]. In the SMI case, clearly each womancan be assigned at most one man, but to model an HR instance without cloning,the main challenges are to maintain a representation of the set of assignees of agiven hospital hj , and of the identity of the worst resident assigned to hj .

The benefits of our approach are two-fold: firstly, the CSP models presented

29

Residents’ preferences M0 Mz Hospitals’ preferences

r1 : h1 h3 – – h1 : (2) : r3 r7 r5 r2 r4 r6 r1

r2 : h1 h5 h4 h3 h1 h3 h2 : (3) : r5 r6 r3 r4

r3 : h1 h2 h5 h1 h1 h3 : (1) : r2 r5 r6 r1 r7

r4 : h1 h2 h4 h2 h2 h4 : (1) : r8 r2 r4 r7

r5 : h3 h1 h2 h3 h1 h5 : (1) : r3 r7 r6 r8 r2

r6 : h3 h2 h1 h5 h2 h2

r7 : h3 h4 h5 h1 h4 h5

r8 : h5 h4 h5 h4

Fig. 1. An HR instance. The GS-list entries are underlined, and the middle two columnsindicate the residents’ assigned hospitals in M0 and Mz (r1 is unassigned in both).

here for HR indicate that AC propagation using a CP toolkit yields the samestructure as given by established linear-time algorithms for HR, from which allsolutions for a given instance can be generated in a failure-free manner withoutsearch. Secondly, and more importantly, our models can be used as a basis onwhich additional constraints can be imposed, covering variants of HR that arisenaturally in practical applications, but which cannot be accommodated easily byexisting algorithms. Examples of such variants include the Hospitals / Residentsproblem with Ties (in which preference lists may include ties; see Section 8 formore details), the Hospitals / Residents problem with Couples (in which couplessubmit joint preference lists), and the generalisation of the Sex-Equal StableMarriage problem (in which one seeks a stable matching such that the sums ofthe ranks of the men’s and women’s partners are as close as possible) to the HRcase. All of these variants are known to be NP-hard [16, 21, 12].

In the next section we present notation and terminology relating to HR,which will be assumed in the remainder of this paper, and we also present someimportant structural and algorithmic results.

2 Definitions and fundamental results

We now give a formal definition of HR. An instance I of HR comprises a setR = r1, . . . , rn of residents and a set H = h1, . . . , hm of hospitals. Eachresident ri ∈ R has an acceptable set of hospitals Ai ⊆ H; moreover ri ranks Ai

in strict order of preference. For each hj ∈ H, denote by Bj ⊆ R those residentswho find hj acceptable; hj ranks Bj in strict order of preference. Finally, eachhospital hj ∈ H has an associated capacity, denoted by cj ∈ Z+, indicatingthe number of posts that hj has. For each ri ∈ R, let lri denote the lengthof ri’s preference list, and for each hj ∈ H, let lhj denote the length of hj ’spreference list; we assume that cj ≤ lhj . Let L denote the total length of theresidents’ preference lists in I. Given ri ∈ R and hj ∈ Ai, define rank(ri, hj) tobe the position of hj in ri’s preference list; rank(hj , ri) is defined similarly. Anexample HR instance is shown in Figure 1 (the hospital capacities are indicatedin brackets).

30

An assignment M is a subset of R ×H such that (ri, hj) ∈ M implies thathj ∈ Ai (i.e. ri finds hj acceptable). If (ri, hj) ∈ M , we say that ri is assignedto hj , and hj is assigned ri. For any q ∈ R ∪ H, we denote by M(q) the setof assignees of q in M . If ri ∈ R and M(ri) = ∅, we say that ri is unassigned,otherwise ri is assigned. Similarly, any hospital hj ∈ H is under-subscribed, fullor over-subscribed according as |M(hj)| is less than, equal to, or greater thancj , respectively.

A matching M is an assignment such that |M(ri)| ≤ 1 for each ri ∈ R and|M(hj)| ≤ cj for each hj ∈ H (i.e. each resident is assigned to at most onehospital, and no hospital is over-subscribed). For convenience, given a residentri ∈ R such that M(ri) 6= ∅, where there is no ambiguity the notation M(ri) isalso used to refer to the single member of M(ri).

A blocking pair relative to a matching M is a (resident,hospital) pair (ri, hj) ∈(R × H)\M such that (i) hj ∈ Ai, (ii) either ri is unassigned in M or prefershj to M(ri), and (iii) either hj is under-subscribed or prefers ri to at least onemember of M(hj). A matching is stable if it admits no blocking pair.

Gale and Shapley [4] described an algorithm for finding a stable matching ina given HR instance I, which is known as the resident-oriented Gale/Shapley(RGS) algorithm [10, Section 1.6.3]. This algorithm finds the resident-optimalstable matching M0 in I, in which each assigned resident is assigned to thebest hospital that he could obtain in any stable matching. On the other hand,the hospital-oriented (HGS) algorithm [10, Section 1.6.2] is a second algorithmfor HR that finds the hospital-optimal stable matching Mz in I, in which eachhospital is assigned the best set of residents that it could obtain in any stablematching. Figure 1 includes columns that give M0 and Mz for the example HRinstance shown. In general, the optimality property of each of M0 and Mz isachieved at the expense of the hospitals and residents respectively (the “pessi-mality” of each of these matchings for the relevant parties is discussed in Sections1.6.2 and 1.6.5 of [10]). The RGS and HGS algorithms for HR are shown in Fig-ure 2 (the term “delete the pair (ri, hj)” refers to the operations of deleting ri

from hj ’s preference list and vice versa). Using a suitable choice of data struc-tures (extending those described in [10, Section 1.2.3]), both the RGS and theHGS algorithms can be implemented to run in O(L) time and O(nm) space.

The deletions made by each of the RGS and HGS algorithms have the effectof reducing the original set of preference lists in I. The reduced lists returned bythe RGS (respectively HGS) algorithm are known as the RGS-lists (respectivelyHGS-lists). The intersection of the RGS-lists and the HGS-lists yields the GS-lists. (E.g. the GS-lists for the HR instance shown in Figure 1 are representedas underlined preference list entries.) The GS-lists in I have several useful prop-erties, which are summarised below (these properties follow as a consequence ofLemmas 1.6.2 and 1.6.4, and Theorems 1.6.1 and 1.6.2 of [10]):

Theorem 1. For a given instance of HR,(i) all stable matchings are contained in the GS-lists;(ii) in M0, each resident with a non-empty GS-list is assigned to the first hospitalon his GS-list, whilst each resident with an empty GS-list is unassigned;

31

M = ∅;while (some ri ∈ R is unassigned

and ri has a non-empty list)hj = first hospital on ri’s list;/* ri applies to hj */M = M ∪ (ri, hj) ;if (hj is over-subscribed)

rk = worst resident assigned to hj ;M = M\(rk, hj) ;

if (hj is full)rk = worst resident assigned to hj ;for (each successor rz of rk on hj ’s list)

delete the pair (rz, hj);

M = ∅;while (some hj ∈ H is under-subscribed

and some ri ∈ Bj is not assigned to hj)ri = first such resident on hj ’s list;/* hj offers a post to ri */if (ri is assigned)

hk = M(ri);M = M\(ri, hk);

M = M ∪ (ri, hj);for (each successor hz of hj on ri’s list)

delete the pair (ri, hz);

Fig. 2. RGS algorithm for HR; HGS algorithm for HR.

(iii) in Mz, each hospital hj is assigned the first mj members of its GS-list,where mj = mincj , g

hj and gh

j is the length of hj’s GS-list.

Given any q ∈ R ∪H, we denote by GS(q) the set of hospitals or residents (asappropriate) that belong to q’s GS-list in I.

Additional important results, attributed to Gale and Sotomayor [5] andRoth [23], concern residents who are unassigned, and hospitals that are under-subscribed, in stable matchings in I. These results are collectively known as theRural Hospitals Theorem [10, Section 1.6.4], and may be stated as follows:

Theorem 2. For a given instance of HR,(i) each hospital is assigned the same number of residents in all stable matchings;(ii) exactly the same residents are unassigned in all stable matchings;(iii) any hospital that is under-subscribed in one stable matching is assignedprecisely the same set of residents in all stable matchings.

3 A cloned model

In this section we indicate how an instance of HR may be reduced to an instanceof SMI by “cloning” hospitals. This technique is described in [10, p.38]; see also[24, pp.131-132]. For completeness, we briefly restate the construction here. LetI be an instance of HR. We form an instance J of SMI by replacing each hospitalhj ∈ H by cj women in J , denoted by hk

j (1 ≤ k ≤ cj). The preference list of hkj

in J is identical to that of hj in I. Each resident ri in I corresponds to a man ri

in J , and each hospital hj in ri’s list in I is replaced by h1j h2

j . . . hcj

j , in thatorder, in J . It may then be shown that the stable matchings in I are in one-onecorrespondence with the stable matchings in J .

In order to obtain the GS-lists of I, we can model J using the “conflictmatrices” encoding of SMI as presented in [6]. In general AC may be establishedin O(edr) time, where e is the number of constraints, d is the domain size, andr is the arity of each constraint [2]. Due to the cloning technique, the number

32

1. yj,k < yj,k+1 (1 ≤ j ≤ m, 1 ≤ k ≤ cj − 1)

2. yj,k ≥ q ⇒ xi ≤ p (1 ≤ j ≤ m, 1 ≤ k ≤ cj , 1 ≤ q ≤ lhj )

3. xi 6= p ⇒ yj,k 6= q (1 ≤ i ≤ n, 1 ≤ p ≤ lri , 1 ≤ k ≤ cj)

4. (xi ≥ p ∧ yj,k−1 < q) ⇒ yj,k ≤ q (1 ≤ i ≤ n, 1 ≤ p ≤ lri , 1 ≤ k ≤ cj)

5. yj,cj < q ⇒ xi 6= p (1 ≤ j ≤ m, cj ≤ q ≤ lhj )

Fig. 3. Constraints for the CSP model of an HR instance.

of women in J is∑m

j=1 cj = O(cm), where c = maxcj : hj ∈ H. Given theconstruction of the encoding in J [6], it follows that e = O(nmc), d = O(n + m)and r = 2, so that the time and space complexities for finding the GS-lists in Iusing the cloned model are O((n + m)4c) and O((nmc)2) respectively.

4 A direct CSP-based model

We now present a direct CSP encoding of an HR instance that avoids cloning.Let I be an instance of HR. For ri ∈ R and hj ∈ H, we use the terminology ri

applies (or is assigned) to hj’s kth post (1 ≤ k ≤ cj) in the case that hj prefersexactly k− 1 members of M(hj) to ri. Also given a matching M , we denote theresident who is assigned to hj ’s kth post in M by Mk(hj) (1 ≤ k ≤ |M(hj)|).

We construct a CSP instance J with variables X = x1, . . . , xn and Y =yj,k : 1 ≤ j ≤ m ∧ 0 ≤ k ≤ cj, whose domains are initially defined as follows:

dom(xi) = 1, 2, . . . , lri ∪ m + 1 (1 ≤ i ≤ n)dom(yj,0) = 0 (1 ≤ j ≤ m)dom(yj,k) = k, k + 1, . . . , lhj ∪ n + k (1 ≤ j ≤ m ∧ 1 ≤ k ≤ cj).

For the xi variables (1 ≤ i ≤ n), the value m + 1 corresponds to the case thatri’s GS-list is empty, whilst the remaining values correspond to the ranks ofpreference list entries that belong to the GS-lists. A similar meaning appliesto the yj,k variables (1 ≤ j ≤ m, 1 ≤ k ≤ cj), except that the value n + kcorresponds to the case that hj ’s GS-list contains fewer than k entries.

More specificially, if min(dom(xi)) ≥ p (1 ≤ p ≤ lri ), then during the RGSalgorithm, ri applies to his pth-choice hospital or worse, so that in M0, either ri

is assigned to such a hospital or is unassigned. Similarly if max(dom(xi)) ≤ p,then during the HGS algorithm, ri was offered a post by his pth-choice hospitalor better, so that ri is assigned to such a hospital in Mz.

From the hospitals’ point of view, if min(dom(yj,k)) ≥ q (1 ≤ q ≤ lhj ),then during the HGS algorithm, hj offers its kth post to its qth-choice residentor worse, so that in Mz, either hj ’s kth post is filled by such a resident, or isunfilled. Similarly if max(dom(yj,k)) ≤ q, then during the RGS algorithm, someresident ri applied to hj ’s kth post, where rank(hj , ri) ≤ q, so that hj ’s kth postis filled by ri or better in M0.

The constraints in J are given in Figure 3 (in the context of Constraints 2-5,p denotes the rank of hj in ri’s list and q denotes the rank of ri in hj ’s list).

33

An interpretation of the constraints is now given. Constraint 1 ensures that hj ’sfilled posts are occupied by residents in preference order, and that if post k − 1is unfilled then so is post k. Constraint 2 states that if hj ’s kth post is filled bya resident no better than ri or is unfilled, then ri must be assigned to a hospitalno worse than hj . Constraints 3 and 5 reflect the consistency of deletions carriedout by the HGS and RGS algorithms respectively (i.e. if hj is deleted from ri’slist, then ri is deleted from hj ’s list, and vice versa). Finally Constraint 4 statesthat if ri is assigned to a hospital no better than hj or is unassigned, and hj ’sfirst k− 1 posts are filled by residents better than ri, then hj ’s kth post must befilled by a resident at least as good as ri.

It turns out that establishing AC in J yields a set of domains that correspondto the GS-lists in I. To demonstrate this, we define some additional notation.For each j (1 ≤ j ≤ m), define Sj = rank(hj , ri) : ri ∈ GS(hj). Let dj denotethe number of residents assigned to hospital hj in any stable matching in I. Foreach k (1 ≤ k ≤ dj), let qj,k = rank(hj ,Mzk

(hj)) and tj,k = rank(hj ,M0k(hj)).

The GS-domains for the variables in J are defined as follows:

dom(xi) =rank(ri, hj) : hj ∈ GS(ri), if GS(ri) 6= ∅m + 1, otherwise

dom(yj,k) =s ∈ Sj : qj,k ≤ s ≤ tj,k, if 1 ≤ k ≤ dj

n + k, if dj + 1 ≤ k ≤ cj .

We prove in [18] (we omit the proof here for space reasons) that, following ACpropagation in J , the domain of each variable is a subset of its GS-domain, andconversely, the GS-domains are arc consistent in J . Given that AC algorithmsfind the unique maximal set of arc consistent domains [2], we therefore have:

Theorem 3. Let I be an instance of HR, and let J be a CSP instance obtainedby the encoding of this section. Then the domains remaining after AC propaga-tion in J correspond exactly to the GS-lists in I.

For example, in the context of the HR instance given in Figure 1, the GS-domainsfor x2, y1,1 and y1,2 are 1, 3, 4, 1 and 3, 4 respectively. In general, followingAC propagation in J , matchings M0 and Mz may be obtained as follows. Letxi ∈ X. If xi = m+1, resident ri is unassigned in both M0 and Mz. Otherwise, inM0 (respectively Mz), ri is assigned to the hospital hj such that rank(ri, hj) = p,where p = min(dom(xi)) (respectively p = max(dom(xi))).

In the context of the time complexity function for establishing AC as men-tioned in Section 3, for this encoding we have e = O(Lc) and d = O(n + m)(recall that L is the total length of the residents’ preference lists in I). The con-straints shown in Figure 3 may be revised in O(1) time, assuming that upper andlower bounds for the variables’ domains are maintained throughout propagation.It follows by [27] that the time complexity for establishing AC in this model isO(Lc(n + m)). Since the space complexity is O(Lc), the model presented in thissection is more efficient than the cloned model in terms of both time and space.

The next result, proved in [18] (we also omit the proof here), states that theencoding presented above can be used to enumerate all the solutions of I in afailure-free manner using AC propagation with a value-ordering heuristic.

34

Theorem 4. Let I be an instance of HR and let J be a CSP instance obtainedby the encoding of this section. Then the following search process enumerates allsolutions in I without repetition and without ever failing due to an inconsistency:

– AC is established as a preprocessing step, and after each branching decisionincluding the decision to remove a value from a domain;

– if all domains are arc consistent and some variable xi has two or more valuesin its domain then search proceeds by setting xi to the minimum value p inits domain. On backtracking, the value p is removed from the domain of xi;

– when a solution is found, it is reported and backtracking is forced.

5 A specialised binary constraint

We now present a specialised binary constraint HR2 that acts between an inte-ger variable, representing a resident, and an object of type Hospital, enforcingstability and consistency. The model of this section involves an HR2 constraintbetween each acceptable (resident, hospital) pair.

5.1 Preliminaries

Our model involves a constrained integer variable xi corresponding to each res-ident ri ∈ R, as in Section 4, whose domain is initially defined as before, withsimilar meanings for the domain values. In addition, we associate a Hospitalobject yj with each hospital hj ∈ H, with the following attributes:

– cap : an integer constant equal to cj (the capacity of hospital hj).– post : an array of integers of length cap, which stores assignments to hospital

posts. Each array element is initialised to ∞ (i.e. the largest integer).– pref : a constrained integer variable whose initial domain is 1, 2, . . . , lhj

(corresponding to the ranks of residents in hj ’s list), plus the value n + 1(corresponding to hj being under-subscribed).

We also assume that we have the following functions, each being of O(1) com-plexity, that operate over constrained integer variables:

– getMin(v) delivers the smallest value in dom(v).– getMax(v) delivers the largest value in dom(v).– getNext(v, a) returns the smallest value greater than a in dom(v), assuming

that a < getMax(v), otherwise the function returns a.– setMax(v, a) sets the maximum value in dom(v) to be min(getMax(v), a).– remV al(v, a) removes the value a from dom(v).

We assume that constraints are processed by an arc consistency algorithm suchas AC5 [27] or AC3 [15]. That is, the algorithm has a stack of constraints that areawaiting revision, and if a variable v loses a value then all constraints involvingv are added to the stack along with the method that must be applied to those

35

1. xAppliesTo(y,yRx) 2. r = yRx;3. for (i = 1 to y.cap)4. if (y.post[i] = r)5. return;6. if (y.post[i] > r)7. swap(y.post[i],r);8. if (y.post[y.cap] < ∞)9. setMax(y.pref,y.post[y.cap]);

1. getLastChoice(y) 2. choice = getMin(y.pref);3. for (i = 2 to y.cap)4. choice = getNext(y.pref,choice);5. return choice;

Fig. 4. (a) Method xAppliesTo. (b) Method getLastChoice.

constraints (so that the stack contains methods and their arguments). Further-more, we also assume that a call to a method, together with its argument, isonly added to the stack if it is not already on the stack. In our pseudocode belowwe use the . (dot) operator as an attribute selector, such that a.b delivers the battribute of a.

The xAppliesTo method of Figure 4(a) is called when a resident ri (rep-resented by variable x) applies to a hospital hj (represented by object y). Inthe pseudocode we assume that yRx represents rank(hj , ri). The method storesall assignments involving hospital hj in strict preference order, with the most-preferred resident in y.post[1]. The method loops through each element of they.post array (lines 3 to 7). If ri is already in the list of hj ’s assignees thenno action is taken (lines 4 and 5). If the current value of r (which is initiallyrank(hj , ri)) is less then the value in y.post[i] (line 6), then the value in r isswapped with the value in y.post[i] (line 7) and the loop continues, so that thevalue of r is inserted in order into the y.post array. On termination of the loop, ifthe last element of y.post has been assigned a value (line 8), then hj is assignedcj residents, consequently we can set the maximum value of y.pref (line 9). Thismethod contains only one loop which iterates cj times, and all methods used areof O(1) complexity. Hence the complexity of xAppliesTo is O(c).

A hospital hj (represented by object y) offers a post to a resident ri (repre-sented by variable x) if ri occupies one of the first cj undeleted entries in hj ’s pref-erence list. Correspondingly, y offers a post to x if rank(hj , ri) is one of the firsty.cap values in dom(y.pref). To test for this condition we use the getLastChoicemethod of Figure 4(b), which returns hj ’s rank of the worst resident that it cancurrently offer a post to. Firstly the lowest value in dom(y.pref) is found (line2). The loop then iterates to find the rth-largest rank in dom(y.pref), wherer = y.cap (lines 3 and 4). This value is then returned via variable choice (line5). The time complexity of this method is again O(c).

5.2 The HR2 constraint

A binary Hospitals / Residents constraint (HR2) is an object that acts betweena variable x (representing a resident ri ∈ R) and an object y (representing ahospital hj ∈ H), and has attributes x, y, xRy and yRx. Here, yRx is as above(representing rank(hj , ri)), whilst xRy represents rank(ri, hj).

36

1. deltaX(C) 2. if getMin(C.x) = C.xRy3. xAppliesTo(C.y,C.yRx);4. if getMax(C.x) < C.xRy5. remVal(C.y.pref,C.yRx);

1. deltaY(C) 2. if C.yRx ≤ getLastChoice(C.y)3. setMax(C.x,C.xRy);4. if getMax(C.y.pref) < C.yRx5. remVal(C.x,C.xRy);

Fig. 5. (a) Method deltaX(C). (b) Method deltaY(C).

Therefore a constraint C between xi and yi is constructed via a call tothe function C = HR2(xi, rank(ri, hj), yj , rank(hj , ri)). This will constructa constraint C such that C.x = xi, C.y = yj , C.xRy = rank(ri, hj) andC.yRx = rank(hj , ri). To construct our encoding we would then make callsto HR2 for all i and j where ri and hj find each other acceptable, thus creatingO(nm) constraints.

Three methods, deltaX, deltaY , and init, act on a constraint C and achievearc consistency between a resident x and hospital y across C. The deltaXmethod, shown in Figure 5(a), is called when a value is removed from dom(x).If ri’s most-preferred undeleted hospital is hj (line 2) then ri applies to hj (line3). In the call to xAppliesTo, ri becomes assigned to hj if the assignment hasnot already been made (line 7 of xAppliesTo), and if hj is now full, then thetail of hj ’s preference list is cropped (line 9 of xAppliesTo), and this will in turngenerate a call to deltaY (described below). If ri prefers his worst undeletedhospital to hj (line 4), then hj has been deleted from ri’s preference list, andconsequently ri is deleted from hj ’s list (line 5) – this in turn will generate a callto deltaY , which is now described.

The deltaY method, shown in Figure 5(b), is called when a value is removedfrom dom(y.pref). If resident ri is among the first cj undeleted residents on hj ’spreference list (line 2), then ri need consider no hospital that it finds inferior tohj (line 3). This action may delete values from the domain of x and subsequentlygenerate calls to deltaX. If hj prefers its worst undeleted resident to ri (line 4),then ri has been deleted from hj ’s preference list, and consequently hj is deletedfrom ri’s list (line 5). This may then generate calls to deltaX. Note also thatlines 4 and 5 of deltaY are symmetrical to lines 4 and 5 in deltaX.

Finally, the init(C) method is called to start the process of making constraintC arc consistent, and makes calls to the deltaX(C) and deltaY (C) methods.

5.3 Complexity

The deltaX method has no loops and thus its time complexity is that of the mostcomplex method it calls, which is the xAppliesTo method with a complexity ofO(c), consequently deltaX has a complexity of O(c). Similarly the complexityof the deltaY method is that of the most complex method it calls, which isgetLastChoice, with a complexity of O(c). Both of the methods called by initthus have a time complexity of O(c), and hence init’s complexity is also O(c).

Each HR2 constraint C has three methods. The init(C) method will be calledonly once and is of complexity O(c). The deltaX(C) method can at worst be

37

called once for each value in the domain of C.x. As the maximum length ofa resident’s preference list is m, and deltaX(C) has a complexity of O(c), thecombined worst case complexity of all possible calls to deltaX(C) is O(mc).Similarly deltaY (C) can at worst be called once for each of the n possible valuesin the domain of C.y.pref . As deltaY (C) has a complexity of O(c), the combinedworst case complexity of all possible calls to deltaY (C) is O(nc). Therefore theoverall worst case time complexity for a single constraint is O(c(m + n)), andas there are L of the HR2 constraints, the overall time complexity of enforcingarc consistency on this model is O(Lc(n + m)), which is the same as the timecomplexity for the model of Section 4. Furthermore, as there are O(nm) HR2constraints, each of size O(1), the space complexity of a model using the HR2constraint is O(nm).

6 A specialised n-ary constraint

We now present a specialised n-ary constraint HRN for the Hospitals / Residentsproblem. This constraint acts between an array of integer variables, x[1], . . . , x[n],representing the residents (as before), and an array of objects of type Hospital,y[1], . . . , y[m], representing the hospitals (again, as before). (Strictly speakingthe arity of the HRN constraint is n + m, but for simplicity we refer to it as ann-ary constraint.) A model based on HRN requires only one constraint for thewhole problem. Henceforth we assume that we have access to the hospital classand all the same functions as with the binary constraint defined in Section 5.

6.1 The Constraint

An n-ary Hospitals / Residents constraint (HRN) is an object that acts betweenan array of residents and an array of hospitals, and has the following attributes:

– x is an array of constrained integer variables representing the residents, suchthat resident ri ∈ R is represented by x[i].

– y is an array of objects of type Hospital representing the hospitals, such thathospital hj ∈ H is represented by y[j].

– xRy is an n ×m integer array such that xRy[i][j] = rank(ri, hj) if ri findshj acceptable, and is 0 otherwise.

– yRx is an m× n integer array such that yRx[j][i] = rank(hj , ri) if hj findsri acceptable, and is 0 otherwise.

– xpl is an n × m integer array such that, for each i (1 ≤ i ≤ n) and k(1 ≤ k ≤ lri ), xpl[i][k] = j if and only if rank(ri, hj) = k.

– ypl is an m × n integer array such that, for each j (1 ≤ j ≤ m) and k(1 ≤ k ≤ lhj ), ypl[j][k] = i if and only if rank(hj , ri) = k.

Again, we have three methods that act on an n-ary constraint C, namely deltaX,deltaY and init. The deltaX method, shown in Figure 6(a), is called when avalue a, where a < m+1, is removed from dom(x[i]). If a is the rank of a hospitalhk that ri prefers to his most-preferred undeleted hospital (line 2) (i.e. ri has

38

1. deltaX(C,i,a) 2. if (a < getMin(C.x[i]))3. k = getMin(C.x[i]);4. j = C.xpl[i][k];5. xAppliesTo(C.y[j],C.yRx[j][i]);6. else7. j = C.xpl[i][a];8. remVal(C.y[j].pref,C.yRx[j][i])

1. deltaY(C,j,a) 2. if (a > getMax(C.y[j].pref))3. i = C.ypl[j][a];4. remVal(C.x[i],C.xRy[i][j]);5. else6. k = getMin(C.y[j].pref);7. for (z=1 to C.y[j].cap)8. i = C.ypl[j][k];9. setMax(C.x[i],C.xRy[i][j])10. k = getNext(C.y[j].pref,k)

Fig. 6. (a) Method deltaX. (b) Method deltaY .

been rejected by hk), the index j of ri’s new favourite hospital is found (lines3 and 4) and ri applies to hj (line 5). This may result in a subsequent call todeltaY via the xAppliesTo method. If the rank of ri’s most-preferred undeletedhospital is not larger than a, the hospital hj at position a of ri’s list is found(line 7), and ri is deleted from hj ’s preference list (line 8). This will generate acall to deltaY (C, j, C.yRx[j][i]), which is now described.

The deltaY method, shown in Figure 6(b), is called when a value a, wherea < n + 1, is removed from dom(y[j].pref). If the removed value a is largerthan the rank of hj ’s worst undeleted resident (line 2), then the resident ri atposition a of hj ’s list is found (line 3), and hj is deleted from ri’s preference list(line 4). This will in turn generate a call to deltaX(C, i, C.xRy[i][j]). If a is notlarger than the rank of hj ’s worst undeleted resident (line 5), then hj will offera post to the first cj undeleted residents on its list (lines 6 to 10). Lines 6 and 8identify the most-preferred undeleted resident ri and his corresponding rank kin hj ’s list. All hospitals inferior to hj are then deleted from ri’s list (line 9). Wethen identify the next undeleted resident in hj ’s list (line 10) whilst respectinghj ’s capacity (controlled by the loop condition in line 7). Essentially, lines 6to 10 reconstruct the offers from hospital hj following the removal of a fromdom(y[j].pref). Note that the call to setMax in line 9 may in turn generatecalls to deltaX. Therefore the propagation of this constraint results from themutual recursion between methods deltaX and deltaY .

Finally the init method makes calls to deltaX(C, i, 0) for all i (1 ≤ i ≤ n),and deltaY (C, j, 0) for all j (1 ≤ j ≤ m).

6.2 Complexity

The deltaX method of this section contains no loops, but calls the xAppliesTo()method which has a complexity of O(c), and thus deltaX also has a complexityof O(c). The deltaY method contains only one loop, which iterates cj times,and all methods used run in O(1) time. Therefore the time complexity of deltaYis also O(c). The deltaX method can be called at most once for each value inthe domain of an x[i] variable, and similarly deltaY can be called at most oncefor each value in the domain of the pref attribute of a y[j] variable. Therefore

39

Model: Cloned CBM HR2 HRN

Time: O((n + m)4 c) O(Lc(n + m)) O(Lc(n + m)) O(Lc)

Space: O((nmc)2 ) O(Lc) O(nm) O(nm)

Table 1. Summary of time and space complexities for the HR models of this paper.

we have a time complexity of O(Lc). Hence the time complexity for the HRNconstraint improves those of the models presented in earlier sections. The spacecomplexity of this encoding is dominated by the ranking arrays xRy and yRx,and is O(nm), though comparable to that of the model presented in Section 5.However, if preference lists are short we may economically trade time for space,or use some sparse data structure, or a hash table to map preferences to indices.

Table 1 summarises the time and space complexities for the HR models inthis paper (the columns refer respectively to the models in Sections 3, 4, 5 and 6).

6.3 Searching for all solutions, using HR2 or HRN

Arc consistency processing on the HR2 and HRN constraints yields the GS −domains as defined in Section 4. A search process need only consider the residentvariables (and need not instantiate the hospital variables), following a similarprocess to that outlined in Theorem 4. Because the search process will back-track, the variable y[j].post would need to be reversible, in order that valuescorresponding to assignment information can be restored on backtracking.

Until now we have assumed that values are removed only as a result of arcconsistency processing. This is not true with the backtracking search. Conse-quently we require minor modifications to our methods. For the HR2 constraintthe deltaX method needs to consider the case when C.xRy < getMin(C.x),i.e. ri prefers hj to each undeleted hospital on his preference list. Therefore toprevent (ri, hj) being a blocking pair, hj must be full and must prefer its worstresident to ri, i.e. we then make a call to setMax(C.y, C.yRx− 1).

For the n-ary constraint HRN, deltaX must consider the case where thedeleted value a is less than the smallest remaining value in the domain of C.x[i],i.e. a < getMin(C.x[i]). Therefore again, to prevent (ri, hj) being a blocking pair(where j = C.xpl[i][a]), we make the call setMax(C.y[j].pref, C.yRx[j][i]− 1).

7 Computational experience

The four encodings presented in this paper were implemented using the JSolvertoolkit, i.e. the Java version of ILOG Solver, in order to carry out an empiricalanalysis. The objective was to compare the runtimes for these models as appliedto randomly-generated and real-world data. Our studies were carried out using a2.8Ghz Pentium 4 processor with 512 Mb of RAM, running Microsoft WindowsXP Professional and Java2 SDK 1.4.2.6 with an increased heap size of 512 Mb.

40

50/13/4 100/20/5 500/63/8 1k/100/10 5k/250/20 20k/550/37 50k/1.2k/42

Cloned 5.84 − − − − − −CBM 0.24 0.36 1.69 4.75 − − −HR2 0.15 0.18 0.42 0.88 9.91 112 −HRN 0.12 0.15 0.19 0.22 0.53 1.42 4.2

Table 2. Average computation times in seconds to find all solutions to 100 randomly-generated HR instances with attributes n/m/c.

Random problem instances were generated with varying number of residentsn, number of hospitals m, capacity c (uniform for each hospital), and a fixed res-idents’ preference list size of 10. Hence we classify problems via the triple n/m/c.Instances were generated as follows. First, a uniformly random preference list oflength 10 was produced for each resident, then a preference list was producedfor each hospital by randomly permuting their acceptable residents. A samplesize of 100 was used for each value of n/m/c.

Table 2 shows the mean time in seconds to construct the model and findall solutions, for the each of the four models applied to random instances withvarying n/m/c attributes. A table entry of − signifies that there was insufficientspace to create the model of that size using the specified encoding. Table 3shows the time to establish AC (shown as “AC”) and find all solutions (shownas “ALL”) to three anonymised HR instances arising from SPA [11]. The firstcolumn indicates n/m/c, where c is the average hospital capacity; also lri ≤ 5 ineach case. (For each instance, the Cloned model ran out of memory.)

The results indicate that the HRN model was typically able to handle largerproblem instances than the other models, and the average runtime was fasterthan for the other models in all cases. The HRN model was also applied toinstances as large as 500k/11.8k/85, finding all solutions on average in 35 sec-onds. As mentioned in the Introduction, instances of the NRMP typically involvearound 31,000 residents and 2,300 hospitals, with residents’ preference lists ofsize between 4 and 7 [20]. The HRN model finds all solutions to problems of size200k/3k/67 in 22 seconds on average. This leads us to believe that ConstraintProgramming is indeed a suitable technology for the HR problem.

# Solutions CBM HR2 HRN

AC ALL AC ALL AC ALL

502/41/13.2 1 1.61 1.64 0.26 0.28 0.17 0.17

510/43/11.5 1 1.64 1.7 0.27 0.31 0.17 0.17

245/34/3.9 1 0.26 0.26 0.14 0.16 0.12 0.12

Table 3. Time taken to establish AC and find all solutions to three SPA instances.

41

8 Conclusions and future work

In this paper we have presented four CP models of an HR instance. The empiricalresults for the models as presented in Section 7 are broadly in line with whatmay be expected, given the summary of time and space complexities presentedin Table 1. Our results indicate that, as is the case for SMI [6], CSP encodingsof HR are “tractable”, a notion that has been explored in detail by Green andCohen [9]. However it remains open as to whether there exists a CSP encodingof HR that gives rise to the GS-lists, for which AC may be established in O(L)time and using O(nm) space. The time complexity of O(L) is optimal, since SMis a special case of HR, and a lower bound of Ω(L) holds for the problem offinding a stable matching, given an instance of SM [19].

The natural extension of this work is to build additional constraints on topof one of the models presented here, in order to cope with generalisations of HRfor which the RGS and HGS algorithms are inapplicable. Section 1 describedthree possible variants of HR that are relevant in this context. One of these wasthe Hospitals / Residents problem with Ties (HRT), which arises when ties arepermitted in the preference lists of hospitals and/or residents. For example, apopular hospital may be indifferent among several applicants. The SPA scheme[11] already permits ties in the hospitals’ lists. However it is known [16] that, inthe presence of ties, stable matchings can be of different sizes, and the problemof finding a maximum stable matching is NP-hard, even for very restricted in-stances of SMI with ties. It has already been demonstrated [7, 8] that the earlierencodings of [6] can be extended to the case where preference lists in a givenSMI instance may involve ties. We have begun to consider the correspondingextension of the models presented in Sections 4, 5 and 6 to the HRT case, andfurther details will appear elsewhere.

Acknowledgement

The authors are grateful to ILOG SA for providing access to the JSolver toolkitvia an Academic Grant Licence.

References

1. B. Aldershof, O.M. Carducci, and D.C. Lorenc. Refined inequalities for stablemarriage. Constraints, 4:281–292, 1999.

2. C. Bessiere and J-C. Regin. Arc consistency for general constraint networks: Pre-liminary results. In Proceedings of IJCAI ’97, vol. 1, pp. 398–404. Morgan Kauf-mann, 1997.

3. Canadian Resident Matching Service. How the matching algorithm works. Webdocument available at http://www.carms.ca/matching/algorith.htm.

4. D. Gale and L.S. Shapley. College admissions and the stability of marriage. Amer-ican Mathematical Monthly, 69:9–15, 1962.

5. D. Gale and M. Sotomayor. Some remarks on the stable matching problem. Dis-crete Applied Mathematics, 11:223–232, 1985.

42

6. I.P. Gent, R.W. Irving, D.F. Manlove, P. Prosser, and B.M. Smith. A constraintprogramming approach to the stable marriage problem. In Proceedings of CP ’01,LNCS vol. 2239, pp. 225–239. Springer, 2001.

7. I.P. Gent and P. Prosser. An empirical study of the stable marriage problem withties and incomplete lists. In Proceedings of ECAI ’02, pp. 141–145. IOS Press, 2002.

8. I.P. Gent and P. Prosser. SAT encodings of the stable marriage problem with tiesand incomplete lists. In Proceedings of SAT ’02, pp. 133–140, 2002.

9. M.J. Green and D.A. Cohen. Tractability by approximating constraint languages.In Proceedings of CP ’03, LNCS vol. 2833, pp. 392–406. Springer, 2003.

10. D. Gusfield and R.W. Irving. The Stable Marriage Problem: Structure and Algo-rithms. MIT Press, 1989.

11. R.W. Irving. Matching medical students to pairs of hospitals: a new variation ona well-known theme. In Proceedings of ESA ’98, LNCS vol. 1461, pp. 381–392.Springer, 1998.

12. A. Kato. Complexity of the sex-equal stable marriage problem. Japan Journal ofIndustrial and Applied Mathematics, 10:1–19, 1993.

13. D.E. Knuth. Mariages Stables Les Presses de L’Universite de Montreal, 1976.14. I.J. Lustig and J. Puget. Program does not equal program: constraint programming

and its relationship to mathematical programming. Interfaces, 31:29–53, 2001.15. A.K. Mackworth. Consistency in networks of relations. Artificial Intelligence,

8:99–118, 1977.16. D.F. Manlove, R.W. Irving, K. Iwama, S. Miyazaki, and Y. Morita. Hard variants

of stable marriage. Theoretical Computer Science, 276 (1-2) : 261–279, 2002.17. D.F. Manlove and G. O’Malley. Modelling and solving the stable marriage problem

using constraint programming. In Proceedings of the Fifth Workshop on Modellingand Solving Problems with Constraints, held at IJCAI ’05, pp. 10–17, 2005.

18. D.F. Manlove, G. O’Malley, P. Prosser and C. Unsworth. A Constraint Program-ming Approach to the Hospitals / Residents Problem. Technical Report TR-2005-196 of the Computing Science Department of Glasgow University, 2005.

19. C. Ng and D.S. Hirschberg. Lower bounds for the stable marriage problem and itsvariants. SIAM Journal on Computing, 19:71–77, 1990.

20. National Resident Matching Program. About the NRMP. Web document availableat http://www.nrmp.org/about_nrmp/how.html.

21. E. Ronn. NP-complete stable matching problems. J. Algorithms, 11:285–304, 1990.22. A.E. Roth. The evolution of the labor market for medical interns and residents: a

case study in game theory. Journal of Political Economy, 92(6):991–1016, 1984.23. A.E. Roth. On the allocation of residents to rural hospitals: a general property of

two-sided matching markets. Econometrica, 54:425–427, 1986.24. A.E. Roth and M.A.O. Sotomayor. Two-sided matching: a study in game-theoretic

modeling and analysis. Cambridge University Press, 1990.25. C. Unsworth and P. Prosser. An n-ary constraint for the stable marriage problem.

In Proceedings of the Fifth Workshop on Modelling and Solving Problems withConstraints, held at IJCAI ’05, pp. 32–38, 2005.

26. C. Unsworth and P. Prosser. A specialised binary constraint for the stable marriageproblem. In Proceedings of SARA ’05, LNAI vol. 3607, pp. 218-233. Springer, 2005.

27. P. van Hentenryck, Y. Deville, and C-M. Teng. A generic arc-consistency algorithmand its specializations. Artificial Intelligence, 57:291–321, 1992.

43

Optimization Models for Generating Graduation Roadmaps

Avi Dechter1, Rina Dechter2

1 California State University, Northridge [email protected]

2 University of California, Irvine

[email protected]

Abstract. Most bachelor’s degree granting institutions in the U.S. are “four-year” colleges in name only. The majority of students who enter such colleges take far longer than four years to graduate. In an effort to help students to plan their studies more effectively and reduce time-to-degree, the California State University has introduced “graduation roadmaps,” which are sample plans that demonstrate how a particular degree program may be completed within a given amount of time. In this paper we develop integer programming and constraints programming models for facilitating the process of generating graduation roadmaps, which also allow customizing these roadmap for the preferences of individual students.

INTRODUCTION

A recent report ([Carey, 2004]), citing data collected by the U.S. Department of Edu-cation’s Graduation Rate Survey, shows that only 37% of first-time freshmen entering four-year bachelor’s degree programs in American universities and colleges complete their degrees within four years. Only 63% of the students complete their degrees in six years – the more common time frame used to report graduation rates. Graduation rates vary wildly among institutions with overall six-year graduation rates ranging from less than 10% to almost 100%. Understandably, graduation rates and their com-panion measure of institutional effectiveness – the time to bachelor’s degree comple-tion (time-to-degree) – are of major concern to many universities and colleges.

The California State University, where graduation rates have been significantly lower than national averages (although somewhat better than its peer public institu-tions), has been under pressure to improve its graduation rates. The overall four-year graduation rate at CSU is just 8% and the six-year graduation rate is 40%. A CSU Task Force on Facilitating Graduation, which had been formed to address this prob-lem, considered many policy options that are available to the institution ([CSU, 2002]). Prominent among the task force’s recommendations was that the CSU cam-puses should develop 4-year, 5-year, and 6-year “graduation roadmaps” for all aca-demic degree programs. These roadmaps “should be term-by-term depictions of the

44

courses in which students should enroll over the entirety of their academic careers.” The rationale for this recommendation is two-fold: (1) such roadmaps would serve as examples for students to assist them in planning their own individual pathways for graduation, and (2) the development of the roadmaps would require departments to review their curricula and their scheduling practices and to make sure that students can, in fact, graduate on a 4-, 5-, or 6-year timetable.

Pursuing a college degree has many of the characteristics of a project. Like the building of a warehouse or the development of computer software, completing a col-lege degree is a “one-time endeavor with a well-defined end result, defined in terms of identifiable activities that must be accomplished in order to bring the project to completion” [Meredith, 2000]. Enrolling in and successfully completing courses are the activities that define a degree program; course prerequisites are similar to the precedence relationships common among project activities; and the timely completion of the degree, just like most projects, is a primary objective. Table 1 summarizes the parallels between a degree program and a project.

Project Degree Program Activities Courses Precedence relationships Prerequisite requirements Resource limitations Study-load limits Goal: minimize completion time Goal: minimize time-to-degree

Table 1. Parallels between a Project and Degree Program

There are, however, two significant differences between completing a college de-gree and a “standard” project. First, a standard project is defined in terms of a single set of activities, all of which must be accomplished in order to complete the project. In contrast, a given college degree can be earned by taking many different sets of courses. Second, precedence constraints in projects involve a fixed set of activities all of which must be completed before the activity can start. Course prerequisites can of-ten be met by completing different sets of courses. For these reasons, established pro-ject planning and scheduling techniques cannot be used directly for degree planning.

In this paper we address the modeling of the degree planning problem using both integer programming and constraint programming. Integer programming (IP) is an ex-tension of linear programming where variables are restricted to take only integer val-ues. Integer programming has been for several decades an established tool for solving combinatorial optimization problems such as resource-constrained project scheduling problems (e.g., [Patterson, 1974], [Stinson, 1978]). Constraint Programming (CP) is an emergent software technology for declaring and solving constraint satisfaction and constrained optimization problems, including scheduling problems ([Baptiste, 2001]). Unlike integer programming, it is not restricted to linear constraints.

The remainder of the paper is organized as follows. In the next section we discuss the structure of academic degree requirements and, with the help of a small example, we demonstrate some of the unique characteristics of degree planning. We then de-

45

velop both an IP model and CP model for solving this example and demonstrate the advantages of the constraint programming modeling approach. Lastly, we discuss the potential benefits of using optimization technology for degree planning and suggest future research.

THE STRUCTURE OF DEGREE REQUIREMENTS

The requirements for a degree are typically specified in terms of required and elec-tive courses. A required course is a course that every student in a degree program must take to qualify for the degree. Typically the required courses are grouped in clus-ters of related courses, all of which must be taken. For example, the requirements for the BS degree in accountancy at CSUN include a cluster of required business courses and a cluster of required accounting courses. While usually a required course is a specific course that must taken, students may be offered more than one way to satisfy a particular requirement. For example, the required business courses for the BS in accountancy include a choice between MKT 304 and MGT 360.

Elective courses are typically specified in terms of “baskets” of courses from which the student must select a specified number of units (or a given number of courses if all the courses in the basket carry the same number of units). For example, the requirements for the BS in Business Administration (BSBA) with option in Mar-keting at CSUN included a basket of Option Elective Courses consisting of 14 courses (all are 3 unit courses) from which the student must elect 6 units. As in the case of re-quired courses, a basket of elective courses may include a group of courses only one of which may be used to satisfy the total units required from this basket. For example, the BSBA with option in Supply Chain Management allows selecting either BUS 491 or SOM 498 (but not both) as one of the two courses that have to be elected from a basket of 10 Option Elective Courses.

In addition to the course requirements for a degree, the student’s plan must satisfy all course prerequisites. The prerequisite requirements are designed to ensure that courses are taken in a logical order that promotes the student’s learning. As demon-strated above, prerequisite requirements could be quite detailed and involve not just courses, but also minimum grade requirements and passing of examinations. We will assume, however, that prerequisite requirements for a course specifies sets of courses all of which the student must have completed before enrolling in that course. There is often more than one way to satisfy the prerequisite requirements For example, for MATH 255A the prerequisite requirement is MATH 105 or both MATH 102 and MATH 104. SOM 409, whose prerequisites are given as SOM 306 and either SOM 307 or MATH 340, provides a different example. (Co-requisites for a course are just like prerequisites except that the student may enroll in them contemporaneously with the course.)

A particular course may be part of more than one requirement. For example, ECON 160 (Microeconomics Principles), which is one of the Lower-Division Busi-ness Core required courses for all BSBA students at CSUN, is in the basket of elec-tive courses that may be used to satisfy the Social Sciences General Education re-quirement of the university. Also, a course may be at the same time a prerequisite for

46

another course and a required (or elective) course in a particular degree program. On the other hand, prerequisite requirements may require students to take courses that are not otherwise required by their programs.

To illustrate the problem of degree planning and scheduling, consider the require-ments for small, fictitious, degree program as described in Table 2.

Course Units Prerequisites Requirements Prerequisite Courses – Select as needed C1 4 None C2 3 None C3 2 C4 C4 3 None Required Courses

0C5 3 C1

C6 4 C1 and either C2 or C3 - or - C7 4 C4 C8 3 C4 Electives – Select at least 6 units from the following C9 3 C5 and C10 C10 3 C6 C11 3 C7, C8 and C10 - or - C12 3 C13 C13 4 C8 Minimum number of units required for degree – 24

Table 2. Sample Requirements for Degree

A network representation of the requirements for our fictitious degree is provided in Figure 1. The nodes of the network represent the courses and the directed arcs rep-resent prerequisite requirements in the obvious way. Additional constructs are used to represent the clustering of the courses and the choices available to students in satisfy-ing the various requirements.

47

Or

OrOr

C9(3)

C6(4)

C5(3)

C13(4)

C12(3)

C11(3)

C10(3)

C4(3)

C3(2)

C2(3)

C1(4)

C8(3)

C7(4)

Electives (min 6 units)Required CoursesPrerequisites

Fig. 1. A Network Representation of Degree Requirements

Degree planning requires making two distinct but interdependent decisions: the se-lection of a subset of courses to take and the scheduling of the selected courses for specific terms. The selected courses must satisfy the degree requirements. The sched-uling of the courses must satisfy additional constraints (study load constraints and perhaps limited availability of courses). We refer to a subset of courses that satisfy the requirement for a degree as a degree plan. The total number of different degree plans in this case is 58. Two of these plans are shown in Figure 2, where the courses se-lected for each plan are highlighted.

Plan A

Or

OrOr

C9(3)

C6(4)

C5(3)

C13(4)

C12(3)

C11(3)

C10(3)

C4(3)

C3(2)

C2(3)

C1(4)

C8(3)

C7(4)


Plan B

Or

OrOr

C9(3)

C6(4)

C5(3)

C13(4)

C12(3)

C11(3)

C10(3)

C4(3)

C3(2)

C2(3)

C1(4)

C8(3)

C7(4)


Fig. 2. Two Course Degree Plans for the Example

Plan A calls for taking 7 courses totaling 24 units. Plan B consists of 8 courses to-taling 27 units. However, if there are no limits on the number of units a student may take in a single term, plan B may be completed in fewer terms than plan A. This is because the longest path of highlighted nodes (selected courses) in the graph of plan A consists of 4 courses while the longest path in the graph of plan B consists of only 3 courses. A schedule for completing Plan A in its shortest time-to-degree of 4 terms

48

and a schedule for completing Plan B in its shortest time-to-degree of 3 terms are shown in Figure 3.

As this example demonstrates, a smaller program (in terms of the number of units needed to complete a degree) is not necessarily shorter. In fact, Plan A in the only 24 unit plan that satisfies the requirements, which means that in order to minimize the time-to-degree the student must take more than 24 units. For this reason, it would generally not be correct to separate the planning decision from the scheduling deci-sion: they must be considered together. For example, a strategy that would focus on first finding minimum-unit programs is not guaranteed to find plans that are optimal from the time-to-degree standpoint. The problem of simultaneously selecting and scheduling courses so as to minimize the length of the program is more complex to model and solve than standard project scheduling problems where the only issue is that of scheduling activities. When resource constraints are added, complexity in-creases even further. In the next section we develop both an integer programming model and a constraint programming model for solving the graduation roadmap prob-lem.

Term 1 Term 4Term 3Term 2

Plan A

Plan B

C4 (3)

C5 (3)C1 (4) C6 (4)

C12 (3)C13 (4)C8 (3)

C10 (3)

C5 (3)

C6 (4)

C13 (4)C8 (3)C4 (3)

C1 (4)

C2 (3)

Fig. 3. – Minimum Length Schedules for the Two Plans

Modeling the degree planning problem

Traditionally, resource-constrained project scheduling problems have been modeled as integer programming problems. More recently, constraint programming has been used to model and solve a variety of scheduling problems. It has been claimed (e.g., [Caseau, 1997], [Van Hentenryck, 2002]) that constraint programming offers a more flexible and natural way for expressing “real life” constraints than integer program-ming. In some cases, constraint programming schemes yield computational advan-

49

tages as well. In this section we develop, in parallel, an integer programming (IP) model and a constraint programming (CP) model for the fictitious example of the pre-ceding section. By considering both approaches, we wish to determine the effective-ness of each approach to our domain and, at the same time, to provide another real-life benchmark through which the two approaches can be compared and contrasted.

In order to make the example more representative, we assume that no more than 6 units may be scheduled in any term and. Both modeling approaches are similar in that they require the specification of (decision) variables, the cost (objective) function, and the constraints.

Decision Variables

The IP model requires the use of binary (zero-one) decision variables indicating whether or not a particular course is scheduled for a particular term. Defining these variables requires specifying a sufficiently large “planning horizon,” T, such that it can be shown that the degree program can be completed in T terms or less. In our ex-ample, an easy choice of a value for T is 13, the total number of courses available for selection, which would be the number of terms it would take to complete the program if all the courses are taken and only one course is taken per term. It is easy to show that the degree program can be completed in less than 8 terms so we reduce the value assigned to T to 8. Thus, we define the variables as follows:

1 if course is scheduled for term 1 13,1 8

0 otherwiseit

i tx i t

= ≤ ≤ ≤ ≤

It is convenient to define in the IP model an additional binary variable, iy , for each of

the courses, indicating whether the course is selected for the program:

1 if course is selected for the program1 13

0 otherwise i

iy i

= ≤ ≤

The relationships between the two sets of variables require the following constraints:

8

1

1 13i itt

y x i=

= ≤ ≤∑

These constraints also guarantee that each course is scheduled to no more that one term.

The reason the IP model must use binary variables it that they enable, as will be shown later, expressing the prerequisite requirements as linear constraints. Since con-straint programming is not restricted to the use of linear constrains, it allows a much

50

more natural choice of decision variables that indicate the term to which each course

is scheduled. In the CP model we associate with each course i a variable is defined as

follows:

if course is scheduled for term , 1,2,...81 13

0 if course is not selected for the programi

t i t ts i

i

== ≤ ≤

The Objective Function

The objective is to minimize the time-to-degree, that is, the value of the index t of the latest term for which any course is scheduled. For this purpose we define, in both modeling approaches, an additional integer variable, Z, to represent this value, so that the objective function is simply stated as

Minimize Z

To guarantee that no course is scheduled later than Z, the IP model must include the following set of constraints:

8

1

1 13itt

tx Z i=

≤ ≤ ≤∑

Similarly, in the CP model the following constraints must be satisfied:

1 13is Z i≤ ≤ ≤

The Constraints

Both IP and CP models must include constraints that will ensure that the solutions they produce satisfy all the degree requirements (i.e., required courses, elective courses, prerequisite requirements and minimum unit requirement) as well as term study-load limits. These constraints will now be discussed in order.

Required Courses

Courses C5 and C8, which must be included, require the following constraints in the IP model:

51

5

8

1

1

y

y

==

Similarly, in the CP model we need to include

5

8

0

0

s

s

>>

Since either C6 or C7 may be selected to satisfy the remaining requirement, the IP model must include the following constraint:

6 7 1y y+ ≥

In the CP model, which permits logical, as well as mathematical constraints, this re-quirement may be simply specified as

( ) ( )6 70 or 0s s> >

Elective Course Requirement

This requirement has two components: the list of electives from which courses may be selected (in our case, courses C9 through C13) and a minimum number of units that the selected courses must represent (6 units in this example). Courses C11 and C12 may not both be counted toward satisfying the unit requirement. The constraints that would ensure that both courses are counted should not, however, exclude the pos-sibility that both courses are included in the program (perhaps to satisfy other re-quirements). In the IP model, this can be accomplished by defining two auxiliary

variables, 11y′ and 12y′ to represent whether C11 and C12, respectively, are used to

satisfy the elective requirement. These variables are defined as follows:

1 if course is used to satisfy the elective requirement11,12

0 otherwisei

iy i

′ = ∈

Now the elective requirement can be represented in the model by including the fol-lowing constraints:

52

11 11

12 12

11 12

9 10 11 12 13

1

3 3 3 3 3 6

y y

y y

y y

y y y y y

′ ≤′ ≤′ ′+ ≤

′ ′+ + + + ≥

The first two constraints ensure that courses C11 and C12, respectively, cannot be used to satisfy the elective requirement if they are not included in the program. The third constraint ensures that at most one of then is used to satisfy the requirement, and the last constraint takes care of the minimum units that need to be selected.

Representing the elective course requirement in the CP model does not require any auxiliary variables and can be accomplished by the following constraint:

( ) ( ) ( ) ( )( ) ( )9 10 11 12 133 0 3 0 max 3 0 ,3 0 3 0 6s s s s s> + > + > > + > ≥

This constraint requires an explanation. In constraint programming when an expres-sion is true, it given a value of 1 and if it is false it evaluated to 0. Thus, the expres-

sion ( )9 0s > has a value of 1 if course C9 is included in the schedule. The expres-

sion ( ) ( )( )11 12max 3 0 ,3 0s s> > makes sure that, if both C11 and C12 are

included in the schedule, the units of just one of them will be counted toward satisfy-ing the requirement.

Prerequisite Requirements

The prerequisite requirements for a course have to be satisfied only if that course is included in the degree plan. For example, since C1 is a prerequisite of C5, a condi-

tional constraint must be added, of the form “if C5 is scheduled for period 5t , then

C1 must be scheduled for a period 1 5t t< .” In the IP model this would achieved by

adding the following two constraints to the problem:

( ) ( )1 5

8 8

1 5 51 1

1 9 1t tt t

y y

tx t x y= =

≥

− − ≤ −∑ ∑

The first constraint guarantees that if C5 is included in the plan, C1 must also be included. The second constraint guarantees that if C5 is included in the plan, then the term for which C1 is scheduled must be earlier than the term for which C5 is sched-uled. The coefficient 9 (that is, the planning horizon +1) is sufficiently large so that if

5 0y = the constraint becomes redundant.

53

In the CP model, the constraint that would satisfy the relationship between C5 and C1 can simply be written as:

( ) ( )5 1 50 0s s s> ⇒ < <

where the symbol ⇒ indicates logical implication. Similar constraints are needed, in both the IP and the CP models, for all pairs of courses between which there is a sim-ple prerequisite relationship like that between C5 and C1.

The case of course C6, where either C2 or C3 would satisfy a prerequisite require-ment, is somewhat more involved. Constraints are needed to ensure that if C6 is se-lected for the program, then at least one of the courses C2 and C3 are scheduled for an earlier term. These constraints, however, should not go so far as to require that both

C2 and C3 satisfy this condition. In the IP model, two auxiliary variables, 6,2y′′

and 6,3y′′ , are needed to represent whether C2 and C3, respectively, satisfy this prereq-

uisite requirement for C6. These variables are defined as follows:

6,

1 if course satisfies the prerequisite requirement of C22,3

0 otherwisej

jy J

′′ = ∈

Now the elective requirement can be represented in the model by including the fol-lowing constraints:

( ) ( )

( ) ( )

6,2 2

6,3 3

6,2 6,3 6

8 8

2, 6, 6,21 1

8 8

3, 6, 6,31 1

1 9 1

1 9 1

t tt t

t tt t

y y

y y

y y y

tx t x y

tx t x y

= =

= =

′′ ≤′′ ≤′′ ′′+ ≥

′′− − ≤ −

′′− − ≤ −

∑ ∑

∑ ∑

The first two constraints indicate that courses C2 and C3, respectively, cannot sat-isfy the prerequisite requirement for C6 if they are not included in the program. The third constraint ensures that at least one of them satisfies the prerequisite requirement, and the last two constraints make sure that for C2 (respectively C3) to satisfy the re-quirement it must be scheduled for a term that is earlier than the term for which C6 is scheduled.

As before, representing this requirement in a CP model is far simpler. It does not require any auxiliary variables and can be accomplished with a single constraint:

54

( ) ( ) ( )6 2 6 3 60 0 or 0s s s s s> ⇒ < < < <

Minimum Unit Requirement

The requirement that a student must take at least 24 units to earn the degree is repre-sented in the IP model by the following constraint:

13

1

24i ii

u y=

≥∑

where iu is the number of units of each course i.

Similarly, in the CP model we need

( )13

1

0 24i ii

u s=

> ≥∑

Study-Load Limits

The number of units taken in each term may not exceed 6. In the IP model the follow-ing set of constraints:

13

1

6 1 8i iti

u x t=

≤ ≤ ≤∑

Similarly, in the CP model we need the constraints:

( )13

1

6 1 8i ii

u s t t=

= ≤ ≤ ≤∑

The complete CP model and IP model for this problem is given in Figure 4 and Figure 5, respectively.

55

( ) ( ) ( )( ) ( )( ) ( ) ( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( )

5 6 7 8

9 10 11 12 13

6 2 6 3 6

13

1

13

1

Minimize

subject to

1 13

0 and 0 or 0 and 0

3 0 3 0 max 3 0 ,3 0 3 0 6

0 0

, s.t. is a unique prerequisite of

0 0 or 0

0 24

i

i j i

i ii

i ii

Z

s Z i

s s s s

s s s s s

s s s

i j j i

s s s s s

u s

u s

=

=

≤ ≤ ≤

> > > >

> + > + > > + > ≥

> ⇒ < <

∀ < >> ⇒ < < < <

> ≥∑

∑ ( )

6 1

0,1,2,...,8 1i

t t T

s i N

= ≤ ≤ ≤

∈ ≤ ≤

Fig. 4. The CP Model for the Example

( )

8

1

8

1

5 8 6 7

11 11 12 12 11 12

9 10 11 12 13

8 8

1 1

M inim ize

subject to

1 13

1 13

1; 1; 1

; ; 1

3 3 3 3 3 6


1 9 1

i itt

itt

j i

jt itt t

Z

y x i

tx Z i

y y y y

y y y y y y

y y y y y

y y i j j i

t x t x y

=

=

= =

= ≤ ≤

≤ ≤ ≤

= = + ≥′ ′ ′ ′≤ ≤ + ≤

′ ′+ + + + ≥≥ ∀ < >

− − ≤ −

∑

∑

∑ ∑ ( )

( ) ( )

( ) ( )

6 ,2 2 6 ,3 3 6 ,2 6 ,3 6

8 8

2 , 6 , 6 ,21 1

8 8

3, 6 , 6 ,31 1

13

1

13

1


; ;

1 9 1

1 9 1

24

6 1 8

0,1 1 13,1 8

0

i

t tt t

t tt t

i ii

i iti

it

i

i j j i

y y y y y y y

tx t x y

tx t x y

u y

u x t

x i t

y

= =

= =

=

=

∀ < >′′ ′′ ′′ ′′≤ ≤ + ≥

′′− − ≤ −

′′− − ≤ −

≥

≤ ≤ ≤

∈ ≤ ≤ ≤ ≤

∈

∑ ∑

∑ ∑

∑

∑

11 12 6,2 6 ,3

,1 1 13

, , , 0,1

i

y y y y

≤ ≤

′ ′ ′′ ′′ ∈

Fig. 5. The IP Model for the Example

56

As Figures 4 and 5 illustrate, the CP model is a more compact representation of the problem than the IP model. This is due to the fact that the CP model, in contrast with the IP model, is not restricted to linear constraints. Some of the variables and con-straints in the IP models represent “formulation tricks” whose only role is to enable the representation of the degree requirements as linear inequalities. The logic-based constraints of the CP model represent these requirements in much more “natural” way. This advantage of constraint programming ([Van Hentenryck, 2002]) becomes more significant in “real-life” situation where degree requirements can be very com-plex, especially if we wish the user (e.g., the student) to incorporate some of his/her specific constraints into the model.

We used OPL Studio to solve the two models developed in this section. IP models are solved by CPLEX and CP models are submitted to ILOG Solver for solution. Both show that a minimum of five terms are needed to complete the degree program. There are seven different plans that would result in completing the degree within that time frame. One of then (which requires the smallest number of units – 25) is shown in Figure 6.

C4 (3)

C5 (3)C1 (4)

C6 (4)

C8 (3)

C3 (2) C10 (3) C9 (3)

Term 1 Term 4Term 3Term 2 Term 5

Fig. 6. A Minimum-Length Degree Plan

Conclusion and suggestions for further research

Completing the requirements for a college degree within a reasonable length of time is a daunting task for many students. Complex degree requirements and rules, the free-dom afforded to students to choose from a large number of courses, and the need to satisfy prerequisite requirements for these courses, make it often difficult for students to plan their individual programs in a way that would reduce, if not minimize, their time-to-degree. In this paper we point to the similarities, as well as the differences, be-tween degree planning and resource-constrained project planning and argue that de-gree planning could benefit from models similar to those used for project planning and scheduling.

With the help of a small example that, nevertheless, illustrates the essence and some of the complexities of the degree-planning problem, we show how the problems may be modeled and solved by two different approaches. Integer programming is the more traditional, operations research based, approach, which depends on “formulation

57

tricks” to overcome the need to represent all the constraints in terms of linear equali-ties and inequalities. Constraint programming is a newer, computer science based, methodology, which allows a more straightforward and “natural” representation of the problem. Further study, currently underway, is needed to assess the practicability of such models for “real life” degree program requirements and to compare computa-tionally the performance of the two approaches. Preliminary experiments show that both models generate optimal plans for real degree programs taken from the CSUN catalogue within acceptable amount of time. The computational aspects of these mod-els, however, are not addressed in this paper and will be studied in the future.

Universities and perhaps other educational institutions may use models like those developed in this paper in several possible ways. They can be used to generate a large variety of “graduation roadmaps” of the type currently required by the California State University for all its degree programs. They can also be used as a basis for automated advisement systems that will allow students to develop and update their own, individual, degree plans. Finally, these models may be very useful for develop-ers of new curricula who could use them to assess the expected time-to-degree of new and modified degree programs.

The models we have proposed in this paper take into account the effects of degree requirements, course prerequisites, and study-load limits on the time-to-degree. Im-plicitly, we have assumed that all courses are available for the student in each term. This assumption is often unjustified because courses may not be offered in each term and, even when they are offered, they may not have enough room to accommodate all the students who may wish to take them. Since course availability is determined by the course schedule, an interesting subject for further study would be investigating how degree planning models may be used to improve course scheduling decisions.

References

Baptiste P., C. Le Pape and W. Nuijten, 2001, Constraint-Based Scheduling: Applying Con-straint Programming to Scheduling Problems, Kluwer Academic Publishers: Boston

Carey, K., 2004, ‘A Matter of Degree: Improving Graduation Rates in Four-Year Colleges and Universities,’ A Report by the Education Trust

Caseau, Y. and F. Laburthe, 1997,‘A Constraint Based Approach to the RCPSP,’ Proceedings of the CP97 Workshop on Industrial Constraint-Directed Scheduling, Schloss Hagenberg, Austria.

California State University, 2002, ‘Facilitating Student Success in Achieving the Baccalaureate Degree,’ Report of the California State University Task Force on Facilitating Graduation.

California State University, Northridge, 2003, Graduation Rates Task Force Report ILOG, 2000, ILOG OPL Optimization Programming Language Reference Manual. Meredith J.R. and S.J. Mantel, Jr., 2000, Project Management: A Managerial Approach, John

Wiley Sons: New York. Patterson, J.H. and W.D. Huber, 1974, ‘A Horizon-Varying, Zero-One Approach to project

Scheduling,’ Management Science, Vol.20, No.6, pp.990-998. Stinson, J.P., E.W. Davis and B.M. Khumawala, 1978, ‘Multiple Resource-Constrained Sched-

uling Using Branch-and-Bound,’ AIIE Transactions, Vol.10, No.3, pp. 252-259. Van Hentenryck, P., 2002, ‘Constraint and Integer Programming in OPL,’ INFORMS Journal

on Computing, Vol.14, No.4, pp.345-372.

58

Models for a variable-sized bin packing problem

Diego Olivier Fernandez Pons

Universite Pierre et Marie Curie, Paris [email protected]

ILOG [email protected]

Abstract. Decomposable structure is classically exploited in linear pro-gramming with polyhedral based decomposition methods such as Ben-ders decomposition.

We will show how to use the decomposable structure of a variable-sizedbin packing problem in constraint programming to derive lower boundsand solve the problem more efficiently.

We will also give evidence that there is still room for improvement, spe-cially in the resolution of sub-problems generated by the decomposition.

1 A variable-sized bin packing problem

In the unsplittable multi-commodity network flow problem with step-increasingcost functions [8] one has to build a network (choose a capacity for every arc ofa graph) and assign a path to each demand in such a way that all commoditiescan flow simultaneously without exceeding the installed capacity (figure 1).

7

7

7

54

4

4

15

15

10

10

Fig. 1. Paths and arc capacity of a network

Benders decomposition [4], branch and cut [3] and branch cut and price [2] meth-ods use a node-path reformulation of the node-arc linear model that introducesbi-partition inequalities (and more generally metric inequalities). These inequali-ties state that for any partition (S, S) of the nodes, the total transversal capacitymust be superior to the total transversal flow (figure 2).

59

S

S

Fig. 2. Demands traversing a cut. The total capacity of the transversal arcs must besuperior to the total transversal flow.

∀S,∑

i∈S,j 6∈S

Demandij ≤∑

i∈S,j 6∈S

Capacityij

If only the partition (S, S) is considered and unsplittable-flow constraints added,computing the minimum cost cut such that all the demands can cross the cut isequivalent to the variable-sized bin packing problem: choose a capacity for eachbin (arc) and assign a bin (arc) to each item (demand).

Side-constraints on the original network design problem generate several kind ofconstraints on the packing sub-problem: limiting the flow or the degree on a node(respectively traffic and port with nomult constraints in [8]) become aggregatedcapacity and cardinality constraints.

∀i ∈ S,∑

j 6∈S

Capacityij ≤ NodeCapacityi

∀i ∈ S, |Capacityij 6= 0| ≤ NodeDegreei

We have retained a simpler version that constrains for each bin the total capacityand the number of items it can contain. This variant is known in the CSPLib(http://4c.ucc.ie/˜tw/csplib/) as “the rack configuration problem” 031.

2 Simple linear and constraint programming models

2.1 A linear formulation with partial symmetry breaking

We begin giving a linear model of the problem:

Let yb ∈ 0, 1n be the indicator vector of the bin type: yb = (0, 0, 0, 1, 0, 0, . . . )that is yk

b = 1 means bin b has type k.Let xi

b be the number of items of type i in bin b

Let Mi be the number of items of type i (constant)

60

min∑

b

(

∑

k

Costkykb

)

∀b,∑

i

Sizei × xib ≤

(

∑

k

MaxCapacityk × ykb

)

(LP 1)

∀b,∑

i

xib ≤

(

∑

k

MaxCardinalityk × ykb

)

(LP 2)

∀i,∑

b

xib = Mi (LP 3)

∀b,∑

k

ykb = 1 (LP 4)

Constraint (LP4) defines the indicator vector yb, while constraints (LP1) and(LP2) ensure each bin is big enough in terms of capacity and cardinality for theitems assigned to it. Constraint (LP3) ensures all items have been assigned toa bin.

Adding lexicographic constraints on the bins (∀b, xb ≤lex xb+1) to the linearmodel showed to be less performant than adding an ordering constraint on bintypes, even if the latter is only a partial symmetry breaking.

Let tb ∈ N be the type of bin b that is [ykb = 1] ⇔ [tb = k]

∀b, tb =∑

k

k × ykb (LP 5)

∀b, tb ≤ tb+1 (LP 6)

2.2 A constraint programming model

The variable-sized bin packing problem “rack configuration problem” appears in[9] where a simple constraint programming model is suggested.

Let yb be the type of bin b

Let xib be the number of items of type i in bin b

Let Mi be the number of items of type i (constant).

min∑

b

Cost[yb]

∀b,∑

i

Sizei × xib ≤ MaxCapacity[yb] (1)

∀b,∑

i

xib ≤ MaxCardinality[yb] (2)

∀i,∑

b

xib = Mi (3)

61

Constraints (1), (2) and (3) are similar to those of the linear formulation.

Two symmetry breaking constraints are also introduced: one on bin types, oneon the bin content when bins have same type.

∀b, yb ≤ yb+1 (4)

∀b, [yb = yb+1] ⇒ [x0b ≤ x0(b+1)] (5)

Kiziltan [7] replaced constraint (5) with a lexicographic ordering of bins withsame type.

∀b, [yb = yb+1] ⇒ [xb ≤lex xb+1] (5 bis)

Both authors first instantiate the bin type and then fill it.

2.3 First experiments

The linear programming model given to ILOG CPLEX 9.1 [5] solves instanta-neously all instances of the CSPLib and all instances generated by Kiziltan. Onthe other hand, the constraint programming model of Van Hentenryck fails tosolve half of the CSPLib problems.

We reproduce its results and those reported by Kiziltan on instances of [7] (Kizil-tan’s experiments were done with ILOG Solver 5.3 on a Pentium III 1Ghz 256Mbof RAM. Ours with a similar configuration at 850MHz).

instance model number of fails time (seconds)

K1 van Hentenryck 3 179 609 151Kiziltan ≤lex 1 938 252 70





K6 van Hentenryck 199 015 12Kiziltan ≤lex 115 308 5

Table 1. Simple models results on the rack configuration problem

The analysis of the results shows that for Kiziltan’s instances, most of the timeis spent in the optimality proof. In the CSPLib problems 3 and 4, the optimalsolution still hasn’t been found after 1 hour.

62

3 CP based decomposition

Benders decomposition methods for the network design problem [4] obtain lowerbounds by projection [1], that is removing part of the initial problem, here theflow variables. The underlying idea is that the number of potential networks(combination of capacities) is significantly smaller than the number of potentialflows (a path in the graph for each demand).

More formally, the problem

min Cost(y) | ∃x, (x, y) is a feasible network and routing plan

where y correspond to the arc-capacity variables and x to the flow variables istransformed in to a problem on y variables only

min Cost(y) | Cλ(y)

where Cλ is an exponential set of constraints, namely the metric inequalities.Bi-partition inequalities

∀S,∑

i∈S,j 6∈S

Demandij ≤∑

i∈S,j 6∈S

Capacityij

are a particular case of metric inequalities.

Because the number of potential combinations of y is small, just enforcing asubset of the constraints Cλ may be enough to find good lower bounds efficiently.

3.1 Projection of the packing problem

We mimic Benders decomposition considering the problem restricted to capacityvariables y only and the following aggregated inequalities (we keep notations ofsection 2.2)

min∑

b

Cost[yb]

∑

i

Sizei × Mi ≤∑

b

MaxCapacity[yb] (π 1)

∑

i

Mi ≤∑

b

MaxCardinality[yb] (π 2)

∀b, yb ≤ yb+1 (4)

Constraints (π1) and (π2) are obtained injecting (3) in (1) and (2) and summingover b. In the process xi

b disappears which means we relax its integrality. In otherwords any solution of the projected model is optimal if items are allowed to splitamong several bins.

63

Lower bounds were tested on CSPLib and Kiziltan instances (table 2). K 5 isan interesting case because it has the longest resolution time with constraintprogramming naive models while there is a single bin combination that may befeasible. These problems is are examples of pairs (x, y) where the number ofcandidate y is very small compared to the size of x.

instance lower bound table size choice points fails seconds

CSPLib 1 550 13 13 1 0CSPLib 2 1100 40 40 1 0.01CSPLib 3 1200 61 60 0 0CSPLib 4 1150 3486 4104 619 0.15

K 1 950 2 2 1 0K 2 950 2 2 1 0K 3 ∞ 0 0 1 0K 4 900 3 3 1 0K 5 1000 1 0 0 0.01K 6 ∞ 0 0 1 0

Table 2. Projection lower bounds for CSPLib and Kiziltan instances

Comparing lower bounds with CPLEX root node and optimal solutions showthat bounds are exact for CSPLib and Kiziltan problems.

instance CPLEX root lower bound optimal

CSPLib 1 535 550 550CSPLib 2 1070 1100 1100CSPLib 3 1170 1200 1200CSPLib 4 1140 1150 1150

K 1 940 950 950K 2 930 950 950K 4 900 900 900K 5 1000 1000 1000

Table 3. Lower bounds compared to CPLEX root and optimal solutions

A small instance named easy (figure 3) has been designed in such a way there isa gap between the lower bound of the splittable item relaxation and the optimalsolution. easy×2 means the number of bins and the number items of each sizehas been doubled. We compare (table 4) the value of CPLEX root node, the lowerbound and the optimal solution. The gap indicates the number of solutions ofthe projection problem that lie between the lower bound and the optimal value(both included).

64

instance CPLEX root lower bound optimal gap

easy 4 230 4 500 4 625 3easy×2 8 640 8 750 9 125 4easy×4 16 920 17 375 17 875 6easy×8 33 840 32 025 35 400 27easy×16 67 680 67 950 70 075 112easy×32 135 360 135 650 140 150 946


# bin types (capacity, cardinality, cost)

200 4 125

150 8 150

200 6 500

250 6 600

# item types (size, number)

20 15

30 8

40 15

50 10

60 10

# total number of bins

10

Fig. 3. easy instance used in our experiments

3.2 Tighter lower bounds

The covering projection problem can be strengthened introducing waste variables

∑

b

MaxCapacity[yb] ≥

(

∑

i

Sizei × Mi

)

(π 1)

∑

b

MaxCardinality[yb] ≥

(

∑

i

Mi

)

(π 2)

become

∑

b

MaxCapacity[yb] =

(

∑

i

Sizei × Mi

)

+∑

b

WastedCapacityb (σ 1)

∑

b

MaxCardinality[yb] =

(

∑

i

Mi

)

+∑

b

WastedCardinalityb (σ 2)

65

where the wasted variables are linked to bin capacities by the relations

∀b,

(

∑

i

Sizei × xib

)

+ WastedCapacityb = MaxCapacity[yb]

∀b,

(

∑

i

xib

)

+ WastedCardinalityb = MaxCardinality[yb]

To make disappear variables xib we enumerate them, in other words we consider

all possible packings of a bin (that is all xi vectors) and compute every feasibletuple (yb,WastedCapacityb,WastedCardinalityb).

For the easy family we obtain 223 tuples which can be reduced to 77 using adomination rule: only bins of minimal are kept because there is no point in usinga bin more expensive that a bin where xi already fits.

With ILOG Solver[6], variables can be constrained to take their values in a tuplelist with a table constraint. To be able to compute strengthened lower boundsalso with a linear solver, tuples are expanded: to each tuple t = (type, cap, card)is associated a variable nt which counts the number of times a bin of type type

with wasted capacity cap and wasted cardinality card appears in the optimalsolution.

The problem becomes

min∑

t

Cost[nt]

∑

t

(MaxCapacity[typet] − capat) × nt =∑

i

Sizei × Mi (σ LP 1)

∑

t

(MaxCardinality[typet] − cardt) × nt =∑

i

Mi (σ LP 2)

∑

t

nt = NumberOfBins (σ LP 3)

For instances easy to easy×32, the strengthened lower bound reach the optimalsolution of the problem.

66

4 Two phases constraint programming models

It is interesting to notice that just introducing waste variables in (1) and (2),their summations (σ1), (σ2) and changing slightly the branching scheme (allbin types are instantiated and then they are filled) allows to solve all Kiziltanproblems instantaneously. Adding the table of discrete wasted values allows tosolve all CSPLib instances and a few easy instances (table 5).

min∑

b

Cost[yb]

∀b,

(

∑

i

Sizei × xib

)

+ WastedCapacityb = MaxCapacity[yb] (1 bis)

∀b,

(

∑

i

xib

)

+ WastedCardinalityb = MaxCardinality[yb] (2 bis)

∑

b

MaxCapacity[yb] =

(

∑

i

Sizei × Mi

)

+∑

b

WastedCapacityb (σ 1)

∑

b

MaxCardinality[yb] =

(

∑

i

Mi

)

+∑

b

WastedCardinalityb (σ 2)

∀i,∑

b

xib = Mi (3)

∀b, yb ≤ yb+1 (4)

∀b, [yb = yb+1] ⇒ [xb ≤lex xb+1] (5 bis)

∀b, (yb,WastedCapacityb,WastedCardinalityb) ∈ table (6)

Constraints (σ1) and (σ2) prevent trivially infeasible combinations of bins to begenerated (e.g. all the bins assigned to the empty type). Then, the total wastedcapacity, the total wasted cardinality (fixed) and the table of discrete wastedvalues constrain the waste values of individual bins, which ends imposing a min-imum load to some bins and reduces the number of candidate xi vectors. Finally,because bin types are fixed, the lexicographic symmetry breaking constraint canprune efficiently.

The CP model tends however to get stuck in a bin combination without beingable to fill it or prove it infeasible. This suggests that the fixed-size bin packingproblem is not properly solved. Putting all bin possible vectors xi in a tableinduced a small overhead, therefore was retained to help the filling phase.

Finally, these first results suggest that a two phases approach may capture in abetter way the structure of the problem.

67

instance easy easy ×2 easy ×4 easy ×8

Solver 6.1 waste 0.24 0.29 - -Solver 6.1 waste + table 0.17 0.28 116.79 -CPLEX 9.1 balance 0.75 61.07 (19925) - (47050) -CPLEX 9.1 integer 0.54 (9125) - - -CPLEX 9.1 optimality 0.84 6.94 (21200) - -CPLEX 9.1 best bound 1.95 9.57 - -CPLEX 9.1 hidden 1.23 (9125) - (17975) - -

Table 5. Time is in seconds, ’-’ means the solver was stopped after 2 hours, thebest solution found if any is given. CPLEX was tried with all of its MIP Emphasisparameters. Experiments were done on a Pentium III 500Mhz 256Mb of RAM

4.1 A descending approach

The idea of descending approaches is to use the lower bound argument at everynode of the search tree. Instead of computing an optimal solution of projectedproblems of section 3.1 or 3.2 at every node, we enumerate all its solutions atthe root node and store them in a table.

cost capacity # 0 #45

1500 135 0 31000 90 1 2

Fig. 4. An example of lower bound raising after a branching

Figure 4 shows how the lower bound may raise while the bins are filled: 3 itemsof size 30 have to be packed and the only available bins are empty bins or binsof capacity 45 and cost 500. The table contains all combinations of 3 bins suchthat their total capacity is larger than 90. At the root node (no item has beenpacked), the minimal cost configuration (45, 45, 0) is still possible. When the firstitem is assigned to the first bin, a wasted capacity of 15 = 45 - 30 is computed.The total wasted capacity raises from 0 to 15 and the total capacity lower boundraises from 90 = 3 × 30 to 105 = 3 × 30 + 15 (equation σ1). The table thenincreases the lower bound on the total cost to 1500 because it is the minimalcost of a bin combination with a capacity larger than 105 (3 bins of size 45).

The descending approach results appear in table 6 as ’descent table’.

68

4.2 An ascending approach

Ascending approaches fix the bin combination of lowest cost and try to fill it. Ifit is proved unfeasible, the second best bin combination is tried and so on.

The ascending scheme was first implemented enumerating all solutions of theprojection problems in section 3.1 (ascent table) and 3.2 (ascent table strong),and storing them in a table. The simple lower bound table is larger than thestrong one, but faster to compute. Tables become really large when the size ofthe problem increases: for easy×16, the simple lower bound table contains 121000 tuples of arity 160 and the strong lower bound table 100 000 tuples.

A second implementation labels bin types in a best first search way (using thecost of the combination which only depends on yb as evaluator) allowing tocircumvent the very high memory consumption of table based implementations.Candidate configurations are guaranteed to be visited in increasing order of cost.Only the simple lower bound is used because the strong model results in highercomputation times. Results appear as ’ascent bfs’ in table 6.

Table 6 shows the results of two phases descending or ascending methods on theeasy problems.

instance easy easy ×2 easy ×4 easy ×8 easy ×16 easy ×32

Solver 6.1 descent table 0 2 (18800) -Solver 6.1 ascent table 0 0 2 15 -Solver 6.1 ascent table strong 0 1 4 93 -Solver 6.1 ascent bfs 0 0 1 17 270 6076

Table 6. Time is in seconds, ’-’ means the solver was killed by the OS because of toohigh memory consumption, the best solution found if any is given. An empty spacemeans the test wasn’t performed. Experiments were done on a Pentium III 850Mhz256Mb of RAM

5 Hard sub-problems

For both descending and ascending methods, the dominating time in easy in-stances is the lower bound computation. We built a family of problems namedmedium for which lower bounds are poor, hence most of the time is spend try-ing to fill an infeasible combination of bins. The ascending solvers end finding alevel where they get trapped.

medium problems are easier than easy ones for linear programming methods.Linear solvers seem to have be in more trouble when the problem is very discrete

69


200 4 125

150 8 150

400 6 250

300 8 300

200 15 500

400 8 600

400 10 900

500 10 1000


20 15

30 8

40 15

50 10

60 10


10

Fig. 5. medium instance used in our experiments

instance CPLEX root lower bound gap strong lower bound gap optimal

medium 1 595 1 700 25 1 725 5 1 875medium×2 3 190 3 300 209 3 540 33 3 675medium×4 6 380 6 475 3 188 6 900 244 7 275medium×8 12 760 12 825 103 331 13 700 3 026 14 475


(e.g. when bin sizes are small with respect to item sizes), whereas decomposition-based constraint programming approaches fail when the problems tends to becontinuous.

instance medium medium ×2 medium ×4 medium ×8

Solver 6.1 ascent bfs 1.48 -CPLEX 9.1 balance 0.76 8.48 65.82 (28250) -CPLEX 9.1 integer 0.4 4.57 375.62 7262.87CPLEX 9.1 optimality 0.58 12.49 98.52 (14600) -CPLEX 9.1 best bound 2.34 22.24 147.99 3622.41CPLEX 9.1 hidden 0.7 11.41 126.77 (14550) -

Table 8. Time is in seconds, ’-’ means the solver was stopped after 2 hours, thebest solution found if any is given. CPLEX was tried with all of its MIP Emphasisparameters. Experiments were done on a Pentium III 850Mhz 256Mb of RAM

70

While computing lower bounds for both easy and medium problems is almostinstantaneous with the linear and combined linear-CP models described in sec-tions 3.1 and 3.2, their corresponding constraint programming models are quiteslow. Since the covering problems that appear in lower bounds computations arecommon, a large number of applications would benefit from significant improve-ments in this direction.


200 4 125

150 8 150

400 6 250

300 8 300

200 15 500

400 8 600

400 10 900

500 10 1000


20 15

30 8

40 15

50 10

60 10

70 20

80 3

90 5

100 12

110 9


24

Fig. 6. hard instance

On hard instance, linear programming solvers find the optimal value soon butaren’t able to prove it. Our constraint programming approaches don’t return anysolution.

6 Conclusion

Projection and decomposition techniques usually used in linear programmingto obtain lower bounds can also be useful in constraint programming basedmethods.

We showed on the variable-sized bin packing problem (the “rack configurationproblem” 031 in the CSPLib) how one could obtain lower bounds, strengthen

71

them with discrete arguments and integrate them into several kinds of constraintprogramming approaches that improve previous models.

We also gave efficient linear programs and combined linear CP programs to com-pute these lower bounds, highlighting directions were significant improvementsare desirable.

References

1. Egon Balas. Computational Combinatorial Optimization: Projection and Lifting in

Combinatorial Optimization, pages 26–56. Springer-Verlag, 2001.2. Alain Chabrier, Emilie Danna, Claude Le Pape, and Laurent Perron. Solving a

network design problem. Annals of Operations Research, 130(1-4):217–239, 2004.3. Mervat Chouman, Teodor Gabriel Crainic, and Bernard Gendron. A cutting-plane

algorithm based on cutset inequalities for multicommodity capacitated fixed chargenetwork design. Technical Report CRT-2003-16, Centre de recherche sur les trans-ports, Universit de Montral, 2003.

4. Virginie Gabrel, Arnaud Knippel, and Michel Minoux. Exact solution of multicom-modity network optimization problem with general step cost functions. Operations

Research Letters, 25:15–23, 1999.5. ILOG. ILOG Cplex 9.1 User Manual. ILOG, april 2005.6. ILOG. ILOG Solver 6.1 User Manual. ILOG, april 2005.7. Zeynep Kiziltan. Symmetry breaking ordering constraints. PhD thesis, Uppsala

University, 2004.8. Claude Le Pape, Laurent Perron, Jean-Charles Regin, and Paul Shaw. Robust and

parallel solving of a network design problem. In 8th International conference on

principles and practice of constraint programming (CP), pages 633–648, 2002.9. Pascal Van Hentenryck. The OPL optimization language. The MIT Press, 1999.

72

The Essence of Essence:A Constraint Language for Specifying Combinatorial Problems

Alan M. Frisch1, Matthew Grum1, Chris Jefferson1,Bernadette Martınez Hernandez1, and Ian Miguel2

1 Artificial Intelligence Group, Dept. of Computer Science, Univ. of York, York, UK2 School of Computer Science, University of St Andrews, St Andrews, UK

Abstract. Essence is a new language for specifying combinatorial (deci-sion or optimisation) problems at a high level of abstraction. The key fea-ture enabling this abstraction is the provision of decision variables whosevalues can be combinatorial objects, such as tuples, sets, multisets, rela-tions, partitions and functions. Essence also allows these combinatorialobjects to be nested to arbitrary depth, thus providing, for example, setsof partitions, sets of sets of partitions, and so forth.

1 Introduction

This paper describes Essence, a new language for specifying combinatorial (de-cision or optimisation) problems at a high level of abstraction. Essence is theresult of our attempt to design a formal language that enables problem specifi-cations that are similar to rigorous natural-language specifications, such as thosecatalogued by Garey and Johnson [1]. Formal problem specifications could facili-tate communication between humans better than the informal specifications thatare currently used, such as those in CSPLib.

A related goal in designing Essence has been to allow problems to be specifiedat a level of abstraction above that at which decisions are made when modellingthe problem in a constraint language, a mathematical programming language orvia a SAT encoding. Our principal motivation for a language is to formalise themodelling process: modelling is the transformation of an Essence specificationinto an equivalent specification in the modelling language of choice. This, in turn,has enabled us to work on the automation of modelling.

Our working hypothesis has been that a problem specification language wouldnot be some form of logical language, such as Z or NP-SPEC, but would be,to a first approximation, a constraint language, such as OPL, F or ESRA, en-hanced with features that increase its level of abstraction. Most importantly, as acombinatorial problem requires finding a certain type of combinatorial object, thelanguage should have decision variables whose domain elements are combinatorialobjects of that type. This enables problems to be stated directly and naturally;without the decision variables of the appropriate type the problem would haveto be “modelled” by encoding the desired combinatorial object as a collection ofconstrainted decision variables of some other type.

Constraint programming languages have gradually evolved a greater range oftypes for decision variables. For example, Eclipse [2] supports decision variables

73

whose domain elements are sets; similarly F [3] supports functions, ESRA [4]supports relations and functions, and NP-Spec [5] supports sets, permutations,partitions and integer functions. Each increase in abstraction allows the user to ig-nore additional modelling decisions, leaving this to the computer. Essence makesa large leap in this direction by providing a type system for constructing decisionvariables of arbitarily-complex domains. This, and other features, gives Essencea high level of abstraction—enough abstraction that we call Essence a specifica-tion language as opposed to a modelling language.

One reason why new types of decision variables have been incorporated intoconstaint programming languages so slowly has been the difficulty of implementingthe enhancements. Elsewhere we have shown how this difficulty can be overcome.In particular, we have shown how Essence specifications can be translated—wesay refined—into declarative models at the level of abstraction supported by ex-isting constraint programming languages [6]. We have implemented a rule-basedsystem, called Conjure, that can refine specifications in a fragment of Essencethat contains all of the main characteristics of Essence, but only a few con-structs of the language. We believe that all the techniques needed to refine thefull language are present in the existing Conjure implementation. Thus, our de-velopment of Essence has been untethered by the demands of refinement; in nocase have we omitted a feature or construct from Essence because we could notrefine it. Nonetheless, the demands of effective refinement have influenced someof the finer decisions in designing Essence.

The high level of abstraction provided by Essence is primarily a consequenceof four features. (1) The language supports a wide range of types (including sets,multisets, relations, functions and partitions) and decision variables can have do-mains containing values of any one of these types. For example, the Social GolfersProblem (SGP, [7]) requires partitioning a set of golfers into groups (e.g., four-somes) in each week of play subject to a certain constraint. The partitions areeasily represented by decision variable whose domain elements are of type parti-tion. We say that the type of a decision variable is the type of its domain elements.(2) All the types can be nested to arbitrary depth; for example a decision vari-able can be of type set, set of sets, set of sets of sets, and so forth. For example,the SONET problem [7] requires placing each of a set of communicating nodesonto one or more communication rings in such a way that the specified commu-nication demand is met. Thus, the goal is to find a set of rings, each of whichis set of nodes—and this can be stated in Essence by using a decision variableof type set of sets. (3) Constraints can contain quantifiers that range over de-cision variables. For example, if a decision variable X is of type set of sets, aconstraint can be of the form ∀x ∈ X.φ. The Golomb Ruler Problem (GRP, [7])requires finding a set of integers such that no two distinct pairs of distinct ele-ments have the same difference. This can be expressed directly in Essence byquantifying over the pairs in the set, even though the set is unknown. (4) Thelanguage provides types containing unnamed, indistinguishable elements. This isuseful because many problems involve some set of elements, yet do not mentionparticular elements. A guiding design principle is that the language should notforce a specification to provide unnecessary information. We know of no specifi-

74

cation or model of the SGP in which the constraints name any particular golfer.Yet most models name the golfers and, in doing so, introduce symmetry into themodel. These four features are almost unique to Essence — ESRA and F supportquantifiers ranging over decision variables.

We present a truth-conditional semantics for Essence, a first (we believe) fora language of this kind.1 It features a proper treatment of undefined expressions,which can arise from, for example, division by zero or array indices out of bounds.

2 An Introduction to Essence by Example

Let us begin by considering the specification of the Golomb Ruler Problem (GRP,problem 6 at www.csplib.org, shown in Fig. 1). A specification is a list of state-ments, of which there are seven kinds, signalled by the keywords given, where,letting, find, maximising, minimising and such that. Statements are composedinto specifications according to the regular expression:

(given | letting | where)∗ find+ [minimising | maximising] (such that)∗

letting statements declare constant identifiers and user-defined types. givenstatements declare problem parameters, whose values are input to specify the in-stance of the problem class. As in other modelling languages, parameter valuesare not part of the problem specification. where statements are used to constrainparameter values; only valid parameter values specify a problem instance. findstatements declare decision variables. A minimising or maximising statementgives the objective function, if any. Finally, such that statements give the prob-lem’s constraints.

The GRP specification begins by declaring the parameter n (valid when pos-itive) and the identifier bound. The declaration of bound uses n, so n must bedeclared first. Identifiers must be declared before use, preventing cyclical defini-tions and decision variables from being used to define constants or parameters.

Essence is a statically typed language. It supports the atomic types int (in-teger), bool (Boolean), user-defined enumerated types and user-defined unnamedtypes, the last denoted by type of size α, a type comprising α unnamed andindistinguishable elements. Compound types are built with the constructors sets,multisets, functions, tuples, relations, partitions and matrices. For example, setof int and relation on int × int are both types.

The domain of a decision variable is a finite set of elements all of the sametype. One can specify a domain merely by giving its type, in which case thedomain consists of all elements of that type. For example, given the user-definedenumerated type colour consisting of the elements red, green and blue one canspecify a decision variable with the domain colour. One can also specify a domainby giving both a type and restrictions that select a subset of the elements of thetype. Consider, for example, the GRP decision variable Ticks. Its domain is setof int, with the restrictions that the set must have cardinality n, and its elements

1 Constraint logic programming languages’ semantics focus on integrating constraints with logicprogramming and abstract away from semantics of particular constraints.

75

Given n, put n integer ticks on a ruler of size m such that all inter-tick distances are unique.Minimise m.

given n : intwhere n ≥ 0letting bound be 2n

find Ticks : set (size n) of int (0..bound)minimising max(Ticks)such that ∀i,j⊆Ticks. ∀k,l⊆Ticks. i, j 6= k, l → |i − j| 6= |k − l|

A SONET communication network comprises a number of rings, each joining a number of nodes.A node is installed on a ring using an ADM and there is a capacity bound on the number of ADMsthat can be installed on a ring. Each node can be installed on more than one ring. Communicationcan be routed between a pair of nodes only if both are installed on a common ring. Given thecapacity bound and a specification of which pairs of nodes must communicate, allocate a set ofnodes to each ring so that the given communication demands are met. The objective is to minimisethe number of ADMs used. (This is a common simplification of the full SONET problem, asdescribed in [8])

given nrings : int, nnodes : int, capacity : intwhere nrings ≥ 1, nnodes ≥ 1, capacity ≥ 1letting Nodes be int(1..nnodes)given demand : set (size m) of set (size 2) of Nodesfind rings : mset (size nrings) of set (maxsize capacity) of Nodesminimising

Pr∈rings |r|

such that ∀pair∈demand. ∃r∈rings. pair ⊆ r

In a golf club there are a number of golfers who wish to play together in g groups of size s. Finda schedule of play for w weeks such that no pair of golfers play together more than once. (Thistransforms into a decision problem and parameterises problem number 10 in CSPLib).

1. given w : int, g : int, s : int2. where w > 0, g > 0, s > 02. letting golfers be type of size g × s3. find sched : mset(size w) of regpart(size s) of golfers4. such that ∀week1,week2⊆sched. ∀group1∈week1,group2∈week2. |group1 ∩ group2| < 2Alternative constraint:4′. such that ∀golfer1,golfer2⊆golfers. (

Pweek∈sched .together(golfer1, golfer2, week)) < 2

Fig. 1. Essence specifications of the Golomb Ruler, SONET, and Social Golfers problems.

must be drawn from the range 0..bound . Quantified variables must also have finitedomains. However, parameters can have infinite domains (e.g. int).

Essence Domains correspond to the notion of domains in existing constraintlanguages. The key advance made by Essence is that it is the first constraintlanguage to support fully compositional domains. For example, a decision variablemay have domain int (lb .. ub), set (size s) of int (lb .. ub), set (size r) ofset (size s) of int (lb .. ub), etc.

Constraints are built from parameters, constants and decision variables us-ing operators commonly found in mathematics and other constraint specificationlanguages. The language also includes variable binders such as ∀x, ∃x and Σx,where x can range over any specified finite domain (e.g. integer range but notinteger). The GRP constraint can be paraphrased “For any two unordered pairsof ticks, i, j and k, l, if the two pairs are different then the distance betweeni and j is not the same as the distance between k and l.” To clarify, the expres-sion i, j ⊆ Ticks means that two distinct elements are drawn from Ticks and,without loss of generality, one is called i and the other is called j.

76

Now consider the specification of the SONET problem (Fig. 1). Notice thatNodes is declared to be a domain whose elements are the integers in the range1..nnodes. A subtle point is that the line 3 of the specification is declaring twoparameters. When parameter demand is instantiated to a particular set of sets,the size of the outer set is known. Hence, the value of m is given indirectly. Thisdeclaration also requires the inner sets to have cardinality two. The goal is to finda multiset (the rings), each element of which is a set of Nodes (the nodes on thatring). The objective is to minimise the sum of the number of nodes installed oneach ring. The constraint ensures that any pair of nodes that must communicateare installed on a common ring.

Finally, Fig. 1 gives two specifications of the Social Golfers problem (SGP).The golfers are not referred to individually in the problem description, and so arespecified naturally with an unnamed type. The decision variable is representedstraightforwardly as a multiset (the fact that it is a set is an implied constraint) ofregular partitions, (a regpart guarantees equal-sized partitions) each representinga week of play. The specifications differ only in the expression of the socialisationconstraint. The constraint at line 4 quantifies over the weeks, ensuring that the sizeof the intersection between every pair of elements of the corresponding partitionsis at most one (otherwise the same two golfers are in a group together more thanonce). The constraint at line 4′ quantifies over the pairs of golfers, ensuring thatthey are partitioned together (via the global constraint together) over the weeksof the schedule at most once. Note that here we make use of a facility common toconstraint languages: treating Booleans as 0/1 for the purpose of counting.

Which of these two alternatives is the more natural is down to the taste of thewriter of the specification. This example does, however, highlight that, althoughEssence allows the user to avoid many more modelling decisions than exist-ing constraint languages, it is possible to write distinct but equivalent Essencespecifications. The natural question is whether we are simply requiring users tobecome experts in writing Essence specifications rather than in traditional con-straint programming languages. Central to this question is whether alternativeEssence specifications of the same problem are refined to the same models. If so,which of the alternative Essence specifications a user happens to choose is im-material. However, since our Conjure refinement system is at a relatively earlystage of development [6], we cannot fully answer this question at present.

Let us consider the most pessimistic case: different Essence specifications ofthe same problem are refined to radically different models. If so, we have replacedone modelling bottleneck with another: expertise in Essence is required to ob-tain the best models. However, even this is a substantial improvement over thecurrent state of affairs, since the degree of choice is vastly reduced. Typically, anEssence specification can be implemented by many constraint models. Choosingamong a small number of Essence specifications is significantly less dauntingthan choosing among a relatively large number of corresponding constraint mod-els. In future, we aim to reduce the number of Essence specifications of the sameproblem by employing normalisation procedures to transform each specificationinto a canonical form, simplifying specification still further.

77

We close this section by noting that the complexity of the problems expressiblein Essence is not clear as, like many other constraint languages, it allows oneto build instances with an exponential number of variables (array [1. . . 2n]) andexponential domain sizes.

3 The Syntax of Essence

The syntax of well-formed Essence problem specifications is defined by a combi-nation of a grammar and correct type and domain characterisations. The gram-mar of Essence is detailed in Section 3.1. The domains and the type system areexplained in Section 3.2.

3.1 The ESSENCE grammar

The grammar is presented in BNF format using the following conventions: astands for a non-empty list of a’s, a stands for a non-empty list of a’s separatedby ’s (where can be any symbol), and [a] stands for nil or one occurrence of a.

Identifiers are strings whose first character is a letter and the rest of thecharacters are alphanumeric, “ ” or “ ’ ”. Identifiers are keywords introducedby the user, hence they must be different from the built-in reserved keywords(indicated in teletype font). A number is any string of numeric characters.

Problem specifications in ESSENCE. A grammatically well-formed Essenceproblem specification is composed of a (possibly empty) preamble, a (possiblyempty) list of find statements, an (optional) objective statement and a (possiblyempty) list of constraints. The preamble includes the given, letting and wherestatements. Given statements consist of parameter declarations, that is, a list ofdomainId’s, where a domainId is an identifier and its domain (e.g. i:int). Con-stant declarations in letting statements associate identifiers to either domains,expressions, or user-defined types. Where statements are conditions that param-eters and constants of a valid problem instance must satisfy. For example, in thefirst specification of Fig. 1 we have the statement where n ≥ 0 that ensures param-eter n is non-negative. Find statements are composed by a list of decision variabledeclarations. Parameter, constant and decision variable declarations comprise thedeclarations of the specification. The objective statement specifies an objective,either minimising or maximising, and an arithmetic expression (arithExpression)as the objective function of the specification.

spec ::= [preamble] [find domainId’] [objective] [constraints]preamble ::= given domainId’

‖ letting constantId’‖ where expression’

objective ::= minimising arithExpression‖ maximising arithExpression

constraints ::= such that expression’constantId ::= identifier be domain

‖ identifier [“:” domain] be expression‖ identifier be new enum identifier’‖ identifier be new type of size arithExpression

domainId ::= identifier “:” domain

78

Types and Domains in ESSENCE. The atomic types supported in Essenceare: Boolean, integer and user-defined types. The latter are of two kinds, enu-meration types and unnamed types. Since user-defined types are associated to anidentifier in the constant declarations we use the non-terminals enumTypeId andunnamedTypeId to represent them in the grammar. The type system of Essenceallows the construction of arbitrarily compound (multi) sets, partial and totalfunctions, sequences, permutations, relations, tuples, (regular) partitions and ma-trices of other types.

Domains are (finite) set of elements all of the same type. One can spec-ify a domain merely by giving its type, in which case the domain consists ofall elements of that type (e.g. i:int). One can also specify a domain by giv-ing both a type and domain restrictions that select a subset of the the ele-ments of the type (e.g. i:int(1..3)). The non-terminals intRangeRestriction,enumRangeRestriction and sizeRestriction represent the domain restrictions thatcan be applied to integer (e.g. int(1..3)), enumeration (e.g. days(saturday,sunday)where days is an enumeration type consisting of the days of the week) and othertypes (e.g. set (size 2) of int) respectively.

domain ::= “(” domain “)” ‖ bool‖ int [“(” intRangeRestriction “)”]‖ enumTypeId [“(” enumRangeRestriction “)”]‖ unnamedTypeId‖ set [sizeRestriction] of domain‖ mset [sizeRestriction] of domain‖ domain “ 7→” [funcClass] domain‖ domain “→” [funcClass] domain

‖ rel on [sizeRestriction] domain [sizeRestriction]×‖ tuple “〈” domain’ “〉”‖ part [sizeRestriction] of domain‖ regpart [sizeRestriction] of domain‖ matrix “[” indexed by domain’ “]” of domain

funcClass ::= bij ‖ inj ‖ sur

Notice that types are not mentioned explicitly in the grammar, however theyexist in Essence as abstractions used by the type checker. Types are producedby removing all the domain restrictions (if any) from domains.

Expressions in ESSENCE. Expressions in Essence can be very complex.For reasons of space we give only a brief description.

The cardinality and negation operators are among the unary operators used toconstruct expressions. The binary operators can be arithmetic (e.g. +, -), Boolean(e.g. ∧, ∨), comparison (e.g. =, 6=) or set related (e.g. ∈, ∪). Many global con-straints (e.g. allDiff, atmost, etc), and operators over functions and relations(e.g. domain, inverse, etc) can be used to compose expressions.

We can name elements of certain types, such as matrix (e.g. [2, a, b]), set(e.g. 2, a, b) and multiset (e.g. #2, a, b#). An important requirement isthat empty (multi)sets must have their domain attached, for example : setof int. The reason is that the type of a named (multi)set is deduced from itselements and this is not possible for an empty (multi)set.

Essence allows the user to create quantified expressions. Examples of quan-tified expressions can be found in the constraints of Figure 1. Each quantifiedexpression (e.g. ∀x, y ∈ S.x + y ≥ z) consists of a quantifier (e.g. ∀), a bindingexpression (e.g. x, y ∈ S), and an expression to quantify (e.g. x + y ≥ z). The

79

binding expression implicitly declares the quantified variables (e.g. x, y), and itoften contains sub-expression with named elements of various types (e.g. x, y).We show here only some of the cases (binder).

quantExp ::= quantifier bindingExp expressionquantifier ::= “∀”, “∃”, “Σ”

bindingExp ::= “(” bindingExp “)” ‖ domainId ‖ binder setBindOp expressionsetBindOp ::= “⊂” ‖ “⊆” ‖ “∈” ‖ “/∈”

binder ::= identifier ‖ “” identifier’ “”‖ “#” identifier’ “#” ‖ “〈” identifier’ “〉” ‖ ...

3.2 The Type Checker and the Finite Domain Checker.

The Essence grammar does not give enough information to define a syntacti-cally correct specification. It does not ensure identifiers are combined properlyin expressions and that those identifiers have adequate domains. To avoid theseproblems we need to use the Type Checker (TC) and the Finite Domain Checker(FDC). The TC guarantees all expressions and subexpressions have valid types.The FDC ensures decision and quantified variables have finite domains.

Before performing any of the type and domain verifications we need to con-struct a table of identifiers from the declarations. In the table every identifier hasa unique entry that records the identifier name and its associated information.The information depends on the declaration of the identifier. For parameters,constant domains and decision variables the domain must be saved. Domain andassociated expression are stored for the constant expressions whereas we store thetype construction for a user-defined type. During the construction of the table weneed to check that every identifier is uniquely declared before it is used in anotherdeclaration. Also, associated expressions and domains must be type-checked andthey must not contain decision variables.

The Type Checker. The TC is described here as a recursive function typeCheck:expression × env → expression that, given an expression, returns a well-typedexpression with respect to a table of identifiers, also called the “environment”(env). typeCheck must be applied to all the expressions in a specification. Theobtained type of an expression must be consistent with its context, for example,a constraint must be Boolean. When an expression cannot be typed or its type isinconsistent with the contexts it is rejected.

For a built-in constant c, typeCheck assigns the correct built-in type, exceptin the cases of empty set and multiset where the type is obtained from the at-tached domain. For an identifier i the function typeCheck finds its domain in theenvironment and returns its type after verifying it is valid. For example, for thedomain matrix indexed by [int(a..b)] of int, it returns matrix indexedby [int] of int after ensuring the indexing type int is an ordered type. Thefunction typeCheck uses several subfunctions to perform the checking of morecomplex expressions. One of the most important is apTypeRule, the function thatapplies the type production rules. To clarify the performance of typeCheck let usshow an example.

Suppose we have the expression x+4 where x has a domain int(1..4). Thefunction typeCheck for binary operators (assume 2 represents any binary opera-

80

tor, and a and b stand for any expression) defined as

typeCheck (a 2 b) env → apTypeRule ((typeCheck a env) 2 (typeCheck b env))

obtains the integer type for a and 4 after type-checking them. Then it applies thetype rule for integer addition

a:int + b:int ; (a + b):int

and returns (x+4):int.That is, the expression (x+4) has type integer. typeCheck performs similarly

for other operators, types and type rules. For quantified expressions, typeCheckalso includes the quantified variables in the current environment.

The Finite Domain Checker. Decision variables and quantified variables(in a domain binding expression) must have a finite domain. Parameters andconstants do not need a finite domain since they will have a finite value assignedwhen implementing an instance. We define below the rules to identify a finitedomain (FDomain).

FDomain ::= bool ‖ int(iRange) ‖ enumIdentifier[(eRange)]‖ unnamedIdentifier‖ set [sizeRestrict] of FDomain‖ mset [sizeRestrict] of FDomain‖ FDomain “− >” [funcType] FDomain‖ FDomain “|− >” [funcType] FDomain‖ tuple “〈” FDomain’ “〉”‖ rel [sizeRestrict] FDomain [sizeRestrict]×‖ part [sizeRestrict] of FDomain‖ regpart [sizeRestrict] of FDomain

4 The Semantics of Essence

This section presents a semantic account of Essence that identifies the conditionsunder which an Essence specification is satisfied—that is, what assignments todecision variables satisfy what specifications. To be clear, our focus is definingthe truth conditions of the language, not on defining the behaviour of a decisionprocedure for satisfiability or any other program. Our semantic account specifiesa denotation function from Essence specifications and assignments to the truthvalues. This function is not total; the denotation of a specification relative to anassignment may be undefined. The denotation function is defined compositionally,and thus applies to every expression and subexpression of Essence.

The semantic rules assign denotations to Essence specifications in which eachexpression is annotated with its type. We also assume that specifications arenormalised in that each statement that declares multiple symbols has been re-placed with a sequence of statements each declaring a single symbol. Similarly, asuch that statement with multiple constraints is assumed to have been replacedwith multiple such that statements, each with a single constraint. The semanticrules do not deal with given statements. It suffices to consider a statement of theform “given p” to be synonymous with a statement of the form “letting p beexp”, where exp is an expression provided by the user to define the instance.

81

The denotation of an expression P is taken relative to three assignments: anassignment A of values to decision variables, an assignment θ of values to user-defined identifiers, and an assignment g of values to quantified variables. Thedenotation of P relative to A, g and θ is written as [[P ]]A,θ,g

. Thus, [[ · ]] is a partialfunction of arity four. A consequence of the definition of the denotation functionis that if P is an Essence specification and A is an assignment to its decisionvariables, then the value of [[P ]]A,θ,g is the same regardless of θ and g. Thus, wecan simply write [[P ]]A and speak of the denotation of P with respect to A.

The denotation function is not total; [[exp]]A,θ,g may be undefined. This isnecessary to handle cases such as where an array index is out of bounds or a divisoris zero. An attempt to do otherwise leads to problems. If “5/0 = 1” is false, thenwe must accept that “5/0 6= 1” is true. What, then, does one say about the truthof “5/0 = 4/0”? We also say that the denotation of an Essence specificationis undefined if any of its where constraints are not met or if a decision variableis assigned a value outside its domain. The definition of the denotation functionshould not be considered as an incomplete definition; rather it is a completedefinition of a partial function.

Our primary intuition regarding partiality is that a specification of a decisionproblem is satisfied by an assignment if the denotation of the specification withrespect to the assignment is T ; and this is the case even if the denotation isundefined with respect to other assignments. Though constraint programminglanguages typically do not have semantic specifications, their behaviour oftenconflicts with our intuition; for example, if one assignment leads to an array indexbeing of bounds an exception is raised and (by default) the program is halted,even if other computation paths lead to a solution.

In the rules that follow, an expression of the form s;S stands for a sequenceof Essence statements, the first of which is s and the remainder of which is thepossibly empty sequence S. If σ is an assignment of any kind (either A, θ, or g),then σ[x 7→ d] is the assignment that is identical to σ with the possible exceptionthat it maps x to d.

Let us first handle letting and where statements. A letting statement de-clares a new identifier and assigns it a value; the semantics accounts for thisassignment by recording it in θ.• [[letting c be exp; R]]A,θ,g = [[R]]A,θ[c 7→e],g, where e = [[exp]]A,θ,g .

• [[letting e be new enum c1, . . . , cn; R]]A,θ,g = [[R]]A,θ′,g , where θ′ = θ[e 7→ c′1, . . . , c′n,c1 7→ c′1, . . . , cn 7→ c′n], and c′1, . . . , c′n is any arbitrary set of size n. This set is totallyordered by “≤”: c′1 ≤ c′2 ≤ · · · ≤ c′n.

• [[letting u be new type of size exp); R]]A,θ,g = [[R]]A,θ[u 7→U ],g , where U is an arbitrary set

of size [[exp]]A,θ,g .

• [[where constraint; R]]A,θ,g = [[R]]A,θ,g provided [[constraint]]A,θ,g = T .

The last semantic rule does not specify a denotation if the constraint is notsatisfied. Our convention is that where a denotation is not specified, it is undefined.

The only role of find statements that must be accounted for by the semanticsis to ensure that the denotation of a specification relative to an assignment A isundefined if A maps a decision variable to a value not in its domain. A minimisingor maximising statement must be handled together with all the find statementsas optimisation is over all assignments to the decision variables. Thus, we give

82

two rules: the first handles all the find statements in the absence of optimisationand the second handles minimisation by considering all solutions to the find andsuch that statements. The rule for maximisation is similar to that for minimi-sation and therefore is not shown.• [[find x1:τ1; . . . ; find xn:τn; R]]A,θ,g = [[R]]A,θ,g provided A(xi) ∈ [[τi]]

A,θ,g (1≤ i≤n) andR contains no find, minimising or maximising statements.

• [[find x1:τ1; . . . ; find xn:τn; minimising exp; R]]A,θ,g

= T if [[FR]]A,θ,g =T and for all A′ such that [[FR]]A′,θ,g =T , [[exp]]A,θ,g ≤ [[exp]]A

′,θ,g,

= F if [[FR]]A,θ,g = F ,

= F if [[FR]]A,θ,g = T and for some A′ such that [[FR]]A′,θ,g = T , [[exp]]A,θ,g > [[exp]]A

′,θ,g,where FR is (find x1:τ1; . . . ; find xn:τn; R).

As the first of the two rules above does not specify a denotation to a find state-ment that does not meet the stated provision, by our convention, the denotationof such a statement is undefined.

The only role that domains play in the semantics is in the previous equation,which involves only finite domains. The denotations of some finite domains aredefined as follows—the remainder are similar to these. Here we consider τo to bean ordered type.• [[bool]]A,θ,g = T ,F• [[τo (r1, . . . , rn)]]A,θ,g = [[τo (r1)]]A,θ,g ∪ [[τo (r2, . . . , rn)]]A,θ,g , provided n ≥ 2.

• [[τo (l..u)]]A,θ,g = i | [[l]]A,θ,g≤ i ≤ [[u]]A,θ,g.• [[τo (exp)]]A,θ,g = [[exp]]A,θ,g, where exp is not of the form l..u or r1, . . . , rn.

• [[set (size exp) of τ ]]A,θ,g = S ⊆ [[τ ]] | [[exp]]A,θ,g = |S|.• [[set (maxsize exp) of τ ]]A,θ,g = S ⊆ [[τ ]] | [[exp]]A,θ,g≥ |S|.• [[matrix indexed by [d1, . . . , dn] of τ ]]A,θ,g = [[d1]]A,θ,g × · · · × [[dn]]A,θ,g → [[τ ]]A,θ,g.

Notice that a matrix denotes a function that is total when the indices are withinbounds, and undefined when they are out of bounds.

A sequence of such that statements is satisfied if the first statement is satis-fied and the rest are satisfied. The empty sequence is always satisfied.• [[such that C; R]]A,θ,g= T if [[C]]A,θ,g = [[exp2]]A,θ,g = T

= F if [[C]]A,θ,g = F

= F if [[R]]A,θ,g = F .

• [[ ]]A,θ,g = T .

Now consider the atomic expressions. In all assignments, “T ”, “F”, “1”, “2”,“3”, etc. denote T , F , 1, 2, 3, etc. For other atomic expressions we have:• [[c]]A,θ,g = θ(c) where c is any user-defined identifier.

• [[x]]A,θ,g = A(x) where x is a decision variable.

• [[x]]A,θ,g = g(x) where x is a quantified variable.

Now consider the operators that are used to build up constraints and otherexpressions. We mostly focus here on the treatment of undefined denotations as inother respects the semantics is both obvious and similar to that of other languages.Our intuition is that existentially-quantified variables should behave like decisionvariables, which suggests the following definition:• [[∃x:τ .constr]]A,θ,g= T if [[constr]]A,θ,g[x7→d] = T for some d∈ [[τ ]]A,θ,g

= F if [[constr]]A,θ,g[x7→d] = F for all d∈ [[τ ]]A,θ,g.

Another intuition is that disjunction should behave like existential quantification,suggesting the following rule:• [[exp1 ∨ exp2]]A,θ,g= F if [[exp1]]A,θ,g = [[exp2]]A,θ,g = F

= T if [[exp1]]A,θ,g = T

= T if [[exp2]]A,θ,g = T .

83

The rules for conjunction and universal quantification are readily obtained fromthe above definitions by interchanging “T” and “F .”

Now consider the division operator:• [[exp1/exp2]]A,θ,g = [[exp1]]A,θ,g/ [[exp2]]A,θ,g

Observe that since division by zero is not defined, if exp2 denotes zero, then thedenotation of exp1/exp2 is undefined.

Some operator symbols are overloaded; the operator they denote depends onthe type (never the denotation) of their operands. Consider ∪, which can denoteset union or multiset union. This operator requires two syntactic rules, and itrequires a semantic rule for each.• [[exp1 ∪ exp2]]A,θ,g = the multiset union of [[exp1]]A,θ,g and [[exp2]]A,θ,g , provided both exp1

and exp2 are of type mset of τ .• [[exp1 ∪ exp2]]A,θ,g = the set union of [[exp1]]A,θ,g and [[exp2]]A,θ,g , provided both exp1 and

exp2 are of type set of τ .

The rule for indexing into a matrix is straightforward.• [[M [i1, . . . , in]]]A,θ,g = [[M ]]([[i1]]A,θ,g , . . . , [[in]]A,θ,g).

5 An Evaluation of the Use of Essence

To evaluate the effectiveness of Essence we decided to specify a large suite ofproblems in the language and reflect on the process and resulting specifications.A suite of 57 problems was selected, 25 drawn from www.csplib.org, and 32 fromconstraint journals and conferences. This lists includes both problems which aretheoretical in nature and problems drawn from the real world.

The catalogue2 contains Essence specifications for all the problems as well assome previously-published specifications in other languages, including Z, ESRA,OPL and F . The catalogue demonstrates the expressiveness and elegance ofEssence when compared to other constraint languages. In all cases the lengthof the specification was in proportion to the size of the problem statement, withlarge examples being just as easy to read.

The task of specifying the problem suite in Essence was undertaken by thesecond author after studying computer science for three years as an undergradu-ate. He had no previous experience of constraint programming or logic program-ming. At the outset the language had no formal description so he had to learn thelanguage from a small set of example specifications. He found that he was able todevelop his understanding of Essence by drawing heavily on his understandingof the conventional set and function operators of discrete mathematics.

The process of writing an Essence specification began by obtaining an un-ambiguous natural language description of the problem. The flexibility of thelanguage allowed specifications to be written directly from this description, start-ing with the parameters, and following with the decision variable, objective, andconstraints.

Attention had to be paid to abstraction during specification in order to makefull use of the language, as some of the problems were described in the literaturein terms of low-level objects such as arrays. With the goal of producing the most

2 available at www.cs.york.ac.uk/aig/constraints/AutoModel/

84

abstract specification possible there was usually a single choice for the type of thedecision variable. In the SONET problem the configuration can be represented asa set of sets of nodes or as a relation from rings to nodes. Both possibilities arepresent in the problem catalogue.

Once several specifications had been written, the process became easier asthe specifications contained common idioms that could be reused. Patterns werepresent in the high level specifications that would not necessarily appear in theconcrete constraint programs due to differing representations of complex objects.Certain constructs were repeated in their entirety, for example minimising a totalcost by summing the costs of each item in a set.

6 Comparison of ESSENCE with other languages

Algebraic modelling languages, dating from the 1970s and originating in the fieldof mathematical programming (e.g. GAMS [9] and MGG [10]), made three signif-icant advances. Firstly they were developed to simplify the user’s role in solvingmathematical programming problems, providing a syntax much closer to the ex-pression of these problems found in the literature. Secondly, they are declarative,characterising the solutions to a problem rather than how the solutions are to befound. This lifts a burden from the user, and allows different solvers to be easilyapplied to the same problem. Finally, they separate the specification from in-stance data, allowing parameterised specifications of problems. As mathematicalprogramming solvers become increasingly powerful and complex, the importanceof these languages has increased (e.g. AMPL [11]).

The success of algebraic modelling languages as an interface to mathemati-cal programming solvers suggests that a similar approach might be fruitful forconstraint solving. From early in the development of the field, the ALICE lan-guage [12] shares many features with algebraic modelling languages of the time,including its declarative nature. Following this, however, the trend in constraintsolving was to extend a general-purpose programming language, such as Prolog(ECLiPSe [13], CHIP [14]) or C++ (ILOG Solver [15]) with a constraint library.Although this approach is undoubtedly powerful and efficient, it is not conduciveto reducing the modelling bottleneck as such systems are often complex, requiringknowledge of the host general-purpose language, and interleave the specificationof a problem with how it is solved.

More recently, in parallel with an increasing awareness of the modelling bot-tleneck in constraints, abstract constraint modelling has re-emerged in languagessuch as EaCL [16], opl [17], esra [4], and F [3]. opl has enjoyed particularsuccess, not least because of its ability to support models intended to be solvedby a combination of mathematical and constraint programming techniques. esraand F are more abstract than opl, providing support for relations and functionsrespectively.

However, to be truly valuable, a modelling language should not force the userto make modelling decisions when a specification is written down. For example, ifthe problem involves partitioning, it should not be necessary for the user to decideto represent the partition as, say, a constrained set of sets. Otherwise, the onus is

85

on the user to make the most effective choice. If an ineffective choice is selected,the early commitment to a representation may inhibit automated attempts toimprove the resultant model.

Therefore, a modelling language should have a level of abstraction above thatat which modelling decisions are made. Current algebraic/constraint modellinglanguages are not abstract enough. It may be argued that general specificationlanguages, such as B, Z, or (more focused on combinatorial problems) NP-Specare sufficiently abstract to avoid forcing the user to make modelling decisions.However, we believe that an abstract specification language tailored to constraintproblems provides the most natural means of specifying such problems.

In order to illustrate the advances made by Essence, we compare the spec-ifications of the SGP in Essence, NP-Spec, esra and opl. Both an Englishdefinition and an Essence specification of this problem are given in Fig. 2, spec-ifications in esra (from [4]), opl (written by the authors) and NP-Spec3 aregiven in Fig. 6. Each specification is divided into 4 sections. The first declares anyparameters and types, the second declares the variables, the third imposes thatthe variables represent a multiset of partitions and the fourth that the “sociabilityconstraint” (that is no pair of players are in the same group more than once) issatisfied.

The most natural way in which to represent the solution of this problem is asa multiset (as the weeks are not distinguished) of partitions of the players, whereeach partition gives the games played in a week. The Essence specification isable to capture this entirely in the decision variable it uses. While NP-Specprovides a partition type, it is not possible to create either a list or multisetof partitions and so it cannot be used. Each of the NP-Spec, opl and esraspecifications therefore uses a specific implementation of multisets of partitionsfor the variables, requiring the user to make modelling decisions. The NP-Specand esra specifications both implement the multiset of partitions by introducinga range of values which represent which group a player plays in, and then usea function from “players × weeks” to the group that player is in during thatweek. The opl specification’s decision variable is a refinement of this, where thefunction is represented as a 2 dimensional matrix indexed by players and weeks,with elements drawn from a range to denote the group.

Each of the esra, opl and NP-Spec specifications have to give a constraintwhich imposes the decision variables represent a multiset of regular partitions.These constraints are contained in the 3rd section of each specification.

As they label the players, weeks and groups, the esra, opl and NP-Specspecifications introduce a large amount of symmetry which is present in the orig-inal specification. The symmetry of the weeks and players can be removed byusing unnamed types, but the symmetry of the groups is more subtle. Labellinggroups in different weeks using the same type introduces a dependancy betweenthe groups in each different weeks not present in the specification. The true sym-metry group allows the groups to be freely permuted independently in each week.By not having to label the groups, the ESSENCE specification avoids this prob-lem. The full extent of the symmetries of the groups has been missed previously

3 from http://www.dis.uniroma1.it/˜cadoli/research/projects/NP-SPEC/

86

(for example [18]), which shows how the use of Conjure can aid even expertmodellers.

One unusual feature in esra’s definition of “Sched”, on line 2, is the nota-tion →groupsize∗weeks. This represents that each element of Group is mapped toby exactly groupsize ∗ weeks assignments. This is implied by the fact that thisfunction is representing a multiset of partitions, but is insufficient to fully imposethis condition, as this requires the stronger condition that each group containsgroupsize players in each week.

The main constraint of the SGP, contained in the denoted as part four ofeach specification, is that no pair of players may play together more than once.Both the opl and esra specifications impose this constraint by quantifying overthe players. NP-Spec’s specification implements this constraint by imposing thatthere must not exist two distinct players and two distinct weeks such that theplayers play together in both weeks. This can be seen as an alternative method ofspecifying the constraint in the Essence specification, using the common trans-formation that instead of imposing less than n elements of a set satisfy a condition,it is equivalent to check that there is no subset of size n where all elements satisfythe condition.

esra1a cst weeks, groups, groupsize : N1b dom Players = 1..groups ∗ groupsize, Weeks = 1..weeks, Groups = 1..groups;

2 var Sched : (Players × Weeks) →groupsize×weeks Groupssolve

3 ∀(h : Groups, w : Weeks)count(groupsize)(p : Players|Sched(p, v) = h)4 ∧∀(p1 < p2 : Players)count(0...1)(w : Weeks|Sched(p1, v) = Sched(p2, v))opl1a int weeks = ... ; int groups = ... ; int groupsize = ... ;1b range Weeks 1..weeks; range Groups 1..groups; range Players 1..groups∗groupsize;2 var Groups Schedule[Players, Weeks];

subject to 3a forall(w in Weeks & g in Groups) (sum(p in Players) (Schedule[p,w] = g) = groupsize);4a forall(ordered p1,p2 in Players) (sum(w in Weeks) (Schedule[p1,w] = Schedule[p2,w]) < 2);

NP-Spec

DATABASE1 weeks = 6; groups = 8; groupsize = 4;

SPECIFICATION2 IntFunc(1..groups∗groupsize><1..weeks, Schedule, 1..groups).3 fail <-- COUNT(Schedule(*,W,Gr),X), X != groupsize.4 fail <-- Schedule(P1,W1,Gr1), Schedule(P2,W1,Gr1), P1 != P2,

Schedule(P1,W2,Gr2), Schedule(P2,W2,Gr2), W1 != W2.

Fig. 2. esra,opl and NP-Spec specifications of the Social Golfers Problem

7 Conclusion

Essence allows combinatorial problems to be specified at a high level of abstrac-tion. The result is that problems can be specified without (or almost without)modelling them. The central, unique feature of Essence is that it supports com-plex, arbitrarily-nested types. Consequently, a problem that requires finding a

87

complex combinatorial object can be directly specified by using a decision vari-able whose type is precisely that combinatorial object.

References

1. Garey, M., Johnson, D.: Computers and Intractability. W. H. Freeman (1979)2. Gervet, C.: Conjunto: Constraint logic programming with finite set domains. In

Bruynooghe, M., ed.: Logic Programming - Proceedings of the 1994 InternationalSymposium, Massachusetts Institute of Technology, The MIT Press (1994) 339–358

3. Hnich, B.: Function Variables for Constraint Programming. PhD thesis, ComputerScience Division, Department of Information Science, Uppsala University (2003)

4. Flener, P., Pearson, J., Agren, M.: Introducing ESRA, a relational language formodelling combinatorial problems. In: Proceedings of LOPSTR ’03: Revised SelectedPapers. Volume 3018 of LNCS. (2004)

5. Cadoli, M., Ianni, G., Palopoli, L., Schaerf, A., Vasile, D.: NP-SPEC: An executablespecification language for solving all problems in NP. Computer Languages 26(2000) 165–195

6. Frisch, A.M., Jefferson, C., Martınez Hernandez, B., Miguel, I.: The rules of con-straint modelling. In: Proceedings of the 19th International Joint Conferences onArtifical Intelligence. (2005) 109 – 116

7. Gent, I.P., Walsh, T.: CSPLib: A problem library for constraints. www.csplib.org

(2005)8. Frisch, A.M., Hnich, B., Miguel, I., Smith, B.M., Walsh, T.: Transforming and

refining abstract constraint specifications. In: Proceedings of the 6th Symposium onAbstraction, Reformulation and Approximation. Volume 3607 of LNCS., Springer(2005) 76–91

9. Brooke, A., Kendrick, D., Meeraus, A.: GAMS: A Users’ Guide. The ScientificPress, Danvers, Massachusetts (1988)

10. Simons, R.: Mathematical programming modeling using MGG. IMA Journal ofMathematics in Management 1 (1987) 267–276

11. Fourer, R., Gay, D.M., Kernighan, B.W.: AMPL: A Modeling Language for Mathe-matical Programming. Second edn. Thomson/Brooks/Cole, Pacific Grove, Califor-nia (2003)

12. Lauriere, J.L.: ALICE: A language and a program for stating and solving combina-torial problems. Artificial Intelligence 10 (1978) 29–127

13. Cheadle, A., Harvery, W., Sadler, A.J., Schimpf, J., Shen, K., Wallace, M.: ECLiPSe:An introduction. Technical Report IC-Parc-03-1, Imperial College London (2003)

14. Aggoun, A., Beldiceanu, N.: Overview of the CHIP compiler system. In Benhamou,F., Colmerauer, A., eds.: Constraint Logic Programming: Selected Research. MITPress, London (1993) 421–436

15. ILOG: ILOG Solver 5.1 User’s Manual. (2000)16. Mills, P., Tsang, E., Williams, R., Ford, J., Borrett, J.: EaCL 1.5: An easy ab-

stract constraint optimisation programming language. Technical report, Universityof Essex, Colchester, UK (1999)

17. Van Hentenryck, P.: The OPL Optimization Programming Language. The MITPress (1999)

18. Flener, P., Frisch, A., Hnich, B., Kızıltan, Z., Miguel, I., Walsh, T.: Matrix modelling.In: Proceedings of the CP’01 Workshop on Modelling and Problem Formulation.(2001) 1–7

88

The Systematic Generation ofChannelling Constraints

Bernadette Martınez-Hernandez and Alan M. Frisch

Artificial Intelligence Group, Dept. of Computer Science, Univ. of York, York, UK

Abstract. The automatic modelling tool Conjure generates CSP mod-els from problem specifications. The generated models may contain sev-eral redundant representations of the same specification variable. Theconsistency between the alternative representations is maintained by im-posing channelling constraints. In this paper we present an algorithmthat produces correct channelling constraints for the generated modelsusing only the facilities already provided by the Conjure system.

1 Introduction

The multiple (concrete) representations of an (abstract) specification variableproduce the alternatives to be found in a combined model. The simultaneousassignments of values of each concrete representation must be constrained tostand for the same value for the abstract variable. Hence, we need a method tomaintain the consistency between simultaneous redundant representations. Thisis achieved by imposing channelling constraints. The systematic generation ofrepresentations allows to trace how each of the alternatives arises. This under-standing of representations helps to produce the correct channelling constraints.In this paper we introduce an algorithm for the systematic generation of chan-nelling constraints between simultaneous redundant representations (under theConjure framework). The basic notions related to modelling, channelling con-straints and Conjure, and an outline for the rest of the paper are given asfollows.

Solving a problem using CSP technology requires mapping its informal de-scription (often natural language) into a formal description in a particular formalsystem adequate for constraint solving. This process is called modelling. Mod-elling is a hard task. In many cases the conceptual gap between the informalproblem description and the constraint program is large. We may find that aproblem is easily modelled with a variable whose domain is unsupported bycurrent solvers. For example, the Sonet problem (shown in Figure 1) requires as-signing nodes to a group of rings. This is easily modelled with a variable (rings)whose domain is composed by multisets of sets of elements drawn from an inte-ger range (Nodes). Variables with this sort of domain are not supported by anycurrent solver. Modellers must use variables with supported domains to repre-sent and implement variables with unsupported domains. In our example, we canuse a two-dimensional matrix of integer variables to represent the variable rings.

89

Often, the transformation from an unsupported variable to a supported one in-volves the addition of constraints to the model. Following the example, to ensurethe soundness of the transformation from rings to the two-dimensional matrixwe need to impose a group of allDifferent constraints, one for each representedset.

In an attempt to reduce the burden of hand-crafted modelling, several sys-tems have been created to automate some of the modelling decisions. One ofthese systems is Conjure [1], a system that transforms problem specificationsinto CSP models. Abstract variables in a specification fed to Conjure mayhave domains currently unsupported by solvers. Conjure uses a set of refine-ment rules to compositionally refine the variables (and constraints) into concreterepresentations that can be implemented in current solvers. We briefly describeConjure in Section 2.

Modellers often come up with several alternatives to represent an abstractvariable. In Section 3 we discuss the definition of representation. We also presentsome examples of alternative redundant representations.

Different representations may have different strengths. Good modellers knowthat combining alternative representations in the same model can improve propa-gation, among other benefits. To maintain the consistency between these simul-taneous alternatives we need to add channelling constraints [2] to the model.Section 4 discusses channelling constraints (also called channels) between alter-native representations.

Conjure is able to produce multiple redundant representations of the samevariable. Also, these different alternatives can be generated simultaneously inthe same model. Currently, Conjure does not produce automatically the chan-nelling constraints to maintain the consistency between the alternative repre-sentations. We show in Section 5 that it is possible to generate systematicallythose channels. More importantly, we can use Conjure for the generation, thatis, we do not require implementing a new subsystem for the generation.

Final remarks and future work details can be found in the last section of thispaper.

2 Refinement and CONJURE

Essence version one (EV1) [4] is the language used to specify problems fedto Conjure. Variables in EV1 may have associated domains unsupported bycurrent solvers, for example, multisets, functions, relations and partitions. Unlikeother languages such as F [5], the domain system of EV1 allows domains to becompound to arbitrary depth, thus providing variables with a domain of sets ofintegers, sets of sets of integers, and so forth.

The current implemented version of Conjure refines variables whose (ar-bitrarily compound) domains can be composed by integers, Booleans, sets ormultisets. It is planed to extend this implementation of Conjure to support allthe range of domains allowed by EV1. A full description of the EV1 language

2

90

A Sonet communication network comprises a number of rings, each joining a numberof nodes. A node is installed on a ring using an ADM. There is a capacity bound onthe number of nodes that can be installed on a ring. Each node can be installed onmore than one ring. Communication can be routed between a pair of nodes only ifboth are installed on a common ring. Given the capacity bound and a specificationof which pairs of nodes must communicate, allocate a set of nodes to each ring sothat the given communication demands are met. The objective is to minimise thenumber of ADMs used. (This is a common simplification of the full Sonet problem,as described in [3])

given nrings:int, nnodes:int, capacity:intwhere nrings≥ 1, nnodes≥ 1, capacity≥ 1letting Nodes be int(1..nnodes)given demand:set of set (size 2) of Nodesfind rings: mset (size nrings) of set (maxsize capacity) of Nodesminimising

Pr∈rings |r|

such that ∀pair∈demand. ∃r∈rings. pair ⊆ r

Fig. 1. Essence specification of the Sonet Problem.

and the performance of Conjure is given in http://www.cs.york.ac.uk/aig/constraints/AutoModel/.

An example of an EV1 specification can be found in Figure 1. Keywordsidentifying the various statements of the specification are shown in teletypefont. The integer parameters of the problem nnodes, nrings, capacity and demandare declared after the keyword given. The restrictions for the parameters of aproblem instance are specified after the keyword where. Following the keywordletting, Nodes is declared as a ‘short-cut’ for the integer range int(1..nnodes).The decision variable rings is declared as a multiset of sets of integer numbers,after the keyword find. Notice that when declared, each parameter and eachvariable has its domain attached after the symbol ‘:’. The objective functionfollows the keyword minimising and the constraints of the problem follow thekeywords such that.

The variable rings has an associated compound domain. To refine this vari-able and others of arbitrarily compound domain, Conjure performs the refine-ment process in a compositional manner. That is, it first reduces the multisetlayer by refining rings into a new variable(s) that represents the multiset. Thisreduction of layers is performed by the recursive application of the refinementrules. All refinement rules output a correct representation of their input, wherean input can be a variable or a constraint.

To exemplify the compositional construction carried out in each rule let usexplain the SizedMultiset1 rule shown in Figure 2. This rule accepts as input avariable whose domain is composed by multisets of τ (τ represents any domain),where each multiset has a restricted size n (n is a placeholder for any expression).Thus, the variable rings matches the input of the SizedMultiset1 rule and thisrule can be applied. The SizedMultiset1 rule ‘peels off’ the multiset layer of

3

91

SizedMultiset1 ρ(X1:mset (size m) of τ)ref→

X ′′1

represent X1 by expmset(X ′1)

|X ′′

1 ∈ ρ(X ′1)

X′1 = genSymbol(X1, matrix (indexed by 1..m) of τ)

Fig. 2. SizedMultiset1 Rule

the variable X1. It does so by generating a new variable with the instructiongenSymbol. The new variable X ′

1 has a matrix domain. To refine the followinglayer (τ in the rule, sets for the variable rings) of the new variable, the refinementfunction ρ must be called recursively. The function ρ matches its input withthe input of a rule and then applies the rule. After the recursive refinement isfinished, the SizedMultiset1 rule returns X ′′

1 , a correct representation of X1.For the variable rings the output depends on the refinement rules used by ρ totransform the set layer.

Often, there are several refinement rules that can be applied to an expression.In fact, ρ returns a set of alternative refinements. Hence, by the successive ap-plication of refinement rules, Conjure may transform a high-level specificationinto various different CSP models. Each of the generated models is a correctrepresentation of the original specification and it can be easily implemented ina current solver. At the moment, Conjure does not implement any heuristic toselect effective models from the alternatives generated.

Conjure is also able to generate models including multiple alternative con-structions related to an abstract variable if this variable appears more than oncein the constraints of an EV1 specification. For example, in the Sonet problemConjure may generate a model with two simultaneous alternatives for rings,one to be used in the objective function (

∑r∈rings |r|), and a very different one

for the problem constraint (∀pair∈demand. ∃r∈rings. pair ⊆ r). It is expected thateach of the constructed alternatives represent the same value of the specificationvariable when all the variables of an implementation of the model are instan-tiated to values. This consistency between alternatives is maintained throughchannelling constraints.

The current implementation of Conjure does not generate the channellingconstraints. However, the rules add information tags to the model. Some of thesetags annotate the relation between the variables used in the rule. For example,we find in the SizedMultiset1 rule the channelling annotation tag ‘representX1 by expmset(X ′

1)’ to express the relation between X1 and X ′1. In general, the

channelling annotations provide information about the type of representation(s)generated by the rule. A human modeller can interpret the information of thesetags and use it to construct the needed channelling constraints. The next sectiondiscusses the representations and the channelling annotations used by Conjureto tag the derivation (and relation) of representations. Channelling constraintsare formally defined and discussed in Section 4.

4

92

3 Representations

An abstract variable whose domain is unsupported by a solver must be imple-mented with a variable or variables of supported domain and possibly someassociated constraints over the implemented variables. During the refinementprocess, the input of a rule can be a variable or a constraint (with its variables)whereas variables (and their domains) and possibly some constraints composethe output of a refinement rule. In both cases we have that a group of variablesand constraints is transformed into another group of variables and constraints.The following definition of CSP instance captures the notion of a general unitof transformation.

Definition 1 A viewpoint is a pair V = (z, DZ) where Z is a set of variablesand DZ is a set containing for every variable z ∈ Z an associated domain DZ(z)defining the set of possible values of z (domain).

An assignment in V is a pair 〈z, a〉, which means that variable z ∈ Z is as-signed the value a ∈ DZ(z). A total assignment is a set of unique assignmentsfor each of the variables in Z.

A CSP instance is a pair R = (V, C) where V is a viewpoint and C isa (possibly empty) set of the constraints. The variables (and domains) of eachconstraint in C must be included in V.

A total assignment of the variables of the CSP instance R is a solution if itsatisfies all the constraints in R.

Example of CSP instance: The CSP instance S1 (set) consists of a singlevariable S whose domain consists of all sets of size n whose elements are drawnfrom the integer range a..b; where a, b and n are integer numbers such thata ≤ b and n > 0. The set of constraints in S1 is empty. These definitions of a,b, n, S and S1 are used throughout this document.

Viewpoints are defined and used by Law and Lee in their model induction[6] and model algebra [7]. The concept of CSP instance is very similar to theirconcept of model. The main difference is that CSP instances (and viewpoints)are not required to be associated with a problem (immediately). We do, however,describe the conditions for a CSP instance to be represented by another CSPinstance. Intuitively, we aim to transform an initial CSP instance into anotherone that represents it consistently. Hence, the definition of representation mustbe strongly related to the preservation of solutions.

Definition 2 R′ represents R via ψ if R′ = (V ′, C ′) and R = (V, C) are CSPinstances and ψ is a partial function from the total assignments of the variablesin V ′ into the total assignments of the variables in V such that:

– For each total assignment x′ of the variables in V ′, x′ satisfies the constraintsin C ′ if and only if ψ(x′) is defined and satisfies the constraints in C,

– For each assignment x of the variables in V satisfying the constraints in C,there is an assignment x′ of the variables in V ′ such that ψ(x′) = x.

5

93

The triple 〈R, R′, ψ〉 specifies the representation of R (under R′ and ψ).

Example of representation: The array of one dimension of integer variablesV aE1S indexed by 1..n composes the variable section of the CSP instance E1(explicit set). Each element of the array has an integer domain a..b. The CSPinstance E1 includes constraints that ensure all the elements of the array V aE1Sare different (allDifferent). Let ψE be a function from the assignments of V aE1Sinto the assignments of S. The application of ψE is defined only for arrays whosevalues are all different. The function ψE returns a set containing the valuespresent in the array. Notice that ψE fulfils all the requisites of Definition 2,hence E1 represents S1 via ψE .

The variable representation discussed by Jefferson and Frisch [8] is similarto this definition, however, they do not base representations on CSP instances.In fact, they disregards the constraints in its definition. A stronger definition ofthe function ψ defines the equivalence of CSP instances.

Definition 3 R′ is equivalent to R via ψ if and only if R′ represents R viaψ, ψ has an inverse, and R represents R via ψ−1.

Examples of equivalent representation: Any CSP instance R is equiva-lent to itself via the identity function. An example more related to sets is theCSP instance O1 (occurrence set) that contains an array of one dimension ofBoolean variables V aO1S indexed by the integer range a..b. The constraintsum(V aO1S) = n is the only element of the set of constraints. The functionψO from the assignments of V aO1S to the assignments of S is defined only forthe assignments satisfying sum(V aO1S) = n. ψO returns a set with the indexesof V aO1S where the Boolean value True is assigned. It is not difficult to con-clude that O1 represents S1 via ψO and S1 represents O1 via ψ−1

O . Hence O1is equivalent to S1 via ψO.

The one to one correspondence between assignments assumed by the equiva-lence is similar to the requirement for redundancy of Law and Lee [6]. Among therepresentations of Conjure we find important non-equivalent representations ofsome CSP instances. For example, the CSP instance E1 is not equivalent to S1(neither to O1) but it is still a representation of both. Therefore, we use the de-finition of representation to describe redundancy, that is, two CSP instances areredundant if they represent the same CSP instance. In the Conjure framework,two different CSP instances generated by the refinement of the same specificationvariable are redundant if the generation ensures that they are representationsof the specification variable. So far, we have discussed CSP instances withoutfocusing in how these CSP instances were constructed to represent another CSPinstance. Let us now focus on the core of this construction, the refinement rules.

Notice in Figure 2 we specify the input of the rule SizedMultiset1 as onlyone variable (X1). Similarly, a single constraint can be used as the input of arule. In fact, a single variable can be seen as the CSP instance consisting onlyof the variable and the empty set of constraints. In the case of a constraint, the

6

94

SizedSet1 ρ(X1:set (size m) of τ)ref→

X ′′1

represent X1 by expset(X ′1)

such that χ|X ′′

1 ∈ ρ(X ′1)

χ ∈ ρ(allDifferent(X ′1))


Fig. 3. SizedSet1 Rule

CSP instance is composed by the variables and domains of the constraint andthe constraint as the only element of the set of constraints.

Also, observe the rule SizedMultiset1 composes an intermediate CSP. Thevariable X ′

1 is the only variable of this intermediate CSP instance. The inter-mediate CSP instance contains no constraints and it represents the input X1.Furthermore, each CSP instance of ρ(X ′

1) represents the CSP instance contain-ing only the variable X ′

1. The following theorem ensures, for the cases where anintermediate CSP instance is generated, that each final CSP instance returnedby the refinement rule is a representation of the input CSP instance.

Theorem 1 (Transitivity) If R′′ represents R′ via ψ′ and R′ represents R viaψ then R′′ represents R via ψ ψ′.

Proof. Straightforward.

Previously, we introduced the channelling annotations as information tags(attached to the model by the rules of Conjure) used to indicate the relationbetween two variables. More formally, they provide information to construct therepresentation relation between two CSP instances. To illustrate this construc-tion let us use some of the former examples.

The SizedSet1 rule shown in Figure 3 can be used to generate E1 whenrefining the variable S. The variable S on its own is seen as the CSP instanceS1. This rule would introduce the channelling annotation ‘represent S by ex-pset(VaE1S )’ stating that VaE1S is the variable used for the explicit set repre-sentation of S. That means E1 represents S1 via ψE . Notice that the functionψE and the allDifferent constraints of E1 are implicitly included in the anno-tation as part of the representation. If the CSP instance O1 is generated torepresent S1 using the SizedSet2 rule (see Figure 4), the channelling anno-tation ‘represent S by occset(VaO1S )’ its attached to the model to indicateV aO1S as the variable used in the occurrence representation of the set S. Simi-larly, O1 represents S1 via ψO, where ψO and the constraint sum(V aO1S) = nare implicitly included on the annotation as part of the obtained representation.From the refinement rules we can produce a catalogue of channelling annotationsexpressing the different representations used in the refinement.

7

95

SizedSet2 ρ(X1:set (size m) of τ)ref→

X ′′1

represent X1 by occset(X ′1)

such that ω|X ′′

1 ∈ ρ(X ′1)

ω ∈ ρ(sum(X ′1) = m)

X′1 = genSymbol(X1, matrix (indexed by τ) of bool)

Fig. 4. SizedSet2 Rule

4 Alternative Representations and Channels

Alternative representations of the same variable remain independent until weadd channelling constraints to maintain the consistency between them. Thereis not a precise definition of channelling constraints, hence we present below adefinition based on the discused framework.

Definition 4 Let P1 = 〈R, R1, ψ1〉 and P2 = 〈R, R2, ψ2〉 be representations ofR. Let vars(R1) and vars(R2) be disjoint sets of variables. The set of constraintsCh is considered a set of channelling constraints between P1 and P2 if:

– For each total assignment x1 (of the variables in R1) satisfying the con-straints in R1 there is at least one total assignment x2 (of the variables inR2) such that the composed assignment x1 ∪ x2 satisfies the constraints inCh. Similarly for each satisfying x2 there must be an assignment x1 suchthat the composed assignment x1 ∪ x2 satisfies the constraints in Ch.

– For all total assignments x1 and x2 where the composed assignment x1 ∪ x2

satisfies the constraints in Ch, ψ1(x1) and ψ2(x2) are either both undefinedor take the same value.

This definition extends the definition of channelling constraints between vari-able representations of Jefferson and Frisch [8]. This definition ensures the correctconnection between assignments that represent the same value. It also takes intoaccount that some assignments in a representation may be undefined in anotherrepresentation.

Let us now show the channels between some of the former examples of CSPinstances. The CSP instances S1 and E1 represent S1. Observe that S1 repre-sents S1 via the identity function (called here ψid). The following constraint isa channelling constraint between S1 and E1.

[CHS1E1] ∀j∈a..b. (j ∈ S ⇔ ∃i∈1..n. V aE1S[i] = j)

Clearly, for every assignment of S there is at least one assignment of V aE1Ssuch that S ∪ V aE1S satisfies the channel CHS1E1 and V aE1S satisfies theconstraints of E1. This is also satisfied the other way around. Furthermore, thedefinition of ψid and ψE ensure ψid(S) and ψE(V aE1S) are either equal or both

8

96

undefined when S ∪ V aE1S satisfies CHS1E1.

Similarly, the following constraint is a channelling constraint between S1 andO1.

[CHS1O1] ∀j∈a..b. (j ∈ S ⇔ V aO1S[j])

Suppose that Conjure produces a model with both E1 and O1 as represen-tations of S1. It is not very difficult to see the following channelling constraintto connect the alternative representations E1 and O1 is correct according toDefinition 4.

[CHE1O1] ∀j∈a..b. (V aO1S[j] ⇔ ∃i∈1..n. V aE1S[i] = j)

We can join two alternative representations of an initial CSP instance togetherwith their channelling constraints. The following theorem ensures that a CSPinstance composed this way is also a representation the initial CSP instance.

Theorem 2 Let 〈R, R1, ψ1〉 and 〈R, R2, ψ2〉 be variable disjoint representationsof R, and Ch a set of channelling constraints between them. We can compose afunction ψ from ψ1 or ψ2 such that the triple 〈R, R1 ∪ R2 ∪ Ch, ψ〉 representsR via ψ.

Proof. Define ψ as the extended version of ψ1, such that it can be applied to anytotal assignment x of the union of variables of R1 and R2. The function ψ returnsthe same values the function ψ1 returns whenever a given assignment satisfiesthe constraints in Ch. The function ψ is undefined otherwise. The definition ofψ and the correct channelling constraints ensure that every total assignment xof the union of variables of R1 and R2 satisfies the constraints in R1, R2 andCh if and only if the function ψ(x) is defined. Also using the definitions it is nothard to see each solution of R is obtained by the application of ψ some totalassignment x. Hence the CSP instance R1 ∪ R2 ∪ Ch satisfies the requirementsto represent R via ψ. ¤

Note that 〈S1, E1 ∪ O1 ∪ CHE1O1, ψ′O〉 is a representation of S1 whereψ′O is the extension of ψO

The refinement of R returns a CSP instance R′ and a set of annotationsdeclaring all the intermediate CSP instances representing R where R′ representsall of them. That is, the annotations describe a sequence of refinements from R′

to R. We call R′ the final representation of R. In the next section we discuss thegeneration of channelling constraints between two alternative final (redundant)representations.

5 Systematic Generation of Channelling Constraints

We have shown how we can relate two CSP instances using the informationprovided by the channelling annotations: more importantly, we presented a def-inition of channelling constraints between representations based on the notions

9

97

of CSP instances and representations. The examples of channelling constraintsshown were fairly simple; however the generation of channelling constraints basedon the annotations becomes challenging for human modellers when the inputvariables have deeply compound domains. It is not unreasonable to think thatthe automation of the generation of the channelling constraints can be providedby the refinement system already provided by Conjure.

Let P be a specification refined by Conjure into P ′ where the variable Xin P has two final representations in P ′, X1 and Y1. Suppose we can use theannotations to force Conjure to produce only certain representations (with thesame domains) for certain variables. That is, we can ask Conjure to generatespecific representations following a path of refinements given by the annotations.Let Y be a new variable with exactly the same domain of X. The constraint X =Y fulfils the definition of channelling constraint between X and Y , where bothare representations of X. We can then re-refine X = Y forcing the X to refineinto X1 and the Y to refine into Y1. Such refinement produces a representationof the channelling constraints between X1 and Y1. Hence, we can compose arepresentation of X, 〈X, X1 ∪ Y1 ∪ ρX1,Y1(X = Y ), ψX1〉, where ρX1,Y1(X = Y )is the conditioned refinement of X = Y . We call this algorithm of generation thepost-processing algorithm.

Theorem 3 If the rules of Conjure produce only representations of their in-puts then the post-processing algorithm generates equivalent (correct) models.

Proof Sketch: If each specification variable has only one representation in a Con-jure generated model we can ensure that the produced model is a representationof the specification problem if the refinement always returns representations.

Let us restrain Conjure to produce only models with one representation ofeach variable (if any). Suppose we have a variable A with two occurrences in theconstraints of the specification M . Let M ′ be the model M with a new variableA′ of the same domain of A, the constraint A = A′, and one of the occurrencesof A substituted by A′. It is not hard to show that M ′ represents M as long asall the rules preserve the representation property. Then, a model M ′′ obtainedby the post-processing algorithm representing M ′ represents M too. In fact wecan prove that there is a model M ′ which is equivalent (if not identical) to eachM ′′ generated by the post-processing algorithm. ¤

We show now an example of the post-processing algorithm with a modifiedversion of the variable rings of the Sonet problem. Let us define the domain of thevariable rings as all the multisets of size nrings, where each of the multisets hasfor elements sets of size capacity of the integer range Nodes. Suppose we applythe SizedMultiset1 rule to rings, then, a new variable rings′ is introduced. Theelements of the domain of this variable rings′ are arrays of sets of integers. Eacharray is indexed by the integer range 1..n. The variable rings′ is used as input forthe recursive call to the refinement function. This call will eventually use eitherthe SizedSet1 rule or the SizedSet2 rule.

The application of the SizedSet1 rule produces the variable rings′1 whosedomain elements are two dimensional matrices of integer variables. All matrices

10

98

SizedMultisetEquality1 ρ(X1:mset (size m) of τ = X2:mset (size m) of τ)ref→

φrepresent X1 by expmset(X ′

1)represent X2 by expmset(X ′

2)|φ ∈ ρ(∀i∈1..m. ∃j∈1..m. X ′

1[i] = X ′2[j])



Fig. 5. SizedMultisetEquality1 Equality Rule

SizedSetEquality1 ρ(X1:set (size m) of τ = X2:set (size m) of τ)ref→

φrepresent X1 by expset(X ′

1)represent X2 by occset(X ′

2)such that χ ∧ ω|φ ∈ ρ(∀j∈τ . X ′

2[j] ⇔ ∃i∈1..m. X ′1[i] = j)

χ ∈ ρ(allDifferent(X ′1))

ω ∈ ρ(sum(X ′2) = m)


X′2 = genSymbol(X2, matrix (indexed by τ) of bool)

Fig. 6. SizedSetEquality1 Equality Rule

of this domain must be indexed by the integer ranges 1..nrings and 1..capacity.The channelling annotations introduced by this refinement are ‘represent ringsby expmset(rings′)’ and ‘∀i∈1..nrings. represent rings′[i] by expset(rings′1[i])’.

On the other hand, the application of the SizedSet2 rule produces the vari-able rings′2 whose domain is integrated by two-dimensional matrices of Booleanvariables. All matrices of this domain must be indexed by the integer ranges1..nrings and 1..nnodes. The channelling annotations introduced by this re-finement are ‘represent rings by expmset(rings′)’ and ‘∀j∈1..nrings.represent rings′[j] by occset(rings′2[j])’.

If these two representations, rings′1 and rings′2 are generated in the samemodel we need to construct the correct channelling constraint. For the post-processing algorithm we introduce the new variable rings to be restricted torefine into rings′2. The restricted refinement of the constraint rings= rings usesthe SizedMultisetEquality1 rule (Fig. 5) first, and then, by means of therecursive calls to the refinement function the SizedSetEquality1 rule (Fig. 6)is applied. Finally, the following channelling constraint is obtained.

[CHrings] ∀i∈1..nrings. ∃j∈1..nrings. ∀k∈1..nnodes.

(rings′1[i, k] ⇔ ∃l∈1..capacity. rings′2[j, l] = k)

11

99

6 Conclusion and Future Work

We present in this paper a framework relating representations of the same vari-able in different levels of abstractions. The generation of channelling constraintspresented is based on the sequence of representations of an specification variable.

The algorithm presented, based on generalisations over CSP instances, reducethe problem of automatic generation of channelling constraints to the problemof proving the correctness of the refinement rules of Conjure. This method ofgeneration may work not only in the Conjure system, but also in any systemtransforming CSP instances with an engine sufficient enough to transform theconstraint (A = A′).

It is important to notice the post processing algorithm can be generalised toproduce the channels between three or more representations of the same spec-ification variable. We may even modify the strategy of generation and produceonly some channelling constraints instead of all the multiple channels betweenthese representations.

The work is far from being complete. The conditions and performance ofthe generation of channels for non (totally) redundant representations are notincluded in this paper. Also, correct channels may need post-processing for a bet-ter reading of the user and/or to generate efficient code for a CSP solver. Whatis more, due to the nature of the refinement process we may generate severalvalid alternative channelling constraints. This alternative generation increasesthe complexity of the model selection task.

7 Acknowledgements

We thank to Ian Miguel and Chris Jefferson for the fruitful discussions andcomments of this work.

References

1. Frisch, A.M., Jefferson, C., Martınez-Hernandez, B., Miguel, I.: The rules of con-straint modelling. In: Nineteenth Int. Joint Conf. on Artificial Intelligence. (2005)

2. Cheng, B.M.W., Choi, K.M.F., Lee, J.H.M., Wu, J.C.K.: Increasing constraintpropagation by redundant modeling: An experience report. Constraints 4 (1999)167–192

3. Frisch, A.M., Hnich, B., Miguel, I., Smith, B.M., Walsh, T.: Transforming and refin-ing abstract constraint specifications. In: Symposium on Abstraction, Reformulationand Approximation (SARA). (2005)

4. Frisch, A.M., Grum, M., Jefferson, C., Martınez-Hernandez, B., Miguel, I.: TheEssence of Essence. In: International Workshop on Modelling and ReformulatingConstraint Satisfaction Problems. (2005) Held at the 11th International Conferenceon Principles and Practice of Constraint Programming.

5. Hnich, B.: Function Variables for Constraint Programming. PhD thesis, ComputerScience Division, Department of Information Science, Uppsala University (2003)

12

100

6. Law, Y.C., Lee, J.H.M.: Model induction: A new source of CSP model redundancy.In: Eighteenth national conference on Artificial intelligence, American Associationfor Artificial Intelligence (2002) 54–60

7. Law, Y.C., Lee, J.H.M.: Algebraic properties of CSP model operators. In: Pro-ceedings of the International Workshop on Reformulating Constraint SatisfactionProblems: Towards Systematisation and Automation (Held in Conjunction withCP-2002). (2002) 57–71 Shorter paper in CP-2002.

8. Jefferson, C., Frisch, A.M.: Representations of sets and multisets in constraint pro-gramming. In: International Workshop on Modelling and Reformulating ConstraintSatisfaction Problems. (2005) Held at the 11th International Conference on Princi-ples and Practice of Constraint Programming.

13

101

Representations of Sets and Multisets inConstraint Programming

Christopher Jefferson and Alan M. Frisch

AI Group, Department of Computer ScienceUniversity of York, UK

[caj,frisch]@cs.york.ac.uk

Abstract. Constraint programming is a powerful and general purposetool, but its use is limited, as the process of refining a specification ofa problem into an efficient constraint program (known as modelling) ismore of an art than a science at present and must be learned by yearsof experience. This paper theoretically analyses one frequently occurringpattern in modelling, how to choose between different representationsof high-level structures, in particular sets and multisets. It differs fromprevious work by providing methods of comparing very different repre-sentations, and by abstracting away from a particular implementationof a representation. It demonstrates useful theoretical dominance resultsbetween representations in both a problem dependant and independentcontext.1

1 Introduction

Constraint programming has proved to be an efficient method of solving com-binatorial problems, and there are now a number of powerful and efficient con-straint solvers available. Unfortunately, constraint programs which implementthe same problem in different ways can have hugely differing runtimes and searchsize.

The first, and arguably most important part of the modelling process isdeciding how to represent each variable in a problem specification, either bycreating a CSP where some variables of the original specification are representedas multiple CSP variables or choosing how the constraint solver will internallyrepresent each variable. This decision is affected by a number of issues, includingthe constraints, the domain size of the variables and if the solver being used offersspecial support for some choices of representation.

Many previous papers have looked at the most efficient methods of imple-menting one or more representations of high-level structures, including func-tions [6], permutations [7] and sets and multisets [1]. These papers comparedthe search size and runtime required for the implementation of constraints onparticular representations, making use of the specialised constraints availablein specific solvers, and how logically equivalent constraints on the same vari-ables achieve varying levels of propagation. The purpose of this paper is to avoid1 Thanks to both Ian Miguel and Bernadette Martinez-Hernandez for helpful discus-

sions on this work

102

studying a particular implementation of a representation, and instead given theCSP variables which will represent a particular variable in an abstract specifi-cation, investigate how well the best implementation which used those variableswould perform in terms of search size. The closest previous work to this paper,by Walsh [8], compared two representations of multisets, and this paper extendssome of the results in that paper. This paper begins by giving a more generaldefinition of representation than the one considered in [8], and then goes on toprove a number of useful results, including when representations dominate eachother, where one representation always produces smaller searches than another,and also how representations can be shown to produce as small a search as thebest possible representation (which is the representation which can represent allpossible sub-domains of a variable). The aim is that this theory will be usefulto both modellers and automated modelling tools (such as Conjure [3]), toprovide guidance on the best representation to use for a particular problem. Theexamples used throughout this paper involve sets and multisets, as these are veryfrequently occurring constructs in constraint programming. The theory howeverapplies to representing any type of variable, and has been usefully applied to alarge range of constructs including graphs, relations and spare matrices.

1.1 Background

An instance of a finite domain CSP consists of a triple 〈V , Dom,C〉, where V isa finite set of variables, Dom is a function which maps each variable v ∈ V to thefinite set of assignments of that variable and C is a finite set of constraints. Eachconstraint c ∈ C specifies a set of assignments to some subset of V , denoted byvars(C), which the constraint allows.

Given a CSP variable X with domain D, a sub-domain of X is defined asa subset of D. For a vector of CSP variables V , a sub-domain of V is definedas a vector V D, where V D[i] is a sub-domain of V [i]. Given a CSP instanceP = 〈V , D, C〉, a sub-domain of P is a sub-domain of V . Given a sub-domainof either a variable, vector of variables or a CSP, the assignments allowed by asub-domain (denoted assign(S)) are defined only on the variables in the sub-domain, and consists of all possible assignments to the variables allowed by thegiven sub-domains. One sub-domain is (strictly) contained in another if the setof assignments allowed by the first is (strictly) contained in the assignmentsallowed by the second. A solution of a CSP 〈V , Dom,C〉 is an assignment to Vwhich satisfies all the constraints in C.

Example 1. A finite domain CSP is given by 〈X, Y , X ∈ 1, 2, 3Y ∈ 1, 2, 3,X + Y > 1, X < 3〉. The domain of X is 1, 2, 3. One sub-domain ofthe CSP is X ∈ 1, 2, Y ∈ 2, 3, which allows the assignments 〈X, Y 〉 ∈〈1, 2〉, 〈1, 3〉, 〈2, 2〉, 〈2, 3〉. Of these assignments, 〈1, 2〉 is the only solution, as itis the only one which satisfies both constraints.

The method for solving CSPs considered in the major of this paper willbe branch-and-propagate search. As can be implied from the title, there aretwo major operations which must be performed while using this kind of search,branching and propagating. Both act on sub-domains of the CSP.

103

Definition 1. A propagation algorithm is defined by a function F which mapsbetween sub-domains of a CSP with satisfies the following two conditions.

– f(S) = T → assign(T ) ⊆ assign(S) and any solutions in assign(S) arealso in assign(T )

– assign(S) ⊆ assign(T ) → assign(f(S)) ⊆ assign(f(T ))

One common family of propagation algorithms are constraint-based propa-gation algorithms, which given a sub-domain of a CSP P and a constraint c ∈ P ,only remove values which do not satisfy c, and therefore clearly can’t be part ofany assignment which satisfies P . MAC propagation [2] removes all such values,and when no such values can be removed the sub-domain is defined to satisfyGAC with respect to c.

The second operation used during search is branching, which simply takesa sub-domain of a CSP P and generates n new CSPs P1, . . . , Pn where Pi isidentical to P except with an added constraint Ci, such that all assignments inthe sub-domain are allowed by at least one of the Ci. Commonly the Ci are unaryconstraints. If any constraints can be used for branching it is possible for searchto not be finite. The simplest way to prevent this is to ensure either at eachpropagation at least one value is removed from the sub-domain of some variablein each branch, or alternatively one finite CSPs, to ensure that no constraint isimposed more than once.

Definition 2. N-way branch-and-propagate search of a CSP P begins from thecomplete sub-domain of P and is defined recursively. Given a sub-domain of aCSP, branch-and-propagate search first performs propagation algorithms on thesub-domain. If the sub-domain of each variable consists of exactly one value, thenthis is an assignment node. If this assignment satisfies all the constraints thena solution has been found, else this is a failure node. Failure nodes also ariseif the sub-domain of any variable becomes empty. If neither of these conditionshold then the current sub-domain is branched, generating a set of new CSPs tosolve.

2 Representations

Given a specification of a problem, it is frequently not possible to map it directlyinto a CSP as it contains variables of types which are not supported by theconstraint solver being used.

Example 2. The Social Golfer Problem involves trying to schedule w rounds ofgolf, where in each round the g × s players are arranged into g groups of size ssuch that no pair of players appear in a group together more than once.

The most obvious formulation of Example 2 would consist of a single variableto represent the schedule. This leads to variables with huge domain, for example16 golfers split into 4 games on each of 6 weeks would lead to a schedule variablewith over 6 ∗ 1045 possible assignments. For this, and a number of other reasons,it often not efficient or even feasible to use an initial choice of representation.There are a number of ways of reducing memory usage to feasible amount, each

104

of which has it’s own trade-offs. Studying these trade-offs is the major purposeof this paper.

Example 3. Consider a CSP variable X with domain D = 1, 2, . . . , 10. The“bounds representation” of X allows only the sub-domains of X (and thereforesubsets of D) represented by the set 〈l, u〉|1 ≤ l ≤ u ≤ n ∪ ∅, where 〈l, u〉represents the sub-domain x|l ≤ x ≤ u and ∅ represents the empty sub-domain.

Propagating the constraint X > 5 from the complete sub-domain of X asmuch as possible leads to the sub-domain 6, 7, 8, 9, 10, which the “boundsrepresentation” represents exactly with 〈6, 10〉. Consider now the constraint “Xis even”. The assignments to X allowed by this constraint are 2, 4, 6, 8, 10. Anyelement of the “bounds representation” which allows all these assignments (forexample 〈2, 10〉) would also allow the set of assignments 3, 5, 7, 9.

Example 4. Consider a CSP variable X with the domain “all subsets of 0,1,2,3of size 2”. This is represented under the “explicit representation” by 2 variables,V1 and V2, each of domain 0, 1, 2, 3, and the constraint V1 6= V2. Assignmentsto V1 and V2 represent assignments to X by the mapping X = V1, V2.

Consider the problem “find all assignments to X such that 1 6∈ X”. Thiswould be represented with the explicit representation by the CSP “find all as-signments to V1 and V2 such that V1 6= 1, V2 6= 1 and V1 6= V2”. At the firstnode of search, MAC would remove 1 from the sub-domain of V1 and V2, andthe remaining assignments to V1 and V2 which satisfy V1 6= V2 all representassignments to X which satisfy 1 6∈ X.

Consider now the problem “find all assignments to X such that the sum of themembers of X is 3” This would be represented by the CSP “find all assignmentsto V1 and V2 such that V1 6= V2 and V1 + V2 = 3”. At the first node of search,no values can be removed from the domains of either V1 or V2, as they are allpart of an assignment to the variables which satisfies all the constraints. Thereare however many assignments which represent assignments to X whose sum isnot 3, for example V1 = 2, V2 = 3.

Example 5. Given a natural number n, consider the CSP containing a single setvariable A with domain MS(1, . . . , 2n, 1, [0 . . . 2n]) and the two constraints|A| = n and |A| = n + 1. If a solver can achieve GAC propagation for thesetwo constraints, propagating both constraints will empty the sub-domain of Awithout search. Now consider A being represented by a Boolean vector V oflength 2n, where V[i] is true is i ∈ A. On any sub-domain of V where lessthan n − 1 variables are instantiated, neither the constraint sum(V ) = n orsum(V ) = n + 1 would remove any values from the domains of any variables ifpropagated, so the size of the search tree will be at least 2n−2 nodes.

Examples 3 and 4 give two examples of representing a variable in a waywhich limits the number of possible sub-domains which are allowed. These aimto show the problems involved with choosing how to represent a variable in aCSP. In both examples, the first constraint considered is unaffected by the choiceof representation, as it is possible to represent the sub-domain of the original

105

variable which contains only those assignments which satisfy the constraint. Inboth examples, the second constraint considered suffers by the choice of rep-resentation, as the smallest sub-domain which contains all assignments whichsatisfy the constraint also contains a number of assignments which do not. Asthe sub-domain of the CSP is the only method by which propagation algorithmscan pass information to each other, this can impair the performance of search,and lead to greatly increased search size. This is shown in Example 5, where apoor choice of representation of a set of size n causes search to increase in sizefrom a single node to a search tree of size exponential in n. This paper aims toformalise the idea of a representation and investigate how the states which canbe represented limit the effectiveness of a particular representation.

Example 3 shows an example of where the representation can be consideredas integral to the solver. Example 4 on the other hand, generates a new CSPwhose solutions can be mapped to the solutions of the original. These shouldnot be considered as disjoint families of representations, as the second can beconsidered as a subset of the first, and many solvers implement nested typesinternally as directly as a vector of CSP variables. For example, Conjunto[4]represents sets using a representation known as the occurrence representation(Definition 8). Both of these kinds of representations will be considered, andmuch of the theory of representations which will be presented applies equallyto all representations, although some results can be extended or simplified byconsidering only representations which map one CSP to another.

Definition 3. A representation of a CSP domain D is defined as a pair 〈R, f〉,where R is a partially ordered set and f : R → P(D). R is the set of stateswhich the representation can take, and for each r ∈ R, f(r) is the sub-domainof X this state represents. A state r1 ∈ R is defined to be reachable from a stater2 ∈ R if r2 < r1. To be a valid representation, 〈R, f〉 must satisfy the followingconditions:

1. r1 ≤ r2 → f(r1) ⊆ f(r2)If r1 is reachable from r2, then r1 must represent a subset of the assignmentsallowed by r2.

2. ∃1 rD ∈ R. (f(rD) = D ∧ ∀r ∈ R. r 6= rD → r < rD)There must be some initial representational state from which all other statescan be reached, to begin search.

3. ∀r ∈ R. ∃r∅ ∈ R. (f(r∅) = ∅ ∧ r∅ ≤ r)From all representational states it must be possible to reach the state whichrepresents nothing, else it would not be possible to fail.

4. ∀r ∈ R,∀d ∈ f(r). ∃rd ∈ R. f(rd) = d ∧ rd ≤ r.If a representational state represents an assignment d, it must be possibleto reach a state which represents just the assignment d, else it would not bepossible to reach all assignments a representation state represents.

Definition 4. A simple representation of a CSP domain D is a representa-tion 〈R, f〉 where the extra condition ∀r1, r2 ∈ R. f(r1) ⊆ f(r2) → r1 ≤ r2

holds. Combining this with Condition 1 of the definition of representation gives∀r1, r2 ∈ R. f(r1) ⊆ f(r2) ↔ r1 ≤ r2.

106

The definition of n-way branch-and-propagate search in Section 1.1 did notrefer to representations. The addition of representations limits the search, so thatonly certain sub-domains are allowed during search, and also which sub-domainsmay be reached from a given sub-domains is limited by the representation. Themost important change to the definition of propagation algorithms is that in-stead of the result of applying the propagation algorithm being a subset of theassignments, instead the state of search achieved must be less in the ordering onthe representation than the state which before. A proof ensuring that the lim-itations places on representations in Definition 3 are sufficient to ensure searchis still correct is outlined in Theorem 1 due to lack of space. We are now in aposition to justify the definition of representation given in Definition 3, and alsoprove that this definition is sufficient for search to be correct.

Theorem 1. The four conditions in Definition 3 are necessary and sufficientfor a representation to be used to represent sub-domains in an n-way branchand propagate search and any branching strategy still leads to all solutions, finitesearch, and all nodes being either solutions nodes, failure nodes, or branched on.

Proof. If all these conditions are satisfied, then by condition 2, the first nodecan allow all assignments to the CSP. Condition 1 ensures search will be finite,condition 3 ensures from any node failure can be reached and condition 4 ensuresfrom any node it possible (although not necessary) to choose a variable whosesub-domain allows more than one assignment and generate a new node for everyallowed assignment. ut

2.1 Variable representations

As discussed by example at the beginning of Section 2, an important family ofrepresentations are those which represent a CSP variable by replacing it with avector of CSP variables and mapping constraints involving the original variableto new constraints involving the new variables. These are known as variablerepresentations (Definition 5). One important feature of variable representationsis that as the result of applying them is another CSP and the resulting variablescan be replaced with representations themselves.

Definition 5. A variable representation of a variable X with domain D is a pair〈V , f〉 where V is a vector of CSP variables and f is a partial surjective functionfrom assignments of V to D. An assignment to V which is in the domain of fis said to represent its image under f . The representation constraint is definedas the constraint “V is in the domain of f”. For a sub-domain V ′ of V , f(V ′)is defined as the sub-domain of X generated by applying f to all assignments inV ′.

The induced representation of a variable representation 〈V , f〉 is defined tobe the representation 〈R, g〉 where R is defined as all sub-domains of V , R isordered by r1 ≤ r2 ↔ ∀v ∈ V. r1[v] ⊆ r2[v] and g(r) = f(x)|x is allowed by r.

Lemma 1. The representation 〈R, g〉 induced from a variable representation〈V , f〉 is simple if and only if f is injective.

107

Proof. If the function f is not injective, then there must exist two distinct as-signments v and w of V such that f(v) = f(w).The sub-domains containingonly these assignments will clearly be incomparable but represent the same setof assignments. Consider now the case where f is injective. If v ≤ w then w al-lows all assignments v does, and therefore f(v) ⊂ f(w). Similarly if f(v) ⊆ f(w)then as f is injective v ≤ w. ut

Definition 6. Given a constraint C and a variable representation 〈V , f〉 of avariable X ∈ vars(C), then X is defined to be replaced in C by 〈V , f〉 as follows:

Given vars(C) = X, Y1, . . . , Yn, then C can be expressed as the subset S of〈D(X) × D(Y1) × · · · × D(Yn)〉 which contains the assignments vars(C) whichare allowed by C. C is the replaced by a new constraint over the Vi and Yi whichallows the assignments 〈v1, . . . , vm, y1, . . . , yn〉| 〈f(〈v1, . . . , vm〉), y1, . . . , yn〉 ∈S. ut

Example 6. Consider a set variable X of size 2 drawn from 0, 1, 2, 3. Thisis represented under the explicit representation by 〈V , f〉, where V = 〈V1, V2〉and the Vi have domain 0, 1, 2, 3, f is defined as mapping each pair of values〈V1, V2〉 to the set V1, V2 where X and Y are distinct. Consider the constraint∑

i∈X = 3, expressed as a set of tuples with 〈0, 3〉, 〈1, 2〉. When thisconstraint is replaced by the representation 〈V , f〉, then the original constraintis replaced by the constraint 〈V [1], V [2]〉 ∈ 〈0, 3〉, 〈3, 0〉, 〈1, 2〉, 〈2, 1〉.

Theorem 2. a) Given a CSP P , a variable X from P and a variable repre-sentation 〈V , f〉 of X, a new CSP P ′ can be generated by adding the variablesV and the constraint f(V ) = X to P . All solutions of P ′ can be generated bytaking a solution of P and then adding an assignment to V which satisfies theconstraint f(V ) = X, and there will be at least one such assignment to V foreach assignment to X, and exactly one when the representation is simple.

b) Given a CSP P which contains a variable X, a vector of variables V andthe constraint cR = f(V ) = X where 〈V , f〉 is a representation of X, a newCSP P ′ can be generated by taking any constraint C in P (except cR) whereX ∈ vars(C) and replacing X with V by 〈V , f〉 in C. The solutions of P andP ′ will be are identical.

c) Given a CSP P which contains a vector of variables V and a variableX, which is in only the constraint f(V ) = X where 〈V , f〉, a new CSP P ′ canbe generated by removing the variable X and the constraint f(V ) = X. Thesolutions to P ′ are exactly the solutions to P without the assignment to X.

Proof. a) Trivial, as for each assignment to X there exists at least one (andexactly one if the representation is perfect) assignment to V such that f(V ) = Xis satisfied.

b) Consider an assignment s to vars(P ) which is a solution to P . This will bea solution to P ′ if and only if s satisfies C with X replaced by V , as describedin Definition 6. However we know that vars(P ) satisfies C, and therefore inparticular this assignment satisfies f(V ) = X. By the definition of replacinga variable in a constraint with a representation, the original constraint will besatisfied if and only if the new one is. The reverse argument follows identically.

108

c) Clearly any solution to P when limited to vars(P ′) will be a solution toP ′, as the constraints on P ′ are a subset of the constraints on P . Given anysolution to P ′, adding the assignment to X which satisfies f(V ) = X which alsoclearly be a solution to P , as it is only P ′ with one extra variable and constraint,and this new constraint is the only one which involves X. ut

Theorem 2 proves that the Definition 6 can be used to apply representationsto a CSP and get a new CSP, whose solutions can be mapped to the solutions ofthe original CSP. One minor problem with Definition 6 is that it assumes con-straints are expressed explicitly. Most constraint solvers operate most efficientlyon constraint expressed implicitly using a small language of constraints. How animplicit representation of a constraint can be refined to an implicit representa-tion on a representation is a complex issue in its own right, see [3]. In this paperrefined constraints will be given implicitly where possible, but how these implicitconstraints could be generated will not be discussed.

2.2 Representations of Sets and Multisets

One of the most common types for variables which occur in specifications ofcombinatorial problems is sets and multisets, and these will be used as the pri-mary example throughout this paper. There are a number of representations ofsets and multisets in common usage in constraint programming, and some ofthese will be explored in this paper. Definition 7 gives a formal way of referringto the various families of sets and multisets which will arise.

Definition 7. Given set X, natural number occ and integer range Size, thenS = MS(X, occ, Size) denotes the set of all (multi)sets drawn from X with sizein the range Size and where each value occurs at most occ times. Three commonspecial cases of this definition are where occ is 1 (so S represents only sets),Size allows a single value (so S represents a single fixed size of (multi)sets),and Size is a range at least as large as [0, |X| ∗ occ], so it places no limit on the(multi)sets in S, so S is unsized.

Definition 8. Given a (multi)set variable X with domain MS(S, occ, Size), theoccurrence representation of X is defined to be 〈V , f〉, where V is a vector in-dexed by S of variables with domain 0, . . . , occ. The function f maps assign-ments of V to X by f(v) = v[s]× s|s ∈ Sm, under the condition the resulting(multi)set is in the domain of X. The representation constraint is therefore givenby sum(V ) ∈ Size. V is called the occurrence vector.

Definition 9. Given a (multi)set variable X with domain MS(S, occ, Size), theexplicit representation of X is defined to be 〈E, f〉, where E is a vector of vari-ables of domain S ∪ ∅ (where ∅ represents a value not in S) indexed by therange 1, . . . ,max(size). f maps assignments of E by f(v) = E[i]|1 ≤ i ≤max(size) ∧ E[i] 6= ∅m, where this (multi)set is in MS(S, occ, Size). E isknown as the element vector.

Definition 10. Given an abstract variable X with domain MS(S, occ, Size), itis represented under the explicit with check representation by 〈E.C, f〉, where

109

E and C are both of length max(size), the elements of C are Boolean variablesand the elements of E have domain S. The function f maps assignments of Eand C to the (multi)set which contains each E[i] for which C[i] is True, wherethis (multi)set is in MS(S, occ, Size). E is called the element vector, and C iscalled the check vector.

Lemma 2. For (multi)sets of domain and size greater than 1 the explicit rep-resentation is not simple.

Proof. It is possible to freely permute the vector of variables in the explicitrepresentation and they will represent the same (multi)set, so as long as thisvector has length greater than 1 and more than one possible assignment, therepresentation cannot be simple. ut

The explicit with check can be simpler to implement than the explicit rep-resentation, as it does not require constraints which can cope with a specialdomain element which represents that variable does not represent a value in the(multi)set. Both these representation have the problem of variable symmetry, aspermuting an assignment to the element vector (and identically permuting thecheck vector in the case of the explicit with check representation) generates anassignment which represents the same (multi)set. Only one method of breakingthis symmetry shall be considered in this paper, which is to impose that an as-signment to the element vector only represents a (multi)set when it is orderedaccording to some ordering of the elements the (multi)set is drawn from. Afterperforming symmetry breaking, both these representations are simple.

Definition 11. The Gent representation2 was designed to try to combine thestrengths of the occurrence and explicit with symmetry breaking representations.Given a (multi)set variables X with domain MS(S, occ, size), it is representedwith the Gent representation by 〈V , f〉, where V is a vector of length |S| withvariables of domain 0, 1, . . . ,max(size). f maps assignments of V to assign-ments of X by the following rules, assuming a total ordering on the elements ofS. 1) If a is the smallest element in S, f(v) contains v[a] occurrences of a. 2)If v[b] 6= 0, then f(v) contains v[b] − maxv[i]|i < b occurrences of b. 3) f(v)is only defined if a < b → (v[b] = 0 ∪ v[a] < v[b]) and the resulting (multi)set isvalid assignment to X.

Example 7. If X = MS(1, 2, 3, 4, 5, 5, [0, 5]), then if X is represented underthe Gent representation then 〈0, 2, 0, 4, 0〉 represents 2, 2, 4, 4M (where M

denotes a multiset) and 〈1, 2, 4, 0, 0〉 represents 1, 2, 3, 3M . The representationof both 〈0, 1, 0, 1, 0〉 and 〈0, 2, 0, 1, 0〉 is undefined, as the non-zero elements arenot in strictly increasing order.

There are a number of variants of these representations, some of which modifythem in a minor manner, and others which perform large additions. One verycommon extension, and the only discussed in this paper, is adding a variable

2 Discovered by Ian Gent, unpublished

110

whose value is size of the (multi)set. It is simple to add this variable to eachof the representations considered so far. These extended representations will bedenoted by adding “+ size” to their title, for example the “occurrence + sizerepresentation”.

3 Comparing representations

The definition of representations given in Section 2 provides a method of spec-ifying representations and in the case of variable representations using them tomap one CSP to another. In order to become useful to modellers, these defini-tions must be used to aid choosing the “best representation” to use to solve aparticular CSP.

The first obvious problem is to decide on a definition of “best representation”.In particular, solver-specific implementation issues and optimisations mean thateven on different versions of the same solver the representation which will solvea problem fastest can change drastically. The next most obvious method of com-paring representations, and the one considered here, is to compare the size ofsearch, ignoring how long a particular implementation may take to implementa given implementation. Of course it is not possible to entirely ignore time andmemory usage, in particular because except in extreme cases, using a represen-tation should always decrease memory usage and time taken per node, but mayincrease (and should never decrease) search size.

The strongest possible relationship between two representations A and B ofthe same variable would be if it could be shown that using A always producedsmaller searches than B. Definition 12 defines exactly this property. It is obviousthat given this definition, if A dominates B, then given any search which wasperformed where a variable is represented using B, B could be replaced with A.

Definition 12. Rep1 = 〈R1, f1〉 dominates Rep2 = 〈R2, f2〉 (or Rep2 is embed-ded in Rep1) if there is a function M : R2 → R1 such that ∀r ∈ R2. f1(M(r)) =f2(r) and ∀r1, r2 ∈ R2. r1 ≤ r2 → M(r1) ≤ M(r2). Rep1 and Rep2 are equiva-lent if Rep1 dominates Rep2 and Rep2 dominates R1. Rep1 strictly dominatesRep2 if Rep1 dominates Rep2 and Rep2 does not dominate Rep1.

Dominance is very similar to the idea of expressivity from [8], and for simplerepresentations Lemma 3 shows that they are identical. For non-simple repre-sentations however is is necessary to also consider the ordering relation betweenstates, as well as the sub-domains those states represent. Considering only thestates which can be represented can suggest a representation is more useful thanit actually is during search, as Example 8 demonstrates.

Lemma 3. Given two simple representations Rep1 = 〈R1, f1〉 and Rep2 =〈R2, f2〉 then Rep1 dominates Rep2 if and only if f2(r2)|r2 ∈ R2 ⊆ f1(r1)|r1 ∈R1.

Proof. If Rep1 dominates Rep2, then there exists a function M : R2 → R1 suchthat ∀r ∈ R2. f1(M(r)) = f2(r) and therefore ∀r2 ∈ R2. ∃r1 ∈ R1. f1(r1) =f2(r2).

111

In the opposite direction, as the set of sub-domains represented Rep2 is asubset of those represented by Rep1 and as both representations are simple (andso each sub-domain is represented at most once), then it is simple to define afunction M from R2 to R1 by f(r2) = r1 ↔ f1(r1) = f2(r2). As the orderingon simple representations is entirely determined by the domains they represent,this function defines the dominance between Rep1 and Rep2. ut

Example 8. Consider representing two element multisets drawn from 1, 2, 3using the explicit representation with and without symmetry breaking.

The sub-domain 1, 1M , 1, 2M ,2, 3M can be represented by the sub-domains 〈1, 2, 1, 3〉 if symmetry breaking is not imposed, but if symmetrybreaking is imposed the smallest sub-domains which contain these assignmentsare 〈1, 2, 1, 2, 3〉, which also allows 2, 2M .

Consider now representing the sub-domain 1, 2M , 2, 3M. These can berepresented exactly by the sub-domains 〈2, 1, 3〉 without symmetry breaking,but require at least the sub-domains 〈1, 2, 2, 3〉 with symmetry breaking,which also allow the multiset 2, 2M .

Without symmetry breaking, there are more possible sets of sub-domainswhich can be represented. This comes at the cost however that the domainswhich were shown second are not reachable from those which were generatedfirst. It is therefore misleading to simply list the states which are reachableduring search in the case of non-simple representations.

3.1 Dominance in variable representations

While dominance is a useful property, proving if one representation dominatesanother is non-trivial directly from the definition, and therefore finding simplermethods of proving dominance is useful. For simple variable representations,Theorem 3 gives a sufficient, although not necessary, to prove one simple repre-sentation dominates another.

Definition 13. A set of channelling constraints between two variable represen-tations R1 = 〈V1, f1〉 and R2 = 〈V2, f2〉 of a variable X is a set of constraintsover V1 and V2 such that an instantiation v1 of V1 and v2 of V2 satisfies all theconstraints if and only if either f1(v1) is equal to f2(v2), or both are undefined.

Theorem 3. Given two simple variable representations R1 = 〈V , f〉 and R2 =〈W , g〉 of a variable X and a set C of channelling constraints between V and W ,where each c ∈ C can be expressed in the form v = x ↔ (w1 = y1∨· · ·∨wn = yn)for constants x, y1, . . . , yn and variables v ∈ V and wi ∈ W , then R2 dominatesR1.

Proof. Consider a sub-domain V ′ of V which is GAC with respect to the rep-resentation constraint, and construct the maximal sub-domain W ′ of W whichis GAC with respect to the representation constraint and W ′ and V ′ togetherare GAC with respect to C. By construction, for each assignment v′ ∈ V ′ wheref(v′) is defined, there will exist a w′ ∈ W ′ such that g(w′) = f(v′).

112

Consider an assignment w′ ∈ W ′ such that g(w′) is defined. There mustexist a unique assignment v ∈ V such that g(w′) = f(v) and c(w′, v) holds forall c ∈ C. It remains to show this assignment is allowed by V ′.

Consider a single channelling constraint in C. if the constraint is true, eitherboth the left and right hand side of the constraint are false, or the comparisonon the left hand side and at least one comparison on the right hand side are true.Given a sub-domain of W therefore, by looking at each channelling constraint inturn it is possible to construct a list of assignments to variables in V which mustbe allowed, and which must be forbidden. Considering a larger sub-domain ofW can only increase the number of assignments to variables in V which must beallowed. Therefore as C(v,w) is true, then as W ′ contains w′, V ′ must containv′, because GAC C(V ′,W ′) holds. ut

Theorem 4. 1. The Gent representation dominates the occurrence represen-tation for representing sets.

2. The Gent + size representation dominates the occurrence + size representa-tion for representing sets.

3. The Gent representation dominates the explicit with symmetry breaking rep-resentation for fixed sized sets and multisets.

4. The Gent + size representation dominates the explicit with symmetry break-ing representation for variable sized sets and multisets.

5. The explicit and explicit with check representations with symmetry breakingare equivalent on variable and fixed sized sets and multisets.

Proof. From Theorem 3, we only need to find a set of channelling constraints ofthe appropriate form to prove dominance. The required channelling constraintsare given below. In each case, a CSP variable X = MS(S, occ, size) is representedin multiple ways.

1. Represent X as 〈V , f〉 under the ordered occurrence representation and〈W , g〉 under the occurrence representation. Consider the set of constraintsW [i] = 0 ↔ V [i] = 0|i ∈ X ∪ W [i] = 1 ↔ V [i] = 1 ∨ · · · ∨ V [i] = size|i ∈X.

2. Represent X as 〈V .S1, f〉 under the ordered occurrence + size representa-tion (where S1 is a variable which represents the size of X), and 〈W .S2, g〉under the occurrence + size representation. Consider the sets of constraintsW [i] = 0 ↔ V [i] = 0|i ∈ X ∪ W [i] = 1 ↔ V [i] = 1 ∨ · · · ∨ V [i] = size|i ∈X and S1 = i ↔ S2 = i|i ∈ size.

3. Represent X as 〈V , f〉 under the ordered occurrence representation and〈W , g〉 under the explicit representation. Consider the set of constraintsW [i] = x ↔ V [x] = i|x ∈ X, i ∈ 1, . . . , size ∪ W [i] = ∅ ↔ False|i ∈1, . . . , size.

4. Represent X as 〈V .S, f〉 under the ordered occurrence + size representationand as 〈W , f〉 under the explicit representation. Consider the set of con-straints W [i] = x ↔ V [x] = i|x ∈ X ∪ W [i] = ∅ ↔ S = 0 ∨ · · · ∨ C =i|i ∈ size.

5. Represent X as 〈V , f〉 under the explicit representation and as 〈W .C, g〉under the explicit with check representation. The set of constraints V [i] =

113

x ↔ W [i] = x|x ∈ X, i ∈ size ∪ V [i] = ∅ ↔ C[i] = 1|x ∈ X show thatthe explicit with check representation dominates the explicit representationand the set of constraints C[i] = 0 ↔ V [i] = ∅|i ∈ size ∪ C[i] = 1 ↔V [i] ∈ X|i ∈ size ∪ W [i] = x ↔ V [i] = x ∨ V [i] = ∅|i ∈ size, x ∈ Xdemonstrates the converse.

ut

4 Perfect representations

While dominance provides a powerful method of comparing representations, itis often too coarse. A more precise system would compare representations withrespect to a specific problem. Unfortunately, performing this usefully in generalhas proved difficult, but this section studies a useful special case of this problem,showing when a representation of some variable is equivalent to the completerepresentation with respect to one or more constraints.

Definition 14. Given a constraint C, a set T = 〈Rv, fv〉 of representationsfor some subset of vars(C), some 〈Rt, ft〉 ∈ T which represents a variable X,and the constraint C ′ obtained by applying each representation in T to variablesin vars(C), then “〈Rt, ft〉 is perfect with respect to C and T” if given a sub-domain A of vars(C ′) which is GAC with respect to C ′, then any assignmentto Rt which is allowed by A and has an image under ft can be extended to anassignment to A which satisfies C ′.

Note that the definition of perfect considers a set of representations. Thisis because replacing many variables with representations may be perfect withrespect to each of these representations, while replacing any single variable witha representation is not.

It follows from the definition of perfect that using a perfect representationinstead of the original variable will not lead to an increase in search space as-suming GAC propagation as whenever GAC propagation occurs, all assignmentswhich represent an assignment to the original variable and which do not satisfythe constraint will be removed.

Example 9. Consider a variable A with domain MS(1, . . . , n, 1, [0 . . . n)) rep-resented with the occurrence representation with occurrence vector V . The con-straint a ∈ A for a constant value a is mapped via the representation to theconstraint V [a] = 1. Any sub-domain of V which is GAC with respect to thisconstraint will clearly have V [a] with only 1 in its domain and therefore anyassignment to V which has represents an element of A will satisfy the originalconstraint. The occurrence representation is therefore perfect with respect to theconstraint a ∈ A.

Example 9 gives an example of a perfect representation with respect to a spe-cific constraint, while Example 5 gave an example of a non-perfect one, whichleads to an exponential increase in search size compared to using the completerepresentation. In a similar fashion to Theorem 3, which showed a useful suffi-cient condition for one variable representation to dominate another, Theorem 5

114

shows a necessary and sufficient (although only sufficient is proved due to spacerestrictions) condition for a variable representation to be perfect with respect toa given constraint.

Definition 15. Given a constraint C and a set of constraints K, then a setof constraints S is defined to be a split of C in the context of K if vars(C) =∪vars(s)|s ∈ S, (s, t ∈ S ∧ s 6= t) → vars(s) ∩ vars(t) = ∅ and C ∧ K islogically equivalent to ∧s|s ∈ S ∧K.

Lemma 4. Given a split S of a constraint C in the context of K, then the setof constraints S′ generated by creating a new constraint s′ for each each s ∈ Swhere vars(s′) is the same as vars(s) and an assignment is allowed by s′ if itcan be extended to a valid assignment of C is also a split of C in the context ofK.

Proof. There cannot be an assignment to vars(C) which is allowed by C ∧ Kand not by (∧S′) ∧ K as each s′ ∈ S′ accepts any assignment which can beextended to a complete assignment of vars(C) which satisfies C. Furthermore,all assignments to (∧S′) ∧ K must be allowed by (∧S) ∧ K as the constraintsin S′ allow the minimal possible set of assignments required to satisfy that theyare a split. Therefore C ∧K → (∧S′) ∧K → (∧S) ∧K ↔ C ∧K, and thereforeall 3 must be equivalent. ut

Theorem 5. Given a constraint C and the constraint C ′ generated by applyinga set of variable representations R = Vi, fi to C, a distinguished representation〈Vr, fr〉 ∈ R, and the set of constraints K generated by taking the representationconstraints of each element of R, then r is perfect with respect to C and R ifC ′ can be split into a set of constraints S in the context of K, each of whichcontains at most one variable from each Vr.

Proof. Without loss of generality, we will assume any splits are in the formdiscussed in Lemma 4. Consider the case where a split exists. We must show thatthis implies that given any sub-domain of vars(C ′), if it is GAC with respect toC ′, any assignment to Vr can be extended to a valid assignment of vars(C ′) withrespect to both C ′ and the representation constraints of all the elements of R. Ifthis was not the case, this would mean there was a sub-domain of vars(C ′) whichis GAC with respect to C ′ but there was an assignment to one of the Vr whichcould not be extended to a valid assignment of vars(C ′). This would mean alsoit could not be extended to a valid assignment which satisfied all the constraintsin a split of C ′. As no variable is in more than one constraint of the split, thiswould mean that at least one constraint in the split had an assignment whichwas not satisfiable. However each constraint in the split contains at most onevariable from Vi, so that variable can be pruned, and therefore one constraintin the split of C ′ was not GAC, and therefore neither was C ′. ut

Corollary 1. The occurrence representation is perfect when all variables in theconstraints A∪B = C,A∩B = C and A ⊆ B are represented by the occurrencerepresentation, and the (multi)set variables A,B and C are unsized.

115

Proof. If A, B and C are represented by occurrence representation with vectorsA’,B’ and C’, then each of the constraints trivially splits to satisfy Theorem 5,for example for A,B, C = MS(1, . . . , n, 1, [0, n]), A ∪ B = C becomes the setof constraints A′[i] ∧B′[i] = C ′[i]|i ∈ 1, . . . , n. ut

Although we shall not prove it here, the explicit representation is not perfectfor any of these constraints. This can be seen intuitively, as all these constraintsinvolve examining every variable in representation to know if a certain elementis in the set or not.

5 Conclusion

This paper has extended the study of representations by giving an implementa-tion independent definition of representations and shown how this can be used tousefully compare representations both to each other in both a problem-dependentand problem-independent manner. Of particular interest is that a previously un-studied representations of sets and multisets has been proved to perform betterthan two frequently used representations. Further, we have shown that for prob-lems which involve most common set constraints excluding cardinality, the oc-currence representation performs as well as the theoretically best representation.The work in this paper is being further expanded to cover other families of con-straints, in particular quantified constraints, and also other types of variables,such as graphs, are being further investigated.

References

1. C. Bessiere, E. Hebrard, B. Hnich, and T. Walsh. Disjoint, partition and intersectionconstraints for set and multiset variable. In Proceedings of CP04, 2004.

2. C. Bessiere and J.-C. Regin. Arc consistency for general constraint networks: Pre-liminary results. In IJCAI (1), pages 398–404, 1997.

3. A. M. Frisch, C. Jefferson, B. Martınez Hernandez, and I. Miguel. The rules ofconstraint modelling. In Proceedings of the 19th International Joint Conferences onArtifical Intelligence, 2005.

4. C. Gervet. Conjunto: constraint logic programming with finite set domains. InM. Bruynooghe, editor, Logic Programming - Proceedings of the 1994 InternationalSymposium, pages 339–358, Massachusetts Institute of Technology, 1994. The MITPress.

5. W. Harvey and P. Stuckey. Improving linear constraint propagation by changingconstraint representation. Constraints, 8[2]:173–207, 2003.

6. B. Hnich. Function Variables for Constraint Programming. PhD thesis, Uppsala,2004.

7. B. Hnich, B. Smith, and T. Walsh. Dual modelling of permutation and injectionproblems. Journal of Artificial Intelligence Research, Volume 21.

8. T. Walsh. Consistency and propagation with multiset constraints: A formal view-point. In Proceedings of CP, 2003.

116

Modelling and Solving Temporal Reasoning asPropositional Satisfiability ?

Duc Nghia Pham, John Thornton, and Abdul Sattar

Institute for Integrated and Intelligent SystemsGriffith University, Queensland, Australia

d.n.pham, j.thornton, [email protected]

Abstract. Recent research has shown that it is often preferable to encode real-world problems as propositional satisfiability (SAT) problems, and then solveusing general purpose solvers. In this way the efficiencies of SAT technologycan be exploited, and the development of specialised solution techniques can beavoided. However, in the interval algebra (IA) domain of temporal reasoning,the state-of-the-art still involves the use of specialised techniques that exploit theparticular structure of interval relations.In this paper we investigate the feasibility of representing and solving IA prob-lems as SAT problems. We firstly introduce two methods of representing IA as aconstraint satisfaction problem (CSP), and then use three SAT-encoding schemesto produce six different IA to SAT representations. In an empirical study, weexamine the performance of existing SAT local and complete search solvers onthese SAT representations, and perform a comparison with solvers that operate onnative IA representations. Our results show that the best performance over a rangeof algorithms is produced using a support SAT encoding of a point algebra-basedCSP. The results also show that a state-of-the-art complete SAT solver (zChaff)can solve these instances significantly faster than existing IA solvers working onequivalent native IA representations.

1 Introduction

Representation and reasoning with time information, or temporal reasoning, is a fun-damental research area in computer science and artificial intelligence. Basic tasks inthis domain include the design and development of efficient reasoning methods for de-termining consistency of temporal representations, and effectively answering temporalqueries. More generally, results from temporal reasoning research have been success-fully applied in many real world AI applications such as planning, plan recognition,natural language understanding, and medical diagnosis [15].

In this paper we are specifically concerned with the interval algebra (IA) representa-tion of the temporal reasoning problem [1]. This is firstly because IA offers considerableexpressivity in terms of representing qualitative information and secondly because it isthe most popular and well studied temporal reasoning formalism. Existing IA tempo-ral reasoning techniques are generally based on the backtracking approach (proposed? We would like to thank Peter van Beek and Jochen Renz for helpful comments on the earlier

version of this paper.

117

by Ladkin and Reinefeld [10]), which uses path consistency as forward checking. Al-though this approach has been further improved [15, 22], it and its variants still rely onpath consistency checking at each step to prune the search space. This native IA ap-proach has the advantage of being fairly compact, but is disadvantaged by the overheadof continually ensuring path-consistency. Additionally, the native IA representation ofvariables and constraints means that state-of-the-art local search and complete searchheuristics (such as unit propagation look ahead in Satz [12] or no-good recording andnon-chronological backtracking in Chaff [14]) cannot be easily transferred to the tem-poral domain.

In practice, existing native IA backtracking approaches are only able to find con-sistent solutions for relatively small general IA instances [21, 19]. This has motivatedresearch in applying stochastic local search techniques (SLS) to the IA problem, tosee if performance increases over complete search observed in other SAT and CSP do-mains can be translated to IA. The first step in this direction was taken by Thornton etal. [19], with the development of the end-point ordering model, specifically designedto represent IA problems in a form suitable for processing by SLS. In this researchthe TSAT local search algorithm was shown to significantly outperform an existingcomplete search technique on a set of larger, more difficult IA problems. However, theend-point ordering model, like the native IA model, has a specialised structure that iscarefully exploited by the TSAT algorithm. This means it is not a suitable representationfor the application of general purpose techniques.

In this paper we ask the question whether the representation of IA problems us-ing specialised models that require specialised algorithms is necessary in the generalcase. Given the development of such approaches takes considerable effort, we wouldexpect significant performance benefits to result. To answer this question, we look atexpressing IA as a propositional satisfiability problem. This enables us to apply a rangeof state-of-art SAT solvers and to compare the performance of these with the exist-ing native IA approaches. According to our understanding, it appears that no explicitand thorough work has tried to formulate temporal problems as SAT instances. Nebeland Burckert [16] pointed out that qualitative temporal instances can be translated toSAT instances but that such a translation causes an exponential blowup in problem size.Hence, no further investigation was provided in their work.1

A second issue addressed by the paper is the discovery of the best SAT representa-tion for IA. This question divides into two parts: firstly the development of an appro-priate CSP representation of IA (i.e. one suitable for the efficient application of generalpurpose solution techniques), and secondly the best choice of a CSP to SAT encoding.One of the main contributions of the paper is the development of a point-based CSPencoding of the IA problem which uses point algebra-based relations [23] but retainsthe full expressivity of IA. This is compared to a more straightforward interval-basedmodel. We also extend the literature on the relative performance of existing CSP to SATencodings by giving an empirical comparison of three encodings for each of our CSPmodels and for both complete and local search approaches.

1 Recent independent work [6] has proposed representing IA as SAT, but the authors do notspecify the transformation in detail, and fail to provide an adequate empirical evaluation.

118

The remainder of the paper is structured as follows: in the next section we reviewthe basic definitions of IA, and in Section 3 we introduce the two methods to transformIA instances to CSP instances. Using these two transformation methods, combined withthree CSP to SAT encodings, six IA to SAT encodings are presented. In Section 4.1 wedescribe the generation of our test set instances and in Sections 4-4.5 we present an em-pirical study to evaluate the performance of these SAT encodings relative to each other,and the performance of existing complete and SLS SAT solvers in comparison to thenative IA backtracking and TSAT solvers. Finally, Section 5 presents the conclusionsand the future research suggested by this study.

2 Interval Algebra

Interval Algebra [1] is the most commonly used formalism to represent temporal in-terval events. It consists of a set of 13 atomic relations between two time intervals:I = eq, b, bi, m, mi, o, oi, d, di, s, si, f, fi (see Table 1). Indefinite information be-tween two time intervals can be expressed as a subset of I (e.g. a disjunction of atomicrelations). For example, the statement “Event X can happen either before or after eventY” can be expressed as Xb, biY . Hence there are a total of 2|I| = 8,192 possiblerelations between pairs of temporal intervals.

Atomic relation Symbol Diagram of PA

meaning representation

X before Y b X¾- Y¾- X− < Y −, X− < Y +

Y after X bi X+ < Y −, X+ < Y +

X meets Y m X¾ - Y¾ - X− < Y −, X− < Y +

Y met by X mi X+ = Y −, X+ < Y +

X overlaps Y o X¾ - X− < Y −, X− < Y +

Y overlapped by X oi Y¾ -

X+ > Y −, X+ < Y +

X during Y d X¾ - X− > Y −, X− < Y +

Y includes X di Y¾ -

X+ > Y −, X+ < Y +

X starts Y s X¾ - X− = Y −, X− < Y +

Y started by X si Y¾ -

X+ > Y −, X+ < Y +

X finishes Y f X¾ - X− > Y −, X− < Y +

Y finished by X fi Y¾ -

X+ > Y −, X+ = Y +

X equals Y eq X¾ - X− = Y −, X− < Y +

Y¾ -

X+ > Y −, X+ = Y +

Table 1. The thirteen IA atomic relations

The four operators of IA: union (denoted by ∪), intersection (denoted by ∩), inver-sion (denoted by −1), and composition (denoted by ), can be defined as follows:

∀ X, Y : X(R1 ∪R2)Y ↔ (XR1Y ∨XR2Y )

119

∀ X, Y : X(R1 ∩R2)Y ↔ (XR1Y ∧XR2Y )

∀ X,Y : X(R−11 )Y ↔ Y R1X

∀ X,Y : X(R1 R2)Y ↔ ∃ Z : (XR1Z ∧ ZR2Y ).

Hence, the intersection and union of any two temporal relations (R1, R2) are simplythe standard set-theoretic intersection and union of the two sets of atomic relationsdescribing R1 and R2, respectively. The inversion of a temporal relation R is the unionof the inversion of each atomic relation ri ∈ R. The composition of any pair of temporalrelations (R1, R2) is the union of all results of the composition operation on each pair ofatomic relations (r1i, r2j), where r1i ∈ R1 and r2j ∈ R2. The full composition resultsof these IA atomic relations are available in [1].

An IA network can be modelled as a temporal CSP (TCSP), where each intervalevent is a CSP variable with a domain of ordered pairs of real numbers and each binaryconstraint Cij is labelled with the interval relations between the ith and jth intervals[15]. In general, the solution for a standard CSP is an assignment of domain values toall variables such that all the constraints are satisfied. Unfortunately, as the domain ofan IA interval is infinite, this representation is not appropriate for a discrete domainCSP or SAT solver. However, the problem can be relaxed such that an I-instantiationof a given IA network is an assignment of interval relations to all binary constraintsin the corresponding TCSP. An I-instantiation is singleton, also known as a scenario,iff each binary constraint is assigned with exactly a single atomic relation. An IA net-work with n interval variables is globally consistent iff it is strongly n-consistent [13].Hence, the ISAT problem of determining whether a given IA network is satisfiable,becomes the problem of determining whether a globally consistent I-instantiation ofthe corresponding TCSP exists [1, 15]. ISAT is the fundamental reasoning task in thetemporal reasoning community because all other interesting reasoning problems can bereduced to it in polynomial time [7] and it is one of the most important tasks in practicalapplications [22].

Figure 1(a) shows an example of an IA network expressing the situation: “Fred wasreading the paper while eating his breakfast. He put the paper down and drank the lastof his coffee. After breakfast he went for a walk.”.2 In this example, variables Xp, Xb,Xc and Xw represent the interval that Fred is reading the paper, eating the breakfast,drinking coffee and walking respectively. Figure 1(b) shows a consistent solution of thisnetwork on the timeline and figure 1(c) shows the corresponding I-instantiation of thatsolution.

As ISAT is known to be NP-complete [23], the application of some sort of exhaus-tive search method is generally required to determine the satisfiability of a full IA net-work. Ladkin and Reinefeld [10] proposed an efficient backtracking approach to solvethe ISAT problem by enforcing path consistency as forward checking at every branchingnode. This allows the elimination of relations that are path inconsistent with the currentpartial solution. They also pointed out that the instantiation of each constraint can beextended from atomic relations to any set of relations for which path consistency guar-antees global consistency, and hence considerably reduced the branching factor of the

2 This example was originally used in [21].

120

mXp

mXb

mXw

mXc

¡¡

¡¡µo, oi, d, di,

s, si, f,

fi, eq

@@

@@Ro, d, s

@@

@@R

b6d

Xp

Xc

Xb Xw

mXp

mXb

mXw

mXc

¡¡

¡¡µd

¡¡

¡¡µb

@@

@@Ro

@@

@@R

b

-b

6d

(a) IA network (b) timeline solution (c) consistent scenario

Fig. 1. An example of an IA network and its consistent solution.

algorithm. In addition, various variable and value ordering techniques were developedand empirically shown to significantly improve overall performance [21, 11, 15].

Recently, Thornton et al. [19] developed a new transformation method, called end-point ordering, that reformulates IA networks into CSPs. In this model variables areinterval end-points and constraints are end-point relations as defined in Table 1. Thedomain of each end-point variable is defined as the integer value position or rank ofthat variable within the total ordering of all end-points [19]. To solve these problems,a specialised TSAT local search algorithm was developed that exploits the structure ofthe end-point domains and constraints. The main difficulty with this approach is thegeneration of very large variable domains (representing all possible orderings of eachinterval). Without the special TSAT pruning heuristics a standard general purpose solverwould not prove competitive. Therefore, neither native IA or the end-point orderingmodels are appropriate for use with a general purpose SAT or CSP solver. For thisreason we decided to explore the development of alternative representations that couldproduce variables with reasonable domain sizes and easily represented constraints.

3 Encoding IA into SAT

Recent research has shown that modelling and solving hard combinatorial problemsas SAT instances can produce significant performance benefits over solving problemsin their original form [9, 8, 18]. This at least indicates that encoding and solving IAproblems as SAT instances using state-of-the-art SAT solvers is a promising line ofenquiry.

A common approach to encode combinatorial problems into SAT is to divide thetask into two steps: (i) modelling the original problem as a CSP and (ii) mapping thenew CSP into SAT. In the next two subsections, we propose two transformation methodsto model IA networks as CSPs such that these CSPs can be feasibly translated intoSAT. We then discuss three SAT encoding schemes to map the CSP formulations of IAnetworks into SAT. This results in six different approaches to encode IA networks intoSAT.3

3 The proof for these reformulating methods will appear in a longer version of this paper.

121

3.1 The Interval-Based Transformation Method

A straightforward method to formulate IA networks as CSPs is to represent each arcbetween a pair of intervals in the original IA network as a CSP variable. We then limitthe domain values of each CSP variable to the set of permissible IA atomic relationsfor that arc, rather than the set of all subsets of I used in existing IA approaches. Thisallows us to reduce the domain size of each CSP variable from 213 to a maximum of 13values. In addition, an instantiation of the new model is now a singleton I-instantiation,as a property of a CSP is that one and only one value can be assigned to a variable atany given time. Hence, the global constraint that an I-instantiation has to be globallyconsistent becomes equivalent to it being path consistent, as a singleton I-instantiationis globally consistent iff it is path consistent [21].

In his original work, Allen [1] proposed a path consistency algorithm for a regularI-instantiation that repeatedly computes

Rik = Rik ∩ (Rij Rjk)

for all triples of intervals (i, j, k) until no further change occurs or until Rik = ∅.These operations remove all the relations that cause an inconsistency between any triple(i, j, k) of intervals. If Rik = ∅, then the original I-instantiation is path inconsistent.

For a singleton I-instantiation, the algorithm can be simplified, without loss of com-pleteness, so that we only need to check

Rik ⊂ (Rij Rjk)

for all triples of interval (i < j < k) once. The intersection (∩) operation is unnecessaryas Rik is instantiated with exactly one atomic relation.

Formally, using this interval-based reduction method, the corresponding CSP of agiven IA network is defined as follows:

Definition 1. Given an IA network with n intervals, the corresponding interval-basedCSP is (X, D, C), where X = υij | i, j ∈ [1..n], i < j; each variable υij representsa relation between two intervals i and j, having a domain Dij = Rij; and C consistsof the following constraints:

υij = x ∧ υjk = y =⇒ υik ∈ z1, ..., zm, (i, j, k ∈ [1..n], i < j < k) (1)

where z1, ..., zm = Dik ∩ (x y). Note that Rij is the IA relation between i and j;and x, y ∈ Rij .

This complete interval-based reduction method requires O(n3) time, where n is thenumber of intervals.

3.2 The Point-Based Transformation Method

Vilain and Kautz [23] proposed the Point Algebra (PA) to model qualitative informa-tion between time points. PA consists of a set of 3 atomic relations P = <,=, >and four operators defined in the similar manner to IA. As an interval event X can be

122

represented as an ordered pair of end-points (X−, X+) where X− < X+, it followsthat the relations between two intervals can be expressed as the relations between theirend-points.

Although each atomic IA relation can be uniquely represented by a combination ofrelations on these points (see Table 1), representing non-atomic IA relations is morecomplex, as not all IA relations can be translated into point relations. For example,the following combination of point relations, (X− 6= Y −) ∧ (X− < Y +) ∧ (X+ 6=Y −) ∧ (X+ < Y +), represents not only Xb, dY but also Xb, d, oY . Therefore,PA can only cover 2% of IA [15].

However, we can simply introduce new constraints to prevent the representation ofundesired IA relations in the CSP model. Such constraints are formed using the negationof the PA representation of the undesired atomic IA relation.4 For instance, to disallowXoY in the above example, we would include the constraint:

¬(X− < Y −) ∨ ¬(X− < Y +) ∨ ¬(X+ > Y −) ∨ ¬(X+ < Y +)

Formally, using this point-based reduction method, the corresponding CSP of agiven IA network is defined as follows:

Definition 2. Let µrst be the PA representation for an atomic IA relation r between two

intervals s and t. Given an IA network with n intervals, the corresponding point-basedCSP is (X, D, C), where X = υij | i, j ∈ [1..2n], i < j; each variable υij representsa relation between two points i and j, having a domain Dij ∈ P; and C consists of thefollowing constraints:

υij = x ∧ υjk = y =⇒ υik ∈ z1, ..., zm, (i, j, k ∈ [1..n], i < j < k) (2)

¬µrst, (r /∈ Rst) (3)

where z1, ..., zm = Dik ∩ (x y). Note that x, y, z are PA atomic relations; s, tare intervals in the original problem and Rst is the relation between them.

3.3 Mapping CSP Representations of IA into SAT

Using either of the above CSP formulations, an IA network can be easily encoded asa SAT instance, where each Boolean variable υr

ij represents an assignment of domainvalue r to a CSP variable υij . Two sets of at-least-one (ALO) and at-most-one (AMO)clauses can then be used to ensure that each CSP variable can only be instantiated withexactly one value at any time. It is common practice to encode a general CSP into SATwithout the AMO clauses, thereby allowing CSP variables to be instantiated with morethan one value [24]. A CSP solution can then be extracted by taking any single SAT-assigned value for each CSP variable. However, our two CSP reduction methods dependon the fact that each CSP variable can only be instantiated with exactly one value atany time. This maintains the completeness of formulation by ensuring not only thecorrectness of our translation of path consistency constraints (required by intersection

4 Table 1 shows the PA representations of 13 IA atomic relations.

123

(∩) operations) but also the global consistency of the solution. Hence, the AMO clausescannot be removed from our translation.

A natural way to encode the path consistency constraints, i.e. constraints (1) and(2) above, is to add the following support (SUP) clause for each pair of domain values(x, y) for the two CSP variables (υij , υjk):

¬υxij ∨ ¬υy

jk ∨ υz1ik ∨ ... ∨ υzm

ik

where z1, ..., zm = Dik ∩ (x y). Note that we use the IA composition table forthe interval-based reduction method and the PA composition table for the point-basedreduction method [23].

For the extra, but essential, constraints that forbid undesired IA relations in thepoint-based model, i.e constraints of type (3) that cannot be excluded in a standard PArepresentation, we add a forbidden (FOR) clause for each of undesired IA relations in-volving the CSP variable in the original problem. The FOR clause of an atomic relationr is defined based on the negation of the PA representation of that relation. For example,given the PA representation of XoY is

(X− < Y −) ∧ (X− < Y +) ∧ (X+ > Y −) ∧ (X+ < Y +)

then the forbidden clause to rule out the relation XoY is

¬υ<X−Y − ∨ ¬υ<

X−Y + ∨ ¬υ>X+Y − ∨ ¬υ<

X+Y +

We call the above the support encoding scheme as it encodes the support valuesof the original problem. In Gent’s support encoding scheme [5], the support clausesare necessary for both implication directions of the CSP constraints. However, in ourscheme, only one SUP clause is needed for each triple of intervals (i < j < k), and notfor all permutation orders of this triple.

Formally, the SAT support encoding scheme for IA networks is defined as follows:

Proposition 1. The support SAT-encoded instance of a given interval-based represen-tation of an IA network consists of appropriate ALO, AMO and SUP clauses. The sup-port SAT-encoded instance of a point-based representation is defined in a similar man-ner with the use of extra FOR clauses.

Another way of representing CSP constraints as SAT clauses is to encode the con-flict values, e.g. nogoods, between any pair of CSP variables [8, 24]. This direct en-coding scheme for IA networks can be derived from our support encoding scheme byreplacing the SUP clauses with the conflict (CON) clauses. If we represent SUP clausesbetween a triple of intervals (i < j < k) as a 3D array of allowable values for the CSPvariable υik given the values of υij and υjk, then the CON clauses can be defined as:

¬υxij ∨ ¬υy

jk ∨ ¬υzm

ik

where zm ∈ Dik − x y. The multivalued encoding [18] is a variation of thedirect-encoding, where all AMO clauses are omitted. As discussed earlier, we did notconsider such an encoding because in the IA transformations the AMO clauses play anecessary role.

124

Proposition 2. The direct SAT-encoded instance of a given IA network is derived fromthe support SAT-encoded instance of that network by replacing SUP clauses with theappropriate CON clauses.

A more compact version of the direct encoding is the log encoding [8, 24]. Here, aBoolean variable xij is true iff the corresponding CSP variable Xi is assigned a valuein which the j-th bit of that value is 1. We can linearly derive log encoded IA instancesfrom direct encoded IA instances by replacing each Boolean variable in the direct en-coding with its bitwise representation. As a single instantiation of the underlying CSPvariable is enforced by the bitwise representation, the AMO and ALO clauses can beomitted. However, extra bitwise prevention (PRE) clauses are needed (if necessary) toprevent bitwise representations of undesired Boolean variables from being instantiated.For example, if the domain of variable X is 3 then we have to add the clause¬x30∨¬x31

to prevent the fourth value from assigning to X .

Proposition 3. The log SAT-encoded instance of a given IA network is derived from thedirect SAT-encoded instance of that network by replacing each Boolean variable withits bitwise representation, removing all AMO and ALO clauses and adding PRE clausesas necessary.

4 Experimental Study

4.1 Generating SAT-encoded Test Instances

As a large collection of IA benchmarks is not available, the majority of work in thearea has relied on randomly generated IA problems [22, 15]. This reliance on randomlygenerated problems has the advantage of allowing the average difficulty of a problem setto be controlled. We therefore based our empirical study on random problems generatedusing Nebel’s A(n, d, s) model [15]. The resulting instances are temporal constraintgraphs with n nodes (i.e. intervals) and an average degree of d constrained arcs (i.e.interval relations). Constrained arcs are then labelled with an average of s IA atomicrelations, where 1 ≤ s ≤ 12. Unconstrained arcs are labelled with all 13 IA atomicrelations.

Unlike the S(n, d, s) model used in earlier studies (e.g. [19]), the A(n, d, s) modelallows us to control the difficulty of generated problems. Nebel [15] suggested that theproblem instances can be generated in the phase transition using the A(n, d, s) modelwith the values of d and s fixed to 9.5 and 6.5, respectively. It is worthy to note thatthe phase transition of a problem type is the border between two regions: one is wheremost of the problems have many solutions and it is relatively easy to solve; and oneis where most of the problems are unsatisfiable and it is also relatively easy to prove[3]. In addition, research has empirically showed that instances in the phase transitionare harder to solve for both complete and incomplete search solvers. We therefore usedthese settings to limit the study to only consider harder instances.

To investigate the performance effects of our six different SAT-encoding schemes,we followed Nebel’s original study and randomly generated three test sets of 20, 30and 40 nodes, each containing 100 satisfiable instances. We then pre-processed these

125

instances using the path consistency algorithm before encoding them into SAT. Wefound this pre-processing significantly reduced the problem size of SAT-encoded IAinstances both in terms of the number of variables and clauses.

Table 2 shows the average problem size of these instances together with the averagetime required to encode them into SAT instances. These results indicate the point-basedsupport encoding can produce the smallest SAT-encoded instances within the shortesttime window.

Problem Reduction Encoding #vars #clauses time (secs)

n = 20 Interval support 1, 685 75, 667 0.06

d = 9.5 based direct 1, 685 695, 713 0.19

s = 6.5 log 639 688, 180 0.43

Point support 1, 954 42, 157 0.02

based direct 1, 954 81, 094 0.03

log 1, 299 79, 264 0.03

n = 30 Interval support 4, 484 371, 486 0.19

d = 9.5 based direct 4, 484 3, 697, 416 0.83

s = 6.5 log 1, 577 3, 675, 095 2.37

Point support 4, 741 173, 746 0.06

based direct 4, 741 340, 366 0.10

log 3, 146 335, 800 0.14

n = 40 Interval support 8, 421 1, 019, 320 0.45

d = 9.5 based direct 8, 421 10, 415, 055 2.26

s = 6.5 log 2, 877 10, 351, 622 6.80

Point support 8, 637 443, 354 0.14

based direct 8, 637 873, 715 0.23

log 5, 744 865, 344 0.36

Table 2. Interval versus Point Based Encodings: A comparison on problem generation.

4.2 SAT Solver Selection

Table 3 shows the results of zChaff [14], PAWS [20] and MV-PAWS [17] for the threetest sets generated in Section 4.1.5 We ran PAWS and MV-PAWS for 1, 000 runs oneach 20 node instance and 100 runs on each 30 and 40 instance. All three methods weretimed out after 1, 200 seconds for each run.

We chose PAWS as our SLS solver [20], as PAWS was recently shown to be oneof the most competitive SAT solvers on a range of larger and more difficult problems,while also requiring considerably less effort in terms of parameter tuning than other

5 All our experiments were performed on a Sun supercomputer with 8× Sun Fire V880 servers,each with 8 × UltraSPARC-III 900MHz CPU and 8GB memory per node.

126

comparable weighting schemes. For complete search, we chose zChaff [14] version2004.11.25 as it has won the championship in the SAT competition for three consecu-tive years. In addition, we included a recently developed version of PAWS, MV-PAWS[17], which implements the same local search heuristic as PAWS, but is specificallydesigned to exploit CSP structure in SAT-encoded instances. MV-PAWS does this byautomatically recognising the structure of CSP variables in a SAT problem and ensuringthat each underlying CSP variable is only ever instantiated with a single value duringthe search. This built-in MV-PAWS mechanism means it can discard all ALO and AMOclauses, as they are now redundant.

zChaff PAWS MV-PAWS

Problem Reduction Encoding meantime Param mean

timemeanflips Param mean

timemeanflips

n = 20 Interval support 0.17 33 1.42 67, 441 100 0.85 15, 955

d = 9.5 based direct 2.66 29 18.04 133, 572 n/a n/a n/a

s = 6.5 log 1, 200 67 65.82 190, 436 n/a n/a n/a

Point support 0.07 52 1.13 124, 388 100 0.26 11, 696

based direct 0.17 50 3.53 246, 886 n/a n/a n/a

log 3.49 68 1.16 54, 734 n/a n/a n/a

n = 30 Interval support 2.17 40 42.83 533, 668 100 20.59 134, 887

d = 9.5 based direct 84.54 31 147.94 382, 124 n/a n/a n/a

s = 6.5 log 1, 200 69 271.55 278, 265 n/a n/a n/a

Point support 0.53 61 23.28 941, 819 100 3.53 67, 462

based direct 1.95 60 78.80 1, 691, 735 n/a n/a n/a

log 104.90 70 28.57 440, 037 n/a n/a n/a

n = 40 Interval support 14.61 40 210.38 1, 274, 465 100 120.13 389, 524

d = 9.5 based direct 354.99 n/a n/a n/a n/a n/a n/a

s = 6.5 log 1, 200 n/a n/a n/a n/a n/a n/a

Point support 3.58 80 179.14 2, 821, 202 100 34.36 287, 986

based direct 13.74 n/a n/a n/a n/a n/a n/a

log 914.36 n/a n/a n/a n/a n/a n/a

Table 3. A comparison of Interval versus Point Based Encodings.

4.3 Results for Different Encodings

Overall, the results in Table 3 and graphed in Figure 2 clearly show that the point-basedtransformation produced better results that the interval-based encoding, regardless ofproblem size, solver or the SAT encoding method employed. Similarly, the supportencoding produced the best performance of all three SAT encoding schemes, regardlessof problem size, solver6 or transformation method (this is consistent with earlier work

6 As the support encoding was so strongly favoured by PAWS and zChaff we did not test MV-PAWS on the other encodings.

127

of [5]). Putting this together leads to the strong conclusion that a combination of point-based transformation and support encoding produces the best results, at least for therandomly generated IA problems considered.

The superior performance of the support encoding can be partly explained by thesignificantly smaller number of clauses generated (on average about ten times less thanfor direct or log encoded instances). However, it should be noted that the search space(i.e. the number of variables) of support and direct encoded instances are the same,whereas the search space of log encoded instances is O(n×(|s|−log|s|)) times smaller[8]. A further possible reason for the superiority of the support encoding (suggested byGent [5]) is the reduced bias to falsify clauses, i.e. the numbers of positive and negativeliterals in support encoding are more balanced than in direct encoding and hence thismay prevent the search from resetting variables to false shortly after they are set to true.

Although our experiments confirmed the superiority of direct over log encoding (asper Hoos [8]) for the interval-based transformation, our results differed for the point-based transformation, where log encoding was found to be better than direct encoding,both in terms of runtimes and flips. To investigate this further, we used two features ofthe search space originally developed in [8]’s study: the standard deviation of the ob-jective function (sdnclu) and the local minima branching along SLS trajectories (blmin)(we did not look at the solution density as this was the same for both encodings). In-tuitively, the larger the sdnclu and blmin values are, the more effective the SLS solver.However, this hypothesis did not fit with our results. This may be explained by thereduced size of the point-based log clauses, resulting from the smaller number of PAatomic relations (in comparison to IA).

4.4 Results for SAT Solvers

As with the encoding results, a comparison of the relative performance of the threesolvers produces a fairly unambiguous picture: zChaff outperforms MV-PAWS, andMV-PAWS outperforms PAWS, with relative performance being fairly independent ofthe transformation or SAT encoding method employed. The superior performance ofMV-PAWS over PAWS confirms the results of [17], where MV-PAWS had a similaradvantage on a range of SAT-encoded non-temporal CSP instances. This also confirmsthe results of earlier studies showing that SAT algorithms which exploit CSP structuregenerally produce better performance (e.g. [4, 2]).

However, the overall domination of zChaff is more surprising, as the earlier TSATstudy [19] indicated that local search has an advantage over complete search in thetemporal domain. In these results, zChaff is not only better across the board, but appearsto be scaling as well or better than both of the local search approaches. It should alsobe considered that there is a version of zChaff [2] (analogous to MV-PAWS) that alsoexploits underlying CSP structure, and we would expect this solver, when released, toproduce even better performance on these SAT-encoded instances.

4.5 SAT versus Existing Approaches

The experimental results in the previous section have clearly shown a point-based trans-formation using a support SAT encoding is the better encoding, that MV-PAWS is the

128

n=20 n=30 n=400

10

20

30

40

50

SAT−encoded IA instances

CP

U ti

me

(sec

onds

)

zChaff

ISIDILPSPDPL

n=20 n=30 n=400

50

100

150

200

250

300

SAT−encoded IA instances

CP

U ti

me

(sec

onds

)

PAWS

ISIDILPSPDPL

Fig. 2. Scalability of zChaff and PAWS on different SAT encodings.

better local search solver, and that zChaff is the better overall solver. The final ques-tion to address is how these SAT approaches compare to the existing state-of-the-artspecialised IA solvers, namely TSAT and Nebel’s backtracking/path consistency algo-rithm.

For this study we generated three test sets of 80, 90 and 100 nodes using the samemethod discussed in Section 4.1 (with each set containing 10 satisfiable instances). Inaddition to using zChaff and MV-PAWS on support encoded point-based transforma-tions, we used TSAT on its end point ordering representation and used two variantsof Nebel’s backtracking algorithm (NBT+I and NBT+H), on the native IA represen-tations. NBT+I instantiates each arc with an atomic relation in I, whereas NBT+Hassigns a relation in the setH of ORD-Horn relations to each arc. Other heuristics usedin Nebel’s backtracking algorithm were set to default.

The results in Table 4 provide strong evidence that the approach of modelling IAproblems as SAT has met with success. In particular, zChaff was the only method ca-pable of solving all instances in the problem set within 1 hour. While Nebel’s NBT+H)was superior on the 80 node problems, it failed to scale up on the 90 and 100 nodeinstances, and while TSAT remained competitive with zChaff in terms of median time,it proved unable to consistently solve all instances (falling to 76% success on the 100node problems). Of secondary interest is that TSAT does appear to have the advantageover MV-PAWS, especially on the larger instances.

5 Conclusions and Future Work

In conclusion, the experiments indicate that our new point-based support encoding isthe most suitable scheme to encode IA problems into SAT instances. The results furthershow that running a state-of-the-art complete SAT solver on such representations canproduce superior results to two of the fastest specialised IA solvers reported in the cur-rent literature. This suggests that a SAT approach to solving larger and more difficult IAproblems may be preferable to developing specialised representations and algorithms.

129

Success Time (secs)

Problem Solver 100% median mean

n = 80 zChaff 100 191.46 244.04

d = 9.5 MV-PAWS 90 416.41 925.41

s = 6.5 TSAT 87 27.01 594.66

NBT+H 100 64.81 210.06

NBT+I 50 3, 309.31 3, 024.78

n = 90 zChaff 100 393.77 576.67

d = 9.5 MV-PAWS 85 944.91 1, 364.66

s = 6.5 TSAT 82 539.88 915.25

NBT+H 60 1, 792.86 2, 238.46

NBT+I 30 3, 600.00 2, 581.89

n = 100 zChaff 100 1, 132.23 1, 120.03

d = 9.5 MV-PAWS 64 2, 059.51 2, 229.51

s = 6.5 TSAT 76 1, 307.80 1, 525.73

NBT+H 30 3, 600.00 2, 571.44

NBT+I 20 3, 600.00 3, 270.14

Table 4. A comparison of SAT versus Existing Approaches.

In future work we anticipate that the performance of our SAT-based approach canbe further improved by exploiting the special structure of IA problems in a manner anal-ogous to the work on TSAT. The possibility also opens up of integrating our approachto temporal reasoning into other well known real world problem domains such as plan-ning. Given the success of SAT solvers in many other real world domains, our workpromises to expand the reach of temporal reasoning approaches for IA to encompasslarger and more practical problems.

References

1. J. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM,26:832–843, November 1983.

2. Carlos Ansotegui, Jose Larrubia, and Felip Manya. Boosting chaff’s performance by incor-porating CSP heuristics. In Proceedings of the Ninth International Conference on Principlesand Practice of Constraint Programming (CP-03), pages 96–107, 2003.

3. P. Cheeseman, B. Kanefsky, and W. Taylor. Where the really hard problems are. In Proceed-ings of the Twelveth International Joint Conference on Artificial Intelligence (IJCAI-91),pages 331–337, 1991.

4. A. Frisch and T. Peugniez. Solving non-Boolean satisifiability problems with stochasticlocal search. In Proceedings of the Seventeenth International Joint Conference on ArtificialIntelligence (IJCAI-01), 2001.

5. I. Gent. Arc consistency in SAT. In Proceedings of the Fifteenth European Conference onArtificial Intelligence (ECAI-02), pages 121–125, 2002.

6. K. Ghiathi and G. Ghassem-Sani. Using satisfiability in temporal planning. WSEAS Trans-actions on Computers, 3(4):963–969, 2004.

130

7. M. Golumbic and R. Shamir. Complexity and algorithms for reasoning about time: A graph-theoretic approach. Journal of ACM, pages 1108–1133, 1993.

8. H. Hoos. SAT-encodings, search space structure, and local search performance. In Proceed-ings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99),pages 296–302, 1999.

9. H. Kautz, D. McAllester, and B. Selman. Encoding plans in propositional logic. In Proceed-ings of the Fifth International Conference on Principles of Knowledge Representation andReasoning (KR-96), pages 374–384, 1996.

10. P. Ladkin and A. Reinefeld. Effective solution of qualitative interval constraint problems.Artificial Intelligence, 57(1):105–124, 1992.

11. Peter Ladkin and Alexander Reinefeld. Fast algebraic methods for interval constraint prob-lems. Annals of Mathematics and Artificial Intelligence, 19:383–411, 1997.

12. C. Li and Anbulagan. Look-ahead versus look-back for satisfiability problems. In Pro-ceedings of the Third International Conference on Principles and Practice of ConstraintProgramming (CP-97), pages 341–355, 1997.

13. A. K. Mackworth. Consistency in networks of relations. Artificial Intelligence, 8:99–118, *1977.

14. M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efficientSAT solver. In Proceedings of the 38th Design Automation Conference (DAC-01), Las Vegas,2001.

15. B. Nebel. Solving hard qualitative temporal reasoning problems: Evaluating the efficiencyof using the ORD-Horn class. Constraints, 1(3):175–190, 1997.

16. B. Nebel and H. J. Burckert. Reasoning about temporal relations: A maximal tractable sub-class of Allen’s Interval Algebra. Journal of ACM, 42(1):43–66, 1995.

17. D. N. Pham, J. Thornton, A. Sattar, and A. Ishtaiwi. SAT-based versus CSP-based constraintweighting for satisfiability. In Proceedings of the Twentieth National Conference on ArtificialIntelligence (AAAI-05), page to appear, 2005.

18. S. Prestwich. Local search on SAT-encoded colouring problems. Lecture Notes in ComputerScience, 2003.

19. J. Thornton, M. Beaumont, A. Sattar, and M. Maher. A local search approach to modellingand solving Interval Algebra problems. Journal of Logic and Computation, 14(1):93–112,2004.

20. John Thornton, Duc Nghia Pham, Stuart Bain, and Valnir Ferreira Jr. Additive versus multi-plicative clause weighting for SAT. In Proceedings of the Twentieth National Conference onArtificial Intelligence (AAAI-04), pages 191–196, 2004.

21. P. van Beek. Reasoning about Qualitative Temporal Information. Artificial Intelligence,58:297–326, 1992.

22. P. van Beek and D. Manchak. The design and experimental analysis of algorithms for tem-poral reasoning. Journal of Artificial Intelligence Research, 4:1–18, 1996.

23. M. Vilain and H. Kautz. Constraint propagation algorithms for temporal reasoning. InProceedings of the Fifth National Conference on Artificial Intelligence (AAAI-86), pages377–382, 1986.

24. Toby Walsh. SAT v CSP. In Proceedings of the Sixth International Conference on Principlesand Practice of Constraint Programming (CP-00), pages 441–456, 2000.

131

Modelling and Reformulating Constraint … Hnich Patrick Prosser Barbara Smith ... Modelling and...

Documents

Transcript of Modelling and Reformulating Constraint … Hnich Patrick Prosser Barbara Smith ... Modelling and...