The Asymmetric Travelling Salesman Problem - Simple search769909/FULLTEXT01.pdf · as a engine of...

The Asymmetric Travelling Salesman Problem

A Linear Programming Solution Approach

SOHOF DASTMARD

Degree Project in Computer Science, DD143X

Supervisor: Per Austrin

Examiner: Orjan Ekeberg

May 4, 2014

Abstract

The travelling salesman problem is a well known optimization problem. The goal is to find theshortest tour that visits each city in a given list exactly once. Despite the simple problem statementit belongs to the class of NP-complete problems. Its importance arises from a plethora of applicationsas well as a theoretical appeal. The asymmetric TSP is not as well researched as the symmetric TSP,in this paper we focus on a solution approach suitable for the asymmetric case. We demonstrate howa linear programming formulation can be used to solve the problem. We also show the limitationsof this solution approach and provide suggestions for improving it.

i

Contents

1 Introduction 11.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Why study this problem? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Aim of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.5.1 Euclidean TSP and the Triangle inequality . . . . . . . . . . . . . . . . . . . . . . 21.5.2 Type of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5.3 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Preliminaries 32.1 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Linear Programming 43.1 Matrix Algebra as Linear Function Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 43.2 Linear Program as an Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . 53.3 Standard Form of Linear Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.4 Finding the Optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.5 Efficient Algorithms for Linear Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 The Asymmetric Travelling Salesman Problem 84.1 Graphical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.2 Linear Programming Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3 The Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Results 11

6 Conclusions 13

ii

1 Introduction

1.1 Problem Description

Consider a salesman who wishes to visit a number of cities in order to sell a certain product and thenreturn home. Given the distance between cities he would like to find a tour (an order to visit the cities)with the following properties: the tour should visit each city on his itinerary exactly once and it shouldcause him to travel as little distance as possible i.e it should be the tour with minimal distance. Thisis know as the travelling salesman problem, abbreviated TSP. The problem statement is very simplebut quite deceptive. In fact, the TSP is one of the most famous and intensely studied problems incomputational mathematics.

The origin of the TSP is unclear but it is known that it was mathematically formulated and studied inthe 1800s by the Irish mathematician W.R Hamilton. Amused by this problem Hamilton even inventeda game based on it, the Icosian Game. The game is a recreational puzzle based on finding a Hamiltoniancycle along the edges of a dodecahedron such that every vertex is visited a single time and ending in thesame vertex one started. The general form of the TSP today is attributed to Karl Menger, who in the1930s defined the problem and considered some obvious brute-force approaches.

When considering a solution technique it makes no difference if we say that we want to find theminimal distance, minimum cost or some other equivalent formulation. They are all equivalent waysof stating the same problem. In our formulation above we have implicitly assumed that the distancesbetween two cities are symmetric. This means for any two cities i and j, the distance from i to j is equalto the distance from j to i. In general there are many ways where this symmetry could break down,e.g we could consider one-way streets or the price of plane tickets being different depending on whichway we travel. Thus the symmetric TSP can viewed as a special case of the more general asymmetrictravelling salesman problem or ATSP. Despite the TSP being so intensely studied less is known aboutfinding optimal tours in the ATSP case and this is the subject of our study.

So how should our salesman (or saleswoman) go about in finding this optimal tour? Clearly, one waywould be to list all possible tours, calculate the total distance involved for each tour and then select thetour with the minimal total distance. This simple approach, also known as a brute-force or exhaustivesearch, is a general problem solving technique. In a systematic way we enumerate all possible candidatesfor a solution and then we check each one to see if it satisfies our problem requirement. Suppose oursalesman has four cities on his list and he wishes to list all possible tours. In selecting the first city tovisit he can choose among any of the four cities, thus he has four options on how to start. For the secondcity he has three options, for the third city two options and for the last city only one option remains.Therefore, the multiplication principle tells us that he has a total of 4! = 24 possible tours. To listall 24 possible tours and then choosing the one with the minimal distance does not seem impossible.However, if we increase the number of cities to 12, our salesman would need to check a whopping 12! =479,001,600 possible solutions. Evidently, this brute-force approach is infeasible for larger problem sizes!

1.2 Why study this problem?

It turns out that the TSP has numerous applications, to present only a few: It is used in the problem ofdrilling holes on circuit boards. The holes may be of different sizes and therefore the drilling machine hasto change the drill head for the different holes. This is a very time consuming and costly process. Clearlya better solution is to first drill all holes of the same size, change the head and continue with all holesof the next diameter and so on. Thus, this problem can be thought of as a series of TSP’s, one for eachhole diameter. Another application is the vehicle routing problems where serving the customers in anurban area calls for the optimal solution of the ATSP. Furthermore we have applications in such diversefields as overhauling gas turbine engines, X-ray crystallography, picking in inventory systems (sequenceof movements of e.g a crane to pick up a set of items), different kinds of scheduling problems and manymore.

However, the main motivation behind continuing to study this problem is the success it has hadas a engine of discovery for general-purpose techniques in applied mathematics. One of the greatestaccomplishment of studying the TSP is the aid it has given to developing the field of mixed-integerprogramming. The widely known branch-and-bound has its origins in work on the TSP. It has played animportant role in the development of the most used heuristic-search algorithms. Furthermore, finding an

1

efficient algorithm for the TSP will help in settling one of the major unresolved problems in computerscience and mathematics, namely the P = NP question. The consequences of the answer in eitherdirection would advance theory tremendously and possibly have huge practical consequences.

1.3 Method

We will solve our problem using the field of linear programming and the simplex method. We willintroduce the theoretical background in chapter 3 and formulate our problem as linear programmingproblem in section 4.2. Then we will use the GNU Linear Programming Kit (GLPK), which implementsthe simplex method, to solve some generated instances of the ATSP. Thus we will be able to study thetechniques in practice and when appropriate draw conclusions.

1.4 Aim of the study

As mentioned earlier the brute force approach is not a feasible one for large problem sizes. Our aim isto investigate the liner programming approach and the theoretical background of why this method isapplicable to our problem. We hope to demonstrate why this approach is suitable and use it to solve someproblem instances. We also aim to investigate how this solution technique scales with large problemssizes. And perhaps offer some insights on the flaws of the method as well as some suggestion on how onecan improve the method used.

1.5 Limitations

1.5.1 Euclidean TSP and the Triangle inequality

In the euclidean TSP, distances are the ordinary euclidean distances which is a special case of the metricTSP. We will assume that all our distances are euclidean. Basically this means that the distance betweentwo points p and q is the length of the line segment connecting them. This is an important simplificationsince it is known that euclidean/orthodromic distances have little resemblance to real distances betweennodes or locations linked through transportation networks or roads as shown by Daganzo [1]. The usageof euclidean distances is motivated by the huge cost and difficulty in obtaining a real distance matrix.However this simplification does not effect the complexity of the problem or our solution techniques.

The triangle inequality is an important concept in mathematics. It simply states that for any trianglethe sum of the lengths of any two sides is always greater than the length of the remaining side. Thisis a very natural restriction of the TSP and for our euclidean TSP this applies to all our distances.At first glance this might seem straightforward and not worthy of being mentioned. However we caneasily imagine situations where the distance or cost of travel from city A to city C is cheaper/less if wechoose to go through an intermediary city B first. For example direct non-stop flights are normally moreexpensive than indirect flights or the direct road from A to C might be very tortuous.

1.5.2 Type of Algorithm

Research on the TSP is a rich and vast field, therefore there are different angles of approach to theproblem. One such angle is the type of algorithm used which decides what type of result one can expect.We can base an approach on exact algorithms, heuristics or approximation algorithms. In general thereis also a distinction made between constructive algorithms, i.e algorithms that actually construct atour with some specified cost and non-constructive algorithms which return a cost for a tour but don’tactually the construct the tour. In this thesis we will restrict our attention to constructive algorithmswhich render mathematically provable optimal solutions and we will not discuss approaches based onheuristics or approximations. Other approaches could lead to more efficient algorithms which can handlelarge problem sizes. However, they can only guarantee suboptimal solutions.

1.5.3 Time

Time is probably the main limitation of our investigation. This thesis gives 6 hp credits, effectivelyfour weeks study. Thus we will only have time to for a cursory review of this fascinating problem.

2

With more time it would be interesting to rigorously test and compare different techniques in softwareimplementations.

2 Preliminaries

In this section we introduce a few concepts which are helpful in order to understand the concepts ofalgorithm analysis and computational complexity. The section can be used as a reference or skippedentirely if the reader feels comfortable with the concepts presented.

2.1 Algorithm Analysis

An algorithm is a step by step instruction devised to solve a problem. When we analyze an algorithm,we are in essence trying to understand how its resource requirements, i.e the amount of time and spaceit uses, scales with increasing input size. Therefore, we begin by describing the algorithm using amathematical model and then we analyze the running time mathematically as a function of the inputsize. To this end we usually limit our focus on analyzing the worst-case running time. Initially thisapproach might seem draconian, since what if an algorithm does very well on almost all inputs excepta few rare instances where it performs unsatisfactory? Even though this is a valid point and deservesour attention in specific cases, this approach has proven to be a rather successful one when examiningefficiency in real world problems. So what is an efficient algorithm? Well, if the input size is increasedby a constant factor a good and desirable property would of course be if the algorithm only slowed downby some constant factor.

Suppose we have have an algorithm so that on every input size x, its running time is bounded byf(x) = k ∗ ex (where k is a constant and > 0). Now if we increase the input size from x to 2x we getf(x) = k ∗ e2x = k ∗ (ex ∗ ex). Clearly the bound on the running time increases multiplicatively by factorof ex, itself a function of the input.

Now, suppose the running time of our algorithm, on every input size x, is bounded by the polynomialexpression p(x) = k ∗ xd (where k and d are constants > 0). An algorithm which has its running timebounded by a polynomial expression is said to have a polynomial running time. Now, when we in thiscase increase the input size from x to 2x. We get p(x) = k ∗ (2x)d = k ∗ 2d ∗ xd. Thus the bound on therunning time has increased by factor of 2d, so we have a slow down by a factor of 2d. However, 2d is aconstant factor and does not depend on the input size. In general an algorithm is considered to be anefficient one if it has polynomial running time.

From this brief discussion we see that analyzing algorithms inherently appears to based on the abilityto express the idea that an algorithms worst-case running time in inputs of size n grows at a rate at mostproportional to some function f(n). Naturally we also need to express the growth rate of functions inways where we can be be insensitive to certain low-order factors, e.g clearly in the example above thedominating factor of p(x) is xd. For this purpose we use the “Big O notation” which is a convenientway of describing the asymptotic behaviour of a function. So for our example we would say that p(x) isO(xd).

2.2 Computational Complexity

In the theory of computational complexity our goal is to classify computational problems according totheir inherent difficulty and then relating them to each other. A problem is considered to be difficult ifit requires a lot of resources to compute, whatever algorithm is used. When we have a set of problemsof related complexity we say that they belong to the same complexity class. Usually in this context theproblems are formulated as decision problems, i.e they have yes or no answers.

The general class of problems for which there exists algorithms that can provide an answer in poly-nomial time is simply called P. Now, for some problems there are no known algorithms that can provideanswers in polynomial time but these problems have the peculiar characteristic that if we would beprovided with a possible solution we would in fact be able to verify the solution in polynomial time. Theclass of problems where the answer can be verified in polynomial time is called NP.

Now, clearly we have P ⊆ NP since if we actually do have polynomial time algorithm for a problemthat same algorithm can be used to construct an efficient certifier for the solution. But is P = NP?

3

In fact, this is one of the major open questions in complexity theory and indeed all of mathematics. Itis one of the Clay Mathematics Institutes Millennium Problems and whoever solves it will be awarded$1,000,000. Settling this question will have deep and profound implications for such diverse field asmathematics, cryptography, algorithm research, artificial intelligence, philosophy, economics etc.

When investigating this question we come across the concept of NP−completeness. This is the classof problems to which all other NP problems can be reduced to in polynomial time and whose solutionstill can be verified in polynomial time. Thus any NP problem can be transformed to an NP−completein polynomial time. In a sense you can say that NP − complete problems are the most difficult ones inNP and thus it would suffice to find a polynomial time algorithm for any NP − complete problem tosettle the P versus NP question. We also come across the concept of NP − hard, a class of problemswhich are at least as difficult as the hardest problems in NP. But despite the prefix NP, these problemsdo not actually have to be elements of NP.

Problems which we can solve in theory, but where the computation of a solution for large inputstakes so long time effectively making it useless for practical purposes are known as intractable, i.e notmanageable problems. Therefore, in complexity theory problems that lack polynomial time solutions areconsidered to be intractable.

It turns out that many natural and important decisions problems are in fact NP − complete, thusintractable as shown by the works of Cook [2] and Karp [3]. We mentioned in the introduction that anefficient solution to our problem would answer an open and fundamental question in computer scienceand mathematics, the motivation behind this claim is the fact that the decision versions of the TSP andthe ATSP are both NP − complete and considered intractable problems, a proof for the ATSP case canbe found in Kleinberg [4]. In general it is not believed that P = NP.

3 Linear Programming

In this section we give a short introduction to the field of linear programming. Most of the materialin this section can be found in any book on linear programming, we studied the works of T. Sottinenin [5]. The aim of this section is to give a general picture of linear programs as well as providing sometheoretical background for the simplex method. Linear programming (LP) or linear optimization is amethod used to carry out optimization in a mathematical model where the underlying relationship arelinear. If the reader recalls, in linear algebra we solve simultaneous equations written Ax = b usingmatrix-vector notation. Linear programming can be thought of as a more complex version of this, withinequalities instead of equations, thus in the form Ax ≤ b.

3.1 Matrix Algebra as Linear Function Algebra

We will present our ideas using matrix algebra so some understanding of why we can model linearfunctions with matrices is helpful. In general we will omit providing proofs and refer the reader to anyintroductory book on linear optimization. The definition of a linear function is:

Definition 3.1.1. A function f : Rn → Rm is linear if

f(αx + βy) = αf(x) + βf(y) for all x,y ∈ Rn and α, β ∈ R

The following theorem shows the important connection between linear functions and matrices.

Theorem 3.1.1. Let f : Rn → Rm. The function f is linear if and only if there is a (m× n)- matrix Asuch that

f(x) = Ax

.

The theorem has two parts, the first part says that if a function f is defined by f(x) = Ax then it islinear. The second part says if a function is linear, then there is a matrix A such that f(x) = Ax. Nowsince a row vector [c1c2 . . . cn] is a (1× n)- matrix we have the following corollary:

Corollary 3.1.2. A function f : Rn → R is linear if there is a vector c ∈ Rn such that

f(x) = cTx.

4

3.2 Linear Program as an Optimization Problem

In general optimization problems can be very diverse. In a linear optimization problem, usually referredto as a linear program, we are given a real valued linear function z = f(x) (the objective function)and a set of variables x1, x2, . . . , xn. We want to find an assignment to these variables which maximizesor minimizes f(x) subject to a collection of linear equality or inequality constraints. An assignment ofvalues to the variables which satisfy the inequalities is called a feasible solution and the value z = f(x)for a given assignment is called the cost of the assignment. Every assignment can be thought of as ann-dimensional point in Rn. If we require all our unknown variables to be integers, then the problem iscalled integer programming (IP) and if we restrict ourselves to the subset {0,1} we call it binary integerprogramming (BIP).

3.3 Standard Form of Linear Programs

The simplest and most common form of representing a linear program is standard form. Furthermore,every linear program can be converted to this form. Normally we need to convert our LPs to canonicalform before they can be processed by solution algorithms, such as the simplex algorithm. A linear pro-gram in standard form consists of three parts:

The objective function to be maximized:

z = f(x1, x2, . . . xn) = c1x1, c2x2, . . . , cnxn

Problem constraints written in the following form:a11x1+ a12x2 . . . a1nxn ≤ b1a21x1+ a22x2 . . . a2nxn ≤ b2

... . . .. . .

...am1x1+ am2x2 . . . amnxn ≤ bm

Non-negative variables:

x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0

Using linear algebra we can present this in shorter notation:

Maximize

z = cTx

Subject to

Ax ≤ b

x ≥ 0.

The vector c is the column vector of coefficients of the objective function and x is the column vectorof variables sometimes called the decision variables. The matrix A is the matrix of coefficients of the left-hand sides of the inequalities and b is the m-dimensional vector of the right-hand sides of the inequalities.If any of the constraints have the form ax ≥ b we turn them into standard form by multiplying them with−1. If we have a variable x which is not required to be ≥ 0 we can introduce two new variables x′ andx′′ and perform the substitution x = x′−x′′, requiring that x′ ≥ 0 and x′′ ≥ 0. After the transformationany feasible solution for the new linear program is still a feasible for the original one with the same cost.Some definitions of linear programs have equations as constraints. Given an equation aTx = b, we canrewrite it as two inequalities aTx ≤ b and aTx ≥ b. In addition, any minimization problem f(x) canbeen seen as maximization of −f(x).

5

3.4 Finding the Optima

The value that optimizes the objective function is called the optima. Inequalities of the form a1x1 +a2x2 . . . anxn ≤ b divide the space Rn into feasible regions where they are satisfied and regions wherethey are not. An interesting result is that the optima is found in one of the corners of the feasible region.We call our decision variables corresponding to these corners Basic Feasible Solutions (BFS). Findingthe best BFS is thus our goal.

Definition 3.4.1. The feasible region R of an LP

max z = cTx

s.t. Ax ≤ b

x ≥ 0

is the set of decision x that satisfy Ax ≤ b and x ≥ 0

R = {x ∈ Rn : Ax ≤ b,x ≥ 0}

.

A convex set is a set of points such that given any two points in the set we can draw a line segmentbetween them and this line will lie entirely within the set. Intuitively, it means the set is connected andthat we go from any point to another without leaving the set. It has thus no dents in the perimeter. Seefigure 1 for an illustration.

Figure 1: A convex and a non-convex set

Theorem 3.4.1. The feasible region of an LP is a convex set: If we have x and y belonging to thefeasible region, then αx + (1− α)y will also belong to the feasible region for α ∈ [0, 1].

Theorem 3.4.1 basically says that if we have two feasible solutions and we draw a line between them,then every solution on that line is also a feasible solution.

When examining feasible solutions we say that a constraint bi is active at the decision x if ai1x1 +ai2x2 · · ·+ ainxn = bi. Which basically means that the particular resource is fully used, with decision x.

Definition 3.4.2. A decision, or feasible solution, of an LP is

• Inner point if we don’t have any active constraints at that solution.

• Boundary Point if we have at least one active constraint at that solution.

• Corner Point if we have at least n linearly independent active constraints at that solution.

In general the boundary between two regions given by a1x1 +a2x2 . . . anxn = K is a hyperplane and thetwo regions are called half-spaces. The intersections of these half-spaces renders a feasibility region andis called a polytope. A polytope is a geometric object with flat sides. E.g a polygon is a polytope in twodimensions and a polyhedron is a polytope in three dimensions. Too provide a more concrete exampleconsider the following inequalities:

6

2x1 + 3x2 ≤ 2 (3.4.1)

−3x1 + 2x2 ≤ 3 (3.4.2)

2x1 + x2 ≤ 4 (3.4.3)

In figure 2 below, we illustrate these inequalities and see how each inequality divides the plane into tworegions. One where the inequality is satisfied and one where it is not. The intersections of these regions,a convex set, forms a polygon as seen.

Figure 2: Plot of inequalities (3.1), (3.2) and (3.3). Source: T.Sottinen [5]

Now we need to check the feasible region for solutions, but there are infinite number of points on theinterior of the feasible region. Luckily, as the following theorems demonstrate we don’t need to checkthe whole region.

Theorem 3.4.2. If we let x∗ be an optimal solution to an LP. Then x∗ is a boundary point of thefeasible region.

Actually we can do much better than theorem 3.4.2, the following theorem states that we don’t evenneed to examine the entire boundary. It suffices to only check the corners.

Theorem 3.4.3. We can find an optimal solution x∗, if it exits, in one of the corner points of thefeasible region. Thus, an optimal solution is a BFS.

Thus the theory of linear programming tells us that for linear systems like this, the optima i.e wherethe objective function has the smallest (or largest) will always be on the corners of the feasibility region.The coordinates of the corners of this feasibility region are the intersections of the hyperplanes. In ourexample we have only five corner points to check! This observation has of course huge impact on thesearch for the optimum point and underlies the simplex method which we will be discussing in section4.3. An LP algorithm, such as the simplex method finds this point on the polytope if such a point exits.It can fail to exist if the intersection of the hyperplanes is an empty set in which case the problem isinfeasible. Another possibility is if the feasible region is unbounded and therefore the problem becomesunbounded.

7

3.5 Efficient Algorithms for Linear Programs

For a long time LP problems were not known to have efficient algorithms, but in 1948 Dantziq [6]presented a breakthrough with the introduction of the simplex method for solving LP problems. Thesimplex method is remarkably efficient in practice and constituted a major improvement over earliermethods. Despite the success, in 1972, Klee and Minty [7] showed that the simplex method has ex-ponential worst-case complexity. However, following the works of Leonid Khachiyan in 1979 (Ellipsoidalgorithm) and Narendra Karmarkar in 1984 (interior point algorithm) polynomial time algorithms werefinally presented. Now, of great importance is that even though we have efficient algorithms for LPproblems it turns out that IP problems are in many practical situations NP − hard. In 1972 RicardKarp [3] proved that the decision version of BIP is NP − complete and as it turns out the TSP can beformulated as a BIP problem.

4 The Asymmetric Travelling Salesman Problem

4.1 Graphical Representation

We can visualize our problem using notation from graph theory. In the asymmetric case, we are givena complete directed graph G = (V,E). The finite set of vertices (sometimes referred to as nodes) V ={v1, v2, . . . vn} correspond to the cities, and the finite set of edges E = {eij = (vi, vj) : vi, vj ∈ V, i 6= j)}correspond to roads connecting the cities. Each edge eij is thus represented by an ordered tuple (vi, vj)and means there is an edge from v1 to v2 and in general eij 6= eji. We can now define a cost matrixC = (cij) on E which satisfies the triangle inequality. Each element cij represents the cost of traversingthe edge eij . This cost could represent the distance or some other attribute of interest. A directedgraph is simply one where the edges have direction, and you can only traverse the edge in the directiongiven. A complete graph is one where each pair of distinct vertices is connected by an edge. Figure 3illustrates the different kinds of graphs and how costs can be depicted on a graph. If the graph is notcomplete, our cost matrix will have undefined cost for those missing edges and we also need to deal withelements of type cii. One solution to both these problems is to just set an arbitrarily large cost in thesecases. We would thus set cij = ∞,∀(i, j) 6∈ V and for each element cii = ∞, i ∈ V in the cost matrixC. This effectively excludes loops and missing edges from a tour. Now we can state our goal as findinga Hamiltonian cycle, i.e. a cycle that visits each vertex precisely once, with the least cost. A graphtheoretic formulation of our problem would lead to methods involving finding minimum spanning treesfor constructing tours and edge exchange algorithms to improve the quality of an existing tour. However,as stated in the introduction we will take a linear programming approach as described in 4.2.

(a) an undirected graph

1 3

2

3 1

54

2

(b) a directed graph withcosts

(c) a complete undirectedgraph

Figure 3: Three different kinds of graphs

4.2 Linear Programming Formulation

In this section we formulate our problem as an integer linear program. We introduce the binary decisionvariable xij associated with each edge eij .

xij =

{1 if edge eij is in the optimal tour0 otherwise

8

Any tour or solution can now be represented by a vector x. The components of this vector consists ofour decision variables. The associated costs for each edge are stored as components in vector c.Our goal is then to minimize cTx or stated differently as minimizing (4.2.1).∑

i∈V

∑j∈V

cijxij (4.2.1)

Subject to: ∑j∈V

xij = 1 (i ∈ V, i 6= j) (4.2.2)

∑i∈V

xij = 1 (j ∈ V, j 6= i) (4.2.3)

∑i∈S

∑j∈S

xij ≤ | S | −1 (S ⊂ V, | S |≥ 2) (4.2.4)

xij ∈ {0, 1} (i, j ∈ V ) (4.2.5)

Constraints 4.2.2 and 4.2.3 are known as degree constraints and are easy to realize, 4.2.2 says that forevery node i ∈ V and all the edges leaving that node, only one of them can be in the tour. And thus thesum over the binary decision variables leaving that node can only be equal to 1. Same argument appliesto 4.2.3, only now we re talking about all edges going into node j ∈ V . One important realization isthat a solution might contain subtours, i.e cycles on subsets S ⊂ V , see figure 4. Therefore we introducesubtour elimination constraints 4.2.4 which are linear equality restraints that eliminate these type ofcycles and loops from our solution set. Finally we have the integer constraint 4.2.5.

The problem as currently formulated turns out to be very difficult to solve and cannot be approachedwith the linear programming method directly. Also the number of subtour elimination constraints growexponentially with number of nodes, and therefore all of them cannot be considered explicitly. Nowtrivially we have

0 ≤ xij ≤ 1 for all edges (4.2.6)

So if we replace constraint 4.2.5 with 4.2.6 we obtain a relaxation of our problem, meaning everysolution to our original problem is also a solution under 4.2.6. The linear relaxation of our problem cannow be approached using linear programming techniques, such as the simplex method. Although anoptimal solution x∗ in this case does not need to constitute a valid tour, since the LP solution can befractional. However, it follows that no tour can have cost less then cTx∗. The significance of a lowerbound is that it can be used to certify that a tour we have found is good or possibly the best comparedto other tours which we have not examined. In [8] Dantzig, Fulkerson and Johnson (DFJ) realized thatsolving the relaxation of this problem can help solving the original problem in more substantial waythan just providing a lower bound. There are an astronomical number of subtour constraints to simplyadd to the LP relaxation, such a list would be far too big for any LP solver to handle directly. DFJproposed to work in a iterative fashion, generating and adding the necessary constraints as they comeup. When an optimal solution x∗ found by the simplex algorithm is not valid a tour, we now that theremust be a linear inequality which is satisfied by all valid solutions but violated by x∗. When we reachsuch a point we include this inequality in our LP relaxation and start the simplex process over again.This procedure is now known as the cutting plane method and this added inequality is known as a cut.Having found a cut we add it to our system which results in a tighter relaxation problem, and we iteratethis process until a valid optimal solution is found. In the 1950s Ralph Gomory introduced an algorithmfor generating these cuts automatically in an efficient way from a simplex tableau. The elegance of thismethod is that it can proceed by generating the inequalities only when and if they are needed.

9

1

2

3

4

5

6

21

3

7

9

1

8

5

7

3

9

7

1

6

35

(a) An ATSP graph

1

2

3

4

5

6

1

3

1

3

1

3

(b) Solution with two subtours

Figure 4: The reason we need subtour elimination constraints

4.3 The Simplex Method

In 1948, G.Dantzig [6] introduced the simplex method to solve linear programs. The manual solution of aproblem with the simplex method is long and tedious process and today optimization software packagessuch as CPLEX, MINTO, CLP etc. do this work for us. Many of these LP or MIP (mixed integerprogramming) solvers use the simplex method. The reason for studying the simplex method is that itenhances our understanding of linear programming and how the solutions are derived. We will not coverthe simplex method in depth in this paper. We will constrain ourselves to presenting the general ideas.We begin by pointing out three important properties of linear programs:

1. As a consequence of the fact that all of the constraints and the objective function are linear theoptimum point is always at a feasible corner point. There can be multiple optima but this happensonly when at least two of the optima are adjacent corner-points. This property means we canrestrict our attention to look at corner-points, rather than infinitely many points in the feasibleregion.

2. If a certain corner-point has an objective function value ≥ to all adjacent solutions, then it isoptimal. Also a consequence of modelling with lines, this property means that we can easilyrecognize when we have an optimal solution, thereby eliminating the need to continue looking atall other feasible corner-points.

3. We have only have a finite number of corner-points feasible solutions. So any method that concen-trates on examining only corner-point feasible solutions as the simplex method does will eventually

10

terminate.

To provide a general outline, the simplex method can be said two have two general phases:

1. Start-up: We start by finding any given corner-point solution. One reason we want our linearprogram in standard form is that the origin (0,0 . . . 0) is always a corner-point feasible solution.In any given problem there are of course smarter choices for the starting point, e.g in the TSP itconstitutes by selecting a tour which one thinks is close to the optimal tour.

2. Iterate: We now repeatedly move to a better adjacent corner-point feasible solution, until no suchbetter point can be found. Our corner-point location where this search fails defines the optimumpoint.

The answer to finding our corner-points algebraically lies in converting the inequalities from the stan-dard form LP into equations, and subsequently solving for the intersection of a subset of the equations.We have to solve for a subset since not all of the equations derived from the original inequalities canhold simultaneously. To find the intersection of linear equations, the well known Gaussian elimination isused. Therefore we also need some way of keeping track of which of the equalities are currently selected,or active. These difficulties are solved with the introduction of the so-called slack, surplus and artificialvariables, which take positive values only when the constraint they appear in are not active.

5 Results

We decided to test our approach on three sets of problems. First we decided to test our method againstsome sample problems from TSPLIB, a library of sample instances for the TSP (and related problems)from different sources and of various types. Then we generated pseudo-random cost matrices with 10up to 100 nodes, to see how our problem scales with the size of input. These matrices had edge weightsranging from 1 to 100. We also decided to test the method on some matrices with the same number ofnodes but with edge weights ranging from 1 to 30. All tests were done on a Acer laptop with 4GB ofSDRAM, using a Pentium(R) dual-core processor running at 2 GHz.

To solve our linear programming problem we used the GNU Linear Programming Kit (GLPK). TheGLPK is an optimization software package, written in ANSI C and organized in the form of callablelibrary. The GLPK is used for solving large-scale linear programming and mixed integer programmingproblems. It is part of the GNU project and released under the GNU General Public License. TheGLPK is not as powerful as some other commercial optimization software packages such as the IBMCPLEX Optimization Studio. The GLPK can be modified to use different algorithms when looking fora solution but we processed our data using the simplex method when solving the LP relaxation togetherwith Gomory’s cuts to enforce integer solutions.

We choose to model our problem using the GNU MathProg language, since it has a closer connectionto the mathematical formulation. Mainly, the sub-tour elimination constraints were easier to expressusing this modelling language. In order to limit the margin of error, each problem was processed fourtimes with the GLPK solver, after which the arithmetic mean was calculated. After obtaining our resultswe approximated a curve of best fit using the cubic splines method in Matlab. In figure 5 the resultsare shown for the randomly generated matrices with edge weights 1-100. Clearly we can solve problemsof up to 80 nodes under two minutes. However as we can see from the fitted curve the time it takes isgrowing exponentially and our method does not scale well. Just adding five more nodes from 95 to 100doubles the time it takes to calculate a solution.

In figure 6 the results are shown for the randomly generated matrices with edge weights 1-30. Herewe can see an interesting development. As before the time it takes to calculate a solution seems to begrowing exponentially with input size but if compare our results with figure 5 we notice that we haveconsistently different results on the same amount of nodes. It is apparent that in this problem set ittakes longer to solve for the same amount of nodes. Calculating a solution for the 70 node problem tookapproximately the same amount of time as calculating the 100 node problem in the previous problemset.

11

0 10 20 30 40 50 60 70 80 90 100−50

0

50

100

150

200

250

300

350

400

450

Number of nodes

Tim

e in s

econds

Randomly generated matrices 10 − 100 nodes. Edge weights 0 − 100.

Figure 5: Test results of randomly generated matrices with edge cost 1-100

0 10 20 30 40 50 60 70−50

0

50

100

150

200

250

300

350

400

450

Number of nodes

Tim

e in s

econds

Randomly generated matrices 10 − 70 nodes. Edge weights 0 − 30.

Figure 6: Test results of randomly generated matrices with edge weights 1-30

12

In figure 7 we show the results plotted in the same figure. Here we also include the test resultson problem instances from the TSPLIB. These problems have names as depicted and can be easilydownloaded from the TSPLIB website. With our method we managed to solve two instances namelyftv33 and ftv44. But when we tried on br17, ftv55 and ftv70 our method took more than 10 minutesand we decided to halt the computation.

0 10 20 30 40 50 60 70 80 90 100−100

0

100

200

300

400

500

Number of nodes

Tim

e in s

econ

ds

TSPLIB: br17+10 Minutes

TSPLIB: ftv55 and ftv70+10 Minutes

TSPLIB: ftv33 and ftv44+Solved in < 130 sec

Comparison of different asymmetric problems.

Figure 7: All test results presented in one figure

6 Conclusions

We started our study with the goal of studying the ATSP using the linear programming approach. Clearlythe linear programming approach does solve the ATSP. However, as we have seen from the data gatheredand the discussion of efficient algorithms, our method is not an efficient one. One very interesting findingwas the fact that we got different results based on how we generated the edge weights of the matrix.It seems that the closer the weights are to another somehow increases the complexity of the problem.Thus there is some inherent complexity in the ATSP besides the number of nodes one needs to process.Our method performed well on sample problem ftv44 with 44 nodes but could not solve the br17, a 17node instance within reasonable time. This a clear indication that there is some inherent complexity inaddition to increasing problem size.

In all our tests calculating the LP relaxation took just a few seconds demonstrating the power behindthis method, however the Achille’s heel was trying to enforce the integer constraints. Even though weproceed by generating cuts as they are needed the cuts used by the GLPK solver are not as efficient asone would like. The generated cuts added to the LP relaxation to enforce an integer solution can becomeineffective due to many rounds of cuts being needed to make progress towards the solution. This is afact acknowledged by most experts and even Gomory himself. The Gomory cuts have the exceptionalcharacteristic that they can be used for any integer problem and they are therefore not designed forsolving this problem in particular. We should therefore be able to improve the results if we designed thecuts particularly for the ATSP. There are actually a number of other cuts one can use. One example isa broad class of inequalities called comb inequalities.

13

During the computation it was evident that after a while the improvements in the objective functiondue adding cuts got smaller and smaller. We could in this situation branch the problem into two sub-problems to be minimized separately. The branching would be done iteratively leading to a binarytree of sub-problems. Each of these sub-problems would either be solved without further branching orfound to be irrelevant. They would be found irrelevant if solving them does not provide a better resultthan we already have. This method of approach called branch-and-cut is used in the most advancedsymmetric TSP solvers, such as concorde. As an example we tested concorde on 1000 node problem andobtained an optimal solution within four seconds. It is worth to mention that there are algorithms whichtransform an asymmetric problem into a symmetric one, doubling the size of the problem. In conclusionthe ATSP/TSP is a fascinating problem and at this moment there are no efficient algorithms which cancalculate optimal solutions to arbitrarily large problem sizes. Thus this problem will continue to be ahot research field for the foreseeable future.

14

References

[1] Daganzo, C. (1984). The length of tours in zones of different shapes. Transportation Research PartB, 18(2):135–145.

[2] S. A. Cook, The complexity of theorem-proving procedures. Proc. 3rd Annual ACM Symp. Theoryof Computing, 1971.

[3] R. M. Karp, Reducibility among combinatorial problems. In R. E. Miller and J. W. Thatcher (eds.)Complexity of Computer Computations, Plenum Press 1972.

[4] J. Kleinberg and Eva Tardos, Algorithm Design. Pearson Addison-Wesley, 2006.

[5] T. Sottinen: www.uwasa.fi/ tsottine/orms1020. April 2014.

[6] G.B. Dantzig 1948. Programming in a linear structure. US Air Force Comptroller, USAF, WashingtonDC, USA.

[7] Klee, Victor; Minty, George J. (1972). ”How good is the simplex algorithm?”. In Shisha, Oved.Inequalities III (Proceedings of the Third Symposium on Inequalities held at the University of Cali-fornia, Los Angeles, Calif., September 1–9, 1969, dedicated to the memory of Theodore S. Motzkin).New York-London: Academic Press. pp. 159–175. MR 332165

[8] G.B. Dantzig, D.R. Fulkerson, S. Johnson Solution of a large scale traveling salesman problem Op-erations Research, 2 (1954), pp. 393–410

15

The Asymmetric Travelling Salesman Problem - Simple search769909/FULLTEXT01.pdf · as a engine of...

Documents

Transcript of The Asymmetric Travelling Salesman Problem - Simple search769909/FULLTEXT01.pdf · as a engine of...