Section 3.6 The Traveling Salesperson Problem
In this section we discuss the Travelling Salesperson problem. In Subsection 3.6.1 we introduce the problem and give some explanation of why it is very hard in general. Rather than try to solve it exactly, we will resort to providing upper and lower bounds for the solution. In Subsection 3.6.2 we discuss various methods of constructing upper bounds. In Subsection 3.6.3 we give a method of constructing lower bounds.
Subsection 3.6.1 Introduction to the Traveling Salesperson Problem
Let us first describe the Traveling Salesperson Problem, or TSP for short, in informal language, and then translate it into a question about graph theory.
Imagine you work for a company, travelling from city to city, trying to sell some product in each (for instance, encyclopedias). You are assigned a list of cities you need to visit, and you need to start from your home, travel from city to city visiting them all, and finally return to your home. Of course, travelling from city to city is expensive (either in terms of money, travel time, or something else), and to turn a profit your company wants you to organize the order you visit each of cities so that the total cost is as cheap as possible. This minimiziation problem is the TSP.
Translated into graph theory, the TSP can be succinctly stated as follows: given a weighted graph \(\bfG\text{,}\) find the cheapest Hamiltonian path. That is, the cheapest closed walk on \(\bfG\) that visits every vertex exactly once.
First, note that it is enough to consider the complete graph \(K_n\text{.}\) If we are given some other weighted graph \(\bfG\text{,}\) we can add all the edges not in \(\bfG\) but make their weights much larger than any of the weights inside \(\bfG\text{.}\)
Another important point is that the problem of determining whether a given graph \(\bfG\) has a Hamiltonian cycle is a special case of the traveling salesman problem. To see this, suppose we're given a graph \(\bfG\text{,}\) and we want to determine whether it is Hamiltonian. We create a weighted \(K_n\text{,}\) with vertices the vertices of \(\bfG\) by giving the edge \(v-w\) a very small weight \(\epsilon\) if \(v\) and \(w\) are adjacent in \(\bfG\text{,}\) and a very large weight \(M\) if \(v\) and \(w\) *are not* adjacent in \(\bfG\text{.}\) Then, any Hamiltonian path in \(\bfG\) would have cost \(n\epsilon\text{,}\) where as any path that uses an edge not in \(\bfG\) costs more than \(M\text{.}\) So, if we make \(M>n\epsilon\text{,}\) the TSP for our weighted \(K_n\) will have a solution with cost less than \(M\) if and only if \(\bfG\) had a Hamiltonian cycle.
Since determining whether a graph \(\bfG\) is Hamiltonian is difficult (NP complete), the TSP will also be. As such, we will not discuss any algorithms for actually solving TSP. Instead, we will discuss methods for giving upper and lower bounds for the TSP.
Subsection 3.6.2 Finding upper bounds to the TSP
Getting good upper bounds to the TSP will turns out to be difficult. However, finding not so good upper bounds will turn out to be quite easy.
For instance, any solution to the TSP will be a Hamiltonian cycle, and in particular if \(\bfG\) contains \(n\) vertices, the TSP solution will contain \(n\) edges. Let \(M\) be the weight of the most expensive edge in \(\bfG\text{.}\)
Since the TSP asks for the cheapeast Hamiltonian cycle, taking any Hamiltonian cycle and calculating its cost will be an upper bound for the TSP. Just choosing a random Hamiltonian cycle will in general be very expensive and silly -- for instance, going from Sheffield to London to Rotherham to Edinburgh to Chesterfield to Glasgow to Nottingham to Brighton is clearly not optimal.
A greedy algorithm will give a heuristically better result: we call it the nearst neighbor algorithm. At each step, simply go to the nearest city you have not already visited. This will give good results at the beginning, but since we do not do any planning ahead, it will in general give bad results, as the following example illustrates:
Consider running the Nearest Neighbor algorithm starting at \(v_0\text{.}\) At the first step, we have a choice -- we could go to \(v_1\) or to \(v_9\text{.}\) Suppose we go to \(v_1\text{.}\) After that, our choice is forced -- \(v_1-v_2-v_3-v_4-v_5-v_6-v_7-v_8-v_9\) costs one at each step. Now, we still have to visit \(T\) before returning to \(V_0\text{,}\) which will cost us 10 to detour through. We clearly should have planned ahead and visited $$T$$ in between vertices \(v_4\) and \(v_5\) at a cost of 4.
Clearly the nearest neighbour algorithm is not in general very good, and better algorithms are possible. We present it first to give a quick but reasonable way to get a solution to TSP that isn't completely horrible, and second to illustrate that greedy algorithms in general will not be efficient. We briefly mention two other ways to get lower bounds.
Another, slightly better, greedy algorithm might be called nearest insertion. It inductively builds bigger and bigger closed loops one vertex at time. When there is a closed loop with k vertices \(v_1-v_2-v_3-\cdots - v_k-v_1\) and we want to add vertex \(w\) to the loop, we look at each of the adjacent legs \(v_i-v_{i+1}\text{,}\) and determine how much it would raise the cost to insert the next vertex \(w\) in between those two cities (changing the path to \(v_1-w-v_{i+1}\)), being sure to also check for inserting it between \(v_k\) and \(v_1\text{.}\) This does much better at our example above, but can run into other problems, and involves a little more bookkeeping and arithmetic, so I won't ask you to implement it on the exam.
Another method inolves a qualitive change of view. The greedy algorithms we describe so far are only heuristics to getting a decent path. There is no guarantee that they produce an output that is in any way close to the optimal path, and indeed examples can be engineered to make them extremely bad. It would be nice to have an upper bound that was guaranteed to not be too far off the optimal solution. The Christofides algorithm does just that, by producing a Hamiltonian cycle that is guaranteed to have weight at most 3/2 of the weight of the optimal solution. Very briefly, it does this by starting with a minimal weight spanning tree, makes a subgraph by adding edges to the tree until every vertex has even degree, taking an Eulerian circuit of that, and then removing edges to get a Hamiltonian cycle.
For nearly fifty years, Christofides algorithm was the best known guaranteed upper bound on the Travelling Salesperson problem, but in the summer 2020 of Nathan Klein, Anna Karlin and Shayan Oveis Gharan managed to modify the algorithm to give a very slight improvement, producing a cycle guaranteed to be within \(3/2-\varepsilon\) for some \(\varepsilon>10^{-36}\text{.}\) See this Quanta article for a popular account of their work.
Subsection 3.6.3 A lower point for TSP
To get a lower bound for TSP we have to be a little more intelligent. Suppose we had a solution \(C\) to the TSP for \(\Gamma\text{,}\) and that we deleted one vertex \(v\) from \(C\text{.}\) Deleting a vertex from a cycle gives us a path \(P\text{,}\) and in particular a tree. Furthermore, \(P\) visits every vertex in \(\Gamma\) except for \(v\text{,}\) and so it is a spanning tree of \(\Gamma\setminus v\text{.}\)
We can use Kruskal's algorithm (or another) to find a minimal spanning tree \(T\) of \(\Gamma\setminus v\text{,}\) and we have that \(w(P)\geq w(T)\text{.}\) The cycle \(C\) contains just two more edges, from \(v\) to two other vertices, say \(a\) and \(b\text{.}\) We can obtain lower bounds on the weights of the edges \(v-a\) and \(v-b\) by taking the weights of the lowest two edges out of \(v\text{,}\) maybe \(e_1\) and \(e_2\text{.}\) We have
giving us a lower bound on solutions to the TSP.