Lecture 10: Kruskal proof, Traveling Salesman
Kruskal’s completed:
In this morning’s lecture we described Kruskal’s algorithm for finding a minimal weight spanning tree in a weighted graph. After demonstrating the algorithm, we showed that it always produces a spanning tree, but we have not yet shown that this spanning tree has minimal weight. We do that now.
The proof will be inductive. Let be the subgraph that Kruskal’s algorithm starts with, namely, all the vertices of and none of the edges. Let be the subgraph Kruskal’s produces after adding edges to . We willinductively prove that is contained in some minimal spanning tree of . In particular, since we know is a spanning tree, we see that will be a minimal weight spanning tree.
The base case is clear: any minimal spanning tree contains all the vertices of , which is the initial graph , and since is finite there exists some minimal spanning tree.
For the inductive step, we assume that produced from Kruskal’s algorithm is contained in some minimal spanning tree , and Kruskal’s algorithm tells us we should construct from by adding the edge to . We need to prove that is contained in some minimal weight spanning tree . We claim we can take .
First, note that is indeed a spanning tree; since has a cycle, is connected, and it has the right number of edges.
Now, we must show has minimal weight. Since only differs from (which had minimal weight) by adding and removing , we see that if and only if .
Since is by supposition a minimal weight spanning tree, we have that . Since Kruskal’s algorithm is trying to add the edge instead of , we have that either , or that adding to creates a cycle. But which is a tree, so it cannot be creating a cycle, and we must have that .
Thus, , and so is a minimal weight spanning tree that contains
Comments on minimal spanning trees
Although Kruskal’s algorithm only finds a single minimal spanning tree, the only time Kruskal’s algorithm has any choice is in what order to add edges with the same weight. Thus, it is possible to find all minimal spanning trees by analyzing what happens when we had a choice of edges to add.
There are several other algorithms for finding minimal spanning trees:
- Prim’s algorithm is very similar to Kruskal’s algorithm, but we start from a single vertex, and always add the cheapest edge that is connected to what we already have and doesn’t create any loops.
- The Reverse-delete algorithm is the opposite of the greedy algorithm. Rather than adding the cheapest edge possible, the Reverse-delete algorithm worries about getting “stuck” having to add an expensive edge. It continually finds the most expensive edge. If removing that edge does not disconnect the graph, it does so. If it does disconnect the graph, it keeps that edge as part of the spanning tree, and finds the next most expensive edge.
Traveling Salesman Problem
The Traveling Salesman Problem, abbreviated TSP, is the following: given a weighted graph , find the cheapest Hamiltonian path; that is, the cheapest closed walk on that visits every vertex exactly once.
We begin with some basic observations:
-
It is enough to consider the complete graph . If we are given some other weighted graph , we can add all the edges not in but make their weights much larger than any of the weights inside .
-
The problem of determining whether a given graph has a Hamiltonian cycle is a special case of the traveling salesman problem. To see this, suppose we’re given a graph , and we want to determine whether it is Hamiltonian. We create a weighted , with vertices the vertices of by giving the edge a very small weight if and are adjacent in , and a very large weight if and are not adjacent in . Then, any Hamiltonian path in would have cost , where as any path that uses an edge not in costs more than . So, if we make , the TSP for our weighted will have a solution with cost less than if and only if had a Hamiltonian cycle.
Since determining whether a graph is Hamiltonian is difficult (NP complete), the TSP will also be. As such, we will not discuss any algorithms for actually solving TSP. Instead, we will discuss methods for giving upper and lower bounds for the TSP.
Upper bounds for TSP
Since the TSP asks for the cheapeast Hamiltonian cycle, taking any Hamiltonian cycle and calculating its cost will be an upper bound for the TSP. Just choosing a random Hamiltonian cycle will in general be very expensive and silly – for instance, going from Sheffield to London to Rotherham to Edinburgh to Chesterfield to Glasgow to Nottingham to Brighton is clearly not optimal.
A greedy algorithm will give a heuristically better result: we call it the nearst neighbor algorithm. At each step, simply go to the nearest city you have not already visited. This will give good results at the beginning, but since we do not do any planning ahead, it will in general give bad results, as the following example illustrates:
Consider running the Nearest Neighbor algorithm starting at . At the first step, we have a choice – we could go to or to . Suppose we go to . After that, our choice is forced – costs one at each step. Now, we still have to visit before returning to , which will cost us 10 to detour through. We clearly should have planned ahead and visited in between vertices and at a cost of 4.
The nearest neighbour algorithm for finding upper bounds is demonstrated in this video.
Clearly the nearest neighbour algorithm is not very good, and better algorithms are possible; we present it first to give a quick but reasonable way to get a solution to TSP that isn’t completely horrible, and second to illustrate that greedy algorithms in general will not be efficient.
Lower bounds for TSP
To get a lower bound for TSP we have to be a little more intelligent. Suppose we had a solution to the TSP for , and that we deleted one vertex from . Deleting a vertex from a cycle gives us a path , and in particular a tree. Furthermore, visits every vertex in except for , and so it is a spanning tree of .
We can use Kruskal’s algorithm (or another) to find a minimal spanning tree of , and we have that . The cycle contains just two more edges, from to two other vertices, say and . We can obtain lower bounds on the weights of the edges and by taking the weights of the lowest two edges out of , maybe and . We have
giving us a lower bound on solutions to the TSP.
This method of finding lower bounds is illustrated in this video