home *** CD-ROM | disk | FTP | other *** search
- A Poor Man's Solution to the Traveling Salesman Problem
-
- Kevin E Knauss
-
- 12 Mar 89
-
- Given a map and a means of transportation, a competent traveling
- salesman can pick a reasonable route to all his customers. The route he
- picks needn't be optimal just practical. In this article, we'll explore
- a hypothetical salesman's intuitive approach to finding a practical
- route to all his customers and its implementation in the C programming
- language. In an attempt to find elegant optimal solutions to tough
- problems, we often overlook solutions which appear to be rather
- "brutish." Upon closer inspection, however, we see that these solutions
- are more elegant than they appear and, better yet, they work!
-
- BACKGROUND
-
- Routing and scheduling problems are inherently difficult to solve
- because they often require total enumeration of all possible outcomes.
- As the number of data points in the problem increases, the possible
- outcomes increase exponentially. With the Traveling Salesman Problem
- (TSP) for instance, studies have shown that an algorithm which yields an
- exact solution is relatively infeasible for networks containing in
- excess of 100 points. In fact, there are problems with as few as 48
- cities for which the best answer found has not been proven to be
- optimal. By using heuristics of various types, one is enabled to find
- feasible (though not necessarily optimal) solutions to these otherwise
- "unsolvable" problems.
-
- The TSP simply stated is: "A salesman, starting in one city, wishes to
- visit each n─1 other cities once and only once and return to the start.
- In what order should he visit the cities to minimize the total distance
- traveled?" (8) This may seem too trivial a task to have generated active
- research for over three decades (the literature I've researched goes
- back to 1958), but there are many practical applications for the
- solution of this basic problem. Automated drilling machines and the
- newer laser drills used to drill printed circuit boards, for example,
- may have hundreds or thousands of holes to drill and often spend as much
- time traveling to a position for a hole as they do drilling it.
- Programming efficient head travel for these machines could make the
- difference for a company to turn a profit or loss. Solve the more basic
- TSP, and the marginal printed circuit board company may be able to stay
- in business by applying the same techniques.
-
- The TSP is one which is, to quote an old cliche, "easier said than
- done." That is, it is easy to explain the problem but difficult to
- solve; at least it seems difficult to solve when we look at many of the
- purely mathematical models that are, too often, not related back to the
- original problem. I chose to attack the TSP in a simplistic manner in
- an attempt to find one or more algorithms which would approximate a
- person's intuitive approach. In so doing, I was able to find an
- efficient way to "solve" the problem by producing acceptable results to
- large─scale traveling salesman problems. (Note that the word
- "acceptable" implies that judgments are required.)
-
- TERMINOLOGY
-
- Before we begin traveling around, a review of TSP terminology is in
- order. We'll begin with the city, our most basic term. This is what
- will be visited and may also be referred to as a town, point, node, or
- vertex. One goes from one city to the next by traveling a given
- distance or incurring a specified cost. The terms link, arc, and edge
- are also used in place of distance or cost. The collection of all the
- cities and the distances between each pair is a network or graph and is
- often represented by a distance matrix. A salesman will follow a route
- to visit each of the cities in the network, and this route may also be
- called a tour, path, or circuit. Finally, if we remove two or more
- links from the completed tour, we will break it into sub─paths or chains
- of cities.
-
- The distance matrix is a two dimensional array where the horizontal or
- row vector (dimension) is identical to the vertical or column vector.
- The cell found at each intersection contains the distance or cost
- between the city represented by the horizontal coordinate and the city
- represented by the vertical coordinate. Those familiar with graph
- theory haven't seen anything new here. If you've seen a lot of these
- terms for the first time, however, don't be afraid to refer back, for
- I'll be using many of them interchangeably.
-
- APPROACH
-
- Let's now consider the problem in terms of a salesman who must visit a
- dozen or so cities in the state of Hypothetica. Since the salesman must
- leave and return to the same city and visit all other cities in the
- process, his tour will be some sort of loop through the state.
- Obviously, as a loop is unbroken, one may start at any point on the the
- tour and still trace the same loop. Thus the starting city is of no
- consequence; rather we want to find the best route irregardless of the
- salesman's starting point.
-
- Intuitively, one would want to travel to cities nearby and to cities
- near those. We can build a procedure based on this thought by first
- finding the closest two cities and then continuing to the next closest
- city that hasn't been visited. This should produce a fairly good tour,
- or at least would seem so at first. It may turn out that this tour
- isn't optimal, but it's a reasonable solution for starters.
-
- As the cities are exhausted from our network, we have fewer choices to
- make. Intuitively, we may reason that the choices left to us may not be
- as good as those we're offered in the early stages of tour building.
- Our salesman may be forced to backtrack and cross previously traversed
- arcs. If we check the proximity of neighboring cities, however,
- especially those near the end of the initial tour, we may be able to
- find improvements.
-
- One approach we may try involves the removal of a single city from the
- tour and testing it between each pair of cities in the remaining tour.
- Once we've tested it in each location, we'll place it in the location
- where the overall circuit cost is lowest (i.e. the shortest distance
- the salesman must travel). This same approach may be tried with chains
- of cities of varying lengths, but with chains we must also check for
- orientation (that is try the chain both frontward and backward between
- each pair of cities). This leaves us with our last thought of simply
- checking the orientation of a chain in its original location. If you
- think a picture is worth a thousand words, see Figure 1 so we can cut
- 4,000 words from this article. By testing the proximity of every city
- or the proximity and orientation of every chain, we can be fairly
- confident that any ill effects produced by our original technique will
- be cleaned up.
-
- If we look through related literature, we find that our tour building
- and improvement techniques have already been studied and named. Our
- tour building algorithm is known as the nearest neighbor or greedy
- algorithm. Our tour improvement algorithms generally fall into a
- category known as k─optimality or k─opt for short. A tour is said to be
- k─optimal if we are unable to improve it by removing any k arcs and
- replacing them with k others. Checking chain orientation in place is
- the same as removing two arcs and replacing them with two others and is
- thus the 2─opt algorithm. Likewise, chain proximity and orientation is
- the 3─opt algorithm with point proximity a special case where the chain
- to be tested has length one. Even though these improvement techniques
- are related, we'll evaluate each on its own merits.
-
- IMPLEMENTATION
-
- To evaluate the intuitive approach we may embark upon an elaborate
- mathematical analysis that may or may not produce any conclusive
- results, or we may implement the solutions in practical models that may
- be run against live data. If this was a scientific journal, we'd follow
- the mathematical tack; but since this is a practical journal, we'll try
- the modeling approach.
-
- The main functions are programmed in their own modules called:
- NearNeighbor, PointOpt, TwoOpt, Hybrid, and ThreeOpt (listings 1 through
- 5 respectively). NearNeighbor generates the initial tour from the
- distance matrix while the other routines take turns improving it.
- PointOpt performs point proximity improvement only, and TwoOpt performs
- only chain orientation improvement. Hybrid combines point proximity and
- chain orientation improvements while ThreeOpt adds chain proximity and
- orientation. The nearest neighbor, 2─Opt, and 3─Opt algorithms have
- been studied in detail within the field, but are normally regarded as
- independent techniques. To my knowledge, this is the first that point
- proximity has been considered either independently (PointOpt) or in
- conjunction with the 2─Opt algorithm (Hybrid).
-
- We'll use six distance matrices found in the literature to test our
- procedures since these networks have known optima (or at least a best
- known solution as is the case of the 48 city problem). We'll need to
- know the tour length each procedure produces and the time it takes to
- find the tour. We can calculate from this information how much
- improvement is made by each technique and what percentage each solution
- is from the known optimum. To see how the improvement techniques work
- on different initial tours, we'll reverse the initial nearest neighbor
- tour and generate a bad initial tour.
-
- To capture the time, we'll need a system dependent routine. GetTime
- samples the clock counter by issuing an interrupt under MS─DOS;
- ElapsedTime calls GetTime and compares the new time with a previous time
- passed in. Listing 6 shows GetTime implemented using the MIX C compiler
- for MS─DOS and ElapsedTime in a plain vanilla implementation. For the
- bad initial tour, we'll simply reverse the logic of the nearest neighbor
- algorithm to generate a farthest neighbor tour (listing 7). The driver
- program (listing 8) is far from elegant but gets the job done.
-
- OBSERVATIONS
-
- One might assume that, since it embodies all the techniques used in the
- other improvement algorithms, the 3─Opt algorithm would produce a tour
- at least as good as the others. Our results show that one might be
- wrong in such an assumption! In fact, the 3─Opt algorithm only found the
- best solution in the two smallest problems and other algorithms found
- the same solutions (see Figure 2). The total time to run the three tour
- building routines and the three lesser improvement routines on each is
- less than the time needed to run the 3─Opt routine just once on one of
- the larger problems (see Figure 3). This indicates that the 3─Opt
- algorithm may not be very valuable as a tour improvement algorithm.
-
- The independent point proximity algorithm is likewise not very valuable.
- It was 15% faster than the Hybrid routine overall but fell well behind
- in performance. PointOpt found the best solution in the 10 city
- problem, but Hybrid found the same solution. Hybrid outperformed
- PointOpt in all the other solutions so nothing is gained by running both
- routines.
-
- We can see that the TwoOpt and Hybrid routines compliment each other
- very nicely. Where Hybrid didn't find the best solution, TwoOpt did.
- The 2─Opt algorithm was consistently the fastest and the point proximity
- 2─Opt hybrid found the best answer in four of the six problems. Their
- combined times plus the time to build the initial tour was comparable to
- the time required to read the distance matrix on my 12Mhz AT clone with
- 28ms access hard drive.
-
- One thing that may be a surprise, is that our tour improvement
- algorithms are dependent upon how compatible the initial tour is with
- the techniques being applied and not necessarily how close to optimum
- that tour is. In all but one problem, the best improvement was found
- when starting from the nearest neighbor solution. In the 42 city
- problem, however, the best improvement was found by starting with the
- farthest neighbor solution. In addition to being found from the nearest
- neighbor solution, the same improvement was found in less time from the
- farthest neighbor solution in the 10 and 20 city problems.
-
- The MIX C compiler/linker with its .COM executable files causes some
- problems for large scale applications such as the drilling machines. I
- had no problem with our small to medium scale problems, but when I
- compiled the program with dimensions over 100 the program ran out of
- space. Dynamic storage allocation won't cure the problem since the heap
- space for dynamic storage allocation and the stack space for static data
- structures all come from the same pot. Answers will have to come from a
- C environment that allows for .EXE executable files or the use of disk
- resident dynamic arrays. The latter of these will degrade execution
- time, but taking a long time to find a solution is better than taking a
- short time to not find one at all! Note that by using the function
- ArcCost to access distance matrix data we've made it easy to try
- differing storage methods.
-
- CONCLUSION
-
- Following an intuitive approach to a problem and implementing that
- approach can often produce very acceptable results. Though there can be
- no substitute for thorough analysis, neither can there be a substitute
- for experimentation and testing of hypotheses. The modularity of C with
- its procedures and functions allows the building blocks of
- experimentation to become the building blocks of application. With our
- TSP modeling, we need only develop a new driver program to build an
- efficient production package.
-