Free Time Research
A Dynamic Data Structure Based Solution to the Traveling Salesman Problem
Hangjiong Chen
Abstract
In order to find a solution to the traveling salesman problem with accuracy and efficiency, I introduced a simple, yet very powerful data structure, which possesses three interesting properties, each of which can contribute to the optimization of a path in its own way: position exchange, block reversal, and cross-join removal via rearrangement. This data structure is named TspGap, and consists of left position pl and right position pr, and a list called block that contains any number of positions between pl and pr. TspGap as a data structure is dynamic. It can expand or slide along the path to cover and represent any part of the path, thus allowing its three properties to be fully exploited and utilized. The first property, position exchange, allows rapid exchange of a position to a different place where this position is deemed to be fit better in terms of the distance to its neighbors. The second property, block reversal, offers a faster, more efficient and general alternative to position exchange, applicable after any operations on the path. Position exchange and block reversal can bring a random path only to a semi-optimal state, which is burdened with many cross-joins. Optimization of a semi-optimal path requires removal of these cross-joins through a rearrangement process. When a position’s distance to one of its immediate neighbors is longer than the distance to another position that is one or more positions away, the position’s optimal state is considered dubious. Such a dubiousness is faithfully reflected in a TspGap, in which the direct distance of its pl and pr is shorter than its side gap’s direct distance. What’s interesting is that the dubiousness of a TspGap is correlated to cross-joins so well and in an intrinsic manner. This correlation sheds light on how the rearrangement could be proceeded in a fast and precise way. Related dubious TspGaps usually exist in a chain, and the next TspGap in the chain decides how the TspGap ahead of it to be processed. Such a dubious TspGap chain works as a unit to remove a few wrong gaps at once, which include cross-joins. Implementation of dubiousness-aided rearrangement should be as general as possible so that the amount of code could be reduced to minimum. An implementation based on alignments of TspGaps in the chain was used in this report.
1. Introduction
In this paper I report a data structure-based solution to the travelling salesman problem (TSP) defined in Wikipedia as follow: given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?
TSP has been subject to intensive study in computer science, and many algorithms have been developed to find a solution. As a result, shortest paths involving many thousands of cities are now available with high accuracy. Despite current great progress in the field, it is always desirable if new algorithm can be discovered, even better if new algorithm can provide a faster and more efficient way to achieve the optimal tour.
TSP can be solved exactly using brutal force algorithm, but ((n-1)!)/2 possible combinations make this approach infeasible.Nevertheless ((n-1)!)/2 possibilities are a great exaggeration. For a given city on the path, its optimal neighbors are actually limited to on average no more than 5 of its nearest neighbors.Using tour of 500 best colleges in the US as an example, the optimal tour in the Boston area is not much related to the optimal tour in San Francisco area. Only one college in the Boston area will be indirectly connected with a particular college in the San Francisco area. In this sense dealing with a 5000-city path is not much more complex than a 500-city path.
In this paper I will introduce two classes, one representing two adjacent cities in the path, and another, a data structure, representing any part of a path, and use them to find a potential solution to the traveling salesman problem.
2. Data Structure Classes
2.1 Gap, TspGap, and Exchanger
Figure 1 shows class diagrams for Gap, TspGap, and Exchanger classes with examples below each diagram.
Figure 1. Class Diagrams representing three classes used in this paper
Gap class represents any two adjacent positions pl and pr on the path:
Gap isn’t very useful, but it can help assess roughly if the two positions that make up a gap’s pl and pr are optimal to a certain degree. If they are not optimal at all, one of the positions may be moved to elsewhere, or a third position may be placed between them. A Gap object is also used to mark a path where operations will be performed to rearrange the path. A Gap has a gap distance from pl to pr.
TspGap is a natural extension of Gap class by including a list, called block, that contains all positions between left side position pl and right-side position pr:
The value of n can range from 0 to the size of entire path minus 2. Therefore, TspGap can represent the entire path if necessary. The most prominent feature of a TspGap is the various distances we can get from it based on relative positioning of each element. The TspGap possesses a left-side gap pl-p0 and a right-side gap pn-pr. The total distance is the sum of left-side gap distance plus the entire block distance plus right-side gap distance. If the block is empty, the total distance is equal to the distance between pl and pr.
TspGap class is a dynamic data structure as shown in Figure 2. It can extend in either direction starting from a given size to the size of the entire path, and it can also slide in either direction to traverse the path. The top half of Figure 2 shows that starting from gap 0-5, it slides down the path to where it is intended to stop. This is a gap-wise operation used during exchange as well as insertion site selection when putting positions back into the path. The bottom half illustrates how to scan the path to collect TspGaps that meet collection criteria. The initial TspGap of a given size is generated from a starting position, and extends one position at a time down the path, and in the meantime, the extended TspGap is checked against criteria. If the TspGap meets the criteria, a clone of it is made and saved. Extension will continue and the TspGap grows in size, until there is no need to continue or reach the end of the open path. The whole process repeats as many times as needed by beginning one position down from the previous starting position. This dynamic nature of TspGap forms the basis of its extraordinary capability in finding a better algorithm for the traveling salesman problem.
Figure 2. Dynamics of TspGap via extension and sliding to perform its functions
A direct application of TspGap is implemented in class Exchanger. The Exchanger class consists of two TspGaps, a donor TspGap and a recipient TspGap. The donor TspGap is directly obtained from the path with its donor positions in the block, while the recipient TspGap, which comes also from the path, possesses an empty block to accept positions from the donor. There is usually only one position in the block of a donor TspGap to exchange, though this is not required.
3. Experiments
In this paper, three national Tsps from the web site www.math.uwaterloo.ca/tsp/world/countries.htmlare used for simplicity: Western Sahara - 29 cities (WI29), Djibouti - 38 cities (DJ38), and Qatar - 194 cities (QA194). Paths are represented as a doubly linked list.
3.1 Data Preparation
The raw map data were converted to a two-dimensional array using standard conversion method so that distance data match what were used by the other researchers. The index map contains a key set representing all cities, each of which maps to a list that contains its first n nearest neighbors in the map. In general n ranges from 6 to 10, depending on the size of the map.
Figure 3. Data preparation. WI29 data were shown here.
The index of a position in the list is an indicator of this position’s closeness to the key. The index compatibility test uses the index map to check if two positions in the path are optimal to a certain degree. If a gap’s pl has an index in pr’s nearest neighbor list greater than 2 and or pr has an index in pl’s nearest neighbor list greater than 2, this gap is considered index incompatible. In addition, the positions in the nearest neighbor list are the neighbor candidates for the key when exchange and rearrangement are used to optimize the path. The key is thus called key position. Because of bidirectional nature of the path, two-way selection method (Figure 4) must be used to select potential neighbor candidates for exchange as well as rearrangement described later.
Figure 4. Two-way selection method to select potential neighbor positions
3.2 Exchange
Exchange can be applied to any path that requires optimization. It can rapidly bring a random path to a semi-optimal state.
The path was scanned gap-wise from a fixed position towards right and the gaps were assessed as targets for exchange using index compatibility test. If a gap is considered index incompatible, it should be subject to exchange. Once an exchange target was identified, the target was processed as both a donor and a recipient depending on which way yielded best result. The potential good neighbors are identified using two-way selection method. The selected neighbor candidates were located on the path, and converted into TspGaps. If the TspGaps were used as donors in Exchanger objects, the neighbor candidates are in the donors’ block, and the incompatible gap was the recipient. Otherwise, the TspGaps were used as recipients with empty blocks in the Exchanger objects, and the incompatible gap was extended into either side by one position to become the donors. Many Exchanger objects constructed in this way formed an exchanger pool. Upon evaluation of the total distance change from the donor and recipient as a whole (see Figure 5), the exchanger that resulted in the largest distance saving was selected and the position in the donor’s block was transferred to the recipient’s block, and finally changes were applied to the path. The exchange continued until exchange yielded no additional distance reduction to the path.
Exchange is effective and brings a random path into a semi-optimal state rapidly. It can achieve optimal tours for short paths like WI29 and DJ38 one out of 10 random paths. Since only exchangers that resulted in a net total distance reduction would be applied to the path, exchange performed in this way always gave rise to an improved path. In the same time, a number of disoriented blocks were also generated during exchange throughout the path. The next step was to identify them and then bring them into the right orientation.
Figure 5. Distance change calculation to evaluate if the exchange will lead to total distance reduction.
3.3 Block Reversal
The structure of TspGap suggests there are two ways to calculate the distance from pl to pr as shown in Figure 6. The obvious way described in earlier section was the forward total distance. If the block’s orientation was reversed, while the gap’s pl and pr remained the same, then the TspGap’s left-side gap and right-side gap were changed, so their distances. The total distance calculated with a reversed block was the reverse total distance. If the reverse total distance is smaller than the forward total distance, the block should be reversed for the TspGap. This simple distance test is called block reversal.
Block reversal can easily detect and correct a disoriented block. However, the power of block reversal is far more than what was illustrated in the figure. Since TspGap was a dynamic data structure, it could extend towards either direction as illustrated in Figure 2, and hence identify all blocks that required reversal in the entire path. Block reversal alone is able to achieve semi-optimal paths from their random parents with results comparable with exchange, even achieve optimal tours for short paths like WI29 and DJ38 one out of ten times. This is a very remarkable feature since block reversal is such a simple and straightforward operation at first glance. Figure 7 illustrated 100 semi-optimal paths generated from their random parents by block reversal only.
Figure 6. Two ways to calculate a TspGap’s distance can reveal the wrong block orientation
Block reversal is effective, fast, and easy to implement. Block reversal should be applied to the whole path progressively, starting from an initial position with a TspGap of size 4. This TspGap extends towards right. As the TspGap grows, total forward distance was checked against total reverse distance. When a condition existed, blocks were reversed and changes applied to the path. When the size reached about 70% of path’s length, a new TspGap of size 4 was created at the position next to the last initiation site to start a new round of operation. Such a process was continued repeatedly until no distance improvement was achieved. A path after block reversals shows remarkable stability despite of being semi-optimal.As a general rule, any action on a path should be followed by block reversal to remove any disoriented blocks that might have been introduced during the operations.
Figure 7. Optimization of 100 random paths created from QA194 using TspGap’s block reversal alone
Exchange is prone to producing disoriented blocks, and hence must combine with block reversal to bring down the distance further. Similarly block reversal on a random path should be followed by exchange to achieve maximum effect. In general, exchange should be performed before block reversal, not the other way around.
3.4 More on Exchange
In addition to the most obvious way of evaluating an exchanger object described in section 3.2, there is another way to determine if an exchanger could be used. It was hidden so deep that it eluded my eye for quite a long time.
A slight variation in calculating the path distance between TspGap’s pl and pr led to the discovery of block reversal mechanism. Let’s look at a short stretch of path, called oligomer, as shown in Figure 8. This oligomer’s start and end positions were correctly aligned to the optimal path, but the positions between them were not in optimal order.Furthermore, the oligomer was immune to exchange and block reversal. In general, certain dramatic approaches were required in order to make an oligomer optimal.
Two sections of the path from semi-optimal QA194 was shown in Figure 8, one of which was the oligomer 98-100-110-113-125-124-126-129-131-133-136. All exchanger objects identified for this oligomer showed negative net distance changes. However, careful inspection of the donor and recipient TspGaps from the best exchanger strongly favored an exchange between them, because recipient TspGap, upon receiving the position from the donor, still had a significantly smaller total distance despite of an overall net distance increase of 6 units to the path. The result of the exchange was shown at the right side of the figure.
The exchange result was already presented in Figure 6 to illustrate how two distance comparison led to block reversal. Therefore, a follow-up block reversal would immediately result in a distance reduction by 31 + (-6) = 25 units to the path.
Figure 8. Anti-sense exchange
As a general conclusion, many short sub-optimal oligomers like the one shown above, which don’t contain cross-joins on a larger scale, usually must experience distance increases before they become optimal. Exchange that will increase oligomers distances but pave the way for block reversal to act upon is called anti-sense exchange. If an anti-sense exchange really didn’t make sense, the block reversal operation usually would immediately bring the oligomers back to their original sequences. That’s a beauty of the block reversal.
Block reversal is ineffective against oligomers of certain position sequences. Instead of turning them into optimal, block reversal reduced their distance to a certain degree only, but made them very stable against normal exchange and further block reversal. When this happened, anti-sense exchange could play a significant role.
Exchange and block reversal act together on any random paths or any unknown paths to reduce their distances until they are no longer able to reduce further. Figure 9 shows two examples of semi-optimal QA194 obtained from exchange and block reversal.
Figure 9. Comparison of Two semi-optimal QA194 tours with the optimal QA194 tour. Green lines were not optimal, and optimal lines replaced with green lines were crossed out with =.
Despite a big distance difference between the two semi-optimal tours, they shared the same number of wrong gaps. Both of them missed 46 optimal gaps, while created 46 new gaps in place of them, suggesting that the number of wrong gaps was pre-fixed to a certain degree in the random paths. It’s also interesting to point out that certain areas were more error-prone as the same errors appeared in both paths, while others were unique to only one of them.
Detailed examinations of these paths led to a general conclusion. Loss of one optimal gap was always accompanied by generation of another wrong gap in its immediate vicinity. It’s hard to see a stretch, even a short stretch of path, which consisted of continuous wrong gaps. A direct result of such a distribution of wrong gaps was cross-joins all over the path, which revealed the limitation of exchange and block reversal on a larger picture. Exchange could be refined on a particular random path to shorten the distance further, but it was just the redistribution of wrong gaps, not the reduction of their number in the path. Of course, these observations are not new to the problem, the challenge is how to remove them all.
3.5 Rearrangement
The paths depicted in Figure 9 suggest that a wrong gap could be broken up to free its two sides, one of which then re-attached to its rightful neighbor sitting just nearby. This rightful neighbor itself must be part of another wrong gap that when breaking up, left another position free, which would look for its own rightful neighbor. As the process moved forward and corrected wrong gaps along the path, it would end up where it started when all wrong gaps were gone. In theory, it’s possible, but in practice, there are difficulties to do so.
The process described above requires movements of stretches of the path from their current locations to where they are optimal. Because of the nature of movement, we call the process rearrangement. There were two issues to be resolved for it to work. Firstly, we must identify the right neighbor, and secondly, we had to determine which side of the selected neighbor to make a break, so that a gap with the new neighbor could be formed. Breaking at the wrong side would obviously produce new wrong gaps, which could proliferate quickly.
When the first wrong gap took apart to start the rearrangement process, we focused on one side only. This side would actively look for its right new neighbor. Since it got all possible candidates from the index map using two-way selection method, we called it the key position. Map stores key/value pairs, so we call new neighbor candidates value positions, only one of which was the right neighbor. The other side would remain free until the whole rearrangement process came to an end by forming the last gap with the last key position as its value position.
In presentation followed below, color lines were used in all figures involving partial paths to mark gaps that are common to both optimal and semi-optimal tour (black), optimal tour only (green) and semi-optimal only (red).
3.5.1 A rearrangement overview
Figure 10 shows how rearrangement could be done. It seems to give an impression that rearrangement was an easy task. Nevertheless, two issues mentioned in previous section became clear too.
Figure 10. An overview of rearrangement
There are two value positions (79 and 75) available for key position 70, and which one to choose? If 79 is chosen, then which side of it to break, 79-86 or 79-75? After numerous trials, determination of a value position was found to be relatively easy, but which side of the value position to break is hard.
3.5.2 The dubiousness of a TspGap
A TspGap is dubious because of its dubious nature in the path. To illustrate this point, let’s look at Figure 11. The stretch of path used in the figure was taken from the semi-optimal path QA194-9616 shown in the right side of Figure 9. And all examples used late on were also taken from the same path.
Figure 11. Illustration of two dubious TspGaps from the semi-optimal path QA194-9617
For the convenience of discussion, we will give different names to different parts of a TspGap. The Gap consisting of TspGap’s pl and pr is named center gap, in contrast to a TspGap’s left side gap and right-side gap. The pl, pr, block’s front p0 and rear pn are called significant positions, because all other positions between p0 and pn are not relevant to the dubiousness of a TspGap.
The two TspGaps in the Figure 11 shared a common feature, that is, their center gap’s distance was shorter than one of their side gaps, in other words, the positions represented by the TspGap was looping back, forming a non-optimal configuration. Comparing this against the optimal path, it was found out immediately that the center gaps were supposed to form, while the side gaps supposed not to form if the path was optimal. The TspGap 2 was even more conspicuous in that pl came back close to pr again after a long journey between them. Another prominent feature is that TspGap 1 and TspGap 2 shared a common side gap 170-169, notwithstanding sitting on opposite sides of the TspGaps. If a TspGap’s one side gap has a distance longer than the center gap, this TspGap is said to be dubious.
Based on which side of a TspGap is dubious, we divide dubious TspGaps into three groups shown in Figure 12. The dubious side is critically important for TspGaps when it came to their use in rearrangement.
Figure 12. There are three groups of dubious TspGaps based on which side they are dubious
3.5.3 Distribution of dubious TspGaps in the path
A dubious TspGap itself doesn’t indicate if the stretch of path where it is located is not optimal. Rather it can work together with other dubious TspGaps to provide a precise way to break the value position from its left or right-side neighbor. Dubious TspGaps are rich around the wrong gaps that form cross-joins. The number of dubious TspGaps in an optimal tour is significantly less than in a semi-optimal tour. The large number of dubious TspGaps from some stretches of path could obscure their true usefulness in the rearrangement process. Figure 13 shows how to get all dubious TspGaps from a short stretch of path and how incredible their rich existence is in that stretch of path. Same colored gaps form a series.
Figure 13. A short stretch of path contains incredible number of dubious TspGaps.
3.5.4 Selection of Value Positions
Selection of value positions is important, not because poor selection will miss the right neighbor for the key position, but because it will generate high background noise that can make the rearrangement process less efficient and prone to errors.
The possible neighbor candidate pool was obtained using two-way selection method, with only first 5 positions in the list in the index map for the key position used. The pool was filtered to remove all positions that had been processed in previous operations. A short piece of path starting from the key position was used to separate positions in the pool into two groups. If the candidate positions were found in this piece, they were saved in a local pool. Otherwise they were saved in a remote pool. If remote pool was not empty, the candidates in the remote pool were used as value positions, and further filtered and prioritized based on dubious TspGaps that could serve as primary TspGap for these value positions (see below). In general, no more than 2 value positions could survive the process. If the remote pool was empty, the local pool took the place.
Separation of neighbor candidates into local and remote pools is very important because it separates the rearrangement process into two types. In the case of local pool, the key position is at a dead end. Rearrangement of a dead end can be quite different from rearrangement of remote value position. Furthermore, there may exit a short cut to bring a dead end into optimal configuration.
3.5.5 Dubious TspGap Selection Rules
Figure 13 also illustrates another facet of dubious TspGaps. They form an intrinsic, interlaced complex, and the TspGaps that can serve our rearrangement purposes are buried deep in this complex. Therefore, selection of the right TspGaps is paramount important. As a general approach, we first obtained all TspGaps from an oligomer, of which the value position for the key position was sitting in the middle of the oligomer. The length of the oligomer needs no longer than 30, and in majority of cases 25 already suffice.
If two or more gaps share the same position at one side, and their size differs by one in the ladder of smallest to largest, they form a dubious gap series in the gap pool. Figure 13 contains 3 series, and each series was labeled with a different color.
A value position is called primary position in the world of dubious TspGaps.
In the following sub-sections, we discuss a few selection rules that will help identify the right TspGaps unambiguously, and delineate their exact relationship as related to the rearrangement process.
3.5.5.1 Selection Rule I: Primary TspGap
To initiate a rearrangement process, we start from the primary position. The TspGap that contains the primary position is called primary TspGap, in which the primary position is either the first or last position of the block, not the left or right of the TspGap.
Although the primary position has a corresponding primary gap in majority of cases, there are exceptions in which the primary positions don’t belong to any TspGap. If this happens to a primary position, we check primary position’s left or right neighbor position in the path. If one of the neighbors appears in a TspGap, and is one of this TspGap’s significant positions, this TspGap is called pseudo primary gap, and this neighbor position is called pseudo primary position.
Primary and pseudo primary TspGaps are processed essentially in the same way, but the final result from the pseudo primary position may require block reversal to correct possible orientation errors because the pseudo primary position is further away from the key position than the primary position.
A special type of dubious TspGap is called triplet because it has a size of 3. If there are two gaps that fit Rule I, and one of them is a triplet, then the triplet should be selected as the primary gap. For illustrative purpose, the gap of size 4 or larger called big gap.
3.5.5.2 Selection Rule II: TspGap of the Smallest Center Gap Distance
If the TspGap is the one from a series, then select the one in the series that possesses the smallest center gap distance. Either primary or pseudo primary position is at the fixed side of the series, not the side that changes.
3.5.5.3 Selection Rule III: Dubious TspGap Chain
Figure 11 illustrates that the center gap of a dubious TspGap is actually present in the optimal path, while the dubious side gap isn’t, suggesting that the path should be rearranged by forming a new gap made of the center gap and removing the dubious side gap. The primary TspGaps are exactly such ones illustrated in the figure. To form a new gap from a dubious TspGap’s left and right positions requires breakup of two pairs of existing gaps. The dubious side gap is one target of breakup, but where is the other existing gap to be cleaved? Figure 14 shows that in order to form a new gap 174-183, the dubious gap 174-182 is sure to be broken up, but either side of the non-dubious side position 183 can be cleaved to free 183 and form the new gap 174-183. The problem is which side of the position 183 should be subject to cleavage, gap 187-183 or gap 183-176? The fact that position 183 must be cleaved at one side indicates that either gap 187-183 or gap 183-176 must be non-optimal, and a non-optimal gap could be the dubious side gap of another dubious TspGap. In fact, this is the case. Figure 14 shows such a dubious TspGap illustrated below the primary TspGap. Although in majority of the cases, the side gap shared by the two dubious TspGaps is dubious in one TspGap, and non-dubious in the other, there are situations where it is either non-dubious to both or dubious to both. The shared side gap 187-183 in Figure 14 is non-dubious to both dubious TspGaps. The dubious TspGap that shares one side gap with the primary TspGap is called secondary TspGap. As a matter of fact, the secondary TspGap can have its own secondary TspGap in such a way that starting from the primary TspGap they form a dubious TspGap chain. When the secondary TspGap fails to find its own secondary TspGap, the chain stops.
Figure 14. Breakup of non-dubious side gap of a TspGap must be aided by the second TspGap.
To select all gaps that form the dubious TspGap chain, the position to select the next gap is at the opposite side of the position that is used to select the current one. This position must be at one of the significant positions of selected TspGap. The secondary gaps so identified are called type I secondary gap.
If no gap fits the above selection rule, the position to select next gap can be at the same side of the primary position that is used to select the primary TspGap, which means the left or right of the TspGap. The gaps selected in this way is called type II secondary gap, and is processed differently from the type I secondary gap described above. Type II secondary gap is valid for primary TspGap only.
If neither type I nor type II secondary gap was found for the primary TspGap, then select any dubious gaps that were present around the primary gap. As usual the secondary gap identified in this way is called pseudo secondary gap. Processing pseudo secondary gaps is more complex than type I and type II secondary gaps. Like type II secondary gap, pseudo secondary gap is valid for primary TspGap only.
The chain ends when no more gap can be selected from this oligomer, and this oligomer is considered optimized.
3.5.5.4 Selection Rule IV: Pseudo Center Gap
There are exceptions to Rule II. In rare occasion the TspGap in a series that has the smallest center gap distance may not be the right primary or secondary gap, instead the one in a series with a larger center gap distance is the one to choose if this gap’s non-dubious side position falls into another dubious TspGap’s significant positions.
Rule IV usually occurs to pseudo dubious TspGap. A pseudo dubious TspGap’s center gap isn’t the gap found in the optimal path, but it plays the same role as primary or secondary gaps in facilitating rearrangement.
Rule IV is observed from some continuous positions that form a very close, linear arrangement. Although we are unable to give them a quantitative measurement, they can be considered as an inseparable unit. Under such conditions the smallest center gap distance becomes irrelevant if the outermost position is one of another dubious TspGap’s significant positions.
Figure 15. Two examples of dubious TspGap chains and their gap pools from QA194-9617
By applying Rules I to IV, we were able to create dubious gap chain for each value position belonging to a key position. Figure 15 shows two dubious gap chains from two large dubious gap pools from QA196-9616. The first secondary gap belongs to primary gap, and second secondary gap belongs to the first secondary gap, and so on. During implementation, they are used in order they appear in the chain.
There are 46 wrong gaps in QA194-9616, and all except two (Gap 101-86 and Gap 79-75) have their own primary gaps, and majority of these primary gaps can’t exist without secondary gaps except a few primary gaps from dead end oligomers. Therefore, their presence seems ubiquitous in semi-optimal paths generated via exchange and block reversal.
3.6 Implementation of Rearrangement
Having obtained the dubious gap chain for a value position, we now put it into use to solve our ultimate goal - rearrange the path and make it optimal. Since each wrong gap is wrong in its own way, we need to find common grounds and make the implementation as general as possible.
As we have mentioned briefly in previous sections, when one wrong gap breaks, one of its side positions, that is, value position, will form a new gap with the key position, and the other side position becomes next key position to start a new round of rearrangement. This view becomes blurry when the operation is carried out on dubious gap chains. With the help of dubious gap chain, more than one wrong gap will be consumed per chain, making the process much quicker and more precise.
TspGaps in the chain are chained together through their significant positions, and the use of one is aided by the next one in the chain, which will be illustrated with examples below.
Dubious gap aided operations during rearrangement include reversal, removal and reattach, and permanent removal. These operations were performed locally on the oligomer only. When program control returns to where each key position starts its journey, the changes to the oligomer are applied to the path, and the new key position is rightfully exposed at the end of the path to accept new value position.
3.6.1 Implementation of Rearrangement via Remote Value Positions
We have developed a general scheme to implement dubious gap aided rearrangement via remote value positions shown in Figure 16. This scheme was for the alignments of primary gap and the first secondary gap from QA194-9616. Its use obviously needs to improve using maps with more cities. Nevertheless, the principle behind its creation stands.
Figure 16. Alignment based scenarios to follow when rearrangement via remote value positions was implemented. The primary TspGap can be larger or smaller than the secondary TspGap
Figure 16 lists all 12 possible configurations between two TspGaps in terms of alignments of primary TspGap’s four significant positions against the secondary TspGap’s four significant positions. Since rearrangements are operated on the local oligomers, the alignments 7 to 12 can be converted to alignments 1 to 6 by reversing the orientations of the oligomers and chain members. Therefore, there exist only 6 alignments to consider, making implementation much easier. These alignments determine how the first cleavage of the primary gap is initiated. After first cut was finished, the following operations involving next secondary gaps from the chain become simpler. The implementation needs to consider each of these alignments and at the same time makes code as general as possible. We won’t describe each of these alignments in this paper, but a few of them to show how it can be done.
Figure 17 illustrates an example that fits the alignment 2 listed in Figure 16. It also shows dubious gap chain, and how these gaps in the chain aligned one against next.
The primary gap doesn’t have Type I, but Type II secondary gap, and cleavage occurs at gap 83-76, which will expose value position 76. Since the secondary gap is type II, it’s primary gap’s left side, not the right side, that will be subject to rearrangement via secondary gap.
The primary gap is embedded in the first secondary gap, and share the same dubious side, so cleavage at the secondary gap occurs at gap 99-83. Two cleavages leave 83 as a free position, and must be put back into the oligomer via gap-wise insertion. All insertion operations should be checked with block reversal to correct possible errors.
The first secondary gap now is cleaved at gap 99-83, and its own secondary gap (there are two there, and both work) indicates that cleavage at gap 96-107 can occur, which means a new gap 99-107 as suggested by the first secondary gap after cutting positions in between out:
…109-99-107-106-104… + 76-83-80-78-82-87-90-92-95-94-91-96
Without second secondary gap, cleavage at gap 96-107 can’t be certain. No more secondary gap from the chain, and the rearrangement ends naturally.
Figure 17. An example that demonstrates how the dubious gap chain works in rearrangement
At last 76-83-80-78-82-87-90-92-95-94-91-96 will be appended to key position 69, which is the last node in the path, and 96 becomes the next key position.
Here we removed three wrong gaps 76-78, 83-99, 107-96, among them 83-99 and 107-96 were cross joins. They were replaced with three optimal gaps 69-76, 83-80, and 99-107.
Note that the next secondary gap in the chain plays a crucial role in determining how the gap before them can be handled because there is no sufficient information to make a right decision by previous gap itself. This example highlights the importance of a gap chain and the beauty of how they work together to make rearrangement precise.
Figure 18 illustrates another example that fits the alignment 7 shown in Figure 16. As we mentioned earlier, alignment type 7 can be converted into alignment type 1, although we didn’t do it here.
Figure 18. An example that demonstrates how the dubious gap chain works in rearrangement
The primary gap was a triplet. If a triplet is either a primary or secondary gap in the chain, it usually must change its configuration to become either non-dubious or in another dubious form depending on its context. A path often deviates from the optimal route at a triplet, and as a result, the triplet must be dedubiosified first to make the following steps in right order, if it is the outmost dubious gap in the chain. In addition, the primary position 146 must be exposed so it can form a gap with 150.
Since the triplet was embedded in the first secondary gap, and shared the same dubious side and same dubious position with the first secondary gap, its dedubiousification turned the secondary gap into the following sequence:
The first secondary gap indicated the existence of gap 143-140. What role could the second secondary gap play in rearranging the first secondary gap? The second secondary gap was dubious at both sides. A double dubious gap was the most versatile form of dubiousness. In order to form the gap 149-152, it could reverse in both directions, depending on which side was in a wrong gap. It’s also possible to cut its block off the gap and then put back into the oligomer through gap-wise insertion. Generally, block cut off and reinsertion gave rise to the right result that was also achieved by one of the two reversal operations. In this example, gap 152-140 was the result of dedubiousification of the primary gap and considered to be the wrong gap. As a result, it reversed as follow:
This step also implied that the gap 143-149 was not cleavable. Therefore, the external gap 138-143 must be broken to form the gap 143-140. Since the second secondary gap was independent of the first, we performed the first secondary gap reversal after the second gap reversal because reversal of the second gap wouldn’t change the orientation of the first secondary gap.
Here we removed five wrong gaps 138-143, 153-149, 152-151, 140-146, and 140-121, among them only 140-121 was a cross join. The next key position was 121. Once again, every gap in the chain is indispensable for a precise rearrangement.
The next example illustrated a case involving a pseudo primary position showed in Figure 19. If the primary position was missing its primary gap, a pseudo primary position took its place to initiate the selection process, and the selection of gap chain should also be a little lax. Dubious gaps in the vicinity of the primary position should be grabbed for use following Rule III.
The dubious gap chain consisted of only two gaps. The primary gap was a triplet, and embedded entirely in its secondary gap, which was just one position larger. An interesting observation about this chain was that these two gaps had opposite dubious sides. This relationship played a deterministic role in the rearrangement process here.
Figure 19. An example that demonstrates how the dubious gap chain works in rearrangement
Figure 19 showed that the formation of gap 133-136 requires cleavage of gap 136-141. Position 141 was the long side of the primary triplet, which meant that dedubiosification of the primary gap would not result in the cleavage of 136-141. However, the opposite dubiousness of its secondary gap made it possible as illustrated below. The new gap 136-148 is not optimal. Therefore, after position 136 formed a gap with the key position 133, position 148 became the next key position.
3.6.1.1 Separability Test
In general, more gaps in the chain, easier to do rearrangement. And more tightly coupled gaps are in the chain, easier to do rearrangement too. Let’s look at a case where gaps in the chain just share no positions. Obviously, the secondary gap is pseudo. Figure 20 illustrates the dubious gap chain and their alignments.
Figure 20. A rearrangement that requires separability test
When the primary gap has neither Type I nor Type II secondary gap, but a pseudo secondary gap, how do we make sure that this pseudo secondary gap is indeed relevant to the rearrangement of the primary gap? Lack of common positions between them implies the possibility that they do not belong to the same stretch of path. Separability test will demonstrate if taking the primary gap apart from the pseudo secondary gap will lead to distance saving.
Figure 21 illustrates how separability test is conducted. Firstly, we will remove the positions between the two gaps from the oligomer if there are positions between them, which in this case are positions 125-124. The result of this operation is the formation of a non-optimal gap 113-126. Since the two gaps are both dubious, rearrangements were performed on both of them. The primary gap is dubious at right, and its new external left side 113-126 is not optimal, therefore, a simple reverse will bring the primary gap into an optimal state. Secondly, since the pseudo secondary gap is dubious at right, and its new external right side 113-126 is not optimal, the position 113 is actually wrong at both sides, and needs to be removed and reinserted to the left side of position 108. Finally, the positions 125-124 will be reinserted into the oligomer. The insertion process is gap-wise, and more tedious than it sounds. The insertion that gives rise to the shortest distance is saved. The distance saving was roughly 710 – 591 = 119 units. A positive distance change guarantees this rearrangement is needed.
Figure 21. Separability test to roughly determine if the breakup of a gap can lead to distance reduction
The steps depicted in Figure 21 successfully exposed primary position 133 to form a new gap with the key position 136. The position 108 became the next key position.
Separability test can also be applied to situations where the primary gap is missing, but primary position is possible to break away from a dubious gap a few positions ahead.
Two points were worth mentioning about separability test. First, the two gaps should be relatively close. In this case the primary gap and pseudo secondary gap were apart by only two positions, but they could be considered as real gap chain members. The two gaps were equally important because the key position could be position 101 when the rearrangement came from the opposite direction. Second, the calculation was qualitative only. Throughout this paper we avoided using any quantitative measurement to help decide how and where. In fact, we only say gap one was longer than gap two, but never say how much gap one was longer than gap two, simply how much had no importance to our question.
3.6.2 Implementation of Rearrangement via Local Value Positions
When a key position had its value positions all located downstream and only a few positions apart, it formed a dead end (Figure 22). Dead end rearrangement also follows the scenarios listed in Figure 16.
Figure 22. Key position 160 forms a dead end because it can’t revert back to 155.
The name dead end suggested we must find an exit in order to break the dead end. Indeed, such an exit was always there to make the rearrangement happen. This wasn’t surprising because the path was bidirectional, and this exit position was really the value position for another key position if rearrangement was proceeded from the other direction. The exit position was quite easy to identify. In all cases examined here, it was one of a dubious gap block’s side. The selection could be further confirmed by two-way selection method as there existed remote value positions for the exit position, because the exit position was in fact the next key position.
Observing from QA194-9616, the key position was often part of a triplet or next to a triplet. When this was the case, the triplet must be dedubiousified or rearranged into another configuration, which was a critical step for a precise final outcome. If the exit position was not close to the key position, there existed a short cut that could make the process faster and more general. The dead end in Figure 22 was rearranged using the short cut shown in Figure 23.
Figure 23. Rearranging a dead end using a short cut. Block reversal had no effect here.
For the short cut to work, the exit position must be at the block side next to the dubious position (in this case the secondary gap was dubious at both sides). It was important to treat the oligomer with block reversal as the last step so that any disorientations introduced by the short cut could be eliminated. The steps in Figure 23 could be applied to the dead end in Figure 13, where block reversal was a must.
3.6.3 Implementation of Rearrangement without the Assistance from Dubiousness
There were dubiousless wrong gaps existing in the path. Dubiouslessness brought us back into the dark age when we had so much difficulty making right choices to find an optimal tour. Good thing was that there were only few dubiousless wrong gaps in the path which was achieved through exchange and block reversal. Nevertheless, we were still facing them, must deal with them, and must deal with them well so that they wouldn’t mess up our effort. The question was if we had solutions to it? In reality, we didn’t have reliable solutions that were general enough to be applicable to all situations. Fortunately, it wouldn’t make our dubiousness-aided rearrangements too hard because they were just a few that were embedded in the sea of dubious TspGaps. Figure 24 shows the only two dubiousless gaps from QA194-9616.
Figure 24. The lone two dubiousless gaps 101-86 and 79-75 from QA194-9616
If this stretch of path was approached from key position 108, then there was little difficulty, because value position 101 must break with position 86 to form a gap 108-101, which made 86 the dead-end key position for the next round. The exit position could be either 79 or 75, though neither of them fit the requirement for an exit position, but their value position 70 formed a wrong gap with position 24, which was full of dubious TspGaps. However, when position 70 was the key position, the means to aid rearrangement was not obviously available. The gap 75-74 could be the target for breakup, because of the secondary triplet 71-[77]-74. But it would generate another dead-end without an exit. Furthermore, another triplet that overlapped two positions with it shared the same dubious side, and no sensible rearrangement could be delineated from them. By the way two tandem triplets with such an alignment could be used as a pattern to filter out those false dubious TspGaps if they posed a misleading possibility. In order to rearrange the key position 70 and its value positions 79 and 75, both sides of the value positions were scanned to search for positions that could serve as next key positions. Such a search yielded results immediately. Position 101 was quickly identified to be the next key position after separating from position 86. A short cut could be used here to make the next few steps much simpler and more general. Position 70 directly formed a gap with 86 and a piece of path below was then subject to block reversal as shown in Figure 25. This example showed the power of block reversal in action in such a clean way. This approach was feasible, but rearrangement without dubiousness obviously made the whole process more complicated and the outcome less certain.
Figure 25. Optimization of an oligomer via block reversals
3.7 A Hypothetic Self-correcting Mechanism
Figure 26 illustrates the distance changes over the entire rearrangement process for QA194-9616, which consisted of about 16 individual rearrangements, depending on how each dubious gap chain was used for rearrangement. Distance reduction wasn’t guaranteed after each rearrangement, and we saw a distance increase instead in certain cases. Calculation of distance changes wasn’t accurate, and was a sort of estimation only, but in general this was enough for the purpose.
Rearrangement could go wrong. What can we do if it goes wrong indeed?The first thing is that we must be able to find out if the rearrangement has gone wrong.If rearrangement goes well in each step, we should see a continuous reduction in path distance as shown in Figure 26. To avoid misleading distance hikes in certain steps, we would need to measure distance changes over the previous three to four rearrangements. By doing so, we will be able to determine whether the general trend is increasing or decreasing in distance over the process. A continuous increase would clearly indicate that the process had moved into optimal part of the path, destroying good gaps instead of correcting wrong ones.
Figure 26. Distance changes over each rearrangement. Total of 16 individual rearrangements were carried out to complete the entire rearrangement process (QA194-9617).
Another general trend was that rearrangement should remove dubious TspGaps from the path and make it less dubious. Figure 27 shows dubious TspGaps from two oligomers before and after rearrangements. An absolute comparison wasn’t possible because the oligomer could have been rearranged and divided into two unrelated pieces. Nevertheless, the difference was still dramatic and stark. Example I was from a dead-end rearrangement. There was still same number of dubious gaps before and after, but dubious gaps after rearrangement was the result of a large gap 16-6, which formed a series with its neighbors. Example II was almost free of dubious gaps. Therefore, an increase in dubiousness could be a sign of rearrangement gone awry.
Figure 27. The numbers of dubious TspGaps before and after rearrangements
The final way to monitor the rearrangement was to reverse the order of the process since the path is bidirectional. If the rearrangement was moving along the track of wrong gaps, the final result should be the same from both directions.
Figure 28. A class diagram showing TspNode to be used to redo rearrangements in case of errors
Since the choice of value positions for a given key position was limited, and the choice of break site left or right to the value position for cleavage was limited, we would be able to redo the rearrangement using either the remaining value positions or another break site of the current value position. As a general approach, we could build a doubly linked list as shown in Figure 28, and each node in the list would maintain a complete record of all relevant information for the current rearrangement, including a copy of value position list, copies of open path and all closed paths prior to the current rearrangement. After rearrangement, the current node would do a distance check up to the previous three to four rearrangements. If the distances deviated from a decline trend, a correction action was deemed to be necessary.Such a self-correcting mechanism could ensure that the rearrangement process was always on the wrong track, eventually resulting in the path that was optimal. In this way, the chance to reach an optimal tour could be improved significantly.
4. Discussions and Conclusions
We have presented a quite complete picture of how to bring a random path to semi-optimal states and then to final optimal state using a simple, yet dynamic and powerful data structure, called TspGap. The structure offers various views on distances between different positions from different angles, and the distance differences so observed can reveal some intrinsic abnormalities of a non-optimal path. Although a single such structure has very limited use in solving this complex graph problem, its super dynamic nature makes it possible to check the entire path and find out all potential gaps that make the path not optimal, and at the same time it provides different means to correct abnormalities at the different stages of optimization.
The most obvious use of TspGaps is to exchange positions in the path to reduce the path distance via Exchanger object, which consists of a donor TspGap and recipient TspGap. The recipient TspGaps are selected using a general two-way selection method. There are three different methods to determine if an exchange should be committed. The first method is based on total distance change calculated from both donor and recipient gaps, which is best used when the path is still in a very rough stage. As the path becomes shorter, and total distance change is no longer able to achieve further improvement, another method called anti-sense exchange can detect certain subtle abnormalities based on gap distance and TspGap’s total distance. Anti-sense exchange will result in a total distance increase, which is not allowed by the first method, yet pave the way for block reversal action to bring down the distance. The third method is called minimum distance increase. This method was not described in the Experiment section and its use is limited to a path near-optimal. In this method, you calculate the distance increase upon accepting positions into an empty block. The one with a smaller distance increase is the winner.
The structure offers another great, universal capability called block reversal. Since a TspGap’s block distance is fixed, its placement between TspGap’s left and right can have two orientations, thus two different distances. If the reverse distance is smaller than its forward distance, we must reverse block’s orientation for the smaller distance. This simple operation is so powerful that it can achieve something far beyond belief. There seemed to exist some pivotal positions in the path, which contributed to the effectiveness of block reversal in a critical way. Without changing their positions in the path, the block reversal could reverse all other changes back to the original state. A path wide scan can reverse all blocks that need to be reversed. Block reversal action is a must, not an option to clear away any orientation related issues left by operations like exchange, position insertions, and so on.
Exchange followed by block reversal can quickly transform a random path into a semi-optimal one. A path is semi-optimal not because the path itself is not optimal but because it is plagued with cross-joins. However, neither exchange nor block reversal could eliminate existing cross-joins or create new cross-joins. Eliminating cross-joins requires movements of many pieces in their entirety to where they really belong in its optimal state. This process is called rearrangement. The key position is the position looking for its optimal neighbor, and value positions are candidates for such an optimal neighbor. Value positions are usually the first five positions in the index map’s value list for the key position. The ambiguity in choosing the right value positions and how to break the value position from its current neighbors often made rearrangement inaccurate.
The structure’s last property provides us with what was needed most in our effort to bring a semi-optimal path into its final optimal state. If a TspGap’s direct distance between its pl and pr, excluding block distance, is smaller than any of its two side gaps, this TspGap is said to be dubious. A TspGap’s dubiousness was first observed during optimizing an oligomer, and its application did make algorithms simpler but not much (data not shown). In addition, dubious TspGaps are ubiquitous even throughout the optimal path, and seemed quite random in one gap’s relation to others, which obscured their true usefulness.
Upon careful observation it was found that dubious TspGap pool collected from an oligomer where the value position was located in the middle contained TspGaps that were intrinsically related to each other and well correlated to sub-optimal gaps in the path. There were rules established to select the initial TspGap, called primary TspGap, which contains the value position. The primary TspGap determines how the value position is cleaved to form a new gap with the key position. Despite of the certainty of breaking dubious gap to bring the center gap of the TspGap into being, it was still a challenge to determine how to break the non-dubious gap in order to form the center gap. Since any gaps that need to be broken to make way for new gaps are sub-optimal, and it was assumed that they would appear in other dubious TspGaps. And it is really the case. The breakup of a dubious TspGap’s non-dubious gap was decided by another dubious TspGap, called secondary TspGap, in which the non-dubious gap of the previous TspGap becomes dubious and like the previous dubious TspGap it can be broken with certainty. In this way they form a dubious gap chain, in which the next dubious gap will aid the rearrangement of its previous one in a precise manner. More complex of the nature of sub-optimal gaps in the oligomer, longer the length of the chain. Generally, the primary TspGap acts on a cross-join, and secondary TspGaps act on any sub-optimal gaps, including cross-joins. In majority of cases, the chain contains 3 dubious TspGaps, including the primary TspGap. The chain expanded the scope of the primary TspGap considerably, and made the rearrangement much faster with many fewer steps.
The implementation of chain-based rearrangement must be as general as possible to reduce the amount of code to deal with complexity of different kinds of dubious gap chains. The implementation that based on gap-alignment was found to be quite general and clean among other choices and used in this work. The alignment usually means that two gaps share at least two positions in whatever manner. If gaps in the chain do not align with each other or share only one position in this sense, separability test could be employed to ascertain the correctness of rearrangement between them. If the key position and its value positions were co-located in a short oligomer that started with the key position, they formed a dead-end. The critical step to process a dead-end oligomer was to identify the exit position so that the dead end could be broken and move forward. In the absence of dubiousness for some wrong gaps, the implementation assumed that this dubiousness-free piece was sandwiched between pieces that were rich in dubiousness so that the uncertainty of rearrangement could be obliviated to maximum.
Dubiousness-aided rearrangement was fast, efficient, and most of all, precise, which came as a surprise. Every dubious gap in the chain plays a role to make the outcome optimal. The dubiousness seems to suit particularly well on cross-joins and wrong gaps caused by cross-joins. Is the correlation between dubiousness and cross-joins a mere coincident or a consequence of exchange and block reversal operations? There was a big “IF” if the dubious gap’s distance were a few units shorter that made the dubiousness disappear, cross-joins would still exist but in other ways? Why majority of gaps involved in cross-joins appear in TspGaps as dubious gaps? Do many else conditions exist? If many of those else conditions did exist, then the observations presented here would collapse. Taking as granted, and assuming yes because it is in this way.
The solution described here has only been tested on a couple of sub-optimal QA194 paths, its generality seems obvious, although it needs to be verified using paths with more cities. In future work, scenarios listed in Figure 16 need to be expanded to include all possible ones so that all kinds of situations from much bigger maps could be processed accordingly. Even if this solution alone is not sufficient to get an optimal tour, it might complement other existing algorithms with speed, efficiency, and precision.
QA194-9616
0-->5(329)-->7(69)-->15(102)-->12(78)-->10(66)-->6(107)-->16(143)-->13(50)-->22(77)-->24(30)-->70(109)-->75(61)-->74(55)-->77(20)-->71(37)-->73(44)-->68(36)-->59(51)-->56(31)-->44(25)-->63(37)-->69(26)-->76(39)-->78(27)-->80(4)-->82(27)-->87(35)-->91(35)-->96(23)-->94(19)-->95(12)-->92(17)-->90(50)-->102(74)-->101(66)-->86(80)-->79(50)-->81(110)-->61(116)-->58(13)-->35(46)-->62(98)-->19(125)-->64(110)-->84(106)-->85(13)-->97(89)-->89(90)-->88(34)-->93(41)-->98(30)-->100(16)-->103(33)-->110(39)-->129(188)-->126(88)-->131(45)-->133(11)-->136(50)-->139(25)-->141(29)-->145(26)-->148(19)-->144(37)-->155(60)-->160(83)-->162(12)-->163(24)-->168(53)-->175(48)-->181(37)-->171(53)-->178(26)-->173(33)-->172(3)-->174(34)-->182(54)-->185(21)-->186(21)-->193(73)-->189(67)-->191(48)-->190(11)-->188(9)-->187(29)-->183(38)-->176(39)-->180(19)-->177(42)-->179(19)-->192(80)-->184(83)-->170(57)-->169(34)-->166(33)-->167(38)-->164(35)-->158(42)-->157(26)-->161(26)-->165(59)-->159(68)-->154(90)-->150(36)-->146(58)-->151(43)-->140(45)-->143(43)-->149(20)-->152(21)-->156(31)-->153(27)-->138(68)-->137(20)-->125(125)-->124(8)-->113(115)-->108(83)-->112(26)-->118(85)-->121(91)-->117(121)-->130(111)-->135(38)-->147(72)-->142(30)-->134(62)-->128(28)-->132(40)-->127(74)-->123(51)-->122(10)-->119(81)-->120(37)-->116(45)-->115(24)-->114(24)-->111(37)-->109(8)-->107(54)-->106(1)-->105(45)-->104(22)-->99(87)-->83(85)-->67(92)-->65(26)-->72(24)-->66(19)-->60(16)-->57(19)-->55(8)-->52(7)-->51(1)-->53(4)-->54(9)-->48(4)-->49(9)-->41(11)-->43(6)-->47(9)-->45(3)-->40(5)-->42(8)-->46(13)-->50(10)-->38(15)-->36(23)-->28(39)-->27(24)-->32(33)-->25(72)-->23(4)-->20(49)-->17(16)-->21(59)-->26(46)-->33(35)-->39(10)-->37(10)-->34(13)-->30(16)-->31(14)-->29(10)-->18(73)-->14(45)-->11(71)-->9(50)-->8(6)-->4(109)-->2(229)-->1(195)-->3(150)-->0(370)-->9616, size = 194, optimal distance = 9352.
Posted on: March 14, 2021