NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / theory / 2772 < prev next >

Wrap

Text File | 1992-12-31 | 14.0 KB | 383 lines

Newsgroups: comp.theory Path: sparky!uunet!munnari.oz.au!metro!mama!naylor From: naylor@research.canon.oz.au (William Naylor) Subject: randomized-hash graph-iso algorithm - more discussion Message-ID: <C0402I.L7p@research.canon.oz.au> Sender: news@research.canon.oz.au Organization: Canon Information Systems Research Australia Date: Thu, 31 Dec 1992 05:47:06 GMT Lines: 372 A month ago I posted details of a randomized algorithm to attack the graph isomorphism problem, based on hashing both graphs with a carefully, but randomly constructed hash function. I argued that the success of this algorithm in solving very large practical problems suggests that graph isomorphism is probably in P. Then I argued that existence of a P algorithm for graph isomorphism probably implies the existence of a P algorithm for graph sub-isomorphism. My conclusion was that the Gismondi-Swart strategy for proving P=NP has a reasonable chance of success, even if their details are wrong. My post is repeated below, for convenience of the reader. Several people responded to my post; I summarize the responses here, and then give a detailed discussion of a particular response below. To summarize: 1) [1] states that graph-iso is in NP, but not known to be in P and not known to be NP-complete. As far as I know, this is still the state of knowledge (before the Gismondi- Swart work). I had claimed that graph-iso is known to be in NP ^ co-NP but NOT NP-complete. This was based on a hasty (incorrect) reading of [1]. Thank you all for correcting me on this point. 2) bdm@cs.anu.edu.au (Brendan McKay) claims that P algorithms for graph-iso are already known for a) graphs with bounded vertex degree b) graphs of bounded genus He has given no references to substantiate his claim. Could somebody out there help substantiate or refute his claim (preferably with references). 3) bdm@cs.anu.edu.au (Brendan McKay) claims that my algorithm does not work properly on large classes of graphs which are of interest in various mathematical work. If what he says is true, this weakens my argument in favor of the Gismondi-Swart strategy. Clearly, the algorithm I gave does not work on graphs where all vertices have the same degree. This problem can be easily patched (see below). Can somebody out there point out some classes of graphs for which my algorithm usually fails, which cannot be easily patched? 4) My original argument still seems correct. ==================================================================== MY ORIGINAL POST ==================================================================== I have some experience in writing programs to attack the graph isomorphism problem in practice. The application is from electrical CAD: determine if 2 circuits, represented by 2 netlists with the wires and part instances unlabeled, represent the same actual circuit. I have used a (well known?) randomized algorithm quite successfully, which usually runs in time n*log(n). I present the details of the algorithm below. The relevance of my algorithm to the discussion of the paper of Swart et. al. is this: the existence of this successful randomized algorithm for graph isomorphism, together with the fact that graph isomorphism is known to be in NP ^ co-NP but NOT NP-complete, is strong evidence that graph isomorphism is an easy problem, probably in P. It would be very odd indeed if this problem could be solved by a randomized algorithm in time n*log(n) or n^3, but by a deterministic algorithm in time no better than exponential. It is my experience that most problems which are in P can be phrased as linear programming problems (this is often not the most effective procedure for solving them, but it is still polynomial). Since graph isomorphism is very likely in P, it can probably be attacked with linear programming. It seems very plausible to me that a linear programming attack on graph isomorphism can be generalized to graph sub-isomorphism, which would, of course, prove P=NP. My point is this: whether or not the Swart algorithm and proof are correct in detail, I think that the strategy of their attack has a good chance of success. We in network-land should not dismiss their work at the first appearance of an error or inconsistency, but instead try to patch their proof and their algorithm. If this proves hopeless, we should try to invent another attack based on their strategy: use linear programming to attack graph isomorphism, then generalize to graph sub-isomorphism. Comments? ------------------------------------------------------------------------ DESCRIPTION of a RANDOMIZED GRAPH ISOMORPHISM ALGORITHM The idea is to hash both graphs with a carefully, but randomly constructed hash function. If the hashes are different, this proves that the graphs are different. If the hashes are the same, this suggests, but does not prove, that the graphs are the same. By using a wider hash word, and by applying successive hash functions that involve successively larger macroscopic properties of the graph, it is possible to drive the probability of random aliasing toward 0 (exponentially, with the amount of computation). In detail, the algorithm is: bool graphs_isomorphic(g1,g2) graph g1,g2; { int hash_radius,hash1,hash2; for(hash_radius=0;hash_radius<RADIUS_LIMIT;++hash_radius) { randomize_hash_function(); /* consult a random oracle to determine hash function details */ hash1 = hash_graph(g1,hash_radius); hash2 = hash_graph(g2,hash_radius); if(hash1 != hash2) { return(FALSE); /* proved not isomorphic */ } } return(TRUE); /* seem isomorphic, but not proved */ } int hash_graph(g,hash_radius) graph g; int hash_radius; { int hash,i; node n; edge e; /* initialize node hash values */ for_all_nodes_of_graph(g,n) { n->old_node_hash = node_degree(n); } /* hash the nodes with neighbors */ for(i=0;i<hash_radius;++i) { for_all_nodes_of_graph(g,n) { n->new_node_hash = n->old_node_hash; for_all_edges_of_node(n,e) { n->new_node_hash = n->new_node_hash + hash(other_node_of_edge(e,n)->old_node_hash); } } for_all_nodes_of_graph(g,n) { n->old_node_hash = n->new_node_hash; } } hash = 0; /* compute a summary hash value */ for_all_nodes_of_graph(g,n) { hash = hash + n->old_node_hash; } return(hash); } Notice that this algorithm can be systematically fooled into thinking that certain degenerate graphs are the same when they are not; but these cases never seem to come up in practice. I believe most or all if these degeneracies can be fixed by a slightly more complicated algorithm which runs slower. I first learned of this algorithm from Paul Filseth. I can think of no way to generalize this algorithm to solve graph sub-isomorphism (which is NP-complete). ==================================================================== DISCUSSION of a POST by bdm@cs.anu.edu.au (Brendan McKay) ==================================================================== In article 5857 of comp.theory, bdm@cs.anu.edu.au (Brendan McKay) wrote a detailed response to my post. I have attempted to clarify several points with him by e-mail, but I have received no reply. Therefore, I must discuss his post without benefit of discussion with him. In article 5857 of comp.theory, bdm@cs.anu.edu.au (Brendan McKay) writes: >In article G88@research.canon.oz.au, naylor@research.canon.oz.au (William Naylor) writes: >>I have some experience in writing programs to attack the graph isomorphism >>problem in practice. The application is from electrical CAD: determine >>if 2 circuits, represented by 2 netlists with the wires and part instances >>unlabeled, represent the same actual circuit. I have used a (well known?) >>randomized algorithm quite successfully, which usually runs in time n*log(n). >>I present the details of the algorithm below. > >First off is that your space of sample graphs is highly biased towards >easy graphs. Graphs made from circuits nearly always have a low genus >and only have vertices of low degree. My algorithm does not rely on the graphs having low genus or low degree. >Isomorphism for graphs of >bounded genus or bounded degree is already known to be in P. Could somebody out there help substantiate or refute this claim (preferably with references). >These graphs are also highly likely to have vertices with a mixture >of degrees, and vertices easily distinguishable from other vertices. The algorithm I presented relies on "enough" vertices having different degrees. How much is enough? This problem can be patched (see below). >By the way, your algorithm is O(# edges), which is more than >O(n log n) for dense graphs. If n is the amount of data presented to the algorithm, rather than the number of vertices, it is O(n*log(n)) >>The relevance of my algorithm to the discussion of the paper of >>Swart et. al. is this: >>the existence of this successful randomized algorithm for graph isomorphism, > >There is no evidence for such an algorithm. Yours only works for >restricted classes of graphs. The original algorithm I presented fails to distinguish graphs of the same order with vertices all having the same degree. This is a very small fraction of all possible graphs. Anyway, this problem can be patched (see below). Does anybody out there see any classes of graphs that can fool my (patched) algorithm? >>together with the fact that graph isomorphism is known to be in >>NP ^ co-NP but NOT NP-complete, is strong evidence that graph isomorphism is > >Graph isomorphism is not known to be in co-NP, as far as I know. Thanks for the correction. ... >Two comments: >(a) Just by looking at the number of nonisomorphic graphs ( about > 2^(n(n-1)/2)/n! ) you can see that your hash words need to be > at least about n(n-1)/2 bits long before you have a chance of > distinguishing all pairs of graphs. Remember, this is a randomized algorithm, not a deterministic algorithm. The objective is NOT to get an answer which is definitely correct, but rather to get an answer which is incorrect with probability <= P for each and every pair of graphs, where P is some constant < 1 (like 0.5). The algorithm can be run multiple times to reduce the probability of error. If we use a hash word of 32 bits, the probability of error due to aliasing is P = 2^-32 regardless of the size of the graphs involved. This is sufficiently small for most purposes. Nothing like this is possible with a deterministic algorithm. If a deterministic algorithm ever fails for a pair of graphs, it always fails for that pair (P=1). My randomized algorithm constructs its hash function by consultation with a random oracle. A pair that fails in one call to my algorithm may succeed on the next call because of different random numbers. The hope is that for each and every pair, the probability of failure can be <= P. >(b) As far as I can see, this fails to distinguish any two regular > graphs of the same order and degree. See the patch below for a fix to this "bug". > Probably many other pairs > also. Could somebody post such pairs? >>Notice that this algorithm can be systematically fooled into thinking >>that certain degenerate graphs are the same when they are not; >>but these cases never seem to come up in practice. > >Maybe not in your practice, but people who use graph isomorphism tests >in mathematical work (especially in coding and design theory) will find >that your method fails to work on almost every example. Can somebody please post some failure cases for the patched algorithm given below? >>I believe >>most or all if these degeneracies can be fixed by a slightly more complicated >>algorithm which runs slower. > >I would be very pleased if this is true, but frankly I doubt it. The original algorithm I presented fails to distinguish graphs of the same order with vertices all having the same degree. This can be patched as follows: 1) Construct N different hashes (as described below) for both graphs, where N is the number of vertices. 2) Try to find a matching between the N hashes from one graph to the N hashes of the other (this can be done in N*log(N) time). If a matching does not exist, this proves that the graphs are different. If a matching exists, this suggests, but does not prove, that the graphs are identical. 3) Construct the hashes as in "hash_graph" above, with these changes: a) Always use hash_radius = N, with N = number of vertices b) Rather than initializing all node hash values with the node degree, initialize a selected node with hash(node_degree(node)) and all others as before, with node_degree(node). This introduces an artificial initial distinction between the graph vertices, which propagates throughout the graph during the hash process. Select each possible node as "special" for the N hash values. This certainly fixes the degeneracy described above. I cannot think of any pairs of graphs which can consistently fool this algorithm. If anybody can see such a pair, could you please post it? >Methods that are as strong or stronger date back to the 1960s. >Modern approaches supplement invariant computations with backtracking. >Several programs exist that can handle thousands of vertices >(one is mine, enquiries welcome). Could somebody post some references to these algorithms? My (unpatched) program can work on graphs with 50,000 vertices in 5-10 minutes. I have been aware of this patch for several years, but I have never found it necessary to implement it for my application. REFERENCES [1] Garey & Johnson: Computers and Intractibility -- A Guide to the Theory of NP-Completeness, page 155 -- Will Naylor net: naylor@research.canon.oz.au mail: Canon Information Systems Research Australia phone: (61-2) 805-2921 P.O. Box 313 North Ryde, NSW 2113 fax: (61-2) 805-2929 Australia