home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.theory
- Path: sparky!uunet!munnari.oz.au!metro!mama!naylor
- From: naylor@research.canon.oz.au (William Naylor)
- Subject: randomized-hash graph-iso algorithm - more discussion
- Message-ID: <C0402I.L7p@research.canon.oz.au>
- Sender: news@research.canon.oz.au
- Organization: Canon Information Systems Research Australia
- Date: Thu, 31 Dec 1992 05:47:06 GMT
- Lines: 372
-
- A month ago I posted details of a randomized algorithm to attack the
- graph isomorphism problem, based on hashing both graphs with a
- carefully, but randomly constructed hash function.
- I argued that the success of this algorithm
- in solving very large practical problems suggests that graph isomorphism
- is probably in P. Then I argued that existence of a P algorithm for
- graph isomorphism probably implies the existence of a P algorithm
- for graph sub-isomorphism. My conclusion was that the Gismondi-Swart
- strategy for proving P=NP has a reasonable chance of success,
- even if their details are wrong.
-
- My post is repeated below, for convenience of the reader.
-
- Several people responded to my post; I summarize the responses here, and
- then give a detailed discussion of a particular response below.
-
- To summarize:
-
- 1) [1] states that graph-iso is in NP, but not known to be in P and not
- known to be NP-complete.
- As far as I know, this is still the state of knowledge
- (before the Gismondi- Swart work).
-
- I had claimed that graph-iso is known to be in NP ^ co-NP but
- NOT NP-complete. This was based on a hasty (incorrect) reading of [1].
- Thank you all for correcting me on this point.
-
- 2) bdm@cs.anu.edu.au (Brendan McKay) claims that P algorithms for
- graph-iso are already known for
-
- a) graphs with bounded vertex degree
- b) graphs of bounded genus
-
- He has given no references to substantiate his claim.
-
- Could somebody out there help substantiate or refute his claim
- (preferably with references).
-
- 3) bdm@cs.anu.edu.au (Brendan McKay) claims that my algorithm does
- not work properly on large classes of graphs which are of interest in
- various mathematical work. If what he says is true, this
- weakens my argument in favor of the Gismondi-Swart strategy.
-
- Clearly, the algorithm I gave does not work on graphs where
- all vertices have the same degree. This problem can be easily patched
- (see below).
-
- Can somebody out there point out some classes of graphs
- for which my algorithm usually fails, which cannot be easily patched?
-
- 4) My original argument still seems correct.
-
-
- ====================================================================
- MY ORIGINAL POST
- ====================================================================
-
- I have some experience in writing programs to attack the graph isomorphism
- problem in practice. The application is from electrical CAD: determine
- if 2 circuits, represented by 2 netlists with the wires and part instances
- unlabeled, represent the same actual circuit. I have used a (well known?)
- randomized algorithm quite successfully, which usually runs in time n*log(n).
- I present the details of the algorithm below.
-
- The relevance of my algorithm to the discussion of the paper of
- Swart et. al. is this:
- the existence of this successful randomized algorithm for graph isomorphism,
- together with the fact that graph isomorphism is known to be in
- NP ^ co-NP but NOT NP-complete, is strong evidence that graph isomorphism is
- an easy problem, probably in P. It would be very odd indeed if this
- problem could be solved by a randomized algorithm in time n*log(n) or n^3,
- but by a deterministic algorithm in time no better than exponential.
-
- It is my experience that most problems which are in P can be phrased as
- linear programming problems (this is often not the most effective
- procedure for solving them, but it is still polynomial).
- Since graph isomorphism is very likely in P, it can probably be attacked
- with linear programming. It seems very plausible to me that a
- linear programming attack on graph isomorphism can be generalized to
- graph sub-isomorphism, which would, of course, prove P=NP.
-
- My point is this: whether or not the Swart algorithm and proof are
- correct in detail, I think that the strategy of their attack has a good
- chance of success. We in network-land should not dismiss their work
- at the first appearance of an error or inconsistency, but instead
- try to patch their proof and their algorithm. If this proves hopeless,
- we should try to invent another attack based on their strategy:
- use linear programming to attack graph isomorphism, then generalize
- to graph sub-isomorphism.
-
- Comments?
-
-
- ------------------------------------------------------------------------
-
- DESCRIPTION of a RANDOMIZED GRAPH ISOMORPHISM ALGORITHM
-
- The idea is to hash both graphs with a carefully, but randomly
- constructed hash function.
- If the hashes are different, this proves that the graphs are different.
- If the hashes are the same, this suggests, but does not prove, that
- the graphs are the same. By using a wider hash word, and by applying
- successive hash functions that involve successively larger macroscopic
- properties of the graph, it is possible to drive the probability
- of random aliasing toward 0 (exponentially, with the amount of computation).
-
- In detail, the algorithm is:
-
-
- bool graphs_isomorphic(g1,g2)
- graph g1,g2;
- {
- int hash_radius,hash1,hash2;
-
- for(hash_radius=0;hash_radius<RADIUS_LIMIT;++hash_radius)
- {
- randomize_hash_function(); /* consult a random oracle to determine
- hash function details */
-
- hash1 = hash_graph(g1,hash_radius);
- hash2 = hash_graph(g2,hash_radius);
-
- if(hash1 != hash2)
- {
- return(FALSE); /* proved not isomorphic */
- }
- }
-
- return(TRUE); /* seem isomorphic, but not proved */
- }
-
- int hash_graph(g,hash_radius)
- graph g;
- int hash_radius;
- {
- int hash,i;
- node n;
- edge e;
-
- /* initialize node hash values */
- for_all_nodes_of_graph(g,n)
- {
- n->old_node_hash = node_degree(n);
- }
-
- /* hash the nodes with neighbors */
- for(i=0;i<hash_radius;++i)
- {
- for_all_nodes_of_graph(g,n)
- {
- n->new_node_hash = n->old_node_hash;
-
- for_all_edges_of_node(n,e)
- {
- n->new_node_hash = n->new_node_hash +
- hash(other_node_of_edge(e,n)->old_node_hash);
- }
- }
-
- for_all_nodes_of_graph(g,n)
- {
- n->old_node_hash = n->new_node_hash;
- }
- }
-
- hash = 0;
-
- /* compute a summary hash value */
- for_all_nodes_of_graph(g,n)
- {
- hash = hash + n->old_node_hash;
- }
-
- return(hash);
- }
-
-
- Notice that this algorithm can be systematically fooled into thinking
- that certain degenerate graphs are the same when they are not;
- but these cases never seem to come up in practice. I believe
- most or all if these degeneracies can be fixed by a slightly more complicated
- algorithm which runs slower.
-
- I first learned of this algorithm from Paul Filseth.
-
- I can think of no way to generalize this algorithm to solve
- graph sub-isomorphism (which is NP-complete).
-
-
- ====================================================================
- DISCUSSION of a POST by bdm@cs.anu.edu.au (Brendan McKay)
- ====================================================================
-
- In article 5857 of comp.theory, bdm@cs.anu.edu.au (Brendan McKay) wrote
- a detailed response to my post. I have attempted to clarify several
- points with him by e-mail, but I have received no reply. Therefore,
- I must discuss his post without benefit of discussion with him.
-
- In article 5857 of comp.theory, bdm@cs.anu.edu.au (Brendan McKay) writes:
-
- >In article G88@research.canon.oz.au, naylor@research.canon.oz.au (William Naylor) writes:
- >>I have some experience in writing programs to attack the graph isomorphism
- >>problem in practice. The application is from electrical CAD: determine
- >>if 2 circuits, represented by 2 netlists with the wires and part instances
- >>unlabeled, represent the same actual circuit. I have used a (well known?)
- >>randomized algorithm quite successfully, which usually runs in time n*log(n).
- >>I present the details of the algorithm below.
- >
- >First off is that your space of sample graphs is highly biased towards
- >easy graphs. Graphs made from circuits nearly always have a low genus
- >and only have vertices of low degree.
-
- My algorithm does not rely on the graphs having low genus or low degree.
-
- >Isomorphism for graphs of
- >bounded genus or bounded degree is already known to be in P.
-
- Could somebody out there help substantiate or refute this claim
- (preferably with references).
-
- >These graphs are also highly likely to have vertices with a mixture
- >of degrees, and vertices easily distinguishable from other vertices.
-
- The algorithm I presented relies on "enough" vertices having different
- degrees. How much is enough? This problem can be patched (see below).
-
- >By the way, your algorithm is O(# edges), which is more than
- >O(n log n) for dense graphs.
-
- If n is the amount of data presented to the algorithm, rather than the
- number of vertices, it is O(n*log(n))
-
- >>The relevance of my algorithm to the discussion of the paper of
- >>Swart et. al. is this:
- >>the existence of this successful randomized algorithm for graph isomorphism,
- >
- >There is no evidence for such an algorithm. Yours only works for
- >restricted classes of graphs.
-
- The original algorithm I presented fails to distinguish graphs of the
- same order with vertices all having the same degree. This is
- a very small fraction of all possible graphs. Anyway, this problem can
- be patched (see below).
-
- Does anybody out there see any classes of graphs that can fool my
- (patched) algorithm?
-
- >>together with the fact that graph isomorphism is known to be in
- >>NP ^ co-NP but NOT NP-complete, is strong evidence that graph isomorphism is
- >
- >Graph isomorphism is not known to be in co-NP, as far as I know.
-
- Thanks for the correction.
-
- ...
-
- >Two comments:
- >(a) Just by looking at the number of nonisomorphic graphs ( about
- > 2^(n(n-1)/2)/n! ) you can see that your hash words need to be
- > at least about n(n-1)/2 bits long before you have a chance of
- > distinguishing all pairs of graphs.
-
- Remember, this is a randomized algorithm, not a deterministic algorithm.
- The objective is NOT to
- get an answer which is definitely correct, but rather to get an
- answer which is incorrect with probability <= P for each and every pair of
- graphs, where P is some constant < 1 (like 0.5). The algorithm
- can be run multiple times to reduce the probability of error.
-
- If we use a hash word of 32 bits, the probability of error due to
- aliasing is
-
- P = 2^-32
-
- regardless of the size of the graphs involved. This is sufficiently
- small for most purposes.
-
- Nothing like this is possible with a deterministic algorithm. If a
- deterministic algorithm
- ever fails for a pair of graphs, it always fails for that pair (P=1).
- My randomized algorithm constructs its hash function by consultation
- with a random oracle. A pair that fails in one call to my algorithm
- may succeed on the next call because of different random numbers.
- The hope is that for each and every pair, the probability of failure
- can be <= P.
-
- >(b) As far as I can see, this fails to distinguish any two regular
- > graphs of the same order and degree.
-
- See the patch below for a fix to this "bug".
-
- > Probably many other pairs
- > also.
-
- Could somebody post such pairs?
-
- >>Notice that this algorithm can be systematically fooled into thinking
- >>that certain degenerate graphs are the same when they are not;
- >>but these cases never seem to come up in practice.
- >
- >Maybe not in your practice, but people who use graph isomorphism tests
- >in mathematical work (especially in coding and design theory) will find
- >that your method fails to work on almost every example.
-
- Can somebody please post some failure cases for the patched algorithm given
- below?
-
- >>I believe
- >>most or all if these degeneracies can be fixed by a slightly more complicated
- >>algorithm which runs slower.
- >
- >I would be very pleased if this is true, but frankly I doubt it.
-
- The original algorithm I presented fails to distinguish graphs of the
- same order with vertices all having the same degree. This can be patched
- as follows:
-
- 1) Construct N different hashes (as described below) for both graphs,
- where N is the number of vertices.
-
- 2) Try to find a matching between the N hashes from one graph to the
- N hashes of the other (this can be done in N*log(N) time).
- If a matching does not exist, this proves
- that the graphs are different. If a matching exists, this suggests,
- but does not prove, that the graphs are identical.
-
- 3) Construct the hashes as in "hash_graph" above, with these changes:
-
- a) Always use hash_radius = N, with N = number of vertices
-
- b) Rather than initializing all node hash values with the
- node degree, initialize a selected node with
-
- hash(node_degree(node))
-
- and all others as before, with
-
- node_degree(node).
-
- This introduces an artificial initial distinction between the
- graph vertices, which propagates throughout the graph during
- the hash process.
-
- Select each possible node as "special" for the N hash values.
-
- This certainly fixes the degeneracy described above. I cannot
- think of any pairs of graphs which can consistently fool this algorithm.
- If anybody can see such a pair, could you please post it?
-
- >Methods that are as strong or stronger date back to the 1960s.
- >Modern approaches supplement invariant computations with backtracking.
- >Several programs exist that can handle thousands of vertices
- >(one is mine, enquiries welcome).
-
- Could somebody post some references to these algorithms?
-
- My (unpatched) program can work on graphs with 50,000 vertices in 5-10 minutes.
-
- I have been aware of this patch for several years, but I have never
- found it necessary to implement it for my application.
-
-
- REFERENCES
-
- [1] Garey & Johnson: Computers and Intractibility -- A Guide to the
- Theory of NP-Completeness, page 155
-
- --
- Will Naylor net: naylor@research.canon.oz.au
- mail: Canon Information Systems Research Australia
- phone: (61-2) 805-2921 P.O. Box 313 North Ryde, NSW 2113
- fax: (61-2) 805-2929 Australia
-