home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!gatech!ncar!noao!amethyst!organpipe.uug.arizona.edu!news
- From: dave@cs.arizona.edu (Dave Schaumann)
- Newsgroups: comp.programming
- Subject: Re: Locating duplicates
- Message-ID: <1993Jan2.041658.21596@organpipe.uug.arizona.edu>
- Date: 2 Jan 93 04:16:58 GMT
- References: <C07FJo.4vq@ux1.cso.uiuc.edu>
- Sender: news@organpipe.uug.arizona.edu
- Reply-To: dave@cs.arizona.edu (Dave Schaumann)
- Organization: University of Arizona
- Lines: 27
- In-Reply-To: ceblair@ux1.cso.uiuc.edu (Charles Blair)
-
- In article <C07FJo.4vq@ux1.cso.uiuc.edu>, ceblair@ux1 (Charles Blair) writes:
- > Given a large array of integers. Is there a way to identify duplicates
- >which is substantially less work than sorting the array?
-
- Hmm... imagine we have an array with just two elements that are equal.
- Now assume we have some duplicate finder which does not sort the array.
- Assuming that the values and positions of the data are not related in
- some way, it seems reasonable to assume that we can arrange the values
- so that that the algorithm must try every possible combination before
- it finds the duplicate pair.
-
- Not an air-tight proof, but it seems reasonable to assert that no
- algorithm that asserts anything about the order of the values in
- the array can find a duplicate in less than O(n*(n-1)/2) time in the
- worst case, and clearly using an O(nlogn) sort and a linear scan
- is better than that. To beat sort-and-scan, you would need some kind
- of linear-or-better ordering algorithm that would allow you to scan
- for duplicates with a linear-or-better scanning algorithm.
-
- However, I think that generally you can do some things to speed up the
- average case of sort-and-scan. For instance, you could modify a heap
- sort algorithm so that it stops as soon as it finds a duplicate. Also,
- you could employ a hashing scheme based on the lowest bytes of the values,
- which would break up your data into 256 independant groups.
-
- --
- Dave Schaumann dave@cs.arizona.edu
-