home *** CD-ROM | disk | FTP | other *** search
- From: ajz@mentor.cc.purdue.edu.UUCP
- Newsgroups: alt.sources
- Subject: Re: Sorting Algorithm
- Message-ID: <FooBarmentor.cc.purdue.edu>
- Date: 27 Aug 90 07:18:24 GMT
-
-
- Having recently taken a course where a 3 week project was based upon sorting,
- here are my notes...
-
- Here's the background, the sort was performed on four 9-digit integers where
- a seperate file told which of those 4 integers was the most significant key and
- which was the least. The total time spent in the program had to be less than
- 4 seconds for a 4000 item list (ours was about 1 second and although .15
- seconds could have shaved off of it [ie, by killing all the error checking
- routines], I saw one that did it in .5 seconds).
-
- The first thing you should know is that any textbook routine like quicksort,
- mergesort, or even a radix sort failed to sort a random 4000 item list in under
- 4 seconds on our Gould NP-1.
-
- This is what my team learned...
- 1) Traversing a linked list to locate an item is very time consuming.
- 2) Dumb O(n^2) routines have very little overhead making them MUCH FASTER than
- O(n log n) routines (which have high overheads) for small values of n (<15).
- 3) An insertion sort [O(n^2)] is one of the fastest sorts for nearly sorted
- data especially when the unsorted items are close to each other.
- 4) Recursive routines took more time than non-recursive ones.
-
- The best routine I saw chain hashed the values into a 33333 item array, then
- it took the array and rebuilt a linked list out of it, and finally it ran an
- insertion sort on the list. For random data, there should be very few cases
- where more than 5% of the items in the rebuilt linked list will be out of
- place. Of these out of place items, they should be uniformly distributed
- within the linked list so that they don't have to be moved more than 3 spaces
- to place them back into their proper position.
-
- Since a 4000 item list will only fill the hash table to ~10%, there should be
- on average only 1.1 probes needed for each item. It should therefore take
- O(1.1n + n + c) where the second "n" comes from the insertion sort having to
- traverse the entire list at least once and the "c" is a constant depending upon
- how many items were out of place and how far they had to be moved on the
- average (shouldn't be more than 4000 * 5% * 3 = 600 or n * 5% * 3 = .15n).
- It should therefore average O(2.25n), have a best case of O(2n), and have
- a worst case of O(2n^2).
-
- The routine my team wrote insertion sorted every group of 12 integers, placed
- this group into an larger array, and then merge sorted this array. This gave
- a average time of O((n / 12) log (n / 12) + (12^2 / 4) * (n / 12)), a best
- case of O((n / 12) log (n / 12) + n), and a worst case of O((n / 12) log
- (n / 12) + (12^2 / 2) * (n / 12)).
-
- For 4000 items...
- First routine Second routine
- average time O(9000) O(14793.6)
- best time O(8000) O(6793.6)
- worst time O(3.2 * 10^7) O(26793.6)
-
- I should stress that the worst case for the second routine is pretty straight
- forward to find, but the worst case determination of the first routine would
- be quite a task. In practice, chances are the second routine will operate
- under unfavorable conditions many more times than the first routine.
-
- --
-
- T. Tim Hsu UUCP ...pur-ee!ajz@mentor.cc.purdue.edu
- ARPA ajz@mentor.cc.purdue.edu
- FAX 1 317 494 0566 BITNET xajz@PURCCVM
-