home *** CD-ROM | disk | FTP | other *** search
- All timings run on an A1000 w/ 68010, 4Meg of fast ram & 1/2Meg of
- chip ram, off of a Supra 60 drive. My standard work environment was in
- place: interlaced morerow'ed WB screen and 50K stack, with the
- following active processes:
-
- Task Pri Address Command Directory
- 1 0 251948 jobs src:treewalk
- 2 0 250878 emacs src:treewalk
- 3 0 213e40 SupraMount
- 4 20 24fb38 bin:startprogs/machII RAM DISK:
- 6 0 2552f0 bin:startprogs/wicon RAM DISK:
-
- In addition, snipit, ARexx 1.10, srt, FF, conman, installbeep the wb
- and mymenu were in place.
-
- The current directory was src:treewalk, the directory tree scanned was
- rooted at src:tmp, consisting of about 20Meg of random stuff. It
- wasn't changed throughout.
-
- Find is a PA find, available on fish disk 197. Treewalk is the binary
- included in this distribution. files is Lattice's files command,
- version 1.01, from Lattice C 5.02.
-
- Though files & treewalk are residentable, find is not. Therefore, all
- three commands were run non-resident to even the field. The object was
- to measure the algorithms used, not the implementation details.
-
-
- First test: walking a large tree.
-
- Timings labeled "no output" were made from Rexx, via a script that
- ran each command 10 times, following each run by the time elapsed
- during the run, measured in seconds. The output was run through "grep
- -v src:" to throw away all output but timings and error messages.
-
- Timings labeled "output" consisted of one run, with the output going
- to standard out.
-
-
- find src:tmp -print, output: 115.62, no output:
- 28.04 28.00 28.14 28.08 28.10 28.02 28.02 28.08 28.00 28.18
- average: 28.066
-
- treewalk dir src:tmp, output: 122.48, no output:
- 46.66 46.54 46.78 46.60 46.48 46.54 46.74 46.72 46.68 46.46
- average: 46.620
-
- files src:tmp, output: 240.70, no output
- 163.58 163.64 163.50 163.30 163.54 163.96 164.10 163.44 163.00 163.36
- average: 163.542
-
- note: files complained about multiple directories having to many files
- or being empty. It doesn't state which.
-
-
- Second test: listing files from a large file system that need to be
- backed up.
-
- This was lifted from my backup script, which normally process the
- output. To make this more realistic, the output run through "grep -v
- src:" which matches the actual use during backup (being run into
- execio for processing by the Rexx backup script). Once again 10
- iterations were run. Note: only two files actually met the entire
- selection criteria, which isn't unusual.
-
-
- find src:tmp -type f -newer src:last-backup ! -name *~ ! -name *.o -print
- 24.88 25.06 25.06 25.08 25.12 25.10 25.16 25.10 25.22 25.20
- average: 25.098
-
- treewalk dir src:tmp filter "file && src:last-backup.date < date
- && !(filename *= '*~' || filename *= '*.o')"
- 25.24 25.42 25.32 25.48 25.48 25.52 25.34 25.24 25.42 25.38
- average: 25.384
-
- Files is unable to perform this search, as it lacks the ability to
- test for files not matching a name.
-
-
- Third test: cleaning up a large working directory.
-
- A copy of the tree was created, and the copy is deleted in two passes:
- first, all files matching "*.o" were deleted, and then everything else
- was deleted. The deletion utility is "rm", which is a version of
- delete without the limit on the number of arguments. This allows
- treewalk to not have to invoke the command multiple times. While this
- may seem unfair to find, part of the purpose of creating treewalk was
- to overcome this disability in find. To make the test realistic, rm
- was resident for all runs of the test. Files has an option to cause
- file deletion which was used so that files would run in reasonable
- time. The sources to "rm" are available upon request.
-
- To avoid having to copy the tree multiple times, this test was run
- only one time for each command. Since the multiple run tests show
- little variance, it isn't expected that these will show much variance
- either.
-
- time
- find tmp:mg -name *.o -exec rm "{}" ";" 53.76
- find tmp:mg -exec rm "{}" ";" 150.68
-
- Note: Find doesn't support AmigaDOS wildcarding.
- Note: Find failed to delete any directories during the second phase of
- the trial, even though it deleted all regular files.
-
- treewalk dir tmp:mg filter "filename#='#?.o'" rm 24.40
- treewalk post dir tmp:mg rm 50.66
-
- Note: to insure that directories are seen after files, treewalk needs
- to be told to do a postorder traversal of the tree during the second
- phase.
- Note: treewalk did not delete the top-level directory, but this is to
- be expected from it's documentation.
-
- files -rerase -name #?.o tmp:mg 353.14
- files -rerase tmp:mg 159.88
-
- Note: files complains that it can't delete certain directories during
- the first phase. This is odd and somewhat annoying.
-
- As a couple of asides, I ran the filtered treewalk file removal,
- forcing treewalk to run a single copy of rm for each file to delete
- (the same behavior that find uses) to gain some measure of how
- important the ability to stack file names on a command is. I then ran
- the full delete using the standard AmigaDOS delete command, to see how
- that compared with the other cases.
-
- treewalk dir tmp:mg filter "filename#='#?.o'" single rm 43.38
- delete tmp:mg all quiet 32.64
-
-
- Final note: program sizes.
-
- find 13044 ----rwed 19-Apr-89 02:29:43
- treewalk 19904 --p-rwed Today 21:37:31
- files 24096 --p-rwed 19-Apr-89 02:30:22
-
-
- Some statistics:
-
- Running time as a percentage of the slowest program. Times for
- multiple run tests are the average.
-
- program files find treewalk <aside>
- test
- 1 output 100 48 51
- 1 no output 100 17 29
- 2 not possible 99 100
- 3 filtered 100 15 7 12
- 3 unfiltered 100 94 32 20
- total 1 & 3 100 38 26
-
-
- Conclusions:
-
- It should be clear that files is the least worthwhile tool of the lot.
- It's far slower than either of the other two, not as flexible, and
- much larger. It's inability to distinguish between an empty directory
- and to many files in a directory is a serious handicap for unattended
- use on large devices. That source is available to the other two tools,
- but not to files, doesn't help. Finally, it's insistence on blaming
- Lattice for it's existence every time it starts just adds insult to
- injury.
-
- Find appears appears to do the actual directory scanning faster than
- treewalk, but does most everything else slower. Possibly moving to a
- newer compiler technology would change this lack of speed. However,
- it's inability to execute a command with multiple file arguments seems
- to be a major performance hit, and that appears to be inherent in it's
- user interface, and not solvable without a major redesign (i.e. -
- treewalk). It is less flexible than treewalk, not having the ability
- to do things like select all files that were last modified on a
- particular date. However, it is smaller, which could be a benefit in
- disk-tight situations.
-
- Treewalk does ok for speed, but not wonderfully. In particular, if
- there is no filtering and the output is going to memory instead of the
- console, it runs slightly faster than 1/2 the speed of find. This is
- probably incurred by 1) not using the stack to store the visitation
- history, so as to avoid not using a vital resource, and 2) using a
- general treewalking routine instead of one that's inseparable from the
- program. However, it's ability to select which files to process is
- better than either alternative. In particular, rather than choosing a
- small set of primitives about files and hardwiring them into the
- program, it allows the user to access the data in the files
- FileInfoBlock, and manipulate it via C-like expressions. The addition
- of the ability to use ARexx macros as primitives is of unknown utility,
- but does allow treewalk to mimic the multiple-exec and the '-ok'
- features of find.
-
- The bottom line is that there is no technical reason to use files.
- Find may be preferable in some cases, but treewalk is probably to be
- preferred in the general case.
-
-
- Copyright 1989, Mike W. Meyer
- These files may be used and redistributed under the terms
- found in the file LICENSE.
-