home *** CD-ROM | disk | FTP | other *** search
- { SORT: merge and sort multiple text files, replaces DOS SORT }
-
- { Copyright, 1988, 1989, by J. W. Rider }
-
- { Syntax :
-
- SORT [options] [<unsorted-file-spec> ... ]
-
- Where available options are:
-
- "/r" reverses the sense of the sort,
-
- "/"+# sorts the lines from the data in column #
- -- a second # defines the last column of the key field.
- -- subsequent #'s are ignored.
-
- "/b" ignores leading blanks (spaces, tabs) in determining
- the key.
-
- "/c" makes the sort case-insensitive 'a'='A',
-
- "/d" "dictionary" vs "ascii" sort, alphanumerics count only
-
- "/f" interpret column numbers as "awk" field numbers,
- -- does not automatically assume "/b"
-
- "/h" displays help message rather than sort input.
-
- "/k" outputs only the key not the whole line.
-
- "/n" sorts the lines numerically vice alphabetically.
- -- "/n" automatically assumes "/b"; in fact, "/n" will search
- -- an entire key field for any hint of a numeric. "DOS1," "DOS2",
- -- "DOS3.3" will all be correctly sorted.
-
- "/t"C makes "C" a field delimiter vice blanks. To include
- blanks, use "/t" without any character.
-
- "/u" eliminates multiple copies of identical lines
- -- "/u" might not work correctly if keys other than the whole
- -- original line are specified: "/+#","/c","/n","/b"
-
- If first filename is missing or is '-', reads from standard input.
- Writes sorted lines to standard output.
-
- The first two options, "/r" and "/"+#, and default use of standard
- input and output are provided for syntax compatibility with MSDOS
- SORT. The other options and command line file-naming are extensions that
- are inspired from Unix implementations of SORT. }
-
- { If the heap is not large enough to completely hold the sorted file,
- or if there is a problem with input/output file names, then
- SORT displays an error message to 'CON' and returns ERRORLEVEL 1
- to the parent process. }
-
- { Even if there is not enough room in the heap to sort the file in
- memory, sort tries to provide a partial sorting of the file. The
- output can be further sorted by sort until the file is completely
- sorted. }
-
- {$A+,B-,D+,E-,F-,I-,L-,N-,O-,R-,S-,V-}
- {$M 16384,0,655360}
-
- program sort;
-
- uses dos; { added to facilitate wildcards in filenames }
-
- { In fact, my personal SORT utility "uses" considerably more units than
- what is indicated here. However, my units are not standard. I would
- not expect the average user to have ever heard of them. Nor would I expect
- the advanced user to even *want* to use them. Instead, I have extracted
- the components that SORT references and 'included' them instead. }
-
- const grain = 16 ; { heap granularity; usage here requires power of two }
- defaultcase = true; { some SORTs start out with different
- case sensitivity. Change it here. TRUE means case sensitive. }
-
- { Granularity of heap is set to 'grain' bytes, see Turbo Pascal
- Reference Guide, pg 199.
-
- SYSTEM.FREEMIN is also set to 16000 in SORTINIT. No investigation
- has been made as to whether or not these values are optimal.
- Failure to set FREEMIN large enough will cause a run-time error
- for files that are too large. }
-
- {$I sort.typ } { Defines the binary tree records. }
- {$I sort.var } { Defines all global variables. Includes
- procedure "sortinit".}
-
-
- { General functions: Some of these functions are written in such a way as
- to be generally useful. }
-
- {$I anstr.fun } { Strip all non-alphanumerics from string }
- {$I posnum.fun } { Searches a string for a numeric substring. }
- {$I bval.fun } { Extract a number from a string }
- {$I errexit.inc } { Type message; Set ErrorCode on exit. }
- {$I findfld.fun } { Find starting pos for "awk" field in string }
- {$I heapmem.fun } { Some "suggested" mods to GetMem and FreeMem. }
- {$I isatty.fun } { Determines whether input has been redirected. }
- {$I iswild.fun } { Does a string contain either "*" or "?" }
- {$I lcase.fun } { Changes all upper case chars in string to lower }
-
-
- { Special functions: These functions are unique to SORT and have
- questionable utility outside of this package. }
-
- {$I btsort.inc } { Binary tree manipulation, output included }
- {$I sortargs.inc } { Handle option switches from command line. }
- {$I sorthlp.fun } { Displays the Sort Help Message. }
- {$I stdinhdr.inc } { Prompt user for data if input not redirected }
-
-
- { Process1file: is what it is all about. The text variable "fi" has
- been previously assigned to same named file. This procedure starts
- from the beginning of the file, reads each line and stores it into the
- binary tree structure until no more lines can be read. The file "fi" is
- closed when we are done. }
-
- procedure process1file; begin reset(fi);
- while not eof(fi) do begin readln(fi,s); storeln(s); end;
- close(fi); end;
-
-
- { MAIN: Most of this is abstracted from studies that I have made concerning
- a standard method of handling multiple-arguments and filenames in
- standard DOS filters. The general approach to decompose the large task
- of merging multiple files into a series of single file tasks. This
- skeleton can be modified to handle arguments in another manner. }
-
- begin { sort main }
-
- { SORT title line: If there are no arguments on the command line and
- standard input has not been redirected from a file, then we assume
- that the user may not be completely certain as the proper method of
- using SORT. The program provides a little message that indicates
- how further help may be obtained. The message is not sent to
- standard output; the user will not be incovenienced if he really
- does know what he doing. }
-
- if (paramcount=0) and isatty(0) then begin
- assign(fe,'CON'); rewrite(fe); writeln(fe,
- ' SORT: Copyright 1988,1989, by J. W. Rider, use "SORT /h" for help.');
- close(fe); end;
-
-
- { SORT INITialization: Initializing variables in this manner is
- time-consuming, but the cost is trivial for sorting files of even
- moderate size. My goal for the final program is to have these
- variables as typed constants. }
-
- sortinit;
-
-
- { Get command line ARGUMENTS: This version of SORT requires all option
- switches be positioned before any file names. Once filenames have
- started being read, no options can be changed. }
-
- arguments;
-
-
- { Needs Help?: If the user specifies that help is desired or if an error
- is made in the command line option switches, then just list the help
- page and quit the program without error. }
-
- if helponly then begin helpmsg; close(output); close(fi); exit; end;
-
-
- { Key fields: If the user has not specified a subset of cols for the
- key, use the whole line. }
-
- if keycol=0 then keycol:=1; if keycol2=0 then keycol2:=255;
-
-
- { No file names: If not input files are specified, use standard input
- as the source }
-
- if parmcount>paramcount then begin
-
- { If input has not be redirected, provide a little more instruction
- on how to get the sort to work correctly. In any case, just handle
- standard input like it was any other file. }
-
- stdinhdr; assign(fi,''); process1file; end
-
-
- { otherwise merge in each file listed on the command line }
-
- else for i:=parmcount to paramcount do
-
- { Use standard input if the command line filename is "-". }
- if paramstr(i)='-' then begin stdinhdr; assign(fi,''); process1file; end
-
- { Otherwise, open each file individually. }
- else begin
-
- { get complete file name and extension for entry }
- fstr:=fexpand(paramstr(i)); fsplit(fstr,d,n,x);
-
- { If a directory is referenced, merge all included files }
- if (n='') and (x='') then fstr:=fstr+'*.*';
-
- { Search for all reasonable files. Be sure to include
- directories. }
- findfirst(fstr,directory+readonly+archive,sr);
-
- { My preference for SORT was to ignore any attempt
- by the user to sort non-existant files. (This could
- be modified to detect such attempts. I just decided that
- there was little that my program could tell me about what
- files I wanted to sort.) }
- while doserror=0 do begin
- assign(fi,d+sr.name);
- if (sr.attr and directory)<>0 then
-
- { Search subdirectories only if they are specifically
- named. Do not perform recursive subdir searches. }
- if not iswild(fstr) then begin
-
- { This time through, it is safe to ignore directories }
- fstr:=fstr+'\*.*';
- fsplit(fstr,d,n,x); findfirst(fstr,readonly+archive,sr);
- while doserror=0 do begin assign(fi,d+sr.name);
- process1file; findnext(sr); end; end
-
- { Ignore ambiguous directories }
- else findnext(sr)
-
- { Merge all non-directory files found. }
- else begin process1file; findnext(sr); end; end end;
-
-
- { after all files have been read, write the sorted tree out }
-
- retrieveln; close(output); { IMPORTANT!: close output before exit }
-
- { If the program is unable to guarantee that the output has been correctly
- sorted, an message is generated to the console and a DOS error return is
- invoked. At worse, the output will be "partially" sorted. (Whatever
- *that* might mean.) }
-
- if sorterror then errexit('Output may not be completely sorted.');
-
- end. {program sort}
-