home *** CD-ROM | disk | FTP | other *** search
-
-
- WildFile for MS-DOS systems
-
- A *IX SH style file globber written in C
- V1.13 Dedicated to the Public Domain
-
- April 22, 1991
- J. Kercheval
- [72450,3702] -- johnk@wrq.com
-
-
-
- 05-07-91
-
- This is V1.13 of Wildfile a *IX SH style file Globber.
-
- The purpose of this code is to enable the programmer to allow *real*
- wildcard file specification. The UNIX (*IX) SH style wildcard system
- has been around for decades and there is absolutely no good reason for
- it's lack of presence within MS/PC DOS and its associated tools.
-
- I submit this without copyright and with the clear understanding that
- this code may be used by anyone, for any reason, with any modifications
- and without any guarantees, warrantee or statements of usability of any
- sort.
-
-
- jbk
-
-
-
- *IX SH style wildcard file globbing
- ===================================
-
- The unix style of wildcard globbing (matching files to a wildcard
- specification) is quite a bit more flexible than the standard
- approach seen on the MS\PC DOS machines. The full power of *IX SH
- style regular expressions are allowed to specify a file name. For
- instance:
- "*t*" would match to the filenames test.doc, wet.goo,
- itsy.bib, foo.tic, etc.
- "th?[a-eg]." would match to any file without an extension,
- whose first two letters were "th", with any third
- letter and whose last letter was a,b,c,d,e or g.
- (ie. thug, thod, thud, etc.)
- "*" would match all filenames.
-
- The regular expression syntax is described in detail in the source
- code and below.
-
-
- Implementation
- ==============
-
- The implementation of the wildcard package is similar in type to the
- standard MS/PC DOS function calls for file searches. There is a
- find_firstfile call which begins a search initially and a
- find_nextfile call which continues a previous search. This approach
- will normally yeild a very quick port from existing *standard*
- implementations of wildcard file searching.
-
- The include file WILDFILE.H does a good job of describing the
- specifics required here.
-
-
- WD
- ==
-
- WD is a very quick implementation of a directory lister to try to
- show the usage of the wildfile module as intended. The program is a
- fully functional program complete with usage messages and command
- line argument parsing.
-
-
- Languages
- =========
-
- WILDFILE (and its associated module MATCH) were developed and
- compiled using both MicroSoft C V6.00A and Borland C++.
-
-
-
- ============================================================================
-
-
- 05-07-91
-
- Wildfile uses MATCH V1.10 modified for the specific nits required
- the MS/PC DOS environment. The documentation for MATCH is included
- for convenience.
-
- jbk
-
-
- ============================================================================
-
-
- MATCH110
-
-
- REGEX Globber (Wild Card Matching)
-
- A *IX SH style pattern matcher written in C
- V1.10 Dedicated to the Public Domain
-
- March 12, 1991
- J. Kercheval
- [72450,3702] -- johnk@wrq.com
-
-
-
-
- *IX SH style Regular Expressions
- ================================
-
- The *IX command SH is a working shell similar in feel to the MSDOS
- shell COMMAND.COM. In point of fact much of what we see in our
- familiar DOS PROMPT was gleaned from the early UNIX shells available
- for many of machines the people involved in the computing arena had
- at the time of the development of DOS and it's much maligned
- precursor CP/M (although the UNIX shells were and are much more
- flexible and powerful then those on the current flock of micro
- machines). The designers of DOS and CP/M did some fairly strange
- things with their command processor and OS. One of those things was
- to only selectively adopt the regular expressions allowed within the
- *IX shells. Only '?' and '*' were allowed in filenames and even with
- these the '*' was allowed only at the end of a pattern and in fact
- when used to specify the filename the '*' did not apply to extension.
- This gave rise to the all too common expression "*.*".
-
- REGEX Globber is a SH pattern matcher. This allows such
- specifications as *75.zip or * (equivelant to *.* in DOS lingo).
- Expressions such as [a-e]*t would fit the name "apple.crt" or
- "catspaw.bat" or "elegant". This allows considerably wider
- flexibility in file specification, general parsing or any other
- circumstance in which this type of pattern matching is wanted.
-
- A match would mean that the entire string TEXT is used up in matching
- the PATTERN and conversely the matched TEXT uses up the entire
- PATTERN.
-
- In the specified pattern string:
- `*' matches any sequence of characters (zero or more)
- `?' matches any character
- `\' suppresses syntactic significance of a special character
- [SET] matches any character in the specified set,
- [!SET] or [^SET] matches any character not in the specified set.
-
- A set is composed of characters or ranges; a range looks like
- 'character hyphen character' (as in 0-9 or A-Z). [0-9a-zA-Z_] is the
- minimal set of characters allowed in the [..] pattern construct.
- Other characters are allowed (ie. 8 bit characters) if your system
- will support them (it almost certainly will).
-
- To suppress the special syntactic significance of any of `[]*?!^-\',
- and match the character exactly, precede it with a `\'.
-
- To view several examples of good and bad patterns and text see the
- output of MATCHTST.BAT
-
-
-
- MATCH() and MATCHE()
- ====================
-
- The match module as written has two parsing routines, one is matche()
- and the other is match(). Since match() is a call to matche() which
- simply has its output mapped to a BOOLEAN value (ie TRUE if pattern
- matches or FALSE otherwise), I will concentrate my explanations here
- on matche().
-
- The purpose of matche() is to match a pattern against a string of
- text (usually a file name or specification). The match routine has
- extensive pattern validity checking built into it as part of the
- parser and allows for a robust pattern match.
-
- The parser gives an error code on return of type int. The error code
- will be one of the the following defined values (defined in match.h):
-
- MATCH_PATTERN - bad pattern or misformed pattern
- MATCH_LITERAL - match failed on character match (standard
- character)
- MATCH_RANGE - match failure on character range ([..] construct)
- MATCH_ABORT - premature end of text string (pattern longer
- than text string)
- MATCH_END - premature end of pattern string (text longer
- than pattern called for)
- MATCH_VALID - valid match using pattern
-
- The functions are declared as follows:
-
- BOOLEAN match (char *pattern, char *text);
-
- int matche(register char *pattern, register char *text);
-
-
-
- IS_VALID_PATTERN() and IS_PATTERN()
- ===================================
-
- There are two routines for determining properties of a pattern
- string. The first, is_pattern(), is designed simply to determine if
- some character exists within the text which is consistent with a SH
- regular expression (this function returns TRUE if so and FALSE if
- not). The second, is_valid_pattern() is designed to check the
- validity of a given pattern string (TRUE return if valid, FALSE if
- not). By 'validity', I mean well formed or syntactically correct.
-
- In addition, is_valid_pattern() has as one of it's parameters a
- return code for determining the type of error found in the pattern if
- one exists. The error codes are as follows and defined in match.h:
-
- PATTERN_VALID - pattern is well formed
- PATTERN_ESC - pattern has invalid literal escape ('\' at end of
- pattern)
- PATTERN_RANGE - [..] construct has a no end range in a '-' pair
- (ie [a-])
- PATTERN_CLOSE - [..] construct has no end bracket (ie [abc-g )
- PATTERN_EMPTY - [..] construct is empty (ie [])
-
- The functions are declared as follows:
-
- BOOLEAN is_valid_pattern (char *pattern, int *error_type);
-
- BOOLEAN is_pattern (char *pattern);
-