home *** CD-ROM | disk | FTP | other *** search
Text File | 1988-09-11 | 36.2 KB | 1,274 lines |
- Subject: v13i073: Split file based on patterns or line numbers
- Newsgroups: comp.sources.unix
- Sender: sources
- Approved: rsalz@uunet.UU.NET
-
- Submitted-by: Gary Puckering <cognos!garyp>
- Posting-number: Volume 13, Issue 73
- Archive-name: slice
-
- Slice splits up a file into lots of little files. It reads its input
- a line at a time, and starts a new output file when
-
- * the input line matches a pattern, or
- * there have been n lines written to the current output file.
-
- You can use it to split a mailbox or an archive of news articles into
- one article per file, for example. In fact, you can do this with
- about 5 lines of awk, but you run into problems with long lines (and
- speed, if it bothers you!).
-
- Slice was originally contributed by Russell Quinn as the program
- "mailsplit". It was subsequently revised by me and posted to
- net.sources. A patch for System V was also issued. Since then, I've
- made quite a few changes, enough so that this version is not 100%
- compatible with the net.sources versions. Also, I've not retested
- this version on System V, so there may be some problems.
-
- Slice allows multiple output formats to be specified (rather than
- multiple input files). This makes it possible to deposit the pieces
- (slices!) into files named whatever your want. For example:
-
- slice -f article -x -r README '^--* [Cc]ut' article.sh
-
- will deposit everything up to the cut line into README and everything
- after it into article.sh. The -x option causes the matched line to be
- excluded. The -r option specifies the destination of rejected lines
- (which in this example includes all lines appearing before the cut line).
-
- There are even options to make slicing mailboxes and files containing
- shell scripts easier (-m and -s). The mailbox option enables you to
- easily reorder messages in your mailbox into proper date sequence:
- just slice the mailbox with -m then cat the resulting files back
- together. This neat trick is accomplished by the use of substitution
- parameters in the output pattern. If an output pattern contains, say,
- a #3 then the 3rd token from each matched line is substituted into the
- file name. Thus, to split a mailbox the default output pattern causes
- each message to be put in a file whose name is constructed from the
- date and time on the "From" line.
-
- There are some good examples in the man page.
-
- Source, Makefile and manual entry enclosed. To install, do the
- following:
-
- 1: Edit the Makefile: you'll need to alter the "R=/usr/local" if
- you don't want slice to live in /usr/local/usr/bin.
-
- 2: make slice
-
- 3: have a play with it & satisfy yourself that it behaves reasonably
-
- 4: make install
-
- Make "install" will do a "$(MAKE) $(CLEAN)" afterwards. If you don't
- want to remove the binary, say
-
- CLEAN="" make install
-
- at step 4.
-
-
- --------------------- cut here ----------------------------------------
- #!/bin/sh
- # This is a shell archive, meaning:
- # 1. Remove everything above the #!/bin/sh line.
- # 2. Save the resulting text in a file.
- # 3. Execute the file with /bin/sh (not csh) to create the files:
- # slice.1
- # Makefile
- # opts.h
- # slice.c
- # CHANGES
- # By: Gary Puckering ()
- export PATH; PATH=/bin:$PATH
- echo shar: extracting "'slice.1'" '(10249 characters)'
- if test -f 'slice.1'
- then
- echo shar: over-writing existing file "'slice.1'"
- fi
- sed 's/^X//' << \SHAR_EOF > 'slice.1'
- X.TH SLICE 1L "1987 September 14" "Cognos Inc."
- X.SH NAME
- Xslice \- split a file into pieces, by pattern or every n'th line
- X.SH SYNOPSIS
- X.B slice
- X.tr _-
- X[\|\fB_f\ \fIfilename\fP\|]
- X[\|\fB_i\fP\fIn\fP\|]
- X[\|\fB_a\fP\|]
- X[\|\fB_x\fP\|]
- X[\|\fB_r\fP\ \fIrejectfile\fP\|]
- X[\|\fB_\&m\fP\||\ \fB_\&s\fP\||\ \fB_\&n\fP\fIn\fP\|]
- X[\|\fIformat\fP\ .\|.\|.\|]
- X.LP
- X.sp
- X.B slice
- X[\|\fB_f\ \fIfilename\fP\|]
- X[\|\fB_i\fP\fIn\fP\|]
- X[\|\fB_a\fP\|]
- X[\|\fB_x\fP\|]
- X[\|\fB_r\fP\ \fIrejectfile\fP\|]
- X\ \fB_\&e\ \fIexpression\fP\
- X[\|\fIformat\fP\ .\|.\|.\|]
- X.LP
- X.sp
- X.B slice
- X[\|\fB_f\ \fIfilename\fP\|]
- X[\|\fB_i\fP\fIn\fP\|]
- X[\|\fB_a\fP\|]
- X[\|\fB_x\fP\|]
- X[\|\fB_r\fP\ \fIrejectfile\fP\|]
- X\ \fIexpression\fP\
- X[\|\fIformat\fP\ .\|.\|.\|]
- X.tr
- X.SH DESCRIPTION
- X.I Slice
- Xsplits a file into pieces (called \fIslices\fR) which are deposited
- Xinto one or more output files. The output files are named according
- Xto the \fIformat\fR strings provided. The input file is split
- Xwhenever a pattern is matched or every \fIn\fR lines, depending on the
- Xoptions selected. Because some of the options are mutually exclusive,
- Xthere are three forms of the command.
- X.LP
- XWhenever a pattern match is used to slice the file, lines occurring
- Xbefore the first match are sent to the \fIreject\fR file (which is
- X\fI/dev/null\fR by default). Lines are also rejected when an output file
- Xcannot be opened, for whatever reason.
- X.LP
- X.SH OPTIONS
- X.LP
- X.IP "\fB-f\fR \fIfilename\fR"
- XInput for \fIslice\fR is taken from the named file rather than \fIstdin\fR.
- X.IP "\fB-i\fP\fIn\fP"
- XThe starting number for numbering output files generated by formats
- Xcontaining "#\&n" (see \fISubstitutions\fR below). The default
- Xstarting number is one.
- X.IP "\fB-a\fR"
- XCauses the file to be split \fBafter\fR the line matching the pattern,
- Xrather than before (as is normally the case). This option will not
- Xwork properly if an output format contains the substitution parameters
- X"#\&1", "#\&2", etc. (See \fIformat\fR below).
- X.IP "\fB-x\fR"
- XCauses the matched line to be excluded from the output files. Handy
- Xfor eliminating cut lines, etc.
- X.IP "\fB-m\fR"
- XUses the pattern "^From " to split the file. This is convenient for
- Xbreaking up a mailbox file. It also changes the default output
- Xpattern so that the output files are named according to the date and
- Xtime of each message. This makes it easy to sort the messages in a
- Xmailbox into chronological order \- just slice the box and cat the
- Xresulting files back together again.
- X.IP "\fB-s\fR"
- XUses the pattern "^#\&!\ */bin/sh" to split the file. This is ideal for
- Xbreaking up mail or news article containing a Bourne shell script.
- X.IP "\fB-n\fIn\fR"
- XSplit the file every \fIn\fR lines. In this case, no pattern matching
- Xis performed. This is the behaviour of \fIsplit (1)\fR,
- Xexcept that the default output filename format for
- X\fIslice\fR is different.
- X.IP "\fB-r\fR \fIrejectfile\fR"
- XEstablishes the name of a file to which rejected lines are sent
- X(default is \fI/dev/null\fR). Lines appearing before the first matched
- Xpattern are rejected. Also, any time an output file cannot be opened,
- Xsource lines are sent to the reject file instead. Substitution
- Xparameters are not allowed in the reject file name, but a leading "+"
- Xis allowed (to indicated that the reject file should be opened for append).
- XThe name \fIstderr\fR can also be used for the standard error file.
- X.IP "\fB-e\fR \fIexpression\fR"
- XUses the pattern specified by \fIexpression\fR to identify where the
- Xfile is to be split. The pattern may contain newlines (which match
- Xthemselves). See
- X.I ed (1)
- Xfor details of the regular expressions.
- X.IP \fIformat\fR
- XAll three forms of the command allow the specification of zero or more
- X\fIformat\fR strings as non-flag parameters. Formats are used to
- Xgenerate output filenames. You can provide as many as you like.
- XEvery time a split is required, a format is used to generate the file
- Xname. If the name is the same as the last name generated from that
- Xformat, a new format is selected. If \fIslice\fR runs out of formats,
- Xa warning will be issued (unless the last output file is \fIstdout\fR)
- Xand the remaining lines will be sent to the reject file. The format
- Xstrings can contain substitution parameters.
- X.SH SUBSTITUTIONS
- X.IP "#\&f"
- XThe name of the input file, less any leading pathname, is substituted.
- XIf the input file is \fIstdin\fR, the name \fIslice\fR is used.
- X.IP "#\&n"
- XA unique number, beginning with the value specified in the \fB-i\fR
- Xoption, is substituted. This number is incremented by 1 for each
- Xoutput file produced by the current output format. When an output
- Xformat produces the same name twice, a new format is selected and
- Xnumbering begins again with the initial value.
- X.IP "#\&1, #\&2 ..."
- XParameters of the form #\&1, #\&2, ... #\&9 are replaced by corresponding
- Xtokens drawn from the source line which matched the slice pattern.
- XFor example, if each procedure in a C program began with a comment
- Xline of the following form:
- X.sp
- X\ \ \ \ /\&* Procedure:\ \ \ main */
- X.br
- X\ \ \ \ #\&1\ \ \ #\&2\ \ \ \ \ \ \ \ \ \ \ #\&3\ \ #\&4
- X.IP
- Xthen parameter #\&3 could be used to extract the procedure name (in
- Xthis case, "main"). If #\&3 appears in an output format, it will be
- Xreplaced by this value.
- X.IP "#$"
- XThis parameter is used to select the last non-blank parameter on the
- Xmatched line. For example, if the matched line were:
- X.sp
- X\ \ \ \ \From garyp@cognos Tue Sep 15 15:08:23 EDT 1987
- X.sp
- Xthen "#$" would select "1987", the last token on the line.
- X.SH FORMAT SPEC's
- X.LP
- XSubstitution parameters can be followed by an optional
- X.I printf
- Xstyle format specification. They can be used to control the width of
- Xthe field and formatting of a numeric token into a string.
- X.LP
- XIf a numeric format like %d is used, the token is assumed to contain
- Xan integer value and the function \fIatoi\fR is used to convert it.
- XThe resulting value is then formatted with \fIsprintf\fR using the
- Xformat code.
- X.LP
- XFor example, the output format zzz.#\&n%02d will produce files of the
- Xform zzz.01, zzz.02, zzz.03, etc. The #\&n is replaced with the current
- Xcount value, which is then formatted by %02d (signed decimal integer,
- X2 digits wide, print leading zeros).
- X.LP
- XFloating point format codes ("f", "e" and "g") are not allowed, nor is the
- Xsingle-character code "c".
- X.LP
- XThere is a special format code (not one of the usual \fIprintf\fR set)
- Xfor conversion of 3 character month names to numbers. The code "m"
- Xcan be used to convert "Jan", "Feb", ... "Dec" to the numbers 1, 2,
- X\&... 12. The numeric month value is then formatted as if the format
- Xstring were of the %d type. For example, the code %02m will result in
- Xthe month name token "Aug" being transformed into the string "08".
- X.SH DEFAULT FORMATS
- XIf no output formats are supplied, the following default format is used:
- X.sp .5v
- X\ \ \ \ #\&f:#\&n%03d
- X.sp .5v
- XThis results in files having names like \fIfilename\fR:001,
- X\fIfilename\fR:002, \fIfilename\fR:003 and so on.
- XThe default format was chosen because the resulting files are listed
- Xin numerical order by
- X.I ls
- Xor by
- X.I echo *
- Xwhich is sometimes useful.
- X.LP
- XIf the \fB-m\fR option is used, the default output format is:
- X.sp .5v
- X\ \ \ \ #\&f:#\&$-#\&4%02m-#\&5%02d.#\&6
- X.sp .5v
- Xwhich results in files have names like \fIfilename\fR:1987-01-25.08:15:30.
- XThe date and time are taken from the "From" line of each message in
- Xthe mailbox.
- X.LP
- XYou can take advantage of this naming convention to reassemble the
- Xmessages in chronological order by \fIcat\fR'ing the files back
- Xtogether again using the command:
- X.sp
- X\ \ \ \ cat \fIfilename\fR:*\ \ >\fImailbox.new\fR
- X.SH SPECIAL CONVENTIONS
- X.LP
- XNormally, \fIslice\fR opens each output file for "write". By
- Xprepending an output format string with "+" (a plus sign) the
- Xdesignated output file will be opened for "append". Thus, a filename
- Xcan appear more than once.
- X.LP
- XIf a format string is simply a "+", then \fIstdout\fR is
- Xused. This is handy for piping one of the pieces to another
- Xcommand.
- X.LP
- XThe special format string "@" is used as an abbreviation for
- X/dev/null. This is handy for throwing away slices.
- X.LP
- X.SH EXAMPLES
- XSplit up a mail folder into a series of files, one for each message,
- Xnamed after the folder and date/time of each message.
- X.sp
- X slice -m -f FOLDER
- X.sp
- XPipe a news article containing a shell script to slice, saving
- Xeverything before the line "#!/bin/sh" line to a file named HEADER and
- Xpiping the script portion to \fIsh\fR to unshar it:
- X.sp
- X | slice -s -r HEADER +\ \ | sh
- X.sp
- XSplit \fIstdin\fR at every line of dashes into the files 0, 1, 2, etc.:
- X.sp
- X cat anyfile | slice -i0\ \ "^--* *$"\ \ #\&n
- X.sp
- XPipe the middle portion of a file to sh, keeping the head and tail in a
- Xfile called README. Exclude the cut lines:
- X.sp
- X slice -f myfile -x -r README "^--* [Cc]ut "\ \ +\ \ +README\ \ | sh
- X.sp
- XKeep the middle portion of a file, discarding the head and tail:
- X.sp
- X slice -f myfile -r @ "^--* [Cc]ut "\ \ middle @
- X.sp
- XSplit a C program which has an identifying comment before each
- Xprocedure into separate files for each procedure (named after the
- Xprocedure):
- X.sp
- X slice -f program.c "/[*] *Procedure: *.* *[*]/"\ \ #\&3.c
- X.sp
- X.SH BUGS
- X.IP a) 4
- XWatch out for filename expansion by the shell. This could cause
- X\fIslice\fR to interpret the extra filenames as output formats causing
- Xfiles to be overwritten. To be safe, do "set noclobber" in csh.
- X.IP b) 4
- XWhen the -a option is used and an output format contains token
- Xsubstitutions (#\&1, #\&2, etc.) the wrong filenames are generated. To
- Xgenerate the correct filenames, either slice has to lookahead to find
- Xthe next match line or it has to direct lines for the current slice
- Xinto a temporary file until it finds the line matching the pattern.
- X.SH DIAGNOSTICS
- X``Internal Error'' indicates a bug in \fIslice\fR, and should be reported.
- XExit status 1 indicates an error parsing options \- for example, if an unknown
- Xflag was used.
- XExit status 2 indicates a meaningless combination was detected and rejected.
- XExit status 3 indicates a run-time problem \- for example, if a file couldn't
- Xbe opened.
- X.LP
- XIf a reject file is not provided, a count of rejected lines is reported.
- X.SH "SEE ALSO"
- X.I cat (1),
- X.I ed (1),
- X.I ls (1),
- X.I mail (1),
- X.I split (1),
- X.I mailsplit (1L),
- X.I atoi (3),
- X.I printf (3).
- SHAR_EOF
- if test 10249 -ne "`wc -c 'slice.1'`"
- then
- echo shar: error transmitting "'slice.1'" '(should have been 10249 characters)'
- fi
- echo shar: extracting "'Makefile'" '(1301 characters)'
- if test -f 'Makefile'
- then
- echo shar: over-writing existing file "'Makefile'"
- fi
- sed 's/^X//' << \SHAR_EOF > 'Makefile'
- X# Makefile for slice
- X#
- X# Originally contributed at mailsplit, written by:
- X# R E Quin, October 1986 University of Warwick (UK) Computer Science
- X# warwick!req +44 203 523193
- X#
- X# Modified and recontributed by:
- X# Gary Puckering 3755 Riverside Dr.
- X# Cognos Incorporated Ottawa, Ontario
- X# (613) 738-1440 CANADA K1G 3N3
- X#
- X# This makefile is intended for the sys5 Augmented make.
- X#
- XMAKE=make
- XCLEAN=clean
- XCC=cc
- X
- X# R is the root of the filesystem -- i.e. where to install things.
- X# PROG is what to make; BINDIR is where to put it.
- X# The binaries are installed in $R/$(BINDIR).
- XR=/usr/local
- XBINDIR=$R/bin
- XMANDIR=$R/man/man1
- XPROG=slice
- XMANPAGE=slice.1
- X
- X# HACKS are for -DBUGFIX style things.
- XHACKS=
- X
- X# For BSD 4.2 or 4.3
- XUSG=
- X
- X# For USG System V version
- X# USG=-DUSG -lPW
- X
- XCFLAGS=-O $(HACKS) $(USG)
- X
- X
- X# "make install " does a $(MAKE) $(CLEAN) at the end, so you can say
- X# CLEAN= make -e install
- X# if you don't want to remove the garbage at the end, for example.
- X# This is useful primarily for testing the install: entry!
- X
- Xall: $(PROG)
- X
- Xslice: opts.h slice.o
- X $(CC) $(CFLAGS) -o $(PROG) slice.o
- X
- Xinstall: slice slice.1
- X chmod a+x $(PROG)
- X /bin/cp $(PROG) $(BINDIR)
- X chmod a+r $(MANPAGE)
- X /bin/cp $(MANPAGE) $(MANDIR)
- X $(MAKE) $(CLEAN)
- X
- Xclean:
- X rm -rf core *.o $(PROG) a.out
- SHAR_EOF
- if test 1301 -ne "`wc -c 'Makefile'`"
- then
- echo shar: error transmitting "'Makefile'" '(should have been 1301 characters)'
- fi
- echo shar: extracting "'opts.h'" '(1008 characters)'
- if test -f 'opts.h'
- then
- echo shar: over-writing existing file "'opts.h'"
- fi
- sed 's/^X//' << \SHAR_EOF > 'opts.h'
- X
- X#define FALSE 0
- X#define TRUE 1
- Xtypedef int bool;
- X
- X#define EXIT_SYNTAX 1 /* syntax error parsing commandline options */
- X#define EXIT_SEMANT 2 /* options are correct but meaningless */
- X#define EXIT_RUNERR 3 /* error opening a file, for example */
- X#define EXIT_INTERN 4 /* internal error -- bug!! */
- X
- X#define nextstr(s,count,array,failure) \
- X {if (((count)<2) && !((array)[0][1])) {failure;}\
- X else {if ((array)[0][1]) { s = &((array)[0][1]); } \
- X else {s = array[1]; --count; array++;}}}
- X
- X#define DFLTNAME "slice" /* input filename (for stdin) */
- X#define DFLTREJECT "/dev/null" /* default reject file */
- X#define BUFLEN BUFSIZ /* the maximum length of an input line (incl. "\n\0") */
- X#define MAXFILENAMELEN BUFSIZ /* longer than the longest possible file name */
- X#define DFLTOUTNAME "#f:#n%03d" /* o/p file name format */
- X#define MBOXFORMAT "+#f:#$-#4%02m-#5%02d.#6" /* default for mailbox */
- X#define MAXPARM 9 /* maximum parm index value */
- X#define PARMESCAPE '#' /* parameter escape character */
- SHAR_EOF
- if test 1008 -ne "`wc -c 'opts.h'`"
- then
- echo shar: error transmitting "'opts.h'" '(should have been 1008 characters)'
- fi
- echo shar: extracting "'slice.c'" '(15915 characters)'
- if test -f 'slice.c'
- then
- echo shar: over-writing existing file "'slice.c'"
- fi
- sed 's/^X//' << \SHAR_EOF > 'slice.c'
- X/*
- X * Program name: slice
- X *
- X * Copyright (c) 1987 by Gary Puckering
- X *
- X * This program may be freely used and/or modified,
- X * with the following provisions:
- X * 1. This notice and the above copyright notice must remain intact.
- X * 2. Neither this program, nor any modification of it,
- X * may be sold for profit without written consent of the author.
- X *
- X * -----------------------------------------------------------------
- X *
- X * This program allows you to cut a file into pieces, either at every
- X * n lines (like fsplit) or based on a pattern match. Slices are sent
- X * to output files, which are named by providing a format string. A
- X * file name may be a constant, or may contain substitution parameters,
- X * such as #f for the input file name or #1, #2, ... #9 for tokens
- X * 1 through 9 on the line matching the pattern.
- X *
- X * -----------------------------------------------------------------
- X */
- X
- Xchar version[] = "@(#)slice.c 2.4";
- X
- X#include <stdio.h>
- X#include <ctype.h>
- X#include <string.h>
- X#include <sys/file.h>
- X
- X#include "opts.h" /* defines nextstr() etc */
- X
- Xbool exclude = FALSE; /* exclude matched line from o/p files */
- Xbool split_after = FALSE; /* split after matched line */
- Xbool m_flag = FALSE; /* was -m option used */
- X
- XFILE *output = (FILE *) NULL; /* fd of current output file */
- XFILE *rejectfd = (FILE *) NULL; /* fd of reject file */
- X
- Xint n_format; /* number of format strings */
- Xint filenumber = 0; /* #n substitution for each file */
- Xint every_n_lines = 0; /* split every n lines */
- Xint rejectcnt = 0; /* count of rejected lines */
- X
- Xchar inbuffer[BUFLEN]; /* input buffer */
- Xchar *progname = "slice"; /* for error messages */
- Xchar *pattern = (char *) NULL; /* reg expr used to split file */
- Xchar **format; /* ptr for format strings */
- Xchar *defaultfmt[] = {DFLTOUTNAME}; /* default format string */
- Xchar *mboxformat[] = {MBOXFORMAT}; /* default format for mailboxes */
- Xchar parmbuf[BUFLEN]; /* parameter buffer */
- Xchar *parm[MAXPARM+1]; /* array of pointers to parms */
- Xchar nullstring[1] = {""}; /* a null string */
- Xchar *infile = (char *) NULL; /* input file name */
- Xchar rejectfile[MAXFILENAMELEN+2] = {DFLTREJECT}; /* reject file name */
- X
- X/* forward declarations */
- XFILE * openfile();
- XFILE * mkreject();
- Xchar * mkname();
- Xchar * rmpath();
- Xchar getfmt();
- X
- X
- Xmain(argc, argv)
- X char *argv[];
- X{
- X /* split files at points that match a given pattern */
- X
- X /* initialise things */
- X char *buffer;
- X int i;
- X int getnum(); /* does more checking than atoi */
- X char *rmpath(); /* removes leading pathname from a filename */
- X
- X for (i=0; i<=MAXPARM; i++) parm[i] = nullstring;
- X
- X /* now remove possible leading pathname
- X * (e.g. /usr/bin/slice is to report it's errors as slice
- X */
- X progname = rmpath(argv[0]);
- X
- X
- X while (--argc) {
- X if (**++argv == '-') {
- X switch(*++*argv) {
- X case 'a': { /* split after pattern */
- X split_after = TRUE;
- X break;
- X }
- X case 'e': { /* pattern (expression) */
- X ++argv; argc--;
- X if (argc==0 || !**argv) {
- X error("Pattern after -e missing or null\n");
- X usage(1);
- X }
- X pattern = *argv;
- X break;
- X }
- X case 'm': { /* mailbox pattern */
- X pattern = "^From ";
- X m_flag = TRUE;
- X break;
- X }
- X case 's': { /* shell pattern */
- X pattern = "^#! *\/bin\/sh";
- X break;
- X }
- X case 'n': { /* -n n_lines -- split every n lines */
- X nextstr(buffer,argc,argv,usage(2));
- X every_n_lines = getnum(buffer);
- X if (every_n_lines <= 0) {
- X error("-n: number must be at least 1\n");
- X exit(EXIT_SYNTAX);
- X }
- X break;
- X }
- X case 'f': {
- X ++argv; argc--;
- X if (argc==0 || !**argv) {
- X error("Filename after -f missing or null\n");
- X usage(1);
- X }
- X infile = *argv;
- X break;
- X }
- X case 'r': {
- X ++argv; argc--;
- X if (argc==0 ||!**argv) {
- X error("Filename after -r missing or null\n");
- X usage(1);
- X }
- X strcpy(rejectfile,*argv);
- X break;
- X }
- X case 'i': { /* -i initial_number */
- X nextstr(buffer,argc,argv,usage(2));
- X filenumber = getnum(buffer);
- X if (filenumber < 0) {
- X error("-i must be followed by a positive number\n");
- X exit(EXIT_SYNTAX);
- X }
- X filenumber--; /* needs to be one less to start with */
- X break;
- X }
- X case 'x': { /* exclude matched lines */
- X exclude = TRUE;
- X break;
- X }
- X default: {
- X error("Unknown flag -%c\n", **argv);
- X usage(1);
- X }
- X } /* end switch */
- X } else {
- X if (!pattern) pattern = *argv; /* first non-flag is pattern */
- X else break; /* break while loop */
- X } /* end if */
- X } /* end while */
- X
- X if (!argc) {
- X if (m_flag) {
- X format = mboxformat;
- X } else {
- X format = defaultfmt;
- X }
- X n_format = 1;
- X } else {
- X format = argv;
- X n_format = argc;
- X }
- X
- X#ifdef DEBUG
- X printf("argc=%d\n",argc);
- X printf("format='%s'\n",*format);
- X printf("pattern='%s'\n",pattern);
- X#endif
- X
- X if (!infile) split(stdin, DFLTNAME, pattern);
- X else fsplit(infile, pattern);
- X
- X exit(0);
- X}
- X
- X
- X/* split a file that hasn't been opened yet */
- X
- Xfsplit(name, pat)
- X char *name;
- X char *pat;
- X{
- X FILE *fd;
- X
- X if (!name || !*name) {
- X error("Can't split a file with an empty name\n");
- X usage(2);
- X }
- X
- X if ( (fd = fopen(name, "r")) == NULL) {
- X error("Can't open %s\n", name);
- X return;
- X }
- X
- X (void) split(fd, name, pat);
- X
- X if (fclose(fd) == EOF) { /* something's gone wrong */
- X error("Can't close %s -- giving up\n", name);
- X exit(EXIT_RUNERR);
- X }
- X}
- X
- X
- X/* Split a file that's already been opened */
- X
- Xsplit(input, name, pattern)
- X FILE *input; /* fd of input file */
- X char *name; /* input filename */
- X char *pattern; /* pattern used to split file */
- X{
- X
- X#ifndef USG
- X extern char *re_comp(); /* compile string into automaton */
- X extern int re_exec(); /* try to match string */
- X int reg_status = 0; /* regular expression status */
- X char *errmessage;
- X#define REMATCH 1
- X#define RENOMATCH 0
- X#define REFAULT -1
- X#define match(expr) ((expr) == REMATCH)
- X#else
- X extern char *regcmp(); /* compile string into automaton */
- X extern char *regex(); /* match string with automaton */
- X char *reg_status; /* regular expression status */
- X char *rex;
- X#define match(expr) ((expr) != NULL)
- X#endif
- X
- X char *fname;
- X int line = 0;
- X
- X rejectcnt = 0;
- X
- X if (split_after && exclude) {
- X error("Can't specify both -a and -x\n");
- X usage(2);
- X }
- X
- X if (every_n_lines && exclude) {
- X error("Can't specify both -n and -x\n");
- X usage(2);
- X }
- X
- X if (every_n_lines && split_after) {
- X error("Can't specify both -n and -a\n");
- X usage(2);
- X }
- X
- X if (every_n_lines && pattern) {
- X error("Can't specify both -n and pattern\n");
- X usage(2);
- X }
- X
- X if (!every_n_lines && (!pattern || !*pattern)) {
- X error("Can't match an empty pattern\n");
- X usage(2);
- X }
- X
- X#ifndef USG
- X if (!every_n_lines && (errmessage = re_comp(pattern)) != NULL) {
- X error("Error in pattern <%s>: %s\n", pattern, errmessage);
- X exit(EXIT_RUNERR);
- X }
- X /* errmessage is NULL here */
- X#else
- X if (!every_n_lines && (rex = regcmp(pattern,(char *)0)) == NULL) {
- X error("Erron in pattern <%s>\n", pattern);
- X exit(EXIT_RUNERR);
- X }
- X /* rex is pointer to compiled expression.... */
- X#endif
- X
- X /* if split after mode, open file at start */
- X if (split_after) {
- X fname = mkname(name);
- X output = openfile(fname);
- X}
- X
- X /* the -2 to fgets is because of the null and \n appended */
- X while (fgets(inbuffer, BUFLEN - 2, input) != NULL) {
- X
- X if ((every_n_lines > 0 && (++line == every_n_lines)) || /* nth line */
- X (!every_n_lines &&
- X#ifndef USG
- X ( match(reg_status = re_exec(inbuffer) ) ) ) ) { /* matches pat */
- X#else
- X ( match(reg_status = regex(rex, inbuffer) ) ) ) ) { /* matches pat */
- X#endif
- X
- X if (split_after) putbuff(inbuffer,output);
- X
- X if (!every_n_lines) get_parms(inbuffer);
- X
- X /* close the current file */
- X if (output && output != stderr &&
- X output != stdout &&
- X output != rejectfd) {
- X if (fclose(output) == EOF) {
- X error("Can't close output file\n");
- X exit(EXIT_RUNERR);
- X }
- X }
- X
- X fname = mkname(name);
- X
- X if (*fname) {
- X /* open a new file */
- X output = openfile(fname);
- X } else {
- X /* no filename to open, so use reject file */
- X error("Insufficient formats -- remainder rejected\n");
- X output = (FILE *) NULL;
- X }
- X
- X line = 0; /* reset input line count */
- X
- X /* if matched lines are excluded, skip the putbuff */
- X if (exclude && match(reg_status)) continue;
- X
- X /* if file is to be split after pattern, put already done */
- X if (split_after && match(reg_status)) continue;
- X#ifndef USG
- X } else {
- X if (reg_status == REFAULT) { /* the re_exec failed */
- X error("Internal error trying to match <%s> to <%s>\n",pattern, inbuffer);
- X exit(EXIT_INTERN);
- X }
- X#endif
- X } /* end match pattern test */
- X
- X putbuff(inbuffer, output); /* now put line out */
- X
- X } /* end while */
- X
- X if (rejectcnt && strcmp(rejectfile,"/dev/null")==0) {
- X error("%d lines rejected to /dev/null\n",rejectcnt);
- X }
- X
- X return (filenumber == -1); /* exit status for main */
- X}
- X
- X
- X/* Make a reject file */
- X
- XFILE *
- Xmkreject()
- X{
- X if (rejectfd) return(rejectfd); /* if there's already one, don't bother */
- X
- X if ( strcmp(rejectfile,"stderr")==0 )
- X rejectfd = stderr;
- X else rejectfd = openfile(rejectfile);
- X
- X if (!rejectfd) {
- X error("Cannot open reject file %s\n",rejectfile);
- X exit(EXIT_RUNERR);
- X }
- X
- X return (rejectfd);
- X}
- X
- X
- X/* Open an output file (or the reject file) */
- X
- X/* If filename starts with '+' open for append */
- X/* If filename is '@' use /dev/null */
- X
- XFILE *
- Xopenfile(fname)
- X char *fname; /* file to be opened */
- X{
- X FILE *fd = (FILE *) NULL;
- X bool exists = FALSE;
- X
- X switch (fname[0]) {
- X case '+': {
- X fname++;
- X /* check for output file = input file */
- X if (infile && (strcmp(fname,infile)==0) ) {
- X error("Output file %s same as input file -- slice rejected\n",fname);
- X break;
- X }
- X if (fname[0]==NULL) {
- X fd = stdout;
- X break;
- X } else {
- X if ((fd = fopen(fname, "a")) == NULL) {
- X error("Can't open output file %s for append -- slice rejected\n", fname[1]);
- X break;
- X }
- X }
- X break;
- X }
- X case '@': {
- X fname++;
- X if (fname[0]==NULL) {
- X if ((fd = fopen("/dev/null", "w")) == NULL) {
- X error("Can't open output file /dev/null -- slice rejected\n");
- X break;
- X }
- X break;
- X } else {
- X fname--;
- X /* fall through to process as normal filename */
- X }
- X }
- X default: {
- X if (access(fname,F_OK)==0) exists = TRUE;
- X if ((fd = fopen(fname, "w")) == NULL) {
- X error("Can't open output file %s for write -- slice rejected\n", fname);
- X break;
- X }
- X if (exists && strcmp(fname,"/dev/null")!=0) {
- X error("File %s overwritten\n",fname);
- X }
- X }
- X }
- X return (fd);
- X}
- X
- X
- X/* Make a new file name using a format string */
- X
- Xchar *
- Xmkname(name)
- X char *name; /* file name for #f substitution */
- X{
- X static char fnambuf[MAXFILENAMELEN + 2]; /* +1 for null, +1 for overflow */
- X static char lastname[MAXFILENAMELEN + 2];
- X
- X int i, j;
- X char *p, *q, *fn;
- X char fmtcode;
- X char fmt[MAXFILENAMELEN];
- X char tempbuf[MAXFILENAMELEN];
- X
- X do_format:
- X for (p=(*format), q=fnambuf; *p; p++) {
- X if (*p != PARMESCAPE) {
- X *q = *p;
- X q++;
- X } else {
- X *q = NULL;
- X switch (*++p) {
- X case PARMESCAPE: {
- X *q = PARMESCAPE;
- X q++;
- X break;
- X }
- X case 'f': {
- X fn = rmpath(name);
- X strcat(q,fn);
- X q += strlen(fn);
- X break;
- X }
- X case 'n': {
- X ++filenumber;
- X if (*(p+1) == '%') {
- X p++;
- X fmtcode = getfmt(fmt,p);
- X p += strlen(fmt) - 1;
- X sprintf(tempbuf,fmt,filenumber);
- X } else {
- X sprintf(tempbuf,"%d",filenumber);
- X }
- X strcat(q,tempbuf);
- X q += strlen(tempbuf);
- X break;
- X }
- X case '1':
- X case '2':
- X case '3':
- X case '4':
- X case '5':
- X case '6':
- X case '7':
- X case '8':
- X case '9':
- X case '$': {
- X if (*p == '$') {
- X i = lastparm();
- X } else {
- X i = (*p) - '1';
- X }
- X if (*(p+1) == '%') {
- X p++;
- X fmtcode = getfmt(fmt,p);
- X p += strlen(fmt) - 1;
- X if (fmtcode != 's') {
- X if (fmtcode == 'm') {
- X j = mtoi(parm[i]);
- X } else {
- X j = atoi(parm[i]);
- X }
- X sprintf(tempbuf,fmt,j);
- X } else {
- X sprintf(tempbuf,fmt,parm[i]);
- X }
- X } else {
- X strcpy(tempbuf,parm[i]);
- X }
- X strcat(q,tempbuf);
- X q += strlen(tempbuf);
- X break;
- X }
- X default: {
- X error("Invalid substitution #%c in format '%s'\n",*p,*format);
- X exit(EXIT_RUNERR);
- X }
- X } /* end switch */
- X } /* end if-else */
- X } /* end for */
- X
- X *q = NULL;
- X
- X /* if name is same, try next format */
- X if (strcmp(fnambuf,lastname)==0) {
- X if (n_format>1) { /* must be a format left to try */
- X format++;
- X --n_format;
- X filenumber=0;
- X lastname[0] = NULL;
- X goto do_format;
- X } else { /* we've run out of formats */
- X fnambuf[0] = NULL;
- X }
- X }
- X
- X if (fnambuf[0]) strcpy(lastname,fnambuf);
- X
- X return(fnambuf);
- X
- X} /* end routine */
- X
- X
- X
- X/* Get a printf-style format string from within an output format */
- X
- Xchar
- Xgetfmt(fmt,p) /* returns last character of string (format code) */
- X char *fmt; /* target of format string */
- X char *p; /* p should point to the '%' starting the format */
- X{
- X char *q;
- X char fmtcode;
- X
- X if (*p != '%') {
- X error("Internal error -- getfmt called when not pointing at %");
- X exit(EXIT_RUNERR);
- X }
- X q = strpbrk(p,"mduxhocsfeg"); /* 'm' is a special extension */
- X
- X if (!q) {
- X error("Can't find end of format spec");
- X exit(EXIT_RUNERR);
- X }
- X
- X fmtcode = *q;
- X strncpy(fmt,p,(q-p+1));
- X
- X switch (*q) {
- X case 'h':
- X case 'c':
- X case 'f':
- X case 'e':
- X case 'g': {
- X error("Format '%s' is not supported\n",fmt);
- X exit(EXIT_RUNERR);
- X }
- X case 'm': {
- X fmt[strlen(fmt) - 1] = 'd'; /* change to d */
- X }
- X }
- X
- X return(fmtcode);
- X}
- X
- X
- X
- X/* Write an input line to the output file */
- X/* If the output file isn't open, write it to the reject file */
- X
- Xputbuff(buffer,fd)
- X char *buffer;
- X FILE *fd;
- X{
- X if (fd) fputs(buffer,fd);
- X else {
- X rejectcnt++;
- X fputs( buffer,mkreject() );
- X }
- X}
- X
- X
- X
- X/* getnum(s) returns the value of the unsigned int in s. If there's any
- X * trailing garbage, or the number isn't +ve, we return -1
- X */
- X
- Xgetnum(s)
- X char *s;
- X{
- X register char *p;
- X
- X for (p = s; *p; p++) {
- X if (!isdigit(*p)) {
- X return -1;
- X }
- X }
- X return atoi(s);
- X}
- X
- X
- X
- X/* Remove the leading pathname from a filename */
- X
- Xchar *
- Xrmpath(fullname)
- X char *fullname;
- X{
- X register char *p;
- X char *q = (char *) NULL;
- X
- X for (p = fullname; p && *p; p++) {
- X if (*p == '/')
- X q = ++p;
- X }
- X if (q && *q) {
- X return(q);
- X }
- X return(fullname);
- X}
- X
- X
- X
- X/* Get tokens (parameters) from matched input lines */
- X
- Xget_parms(buffer)
- X char *buffer;
- X{
- X int i,l;
- X
- X strcpy(parmbuf,buffer);
- X l = strlen(parmbuf);
- X if(parmbuf[l-1]=='\n') parmbuf[l-1]=NULL;
- X
- X parm[0]=strtok(parmbuf," ");
- X
- X for (i=1; i<=MAXPARM; i++) {
- X parm[i]=strtok(NULL," ");
- X }
- X
- X for (i=0; i<=MAXPARM; i++) {
- X if (!parm[i]) parm[i] = nullstring;
- X }
- X}
- X
- X
- X
- X/* Find last non-null parameter */
- X
- Xlastparm()
- X{
- X int i;
- X
- X for (i=MAXPARM; i>0; i--) {
- X if (parm[i] && *parm[i]) return(i);
- X }
- X return(0);
- X}
- X
- X
- X
- X/* Convert three character month name to integer */
- X
- Xmtoi(monthname)
- X char *monthname;
- X{
- X static char *months[] = {
- X "Jan", "Feb", "Mar", "Apr", "May", "Jun",
- X "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
- X };
- X
- X int i;
- X
- X for (i=1; i<=12; i++) {
- X if (strcmp(monthname,months[i-1])==0) return(i);
- X }
- X
- X error("Invalid month '%s' found -- zero used",monthname);
- X i=0;
- X return(i);
- X}
- X
- X
- Xerror(fmt, a1, a2, a3, a4)
- X char *fmt;
- X{
- X fputs(progname, stderr);
- X fputs(": ", stderr);
- X fprintf(stderr, fmt, a1, a2, a3, a4);
- X}
- X
- X
- Xusage(status)
- X int status; /* exit if status != 0 */
- X{
- X fprintf(stderr,"Usage: %s [-f filename] [-a|-x] [-i<n>] [-w] [-m|-s|-n<n>] [-r file] [-e expression | expression] [format...]\n", progname);
- X if (status)
- X exit(status);
- X}
- X
- SHAR_EOF
- if test 15915 -ne "`wc -c 'slice.c'`"
- then
- echo shar: error transmitting "'slice.c'" '(should have been 15915 characters)'
- fi
- echo shar: extracting "'CHANGES'" '(2731 characters)'
- if test -f 'CHANGES'
- then
- echo shar: over-writing existing file "'CHANGES'"
- fi
- sed 's/^X//' << \SHAR_EOF > 'CHANGES'
- X
- X------------------------------ CHANGES ------------------------------
- X
- Xv2.3 87/09/20 21:36:52 garyp
- X Make default reject file /dev/null. Report count of lines
- X rejected, unless reject file specified.
- X
- Xv2.2 87/09/15 17:33:33 garyp
- X Fix bug that caused dump when parm was null. Changed mkname so
- X that the path is removed from the input filename. Added #$
- X substitution parameter, to correctly split mailboxes. Changed
- X the default pattern for -m to use #$ instead of #7.
- X
- Xv2.1 87/09/15 00:20:07 garyp
- X New release of slice, now supports substitution parameters.
- X Use of %s and %d replaced by #f and #n respectively.
- X Parameters #1, #2, ... #9 are replaced by tokens from the
- X matched line. Printf-style format (e.g. %02d) may immediately
- X follow any substitution.
- X
- Xv1.12 87/09/10 17:05:04 garyp
- X Reject file now added.
- X
- Xv1.11 87/09/08 23:53:29 garyp
- X Add features "+<filename>" and "@"
- X
- Xv1.10 87/02/12 10:32:50 garyp
- X Minor revision to Sys V changes Added #define match() to reduce
- X number of #ifndef USG Added some comments
- X
- Xv1.9 87/02/09 10:10:27 garyp
- X Revisions to support System V regular expression routines.
- X Provided by Jon A. Tankersley.
- X
- Xv1.8 86/12/12 22:28:40 garyp
- X Add -x and -a options. Improve semantic checking. Fix -n
- X (wasn't working -- caused core dump).
- X
- Xv1.7 86/12/12 11:54:56 garyp
- X Fix problem with multiple '+' formats Switch from 'w' to 'a'
- X (append) access for output files.
- X
- Xv1.6 86/12/11 21:18:08 garyp
- X Cleaned up error status codes. Allowed '+' format to designate
- X stdout.
- X
- Xv1.5 86/12/11 17:29:25 garyp
- X New improved mailsplit with lots of features. Renamed slice.
- X
- Xv1.4 86/12/09 17:03:05 garyp
- X First parameter is now either -m, for mailbox pattern, or -s,
- X for shell pattern, or -n for line count split, or a user
- X specified pattern (not starting with '-'). These options
- X replace -p. The default output filename is now "%s:%03d" where
- X %s is the input filename.
- X
- Xv1.3 86/12/09 12:57:49 garyp
- X Fixed search for %d. All forms now allowed. Only %s allowed,
- X though (no other form seems useful anyway).
- X
- Xv1.2 86/12/09 12:10:09 garyp
- X Changed to allow output pattern to contain %s to represent
- X input filename. However, code only does exact match on %d and
- X %s. It should do a pattern match to allow things like %6d.
- X
- Xv1.1 86/12/08 21:08:12 garyp
- X date and time created 86/12/08 21:08:12 by garyp
- X---------------------------------------------------------------------
- X
- SHAR_EOF
- if test 2731 -ne "`wc -c 'CHANGES'`"
- then
- echo shar: error transmitting "'CHANGES'" '(should have been 2731 characters)'
- fi
- # End of shell archive
- exit 0
-
-
-