home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-09-12 | 35.7 KB | 1,242 lines |
- Newsgroups: comp.sources.unix
- From: tcl@hellfudge.asd.sgi.com (Tom Lawrence)
- Subject: v27i035: encode - utilities encode/decode binary files in ascii format, Part01/01
- Message-id: <1.747861129.4245@gw.home.vix.com>
- Sender: unix-sources-moderator@gw.home.vix.com
- Approved: vixie@gw.home.vix.com
-
- Submitted-By: tcl@hellfudge.asd.sgi.com (Tom Lawrence)
- Posting-Number: Volume 27, Issue 35
- Archive-Name: encode/part01
-
- ----------- What are encode/decode?
-
- Encode and decode are utilities which encode binary data into printable
- format suitable for transmission via email, posting to usenet, etc. They are
- intended to replace the aging uuencode and uudecode.
-
- ----------- Features:
-
- Encode features a very flexible encoding scheme which allows the user to
- specify exactly which printable characters to use in the output. The
- default is to use all 95 printable characters in the encoding process, as
- this produces the least expansion of the input data. However, for cases
- such as file transfer to a mainframe or to a foreign country where some
- characters may be modified en route, these characters can simply be removed
- from the output character set. Encoding is possible with as few as 2
- characters in the output character set.
-
- Regardless of how many characters are specified in the output character set,
- encode only expands the data by a factor very close to the theoretical limit
- for that number of characters. (see next section)
-
- The implementation is simple (less than 500 lines total without comments) and
- efficient (runs at a speed comparable to uuencode/uudecode)
-
- #! /bin/sh
- # This is a shell archive. Remove anything before this line, then unpack
- # it by saving it into a file and typing "sh file". To overwrite existing
- # files, type "sh file -c". You can also feed this as standard input via
- # unshar, or by typing "sh <file", e.g.. If this archive is complete, you
- # will see the following message at the end:
- # "End of archive 1 (of 1)."
- # Contents: MANIFEST Makefile README codes.c codes.h decode.1 decode.c
- # encode.1 encode.c
- # Wrapped by vixie@gw.home.vix.com on Sun Sep 12 12:10:53 1993
- PATH=/bin:/usr/bin:/usr/ucb ; export PATH
- if test -f 'MANIFEST' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'MANIFEST'\"
- else
- echo shar: Extracting \"'MANIFEST'\" \(393 characters\)
- sed "s/^X//" >'MANIFEST' <<'END_OF_FILE'
- X File Name Archive # Description
- X-----------------------------------------------------------
- X MANIFEST 1 This shipping list
- X Makefile 1
- X README 1
- X codes.c 1
- X codes.h 1
- X decode.1 1
- X decode.c 1
- X encode.1 1
- X encode.c 1
- END_OF_FILE
- if test 393 -ne `wc -c <'MANIFEST'`; then
- echo shar: \"'MANIFEST'\" unpacked with wrong size!
- fi
- # end of 'MANIFEST'
- fi
- if test -f 'Makefile' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'Makefile'\"
- else
- echo shar: Extracting \"'Makefile'\" \(601 characters\)
- sed "s/^X//" >'Makefile' <<'END_OF_FILE'
- X# $Header: /d/tcl/src/uutar/RCS/Makefile,v 1.2.1.2 1993/09/10 21:39:24 tcl Exp $
- X
- XCC = cc
- XCFLAGS = -O
- X
- XCOMMON_SRCS = codes.c
- XCOMMON_BINARIES = ${COMMON_SRCS:.c=.o}
- X
- XENCODE_SRCS = encode.c
- XENCODE_BINARIES = ${ENCODE_SRCS:.c=.o}
- X
- XDECODE_SRCS = decode.c
- XDECODE_BINARIES = ${DECODE_SRCS:.c=.o}
- X
- Xdefault: encode decode
- X
- Xencode: $(COMMON_BINARIES) $(ENCODE_BINARIES)
- X $(CC) -o encode $(COMMON_BINARIES) $(ENCODE_BINARIES)
- X
- Xdecode: $(COMMON_BINARIES) $(DECODE_BINARIES)
- X $(CC) -o decode $(COMMON_BINARIES) $(DECODE_BINARIES)
- X
- Xclean:
- X @touch bunk.o bunk~ encode decode
- X /bin/rm -f *.o *~ encode decode
- END_OF_FILE
- if test 601 -ne `wc -c <'Makefile'`; then
- echo shar: \"'Makefile'\" unpacked with wrong size!
- fi
- # end of 'Makefile'
- fi
- if test -f 'README' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'README'\"
- else
- echo shar: Extracting \"'README'\" \(8859 characters\)
- sed "s/^X//" >'README' <<'END_OF_FILE'
- X$Header: /usr/people/tcl/src/uutar/RCS/README,v 1.3 1993/09/12 00:40:52 tcl Exp $
- X
- X----------- What are encode/decode?
- X
- XEncode and decode are utilities which encode binary data into
- Xprintable format suitable for transmission via email, posting to
- Xusenet, etc. They are intended to replace the aging uuencode and
- Xuudecode.
- X
- X----------- Features:
- X
- XEncode features a very flexible encoding scheme which allows the user
- Xto specify exactly which printable characters to use in the output.
- XThe default is to use all 95 printable characters in the encoding
- Xprocess, as this produces the least expansion of the input data.
- XHowever, for cases such as file transfer to a mainframe or to a
- Xforeign country where some characters may be modified en route, these
- Xcharacters can simply be removed from the output character set.
- XEncoding is possible with as few as 2 characters in the output
- Xcharacter set.
- X
- XRegardless of how many characters are specified in the output
- Xcharacter set, encode only expands the data by a factor very close to
- Xthe theoretical limit for that number of characters. (see next
- Xsection)
- X
- XMy implementation is simple (less than 500 lines total without
- Xcomments) and efficient (runs at a speed comparable to
- Xuuencode/uudecode)
- X
- X----------- Some theory on file expansion during encoding:
- X
- XThe number of bits required to encode n distinct values is log2(n)
- X(log base 2 of n). For example, to encode 256 distinct values, you
- Xneed log2(256) = 8 bits. Let's think of the input file before encoding
- Xas a raw stream of bits without byte boundaries. If we want to
- Xrepresent this data with 256 distinct characters, we will consume 8
- Xbits of the input bitstream per output character. This is how files
- Xare normally encoded. However, if we can't use all 256 output
- Xcharacters, we will consume fewer than 8 input bits per output
- Xcharacter, and thus we will require more output characters to
- Xrepresent the input bitstream than if we had 256 output characters.
- XThus, the process of encoding a binary file in printable format will
- Xnecessarily expand the file. For example if we use the 95 printable
- Xcharacters, we'll consume an average of log2(95) = 6.57 bits in the
- Xinput stream for each output character. Thus the file will be expanded
- Xby a factor of log2(256)/log2(95) = log(256)/log(95) = 1.217 or 21.7%.
- XNote that this is a theoretical figure. In practice, we can't
- Xsubdivide bits, but this figure does provide a theoretical estimate of
- Xthe smallest amount of expansion we can hope to get with n output
- Xcharacters. In practice some coding schemes should be able to do
- Xbetter for select cases, but for a very large sample space of random
- Xdata, no encoding scheme should ever be able to do better than this
- Xtheoretical limit.
- X
- XUuencode maps 3 input characters to 4 output characters for an
- Xexpansion of 33% (not including control information). Lately several
- Xencoding schemes which map 4 input characters to 5 output characters
- Xhave popped up, for an expansion of 25%.
- X
- XAn analysis of encode shows that the average expansion over a very
- Xlarge input file of random data is
- X8 / (pb - 2 + 2n/p)
- Xwhere n is the number of output characters, p is the smallest power of
- X2 greater than or equal to n, and pb is log2(p), or the number of bits
- Xneeded to represent p values. A graph of this function for values of n
- Xfrom 2 to 256 shows a very close approximation of the theoretical
- Xexpansion of log(256)/log(n). For example, for n = 95, the expansion
- Xfactor is
- X8 / (7 - 2 + 2*95/128) = 1.234 or 23.4%
- X
- XNote that all expansion factors given above fail to take into account
- Xthe addition of newline characters to limit output width.
- X
- X----------- The encoding process:
- X
- XThe encoding process used by encode is simply to throw away the byte
- Xboundaries in the input bitstream and insert new byte boundaries in
- Xsuch a manner that there are only n distinct "tokens" in the input
- Xstream where n is the number of output characters. These tokens can
- Xthen be mapped one-to-one with the output characters, both during
- Xencoding and decoding. A good example of this process is uuencode,
- Xwhich discards the byte boundaries which occur every 8 bits and
- Xinserts byte boundaries every 6 bits. The result is a series of tokens
- Xwith a maximum of 64 possible values, each of which is mapped
- Xone-to-one with the output character set of 64 printable characters.
- XThis process is trivial for any n which is a power of two, you simply
- Xinsert byte boundaries every log2(n) bits. When n is not a power of 2,
- Xhowever, the process is somewhat more complicated.
- X
- XWe can no longer insert the byte boundaries at regular intervals of b
- Xbits, since this would imply 2^b output characters. If we select b
- Xsuch that 2^b < n, then we aren't using all n output characters, and
- Xwe're expanding the file more than necessary. On the other hand if we
- Xselect b such that 2^b > n, we don't have enough output characters to
- Xencode the data. The solution is to start with the smallest b such
- Xthat 2^b >= n and then eliminate some of the input tokens until there
- Xare exactly n of them, then we can map one-to-one with the output
- Xcharacters. Input tokens can be eliminated by taking two input tokens
- Xand combining them to form a single, shorter token. This is best
- Xexplained by giving an example.
- X
- XLet's say we have 6 output characters. We start with 8 input tokens:
- X000,001,010,011,100,101,110,111
- XThis set of tokens has the property that any input bitstream can
- Xbe broken down to a series of these tokens in exactly one way.
- XNow let's combine two of the tokens. The tokens to be combined must
- Xhave identical bits except for the last bit, and the process of
- Xcombining strips that bit from the tokens. e.g. 110 and 111 can be
- Xcombined into the token 11, so we now have the token set
- X000,001,010,011,100,101,11
- XIf we combine two more tokens, 100 and 101 -> 10, we get
- X000,001,010,011,10,11
- XThis token set still has the property that any input bitstream can be
- Xbroken down into a series of these tokens in exactly one way, and
- Xsince there are 6 of them, we can map one-to-one with the output
- Xcharacter set.
- X
- XThe standard for the generation of these tokens will be as follows:
- XStart with 2^b distinct tokens of length b bits, where b is the
- Xsmallest integer such that 2^b >= n, where n is the number of output
- Xcharacters. Then, as above, while there are more than n tokens of any
- Xlength, replace the two numerically greatest b length tokens with a
- Xsingle b-1 length token such that the b-1 length token is equivalent
- Xto the b-1 most significant bits of either b length token. (It is
- Xasserted that at any time in the procedure, the two numerically
- Xgreatest b length tokens differ only in the least significant bit).
- X
- XThe standard for the one-to-one mapping between tokens and output
- Xcharacters will be as follows: tokens will be sorted such that all b
- Xlength tokens come first, in numerical order, followed by all b-1
- Xlength tokens, in numerical order. Output characters will be sorted by
- Xascii code in numerical order. A one-to-one mapping will be
- Xestablished between these two sets.
- X
- XThe standard for the checksum will be as follows: The checksum will be
- Xcomputed on the decoded data. It will be 32 bits wide. For each
- Xcharacter read from the input file during encoding or written to the
- Xoutput file diring decoding, the checksum will first be rolled 7 bits
- Xto the left (the 7 bits which slide off the MSB end will be reinserted
- Xinto the LSB end) and then the character will be xor'd onto the low
- Xorder 8 bits of the checksum.
- X
- X----------- Implementation:
- X
- XDecoding with this scheme is trivial: you simply map the printable
- Xcharacter from the input to the corresponding variable length token,
- Xand then append that token to the decoded bitstream.
- X
- XEncoding is a bit more tricky however, since the token length is
- Xvariable, and the input bitstream has no token boundaries in it. The
- Xsolution is to set up a 256 element array which is indexed by the next
- X8 bits in the input bitstream. Note that these 8 bits are not
- Xnecessarily byte-aligned in the input file. The indexed element in the
- Xarray will indicate how many bits should be consumed in the input, and
- Xwhat printable character to append to the output. For example, in
- Xorder to recognize the token 010, all elements of the array whose
- Xindex is 010xxxxx for all xxxxx should be set up to indicate that 3
- Xbits were seen and give the printable character that maps to 010. The
- Xinput bitstream will then be advanced by 3 bits and the operation is
- Xrepeated, using the next 8 bits to index the array again.
- X
- XMy implementation of this encoding process is fairly simplistic and
- Xincorporates no more than the basic functionality provided by
- Xuuencode/uudecode. It is intended primarily to introduce this encoding
- Xscheme to the public in the hopes that it will be widely adopted.
- XShould such adoption occur, this file should be used as a standard
- Xreference for the encoding algorithm.
- X
- END_OF_FILE
- if test 8859 -ne `wc -c <'README'`; then
- echo shar: \"'README'\" unpacked with wrong size!
- fi
- # end of 'README'
- fi
- if test -f 'codes.c' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'codes.c'\"
- else
- echo shar: Extracting \"'codes.c'\" \(5195 characters\)
- sed "s/^X//" >'codes.c' <<'END_OF_FILE'
- X/*
- X * $Header: /usr/people/tcl/src/uutar/RCS/codes.c,v 1.1.1.4 1993/09/11 22:42:44 tcl Exp $
- X * Tom Lawrence
- X * tcl@sgi.com
- X */
- X
- X#include <stdio.h>
- X#include <stdlib.h>
- X#include "codes.h"
- X
- X/* see codes.h */
- Xint numchars;
- Xstruct code codes[256];
- X
- X/* initialize a subset of the codes array. val and len define a variable
- X * length bitfield. The array elements will be set up so that any element
- X * of the array whose index is a left-aligned superset of this bitfield
- X * will contain the given output ascii character. E.g. if the bitfield is
- X * 10010, then all elements in the array with subscript 10010xxx for all
- X * xxx, will store the given ascii code, and the length of the bitfield,
- X * in this case 5.
- X */
- Xstatic void
- Xinit_encodeval(codes, val, len, ascii)
- X struct code *codes;
- X int val;
- X int len;
- X int ascii;
- X{
- X int shift, stop;
- X
- X /* determine how far the code must be shifted to be
- X * MSB justified in the byte
- X */
- X shift = 8 - len;
- X
- X /* calculate the upper bound of indices which this bitfield
- X * will match
- X */
- X stop = (val + 1) << shift;
- X
- X /* shift the code over to the left edge of the byte */
- X val <<= shift;
- X
- X /* thus, for every index in the 256 element array which has
- X * this code as a prefix
- X */
- X for(; val < stop; val++) {
- X /* store the code length and the printable character
- X * it represents
- X */
- X codes[val].len = (char)len;
- X codes[val].code = (char)ascii;
- X }
- X}
- X
- X/* convert an ascii character code to an integer. The character code may
- X * be in decimal, hex or octal, or it may be an actual character escaped
- X * with a back-slash
- X */
- Xstatic int
- Xstr2val(str)
- X char *str;
- X{
- X int val;
- X char *end;
- X
- X while(*str == ' ' || *str == '\t')
- X str++;
- X
- X /* check if this is an escaped character */
- X if (*str == '\\') {
- X str++;
- X if (*str == 0) {
- X fprintf(stderr, "missing character in alphabet\n");
- X exit(1);
- X }
- X return((int)*str);
- X }
- X
- X val = (int)strtol(str, &end, 0);
- X if (end == str) {
- X if (*str)
- X fprintf(stderr, "invalid char \'%c\' in alphabet\n", *str);
- X else
- X fprintf(stderr, "empty numerical field in alphabet\n");
- X exit(1);
- X }
- X return(val);
- X}
- X
- X/* parse a range of characters for the output character set and mark each
- X * character as in use in the codes array. A range is either in the form
- X * num-num or just num
- X */
- Xstatic void
- Xparse_charval_range(range)
- X char *range;
- X{
- X char *c, savec = 0;
- X int start, end, x;
- X
- X for(c = range; *c && *c != '-'; c++);
- X savec = *c;
- X *c = 0;
- X
- X start = str2val(range);
- X if (savec == '-') {
- X end = str2val(c + 1);
- X *c = savec;
- X for(x = start; x <= end; x++)
- X codes[x].inuse = 1;
- X }
- X else
- X codes[start].inuse = 1;
- X}
- X
- X/* parse a list of character ranges for the output character set and then
- X * parse each range found. A list is of the form range,range,...
- X */
- Xvoid
- Xparse_charval_list(list)
- X char *list;
- X{
- X char *c1, *c2, savec2;
- X int x;
- X
- X for(x = 0; x < 256; x++)
- X codes[x].inuse = 0;
- X
- X c1 = list;
- X while(*c1) {
- X while(*c1 == ',')
- X c1++;
- X if (*c1 == 0)
- X return;
- X for(c2 = c1; *c2 && *c2 != ','; c2++);
- X savec2 = *c2;
- X *c2 = 0;
- X parse_charval_range(c1);
- X *c2 = savec2;
- X c1 = c2;
- X }
- X return;
- X}
- X
- X/* print out the character set in the form of a list of ranges, encoded
- X * in decimal
- X */
- Xvoid
- Xprint_charval_list(fp)
- X FILE *fp;
- X{
- X int x, usecomma;
- X
- X usecomma = 0;
- X for(x = 0; x < 256; x++) {
- X if (codes[x].inuse) {
- X if (usecomma)
- X putc(',', fp);
- X fprintf(fp, "%d", x);
- X usecomma = 1;
- X if (x < 255 && codes[x+1].inuse) {
- X putc('-', fp);
- X while(++x < 256 && codes[x].inuse);
- X fprintf(fp, "%d", x-1);
- X }
- X }
- X }
- X}
- X
- X/*
- X * Initialize the tables for encoding or decoding depending on the given
- X * direction.
- X */
- Xvoid
- Xinit_codes(direction)
- X int direction;
- X{
- X int x, code, numchars;
- X int pof2, pof2len, half, whole;
- X
- X /* count how big our character set is */
- X numchars = 0;
- X for(x = 0; x < 256; x++)
- X if (codes[x].inuse)
- X numchars++;
- X
- X if (numchars < 2) {
- X fprintf(stderr,
- X "uutar: alphabet doesn't contain enough characters.\n");
- X exit(1);
- X }
- X
- X /* determine the lowest power of 2 that is >= numchars, and the number
- X * of bits needed to store that many values.
- X */
- X for(pof2 = 2, pof2len = 1; pof2 < numchars;
- X pof2 <<= 1, pof2len++);
- X
- X /* compute how many half codes we need */
- X half = pof2 - numchars;
- X
- X /* compute how many whole codes we need */
- X whole = numchars - half;
- X
- X /* create a variable length code for each valid entry */
- X code = 0;
- X x = -1;
- X
- X /* create the whole codes */
- X while(whole--) {
- X /* get next slot */
- X do x++; while(codes[x].inuse == 0);
- X
- X if (direction == DECODE) {
- X codes[x].code = (char)code;
- X codes[x].len = (char)pof2len;
- X }
- X else
- X init_encodeval(codes, code, pof2len, x);
- X code++;
- X }
- X
- X /* chop off LSB to form the half codes */
- X code >>= 1;
- X pof2len--;
- X
- X /* create the half codes */
- X while(half--) {
- X do x++; while(codes[x].inuse == 0);
- X
- X if (direction == DECODE) {
- X codes[x].code = (char)code;
- X codes[x].len = (char)pof2len;
- X }
- X else
- X init_encodeval(codes, code, pof2len, x);
- X code++;
- X }
- X}
- END_OF_FILE
- if test 5195 -ne `wc -c <'codes.c'`; then
- echo shar: \"'codes.c'\" unpacked with wrong size!
- fi
- # end of 'codes.c'
- fi
- if test -f 'codes.h' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'codes.h'\"
- else
- echo shar: Extracting \"'codes.h'\" \(1188 characters\)
- sed "s/^X//" >'codes.h' <<'END_OF_FILE'
- X/*
- X * $Header: /usr/people/tcl/src/uutar/RCS/codes.h,v 1.1.1.2 1993/09/11 18:41:46 tcl Exp $
- X * Tom Lawrence
- X * tcl@sgi.com
- X */
- X
- X/* number of printable characters in output character set */
- Xextern int numchars;
- X
- X/* encoding/decoding table. inuse indicates whether or not the character
- X * whose ascii code is the offset into this array is part of the output
- X * printable character set.
- X *
- X * When encoding, the next 8 bits (not necessarily byte aligned) in the
- X * input binary bitstream are used to index into this array. The code
- X * field then indicates the printable output character to append to the
- X * output, and the len field indicates how many of the input 8 bits
- X * should be comsumed by this operation, i.e. the input bitstream is
- X * advanced by len bits.
- X *
- X * When decoding, the input printable ascii character is used to index
- X * into this array. The variable length (8 bits or less) bitfield stored
- X * in code and whose length is len, is appended to the output binary
- X * bitstream.
- X */
- Xextern struct code {
- X char inuse;
- X char code;
- X char len;
- X} codes[256];
- X
- Xvoid init_codes();
- Xvoid parse_charval_list();
- Xvoid print_charval_list();
- X
- X#define ENCODE 0
- X#define DECODE 1
- END_OF_FILE
- if test 1188 -ne `wc -c <'codes.h'`; then
- echo shar: \"'codes.h'\" unpacked with wrong size!
- fi
- # end of 'codes.h'
- fi
- if test -f 'decode.1' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'decode.1'\"
- else
- echo shar: Extracting \"'decode.1'\" \(717 characters\)
- sed "s/^X//" >'decode.1' <<'END_OF_FILE'
- X.\" $Header: /usr/people/tcl/src/uutar/RCS/decode.1,v 1.1 1993/09/11 20:06:09 tcl Exp $
- X.TH decode 1 "11 Sept 1993"
- X.SH NAME
- Xdecode \- decode a file encoded with the encode(1) utility
- X.SH SYNOPSIS
- X.B decode
- X[
- X.B \-i \c
- X.I <inputfile>
- X]
- X[
- X.B \-o \c
- X.I <outputfile>
- X]
- X.SH DESCRIPTION
- X.LP
- XDecode decodes a file which has been encoded in printable format with
- Xthe encode(1) utility.
- X.SH OPTIONS
- X.TP
- X.B \-i\c
- X.I <inputfile>
- X.br
- Xspecifies the file to read input from. If this argument is omitted,
- Xstdin is used.
- X.TP
- X.B \-o\c
- X.I <outputfile>
- X.br
- Xspecifies the file to write output to. If this argument is omitted,
- Xthe name of the output file is obtained from the first line of the
- Xinput file.
- X.SH "SEE ALSO"
- X.BR encode (1),
- END_OF_FILE
- if test 717 -ne `wc -c <'decode.1'`; then
- echo shar: \"'decode.1'\" unpacked with wrong size!
- fi
- # end of 'decode.1'
- fi
- if test -f 'decode.c' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'decode.c'\"
- else
- echo shar: Extracting \"'decode.c'\" \(5584 characters\)
- sed "s/^X//" >'decode.c' <<'END_OF_FILE'
- X/*
- X * $Header: /usr/people/tcl/src/uutar/RCS/decode.c,v 1.1.1.3 1993/09/11 18:42:17 tcl Exp $
- X * Tom Lawrence
- X * tcl@sgi.com
- X */
- X
- X#include <stdio.h>
- X#include <fcntl.h>
- X#include <stdlib.h>
- X#include <string.h>
- X#include <strings.h>
- X#include <ctype.h>
- X#include "codes.h"
- X
- X/*
- X * given a string with n tokens separated by white space in it, and a
- X * pointer to a char vector, create a vector with each pointer pointing
- X * to a successive token and null terminate the tokens. Return the
- X * number of tokens or -1 on error. This routine is destructive to the
- X * passed string
- X */
- X#define IS_WHITE_SPACE(c) (c == ' ' || c == '\t')
- X#define VECLEN 10
- X
- XFILE *infp, *outfp;
- X
- Xstatic int
- Xtokenize(string, vector)
- X char *string;
- X char **vector;
- X{
- X int tokens;
- X enum {
- X WHITE_SPACE,
- X TOKEN
- X } state;
- X char *c;
- X
- X /* scan through the string setting up the vector pointers and
- X * null terminating the tokens
- X */
- X tokens = 0;
- X state = WHITE_SPACE;
- X
- X for(c = string; *c; c++) {
- X if (state == WHITE_SPACE && !IS_WHITE_SPACE(*c)) {
- X /* just hit beginning of a token */
- X vector[tokens] = c;
- X tokens++;
- X state = TOKEN;
- X
- X if (tokens >= VECLEN) {
- X fprintf(stderr, "too many tokens in input\n");
- X exit(1);
- X }
- X }
- X else if (state == TOKEN && IS_WHITE_SPACE(*c)) {
- X /* just ended a token */
- X *c = 0;
- X state = WHITE_SPACE;
- X }
- X }
- X return(tokens);
- X}
- X
- X/* normally I'd use strtol for this, but strtol can't handle
- X * unsigned values greater than 0x7FFFFFFF on some machines.
- X */
- Xstatic unsigned int
- Xhex2long(str)
- X char *str;
- X{
- X unsigned long ret = 0;
- X char *c, c1;
- X
- X for(c = str; *c; c++) {
- X c1 = *c;
- X if (c1 >= '0' && c1 <= '9')
- X c1 -= '0';
- X else if (c1 >= 'a' && c1 <= 'f')
- X c1 -= ('a' - 10);
- X else if (c1 >= 'A' && c1 <= 'F')
- X c1 -= ('A' - 10);
- X ret = (ret << 4) + c1;
- X }
- X return(ret);
- X}
- X
- Xstatic void
- Xusage()
- X{
- X printf("options:\n");
- X printf("-i <inputfile>\n");
- X printf("-o <outputfile>\n");
- X exit(1);
- X}
- X
- X/* parse command line arguments */
- Xstatic void
- Xparse(argc, argv)
- X int argc;
- X char **argv;
- X{
- X char *infile, *outfile;
- X
- X infile = outfile = 0;
- X
- X while(--argc) {
- X argv++;
- X if (!strcmp(*argv, "-i")) {
- X if (argc < 2)
- X usage();
- X argc--;
- X argv++;
- X infile = *argv;
- X }
- X else if (!strcmp(*argv, "-o")) {
- X if (argc < 2)
- X usage();
- X argc--;
- X argv++;
- X outfile = *argv;
- X }
- X else
- X usage();
- X }
- X
- X /* open input stream */
- X if (infile) {
- X if ((infp = fopen(infile, "r")) == 0) {
- X perror(infile);
- X exit(1);
- X }
- X }
- X else
- X infp = stdin;
- X
- X /* open output stream or leave it for later if no output file
- X * was specified
- X */
- X if (outfile) {
- X if ((outfp = fopen(outfile, "w")) == 0) {
- X perror(outfile);
- X exit(1);
- X }
- X }
- X else
- X outfp = 0;
- X}
- X
- Xmain(argc, argv)
- X int argc;
- X char **argv;
- X{
- X char buffer[1024], *tokens[VECLEN], *c, out;
- X int state, numtokens, outfd, buf_offset, lookforend;
- X unsigned int cksum;
- X unsigned short buf;
- X
- X /* parse command line arguments */
- X parse(argc, argv);
- X
- X state = 0;
- X
- X /* clear the output buffer */
- X buf = 0;
- X buf_offset = 16;
- X
- X cksum = 0;
- X lookforend = 0;
- X
- X /* scan the input file */
- X while(fgets(buffer, sizeof(buffer), infp)) {
- X /* remove any newlines */
- X if (c = index(buffer, '\n'))
- X *c = 0;
- X
- X /* if this line is blank, check for and END keyword
- X * on the next line
- X */
- X if (*buffer == 0) {
- X lookforend = 1;
- X continue;
- X }
- X
- X /* state 0 == haven't seen BEGIN yet */
- X if (state == 0) {
- X if (!strncmp(buffer, "BEGIN ", 6)) {
- X state = 1;
- X numtokens = tokenize(buffer, tokens);
- X if (numtokens < 4) {
- X fprintf(stderr, "incomplete BEGIN line in encoded file\n");
- X exit(1);
- X }
- X
- X /* if output file wasn't specified on command line, use the
- X * one encoded in the input file
- X */
- X if (outfp == 0) {
- X /* use open() so we can specify the mode */
- X if ((outfd = open(tokens[2], O_WRONLY | O_CREAT | O_TRUNC,
- X strtol(tokens[1], 0, 8))) < 0) {
- X perror(tokens[2]);
- X exit(1);
- X }
- X outfp = fdopen(outfd, "w");
- X }
- X /* parse the character set and initialize the
- X * codes accordingly
- X */
- X parse_charval_list(tokens[3]);
- X init_codes(DECODE);
- X }
- X }
- X
- X /* state != 0 and we're looking for the END token */
- X else if (lookforend && !strncmp(buffer, "END ", 4)) {
- X numtokens = tokenize(buffer, tokens);
- X
- X /* issue checksum error if there's a mismatch */
- X if (numtokens < 2 || hex2long(tokens[1]) != cksum) {
- X fprintf(stderr, "checksum error.\n");
- X fprintf(stderr, "saw %X, computed %X\n",
- X hex2long(tokens[1]), cksum);
- X exit(1);
- X }
- X exit(0);
- X }
- X
- X /* state != 0 so this is a data line. Decode it */
- X else {
- X for(c = buffer; *c; c++) {
- X
- X /* check for garbage characters in the input */
- X if (!codes[*c].inuse) {
- X fprintf(stderr, "invalid char ");
- X if (isprint(*c))
- X fprintf(stderr, "\'%c\' ", *c);
- X fprintf(stderr, "(%d) in input", *c);
- X exit(1);
- X }
- X
- X /* append the variable length bitfield that maps to
- X * this input character to the output bitstream
- X */
- X buf_offset -= codes[*c].len;
- X buf |= (((unsigned short)(codes[*c].code) << buf_offset));
- X
- X /* if we've got an entire byte available in the output
- X * buffer, append it to the output file
- X */
- X if (buf_offset < 9) {
- X out = (char)(buf >> 8);
- X putc(out, outfp);
- X cksum = ((cksum << 7) | (cksum >> 25)) ^
- X (unsigned char)out;
- X
- X /* advance the output buffer */
- X buf_offset += 8;
- X buf <<= 8;
- X }
- X }
- X }
- X lookforend = 0;
- X }
- X}
- END_OF_FILE
- if test 5584 -ne `wc -c <'decode.c'`; then
- echo shar: \"'decode.c'\" unpacked with wrong size!
- fi
- # end of 'decode.c'
- fi
- if test -f 'encode.1' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'encode.1'\"
- else
- echo shar: Extracting \"'encode.1'\" \(2628 characters\)
- sed "s/^X//" >'encode.1' <<'END_OF_FILE'
- X.\" $Header: /usr/people/tcl/src/uutar/RCS/encode.1,v 1.2 1993/09/11 22:21:31 tcl Exp $
- X.TH encode 1 "11 Sept 1993"
- X.SH NAME
- Xencode \- encode binary files into printable format
- X.SH SYNOPSIS
- X.B encode
- X[
- X.B \-i \c
- X.I <inputfile>
- X]
- X[
- X.B \-o \c
- X.I <outputfile>
- X]
- X[
- X.B \-n \c
- X.I <name>
- X]
- X[
- X.B \-c \c
- X.I <charset>
- X]
- X.SH DESCRIPTION
- X.LP
- XEncode takes a binary file as input and encodes it into a printable
- Xformat that can be transferred via email.
- X.SH OPTIONS
- X.TP
- X.B \-i\c
- X.I <inputfile>
- X.br
- Xspecifies the file to read input from. If this argument is omitted,
- Xstdin is used.
- X.TP
- X.B \-o\c
- X.I <outputfile>
- X.br
- Xspecifies the file to write output to. If this argument is omitted,
- Xstdout is used.
- X.TP
- X.B \-n\c
- X.I <name>
- X.br
- Xspecifies the filename to store in the output file. This filename will
- Xbe the default filename used to create the decoded file. If this
- Xargument is omitted, the name of the input file is used. If the input
- Xfile is stdin, the string "stdin" is used.
- X.TP
- X.B \-c\c
- X.I <charset>
- X.br
- X
- Xspecifies the character set to encode with. A character set is
- Xspecified as a list of ranges. A range is either a single character
- Xcode or two character codes separated by a hyphen, e.g. 23 or 45-51. A
- Xlist of ranges is 1 or more ranges separated by commas, e.g. 23,45-51.
- XOverlaps in ranges are not a problem; each character is counted only
- Xonce. A character code is any valid number between 0 and 255 decimal,
- Xor the equivalent in octal, hex, or raw escaped characters. Note,
- Xhowever, that it only makes sense to use printable characters in the
- Xrange 32-126. Octal codes must be preceeded by a 0, e.g. 023. Hex
- Xcodes must be preceeded by 0x, e.g. 0x6e. Raw escaped character codes
- Xmay be specified with a backslash followed by the character itself,
- Xe.g. \\t. If this argument is omitted, the entire set of printable
- Xcharacters, 32-126, is used. The character set is included in the
- Xencoded file in decimal notation with any overlaps removed regardless
- Xof how it is specified on the command line.
- X
- X.SH OUTPUT FORMAT
- X.LP
- XThe first line of the output contains the keyword BEGIN followed by
- Xthe file mode of the input file in octal, the filename to be used when
- Xcreating the decoded file, and the character set used. Immediately
- Xfollowing this line is the encoded data, using only the characters in
- Xthe specified character set. Output width is limited to 79 columns by
- Xinserting a newline every 79 characters. The encoded data terminates
- Xwhen two consecutive newlines are seen. Immediately following the
- Xsecond newline is a line containing the keyword END and a 32 bit
- Xchecksum of the input file in hex.
- X.SH "SEE ALSO"
- X.BR decode (1),
- END_OF_FILE
- if test 2628 -ne `wc -c <'encode.1'`; then
- echo shar: \"'encode.1'\" unpacked with wrong size!
- fi
- # end of 'encode.1'
- fi
- if test -f 'encode.c' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'encode.c'\"
- else
- echo shar: Extracting \"'encode.c'\" \(4700 characters\)
- sed "s/^X//" >'encode.c' <<'END_OF_FILE'
- X/*
- X * $Header: /usr/people/tcl/src/uutar/RCS/encode.c,v 1.1.1.5 1993/09/11 22:42:56 tcl Exp $
- X * Tom Lawrence
- X * tcl@sgi.com
- X */
- X
- X#include <stdio.h>
- X#include <sys/types.h>
- X#include <sys/stat.h>
- X#include <stdlib.h>
- X#include <string.h>
- X#include "codes.h"
- X
- Xstatic FILE *infp, *outfp;
- Xstatic char *name, *charset;
- Xstatic mode_t inmode;
- X
- Xstatic void
- Xusage()
- X{
- X printf("options:\n");
- X printf("-i <inputfile>\n");
- X printf("-o <outputfile>\n");
- X printf("-n <name>\n");
- X printf("-c <charset>\n");
- X exit(1);
- X}
- X
- X/* parse command line arguments */
- Xstatic void
- Xparse(argc, argv)
- X int argc;
- X char **argv;
- X{
- X char *infile, *outfile;
- X struct stat statbuf;
- X
- X infile = outfile = 0;
- X name = charset = 0;
- X
- X while(--argc) {
- X argv++;
- X if (!strcmp(*argv, "-i")) {
- X if (argc < 2)
- X usage();
- X argc--;
- X argv++;
- X infile = *argv;
- X }
- X else if (!strcmp(*argv, "-o")) {
- X if (argc < 2)
- X usage();
- X argc--;
- X argv++;
- X outfile = *argv;
- X }
- X else if (!strcmp(*argv, "-n")) {
- X if (argc < 2)
- X usage();
- X argc--;
- X argv++;
- X name = *argv;
- X }
- X else if (!strcmp(*argv, "-c")) {
- X if (argc < 2)
- X usage();
- X argc--;
- X argv++;
- X charset = *argv;
- X }
- X else
- X usage();
- X }
- X
- X /* open the input stream */
- X if (infile) {
- X if ((infp = fopen(infile, "r")) == 0) {
- X perror(infile);
- X exit(1);
- X }
- X if (stat(infile, &statbuf) < 0) {
- X perror(infile);
- X exit(1);
- X }
- X inmode = statbuf.st_mode & 0777;
- X }
- X else {
- X infp = stdin;
- X inmode = 0666;
- X }
- X
- X /* open the output stream */
- X if (outfile) {
- X if ((outfp = fopen(outfile, "w")) == 0) {
- X perror(outfile);
- X exit(1);
- X }
- X }
- X else
- X outfp = stdout;
- X
- X /* get the filename to store in the encoded file */
- X if (name == 0) {
- X if (infile == 0)
- X name = "stdin";
- X else
- X name = infile;
- X }
- X
- X /* set default character set if none was specified */
- X if (charset == 0)
- X charset = "32-126";
- X}
- X
- Xmain(argc, argv)
- X int argc;
- X char **argv;
- X{
- X int c;
- X unsigned short buf;
- X int buf_offset, inlen, cols = 0, pattern;
- X unsigned int cksum;
- X
- X /* parse command line arguments */
- X parse(argc, argv);
- X
- X /* parse the supplied character set specification and initialize
- X * tables based on that set
- X */
- X parse_charval_list(charset);
- X init_codes(ENCODE);
- X
- X fprintf(outfp, "BEGIN %o %s ", inmode, name);
- X print_charval_list(outfp);
- X putc('\n', outfp);
- X
- X /* clear the sliding input buffer */
- X buf = 0;
- X buf_offset = 16;
- X
- X cksum = 0;
- X
- X /* read in the input file */
- X while((c = getc(infp)) != EOF) {
- X
- X /* compute a checksum on the input file */
- X cksum = ((cksum << 7) | (cksum >> 25)) ^ (unsigned)c;
- X
- X /* shift the byte just read in into our sliding buffer */
- X buf_offset -= 8;
- X buf |= ((unsigned short)c << buf_offset);
- X
- X /* see if there are any complete variable length bitfields
- X * in the input buffer. If so, output their corresponding
- X * printable output character and advance the input buffer
- X * by their length in bits
- X */
- X while (1) {
- X
- X /* grab the next 8 bits in the input bitstream */
- X pattern = (int)(buf >> 8);
- X
- X /* determine how many of those bits we will need
- X * to extract from the sliding buffer
- X */
- X inlen = codes[pattern].len;
- X
- X /* if there are not enough bits in the sliding
- X * buffer, stop for now. (interestingly, you don't need
- X * to have all of the needed bits in order to determine
- X * that you don't have all of the needed bits)
- X */
- X if (inlen > (16 - buf_offset))
- X break;
- X
- X /* output the printable character associated with
- X * the variable length bitfield recognized in the
- X * input bitstream
- X */
- X putc(codes[pattern].code, outfp);
- X
- X /* limit our width */
- X if (++cols == 79) {
- X cols = 0;
- X putc('\n', outfp);
- X }
- X
- X /* advance the input bitstream by the length of the bitfield
- X * just recognized
- X */
- X buf_offset += inlen;
- X buf <<= inlen;
- X }
- X }
- X
- X /* flush the buffer. The last byte read in may still have some
- X * of its bits in the sliding buffer. If so, print out one more
- X * output character. This will necessarily append some garbage
- X * bits to the output but what can we do? we can't write files
- X * at a finer granularity that the byte. The decoder will ignore
- X * them so it's ok
- X */
- X if (buf_offset < 16) {
- X putc(codes[pattern].code, outfp);
- X cols++;
- X }
- X
- X /* indicate end of encoded data by 2 consecutive newlines followed
- X * by the keyword END. This is necessary since the END line itself
- X * is potentially valid encoded data
- X */
- X if (cols)
- X putc('\n', outfp);
- X fprintf(outfp, "\nEND %X\n", cksum);
- X}
- END_OF_FILE
- if test 4700 -ne `wc -c <'encode.c'`; then
- echo shar: \"'encode.c'\" unpacked with wrong size!
- fi
- # end of 'encode.c'
- fi
- echo shar: End of archive 1 \(of 1\).
- cp /dev/null ark1isdone
- MISSING=""
- for I in 1 ; do
- if test ! -f ark${I}isdone ; then
- MISSING="${MISSING} ${I}"
- fi
- done
- if test "${MISSING}" = "" ; then
- echo You have the archive.
- rm -f ark[1-9]isdone
- else
- echo You still need to unpack the following archives:
- echo " " ${MISSING}
- fi
- ## End of shell archive.
- exit 0
-