Source Code 1994 March

home *** CD-ROM | disk | FTP | other *** search

/ Source Code 1994 March / Source_Code_CD-ROM_Walnut_Creek_March_1994.iso / compsrcs / unix / volume27 / encode / part01 < prev next >

Wrap

Text File | 1993-09-12 | 35.7 KB | 1,242 lines

Newsgroups: comp.sources.unix From: tcl@hellfudge.asd.sgi.com (Tom Lawrence) Subject: v27i035: encode - utilities encode/decode binary files in ascii format, Part01/01 Message-id: <1.747861129.4245@gw.home.vix.com> Sender: unix-sources-moderator@gw.home.vix.com Approved: vixie@gw.home.vix.com Submitted-By: tcl@hellfudge.asd.sgi.com (Tom Lawrence) Posting-Number: Volume 27, Issue 35 Archive-Name: encode/part01 ----------- What are encode/decode? Encode and decode are utilities which encode binary data into printable format suitable for transmission via email, posting to usenet, etc. They are intended to replace the aging uuencode and uudecode. ----------- Features: Encode features a very flexible encoding scheme which allows the user to specify exactly which printable characters to use in the output. The default is to use all 95 printable characters in the encoding process, as this produces the least expansion of the input data. However, for cases such as file transfer to a mainframe or to a foreign country where some characters may be modified en route, these characters can simply be removed from the output character set. Encoding is possible with as few as 2 characters in the output character set. Regardless of how many characters are specified in the output character set, encode only expands the data by a factor very close to the theoretical limit for that number of characters. (see next section) The implementation is simple (less than 500 lines total without comments) and efficient (runs at a speed comparable to uuencode/uudecode) #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of archive 1 (of 1)." # Contents: MANIFEST Makefile README codes.c codes.h decode.1 decode.c # encode.1 encode.c # Wrapped by vixie@gw.home.vix.com on Sun Sep 12 12:10:53 1993 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f 'MANIFEST' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'MANIFEST'\" else echo shar: Extracting \"'MANIFEST'\" $393 characters$ sed "s/^X//" >'MANIFEST' <<'END_OF_FILE' X File Name Archive # Description X----------------------------------------------------------- X MANIFEST 1 This shipping list X Makefile 1 X README 1 X codes.c 1 X codes.h 1 X decode.1 1 X decode.c 1 X encode.1 1 X encode.c 1 END_OF_FILE if test 393 -ne `wc -c <'MANIFEST'`; then echo shar: \"'MANIFEST'\" unpacked with wrong size! fi # end of 'MANIFEST' fi if test -f 'Makefile' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'Makefile'\" else echo shar: Extracting \"'Makefile'\" $601 characters$ sed "s/^X//" >'Makefile' <<'END_OF_FILE' X# $Header: /d/tcl/src/uutar/RCS/Makefile,v 1.2.1.2 1993/09/10 21:39:24 tcl Exp $ X XCC = cc XCFLAGS = -O X XCOMMON_SRCS = codes.c XCOMMON_BINARIES = ${COMMON_SRCS:.c=.o} X XENCODE_SRCS = encode.c XENCODE_BINARIES = ${ENCODE_SRCS:.c=.o} X XDECODE_SRCS = decode.c XDECODE_BINARIES = ${DECODE_SRCS:.c=.o} X Xdefault: encode decode X Xencode: $(COMMON_BINARIES) $(ENCODE_BINARIES) X $(CC) -o encode $(COMMON_BINARIES) $(ENCODE_BINARIES) X Xdecode: $(COMMON_BINARIES) $(DECODE_BINARIES) X $(CC) -o decode $(COMMON_BINARIES) $(DECODE_BINARIES) X Xclean: X @touch bunk.o bunk~ encode decode X /bin/rm -f *.o *~ encode decode END_OF_FILE if test 601 -ne `wc -c <'Makefile'`; then echo shar: \"'Makefile'\" unpacked with wrong size! fi # end of 'Makefile' fi if test -f 'README' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'README'\" else echo shar: Extracting \"'README'\" $8859 characters$ sed "s/^X//" >'README' <<'END_OF_FILE' X$Header: /usr/people/tcl/src/uutar/RCS/README,v 1.3 1993/09/12 00:40:52 tcl Exp $ X X----------- What are encode/decode? X XEncode and decode are utilities which encode binary data into Xprintable format suitable for transmission via email, posting to Xusenet, etc. They are intended to replace the aging uuencode and Xuudecode. X X----------- Features: X XEncode features a very flexible encoding scheme which allows the user Xto specify exactly which printable characters to use in the output. XThe default is to use all 95 printable characters in the encoding Xprocess, as this produces the least expansion of the input data. XHowever, for cases such as file transfer to a mainframe or to a Xforeign country where some characters may be modified en route, these Xcharacters can simply be removed from the output character set. XEncoding is possible with as few as 2 characters in the output Xcharacter set. X XRegardless of how many characters are specified in the output Xcharacter set, encode only expands the data by a factor very close to Xthe theoretical limit for that number of characters. (see next Xsection) X XMy implementation is simple (less than 500 lines total without Xcomments) and efficient (runs at a speed comparable to Xuuencode/uudecode) X X----------- Some theory on file expansion during encoding: X XThe number of bits required to encode n distinct values is log2(n) X(log base 2 of n). For example, to encode 256 distinct values, you Xneed log2(256) = 8 bits. Let's think of the input file before encoding Xas a raw stream of bits without byte boundaries. If we want to Xrepresent this data with 256 distinct characters, we will consume 8 Xbits of the input bitstream per output character. This is how files Xare normally encoded. However, if we can't use all 256 output Xcharacters, we will consume fewer than 8 input bits per output Xcharacter, and thus we will require more output characters to Xrepresent the input bitstream than if we had 256 output characters. XThus, the process of encoding a binary file in printable format will Xnecessarily expand the file. For example if we use the 95 printable Xcharacters, we'll consume an average of log2(95) = 6.57 bits in the Xinput stream for each output character. Thus the file will be expanded Xby a factor of log2(256)/log2(95) = log(256)/log(95) = 1.217 or 21.7%. XNote that this is a theoretical figure. In practice, we can't Xsubdivide bits, but this figure does provide a theoretical estimate of Xthe smallest amount of expansion we can hope to get with n output Xcharacters. In practice some coding schemes should be able to do Xbetter for select cases, but for a very large sample space of random Xdata, no encoding scheme should ever be able to do better than this Xtheoretical limit. X XUuencode maps 3 input characters to 4 output characters for an Xexpansion of 33% (not including control information). Lately several Xencoding schemes which map 4 input characters to 5 output characters Xhave popped up, for an expansion of 25%. X XAn analysis of encode shows that the average expansion over a very Xlarge input file of random data is X8 / (pb - 2 + 2n/p) Xwhere n is the number of output characters, p is the smallest power of X2 greater than or equal to n, and pb is log2(p), or the number of bits Xneeded to represent p values. A graph of this function for values of n Xfrom 2 to 256 shows a very close approximation of the theoretical Xexpansion of log(256)/log(n). For example, for n = 95, the expansion Xfactor is X8 / (7 - 2 + 2*95/128) = 1.234 or 23.4% X XNote that all expansion factors given above fail to take into account Xthe addition of newline characters to limit output width. X X----------- The encoding process: X XThe encoding process used by encode is simply to throw away the byte Xboundaries in the input bitstream and insert new byte boundaries in Xsuch a manner that there are only n distinct "tokens" in the input Xstream where n is the number of output characters. These tokens can Xthen be mapped one-to-one with the output characters, both during Xencoding and decoding. A good example of this process is uuencode, Xwhich discards the byte boundaries which occur every 8 bits and Xinserts byte boundaries every 6 bits. The result is a series of tokens Xwith a maximum of 64 possible values, each of which is mapped Xone-to-one with the output character set of 64 printable characters. XThis process is trivial for any n which is a power of two, you simply Xinsert byte boundaries every log2(n) bits. When n is not a power of 2, Xhowever, the process is somewhat more complicated. X XWe can no longer insert the byte boundaries at regular intervals of b Xbits, since this would imply 2^b output characters. If we select b Xsuch that 2^b < n, then we aren't using all n output characters, and Xwe're expanding the file more than necessary. On the other hand if we Xselect b such that 2^b > n, we don't have enough output characters to Xencode the data. The solution is to start with the smallest b such Xthat 2^b >= n and then eliminate some of the input tokens until there Xare exactly n of them, then we can map one-to-one with the output Xcharacters. Input tokens can be eliminated by taking two input tokens Xand combining them to form a single, shorter token. This is best Xexplained by giving an example. X XLet's say we have 6 output characters. We start with 8 input tokens: X000,001,010,011,100,101,110,111 XThis set of tokens has the property that any input bitstream can Xbe broken down to a series of these tokens in exactly one way. XNow let's combine two of the tokens. The tokens to be combined must Xhave identical bits except for the last bit, and the process of Xcombining strips that bit from the tokens. e.g. 110 and 111 can be Xcombined into the token 11, so we now have the token set X000,001,010,011,100,101,11 XIf we combine two more tokens, 100 and 101 -> 10, we get X000,001,010,011,10,11 XThis token set still has the property that any input bitstream can be Xbroken down into a series of these tokens in exactly one way, and Xsince there are 6 of them, we can map one-to-one with the output Xcharacter set. X XThe standard for the generation of these tokens will be as follows: XStart with 2^b distinct tokens of length b bits, where b is the Xsmallest integer such that 2^b >= n, where n is the number of output Xcharacters. Then, as above, while there are more than n tokens of any Xlength, replace the two numerically greatest b length tokens with a Xsingle b-1 length token such that the b-1 length token is equivalent Xto the b-1 most significant bits of either b length token. (It is Xasserted that at any time in the procedure, the two numerically Xgreatest b length tokens differ only in the least significant bit). X XThe standard for the one-to-one mapping between tokens and output Xcharacters will be as follows: tokens will be sorted such that all b Xlength tokens come first, in numerical order, followed by all b-1 Xlength tokens, in numerical order. Output characters will be sorted by Xascii code in numerical order. A one-to-one mapping will be Xestablished between these two sets. X XThe standard for the checksum will be as follows: The checksum will be Xcomputed on the decoded data. It will be 32 bits wide. For each Xcharacter read from the input file during encoding or written to the Xoutput file diring decoding, the checksum will first be rolled 7 bits Xto the left (the 7 bits which slide off the MSB end will be reinserted Xinto the LSB end) and then the character will be xor'd onto the low Xorder 8 bits of the checksum. X X----------- Implementation: X XDecoding with this scheme is trivial: you simply map the printable Xcharacter from the input to the corresponding variable length token, Xand then append that token to the decoded bitstream. X XEncoding is a bit more tricky however, since the token length is Xvariable, and the input bitstream has no token boundaries in it. The Xsolution is to set up a 256 element array which is indexed by the next X8 bits in the input bitstream. Note that these 8 bits are not Xnecessarily byte-aligned in the input file. The indexed element in the Xarray will indicate how many bits should be consumed in the input, and Xwhat printable character to append to the output. For example, in Xorder to recognize the token 010, all elements of the array whose Xindex is 010xxxxx for all xxxxx should be set up to indicate that 3 Xbits were seen and give the printable character that maps to 010. The Xinput bitstream will then be advanced by 3 bits and the operation is Xrepeated, using the next 8 bits to index the array again. X XMy implementation of this encoding process is fairly simplistic and Xincorporates no more than the basic functionality provided by Xuuencode/uudecode. It is intended primarily to introduce this encoding Xscheme to the public in the hopes that it will be widely adopted. XShould such adoption occur, this file should be used as a standard Xreference for the encoding algorithm. X END_OF_FILE if test 8859 -ne `wc -c <'README'`; then echo shar: \"'README'\" unpacked with wrong size! fi # end of 'README' fi if test -f 'codes.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'codes.c'\" else echo shar: Extracting \"'codes.c'\" $5195 characters$ sed "s/^X//" >'codes.c' <<'END_OF_FILE' X/* X * $Header: /usr/people/tcl/src/uutar/RCS/codes.c,v 1.1.1.4 1993/09/11 22:42:44 tcl Exp $ X * Tom Lawrence X * tcl@sgi.com X */ X X#include <stdio.h> X#include <stdlib.h> X#include "codes.h" X X/* see codes.h */ Xint numchars; Xstruct code codes[256]; X X/* initialize a subset of the codes array. val and len define a variable X * length bitfield. The array elements will be set up so that any element X * of the array whose index is a left-aligned superset of this bitfield X * will contain the given output ascii character. E.g. if the bitfield is X * 10010, then all elements in the array with subscript 10010xxx for all X * xxx, will store the given ascii code, and the length of the bitfield, X * in this case 5. X */ Xstatic void Xinit_encodeval(codes, val, len, ascii) X struct code *codes; X int val; X int len; X int ascii; X{ X int shift, stop; X X /* determine how far the code must be shifted to be X * MSB justified in the byte X */ X shift = 8 - len; X X /* calculate the upper bound of indices which this bitfield X * will match X */ X stop = (val + 1) << shift; X X /* shift the code over to the left edge of the byte */ X val <<= shift; X X /* thus, for every index in the 256 element array which has X * this code as a prefix X */ X for(; val < stop; val++) { X /* store the code length and the printable character X * it represents X */ X codes[val].len = (char)len; X codes[val].code = (char)ascii; X } X} X X/* convert an ascii character code to an integer. The character code may X * be in decimal, hex or octal, or it may be an actual character escaped X * with a back-slash X */ Xstatic int Xstr2val(str) X char *str; X{ X int val; X char *end; X X while(*str == ' ' || *str == '\t') X str++; X X /* check if this is an escaped character */ X if (*str == '\\') { X str++; X if (*str == 0) { X fprintf(stderr, "missing character in alphabet\n"); X exit(1); X } X return((int)*str); X } X X val = (int)strtol(str, &end, 0); X if (end == str) { X if (*str) X fprintf(stderr, "invalid char \'%c\' in alphabet\n", *str); X else X fprintf(stderr, "empty numerical field in alphabet\n"); X exit(1); X } X return(val); X} X X/* parse a range of characters for the output character set and mark each X * character as in use in the codes array. A range is either in the form X * num-num or just num X */ Xstatic void Xparse_charval_range(range) X char *range; X{ X char *c, savec = 0; X int start, end, x; X X for(c = range; *c && *c != '-'; c++); X savec = *c; X *c = 0; X X start = str2val(range); X if (savec == '-') { X end = str2val(c + 1); X *c = savec; X for(x = start; x <= end; x++) X codes[x].inuse = 1; X } X else X codes[start].inuse = 1; X} X X/* parse a list of character ranges for the output character set and then X * parse each range found. A list is of the form range,range,... X */ Xvoid Xparse_charval_list(list) X char *list; X{ X char *c1, *c2, savec2; X int x; X X for(x = 0; x < 256; x++) X codes[x].inuse = 0; X X c1 = list; X while(*c1) { X while(*c1 == ',') X c1++; X if (*c1 == 0) X return; X for(c2 = c1; *c2 && *c2 != ','; c2++); X savec2 = *c2; X *c2 = 0; X parse_charval_range(c1); X *c2 = savec2; X c1 = c2; X } X return; X} X X/* print out the character set in the form of a list of ranges, encoded X * in decimal X */ Xvoid Xprint_charval_list(fp) X FILE *fp; X{ X int x, usecomma; X X usecomma = 0; X for(x = 0; x < 256; x++) { X if (codes[x].inuse) { X if (usecomma) X putc(',', fp); X fprintf(fp, "%d", x); X usecomma = 1; X if (x < 255 && codes[x+1].inuse) { X putc('-', fp); X while(++x < 256 && codes[x].inuse); X fprintf(fp, "%d", x-1); X } X } X } X} X X/* X * Initialize the tables for encoding or decoding depending on the given X * direction. X */ Xvoid Xinit_codes(direction) X int direction; X{ X int x, code, numchars; X int pof2, pof2len, half, whole; X X /* count how big our character set is */ X numchars = 0; X for(x = 0; x < 256; x++) X if (codes[x].inuse) X numchars++; X X if (numchars < 2) { X fprintf(stderr, X "uutar: alphabet doesn't contain enough characters.\n"); X exit(1); X } X X /* determine the lowest power of 2 that is >= numchars, and the number X * of bits needed to store that many values. X */ X for(pof2 = 2, pof2len = 1; pof2 < numchars; X pof2 <<= 1, pof2len++); X X /* compute how many half codes we need */ X half = pof2 - numchars; X X /* compute how many whole codes we need */ X whole = numchars - half; X X /* create a variable length code for each valid entry */ X code = 0; X x = -1; X X /* create the whole codes */ X while(whole--) { X /* get next slot */ X do x++; while(codes[x].inuse == 0); X X if (direction == DECODE) { X codes[x].code = (char)code; X codes[x].len = (char)pof2len; X } X else X init_encodeval(codes, code, pof2len, x); X code++; X } X X /* chop off LSB to form the half codes */ X code >>= 1; X pof2len--; X X /* create the half codes */ X while(half--) { X do x++; while(codes[x].inuse == 0); X X if (direction == DECODE) { X codes[x].code = (char)code; X codes[x].len = (char)pof2len; X } X else X init_encodeval(codes, code, pof2len, x); X code++; X } X} END_OF_FILE if test 5195 -ne `wc -c <'codes.c'`; then echo shar: \"'codes.c'\" unpacked with wrong size! fi # end of 'codes.c' fi if test -f 'codes.h' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'codes.h'\" else echo shar: Extracting \"'codes.h'\" $1188 characters$ sed "s/^X//" >'codes.h' <<'END_OF_FILE' X/* X * $Header: /usr/people/tcl/src/uutar/RCS/codes.h,v 1.1.1.2 1993/09/11 18:41:46 tcl Exp $ X * Tom Lawrence X * tcl@sgi.com X */ X X/* number of printable characters in output character set */ Xextern int numchars; X X/* encoding/decoding table. inuse indicates whether or not the character X * whose ascii code is the offset into this array is part of the output X * printable character set. X * X * When encoding, the next 8 bits (not necessarily byte aligned) in the X * input binary bitstream are used to index into this array. The code X * field then indicates the printable output character to append to the X * output, and the len field indicates how many of the input 8 bits X * should be comsumed by this operation, i.e. the input bitstream is X * advanced by len bits. X * X * When decoding, the input printable ascii character is used to index X * into this array. The variable length (8 bits or less) bitfield stored X * in code and whose length is len, is appended to the output binary X * bitstream. X */ Xextern struct code { X char inuse; X char code; X char len; X} codes[256]; X Xvoid init_codes(); Xvoid parse_charval_list(); Xvoid print_charval_list(); X X#define ENCODE 0 X#define DECODE 1 END_OF_FILE if test 1188 -ne `wc -c <'codes.h'`; then echo shar: \"'codes.h'\" unpacked with wrong size! fi # end of 'codes.h' fi if test -f 'decode.1' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'decode.1'\" else echo shar: Extracting \"'decode.1'\" $717 characters$ sed "s/^X//" >'decode.1' <<'END_OF_FILE' X.\" $Header: /usr/people/tcl/src/uutar/RCS/decode.1,v 1.1 1993/09/11 20:06:09 tcl Exp $ X.TH decode 1 "11 Sept 1993" X.SH NAME Xdecode \- decode a file encoded with the encode(1) utility X.SH SYNOPSIS X.B decode X[ X.B \-i \c X.I <inputfile> X] X[ X.B \-o \c X.I <outputfile> X] X.SH DESCRIPTION X.LP XDecode decodes a file which has been encoded in printable format with Xthe encode(1) utility. X.SH OPTIONS X.TP X.B \-i\c X.I <inputfile> X.br Xspecifies the file to read input from. If this argument is omitted, Xstdin is used. X.TP X.B \-o\c X.I <outputfile> X.br Xspecifies the file to write output to. If this argument is omitted, Xthe name of the output file is obtained from the first line of the Xinput file. X.SH "SEE ALSO" X.BR encode (1), END_OF_FILE if test 717 -ne `wc -c <'decode.1'`; then echo shar: \"'decode.1'\" unpacked with wrong size! fi # end of 'decode.1' fi if test -f 'decode.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'decode.c'\" else echo shar: Extracting \"'decode.c'\" $5584 characters$ sed "s/^X//" >'decode.c' <<'END_OF_FILE' X/* X * $Header: /usr/people/tcl/src/uutar/RCS/decode.c,v 1.1.1.3 1993/09/11 18:42:17 tcl Exp $ X * Tom Lawrence X * tcl@sgi.com X */ X X#include <stdio.h> X#include <fcntl.h> X#include <stdlib.h> X#include <string.h> X#include <strings.h> X#include <ctype.h> X#include "codes.h" X X/* X * given a string with n tokens separated by white space in it, and a X * pointer to a char vector, create a vector with each pointer pointing X * to a successive token and null terminate the tokens. Return the X * number of tokens or -1 on error. This routine is destructive to the X * passed string X */ X#define IS_WHITE_SPACE(c) (c == ' ' || c == '\t') X#define VECLEN 10 X XFILE *infp, *outfp; X Xstatic int Xtokenize(string, vector) X char *string; X char **vector; X{ X int tokens; X enum { X WHITE_SPACE, X TOKEN X } state; X char *c; X X /* scan through the string setting up the vector pointers and X * null terminating the tokens X */ X tokens = 0; X state = WHITE_SPACE; X X for(c = string; *c; c++) { X if (state == WHITE_SPACE && !IS_WHITE_SPACE(*c)) { X /* just hit beginning of a token */ X vector[tokens] = c; X tokens++; X state = TOKEN; X X if (tokens >= VECLEN) { X fprintf(stderr, "too many tokens in input\n"); X exit(1); X } X } X else if (state == TOKEN && IS_WHITE_SPACE(*c)) { X /* just ended a token */ X *c = 0; X state = WHITE_SPACE; X } X } X return(tokens); X} X X/* normally I'd use strtol for this, but strtol can't handle X * unsigned values greater than 0x7FFFFFFF on some machines. X */ Xstatic unsigned int Xhex2long(str) X char *str; X{ X unsigned long ret = 0; X char *c, c1; X X for(c = str; *c; c++) { X c1 = *c; X if (c1 >= '0' && c1 <= '9') X c1 -= '0'; X else if (c1 >= 'a' && c1 <= 'f') X c1 -= ('a' - 10); X else if (c1 >= 'A' && c1 <= 'F') X c1 -= ('A' - 10); X ret = (ret << 4) + c1; X } X return(ret); X} X Xstatic void Xusage() X{ X printf("options:\n"); X printf("-i <inputfile>\n"); X printf("-o <outputfile>\n"); X exit(1); X} X X/* parse command line arguments */ Xstatic void Xparse(argc, argv) X int argc; X char **argv; X{ X char *infile, *outfile; X X infile = outfile = 0; X X while(--argc) { X argv++; X if (!strcmp(*argv, "-i")) { X if (argc < 2) X usage(); X argc--; X argv++; X infile = *argv; X } X else if (!strcmp(*argv, "-o")) { X if (argc < 2) X usage(); X argc--; X argv++; X outfile = *argv; X } X else X usage(); X } X X /* open input stream */ X if (infile) { X if ((infp = fopen(infile, "r")) == 0) { X perror(infile); X exit(1); X } X } X else X infp = stdin; X X /* open output stream or leave it for later if no output file X * was specified X */ X if (outfile) { X if ((outfp = fopen(outfile, "w")) == 0) { X perror(outfile); X exit(1); X } X } X else X outfp = 0; X} X Xmain(argc, argv) X int argc; X char **argv; X{ X char buffer[1024], *tokens[VECLEN], *c, out; X int state, numtokens, outfd, buf_offset, lookforend; X unsigned int cksum; X unsigned short buf; X X /* parse command line arguments */ X parse(argc, argv); X X state = 0; X X /* clear the output buffer */ X buf = 0; X buf_offset = 16; X X cksum = 0; X lookforend = 0; X X /* scan the input file */ X while(fgets(buffer, sizeof(buffer), infp)) { X /* remove any newlines */ X if (c = index(buffer, '\n')) X *c = 0; X X /* if this line is blank, check for and END keyword X * on the next line X */ X if (*buffer == 0) { X lookforend = 1; X continue; X } X X /* state 0 == haven't seen BEGIN yet */ X if (state == 0) { X if (!strncmp(buffer, "BEGIN ", 6)) { X state = 1; X numtokens = tokenize(buffer, tokens); X if (numtokens < 4) { X fprintf(stderr, "incomplete BEGIN line in encoded file\n"); X exit(1); X } X X /* if output file wasn't specified on command line, use the X * one encoded in the input file X */ X if (outfp == 0) { X /* use open() so we can specify the mode */ X if ((outfd = open(tokens[2], O_WRONLY | O_CREAT | O_TRUNC, X strtol(tokens[1], 0, 8))) < 0) { X perror(tokens[2]); X exit(1); X } X outfp = fdopen(outfd, "w"); X } X /* parse the character set and initialize the X * codes accordingly X */ X parse_charval_list(tokens[3]); X init_codes(DECODE); X } X } X X /* state != 0 and we're looking for the END token */ X else if (lookforend && !strncmp(buffer, "END ", 4)) { X numtokens = tokenize(buffer, tokens); X X /* issue checksum error if there's a mismatch */ X if (numtokens < 2 || hex2long(tokens[1]) != cksum) { X fprintf(stderr, "checksum error.\n"); X fprintf(stderr, "saw %X, computed %X\n", X hex2long(tokens[1]), cksum); X exit(1); X } X exit(0); X } X X /* state != 0 so this is a data line. Decode it */ X else { X for(c = buffer; *c; c++) { X X /* check for garbage characters in the input */ X if (!codes[*c].inuse) { X fprintf(stderr, "invalid char "); X if (isprint(*c)) X fprintf(stderr, "\'%c\' ", *c); X fprintf(stderr, "(%d) in input", *c); X exit(1); X } X X /* append the variable length bitfield that maps to X * this input character to the output bitstream X */ X buf_offset -= codes[*c].len; X buf |= (((unsigned short)(codes[*c].code) << buf_offset)); X X /* if we've got an entire byte available in the output X * buffer, append it to the output file X */ X if (buf_offset < 9) { X out = (char)(buf >> 8); X putc(out, outfp); X cksum = ((cksum << 7) | (cksum >> 25)) ^ X (unsigned char)out; X X /* advance the output buffer */ X buf_offset += 8; X buf <<= 8; X } X } X } X lookforend = 0; X } X} END_OF_FILE if test 5584 -ne `wc -c <'decode.c'`; then echo shar: \"'decode.c'\" unpacked with wrong size! fi # end of 'decode.c' fi if test -f 'encode.1' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'encode.1'\" else echo shar: Extracting \"'encode.1'\" $2628 characters$ sed "s/^X//" >'encode.1' <<'END_OF_FILE' X.\" $Header: /usr/people/tcl/src/uutar/RCS/encode.1,v 1.2 1993/09/11 22:21:31 tcl Exp $ X.TH encode 1 "11 Sept 1993" X.SH NAME Xencode \- encode binary files into printable format X.SH SYNOPSIS X.B encode X[ X.B \-i \c X.I <inputfile> X] X[ X.B \-o \c X.I <outputfile> X] X[ X.B \-n \c X.I <name> X] X[ X.B \-c \c X.I <charset> X] X.SH DESCRIPTION X.LP XEncode takes a binary file as input and encodes it into a printable Xformat that can be transferred via email. X.SH OPTIONS X.TP X.B \-i\c X.I <inputfile> X.br Xspecifies the file to read input from. If this argument is omitted, Xstdin is used. X.TP X.B \-o\c X.I <outputfile> X.br Xspecifies the file to write output to. If this argument is omitted, Xstdout is used. X.TP X.B \-n\c X.I <name> X.br Xspecifies the filename to store in the output file. This filename will Xbe the default filename used to create the decoded file. If this Xargument is omitted, the name of the input file is used. If the input Xfile is stdin, the string "stdin" is used. X.TP X.B \-c\c X.I <charset> X.br X Xspecifies the character set to encode with. A character set is Xspecified as a list of ranges. A range is either a single character Xcode or two character codes separated by a hyphen, e.g. 23 or 45-51. A Xlist of ranges is 1 or more ranges separated by commas, e.g. 23,45-51. XOverlaps in ranges are not a problem; each character is counted only Xonce. A character code is any valid number between 0 and 255 decimal, Xor the equivalent in octal, hex, or raw escaped characters. Note, Xhowever, that it only makes sense to use printable characters in the Xrange 32-126. Octal codes must be preceeded by a 0, e.g. 023. Hex Xcodes must be preceeded by 0x, e.g. 0x6e. Raw escaped character codes Xmay be specified with a backslash followed by the character itself, Xe.g. \\t. If this argument is omitted, the entire set of printable Xcharacters, 32-126, is used. The character set is included in the Xencoded file in decimal notation with any overlaps removed regardless Xof how it is specified on the command line. X X.SH OUTPUT FORMAT X.LP XThe first line of the output contains the keyword BEGIN followed by Xthe file mode of the input file in octal, the filename to be used when Xcreating the decoded file, and the character set used. Immediately Xfollowing this line is the encoded data, using only the characters in Xthe specified character set. Output width is limited to 79 columns by Xinserting a newline every 79 characters. The encoded data terminates Xwhen two consecutive newlines are seen. Immediately following the Xsecond newline is a line containing the keyword END and a 32 bit Xchecksum of the input file in hex. X.SH "SEE ALSO" X.BR decode (1), END_OF_FILE if test 2628 -ne `wc -c <'encode.1'`; then echo shar: \"'encode.1'\" unpacked with wrong size! fi # end of 'encode.1' fi if test -f 'encode.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'encode.c'\" else echo shar: Extracting \"'encode.c'\" $4700 characters$ sed "s/^X//" >'encode.c' <<'END_OF_FILE' X/* X * $Header: /usr/people/tcl/src/uutar/RCS/encode.c,v 1.1.1.5 1993/09/11 22:42:56 tcl Exp $ X * Tom Lawrence X * tcl@sgi.com X */ X X#include <stdio.h> X#include <sys/types.h> X#include <sys/stat.h> X#include <stdlib.h> X#include <string.h> X#include "codes.h" X Xstatic FILE *infp, *outfp; Xstatic char *name, *charset; Xstatic mode_t inmode; X Xstatic void Xusage() X{ X printf("options:\n"); X printf("-i <inputfile>\n"); X printf("-o <outputfile>\n"); X printf("-n <name>\n"); X printf("-c <charset>\n"); X exit(1); X} X X/* parse command line arguments */ Xstatic void Xparse(argc, argv) X int argc; X char **argv; X{ X char *infile, *outfile; X struct stat statbuf; X X infile = outfile = 0; X name = charset = 0; X X while(--argc) { X argv++; X if (!strcmp(*argv, "-i")) { X if (argc < 2) X usage(); X argc--; X argv++; X infile = *argv; X } X else if (!strcmp(*argv, "-o")) { X if (argc < 2) X usage(); X argc--; X argv++; X outfile = *argv; X } X else if (!strcmp(*argv, "-n")) { X if (argc < 2) X usage(); X argc--; X argv++; X name = *argv; X } X else if (!strcmp(*argv, "-c")) { X if (argc < 2) X usage(); X argc--; X argv++; X charset = *argv; X } X else X usage(); X } X X /* open the input stream */ X if (infile) { X if ((infp = fopen(infile, "r")) == 0) { X perror(infile); X exit(1); X } X if (stat(infile, &statbuf) < 0) { X perror(infile); X exit(1); X } X inmode = statbuf.st_mode & 0777; X } X else { X infp = stdin; X inmode = 0666; X } X X /* open the output stream */ X if (outfile) { X if ((outfp = fopen(outfile, "w")) == 0) { X perror(outfile); X exit(1); X } X } X else X outfp = stdout; X X /* get the filename to store in the encoded file */ X if (name == 0) { X if (infile == 0) X name = "stdin"; X else X name = infile; X } X X /* set default character set if none was specified */ X if (charset == 0) X charset = "32-126"; X} X Xmain(argc, argv) X int argc; X char **argv; X{ X int c; X unsigned short buf; X int buf_offset, inlen, cols = 0, pattern; X unsigned int cksum; X X /* parse command line arguments */ X parse(argc, argv); X X /* parse the supplied character set specification and initialize X * tables based on that set X */ X parse_charval_list(charset); X init_codes(ENCODE); X X fprintf(outfp, "BEGIN %o %s ", inmode, name); X print_charval_list(outfp); X putc('\n', outfp); X X /* clear the sliding input buffer */ X buf = 0; X buf_offset = 16; X X cksum = 0; X X /* read in the input file */ X while((c = getc(infp)) != EOF) { X X /* compute a checksum on the input file */ X cksum = ((cksum << 7) | (cksum >> 25)) ^ (unsigned)c; X X /* shift the byte just read in into our sliding buffer */ X buf_offset -= 8; X buf |= ((unsigned short)c << buf_offset); X X /* see if there are any complete variable length bitfields X * in the input buffer. If so, output their corresponding X * printable output character and advance the input buffer X * by their length in bits X */ X while (1) { X X /* grab the next 8 bits in the input bitstream */ X pattern = (int)(buf >> 8); X X /* determine how many of those bits we will need X * to extract from the sliding buffer X */ X inlen = codes[pattern].len; X X /* if there are not enough bits in the sliding X * buffer, stop for now. (interestingly, you don't need X * to have all of the needed bits in order to determine X * that you don't have all of the needed bits) X */ X if (inlen > (16 - buf_offset)) X break; X X /* output the printable character associated with X * the variable length bitfield recognized in the X * input bitstream X */ X putc(codes[pattern].code, outfp); X X /* limit our width */ X if (++cols == 79) { X cols = 0; X putc('\n', outfp); X } X X /* advance the input bitstream by the length of the bitfield X * just recognized X */ X buf_offset += inlen; X buf <<= inlen; X } X } X X /* flush the buffer. The last byte read in may still have some X * of its bits in the sliding buffer. If so, print out one more X * output character. This will necessarily append some garbage X * bits to the output but what can we do? we can't write files X * at a finer granularity that the byte. The decoder will ignore X * them so it's ok X */ X if (buf_offset < 16) { X putc(codes[pattern].code, outfp); X cols++; X } X X /* indicate end of encoded data by 2 consecutive newlines followed X * by the keyword END. This is necessary since the END line itself X * is potentially valid encoded data X */ X if (cols) X putc('\n', outfp); X fprintf(outfp, "\nEND %X\n", cksum); X} END_OF_FILE if test 4700 -ne `wc -c <'encode.c'`; then echo shar: \"'encode.c'\" unpacked with wrong size! fi # end of 'encode.c' fi echo shar: End of archive 1 $of 1$. cp /dev/null ark1isdone MISSING="" for I in 1 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have the archive. rm -f ark[1-9]isdone else echo You still need to unpack the following archives: echo " " ${MISSING} fi ## End of shell archive. exit 0