home *** CD-ROM | disk | FTP | other *** search
- ;*************************************************************************
- ;
- ; atob.asm: Fast assembly-language version of atob.c
- ; Version 1.1
- ;
- ; atob converts files from ascii to binary, undoing the encoding
- ; of btoa.
- ;
- ; USAGE:
- ; atob output.fil <input.fil
- ;
- ; DESCRIPTION:
- ;
- ; atob reads its stdin and sends its output to the designated output file.
- ; If it already exists, the output file is overwritten *without*warning*.
- ;
- ; The encoding is performed by the program btoa. The btoa/atob encoding
- ; has a 25% expansion rate (as opposed to 33% by uudecode). Simple error
- ; checking is performed at the end to confirm that the entire file was
- ; decoded correctly. No attempt to localize the error is made. Btoa/atob
- ; is suitable only for ASCII transmission and hence is not appropriate for
- ; USENET transmission of binaries.
- ;
- ; I use btoa/atob to transfer binary files from a UNIX host to my IBM PC
- ; over a seven-bit channel, using the command line
- ;
- ; btoa < desired.fil | kermit -s -
- ;
- ; receiving the file in MS-Kermit, then decoding it on my IBM PC. This is
- ; faster than sending the binary directly since kermit uses eighth-bit quoting
- ; which essentially means that it takes 50% longer to transmit binaries
- ; as compared to ASCII files.
- ;
- ; I have used this program to decode dozens of binaries (including some
- ; ridiculously huge ones) and they all decoded fine, so I'm pretty certain
- ; that the bugs have been stomped out. If you find one, contact me at the
- ; email address below.
- ;
- ; SPEED:
- ;
- ; rjb = Ray Berry's version of atob (1/21/89)
- ; v10 = Version 1.0 of my atob
- ; v11 = Version 1.1 of my atob
- ;
- ; Time to decode Ralf Brown's interrupt list (inter490.zoo) on a PC-XT:
- ;
- ; rjb = 1583 seconds (26 min 23 sec)
- ; v10 = 113 seconds ( 1 min 53 sec) Speedup: 14x
- ; v11 = 70 seconds ( 1 min 10 sec) Speedup: 22x [1.6x faster than v10]
- ;
- ;
- ; ENCODING:
- ;
- ; Four bytes from the input are viewed as a 32-bit integer and converted
- ; to base 85. "!" represents zero, the double-quote represents 1, and
- ; so on up to "u" representing 84. As a special case, the 32-bit number
- ; zero is represented by the single character "z". The file is headed
- ; by the string "xbtoa Begin" and is followed by
- ;
- ; xbtoa End N Clen Clen E Ceor S Csum R Crot
- ;
- ; where Clen is the length of the encoded file (first in decimal, then
- ; in hex) and Ceor, Csum and Crot are three checksums (encoded in hex).
- ; Csum is Clen plus the sum of the characters. Ceor is the exclusive-or
- ; of all the characters, and Crot is a checksum computed via the formula
- ;
- ; Crot = (Crot rotated left one bit) + next_character
- ;
- ; These are crude checksums (not as robust as, say, CRC).
- ;
- ; IMPLEMENTATION:
- ;
- ; The program is pretty much a straightforward implementation of the
- ; decoding algorithm in assembly. Of course, it is rather heavily
- ; hand-optimized. In particular, SI and DI are used as global register
- ; values and inline macros are used rather frequently. The only room
- ; for improvement I can think of is replacing the multiply-by-85
- ; with a shift-and-add algorithm. (Which might even be slower on
- ; the 80386, whose multiplication algorithm has been pretty well-
- ; optimized.)
- ;
- ; HOW TO MAKE IT:
- ;
- ; If you have TLINK If you have MS's LINK
- ; ----------------- ---------------------
- ; masm atob; masm atob;
- ; tlink atob; link atob;
- ;
- ; DISCLAIMER/COPYRIGHT:
- ;
- ; As usual, the author claims no responsibility for the behavior of
- ; the program, although he is pretty sure that it works fine.
- ;
- ; The program remains Copyright 1990 by Raymond Chen, but I make
- ; no attempt to restrict distribution in any form. Just don't try
- ; to pass it off as your own. This copyright is of dubious legal
- ; significance since the string "Copyright" appears nowhere in the
- ; binary. I don't care. If you want to violate my copyright, there
- ; isn't too much I can do to stop you.
- ;
- ; AUTHORSHIP:
- ;
- ; The program was written by Raymond Chen (raymond@math.berkeley.edu)
- ; in January 1990 or thereabouts.
- ;
- ; And now... The code:
-
- name atob
-
- ;*************************************************************************
- ; General equates
- ;*************************************************************************
-
- DOSvec equ 21h ; DOS interrupt service vector
-
- stdin equ 0 ; DOS file handle
-
- print equ 9 ; Print a string to the console
- creat equ 3ch ; Create a new file
- close equ 3eh ; Close a file
- read equ 3fh ; Read from a handle
- write equ 40h ; Write to a handle
- exit equ 4ch ; End the process
-
- openflg equ 20h ; read-only, deny write
-
- cr equ 0dh
- lf equ 0ah
-
- bufsiz equ 10240 ; size of I/O buffers.
- ; five times it must be less than 64K.
-
- ; 5 = 4 + 1. 1 = a copy of the incoming data.
- ; 4 = each "z" codes four output bytes.
- ; worst case expansion is when somebody btoa's a
- ; file consisting entirely of zeros.
-
- ;*************************************************************************
- ; Global register assignments:
- ;
- ; SI = where to get the next byte from the input buffer
- ; DI = where to send next byte to the output buffer
- ; BP = What byte number within a block of four? (bcount)
- ;*************************************************************************
-
- ;*************************************************************************
- ; Segmentation nonsense.
- ;*************************************************************************
-
- _TEXT segment byte public 'CODE'
- _TEXT ends
-
- _DATA segment word public 'DATA'
- _DATA ends
-
- _BSS segment word public 'BSS'
- _BSS ends
-
- _BSSEND segment byte public 'STACK'
- _BSSEND ends
-
- _STACK segment stack 'STACK'
- dw 64 dup (?)
- _STACK ends
-
- DGROUP GROUP _DATA, _BSS, _BSSEND
-
- ;*************************************************************************
- ; Initialized Global Variables
- ;*************************************************************************
- _DATA segment word public 'DATA'
-
- ClenL dw 0 ; Length of converted file
- ClenH dw 0
-
- Ceor label byte ; Checksum via exclusive or
- CeorL dw 0
- CeorH dw 0
-
- Csum label dword ; Checksum via summation
- CsumL dw 0 ; Low word
- CsumH dw 0 ; High word
-
- Crot label dword ; Checksum via rotation
- CrotL dw 0 ; Low word
- CrotH dw 0 ; High word
-
- _DATA ends
-
- ;*************************************************************************
- ; Uninitialized Global Variables
- ;*************************************************************************
- _BSS segment word public 'BSS'
-
- fdout label word
- dw 1 dup (?) ; File handle for output
-
- numbuf label byte
- db 10 dup (?) ; Numbers go here for decoding
-
- inbuf label byte
- db bufsiz dup (?) ; file input buffer
- db 1 dup (?) ; the extra byte is for a sentinel
-
- outbuf label byte
- db 4*bufsiz dup (?); file output buffer
-
- _BSS ends
-
- ;*************************************************************************
- ; Macros
- ;*************************************************************************
-
- ;*************************************************************************
- ; makestr: Create a string with the specified label.
- ; The optional third argument receives the length of the string.
- ;*************************************************************************
- makestr macro l, s, c
- _DATA segment word public 'DATA'
- l db s
- ifnb <c>
- c equ $-l
- endif
- _DATA ends
- endm
-
- ;*************************************************************************
- ; die: Print a message and terminate the program.
- ;*************************************************************************
- die macro msg
- local l
- makestr l, <msg, '$'>
- mov dx, offset DGROUP:l
- jmp error
- endm
-
- ;*************************************************************************
- ; DOS: Call DOS with the function code passed as the argument.
- ;*************************************************************************
- DOS macro func
- mov ah, func
- int DOSvec
- endm
-
- ;*************************************************************************
- ; inchar: Read a character from the input file to al.
- ; If the input buffer is empty, fill it.
- ;*************************************************************************
- inchar macro
- local l
- lodsb
- or al, al
- jnz l
- call flushbuf ; this also fills the input buffer
- l:
- endm
-
-
- ;*************************************************************************
- ; Code
- ;*************************************************************************
- _TEXT segment byte public 'CODE'
- DGROUP group _DATA,_BSS
- assume cs:_TEXT, ds:DGROUP, es:DGROUP, ss:DGROUP
-
- ;*************************************************************************
- ; error: Print the string in DX to the console and die
- ;*************************************************************************
- error proc near
- DOS print ; print the string
- mov bx, fdout ; close the file, if it's open
- or bx, bx
- jz noclose
- DOS close
- noclose:
- mov al, 1 ; return an error
- DOS exit
- error endp
-
- ;*************************************************************************
- ; decode: Decode the buffer until it is empty.
- ; This is the inner loop, so it has been mangled for speed.
- ; The object of the game is to make as few jumps as possible
- ; along the most-frequently-travelled path through the code,
- ; and to combine termination tests.
- ;*************************************************************************
- decode proc near
-
- ; Preliminary setup:
- ; Throughout, cx = 85. We need to keep that number nearby.
- mov cx, 85
- jmp short dloop ; enter the main loop
-
-
- ;*************************************************************************
- ; aux: Auxiliary stuff that has been hoisted from the inner loop.
- ; Since the buffer empties very rarely (once every 16000
- ; times through the loop), we shuffle code around so that
- ; the most common code path doesn't involve any jumps.
- ;*************************************************************************
- aux1: call flushbuf
- jmp short auxret1
- aux2: call flushbuf
- jmp short auxret2
- aux3: call flushbuf
- jmp short auxret3
- retdc: ret
- ;*************************************************************************
- ; zout: See if we got a 'z'; otherwise, leave.
- ;*************************************************************************
- zout: cmp al, 'z'-'!'
- jnz retdc
-
- ;*************************************************************************
- ; zee: Code four consecutive zeros
- ;*************************************************************************
- zee: xor ax,ax
- stosw
- stosw ; this codes four zeros
- add CsumL, 4 ; Update the "Sum" checksum.
- adc CsumH, 0
-
- ; Updating the "Rotate" checksum is tricky.
- ; The clever way ( rol ax, 4; rol dx, 4; exchanging low nibbles)
- ; takes more clock cycles. Though this drains the prefetch queue.
-
- mov dx, CrotH
- mov ax, CrotL
-
- sal ax,1 ; This sequence
- rcl dx,1 ; rotates the 32-bit
- adc ax,0 ; number by one bit.
-
- sal ax,1 ; So we do it four times.
- rcl dx,1 ;
- adc ax,0 ;
-
- sal ax,1 ; Third time
- rcl dx,1 ;
- adc ax,0 ;
-
- sal ax,1 ; Fourth time
- rcl dx,1 ;
- adc ax,0 ;
-
- mov CrotH, dx
- mov CrotL, ax
-
- ;*************************************************************************
- ; dloop: The main decoding loop.
- ;
- ; The numbers down the center are clock ticks on an 8086.
- ; I optimized for an 8086, so sue me.
- ;*************************************************************************
- dloop:
-
- dloop1: lodsb ; Get the first character of a quintet
- or al, al
- jz aux1
-
- auxret1:
- sub al,'!'
- jb dloop1 ; ignore control characters
- cmp al,cl
- jge zout ; either a 'z' or something invalid
- mul cl
- mov bx, ax ; BX = a0
-
- dloop2: lodsb ; Get the second character of a quintet
- or al, al
- jz aux2
- auxret2:
- sub al, '!'
- jb dloop2 ; ignore control characters
- cmp al, cl
- jge retdc
-
- mov ah, ch ; zero out the top byte
- add ax, bx ; AX = ab
- mov dx, 85*85
- mul dx ; DX:AX = ab00
- mov bx, ax ; DX:BX = ab00
-
- dloop3: lodsb ; Get the third character of a quintet
- or al, al
- jz aux3
- auxret3:
- sub al, '!'
- jb dloop3 ; ignore control characters
- cmp al, cl
- jge retdc
-
- mul cl ; AX=c0
- add bx, ax
- adc dx, 0 ; DX:BX = abc0
-
- dloop4: lodsb ; Get the fourth character of a quintet
- or al, al
- jz aux4
- auxret4:
- sub al, '!'
- jb dloop4 ; ignore control characters
- cmp al, cl
- jge retdc2
-
- mov ah, ch
- add bx, ax
- mov ax, dx
- adc ax, 0 ; AX:BX = abcd
-
- mul cx
- mov bp, ax
- mov ax, bx
- mul cx
- add bp, dx
- mov bx, ax ; BP:BX = abcd0
-
- dloop5: lodsb ; Get the last character of a quintet
- or al, al
- jz aux5
- auxret5:
- sub al, '!'
- jb dloop5 ; ignore control characters
- cmp al, cl
- jge retdc2
-
- mov ah, ch
- add bx, ax
- mov dx, bp
- adc dx, 0 ; DX:BX = abcde
-
- mov al, dh ; send out the bytes in [DX:BX]
- call byteout ; from top to bottom
- mov al, dl
- call byteout
- mov al, bh
- call byteout
- mov al, bl
- call byteout
-
- jmp dloop1 ; ready for more!
-
- retdc2: ret
- aux4: call flushbuf
- jmp short auxret4
- aux5: call flushbuf
- jmp short auxret5
- decode endp
-
-
- ;*************************************************************************
- ; byteout: Update checksums and output the character in AL. Preserves AL.
- ;*************************************************************************
-
- byteout proc near
- xor ah,ah ; Convert byte to word
-
- xor Ceor, al ; Update the "exclusive or" checksum.
-
- stc ; Update the "Sum" checksum.
- adc CsumL, ax ; Add the character, plus one
- adc CsumH, 0
-
- sal CrotL,1 ; Update the "Rotate" checksum.
- rcl CrotH,1 ; Rotate left, then add in the character.
- adc CrotL,ax ; Sneaky trick. Carry gets added for free.
- adc CrotH,0
-
- stosb ; Output the character. We never overrun
- ; the buffer.
- ret
- byteout endp
-
- ;*************************************************************************
- ; flushbuf: Write out the current output buffer.
- ;
- ; The test for buffer overflow is done only when the input
- ; buffer is empty. This allows us to remove the test for
- ; output buffer overflow from the inner loop.
- ;
- ; IMPORTANT: This function falls through to fillbuf.
- ;
- ;*************************************************************************
- flushbuf proc near
- cmp di, offset DGROUP:outbuf
- jz noflush
- push ax
- push bx
- push cx
- push dx
- mov bx, fdout ; write to the output file
- mov dx, offset DGROUP:outbuf ; buffer location
- mov cx, di
- sub cx, dx ; number of bytes to write
- mov di, dx
- DOS write
- pop dx
- pop cx
- pop bx
- pop ax
- jnc noflush
- die "Error writing output file"
-
- noflush: jmp short fillbuf
- flushbuf endp
-
- ;*************************************************************************
- ; fillbuf: Fill the input buffer with more goodies
- ; We do a block read on stdin, then put a sentinel at the end.
- ; Since we are I/O bound, speed is not an issue.
- ;*************************************************************************
- fillbuf proc near
- push ax
- push bx
- push cx
- push dx
-
- mov bx, 0 ; read from stdin
- mov cx, bufsiz ; number of characters to read
- mov dx, offset DGROUP:inbuf ; buffer location
- mov si, dx
- DOS read
- jc fillerr
- or ax, ax ; EOF
- jz fillerr
- mov bx, ax
- add bx, offset DGROUP:inbuf
- mov byte ptr [bx], 0 ; mark the end of the buffer
- pop dx
- pop cx
- pop bx
- pop ax
- lodsb
- ret
-
- fillerr:
- die "Unexpected end of input"
-
- fillbuf endp
-
- ;*************************************************************************
- ; convert: Convert the number in the input buffer in base BP into [DX:BX]
- ; Returns NZ if [DX:BX] is not equal to dword ptr [DI].
- ; If DI = 0, then returns Z always.
- ;*************************************************************************
- convert proc near
- push di
-
- xor bx, bx ; the number is collected in [DI:BX]
- xor di, di
- xor cx, cx
-
- convloop:
- inchar
- sub al, '0'
- jb convbye
- cmp al, 9
- jbe notletter
- sub al, 'a' - ('0' + 10)
- notletter:
- mov cl, al
- cmp cx, bp
- ja convbye
-
- mov ax, di
- mul bp
- mov di, ax ; di now contains itself, times the base
-
- mov ax, bx
- mul bp
- add di, dx
- mov bx, ax
- add bx, cx
- adc di, 0
-
- jmp convloop
-
- convbye:
- mov dx, di
- pop di
- or di, di
- jz retcv
- cmp [di], bx
- jnz retcv
- cmp [di+2], dx
- retcv: ret
- convert endp
-
- ;*************************************************************************
- ; prefix: Confirm that the next two characters are AL and a space.
- ;*************************************************************************
- prefix proc near
- inchar
- cmp ah, al
- jnz badtrailer
- inchar
- cmp al, ' '
- jnz badtrailer
- ret
- prefix endp
-
- ;*************************************************************************
- ; doprefix: Calls prefix (qv) with the argument as AL.
- ;*************************************************************************
- doprefix macro c
- mov ah, c
- call prefix
- endm
-
- ;*************************************************************************
- ; confirm: The prefix should be C, followed by a number in base BASE
- ; which should agree with the variable VAR. If it doesn't,
- ; then jump to ERR.
- ;*************************************************************************
-
- confirm macro c, base, var, err
- ifnb <c>
- doprefix c
- endif
- mov di, offset DGROUP:var&L
- mov bp, base
- call convert
- jnz err
- endm
-
- ;*************************************************************************
- ; main: The main program.
- ;*************************************************************************
-
- main proc near
-
- ;*************************************************************************
- ; Step 1a: Get the filename from the command line and create it for writing.
- ;*************************************************************************
- mov di, 081h ; command line lives here
- mov cx, 07fh
- mov al, ' '
- cld
- repe scasb ; look for a non-space
- dec di
-
- mov dx, di ; save filename start
-
- mov al, cr ; look for end of command line
- repne scasb
- mov byte ptr [di-1], 0 ; terminate the filename
- xor cx, cx ; file has no attributes
- DOS creat
-
- assume ds:DGROUP, es:DGROUP
- mov bx, DGROUP
- mov ds, bx
- mov es, bx
- jc outerr
-
- ;*************************************************************************
- ; Step 1b: Initalize I/O pointers and preread the input
- ;*************************************************************************
- mov fdout, ax
- call fillbuf
- dec si ; unread the character fillbuf reads
-
- ;*************************************************************************
- ; Step 2: Skip to the start line.
- ; Here, di is used to point into the start line for comparison.
- ;*************************************************************************
- makestr header, <"xbtoa Begin">, nheader
-
- bol: ; at the beginning of a new line
- mov di, offset DGROUP:header
- mov cx, nheader
-
- compare: ; compare current line against start
- inchar
- scasb
- jnz blip
- loop compare
- jmp short found
-
- outerr: die "Couldn't open output file"
-
- inblip:
- inchar
- blip: ; skip until you hit a nl
- cmp al, lf
- jnz inblip
- jmp short bol
-
- badtrailer:
- die "Invalid trailer"
-
- badchar:
- die "Invalid character in input"
-
- ;*************************************************************************
- ; Step 3a: Prepare to decode the file.
- ;*************************************************************************
- found:
- mov di, offset DGROUP:outbuf
-
- ;*************************************************************************
- ; Step 3b: Decode the file.
- ;*************************************************************************
- call decode
- cmp al, 'x'-'!'
- jnz badchar
-
- push di ; free up a register to diddle with
-
- ;*************************************************************************
- ; Step 4: Inspect the trailer
- ;
- ; The trailer looks like this (noting that the x has already been
- ; scanned)
- ;
- ; xbtoa End N Clen Clen E Ceor S Csum R Crot
- ;
- ; Where all values are in hex, except for the first Clen.
- ; The Clen gives the actual length of the file. (The encoding
- ; pads the file with nulls until the length is 0 mod 4.)
- ;*************************************************************************
- makestr trailer, <"btoa End ">, ntrailer
- mov di, offset DGROUP:trailer
- mov cx, ntrailer
- trail:
- inchar
- scasb
- jnz badtrailer
- loop trail
- jmp short parse
-
- badlen:
- die "File lengths disagree"
- badcksum:
- die "Checksum mismatch"
-
- parse:
- doprefix 'N'
- mov bp, 10
- xor di, di ; don't verify against anything
- call convert
- mov ClenL, bx ; save actual length for checksum
- mov ClenH, dx
-
- ;*************************************************************************
- ; Step 4': Adjust the file length and write out the last little bit
- ;*************************************************************************
- pop di ; recover output file position
- adjustloop:
- and bx, 3
- jz noadjust
-
- dec di
- inc bx
- jmp adjustloop
-
- noadjust:
- mov dx, offset DGROUP:outbuf ; buffer location
- mov cx, di ; number of bytes to write
- sub cx, dx
- jcxz nowrite
- mov bx, fdout ; write to the output file
- DOS write
- jnc nowrite
- die "Error writing output file"
-
- ;*************************************************************************
- ; Step 4'': Inspect the checksum values.
- ;*************************************************************************
- nowrite:
- confirm <>, 16, Clen, badlen
- confirm 'E', 16, Ceor, badcksum
- confirm 'S', 16, Csum, badcksum
- confirm 'R', 16, Crot, badcksum
-
- mov bx, fdout
- DOS close
- mov al, 0
- DOS exit
-
- main endp
-
- ;*************************************************************************
- ; The end
- ;*************************************************************************
- _TEXT ends
-
- end main
-