home *** CD-ROM | disk | FTP | other *** search
- ************************************************************************
- * *
- * ASMGEN.COM - by J. Gersbach and J. Damke (Ver. 2.01) *
- * *
- * A program to generate cross-referenced assembly language code *
- * from any executable file. *
- * *
- * *
- * *
- * Uploaded to PCanada by Mark Magner November 23, 1983 *
- * *
- ************************************************************************
-
-
- * PREFACE *
-
-
- This program will generate 8086/87/88 assembly code text that is
- compatible with the IBM Personal Computer Macro Assembler from
- any executable diskette file up to 65,535 bytes. The output can
- be routed to the console or a diskette file. A reference list
- may be generated separately or embedded at the appropiate
- instruction counter address in the assembly code.
-
- Some manual touch up will be required before reassembly, but
- nearly all the typing is done for you by ASMGEN and anything
- questionable is marked with "??".
-
- A file of sequential instructions may be resident on the same
- diskette to indicate to ASMGEN which addresses contain code,
- bytes, words, or strings. This file may also include
- instructions to assume segment register values or toggle the
- output of assembley code text, generation of the reference
- table, 8087 mnemonics, of the inclusion of embedded reference
- information in the assembly file.
-
- DEBUG may be used to browse through the executable file to
- determine the starting locations of code and data to develop the
- sequential instruction file. It is important to accurately
- specify these locations for an accurate reference tabel and
- minimum touching up of the ASM output text.
-
- The number of references within the file determines the amount
- of memory required since a reference tabel is built in memory
- during the first pass. Disassembly is done from disk and only
- one file sector is in memory at any given time. Therefore memory
- size does not limit the size of the file to be disassembled. 48K
- bytes of memory will be enough for most programs but a few will
- need 64K or 128K. One diskette drive is sufficient but two is
- more convenient.
-
-
- * STARTING ASMGEN *
-
- There are two ways to work with ASMGEN: either by using the
- command menu or by calling ASMGEN with parameters. Following are
- the descriptions of both options.
-
- * USING THE ASMGEN MENU *
-
- The program is invoked by typing: ASMGEN
-
- You are then prompted for a file specification. Respond with the
- name of the executable file from which you wish to generate the
- assembly code. The executable file will normally have an
- extension of .EXE or .COM. ASMGEN will check this file spec for
- validity and then respond with a prompt that includes a summary
- of the command letters indicating that you may give it a
- command. The executable file contents are not checked for valid
- code and ASMGEN will try to dis-assemble text or compressed
- BASIC files and produce unintelligible assembly code.
-
- The commands are:
-
- X filespec This file spec replaces any previous executable
- file spec. The usual file extension is .COM
- or .EXE
-
- EXAMPLE: X DATE.COM
-
-
- A <filespec> The executable file is disassembled and the assem-
- bly code is routed to the specified file. The
- usual file extension is .ASM. If the filespec is
- omitted, the output will default to the console.
-
- EXAMPLE: A DATE.ASM
-
- R <filespec> The reference table is sent to the file specified.
- The usual file extension is .TBL. If the filespec
- is omitted, the output will default to the console.
-
- EXAMPLE: R DATE.TBL
-
- Q The program is terminated and control returned to
- DOS.
-
-
- Each time a command has been executed, ASMGEN waits with a one
- line prompt for the next command.
-
- X <filespec>, A <CON>, R <CON> or Q ?
-
- The default filespec for each command is shown in brackets.
- Enter the next command of your choice as described above.
-
-
- * USING ASMGEN WITH PARAMETER CALLS *
-
- Up to three file specifications may be included when ASMGEN is
- first called from DOS. The executable file's name is given
- first, followed by specifications for the assembly and reference
- table files.
-
- EXAMPLE: ASMGEN DATE.COM, DATE.ASM, DATE.TBL
-
- If a semicolon follows the last filespec, ASMGEN will exit to
- DOS when the command has been executed. If no semicolon is
- entered, ASMGEN will display the menu options described above
- and wait for further input after executing the command.
-
- EXAMPLE: ASMGEN DATE.COM, DATE.ASM;
-
- If the filespec for the .ASM file and/or .TBL file is omitted,
- ASMGEN will generate first the .ASM file, then a .TBL file using
- the filename of the first filespec.
-
- EXAMPLE: ASMGEN DATE.COM,,; creates DATE.ASM and DATE.TBL and
- exits to DOS.
-
- If only the reference table is desired, the dummy name NUL
- should be entered in place of an .ASM filespec
-
- EXAMPLE: ASMGEN DATE.COM, NUL, DATE.TBL
-
- If only one filespec is given when the program is called, the
- reference table is built in memory and then the menu options are
- displayed for further commands.
-
- EXAMPLE: ASMGEN DATE.COM
-
-
- * PROGRAM EXECUTION *
-
- The disassembly is done in two passes through the scource file.
- On pass #1, the reference table is built in memory and the
- actual output is generated during pass #2. Once the reference
- table is established, it remains in memory until an X or Q
- command is issued, and subsequent A and R command executions
- skip pass #1. This saves a lot of time when the executable file
- is large.
-
- Three contiguous data areas are built dynamically in memory
- during pass #1. First is the compressed sequential instruction
- list. Second is a list of pointers for .EXE files that point to
- the locations of all relocatable variables in the program, also
- arranged in numerical order. These are established before
- reading any code. Third, the reference table is then built in a
- higher area of memory as pass #1 progresses.
-
- If all available memory in the program segment is filled before
- the first two data areas are completed, ASMGEN will abort to the
- command prompt. After the reference table is started, a shortage
- of memory will produce the message "Reference Table Incomplete
- Due to Insufficient Memory" and continue.
-
- Ctrl-Break may be used at any time to interrupt a command in
- progress.
-
-
- * READING THE ASSEMBLY CODE FILE (.ASM) *
-
- This file begins with a title taken from the executable file's
- name and date followed by the current date (in brackets).
-
- If not inhibited by the M switch in a .SEQ file (explained
- later), the macro library will appear next in the file.
-
- Next will be a .RADIX 16 pseudo-op which tells the macro
- assembler that all numbers are in hexadecimal form.
-
- Then comes a header that indicates a starting value for the code
- segment, stack segment, instruction pointer and the stack
- pointer. The stack pointer is usually set to FFFF for .COM files
- but may be somewhat less depending on available memory. These
- values are passed by the linker for .EXE files.
-
- The first ASSUME statement might come next. There is one
- generated for each segment that begins with code. All segment
- registers are designated according to the current set of
- ASSUMEs. They will sometimes be incorrect, so all ASSUME
- statements should be checked prior to re-assembly.
-
- The disassembled output follows, terminated by an END statement
- and the execution address. An ORG psuedo-op is included if
- required.
-
- The text is compatible with the IBM Macro Assembler and the
- format is the same except for RETurns. To avoid the need for
- PROCedure titles, special mnemonics are provided for all RET
- instructions. These are defined in the macro library at the
- beginning of the file. Only macros that are needed for the
- current file are produced. The optional embedded commands that
- make up the reference table enhance the readability of the file.
- For very large files, this is sometimes undesirable and a
- separate reference table is best.
-
- When invalid instructions are encountered in code areas, they
- are reproduced as byte values followed by "??". If a near jump
- is defined previously in the code, and it is within range of a
- short jump, a NOP instruction is inserted after the jump. The
- executable file created with this .ASM file and the Macro
- Assembler and Linker will then be the same length as the
- original file. This makes it less important to differentiate
- between labels and numeric constants since the label values and
- their offsets within the file will be the same. The fundamental
- problem of disassembly is in knowing if the original assembly
- code defined a number as a label which changes as a function of
- it's position or as a number that always remains the same. If
- you make changes in the assembly code however, you must properly
- specify all values. You might as well remove all NOPs at the
- same time.
-
- Labels are five characters long and begin with "L". Segment
- labels begin with "S". The remaining characters are the current
- instruction counter in hex form, thus making each label unique
- and showing it's location in the original file. The instruction
- counter is continuous throughout the assembly code without
- resetting at segment boundaries. The segment labels are then in
- byte as opposed to paragraph form. In those cases where a label
- value is modified by an ASSUME statement, the original value is
- included as a comment in the referencing instruction so that it
- may be easily changed back if it was not intended as a location.
-
- The word "Relocatable" is printed at the end of any line that
- contains an ablolute paragraph value. These are values that DOS
- modifies after loading but before executing a program. They are
- used for loading segment registers that are sensitive to the
- program location in menory. Relocatable values are not modified
- by ASSUMEs. ASMGEN converts these numbers from paragraph to byte
- values by multiplying them by sixteen so that they will fit
- within the 16-bit instruction counter field. When the paragraph
- value is negative or exceeds 0FFFH, it is left unchanged and a
- warning (??) is issued on that line. When a program larger than
- 64K bytes is being disassembled, it should be divided into
- smaller files.
-
- All words are produced as labels, except when the "L" switch has
- been enacted in the .SEQ file (explained later). The label name
- indicates it's numeric value and, if it does not occur on an
- instruction boundary, the name indicates it's position relative
- to the current instruction pointer is given by an EQU statement.
- Therefore the Macro Assember will assume that it is a location,
- but it is easily changed to a constant since the value is given
- in the label name. The word OFFSET precedes a label whenever it
- is questionable whether it is a label or an immediate value. You
- must decide which of the labels should be constants and which of
- the constants should be labels, and change them accordingly.
- When changing labels to numbers, be sure to append an "H" if the
- number ends with a "D" or a "B" since the Macro Assembler will
- otherwise assume that it is decimal or binary.
-
- Bytes are always treated as constants. An optional switch may be
- included in the .SEQ file (explained later) which enables
- numbers instead of labels if all references to the value are
- data segment and immediate operation types.
-
- An effective procedure to follow in attempting to understand the
- assembly code file is to look first for the message text area,
- the input commands, and the simpler subroutines. Then add label
- names to addresses in the .SEQ file (explained later) that
- remind the you of their purpose. Add comments to the labels. If
- these names are well chosen, the larger routines eventually will
- become clear. The embedded references are produced as labels so
- they will retain their meanings as they are changed.
-
- It is also helpful to spend some time studying the structure of
- data areas. Vector tables, which are frequently used to control
- the program's flow, reveal the program's structure very quickly.
- If some routines do not have labels at the beginning, it is
- usually because the code or tables that reference them (or the
- segment register assumptions) are not properly defined in the
- .SEQ file.
-
-
- * READING THE REFERENCE TABLE (.TBL) *
-
- A referencee is defined as a number that is referenced somewhere
- in the program. It may be a program loaction or a numeric
- constant.
-
- A referencor is is defined as the address in the program from
- which a reference is made to the referencee.
-
- Each entry is composed of a referencEE followed by a list of
- referencors. If more than one line is needed, additional lines
- are indented to the first referencor position. The referencEE is
- followed by an "S" if it includes references to the beginning of
- segment. The referencor is followed by two letters, the first of
- which represents the segment register that is implied or
- prefixed in the referencing instruction. The second letter
- indicates the type of operation on the referencEE. When the
- reference entries are embedded in the assembly code, all values
- are preceded with the letter "L".
-
- ----------------------------------------------------------------------------
- 1st letter | 2nd letter
- SEG REGISTER | TYPE OF OPERATION
- ----------------------------------------------------------------------------
- C code | J jump M modify - INC, ADD, etc.
- S stack | C call I immediate - value or offset
- D data | R read T test or compare
- E extra | W write ? unknown or ESC instruction
- | P port
- ----------------|-----------------------------------------------------------
-
-
-
- * WRITING/READING THE SEQUENTIAL INSTRUCTION FILE (.SEQ) *
-
- The sequential instruction file is a list of special
- instructions to ASMGEN which the user creates. The file takes
- the form of a list of hexadecimal addresses and single-letter
- instructions or generation switches. If used, the .SEQ file must
- be on the same diskette as the source file and have the same
- name as the source file with an extension of .SEQ. Each
- instruction in the file must be in one of the following formats:
-
- addr command
- or
- addr command ;comment
- or
- addr command label comment
- or
- addr command label comment ;comment
-
- "addr" represents the instruction pointer value. All addr values
- must be in numerical sequence in the file.
-
- "command" may be either a toggle switch or a generation
- instruction.
-
- "label" is optional and replaces the label generated for this
- address with this non-blank string.
-
- "comment" is optional and must be preceded by "label" unless the
- dummy label "." is used. Everything following "label" is treated
- as an address comment and will be printed in the ASM file behind
- the generated instruction. The address comment may be up to 255
- characters in length and should not contain a semi-colon.
-
- ";comment" is optional. Anything following a semi-colon in the
- .SEQ file instructions is considered as a comment in the .SEQ
- file only and is not added to the generated .ASM file.
-
- "label" and "comment" are not allowed when a generation switch
- is coded, but a ";comment" may be used to help clarify the .SEQ
- file.
-
- The .SEQ file is read into memory before the first pass starts.
- The addresses and commands will be compressed, but "label" and
- "comment" will be held in memory one to one. An effect of this
- is that memory space required for disassembly increases with
- each "label" and "comment" added to the .SEQ file.
-
-
- * DESCRIPTION OF GENERATION SWITCHES *
-
- THE VARIOUS TOGGLE SWITCHES ARE SET TO ON BY DEFAULT. Switches
- may be toggled on and off at any point in the .SEQ
- file/disassembly.
-
- All options switches except /M and /H can be either toggled or
- directly set by the user. A suffix of "+" turns the switch ON,
- and a suffix of "-" turns the switch OFF. Switches encountered
- in the file that have neither of these suffixes are toggled to
- the opposite of their state at the time; ON switches are turned
- OFF and OFF switches are turned ON.
-
- /B - generate byte references
-
- When ON, byte and word references are included in the reference
- table. When OFF, only word references are generated.
-
- /E - embedded references in ASM file
-
- When ON, reference table entries are inserted in the text just
- before the referencee's definition statement. When OFF, these
- entries are not included with the disassembled text. The entire
- reference table can be printed with the "R" command.
-
- /F - 8087 mnemonics
-
- When ON, ESC instructions are produced. When OFF, ESC
- instructions are assumed to be 8087 instructions and 8087
- mnemonics are produced.
-
- /H - append hex "H"
-
- When this switch appears at any point in the .SEQ file, an "H"
- is appended to all hex numbers. This does not, of course, apply
- to the labels which are hex values preceded by the letter "L".
- The .RADIX 16 pseudo-op is omitted which allows the assembler's
- radix to default to decimal. This switch defaults to NO H
- APPEND. Note that it will be set only once. It retains it's
- value until the next .SEQ file is read.
-
- /L - generate label or number
-
- When ON, all word references are treated as labels. When OFF, a
- word reference is treated as a constant if all referencors are
- data immediate types.
-
- /M - suppress macro library
-
- When this switch appears at any point in the .SEQ file, no macro
- library is included in the text output. The DEFAULT IS THAT THE
- MACRO LIBRARY WILL BE INCLUDED. Note that this switch will be
- set only once. It retains it's value until the next .SEQ file is
- read.
-
- /O - control ASM output
-
- When ON, ASMGEN will output the generated text. When OFF, output
- will be suppressed.
-
- /R - control TBL output
-
- When ON, ASMGEN will output the generated reference data. When
- OFF, the reference table is not printed.
-
- /T - control trace output
-
- When ON, up to 16 bytes of object code are included as comments
- in each line of the assembly code file. When OFF, object code
- is not included.
-
-
- * DESCRIPTION OF .SEQ FILE COMMANDS *
-
- A - assume
-
- The following lines contain ASSUMptions for segment register
- values. They become effective at the address specified by this
- instruction and may be modified anywhere in the disassembly.
- The required format for assumptions is:
-
- & 0400 DS
-
- The ampersand indicates a continuation of the A instruction.
-
- In this example, a data segment beginning at a instruction
- pointer value of 400 will be assumed until another A instruction
- changes it. CS, ES, and SS are also supported. The segment
- assumptions are used for effective address calculations only.
- The code segment assumption does not affect the instruction
- pointer value.
-
- B - bytes
-
- The bytes encountered in the source file are assumed to have
- meaning as single byte values.
-
- C - code
-
- The bytes encountered in the source file are assumed to be valid
- 8088 machine language instructions.
-
- D - generate data operand
-
- The operand of the instructions is changed to immediate data.
- Subsequent bytes are interpreted as "C" (code follows).
- --Appears to not work correctly if the /H shitch is set--
-
- I - initial value for IP
-
- The hexadecimal value on this line overrides the instruction
- pointer value at the beginning of the file - not to be confused
- with the address at which execution begins. The default values
- are 0000 for EXE files and 0100H for COM and other files. The
- execution address following the END statement is omitted if this
- option is invoked.
-
- S - strings
-
- The bytes encountered in the source file are assumed to form
- text. Quoted text is produced for printable ASCII characters and
- byte values for others.
-
- # - defined length strings
-
- The first byte encountered in the source file contains the
- length of the character string which begins with the next
- encountered character. This length value may be overridden by a
- subsequent SEQ file instruction.
-
- $ - defined length strings
-
- The first byte encountered in the source file contains the
- length of the character string which begins with the next
- encountered character plus the length byte itself. This length
- value may be overridden by a subsequent SEQ file instruction.
-
- W - words
-
- Pairs of bytes encountered in the source file are assumed to
- have meaning as word values.
-
- X - repeating data structure
-
- A cyclic data structure is assumed to begin at the specified
- instruction pointer value. The structure definition may follow
- and is prefixed by an ampersand (&) to indicate the continuation
- of this instruction. If the definition does not follow, then the
- most recent definition is used. If no structure is yet defined,
- then an error message is displayed.
-
- The following elements may be used to define the structure:
-
- & NNNN S - The next NNNN bytes are defined as string characters
- & NNNN B - The next NNNN bytes are defined as byte values
- & NNNN W - The next NNNN bytes are defined as word values
- & XXNN $ - The next sequence of bytes is defined as NN fields.
- Each field consists of a length byte and a string
- of characters. The length of each field is
- contained in the first encountered byte. The high
- nibble (XX), if non-zero, is a bit mask of the
- length field within the byte. The length field is
- right-justified within the byte after the byte
- value is sent to the output file.
-
-
-
- * EXAMPLES OF .SEQ COMMANDS *
-
- This example .SEQ file shows all the possible instructions in
- the appropriate format.
-
- ;All switches are on at the beginning.
- 0 /T ;no object code as comments in output
- 0 /M ;no macro library in output
- 0 /H ;append "H" to all numbers
- 00H A ;assume the following segment values
- ;Note that the ampersand (&) indicates the extended ASSUME
- & 380 DS ;the data segment starts at 380 hex
- & 380 ES ;the extra segment starts at 380 hex
- 0200 I ;initialize the instruction pointer to 200
- 0200 /F ;introduce 8087 mnemonics (not ESC)
- 0200 /E ;no embedded references
- 0200 C ;code begins at 200
- 0203H W ;words are at 203
- 0207 C ;more code starting here
- 220 X ;complex data structure begins here
- & 3 W ;words
- & 1 B ;byte
- & 0E02 $ ;2 strings starting with the 2nd byte follow
- ;bits 3,2,1 of the first byte contain the length of the
- ;string including the length byte.
- ;the high nibble (0E) is the mask.
- ;see also # in summary below
- & 1 B ;byte
- ;the structure repeats until 351
- 351 B ;bytes
- 358 C ;more code
- 380 S ;strings - list of messages
- 421 W ;words
- 4FD /B ;no further byte references
- 502 /R ;garbage here - turn off reference generation
- 502 /O ;and output
- 600H /O+ ;valid code - turn output back on
- 600 /R
- 600 C
- 1A60 /O- ;output file about to fill diskette - turn output
- ;off but keep ;scanning for references. another run
- ;will be needed to get the remaining code.
- 1B00 /D ;treat operand as immediate data
- 1DFD /B+ ;continue with byte references
- 1F45 W user_prt ;user provided labels will translate
- 2256 S $MSG ;to upper case
- ^Z ;Ctrl-Z, End-of-file marker is necessary
-
-
- Comments may be included if preceded by a semicolon.
-
- Alphabetic characters may be either upper or lower case.
-
- An "H" may follow the hex address.
-
-
-
- * SAMPLE SESSION *
-
- The external command CHKDSK.COM will serve as an example for
- this sample session because it is short. The .SEQ file is also
- short and easy to generate. Only these few instructions are
- needed.
-
-
- 0100 /T ;include object code as comments in .ASM file
- 0100 /E ;simpler output without references
- 04F7H S ;messages
- 04F7H /H ;append "H" to numeric values
-
- Using DEBUG, browse through CHKDSK.COM to see how this was
- arrived at. Usually, but not always, the best procedure is to
- assume code. If the code appears unintelligible, display it in
- hex/ASCII. If it is not text, assume bytes. Label positions in
- the first disassembly may indicate that some locations should be
- words. Next, generate the .ASM file by typing
-
- ASMGEN CHKDSK.COM <enter>
- A <enter>
-
- The assembly code can be viewed on the screen. Then type
-
- A CHKDSK.ASM <enter>
-
- to save the assembly source code to a file. Then,
-
- R CHKDSK.TBL <enter>
-
- to save the cross-reference table to disk.
-
- The Macro Assembler, Link.exe and Exe2bin could now be used to
- assemble CHKDSK.ASM, link it to .EXE and convert it to a .COM
- file. No modification should be necessary in this case.
-
- If working with code that is to be modified, the symbol types
- must be correctly specified as locations or as constants. If
- they are constants, place them outside of any segment. The label
- names may then be changed to make the code more readable.
-
-
- There are several disassembly issues and errors that must be corrected.
-
- 1.) put the PAGE directive "PAGE 56,132" on a line near the top of
- the .ASM listing.
-
- 2.) With the /H switch set, the number 0FFFF is not always provided
- with an "H" suffix. It must be corrected by hand.
-
- 2a.) With the /H switch set, the immediate operand value generated
- with the D command will need to be patched with an H suffix.
-
- 3.) .SYS type device driver files start at 0000, and need an ASSUME
- directive. They also need a "0000 I" line to start disassembly
- at 0000H. Two pointers to executable code are at 0006 and 0008.
-
- 4.) Override operators BYTE PTR or WORD PTR are sometimes needed.
-
- 5.) Most Microsoft MASM versions simply refuse to assemble certain
- disassembled instructions (usually LES or LDS instructions with
- an 8-bit register specified). They actually assemble the code
- correctly, but doesn't like the construct and declare an assembly
- error. In such cases, try using the IBM's MASM 1.00, it isn't
- as fussy about assembling such questionable code.
-
- 6.) If a message 'Hex value error in /SEQ file' is displayed upon
- attempting to disassemble a file, it may be necessary to add a
- Ctrl-Z, End-of-file marker, at the end of the .SEQ file.
-
-
-