home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- TABLE OF CONTENTS
-
- Description ..................2
-
- Syntax .......................2
-
- Labels .....................3
-
- Opcodes ....................3
-
- Operands ...................4
-
- Pseudo ops .................5
-
- Error Messages ................6
-
- Source code ...................6
-
- Disclaimer ....................7
-
- Revisions......................8
-
- Op code table .................9
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 1
-
-
-
-
-
-
-
-
- ASSEMBLE.COM
-
-
- DESCRIPTION
- This is a two pass assembler written in Turbo Pascal. It
- was written because I was a new owner of an IBM PCjr with little
- software and was not aware of any public domain assemblers that
- had reasonable performance. Additionally I had recently purchased
- Turbo Pascal from Borland International and wanted a project to
- help me learn Pascal.
-
- This is a simple assembler for Intel 8088/8086 instruction
- set. It closely follows the syntax of the instruction set
- described in THE 8806 BOOK by Russel Rector and George Alexy. It
- is also patterned after CHASM.BAS version 1.9 written in basic by
- David Whitman of Whitman Software (there have been numerious
- changes to avoid copywrite violations). It is not a macro
- assembler and therefore recognizes only a few pseudo op codes.
- Listed later in this documentation is a complete list of op codes
- and pseudo ops recognized by ASSEMBLE.COM.
-
- The input requirements to ASSEMBLE.COM are ordinary DOS
- files which can be created with most text editors. The output
- will be a listing sent to the screen, printer or a disk file and
- a .COM file. This assembler was designed to create executable
- programs and assembly code for use in BASIC or Turbo Pascal
- programs. It will not generate code that can be linked to other
- other programs. See your BASIC or TURBO instruction manuals for
- including executable code in your programs. Since this assembler
- is intended for small projects you will probably find bypassing
- the Link and conversion from .EXE to .COM files is a convenience.
- If you intend to write large software projects, I recommend you
- get a macro assembler such as IBM's or CHASM version 4.0 (also
- written in Turbo Pascal). Although intended for small projects,
- since both the input file and output file(s) are on the disc the
- only limitations to the size of your program is the disk space,
- number of labels you use and your endurance. Labels and their
- memory address are stored on the first pass in a data array. This
- array is limited to 400 lables. My goal is to keep this program
- under 35K bytes so it can be run on small systems. If you have
- the memory this program can easily be recompiled with a larger
- array defined. See the notice at the end for the source code.
-
- SYNTAX
- Each line of source code begins with a label or blank space
- followed by the op code (8088 instruction or pseudo op) and then
- the operand(s) (if required by the opcode). The source code may be
- followed with a semicolon (;) and any comments for that line.
- Optionally a line may be only comments if it starts with a semi
- colon. A blank space must precede and follow the op code field.
- For readability I recommend one or more spaces between the label
- and the op code and between the op code and the operand. A comma
- must be used to separate operands. The requirement for a comma
- between operands differs from CHASM and should be noted when
- converting programs.
-
-
- 2
-
-
-
-
-
-
-
-
- Example
- 1stLabel mov ax,90H ;format example
-
- LABLES
- Although labels may be longer only the first 12 letters of
- the label are stored for future reference therefore any labels
- with the first twelve characters the same will cause a duplicate
- label error. Also since the line parseing routine converts all
- source code not in single quotes to upper case prior to decodeing
- the line, you can not use upper and lower case to distinguish
- labels. For example LongLabelxxx1 and longlabelxxx2 are both
- stored as LONGLABLEXXX therefore would cause an duplicate label error.
- If you are going to use numbers to distinguish labels I suggest
- you use them at the beginning of a label such as 2LongLabelXXX.
-
- OP CODES
- All of the op codes specified in THE 8086 BOOK are supported
- however in order to resolve some ambiguous op codes the syntax
- was modified. The first ambiguous opcode is JMP which can be
- either a 8 or 16 bit displacement jump. Eight bit displacement
- jumps are resolved by specifying JMPS for short jump. Jumps
- useing mem/reg (indirect) addressing for their destination must
- specify Near or Far to indicate a jump within the current CS or
- an intersegment jump. These jumps are coded as JMPN or JMPF. This
- same logic is used for CALLN and CALLF when using the mem/reg
- addressing mode. The other major area of ambiguity comes from
- using op codes that do not specify a register as either the
- destination or source. This assembler requires you to append the
- op code with a B or W to distinguish between bytes and words ie.
- MOVSB for move a string of bytes or MOVW [bx],8 to load 8 into
- the word address pointed to by BX.
-
- Normaly all data moves are assumed to be relative to the DS
- (data segment) register. This default can be over ridden one
- instruction at a time by using the SEG op code in the line prior
- to the desired over ride.
- Example
- SEG ES
- MOV AX,[BX]
- This moves a word into the accumulator from the address in the
- extra segment offset by the bx register. This is a little used
- function since ASSEMBLE assumes all of the segment registers are
- set to the same location as is required for the start of a .COM
- program. Access of system resources should be done with BIO or
- DOS calls when possible rather than going directly to a hardware
- memory location outside your program.
-
- The opcode table lists the available opcodes and pseudo ops
- and the various addressing modes associated with each. Please
- note that the mem/reg addressing mode includes several sub modes
- such as base relative (using BX as an offset), stack relative
- (using BP as an offset), and indexed (useing the SI or DI). See
- the 8086 BOOK for an explanation of each mode.
-
-
-
-
- 3
-
-
-
-
-
-
-
-
- OPERANDS
- The operands describe to the assembler the destination and
- source of the data to be operated on. The 8088 uses a number of
- addressing modes to determine where that data is and should go.
- You will discover by looking at the op code table, not all modes
- can be used with an individual op code. Addressing modes are :
-
- Accumulator - data is transferred to/from the accumulator.
- Displacement - the displacement value is added to the present IP.
- Immediate - data is assembled into the instruction.
- Memory/Reg - data is transferred to/from address pointed to
- by [mem] or [reg].
- Register - data is transferred to/from the register.
-
- Lables can be used in the Immediate, Displacement and Memory
- addressing modes. This assembler follows the Intel convention of
- treating a lable operand as a refference to the value in a
- memory location unless it was defined by an EQU. If a value is
- added to that lable it is then treated as a refference to the
- location address. To use an offset to the lable to obtain a value
- in a memory location the lable and offset must be contained in
- brackets just as a numeric value would.
- Operand Meaning
- Lable refference value in the memory location
- Offset Lable refference the memory location
- Lable+5 refference the memory location at lable+5
- This is the same as Offset Lable+5.
- $-Lable refference to a memory location or a numeric
- value i.e. when defineing a buffer length.
- [Lable+5] refference the value in memory location lable+5
- [1234] refference the value in memory location 1234
-
- Addition, subtraction, multiplication and division are supported
- by the parser. Lables are treated as numbers when used in a math
- expression.
-
- Accumulator: The accumulator(s) are AX or AL and AH where AX is a
- 16 bit accumulator, AL is the lower 8 bits of AX and AH is the
- higher 8 bits.
-
- DISPLACEMENT
- A displacement value to be added to the instruction pointer
- (IP) is included as immediate data in the opcode. The assembler
- calculates the amount of displacement based on the location of
- the opcode and then location of the address in the operand. The
- address in the operand can be expressed as a number (binary,hex
- or decimal) but is most commonly expressed as a label.
- Example
- LABEL MOV AX,[BX]
- CMP AX,10H
- JLE LABEL
-
- With this example the assembler calculates a negative
- displacement to jump back to LABEL when then value in AX is less
- than or equal 10H.
-
-
- 4
-
-
-
-
-
-
-
-
- IMMEDIATE
- All immediate data is assembled into the instruction
- code. This data can be represented in two ways. First immediate
- data can be presented in binary, decimal or hexidecimal format in
- a signed range of -32768 to 32767 (8000H to 7FFFH) or if the sign
- bit is not used, 0 to 65535 (0000H to FFFFH). As in these
- examples a 'H' is appended to the number to indicate hexidecimal.
- Binary numbers are expressed as a series of up to 16 ones and
- zeros followed by a B, i.e. 11010B represents 26. The other method
- of representing immediate data is with labels. The value of the
- label is the address at which the label was used or the value
- assigned to the label in an EQU pseudo op.
- Example
- Lable equ 10
- Here db 10
- MOV BX,LABEL ;Load BX with the value of Label
- MOV BX,OFFSET(Here) ;Load BX with the address of Here
-
- MEMORY/REGISTER
- This addressing mode is also called indirect addressing. The
- operand is used to point to a memory location that contains the
- data rather than the instruction containing the data as in the
- immediate addressing mode. The operand can be a memory location
- expressed as a label, decimal number or a hexidecimal number or
- it can be a memory location pointed to by a register. The
- following indirect modes are allowed:
- MOV Reg,[BP]
- MOV [BX],Reg
- MOV [BX+SI],Reg ;BX plus SI displacement equal location
- MOV Reg,[BX+DI] ;BX plus DI " " "
- MOV Reg,[BP+SI] ;BP plus SI " " "
- MOV [BP+DI],Reg ;BP plus SI " " "
- MOV [DI],Reg
- MOV Reg,[SI]
- MOV LABEL,Reg
- MOV [1234],Reg
- Any of the general purpose registers can be used in place of Reg
- in these examples. Immediate data may also be substituted for a
- source register however then the opcode most be appended with W
- or B so the assembler knows if you are pointing to a word or byte
- address. In addition to the above when an indirect address using
- a register is chosen a displacement may also be used.
- Example
- MOV Reg,10H[BP] ;Source address equal BP+10H
- MOV Reg,-5[BP+SI] ;Source address equal BP+SI-5
- DEMO EQU FFH
- MOV DEMO[BX+DI],Reg ;Destination address equal BX+DI+255
-
- REGISTER
- In this addressing mode the data is contained in or is to be
- stored in one of the 8088 registers. The registers are AX
- (AL+AH), BX (BL+BH), CX (CL+CH), DX (DL+DH), BP, DI, SI and the
- four segment registers CS, DS, ES, SS. All math operations use
- the accumulator (AX, AL or AH) plus the MUL and DIV use the DX
- register when 32 bit numbers are involved. The BX and BP
-
-
- 5
-
-
-
-
-
-
-
-
- registers can be used as base pointers in the Data or Stack
- segments respectively. The CX register can be used for a
- automatic counter for some instructions. As demonstrated in
- earlier examples the SI and DI registers can be used as indexing
- registers. Any of the numerous assembly language books for the
- IBM PC or PCjr will give you an explanation of each of the
- processor registers and their uses.
-
- PSEUDO OPCODES
- Pseudo opcodes are assembler directives that control the
- generation of the object code. The available pseudo ops are DB,
- DS, DW, ENDP, EQU, ORG and PROC.
- DB = define byte and has operands of one or more bytes and/or
- a string ( DB 20H,'Demo' ). Strings are set off by single
- quotes. Numbers are less than 256 and expressed in binary,
- decimal or hexidecimal.
- DS = define segment and initializes a string of memory
- locations. The first operand defines the number of bytes to
- be initialized. If included the second operand defines the
- value the memory is to be initialized to. The default value
- is zero. ( DS 20,FFH ;initialize 20 bytes to 255).
- DW = define word and its operand(s) must be a number or a
- label. With DW the low order byte is stored first in memory
- as this is the format used by the 8088 for integer storage
- (i.e. dw 1020H = db 20H,10H).
- EQU= define the value of a label. All label definitions must
- occur at the beginning of your program or errors may occur
- in the assembly process. The most common error message
- received from defining a label late in the program is 'PHASE
- ERROR'. A phase error indicates the assembler generated a
- address for a label on the second pass different from that
- of the first pass.
- ORG= reset the location counter to new origin. Since all .COM
- programs start at 100H the default setting for ASSEMBLE.COM
- is 100H however you may have a need to start at 00H for a
- driver routine or a machine language routine for BASIC.
- PROC and ENDP are used together to define a program or procedure
- as Near or Far. This information is used to determine the
- type of return to be generated when a RET is encountered.
- If no procedure is defined a Near procedure is assumed. The
- syntax is:
- PROC NEAR ;Proc must be followed by Near or Far
- ....
- ....
- ENDP
- If PROC is used an ENDP must be used.
-
-
- ERROR MESSAGES
- All error messages and diagnostics are printed immediately
- before the line in which the error occurs. The total number of
- error and diagnostic messages will be displayed at the end of the
- source code print out immediately prior to the symbol table dump.
- I have made an attempt to make error messages as user
- friendly as possible. The most cryptic of the error messages is
-
-
- 6
-
-
-
-
-
-
-
-
- the series you receive when there is a syntax error. This message
- will be the opcode and ASSEMBLE.COM's interpretation of the type
- of data included in the operands. For example the message
- *** Syntax Error: MOV (16 bit immediate or 8 bit immediate), (none)
- would appear immediate before a line of code containing the
- instruction MOV 45H. By reviewing the type of data and the
- allowable operands for each instruction you should be able to
- locate the error. In the instruction above both a destination and
- source operand are required and if immediate data is used it must
- be the source operand.
- 'Phase Error' is most commonly caused by referencing an
- equate before defineing it. I strongly recommend you only use the
- EQU pseudo op at the beginning of your source code. This practice
- should prevent this error and will make your source code easier
- to read.
- 'Error: EQU without symbol' is received when you use the equate
- pseudo-op without a label.
- 'Error: EQU with forward reference' is received if you
- attempt to use a forward reference when equating a label.
- 'Error: ENDP without PROC'. You must specify where the
- procedure begins.
- 'Error: Missing ENDP'. You must specify where the procedure
- ends if PROC is used.
- 'Error: Procedures nested too deeply'. Only 10 levels of
- nesting are allowed.
- 'Error: Duplicate label'. See section on labels.
- 'Error: Data too long' indicates use of a byte operand where
- the data is out of the range of 0 to 255.
- 'Error: Too far for short jump' indicates a jump attempt
- longer than -128 or +127 bytes.
- 'Error: Undefined Symbol' plus the operand is displayed
- when no match is found in the symbol table. Frequently caused by
- bad syntax in the operand.
- 'Error: Illegal or undefined argument for OFFSET' is simular
- to 'Undefined Symbol'
-
- Two diagnostic messages may be given. The first follows a
- syntax error and is 'Specify word or byte operand' if the
- assembler could not determine which to use. The opcode must be
- corrected by appending a B or W to it. The assembler is not smart
- enough to give you this very often. The other message is just a
- notice that you used a long jump where you could have used a
- short jump and saved a byte of object code.
-
- SOURCE CODE
- Turbo Pascal source code for this assembler is avaliable for
- those who wish to customize it for their own needs (or those who
- would like to see what makes it tick). If you would like a copy
- of the source code send a formatted disk and $10 to
- George Fulford
- RR 1 Box 163c
- Shellsburg Ia 52332
- Although I have no intentions of entering the software market at
- this time I do plan to make corrections to this program as the
- bugs are found. If you obtain a copy of the source code from me
-
-
- 7
-
-
-
-
-
-
-
-
- it will be the most up to date version.
-
- WARRENTY/GUARANTEE
- There is NONE.
- This assembler runs on my PCjr and since I used all standard
- Turbo Pascal it should run on any PC DOS machine. If it does not
- I probably won't be able to help you.
-
- I have spent quite a bit of time debuging but I am sure there are
- still a few bugs lurking in the code. I will attempt to stamp out
- any reported.
-
- UPDATES
-
- Changes made by version 1.01
- 1) A more sophisticated number parsing routine was added. An
- offset may now be added to a lable. Multiplication and
- subtraction now occur in the correct order.
- 2) $ can now be used to indicate the present memory location.
- 3) Due to memory management problems with the PCjr lables are no
- longer stored in a linked list that uses all of the avaliable
- memory. Lables are stored in a predefined array of 400. This
- should be sufficient for all small programs. The 'Stack Overflow
- Error' is no longer applicable.
- 4) Lable storage has been expanded to twelve characters to
- improve the readability of lables.
- 5) A larger input buffer has been allocated to reduce disk access
- time and other minor changes in code to inprove performance.
- 6) Bugs in DIV, SHR, SHL, ORB instruction have been corrected.
- DIVB and DIVW have been added to allow 8 or 16 bit divides useing
- indirect addressing.
- 7) A bug in the line parsing routine that stopped the parsing at
- a semicolon that was enclosed in quotes has been corrected.
- 8) An error in interpeting [BP] has been corrected.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 8
-
-
-
-
-
-
-
-
- OP CODE TABLE
- addressing modes supported (b/w = must specify byte or word)
- A = acumulator reg(ax, ah, al)
- b/w = must add B or W to opcode for this addressing mode
- D = displacememt (8 or 16 bit as required by the instruction)
- I = immediate (byte for 8 bit registers, word for 16 bit reg)
- M/R = memory or register indirect addressing
- N = none
- R = register(bx, cx, dx, bp, si, di)
- S = segment register (cs, ds, es, ss)
- Op Operand
- types
- dest. N | A | A | R | R | M/R | R | M/R | I | I | D | M/R
- source N | I | M/R | I | N | R | M/R | I | I | N | N | N
- AAA x | | | | | | | | | | |
- AAD x | | | | | | | | | | |
- AAM x | | | | | | | | | | |
- AAS x | | | | | | | | | | |
- ADC | x | | x | | x | x | b/w | | | |
-
- AND | x | | x | | x | x | b/w | | | |
- CALL | | | | | | | | x | | x |
- CALLF | | | | | | | | | | | x
- CALLN | | | | | | | | | | | x
- CBW x | | | | | | | | | | |
-
- CLC x | | | | | | | | | | |
- CLD x | | | | | | | | | | |
- CLI x | | | | | | | | | | |
- CMC x | | | | | | | | | | |
- CMP | x | | x | | x | x | b/w | | | |
-
- CMPS b/w | | | | | | | | | | |
- CWD x | | | | | | | | | | |
- DAA x | | | | | | | | | | |
- DAS x | | | | | | | | | | |
- DB | | | | | | | | x | x | |
-
- DEC | | | | x | | | | | | | b/w
- DIV | | | | x | | | | | | | b/w
- DS | | | | | | | | x | x | |
- DW | | | | | | | | x | x | |
- ENDP x | | | | | | | | | | |
-
- EQU | | | | | | | | | x | | memory
- HLT x | | | | | | | | | | |
- IDIV | | x | | | | | | | | |
- IMUL | | x | | | | | | | | |
- IN | x |note1| | | | | | | | |
-
-
-
-
-
-
-
-
- 9
-
-
-
-
-
-
-
-
- Op Operand
- types
- dest. N | A | A | R | R | M/R | R | M/R | I | I | D | M/R
- source N | I | M/R | I | N | R | M/R | I | I | N | N | N
- INC | | | | x | | | | | | | b/w
- INT x | | | | | | | | | x | |
- INTO x | | | | | | | | | | |
- IRET x | | | | | | | | | | |
- JA | | | | | | | | | | x |
-
- JAE | | | | | | | | | | x |
- JB | | | | | | | | | | x |
- JBE | | | | | | | | | | x |
- JCXZ | | | | | | | | | | x |
- JE | | | | | | | | | | x |
-
- JG | | | | | | | | | | x |
- JGE | | | | | | | | | | x |
- JL | | | | | | | | | | x |
- JLE | | | | | | | | | | x |
- JMP | | | | | | | | | | x |
-
- JMPF | | | | | | | | | | x |
- JMPN | | | | | | | | | | x |
- JMPS | | | | | | | | | | x |
- JNE | | | | | | | | | | x |
- JNO | | | | | | | | | | x |
-
- JNP | | | | | | | | | | x |
- JNS | | | | | | | | | | x |
- JNZ | | | | | | | | | | x |
- JO | | | | | | | | | | x |
- JP | | | | | | | | | | x |
-
- JPE | | | | | | | | | | x |
- JPO | | | | | | | | | | x |
- JS | | | | | | | | | | x |
- JZ | | | | | | | | | | x |
- LAHF x | | | | | | | | | | |
-
- LDS | | | | | |note2| | | | |
- LEA | | | | | |note2| | | | |
- LES | | | | | |note2| | | | |
- LOCK x | | | | | | | | | | |
- LODS b/w | | | | | | | | | | |
-
- LOOP | | | | | | | | | | x |
- LOOPE | | | | | | | | | | x |
- LOOPNE | | | | | | | | | | x |
- LOOPNZ | | | | | | | | | | x |
- LOOPZ | | | | | | | | | | x |
-
-
-
-
-
-
- 10
-
-
-
-
-
-
-
-
- Op Operand
- types
- dest. N | A | A | R | R | M/R | R | M/R | I | I | D | M/R
- source N | I | M/R | I | N | R | M/R | I | I | N | N | N
- MOV note3| | x | | | x | x | b/w | | | |
- MOVS b/w | | | | | | | | | | |
- MUL | | x | | | | | | | | |
- NEG | | | | x | | | | | | |
- NOP x | | | | | | | | | | |
-
- NOT | | | | x | | | | | | | b/w
- OR | x | | x | | x | x | b/w | | | |
- ORG | | | | | | | | | x | |
- OUT | |note1| | | | | | | | |
- POP | | | | x | | | | | | |x or seg
-
- POPF x | | | | | | | | | | |
- PROC note4| | | | | | | | | | |
- PUSH | | | | x | | | | | | |x or seg
- PUSHF x | | | | | | | | | | |
- RCL | | | | x | | | | | | | b/w
-
- RCR | | | | x | | | | | | | b/w
- REP x | | | | | | | | | | |
- REPE x | | | | | | | | | | |
- REPNE x | | | | | | | | | | |
- REPNZ x | | | | | | | | | | |
-
- REPZ x | | | | | | | | | | |
- RET x | | | | | | | | | | x |
- ROL | | | | x | | | | | | | b/w
- ROR | | | | x | | | | | | | b/w
- SAHF x | | | | | | | | | | |
-
- SAR | | | | x | | | | | | | b/w
- SBB | x | | x | | x | x | b/w | | | |
- SCAS b/w | | | | | | | | | | |
- SEG | | | | | | | | | x | |
- SHL | | | | x | | | b/w | | | |
-
- SHR | | | | x | | | b/w | | | |
- STC x | | | | | | | | | | |
- STD x | | | | | | | | | | |
- STI x | | | | | | | | | | |
- STOS b/w | | | | | | | | | | |
-
- SUB | x | | x | | x | x | b/w | | | |
- TEST | x | | x | | x | x | b/w | | | |
- WAIT x | | | | | | | | | | |
- XCHG | |note5| | | x | x | | | | |
- XLAT x | | | | | | | | | | |
-
- XOR | x | | x | | x | x | b/w | | | |
-
-
-
-
- 11
-
-
-
-
-
-
-
-
- note 1 IN/OUT supports DX<-acum(8 or 16) and port<-acum(8 or 16).
- note 2 These instructions can use only memory reference
- in the source operand.
- note 3 MOV also supports mem<-acum seg<-M/R and M/R<-seg(or CS).
- note 4 Must specify near or far PROCedure.
- note 5 The accumulator can be exchanged with any of the registers
- using the form XCHG AX,BX or XCHG BX
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 12
-
-
-
-
-