Programmer 7500

home *** CD-ROM | disk | FTP | other *** search

/ Programmer 7500 / MAX_PROGRAMMERS.iso / INFO / ASMUTL / ASM1_01.ZIP / ASSEMBLE.DOC < prev

Wrap

Text File | 1985-11-18 | 32.2 KB | 793 lines

TABLE OF CONTENTS Description ..................2 Syntax .......................2 Labels .....................3 Opcodes ....................3 Operands ...................4 Pseudo ops .................5 Error Messages ................6 Source code ...................6 Disclaimer ....................7 Revisions......................8 Op code table .................9 1 ASSEMBLE.COM DESCRIPTION This is a two pass assembler written in Turbo Pascal. It was written because I was a new owner of an IBM PCjr with little software and was not aware of any public domain assemblers that had reasonable performance. Additionally I had recently purchased Turbo Pascal from Borland International and wanted a project to help me learn Pascal. This is a simple assembler for Intel 8088/8086 instruction set. It closely follows the syntax of the instruction set described in THE 8806 BOOK by Russel Rector and George Alexy. It is also patterned after CHASM.BAS version 1.9 written in basic by David Whitman of Whitman Software (there have been numerious changes to avoid copywrite violations). It is not a macro assembler and therefore recognizes only a few pseudo op codes. Listed later in this documentation is a complete list of op codes and pseudo ops recognized by ASSEMBLE.COM. The input requirements to ASSEMBLE.COM are ordinary DOS files which can be created with most text editors. The output will be a listing sent to the screen, printer or a disk file and a .COM file. This assembler was designed to create executable programs and assembly code for use in BASIC or Turbo Pascal programs. It will not generate code that can be linked to other other programs. See your BASIC or TURBO instruction manuals for including executable code in your programs. Since this assembler is intended for small projects you will probably find bypassing the Link and conversion from .EXE to .COM files is a convenience. If you intend to write large software projects, I recommend you get a macro assembler such as IBM's or CHASM version 4.0 (also written in Turbo Pascal). Although intended for small projects, since both the input file and output file(s) are on the disc the only limitations to the size of your program is the disk space, number of labels you use and your endurance. Labels and their memory address are stored on the first pass in a data array. This array is limited to 400 lables. My goal is to keep this program under 35K bytes so it can be run on small systems. If you have the memory this program can easily be recompiled with a larger array defined. See the notice at the end for the source code. SYNTAX Each line of source code begins with a label or blank space followed by the op code (8088 instruction or pseudo op) and then the operand(s) (if required by the opcode). The source code may be followed with a semicolon (;) and any comments for that line. Optionally a line may be only comments if it starts with a semi colon. A blank space must precede and follow the op code field. For readability I recommend one or more spaces between the label and the op code and between the op code and the operand. A comma must be used to separate operands. The requirement for a comma between operands differs from CHASM and should be noted when converting programs. 2 Example 1stLabel mov ax,90H ;format example LABLES Although labels may be longer only the first 12 letters of the label are stored for future reference therefore any labels with the first twelve characters the same will cause a duplicate label error. Also since the line parseing routine converts all source code not in single quotes to upper case prior to decodeing the line, you can not use upper and lower case to distinguish labels. For example LongLabelxxx1 and longlabelxxx2 are both stored as LONGLABLEXXX therefore would cause an duplicate label error. If you are going to use numbers to distinguish labels I suggest you use them at the beginning of a label such as 2LongLabelXXX. OP CODES All of the op codes specified in THE 8086 BOOK are supported however in order to resolve some ambiguous op codes the syntax was modified. The first ambiguous opcode is JMP which can be either a 8 or 16 bit displacement jump. Eight bit displacement jumps are resolved by specifying JMPS for short jump. Jumps useing mem/reg (indirect) addressing for their destination must specify Near or Far to indicate a jump within the current CS or an intersegment jump. These jumps are coded as JMPN or JMPF. This same logic is used for CALLN and CALLF when using the mem/reg addressing mode. The other major area of ambiguity comes from using op codes that do not specify a register as either the destination or source. This assembler requires you to append the op code with a B or W to distinguish between bytes and words ie. MOVSB for move a string of bytes or MOVW [bx],8 to load 8 into the word address pointed to by BX. Normaly all data moves are assumed to be relative to the DS (data segment) register. This default can be over ridden one instruction at a time by using the SEG op code in the line prior to the desired over ride. Example SEG ES MOV AX,[BX] This moves a word into the accumulator from the address in the extra segment offset by the bx register. This is a little used function since ASSEMBLE assumes all of the segment registers are set to the same location as is required for the start of a .COM program. Access of system resources should be done with BIO or DOS calls when possible rather than going directly to a hardware memory location outside your program. The opcode table lists the available opcodes and pseudo ops and the various addressing modes associated with each. Please note that the mem/reg addressing mode includes several sub modes such as base relative (using BX as an offset), stack relative (using BP as an offset), and indexed (useing the SI or DI). See the 8086 BOOK for an explanation of each mode. 3 OPERANDS The operands describe to the assembler the destination and source of the data to be operated on. The 8088 uses a number of addressing modes to determine where that data is and should go. You will discover by looking at the op code table, not all modes can be used with an individual op code. Addressing modes are : Accumulator - data is transferred to/from the accumulator. Displacement - the displacement value is added to the present IP. Immediate - data is assembled into the instruction. Memory/Reg - data is transferred to/from address pointed to by [mem] or [reg]. Register - data is transferred to/from the register. Lables can be used in the Immediate, Displacement and Memory addressing modes. This assembler follows the Intel convention of treating a lable operand as a refference to the value in a memory location unless it was defined by an EQU. If a value is added to that lable it is then treated as a refference to the location address. To use an offset to the lable to obtain a value in a memory location the lable and offset must be contained in brackets just as a numeric value would. Operand Meaning Lable refference value in the memory location Offset Lable refference the memory location Lable+5 refference the memory location at lable+5 This is the same as Offset Lable+5. $-Lable refference to a memory location or a numeric value i.e. when defineing a buffer length. [Lable+5] refference the value in memory location lable+5 [1234] refference the value in memory location 1234 Addition, subtraction, multiplication and division are supported by the parser. Lables are treated as numbers when used in a math expression. Accumulator: The accumulator(s) are AX or AL and AH where AX is a 16 bit accumulator, AL is the lower 8 bits of AX and AH is the higher 8 bits. DISPLACEMENT A displacement value to be added to the instruction pointer (IP) is included as immediate data in the opcode. The assembler calculates the amount of displacement based on the location of the opcode and then location of the address in the operand. The address in the operand can be expressed as a number (binary,hex or decimal) but is most commonly expressed as a label. Example LABEL MOV AX,[BX] CMP AX,10H JLE LABEL With this example the assembler calculates a negative displacement to jump back to LABEL when then value in AX is less than or equal 10H. 4 IMMEDIATE All immediate data is assembled into the instruction code. This data can be represented in two ways. First immediate data can be presented in binary, decimal or hexidecimal format in a signed range of -32768 to 32767 (8000H to 7FFFH) or if the sign bit is not used, 0 to 65535 (0000H to FFFFH). As in these examples a 'H' is appended to the number to indicate hexidecimal. Binary numbers are expressed as a series of up to 16 ones and zeros followed by a B, i.e. 11010B represents 26. The other method of representing immediate data is with labels. The value of the label is the address at which the label was used or the value assigned to the label in an EQU pseudo op. Example Lable equ 10 Here db 10 MOV BX,LABEL ;Load BX with the value of Label MOV BX,OFFSET(Here) ;Load BX with the address of Here MEMORY/REGISTER This addressing mode is also called indirect addressing. The operand is used to point to a memory location that contains the data rather than the instruction containing the data as in the immediate addressing mode. The operand can be a memory location expressed as a label, decimal number or a hexidecimal number or it can be a memory location pointed to by a register. The following indirect modes are allowed: MOV Reg,[BP] MOV [BX],Reg MOV [BX+SI],Reg ;BX plus SI displacement equal location MOV Reg,[BX+DI] ;BX plus DI " " " MOV Reg,[BP+SI] ;BP plus SI " " " MOV [BP+DI],Reg ;BP plus SI " " " MOV [DI],Reg MOV Reg,[SI] MOV LABEL,Reg MOV [1234],Reg Any of the general purpose registers can be used in place of Reg in these examples. Immediate data may also be substituted for a source register however then the opcode most be appended with W or B so the assembler knows if you are pointing to a word or byte address. In addition to the above when an indirect address using a register is chosen a displacement may also be used. Example MOV Reg,10H[BP] ;Source address equal BP+10H MOV Reg,-5[BP+SI] ;Source address equal BP+SI-5 DEMO EQU FFH MOV DEMO[BX+DI],Reg ;Destination address equal BX+DI+255 REGISTER In this addressing mode the data is contained in or is to be stored in one of the 8088 registers. The registers are AX (AL+AH), BX (BL+BH), CX (CL+CH), DX (DL+DH), BP, DI, SI and the four segment registers CS, DS, ES, SS. All math operations use the accumulator (AX, AL or AH) plus the MUL and DIV use the DX register when 32 bit numbers are involved. The BX and BP 5 registers can be used as base pointers in the Data or Stack segments respectively. The CX register can be used for a automatic counter for some instructions. As demonstrated in earlier examples the SI and DI registers can be used as indexing registers. Any of the numerous assembly language books for the IBM PC or PCjr will give you an explanation of each of the processor registers and their uses. PSEUDO OPCODES Pseudo opcodes are assembler directives that control the generation of the object code. The available pseudo ops are DB, DS, DW, ENDP, EQU, ORG and PROC. DB = define byte and has operands of one or more bytes and/or a string ( DB 20H,'Demo' ). Strings are set off by single quotes. Numbers are less than 256 and expressed in binary, decimal or hexidecimal. DS = define segment and initializes a string of memory locations. The first operand defines the number of bytes to be initialized. If included the second operand defines the value the memory is to be initialized to. The default value is zero. ( DS 20,FFH ;initialize 20 bytes to 255). DW = define word and its operand(s) must be a number or a label. With DW the low order byte is stored first in memory as this is the format used by the 8088 for integer storage (i.e. dw 1020H = db 20H,10H). EQU= define the value of a label. All label definitions must occur at the beginning of your program or errors may occur in the assembly process. The most common error message received from defining a label late in the program is 'PHASE ERROR'. A phase error indicates the assembler generated a address for a label on the second pass different from that of the first pass. ORG= reset the location counter to new origin. Since all .COM programs start at 100H the default setting for ASSEMBLE.COM is 100H however you may have a need to start at 00H for a driver routine or a machine language routine for BASIC. PROC and ENDP are used together to define a program or procedure as Near or Far. This information is used to determine the type of return to be generated when a RET is encountered. If no procedure is defined a Near procedure is assumed. The syntax is: PROC NEAR ;Proc must be followed by Near or Far .... .... ENDP If PROC is used an ENDP must be used. ERROR MESSAGES All error messages and diagnostics are printed immediately before the line in which the error occurs. The total number of error and diagnostic messages will be displayed at the end of the source code print out immediately prior to the symbol table dump. I have made an attempt to make error messages as user friendly as possible. The most cryptic of the error messages is 6 the series you receive when there is a syntax error. This message will be the opcode and ASSEMBLE.COM's interpretation of the type of data included in the operands. For example the message *** Syntax Error: MOV (16 bit immediate or 8 bit immediate), (none) would appear immediate before a line of code containing the instruction MOV 45H. By reviewing the type of data and the allowable operands for each instruction you should be able to locate the error. In the instruction above both a destination and source operand are required and if immediate data is used it must be the source operand. 'Phase Error' is most commonly caused by referencing an equate before defineing it. I strongly recommend you only use the EQU pseudo op at the beginning of your source code. This practice should prevent this error and will make your source code easier to read. 'Error: EQU without symbol' is received when you use the equate pseudo-op without a label. 'Error: EQU with forward reference' is received if you attempt to use a forward reference when equating a label. 'Error: ENDP without PROC'. You must specify where the procedure begins. 'Error: Missing ENDP'. You must specify where the procedure ends if PROC is used. 'Error: Procedures nested too deeply'. Only 10 levels of nesting are allowed. 'Error: Duplicate label'. See section on labels. 'Error: Data too long' indicates use of a byte operand where the data is out of the range of 0 to 255. 'Error: Too far for short jump' indicates a jump attempt longer than -128 or +127 bytes. 'Error: Undefined Symbol' plus the operand is displayed when no match is found in the symbol table. Frequently caused by bad syntax in the operand. 'Error: Illegal or undefined argument for OFFSET' is simular to 'Undefined Symbol' Two diagnostic messages may be given. The first follows a syntax error and is 'Specify word or byte operand' if the assembler could not determine which to use. The opcode must be corrected by appending a B or W to it. The assembler is not smart enough to give you this very often. The other message is just a notice that you used a long jump where you could have used a short jump and saved a byte of object code. SOURCE CODE Turbo Pascal source code for this assembler is avaliable for those who wish to customize it for their own needs (or those who would like to see what makes it tick). If you would like a copy of the source code send a formatted disk and $10 to George Fulford RR 1 Box 163c Shellsburg Ia 52332 Although I have no intentions of entering the software market at this time I do plan to make corrections to this program as the bugs are found. If you obtain a copy of the source code from me 7 it will be the most up to date version. WARRENTY/GUARANTEE There is NONE. This assembler runs on my PCjr and since I used all standard Turbo Pascal it should run on any PC DOS machine. If it does not I probably won't be able to help you. I have spent quite a bit of time debuging but I am sure there are still a few bugs lurking in the code. I will attempt to stamp out any reported. UPDATES Changes made by version 1.01 1) A more sophisticated number parsing routine was added. An offset may now be added to a lable. Multiplication and subtraction now occur in the correct order. 2) $ can now be used to indicate the present memory location. 3) Due to memory management problems with the PCjr lables are no longer stored in a linked list that uses all of the avaliable memory. Lables are stored in a predefined array of 400. This should be sufficient for all small programs. The 'Stack Overflow Error' is no longer applicable. 4) Lable storage has been expanded to twelve characters to improve the readability of lables. 5) A larger input buffer has been allocated to reduce disk access time and other minor changes in code to inprove performance. 6) Bugs in DIV, SHR, SHL, ORB instruction have been corrected. DIVB and DIVW have been added to allow 8 or 16 bit divides useing indirect addressing. 7) A bug in the line parsing routine that stopped the parsing at a semicolon that was enclosed in quotes has been corrected. 8) An error in interpeting [BP] has been corrected. 8 OP CODE TABLE addressing modes supported (b/w = must specify byte or word) A = acumulator reg(ax, ah, al) b/w = must add B or W to opcode for this addressing mode D = displacememt (8 or 16 bit as required by the instruction) I = immediate (byte for 8 bit registers, word for 16 bit reg) M/R = memory or register indirect addressing N = none R = register(bx, cx, dx, bp, si, di) S = segment register (cs, ds, es, ss) Op Operand types dest. N | A | A | R | R | M/R | R | M/R | I | I | D | M/R source N | I | M/R | I | N | R | M/R | I | I | N | N | N AAA x | | | | | | | | | | | AAD x | | | | | | | | | | | AAM x | | | | | | | | | | | AAS x | | | | | | | | | | | ADC | x | | x | | x | x | b/w | | | | AND | x | | x | | x | x | b/w | | | | CALL | | | | | | | | x | | x | CALLF | | | | | | | | | | | x CALLN | | | | | | | | | | | x CBW x | | | | | | | | | | | CLC x | | | | | | | | | | | CLD x | | | | | | | | | | | CLI x | | | | | | | | | | | CMC x | | | | | | | | | | | CMP | x | | x | | x | x | b/w | | | | CMPS b/w | | | | | | | | | | | CWD x | | | | | | | | | | | DAA x | | | | | | | | | | | DAS x | | | | | | | | | | | DB | | | | | | | | x | x | | DEC | | | | x | | | | | | | b/w DIV | | | | x | | | | | | | b/w DS | | | | | | | | x | x | | DW | | | | | | | | x | x | | ENDP x | | | | | | | | | | | EQU | | | | | | | | | x | | memory HLT x | | | | | | | | | | | IDIV | | x | | | | | | | | | IMUL | | x | | | | | | | | | IN | x |note1| | | | | | | | | 9 Op Operand types dest. N | A | A | R | R | M/R | R | M/R | I | I | D | M/R source N | I | M/R | I | N | R | M/R | I | I | N | N | N INC | | | | x | | | | | | | b/w INT x | | | | | | | | | x | | INTO x | | | | | | | | | | | IRET x | | | | | | | | | | | JA | | | | | | | | | | x | JAE | | | | | | | | | | x | JB | | | | | | | | | | x | JBE | | | | | | | | | | x | JCXZ | | | | | | | | | | x | JE | | | | | | | | | | x | JG | | | | | | | | | | x | JGE | | | | | | | | | | x | JL | | | | | | | | | | x | JLE | | | | | | | | | | x | JMP | | | | | | | | | | x | JMPF | | | | | | | | | | x | JMPN | | | | | | | | | | x | JMPS | | | | | | | | | | x | JNE | | | | | | | | | | x | JNO | | | | | | | | | | x | JNP | | | | | | | | | | x | JNS | | | | | | | | | | x | JNZ | | | | | | | | | | x | JO | | | | | | | | | | x | JP | | | | | | | | | | x | JPE | | | | | | | | | | x | JPO | | | | | | | | | | x | JS | | | | | | | | | | x | JZ | | | | | | | | | | x | LAHF x | | | | | | | | | | | LDS | | | | | |note2| | | | | LEA | | | | | |note2| | | | | LES | | | | | |note2| | | | | LOCK x | | | | | | | | | | | LODS b/w | | | | | | | | | | | LOOP | | | | | | | | | | x | LOOPE | | | | | | | | | | x | LOOPNE | | | | | | | | | | x | LOOPNZ | | | | | | | | | | x | LOOPZ | | | | | | | | | | x | 10 Op Operand types dest. N | A | A | R | R | M/R | R | M/R | I | I | D | M/R source N | I | M/R | I | N | R | M/R | I | I | N | N | N MOV note3| | x | | | x | x | b/w | | | | MOVS b/w | | | | | | | | | | | MUL | | x | | | | | | | | | NEG | | | | x | | | | | | | NOP x | | | | | | | | | | | NOT | | | | x | | | | | | | b/w OR | x | | x | | x | x | b/w | | | | ORG | | | | | | | | | x | | OUT | |note1| | | | | | | | | POP | | | | x | | | | | | |x or seg POPF x | | | | | | | | | | | PROC note4| | | | | | | | | | | PUSH | | | | x | | | | | | |x or seg PUSHF x | | | | | | | | | | | RCL | | | | x | | | | | | | b/w RCR | | | | x | | | | | | | b/w REP x | | | | | | | | | | | REPE x | | | | | | | | | | | REPNE x | | | | | | | | | | | REPNZ x | | | | | | | | | | | REPZ x | | | | | | | | | | | RET x | | | | | | | | | | x | ROL | | | | x | | | | | | | b/w ROR | | | | x | | | | | | | b/w SAHF x | | | | | | | | | | | SAR | | | | x | | | | | | | b/w SBB | x | | x | | x | x | b/w | | | | SCAS b/w | | | | | | | | | | | SEG | | | | | | | | | x | | SHL | | | | x | | | b/w | | | | SHR | | | | x | | | b/w | | | | STC x | | | | | | | | | | | STD x | | | | | | | | | | | STI x | | | | | | | | | | | STOS b/w | | | | | | | | | | | SUB | x | | x | | x | x | b/w | | | | TEST | x | | x | | x | x | b/w | | | | WAIT x | | | | | | | | | | | XCHG | |note5| | | x | x | | | | | XLAT x | | | | | | | | | | | XOR | x | | x | | x | x | b/w | | | | 11 note 1 IN/OUT supports DX<-acum(8 or 16) and port<-acum(8 or 16). note 2 These instructions can use only memory reference in the source operand. note 3 MOV also supports mem<-acum seg<-M/R and M/R<-seg(or CS). note 4 Must specify near or far PROCedure. note 5 The accumulator can be exchanged with any of the registers using the form XCHG AX,BX or XCHG BX 12