home *** CD-ROM | disk | FTP | other *** search
- CHAPTER 4 ELEMENTS OF THE A86 LANGUAGE
-
- This chapter begins the description of the A86 language. It's a
- bit more tutorial in nature than the rest of the manual. I'll
- start by describing the elementary building blocks of the
- language.
-
-
- The A86 Language and the A86 Program
-
- First, let's establish what we mean when we say A86. On one
- hand, A86 is the name for my assembly language for the Intel 86
- family of (IBM-PC and compatible) computers. Statements written
- in this language are used to specify machine instructions for the
- 86 family and to allocate memory space for program data. On the
- other hand, A86 is the name for a program called an assembler,
- that translates these human readable statements into a machine
- readable form. The input to the assembler is a source file (or a
- list of source files) containing assembly language statements.
- The output of the assembler is a file containing binary program
- code that can either be run as a program on the PC, or combined
- with other modules (using a linker) to make a program.
-
-
- General Categories of A86 Elements
-
- The statements in an A86 source file can be classified in three
- general categories: instruction statements, data allocation
- statements, and assembler directives. An instruction statement
- uses an easily remembered name (a mnemonic) and possibly one or
- more operands to specify a machine instruction to be generated. A
- data allocation statement reserves, and optionally initializes,
- memory space for program data. An assembler directive is a
- statement that gives special instructions to the assembler.
- Directives are unlike the instruction and data allocation
- statements in that they do not specify the actual contents of
- memory. Examples of the three types of A86 statements are given
- below. These are provided to give you a general idea of what the
- different kinds of statements look like.
-
- Instruction Statements
-
- MOV AX,BX
- CALL SORT_PROCEDURE
- ADD AL,7
-
- Data Allocation Statements
-
- A_VARIABLE DW 0
- DB 'HELLO'
-
- Assembler Directives
-
- CODE SEGMENT
- ITEM_COUNT EQU 5
- 4-2
-
- The statements in an A86 source file are made up of keywords,
- identifiers, numbers, strings, special characters, and comments.
- A keyword is a symbol that has special meaning to the assembler,
- such as an instruction mnemonic (MOV, CALL) or some other
- reserved word in the assembly language (DB, SEGMENT, EQU).
- Identifiers are programmer-defined symbols, used to represent
- such things as variables, labels in the code, and numerical
- constants. Identifiers may contain letters, numbers, and the
- characters _, @, $, and ?, but must begin with a letter, _, or @.
- The identifier name is considered unique up to 127 characters,
- but it can be of any length (up to 255 characters). Examples of
- identifiers are: COUNT, L1, and A_BYTE.
-
- Numbers in A86 may be expressed as decimal, hexadecimal, octal,
- or binary. These must begin with a decimal digit and, except in
- the case of a decimal or hexadecimal number, must end with "x"
- followed by a letter identifying the base of the number. A
- number without an identifying base is hexadecimal if the first
- digit is 0; decimal if the first digit is 1 through 9. Examples
- of A86 numbers are: 123 (decimal), 0ABC (hexadecimal), 1776xQ
- (octal), and 10100110xB (binary).
-
- Strings are characters enclosed in single quotes. Examples of
- strings are: '1st string' and 'SIGN-ON MESSAGE, V1.0'. The
- single quote is one of many special characters used in the
- assembly language. Others, run together in a list, are: ! $ ? ;
- : = , [ ] . + - ( ) * / > ". The space and tab characters are
- also special characters, used as separators in the assembly
- language.
-
- For compatibility with other assemblers, I now also accept double
- quotes for strings.
-
- A comment is a sequence of characters used for program
- documentation only; it is ignored by the assembler. Comments
- begin with a semicolon (;) and run to the end of the line on
- which they are started. Examples of lines with comments are
- shown below:
-
- ; This entire line is a comment.
- MOV AX,BX ; This is a comment next to an instruction statement.
-
- Alternatively, for compatibility with other assemblers, I provide
- the COMMENT directive. The next non-blank character after
- COMMENT is a delimiter to a comment that can run across many
- lines; all text is ignored, until a second instance of the
- delimiter is seen. For example,
-
- COMMENT 'This comment
- runs across two lines'
- 4-3
-
- I don't like COMMENT, because I think it's very dangerous. If,
- for example, you have two COMMENTs in your program, and you
- forget to close the first one, the assembler will happily ignore
- all source code between the comments. If that source code does
- not happen to contain any labels referenced elsewhere, the error
- may not be detected until your program blows up. For multiline
- comments, I urge you to simply start each line with a semicolon.
-
- Statements in the A86 are line oriented, which means that
- statements may not be broken across line boundaries. A86 source
- lines may be entered in a free form fashion; that is, without
- regard to the column orientation of the symbols and special
- characters.
-
- PLEASE NOTE: Because an A86 line is free formatted, there is no
- need for you to put the operands to your instructions in a
- separate column. You organize things into columns when you want
- to visually scan down the column; and you practically never scan
- operands separate from their opcodes. The only reason that 99%
- of the assembly-language programs out there in the world have
- operands in a separate column is that some IBM assembler written
- back in 1953 required it. It makes no sense to have operands in
- a separate column, so STOP DOING IT!
-
-
- Operand Typing and Code Generation
-
- A86 is a strongly typed assembly language. What this means is
- that operands to instructions (registers, variables, labels,
- constants) have a type attribute associated with them which tells
- the assembler something about them. For example, the operand 4
- has type number, which tells the assembler that it is a numerical
- constant, rather than a register or an address in the code or
- data. The following discussion explains the types associated
- with instruction operands and how this type information is used
- to generate particular machine opcodes from general purpose
- instruction mnemonics.
-
- Registers
-
- The 8086 has 8 general purpose word (two-byte) registers:
- AX,BX,CX,DX,SI,DI,BP, and SP. The first four of those registers
- are subdivided into 8 general purpose one-byte registers
- AH,AL,BH,BL,CH,CL,DH, and DL. There are also 4 16-bit segment
- registers CS,DS,ES, and SS, used for addressing memory; and the
- implicit instruction-pointer register (referred to as IP,
- although "IP" is not part of the A86 assembly language).
-
- Variables
-
- A variable is a unit of program data with a symbolic name,
- residing at a specific location in 8086 memory. A variable is
- given a type at the time it is defined, which indicates the
- number of bytes associated with its symbol. Variables defined
- with a DB statement are given type BYTE (one byte), and those
- defined with the DW statement are given type WORD (two bytes).
- Examples:
- 4-4
-
- BYTE_VAR DB 0 ; A byte variable.
- WORD_VAR DW 0 ; A word variable.
-
- Labels
-
- A label is a symbol referring to a location in the program code.
- It is defined as an identifier, followed by a colon (:), used to
- represent the location of a particular instruction or data
- structure. Such a label may be on a line by itself or it may
- immediately precede an instruction statement (on the same line).
- In the following example, LABEL_1 and LABEL_2 are both labels for
- the MOV AL,BL instruction.
-
- LABEL_1:
- LABEL_2: MOV AL,BL
-
- In the A86 assembly language, labels have a type identical to
- that of constants. Thus, the instruction MOV BX,LABEL_2 is
- accepted, and the code to move the immediate constant address of
- LABEL2 into BX, is generated.
-
- IMPORTANT: you must understand the distinction between a label
- and a variable, because you may generate a different instruction
- than you intended if you confuse them. For example, if you
- declare X: DW ?, the colon following the X means that X is a
- label; the instruction MOV SI,X moves the immediate constant
- address of X into the SI register. On the other hand, if you
- declare X DW ?, with no colon, then X is a word variable; the
- same instruction MOV SI,X now does something different: it loads
- the run-time value of the memory word X into the SI register.
-
- Constants
-
- A constant is a numerical value computed from an assembly-time
- expression. For example, 123 and 3 + 2 - 1 both represent
- constants. A constant differs from an a variable in that it
- specifies a pure number, known by the assembler before the
- program is run, rather than a number fetched from memory when the
- program is running.
-
-
- Generating Opcodes from General Purpose Mnemonics
-
- My A86 assembly language is modeled after Intel's ASM86 language,
- which uses general purpose mnemonics to represent classes of
- machine instructions rather than having a different mnemonic for
- each opcode. For example, the MOV mnemonic is used for all of
- the following: move byte register to byte register, load word
- register from memory, load byte register with constant, move word
- register to memory, move immediate value to word register, move
- immediate value to memory, etc. This feature saves you from
- having to distinguish "move" from "load," "move constant" from
- "move memory," "move byte" from "move word," etc.
- 4-5
-
- Because the same general purpose mnemonic can apply to several
- different machine opcodes, A86 uses the type information
- associated with an instruction's operands in determining the
- particular opcode to produce. The type information associated
- with instruction operands is also used to discover programmer
- errors, such as attempting to move a word register to a byte
- register.
-
- The examples that follow illustrate the use of operand types in
- generating machine opcodes and discovering programmer errors. In
- each of the examples, the MOV instruction produces a different
- 8086 opcode, or an error. The symbols used in the examples are
- assumed to be defined as follows: BVAR is a byte variable, WVAR
- is a word variable, and LAB is a label. As you examine these MOV
- instructions, notice that, in each case, the operand on the right
- is considered to be the source and the operand on the left is the
- destination. This is a general rule that applies to all two-
- operand instruction statements.
-
- MOV AX,BX ; (8B) Move word register to word register.
- MOV AX,BL ; ERROR: Type conflict (word,byte).
- MOV CX,5 ; (B9) Move constant to word register.
- MOV BVAR,AL ; (A0) Move AL register to byte in memory.
- MOV AL,WVAR ; ERROR: Type conflict (byte,word).
- MOV LAB,5 ; ERROR: Can't use label/constant as dest. to MOV.
- MOV WVAR,SI ; (89) Move word register to word in memory.
- MOV BL,1024 ; ERROR: Constant is too large to fit in a byte.
-
-