home *** CD-ROM | disk | FTP | other *** search
- CHAPTER 9 DIRECTIVES IN A86
-
-
- Segments in A86
-
- The following discussion applies when A86 is assembling a .COM
- See the next chapter for the discussion of segmentation for .OBJ
- files.
-
- A86 views the 86 computer's memory space as having two parts: The
- first part is the program, whose contents are the object bytes
- generated by A86 during its assembly of the source. A86 calls
- this area the CODE SEGMENT. The second part is the data area,
- whose contents are generated by the program after it starts
- running. A86 calls this area the DATA SEGMENT.
-
- Please note well that the only difference between the CODE and
- DATA segments is whether the contents are generated by the
- program or the assembler. The names CODE and DATA suggest that
- program code is placed in the CODE segment, and data structures
- go in the DATA segment. This is mostly true, but there are
- exceptions. For example, there are many data structures whose
- contents are determined by the assembler: pointer tables, arrays
- of pre-defined constants, etc. These tables are assembled in the
- CODE segment.
-
- In general, you will want to begin your program with the
- directive DATA SEGMENT, followed by an ORG statement giving the
- address of the start of your data area. You then list all your
- program variables and uninitialized data structures, using the
- directives DB, DW, and STRUC. A86 will allocate space starting
- at the address given in the ORG statement, but it will not
- generate any object bytes in that space. After your data segment
- declarations, you provide a CODE SEGMENT directive. If the
- program starts at any location other than the standard 0100, you
- give an ORG giving the address of the start of your program. You
- follow this with the program itself, together with any
- assembler-generated data structures. A short program
- illustrating this suggested usage follows:
-
- DATA SEGMENT
- ORG 08000
- ANSWER_BYTE DB ?
- CALL_COUNT DW ?
-
- CODE SEGMENT
- JMP MAIN
-
- TRAN_TABLE:
- DB 16,3,56,23,0,9,12,7
-
- MAIN:
- MOV BX,TRAN_TABLE
- XLATB
- MOV ANSWER_BYTE,AL
- INC CALL_COUNT
- RET
- 9-2
-
- A86 allows you to intersperse CODE SEGMENTs and DATA SEGMENTs
- throughout your program; but in general it is best to put all
- your DATA SEGMENT declarations at the top of your program, to
- avoid problems with forward referencing.
-
-
- CODE ENDS and DATA ENDS Statements
-
- For compatibility with Intel/IBM assemblers, A86 provides the
- CODE ENDS and DATA ENDS statements. The CODE ENDS statement is
- ignored; we assume that you have not nested a CODE segment inside
- a DATA segment. The DATA ENDS statement is equivalent to a CODE
- SEGMENT statement.
-
-
-
- The ORG Directive
-
- Syntax: ORG address
-
- ORG moves the output pointer (the location counter at which
- assembly is currently taking place within the current segment) to
- the value of the operand, which should be an absolute constant,
- or an expression evaluating to an absolute,
- non-forward-referenced constant.
-
- ORG is most often used in a DATA segment, to control the location
- of the data area within the segment. For example, in programs
- that fit entirely into 64K, you provide an ORG directive as the
- first line within your DATA segment at the top of your program.
- The location given by the ORG is some location that you are sure
- will be beyond the end of your program. If you are sure that
- your program will not go beyond 8K (02000 hex), your program can
- look like this:
-
- DATA SEGMENT
- ORG 02000 ; data goes here, beyond the end of the program
-
- (your data segment variable and buffer declarations go here)
-
- DATA ENDS
-
- (your program goes here)
- 9-3
-
- There is a special side effect to ORG when it is used in the CODE
- segment. If you begin your code segment with ORG 0, then A86
- knows that you are not assembling a .COM program; but are instead
- assembling a code segment to be used in some other context
- (examples: programming a ROM, or assembling a procedure for older
- versions of Turbo Pascal). The output file will start at 0, not
- 0100 as in a .COM file; and the default extension for the output
- file will be .BIN, not .COM.
-
- Other than in the above example, you should not in general issue
- an ORG within the CODE segment that would lower the value of the
- output pointer. This is because you thereby put yourself in
- danger of losing part of your assembled program. If you
- re-assemble over space you have already assembled, you will
- clobber the previously-assembled code. Also, be aware that the
- size of the output program file is determined by the value of the
- code segment output pointer when the program stops. If you ORG
- to a lower value at the end of your program, the output program
- file will be truncated to the lower-value address.
-
- Again, almost no program producing a .COM file will need any ORG
- directive in the code segment. There is an implied ORG 0100 at
- the start of the program. You just start coding instructions,
- and the assembler will put them in the right place.
-
-
- The EVEN Directive
-
- Syntax: EVEN
-
- The EVEN directive coerces the current output pointer to an even
- value. In a DATA SEGMENT or STRUC, it does so by adding 1 to the
- pointer if the pointer was odd; doing nothing if the pointer was
- already even. In a code segment, it outputs a NOP if the pointer
- was odd. EVEN is most often used in data segments, before a
- sequence of DW directives. The 16-bit machines of the 86 family
- fetch words more quickly when they are aligned onto even
- addresses; so the EVEN directive insures that your program will
- have the faster access to those DW's that follow it. (This speed
- improvement will not be seen on the 8-bit machines, most notably
- the 8088 of the original IBM-PC.)
-
-
- Data Allocation Using DB, DW, DD, DQ, and DT
-
- The 86 computer family supports the three fundamental data types
- BYTE, WORD, and DWORD. A byte is eight bits, a word is 16 bits
- (2 bytes), and a doubleword is 32 bits (4 bytes). In addition,
- the 87 floating point processor manipulates 8-byte quantities,
- which we call Q-words, and 10-byte quantities, which we call
- T-bytes. The A86 data allocation statement is used to specify
- the bytes, words, doublewords, Q-words, and T-bytes which your
- program will use as data. The syntax for the data allocation
- statement is as follows:
- 9-4
-
- (optional var-name) DB (list of values)
- (optional var-name) DW (list of values)
- (optional var-name) DD (list of values)
- (optional var-name) DQ (list of values)
- (optional var-name) DT (list of values)
-
- The variable name, if present, causes that name to be entered
- into the symbol table as a memory variable with type BYTE (for
- DB), WORD (for DW), DWORD (for DD), QWORD (for DQ), or TBYTE (for
- DT). The variable name should NOT have a colon after it, unless
- you wish the name to be a label (instructions referring to it
- will interpret the label as the constant pointer to the memory
- location, not its contents).
-
- The DB statement is used to reserve bytes of storage; DW is used
- to reserve words. The list of values to the right of the DB or
- DW serves two purposes. It specifies how many bytes or words are
- allocated by the statement, as well as what their initial values
- should be. The list of values may contain a single value or more
- than one, separated by commas. The list can even be missing;
- meaning that we wish to define a byte or word variable at the
- same location as the next variable.
-
- If the data initialization is in the DATA segment, the values
- given are ignored, except as place markers to reserve the
- appropriate number of units of storage. The use of "?", which in
- .COM mode is a synonym for zero, is recommended in this context
- to emphasize the lack of actual memory initialization. When A86
- is assembling .OBJ files, the ?-initialization will cause a break
- in the segment (unless ? is embedded in a nested DUP containing
- non-? terms, in which case it is a synonym for zero).
-
- A special value which can be used in data initializations is the
- DUP construct, which allows the allocation and/or initialization
- of blocks of data. The expression n DUP x is equivalent to a
- list with x repeated n times. "x" can be either a single value,
- a list of values, or another DUP construct nested inside the
- first one. The nested DUP construct needs to be surrounded by
- parentheses. All other assemblers, and earlier versions of A86,
- require parentheses around all right operands to DUP, even simple
- ones; but this requirement has been removed for simple operands
- in the current A86.
-
- Here are some examples of data initialization statements, with
- and without DUP constructs:
-
- CODE SEGMENT
- DW 5 ; allocate one word, init. to 5
- DB 0,3,0 ; allocate three bytes, init. to 0,3,0
- DB 5 DUP 0 ; equivalent to DB 0,0,0,0,0
- DW 2 DUP (0,4 DUP 7) ; equivalent to DW 0,7,7,7,7,0,7,7,7,7
- 9-5
-
- DATA SEGMENT
- XX DW ? ; define a word variable XX
- YYLOW DB ; no init value: YYLOW is low byte of word var YY
- YY DW ?
- X_ARRAY DB 100 DUP ? ; X_ARRAY is a 100-byte array
- D_REAL DQ ? ; double precision floating variable
- EX_REAL DT ? ; extended precision floating variable
-
- A character string value may be used to initialize consecutive
- bytes in a DB statement. Each character will be represented by
- its ASCII code. The characters are stored in the order that they
- appear in the string, with the first character assigned to the
- lowest-addressed byte. In the DB statement that follows, five
- bytes are initialized with the ASCII representation of the
- characters in the string 'HELLO':
-
- DB 'HELLO'
-
- Note that except for string comparisons described in the previous
- chapter, the DB directive is the only place in your program that
- strings of length greater than 2 may occur. In all other
- contexts (including DW), a string is treated as the constant
- number representing the ASCII value of the string; for example,
- CMP AL,'@' is the instruction comparing the AL register with the
- ASCII value of the at-sign. Note further that 2-character string
- constants, like all constants in the 8086, have their bytes
- reversed. Thus, while DB 'AB' will produce hex 41 followed by
- hex 42, the similar looking DW 'AB' reverses the bytes: hex 42
- followed by hex 41.
-
- For compatibility, A86 now accepts double quotes, as well as
- single quotes, for strings in DB directives.
-
-
- The DD directive is used to initialize 32-bit doubleword pointers
- to locations in arbitrary segments of the 86's memory space.
- Values for such pointers are given by two numbers separated by a
- colon. The segment register value appears to the left of the
- colon; and the offset appears to the right of the colon. In
- keeping with the reversed-bytes nature of memory storage in the
- 86 family, the offset comes first in memory. For example, the
- statement
-
- DD 01234:05678
-
- appearing in a CODE segment will cause the hex bytes 78 56 34 12
- to be generated, which is a long pointer to segment 01234, offset
- 05678.
-
- DD, DQ, and DT can also be used to initialize large integers and
- floating point numbers. Examples:
-
- DD 500000 ; half million, too big for most 86 instructions
- DD 3.5 ; single precision floating point number
- DQ 3.5 ; the same number in a double precision format
- DT 3.5 ; the same number in an extended precision format
- 9-6
-
- The STRUC Directive
-
- The STRUC directive is used to define a template of data to be
- addressed by one of the 8086's base and/or index registers. The
- syntax of STRUC is as follows:
-
- (optional strucname) STRUC (optional effective address)
-
- The optional structure name given at the beginning of the line
- can appear in subsequent expressions in the program, with the
- operator TYPE applied to it, to yield the number of bytes in the
- structure template.
-
- The STRUC directive causes the assembler to enter a mode similar
- to DATA SEGMENT: assembly within the structure declares symbols
- (the elements of the structure), using a location counter that
- starts out at the address following STRUC. If no address is
- given, assembly starts at location 0. An option not available to
- the DATA SEGMENT is that the address can include one base
- register [BX] or [BP] and/or one index register [SI] or [DI]. The
- registers are part of the implicit declaration of all structure
- elements, with the offset value increasing by the number of bytes
- allocated in each structure line. For example:
-
- LINE STRUC [BP] ; the template starts at [BP]
- DB 80 DUP (?) ; these 80 bytes advance us to [BP+80]
- LSIZE DB ? ; this 1 byte advances us to [BP+81]
- LPROT DB ?
- ENDS
-
- The STRUC just given defines the variables LSIZE, equivalent to
- B[BP+80], and LPROT, equivalent to B[BP+81]. You can now issue
- instructions such as MOV AL,LSIZE; which automatically generates
- the correct indexing for you.
-
- The mode entered by STRUC is terminated by the ENDS directive,
- which returns the assembler to whatever segment (CODE or DATA) it
- was in before the STRUC, with the location counter restored to
- its value within that segment before the STRUC was declared.
-
-
-
- Forward References
-
- A86 allows names for a variety of program elements to be forward
- referenced. This means that you may use a symbol in one
- statement and define it later with another statement. For
- example:
-
- JNZ TARGET
- .
- .
- TARGET:
- ADD AX,10
- 9-7
-
- In this example, a conditional jump is made to TARGET, a label
- farther down in the code. When JNZ TARGET is seen, TARGET is
- undefined, so this is a forward reference.
-
- Earlier versions of A86 were much more restricted in the kinds of
- forward references allowed. Most of the restrictions have now
- been eased, for convenience as well as compatibility with other
- assemblers. In particular, you may now make forward references
- to variable names. You just need to see to it that A86 has
- enough information about the type of the operand to generate the
- correct instruction. For example, MOV FOO,AL will cause A86 to
- correctly deduce that FOO is a byte variable. You can even code
- a subsequent MOV FOO,1 and A86 will remember that FOO was assumed
- to be a byte variable. But if you code MOV FOO,1 first, A86
- won't know whether to issue a byte or a word MOV instruction; and
- will thus issue an error message. You then specify the type by
- MOV FOO B,1.
-
- In general, A86's compatibility with That Other assembler has
- improved dramatically for forward references. Now, for most
- programs, you need only sprinkle a very few B's and W's into your
- references. And you'll be rewarded: in many cases the word form
- is longer than the byte form, so that the other assembler winds
- up inserting a wasted NOP in your program. You'll wind up with
- tighter code by using A86!
-
-
- Forward References in Expressions
-
- A86 now allows you to add or subtract a constant number from a
- forward reference symbol; and to append indexing registers to a
- forward reference symbol. This covers a vast majority of
- expressions formerly disallowed. For the remaining, more
- complicated expressions, there is a trick you can use to work
- your way around almost any case where you might run into a
- forward reference restriction. The trick is to move the
- expression evaluation down in your program so that it no longer
- contains a forward reference; and forward reference the
- evaluation answer. For example, suppose you wish to advance the
- ES segment register to point immediately beyond your program. If
- PROG_SIZE is the number of bytes in your program, then you add
- (PROGSIZE+15)/16 to the program's segment register value. This
- value is known at assembly time; but it isn't known until the end
- of the program. You do the following:
-
- MOV AX,CS ; fetch the program's segment value
- ADD AX,SEG_SIZE ; use a simple forward reference
- MOV ES,AX ; ES is now loaded as desired
-
- Then at the end of the program you evaluate the expression:
-
- PROG_SIZE EQU $
- SEG_SIZE EQU (PROG_SIZE+15)/16
- 9-8
-
- The EQU Directive
-
- Syntax: symbol-name EQU expression
- symbol-name EQU built-in-symbol
- symbol-name EQU INT n
-
- The expression field may specify an operand of any type that
- could appear as an operand to an instruction.
-
- As a simple example, suppose you are writing a program that
- manipulates a table containing 100 names and that you want to
- refer to the maximum number of names throughout the source file.
- You can, of course, use the number 100 to refer to this maximum
- each time, as in MOV CX,100, but this approach suffers from two
- weaknesses. First of all, 100 can mean a lot of things; in the
- absence of comments, it is not obvious that a particular use of
- 100 refers to the maximum number of names. Secondly, if you
- extend the table to allow 200 names, you will have to locate each
- 100 and change it to a 200. Suppose, instead, that you define a
- symbol to represent the maximum number of names with the
- following statement:
-
- MAX_NAMES EQU 100
-
- Now when you use the symbol MAX_NAMES instead of the number 100
- (for example, MOV CX,MAX_NAMES), it will be obvious that you are
- referring to the maximum number of names in the table. Also, if
- you decide to extend the table, you need only change the 100 in
- the EQU directive to a 200 and every reference to MAX_NAMES will
- reflect the change.
-
- You could also take advantage of A86's strong typing, by changing
- MAX_NAMES to a variable:
-
- MAX_NAMES DB ?
-
- or even an indexed quantity:
-
- MAX_NAMES EQU [BX+1]
-
- Because the A86 language is strongly typed, the instruction for
- loading MAX_NAMES into the CX register remains exactly the same
- in all cases: simply MOV CX,MAX_NAMES.
- 9-9
-
- Equates to Built-In Symbols
-
- A86 allows you to define synonyms for any of the assembler
- reserved symbols, by EQUating an alternate name of your choosing,
- to that symbol. For example, suppose you were coding a source
- module that is to be incorporated into several different
- programs. In some programs, a certain variable will exist in the
- code segment. In others, it will exist in the stack segment. You
- want to address the variable in the common source module, but you
- don't know which segment override to use. The solution is to
- declare a synonym, QS, for the segment register. QS will be
- defined by each program: the code-segment program will have a QS
- EQU CS at the top of it; the stack-segment program will have QS
- EQU SS. The source module can use QS as an override, just as if
- it were CS or SS. The code would be, for example, QS MOV
- AL,VARNAME.
-
-
- The NIL Prefix
-
- A86 provides a mnemonic, NIL, that generates no code. NIL can be
- used as a prefix to another instruction (which will have no
- effect on that instruction), or it can appear by itself on a
- line. NIL is provided to extend the example in the previous
- section, to cover the possibility of no overrides. If your
- source module goes into a program that fits into 64K, so that all
- the segment registers have the same value, then code QS EQU NIL
- at the top of that program.
-
-
- Interrupt Equates
-
- A86 allows you to equate your own name to an INT instruction with
- a specific interrupt number. For example, if you place TRAP EQU
- INT 3 at the top of your program, you can use the name TRAP as a
- synonym for INT 3 (the debugger trap on the 8086).
-
-
- Duplicate Definitions
-
- A86 contains the unique feature of duplicate definitions. We
- have already discussed local symbols, which can be redefined to
- different values without restriction. Local symbols are the only
- symbols that can be redefined. However, any symbol can be
- defined more than once, as long as the symbol is defined to be
- the same value and type in each definition.
-
- This feature has two uses. First, it eases modular program
- development. For example, if two independently-developed source
- files both use the symbol ESC to stand for the ASCII code for
- ESCAPE, they can both contain the declaration ESC EQU 01B, with
- no problems if they are combined into the same program.
- 9-10
-
- The second use for this feature is assertion checking. Your
- deliberate redeclaration of a symbol name is an assertion that
- the value of the symbol has not changed; and you want the
- assembler to issue you an error message if it has changed.
- Example: suppose you have declared a table of options in your
- DATA segment; and you have another table of initial values for
- those options in your CODE segment. If you come back months
- later and add an option to your tables, you want to be reminded
- to update both tables in the same way. You should declare your
- tables as follows:
-
- DATA SEGMENT
- OPTIONS:
- .
- .
- OPT_COUNT EQU $-OPTIONS ; OPT_COUNT is the size of the table
-
- CODE SEGMENT
- OPT_INITS:
- .
- .
- OPT_COUNT EQU $-OPT_INITS ; second OPT_COUNT had better be the same!
-
-
-
- The = Directive
-
- Syntax: symbol-name = expression
- symbol-name = built-in-symbol
- symbol-name = INT n
-
- The equals sign directive is provided for compatibility with That
- Other assembler. It is identical to the EQU directive, with one
- exception: if the first time a symbol appears in a program is in
- an = directive, that symbol will be taken as a local symbol. It
- can be redefined to other values, just like the generic local
- symbols (letter followed by digits) that A86 supports. (If you
- try to redefine an EQU symbol to a different value, you get an
- error message.) The = facility is most often used to define
- "assembler variables", that change value as the assembly
- progresses.
-
-
- The PROC Directive
-
- Syntax: name PROC NEAR
- name PROC FAR
- name PROC
-
- PROC is a directive provided for compatibility with Intel/IBM
- assemblers. I don't like PROC; and I recommend that you do not
- use it, even if you are programming for those assemblers.
- 9-11
-
- The idea behind PROC is to give the assembler a mechanism whereby
- it can decide for you what kind of RET instruction you should be
- providing. If you specify NEAR in your PROC directive, then the
- assembler will generate a near (same segment) return when it sees
- RET. If you specify FAR in your PROC directive, the assembler
- will generate a far RETF return (which will cause both IP and CS
- to be popped from the stack). If you simply leave well enough
- alone, and never code a PROC in your program, then RET will mean
- near return throughout your program.
-
- The reason I don't like PROC is because it is yet another attempt
- by the assembler to do things "behind your back". This goes
- against the reason why you are programming in assembly language
- in the first place, which is to have complete control over the
- code generated by your source program. It leads to nothing but
- trouble and confusion.
-
- Another problem with PROC is its verbosity. It replaces a simple
- colon, given right after the label it defines. This creates a
- visual clutter in the program, that makes the program harder to
- read.
-
- A86 provides an explicit RETF mnemonic so that you don't need to
- use PROC to distinguish between near and far return instructions.
- You can use RET or a near return and RETF for a far return. Even
- if you are programming in that other assembler, and you need to
- code a far return, I recommend that you create a RETF macro (it
- would have the single line DB 0CBH), and stay away from PROCs
- entirely.
-
-
- The ENDP Directive
-
- Syntax: [name] ENDP
-
- The only action A86 takes when it sees an ENDP directive is to
- return the assembler to its (sane) default state, in which RET is
- a near return.
-
- NOTE that this means that A86 does not support nested PROCs, in
- which anything but the innermost PROC has the FAR attribute. I'm
- sorry if I am blunt, but anybody who would subject their program
- to that level of syntactic clutter has rocks in their head.
-
-
- The LABEL Directive
-
- Syntax: name LABEL NEAR
- name LABEL FAR
- name LABEL BYTE
- name LABEL WORD
-
- LABEL is another directive provided for compatibility with
- Intel/IBM assemblers. A86 provides less verbose ways of
- specifying all the above LABEL forms, except for LABEL FAR.
- 9-12
-
- LABEL defines "name" to have the type given, and a value equal to
- the current output pointer. Thus, LABEL NEAR is synonymous with
- a simple colon following the name; and LABEL BYTE and LABEL WORD
- are synonymous with DB and DW, respectively, with no operands.
-
- LABEL FAR does have a unique functionality, not found in other
- assemblers. It identifies "name" as a procedure that can be
- called from outside this program's code segment. Such procedures
- should have RETFs instead of RETs. Furthermore, I have provided
- the following feature, unique to A86: if you CALL the procedure
- from within your program, A86 will generate a PUSH CS instruction
- followed by a NEAR call to the procedure. Other assemblers will
- generate a FAR call, having the same functional effect; but the
- FAR call consumes more program space, and takes more time to
- execute.
-
- WARNING: you cannot use the above CALL feature as a forward
- reference; the LABEL FAR definition must precede any CALLs to it.
- This is unavoidable, since the assembler must assume that a CALL
- to an undefined symbol takes 3 program bytes. All assemblers
- will issue an error in this situation.
-
-