home *** CD-ROM | disk | FTP | other *** search
- Third Party Assembler Interface and Linker
-
- Traditionally in Forth systems, a "Forth Assembler" has been
- included. Adding assembler components to high level language can
- produce dramatic improvements in performance and capability over high
- level Forth. Unfortunately these assemblers are usually written in
- Forth, and have serious limitations. Often the syntax is markedly
- different from the expected syntax for the particular processor. It is
- usually difficult enough for most programmers to work in normal
- assembler syntax, without having to learn a new one.
-
- L.O.V.E. FORTH has been designed to use virtually any third
- party assembler, using standard assembler syntax. Whenever CODE ,
- ;CODE or ASM is encountered, Forth calls in the third party assembler
- to process the word, and links in the resulting object file, with a
- built-in linker. This means that not only can normal syntax be used in
- words created by the programmer, but that assembly language program
- sections from other sources can be included with little or no
- modification.
-
- The authors recommend the excellent assember A86 by Isaacson,
- also available as shareware. The original L.O.V.E. FORTH RPN assembler
- is included with the system as source code, to be used if desired.
-
- Operation
-
- A small amount of set-up is required in order to configure the
- system. The authors have already included configuration files for
- A86, Microsoft's MASM and Borland's TASM (see Assember Set-up below).
- For simple code words, like those supported by the old RPN assemblers,
- use is straight forward. For example, a word to make four copies of
- the top of stack:
-
- CODE DUP4 ; ( n -- n,n,n,n )
- pop ax
- push ax ; push some copies
- push ax
- push ax
- push ax
- next c;
- The operation NEXT above is a pre-defined macro.
-
- There are many other powerful features of this facility, namely
- the use of declarations in the assembly code. Not only can machine
- code be assembled, but any other type of data, including threads,
- heads, and data. Words can be defined using PUBLIC and existing words
- can be referenced with EXTRN. These are all interpreted by the linker
- portion of this interface.
-
- Errors during assembly
-
- If the assembler fails to produce an object file, an error
- message is displayed, and compilation is aborted. The programmer must
- then examine the error or listing file mentioned in the error message
- in order to determine the problem. The file containing the code to
- assemble is usually called CODE-4TH.ASM, and the file with the errors
- is usually named CODE-4TH.ERR or CODE-4TH.LST.
-
- SEGMENT Declarations
-
- The linker supports several reserved segment and class names, for
- use in directing code into various segments. These are: 'CODE', 'THREADS',
- 'DATA', 'HEADS', and 'STACKS'. These reserved names can either be used
- as segment names (most common), or as class names. When used as segment
- names any class name then specified, is ignored.
-
- The following segments are declared automatically for the
- programmer at the beginning of each assembly. The programmer need only
- switch between them (eg. HEADS SEGMENT is sufficient to switch to
- heads, without all the other parts of the declaration).
-
- code segment byte public 'CODE'
- code ends
- threads segment word public 'THREADS'
- threads ends
- data segment byte public 'DATA'
- data ends
- heads segment byte public 'HEADS'
- heads ends
- stacks segment byte public 'STACKS'
- stacks ends
-
- The code segment is the default, if no other is specified,
- allowing simple words to assemble with no declarations whatsoever.
- There is a statement CODE SEGMENT automatically inserted before the
- assembler statements, and the statements CODE ENDS and END after the
- end of the assembler word. The directive:
- ASSUME CS:CODE, DS:CODE, ES:CODE
- is also inserted, so no segment overrides will be inserted by the
- assembler, unless the programmer explicitly includes them.
-
- Origins
-
- When any segment is declared in an assembler, the origin is assumed
- to be 0. This is fine, when the only code being dealt with is produced by
- the assembler; the programmer is in complete control. Here the
- code must be loaded on top of an existing program - L.O.V.E. Forth.
- Therefore, the origins have been constructed to follow a slightly
- different pattern.
-
- When a reserved name is used for a segment name, the real segment
- origin is at 0000 in the L.O.V.E. Forth segment. The origin (if any) given
- by the programmer is incremented by HERE (or CS:HERE, TS:HERE, etc), prior
- to the code being loaded in. This ensures that there are no overwritten
- areas of memory. Alignment attribute is not meaningful for standard
- segments, they already start on even byte, word, paragraph and page
- boundaries.
-
- Should the programmer desire an origin of 0, in the segment
- being declared, a different name (unreserved) should be used. In this
- case, the linker looks to the class name for direction on where to load
- the code into memory. If the class name is not specified, the code is
- loaded into the CODE segment. The alignment type may be specified, if
- so desired. The combine type is ignored.
-
- SEGMENT Examples
-
- The most common declaration is:
- CODE SEGMENT
- which causes the code following it to be placed in the code
- segment. The origin coming in from the object file (normally
- 0 for the first code in that segment) is incremented by the
- dictionary pointer. Therefore the ORG is forced to be CS:HERE
-
- Another more complex example is:
- MYTHREADS SEGMENT WORD PUBLIC 'THREADS'
- which causes the following code to be loaded into the thread
- segment. The origin is relative to the start of this declared
- segment.
-
- MYSEG SEGMENT
- Code/data in this segment has it's own origin of 0.
- If grouped however, it has an offset from the start of the
- group <=64k. It is placed in ram in one of the standard
- segments (in this case the code segment)
-
- THREADS SEGMENT byte public 'code'
- The segment and class conflict - in this case the class is
- ignored.
-
- GROUP Declaration
-
- The programmer may declare any group, that does not group different
- L.O.V.E. Forth segments together (can't because >=64k apart). A
- segment may be part of only one group.
-
- EXTRN declarations
-
- The address or value of existing Forth words may be referenced
- in the assembler code, using the EXTRN declaration. Since words in
- L.O.V.E. Forth have several parts, the address of each part may be
- obtained, by adding a special prefix to the name desired. The prefixes
- are sorted out by the linker.
-
- Prefix Segment Purpose
- Register
- CODE@ (no prefix) CS address of machine code
- THREADS@ DS compilation address
- DATA@ ES parameter field address
- HEADS@ n/a name field address
- IMMEDIATE@ n/a special - executes the
- following word at link-time to
- obtain value
-
- For example:
-
- EXTRN CODE@COUNT:NEAR, DATA@TIB:BYTE, IMMEDIATE@HERE:ABS
-
- MOV BYTE PTR ES:DATA@TIB, 0DH ; install carriage return
- ADD AX,IMMEDIATE@HERE ; add HERE
- JMP CODE@COUNT ; exit via a forth word
-
- If the word appears without a prefix or if CODE@ is in front of
- the word, then the address of the related machine code is returned.
- This is the same as is returned with 'CODE . Similarly THREADS@
- returns the compilation address of the following word. The most useful
- prefix is perhaps DATA@ which returns the parameter field address, the
- address returned by a VARIABLE or other word created by CREATE. HEADS@
- returns the name field address. This is relative to the head segment,
- the actual value of which can be obtained from the label HSEG (see
- Frame Fixups below).
-
- The word IMMEDIATE@ can execute a word at link-time. This is
- typically a CONSTANT whose value is required, or a VARIABLE whose
- address is required in assembly code ( eg. IMMEDIATE@BL ). It can be
- any word that returns a single cell on the stack. If HERE or the other
- dictionary values are referenced, they return the values they had,
- prior to linking.
-
- If using MASM the programmer must pay particular attention to
- how the external references are declared. When using the reference as
- a memory pointer (eg. BYTE PTR ) the reference must be declared as
- :BYTE or :WORD (or other address delaration). A value used as an
- immediate type operand must be declared :ABS . If mis-declared, MASM
- ignores the addressing mode explicitly used in the instruction, in
- favour of what is implied in the EXTRN declaration. A reference can
- therefore not be used both as and immediate type operand and a memory
- reference.
-
- If using A86, the programmer need not include the EXTRN
- directive, as any symbols that are undefined, are automatically
- declared external. And if the EXTRN directive is used any type
- declaration (:NEAR, :WORD, :ABS, etc.) may be used, A86 handles all
- cases correctly.
-
- Forth Words with Illegal Characters
-
- When words contain characters that are illegal for the
- assembler a prefix of %% may be used. This prefix is dealt with before
- assembly begins, and changes the name to one acceptable for the
- assembler. Illegal characters include: +-*/%^() and many more.
- The word prefixed by %% must however be terminated by a space, tab or
- end of line. For example:
- %%-TRAILING %%+! %%2DUP
-
- Complete example, a word which exits via */
-
- CODE 550_337_*/ ; ( scale n by this fraction to get m ( n -- m )
- extrn %%*/ :near ; reference to the word */
- mov ax,550
- push ax
- mov ax,337
- push ax
- jmp %%*/ c;
-
- PUBLIC declarations
-
- Just as it is possible to reference Forth words from within
- assember with EXTRN, it is also possible to create new words. This is
- done with the PUBLIC directive. This can be used to create
- multiple entry points in words, or simply to create address references
- available in high level code or other code definitions. The %% prefix
- described above, can be used to make names with assembler-illegal
- characters. Example:
-
- CODE QDROP ; ( q -- )
- POP AX ; yes, there are more efficient ways of coding
- POP AX ; this word
- DDROP:POP AX
- DROP: POP AX
- NEXT
-
- PUBLIC DDROP ; ( d -- )
- PUBLIC DROP ; ( n -- )
- c;
-
- As shown in the table below, PUBLIC declarations work
- differently, depending on which segment the label is declared in. Note
- that a reference to the data segment, effectively becomes a VARIABLE .
-
- code segment A CODE word is created
- threads segment The PUBLIC address is assumed to be
- the compilation address of a word
- other segment A CONSTANT is created with the value
- names of the PUBLIC address
-
- A PUBLIC Caution about FORGET
-
- Words declared PUBLIC are CREATEd at link-time. Unfortunately
- most linkers do not provide PUBLIC declarations in any reasonable
- order. This means that a word deleclared later, may refer to a word
- lower in memory. This conflicts with FORGET which removes everything
- above the forgotten word. When using forget, be sure to forget all of
- the words PUBLICly CREATEd within one code word or ASM section.
-
- The Command ASM
-
- ASM is the best way to include a large body of assembly code
- into Forth. ASM simply begins a section of assembly language code.
- There is no word CREATEd like CODE , words that require access from
- high level Forth or other assembler words, should be declared PUBLIC as
- described above. Many code words can thus be included in one section.
- Example:
-
- ASM
- code segment
- BIT: ; ( access a table of bits ( n -- bit )
- POP BX
- ADD BX,BX
- PUSH es: [BX+bittable]
- NEXT
- code ends
-
- data segment
- assume cs:data
- bittable: dw 1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192
- dw 16384,32768
- data ends
- PUBLIC BIT
- end c;
-
- Linking OBJect Files
-
- The linker is automatically started after assembling a code
- word with CODE ;CODE or ASM . It is also possible for the linker to
- operate on existing object files. The authors may also be delivering
- object file versions of utilities and upgrades in the future. The
- syntax for this commmand is LINK" followed by the path and file name
- of a Microsoft format OBJ file. For example:
- LINK" MATRIX.OBJ"
- Would link in the specified file.
-
- Assembler Set-up
-
- Three assemblers are currently supported directly. A86,
- Microsoft MASM version 5, MASM version 6 and Borland TASM. In order to
- use one of these, the configuration file must be copied to the name
- ASSEMBLY.CFG, for example to use A86 type:
- COPY LOVEA86.CFG ASSEMBLY.CFG
- for MASM, MASM 6 and TASM, the files are LOVEMASM.CFG LOVEML6.CFG and
- LOVETASM.CFG respectively. MASM version 6 takes so much memory that
- the extended memory version must be used. This only works if you omit
- EMM386.
-
- If using another assembler, any of the above files can be
- modified according to what the assembler needs. Read the
- instructions in the CFG files (standard ascii). The following
- information must be provided:
- command line
- input, output, listing, error files
- the macro definition for NEXT
- the segment declarations
-
- lines to precede the lines parsed from CODE or ;CODE
- lines to follow the lines from CODE or ;CODE
-
- When the assembly file is created, first the macro
- definition, then the segment declarations described above are inserted
- into the file, along with the name of the word being assembled (if
- applicable). If assembling the words CODE or ;CODE, the "line to
- preceding" those parsed above are inserted, then the lines between
- CODE (;CODE) and C;. The file is terminated with the "lines to
- follow" from above. If the command ASM is used, the lines between ASM
- and C; are inserted following the segment declarations, and the file is
- terminated.
-
- Improving performance
-
- This method of assembly can be slow on any machine. The act of
- calling another program (assembler) through DOS is time consuming
- especially in disk accesses. There are two ways to speed this up:
-
- 1. Use the ASM facility to group CODE words together. The
- words which would otherwise have been declared separately
- will all be declared at one time, using the PUBLIC
- declaration. The assembler is only invoked once per ASM
- section.
-
- 2. Create a small RAM disk to include the temporary files
- listed in ASSEMBLY.CFG (just change the drive and/or
- directory where these are stored). For most words a size of
- 30k should be more than enough. The assembler itself can
- also be copied to the RAM disk if it is big enough.
-
- Frame Fixups
-
- Frame fixups are not supported. This means that explicit references
- to segments are not allowed. Keep in mind that on entry to any code word
- the segment registers contain the usual segment values. In addition
- there are locations defined in the CS: (CODE segment) that contain the
- current addresses of the standard segments. (These are CONSTANTs).
-
- Address contains segment value also in register
- CSEG CODE CS
- TSEG THREADS DS
- VSEG DATA ES
- SSEG STACKS SS
- HSEG HEADS n/a
- PSPSEG DOS program segment prefix n/a
-
- So access to these values is via the CS register, for example
- to load the VS value into DS:
-
- MOV DS, word ptr CS: IMMEDIATE@VSEG
-
- Why frame fixups are not supported
-
- In order to be used interactively, any frame numbers included
- in code would have to be resolved immediately on assembly. This is not
- a problem, the problems occur later. When an application is SAVEd and
- then re-executed at a later time, the location in memory where DOS
- loads the program is often different. Relocation is supported by
- DOS; the EXE file header can contain relocation items. However when
- the program is SAVEd, the segment memory images are concatenated and
- the result is saved in the EXE file. It is difficult to determine both
- where the fixup locations are, and where they are to point to, since on
- re-execution the image is expanded again. In addition before the image
- is to be saved, these references would have to be de-re-located. Not
- completely impossible, but difficult. Further difficulties ensue if
- the program is saved as a final APPLICATION, where the program is both
- saved and executed in its concatenated form.
-
- A version of L.O.V.E. Forth in preparation is able to perform
- frame fixups (the fixup information is stored as a field in each
- dictionary head). When saving an application with APPLICATION" these
- data are transferred to the .EXE header.