home *** CD-ROM | disk | FTP | other *** search
- A68k - a freely distributable assembler for the Amiga
-
- by Charlie Gibbs
-
- with special thanks to
- Brian R. Anderson and Jeff Lydiatt
-
- (Version 1.2 - July 11, 1988)
-
- Note: This program is NOT Public Domain. Permission is given
- to freely distribute this program provided no fee is charged, and this
- documentation file is included with the program.
-
- This assembler is based on Brian R. Anderson's 68000 cross-
- assembler published in Dr. Dobb's Journal, April through June 1986.
- I have converted it to produce AmigaDOS-format object modules, and
- have made many enhancements, such as macros and include files.
-
- My first step was to convert the original Modula-2 code into C.
- I did this for two reasons. First, I had access to a C compiler, but
- not a Modula-2 compiler. Second, I like C better anyway.
-
- The executable code generator code (GetObjectCode and MergeModes)
- is essentially the same as in the original article, aside from its
- translation into C. I have almost completely rewritten the remainder
- of the code, however, in order to remove restrictions, add enhancements,
- and adapt it to the AmigaDOS environment. Since the only reference book
- available to me was the AmigaDOS Developer's Manual (Bantam, February
- 1986), the assembler and the remainder of this document work in terms
- of that book.
-
-
- RESTRICTIONS
-
- Let's get these out of the way first. There are a few things that I
- have not yet implemented, and some outright bugs that would take too long
- to correct for this version.
-
- o The verification file (-v) option is not supported. Diagnostic
- messages always appear on the console. They also appear in the
- listing file, however (see extensions below). You can produce
- an error file by redirecting console output to a file - the
- line number counter and final summary are displayed on stderr
- so you can still see what's happening.
-
- o The file names in the include directory list (-i) must be separated
- by commas. The list may not be enclosed in quotes.
-
- o Labels assigned by EQUR and REG directives are case-sensitive.
-
- o The following directives are not supported, and will be flagged as
- invalid op-codes:
-
- RORG
- OFFSET
- NOPAGE
- LLEN
- PLEN
- NOOBJ
- FAIL
- FORMAT
- NOFORMAT
- MASK2
-
- I feel that NOPAGE, LLEN, and PLEN should not be defined within a
- source module. It doesn't make sense to me to have to change your
- program just because you want to print your listings on different
- paper. The command-line option "-p" (see below) can be used as a
- replacement for PLEN.
-
-
- EXTENSIONS
-
- Now for the good stuff:
-
- o Labels can be any length that will fit onto one source line
- (currently 127 bytes maximum). Since labels are stored on the
- heap, the number of labels that can be processed is limited only
- by available memory, which can be increased by using the "-w"
- option (see below).
-
- o Since section data and user macro definitions are stored on the
- same heap as the symbol table (see above), they too are limited
- only by available memory. (Actually, there is a hard-coded limit
- of 32767 sections, but I doubt anyone will run into that one.)
-
- o The only values a label cannot take are the register names - the
- assembler can distinguish between the same name used as a label,
- instruction name or directive, macro name, or section name.
-
- o Section and user macro names appear in the symbol table dump, and
- will also be cross-referenced. Their names can be the same as any
- label (see above); the assembler can sort them out.
-
- o Includes and macro calls can be nested indefinitely, limited only
- by available memory. The message "Secondary heap overflow -
- assembly terminated" will be displayed if memory is exhausted.
- You can increase the size of this heap using the -w parameter
- (see below). Recursive macros are supported; recursive includes
- will, of course, result in a loop that will be broken only when
- the heap overflows.
-
- o The EVEN directive forces alignment on a word (2-byte) boundary.
- It does the same thing as CNOP 0,2.
- (This one is left over from the original code.)
-
- o Branch (Bcc) instructions to a previously-defined label will be
- automatically converted to short form if possible. This feature is
- not available for forward branches, since in pass 1 the assembler
- doesn't yet know how far the branch must go.
-
- o Backward references to labels within the current CODE section
- will be converted to PC relative addressing with displacement
- if this mode is legal for the instruction.
-
- o If a MOVEM instruction only specifies one register, it is converted
- to the corresponding MOVE instruction. Instructions such as
- MOVEM D0-D0,label will not be converted, however.
-
- o ADD, SUB, and MOVE instructions will be converted to ADDQ, SUBQ,
- and MOVEQ respectively if possible. Instructions coded explicitly
- as (for example) ADDA or ADDI will not be converted.
-
- o ADD, CMP, SUB, and MOVE to an address register are converted to
- ADDA, CMPA, SUBA, and MOVEA respectively, except if an ADD, SUB,
- or MOVE instruction has already been converted to quick form.
-
- o ADD, AND, CMP, EOR, OR, and SUB of an immediate value are converted
- to ADDI, ANDI, CMPI, EORI, ORI, and SUBI respectively (unless the
- address register or quick conversion above has already been done).
-
- o If both operands of a CMP instruction are postincrement mode, the
- instruction is converted to CMPM.
-
- o Operands of the form 0(An) will be treated as (An).
-
- o The SECTION directive allows a third parameter. This can be
- specified as either CHIP or FAST (upper- or lower-case). If this
- parameter is present, the hunk will be written with the MEMF_CHIP
- or MEMF_FAST bit set. This allows you to produce "pre-ATOMized"
- object modules.
-
- o The synonyms DATA and BSS are accepted for SECTION directives
- starting data or BSS hunks. The CHIP and FAST options mentioned
- above can also be used, e.g. BSS name,CHIP.
-
- o The following synonyms have been implemented for compatibility
- with the Aztec assembler:
- CSEG is treated the same as CODE or SECTION name,CODE
- DSEG is treated the same as DATA or SECTION name,DATA
-
- o The ability to produce Motorola S-records is retained from the
- original code. The -s option causes the assembler to produce
- S-format instead of AmigaDOS format. Relocatable code cannot be
- produced in this format.
-
- o Error messages consist of three parts.
- The position of the offending line is given as a line number
- within the current module. If the line is within a macro expan-
- sion or INCLUDE file, the position of the macro call or INCLUDE
- statement in the outer module is given as well. This process
- is repeated until the outermost source module is reached.
- Next, the offending source line itself is listed.
- Finally, the errors for that line are displayed. A flag
- (^) is placed under the column where the error was detected.
-
- o Named local labels are supported. These work the same as the
- local labels supported by the Metacomco assembler (nnn$) but
- can be formed in the same manner as normal labels, except that
- they must be preceded by a backslash.
-
- o The following synonyms have been implemented for compatibility
- with the Assempro assembler:
- ENDIF is treated the same as ENDC
- = is treated the same as EQU
- | is treated the same as ! (logical OR)
-
- o Quotation marks (") can be used as string delimiters
- as well as apostrophes ('). Any given string must begin
- and end with the same delimiter. This allows such statements
- as the following:
- MOVEQ '"',D0
- DC.B "This is Charlie's assembler."
- Note that you can still define an apostrophe within a string
- delimited by apostrophes if you double it, e.g.
- MOVEQ """",D0
- DC.B 'This is Charlie''s assembler.'
-
- o If any errors are found in the assembly, the object code file
- will be scratched, unless you specified the -k (keep) flag
- on the command line.
-
- o The symbol .A68K (note upper case) is automatically defined
- as a SET symbol having an absolute value of 1. This enables
- a source program to determine whether it is being assembled
- on this assembler.
-
- o A zeroth positional macro parameter (\0) is supported. It
- is replaced by the length of the macro call (B, W, or L,
- defaulting to W). For instance, given the macro:
- moov MACRO
- move.\0 \1,\2
- ENDM
- the macro call
- moov.l d0,d1
- would be expanded as
- move.l d0,d1
-
-
- HOW TO USE IT
-
- The command-line syntax to run the assembler is as follows:
-
- a68k <source file>
- [-d]
- [-e<equate file>]
- [-h<header file>]
- [-i<include dirlist>]
- [-k]
- [-l<listing file>]
- [-o<object file>]
- [-p<page depth>]
- [-q[<quiet interval>]]
- [-s]
- [-t]
- [-w[<primary-heap-size>][,secondary-heap-size]]
- [-x<listing file>]
- [-z[<debug-start-line>][,debug-end-line]]
-
- These options can be given in any order, and the source file name can
- appear before all switches, after them, or anywhere in the middle.
- Option values, if any, must immediately follow the keyword with
- no intervening spaces.
-
- If the -o keyword is omitted, the object file will be given a default
- name. It is created by replacing all characters after the last period in
- the source file name by "o". For example, if the source file name is
- "myprog.asm", the object file name defaults to "myprog.o". A source name
- of "my.new.prog.asm" produces a default object file name of "my.new.prog.o".
- If the source file name does not contain a period, ".o" is appended to it
- to produce the default object file name.
-
- The default value for the listing file name is arrived at in the same
- way as the object file name, except that ".lst" is appended instead of ".o".
- If you don't specify this parameter, no listing file will be produced.
- If you specify -x (see below), -l (with the default name) is assumed,
- although you can still use this parameter if you wish.
-
- The default value for the equate file name is arrived at in the same
- way as the object file name, except that ".equ" is appended instead of ".o".
-
- The include directory list is a list of directory names separated by
- commas. No embedded blanks are allowed. For example, the specification
- -imylib,df1:another.lib
- will cause include files to be searched for first in the current directory,
- then in "mylib", then in "df1:another.lib".
-
- The -d keyword causes symbol table entries (hunk_symbol) to be written
- to the object module for the use of symbolic debuggers.
-
- The -k keyword causes the object file to be kept if any errors were
- found. Otherwise, it will be scratched if any errors occurred.
-
- The -l keyword causes a listing file to be produced. If you want
- the listing file to include a symbol table dump and cross-reference,
- use the -x keyword instead (see below).
-
- The -p keyword causes the page depth to be set to the specified value.
- If omitted, a default of 60 lines (-p60) is assumed.
-
- The -q keyword changes the interval at which A68k displays the
- current line number (the default is every 10 lines, i.e. -q10). If
- you specify -q0 or -q without a value, no line numbers will be displayed.
- This will speed up assemblies slightly by reducing console I/O.
-
- The -s keyword, if specified, causes the object file to be written in
- Motorola S-record format. If omitted, AmigaDOS format will be produced.
- The default name for an S-record file has ".s" appended to the source name,
- rather than ".o"; this can still be overridden with the -o keyword, though.
-
- The -t keyword allows tabs in the source file to be passed through
- to the listing file, rather than being expanded. In addition, tabs will
- be generated in the listing file to skip from the object code to the
- source statement, etc. This can greatly reduce the size of the listing
- file, as well as making it quicker to produce. Do not use this option
- if you will be displaying or listing the list file on a device which
- does not respond to a tab at every 8th position.
-
- The -w keyword specifies the size of the heaps used. The primary heap
- stores the symbol table, user macro text, relocation information, and
- cross-reference information. The secondary heap stores information for
- nested macro calls and include files. The primary heap size defaults to
- 32768 bytes, which should be enough for all but the largest assemblies.
- The secondary heap size defaults to 1024 bytes, which should be enough
- unless you use very deeply nested macros and/or include files with long
- path names. You can specify either or both parameters. For example:
- -w40000 secondary heap size remains at 1024 bytes
- -w,2000 primary heap size remains at 32768 bytes
- -w40000,2000 increases the size of both heaps
- If you're really tight for memory, and are assembling small modules, you
- can use this keyword to shrink the heaps below their default sizes.
- At the end of an assembly, a message will be displayed giving the
- amount of heap space actually used, in the form of the -w command
- you would have to enter to allocate the mininum heap space.
- See below for a layout of the heaps.
-
- The -x keyword works the same as -l, except that a symbol table
- dump, including cross-reference information, will be added to the end
- of the listing file.
-
- The -z keyword is provided for debugging purposes. You can cause
- the assembler to list a range of each lines, complete with line number
- and current location counter value, during both passes. For example:
- -z lists all source lines
- -z100,200 lists lines 100 through 200
- -z100 lines all lines starting at 100
- -z,100 lines the first 100 lines
-
-
- If you wish to override the default object and (optionally) listing
- file names, you can omit the -o and -l keywords. The assembler interprets
- the first three parameters without leading hyphens as the source, object,
- and listing file names respectively. Anything over three file names is an
- error, as is attempting to respecify a file name with the -o or -l keywords.
-
-
- The primary heap is built from both ends. Symbol table entries
- (including labels) and macro text are stored during pass 1. Cross-reference
- data is stored during pass 2. Relocation information is also stored during
- pass 2, but is cleared at the end of each SECTION. Since it is no longer
- needed once dumped, the space is freed for re-use by the next section's
- relocation information. The expression parser also uses the primary heap
- to store its working stacks - this space is freed as soon as an expression
- has been evaluated.
- The fixed portion of each symbol table entry occupies 16 bytes. The
- labels and macro text occupy just enough space to hold their strings
- (including the end-of-string delimiter) - they are all pointed to by fixed
- symbol table entries. Relocation entries occupy 10 bytes each.
- Cross-reference entries are 12 bytes long - each holds four references to
- one symbol. The expression parser creates temporary entries for terms
- (10 bytes each) and operators (4 bytes each). Since terms are combined
- as soon as possible, the parser almost never needs to store the entire
- expression on the heap.
- The diagram below illustrates the layout of the primary heap. High
- memory addresses are at the top of the diagram, while low addresses are
- at the bottom. The names on the left of the diagram are the names of the
- pointers to the various tables within the heap.
-
- Heap + maxheap -------------> ___________________________
- | |
- | Symbol table |
- struct SymTab *SymStart ---> |___________________________|
- | |
- | Symbol references |
- struct Ref *RefStart -------> |___________________________|
- | |
- | (unused space) |
- char *HeapLim --------------> |___________________________|
- | |
- | Relocation data |
- struct RelTab *RelStart ----> |___________________________|
- | |
- | Labels and macro text |
- char *Heap -----------------> |___________________________|
-
- Note that the pointers are to various types. This makes for
- lots of interesting casts. (Ain't C fun?) Since the relocation
- data is cleared at the end of each section, HeapLim will move up and
- down. The "high-water mark" is stored in char *HighHeap, which is
- used solely to produce the memory usage message at the end of the
- assembly. Note that a program may consist of a section containing
- many relocatable references, followed by a section with fewer
- relocatable references but lots of symbol references. In this case,
- RefStart might end up below HighHeap, and the final message would
- indicate that more heap space was used than was available. This is
- not an error - only if RefStart hits HeapLim will an error be reported.
-
-
- The secondary heap is also built from both ends, but it grows and
- shrinks according to how many macros and include files are currently open.
- At all times there will be at least one entry on the heap, for the original
- source code file.
- The bottom of the heap holds the names of the source code file and
- any macro or include files that are currently open. The full path is
- given. A null string is stored for user macros. Macro arguments are
- stored by additional strings, one for each argument in the macro call line.
- All strings are stored in minimum space, similar to the labels and user
- macro text on the primary heap. File names are pointed to by the fixed
- table entries (see below) - macro arguments are accessed by stepping past
- the macro name to the desired argument, unless NARG would be exceeded.
- The fixed portion of the heap is built down from the top. Each entry
- occupies 16 bytes. Enough information is stored to return to the proper
- position in the outer file once the current macro or include file has been
- completely processed.
- The diagram below illustrates the layout of the secondary heap.
-
- Heap2 + maxheap2 -----------> ___________________________
- | |
- | Input file table |
- struct InFCtl *InF ---------> |___________________________|
- | |
- | Parser operator stack |
- struct OpStack *Ops --------> |___________________________|
- | |
- | (unused space) |
- struct TermStack *Term -----> |___________________________|
- | |
- | Parser term stack |
- char *NextFNS --------------> |___________________________|
- | |
- | Input file name stack |
- char *Heap2 ----------------> |___________________________|
-
- The "high-water mark" for NextFNS is stored in char *High2,
- and the "low-water mark" (to stretch a metaphor) for InF is stored
- in struct InFCtl *LowInF. Again, these figures are used only to
- determine the maximum heap usage.
-
-
- Please send me any bug reports, flames, etc. I can be reached
- on Mind Link (604/533-2312), at any Panorama (PAcific NORthwest AMiga
- Association) meeting, or via Jeff Lydiatt or Larry Phillips.
- (I don't have the time or money to live on Usenet or CompuServe, etc.)
-
- Charlie Gibbs
-
- (I can't give a mailing address right now because I'm moving.)
-