Club Amiga de Montreal

home *** CD-ROM | disk | FTP | other *** search

/ Club Amiga de Montreal - CAM / CAM_CD_1.iso / files / 254a.lha / PCQ / A68k.doc < prev next >

Wrap

Text File | 1989-05-10 | 18.7 KB | 428 lines

A68k - a freely distributable assembler for the Amiga by Charlie Gibbs with special thanks to Brian R. Anderson and Jeff Lydiatt (Version 1.2 - July 11, 1988) Note: This program is NOT Public Domain. Permission is given to freely distribute this program provided no fee is charged, and this documentation file is included with the program. This assembler is based on Brian R. Anderson's 68000 cross- assembler published in Dr. Dobb's Journal, April through June 1986. I have converted it to produce AmigaDOS-format object modules, and have made many enhancements, such as macros and include files. My first step was to convert the original Modula-2 code into C. I did this for two reasons. First, I had access to a C compiler, but not a Modula-2 compiler. Second, I like C better anyway. The executable code generator code (GetObjectCode and MergeModes) is essentially the same as in the original article, aside from its translation into C. I have almost completely rewritten the remainder of the code, however, in order to remove restrictions, add enhancements, and adapt it to the AmigaDOS environment. Since the only reference book available to me was the AmigaDOS Developer's Manual (Bantam, February 1986), the assembler and the remainder of this document work in terms of that book. RESTRICTIONS Let's get these out of the way first. There are a few things that I have not yet implemented, and some outright bugs that would take too long to correct for this version. o The verification file (-v) option is not supported. Diagnostic messages always appear on the console. They also appear in the listing file, however (see extensions below). You can produce an error file by redirecting console output to a file - the line number counter and final summary are displayed on stderr so you can still see what's happening. o The file names in the include directory list (-i) must be separated by commas. The list may not be enclosed in quotes. o Labels assigned by EQUR and REG directives are case-sensitive. o The following directives are not supported, and will be flagged as invalid op-codes: RORG OFFSET NOPAGE LLEN PLEN NOOBJ FAIL FORMAT NOFORMAT MASK2 I feel that NOPAGE, LLEN, and PLEN should not be defined within a source module. It doesn't make sense to me to have to change your program just because you want to print your listings on different paper. The command-line option "-p" (see below) can be used as a replacement for PLEN. EXTENSIONS Now for the good stuff: o Labels can be any length that will fit onto one source line (currently 127 bytes maximum). Since labels are stored on the heap, the number of labels that can be processed is limited only by available memory, which can be increased by using the "-w" option (see below). o Since section data and user macro definitions are stored on the same heap as the symbol table (see above), they too are limited only by available memory. (Actually, there is a hard-coded limit of 32767 sections, but I doubt anyone will run into that one.) o The only values a label cannot take are the register names - the assembler can distinguish between the same name used as a label, instruction name or directive, macro name, or section name. o Section and user macro names appear in the symbol table dump, and will also be cross-referenced. Their names can be the same as any label (see above); the assembler can sort them out. o Includes and macro calls can be nested indefinitely, limited only by available memory. The message "Secondary heap overflow - assembly terminated" will be displayed if memory is exhausted. You can increase the size of this heap using the -w parameter (see below). Recursive macros are supported; recursive includes will, of course, result in a loop that will be broken only when the heap overflows. o The EVEN directive forces alignment on a word (2-byte) boundary. It does the same thing as CNOP 0,2. (This one is left over from the original code.) o Branch (Bcc) instructions to a previously-defined label will be automatically converted to short form if possible. This feature is not available for forward branches, since in pass 1 the assembler doesn't yet know how far the branch must go. o Backward references to labels within the current CODE section will be converted to PC relative addressing with displacement if this mode is legal for the instruction. o If a MOVEM instruction only specifies one register, it is converted to the corresponding MOVE instruction. Instructions such as MOVEM D0-D0,label will not be converted, however. o ADD, SUB, and MOVE instructions will be converted to ADDQ, SUBQ, and MOVEQ respectively if possible. Instructions coded explicitly as (for example) ADDA or ADDI will not be converted. o ADD, CMP, SUB, and MOVE to an address register are converted to ADDA, CMPA, SUBA, and MOVEA respectively, except if an ADD, SUB, or MOVE instruction has already been converted to quick form. o ADD, AND, CMP, EOR, OR, and SUB of an immediate value are converted to ADDI, ANDI, CMPI, EORI, ORI, and SUBI respectively (unless the address register or quick conversion above has already been done). o If both operands of a CMP instruction are postincrement mode, the instruction is converted to CMPM. o Operands of the form 0(An) will be treated as (An). o The SECTION directive allows a third parameter. This can be specified as either CHIP or FAST (upper- or lower-case). If this parameter is present, the hunk will be written with the MEMF_CHIP or MEMF_FAST bit set. This allows you to produce "pre-ATOMized" object modules. o The synonyms DATA and BSS are accepted for SECTION directives starting data or BSS hunks. The CHIP and FAST options mentioned above can also be used, e.g. BSS name,CHIP. o The following synonyms have been implemented for compatibility with the Aztec assembler: CSEG is treated the same as CODE or SECTION name,CODE DSEG is treated the same as DATA or SECTION name,DATA o The ability to produce Motorola S-records is retained from the original code. The -s option causes the assembler to produce S-format instead of AmigaDOS format. Relocatable code cannot be produced in this format. o Error messages consist of three parts. The position of the offending line is given as a line number within the current module. If the line is within a macro expan- sion or INCLUDE file, the position of the macro call or INCLUDE statement in the outer module is given as well. This process is repeated until the outermost source module is reached. Next, the offending source line itself is listed. Finally, the errors for that line are displayed. A flag (^) is placed under the column where the error was detected. o Named local labels are supported. These work the same as the local labels supported by the Metacomco assembler (nnn$) but can be formed in the same manner as normal labels, except that they must be preceded by a backslash. o The following synonyms have been implemented for compatibility with the Assempro assembler: ENDIF is treated the same as ENDC = is treated the same as EQU | is treated the same as ! (logical OR) o Quotation marks (") can be used as string delimiters as well as apostrophes ('). Any given string must begin and end with the same delimiter. This allows such statements as the following: MOVEQ '"',D0 DC.B "This is Charlie's assembler." Note that you can still define an apostrophe within a string delimited by apostrophes if you double it, e.g. MOVEQ """",D0 DC.B 'This is Charlie''s assembler.' o If any errors are found in the assembly, the object code file will be scratched, unless you specified the -k (keep) flag on the command line. o The symbol .A68K (note upper case) is automatically defined as a SET symbol having an absolute value of 1. This enables a source program to determine whether it is being assembled on this assembler. o A zeroth positional macro parameter (\0) is supported. It is replaced by the length of the macro call (B, W, or L, defaulting to W). For instance, given the macro: moov MACRO move.\0 \1,\2 ENDM the macro call moov.l d0,d1 would be expanded as move.l d0,d1 HOW TO USE IT The command-line syntax to run the assembler is as follows: a68k <source file> [-d] [-e<equate file>] [-h<header file>] [-i<include dirlist>] [-k] [-l<listing file>] [-o<object file>] [-p<page depth>] [-q[<quiet interval>]] [-s] [-t] [-w[<primary-heap-size>][,secondary-heap-size]] [-x<listing file>] [-z[<debug-start-line>][,debug-end-line]] These options can be given in any order, and the source file name can appear before all switches, after them, or anywhere in the middle. Option values, if any, must immediately follow the keyword with no intervening spaces. If the -o keyword is omitted, the object file will be given a default name. It is created by replacing all characters after the last period in the source file name by "o". For example, if the source file name is "myprog.asm", the object file name defaults to "myprog.o". A source name of "my.new.prog.asm" produces a default object file name of "my.new.prog.o". If the source file name does not contain a period, ".o" is appended to it to produce the default object file name. The default value for the listing file name is arrived at in the same way as the object file name, except that ".lst" is appended instead of ".o". If you don't specify this parameter, no listing file will be produced. If you specify -x (see below), -l (with the default name) is assumed, although you can still use this parameter if you wish. The default value for the equate file name is arrived at in the same way as the object file name, except that ".equ" is appended instead of ".o". The include directory list is a list of directory names separated by commas. No embedded blanks are allowed. For example, the specification -imylib,df1:another.lib will cause include files to be searched for first in the current directory, then in "mylib", then in "df1:another.lib". The -d keyword causes symbol table entries (hunk_symbol) to be written to the object module for the use of symbolic debuggers. The -k keyword causes the object file to be kept if any errors were found. Otherwise, it will be scratched if any errors occurred. The -l keyword causes a listing file to be produced. If you want the listing file to include a symbol table dump and cross-reference, use the -x keyword instead (see below). The -p keyword causes the page depth to be set to the specified value. If omitted, a default of 60 lines (-p60) is assumed. The -q keyword changes the interval at which A68k displays the current line number (the default is every 10 lines, i.e. -q10). If you specify -q0 or -q without a value, no line numbers will be displayed. This will speed up assemblies slightly by reducing console I/O. The -s keyword, if specified, causes the object file to be written in Motorola S-record format. If omitted, AmigaDOS format will be produced. The default name for an S-record file has ".s" appended to the source name, rather than ".o"; this can still be overridden with the -o keyword, though. The -t keyword allows tabs in the source file to be passed through to the listing file, rather than being expanded. In addition, tabs will be generated in the listing file to skip from the object code to the source statement, etc. This can greatly reduce the size of the listing file, as well as making it quicker to produce. Do not use this option if you will be displaying or listing the list file on a device which does not respond to a tab at every 8th position. The -w keyword specifies the size of the heaps used. The primary heap stores the symbol table, user macro text, relocation information, and cross-reference information. The secondary heap stores information for nested macro calls and include files. The primary heap size defaults to 32768 bytes, which should be enough for all but the largest assemblies. The secondary heap size defaults to 1024 bytes, which should be enough unless you use very deeply nested macros and/or include files with long path names. You can specify either or both parameters. For example: -w40000 secondary heap size remains at 1024 bytes -w,2000 primary heap size remains at 32768 bytes -w40000,2000 increases the size of both heaps If you're really tight for memory, and are assembling small modules, you can use this keyword to shrink the heaps below their default sizes. At the end of an assembly, a message will be displayed giving the amount of heap space actually used, in the form of the -w command you would have to enter to allocate the mininum heap space. See below for a layout of the heaps. The -x keyword works the same as -l, except that a symbol table dump, including cross-reference information, will be added to the end of the listing file. The -z keyword is provided for debugging purposes. You can cause the assembler to list a range of each lines, complete with line number and current location counter value, during both passes. For example: -z lists all source lines -z100,200 lists lines 100 through 200 -z100 lines all lines starting at 100 -z,100 lines the first 100 lines If you wish to override the default object and (optionally) listing file names, you can omit the -o and -l keywords. The assembler interprets the first three parameters without leading hyphens as the source, object, and listing file names respectively. Anything over three file names is an error, as is attempting to respecify a file name with the -o or -l keywords. The primary heap is built from both ends. Symbol table entries (including labels) and macro text are stored during pass 1. Cross-reference data is stored during pass 2. Relocation information is also stored during pass 2, but is cleared at the end of each SECTION. Since it is no longer needed once dumped, the space is freed for re-use by the next section's relocation information. The expression parser also uses the primary heap to store its working stacks - this space is freed as soon as an expression has been evaluated. The fixed portion of each symbol table entry occupies 16 bytes. The labels and macro text occupy just enough space to hold their strings (including the end-of-string delimiter) - they are all pointed to by fixed symbol table entries. Relocation entries occupy 10 bytes each. Cross-reference entries are 12 bytes long - each holds four references to one symbol. The expression parser creates temporary entries for terms (10 bytes each) and operators (4 bytes each). Since terms are combined as soon as possible, the parser almost never needs to store the entire expression on the heap. The diagram below illustrates the layout of the primary heap. High memory addresses are at the top of the diagram, while low addresses are at the bottom. The names on the left of the diagram are the names of the pointers to the various tables within the heap. Heap + maxheap -------------> ___________________________ | | | Symbol table | struct SymTab *SymStart ---> |___________________________| | | | Symbol references | struct Ref *RefStart -------> |___________________________| | | | (unused space) | char *HeapLim --------------> |___________________________| | | | Relocation data | struct RelTab *RelStart ----> |___________________________| | | | Labels and macro text | char *Heap -----------------> |___________________________| Note that the pointers are to various types. This makes for lots of interesting casts. (Ain't C fun?) Since the relocation data is cleared at the end of each section, HeapLim will move up and down. The "high-water mark" is stored in char *HighHeap, which is used solely to produce the memory usage message at the end of the assembly. Note that a program may consist of a section containing many relocatable references, followed by a section with fewer relocatable references but lots of symbol references. In this case, RefStart might end up below HighHeap, and the final message would indicate that more heap space was used than was available. This is not an error - only if RefStart hits HeapLim will an error be reported. The secondary heap is also built from both ends, but it grows and shrinks according to how many macros and include files are currently open. At all times there will be at least one entry on the heap, for the original source code file. The bottom of the heap holds the names of the source code file and any macro or include files that are currently open. The full path is given. A null string is stored for user macros. Macro arguments are stored by additional strings, one for each argument in the macro call line. All strings are stored in minimum space, similar to the labels and user macro text on the primary heap. File names are pointed to by the fixed table entries (see below) - macro arguments are accessed by stepping past the macro name to the desired argument, unless NARG would be exceeded. The fixed portion of the heap is built down from the top. Each entry occupies 16 bytes. Enough information is stored to return to the proper position in the outer file once the current macro or include file has been completely processed. The diagram below illustrates the layout of the secondary heap. Heap2 + maxheap2 -----------> ___________________________ | | | Input file table | struct InFCtl *InF ---------> |___________________________| | | | Parser operator stack | struct OpStack *Ops --------> |___________________________| | | | (unused space) | struct TermStack *Term -----> |___________________________| | | | Parser term stack | char *NextFNS --------------> |___________________________| | | | Input file name stack | char *Heap2 ----------------> |___________________________| The "high-water mark" for NextFNS is stored in char *High2, and the "low-water mark" (to stretch a metaphor) for InF is stored in struct InFCtl *LowInF. Again, these figures are used only to determine the maximum heap usage. Please send me any bug reports, flames, etc. I can be reached on Mind Link (604/533-2312), at any Panorama (PAcific NORthwest AMiga Association) meeting, or via Jeff Lydiatt or Larry Phillips. (I don't have the time or money to live on Usenet or CompuServe, etc.) Charlie Gibbs (I can't give a mailing address right now because I'm moving.)