home *** CD-ROM | disk | FTP | other *** search
-
- Web
-
- Greg Lee
- July 8, 1986
-
- General Description.
-
- Web is a preprocessor for assembly language programs. It converts
- assembly language programs which make use of certain structuring
- conventions into files which are acceptable to a 68000 assembler, in
- particular, the Metacomco assembler for the Amiga. Although inspired
- by Donald Knuth's system of the same name (see "Literate Programming"
- in The Computer Journal, v.27, no.2, 1984, pp.97-111), it is very much
- more primitive than Knuth's system, and the syntax of the language
- it interprets is different from that of the real WEB.
-
- Structuring.
-
- The structuring conventions made available by the use of web fall
- into the following categories:
-
- comments (numbered comment sections, etc.)
- procedures (can be given more readable names)
- named code sections (code defined apart from where it is used)
- defined symbols (for which text strings will be substituted)
- line separator (';' separates multiple statements on a line)
- statement grouping (using '{...}', '«...»', and '[...]')
- branch mnemonics (using '=' for 'bne', etc.)
- breaks (an easy way to branch to the end of a section)
- infix statements (express move/add/sub/lea/cmp with '='/'?' infix)
- length typing (associate byte/word/long with variables)
- data declarations (declare simple unitialized data variables)
-
- Some details about these can be found below. The source for web makes use
- of these conventions, and this is the principle documentation for the
- use of web, both by way of example and providing details of the
- implementation. The source is not put forward as an ideal model of
- style (you can do better).
-
-
- Usage.
-
- The source code that web is to process is assumed to be in
- a file whose name ends in '.w', as for example in 'web.w'. When you
- invoke web, give the name of the file on the command line, either with
- or without the '.w' suffix, as you choose. The output file which is
- to be assembled will have the same name, except ending in '.a' instead
- of '.w'. It will be created in the same directory as that of the '.w'
- file, and any pre-existing file of that name will be deleted.
-
- For example, the command for web to process itself is either 'web web'
- or 'web web.w', and the assembly file created is 'web.a'.
-
- The largest file that web can process is fixed by the size of
- an internal buffer which is now set at 80,000 bytes. The source file
- can typically be twice this size, since comments and multiple spaces
- are removed before the text is placed in the buffer. The total number
- of procedures, named code sections, and definitions must not exceed
- the size of an internal table, which can now hold a maximum of 400
- entries. Having the source code, of course you can easily change
- these limits.
-
-
- Optionality.
-
- The use of every part of web's extended syntax is optional. A program
- which could have been assembled directly can be sent through a web
- proprocessing stage without occasioning any changes in the resulting object
- code. So, if there are some aspects of the syntax you don't like, you don't
- need to use the related conventions. However, there are some
- incompatibilities between the languages that the assembler and web accept.
- So not just any correct assembly language program can be processed by web
- without problems. Here is a list of the potential difficulties:
-
- capital letters -- the first non-blank character of a line must
- not be a capital, unless the intention is to name a procedure;
- also, the sequences '.B','.W','.L' are elided
- semi-colon -- this must not be used for comments, since web
- understands it as a line separator
- colon -- after a label terminated with a colon, web starts a new
- line, so one can't use a colon with a label before the assembler
- directives macro, reg, set, equ, equr, section, rorg
- data sections -- web just does not understand data sections, so
- they must be treated as procedures (never called, of course);
- at the end of a data section, one must give a 'section ...,code'
- directive, so the assembler will accept the 'rts' instruction
- that web will add at the end of the section
- angle brackets -- assembler macros that take string arguments may
- have these arguments enclosed in angle brackets; web does not
- know that such angles should protect the enclosed characters
- from being changed
- double quotes -- web understands that single quotes protect characters,
- but it does not understand double quotes; use single quotes
- around file names after the 'include' directive
- keywords -- when the words 'define', 'byte', 'word', 'long' occur
- first on a line, possibly preceded by spaces, they are interpreted
- in a special way (detailed below)
-
-
- Comments.
-
- A paragraph of commentary can be introduced by a number followed by a
- period, or the Amiga topaz section character '§' (Alt single quote), or the
- paragraph character '¶' (Alt '6'). The paragraph introducer is the first
- nonblank thing on the line. Such a paragraph is terminated with an empty
- line.
-
- A second way of giving comments is to place them in parentheses. The
- initial left parenthesis must be the the very first thing on the line, but
- the closing right parenthesis can come anywhere following, except within
- another comment or enclosed in single quotes. Pairs of parenthses may
- occur within a parenthetical comment (so one can comment out portions of
- code that contain parenthetical comments).
-
- Thirdly, the bullet character '·' and everything on the remainder of
- its line is treated as a comment. This is a replacement for the assembler
- use of ';', which is no longer available with web.
-
- Lastly, '*' at the beginning of a line makes that line a comment. This
- is also an assembler convention.
-
- The assembler convention of placing comments after complete code
- instructions will also often work, but is not recommended, since it
- sometimes will interfere with web's syntactic analysis. It definitely
- will interfere after an infix statement (see below).
-
-
- Procedures.
-
- Subroutines can be given long names which may contain spaces. A line
- beginning with a capital letter is treated as such a name, and the lines
- following, up to the next procedure name or named code section or end
- of file, are part of the procedure. To call the procedure, just give its
- name preceded on a line by one or more blanks (possibly preceded by a
- left statement grouper or a colon-terminated label).
-
- Web generates an 'rts' instruction at the end of a procedure definition.
- At the end of the output file, it also generates an 'end' directive.
- Material at the beginning of the input file before any procedure name or
- named code section is treated as the body of an unnamed procedure,
- except that no 'rts' is added at the end.
-
- Procedures can be defined with parameters and with a list of registers
- to save. The list of parameters is given in parentheses following the
- name of the procedure, and the registers to save after that. The
- register list is in the form used by the 'movem' instruction. When
- the procedure is called with a parenthesized list of arguments, a
- series of 'move.l' instructions is generated for each of the names
- in the argument list with corresponding names in the parameter list
- as destinations. If one of the corresponding names should be missing,
- no move instruction is generated. Redundant moves, and moves to a
- destination which is the same as a subsequent source are not checked
- for.
-
-
- Named Code Sections.
-
- Named code sections are in-line procedures. A line beginning with
- two hyphens is treated as the name of a code section, and the section
- is defined as all following lines up to the next procedure name or named
- code section or end of file. To invoke the code, just give its name
- (including the hyphens) on a line preceded by one or more blanks
- (which may be preceded by a left brace or a colon-terminated label).
- Then the definition will be substituted for the name.
-
- Such code sections need not be defined before they are invoked, and
- the definitions may contain invocations of other named code sections,
- with no restriction on the level of nesting allowed. However, if there is
- recursion, the output file will exceed the capacity of most disk drives,
- since it will be infinitely long.
-
- Parameters and a register save list work just as they do for procedures.
-
- Due to a mistake in the implementation, if a given code section is
- to be invoked more than once, it must not contain '{...}', '«...»', '[...]'
- (see below), since then the generated labels will not be unique.
-
-
- Defines.
-
- Symbols can be defined as standing for a string of text. For instance,
- if the input file has the line: define new_line #10
- then for all occurrences of 'new_line', in the output file the text
- '#10' would be substituted.
-
- A symbol is a span of lower case letters or '_' or the grave accent.
- In the definition, the substitution text starts with the first non-blank
- character after the defined symbol and continues to the end of the line.
- Symbols cannot be redefined, and the substitution text cannot contain
- material which web is responsible for processing, except that it can
- contain other defined symbols. As in the case of named code sections,
- there is no protection provided against recursive definitions.
-
- No symbol substitutions are made within labels which are followed by
- a colon or which stand alone on a line.
-
-
- Line Separater.
-
- One can put multiple code instructions on one line by separating
- them with ';'s. However, the titles of procedures or named code
- sections, 'define's, and data declarations cannot come after ';'.
-
-
- Statement Grouping.
-
- As an alternative to making up your own labels for program branches,
- you can have web do this for you. A unique label is substituted for
- each of every pair of matching left and right braces. For example,
-
- instead of bne lab23 you can write bne { move D0,D3
- move D0,D3 }
- lab23:
-
- After the label for a left brace, a new line is begun in the output,
- and left indentation is removed before the labels substituted for either
- brace. In addition, '«' (Alt shift '+') is provided as a synonym for '{',
- and '»' (Alt ';') is a synonym for ';}', so instead of the above,
- you can also put:
- bne « move D0,D3 »
- A right brace could not be used in this last example, because the
- label substituted would be put on the same line as the move instruction.
-
- Yet another alternative for statement grouping is to use '[...]'. These
- work like braces, and can be used for notational variety. However, they
- generate a different set of labels, and they need not be properly nested
- with respect to braces. For example, one can make unstructured loops
- like this:
- { add.w d0,d1
- [ subq.l #1,d3
- bpl }
- moveq #30,d3
- subq.l #1,d4
- bpl ]
-
- Branch Mnemonics.
-
- Used with braces, the ordinary branch mnemonics are unintuitive.
- Consequently, the following alternative symbols are provided:
-
- use <=± for bgt
- use <= for bhi
- use >=± for blt
- use >= for bcs
- use ~= for beq
- use != for beq
- use -> for bra
- use <± for bge
- use < for bcc
- use >± for ble
- use > for bls
- use = for bne
- use - for bpl ('-' must be followed by a blank)
- use + for bmi
-
- Thus the above example could also have been written:
-
- = « move D0,D3 »
-
-
- Breaks.
-
- As a further means to avoid writing labels, the symbols '¶' (Alt '6')
- and '§' (Alt single-quote) can be used as targets of branch instructions.
- '¶' refers to the position of the next following unmatched right brace,
- and '§' refers to the end of the current procedure or named code section.
- The "end" is just before the 'movem.l (sp)+,...' to restore registers
- if a register save list was given, and for a procedure, before the
- the 'rts' instruction generated by web.
-
- Note that '¶' does not refer to ']', only '}'. Thus one can break out of
- deeply nested inner loops for which brackets were used to the end of
- an outer loop enclosed by braces.
-
- Infix Statements.
-
- Web allows a few C-style infix statements. However in most cases
- the intended length for the generated assembler mnemonic must be
- explicitly given as a suffix on the first operand. Here are examples
- for some byte-length operations:
-
- use d0.b = d2 for move.b d2,d0
- use d0.b += d2 for add.b d2,d0
- use d0.b -= d2 for sub.b d2,d0
- use d0.b ? d2 for cmp.b d2,d0
-
- Similarly, word and long word operations can be expressed using '.w'
- and '.l' suffixes. '*=', '/=', '&=', '|=', '<<=', '>>=' generate respectively
- instructions mulu, divu, and, or, lsl, lsr. An lea instruction is generated
- when the second operand begins with '&' or '^':
-
- use a0 = &buffer for lea buffer,a0
- use a0 = ^buffer for lea buffer,a0
-
- If the second operand begins with single-quote, '@', '%', or digits
- optionally preceded by '-' but not followed by '(', then '#' is
- appended. For instance:
-
- use d0.b = 'z' for move.b #'z',d0
- use (a0)+.w ? 312 for cmp.w #312,(a0)+
- use -(a0).l = -11 for move.l #-11,-(a0)
- use verf.b -= 1(a0) for sub.b 1(a0),verf
- use verf.b -= %10(a0) for sub.b #%10(a0),verf (sic!!)
-
- For add's and sub's, if the second operand begins with a single digit from
- 1 to 8 (not followed by '('), addq and subq are used:
-
- use memvar.w += 7 for addq.w #7,memvar
- use memvar.w += 7+2 for addq.w #7+2,memvar (sic!!)
- use memver.w += 9 for add.w #9,memvar
-
- Substitution for defined symbols is done before the conversion of
- infix statements, so in the above cases the operands could be in
- symbolic form without preventing web from finding the appropriate
- assembler instruction. But web does not analyze assembler 'equ'
- directives, so equ'd symbols should not be used for operands of
- infix statements.
-
- When the first operand of an infix statement is not given a
- length suffix, then provided certain conditions are satisfied, web
- will use a moveq instruction for an '=' statement and an addq.l
- or subq.l for '+=' or '-=' statements. Otherwise, the instruction
- mnemonic is just passed on to the assembler without a length suffix.
- For moveq, 'd' & a digit must precede the '=' sign (signifying one hopes that
- a data register is referred to) and digit or '-' digit must follow (unless
- followed by '('). For addq.l and subq.l, the second operand
- must be a digit from 1 to 8.
-
- use d0 = 0 for moveq #0,d0
- use had0 = 0 for moveq #0,had0 (sic!!)
- use d0 = 128 for moveq #128,d0 (sic!!)
- use a0 += 4 for addq.l #4,a0
-
- If the second operand is an expression intended for the assembler
- to evaluate, spacing may be used freely, since the spaces will be
- removed by web.
-
- use d0 = 1 + 28 for moveq #1+28,d0
-
-
- Length Typing.
-
- The treatment of length suffixes in infix statements as discussed
- above has this drawback: length is made to appear to be a property
- of a variable (which is often reasonable), yet only when the variable
- appears at the left of an infix operator. As a partial corrective,
- there is an added facility for expressing variable references in a
- uniform way. If the length suffixes are given with capital letters,
- the information they convey will be made use of in translating infix
- statements, but deleted everywhere else. One might have a sequence
- such as: for:
- d2.B = memvar.B move.b memvar,d2
- d1.B = d2.B move.b d2,d1
- d1.B ? verf.B cmp.b verf,d1
- != « memvar.B += d1.B » beq somelabel
- ... add.b d1,memvar
- somelabel:
- ...
-
- In this example, no use will be made of the '.B's on the right hand
- sides; they are for appearance only. Now in conjunction with
- the define feature, length information can be made implicit, if one
- wishes. Accompanied by:
-
- define snail D1.B
- define clips D2.B
-
- the above could be written:
-
- clips = memvar.B
- snail = clips
- snail ? verf.B
- != « memvar.B += snail »
- ...
-
- One does not lose any flexibility in following such a policy of variable
- usage, since implicit lengths can always be explicitly overridden.
- While the above 'snail = clips' translates to 'move.b D2,D1', if
- for some reason a long operation was wanted, e.g., one could write
- 'snail.l = clips', which would translate to 'move.l D2,D1'.
-
-
- Data Declarations.
-
- Built on top of the preceding method of hiding variable lengths
- is a primitive provision for declaring memory variables. The keywords
- 'long', 'word', and 'byte' define the following symbol as having the
- named length, and a bss section is appended to the output file requesting
- the allocation of memory to hold the data. For the above example, one
- could add the declarations:
-
- byte memvar
- byte verf
-
- and revise the code to read:
-
- clips = memvar
- snail = clips
- snail ? verf
- != « memvar += snail »
- ...
-
- In the output file, this would produce:
-
- move.b .0001Q,D2
- move.b D2,D1
- cmp.b .0002Q,D1
- beq .0001
- add.b D1,.0001Q
- .0001
- .8001
- ...
- section webroom,bss
- ...
- .0001Q DS.B 1
- .0002Q DS.B 1
- ...
- end
-
- (The actual labels generated would depend on what code had preceded. The
- label '.8001' is there as the target of a potential branch to ¶ instruction,
- though none occurs in this instance.)
-
- Long words declared in this way will be aligned on long word
- boundaries, and words aligned on word boundaries.
-
-
- Converting old assembly files.
-
- A separate utility 'atow' will convert move, add, and sub instructions
- to web's '=' format.
-
-
- Bugs.
-
- There is practically no error checking, aside from checking for
- balanced braces in each section.
-
- See the above lines with '(sic!!)' at the end.
-
- The language interpreted by web does not have a regular syntax. The
- only criterion for correctness is whether the code will be translated
- into proper and appropriate assembly language by the simple text
- transformations discussed above. Consequently, in writing code for
- web one must often do the prospective translation mentally in order to
- decide what form statements should take.
-
-
-