Club Amiga de Montreal

home *** CD-ROM | disk | FTP | other *** search

/ Club Amiga de Montreal - CAM / CAM_CD_1.iso / files / 116.lha / Web / web.doc < prev next >

Wrap

Text File | 1986-11-20 | 17.9 KB | 450 lines

Web Greg Lee July 8, 1986 General Description. Web is a preprocessor for assembly language programs. It converts assembly language programs which make use of certain structuring conventions into files which are acceptable to a 68000 assembler, in particular, the Metacomco assembler for the Amiga. Although inspired by Donald Knuth's system of the same name (see "Literate Programming" in The Computer Journal, v.27, no.2, 1984, pp.97-111), it is very much more primitive than Knuth's system, and the syntax of the language it interprets is different from that of the real WEB. Structuring. The structuring conventions made available by the use of web fall into the following categories: comments (numbered comment sections, etc.) procedures (can be given more readable names) named code sections (code defined apart from where it is used) defined symbols (for which text strings will be substituted) line separator (';' separates multiple statements on a line) statement grouping (using '{...}', '«...»', and '[...]') branch mnemonics (using '=' for 'bne', etc.) breaks (an easy way to branch to the end of a section) infix statements (express move/add/sub/lea/cmp with '='/'?' infix) length typing (associate byte/word/long with variables) data declarations (declare simple unitialized data variables) Some details about these can be found below. The source for web makes use of these conventions, and this is the principle documentation for the use of web, both by way of example and providing details of the implementation. The source is not put forward as an ideal model of style (you can do better). Usage. The source code that web is to process is assumed to be in a file whose name ends in '.w', as for example in 'web.w'. When you invoke web, give the name of the file on the command line, either with or without the '.w' suffix, as you choose. The output file which is to be assembled will have the same name, except ending in '.a' instead of '.w'. It will be created in the same directory as that of the '.w' file, and any pre-existing file of that name will be deleted. For example, the command for web to process itself is either 'web web' or 'web web.w', and the assembly file created is 'web.a'. The largest file that web can process is fixed by the size of an internal buffer which is now set at 80,000 bytes. The source file can typically be twice this size, since comments and multiple spaces are removed before the text is placed in the buffer. The total number of procedures, named code sections, and definitions must not exceed the size of an internal table, which can now hold a maximum of 400 entries. Having the source code, of course you can easily change these limits. Optionality. The use of every part of web's extended syntax is optional. A program which could have been assembled directly can be sent through a web proprocessing stage without occasioning any changes in the resulting object code. So, if there are some aspects of the syntax you don't like, you don't need to use the related conventions. However, there are some incompatibilities between the languages that the assembler and web accept. So not just any correct assembly language program can be processed by web without problems. Here is a list of the potential difficulties: capital letters -- the first non-blank character of a line must not be a capital, unless the intention is to name a procedure; also, the sequences '.B','.W','.L' are elided semi-colon -- this must not be used for comments, since web understands it as a line separator colon -- after a label terminated with a colon, web starts a new line, so one can't use a colon with a label before the assembler directives macro, reg, set, equ, equr, section, rorg data sections -- web just does not understand data sections, so they must be treated as procedures (never called, of course); at the end of a data section, one must give a 'section ...,code' directive, so the assembler will accept the 'rts' instruction that web will add at the end of the section angle brackets -- assembler macros that take string arguments may have these arguments enclosed in angle brackets; web does not know that such angles should protect the enclosed characters from being changed double quotes -- web understands that single quotes protect characters, but it does not understand double quotes; use single quotes around file names after the 'include' directive keywords -- when the words 'define', 'byte', 'word', 'long' occur first on a line, possibly preceded by spaces, they are interpreted in a special way (detailed below) Comments. A paragraph of commentary can be introduced by a number followed by a period, or the Amiga topaz section character '§' (Alt single quote), or the paragraph character '¶' (Alt '6'). The paragraph introducer is the first nonblank thing on the line. Such a paragraph is terminated with an empty line. A second way of giving comments is to place them in parentheses. The initial left parenthesis must be the the very first thing on the line, but the closing right parenthesis can come anywhere following, except within another comment or enclosed in single quotes. Pairs of parenthses may occur within a parenthetical comment (so one can comment out portions of code that contain parenthetical comments). Thirdly, the bullet character '·' and everything on the remainder of its line is treated as a comment. This is a replacement for the assembler use of ';', which is no longer available with web. Lastly, '*' at the beginning of a line makes that line a comment. This is also an assembler convention. The assembler convention of placing comments after complete code instructions will also often work, but is not recommended, since it sometimes will interfere with web's syntactic analysis. It definitely will interfere after an infix statement (see below). Procedures. Subroutines can be given long names which may contain spaces. A line beginning with a capital letter is treated as such a name, and the lines following, up to the next procedure name or named code section or end of file, are part of the procedure. To call the procedure, just give its name preceded on a line by one or more blanks (possibly preceded by a left statement grouper or a colon-terminated label). Web generates an 'rts' instruction at the end of a procedure definition. At the end of the output file, it also generates an 'end' directive. Material at the beginning of the input file before any procedure name or named code section is treated as the body of an unnamed procedure, except that no 'rts' is added at the end. Procedures can be defined with parameters and with a list of registers to save. The list of parameters is given in parentheses following the name of the procedure, and the registers to save after that. The register list is in the form used by the 'movem' instruction. When the procedure is called with a parenthesized list of arguments, a series of 'move.l' instructions is generated for each of the names in the argument list with corresponding names in the parameter list as destinations. If one of the corresponding names should be missing, no move instruction is generated. Redundant moves, and moves to a destination which is the same as a subsequent source are not checked for. Named Code Sections. Named code sections are in-line procedures. A line beginning with two hyphens is treated as the name of a code section, and the section is defined as all following lines up to the next procedure name or named code section or end of file. To invoke the code, just give its name (including the hyphens) on a line preceded by one or more blanks (which may be preceded by a left brace or a colon-terminated label). Then the definition will be substituted for the name. Such code sections need not be defined before they are invoked, and the definitions may contain invocations of other named code sections, with no restriction on the level of nesting allowed. However, if there is recursion, the output file will exceed the capacity of most disk drives, since it will be infinitely long. Parameters and a register save list work just as they do for procedures. Due to a mistake in the implementation, if a given code section is to be invoked more than once, it must not contain '{...}', '«...»', '[...]' (see below), since then the generated labels will not be unique. Defines. Symbols can be defined as standing for a string of text. For instance, if the input file has the line: define new_line #10 then for all occurrences of 'new_line', in the output file the text '#10' would be substituted. A symbol is a span of lower case letters or '_' or the grave accent. In the definition, the substitution text starts with the first non-blank character after the defined symbol and continues to the end of the line. Symbols cannot be redefined, and the substitution text cannot contain material which web is responsible for processing, except that it can contain other defined symbols. As in the case of named code sections, there is no protection provided against recursive definitions. No symbol substitutions are made within labels which are followed by a colon or which stand alone on a line. Line Separater. One can put multiple code instructions on one line by separating them with ';'s. However, the titles of procedures or named code sections, 'define's, and data declarations cannot come after ';'. Statement Grouping. As an alternative to making up your own labels for program branches, you can have web do this for you. A unique label is substituted for each of every pair of matching left and right braces. For example, instead of bne lab23 you can write bne { move D0,D3 move D0,D3 } lab23: After the label for a left brace, a new line is begun in the output, and left indentation is removed before the labels substituted for either brace. In addition, '«' (Alt shift '+') is provided as a synonym for '{', and '»' (Alt ';') is a synonym for ';}', so instead of the above, you can also put: bne « move D0,D3 » A right brace could not be used in this last example, because the label substituted would be put on the same line as the move instruction. Yet another alternative for statement grouping is to use '[...]'. These work like braces, and can be used for notational variety. However, they generate a different set of labels, and they need not be properly nested with respect to braces. For example, one can make unstructured loops like this: { add.w d0,d1 [ subq.l #1,d3 bpl } moveq #30,d3 subq.l #1,d4 bpl ] Branch Mnemonics. Used with braces, the ordinary branch mnemonics are unintuitive. Consequently, the following alternative symbols are provided: use <=± for bgt use <= for bhi use >=± for blt use >= for bcs use ~= for beq use != for beq use -> for bra use <± for bge use < for bcc use >± for ble use > for bls use = for bne use - for bpl ('-' must be followed by a blank) use + for bmi Thus the above example could also have been written: = « move D0,D3 » Breaks. As a further means to avoid writing labels, the symbols '¶' (Alt '6') and '§' (Alt single-quote) can be used as targets of branch instructions. '¶' refers to the position of the next following unmatched right brace, and '§' refers to the end of the current procedure or named code section. The "end" is just before the 'movem.l (sp)+,...' to restore registers if a register save list was given, and for a procedure, before the the 'rts' instruction generated by web. Note that '¶' does not refer to ']', only '}'. Thus one can break out of deeply nested inner loops for which brackets were used to the end of an outer loop enclosed by braces. Infix Statements. Web allows a few C-style infix statements. However in most cases the intended length for the generated assembler mnemonic must be explicitly given as a suffix on the first operand. Here are examples for some byte-length operations: use d0.b = d2 for move.b d2,d0 use d0.b += d2 for add.b d2,d0 use d0.b -= d2 for sub.b d2,d0 use d0.b ? d2 for cmp.b d2,d0 Similarly, word and long word operations can be expressed using '.w' and '.l' suffixes. '*=', '/=', '&=', '|=', '<<=', '>>=' generate respectively instructions mulu, divu, and, or, lsl, lsr. An lea instruction is generated when the second operand begins with '&' or '^': use a0 = &buffer for lea buffer,a0 use a0 = ^buffer for lea buffer,a0 If the second operand begins with single-quote, '@', '%', or digits optionally preceded by '-' but not followed by '(', then '#' is appended. For instance: use d0.b = 'z' for move.b #'z',d0 use (a0)+.w ? 312 for cmp.w #312,(a0)+ use -(a0).l = -11 for move.l #-11,-(a0) use verf.b -= 1(a0) for sub.b 1(a0),verf use verf.b -= %10(a0) for sub.b #%10(a0),verf (sic!!) For add's and sub's, if the second operand begins with a single digit from 1 to 8 (not followed by '('), addq and subq are used: use memvar.w += 7 for addq.w #7,memvar use memvar.w += 7+2 for addq.w #7+2,memvar (sic!!) use memver.w += 9 for add.w #9,memvar Substitution for defined symbols is done before the conversion of infix statements, so in the above cases the operands could be in symbolic form without preventing web from finding the appropriate assembler instruction. But web does not analyze assembler 'equ' directives, so equ'd symbols should not be used for operands of infix statements. When the first operand of an infix statement is not given a length suffix, then provided certain conditions are satisfied, web will use a moveq instruction for an '=' statement and an addq.l or subq.l for '+=' or '-=' statements. Otherwise, the instruction mnemonic is just passed on to the assembler without a length suffix. For moveq, 'd' & a digit must precede the '=' sign (signifying one hopes that a data register is referred to) and digit or '-' digit must follow (unless followed by '('). For addq.l and subq.l, the second operand must be a digit from 1 to 8. use d0 = 0 for moveq #0,d0 use had0 = 0 for moveq #0,had0 (sic!!) use d0 = 128 for moveq #128,d0 (sic!!) use a0 += 4 for addq.l #4,a0 If the second operand is an expression intended for the assembler to evaluate, spacing may be used freely, since the spaces will be removed by web. use d0 = 1 + 28 for moveq #1+28,d0 Length Typing. The treatment of length suffixes in infix statements as discussed above has this drawback: length is made to appear to be a property of a variable (which is often reasonable), yet only when the variable appears at the left of an infix operator. As a partial corrective, there is an added facility for expressing variable references in a uniform way. If the length suffixes are given with capital letters, the information they convey will be made use of in translating infix statements, but deleted everywhere else. One might have a sequence such as: for: d2.B = memvar.B move.b memvar,d2 d1.B = d2.B move.b d2,d1 d1.B ? verf.B cmp.b verf,d1 != « memvar.B += d1.B » beq somelabel ... add.b d1,memvar somelabel: ... In this example, no use will be made of the '.B's on the right hand sides; they are for appearance only. Now in conjunction with the define feature, length information can be made implicit, if one wishes. Accompanied by: define snail D1.B define clips D2.B the above could be written: clips = memvar.B snail = clips snail ? verf.B != « memvar.B += snail » ... One does not lose any flexibility in following such a policy of variable usage, since implicit lengths can always be explicitly overridden. While the above 'snail = clips' translates to 'move.b D2,D1', if for some reason a long operation was wanted, e.g., one could write 'snail.l = clips', which would translate to 'move.l D2,D1'. Data Declarations. Built on top of the preceding method of hiding variable lengths is a primitive provision for declaring memory variables. The keywords 'long', 'word', and 'byte' define the following symbol as having the named length, and a bss section is appended to the output file requesting the allocation of memory to hold the data. For the above example, one could add the declarations: byte memvar byte verf and revise the code to read: clips = memvar snail = clips snail ? verf != « memvar += snail » ... In the output file, this would produce: move.b .0001Q,D2 move.b D2,D1 cmp.b .0002Q,D1 beq .0001 add.b D1,.0001Q .0001 .8001 ... section webroom,bss ... .0001Q DS.B 1 .0002Q DS.B 1 ... end (The actual labels generated would depend on what code had preceded. The label '.8001' is there as the target of a potential branch to ¶ instruction, though none occurs in this instance.) Long words declared in this way will be aligned on long word boundaries, and words aligned on word boundaries. Converting old assembly files. A separate utility 'atow' will convert move, add, and sub instructions to web's '=' format. Bugs. There is practically no error checking, aside from checking for balanced braces in each section. See the above lines with '(sic!!)' at the end. The language interpreted by web does not have a regular syntax. The only criterion for correctness is whether the code will be translated into proper and appropriate assembly language by the simple text transformations discussed above. Consequently, in writing code for web one must often do the prospective translation mentally in order to decide what form statements should take.