home *** CD-ROM | disk | FTP | other *** search
-
- UNIX Assembler Reference Manual
-
- Dennis M. Ritchie
-
- Bell Laboratories
- Murray Hill, New Jersey
-
-
-
- 0. Introduction
-
- This document describes the usage and input syntax of the UNIX PDP-11
- assembler as. The details of the PDP-11 are not described; consult the DEC
- documents PDP-11/20 Handbook and PDP-11/45 Handbook.
-
- The input syntax of the UNIX assembler is generally similar to that of the
- DEC assembler PAL-11R, although its internal workings and output format are
- unrelated. It may be useful to read the publication DEC-11-ASDB-D, which
- describes PAL-11R, although naturally one must use care in assuming that
- its rules apply to as.
-
- As is a rather ordinary two-pass assembler without macro capabilities. It
- produces an output file which contains relocation information and a
- complete symbol table; thus the output is acceptable to the UNIX link-
- editor ld, which may be used to combine the outputs of several assembler
- runs and to obtain object programs from libraries. The output format has
- been designed so that if a program contains no unresolved references to
- external symbols, it is executable without further processing.
-
- 1. Usage
-
- as is used as follows:
-
- as [ - ] file
-
- If the optional '-' argument is given, all undefined symbols in the current
- assembly will be made undefined external. See the .globl directive below.
-
- The other arguments name files which are concatenated and assembled. Thus
- programs may be written in several pieces and assembled together.
-
- The output of the assembler is placed on the file a.out in the current
- directory. If there were no unresolved external references, and no errors
- detected, a.out is made executable; otherwise, if it is produced at all, it
- is made non-executable.
-
- 2. Lexical conventions
-
- Assembler tokens include identifiers (alternatively, symbols or names),
- temporary symbols, constants, and operators.
-
- 2.1 Identifiers
-
- An identifier consists of a sequence of alphanumeric characters (including
- period '.' , underscore '_' , and tilde '~' as alphanumeric) of which the
- first may not be numeric. Only the first eight characters are significant.
- When a name begins with a tilde, the tilde is discarded and that occurrence
- of the identifier generates a unique entry in the symbol table which can
- match no other occurrence of the identifier. This feature is used by the C
- compiler to place names of local variables in the output symbol table
- without having to worry about making them unique.
-
- 2.2 Temporary symbols
-
- A temporary symbol consists of a digit followed by f or Temporary symbols
- are discussed fully in 5.1.
-
- 2.3 Constants
-
- An octal constant consists of a sequence of digits; 8 and The constant is
- truncated to 16 bits and interpreted in two's complement notation.
-
- A decimal constant consists of a sequence of digits terminated by a decimal
- point '.'. The magnitude of the constant should be representable in 15
- bits; i.e., be less than 32,768.
-
- A single-character constant consists of a single quote ''' followed by an
- ASCII character not a new-line. Certain dual character escape sequences
- are acceptable in place of the ASCII character to represent new-line and
- other non-graphics (see String statements, 5.5). The constant's value has
- the code for the given character in the least significant byte of the word
- and is null-padded on the left.
-
- A double-character constant consists of a double quote '"' followed by a
- pair of ASCII characters not including new-line. Certain dual-character
- escape sequences are acceptable in place of either of the ASCII characters
- to represent new-line and other non-graphics (see String statements, 5.5).
- The constant's value has the code for the first given character in the
- least significant byte and that for the second character in the most
- significant byte.
-
- 2.4 Operators
-
- There are several single- and double-character operators; see 6.
-
- 2.5 Blanks
-
- Blank and tab characters may be interspersed freely between to tokens , but
- may not be used within tokens (except character constants). A blank or tab
- is required to separate adjacent identifiers or constants not otherwise
- separated.
-
- 2.6 Comments
-
- The character '/' introduces a comment, which extends through the end of
- the line on which it appears. Comments are ignored by the assembler.
-
- 3. Segments
-
- Assembled code and data fall into three segments: the text seg-ment, the
- data segment, and the bss segment. The text segment is the one in which
- the assembler begins, and it is the one into which instructions are
- typically placed. The UNIX system will, if desired, enforce the purity of
- the text segment of programs by trapping write operations into it. Object
- programs produced by the assembler must be processed by the link-editor ld
- (using its '_' flag) if the text segment is to be write-protected. A
- single copy of the text segment is shared among all processes executing
- such a program.
-
- The data segment is available for placing data or instructions which will
- be modified during execution. Anything which may go in the text segment
- may be put into the data segment. In pro-grams with write-protected,
- sharable text segments, data segment contains the initialized but variable
- parts of a program. If the text segment is not pure, the data segment
- begins immediately after the text segment; if the text segment is pure, the
- data segment begins at the lowest 8K byte boundary after the text seg-ment.
-
- The bss segment may not contain any explicitly initialized code or data.
- The length of the bss segment (like that of text or data) is determined by
- the high-water mark of the location counter within it. The bss segment is
- actually an extension of the data segment and begins immediately after it.
- At the start of execution of a program, the bss segment is set to 0.
- Typically the bss segment is set up by statements exemplified by
-
- lab: . = .+10
-
- The advantage in using the bss segment for storage that starts off empty is
- that the initialization information need not be stored in the output file.
- See also Location counter and Assignment statements below.
-
- 4. The location counter
-
- One special symbol, '.', is the location counter. Its value at any time is
- the offset within the appropriate segment of the start of the statement in
- which it appears. The location counter may be assigned to, with the
- restriction that the current segment may not change; furthermore, the value
- of '.' may not de-crease. If the effect of the assignment is to increase
- the value of '.', the required number of null bytes are generated (but see
- Segments above).
-
- 5. Statements
-
- A source program is composed of a sequence of statements. Statements are
- separated either by new-lines or by semicolons. There are five kinds of
- statements: null statements, expression statements, assignment statements,
- string statements, and keyword statements.
-
- Any kind of statement may be preceded by one or more labels.
-
- 5.1 Labels
-
- There are two kinds of label: name labels and numeric labels. A name label
- consists of a name followed by a colon ':'. The effect of a name label is
- to assign the current value and type of the location counter '.' to the
- name. An error is indicated in pass 1 if the name is already defined; an
- error is indicated in pass 2 if the '.' value assigned changes the
- definition of the label.
-
- A numeric label consists of a digit 0 to 9 followed by a colon ':'. Such a
- label serves to define temporary symbols of the form Nb and Nf, where N is
- the digit of the label. As in the case of name labels, a numeric label
- assigns the current value and type of '.' to the temporary symbol.
- However, several numeric labels with the same digit may be used within the
- same assembly. References of the form Nf refer to the first numeric label
- N: forward from the reference; backward from the reference. This sort of
- temporary label was introduced by Knuth [The Art of Computer Programming,
- Vol I: Fundamental Algorithms]. Such labels tend to conserve both the
- symbol table space of the assembler and the inventive powers of the
- programmer.
-
- 5.2 Null statements
-
- A null statement is an empty statement (which may, however, have labels).
- A null statement is ignored by the assembler. Common examples of null
- statements are empty lines or lines containing only a label.
-
- 5.3 Expression statements
-
- An expression statement consists of an arithmetic expression not beginning
- with a keyword. The assembler computes its (16-bit) value and places it in
- the output stream, together with the appropriate relocation bits.
-
- 5.4 Assignment statements
-
- An assignment statement consists of an identifier, an equals sign '=', and
- an expression. The value and type of the expression are assigned to the
- identifier. It is not required that the type or value be the same in pass
- 2 as in pass 1, nor is it an error to redefine any symbol by assignment.
-
- Any external attribute of the expression is lost across an assignment.
- This means that it is not possible to declare a global symbol by assigning
- to it, and that it is impossible to define a symbol to be offset from a
- non-locally defined global symbol.
-
- As mentioned, it is permissible to assign to the location counter '.' . It
- is required, however, that the type of the expression assigned be of the
- same type as '.', and it is forbidden to decrease the value of '.' . In
- practice, the most common assignment to . has the form for some number n;
- this has the effect of generating n null bytes.
-
- 5.5 String statements
-
- A string statement generates a sequence of bytes containing ASCII
- characters. A string statement consists of a left string quote '<'
- followed by a sequence of ASCII characters not including newline, followed
- by a right string quote '>'. Any of the ASCII characters may be replaced
- by a two-character escape se-quence to represent certain non-graphic
- characters, as follows:
-
- \n NL (012)
- \t HT (011)
- \e EOT (004)
- \0 NUL (000)
- \r CR (015)
- \a ACK (006)
- \p PFX (033)
- \\ \
- \> >
-
- The last two are included so that the escape character and the right string
- quote may be represented. The same escape sequences may also be used
- within single- and double-character constants (see 2.3 above).
-
- 5.6 Keyword statements
-
- Keyword statements are numerically the most common type, since most machine
- instructions are of this sort. A keyword statement begins with one of the
- many predefined keywords of the assembler; the syntax of the remainder
- depends on the keyword. All the keywords are listed below with the syntax
- they require.
-
- 6. Expressions
-
- An expression is a sequence of symbols representing a value. Its
- constituents are identifiers, constants, temporary symbols, operators, and
- brackets. Each expression has a type.
-
- All operators in expressions are fundamentally binary in nature; if an
- operand is missing on the left, a 0 of absolute type is assumed.
- Arithmetic is two's complement and has 16 bits of precision. All
- operators have equal precedence, and expressions are evaluated strictly
- left to right except for the effect of brackets.
-
- 6.1 Expression operators
-
- The operators are:
-
- (blank) when there is no operator between operands, the effect is exactly
- the same as if a + had appeared.
-
- + addition
-
- - subtraction
-
- * multiplication
-
- \/ division (note that plain / starts a comment)
-
- & bitwise and
-
- | bitwise or
-
- >> logical right shift
-
- << logical left shift
-
- % modulo
-
- ! a!b is a or (not |); i.e., the or of the first operand and the
- one's complement of the second; most common use is as a unary.
-
- ^ result has the value of first operand and the type of the second;
- most often used to define new machine instructions with
- syntax identical to existing instructions.
-
- Expressions may be grouped by use of square brackets '['']'. (Round
- parentheses are reserved for address modes.)
-
- 6.2 Types
-
- The assembler deals with a number of types of expressions. Most types are
- attached to keywords and used to select the routine which treats that
- keyword. The types likely to be met explicitly are:
-
- undefined
- Upon first encounter, each symbol is undefined. It may be-
- come undefined if it is assigned an undefined expression.
- It is an error to attempt to assemble an undefined expres-
- sion in pass 2; in pass 1, it is not (except that certain
- keywords require operands which are not undefined).
-
- undefined external
- A symbol which is declared .globl but not defined in the
- current assembly is an undefined external. If such a sym-
- bol is declared, the link editor ld must be used to load
- the assembler's output with another routine that defines
- the undefined reference.
-
- absolute
- An absolute symbol is one defined ultimately from a con-
- stant. Its value is unaffected by any possible future ap-
- plications of the link-editor to the output file.
-
- text
- The value of a text symbol is measured with respect to the
- beginning of the text segment of the program. If the as-
- sembler output is link-edited, its text symbols may change
- in value since the program need not be the first in the
- link editor's output. Most text symbols are defined by ap-
- pearing as labels. At the start of an assembly, the value
- of '.' is text 0.
-
- data
- The value of a data symbol is measured with respect to the
- origin of the data segment of a program. Like text sym-
- bols, the value of a data symbol may change during a subse-
- quent link-editor run since previously loaded programs may
- have data segments. After the first .data statement, the
- value of '.' is data 0.
-
- bss
- The value of a bss symbol is measured from the beginning of
- the bss segment of a program. Like text and data symbols,
- the value of a bss symbol may change during a subsequent
- link-editor run, since previously loaded programs may have
- bss segments. After the first .bss statement, the value of
- '.' is bss 0.
-
- external absolute, text, data, or bss
- symbols declared .globl but defined within an assembly as
- absolute, text, data, or bss symbols may be used exactly as
- if they were not declared .globl; however, their value and
- type are available to the link editor so that the program
- may be loaded with others that reference these symbols.
-
- register
- The symbols
-
- r0 ... r5
- fr0 ... fr5
- sp
- pc
-
- are predefined as register symbols. Either they or symbols
- defined from them must be used to refer to the six
- general-purpose, six floating-point, and the 2 special-
- purpose machine registers. The behavior of the floating
- register names is identical to that of the corresponding
- general register names; the former are provided as a
- mnemonic aid.
-
- other types
- Each keyword known to the assembler has a type which is
- used to select the routine which processes the associated
- keyword statement. The behavior of such symbols when not
- used as keywords is the same as if they were absolute.
-
- 6.3 Type propagation in expressions
-
- When operands are combined by expression operators, the result has a type
- which depends on the types of the operands and on the operator. The rules
- involved are complex to state but were intended to be sensible and
- predictable. For purposes of expression evaluation the important types
- are:
-
- undefined
- absolute
- text
- data
- bss
- undefined external
- other
-
- The combination rules are then: If one of the operands is undefined, the
- result is undefined. If both operands are absolute, the result is
- absolute. If an absolute is combined with one of the other types mentioned
- above, or with a register expression, the result has the register or other
- type. As a consequence, one can refer to r3 as r0+3. If two operands of
- other type are combined, the result has the numerically larger type (not
- that this fact is very useful, since the values are not made public). An
- other type combined with an explicitly discussed type other than absolute
- acts like an absolute.
-
- Further rules applying to particular operators are:
-
- + If one operand is text-, data-, or bss-segment relocatable,
- or is an undefined external, the result has the postulated
- type and the other operand must be absolute.
-
- - If the first operand is a relocatable text-, data-, or
- bss-segment symbol, the second operand may be absolute (in
- which case the result has the type of the first operand);
- or the second operand may have the same type as the first
- (in which case the result is absolute). If the first
- operand is external undefined, the second must be abso-
- lute. All other combinations are illegal.
-
- ^ This operator follows no other rule than that the result
- has the value of the first operand and the type of the
- second.
-
- others It is illegal to apply these operators to any but abso-
- lute symbols.
-
- 7. Pseudo-operations
-
- The keywords listed below introduce statements which generate data in
- unusual forms or influence the later operations of the assembler. The
- metanotation:
-
- [ stuff ] ...
-
- means that 0 or more instances of the given stuff may appear. Also,
- boldface tokens are literals, italic words are substitutable.
-
- 7.1 .byte expression [ , expression ] ...
-
- The expressions in the comma-separated list are truncated to 8 bits and
- assembled in successive bytes. The expressions must be absolute. This
- statement and the string statement above are the only ones which assemble
- data one byte at at time.
-
- 7.2 .even
-
- If the location counter '.' is odd, it is advanced by one so the next
- statement will be assembled at a word boundary.
-
- 7.3 .if expression
-
- The expression must be absolute and defined in pass 1. If its value is
- nonzero, the .if is ignored; if zero, the statements between the .if and
- the matching .endif (below) are ignored. .if may be nested. The effect of
- .if cannot extend beyond the end of the input file in which it appears.
- (The statements are not totally ignored, in the following sense: .ifs and
- .endifs are scanned for, and moreover all names are entered in the symbol
- table. Thus names occurring only inside an .if will show up as undefined
- if the symbol table is listed.)
-
- 7.4 .endif
-
- This statement marks the end of a conditionally-assembled section of code.
- See .if above.
-
- 7.5 .globl name [ , name ] ...
-
- This statement makes the names external. If they are otherwise defined (by
- assignment or appearance as a label) they act within the assembly exactly
- as if the .globl statement were not given; however, the link editor ld may
- be used to combine this routine with other routines that refer these
- symbols.
-
- Conversely, if the given symbols are not defined within the current
- assembly, the link editor can combine the output of this assembly with that
- of others which define the symbols.
-
- As discussed in 1., it is possible to force the assembler to make all
- otherwise undefined symbols external.
-
- 7.6 .text
- 7.7 .data
- 7.8 .bss
-
- These three pseudo-operations cause the assembler to begin assembling into
- the text, data, or bss segment respectively. Assembly starts in the text
- segment. It is forbidden to assemble any code or data into the bss
- segment, but symbols may be defined and '.' moved about by assignment.
-
- 7.9 .comm name , expression
-
- Provided the name is not defined elsewhere, this statement is equivalent to
-
- .globl name
- name = expression ^ name
-
- That is, the type of name is undefined external, and its value is
- expression. In fact the name behaves in the current assembly just like an
- undefined external. However, the link-editor ld has been special-cased so
- that all external symbols which are not otherwise defined, and which have a
- non-zero value, are defined to lie in the bss segment, and enough space is
- left after the symbol to hold expression bytes. All symbols which become
- defined in this way are located before all the explicitly defined bss-
- segment locations.
-
- 8. Machine instructions
-
- Because of the rather complicated instruction and addressing structure of
- the PDP-11, the syntax of machine instruction state-ments is varied.
- Although the following sections give the syntax in detail, the 11/20 and
- 11/45 handbooks should be consulted on the semantics.
-
- 8.1 Sources and Destinations
-
- The syntax of general source and destination addresses is the same. Each
- must have one of the following forms, where reg is a register symbol, and
- expr is any sort of expression:
-
- syntax words mode
-
- reg 0 0+reg
- (reg)+ 0 2+reg
- _(reg) 0 4+reg
- expr(reg) 1 6+reg
- (reg) 0 1+reg
- *reg 0 1+reg
- *(reg)+ 0 3+reg
- *_(reg) 0 5+reg
- *(reg) 1 7+reg
- *expr(reg) 1 7+reg
- expr 1 67
- $expr 1 27
- *expr 1 77
- *$expr 1 37
-
- The words column gives the number of address words generated; the mode
- column gives the octal address-mode number. The syntax of the address
- forms is identical to that in DEC assemblers, except that '*' has been
- substituted for '@' and '$' for '#'; the UNIX typing conventions make '@'
- and '#' rather inconvenient.
-
- Notice that mode *reg is identical to (reg); that *(reg) generates an index
- word (namely, 0); and that addresses consisting of an unadorned expression
- are assembled as pc-relative references independent of the type of the
- expression. To force a non-relative reference, the form *$expr can be
- used, but notice that further indirection is impossible.
-
- 8.3 Simple machine instructions
-
- The following instructions are defined as absolute symbols:
-
- clc
- clv
- clz
- cln
- sec
- sev
- sez
- sen
-
- They therefore require no special syntax. The PDP-11 hardware
- allows more than one of the clear class, or alternatively
- more than one of the set class to be or-ed together; this may
- be expressed as follows:
-
- clc|clv
-
- 8.4 Branch
-
- The following instructions take an expression as operand. The expression
- must lie in the same segment as the reference, cannot be undefined-
- external, and its value cannot differ from the current location of '.' by
- more than 254 bytes:
-
- br blos
- bne bvc
- beq bvs
- bge bhis
- blt bec (= bcc)
- bgt bcc
- ble blo
- bpl bcs
- bmi bes (= bcs)
- bhi
-
- bes (branch on error set) and bec (branch on error clear) are intended to
- test the error bit returned by system calls (which is the c-bit).
-
- 8.5 Extended branch instructions
-
- The following symbols are followed by an expression representing an address
- in the same segment as '.'. If the target address is close enough, a
- branch-type instruction is generated; if the address is too far away, a jmp
- will be used.
-
- jbr jlos
- jne jvc
- jeq jvs
- jge jhis
- jlt jec
- jgt jcc
- jle jlo
- jpl jcs
- jmi jes
- jhi
-
- jbr turns into a plain jmp if its target is too remote; the oth-ers (whose
- names are contructed by replacing the b in the branch instruction's name by
- j) turn into the converse branch over a jmp to the target address.
-
- 8.6 Single operand instructions
-
- The following symbols are names of single-operand machine instructions.
- The form of address expected is discussed in 8.1 above.
-
- clr sbcb
- clrb ror
- com rorb
- comb rol
- inc rolb
- incb asr
- dec asrb
- decb asl
- neg aslb
- negb jmp
- adc swab
- adcb tst
- sbc tstb
-
- 8.7 Double operand instructions
-
- The following instructions take a general source and destination (8.1),
- separated by a comma, as operands.
-
- mov
- movb
- cmp
- cmpb
- bit
- bitb
- bic
- bicb
- bis
- bisb
- add
- sub
-
- 8.8 Miscellaneous instructions
-
- The following instructions have more specialized syntax. Here reg is a
- register name, src and dst a general source or destination (8.1), and expr
- is an expression:
-
- jsr reg,dst
- rts reg
- sys expr
- ash src,reg (or, als)
- ashc src,reg (or, alsc)
- mul src,reg (or, mpy)
- div src,reg (or, dvd)
- xor reg,dst
- sxt dst
- mark expr
- sob reg,expr
-
- sys is another name for the trap instruction. It is used to code system
- calls. Its operand is required to be expressible in 6 bits. The
- alternative forms for ash, ashc, mul, and div are provided to avoid
- conflict with EAE register names should they be needed.
-
- The expression in mark must be expressible in six bits, and the expression
- in sob must be in the same segment as '.', must not be external-undefined,
- must be less than '.', and must be within 510 bytes of '.'.
-
- 8.9 Floating-point unit instructions
-
- The following floating-point operations are defined, with syntax as
- indicated:
-
- cfcc
- setf
- setd
- seti
- setl
- clrf fdst
- negf fdst
- absf fdst
- tstf fsrc
- movf fsrc,freg (= ldf)
- movf freg,fdst (= stf)
- movif src,freg (= ldcif)
- movfi freg,dst (= stcfi)
- movof fsrc,freg (= ldcdf)
- movfo freg,fdst (= stcfd)
- movie src,freg (= ldexp)
- movei freg,dst (= stexp)
- addf fsrc,freg
- subf fsrc,freg
- mulf fsrc,freg
- divf fsrc,freg
- cmpf fsrc,freg
- modf fsrc,freg
- ldfps src
- stfps dst
- stst dst
-
- fsrc, fdst, and freg mean floating-point source, destination, and register
- respectively. Their syntax is identical to that for their non-floating
- counterparts, but note that only floating re-gisters 0_3 can be a freg.
-
- The names of several of the operations have been changed to bring out an
- analogy with certain fixed-point instructions. The only strange case is
- movf, which turns into either stf or ldf depending respectively on whether
- its first operand is or is not a register. Warning: ldf sets the floating
- condition codes, stf does not.
-
- 9. Other symbols
-
- 9.1 '..'
-
- The symbol '..' is the relocation counter. Just before each assembled word
- is placed in the output stream, the current value of this symbol is added
- to the word if the word refers to a text, data or bss segment location. If
- the output word is a pc-relative address word which refers to an absolute
- location, the value of '..' is subtracted.
-
- Thus the value of '..' can be taken to mean the starting core location of
- the program. In UNIX systems with relocation hardware, the initial value
- of '..' is 0.
-
- The value of '..' may be changed by assignment. Such a course of action is
- sometimes necessary, but the consequences should be carefully thought out.
- It is particularly ticklish to change '..' midway in an assembly or to do
- so in a program which will be treated by the loader, which has its own
- notions of '..'.
-
- 9.2 System calls
-
- The following absolute symbols may be used to code calls to the UNIX system
- (see the sys instruction above).
-
- break nice
- chdir open
- chmod read
- chown seek
- close setuid
- creat signal
- exec stat
- exit stime
- fork stty
- fstat tell
- getuid time
- gtty umount
- link unlink
- makdir wait
- mdate write
- mount
-
- Warning: the wait system call is not the same as the wait instruction,
- which is not defined in the assembler.
-
- 10. Diagnostics
-
- When an input file cannot be read, its name followed by a question mark is
- typed and assembly ceases. When syntactic or semantic errors occur, a
- single-character diagnostic is typed out together with the line number and
- the file name in which it occurred. Errors in pass 1 cause cancellation
- of pass 2. The possible errors are:
-
- ) parentheses error
- ] parentheses error
- > string not terminated properly
- * indirection (*) used illegally
- . illegal assignment to .
- A error in address
- B branch address is odd or too remote
- E error in expression
- F error in local (f or b) type symbol
- G garbage (unknown) character
- I end of file inside an .if
- M multiply defined symbol as label
- O word quantity assembled at odd address
- P phase error_ . different in pass 1 and 2
- R relocation error
- U undefined symbol
- X syntax error
-