Programmer's ROM - The Computer Language Library

home *** CD-ROM | disk | FTP | other *** search

/ Programmer's ROM - The Computer Language Library / programmersrom.iso / ada / lrm / chap02.doc < prev next >

Wrap

Text File | 1988-05-03 | 25.3 KB | 1,322 lines

The following document is a draft of the corresponding chapter of the version of the Ada Reference Manual produced in response to the Ansi Canvass. It is given a limited circulation to Ada implementers and to other groups contributing comments (according to the conventions defined in RRM.comments). This draft should not be referred to in any publication. ANSI-RM-02-v23 - Draft Chapter 2 Lexical Elements version 23 83-02-11 This revision has addressed all comments up to #5795 2. Lexical Elements The text of a program consists of the texts of one or more compilations. The text of a compilation is a sequence of lexical elements, each composed of characters; the rules of composition are given in this chapter. Pragmas, which provide certain information for the compiler, are also described in this chapter. References: character 2.1, compilation 10.1, lexical element 2.2, pragma 2.8 2.1 Character Set The only characters allowed in the text of a program are the graphic characters and format effectors. Each graphic character corresponds to a unique code of the ISO seven-bit coded character set (ISO standard 646), and is represented (visually) by a graphical symbol. Some graphic characters are represented by different graphical symbols in alternative national representations of the ISO character set. The description of the language definition in this standard reference manual uses the ASCII graphical symbols, the ANSI graphical representation of the ISO character set. graphic_character ::= basic_graphic_character | lower_case_letter | other_special_character basic_graphic_character ::= upper_case_letter | digit | special_character | space_character basic_character ::= basic_graphic_character | format_effector The basic character set is sufficient for writing any program. The characters included in each of the categories of basic graphic characters are defined as follows: (a) upper case letters A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (b) digits 0 1 2 3 4 5 6 7 8 9 2 - 1 (c) special characters " # & ' ( ) * + , - . / : ; < = > _ | (d) the space character Format effectors are the ISO (and ASCII) characters called horizontal tabulation, vertical tabulation, carriage return, line feed, and form feed. 2 - 2 The characters included in each of the remaining categories of graphic characters are defined as follows: (e) lower case letters a b c d e f g h i j k l m n o p q r s t u v w x y z (f) other special characters ! $ % ? @ [ \ ] ^ ` { } Allowable replacements for the special characters vertical bar (|), sharp (#), and quotation (") are defined in section 2.10. Notes: The ISO character that corresponds to the sharp graphical symbol in the ASCII representation appears as a pound sterling symbol in the French, German, and United Kingdom standard national representations. In any case, the font design of graphical symbols (for example, whether they are in italic or bold typeface) is not part of the ISO standard. The meanings of the acronyms used in this section are as follows: ANSI stands for American National Standards Institute, ASCII stands for American Standard Code for Information Interchange, and ISO stands for International Organization for Standardization. The following names are used when referring to special characters and other special characters: symbol name symbol name " quotation > greater than # sharp _ underline & ampersand | vertical bar ' apostrophe ! exclamation mark ( left parenthesis $ dollar ) right parenthesis % percent * star, multiply ? question mark + plus @ commercial at , comma [ left square bracket - hyphen, minus \ back-slash . dot, point, period ] right square bracket / slash, divide ^ circumflex : colon ` grave accent ; semicolon { left brace < less than } right brace = equal ` tilde 2.2 Lexical Elements, Separators, and Delimiters The text of a program consists of the texts of one or more compilations. The text of each compilation is a sequence of separate lexical elements. 2 - 3 Each lexical element is either a delimiter, an identifier (which may be a reserved word), a numeric literal, a character literal, a string literal, or a comment. The effect of a program depends only on the particular sequences of lexical elements that form its compilations, excluding the comments, if any. 2 - 4 In some cases an explicit separator is required to separate adjacent lexical elements (namely, when without separation, interpretation as a single lexical element is possible). A separator is any of a space character, a format effector, or the end of a line. A space character is a separator except within a comment, a string literal, or a space character literal. Format effectors other than horizontal tabulation are always separators. Horizontal tabulation is a separator except within a comment. The end of a line is always a separator. The language does not define what causes the end of a line. However if, for a given implementation, the end of a line is signified by one or more characters, then these characters must be format effectors other than horizontal tabulation. In any case, a sequence of one or more format effectors other than horizontal tabulation must cause at least one end of line. One or more separators are allowed between any two adjacent lexical elements, before the first of each compilation, or after the last. At least one separator is required between an identifier or a numeric literal and an adjacent identifier or numeric literal. A delimiter is either one of the following special characters (in the basic character set) & ' ( ) * + , - . / : ; < = > | or one of the following compound delimiters each composed of two adjacent special characters => .. ** := /= >= <= << >> <> Each of the special characters listed for single character delimiters is a single delimiter except if this character is used as a character of a compound delimiter, or as a character of a comment, string literal, character literal, or numeric literal. The remaining forms of lexical element are described in other sections of this chapter. Notes: Each lexical element must fit on one line, since the end of a line is a separator. The quotation, sharp, and underline characters, likewise two adjacent hyphens, are not delimiters, but may form part of other lexical elements. The following names are used when referring to compound delimiters: delimiter name => arrow .. double dot ** double star, exponentiate := assignment (pronounced: "becomes") /= inequality (pronounced: "not equal") >= greater than or equal 2 - 5 <= less than or equal << left label bracket >> right label bracket <> box References: character literal 2.5, comment 2.7, compilation 10.1, format effector 2.1, identifier 2.3, numeric literal 2.4, reserved word 2.9, space character 2.1, special character 2.1, string literal 2.6 2 - 6 2.3 Identifiers Identifiers are used as names and also as reserved words. identifier ::= letter {[underline] letter_or_digit} letter_or_digit ::= letter | digit letter ::= upper_case_letter | lower_case_letter All characters of an identifier are significant, including any underline character inserted between a letter or digit and an adjacent letter or digit. Identifiers differing only in the use of corresponding upper and lower case letters are considered as the same. Examples: COUNT X get_symbol Ethelyn Marion SNOBOL_4 X1 PageCount STORE_NEXT_ITEM Note: No space is allowed within an identifier since a space is a separator. References: digit 2.1, lower case letter 2.1, name 4.1, reserved word 2.9, separator 2.2, space character 2.1, upper case letter 2.1 2.4 Numeric Literals There are two classes of numeric literals: real literals and integer literals. A real literal is a numeric literal that includes a point; an integer literal is a numeric literal without a point. Real literals are the literals of the type universal_real. Integer literals are the literals of the type universal_integer. numeric_literal ::= decimal_literal | based_literal References: literal 4.2, universal_integer type 3.5.4, universal_real type 3.5.6 2.4.1 Decimal Literals 2 - 7 A decimal literal is a numeric literal expressed in the conventional decimal notation (that is, the base is implicitly ten). decimal_literal ::= integer [.integer] [exponent] integer ::= digit {[underline] digit} exponent ::= E [+] integer | E - integer 2 - 8 An underline character inserted between adjacent digits of a decimal literal does not affect the value of this numeric literal. The letter E of the exponent, if any, can be written either in lower case or in upper case, with the same meaning. An exponent indicates the power of ten by which the value of the decimal literal without the exponent is to be multiplied to obtain the value of the decimal literal with the exponent. An exponent for an integer literal must not have a minus sign. Examples: 12 0 1E6 123_456 -- integer literals 12.0 0.0 0.456 3.14159_26 -- real literals 1.34E-12 1.0E+6 -- real literals with exponent Notes: Leading zeros are allowed. No space is allowed in a numeric literal, not even between constituents of the exponent, since a space is a separator. A zero exponent is allowed for an integer literal. References: digit 2.1, lower case letter 2.1, numeric literal 2.4, separator 2.2, space character 2.1, upper case letter 2.1 2.4.2 Based Literals A based literal is a numeric literal expressed in a form that specifies the base explicitly. The base must be at least two and at most sixteen. based_literal ::= base # based_integer [.based_integer] # [exponent] base ::= integer based_integer ::= extended_digit {[underline] extended_digit} extended_digit ::= digit | letter An underline character inserted between adjacent digits of a based literal does not affect the value of this numeric literal. The base and the exponent, if any, are in decimal notation. The only letters allowed as extended digits are the letters A through F for the digits ten through fifteen. A letter in a based literal (either an extended digit or the letter E of an exponent) can be written either in lower case or in upper case, with the same meaning. 2 - 9 The conventional meaning of based notation is assumed; in particular the value of each extended digit of a based literal must be less than the base. An exponent indicates the power of the base by which the value of the based literal without the exponent is to be multiplied to obtain the value of the based literal with the exponent. 2 - 10 Examples: 2#1111_1111# 16#FF# 016#0FF# -- integer literals of value 255 16#E#E1 2#1110_0000# -- integer literals of value 224 16#F.FF#E+2 2#1.1111_1111_111#E11 -- real literals of value 4095.0 References: digit 2.1, exponent 2.4.1, letter 2.3, lower case letter 2.1, numeric literal 2.4, upper case letter 2.1 2.5 Character Literals A character literal is formed by enclosing one of the 95 graphic characters (including the space) between two apostrophe characters. A character literal has a value that belongs to a character type. character_literal ::= 'graphic_character' Examples: 'A' '*' ''' ' ' References: character type 3.5.2, graphic character 2.1, literal 4.2, space character 2.1 2.6 String Literals A string literal is formed by a sequence of graphic characters (possibly none) enclosed between two quotation characters used as string brackets. string_literal ::= "{graphic_character}" A string literal has a value that is a sequence of character values corresponding to the graphic characters of the string literal apart from the quotation character itself. If a quotation character value is to be represented in the sequence of character values, then a pair of adjacent quotation characters must be written at the corresponding place within the string literal. (This means that a string literal that includes two adjacent quotation characters is never interpreted as two adjacent string literals.) The length of a string literal is the number of character values in the sequence represented. (Each doubled quotation character is counted as a single character.) Examples: 2 - 11 "Message of the day:" "" -- an empty string literal " " "A" """" -- three string literals of length 1 "Characters such as $, %, and } are allowed in string literals" 2 - 12 Note: A string literal must fit on one line since it is a lexical element (see 2.2). Longer sequences of graphic character values can be obtained by catenation of string literals. Similarly catenation of constants declared in the package ASCII can be used to obtain sequences of character values that include nongraphic character values (the so-called control characters). Examples of such uses of catenation are given below: "FIRST PART OF A SEQUENCE OF CHARACTERS " & "THAT CONTINUES ON THE NEXT LINE" "sequence that includes the" & ASCII.ACK & "control character" References: ascii predefined package C, catenation operation 4.5.3, character value 3.5.2, constant 3.2.1, declaration 3.1, end of a line 2.2, graphic character 2.1, lexical element 2.2 2.7 Comments A comment starts with two adjacent hyphens and extends up to the end of the line. A comment can appear on any line of a program. The presence or absence of comments has no influence on whether a program is legal or illegal. Furthermore, comments do not influence the effect of a program; their sole purpose is the enlightenment of the human reader. Examples: -- the last sentence above echoes the Algol 68 report end; -- processing of LINE is complete -- a long comment may be split onto -- two or more consecutive lines ---------------- the first two hyphens start the comment Note: Horizontal tabulation can be used in comments, after the double hyphen, and is equivalent to one or more spaces (see 2.2). References: end of a line 2.2, illegal 1.6, legal 1.6, space character 2.1 2.8 Pragmas 2 - 13 A pragma is used to convey information to the compiler. A pragma starts with the reserved word pragma followed by an identifier that is the name of the pragma. pragma ::= pragma identifier [(argument_association {, argument_association})]; argument_association ::= [argument_identifier =>] name | [argument_identifier =>] expression 2 - 14 Pragmas are only allowed at the following places in a program: - After a semicolon delimiter, but not within a formal part or discriminant part. - At any place where the syntax rules allow a construct defined by a syntactic category whose name ends with "declaration", "statement", "clause", or "alternative", or one of the syntactic categories variant and exception handler; but not in place of such a construct. Also at any place where a compilation unit would be allowed. Additional restrictions exist for the placement of specific pragmas. Some pragmas have arguments. Argument associations can be either positional or named as for parameter associations of subprogram calls (see 6.4). Named associations are, however, only possible if the argument identifiers are defined. A name given in an argument must be either a name visible at the place of the pragma or an identifier specific to the pragma. The pragmas defined by the language are described in Annex B: they must be supported by every implementation. In addition, an implementation may provide implementation-defined pragmas, which must then be described in Appendix F. An implementation is not allowed to define pragmas whose presence or absence influences the legality of the text outside such pragmas. Consequently, the legality of a program does not depend on the presence or absence of implementation-defined pragmas. A pragma that is not language-defined has no effect if its identifier is not recognized by the (current) implementation. Furthermore, a pragma (whether language-defined or implementation-defined) has no effect if its placement or its arguments do not correspond to what is allowed for the pragma. The region of text over which a pragma has an effect depends on the pragma. Examples: pragma LIST(OFF); pragma OPTIMIZE(TIME); pragma INLINE(SETMASK); pragma SUPPRESS(RANGE_CHECK, ON => INDEX); Note: It is recommended (but not required) that implementations issue warnings for pragmas that are not recognized and therefore ignored. References: compilation unit 10.1, delimiter 2.2, discriminant part 3.7.1, exception handler 11.2, expression 4.4, formal part 6.1, identifier 2.3, implementation-defined pragma F, language-defined pragma B, legal 1.6, name 4.1, reserved word 2.9, statement 5, static expression 4.9, variant 3.7.3, visibility 8.3 Categories ending with "declaration" comprise: basic declaration 3.1, component declaration 3.7, entry declaration 9.5, generic parameter declaration 12.1 2 - 15 Categories ending with "clause" comprise: alignment clause 13.4, component clause 13.4, context clause 10.1.1, representation clause 13.1, use clause 8.4, with clause 10.1.1 Categories ending with "alternative" comprise: accept alternative 9.7.1, case statement alternative 5.4, delay alternative 9.7.1, select alternative 9.7.1, selective wait alternative 9.7.1, terminate alternative 9.7.1 2 - 16 2.9 Reserved Words The identifiers listed below are called reserved words and are reserved for special significance in the language. For readability of this manual, the reserved words appear in lower case boldface. abort declare generic of select abs delay goto or separate accept delta others subtype access digits if out all do in task and is package terminate array pragma then at else private type elsif limited procedure end loop begin entry raise use body exception range exit mod record when rem while new renames with case for not return constant function null reverse xor A reserved word must not be used as a declared identifier. Notes: Reserved words differing only in the use of corresponding upper and lower case letters are considered as the same (see 2.3). In some attributes the identifier that appears after the apostrophe is identical to some reserved word. References: attribute 4.1.4, declaration 3.1, identifier 2.3, lower case letter 2.1, upper case letter 2.1 2.10 Allowable Replacements of Characters The following replacements are allowed for the vertical bar, sharp, and quotation basic characters: - A vertical bar character (|) can be replaced by an exclamation mark (!) where used as a delimiter. - The sharp characters (#) of a based literal can be replaced by colons (:) provided that the replacement is done for both occurrences. - The quotation characters (") used as string brackets at both ends of a string literal can be replaced by percent characters (%) provided that 2 - 17 the enclosed sequence of characters contains no quotation character, and provided that both string brackets are replaced. Any percent character within the sequence of characters must then be doubled and each such doubled percent character is interpreted as a single percent character value. 2 - 18 These replacements do not change the meaning of the program. Notes: It is recommended that use of the replacements for the vertical bar, sharp, and quotation characters be restricted to cases where the corresponding graphical symbols are not available. Note that the vertical bar appears as a broken bar on some equipment; replacement is not recommended in this case. The rules given for identifiers and numeric literals are such that lower case and upper case letters can be used indifferently; these lexical elements can thus be written using only characters of the basic character set. If a string literal of the predefined type STRING contains characters that are not in the basic character set, the same sequence of character values can be obtained by catenating string literals that contain only characters of the basic character set with suitable character constants declared in the predefined package ASCII. Thus the string literal "AB$CD" could be replaced by "AB" & ASCII.DOLLAR & "CD". Similarly, the string literal "ABcd" with lower case letters could be replaced by "AB" & ASCII.LC_C & ASCII.LC_D. References: ascii predefined package C, based literal 2.4.2, basic character 2.1, catenation operation 4.5.3, character value 3.5.2, delimiter 2.2, graphic character 2.1, graphical symbol 2.1, identifier 2.3, lexical element 2.2, lower case letter 2.1, numeric literal 2.4, string bracket 2.6, string literal 2.6, upper case letter 2.1 2 - 19