home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
-
-
-
-
-
-
-
- Version 6.0 of Icon*
-
-
- Ralph E. Griswold
- William H. Mitchell
- Janalee O'Bagy
-
-
-
- TR 86-10a
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- June 8, 1986
-
-
- Department of Computer Science
-
- The University of Arizona
-
- Tucson, Arizona 85721
-
-
-
-
- *This work was supported by the National Science Foundation under
- Grant DCR-8401831.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Version 6.0 of Icon
-
-
-
-
- _1_.___B_a_c_k_g_r_o_u_n_d
-
- Different versions of the Icon programming language are iden-
- tified by numbers. The first, Version 1, was superceded by Ver-
- sion 2. Starting with Version 3, the implementation of Icon was
- completely changed. While Version 2 is still in use on some com-
- puters, Version 5 is by far the mostly widely used version of
- Icon. The features of Version 5 are described in the ``Icon
- book'' [1]. Since this book is the only complete and generally
- available description of a widely used version of Icon, Version 5
- is sometimes called ``standard Icon''.
-
- Since Icon is the byproduct of a research effort that is con-
- cerned with the development of novel programming language
- features, some extensions to the standard language are inevit-
- able. These extensions are incorporated as features of new
- releases of the implementation. These releases are identified by
- minor version numbers that are separated from the major version
- number by a decimal point. For example, Version 5.10 is the tenth
- minor revision of Version 5.
-
- Because of the relationship between language design and imple-
- mentation, major version numbers usually contain significant
- language changes that have accumulated over a number of minor
- revisions, while minor version numbers primarily reflect imple-
- mentation changes.
-
- Recently an implementation of Icon written almost entirely in
- C has been completed. This implementation is so different from
- previous ones that some designation is needed to distinguish it
- from other implementations. It therefore has been designated Ver-
- sion 6. Despite the change in major version number, the Version
- 6.0 language is nearly the same as the Version 5.10 language and
- the Version 5 book continues to serve as the primary reference
- for Version 6. Since a revision of a book is a major and time-
- consuming process, this report is provided to supplement the
- present Icon book. Together, the book and this report provide a
- description of Version 6.0.
-
- Most of the language extensions in Version 6.0 are upward-
- compatible with Version 5 and almost all programs that contain
- only standard Version 5 features work properly under Version 6.0.
- However, some of the more implementation-dependent aspects
- described in the Version 5 book are now obsolete and there are
- significant new language features.
-
-
-
-
- - 1 -
-
-
-
-
-
-
-
-
- _2_.___F_e_a_t_u_r_e_s__o_f__V_e_r_s_i_o_n__6_._0
-
- Version 6.0 extensions consist of a declaration to facilitate
- the use of Icon procedure libraries, a new set data type, new
- options for sorting tables, new syntax to support the use of co-
- expressions in programmer-defined control operations, the invoca-
- tion of functions and operators by their string names, and a few
- new functions.
-
- _2_._1___T_h_e__L_i_n_k__D_e_c_l_a_r_a_t_i_o_n
-
- The link declaration simplifies the inclusion of separately
- translated libraries of Icon procedures. If icont [2] is run with
- the -c option, source files are translated into intermediate
- _u_c_o_d_e files (with names ending in .u1 and .u2). For example,
-
- icont -c libe.icn
-
- produces the ucode files libe.u1 and libe.u2. The ucode files can
- be incorporated in another program with the new link declaration,
- which has the form
-
- link libe
-
- The argument of link is, in general, a list of identifiers or
- string literals that specify the names of files to be linked
- (without the .u1 or .u2). Thus, when running under UNIX*,
-
- link libe, "/usr/icon/ilib/collate"
-
- specifies the linking of libe in the current directory and col-
- late in /usr/icon/ilib. The syntax for paths may be different
- for other operating systems.
-
- The environment variable IPATH controls the location of files
- specified in link declaratinos. The value of IPATH should be a
- blank-separated string1 of the form _p_1 _p_2 ... _p_n where each _p_i
- names a directory. Each directory is searched in turn to locate
- files named in link declarations. The default value of IPATH is
- the current directory.
-
- _2_._2___S_e_t_s
-
- Sets are unordered collections of values and have many of the
- properties normally associated with sets in the mathematical
- sense. The function
-
- set(a)
-
- __________________________
- *UNIX is a trademark of AT&T Bell Laboratories.
- 1The separator is a colon for UNIX systems under Versions
- 5.9 and 5.10.
-
-
-
-
- - 2 -
-
-
-
-
-
-
-
-
- creates a set that contains the distinct elements of the list a.
- For example,
-
- set(["abc",3])
-
- creates a set with two members, abc and 3. Note that
-
- set([])
-
- creates an empty set. Sets, like other data aggregates in Icon,
- need not be homogeneous - a set may contain members of different
- types.
-
- Sets, like other Icon data aggregates, are represented by
- pointers to the actual data. Sets can be members of sets, as in
-
- s1 := set([1,2,3])
- s2 := set([s1,[]])
-
- in which s2 contains two members, one of which is a set of three
- members and the other of which is an empty list.
-
- Any specific value can occur only once in a set. For example,
-
- set([1,2,3,3,1])
-
- creates a set with the three members 1, 2, and 3. Set membership
- is determined the same way the equivalence of values is deter-
- mined in the operation
-
- x === y
-
- For example,
-
- set([[],[]])
-
- creates a set that contains two distinct empty lists.
-
- Several set operations are provided. The function
-
- member(s,x)
-
- succeeds and returns the value of x if x is a member of s, but
- fails otherwise. Note that
-
- member(s1,member(s2,x))
-
- succeeds if x is a member of both s1 and s2.
-
- The function
-
- insert(s,x)
-
- inserts x into the set s and returns the value of s. Note that
-
-
-
- - 3 -
-
-
-
-
-
-
-
-
- insert(s,x) is similar to put(a,x) in form. A set may contain (a
- pointer to) itself:
-
- insert(s,s)
-
- adds s as an member of itself.
-
- The function
-
- delete(s,x)
-
- deletes the member x from the set s and returns the value of s.
-
- The functions insert(s,x) and delete(s,x) always succeed,
- whether or not x is in s. This allows their use in loops in which
- failure may occur for other reasons. For example,
-
- s := set([])
- while insert(s,read())
-
- builds a set that consists of the (distinct) lines from the stan-
- dard input file.
-
- The operations
-
- s1 ++ s2
- s1 ** s2
- s1 -- s2
-
- create the union, intersection, and difference of s1 and s2,
- respectively. In each case, the result is a new set.
-
- The use of these operations on csets is unchanged. There is no
- automatic type conversion between csets and sets; the result of
- the operation depends on the types of the arguments. For example,
-
- 'aeiou' ++ 'abcde'
-
- produces the cset 'abcdeiou', while
-
- set([1,2,3]) ++ set([2,3,4])
-
- produces a set that contains 1, 2, 3, and 4. On the other hand,
-
- set([1,2,3]) ++ 4
-
- results in Run-time Error 119 (set expected).
-
- The functions and operations of Icon that apply to other data
- aggregates apply to sets as well. For example, if s is a set,
-
- *s
-
- is the size of s (the number of members in it). Similarly,
-
-
-
- - 4 -
-
-
-
-
-
-
-
-
- type(s)
-
- produces the string set. The string images of sets are in the
- same style as for other aggregates, with the size enclosed in
- parentheses. Therefore,
-
- s := set(["abc",3])
- write(image(s))
-
- writes set(2).
-
- The operation
-
- !s
-
- generates the members of s, but in no predictable order. Simi-
- larly,
-
- ?s
-
- produces a randomly selected member of s. These operations pro-
- duce values, not variables - it is not possible to assign a value
- to !s or ?s.
-
- The function
-
- copy(s)
-
- produces a new set, distinct from s, but which contains the same
- members as s. The copy is made in the same fashion as the copy of
- a list - the members themselves are not copied.
-
- The function
-
- sort(s)
-
- produces a list containing the members of s in sorted order.
- Sets occur after tables but before records in Icon's collating
- sequence.
-
- _E_x_a_m_p_l_e_s
-
- _W_o_r_d _C_o_u_n_t_i_n_g:
-
- The following program lists, in alphabetical order, all the
- different words that occur in standard input:
-
-
-
-
-
-
-
-
-
-
-
- - 5 -
-
-
-
-
-
-
-
-
- procedure main()
- letter := &lcase ++ &ucase
- words := set([])
- while text := read() do
- text ? while tab(upto(letter)) do
- insert(words,tab(many(letter)))
- every write(!sort(words))
- end
-
-
- _T_h_e _S_i_e_v_e _o_f _E_r_a_t_o_s_t_h_e_n_e_s:
-
- The follow program produces prime numbers, using the classical
- ``Sieve of Eratosthenes'':
-
- procedure main()
- local limit, s, i
- limit := 5000
- s := set([])
- every insert(s,1 to limit)
- every member(s,i := 2 to limit) do
- every delete(s,i + i to limit by i)
- primes := sort(s)
- write("There are ",*primes," primes in the first ",limit," integers.")
- write("The primes are:")
- every write(right(!primes,*limit + 1))
- end
-
-
- _2_._3___S_o_r_t_i_n_g__T_a_b_l_e_s
-
- Two new options are available for sorting tables. These
- options are specified by the values 3 and 4 as the second argu-
- ment of sort(t,i). Both of these options produce a single list
- in which the entry values and assigned values of table elements
- alternate. A value of 3 for i produces a list in which the entry
- values are in sorted order, and a value of 4 produces a list in
- which the assigned values are in sorted order. For example, if
- the table wcount contains elements whose entry values are words
- and whose corresponding assigned values are counts, the following
- code segment writes out a list of the words and their counts,
- with the words in alphabetical order:
-
- a := sort(wcount,3)
- every i := 1 to *a - 1 by 2 do
- write(a[i]," : ",a[i + 1])
-
-
- The main advantage of the new sorting options is that they
- only produce a single list, rather than a list of lists as pro-
- duced by the options 1 and 2. The amount of space needed for the
- single list is proportionally much less than for the list of
- lists.
-
-
-
-
- - 6 -
-
-
-
-
-
-
-
-
- _2_._4___P_r_o_g_r_a_m_m_e_r_-_D_e_f_i_n_e_d__C_o_n_t_r_o_l__O_p_e_r_a_t_i_o_n_s
-
- As described in [3], co-expressions can be used to provide
- programmer-defined control operations. Version 6.0 provides sup-
- port for this facility by means of an alternative syntax for pro-
- cedure invocation in which the arguments are passed in a list of
- co-expressions. This syntax uses braces in place of parentheses:
-
- p{_e_x_p_r_1, _e_x_p_r_2, ..., _e_x_p_r_n}
-
- is equivalent to
-
- p([create _e_x_p_r_1, create _e_x_p_r_2, ..., create _e_x_p_r_n])
-
- Note that
-
- p{}
-
- is equivalent to
-
- p([])
-
-
- _2_._5___I_n_v_o_c_a_t_i_o_n__B_y__S_t_r_i_n_g__N_a_m_e
-
- A string-valued expression that corresponds to the name of a
- procedure or operation can be used in place of the procedure or
- operation in an invocation expression. For example,
-
- "image"(x)
-
- produces the same call as
-
- image(x)
-
- and
-
- "-"(i,j)
-
- is equivalent to
-
- i - j
-
-
- In the case of operator symbols with unary and binary forms,
- the number of arguments determines the operation. Thus
-
- "-"(i)
-
- is equivalent to
-
- -i
-
- Since to-by is an operation, despite its reserved-word syntax, it
-
-
-
- - 7 -
-
-
-
-
-
-
-
-
- is included in this facility with the string name "..." . Thus
-
- "..."(1,10,2)
-
- is equivalent to
-
- 1 to 10 by 2
-
- Similarly, range specifications are represented by ":", so that
-
- ":"(s,i,j)
-
- is equivalent to
-
- s[i:j]
-
-
- Defaults are not provided for omitted or null-valued arguments
- in this facility. Consequently,
-
- "..."(1,10)
-
- results in a run-time error when it is evaluated.
-
- The subscripting operation is available with the string name
- "[]". Thus
-
- "[]"(&lcase,3)
-
- produces c.
-
- Arguments to operators invoked by string names are derefer-
- enced. Consequently, string invocation for assignment operations
- is ineffective and results in error termination.
-
- String names are available for the operations in Icon, but not
- for control structures. Thus
-
- "|"(_e_x_p_r_1,_e_x_p_r_2)
-
- is erroneous. Note that string scanning is a control structure.
-
- Field references, of the form
-
- _e_x_p_r . _f_i_e_l_d_n_a_m_e
-
- are not operations in the ordinary sense and are not available
- via string invocation. In addition, conjunction is not available
- via string invocation, since no operation is actually performed.
-
- String names for procedures are available through global iden-
- tifiers. Note that the names of functions, such as image, are
- global identifiers. Similarly, any procedure-valued global iden-
- tifier may be used as the string name of a procedure. Thus in
-
-
-
- - 8 -
-
-
-
-
-
-
-
-
- global q
-
- procedure main()
- q := p
- "q"("hi")
- end
-
- procedure p(s)
- write(s)
- end
-
- the procedure p is invoked via the global identifier q.
-
- _2_._6___N_e_w__F_u_n_c_t_i_o_n_s
-
- The function proc(x,i) converts x to a procedure, if possible.
- If x is procedure-valued, its value is returned unchanged. If the
- value of x is a string that corresponds to the name of a pro-
- cedure as described in the preceding section, the corresponding
- procedure value is returned. The value of i is used to distin-
- guish between unary and binary operators. For example,
- proc("*",2) produces the multiplication operator, while
- proc("*",1) produces the size operator. The default value for i
- is 1. If x cannot be converted to a procedure, proc(x,i) fails.
-
- The function seq(i,j) generates an infinite sequence of
- integers starting at i with increments of j. An omitted or null-
- valued argument defaults to 1. For example, seq() generates 1, 2,
- 3, ... .
-
- Storage is allocated automatically during the execution of an
- Icon program, and garbage collections are performed automatically
- to reclaim storage for subsequent reallocation. Garbage collec-
- tion normally only occurs when it is necessary. Garbage collec-
- tion can be forced by the function collect().
-
- _2_._7___M_i_n_o_r__L_a_n_g_u_a_g_e__C_h_a_n_g_e_s
-
- There are two minor language changes:
-
- o+ The keyword &options of Version 5.10 is not present in
- Version 6.0.
-
- o+ Version 6.0 reads the last line of a file, even if that
- line does not end with a newline; Version 5 discards
- such a line.
-
- _2_._8___V_e_r_s_i_o_n__C_h_e_c_k_i_n_g
-
- The Icon translator converts a source-language program to an
- intermediate form, called _u_c_o_d_e. The Icon linker converts one or
- more ucode files to a binary form called _i_c_o_d_e. The format of
- Version 6.0 ucode and icode files is different from that of ear-
- lier versions. To avoid the possibility of malfunction due to
-
-
-
- - 9 -
-
-
-
-
-
-
-
-
- incompatible ucode and icode formats, Version 6.0 checks both
- ucode and icode files and terminates processing with an error
- message if the versions are not correct.
-
-
- _3_.___O_b_s_o_l_e_t_e__a_n_d__C_h_a_n_g_e_d__F_e_a_t_u_r_e_s__o_f__V_e_r_s_i_o_n__5
-
- The original implementation of Version 5 supported both a com-
- piler (iconc) and an interpreter (icont). Version 6.0 supports
- only an interpreter. Interpretation is only slightly slower than
- the execution of compiled code and the interpreter is portable
- and gets into execution much more quickly than the compiler. How-
- ever, it is not possible to load C functions with the interpreter
- as it was with the compiler. A system for personalized inter-
- preters [4] is included with Version 6.0 for UNIX systems to make
- it comparatively easy to add new functions and otherwise modify
- the Icon run-time system.
-
- Some run-time environment variables have changed; see Appendix
- A. A number of error messages have been changed. Appendix B
- contains a list of run-time error messages.
-
-
-
- _4_.___K_n_o_w_n__B_u_g_s__i_n__V_e_r_s_i_o_n__6_._0
-
-
- o+ The translator does not detect arithmetic overflow in
- conversion of numeric literals. Very large numeric
- literals may have incorrect values.
-
- o+ Integer overflow on multiplication and exponentiation is
- not detected during execution. Such overflow may occur
- during type conversion.
-
- o+ Line numbers may be wrong in diagnostic messages related
- to lines with continued quoted literals.
-
- o+ In some cases, trace messages may show the return of sub-
- scripted values, such as &null[2], that would be errone-
- ous if they were dereferenced.
-
- o+ If a long file name for an Icon source-language program
- is truncated by the operating system, mysterious diagnos-
- tic messages may occur during linking.
-
- o+ Stack overflow is checked using a heuristic that may not
- always be effective.
-
- o+ If an expression such as
-
- x := create _e_x_p_r
-
- is used in a loop, and x is not a global variable,
- unreferenceable co-expressions are generated by each suc-
- cessive create operation. These co-expressions are not
-
-
-
- - 10 -
-
-
-
-
-
-
-
-
- garbage collected. This problem can be circumvented by
- making x a global variable or by assigning a value to x
- before the create operation, as in
-
- x := &null
- x := create _e_x_p_r
-
-
- o+ Stack overflow in a co-expression may not be detected and
- may cause mysterious program malfunction.
-
- o+ Co-expressions were designed to support coroutine pro-
- gramming as well as the more usual application for the
- controlled production of the results of a generator.
- However, the implementation is not sufficiently general
- to support coroutines. In Version 5, the use of co-
- expressions in a coroutine fashion could produce program
- malfunction. In Version 6.0, such uses are prevented and
- cause error termination. As a consequence, some uses of
- co-expressions that worked in Version 5 may not work in
- Version 6.0. In particular, the example in Section 13.4.2
- of the Icon book does not work in Version 6.0.
-
- o+ The garbage collector was designed for machines with com-
- paratively small address spaces and may not perform well
- for computers with very large address spaces. In partic-
- ular, if an attempt is made to create a very large data
- object that will not fit into memory (such as a million-
- element list), it takes an inordinately long time to
- determine that the object can not be allocated.
-
-
- _5_.___P_o_s_s_i_b_l_e__D_i_f_f_e_r_e_n_c_e_s__A_m_o_n_g__V_e_r_s_i_o_n__6_._0__I_m_p_l_e_m_e_n_t_a_t_i_o_n_s
-
- Version 6.0 of Icon is written almost entirely in C and most
- of its features are machine-independent. Appendix B of the Icon
- book lists differences that may occur because of different
- machine architectures. In addition to the differences listed
- there, the implementation of sets and tables uses a parameter
- that is different for 16- and 32-bit computers. This parameter
- determines how set members and table elements are physically
- stored as a result of hashing. The difference only is noticeable
- in the results produced by !x and ?x, where x is a set or a
- table.
-
- A few aspects of the implementation of Version 6.0 are
- specific to different computer architectures and operating sys-
- tems. Co-expressions require a context switch that is implemented
- in assembly language. If this context switch is not implemented,
- an attempt to activate a co-expression results in error termina-
- tion. Arithmetic overflow checking also generally requires
- assembly-language routines and may not be supported on some
- implementations of Version 6.0.
-
-
-
-
- - 11 -
-
-
-
-
-
-
-
-
- Some features of Icon, such as opening a pipe for i/o and the
- system function, are closely related to UNIX. These features
- should work correctly for Version 6.0 running on UNIX systems,
- but they may not be supported on other operating systems.
-
- _A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s
-
- In addition to the authors of this report, several persons
- contributed to the implementation of Version 6.0 of Icon. The
- implementation of sets and the new sorting options were done by
- Rob McConeghy. Other major contributors include Gregg Townsend,
- Chris Janton, and Kelvin Nilsen.
-
- _R_e_f_e_r_e_n_c_e_s
-
- 1. Griswold, Ralph E. and Madge T. Griswold. _T_h_e _I_c_o_n _P_r_o_g_r_a_m_-
- _m_i_n_g _L_a_n_g_u_a_g_e, Prentice-Hall, Inc., Englewood Cliffs, New Jersey.
- 1983.
-
- 2. Griswold, Ralph E. _I_C_O_N_T(_1), manual page for _U_N_I_X
- _P_r_o_g_r_a_m_m_e_r'_s _M_a_n_u_a_l, Department of Computer Science, The Univer-
- sity of Arizona. May 1986.
-
- 3. Griswold, Ralph E. and Michael Novak. ``Programmer-Defined
- Control Operations'', _T_h_e _C_o_m_p_u_t_e_r _J_o_u_r_n_a_l, Vol. 26, No. 2 (May
- 1983).
-
- 4. Griswold, Ralph E. _P_e_r_s_o_n_a_l_i_z_e_d _I_n_t_e_r_p_r_e_t_e_r_s _f_o_r _V_e_r_s_i_o_n _6._0
- _o_f _I_c_o_n, Technical Report TR 86-12, Department of Computer Sci-
- ence, The University of Arizona. May 1985.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 12 -
-
-
-
-
-
-
-
-
- Appendix A - Environment Variables
-
-
-
-
- There are a number of environment variables that can be set to
- override the default values for sizes of Icon's storage regions
- and other run-time parameters. These environment variables are:
-
- TRACE The initial value of &trace. The default value is
- zero.
-
- NBUFS The number of input-output buffers. The default
- value is 10 for computers with large address
- spaces and 5 for computers with small address
- spaces.
-
- NOERRBUF No buffering of standard error output.
-
- ICONCORE Produce a core dump in the case of error termina-
- tion.
-
- MSTKSIZE The size in words of the main interpreter stack.
- The default value is 10000 for machines with large
- address spaces and 3000 for machines with small
- address spaces.
-
- STRSIZE The initial size in bytes of the allocated string
- region. The default value is 51200 for machines
- with large address spaces and 10240 for machines
- with small address spaces.
-
- HEAPSIZE The initial size in bytes of the allocated block
- region. The default value is 51200 for machines
- with large address spaces and 10240 for machines
- with small address spaces.
-
- STATSIZE The initial size in bytes of the static region in
- which co-expressions are allocated. The default
- value is 20480 for machines with large address
- spaces and 1024 for machines with small address
- spaces.
-
- STATINCR The increment for expanding the static region. The
- default increment is one-fourth the initial size
- of the static region.
-
- COEXPSIZE The size in words of co-expression blocks. The
- default value is 2000 for machines with large
- address spaces and 1000 for machines with small
- address spaces.
-
-
-
-
-
-
- - 13 -
-
-
-
-
-
-
-
-
- Appendix B - Run-Time Error Messages
-
-
-
-
-
- 101 integer expected
- 102 numeric expected
- 103 string expected
- 104 cset expected
- 105 file expected
- 106 procedure or integer expected
- 107 record expected
- 108 list expected
- 109 string or file expected
- 110 string or list expected
- 111 variable expected
- 112 invalid type to size operation
- 113 invalid type to random operation
- 114 invalid type to subscript operation
- 115 list or table expected
- 116 invalid type to element generator
- 117 missing main procedure
- 118 co-expression expected
- 119 set expected
-
- 201 division by zero
- 202 remaindering by zero
- 203 integer overflow
- 204 real overflow underflow or division by zero
- 205 value out of range
- 206 negative first operand to real exponentiation
- 207 invalid field name
- 208 second and third arguments to map of unequal length
- 209 invalid second argument to open
- 210 argument to system function too long
- 211 by clause equal to zero
- 212 attempt to read file not open for reading
- 213 attempt to write file not open for writing
- 214 recursive co-expression activation
-
- 301 interpreter stack overflow
- 302 C stack overflow
- 303 unable to expand memory region
- 304 memory region size changed
-
-
-
-
-
-
-
-
-
-
-
-
- - 14 -
-
-