home *** CD-ROM | disk | FTP | other *** search
- The CSTAR READ.ME file
-
- Read this documentation file first.
-
- June 25, 1991
-
-
- OVERVIEW
-
- The CSTAR language is an outgrowth of a language called PL/68K, which
- was described in the January 1986 issue of Dr. Dobb's Journal. CSTAR is
- a cross translator: it runs on the IBM PC and produces 68000 assembly
- code in Digital Research (DRI) format. The source code for the CSTAR
- compiler can be compiled using the Microsoft C compiler v5.1 or later or
- with Turbo C v1.5 or later. There is approximately four programmer years
- of work on this disk.
-
- CSTAR produces locally optimal code in almost all circumstances. By that I
- mean that CSTAR produces code for arithmetic operations and flow of
- control constructs that is at least as good as would typically be produced by
- an expert assembly language programmer. As you will see when you look at
- the code generator code, producing this kind of optimal code requires a
- *lot* of attention to detail. CSTAR does no global optimizations or loop
- optimizations, nor does it eliminate common subexpressions, but rather
- provides the ability to specify those optimizations in source code if desired.
-
-
- PUBLIC DOMAIN SOFTWARE
-
- The CSTAR program was placed in the public domain on June 15, 1991,
- by its principal author and sole owner:
-
- Edward K. Ream
- 1617 Monroe Street
- Madison, WI 53711
- (608) 257-0802
-
- CSTAR may be used for any commercial or non-commercial purpose.
-
-
- DISCLAIMER OF WARRANTIES
-
- Edward K. Ream (Ream) specifically disclaims all warranties,
- expressed or implied, with respect to this computer software,
- including but not limited to implied warranties of merchantability
- and fitness for a particular purpose. In no event shall Ream be
- liable for any loss of profit or any commercial damage, including
- but not limited to special, incidental consequential or other damages.
-
- CALL ME
-
- The CSTAR compiler is large and complicated. Please feel free to call with
- any questions or comments you may have. If you plan to do any real work
- with CSTAR I probably can save you a lot of time. I am also available to do
- commercial consulting and programming on this product.
-
- DOCUMENTATION
-
- The file ARITH.DOC contains a detailed discussion of the rules for
- generating code for arithmetic expressions. As you will see, this is a
- complex area, and one in which many compilers generate poor code.
- CSTAR does an outstanding job in this area.
-
- The file CHANGES.DOC discusses how the coding and performance of the
- CSTAR translator could be improved, perhaps by as much as a factor of 3.
- The changes described are extensive so this file will be of theoretical interest
- only for most people.
-
- The file CSTAR.DOC discusses the evolution of the original language,
- PL/68Kto the CSTAR language. As you will see, this change was not
- entirely satisfactory, which was one reason why neither PL/68K nor
- CSTAR were ever offered as commercial products. Nontheless, CSTAR
- would make a good basis for making a full C compiler.
-
- The file DESIGN.DOC discusses various aspects of design of the CSTAR
- translator. (See CSTAR.DOC for a discussion of the CSTAR language.)
-
- The file PL68K.DOC contains a slightly revised version of the original Dr.
- Dobb's Journal article about PL/68K. Alas, it does not contain the original
- figures.
-
-
- FEATURES AND LIMITATIONS OF THE CSTAR COMPILER
-
- Insofar as CSTAR is simply C, the CSTAR compiler can not produce better
- code than the best C compilers. In fact, however, the design of the CSTAR
- compiler was heavily influenced by the same philosophy behind the design
- of the PL/68K language . This philosophy produced several important
- benefits in the compiler:
-
- o You get as much control over the code generated by arithmetic
- expressions as you desire. Most of the time, the code generators in the
- CSTAR compiler produce locally optimal code. If they do not, or if global
- considerations require local code changes, you can usually improve the code
- simply by using different C operators. For example, you could replace
- array operators [] with dereference operators *. These tricks will work with
- most C compilers, but CSTAR also allows you to use PL/68K-style
- pseudo-functions for absolute control over the output generated.
-
- Changing arithmetic expressions produces only local changes to the output.
- The CSTAR compiler does no global optimization, so you can always
- pinpoint what source code statements produced what output statements.
- Output is flagged with the line number of the statements that produced the
- output.
-
- The quality of code generated by expressions largely determines the overall
- quality of a compiler. The CSTAR compiler typically produces good-to-
- excellent code for all C-language constructs but the CSTAR language also
- allows you to specify exact code to be produced when you need to. Except
- for constructs involving common sub-expressions and global optimizations
- (which you may improve by changing the source code), the quality of the
- code produced by the CSTAR compiler is as good or better than the code
- produced by the Green Hills compiler.
-
- o The CSTAR compiler always produces best possible code for boolean
- expressions and flow-of-control constructs such as if-then-else, do-while,
- etc. No unnecessary jumps to jumps are ever generated. Compare
- instructions are rearranged in required to eliminate non-essential jumps. The
- code produced by this part of the compiler is indistinguishable from the
- code that would produced by a good assembly language programmer.
-
- The CSTAR compiler compiler is fast, and could be made much faster. On
- a 10 Mhz CP/M 68000 system with a ram disk the CSTAR compiler
- compiles more than 2000 lines per minute. This is comparable to the speed
- at which the non-optimizing Unix cc compiler compiles code on a 68000
- system and about three times faster than the Green Hills compiler.
-
- The CSTAR compiler reads source files only once--no intermediate files are
- used and all data structures are kept in main memory for maximum speed.
- Thus, the size of the largest function that can be compiled is limited by the
- amount of main memory available. About 500K of main memory is
- required to compile the file containing the largest function in the CSTAR
- compiler itself.
-
- The CSTAR compiler contains the following extensions to standard C:
-
- o C language variables which have the same name as 68000 registers are
- treated as if they were register variables assigned to the corresponding
- register.
-
- o Functions which have the same name as 68000 instructions are treated as
- if the corresponding 68000 instruction were inserted in line.
-
- o The #enum preprocessor directive is an abbreviation for a sequence of
- #define's.
-
- The following features of the ANSI C are not supported by the CSTAR
- compiler:
-
- o Blocks. All variables of a function must be declared as formal parameters
- or as local variables whose scope is the entire function.
-
- o Bit fields.
-
- o Complex initializers involving arrays of structs or unions.
-
- o Enum data type.
-
- o Functions may not return structs,unions or arrays, but only pointers to
- those objects.
-
- o Function prototyping.
-
- The data structures used by the CSTAR were designed for maximum
- flexibility and speed. The CSTAR compiler builds in main memory a full
- parse tree and a complete representation of the code to be output, called the
- intermediate code list or ICL. The CSTAR compiler generates code (i.e.,
- adds nodes to the ICL), by making several passes over the parse tree,
- sometimes rearranging the tree in the process. The nodes of the parse tree
- mostly contains links to other nodes so that traversing the parse tree is very
- easy. After the ICL is built, a peephole optimization pass deletes or reorders
- nodes in the ICL. The ICL is doubly linked so that deleting or rearranging
- nodes of the ICL is also easy.
-
- With these data structures in place, adding a global optimizer or an internal
- assembler (so that the compiler would produce object code) would be
- straightforward. A global optimizer would simply rearrange the parse tree
- before the code generators were applied to it. Again, the format of the parse
- tree already lends itself to these operations. As with a stand alone
- assembler, the internal assembler would simply make two passes over the
- ICL, one pass to calculate location counter values and the other pass to
- output object code. The internal assembler, however, would be much easier
- to write than a stand-alone assembler since the ICL is a much simpler format
- to process than standard assembly language.
-
-
- STATUS OF THE COMPILER
-
- At one time the CSTAR compiler could compile itself. However, that is no
- longer true--the code now contains both function prototypes and ANSI function
- definitions which the CSTAR compiler does not support. If you are thinking
- of adding these new language features, you should consider getting the Sherlock
- debugging package. The SPP tool, which is part of the Sherlock package, does
- parse full ANSI C.
-
- Compiles with both Turbo and Microsoft without errors and without
- significant warnings. Microsoft gives no warnings and Turbo C gives the
- following warnings, all of which can be ignored:
-
- o Possibly invalid assignments.
-
- o Unreachable code: These are the result of code that tests the compiler
- environment. These are ok.
-
- o Parameter xxx is not used. These are the result of unfinished code,
- mostly unfinished optimizations.
-
- The ipt field of struct iblock is really a union, which is why casts are
- so often required. This code could be buggy.
-
- No bugs are known, but not a lot of serious testing has been done lately.
-
-
- FILES ON THE DISK
-
- The header file cstar.h contains extensive discussion about compile-time
- options to choose before compiling CSTAR.
-
- Main line and preprocessor: cstar.c, tok.c, dir.c, def.c.
-
- Parser: par.c, dcl.c, exp.c.
-
- Symbol tables: mst.c and st.c.
-
- Code generation: g1.c, g2.c, g3.c, reg.c, x2.c, t2.c.
-
- Output Routines: io.c, pr.c.
-
- Utilities: sys.c, mem.c, utl.c, str.c.
-
- Executable files: csdb.exe is the debugging version of cs.exe.
-
- Test files: *.tst
-
- Batch files: The file fr.sub searches all the source files of CSTAR for a
- pattern using a FIND program.
-
- Make and link files for Microsoft C: *.mmk and *.ml
-
- Make and link files for Turboc: *.mak and *.lnk.
-
- Dummy version of Sherlock macros: sl.h (see next section)
-
-
- DEBUGGING NOTES
-
- csdb.exe is instrumented with the Sherlock tracing system, available from
- the C Users' Group. If you are going to modify CSTAR I *strongly*
- recommend you get a copy of Sherlock. The file sl.h contains a dummy
- version of sl.h which will allow you to recompiler CSTAR without having
- Sherlock.
-
- Almost any routine may be traced by simply enabling the Sherlock argument
- with the corresponding name. For instance, to trace is_reserved(), just do
- the following on the command line:
-
- csdb ++is_reserved
-
- However, the most useful tracing arguments do not correspond to the
- names of routines. These "extra" tracing arguments are as follows:
-
- ++v Enable tracing of numerous dumps.
-
- ++out_list Send output of compiler to both console and the output file.
-
- ++g_put Intersperse debugging output with compiler output.
-
- ++code Output intermediate data structures.
-
- ++dump Output debugging statistics at the end of the run.
-
- You should enable all of these switches any time you are faced with a
- serious bug. As you peruse the code, you may also find some other
- Sherlock trace points which would be helpful, but you should definitely be
- aware of the ones above.
-
- It is also possible to set internal compiler variables and switches using
- Sherlock arguments.
-
- ++no_out Disable out_list() function.
-
- ++nopeep Disable peephole optimization
-
- ++init_1 Use 1 temp A-register, 1 temp D register for expressions
-
- ...
-
- +init_6 Use 6 temp A-regs, 6 temp D regs for expressions.
-
-
- Note: Because CSTAR does not support local variables defined in inner
- blocks, you must use the MARK I version of the Sherlock macros when
- compiling the CSTAR source using CSTAR compiler itself. The normal
- MARK II version of the Sherlock macros can be used when compiling
- CSTAR with either Turboc C or Microsoft C or any other ANSI compiler.
-
-
- DIAGNOSTIC MESSAGES
-
- There are three levels of diagnostic messages. In order of severity, they are:
-
- Errors: an error is present which precludes creation of usable code. For
- example, outright syntax errors.
-
- Warnings: usable code will be produced, but there is something going on
- that probably needs fixing. For example, dubious pointer subtraction,
- constant in conditional context, etc. Many, if not most, C warnings have
- probably already become PL/68K errors.
-
- Helps: valid code (as far as the compiler is concerned) has been produced,
- but a question is raised as to something which the programmer may well
- choose to ignore, e.g. an operator which generates especially abstruse code
- on the target processor. Helps may be turned off as a command line option.
-