home *** CD-ROM | disk | FTP | other *** search
- Literate C++
-
- Marco S. Hyman
- uucp: ...!pacbell!dumbcat!marc
-
- Next time you're feeling bored ask a group of programmers to define good
- program documentation -- then duck before every volume of The Art of Computer
- Programming is thrown your way. The extremes are easy to identify. There is
- usually at least one in the group insisting that the only documentation ever
- needed is the source, the whole source, and nothing but the source. At the
- other end of the spectrum will be the programmer bent under weighty volumes of
- requirements analysis documents, system design documents, HIPO charts, data
- flow diagrams, data dictionaries, structure charts, and, of course, source
- listings.
- One of the reasons for such diverse opinions is that each programmer is
- likely to have a different documentation goal. Some programmers want to
- explain algorithms, others want to show data flow and state transitions. There
- are also those that do the bare minimum required by the organization they work
- for. This last group can't see any purpose in writing documentation, so
- perhaps we should state one:
- "The purpose of program documentation is to provide enough information for
- another programmer to understand and maintain the program."
- If this purpose seems a bit altruistic substitute ``another programmer''
- with ``you, after not looking at the program for a year.'' The purpose is just
- as strong and certainly hits closer to home.
- But what about C++ class documentation? One of the advantages of C++ and
- object-oriented programming is that it leads to code re-use. However, a
- programmer is not likely to re-use code when its function is a mystery. Forcing
- another programmer to look at your implementation to discover what your code
- does is not polite. It leads to your code being tossed in favor of code the
- other guy understands. Code is not re-usable until it's documented.
- "The purpose of class documentation is to provide enough information for
- another programmer to use the class and member functions of the class."
- Two levels of documentation are needed; the first level for users of a
- class and the second level for maintainers of the class. Class users need
- something like the pages in your C library reference manual. Class maintainers
- need to know the algorithms used and WHY the code is the way it is. Both the
- code and the class documentation will convey WHAT the class does.
-
- Literate Programming
- Don Knuth conceived the idea of programs as works of literature and created
- ``Literate Programming'' (see sidebar) as a method of explaining to programmers
- what the computer was to do. In Knuth's implementation a program is written in
- WEB, a language consisting of both TeX text and Pascal text. The combined text
- is processed by two programs, TANGLE and WEAVE, to produce both Pascal source
- code and TeX formatted documentation.
- There are many advantages of keeping the documentation and the code in the
- same file. A programmer is more likely to update both when both are on the
- screen at the same time, thus keeping the program and documentation in sync. It
- also becomes impossible to loose the documentation (without also losing the
- source). With the proper tools a pretty printed version of the source can be
- included in the documentation. Most important, the programmer is encouraged to
- think about both documentation and code. This usually has a side effect of
- improving both.
- To see if the literate programming paradigm can be stretched to include C++
- I propose Literate C++ (lc++)*1. As shown in figure 1 an lc++ input file
- (file.lc) will be used to create both a C++ header file (file.h) and a C++
- source file (file.cc). The lc++ input file (file.lc) can also be processed to
- create a library manual page (file.3) and a class documentation file
- (file.doc).
- An lc++ language has been designed and a header and source file extraction
- program, named lcpp, has been prototyped in awk.*2 They are both described
- below. My goal is to use the prototype program for a while, refining the lc++
- language as it is used to generate the second generation extraction tool in
- lc++. The next step will be to determine a good format for both types of
- documentation and the creation of the documentation extraction tools.
-
- The lc++ Language
- The lc++ commands are listed in figure 2. Each command starts with an at sign
- (@) and currently must be the first token on a line, although this is likely to
- change.
- Each lc++ input (.lc) file creates one header file and one source file. If
- multiple headers are required, for example, then each must have its own lc++
- file. One of the purposes of the awk prototype is to determine if this
- limitation is reasonable.
- The .lc file contains three sections. The first section consists of all
- text and commands prior to the @specification command. Text in this section is
- ignored by both the code and documentation extraction tools. Commands usually
- found in this section are @title and @copyright.
- The second section of the file starts with @specification. Code and text
- in the specification section are used to create the header (.h) file and the
- library man page (.3) file. The code in this sections defines classes and
- declares member functions.
- The final section starts with the @implementation command. This section is
- used to define the member functions declared in the section sections. The text
- in this section is free form and used to explain what is being done and why.
- Code and text in this section create the source (.cc) file and the
- documentation (.doc) file. @inline commands will cause code to be added to the
- end of the header file. A description of each command follows.
- @title: The @title command causes a title, perhaps including version
- information, to be written to both the .h and the .cc output files. Along with
- the title is a canned notice that explains that the output files should not be
- modified directly, but that changes should be applied to the input (.lc) file
- and lc++ run again to generate new output files.
- @copyright: The @copyright statement is written to both output files.
- Comment delimiters must be supplied. This is not done automatically as
- different authors prefer different commenting styles. The copyright section is
- also a good place to include a change log, such as that built by RCS or any
- other version control system you may be using.
- @code: The @code command enables output. The location of the output, .h
- file or .cc file, depends upon the current mode (specification or
- implementation). All lines following the @code line are written to the current
- file. Output continues until a command that disables output is encountered.
- An @code is not always required for output. The @copyright command above, for
- example, enables output by default.
- @text: The @text command disables .h or .cc output and signifies the
- beginning of documentation. The text will be written to the library manual
- page (.3) file when @text is seen in the @specification section. Text will be
- written to the documentation (.doc) file when seen in the implementation
- section. Alternating @text and @code commands are often seen as the author
- goes back and forth between coding and documenting.
- @specification: The @specification command selects specification mode. In
- specification mode @code output is written to the .h file. Classes are defined
- in this mode and class members are declared. @code output is written to the .h
- file immediately. Other output, such as class definitions and member
- declarations, are not written until the specification mode ends. Output is not
- enabled by the @specification command. The mode is ended by end of the lc++
- input (.lc) file or by an @implementation command.
- @class: The @class command starts the definition of a new class. Classes
- are always output in the order that the @class command is found in the lc++
- input (.lc) file. No output is done until the specification section of the .lc
- file is finished. If circular class definitions are required use a class x;
- declaration in an @code block before the @class definition.
- @base: The @base command declares the base classes that make up a class.
- The syntax of the command is @base @<classname> <base class description>. The
- @<classname> is optional. By default, an @base command adds a base class to
- the last class defined with the @class command. Because it may be easier to
- document related classed by bouncing between them it is possible to add a base
- class to any previously defined class by using the @<classname> syntax.
- Example:
-
- @class Class1 // defines Class1
- @class Class2 // defines Class2
- @base virtual public Base2 // adds Base2 as a base class of Class2
- @base @Class1 public Base1 // adds Base1 as a base class to Class1
- @base private Base2p // adds Base2p as another base class
- // to Class2 (the last defined class).
-
- @public, @protected, and @private: These three command add members to the
- current class. Like the @base command members can be added to a previous named
- class by adding an @<classname> after the @public, @protected, or @private
- command. The text on the command line after the command will be copied into
- the class definition. Proper C++ syntax must be followed.
- The text following the command should explain when to use the member and
- what the member does. Of course, this pertains to member functions much more
- so that data members. How member functions are implemented is *not*
- appropriate subject matter here. This is still part of the specification. The
- implementation could vary many ways and still meet the specification. This text
- is *not* added to the header file.
- @requires: The @requires command introduces text that describes caller
- requirements. That is, if the requirements are not followed than the called
- function is not required to work. Examples of @requires would be that only
- positive numbers are passed to a square root function. This command always
- pertains to the last @public, @protected, or @private command found in the lc++
- input (.lc) file.
- @effects: This command introduces a very brief description of what the
- member function does, i.e. what is the effects of calling the member function.
- The description is used to generate class documentation. This command always
- pertains to the last @public, @protected, or @private command found in the lc++
- input (.lc) file.
- See Abstraction and Specification in Program Development by Liskov and
- Guttag on specifying procedures by use of a requires and an effects clause.
- They also use a modifying clause which could be added to lc++.
- @implementation: The implementation command forces classes defined to be
- written to the header (.h) file and switches @code output to be written to the
- source (.cc) file. All output, except for @inline (see below) will now be
- written to the source file. Text written after the @implementation command
- should discuss implementation details; more of the *how* than the *why*.
- @member: The @member command starts the definition of a member function.
- All lines following the @member command will be copied to the source (.cc)
- file. The command will be used when the documentation extraction programs are
- written. In the source extraction program it acts as an @code.
- @inline: The @inline command adds the lines following the command to the
- header (.h) file. The member function should have been declared as inline in
- the specification section of the file. This is not verified, however, and can
- lead to problems. For this reason future versions of the language will not use
- this keyword.
-
- The AWK extraction program
- Listing 1 is lcpp, the awk program used to process the lc++ input file. It uses
- the features of new awk, as described in The AWK Programming Language by Aho,
- Kernighan, and Weinberger. The program is fairly simple and should be easy to
- understand.
- Two arrays, class and className, are used to associate a class name with a
- class number. The array class returns a class number when indexed by a name.
- The array className returns a name when indexed by a number.
- The only tricky bit of coding is in the use of awk's associative arrays to
- force classes and member functions to be output in the same sequence they were
- input. The use of the member array illustrates the use. Whenever a new class
- is defined three entries are added to the member array for the class using the
- class number classNum, member[classNum, "public"], member[classNum,
- "protected"], and member[classNum, "private"]. The three entries are
- initialized to 0. This entry is then used as an index into the array when a
- member definition occurs. A public member definition would be added at
- member[classNum, "public", member[classNum, "public"]]. Note that this entry
- uses three indexes and the third is the current count. The count is
- incremented after the entry is added. The functions doClass and doMembers use
- these embedded counts to control printing.
-
- A Short Example
- Listing 2 contains a short example of literate C++. The code doesn't do
- anything except to illustrate some of the features of the language. Note how
- descriptive text can be placed anywhere in the file. When processed by awk and
- lcpp two output files are created. With the input file named test.lc the
- output files are named test.h (listing 3) and test.cc (listing 4). The command
- line used to generate these files was
-
- awk -f lcpp test.lc
-
- but this may vary between operation systems and versions of awk.
- The definition of Literate C++ is not complete. Non-member functions are
- not handled and inline member functions must be declared as inline in too many
- places. Also, little thought has been given to how documentation should be
- typeset. Documentation requirements are sure to force changes to the
- definition. With use, this prototype will help show what other changes need to
- be made to the language.
- Will literate C++ work? Think of all the programs you've had to learn over
- the years. Now think of those that have been the easiest to understand.
- Weren't the ones easiest to understand accompanied by articles in Computer
- Language, or Dr. Dobbs, or Byte: code and text -- a literate programming style.
-
- Marco S. Hyman is a principal engineer, designing and writing software for a
- company in San Francisco. C++ and object-oriented programming are hobbies he
- pursues at home. He can be reached via e-mail (UUCP) at
- ...!pacbell!dumbcat!marc.
-
- Bibliography
- Aho, A.V, B.W. Kernighan, and P.J. Weinberger, The AWK Programming
- Language, Addison-Wesley, Reading, Mass. (1988).
- Liskov, B., and J. Guttag, Abstraction and Specification in Program
- Development, MIT Press, Cambridge, Mass. (1986).
-
- *1 Note: By rights this should be called C++WEB or WEB++. I thought of
- lc++ first and like the name so haven't changed it.
- *2 Note: Lcpp is written in awk and requires new awk (nawk for old UNIX
- hands.) I believe the DOS ports of awk are new awk compatible.
-
-
- Sidebar: Literate Programming
- Literate Programming is the name given by Donald Knuth to a programming
- language and documentation system built around the idea that a program can be
- considered a work of literature. It is Knuth's belief that a ``practitioner of
- literate programming can be regarded as an essayist, whose main concern is with
- exposition and excellence of style.'' These main concerns emphasize the goal of
- a literate program: explaining to another programmer what the computer is to
- do.
- Knuth's literate programming is implemented in WEB, a language that
- combines the features of two other languages, TeX and PASCAL. WEB programs are
- descriptions of software systems. A WEB description is processed by two other
- programs, TANGLE and WEAVE, to produce a PASCAL source file and a TeX input
- file. When the TeX input file is processed by TeX the output is a ``pretty
- printed'' version of the program with supporting documentation.
- WEB files are composed of modules with each module consisting of three
- parts: TeX explanatory material, definitions (WEB adds simple macros to
- PASCAL), and PASCAL code.
- Each module is more or less self-contained and should not be so long that
- its structure is hidden in its length and complexity. Modules are often a few
- lines long, they are rarely longer than a page.
- Other versions of WEB or WEB-like languages are also in use. CWEB is
- similar to WEB but the output is TeX and C. (This is not to be confused with
- the WEB2C tool that converts original WEB to C code.) loom is a preprocessor
- written by Janet Incerpi and Robert Sedgewick and used in preparation of
- Sedgewick's book Algorithms (Addison-Wesley, Reading, Mass., 1983).
- The Communications of the ACM has a sometimes column on literate
- programming moderated by Christopher J. Van Wyk of AT&T Bell Laboratories. See
- the July 1987, December 1987, December 1988, June 1989, and September 1989
- issues. The latest column described the language SPIDER which is used to
- generate WEBs for other languages.
-
- For more information see also:
- Bently, J., D. Knuth, and D. McIlroy, ``Programming Perls: A Literate
- Program,'' Communications of the ACM, 29,6 (June 1986), 471-483
- Knuth, D., ``Literate Programming,'' Computer Journal, 27,2 (1984), 97-111
- Knuth, D., The WEB System of Structured Documentation, Stanford Computer
- Science Report CS980 (September 1983).
-
- Figure 1
-
- ..............
- . lc++ input .
- . (file.lc) .
- ..............
- |
- .....................................
- | |
- lcpp awk script some future program
- | |
- .............. ....................
- | | | |
- ............ ............. ................... ..................
- C++ . . C++ . . class . . class .
- . Header . . Source . . use (man page) . . implementation .
- . (file.h) . . (file.cc) . . (file.3) . . (file.doc) .
- ............ ............. ................... ..................
-
-
- Figure 2
-
- @title Assign a title to the output files.
- @copyright Put copyright info in output files.
- @code Flag the following lines as code to be written to an
- output file
- @text Flag the following lines as text that is not to be
- written to an output file.
- @specification Start defining a specification.
- @class Define a new class
- @base Specify a base class for a previous class definition.
- @public Specify a public interface to a class
- @protected Specify a protected interface to a class
- @private Specify a private interface to a class
- @requires Specify member function requirements
- @effects Specify member function effects
- @implementation Start defining an implementation
- @inline Define an inline member function
- @member Define a member function
-
-
- Listing 1 (lcpp)
-
- # @(#) lcpp 12feb90 (msh)
- # function timestamp: outputs the file creation timestamp
- # this function may not work on non-unix systems
- function timestamp( file ) {
- "date" | getline d
- print "// @(#) " file " created " d > file
- }
-
- # function notice: outputs the title and do not revise
- # notice for the passed file.
- function notice( title, file ) {
- print title > file
- print "" > file
- print "// This file generated from the input file " ARGV[1] > file
- print "// DO NOT REVISE THIS FILE." > file
- print "// To make revisions modify the original input file." > file
- print "" > file
- }
-
- # function members: keep track of members by class and type
- # Entries are kept in the order defined.
- function members( type ) {
- $1 = ""
- if ( $2 ~ /@.*/ ) {
- classNum = class[ substr($2,2) ]; $2 = ""
- } else {
- classNum = classCount
- }
- member[classNum,type,member[classNum,type]] = $0
- ++member[classNum,type]
- }
-
- # function error: print line number, error message,
- # and increase error counter
- function error( msg ) {
- print "Line " NR ": " msg
- errors++
- }
-
- # function doMember: output members of a given type for a given class
- function doMembers( num, type ) {
- if ( member[num,type] > 0 ) {
- print type ":" > hOut
- for (i = 0; i < member[num,type]; ++i) {
- print " " member[num,type,i] > hOut
- }
- }
- }
-
- # function doClass: outputs a class specification from
- # the internal class tables
- function doClass( num ) {
- # output the class header
- print "" > hOut
- printf "class %s", className[num] > hOut
- # Add any base classes. Output the opening brace.
- for ( i = 0; i < base[num]; ++i ) {
- printf "%s", base[num,i] > hOut
- }
- print " {" > hOut
- # output the various members
- doMembers( num, "public" )
- doMembers( num, "protected" )
- doMembers( num, "private" )
- # terminate the class.
- print "};" > hOut
- }
-
- # verify the correct number of arguments and build
- # the name of the output files
- BEGIN {
- if (ARGC != 2) {
- print "usage: " ARGV[0] " -f lcpp file"
- exit 1
- }
- count = index(ARGV[1],".")
- if (count == 0) {
- hOut = ARGV[1] ".h"
- ccOut = ARGV[1] ".cc"
- } else {
- hOut = substr(ARGV[1],1,count) "h"
- ccOut = substr(ARGV[1],1,count) "cc"
- }
- timestamp(hOut); timestamp(ccOut)
- }
-
- # @<anything>: turn off output whenever an @command is found
- $1 ~ /^@.*/ { outEnabled = 0 }
-
- # @title: The title is written to both output files as a comment.
- # output remains off.
- $1 == "@title" {
- $1 = "// title: "; notice( $0, hOut ); notice( $0, ccOut ); next }
-
- # @copyright: Output is turned on so the following copyright info
- # is written to both output files.
- $1 == "@copyright" { hOutEnabled = 1; ccOutEnabled = 1; outEnabled = 1; next }
-
- # @specification: Marker for the start of a specification.
- # direct output to the header file only, but keep output disabled
- $1 == "@specification" { hOutEnabled = 1; ccOutEnabled = 0; next }
-
- # @text: Disable output (actually done above, just eat the @text)
- $1 == "@text" { next }
-
- # @code: Enable output for the following lines.
- $1 == "@code" { outEnabled = 1; next }
-
- # @class: look for class definition. Verify the class name.
- # Start storing info in an array entry for the class.
- $1 == "@class" {
- if ( NF != 2 ) {
- error( "invalid class definition" )
- } else {
- if ( $2 in class ) {
- error( "duplicate class name" )
- } else {
- ++classCount; classNum = classCount
- class[$2] = classNum; className[classNum] = $2
- base[classNum] = 0
- member[classNum,"public"] = 0
- member[classNum,"protected"] = 0
- member[classNum,"private"] = 0
- }
- }
- next
- }
-
- # @base: define a base for the named class. If not class
- # named use the last class defined. Add it to the base class
- # array for the appropriate class.
- $1 == "@base" {
- if ( $2 ~ /^@.*/ ) {
- classNum = class[ substr($2,2) ]; $2 = ""
- } else {
- classNum = classCount
- }
- if ( classNum ) {
- $1 = base[classNum] == 0 ? ":" : ","
- base[classNum,base[classNum]] = $0
- ++base[classNum]
- } else {
- error( "no class for base definition" )
- }
- next
- }
-
- # keep track of public entries by class.
- $1 == "@public" { members( "public" ); next }
-
- # keep track of protected entries by class.
- $1 == "@protected" { members( "protected" ); next }
-
- # keep track of private entries by class.
- $1 == "@private" { members( "private" ); next }
-
- # process @requires. Ignore for now.
- $1 == "@requires" { next; }
-
- # process @effects. Ignore for now.
- $1 == "@effects" { next; }
-
- # entering the implementation section of the input. Set code output to go
- # to the cc file after dumping the classes. Output remains off.
- $1 == "@implementation" {
- for ( classNum = 1; classNum <= classCount; classNum++ ) {
- doClass( classNum )
- }
- classCount = 0
- hOutEnabled = 0
- ccOutEnabled = 1
- print "#include \"" hOut "\"" > ccOut
- next
- }
-
- # member function definition. Enable output to the c file.
- $1 == "@member" { hOutEnabled = 0; ccOutEnabled = 1; outEnabled = 1; next }
-
- # inline member function. Enable output to the h file.
- $1 == "@inline" { hOutEnabled = 1; ccOutEnabled = 0; outEnabled = 1; next }
-
- # check if an invalid @command was given and flag the line number
- $1 ~ /^@/ { error( "unknown command" ); next }
-
- # if output is enabled for the header file write this line out
- outEnabled == 1 && hOutEnabled == 1 { print $0 > hOut }
-
- # if output is enabled for the cc file write this line out
- outEnabled == 1 && ccOutEnabled == 1 { print $0 > ccOut }
-
- END {
- for ( classNum = 1; classNum <= classCount; classNum++ ) {
- doClass( classNum )
- }
- close( hOut );
- close( ccOut );
- if ( errors ) {
- print errors "error(s) found"
- exit 1
- } else {
- print "generated " hOut " and " ccOut
- }
- }
-
- Listing 2 (test.lc)
-
- @title Example Program
- This text does not go in either file.
- @copyright
- /*
- * This class doesn't do anything.
- */
- @text
- Note: Copyright output is to both files until
- the next @command
- @specification
- Code output is not enabled. If you wish something
- to be written to the header file you must turn on
- code generation by using an @code
- @code
-
- #include <stdio.h>
- @text
- stdio.h was included above as it is used by
- one of the inline functions.
- @class testClass
- This is where testClass is described.
- @private int dataMember;
- This is where dataMember is described.
- @public inline testClass();
- @requires
- The requirements, if any, of the testClass
- constructor.
- @effects
- The effects of calling the testClass constructor
- @text
- General text about the constructor.
- @public virtual ~testClass();
- @implementation
- Text describing implementation issues.
- @code
-
- // this will be part of the .cc file
-
- @text
- The next function is inline, so it will be added
- to the header file. This assumes that the function
- has been declared inline above.
- @inline
- testClass::testClass()
- {
- @text
- Text can be added even in the middle of a function.
- Just use @code to start outputting code again.
- @code
- printf( "testClass constructor\n" );
- }
- @text
- The next function is a member function.
- @member
- testClass::~testClass()
- {
- // do something here
- }
-
- Listing 3 (test.h)
-
- // @(#) test.h created Thu Mar 29 17:56:47 PST 1990
- // title: Example Program
-
- // This file generated from the input file test.lc
- // DO NOT REVISE THIS FILE.
- // To make revisions modify the original input file.
-
- /*
- * This class doesn't do anything.
- */
-
- #include <stdio.h>
-
- class testClass {
- public:
- inline testClass();
- virtual ~testClass();
- private:
- int dataMember;
- };
- testClass::testClass()
- {
- printf( "testClass constructor\n" );
- }
-
- Listing 4 (test.cc)
-
- // @(#) test.cc created Thu Mar 29 17:56:47 PST 1990
- // title: Example Program
-
- // This file generated from the input file test.lc
- // DO NOT REVISE THIS FILE.
- // To make revisions modify the original input file.
-
- /*
- * This class doesn't do anything.
- */
- #include "test.h"
-
- // this will be part of the .cc file
-
- testClass::~testClass()
- {
- // do something here
- }
-