home *** CD-ROM | disk | FTP | other *** search
Text File | 1996-02-07 | 2.9 MB | 88,156 lines |
Text Truncated. Only the first 1MB is shown below. Download the file for the complete contents.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- A Short Floating- Point Type in C++
-
-
- William Smith
-
-
- William Smith is the engineering manager at Montana Software, a software
- development company specializing in custom applications for MS- DOS and
- Windows. You may contact him by mail at P.O. Box 663, Bozeman, MT 59771- 0663.
-
-
-
-
- Introduction
-
-
- Even though a typical microcomputer can have up to ten times the memory of one
- just a few years ago, there are still programming problems where memory is a
- limiting factor. I frequently bump into memory limitations in embedded and
- data acquisition applications. Numerous times I have had to work with a large
- quantity of floating- point numbers in a confining space. A common situation
- is the acquisition of large amounts of data through a 14- bit (or smaller) A-
- to- D (Analog to Digital) converter.
- Storing these numbers as 32-bit floats always seemed like overkill to me and a
- waste of space. This was especially annoying when I had to store tens of
- thousands of points in an array and would hit some kind of a memory limitation
- such as a segment boundary, physical memory limit, or even a file or disk size
- limit. The standard float type works, but it represents a poor match to the
- problem to be solved. Matching the floating- point size to what an application
- needs can result in significant memory savings in data- intensive programs.
- I really only needed a 16-bit floating- point type instead of the native
- 32-bit float. At first, I played some games and stored the data as short int.
- But this forced me to convert the data to float to do anything useful with it.
- I wanted a short floating-point type. I even implemented one, albeit crudely,
- in C. C allowed me to do it, but the conversion process never was clean or
- transparent. With C++, I was finally able to do what I wanted. I was able to
- create a short floating-point type that I could use naturally in my
- applications. C++ can hide all the dirty work, such as conversions.
- The new type, which I call sfloat, even allowed me to control range and
- precision. Some situations called for a floating-point type that ranged
- between 0 and 10.0 and maximized the precision within that range. Other
- situations required a larger signed range but less precision. Being able to
- tailor the characteristics of the type to meet an application's needs was a
- practical feature I built into sfloat,
- I implemented the sfloat type in "Standard C++" (if there is such a beast).
- The code works with Microsoft C++ and Borland C++ under MS-DOS and MS-Windows.
- It has some dependencies on the size of the standard types float, unsigned
- short int, and long. It assumes that:
- a float is 32 bits
- an unsigned short int is 16 bits
- a long is 32 bits
- It also assumes that the float type is that defined by the IEEE standard for
- 32-bit floating-point values. Table 1 gives the IEEE details. As long as a
- compiler and operating system conform to these restrictions, the code for
- sfloat will probably work in other environments.
-
-
- Implementation
-
-
- Listing 1, sfloat.hpp, defines a C++ class called sfloat. The class has
- numerous private static members, one protected member and numerous public
- member functions. There are even some non-member functions prototyped in
- sfloat.hpp.
- The static data provides a workspace for conversion between sfloat and float.
- This static data is class specific. All instances, or objects, of class sfloat
- share the same static data. The protected member s is the only object instance
- data. This member is unique to each instance of sfloat. In fact, the sizeof
- operator will report the size of sfloat to be the size of this member, 2
- bytes.
-
-
- Constructors
-
-
- One of the most elemental functions for a C++ class is the constructor. A
- constructor has the same function name as for the class. Furthermore, you can
- overload the constructor to provide construction from (conversion from)
- different types. The sfloat class has three constructors.
- sfloat();
- sfloat(float f);
- sfloat(sfloat& sf);
- sfloat() defines the "default" construction of an sfloat object, such as on
- the stack. The compiler would generate this function automatically if you do
- not specify it. sfloat(float f) converts a floating-point number to an sfloat
- to initalize the stored value. sfloat(sfloat& sf) initializes the new object
- by making a copy of another sfloat object. These three constructors provide
- the functionality needed to support the following declarations using sfloat.
- sfloat sf1;
- // uses sfloat();
- sfloat sf2 = 1.0f;
- // uses sfloat(float f);
- sfloat sf3 = sf2;
- // uses sfloat(sfloat& sf);
- These three types of construction and initialization cover the minimum
- required to use sfloat type naturally. The code for the constructor functions
- resides in Listing 2, sfloat.inl. sfloat() and sfloat(sfloat& sf) are very
- simple. On the other hand, sfloat(float f) has to do a bit of work. It has to
- convert a float to an unsigned short and assign it to the object instance data
- member s.
- The conversion process used in sfloat(float f) truncates the mantissa bits to
- a lower precision. It also lowers the range of the exponent by discarding
- higher-order bits. The conversion process utilizes some of the static data
- members of class sfloat as a work space and to hold intermediate values. The
- bitwise shift operators << and >> move the bits that will be kept from the
- float value into place before they are packed into an unsigned short.
- Since none of the constructor functions allocate memory on the heap (free
- store) using new there is no need to define a destructor function. C++ will
- provide a default destructor that does nothing.
-
-
- Conversion to float
-
-
- We also need a way to convert an sfloat object to a float. To use conventional
- notation, we need to define the operator function
- sfloat::operator float()
- Listing 2, sfloat. inl, contains the definition of this function. You will
- notice that it's logic is just the reverse of sfloat:: sfloat(float f). The
- shift operators once again move the bits of the sfloat into the proper
- locations in the 32 bits of a float. The extra bits are filled with zeros.
-
-
-
- Overloaded Operators
-
-
- Operator overloading is one of the features of C++ that allow you to use new
- defined types just like the standard existing types. Operator overloading is
- not so much an object-oriented feature as a convenience. Table 2, an extract
- from Listing 1, lists the operator functions defined for sfloat. This list
- includes all the operators that one commonly uses on floating-point numbers.
- These operator functions allow you to use objects of the class sfloat just
- like you would a standard floating-point type.
- Operator overloading is fairly straight-forward feature of C++ and covered
- well elsewhere. I recommend the "Stepping Up To C++" series of articles on
- "Operator Overloading" by Dan Saks (see CUJ January, March, May, and July
- 1992). I took a very simple approach to implementing these operators. I
- convert to float, use the predefined operations, then convert back to sfloat.
- For example, here is the code for the add-assignment operator:
- inline sfloat &sfloat::
- operator+=(sfloat sf)
- {
- float f = (float)*this;
- f += (float)sf;
- *this = (sfloat)f;
- return ( *this );
- } // operator+=
- This techniques is not the most efficient (it has to do three type
- conversions), but it sure is simple. My needs for the sfloat type were
- data-size driven, not code-speed or code-size driven. Consequently I can live
- with the overhead of all those conversions. If you cannot, you could rewrite
- some of these routines to operate directly on the sfloat type.
- I would like to emphasize that you can get trapped into inefficiency with
- operator overloading. If you are not careful, your operator overloading can
- force unneeded object construction and destruction, especially for the
- operators +, - , *, and /. One trick to avoid this is to use the corresponding
- assignment operators (such as +=) with a reference return type to define the
- other math operators. This technique results in the interesting side effect
- that the operators +, - , *, and / are neither member or friend functions.
-
-
- Inlining
-
-
- In implementing the sfloat class, I choose to inline the overloaded operator
- functions and the conversion functions. Inlining a function means that its
- code gets inserted into your compiled program each time the function is
- called. This can cause your program to bloat in size unexpectedly. If you find
- this happening, I recommend you do not inline at least the two conversion
- functions sfloat::sfloat(float f) and sfloat::operator float(). Both are
- fairly long. But experiment first. To discontinue inlining for a function,
- remove the inline modifier from its function definition and move the
- definition from the file sfloat. inl, (Listing 2) to sfloat.cpp (Listing 3).
- Some of operator functions are very short. You may wonder why I did not
- include their definitions with the class definition in the file sfloat.hpp.
- Instead I grouped all the inline functions in the file sfloat. inl. This is
- not quite standard, but I have to agree with Walter Bright, one of the C++
- compiler pioneers. Inline function bodies appearing in the class body clutters
- the class definition (C++ Report October 1992).
- Including inline functions with the class definition also violates the
- separation of the implementation of a class function members from the class
- definition. For maintenance purposes, it is a good technique to isolate the
- two. The class definition is the class interface and should change less than
- the member function implementation.
-
-
- Controlling Range and Precision
-
-
- The function sfloatrange, Listing 3 (slfoat.cpp), provides a way to adjust the
- range, signedness, and precision of the sfloat type:
- friend void sfloatrange(
- unsigned short sfNumExpBits,
- unsigned short sfSigned);
- The first parameter is the number of exponent bits. This can be any number
- from 1 to 8. The higher the number the larger the range of values that sfloat
- can represent. Eight bits is the same as for the standard float type. Table 3
- shows the maximum value that sfloat can represent for each of the possible
- numbers of exponent bits.
- The second parameter determines whether or not sfloat is a signed value. If
- the value is signed, sfloat reserves one of its bits as a sign bit. The number
- of mantissa bits is the remaining bits out of 16 not used by the exponent or
- the sign. That number can range from a minimum of seven to a maximum of 15:
- The minimum of 7 results from specifying eight exponent bits and designating
- sfloat as signed.
- The maximum of 15 results from specifying one exponent bit and designating
- sfloat as unsigned.
- Table 4 lists all the possible numbers of mantissa bits and the corresponding
- (minimum) number of significant decimal digits.
- I have encountered a requirement to have an unsigned floating-point
- representation that needs only four significant digits and a range of four
- orders of magnitude (0 to 104). A combination of 11 mantissa bits, five
- exponent bits and no sign bit worked fine.
- The defaults, if you do not call sfloatrange, are eight exponent bits, a sign
- bit, and seven mantissa bits. This yields the same range as the standard
- float, but with much less precision. These values make the conversions between
- sfloat and float particularly easy. You can just use a union of a float and
- two unsigned shorts. To convert from a float, just store in the float member
- of the union and extract the second unsigned short. To convert from an sfloat,
- you reverse the process. Notice that the conversion functions do this for the
- special default range situation.
- There are limitations with range setting. The sfloat class uses static data to
- preserve the range information. This prevents you from tailoring the range
- individually for each instance (object) of the class. In other words, once you
- set the range, all sfloat objects have the same range. You could have each
- instance retain information about the number of exponent, mantissa, and sign
- bits, but this would require each object to store information about range and
- defeat the desire to save space. Use of static data also helps to speed up
- conversions.
- Use of static data to store the range information has repercussions in
- multitasking or multithreaded environments. Static data prevents the code for
- sfloat from being re-entrant. You cannot preserve different range information
- between tasks if the tasks are sharing the same code such as a Windows DLL
- (Dynamic Link Library).
- To keep sfloat small and make it re-entrant would require eliminating the size
- adjustability. This would force you to create a different class for each of
- the different range combinations used. Some real time or multitasking
- situations may demand you eliminate the range adjustability.
-
-
- Conclusions
-
-
- Some of the basic features of C++ make the solution to specific problems
- elegant and easy compared to C. I have presented a short floating-point type
- sfloat that utilizes operator overloading for notational convenience. You can
- easily integrate this new type into your C++ applications. The sfloat type is
- a 16-bit (two-byte) floating-point representation that you can use instead of
- the standard four-byte float.
- The sfloat type has appeal in applications that need only 16 bits for a
- floating-point type and require the storage of large amount of data. If you
- have particular requirements on range, precision, and signedness, you can
- tailor this type to best match your needs. In this way, you can get as many as
- five significant decimal digits (only one less than the standard float) in the
- range 0.0 to 2.0. You can also trade precision for range to get the same range
- as the standard float but with only three significant digits.
- Table 1 IEEE 32-bit float format
- Bits Meaning
- --------------------------------------------------------------------
- 0-21 23- bit mantissa between 1.0 and 2.0 (high-order bit implied)
- 22-30 eight-bit exponent (excess 127 binary exponent)
- 31 sign bit
- Table 2 Overloaded Operators for sfloat
- Member function assignment operators
-
- --------------------------------------------------
- sfloat &operator+=( sfloat sf );
- sfloat &operator-=( sfloat sf );
- sfloat &operator*=( sfloat sf );
- sfloat &operator/=( sfloat sf );
-
- Member function unary operators
- --------------------------------------------------
- sfloat operator+();
- sfloat operator-();
- sfloat operator++();
- sfloat operator--();
- sfloat operator++( int );
- sfloat operator--( int );
-
- Friend function relational operators
- --------------------------------------------------
- friend int operator==( sfloat sf1, sfloat sf2 );
- friend int operator=( sfloat sf1, sfloat sf2 );
- friend int operator( sfloat sf1, sfloat sf2 );
- friend int operator>( sfloat sf1, sfloat sf2 );
- friend int operator<( sfloat sf1, sfloat sf2 );
- friend int operator!=( sfloat sf1, sfloat sf2 );
-
- Non-member, non-friend function math operators
- --------------------------------------------------
- sfloat operator+( sfloat sf1, sfloat sf2 );
- sfloat operator-( sfloat sf1, sfloat sf2 );
- sfloat operator*( sfloat sf1, sfloat sf2 );
- sfloat operator/( sfloat sf1, sfloat sf2 );
- Table 3 Exponent size and range
- Exponent bits Maximum Value
- ----------------------------
- 1 2
- 2 4
- 3 16
- 4 256
- 5 65,536
- 6 4.29*109
- 7 1.84*1019
- 8 3.40*1038
- Table 4 Mantissa size and number of significant digits
- Mantissa bits Significant Digits
- ---------------------------------
- 7 2
- 8 3
- 9 3
- 10 3
- 11 4
- 12 4
- 13 4
- 14 4
- 15 5
-
- Listing 1 Definition of class sfloat
- #if !defined ( SFLOAT_DEFINED )
- #define SFLOAT_DEFINED
-
- union conv
-
- {
- float f;
- long l;
- unsigned short s[2];
- };
-
- class sfloat
- {
-
- private:
- // class data
- static unsigned long fManSignMask;
- static unsigned long fExpMask;
- static unsigned long fManMask;
- static unsigned short sfManSignMask;
- static unsigned short ManSign;
- static unsigned short Exp;
- static unsigned short Man;
- static unsigned short fBias;
- static unsigned short sfBias;
- static unsigned short fsfBias;
- static unsigned short Signed;
- static unsigned short fExpBits;
- static unsigned short sfExpBits;
- static unsigned short fManBits;
- static unsigned short sfManBits;
- static unsigned short fManSignBits;
- static unsigned short sfManSignBits;
- static unsigned short sfBits;
- static unsigned short fManShift1;
- static unsigned short fManShift2;
- static unsigned short sfManShift;
- static unsigned short fExpShift;
- static unsigned short sfExpShift;
- static unsigned short sfExpBitsMin;
- static unsigned short sfExpBitsMax;
- static union conv u;
-
- protected:
- // object instance data
- unsigned short s;
-
- public:
- // constructors
- sfloat();
- sfloat( float );
- sfloat( sfloat& sf );
-
- // conversion
- operator float();
-
- // member function assignment operators
- sfloat &operator+=( sfloat sf );
- sfloat &operator-=( sfloat sf );
- sfloat &operator*=( sfloat sf );
- sfloat &operator/=( sfloat sf );
-
- // member function unary operators
- sfloat operator+();
-
- sfloat operator-();
- sfloat operator++();
- sfloat operator--();
- sfloat operator++( int );
- sfloat operator--( int );
-
- // friend function relational operators
- friend int operator==( sfloat sf1, sfloat sf2 );
- friend int operator>=( sfloat sf1, sfloat sf2 );
- friend int operator<=( sfloat sf1, sfloat sf2 );
- friend int operator>( sfloat sf1, sfloat sf2 );
- friend int operator<( sfloat sf1, sfloat sf2 );
- friend int operator!=( sfloat sf1, sfloat sf2 );
-
- // utility function
- friend void sfloatrange(
- unsigned short sfNumExpBits,
- unsigned short sfSigned );
-
- }; // class sfloat
-
- // non-member function math operators
- sfloat operator+( sfloat sf1, sfloat sf2 );
- sfloat operator-( sfloat sf1, sfloat sf2 );
- sfloat operator*( sfloat sf1, sfloat sf2 );
- sfloat operator/( sfloat sf1, sfloat sf2 );
-
- #include <sfloat.inl>
-
- #endif
-
- // End of SFLOAT.HPP
-
-
- Listing 2 Constructor functions and overloaded operators
- inline sfloat::sfloat() {}
-
- inline sfloat::sfloat( float f )
- {
-
- // Init conversion union
- u.f = f;
-
- // Get the sign
- if ( Signed )
- {
- if ( !fsfBias )
- {
- s = u.s[1];
- return;
- }
- s = (unsigned short)
- (( u.l & fManSignMask ) >> sfBits );
- }
- else
- {
- s = 0;
- }
-
-
- // Get the exponent
- Exp = (unsigned short)
- (( u.l & fExpMask ) >> fManBits ) -
- fsfBias;
-
- // Compress the exponent
- Exp = ( Exp << ( sfExpShift )) >> sfManSignBits;
-
- // Get the mantissa
- Man = (unsigned short)(( u.l & fManMask ) >>
- ( sfManShift ));
-
- s = Man Exp;
-
- } // sfloat( float )
-
- inline sfloat::sfloat( sfloat& sf ) : s( sf.s ) {}
-
- // Cast - conversion operators
- inline sfloat::operator float()
- {
-
- // Get the sign - Init conversion union
- if ( Signed )
-
- {
- if ( !fsfBias )
- {j
- u.s.[1] = s;
- return ( u.f );
- }
- u.l = ( (unsigned long)
- (s & sfManSignMask )) << sfBits;
- }
- else
- {
- u.l = 0L;
- }
-
- // Get exponent
- u.l = (unsigned long)((( s << sfManSignBits ) >>
- fExpShift ) + fsfBias ) << fManBits;
-
- // Get the mantisa
- u.l = ( (unsigned long)( s << fManShift1 )) <<
- fManShift2;
-
- return ( u.f );
-
- } // operator(float)
-
- // Overloaded operators
-
- // Assigment operators
- inline sfloat &sfloat::operator+=( sfloat sf )
- {
- float f = (float)*this;
- f += (float)sf;
- *this = (sfloat)f;
-
- return ( *this );
- } // operator+=
-
- inline sfloat &sfloat::operator-=( sfloat sf )
- {
- float f = (float)*this;
- f - = (float)sf;
- *this = (sfloat)f;
- return ( *this );
- } // operator-=
-
- inline sfloat &sfloat::operator*=( sfloat sf )
- {
- float f = (float)*this;
- f *= (float)sf;
- *this = (sfloat)f;
- return ( *this );
- } // operator*=
-
- inline sfloat &sfloat::operator/=( sfloat sf )
- {
- float f = (float)*this;j
- *this = (sfloat)f;
- return ( *this );
- } // operator*=
-
- // increment operators
- inline sfloat sfloat::operator++()
- { return ( *this += 1.0f ); }
-
- inline sfloat sfloat::operator--()
- { return ( *this - = 1.0f ); }
-
- inline sfloat sfloat::operator++( int )
- {
- sfloat sf( *this );
- *this += 1.0f;
- return ( sf );
- } // operator ++
-
- inline sfloat sfloat::operator--(int )
- {
- sfloat sf( *this );
- *this - = 1.0f;
- return ( sf );
- } // operator --
-
- // sign change operators
- inline sfloat sfloat::operator+()
- { return ( *this ); }
-
- inline sfloat sfloat::operator-()
- {
- sfloat sf( 0.0f ):
- return ( sf - *this );
- } // operator -
-
- // Logical operators
- inline int operator==( sfloat sf1, sfloat sf2)
-
- { return ( sf1.s == sf2.s ); }
-
- inline int operator<=( sfloat sf1, sfloat sf2 )
- { return ( sf1.s <= sf2.s ); }
-
- inline int operator>=( sfloat sf1, sfloat sf2 )
- { return ( sf1.s >= sf2.s ); }
-
- inline int operator<( sfloat sf1, sfloat sf2 )
- { return ( sf1.s < sf2.s ); }
-
- inline int operator>( sfloat sf1, sfloat sf2 )
- { return ( sf1.s > sf2.s ); }
-
- inline int operator!=( sfloat sf1, sfloat sf2 )
- { return ( !( sf1.s == sf2.s )); }j
- // math operations
- inline sfloat operator+( sfloat sf1, sfloat sf2 )
- { return ( sf1 += sf2 ); }
-
- inline sfloat operator-( sfloat sf1, sfloat sf2 )
- { return ( sf1 - = sf2 ); }
-
- inline sfloat operator*( sfloat sf1, sfloat sf2 )
- { return ( sf1 *= sf2 ); }
-
- inline sfloat operator/( sfloat sf1, sfloat sf2 )
- { return ( sf1 /= sf2 ); }
-
- // End of SFLOAT.INL
-
-
- Listing 3 Definition of default values and function sfloatrange
- #include <sfloat.hpp>
-
- // Work space
- unsigned short sfloat::Exp;
- unsigned short sfloat::Man;
-
- // Bit masks to extract parts of ieee float
- unsigned long sfloat::fManSignMask = 0x80000000L;
- unsigned long sfloat::fExpMask = 0x7F800000L;
- unsigned long sfloat::fManMask = 0x007FFFFFL;
-
- // Bit mask to extract parts of short float
- unsigned short sfloat::sfManSignMask = 0x8000;
-
- // float exponent bias
- unsigned short sfloat::fBias = 127;
-
- // short float exponent bias
- unsigned short sfloat::sfBias = 127;
-
- // fBias - sfBias
- unsigned short sfloat::fsfBias= 0;
-
- // if signed flag
- unsigned short sfloat::Signed = 1;
-
-
- // number of float exponent bits
- unsigned short sfloat::fExpBits = 8;
-
- // number of short float exponent bits
- unsigned short sfloat::sfExpBits = 8;
-
- // number of float mantissa bits
- unsigned short sfloat::fManBits = 23;
-
- // number of short float mantiss bits
- unsigned short sfloat::sfManBits = 7;
-
- // number of float mantissa sign bits
- unsigned short sfloat::fManSignBits = 1;
-
- // number of float mantissa sign bits
- unsigned short sfloat::sfManSignBits = 1;
-
- // number of short float bits
- unsigned short sfloat::sfBits = 16;
-
- // float mantissa shift
- unsigned short sfloat::fManShift1 = 9;
-
- // float mantissa shift
- unsigned short sfloat::fManShift2 = 7;
-
- // short float mantissa shift
- unsigned short sfloat::sfManShift = 16;
-
- // float exponent shift
- unsigned short sfloat::fExpShift = 8;
-
- // short float exponent shift
- unsigned short sfloat::sfExpShift = 8;
-
- // short float exponent minimum bits
- unsigned short sfloat::sfExpBitsMin = 1;
-
- // short float exponent maximum bits
- unsigned short sfloat::sfExpBitsMax = 8;
-
- // union for conversion
- union conv sfloat::u;
-
- void sfloatrange( unsigned short sfNumExpBits,
- unsigned short sfSigned )
- {
-
- // Set the number of short float exponent bits
- sfloat::sfExpBits = sfNumExpBits;
- if ( sfloat::sfExpBits > sfloat::sfExpBitsMax )
- {
- sfloat::sfExpBits = sfloat::sfExpBitsMax;
- }
- else if ( sfloat::sfExpBits <
- sfloat::sfExpBitsMin )
- {
- sfloat::sfExpBits = sfloat::sfExpBitsMin;
-
- }
-
- // Set the number of short float sign bits
- if ( sfSigned )
- {
- sfloat::Signed = sfloat::sfManSignBits = 1;
- }
- else
- {
- sfloat::Signed = sfloat::sfManSignBits = 0;
- }
-
- // Set the number of short float mantissa bits
- sfloat::sfManBits = sfloat::sfBits -
- sfloat::sfManSignBits - sfloat::sfExpBits;
-
- // Set the short float exponent bias value
- sfloat::sfBias = 0;
- sfloat::sfBias =
- ( 1 << ( sfloat::sfExpBits - 1 )) - 1;
- sfloat::fsfBias = sfloat::fBias - sfloat::sfBias;
-
- // Set the converson shift values
- sfloat::fManShift1 = sfloat::sfManSignBits +
- sfloat::sfExpBits;
- sfloat::fManShift2 = sfloat::sfBits -
- sfloat::fExpBits - sfloat::fManSignBits;
- sfloat::sfManShift = sfloat::fManBits -
- sfloat::sfBits + sfloat::sfExpBits +
- sfloat::sfManSignBits;
- sfloat::fExpShift = sfloat::sfManBits +
- sfloat::sfManSignBits;
- sfloat::sfExpShift = sfloat::sfBits -
- sfloat::sfExpBits;
-
- } // sfloatrange
-
- // End SFLOAT.CPP
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- The Annotated ANSI C Standard
-
-
- P.J. Plauger
-
-
- P.J. Plauger is senior editor of The C Users Journal. He is convenor of the
- ISO C standards committee, WG14, and active on the C++ committee, WG21. His
- latest books are The Standard C Library, and Programming on Purpose (three
- volumes), all published by Prentice-Hall. You can reach him at
- pjp@plauger.com.
-
-
- If you really take your portable C seriously, there's no substitute for a
- handy copy of the C Standard. It is, by definition, the last word on what's
- valid Standard C and what's not. Unfortunately, you have to shell out roughly
- $65 or more (last time I asked) to either ANSI in New York City or ISO in
- Geneva to get a copy. And what you get for your money is a couple hundred
- pages of dense legalese. (We who wrote the C Standard also provided a
- Rationale, but that got dropped somewhere along the road to approval.)
- When I wrote The Standard C Library (Prentice Hall, 1992), I had the devil's
- own time getting permission to reprint about half the C Standard. ANSI has
- been struggling for years to work out a reasonable policy for granting reprint
- rights to their programming language standards, which have been growing
- steadily and rapidly in commercial importance. They just plain didn't have
- their act together when I asked. ISO was a bit better off, but still ill
- prepared to deal with the more complex standards that we programmers care
- about. Eventually, I got my permission from ISO, but I'm still not sure how.
- Now it seems that Osborne McGraw-Hill has also managed to strike a reprint
- deal, this time with ANSI. They offer this complete and verbatim edition of
- the ANSI/ISO C Standard for a mere $39.95 list price. Not only that, they
- provide considerable running commentary as well. The latter takes the form of
- annotation supplied by Herbert Schildt, one of the more prolific authors in
- the realm of C and C++ "how 2" books.
- You run a slight risk in using a derived standard such as this one. I had to
- convert Dave Prosser's troff master into a form digestible by Ventura
- Publisher. In the process, I introduced a number of typographical errors,
- mostly trivial formatting botches. A cursory scan of Osborne's opus has so far
- turned up only one typo. Far worse, however is the duplication of page 131 in
- place of page 132, which blows a nasty whole in the middle of fprintf. Those
- two botches make a prima facie case that the reproduction is not assuredly
- perfect. Still, they seem to have preserved the pagination of the official
- standard, and they have certainly done a tidier job than I did with my subset.
- I wouldn't stake my career on this book being right, but I'd probably trust it
- to settle most bar bets.
- Schildt's annotation is the real value added, and it's pretty good. Mostly I
- feel he draws attention to the right sorts of ancillary issues, and he has a
- fairly sensible perspective on the C Standard. You can usually count on him to
- tell you what the words mean and not just what they say.
- His annotation is somewhat compromised by the constraints of the presentation
- -- pages from the C Standard are on the left, his commentary is on the right.
- He's not above continuing a comment onto a subsequent page, but you can still
- see the brakes being applied rather often. Many of his comments and
- illustrations are terse or barely commented to stay roughly in sync with the
- standardese.
- In a few places, I find the annotation excessively terse. Schildt says little
- about the math functions, even though any number of them could profit from
- just a sentence or two. (frexp and ldexp are two that spring to mind.) True,
- many of these functions are ones that only a mathematician could love, but I
- still find rather laconic the assertion, "The descriptions of the hyperbolic
- functions are straightforward and need no further comment." Similarly, the
- functions in <locale.h> get only a cursory treatment, not that I can blame the
- guy for copping out here.
- More generally, Schildt endeavors to say at least something on each topic to
- aid understanding. I don't always agree with what he chose to say, but mostly
- I feel he says something helpful. Only occasionally did I catch him out. For
- example, his description of storage classes, a notoriously involuted topic,
- contains the usual errors of oversimplification. And his recitation of the
- history of register is slightly incorrect and incomplete. But as a rule, I
- think you will be more enlightened than misled by the annotations in this
- book.
- So if you've always wanted your own copy of the C Standard, here's your best
- chance. Not only do you get a reasonably accurate facsimile of the gospel at
- bargain rates, you get some helpful commentary in the bargain. That makes this
- book a better than average buy.
- Title: The Annotated ANSI C Standard
- Author: annotated by Herbert Schildt
- Publisher: Osborne McGraw Hill, 1990
- Price: $39.95
- ISBN: 0-07-881952-0
- Pages: 600
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Porting Microsoft's Foundation Class Library to UNIX
-
-
- Scot Wingo and Louis Lu
-
-
- Scot is a principal software engineer at Bristol Technology Incorporated,
- where he leads the development of the Wind/U Windows-to-UNIX portability
- toolkit. Louis is also a software engineer at Bristol Technology, where he
- leads the development of Bristol's Xprinter product. Scot and Lu are both on
- the development team that added Microsoft Foundation Class Library support to
- the Wind/U toolkit, allowing MFC 2.0 applications to port to UNIX/Motif. Scot
- and Lu can be reached at (203) 438-6969 or via e-mail at scot@bristol.com and
- lu@bristol.com.
-
-
-
-
- Introduction
-
-
- Many programmers believe that by using C++ with its strong type checking they
- can achieve the multi-platform programmer's nirvana: 100% portable code. We
- tested this theory by porting a large 16-bit based C++ library, Microsoft
- Foundation Class library (MFC), to 32-bit UNIX workstations. We found that
- while using C++ definitely increases your program's portability, it still is
- not the portability silver bullet. This article highlights some common
- portability problems and shows examples of them in the context of MFC.
- Microsoft recently released the second release of MFC with their Visual C++
- development environment. MFC provides the user with the best of both worlds, a
- set of basic data type classes and an application framework. The basic classes
- provide support for collections, exceptions, file I/O, strings, and run-time
- class information. MFC's application framework is built upon the Windows API
- and implements several advanced application features such as:
- toolbars
- status bars
- Multiple Document Interface (MDI)
- Most Recently Used (MRU) -- a list of recently used files in a menu maintained
- for you
- message mapping -- allows you to easily map messages to member functions
- splitter windows -- similar to those used in Excel and Word
- print preview/printing
- Microsoft provides MFC source code with the Visual C++ product as a reference.
- We used this code as the starting point for porting the MFC to UNIX. By making
- MFC available on UNIX, we hope to improve the quality of applications and
- facilitate the creation of feature rich applications on UNIX. Moreover, with
- MFC on both Windows and UNIX, multi-platform developers can support
- applications on both platforms with a single set of source code.
-
-
- Getting Started
-
-
- After copying the MFC source files to the UNIX workstation, we first had to
- convert the files from DOS format to UNIX format. (DOS files have both
- carriage returns and line feeds. UNIX files have only linefeeds.) Most
- workstations have utilities, such as dos2unix, for performing the conversion.
- Next, we created a quick and dirty makefile that compiles all of the source
- files and stuffs them into a library.
- Before firing up the compiler, we investigated which macros should be defined.
- The first macro, NO_VBX_CONTROLS, excludes the MFC support for Visual Basic
- (VBX) controls. Since there is no concept of Visual Basic controls or the
- underlying library on UNIX, we defined this macro. Microsoft was kind enough
- to also include a PORTABLE macro which, when defined, turns off sections of
- inline Intel x86 assembly code and turns on C++ equivalents. For the initial
- compilation, we also decided to turn on the _DEBUG macro, which turns off most
- inlining and turns on tons of useful assertions. After defining these macros,
- we kicked off the first compile.
- The first portability problems were due to subtle differences in the various
- compilers we used. Most UNIX C++ compilers are based on the AT&T cfront
- implementation. Each hardware vendor typically licenses the cfront compiler
- and adds the platform specific components needed to support their hardware. We
- ported the MFC to Sun SPARCstation and HP 9000/700 workstations. The two
- compilers used in this port were the SPARCworks C++ 3.0.1 and HP's C++ version
- 3.0.2. Both compilers are based on version 3.2 of cfront. Visual C++ uses
- Microsoft's C 8.0 compiler, which claims cfront compliance but is not based
- directly on that implementation.
- The biggest compiler difference was where the _DEBUG version of MFC tracks
- memory allocation by overloading operator new. The debug version of new is
- overloaded to take filename and line number information. Listing 1 shows the
- relevant code.
- When _DEBUG is defined, a new expression should be preprocessed to:
- new ==> DEBUG_NEW
- DEBUG_NEW ==> new(__FILE__, __LINE__)
- CObject *obj = new ( "nested.C" , 30 ) CObject;
- This macro expansion may appear recursive, but it is not. Microsoft C and Sun
- C++ 3.0 expand these macros correctly, but HP C++ 3.0 does not. The HP
- preprocessor fails with the following error message:
- nested.C: 30: Overflowed replacement buffer.
- The HP C++ preprocessor does not follow the macro expansion rules defined in
- The Annotated C++ Reference Manual. In this manual, the C++ ANSI base
- document, section 16.3.3 "Rescanning and Further Replacement" states: "If the
- name of the macro being replaced is found during this scan or during
- subsequent rescanning, it is not replaced." Hopefully, with the publication of
- an ANSI C++ standard, differences like these will no longer be an issue.
- To fix the problem, we just disabled the debug version of new on HP
- workstations by adding:
- #ifndef HPUX
- #define new DEBUG_NEW
- #endif
-
-
- Integer Size Issues
-
-
- Since integers in the Windows environment are 16 bits wide, C programmers
- often fall into the common mistake of assuming that other 16-bit data types
- are always the same size as an integer. This is not the case in 32-bit
- environments. See Listing 2 for an example of a 16/32-bit problem waiting to
- happen. Porting the code in Listing 2 to UNIX would cause problems if the
- value of nOne was ever greater than 65,535, because it would suddenly become
- too large to fit into wTwo (which is only 16 bits wide). The wTwo variable
- would wrap and start back at 0.
- C++'s strong type checking will never allow code like this to survive, so
- 16/32-bit issues are not usually a common C++ problem. We did find one
- significant 16/32-bit portability problem in the MFC message mapping
- mechanism. To better understand the problem, let's look at how Microsoft has
- implemented Message Mapping in MFC.
- In Windows SDK programming, programs usually handle messages in a window
- procedure, or WinProc. MFC's Message Mapping provides a facility that allows
- you to map a windows message to a C++ class method. This paradigm is a natural
- for object-oriented programming because it lets you think of each message
- handler as being responsible for handling the communication between your
- object and the application framework. Some frameworks use virtual functions
- for message handling, but this results in very large vtables and poor
- performance. Borland's Object Windows Library (OWL) uses a "dynamic dispatch
- table" which is implemented through a new C++ syntax. The drawback of this
- approach is that it requires extensions to the C++ language, and thus is not
- portable. MFC implements message mapping through a set of macros that create a
- message-mapping table inside each class. Here's an example of how to declare a
- simple message map:
- BEGIN_MESSAGE_MAP()
- ON_WM_LBUTTONDOWN()
- ON_WM_LBUTTONUP()
- ON_WM_MOUSEMOVE()
- ON_COMMAND(ID_FILE_PRINT, CView::OnFilePrint)
-
- ON_COMMAND(ID_FILE_PRINT_PREVIEW,
- CView::OnFilePrintPreview)
- END_MESSAGE_MAP()
- Each entry can use a default mapping such as ON_WM_LBUTTONDOWN, which assumes
- that you would like to map WM_LBUTTONDOWN to the OnLButtonDown member
- function. You can also specify the mapping with the more generic
- ON_MESSAGE(message, function) macro.
- Each entry in the table has the following four elements:
- UINT nMessage
- UINT nID
- UINT nSig
- AFX_PMSG pfn
- where nMessage is the message identifier (such as "WM_PAINT, WM_MOUSEMOVE"),
- nID is the identifier for the recipient of the message, nSig is the signature
- alias (more on this later), and pfn is a pointer to the method for handling
- the specified message.
- The beauty of this message-mapping scheme is that it is very fast (based on an
- integer lookup) and fairly portable. The portability problem comes from the
- way the MFC must store the member function pointers in the table. To avoid
- complete chaos, each table entry uses the nSig field to store the return value
- and argument types of each message handling method. For example, if you have a
- message handler defined as:
- void MessageHandler(WPARAM wParam, LPARAM lParam);
- the nSig value for this function would be AfxSig_vwl. All possible types of
- declarations are enumerated in a MFC header file. This scheme allows the
- message mapping to sneak around C++'s strong type-checking, while still
- providing a level of type checking. When a message comes in, MFC uses the nSig
- value to match the message fields to the fields of the function. The only
- problem with this scheme is that if a function is defined as:
- void MessageHandler(WPARAM wParam,
- CPoint cpoint);
- the nSig value is also AFXSig_vwl. Since the cpoint is treated like a long,
- the CPoint constructor will not be called, and if any conversion other than
- copying needs to happen it will be skipped.
- To fix this problem, we added some new values to the signature enumeration,
- such as AfxSig_vwp, which will ensure that the CPoint constructor is called
- and any conversions are made. The lesson to be learned here is that if you
- circumvent C++'s strong type-checking, you will pay a penalty in portability.
-
-
- Alignment and Byte Order
-
-
- Another common 16/32-bit problem is structure packing. On 16-bit systems,
- compilers pack structures based on 16-bit boundaries. On 32-bit systems, the
- compilers often use 32-bit boundaries (they waste a byte here and there to
- ensure that the elements of a structure are aligned properly). The end result
- is that the sizeof operator will return different results in 16 and 32-bit
- environments. Structure packing can cause the most problems if you read
- structures to and from binary files. MFC does not write structures to file,
- but does not prevent the programmer from doing so. It is more portable to
- avoid writing structures to file and stick with the basic datatypes when
- writing binary files.
- The other common portability problem between Windows and UNIX is byte
- swapping. Some UNIX workstations such as the Sun SPARCstation, have Big Endian
- (versus Intel's Little Endian) byte ordering. This means, among other things,
- that the programmer cannot make assumptions about the order of the bytes
- within the fields of a structure. C++ does not protect the programmer from
- these problems, and we encountered a significant number of byte-swapping
- problems in MFC. See Listing 3 for a byte-swapping problem in the constructor
- of the MFC class CPoint.
- This code makes the fatal mistake of assuming that data in the DWORD dwPoint
- will be ordered exactly the same as the tagPoint structure. To fix the
- problem, we modified the CPoint constructor to use Microsoft's portable HIWORD
- and LOWORD macros (these live in windows.h) to deconstruct a DWORD properly.
- Here's the portable version of CPoint: :CPoint(DWORD):
- CPoint::CPoint(DWORD dwPoint)
- {
- x = LOWORD(dwPoint);
- y= HIWORD(dwPoint);
- }
- The MFC CPoint and CSize classes contained substantial byte ordering problems
- that we discovered by reviewing the source and scanning for typecasts on the
- left side of expressions.
- Most RISC based systems can only write words to memory on 16-bit boundaries.
- If programs do not follow this rule, a core dump is created with a bus error.
- The MFC object serialization was a source of unaligned write problems, as
- shown in Listing 4. This code assumes that the m_lpBufCur can be written
- without consideration of its alignment in memory. This code caused an
- immediate bus error on both of the target platforms.
- The safest way to avoid these problems is to use the memcpy function, which
- will handle memory alignment for you when necessary. Listing 5 shows the more
- portable version of Listing 4.
-
-
- Operating System Differences
-
-
- The UNIX systems used here have a flat 32-bit memory scheme, versus DOS's
- segmented memory. MFC has some dependencies on the DOS segmented memory. A
- typical example is:
- #define _AFX_FP_OFF(thing) (*((UINT*)&(thing)))
- #define _AFX_FP_SEG(lp) (*((UINT*)&(lp)+1))
- These macros obtain the segment and offset of a pointer. Needless to say, they
- do not work under UNIX. We replaced each instance of this macro with more
- portable code on a case-by-case basis.
- File system differences are another example of operating-system portability
- problems. The UNIX file system allows file names to be 250 characters long,
- and separates directory names with a / instead of a \ character. DOS file
- names are usually in the format:
- drive_letter:\path\filename.EXT
- where file_name is limited to eight charaacters. The MFC File I/O routines
- contained many problems in this area. So too did the code for serialization
- and MRU.
- MFC uses serialization to provide object persistence in binary files. All
- object serialization is built on the basic types such as WORD, DWORD, float,
- int, etc. Since MFC defines the serialization for these low-level types
- already, developers are isolated from many of the portability problems
- associated with binary file I/O. In the future, we consider re-writing the
- basic type serialization code to be able to read files written on either
- Little or Big Endian machines. To do this, we will always assume that data
- should be written in one byte ordering. If a machine doesn't use that byte
- ordering, the serialization will automatically change to re-order data going
- into and out of binary files via serialization.
-
-
- It Works!
-
-
- After fixing the mentioned portability problems, we were able to get some MFC
- samples up and running on UNIX, as shown in Figure 1. The porting effort took
- two people approximately two months to examine all of the library and
- eliminate the portability problems. In total there were over 100 portability
- problems that had to be fixed.
- About six months after our port of the 16-bit MFC to UNIX, Microsoft released
- the Windows NT version of MFC. We dissected it to see what portability
- improvements Microsoft had made in their port from the 16-bit Windows
- environment to the 32-bit NT environment. The biggest improvements, as
- expected, were in the areas of 16/32-bit and memory model portability.
- Microsoft fixed all of the examples mentioned earlier in this article, with
- the exception of some byte swapping problems, because NT only runs on Little
- Endian processors. Porting this version of MFC to UNIX will take much less
- time and effort.
- Porting the 16-bit MFC to UNIX was a challenging exercise in finding and
- fixing portability problems. Fellow C++ programmers should take these
- experiences to heart and write code that avoids these portability pitfalls.
- With the multitude of platforms and operating environments available, you
- never know on which platform your code will be running.
-
-
- Bibliography
-
-
-
- [1] Microsoft Corp. Microsoft Visual C++ Class Library Reference
- [2] Microsoft Corp. Microsoft Visual C++ Class Library Users' Guide
- [3] Margaret A. Ellis, Bjarne Stroustrup, The Annotated C++ Reference Manual,
- Addison-Wesley, [1990].
- Figure 1 A sample application ported to UNIX
-
- Listing 1 MFC overloading of new for debugging
- #include <stdlib.h>
- // MFC style debug new defines.
-
- class CObject
- {
- public:
- #ifdef _DEBUG
- // for file name/line number tracking using DEBUG_NEW
- void* operator new(size_t nSize,
- char * lpszFileName, int nLine);
- #endif
- };
-
- #ifdef _DEBUG
- // Memory tracking allocation
- #define DEBUG_NEW new(__FILE__, __LINE__)
- #else
- // NonDebug version that assume everything is OK
- #define DEBUG_NEW new
- #endif
-
- #ifdef _DEBUG
- #define new DEBUG_NEW
- #endif
-
- main()
- {
- CObject *obj = new CObject;
- }
- // End of File
-
-
- Listing 2 A 16/32-bit portability pitfall
- typedef unsigned short WORD;
-
- int function()
- {
- WORD wOne;
- int nTwo;
- ...
- wTwo = (WORD)nOne;
- ...
- }
- // End of File
-
-
- Listing 3 Illustrates byte-swapping problem
- struct tagPOINT
- {
- short x;
- short y;
- };
-
-
- class CPoint : tagPOINT {
- ...
- CPoint::CPoint(DWORD);
- ...
- };
-
- CPoint::CPoint(DWORD dwPoint);
- {
- *(DWORD *)this = dwPoint;
- }
- // End of File
-
-
- Listing 4 Code that may create memory alignment problems
- Class CArchive {
- ...
- BYTE * m_lpBufCur;
- BYTE * m_lpBufMax;
- ...
- };
-
- AFX_INLINE CArchive& CArchive::operator<<(DWORD dw)
- {
- if (m_lpBufCur + sizeof(DWORD) > m_lpBufMax)
- Flush();
- *(DWORD FAR*)m_lpBufCur = dw;
- m_lpBufCur += sizeof(DWORD);
- return *this;
- }
-
- // End of File
-
-
- Listing 5 Fixes potential memory alignment problems in Listing 4
- AFX_INLINE CArchive& CArchive::operator<<(DWORD dw)
- {
- if (m_lpBufCur + sizeof(DWORD) > m_lpBufMax)
- Flush();
-
- memcpy(m_lpBufCur,&dw,sizeof(DWORD));
-
- m_lpBufCur += sizeof(DWORD);
- return *this;
- }
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Handling Time-Consuming Windows Tasks
-
-
- Andy Yuen
-
-
- Andy Yuen has a Master and a Bachelor degree in Electrical Engineering from
- Carleton University in Ottawa, Canada. He has been working in the software
- industry for over 14 years. He is currently a senior software engineer with a
- company in Sydney, Australia, which specializes in automation and network
- management. Andy can be reached by e-mail at auawyl@grieg.sdi.oz.au.
-
-
-
-
- Introduction
-
-
- Microsoft Windows 3.X and OS/2 Presentation Manager (PM) programmers are
- probably all aware of the "1/10 second rule" -- an application should take no
- longer than 1/10 of a second to process a message. Any task which takes longer
- can be considered a "big job." And a big job degrades the system response.
- Sometimes, it even brings the system to a standstill while it is running. This
- happens because there is only one input message queue for all Windows or PM
- applications. Once you hold up this queue, other applications cannot run.
- OS/2, to some extent, is less affected by a big job than Windows due to its
- true multitasking capability; other non-PM tasks can still run. Also, OS/2 has
- a lot more built-in facilities to combat this problem. (For example, see the
- sidebar on OS/2's object windows.)
- The usual work-around under Windows is to have PeekMessage loops scattered all
- through a program. This article proposes an alternate solution. It describes
- the small C++ class Cschlr which provides the functions of a simple scheduler.
- It manages thread creation, deletion, and synchronization by using counting
- semaphores and timed functions. A Windows program can create one or more
- threads to handle time-consuming tasks and then pass control back to Windows
- quickly via the message loop. Thread interactions can be synchronized by using
- counting semaphores and timed functions. Although this C++ class has been
- designed from the ground up for Windows, it can actually be used unmodified
- for OS/2. I also discuss a number of the design/implementation issues, and
- illustrate its use with a complete example.
-
-
- Design
-
-
- The design criteria of Cschlr are based on the KISS (Keep-It-Simple-Stupid)
- principle. Consequently, Cschlr must be simple, small, and portable. It is
- intended to work in the following fashion:
- 1. Threads can be created before entering the main PeekMessage loop or can be
- created anywhere in an application.
- 2. Threads can terminate themselves anywhere within an application.
- 3. Parameters can be passed to threads during thread creation.
- 4. PeekMessage needs only be used once in WinMain.
- 5. Counting semaphores and timed functions provide synchronization among
- threads.
- The interface to the Cschlr class (C++ header file) can be found in Listing 1.
- Member functions can be found in Listing 2. The constructor and destructor are
- self-explanatory. I give below a brief description of the rest of the methods:
- int CreateThread(THDFN func, int stacksize, void *param);
- Used to create a thread, where func is the thread procedure, stacksize is the
- size of the stack for the thread and param is a pointer to user parameters to
- be passed to the thread. This method is modelled after the BeginThread
- function provided by C compilers for OS/2. The returned integer is the thread
- number.
- void Suicide();
- The counterpart of CreateThread. A thread can kill itself by either calling
- Suicide explicitly or by letting itself to fall through the end of the thread
- function.
- Csemq* CreateSem(long lValue);
- Used to create a counting semaphore and to initialise its value to lValue. It
- returns the semaphore object on success or a null pointer on failure.
- void Signal(Csemq *sem, long lMaxCount = 256);
- Used to signal a semaphore. Sem is a semaphore object created by CreateSem.
- lMaxCount is the maximum semaphore count allowed. If the current semaphore
- count is greater than or equal to lMaxCount, its count will not be increased
- any further. The calling thread always gets back control after calling Signa1.
- (Refer to the sidebar for a description of the operation of counting
- semaphores)
- void Wait(Csemq *sem);
- Used to wait on a semaphore. If the semaphore count is smaller than 1, the
- calling thread will be blocked until signalled by another thread.
- void Preempt();
- Used by the calling thread to give up the CPU voluntarily.
- void Sleep(long 1Seconds);
- Used by the calling thread to go to sleep for lSeconds. Another ready-to-run
- thread will be despatched.
- void GetSemStates(Csemq *sem, long &lCount, int &fWait);
- Used to retrieve the current states of the semaphore object sem. On return,
- lCount contains the semaphore count and fWait is set to TRUE if there is at
- least one thread waiting on the semaphore and FALSE otherwise.
- void Run();
- Used to dispatch threads and to give them a slice of the CPU. This method is
- normally only invoked in the main PeekMessage loop.
-
-
- Implementation
-
-
- Cschlr implements the high-level thread and semaphore management services and
- provides the interface to the user. It maintains a ready-to-run queue for
- threads and keeps tracks of the identity of the running thread. The state
- transition diagram of Cschlr is shown in Figure 1. Preempt handles the task
- switch. It moves the running thread to the ready-to-run queue and schedules
- another thread from the ready-to-run queue for execution. Wait allows the
- running thread to wait in a semaphore queue while another ready-to-run thread
- is dispatched. Signal moves a thread waiting in a semaphore queue back to the
- ready-to-run queue. And Sleep and Wakeup provide time-delay services to all
- threads. Please note that Wakeup is used only internally by Cschlr.
- In order to implement the above, Cschlr calls on the services of two private
- classes: Csemq (see Listing 3 and Listing 4) and Cthread (Listing 5 and
- Listing 6). Consequently, it is easy to understand how Cschlr works once one
- knows the internals of Csemq and Cthread. Hence, I'll start with these and
- come back to Cschlr later.
- Csemq is a private class of Cschlr. This means that all of its methods and
- data are private to the containing class. Only Cschlr can access them, besides
- Csemq itself, because Cschlr has been declared as a friend of Csemq.
- Csemq provides semaphore services to Cschlr. There is a counter and a queue
- associated with each counting semaphore. Most C++ compilers come with class
- libraries which provide common objects like lists, stacks, queues, etc.
- However, if I had used any one of these class libraries, the code would no
- longer be portable among different compilers. Also, in examining the
- requirements, and sticking to the small-is-beautiful design philosophy, I
- decided to implement my own queue management routines because I don't need the
- full functions of these class libraries anyway.
- I use a long integer to represent both a queue and a counter. If the MSB (Most
- Significant Bit) is a 0, the long integer is a counter, which means that the
- counter is at most 31 bits long. When the MSB is 1, the long integer is a
- queue. Each set bit (non-zero bit) in the long integer represents a thread.
- For example, a value of 0x8003 contains two threads: bit 0 is thread #0, and
- bit 1 is thread #1. This implies that there are at most 31 threads possible.
- This should be more than enough for our purpose.
- If multiple threads are waiting on a semaphore, which thread should we resume
- when a signal has been received? I decided that a somewhat round-robin
- algorithm should be implemented. I said "somewhat" because it is not possible
- to implement a truly round-robin algorithm using the queue implementation
- described above -- such a queue does not keep the order in which threads are
- queued. (Someone might point out that this is a contradiction in terms, and
- that it is not really a queue if it does not maintain the order. Maybe I
- should have called it an unordered list. Anyway, I'll refer to it as a queue
- throughout this article.)
-
- In order to simulate a round-robin effect, each Csemq object has a private
- variable priority associated with it. Priority is set to CSC_NO_THREAD (whose
- value is 31) when a Csemq object is created. Method Dequeue returns a thread
- number for Cschlr to schedule its execution. It starts scanning the semaphore
- queue one bit to the left of the bit position recorded in priority. Since
- priority is set to 31, it starts scanning from bit 0 (wraparound) moving to
- bit 1, bit 2, etc.
- When a thread is found, its position is saved in priority. For example, if
- thread 4 (bit 4) is the first thread found waiting on the semaphore, Dequeue
- starts scanning from bit 5 the next time it is invoked. The reason that I go
- to such trouble to implement this scheduling algorithm is to prevent a thread
- from monopolizing the CPU. Take for example the following case: thread 0 and
- thread 1 are waiting on the same semaphore. Assume that thread 0 starts
- execution and later waits on the same semaphore. If I start scanning always
- from bit 0, thread 0 will always get awakened when the semaphore gets
- signalled. Thread 1 will never get a chance to run. By implementing the
- aforementioned "somewhat" round-robin scheduling algorithm, this situation
- could be avoided.
- Semaphore operations such as Signal and Wait should be atomic or indivisible.
- They cannot be interrupted in the middle of an operation. If the operation is
- not atomic, one might get into the situation where two threads try to perform
- an operation on the same semaphore concurrently. It may happen that while one
- operation is half-way through the operation, the operating system switches to
- another thread which then updates the semaphore count or queue, rendering the
- semaphore structure inconsistent. Under Windows, this would never happen
- because of the cooperative multitasking model used: a Windows procedure only
- gives up the CPU voluntarily. There is no chance of having two truly
- concurrent threads of execution under Windows. The PeekMessage loop is the
- mechanism to multiplex the CPU among a number of pseudo-concurrent threads in
- an application which uses Cschlr.
-
-
- Class Cthread
-
-
- Cthread is also a private class of Cschlr. It provides thread objects to
- Cschlr. In order to keep this class implementation portable, I avoided the use
- of inline assembly since the syntax is different among compilers. The most
- portable way to implement this is to use the Standard C library's setjmp and
- longjmp functions. These were originally designed for error recovery purposes
- and people are starting to question their usefulness under C++, which has
- exception handling. However, these two funcitons prove to be invaluable in
- implementing the Cthread class.
- Unfortunately, the use of setjmp and longjmp do not make Cthread 100 per cent
- portable, due to the layout of the jmp_buf structure. jmp_buf is used to save
- the context of execution when setjmp is called. The execution context consists
- of various CPU registers. The only time that I need to know which registers go
- where is during the creation of a thread. One needs to set the CS:IP (Code
- Segment:Instruction Pointer) and the SS:SP (Stack Segment:Stack Pointer) to
- simulate a setjmp during thread creation.
- Cthread maps the jmp_buf to a data structure defined within CTHREAD. CPP. The
- typedef statement:
- typedef struct {
- long ip; //CS:IP
- short filler; //don't care
- long sp; //SS:SP
- } *mapptr;
- defines a pointer type for this purpose. It works for the Zortech C++ Version
- 3 compilers for Windows and OS/2, which I am using for the development of
- Cschlr. If you are using a different compiler, you need to change this type
- definition to reflect the difference in implementation of these functions.
- There are two ways to find out the structure of your compiler's jmp_buf. The
- first is to examine the sources for the C library if they are available. The
- second method, and probably the easier of the two, is to write a very simple C
- program which calls setjmp. Compile and link it with debugging information.
- Then use the source-code debugger to compare the register and jmp_buf values
- before and after executing setjmp to find out where the CS:IP and SS:SP
- registers get saved.
- The other thing to remember is that Cthread is intended to be compiled using
- the large memory model. The magic number WORDSTOREMOVE is defined as 3.
- Different values are needed for different memory models. The magic number is 3
- because the C library function setjmp has the form:
- int setjmp(jmp_buf env);
- A thread pushes env onto the stack, and then calls setjmp, thus saving the
- 32-bit return address on the stack. setjmp then saves the BP (Base Pointer)
- register on the stack (the standard C function prologue). When longjmp is
- called to restore the stack environment, it discards the saved BP and return
- address on the stack. For the large memory model, the return address occupies
- four bytes while BP occupies two bytes, for a total of three 16-bit words.
- Since this function relies on the standard C prologue to work properly, the
- compiler must be instructed to generate the standard C function prologue
- instead of the one for Windows. Consequently, you will notice that there are
- two different compile option macros defined in the Makefile (Listing 8) for
- the example program: one for compiling Cschlr, Csemq, and Cthread, the other
- for compiling the Windows main program.
- Cthread saves the context of a thread in the data structure THD. The member
- Context is the jmp_buf. TotalLen is the total length of the THD structure in
- bytes. This value is needed for the destructor of a thread object, which frees
- the allocated storage for the stack. OverFlowed is a flag set to zero to guard
- against stack overflow. Stack is the stack for the thread. Since the stack
- grows downward in a X86 processor, if overflow occurs Overflowed will
- hopefully be overwritten with a nonzero value. Cthread refuses to switch to a
- thread with a nonzero OverFlowed flag when the Transfer method is invoked.
- The implementation of Cschlr is quite straightforward except for CreateThread,
- which passes a user parameter to the thread, and the implicit termination of a
- thread by falling through the thread function. Again, Cschlr is designed for
- the large memory model. CreateThread creates a new thread by:
- task[i] = new Cthread(func, stacksize, (int *) &retaddr, 4);
- The argument value 4 specifies the number of 16-bit words to copy to the new
- thread's stack. In order to force a thread to kill itself when falling through
- the thread function body, a return address is copied to the stack, together
- with the user parameter to be passed to the thread. This return address points
- to Kill, a static function within CTHREAD.CPP. See Figure 2 for the stack
- layout during thread creation. When the thread exits the thread function, it
- returns to Kill, which calls Suicide to terminate itself. The address of Kill
- and the user parameter pointer take up four words when the large memory model
- is used.
- Note also that there are 33 slots reserved:
- Cthread *task[CSC_NO_THREAD + 3];
- in the Cschlr object although there can only be 31 threads. One extra slot is
- used for the Windows main program while the other is for termination of
- threads. When Run, the method that multiplexes the CPU among various threads,
- is called within the PeekMessage loop, its context gets saved in the slot
- task[MAIN]. When a thread gives up the CPU voluntarily, control is passed back
- to MAIN so that it can continue with the PeekMessage loop. Consequently, a
- Wait or a Sleep must not be called within the PeekMessage loop. Otherwise it
- will bring back the big-job problem that we meant to avoid.
- When Sleep is called, the thread is put to sleep in the dummy semaphore queue
- WaitQ. All sleeping threads are maintained in chronological order in the form
- of a linked list. During each task switch, the table is scanned and the
- threads that have reached the end of their sleep are moved to the ready-to-run
- queue ReadyQ. If there is no ready-to-run thread in ReadyQ, control is
- returned to MAIN immediately.
- The slot task[DUMMY] is used for thread termination. It is required because a
- Cthread object always saves the context of the running thread during a task
- switch. task[DUMMY] is used for saving the context of the terminating thread
- temporarily. The context saved in it never needs to be restored because that
- thread no longer exists after committing suicide.
-
-
- An Example
-
-
- The best way to understand Cschlr is by example. Charles Petzold describes the
- Salvage benchmark as an example of a "big job" [2]. I'll use the same example
- to contrast the Cschlr solution to the big job to the solutions described in
- his article. Instead of starting from scratch, I'll use Gpf Systems Inc.'s
- OS/2-based Gpf code generator. My version of Gpf is Version 1.3, which runs on
- OS/2 V1.3. It is a tool very similar to Microsoft's Visual C++. However, it
- generates Windows, OS/2 16-bit, and 32-bit C code.
- It was ahead of its time when I first got it in early 1992. I believe the
- latest version is Gpf 2.0. Instead of going into details as to how it works,
- let me just say that it generates the framework for a Windows applications and
- creates the necessary resources. I am just using it as an event dispatcher,
- which means that whenever an event occurs -- the user clicks on a menu item,
- for example -- Gpf will call my function for handling that event.
- The example program is called BIGJOB.CPP. [Note: BIGJOB.CCP is not listed here
- because of its size. It will be on the monthly code disk. See BIGJOB.H,
- Listing 7, which implements the main logic of the example. -- mb] BIGJOB has
- only two menu items: Repetition and Action. Repetition allows a user to select
- the number of iterations to run the Salvage benchmark and Action allows a user
- to either Start or Abort its execution. Running it for 10,000 iterations may
- take quite a while depending on the speed of your system. BIGJOB displays a
- progress report every five seconds and shows the total time taken for running
- the benchmark to completion. While the benchmark is running, a user can switch
- to any Windows application at will without appreciable delay. An application
- which is not designed to execute a big job will lock up the computer for the
- duration of the benchmark. This example clearly demonstrates the power of
- Cschlr.
- As I have said before, I am using Gpf as an event dispatcher. Whenever an
- event occurs, one of my functions gets called. The only thing I need to change
- in the Gpf generated source code is to replace the generated main message loop
- with my PeekMessage loop. In order to expedite the development, the body of my
- event handling functions (the file BIGJOB.H) is included in BIGJOB.CCP by an
- #include directive. People may frown at this method, but it is actually
- suggested in the Gpf manual [4].
- Another thing that needs to be done is to rename the generated file BIGJOB.C
- to BIGJOB.CPP in order to use C++ classes. Some may find Gpf's generated
- comments to be excessive. Other may find that, for such a small application,
- the generated code is more complex than necessary and many functions are
- included although they never get used. This is no longer true if you use Gpf
- to develop a large application. After all, there is always a price to pay for
- the convenience one gets by using Gpf.
- The events that I hook into include:
-
-
- WM_CREATE
-
-
- Gpf calls CreateThread(VOID) to create two threads: Timer and Big. Timer is
- responsible for the periodic progress report when the benchmark is running. It
- invalidates the main window to force PaintWindow(pGpfParms) to repaint the
- screen. Note that Gpf usually passes a pointer of type PGPFPARMS, which
- contains all windows-related parameters like handles, messages, etc. to a
- user-supplied function for event handling.
- Big is the thread that actually carries out the benchmark. Notice that Big
- gives up the CPU voluntarily after each invocation of Salvage by calling
- mtask.Preempt(). This allows the main PeekMessage loop to regain control so
- that other Windows applications may run. It also examines fContinue to see
- whether it should abort the benchmark and enable the Start item in the
- application menu.
-
-
- WM_PAINT
-
-
- Gpf calls PaintWindows(pGpfParms) which displays the appropriate message on
- the screen.
-
-
- WM_COMMAND
-
-
-
- Gpf calls SelectRep(pGpfParms) which sets the number of repetitions and puts a
- check mark in the selected item by calling the Gpf function GpfMenuTick.
- Gpf calls StartSalvage(pGpfParms) to disable the item Start to avoid further
- selection. The function then enables the Abort item, sets the boolean variable
- fContinue to TRUE, and signals the Big thread to carry out the benchmark.
- AbortSalvage(pGpfParms) sets fContinue to FALSE, disables the Abort item, and
- enables the Start item by calling the Gpf function GpfMenuGray.
- Semaphores semtimer and semjob are used to synchronize the actions among the
- Window procedure and the threads Timer and Big.
- PeekMessageLoop is the function that replaces the normal GetMessage loop in
- WinMain.
-
-
- Cross-platform Development
-
-
- Cschlr works under both Windows and 16-bit OS/2 PM. Since I am using compilers
- from the same vendor, I don't even have to make any change to the mapptr type
- definition in Cthread because the jmp_buf structure remains unchanged for both
- Windows and OS/2 compilers. Cschlr works under OS/2 because OS/2's PM also has
- the same architecture as Windows: it uses a single message queue for all PM
- applications.
- Cschlr does not take advantage of the true multi-tasking (multi-threading)
- capabilities of OS/2. It dispatches threads by multiplexing the CPU with Run()
- in the main PeekMessage loop in Windows and the WinPeekMsg loop in OS/2. But
- it has one advantage that true OS/2 multi-tasking does not have. OS/2 threads
- cannot directly force an event to occur by calling APIs like WinInvalidateRect
- if the calling thread is not the message thread (the thread where the message
- queue is created). Cschlr allows you to do that because all threads it creates
- are actually part of the message thread.
- Although one can use Gpf to define the screen layout and use the same Gpf file
- to generate both Windows and OS/2 C source codes, the user-supplied event
- handling functions still need to be changed if they use GUI-specific calls,
- due to the differences between Windows and PM APIs. This brings one to ask the
- question: is there a tool which allows you to develop an application once and
- port it to other platforms without any source changes? The answer is both yes
- and no. Yes, there are some tools in the market which claim to do that. No,
- they may not do what you intend them to do.
- Most of these tools take the common denominator approach, which limits you to
- use only a small subset of features available from a particular GUI. Although
- they usually provide ways for you to handle GUI-specific features, once you've
- done that, the application is no longer portable. Some class libraries attempt
- to encapsulate GUIs in C++ and force you to learn a totally new set of
- functions and programming model to gain portability. Some of them have over
- 150 classes with thousands of methods. The learning curve is steep even if
- they can do what they claim to do.
- My friends and I did some extensive research on cross-platform tools sometime
- ago. The verdict: package the non-GUI-related functions in the most portable
- way possible and use the best tool you can find to generate the particular GUI
- that you are interested in. If the non-GUI part -- the heart of your
- application -- is well packaged, it should be possible to interface the two
- components together. Of course, it is easier said than done. The best way to
- do this varies depending on the application. Since Cschlr works unmodified
- under both Windows and OS/2, I think it should qualify as a cross-platform
- tool.
-
-
- Conclusions
-
-
- I have presented a simple C++ class, Cschlr, to handle time-consuming tasks
- under Windows, and demonstrated its use by a complete example. It should be
- evident from the example that Cschlr helps in limiting the proliferation of
- PeekMessage loops that are quite commonplace in applications that are CPU
- bound. It reduces the total number of PeekMessage loops to exactly one and
- provides counting semaphores and timed functions for synchronisation.
- All these functions are provided without increasing the memory usage
- significantly, due to the small size of Cschlr. The use of Cschlr also
- improves the readability of source code by reducing the proliferation of
- PeekMessage loops and by providing a facility for the user to organize a
- Windows application by breaking it up into smaller and more manageable pieces
- in the form of threads. And a Windows application written using threads should
- make porting to OS/2 less painful.
- References
- 1. Dror, A., Lafore, R. 0S/2 Presentation Manager Programming Primer.
- McGraw-Hill 1990.
- 2. Petzold, C. Utilizing OS/2 Multithread Techniques in Presentation Manager
- Applications. Microsoft Journal, March 1988.
- 3. Comer, D., Fossum, T. V. Operating System Design Vol. 1. The Xinu Approach.
- Prentice-Hall 1988.
- 4. Gpf Systems, Inc. Gpf. Gpf Systems Inc. 1992.
- Counting Semaphores
- Semaphores are special constructs invented by E. W. Dijkstra for concurrent
- process synchronization. Only two operations are allowed on semaphores: wait
- and signal (originally called P and V operations respectively by Dijkstra).
- There is usually a count and a queue associated with each semaphore. The wait
- and signal operations must be done indivisibly, in the sense that only one
- process can operate on a semaphore at any one time. This can be achieved on a
- single-processor system like the PC by simply disabling interrupts during such
- operations. Wait and signal may be implemented as follows:
- wait(S): If the count of semaphore S is greater than zero, subtract 1 from it
- and continue, otherwise put the calling thread in semaphore S's queue and
- schedule the execution of another ready-to-run thread.
- signal(S): If no thread is waiting on semaphore S, add 1 to its count and
- continue, otherwise move the first thread waiting in semaphore S's queue to
- the ready-to-run queue.
- The wait operation allows a process or thread to relinquish use of the CPU to
- other ready-to-run processes or threads when it is waiting on a certain event
- to occur. When such an event occurs, a signal operation can be used to inform
- the waiting process to proceed with its normal operation.
- A semaphore can be thought of as a jar holding cookies. The count associated
- with the semaphore corresponds to the number of cookies in the jar. Each wait
- operation is analogous to removing a cookie from the jar. When the jar is
- empty, one has to wait until someone tosses in some more cookies (signal
- operations).
- Semaphores serve many useful purposes. For example, they may be used to guard
- against concurrent access of shared variables among a number of concurrent
- threads, or to synchronize the execution of two threads. The following example
- demonstrates the latter:
- Thread #1 Thread #2
-
- {processing) {processing}
-
- wait(checkpoint) signal(checkpoint)
-
- {processing} {processing}
- Here, checkpoint is a semaphore with a count of zero. The jar is empty. When
- thread #1 performs the wait operation, it blocks itself (since the jar is
- empty) until thread #2 signals it to resume (puts a cookie in the jar). And
- this concludes our short introduction to counting semaphores.
- Object Windows
- All windows under OS/2's Presentation Manager are objects. The windows that
- people are familiar with all manifest themselves on the screen -- scroll bars,
- child windows, etc. Object windows, on the other hand, do not have this visual
- property. Since they do not display anything on the screen, they don't have to
- handle many of the events like WM_PAINT, WM_CHAR, etc. They are free to do
- whatever a programmer intends them to do, including carrying out a lengthy
- task and receiving messages at the same time. A programmer defines all
- messages going to an object window and all messages that an object window
- sends. An object window communicates with other windows by posting of these
- messages.
- Figure 1 State transition diagram of the Cschlr scheduler
- Figure 2 Stack layout during thread creation
-
- Listing 1 Interface to Cschlr class
- #ifndef__CSCHLR
- #define__CSHLR
-
- #include <time.h>
- #include "cthread.hpp"
- #include "csemq.hpp"
-
- struct schlr_table
- {
- time_t wakeuptime;
- int next;
-
- };
-
- class Cschlr
- {
- private:
- Csemq readyQ; //ready-to-run queue
- Csemq waitQ; //queue for delayed
- //threads to wait on
- Cthread *task[CSC_NO_THREAD + 3]; //thread objects
- schlr_table *table; //delay table
- //(linked list)
- int head; //head of linked list
- int nTask; //number of threads
- //created
- int nRunning; //running thread number
-
- void Switch(Csemq *sem);
- void WakeUp();
-
- public:
- Cschlr();
- ~Cschlr();
- int CreateThread(THDFN func, int stacksize, void *param);
- void Suicide();
- Csemq* CreateSem(long 1Value);
- void DestroySem(Csemq *sem);
- void Signal(Csemq *sem, long 1MaxCount = 256);
- void Wait(Csemq *sem);
- void Preempt();
- void Sleep(long 1Seconds);
- void GetSemStates(Csemq *sem, long &lCount, int &fWait);
- void Run();
-
- };
-
- #endif
-
- // End of File
-
-
- Listing 2 Member function definitions for class Cschlr
- #include <stdlib.h>
- #include <stdio.h>
- #include "cschlr.hpp"
-
- const int MAIN = CSC_NO_THREAD; //main prgram context
- const int DUMMY = MAIN + 1; //temporary context
- const int EMPTY =-1; //no delayed thread marker
-
- /* The following static varaible is used to facilitate the
- implementation of Kill. This limits the number of Cschlr
- object per Windows application to 1 only */
- static Cschlr *schlr;
-
- /* Used for the implicit termination of a thread when
- execution falls through the thread function body */
- static void Kill()
- {
- schlr->Suicide();
-
- }
-
- /* Used to check if it is time to wake up the sleeping threads. */
- void Cschlr::WakeUp()
- {
- time_t current;
- schlr_table *ptr;
-
- /* check time and dispatch thread */
- time(¤t);
- while ((head != EMPTY) &&
- (current >= (ptr = (table + head))->wakeuptime))
- {
- readyQ.Enqueue(head);
- head = ptr->next;
- ptr->next = EMPTY;
- }
- }
-
- /* Used to perform a task switch. Control is passed
- back to the Windows PeekMessage loop */
- void Cschlr::Switch(Csemq *sem)
- {
- WakeUp();
- sem->Enqueue(nRunning);
- task[nRunning]->Transfer(*task[MAIN]);
- }
-
- /* Constructor to set up the task table */
- Cschlr::Cschlr()
- {
- nTask = 0;
- nRunning = CSC_NO_THREAD;
- head = EMPTY;
- table = new schlr_table[CSC_NO_THREAD];
- schlr = this;
-
- //used to resume main program
- task[MAIN] = new Cthread(NULL, 0, NULL, 0);
-
- //used in Suicide to switch task
- task[DUMMY] = new Cthread(NULL, 0, NULL, 0);
- }
-
- /* Empty destructor */
- Cschlr::~Cschlr()
- {
- //do nothing
- }
-
- /* Used to create a thread and pass to it a user variable */
- int Cschlr::CreateThread(THDFN func, int stacksize, void *param)
- {
- THDFN retaddr; //these two variables must appear
- void *ptr; //together for CreateThread to work
- int i;
-
- int thread = CSC_NO_THREAD;
-
-
- if (nTask < CSC_NO_THREAD + 1)
- {
- for (i = 0; i < CSC_NO_THREAD + 1; i++)
- {
- if (task[i] == NULL)
- break;
- }
- retaddr = Kill; //set return address to point to Kill
- ptr = param; //set user parameter pointer
- task[i] = new Cthread(func, stacksize, (int *) &retaddr, 4);
- readyQ.Enqueue(i);
- thread = nTask++;
- }
-
- return (thread);
- }
-
- /* Used by a calling thread to commit suicide or self-terminate */
- void Cschlr::Suicide()
- {
- int current;
-
- current = nRunning;
- nRunning = readyQ.Dequeue();
- delete task[current];
- task[current] = NULL;
- nTask--;
- task[DUMMY]->Transfer(*task[nRunning]);
- }
-
- /* Used to create a semaphore object */
- Csemq* Cschlr::CreateSem(long lValue)
- {
- return (new Csemq(lValue));
- }
-
- /* Used to destroy a semaphore object */
- void Cschlr::DestroySem(Csemq* sem)
- {
- delete sem;
- }
-
- /* Used to signal a semaphore */
- void Cschlr::Signal(Csemq *sem, long 1MaxCount)
- {
- if (sem->GetType() == CST_COUNT)
- {
- if (sem->GetCount() < lMaxCount)
- sem->UpdateCount(1);
- }
- else
- {
- readyQ.Enqueue(sem->Dequeue());
- }
- }
-
- /* Used to wait on a semaphore */
- void Cschlr::Wait(Csemq *sem)
- {
-
-
- if (sem->GetType() == CST_COUNT)
- {
- if (!sem->GetCount())
- {
- //move running thread to semaphore queue and
- //switch to a ready-to-run thread
- Switch(sem);
- }
- else
- //decrement sempahore count and continue execution
- sem->UpdateCount(-1);
- }
- else
- {
- //move running thread to semaphore queue and
- //switch to a ready-to-run thread
- Switch(sem);
- }
- }
-
- /* Used to give up the cpu voluntarily */
- void Cschlr::Preempt()
- {
- Switch(&readyQ);
- }
- /* Used to put the calling thread to sleep for the
- specified number of seconds */
- void Cschlr::Sleep(long lseconds)
- {
- time_t current;
- schlr_table *ptr;
- schlr_table *thread;
- int prev;
- int next;
-
- current = time(0) + lSeconds; //init wakeup time
-
- if (head == EMPTY)
- { //no thread delayed
- head = nRunning;
- next = EMPTY;
- }
- else //scan delayed threads
- {
- prev = EMPTY;
- next = head;
- while ((next != EMPTY) && (current >=
- (ptr = (table + next))->wakeuptime))
- {
- prev = next;
- next = ptr->next;
- }
- if (prev == EMPTY)
- head = nRunning;
- else
- (table + prev)->next = nRunning;
- }
-
-
- ptr = (table + nRunning);
- ptr->wakeuptime = current;
- ptr->next = next;
-
- Switch(&waitQ);
-
- }
-
- /* Used to retrieve the status of a semaphore */
- void Cschlr::GetSemStates(Csemq *sem, long &lCount, int &fWait)
- {
- if (sem->GetType() == CST_COUNT)
- {
- 1Count = sem->GetCount();
- fWait = 0;
- }
- else
- {
- 1Count = 0;
- fWait = 1;
- }
- }
-
- /* Used by the main PeekMessage loop to multiplex the
- cpu among a number of threads */
- void Cschlr::Run()
- {
- if ((nRunning = readyQ.Dequeue()) != CSC_NO_THREAD)
- task[MAIN]->Transfer(*task[nRunning]);
- else
- WakeUp();
- }
-
- // End of File
-
-
- Listing 3 Header file for class Csemq
- #ifndef _CSEMQ
- #define _CSEMQ
-
- enum csemq_type {CST_COUNT, CST_QUEUE};
- const int CSC_NO_THREAD = 31;
- const long CSC_IDLE = 0x80000000L;
-
- class Csemq
- {
- private:
- long sem;
- int priority;
-
- Csemq();
- Csemq(long Value);
- int Dequeue();
- void Enqueue(int nThread);
- void UpdateCount(long 1Value);
- csemq_type GetType();
- long GetCount ();
-
- friend Cschlr;
-
- };
-
- #endif
-
- // End of File
-
-
- Listing 4 Member function definitions for class Csemq
- #include <stdlib.h>
- #include "csemq.hpp"
-
- /* Constructor */
- Csemq::Csemq()
- {
- sem = 0L;
- priority = CSC_NO_THREAD;
- }
-
- /* Another Constructor */
- Csemq::Csemq(long lValue)
- {
- sem = 1Value;
- priority = CSC_NO_THREAD;
- }
-
- /* Used to return a thread for execution based on a
- round-robin-like scheduling algorithm */
- int Csemq::Dequeue()
- {
- int nTask = CSC_NO_THREAD;
- long task;
-
- long tmp;
- int i;
-
- if (sem < 0)
- {
-
- //this implements a somewhat round-robin scheduling algorithm
- task = _lrotl(1, priority + 1);
- tmp = sem & ~CSC_IDLE;
-
- /* scan the semaphore queue structure bit-by-bit */
- for (i = 0; i < CSC_NO_THREAD + 1; i++)
- {
- if (task & tmp)
- break;
- else
- task = _lrotl(task, 1);
- }
-
- sem &= ~task;
- if (sem == CSC_IDLE)
- sem = 0;
-
- if (i <= CSC_NO_THREAD)
- nTask = long (i + priority + 1) % 32L;
-
- }
-
-
- priority = nTask;
-
- return (nTask);
- }
-
- /* Used to put a thread in the semaphore queue */
- void Csemq::Enqueue(int nThread)
- {
- if ((nThread <= CSC_NO_THREAD) (sem <= 0))
- {
- sem = (1L << nThread) CSC_IDLE ;
- }
- }
-
- /* Used to update the semaphore count by the specified
- amount */
- void Csemq::UpdateCount(long lValue)
- {
- long lTmp;
-
- if (sem >= 0)
- {
- lTmp = sem + lValue;
- if (lTmp >= 0)
- sem = lTmp;
- }
- }
-
- /* Used to return an indicator to the caller telling
- it if the semaphore structure is a counter or a queue */
- csemq_type Csemq::GetType()
- {
- csemq_type nType = CST_COUNT;
-
- if (sem < 0)
- nType = CST_QUEUE;
-
- return (nType);
- }
-
- /* Used to get the count of a semaphore */
- long Csemq::GetCount()
- {
- return (sem);
- }
-
- // End of File
-
-
- Listing 5 Header file for class Cthread
- #ifndef __CTHREAD
- #define __CTHREAD
-
- #include <setjmp.h>
-
- const int THD_MAX_STACK = 0x4000; /* maximum stack size allowed */
-
- typedef void (* THDFN)(void);
-
-
- typedef struct thd_type {
- jmp_buf Context; /* context of thread */
- int TotalLen; /* total length of structure */
- int Overflowed; /* overflow guard */
- int Stack[THD_MAX_STACK]; /* stack of thread */
- } THD, *THDPTR;
-
- class Cthread
- {
- private:
-
- THDPTR threadbody;
-
- Cthread(THDFN func, int stacksize, int *frame, int framesize);
-
- void Transfer(Cthread& thread);
-
- ~Cthread(void);
-
- friend Cschlr;
-
- };
-
- #endif
-
- // End of File
-
-
- Listing 6 Member function definitions for class Cthread
- #include <stdlib.h>
- #include <stdio.h>
- #include "cthread.hpp"
-
- /* This module is compiler-specific. It relies on the
- structure of jmp_buf and the C function prolog and
- epilog to work properly */
-
- /* The following definitions are for the Zortech V3.0
- C++Windows and OS/2 compilers */
-
- #define WORDSTOREMOVE 3 //used for stack adjustment
- typedef struct //jmp_buf map
- {
- long ip; //CS:IP
- short filler1;
- long sp; //SS:SP
- } *mapptr;
-
- /* Used to transfer to the specified thread. The context
- of the running thread is saved in the thread object's
- private data area */
- void Cthread::Transfer(Cthread &thread)
- {
-
- /* don't transfer control if the new thread's
- stack is corrupt */
- if ((thread.threadbody == NULL)
- (thread.threadbody->Overflowed)) return;
-
-
- /* save old thread's context and transfer control */
- if (!setjmp(this->threadbody->Context))
- longjmp(thread.threadbody->Context, 1);
-
- }
-
- /* Used to create a thread object and copy a number of
- parameters to the thread's stack */
- Cthread::Cthread(THDFN func, int stacksize,
- int *frame, int framesize)
- {
-
- int i;
- mapptr ptr;
-
- /* allocate stack */
- stacksize /= sizeof(int); //convert to out allocation unit
- stacksize = (stacksize > THD_MAX_STACK) ? THD_MAX_STACK: stacksize;
-
- threadbody = (THDPTR) new char[i = (sizeof(THD) - sizeof(int) *
- (THD_MAX_STACK - stacksize))];
- threadbody->Overflowed = 0;
- threadbody->TotalLen = i;
- if (threadbody == NULL)
- {
- printf("Thread creation failed...\n");
- }
-
- setjmp(threadbody->Context); //initialize jmp_buf structure
- ptr = (mapptr) &threadbody->Context;
- /* initialize stack with parameters if any */
- if (stacksize)
- {
- if ((frame != NULL) && (stacksize > framesize))
- {
- for (i = 0; i < framesize; i++)
- threadbody->Stack[stacksize - framesize + i] =
- *frame++;
- stacksize -= framesize;
- }
- //set stack pointer
-
- ptr ->sp = (long) &(threadbody->Stack[stacksize -
- WORDSTOREMOVE]);
- }
-
- /* set up the start of the thread body */
- ptr->ip = (long) func;
-
- }
-
- /* Destructor */
- Cthread::~Cthread(void)
- {
- delete [threadbody->TotalLen] (char *) threadbody;
- threadbody = NULL;
- }
-
-
- // End of File
-
-
- Listing 7 Implements main logic of example program BIGJOB.cpp
- /* define externals and prototypes */
- #include "cschlr.hpp"
-
- VOID Timer(void);
- VOID Big(void);
- VOID PaintWindow( PGPFPARMS pGpfParms);
- VOID SelectRep( PGPFPARMS pGpfParms);
- VOID StartSalvage( PGPFPARMS pGpfParms);
- VOID AbortSalvage( PGPFPARMS pGpfParms);
- VOID CreateThreads(VOID);
- VOID PeekMessageLoop(MSG * pMsg, Cschlr *pTask);
- static double Savage(double x);
-
- enum Status {IDLE, RUNNING, DONE, ABORTED};
-
- SHORT idCommand = ID_REP10; //iteration command
- short nIterations; //no. of iterations
- Status status = IDLE; //current status
- SHORT rep[4] = {10, 100, 1000, 10000}; //iteration values
- time_t lStart; //start time
- time_t lTime; //command total time
- BOOL fContinue; //continue flag
-
- int thdtimer, thdbig; //thread objects
- Csemq *semjob, *semtimer; //semaphore objects
- const int STACKSIZE = 4096; //stack size
- Cschlr mtask; //scheduler object
-
- /* Used to execute the Salvage benchmark once */
- static double Savage(double x)
- {
- return tan(atan(exp(log(sqrt(x * x))))) + 1.0;
- }
-
- /* Used to diaply the main window to reflect the state of executeion */
- VOID PaintWindow( PGPFPARMS pGpfParms)
- {
- static char *szFormat[]: {
- "Idling...",
- "Running... %ld seconds passed",
- "Completed in %ld seconds: %d repetitions",
- "Aborted after %ld seconds"
- };
-
- RECT rect;
- char szMsg[50];
-
- GetClientRect(pGpfParms->hwnd, &rect);
- wsprintf(szMsg, szFormat[status], lTime, nIterations);
- DrawText(pGpfParms->hdc, szMsg, -1, &rect,
- DT_SINGLELINE DT_CENTER DT_VCENTER);
-
- }
-
- /* Timer thread function to display the status report
-
- every 5 seconds */
- VOID Timer(VOID)
- {
- int i;
-
- while (1)
- {
- mtask.Wait(semtimer);
- i = 0;
- while (1)
- {
- mtask.Sleep(1L);
- i++;
- if (status == RUNNING)
- {
- if (i % 5 == 0)
- {
- InvalidateRect(hwndMainWindow, NULL, TRUE);
- lTime = time(0) - lStart;
- }
- }
- else
- break;
- }
- }
- }
-
- /* The thread to actually carry out the Salvage benchmark
- for a number of iterations */
- VOID Big(VOID)
- {
- double x;
- int i;
-
- while (1)
- {
- mtask.Wait(semjob);
- for (i = 0; i < nIterations; i++)
- {
- if (!fContinue)
- break;
- x = Savage(1.0);
- mtask.Preempt();
- }
- lTime = time(0) - lStart;
- if (fContinue)
- status = DONE;
- else
- status = ABORTED;
- GpfMenuGray(hwndMainWindow,ID_START,FALSE);
- GpfMenuGray(hwndMainWindow,ID_ABORT,TRUE);
- InvalidateRect(hwndMainWindow, NULL, TRUE);
- }
- }
-
- /* Used to check mark the iteration count selected by the user */
-
- VOID SelectRep( PGPFPARMS pGpfParms)
- {
-
-
- GpfMenuTick(hwndMainWindow,idCommand,FALSE);
- idCommand = pGpfParms->Command;
- GpfMenuTick(hwndMainWindow,pGpfParms->Command,TRUE);
- return;
-
- }
-
- /* Used to start the Salvage benchmark by signalling the thread Big */
- VOID StartSalvage( PGPFPARMS pGpfParms)
- {
-
- GpfMenuGray(hwndMainWindow, ID_ABORT,FALSE);
- GpfMenuGray(hwndMainWindow, ID_START,TRUE);
- nIterations = rep[idCommand - ID_REP10];
- fContinue = TRUE;
- time(&lStart);
- status = RUNNING;
- lTime = 0L;
- InvalidateRect(hwndMainWindow, NULL, TRUE);
-
- mtask.Signal(semjob);
- mtask.Signal(semtimer);
-
- return;
-
- }
-
- /* Used to abort the executing Salvage benchmark */
- VOID AbortSalvage( PGPFPARMS pGpfParms)
- {
- fContinue = FALSE;
- GpfMenuGray(hwndMainWindow, ID_START, FALSE);
- GpfMenuGray(hwndMainWindow,ID_ABORT,TRUE);
-
- return;
-
- }
-
- /* Used to create the Timer and Big threads */
- VOID CreateThreads(VOID)
- {
- /* create mutlitasking environment */
- thdtimer =mtask.CreateThread(Timer, STACKSIZE, NULL);
- thdbig =mtask.CreateThread(Big, STACKSIZE, NULL);
- semjob = mtask.CreateSem(0);
- semtimer = mtask.CreateSem(0);
- GpfMenuGray(hwndMainWindow, ID_ABORT,TRUE);
- GpfMenuTick(hwndMainWindow, ID_REP10,TRUE);
- }
-
- /* Main PeekMessage loop to replace the normal
- GetMessage loop */
- VOID PeekMessageLoop(MSG * pMsg, Cschlr *pTask)
- {
-
- BOOL fGo = TRUE;
-
- while (fGo)
-
- {
- if (pTask != NULL)
- {
- while (!PeekMessage(pMsg,NULL,NULL,NULL,PM_NOREMOVE))
- pTask->Run();
- }
-
- if (GetMessage(pMsg,NULL,NULL,NULL))
- {
- if ((!pMsg->hwnd) (!IsDialogMessage
- (GetActiveWindow(),pMsg)))
- {
- TranslateMessage(pMsg);
- DispatchMessage(pMsg);
- }
-
- }
-
- else
- fGo = FALSE;
-
- }
- }
-
- // End of File
-
-
- Listing 8 Makefile for program BIGJOB
- # ************** bigjob Make File (.Mak) ***************
- #
- # Generated by GPF (Gui Programming Facility) V1.3 Level(01)
- #
- # Program Name : bigjob
- # DataBase Name : No
- # Date and Time : Tue May 25 20:50:32 1993
- # Program Level : 1.0
- # Copyright : Andy Yuen 1993
- #
- #
- **********************************************************
-
- WinOptions = -2 -a1 -c -g -Ju -ml -W2 -x -Jm
- DefOptions = -c -a1 -Ju -mli
- CC = ztc
- AllObj = \
- bigjob.Obj csemq.obj cthread.obj cschlr.obj
-
- # note that Cschlr, Csemq and Cthread should not be compiled to
- # generate Windows function prologs and epilogs
-
- # the following inference rule is for Cschlr Csemq and Cthread
- .cpp.obj:
-
- $(CC) $(DefOptions) $<
-
- bigjob.exe: $(AllObj) bigjob.res
- link @bigjob.l
- rc bigjob.res bigjob.exe
-
-
- bigjob.obj: bigjob.cpp bigjob.h
- $(CC) $(WinOptions) bigjob.cpp
-
- bigjob.Res: bigjob.Rc
- Rc -r bigjob.Rc
-
- # End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Standard C
-
-
- Technical Corrigendum 1
-
-
-
-
- P.J. Plauger
-
-
- P.J. Plauger is senior editor of The C Users Journal. He is convenor of the
- ISO C standards committee, WG14, and active on the C++ committee, WG21. His
- latest books are The Standard C Library, and Programming on Purpose (three
- volumes), all published by Prentice-Hall. You can reach him at
- pjp@plauger.com.
-
-
-
-
- Introduction
-
-
- No sooner had the ANSI C Standard hit the streets in 1989 but people started
- asking questions about it. Some questions were simple requests for
- enlightenment, or how to interpret an apparent ambiguity or oversight. Others
- were direct challenges to the correctness or completeness of the C Standard
- itself. In each case, ANSI rules required that committee X3J11 respond to the
- query as a Request For Interpretation, or RFI.
- X3J11 did its best to respond to four dozen such RFIs. Unfortunately, the
- rules of the game did not permit us to easily change the wording of the C
- Standard. We often had to excuse what we wrote instead of simply making it a
- bit clearer. Equally unfortunately, those dozens of RFIs and the committee
- responses were slow to see the light of day. The first batch was only recently
- approved by ANSI for publication.
- Meanwhile, ISO committee JTC1-/SC22/WG14 assumed more and more responsibility
- for the C Standard. It didn't help that the ISO rules for interpreting and
- fixing language standards were, in their own way, as obscure and inappropriate
- as ANSI's. It was not until the August 1992 plenary meeting of SC22 that we
- laid down sensible procedures for responding to queries and challenges from
- the public.
- WGI4, with lots of cooperation from X3J11, has worked hard for the past year
- or so to catch up. We've now processed nearly five dozen Defect Reports (or
- DRs), the ISO analog to RFIs. All of the responses to date have been gathered
- into a Record of Responses (or RR) that is now being balloted within SC22. By
- the time you read these words, the ballot should be closed. I'll be astonished
- if the ballot yields any serious opposition. Too many experts from too many
- countries have labored for too many years for any major objections to remain
- hidden.
- A companion document to the Record of Responses, called a Technical
- Corrigendum (or TC), summarizes all the changes that WG14 now recommends to
- the ISO C Standard. The RR is not normative but the TC is. Most of the changes
- are designed to clarify wording that can be misread. A few resolve ambiguities
- or patch holes that are hard to argue away. Just one or two definitely change
- the rules of C -- to make the language more like what X3J11 meant instead of
- what we ended up saying. None of the changes add significant new features to
- Standard C, or take any away.
- What I present here are the actual instructions for changing the ISO C
- Standard (ISO/IEC 9899:1990). Be warned that they are in draft form -- they
- can still change in response to comments from the balloting. I expect such
- changes to be small, however. I have extracted changes to the appendixes and
- put them at the end. Otherwise, the changes occur in no particular page order.
- That's the way we responded to the questions as they came in.
- If you have the ANSI version instead (ANSI X3.159:1989), you'll find the
- leading digit of the subclause number differs. Usually, the ISO number is
- three higher. You'll also find that page numbers tend to be off by one,
- mostly. But even if you lack a copy of the C Standard in any form, you should
- make sense out of what follows.
- I've added my own commentary in italics to explain the reason for each change.
- The actual TC reproduces the Defect Report that led to the change, but I lack
- the space to do the same here. The words in boldface are the meta-instructions
- from the TC describing where each change should occur.
-
-
- The Changes
-
-
- Some implementors wanted to avoid copying structures in a function returning a
- structure, even if that meant the return value might overlap a structure
- argument value. We wanted to clarify that this is not permissible:
- Add to subclause 6.6.6.4, page 80:
- The overlap restriction in subclause 6.3.16.1 does not apply to the case of
- function return.
- Example
- In:
- struct s {double i;} f(void);
-
- union {struct {int f1;
- struct s f2;} u1;
- struct {struct s f3;
- int f4;} u2;
- } g;
-
- struct s f(void)
- {
- return g.u1.f2;
- }
-
- /* ... */
-
- g.u2.f3 = f();
- the behavior is defined.
- We missed one or two places where the C grammar is ambiguous. Sometimes it's
- hard to tell from context whether a type definition is being used a different
- way in a nested scope. We generalized the guideline originally laid down by
- Dennis Ritchie:
- In subclause 6.5.4.3, page 68, change:
- In a parameter declaration, a single typedef name in parentheses is taken to
- be an abstract declarator that specifies a function with a single parameter,
- not as redundant parentheses around the identifier for a declarator.
-
- to:
- If, in a parameter declaration, an identifier can be treated as a typedef name
- or as a parameter name, it shall be taken as a typedef name.
- We got the words wrong regarding two declarations for the same name. We meant
- to have the later declaration assume the composite type of the two, even if
- the earlier declaration was in an outer scope. This is a substantive change to
- make Standard C behave as we intended:
- In subclause 6.1.2.6, page 25, change:
- For an identifier with external or internal linkage declared in the same scope
- as another declaration for that identifier, the type of the identifier becomes
- the composite type.
- to:
- For an identifier with internal or external linkage declared in a scope in
- which a prior declaration of that identifier is visible*, if the prior
- declaration specifies internal or external linkage, the type of the identifier
- at the latter declaration becomes the composite type. [*Footnote: As specified
- in 6.1.2.1, the latter declaration might hide the prior declaration.]
- Here is a similar error regarding the determination of storage class. We meant
- the rule to apply across any two scopes, not just file scope and another one,
- so we fixed it:
- In subclause 6.1.2.2, page 21, change:
- If the declaration of an identifier for an object or a function contains the
- storage-class specifier extern, the identifier has the same linkage as any
- visible declaration of the identifier with file scope. If there is no visible
- declaration with file scope, the identifier has external linkage.
- to:
- For an identifier declared with the storage-class specifier extern in a scope
- in which a prior declaration of that identifier is visible*, if the prior
- declaration specifies internal or external linkage, the linkage of the
- identifier at the latter declaration becomes the linkage specified at the
- prior declaration. If no prior declaration is visible, or if the prior
- declaration specifies no linkage, then the identifier has external linkage.
- [*Footnote: As specified in 6.1.2.1, the latter declaration might hide the
- prior declaration.]
- We wanted to clarify how a tentative array definition with unknown size gets
- completed. Adding an example changes no normative wording, but provides a
- useful hint to the reader:
- Add to subclause 6.7.2, page 84:
- Example
- If at the end of the translation unit containing
- int i[];
- the array i still has incomplete type, the array is assumed to have one
- element. This element is initialized to zero on program startup.
- We wanted to clarify that array arguments become pointer arguments rather
- early in the life of a function prototype. You can treat arrays as pointers
- both for determining type compatibility and for forming a composite type:
- In subclause 6.5.4.3, page 68, lines 23-25, change the two occurrences of:
- its type for these comparisons
- to:
- its type for compatibility comparisons, and for determining a composite type.
- A similar confusion recurs on just when a structure type becomes complete. We
- clarified that completion occurs at the closing brace in the structure
- definition:
- In subclause 6.5.2.3, page 62, line 27, change:
- occurs prior to the declaration that defines the content
- to:
- occurs prior to the } following the struct-declaration-list that defines the
- content
- Yet another confusion recurs about when the size of an enumeration is known:
- Add to subclause 6.5.2.3, page 63:
- Example
- An enumeration type is compatible with some integral type. An implementation
- may delay the choice of which integral type until all enumeration constants
- have been seen. Thus in:
- enum f { c = sizeof(enum f)};
- the behavior is undefined since the size of the respective enumeration type is
- not known when sizeof is encountered.
- Some people read the description of fscanf as requiring a conversion failure
- on %n when the input is exhausted. That was not our intent:
- Add to subclause 7.9.6.2, page 138:
- Example
- In:
- #include <stdio.h>
-
- /* ... */
-
- int d1, d2, n1, n2, i;
-
- i = sscanf("123", "%d%n%n%d", &d1, &n1, &n2, &d2);
- the value 123 is assigned to d1 and the value 3 to n1. Because %n can never
- get an input failure the value of 3 is also assigned to n2. The value of d2 is
- not affected. The value 3 is assigned to i.
- We made clearer just what is meant by the implicit initialization of static
- objects to zero:
- In subclause 6.5.7, pages 71-72, change:
- If an object that has static storage duration is not initialized explicitly,
- it is initialized implicitly as if every member that has arithmetic type were
- assigned 0 and every member that has pointer type were assigned a null pointer
- constant.
- to:
- If an object that has static storage duration is not initialized explicitly,
- it is initialized implicitly according to these rules:
- if it has pointer type, it is initialized implicitly to a null pointer
- constant;
- if it has arithmetic type, it is initialized implicitly to zero;
- if it is an aggregate, every member is initialized (recursively) according to
- these rules;
- if it is a union, the first named member is initialized (recursively)
- according to these rules.
- It was not completely clear that a newline always ends a preprocessing
- directive:
- Add to subclause 6.8, page 86, Description:
- A new-line character ends the preprocessing directive even if it occurs within
- what would otherwise be an invocation of a function-like macro.
- Some situations in Standard C are described as both constraint violations and
- undefined or implementation-defined behavior. We decided to clarify the
- precedence of errors:
-
- Add to subclause 5.1.1.3, page 6:
- If a construct violates a constraint and is also specified as having undefined
- or implementation-defined behavior the constraint takes precedence.
- Example
- An implementation shall issue a diagnostic for the translation unit:
- char i;
- int i;
- because in those cases where wording in this International Standard describes
- the behavior for a construct as being both a constraint error and resulting in
- undefined behavior, the constraint error shall be diagnosed.
- Some people felt it was not obvious enough that the members of a structure or
- union inherit its storage class:
- Add to subclause 6.5.1, page 58:
- A declaration of an aggregate or union with a storage-class specifier other
- than typedef implicitly causes all of its members to be given the
- storage-class specifier.
- We wanted to clarify that assignment to a narrower type does indeed
- effectively stuff the value through a knothole, scraping off high-order bits:
- Add to subclause 6.3.16.1, page 54:
- Example
- In the fragment:
- char c;
- int i;
- long l;
-
- l = ( c = i );
- the value of i is converted to the type of the assignment-expression c = i,
- that is, char type. The value of the expression enclosed in parenthesis is
- converted to the type of the outer assignment-expression, that is, long type.
- Some people were confused about the meaning of "ignored" when talking about
- unnamed structure or union members during initialization:
- In subclause 6.5.7, page 71, line 39, change:
- All unnamed structure or union members are ignored during initialization.
- to:
- Except where explicitly stated otherwise, for the purposes of this subclause
- unnamed members of objects of struct and union type do not participate in
- initialization. Unnamed members of struct objects have indeterminate value
- even after initialization. A union containing only unnamed members has
- indeterminate value even after initialization.
- In subclause 6.5.7, page 72, lines 4-5, change:
- The initial value of the object is that of the expression:
- to:
- The initial value of the object, including unnamed members, is that of the
- expression:
- How macros get expanded is a source of confusion to many. We added yet another
- example to help clarify this difficult topic:
- Add to subclause 6.8.3.3, page 90:
- Example
- #define hash_hash # ## #
- #define mkstr(a) # a
- #define in_between(a) mkstr(a)
- #define join(c, d) in_between(c hash_hash d)
-
- char p[] = join(x, y);
- /* equivalent to char p[] = "x ## y"; */
- The expansion produces, at various stages:
- join(x, y)
-
- in_between(x hash_hash y)
-
- in_between(x ## y)
-
- mkstr(x ## y)
-
- "x ## y"
- In other words, expanding hash_hash produces a new token, consisting of two
- adjacent sharp-signs, but this new token is not the catenation operator.
- Here's a one-word change, to clarify that we are talking about identifiers in
- general and not some (unspecified) one in particular:
- In subclause 7.1.2, page 96, lines 34-35, change:
- However, if the identifier is declared or defined in more than one header,
- to:
- However, if an identifier is declared or defined in more than one header,
- The functions ftell and fgetpos can often fail. Only values returned by
- successful calls are permitted in certain contexts:
- In subclause 7.9.9.2, page 145, lines 39-40, change:
- a value returned by an earlier call to the ftell function
- to:
-
- a value returned by an earlier successful call to the ftell function
- In subclause 7.9.9.3, page 146, lines 10-11, change:
- a value obtained from an earlier call to the fgetpos function
- to:
- a value obtained from an earlier successful call to the fgetpos function
- We really didn't say clearly what is the type of a function call expression:
- In subclause 6.3.2.2, page 40, line 35, change:
- The value of the function call expression is specified in 6.6.6.4.
- to:
- If the expression that denotes the called function has type pointer to
- function returning an object type, that object type is the type of the result
- of the function call. The value of the function call is determined by the
- return statement that executes within the called function, as specified in
- 6.6.6.4. Otherwise, the function call has type void.
- We used two different terms for "iteration structures" and "control
- structures." This change eliminates the form we used only once:
- In subclause 5.2.4.1, page 13, lines 1-2, change:
- -- 15 nested levels of compound statements, iteration control structures, and
- selection control structures
- to:
- -- 15 nested levels of compound statements, iteration statements, and
- selection statements
- Some readers insisted on believing that an expression such as: x<3&&0>x must
- be parsed to include the token <3&&0>, and hence requires a diagnostic. It was
- easier to add a sentence to the C Standard than to continue to fight such
- perversity:
- Add to subclause 6.1, page 18:
- There is one exception to this rule: a header-name preprocessing token is only
- recognized within a #include preprocessing directive, and within such a
- directive, a sequence of characters that could be either a header-name or a
- string-literal is recognized as the former.
- Here is a similar, but milder, form of the same perversity:
- Add to subclause 6.1.2, page 20:
- When preprocessing tokens are converted to tokens during translation phase 7,
- if a preprocessing token could be converted to either a keyword or an
- identifier, it is converted to a keyword.
- More cleanup of header-name parsing:
- In subclause 6.1.7, page 32, delete:
- Constraint
- Header name preprocessing tokens shall only appear within a #include
- preprocessing directive.
- Add to subclause 6.1.7, page 32:
- The header-name preprocessing token is recognized only within a #include
- preprocessing directive.
- The %0 conversion specifier in fprintf has some subtle implications. It is not
- the same as forcing a zero fill. Nor is it the same as forcing increased
- precision:
- In subclause 7.9.6.1, page 132, lines 37-38, change:
- For 0 conversion, it increases the precision to force the first digit of the
- result to be a zero.
- to:
- For 0 conversion, it increases the precision, if and only if necessary, to
- force the first digit of the result to be a zero.
- Similarly, the matching rules for fscanf seem to need no end of clarification:
- In subclause 7.9.6.2, page 135, change:
- An input item is defined as the longest matching sequence of input characters,
- unless that exceeds a specified field width, in which case it is the initial
- subsequence of that length in the sequence.
- to:
- An input item is defined as the longest sequence of input characters which
- does not exceed any specified field width and which is, or is a prefix of, a
- matching input sequence.
- In subclause 7.9.6.2, page 137, delete:
- If conversion terminates on a conflicting input character, the offending input
- character is left unread in the input stream.
- Add to subclause 7.9.6.2, page 137:
- fscanf pushes back at most one input character onto the input stream.*
- Therefore, some sequences that are acceptable to strtod, strtol, or strtoul
- are acceptable to fscanf. [*Footnote: If conversion terminates on a
- conflicting input character, the offending input character is left unread in
- the input stream.]
- The following change started out in an entirely different arena. We wanted to
- clarify that an implementation can add extra identifier characters, such as $,
- provided that it issues a diagnostic when they're used. But we discovered an
- ambiguity in how such extra characters would parse in a macro definition. So
- we decided to resolve the ambiguity and make the extension more usable:
- Add to subclause 6.8, page 86, Constraints:
- If the first character of a replacement-list is not a member of the minimal
- basic source character set*, there shall be white-space separation between the
- identifier and the replacement-list. [*Footnote: "Minimal basic source
- character set" refers to the 90-odd basic source characters listed in
- subclause 5.2.1.]
- We thought it was clear enough that library macros should be written sensibly,
- but not everyone seemed to agree:
- Add to subclause 7.1.2, page 96:
- Any definition of a macro described in this clause shall expand to code that
- is fully protected by parentheses where necessary, so that it groups in an
- arbitrary expression as if it were a single identifier.
- Here's a small but potentially misleading gaffe in an example:
- Change subclause 7.12.2.3, page 172, line 16, from:
- if (mktime(&time_str) == -1)
- to:
- if (mktime(&time_str) ==
- (time_t)-1)
- And a similar error in the index:
- In the index, page 217, change:
- static storage-class specifier, 3.1.2.2, 6.1.2.4, 6.5.1, 6.7
- to:
- static storage-class specifier, 6.1.2.2, 6.1.2.4, 6.5.1, 6.7
- When we listed the rules for aliasing (accessing the same object by lvalues
- with different types), we were overly restrictive in describing the kinds of
- qualified types that are valid:
-
- In subclause 6.3, page 38, lines 18-21, change:
- An object shall have its stored value accessed only by an lvalue expression
- that has one of the following types:36
- -- the declared type of the object,
- -- a qualified version of the declared type of the object,
- to:
- An object shall have its stored value accessed only by an lvalue expression
- that has one of the following types: 36
- -- a type compatible with the declared type of the object,
- -- a qualified version of a type compatible with the declared type of the
- object,
- Some of the functions declared in <string.h> take a length argument, which can
- be zero. We spelled out what happens when that argument is zero:
- Add to subclause 7.11.1, page 162:
- Where an argument declared as size_t n determines the length of the array for
- a function, n can have the value zero on a call to that function. Unless
- explicitly stated otherwise in the description of a particular function,
- pointer arguments on such a call must still have valid values, as described in
- subclause 7.1.7 Use of library functions. On such a call, a function that
- copies characters shall copy zero characters, while a function that compares
- two character sequences shall return zero.
- We made clear that the macros for signal numbers defined in <signal. h> must
- have distinct values:
- In subclause 7.7, page 120, lines 14-16, change:
- and the following, each of which expands to a positive integral constant
- expression that is the signal number corresponding to the specified condition:
- to:
- and the following, which expand to positive integral constant expressions with
- distinct values that are the signal numbers, each corresponding to the
- specified condition:
-
-
- Listing Undefined Behavior
-
-
- The UK delegation to WG14 wants a complete list of undefined behaviors in
- Appendix G.2. This is part of an ongoing effort to round out that list:
- Add to subclause G.2, page 204:
- -- A program contains no function called main.
- Add to subclause G.2, page 204:
- -- A storage-class specifier or type-qualifier modifies the keyword void as a
- function parameter-type-list.
- Add to subclause G.2, page 204:
- -- For an array of arrays, the permitted pointer arithmetic in subclause
- 6.3.6, page 47, lines 12-40 is to be understood by interpreting the use of the
- word "object" as denoting the specific object determined directly by the
- pointer's type and value, not other objects related to that one by contiguity.
- Therefore, if an expression exceeds these permissions, the behavior is
- undefined. For example, the following code has undefined behavior:
- int a[4][5];
-
- a[1][7] = 0; /* undefined */
- Some conforming implementations may choose to diagnose an "array bounds
- violation," while others may choose to interpret such attempted accesses
- successfully with the "obvious" extended semantics.
- Add to subclause G.2, page 204:
- -- If a fully expanded macro replacement list contains a function-like macro
- name as its last pre-processing token, it is unspecified whether this macro
- name may be subsequently replaced. If the behavior of the program depends upon
- this unspecified behavior, then the behavior is undefined.
- Example
- Given the definitions:
- #define f(a) a*g
- #define g(a) f(a)
- the invocation:
- f(2)(9)
- results in undefined behavior. Among the possible behaviors are the generation
- of the preprocessing tokens:
- 2*f(9)
- and
- 2*9*g
- Add to subclause G.2, page 204:
- -- A call to a library function exceeds an Environmental limit.
-
-
- Conclusion
-
-
- I believe these changes are reasonably minor. The C Standard is honored by
- dozens of vendors, required by hundreds of customers, and validated by several
- agencies around the world. Yet it has seen remarkably few challenges for all
- that. Put another way, the C Standard has held up pretty well these past five
- years.
- Still, it doesn't hurt to fix the obvious flaws. The fewer ambiguities in a
- document, the fewer misunderstandings result. And, of course, the fewer
- questions get directed to those of us who generate answers.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Stepping Up To C++
-
-
- How Virtual Functions Work
-
-
-
-
- Dan Saks
-
-
- Dan Saks is the founder and principal of Saks & Associates, which offers
- consulting and training in C++ and C. He is secretary of the ANSI and ISO C++
- committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of
- the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach
- him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601,
- or electronically at dsaks@wittenberg.edu.
-
-
- Last month, I introduced virtual functions. (See "Stepping Up to C++: Virtual
- Functions," CUJ, December 1993). I outlined how you might use virtual
- functions to implement a device-independent file system, and showed in detail
- how to create a class of geometric shapes with polymorphic behavior. I
- continue this month by explaining the mechanics of how virtual functions work.
- But first, a brief recap of the key concepts.
- Public inheritance in C++ defines an Is-A relationship between a derived class
- and its base. That is, given the definition
- class D : public B { ... };
- D is publicly derived from B, and any D object is also a B object. A function
- that expects a pointer (or reference) to a B object as its formal parameter
- will accept a pointer (or reference, respectively) to a D object as well.
- More generally, converting a pointer to a D object into a pointer to a B
- object is a standard conversion that does not require a cast. For example, if
- d is an object of class D and B is a public base class of D you can write
- B *pb = &d;
- which converts &d (an expression of type D *) to B *. Binding a D object to a
- B & is also a standard conversion.
- When discussing the behavior of pointers to base and derived objects, it helps
- to distinguish the static type of an object from its dynamic type. The static
- type of an object is the type of the expression used to refer to that object.
- The dynamic type of an object is its "actual" type -- the object's type at the
- point where it was created.
- For example, using pb as defined and initialized above, *pb has static type B
- but dynamic type D. Or, given
- B &rb = d;
- rb has static type B but dynamic type D. The static type of *pb is always B,
- but its dynamic type may change during program execution. For example, if b is
- a B object, then
- pb = &b;
- changes the dynamic type of *pb to B.
- A derived class inherits all the members of its base class. A derived class
- cannot discard any members it inherits, but it can override the definition of
- an inherited member function with a new definition.
- In C++, non-static member functions are non-virtual by default. C++ resolves
- non-virtual member function calls by static binding. That is, if pb is
- declared as a B *, and B has a non-virtual member function , then pb->f always
- calls B's f. Even if, at the time of the call, pb actually points to a D
- object (where D is derived from B and overrides ), calling pb->f still invokes
- B's f, not D's f.
- On the other hand, virtual member function calls bind dynamically. If pb is a
- B * and f is declared as a virtual member function in B, then calling pb->f
- calls f associated with the dynamic type of *pb. Thus, if pb actually points
- to a B object, then pb->f calls B's f. But, if pb actually points to a D
- object (where D is derived from B and overrides f), then pb->f calls D's f.
- A class with at least one virtual member function is called a polymorphic
- type, and objects of such type exhibit polymorphism. Polymorphism lets you
- define a single interface for a hierarchy of subtypes that exhibit logically
- similar, but physically different, behavior. Using polymorphism, you can pass
- pointers or references to objects of a derived class type to functions that
- know the object only by its base class type. Yet, the object retains its
- dynamic type so that member function calls applied to that object invoke the
- derived class behavior.
- Listing 1 shows the class definition for shape, a polymorphic class for
- geometric shapes that I described last month. Listing 2 shows the
- corresponding member function and static data member definitions. Class shape
- has three virtual functions, area, name and put, and two non-virtual
- functions, color and shape (a constructor). Listing 3 and Listing 4 show the
- complete definitions for classes circle and rectangle, respectively, both
- derived from shape. Each derived class defines it own constructor
- (constructors are not inherited), and also overrides each of its inherited
- virtual functions with appropriate definitions.
- Listing 5 contains a function that illustrates the power of polymorphism.
- Function largest locates the shape with the largest area from a collection of
- shapes. Since shape is a polymorphic type, calling sa[i]->area returns the
- area of a shape without the caller ever knowing exactly what kind of shape
- *sa[i] really is.
-
-
- vptrs and vtbls
-
-
- Both the ARM (Ellis and Stroustrup [1990]) and the emerging C++ standard take
- pains to describe the behavior of virtual functions, as well as the rest of
- the C++ language, without suggesting any particular implementation strategy.
- However, the ARM does describe implementation techniques in the commentary at
- the end of chapter 10 on derived classes. I find that relying on a model
- implementation simplifies the description of many details of the behavior of
- virtual functions. The following is one such model.
- Typical C++ implementations add a single pointer to each object of a
- polymorphic class. That pointer is called a vptr (pronounced "VEE-pointer").
- Whenever a constructor for a polymorphic class initializes an object, its sets
- that object's vptr to the address of a table of function pointers called a
- vtbl ("VEE-table"). Each entry in the vtbl is the address of a virtual
- function. All objects of a given class share the same vtbl; that vtbl contains
- exactly one entry for each virtual function in that class.
- For example, Figure 1 shows the layout for an object of class shape (as
- defined in Listing 1) along with its corresponding vtbl. Every shape object
- has the same two fields in the same order: a vptr and a _colo data member. The
- vptr points to shape's vtbl, which contains the addresses of the virtual
- functions shape: :area, shape::name and shape::put. The non-virtual functions
- shape::color and shape::shape (the constructor) don't use any space in the
- vtbl, nor in the object itself.
- Figure 2 and Figure 3 show the layouts for circle and rectangle objects,
- respectively (as defined in Listing 3 and Listing 4, respectively), along with
- their corresponding vtbls. Notice that the initial portions of both circle
- objects and rectangle objects are shape objects, so a pointer to a circle or a
- rectangle is a pointer to a shape, and the conversion from circle * or
- rectangle * to shape * requires no pointer arithmetic.
- The vtbls for both derived classes have their function pointers in the same
- order as the vtbl for the base class, although the pointer values differ. For
- example, the vtbl entry for the area function is always first in every class
- derived (directly or indirectly) from shape. The vtbl entry for name is always
- second, and the entry for put is always third.
- Whereas a non-virtual function call generates a call instruction that refers
- directly to the function's address as determined during translation (compiling
- and linking), a virtual function call generates additional code to locate the
- function's address in the vtbl.
- The ARM suggests viewing a vtbl as an array of function addresses, so that
- each call locates the called function by subscripting into the vtbl. For
- example, if ps is a pointer to a shape, then
- a = ps->area ( );
- translates into something like
- a = (*(ps->vtbl[0]))(ps);
- and
- ps->put(cout);
- translates into
- (*(ps->vtbl[2]))(ps, cout);
- An expression of the form ps->vtbl [n] is the th entry in the vtbl of the
- object *ps, so (*(ps->vtbl [n])) is the nth virtual function itself. Actually,
- in C++ as in C, you need not explicitly dereference a pointer to a function in
- a call expression. Thus you can write
- (*(ps->vtbl[2])) (ps, cout);
- as simply
- (ps->vtbl[2])(ps, cout);
- Each virtual function may have a different signature (sequence of formal
- parameter types) and return type. So strictly speaking, you can't implement
- vtbls as arrays of pointers because an array requires all its elements to have
- the same type. For example, shape::area has type double (*)() and shape::put
- has type void (*)(ostream &).
-
- I prefer to model a vtbl as a struct in which all the members are pointers to
- functions. For instance, you can define the struct type for a vtbl for shapes
- as something like
- struct shape_virtual_table
- {
- double (*area)();
- const char *(*name)();
- ostream &(*put)(ostream &os);
- };
- and define the actual shape vtbl as something like
- shape_virtual_table shape_vtbl =
- {
- &shape::area,
- &shape::name,
- &shape::put
- };
- Similarly, you can define the circle vtbl as something like
- shape_virtual_table circle_vtbl =
- {
- &circle::area,
- &circle::name,
- &circle::put
- };
- (I say "something like" because this code won't actually compile. The code
- only demonstrates the general layout of the vtbls.) Using this translation
- model,
- a = ps->area();
- translates into
- a = (*ps->vtbl->area)(ps);
- or simply
- a = ps->vtbl->area(ps);
- and
- ps->put(cout);
- translates into
- (*ps->vtbl->put)(ps, cout);
- or just
- ps->vtbl->put(ps, cout);
- A virtual function call with n arguments translates into a call (through a
- vtbl entry) with n+1 arguments. The additional argument is always the address
- of the object to which function applies; in the examples above, its value is
- always ps. The additional argument becomes the value of this inside the called
- function. Virtual functions cannot be static members, so they always have an
- implicit this argument.
- Bear in mind that I'm describing only a typical implementation strategy.
- (Colvin [1993] describes a similar implementation of virtual method tables
- generated with the aid of macros that support object-oriented programming in
- C.) C++ translators may implement virtual functions somewhat differently, but
- the effect must be the same. The vptr need not be at the beginning of each
- polymorphic object. But, for any class D derived from polymorphic class B, D's
- vptr must be at the same offset within D as B's vptr is within B. Similarly,
- the function pointers in the vtbl need not be in the same order as the virtual
- function declarations in the class. But, for any D derived from a polymorphic
- B, the initial portion of D's vtbl must have the same layout as B's vtbl, even
- if D's actual pointer values differ from B's because D has overridden some of
- the virtual functions it inherited.
- In short, a C++ translator must insure that the base subobject of any derived
- object has the same layout as any other object of the same base type, and the
- base portion of a derived class vtbl must have the same layout as the base
- class vtbl. Hence, a translator need not see the declarations for any derived
- classes when translating a virtual call. Regardless of p's dynamic type, a
- virtual call such as p->f always translates into code to
- construct f's actual argument list
- follow p's vptr to a vtbl
- transfer control to the function whose address is in the vtbl entry
- corresponding to f.
- All polymorphic objects with a given dynamic type can share the same physical
- vtbl. Some C++ implementations actually manage to eliminate duplicate vtbls.
- Others produce multiple copies of vtbls, either due to limitations of the
- development environment or to provide better system performance. Many
- implementations offer compiler and linker options to let you decide.
- This implementation model shows that virtual functions introduce a small cost
- in both space and time:
- Adding one or more virtual functions to a class that previously had none adds
- a vptr to each object of that class.
- Each polymorphic class adds at least one more vtbl to the program's data
- space.
- Every constructor for a polymorphic class must initialize a vptr.
- Every virtual function call must locate the address of the function by looking
- in a vtbl (requiring typically 2 to 4 additional machine instructions).
- In C++, member functions are non-virtual by default because C++ tries to
- adhere to the principle that "you don't pay for what you don't use." If you're
- willing to pay for a virtual function call, you must say so explicitly.
-
-
- Selective Overriding
-
-
- A derived class may override all, some, or none of the virtual functions in
- its base class. A derived class inherits the function definitions for all
- virtual functions it does not override. Listing 6 and Figure 4 together
- illustrate the effects of selectively overriding only some of the virtual
- functions inherited from a base class.
- Listing 6 shows a simple class hierarchy and Figure 4 shows the corresponding
- vtbls. Class B defines three virtual functions, f, g, and h. Class C derived
- from B overrides only function f, so the entries for g and h in C's vtbl
- continue to point to B's g and h. Class D derived from C overrides only
- function h, so the entries for f and g in D's vtbl are the same as in C's
- vtbl. Since neither C nor D overrides g, all three vtbl's have the same value
- for g's entry, namely B's g.
- In Listing 6, pc has static type C *. But by the time program execution
- reaches the declaration
- B &rb = *pc;
- pc has dynamic type D *. Thus all the calls applied to rb use D's vtbl.
-
-
- Pure Virtual Functions
-
-
-
- Sometimes when you design a type hierarchy, you find that you don't want to
- create the hierarchy's base class object. Rather, the base class serves only
- as the specification for a common interface for objects of types derived from
- it.
- For example, in a hierarchy that implements a device-independent file system
- (such as the one I sketched in my last column), the base class file defines
- the properties common to all file types. The derived classes define specific
- file types, like disk_file or tape_file, that are types for the real-live file
- objects in the system. But there is no such thing as a file that is not a
- disk_file, or a tape_file, or some other device-specific file. The base class
- only specifies the common file interface.
- In my shape hierarchy, I never really wanted to create objects whose dynamic
- type is shape. Class shape is only supposed to define the common properties
- for shapes. I implemented class shape so that objects of that type appear to
- be points, not because I needed a shape that's a point, but because I hadn't
- yet presented a way to avoid defining the function bodies for virtual
- functions. In fact, with early C++ dialects, you had no choice but to define
- phony function bodies. Now you can simply declare the functions as pure
- virtual functions.
- You declare a function as a pure virtual function by adding the pure virtual
- specifier = 0 at the end of the function declaration. Listing 7 shows the
- definition for class shape with area, name, and put as pure virtual functions.
- A base class with at least one pure virtual function is called an abstract
- base class. You cannot declare objects of an abstract base class. However, you
- can declare pointers and references to an abstract base class:
- shape s; // error
- void f(shape *ps); // ok
- shape &rs; // ok
- If a derived class does not override every pure virtual function in the base
- class with an "impure" virtual function, then the derived class is also an
- abstract base class. For example, class D in Listing 8 is an abstract base
- class because it fails to override pure virtual function g with a function
- definition.
- In the model implementation, the vtbl entry for a pure virtual function is a
- null pointer.
- References
- Colvin [1993]. Gregory Colvin. "Extending C for Object-Oriented Programming,"
- The C Users Journal, Vol. 11, No. 7, July 1993.
- Ellis and Stroustrup [1990]. Margaret A. Ellis and Bjarne Stroustrup. The
- Annotated C++ Reference Manual. Addison- Wesley.
- Figure 1 Layout of class shape
- Figure 2 Layout of class circle
- Figure 3 Layout of class rectangle
- Figure 4 Selectively overriding only some virtual functions
-
- Listing 1 A base class for shapes
- class shape
- {
- public:
- enum palette { BLUE, GREEN, RED };
- shape(palette c);
- virtual double area() const;
- virtual const char *name() const;
- virtual ostream &put(ostream &os) const;
- palette color() const;
- private:
- palette_color;
- static const char *color_image[RED - BLUE + 1];
- };
-
- inline ostream &operator<<(ostream &os, const shape &s)
- {
- return s.put(os);
- }
-
- // End of File
-
-
- Listing 2 Member function and static member data definitions for class shape
- shape::shape(palette c) : _color(c) { }
-
- shape::palette shape::color( ) const
- {
- return _color;
- }
-
- double shape::area() const
- {
- return 0;
- }
-
- const char *shape::name() const
- {
- return "point";
-
- }
-
- ostream &shape::put(ostream &os) const
- {
- return os << color_image[_color] << '' << name();
- }
-
- const char *shape::color_image[shape::RED - shape::BLUE + 1] =
- { "blue", "green", "red" };
-
- // End of File
-
-
- Listing 3 Class circle derived form shape
- class circle : public shape
- {
- public:
- circle(palette c, double r);
- double area() const;
- const char *name() const;
- ostream &put(ostream &os) const;
- private:
- double radius;
- };
-
- circle::circle(palette c, double r) : shape(c), radius(r) { }
-
- double circle::area() const
- {
- const double pi = 3.1415926;
- return pi * radius * radius;
- }
-
- const char *circle::name() const
- {
- return "circle";
- }
-
- ostream &circle::put(ostream &os) const
- {
- return shape::put(os) << "with radius = " << radius;
- }
-
- // End of File
-
-
- Listing 4 Class rectangle derived from shape
- class rectangle : public shape
- {
- public:
- rectangle(palette c, double h, double w);
- double area() const;
- const char *name() const;
- ostream &put(ostream &os) const;
- private:
- double height, width;
- };
-
- rectangle::rectangle(palette c, double h, double w)
-
- : shape(c), height(h), width(w) { }
-
- double rectangle::area() const
- {
- return height * width;
- }
-
- const char *rectangle::name() const
- {
- return "rectangle";
- }
-
- ostream &rectangle::put(ostream &os) const
- {
- return shape::put(os) << " with height = " << height
- << " and width = " << width;
- }
-
- // End of File
-
-
- Listing 5 A function that returns the shape with the largest area in a
- collection of shapes
- const shape *largest(const shape *sa[], size_t n)
- {
- const shape *s = 0;
- double m = 0;
- double a;
- for (size_t i = 0; i < n; ++i)
- if ((a = sa[i]->area()) > m)
- {
- m = a;
- s = sa[i];
- }
- return s;
- }
-
- // End of File
-
-
- Listing 6 Selective virtual overriding
- #include <iostream.h>
-
- class B
- {
- public:
- virtual void f();
- virtual void g();
- virtual void h();
- };
-
- class C : public B
- {
- public:
- void f(); // virtual
- };
-
- class D : public C
- {
- public:
-
- void h(); // virtual
- };
-
- void B::f() { cout << "B::f()\n"; }
-
- void B::g() { cout << "B::g()\n"; }
-
- void B::h() { cout << "B::h()\n"; }
-
- void C::f() { cout << "C::f()\n"; }
-
- void D::h() { cout << "D::h()\n"; }
-
- int main()
- {
- C c;
- D d;
-
- B *pb = &c; // ok, &c is a C * which is a B *
- pb->f(); // calls C::f()
- pb->g(); // calls B::g()
- pb->h(); // calls B::h()
-
- C *pc = &d; // ok, &d is a D * which is a C *
- pc->f(); // calls C::f()
- pc->g(); // calls B::g()
- pc->h(); // calls D::h()
-
- B &rb = *pc; // ok, *pc is a C which is a B
- rb.f(); // calls C::f()
- rb.g(); // calls B::g()
- rb.h(); // calls D::h()
-
- return 0;
- }
-
- // End of File
-
-
- Listing 7 An abstract base class for shapes
- class shape
- {
- public:
- enum palette { BLUE, GREEN, RED };
- shape(palette c);
- virtual double area() const = 0;
- virtual const char *name() const = 0;
- virtual ostream &put(ostream &os) const = 0;
- palette color() const;
- private:
- palette _color;
- static const char *color_image[RED - BLUE + 1];
- };
-
- inline ostream &operator<<(ostream &os, const shape &s)
- {
- return s.put(os);
- }
-
-
- // End of File
-
-
- Listing 8 Derived class that doesn't override all pure virtual functions is
- still abstract
- class B
- {
- public:
- virtual void f();
- virtual void g() = 0;
- };
-
- void B::f() { ... }
-
- class D : public B
- {
- public:
- void f(); // virtual
- // g is still pure virtual
- };
-
- void D::f() { ... }
-
- int main()
- {
- B b; // error, B is abstract class
- D d; // error, D is abstract class
- ...
- }
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Questions and Answers
-
-
- Lint for C++
-
-
-
-
- Kenneth Pugh
-
-
- Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++
- language courses for corporations. He is the author of C Language for
- Programmers and All On C, and was a member of the ANSI C committee. He also
- does custom C programming for communications, graphics, image databases, and
- hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707.
- You may fax questions for Ken to (919) 489-5239. Ken also receives email at
- kpugh@allen.com (Internet) and on Compuserve 70125,1142.
-
-
- A few months ago, a reader asked a question regarding the availability of Lint
- for C++. I replied that there was no product available at that time. In
- response to the column, Gimpel software, makers of PC-lint, sent me a beta
- copy of their new PC-lint for C/C++. I've used their C product on a number of
- programs with great success in finding obscure bugs -- especially on programs
- I inherited form other people. Their new version provides the same types of
- error analysis for C++ programs as the older versions did for C programs.
- The new version of PC-lint adds a number of C++ specific error messages.
- PC-lint analyzes the relationship of class data and class member functions and
- provides warnings about some common errors in class design. No existing
- compiler that I know about flags these errors. For example, PC-lint can report
- when a destructor for a base class is not virtual or does not exist. In this
- situation, using a collection of base class pointers that contain pointers to
- the derived classes will result in a call to the wrong destructor.
- Another common mistake that PC-lint catches is the use of a constructor for a
- class that calls new, without having a copy constructor or assignment operator
- declared for the class. Normally a class that allocates memory requires the
- declaration of an explicit copy constructor and an explicit assignment
- operator. These member functions usually need to allocate memory, unless the
- class implementation uses some form of reference counting.
- The new version of PC-lint also adds a few more C warnings, such as the
- warning issued for a missing semicolon at the end of a structure definition.
- Heeding this warning would eliminate the relatively obscure compiler messages
- that result from code such as the following:
- struct my_struct
- {
- int member;
- }
- function(int x)
- {
- ...
- }
- An important purpose of the original lint program was to catch parameter type
- mismatches. C++ has eliminated the need for this check by making function
- prototypes mandatory, but it has added the potential for a number of problems
- resulting from the misuse of classes. According to Gimpel Software, the list
- of possible C++ warnings that are analyzed will expand in the next version.
-
-
- A Discussion of Object Orientation
-
-
- I just spent a week teaching an advanced C course. Most of the prepared
- material covered syntax and construction of complex objects, but all of the
- students were more interested in the design of objects.
- Designing good classes is one of the most rewarding aspects of C++. Using bad
- classes can be one of the most frustrating. Therefore, it's useful to develop
- criteria for determining how well a class is designed.
- The subjective measurements of a class's quality include coupling,
- cohesiveness, sufficiency, primitiveness, and completeness. Coupling is the
- degree of interdependence between objects. Classes that are friends of other
- objects are highly coupled; classes that just depend on the existence of other
- objects are loosely coupled.
- In a highly cohesive class, the data and function members work together as a
- single abstraction.
- The last three criteria, sufficiency, primitiveness, and completeness. A class
- should have a sufficient interface, which is one that provides all required
- operations. The class needs to contain all primitive operations, which are
- those requiring access to the hidden implementation. Finally, a class is
- complete if it provides all possible operations that a user might want to
- perform on it or with it. A class that is primative is easier to port to
- another system or to modify, as it has fewer member functions than a complete
- class. On the other hand, a complete class can be easier for users to employ
- in their code.
- An example is in order here. Suppose you created a file class called File. The
- most primitive interface for this class might look like:
- class File
- {
- public:
- File(char * name, int mode);
- ~File();
- int read(char * buffer, int length);
- int write(char * buffer, int length);
- };
- With this interface, the constructor might throw an exception if it could not
- open the file. Alternatively, read and write might return the appropriate
- error indication if the file could not be opened. The destructor will close
- the file if the constructor was able to open it. For simplicity's sake, I used
- an int for the constructor's mode parameter. You could use some type of
- enumerated parameter (e.g. enum FILE_ACCESS_TYPE {READ_ONLY, WRITE_ONLY, ....}
- ) instead for a clearer functional interface.
- The following primitive interface is similar:
- class File
- {
- public:
- File();
- ~File();
- int open(char * name, int mode);
- int read(char * buffer, int length);
- int write(char * buffer, int length);
- int close();
-
- };
- Notice, however, that the meaning of class File has changed. File no longer
- represents a particular file. Now, an object of class File can be reused in
- the same scope with a different file. The interface is still primitive. In
- particular, there is no seek function to go to a particular byte in the file.
- A user who needs to perform a seek, can use read and throw away the
- intervening data. To get to a previous position in the file, close the file,
- reopen it, and perform another read. Since seek can be implemented as a
- combination of primitive operations, it is not a primitive operation.
- Since many operating systems provide a seek operation, it would be more
- efficient to include that as part of the interface. On systems that did not
- have seek, the implementation could fake it with discardable reads. So a
- slightly less primitive, but more efficient interface would look like:
- class File
- {
- public:
- File();
- ...
- long seek(long position, int direction);
- };
- As a side note, you could replace the long parameter with a typedef, as the
- standard C fseek function uses. You could implement the direction parameter as
- an enumeration.
-
-
- Sufficiency
-
-
- A typical user might want to find out the current file position. The primitive
- interface does not provide an operation for determining file position. The
- user could determine it by keeping track of the reads, writes, and seeks.
- Since this facility is commonly needed, a sufficient interface could be
- written as follows:
- class File
- {
- public:
- File();
- ...
- long seek(long position, int direction);
- long tell();
- };
-
-
- Toward Completeness
-
-
- Users may occasionally reposition files to the first byte, using seek(0, 0).
- You might code a convenient macro for this operation as follows:
- #define rewind() seek(0,0);
- If rewind were a common operation, you might want to add the function to the
- interface. The function is not necessary, but may be generally useful. Adding
- a rewind function would make the interface more complete. The question is when
- to stop adding functions. Should you also include a
- search_for_a_byte_value(int byte_value)
- function? Or should you let the user write a private version? The more member
- functions a class contains, the more overwhelming it can be. The fewer
- functions a class contains, the more code the user may have to write.
-
-
- Objective Criteria for Class Design
-
-
- A few implicit objective criteria for adding functions exist. First, an object
- should perform the operations requested of it by the user. If the object
- cannot perform an operation, it should notify the user through an error
- return, an exception, or by some other means. Second, an object should do no
- harm. Using an object should not cause memory to be overwritten, or cause
- changes to other objects, unless those actions were truly intended. (Science
- fiction buffs may notice that these criteria are similar to Isaac Asimov's
- rules for robots.)
- The following example demonstrates the need for these rules. I spent a few
- days trying to use an object-oriented user interface generator for Microsoft
- Windows, DOS, and other systems. The documentation for the system assumes that
- you are new to C++, as it covers the language in some detail. On the other
- hand, the system uses pointers and pointers to tables of pointers to objects
- as part of the programmer interface.
- One of the objects the system supports represents a vertical list. One of the
- member functions for the vertical list object loads the object from persistent
- storage. The member function must open a corresponding file to get the data
- for the object. Unfortunately, the function failed to close that file. Not
- even the destructor for the object closed the file.
- In my application, this function error did not manifest itself until the
- fourth time through a particular series of operations. The program ran fine
- until it suddenly was unable to open a data file. I spent a number of hours to
- determine what unapparent error on my part had caused such a problem. The
- answer: my application had exceeded the open file limit; the object's storage
- file had been repeatedly opened without ever being closed.
- The failure of this object to obey one of the two "objective" rules (to not
- cause harm) definitely caused a lot of human grief.
-
-
- Questions and Answers
-
-
-
-
- Pointer Types
-
-
- Q
- The C-program in Listing 1 generates an unexpected warning when compiled using
- version 2.3.3 of gcc (no options specified). The diagnostic is:
- test.c: In function 'main':
- test.c:12: warning: passing arg 1 of 'sub2' from incompatible
-
- pointer type
- The diagnostic is tied to my prototype on line 7 where I declare a single
- argument to be a pointer to an array of const int dimensioned DIM2. This
- should prevent any obvious assignments to the parameter a within sub2, which
- was my intention.
- I've run this by several people (including gcc tech support) and some agree
- with the compiler and others do not. I would appreciate your opinion.
- Michael G. Soyka
- Warren, RI
- A
- This is a great followup to a question from last month. In that issue, I
- discussed the datatypes of variables such as:
- int int_array2d[1][10];
- The type of int_array_2d usually reduces to (int (*)[10]) and int_array
- usually reduces to (int *), so:
- pointer_to_int_array = int_array_2d;
- pointer_to_int = int_array;
- are compatible assignments. I use the phrase "usually reduces to" to emphasize
- that the type of an array is not the same as a pointer type. Let's add a const
- term to one set of these variables, as in the following:
- const int int_array_of_const[1];
- const int *pointer_to_int_const;
- The type of int_array_of_const usually reduces to (const int *), so the
- assignment of
- pointer_to_int_const = int_array_of_const;
- is proper. When using two-dimensionional arrays and pointers, typing becomes
- somewhat more complex. For the declarations
- const int int_array_2d_of_const[1][10];
- const int (*pointer_to_array_of_const_int)[10];
- the type of int_array_2d_of_const usually reduces to (const int (*)[10]), so
- the assignment is compatible. Now, the reduced type of int_array_2d from the
- first example was (int (*)[10]). That reduced type is not the same as (const
- int (*)[10]), which is the type of pointer_to_array_of_const_int. So
- pointer_to_int_array_of_const = int_array_2d;
- yields a compiler warning, since these two expressions represent pointers to
- incompatible types. One expression represents a pointer to a const array, the
- other to a non-const array.
- A pointer to an int can be assigned to a pointer to a const int. This
- assignment simply adds "constness" to the object originally pointed at by the
- pointer to int. The C Standard does not include an example of this operation,
- but its effect seems to follow from the definition of const. The reverse
- operation is not allowed without a cast, as that would be taking away
- "constness."
- It seems appropriate to me to allow a programmer to add "constness" to an
- object at any level. I discussed this concept in a conversation with Chris
- Skelly, who has written for The C User's Journal and has written a book (soon
- to be published) on pointers. We agreed that it seems proper to be able to add
- "constness" to a data type with an assignment.
- However, according to P.J. Plauger, the C Standard does not require that a
- pointer to array of const T be assignment-compatible with a pointer to array
- of T. Some vendors may provide this latitude while others may not.
- I tried your program with the Borland C++ 3.1 and Microsoft C++ 7.0 compilers.
- Borland accepts it without complaint. Microsoft generates a message similar to
- gcc. I altered your original example as shown in Listing 2 to better
- illustrate the problem.
- This program yielded an error reading:
- error C2440: 'initializing' : cannot convert
- from 'int [1][1]' to 'const int (_near *)[1]'
- The preceding diagnostic is valid for the line
- const int (*pointer_2d_to_const_2)[DIM2] = array_2d;
- For reasons I explained before, the diagnostic may not be clearly worded,
- although it is valid. Note that the compiler had no problem with
- const int (*pointer_2d_to_const_1)[DIM2]: array_2d_of_const;
- which implies that the conversion from const int [1][1] to const int (_near
- *)[1] is acceptable.
- Interestingly enough, the diagnostic does not appear when the program is
- compiled as a C program with Microsoft. This discrepancy is probably due to
- the more extensive type-checking performed by C++ compilers.
- One way to solve your problem is shown in Listing 3. I use a typedef to
- eliminate one dimension from the other declarations. The use of typedef makes
- the array and pointers equivalent to the following declarations:
- int int_array[1];
- const int *pointer_to_const int;
- Along these same lines, I ran into an "interesting" problem when I needed to
- convert some old C code from the Microsoft large model to C++ under the medium
- model. Nobody would do this if they want to maintain their sanity. However,
- the requirement existed, so I plunged ahead. I did the conversion in two steps
- -- first to large model C++, then to medium model C++.
- The first step went relatively smoothly. I had already determined the
- interface to the C++ objects and created test implementations of the objects.
- I linked the overall test program using the test implementation and the
- results came out correct. Then I began the second step. My addition of __far
- to every pointer seemed to work fine. The compiler warned me about the few
- places that I missed, except... And it was a big exception:
- One of the original function prototypes was defined as follows:
- function (char *array[])
- array was an array of pointers to character strings. Since I could use either
- pointer or array syntax for an argument, I would expect to be able to convert
- the prototype to:
- function (char __far * __far array[]);
- I used the first __far because the array was going to contain elements which
- were far pointers to char. The second __far was for the array itself. This
- declaration should be logically equivalent to
- function (char __far * __far * array);
- In fact, Microsoft's compiler accepts the former declaration form without a
- hint of a warning -- but the code compiles incorrectly. Garbage addresses are
- passed. Memory is overwritten. Windows crash. Beeps go off. For the first time
- in a long period, I had to invoke a debugger. With the switch to the second
- form of the declaration all the problems went away.
- This address problem occurs only with multiply-dimensioned arrays; it's
- similar to your const problem. There appears to be a fundamental disagreement
- in how to bind __far to arrays versus how to bind it to pointers. Since __far
- is not in the C Standard, compiler vendors can do it any way they wish. I just
- wish they would give me a warning.
-
- Listing 1 Yields unexepected warning
- 1: #define DIM1 10
- 2: #define DIM2 10
- 3:
- 4: int a[DIM1][DIM2];
- 5:
- 6: void sub1 ( int (*a)[DIM2]);
- 7: void sub2 (const int (*a)[DIM2]);
- 8:
- 9: void main ()
-
- 10: {
- 11: sub1 (a);
- 12: sub2 (a);
- 13: }
-
- /* End of File */
-
-
- Listing 2 Potentially incompatible assignments
- #define DIM1 1
-
- #define DIM2 1
-
- int array_2d[DIM1][DIM2];
- const int array_2d_of_const[DIM1][DIM2] = {{1}};
-
- void function1 ()
- {
- int (*pointer_2d)[DIM2] = array_2d;
- const int (*pointer_2d_to_const_1)[DIM2] = array_2d_of_const;
- const int (*pointer_2d_to_const_2)[DIM2] = array_2d;
- }
-
- /* End of File */
-
-
- Listing 3 Using a typedef to eliminate incompatibilities
- #define DIM1 1
- #define DIM2 1
-
- typedef int INT_ARRAY_DIM2[DIM2];
-
- INT_ARRAY_DIM2 array_2d[DIM1];
- const INT_ARRAY_DIM2 array_2d_of_const[DIM1] = {{1}};
-
- void function1 ()
- {
- INT_ARRAY_DIM2 *pointer_2d = array_2d;
- const INT_ARRAY_DIM2 *pointer_2d_to_const_1 = array_2d_of_const;
- const INT_ARRAY_DIM2 *pointer_2d_to_const_2 = array_2d;
- }
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Code Capsules
-
-
- Bit Handling in C++, Part 2
-
-
-
-
- Chuck Allison
-
-
- Chuck Allison is a regular columnist with CUJ and a software architect for the
- Family History Department of the Church of Jesus Christ of Latter Day Saints
- Church Headquarters in Salt Lake City. He has a B.S. and M.S. in mathematics,
- has been programming since 1975, and has been teaching and developing in C
- since 1984. His current interest is object-oriented technology and education.
- He is a member of X3J16, the ANSI C++ Standards Committee. Chuck can be
- reached on the Internet at allison@decus.org, or at (801)240-4510.
-
-
-
-
- The bits Class Template
-
-
- The standard C++ library has two classes for bit manipulation: bitstring and
- bits. Last month I discussed the bitstring class, which defines objects that
- behave like an array of bits that expands or contracts according to your
- needs. This month's installment explains the bits class, an abstraction which
- extends C's bitwise semantics by allowing easy access to bits, by allowing an
- arbitrary (but fixed) number of bits in a bitset, and by adding important new
- functionality. Following last month's format, I present here an excerpt from
- the official proposal accepted by the standards committee, a working
- implementation, and examples of how to use the class.
- Class bits accommodates a fixed-length collection of bits. You can think of a
- bits object as an arbitrarily large unsigned integer. It is actually a class
- template, with the number of bits in the collection as the template parameter.
- (See the sidebar "Templates in a Nutshell.") It is highly suitable for
- interfacting with the host operating system, and is designed for efficiency.
- (It can be stack-based.) Here's a sample program run on a machine with 16-bit
- integers:
- // tbits.cpp:
- // Set some bits
- // and display the result -
- #include <iostream.h>
- #include <stddef. h>
- #include <limits.h>
- #include "bits.h"
-
- main()
- {
-
- const size_t SIZE
- = CHAR_BIT * sizeof(int);
- bits<SIZE> flags;
- enum open_mode {in, out, ate,
- app, trunc,
- binary};
-
- flags.set(in);
- flags.set(binary);
- cout << "flags:"
- << flags <<" (0x"
- << hex
- << flags.to_ushort()
- << ")\n";
- cout << "binary?"
- << (flags.test(binary)
- ? "yes" : "no")
- << endl;
- return 0;
- }
-
- Output
- flags: 0000000000100001 (0x21)
- binary? yes
-
-
-
- Member Function Descriptions
-
-
- This section is a modified excerpt from the official proposal which describes
- the semantics of each member function. For a quick look at the class
- interface, see Listing 6. Since the library group of the joint C++ standards
- committee is still deciding how to integrate exceptions into the standard
- library, I just mention them briefly here. The names and uses of exceptions
- are subject to change. I use asserts in place of exceptions in the
- implementation (see Listing 7).
-
-
- 1.0 Constructors
-
-
-
-
- Synopsis
-
-
- bits()
- bits(unsigned long n)
- bits(const string& s)
- bits(const bits<N>& b)
-
-
- 1.1 Constructor bits()
-
-
-
-
- Description
-
-
- Initializes all bits to zero.
-
-
- 1.2 Constructor bits(unsigned long n)
-
-
-
-
- Description
-
-
- Initializes the object with the bits of n. If N >sizeof(unsigned long) *
- CHAR_BIT, sets the extra bits to zero.
-
-
- 1.3 Constructor bits(const string& s)
-
-
-
-
- Description
-
-
- Each character of s is interpreted as a bit (a string of 1s and 0s is
- expected). In typical integral fashion, treats the last (right-most) character
- of s as bit 0.
-
-
- Exceptions
-
-
- Throws invalid_argument if a character other than 1 or 0 is encountered.
-
-
-
- 1.4 Constructor bits(const bits<N>& b)
-
-
-
-
- Description
-
-
- Standard copy constructor.
-
-
- 2.0 Destructor
-
-
- No destructor required.
-
-
- 3.0 Other Member Functions
-
-
-
-
- 3.1 Function unsigned short to_ushort() const
-
-
-
-
- Description
-
-
- Converts the n least significant bits of *this (where n == sizeof(unsigned
- short) * CHAR_BIT) to an unsigned short. This is useful when the bits
- represent flags in a word passed to the operating system.
-
-
- Exceptions
-
-
- Throws overflow if N > n and any of the bits above position n-1 are set.
-
-
- 3.2 Function unsigned long to_ulong() const
-
-
-
-
- Description
-
-
- Converts the n least significant bits of *this (where n == sizeof(unsigned
- long) * CHAR_BIT) to an unsigned long.
-
-
- Exceptions
-
-
- Throws overflow if N > n and any of the bits above position n-1 are set.
-
-
- 3.3 Function string to_string() const
-
-
-
-
-
- Description
-
-
- Creates a string of 1s and 0s representing the contents of *this. As with
- unsigned integers, treats the last character as bit 0.
-
-
- Returns
-
-
- The temporary string of 1s and 0s.
-
-
- 3.4 Function bits<N>& operator:(const bits<N>& b)
-
-
-
-
- Description
-
-
- Standard assignment operator.
-
-
- Returns
-
-
- A reference to *this.
-
-
- 3.5 Function int operator==(const bits<N>& b) const
-
-
-
-
- Description
-
-
- Compares *this to b for equality. Two bitsets are equal if and only if their
- bit patterns are identical.
-
-
- Returns
-
-
- Non-zero if the bitsets are equal, zero otherwise.
-
-
- 3.6 Function int operator!=(const bits<N>& b) const
-
-
-
-
- Description
-
-
- Compares *this to b for inequality. Equivalent to !operator==().
-
-
-
- Returns
-
-
- Zero if the bitsets are equal, non-zero otherwise.
-
-
- 3.7 Functions set
-
-
-
-
- Synopsis
-
-
- bits<N>& set(size_t n, int val = 1)
- bits<N>& set()
-
-
- Description
-
-
- These functions set one or more bits. The function set(size_t, int) can reset
- a bit, depending on val.
-
-
- 3.7.1 Function bits<N>& set[size_t n, int val)
-
-
-
-
- Description
-
-
- Sets the nth bit if val is non-zero, otherwise resets the bit.
-
-
- Returns
-
-
- A reference to *this.
-
-
- Exceptions
-
-
- Throws out_of_range if n is not in [0,N-1].
-
-
- 3.7.2 Function bits<N>& set()
-
-
-
-
- Description
-
-
- Sets all bits.
-
-
-
- Returns
-
-
- A reference to *this.
-
-
- 3.8 Functions reset
-
-
-
-
- Synopsis
-
-
- bits<N>& reset(size_t n)
- bits<N>& reset()
-
-
- Description
-
-
- These functions reset one or more bits.
-
-
- 3.8.1 Function bits<N>& reset(size_t n)
-
-
-
-
- Description
-
-
- Resets the nth bit.
-
-
- Returns
-
-
- A reference to *this.
-
-
- Exceptions
-
-
- Throws out_of_range if n is not in [0, N-1].
-
-
- 3.8.2 Function bits<N>& reset()
-
-
-
-
- Description
-
-
- Resets all bits.
-
-
- Returns
-
-
-
- A reference to *this.
-
-
- 3.9 Functions toggle
-
-
-
-
- Synopsis
-
-
- bits<N>& toggle(size_t n)
- bits<N>& toggle()
-
-
- Description
-
-
- These functions toggle one or more bits.
-
-
- 3.9.1 Function bits<N>& toggle(size_t n)
-
-
-
-
- Description
-
-
- Toggles the nth bit.
-
-
- Returns
-
-
- A reference to *this.
-
-
- Exceptions
-
-
- Throws out_of_range if n is not in [0,N-1].
-
-
- 3.9.2 Function bits<N>& toggle()
-
-
-
-
- Description
-
-
- Toggles all bits.
-
-
- Returns
-
-
-
- A reference to *this.
-
-
- 3.10 Function bits operator~() const
-
-
-
-
- Description
-
-
- Toggles all the bits of a copy of *this.
-
-
- Returns
-
-
- A toggled copy of *this.
-
-
- 3.11 Function int test(size_t n) const
-
-
-
-
- Description
-
-
- Tests if bit n is set.
-
-
- Returns
-
-
- Non-zero if the bit is set, zero otherwise.
-
-
- Exceptions
-
-
- Throws out_of_range if n is not in [0,N-1].
-
-
- 3.12 Function int any() const
-
-
-
-
- Description
-
-
- Tests if any bits at all are set.
-
-
- Returns
-
-
- 0 if all bits are 0, non-zero otherwise.
-
-
-
- 3.13 Function int none() const
-
-
-
-
- Description
-
-
- Tests if no bits at all are set.
-
-
- Returns
-
-
- Non-zero if all bits are 0, 0 otherwise.
-
-
- 3.14 Function bits<N>& operator&=(const bits<N>& b)
-
-
-
-
- Description
-
-
- Performs a destructive bitwise-AND of b into *this.
-
-
- Returns
-
-
- A reference to *this.
-
-
- 3.15 Function bits<N>& operator/=(const bits<N>& b)
-
-
-
-
- Description
-
-
- Performs a destructive bitwise-OR of b into *this.
-
-
- Returns
-
-
- A reference to *this.
-
-
- 3.16 Function bits<N>& operator^=(const bits<N>& b)
-
-
-
-
- Description
-
-
-
- Performs a destructive bitwise exclusive-OR of b into *this.
-
-
- Returns
-
-
- A reference to *this.
-
-
- 3.17 Function bits<N>& operator>>=(size_t n)
-
-
-
-
- Description
-
-
- Shifts *this right destructively (i.e., in place) by n bit positions. Resets
- all bits if n > N. To shift "right" by n means that bit 0 receives the value
- of bit n, bit 1 receives bit (n+1), etc.
-
-
- Returns
-
-
- A reference to *this.
-
-
- 3.18 Function bits<N>& operator<<=(size_t n)
-
-
-
-
- Description
-
-
- Shifts *this left destructively by n bit positions. Resets all bits if n > N.
- To shift "left" by n means that bit n receives the value of bit 0, bit (n+1)
- receives bit 1, etc.
-
-
- Returns
-
-
- A reference to *this.
-
-
- 3.19 Function bits<N> operator>>(size_t n) const
-
-
-
-
- Description
-
-
- A non-destructive version of operator>>=().
-
-
- Returns
-
-
-
- The results of the shift in a temporary bits object.
-
-
- 3.20 Function bits<N> operator<<(size_t n) const
-
-
-
-
- Description
-
-
- A non-destructive version of operator<<=().
-
-
- Returns
-
-
- The results of the shift in a temporary bits object.
-
-
- 3.21 Function size_t count( ) const
-
-
-
-
- Description
-
-
- Counts the number of bits that are set.
-
-
- Returns
-
-
- The number of 1 bits in *this.
-
-
- 3.22 Function size_t length( ) const
-
-
-
-
- Description
-
-
- Returns the fixed-size length of the object.
-
-
- Returns
-
-
- N.
-
-
- 4.0 Global Functions
-
-
-
-
-
- 4.1 Function ostream& operator<<(ostream& os, const bits<N>& b)
-
-
-
-
- Description
-
-
- Sends a sequence of N 1s and 0s corresponding to the bit pattern of *this to
- the stream os,
-
-
- Returns
-
-
- A reference to the stream os.
-
-
- 4.2 Function istream& operator>>(istream& is, bits<N>& b)
-
-
-
-
- Description
-
-
- Reads a sequence of up to N 1s and 0s from the stream is, after skipping
- whitespace. The first non-bit character thereafter terminates the read and
- remains in the stream. The corresponding bit pattern is reproduced in b.
- Treats the last 1 or 0 read from the stream as bit 0 of b.
-
-
- Returns
-
-
- A reference to the stream is.
-
-
- 4.3 Function bits<N>operator& (const bits<N>& b1, const bits<N>& b2)
-
-
-
-
- Description
-
-
- Performs a bitwise-AND of b1 and b.
-
-
- Returns
-
-
- The results of the bitwise-AND in a temporary bits object.
-
-
- 4.4 Function bits<N>operator / (const bits<N>& b1, const bits<N>& b2)
-
-
-
-
- Description
-
-
-
- Performs a bitwise-OR of b1 and b.
-
-
- Returns
-
-
- The results of the bitwise-OR in a temporary bits object.
-
-
- 4.5 Function bits <N> operator^ (const bits<N>& b1, const bits<N>& b2)
-
-
-
-
- Description
-
-
- Performs a bitwise exclusive-OR of b1 and b.
-
-
- Returns
-
-
- The results of the exclusive-OR in a temporary bits object.
-
-
- Design Notes
-
-
- Having an expression instead of a type as a template parameter has the
- following effects:
- A bits-object can be stack-based, since its size is known at compile time.
- This means less run-time overhead and therefore better performance than
- bitstring objects.
- Objects of different sizes (i.e., with a different number of bits) are
- different types, and therefore can't be combined in operations.
- No global functions taking bits arguments are allowed under the current
- definition of the language unless you define them inline in the class
- definition. The standards committee is working to fix this. See the sidebar,
- "Templates in a Nutshell," for more detail.
- Since a bits object is an extension of unsigned integers as far as bitwise
- operations are concerned, a collection of bits behaves like a number, in that
- bit 0 is the right-most bit. To be consistent with C bitwise operations, the
- statements
- bits<8> b;
- b = 5;
- cout << b << endl;
- give the result
- 00000101
- that is, the bits of 5 are ORed with b (via the constructor bits(unsigned
- long)).
- As you can see in Listing 7, I've taken some liberties in the implementation
- of this class by changing the type of the template parameter to a size_t. The
- reason for originally making the number of bits an unsigned long was to
- guarantee a minimum size across platforms (the ANSI C standard states that an
- unsigned long must be at least 32 bits wide). However, this would require that
- count and length return an unsigned long. As proof that this is unnatural, I
- offer the fact that I forgot to do so (the functions return a size_t because
- it "feels right") and no one on the committee noticed. (Actually, Bill Plauger
- finally noticed while he was editing the standard library documents, but that
- was four months after the bits class became official.) Furthermore, the
- corresponding functions in the string and bitstring classes also return a
- size_t. (They have no choice.) To be consistent with these classes, therefore,
- and because it just makes sense, I shall propose to the committee that we
- approve the obvious and make the template parameter a size_t.
- I'm also thinking of changing the name from bits to bitset. This would allow
- you to refer to an object as a "bitset" instead of always having to say a
- "bits object." (Who would ever call it a "bits"?) And one could argue that the
- function to_ushort( ) is superfluous, since it is equivalent to (unsigned
- short) to_ulong( ).
-
-
- Implementation Notes
-
-
- The Code Capsule "Bit Handling in C" (CUJ, November, 1993) provides a thorough
- explanation of the internals of using an integral array to store and access
- individual bits, so I won't repeat it here. The techniques found therein serve
- both the bitstring and bits classes. The implementation of the string class is
- in last month's installment ("Bit Handling in C++, Part 1," CUJ, December
- 1993). Listing 8 has a test program that exercises most of the member
- functions of the bits class.
-
-
- Sets of Integers
-
-
- For those of you who miss some of the high-level features of Pascal and
- Modula-2, the bits class gives you sets of integers almost for free. Just
- define a bits object of a size appropriate for your application and do the
- following:
- For the set operation: Do this:
- insert x into s s. set(x)
- remove x from s s. reset(x)
- x member of s? s. test(x)
- complement of s s. toggle() or ~s
-
- s1 + s2 (union) s1 / s2
- s1 * s2 (intersection) s2 & s2
- s1 - s2 (difference) see below
- s1 <= s2 (subset) see below
- s1 >= s2 (superset) see below
- If this still seems too "low-level," it is a trivial matter to define a
- set-like interface to the bits class. Listing 9 defines a class template
- called Intset that has all the basic set operations along with an ostream
- inserter. The only operations that take any thought at all (but only very
- little) are set difference and subset. To remove from s1 the elements of s2,
- just reset in s1 the bits that are set in s2:
- s1.bitset &= ~s2.bitset;
- // see Intset<N>::operator-
- If s1 is a subset of s2, then s1 is nothing more nor less than the
- intersection of the two sets:
- s1 == s1 & s2
- // true iff s1 <= s2
- The test program in Listing 10 shows how to use the Intset class.
-
-
- Conclusion
-
-
- The acceptance of these two bit handling classes by the C++ standards
- committee shows that the needs of the system programmer have not been
- forgotten. Much of the hullabaloo over object-oriented technology emphasizes
- inheritance hierarchies of polymorphic, high-level abstract data types. These
- concepts are best left to specialized libraries and applications while the
- standard rightly focuses on commonly needed, low-level abstractions. If you
- have any comments on these classes, please contact me at the email address in
- the by-line at the bottom of the first page of this article.
- Templates In A Nutshell
- A template is a parameterized layout for defining a function or class. The
- parameters are usually types, but class templates can also have value
- parameters. Template definitions allow you to specify the logic of a function
- or class once and then have the compiler create specific functions or classes
- for different types as you need them.
-
-
- Function Templates
-
-
- Consider the swap function
- void swap(int& x, int& y)
- {
- int temp = x;
- x = y;
- temp = y;
- }
- This works only for integer arguments. You need a different version of swap
- for each data type for which you want to swap elements. If you inspect the
- version for doubles:
- void swap(double& x, double& y)
- {
- double temp = x;
- x = y;
- temp = y;
- }
- you'll notice that the only change was to substitute double for int in the
- text. This suggests a macro solution, with the data type as a parameter:
- // genswap.h
- #define genswap(T) void swap(T& x, T& y) \
- { \
- T temp = x; \
- x = y; \
- y = temp; \
- }
- To generate different versions of swap, call genswap as needed:
- #include <iostream.h>
- #include "genswap.h"
-
- genswap(int) // Awkward syntax,
- genswap(double) // I'll admit.
-
- main()
- {
- int i = 1, j = 2;
- double x = 1.1, y = 2.2;
- swap(i,j);
-
- swap(x,y);
- cout << i << ',' << j << endl; // 2,1
- cout << x << ',' << y << endl; // 2.2,1.1
- return 0;
- }
- This has the advantage of allowing you to specify the function logic only
- once.
- A function template is much the same as the genswap macro, except that you
- don't have to explicitly generate the functions you need. After seeing the
- template definition
- template<class T>
- void swap(T& x, T& y)
- {
- T temp = x;
- x = y;
- y = temp;
- }
- the compiler automatically generates versions as needed when it finds a call
- to swap. (See Listing 1.) You can use any type, including built-ins, for the
- template argument.
-
-
- Class Templates
-
-
- You can also parameterize class definitions. A good candidate is a stack,
- since the logic of stack operations is the same no matter what type the stack
- elements are. Listing 2 and Listing 3 have the definition of an integer stack
- class. To templatize this class, precede the class definition with the line
- template<class T>
- as before, and change all occurrences of int that refer to the type of
- elements on the stack to T (see Listing 4). You instantiate a specific stack
- class like this:
- Stack<int> s1(5);
- The type of s1 is Stack<int> ("stack of int"). The token Stack cannot appear
- unqualified outside of the class template definition. See Listing 5 for an
- example of using Stack template classes. (Point of Terminology: A "class
- template" is the original template definition. A "template class" is a
- particular class instantiated from the class template, such as Stack<int>.)
- Note that there is no separate stack2.cpp file. My compilers (Borland 3.1 and
- WATCOM 9.5) require the entire class implementation to be visible during
- compilation, so everything is in an include file.
- A class template can also have value parameters. The bits class in this
- article is a good example:
- template<size_t N>
- class bits
- {
- //...
- };
- The value for N must be a constant expression when instantiated:
- bits<16>b1; // ok
- const size_t n = 32;
- bits<n> b2; // ok
- size_t m = 64;
- bits<m> b3; // nope - m not const
- Since the specific values of N in a program are known at compile time, the
- array inside a bits<N> object can be placed on the stack, thus avoiding the
- need for dynamic memory management.
- A disadvantage of value parameters occurs with friend functions. Consider the
- friend function operator& defined in the bits class. To define this outside of
- the class definition itself, you would have to write:
- template<size_t N>
- bits<N> operator&(const bits<N>& b1, const bits<N>& b2)
- {
- //...
- }
- Since this is not a member function, the compiler recognizes it as a function
- template definition. Under the current definition of the language, global
- function templates can only have type arguments (e.g., class T), because a
- compiler uses overloading rules to resolve them. That's why I had to fully
- define all friends inside of the bits<N> class template definition. The joint
- C++ standards committee voted in November to allow out-of-line definitions of
- friend functions for class templates.
-
- Listing 1 A function template for swapping two objects of the same type
- // swap.cpp
- #include <iostream.h>
-
- template<class T>
- void swap(T& x, T& y)
- {
- T temp = x;
- x = y;
- y = temp;
- }
-
-
- main()
- {
- int a = 1, b = 2;
- double c = 1.1, d = 2;
- char *s = "hello", *t = "there";
-
- swap(a,b);
- cout << "a = " << a << ", b = " << b << '\n';
-
- swap(c,d);
- cout<< "c = " << c << ", d = " << d << '\n';
-
- swap(s,t);
- cout<< "s = " << s << ", t = " << t << '\n';
-
- return 0;
- }
-
- /* Output;
- a = 2, b = 1
- c = 2.2, d = 1.1
- s = there, t = hello
-
- // End of File
-
-
- Listing 2 A class for a stack of integers
- // stack1.h: A C++ integer stack class
-
- #include <stddef.h>
-
- class Stack
- {
-
- size_t size;
- int *data;
- int ptr;
-
- public:
- Stack(size_t);
- ~Stack();
- void push(int);
- int pop();
- int empty() const;
- int full() const;
- };
-
- inline Stack::Stack(size_t siz)
- {
- data = new int[size = siz];
- ptr = 0;
- }
-
- inline Stack::~Stack()
- {
- delete [] data;
- }
-
- inline int Stack::empty() const
-
- {
- return ptr == 0;
- }
-
- inline int Stack::full() const
- {
- return ptr == size;
- }
-
- // End of File
-
-
- Listing 3 Out-of-line functions for the stack class
- // stack1.cpp
- #include "stack1.h"
-
- void Stack::push(int x)
- {
- if (ptr < size)
- data[ptr++] = x;
- }
-
- int Stack::pop()
- {
- if (ptr > 0)
- --ptr;
- return data[ptr];
- }
-
- // End of File
-
-
- Listing 4 A class template for homogeneous stacks
- // stack.h: A C++ stack class template
-
- #include <stddef.h>
-
- template<class T>
- class Stack
- {
-
- size_t size;
- T *data;
- int ptr;
-
- public:
- Stack(size_t);
- ~Stack();
- void push(const T&);
- T pop();
- int empty() const;
- int full() const;
- };
-
- template<class T>
- inline Stack<T>::Stack(size_t siz)
- {
- data = new T[size = siz];
- ptr = 0;
-
- }
-
- template<class T>
- inline Stack<T>::~Stack()
- {
-
- delete [] data;
- }
-
- template<class T>
- void Stack<T>::push(const T& x)
- {
- if (ptr < size)
- data[ptr++] = x;
- }
-
- template<class T>
- T Stack<T>::pop()
- {
- if (ptr > 0)
- --ptr;
- return data[ptr];
- }
-
- template<class T>
- inline int Stack<T>::empty() const
- {
- return ptr == 0;
- }
-
- template<class T>
- inline int Stack<T>::full() const
- {
- return ptr == size;
- }
-
- // End of File
-
-
- Listing 5 Illustrates the stack template class
- // tstack2.h
- #include <iostream.h>
- #include "stack2.h"
-
- main()
- {
- Stack<int> s1(5), s2(5);
-
- // Push odds onto s1, evens onto s2:
- for (int i = 1; i < 10; i += 2)
- {
- s1.push(i);
- s2.push(i+1);
- }
-
- // Retrieve and print in LIFO order:
- cout << "Stack 1:\n";
- while (!s1.empty())
- cout << s1.pop() << endl;
-
-
- cout << "Stack 2:\n";
- while (!s2.empty())
- cout << s2.pop() << endl;
-
- return 0;
- }
-
- /* Output:
- Stack 1:
- 9
- 7
- 5
- 3
- 1
- Stack 2:
- 10
- 8
- 6
- 4
- 2
- */
-
- /* End of File */
-
-
- Listing 6 The bits class template interface
- template<unsigned long N>
- class bits
- {
- // Friends:
- // Global I/O funtions
- friend ostream& operator<<(ostream&, const bits<N>&);
- friend istream& operator>>(istream&, bits<N>&);
-
- // Global bitwise operators
- friend bits<N> operator&(const bits<N>&, const bits<N>&);
- friend bits<N> operator(const bits<N>&, const bits<N>&);
- friend bits<N> operator^(const bits<N>&, const bits<N>&);
-
- public:
- // Constructors
- bits();
- bits(unsigned long n);
- bits(const bits<N>& b);
- bits(const string& s);
-
- // Conversions
- unsigned short to_ushort() const;
- unsigned long to_ulong() const;
- string to_string() const;
-
- // Assignment
- bits<N>& operator=(const bits<N>& rhs);
-
- // Equality
- int operator==(const bits<N>& rhs) const;
- int operator!=(const bits<N>& rhs) const;
-
-
- // Basic bit operations
- bits<N>& set(size_t pos, int val = 1);
- bits<N>& set();
- bits<N>& reset(size_t pos);
- bits<N>& reset();
- bits<N>& toggle(size_t pos);
- bits<N>& toggle();
- bits<N> operator~() const;
- int test(size_t n) const;
- int any() const;
- int none() const;
-
- // Bit-wise operators
- bits<N>& operator&=(const bits<N>& rhs);
- bits<N>& operator=(const bits<N>& rhs);
- bits<N>& operator^=(const bits<N>& rhs);
-
- // Shift operators
- bits<N>& operator<<=(size_t n);
- bits<N>& operator>>=(size_t n);
- bits<N> operator<<(size_t n) const;
- bits<N> operator>>(size_t n) const;
-
- size_t count() const;
- size_t length() const;
- };
-
- // End of File
-
-
- Listing 7 An implementation of the bits class template
- // bits.h
-
- #include <iostream. h>
- #include <stddef. h>
- #include <limits.h>
- #include <assert.h>
- #include "string.hpp"
-
- template<size_t N>
- class bits
- {
- // Global I/O funtions
- friend ostream& operator<<(ostream& os, const bits<N>& rhs)
- {os << rhs.to_string(); return os;}
- friend istream& operator>>(istream& is, bits<N>& rhs)
- {rhs.read(is); return is;}
-
- // Global bitwise operators
- friend bits<N>operator&(const bits<N>& b1,const bits<N>& b2)
- {bits<N> r(b1); return r &= b2;}
- friend bits<N>operator(const bits<N>&b1,const bits<N>& b2)
- {bits<N> r(b1); return r = b2;}
- friend bits<N> operator^(const bits<N>& b1,const bits<N>& b2)
- {bits<N> r(b1); return r ^= b2;}
-
- public:
- // Constructors
- bits();
-
- bits(unsigned long n);
- bits(const bits<N>& b);
- bits(const string& s);
-
- // Conversions
- unsigned short to_ushort() const;
- unsigned long to_ulong() const;
- string to_string() const;
-
- // Assignment
- bits<N>& operator=(const bits<N>& rhs);
-
- // Equality
- int operator==(const bits<N>& rhs) const;
- int operator!=(const bits<N>& rhs) const;
-
- // Basic bit operations
- bits<N>& set(size_t pos, int val = 1);
- bits<N>& set();
- bits<N>& reset(size_t pos);
- bits<N>& reset();
- bits<N>& toggle(size_t pos);
- bits<N>& toggle();
- bits<N> operator~() const;
- int test(size_t n) const;
- int any() const;
- int none() const;
-
- // Bit-wise operators
- bits<N>& operator&=(const bits<N>& rhs);
- bits<N>& operator=(const bits<N>& rhs);
- bits<N>& operator^=(const bits<N>& rhs);
-
- // Shift operators
- bits<N>& operator<<=(size_t n);
- bits<N>& operator>>=(size_t n);
- bits<N> operator<<(size_t n) const;
- bits<N> operator>>(size_t n) const;
-
- size_t count() const;
- size_t length() const;
-
- private:
- typedef unsigned int Block;
- enum {BLKSIZ = CHAR_BIT * sizeof (Block)};
- enum {nblks_ = (N+BLKSIZ-1) / BLKSIZ};
-
- Block bits_[nblks_];
-
- static size_t word(size_t pos)
- {return nblks_ - 1 - pos/BLKSIZ;}
- static size_t offset(size_t pos)
- {return pos % BLKSIZ;}
- static Block mask1(size_t pos)
- {return Block(1) << offset(pos);}
- static Block mask0(size_t pos)
- {return ~(Block(1) << offset(pos));}
-
- void cleanup();
-
- void set_(size_t pos);
- int set_(size_t pos, int val);
- void reset_(size_t pos);
- int test_(size_t pos) const;
- void from_string(const string& s);
- void read(istream& is);
- int any(size_t start_pos) const;
- unsigned long to(size_t) const;
- };
-
- template<size_t N>
- inline bits<N>::bits()
- {
- reset();
- }
-
- template<size_t N>
- bits<N>::bits(const string& s)
- {
- // Validate that s has only 0's and 1's
- for (int i = 0; i < s.length(); ++i)
- {
- char c = s.get_at(i);
- if (c != '0' && c != '1')
- break;
- }
- assert(i == s.length());
-
- from_string(s);
- }
-
- template<size_t N>
- inline bits<N>::bits(const bits<N>& b)
- {
- memcpy(bits_,b.bits_,nblks_*sizeof(bits_[0]));
- }
-
- template<size_t N>
- bits<N>::bits(unsigned long n)
- {
- // Don't drop any bits
- if (N < CHAR_BIT * sizeof(unsigned long))
- assert((n >> N) == 0);
-
- reset();
-
- size_t nblks = sizeof (unsigned long) / sizeof (Block);
- if (nblks > 1)
- for (int i = 0; i < nblks; ++i)
- {
- bits_[nblks - 1 - i] = Block(n);
- n >>= BLKSIZ;
- }
- else
- bits_[nblks_ - 1] = Block(n);
- }
-
- template<size_t N>
- unsigned short bits<N>::to_ushort() const
-
- {
- size_t limit = sizeof(unsigned short) * CHAR_BIT;
- assert(!(length() > limit && any(limit)));
- size_t nblks = sizeof(unsigned short) / sizeof(Block);
- return (unsigned short) to(nblks);
- }
- template<size_t N>
- unsigned long bits<N>::to_ulong() const
- {
- size_t limit= sizeof(unsigned long) * CHAR_BIT;
- assert(!(length() > limit && any(limit)));
- size_t nblks = sizeof(unsigned long) / sizeof(Block);
- return to(nblks);
- }
-
- template<size_t N>
- string bits<N>::to_string() const
- {
- char *s = new char[N+1];
- for (int i = 0; i < N;++i)
- s[i] = '0' + test_(N-1-i);
- s[N] = '\0';
- string s2(s);
- delete [] s;
- return s2;
- }
-
- template<size_t N>
- bits<N>& bits<N>::operator=(const bits<N>& b)
- {
- if (this != &b)
- memcpy(bits_,b.bits_, nblks_* sizeof(bits_[0]));
- return *this;
- }
-
- template<size_t N>
- inline int bits<N>::operator==(const bits<N>& b) const
- {
- return !memcmp(bits_,b.bits_,nblks_ * sizeof(bits_[0]));
- }
-
- template<size_t N>
- inline int bits<N>::operator!=(const bits<N>& b) const
- {
- return !operator==(b);
- }
-
- template<size_t N>
- inline bits<N>& bits<N>::set(size_t pos, int val)
- {
- assert(pos < N);
- (void) set_(pos,val);
- return *this;
- }
-
- template<size_t N>
- inline bits<N>& bits<N>::set()
- {
- memset(bits_,~0u,nblks_ * sizeof bits_[0]);
-
- cleanup();
- return *this;
- }
-
- template<size_t N>
- inline bits<N>& bits<N>::reset(size_t pos)
- {
- assert(pos < N);
- reset_(pos);
- return *this;
- }
-
- template<size_t N>
- inline bits<N>& bits<N>::reset()
- {
- memset(bits_,0,nblks_ * sizeof bits_[0]);
- return *this;
- }
-
- template<size_t N>
- inline bits<N>& bits<N>::toggle(size_t pos)
- {
- assert(pos < N);
- bits_[word(pos)] ^= mask1(pos);
- return *this;
- }
-
- template<size_t N>
- bits<N>& bits<N>::toggle()
- {
- size_t nw = nblks_;
- while (nw--)
- bits_[nw] = ~bits_[nw];
- cleanup();
- return *this;
- }
-
- template<size_t N>
- inline bits<N> bits<N>::operator~() const
- {
- bits<N> b(*this);
- b.toggle();
- return b;
- }
-
- template<size_t N>
- inline int bits<N>::test(size_t pos) const
- {
- assert(pos < N);
- return test_(pos);
- }
-
- template<size_t N>
- int bits<N>::any() const
- {
- for (int i = 0; i < nblks_; ++i)
- if (bits_[i])
- return 1;
- return 0;
-
- }
-
- template<size_t N>
- inline int bits<N>::none() const
- {
- return !any();
- }
-
- template<size_t N>
- bits<N>& bits<N>::operator&=(const bits<N>& rhs)
- {
- for (int i = 0; i < nblks_; ++i)
- bits_[i] &= rhs.bits_[i];
- return *this;
- }
-
- template<size_t N>
- bits<N>& bits<N>::operator=(const bits<N>& rhs)
- {
- for (int i = 0; i < nblks_; ++i)
- bits_[i] = rhs.bits_[i];
- return *this;
- }
-
- template<size_t N>
- bits<N>& bits<N>::operator^=(const bits<N>& rhs)
- {
- for (int i= 0; i < nblks ; ++i)
- bits_[i] ^= rhs.bits_[i];
- return *this;
- }
-
- template<size_t N>
- bits<N>& bits<N>::operator>>=(size_t n)
- {
- if (n > N)
- n = N;
- for (int i = 0; i < N-n;++i)
- (void) set_(i,test_(i+n));
- for (i = N-n; i < N; ++i)
- reset_(i);
- return *this;
- }
-
- template<size_t N>
- bits<N>& bits<N>::operator<<=(size_t n)
- {
- if (n > N)
- n = N;
- for (int i = N-1; i >= n; --i)
- (void) set_(i,test(i-n));
- for (i = 0; i < n; ++i)
- reset_(i);
- return *this;
- }
-
- template<size_t N>
- inline bits<N> bits<N>::operator>>(size_t n) const
- {
-
- bits r(*this);
- return r >>= n;
- }
-
- template<size_t N>
- inline bits<N> bits<N>::operator<<(size_t n) const
- {
- bits r(*this);
- return r <<= n;
- }
-
- template<size_t N>
- size_t bits<N>::count() const
- {
- size_t sum = 0;
- for (int i = 0; i < N; ++i)
- if (test_(i))
- ++sum;
- return sum;
- }
-
- template<size_t N>
- inline size_t bits<N>::length() const
- {
- return N;
- }
-
- // Private functions
- template<size_t N>
- inline void bits<N>::set_(size_t pos)
- {
- bits_[word(pos)] = mask1(pos);
- }
-
- template<size_t N>
- int bits<N>::set_(size_t pos, int val)
- {
- if (val)
- set_(pos);
- else
- reset_(pos);
- return !!val;
- }
-
- template<size_t N>
- inline void bits<N>::reset_(size_t pos)
- {
- bits_[word(pos)] &= mask0(pos);
- }
-
- template<size_t N>
- inline int bits<N>::test_(size_t pos) const
- {
- return !!(bits_[word(pos)] & mask1(pos));
- }
-
- template<size_t N>
- inline void bits<N>::cleanup()
- {
-
- // Make sure unused bits don't get set
- bits_[0] &= (~Block(0) >> (nblks_ * BLKSIZ - N));
- }
-
- template<size_t N>
- void bits<N>::from_string(const string& s)
- {
- // Assumes s contains only 0's and 1's
- size_t slen = s.length();
- reset();
- for (int i = slen-1; i >= 0; --i)
- if (s.get_at(i) == '1')set_(slen-i-1);
- }
-
- template<size_t N>
- void bits<N>::read(istream& is)
- {
- char *buf = new char[N];
- char c;
-
- is >> ws;
- for (int i = 0; i < N; ++i)
- {
- is.get(c);
- if (c == '0' c == '1')
- buf[i] = c;
- else
- {
- is.putback(c);
- buf[i] = '\0';
- break;
- }
- }
-
- if (i==0)
- is.clear(ios::failbit);
- else
- from_string(string(buf));
- delete buf;
- }
-
- template<size_t N>
- int bits<N>::any(size_t start) const
- {
- // See if any bit past start (inclusive) is set
- for (int i = start; i < N; ++i)
- if (test_(i))
- return 1;
- return 0;
- }
-
- template<size_t N>
- unsigned long bits<N>::to(size_t nblks) const
- {
- if (nblks > 1)
- {
- int i;
- unsigned long n = bits_[nblks_ - nblks];
-
-
- /* Collect low-order sub-blocks into an unsigned */
- if (nblks > nblks_)
- nblks = nblks_;
- while (--nblks)
- n = (n << BLKSIZ) bits_[nblks_ - nblks];
- return n;
- }
- else
- return (unsigned long) bits_[nblks_ - 1];
-
- }
-
- /* End of File */
-
-
- Listing 8 Tests the bits class
- // tbits.cpp
- #include <i0stream.h>
- #include <i0manip.h>
- #include <stddef.h>
- #include <limits.h>
- #include "bits.h"
-
- main()
- {
- const size_t SIZE = CHAR_BIT * sizeof(unsigned long);
- unsigned long n = 0x12345678;
- bits<SIZE> x(n), y(string("10110")), z(x);
-
- cout << "Initial x: "<< x << endl;
- cout << "Initial y: "<< y << endl;
- cout << "Initial z: "<< z << endl;
- cout << "Enter new z: "/;
- cin >> z;
- cout << "New z: "<< z << endl;
- cout << "z == "<< z.to_ulong() << endl;
- cout << "y ==" << y.to_ushort() << endl;
- cout << "x ==" << x.to_ulong() << endl;
-
- cout << "x: "<< x <<" (" << x.count()
- <<" bits set)" << endl;
- cout << "x == 0x12345678L? "<< (x == 0x12345678L) << endl;
- cout << "x: "<< x << endl;
- cout << "x: "<< hex << setfill('0')
- << setw(sizeof(unsigned long)*2)
- << x.to_ulong() << dec << endl;
- cout << "x <<= 6 == " << (x <<= 6) << endl;
- cout << "x >>= 6 == " << (x >>= 6) << endl;
- cout << "85 ==" << bits<SIZE>(85) << endl;
- cout << "x ^ 85 == " << (x ^ 85) << endl;
- cout << "x & 85 == " << (x & 85) << endl;
- cout << "85 & x === " << (85 & x) << endl;
- cout << "~x == " << (~x) <<" == "
- << (~x).to_ulong() << endl;
-
- y = 0x55555550L;
- cout << "y: " << y << " (" << y.count()
- << " bits set)" << endl;
- cout << "y[0]: " << hex << setfill('0')
-
- << setw(sizeof(unsigned long)*2)
- << y.to_ulong() << dec << endl;
- cout << "x & y == " << (x & y) << endl;
- cout << "x y == " << (x I y) << endl;
- cout << "x ^ y == " << (x ^ y) << endl;
- cout << "x != y? "<< (x != y) << endl;
- return 0;
- }
-
- /* Sample Execution:
- Initial x: 00010010001101000101011001111000
- Initial y: 00000000000000000000000000010110
- Initial z: 00010010001101000101011001111000
- Enter new z: 101001000100001000001
- New z: 00000000000101001000100001000001
- z == 1345601
- y == 22
- x:== 305419896
- x == 00010010001101000101011001111000 (13 bits set)
- x == 0x12345678L? 1
- x: 00010010001101000101011001111000
- x: 12345678
- x <<= 6 == 10001101000101011001111000000000
- x >>= 6 == 00000010001101000101011001111000
- 85 == 00000000000000000000000001010101
- x ^ 85 == 00000010001101000101011000101101
- x & 85 == 00000000000000000000000001010000
- 85 & x === 00000000000000000000000001010000
- ~x == 11111101110010111010100110000111 == 4257982855
- y: 01010101010101010101010101010000 (14 bits set)
- y[0]: 55555550
- x & y == 00000000000101000101010001010000
- x y == 01010111011101010101011101111000
- x ^ y == 01010111011000010000001100101000
- x != y? 1
-
- // End of File
-
-
- Listing 9 Implementation of sets of integers
- #if !defined(INTSET_H)
- #define INTSET_H
-
- #include <iostream.h>
- #include <stddef.h>
- #include "bits.h"
-
- template<size_t N>
- class Intset
- {
- public:
-
- // NOTE: The following constructors shouldn't be
- // necessary. The compiler-generated ones should
- // suffice. For some reason, Borland 3.1 requires
- // these (WATCOM does not).
-
- // Constructors
- Intset();
-
- Intset(const lntset<N>& is);
-
- // Set operations
- Intset<N> operator-(const Intset<N>& is) const;
- Intset<N> operator+(const Intset<N>& is) const;
- Intset<N> operator*(const Intset<N>& is) const;
- Intset<N> operator~() const;
- int operator==(const Intset<N>& is) const;
- int operator!=(const Intset<N>& is) const;
- int operator<=(const Intset<N>& is) const;
- int operator>=(const Intset<N>& is) const;
-
- // Member operations
- int contains(size_t n) const;
- Intset<N>& insert(size_t n);
- Intset<N>& remove(size_t n);
-
- size_t count() const;
- friend ostream& operator<<(ostream& os, const Intset<N>& is)
- {is.print(os); return os;}
-
- private:
- bits<N> bitset;
-
- int subsetof(const Intset<N>& is) const;
- void print(ostream& os) const;
- };
-
- template<size_t N>
- Intset<N>::Intset()
- {
- bitset.reset();
- }
-
- template<size_t N>
- Intset<N>::Intset(const Intset<N>& is)
- {
- bitset = is.bitset;
- }
-
- template<size_t N>
- inline Intset<N> Intset<N>::operator-(const Intset<N> &is) const
- {
- Intset<N> r(*this);
- r.bitset &= ~is.bitset;
- return r;
- }
-
- template<size_t N>
- inline Intset<N> Intset<N>::operator+(const Intset<N> &is) const
- {
- Intset<N> r(*this);
- r.bitset =is.bitset;
- return r;
- }
-
- template<size_t N>
- inline Intset<N> Intset<N>::operator*(const Intset<N> &is) const
- {
-
- Intset<N> r(*this);
- r.bitset& = is.bitset;
- return r;
- }
-
- template<size_t N>
- inline Intset<N> Intset<N>::operator~() const
- {
- Intset<N> r(*this);
- r.bitset.toggle();
- return r;
- }
-
- template<size_t N>
- inline int Intset<N>::operator==(const Intset<N> &is) const
- {
- return bitset == is.bitset;
- }
-
- template<size_t N>
- inline int Intset<N>::operator!=(const Intset<N> &is) const
- {
- return bitset != is.bitset;
- }
-
- template<size_t N>
- inline int Intset<N>::operator<=(const Intset<N> &is) const
- {
- return subsetof(is);
- }
-
- template<size_t N>
- inline int Intset<N>::operator>=(const Intset<N> &is) const
- {
- return is.subsetof(*this);
- }
-
- template<size_t N>
- inline int Intset<N>::contains(size_t n) const
- {
- return bitset.test(n);
- }
-
- template<size_t N>
- inline Intset<N>& Intset<N>::insert(size_t n)
- {
- bitset.set(n);
- return *this;
- }
-
- template<size_t N>
- inline Intset<N>& Intset<N>::remove(size_t n)
- {
- bitset.reset(n);
- return *this;
- }
-
- template<size_t N>
- inline size_t Intset<N>::count() const
-
- {
-
- return bitset.count();
- }
-
- template<size_t N>
- inline int Intset<N>::subsetof(const Intset<N>& is) const
- {
- bits<N> r(bitset);
- r &= is.bitset;
- return bitset == r;
- }
-
- template<size_t N>
- void Intset<N>::print(ostream& os) const
- {
- os << '{';
- int first_time = 1;
- for (int i = 0; i < N;++i)
- if (bitset.test(i))
- {
- if (!first_time)
- os << ',';
- os << i;
- first_time = 0;
- }
- os<<'}';
- }
-
- #endif
-
- /* End of File */
-
-
- Listing 10 Tests the Intset class
- //tintset.cpp
- #include <iostream.h>
- #include "intset. h"
-
- main()
- {
- Intset<16> x, y;
-
- for (int i = 0; i < 10;++i)
- {
- x.insert(i);
- if (i % 2)
- y.insert(i);
- }
-
- cout << "x == " << x << endl;
- cout << "y == " << y << endl;
- cout << "y < = x? " << (y <= x) << endl;
- cout << "y >= x? " << (y >= x) << endl;
- cout << "x - y == " << x - y << endl;
- cout << "x + y == " << x + y << endl;
- cout << "x * y == " << x * y << endl;
- cout << "~x == " << ~x << endl;
- cout << "x.contains(2)? " << x.contains(2) << endl;
-
- cout << "y.contains(2)? " << y.contains(2) << endl;
- cout << "x.count() == " << x.count() << endl;
- cout << "y.count() == " << y.count() << endl;
- return 0;
- }
-
- /*Output:
- x == {0,1,2,3,4,5,6,7,8,9}
- y == {1,3,5,9}
- y <= x? 1
- y >= x? 0
- x - y == {0,2,4,6,8}
- x + y == {0,1,2,3,4,5,6,7,8,9}
- x * y == {1,3,5,7,9}
- ~x == {10,11,12,13,14,15}
- x.contains(2)? 1
- y.contains(2)? 0
- x.count() == 10
- y.count() == 5
- */
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- On the Networks
-
-
- Special Issue: USENET Network News Update
-
-
-
-
- Sydney S. Weinstein
-
-
- Sydney S. Weinstein, CDP, CCP is a consultant, columnist, lecturer, author,
- professor, and President of Datacomp Systems, Inc., a consulting and contract
- programming firm specializing in databases, data presentation and windowing,
- transaction processing, networking, testing and test suites, and device
- management for UNIX and MS-DOS. He can be contacted care of Datacomp Systems,
- Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail
- on the Internet/USENET mailbox syd@DSI.C0M (dsinc!syd for those who cannot do
- Internet addressing).
-
-
- Each January I like to provide an overview of networks, both to answer
- questions I am commonly asked throughout the year, and to benefit readers who
- are completely new to networks. This is my fourth anniversary in writing this
- column, and its time to update the prior January columns. In prior years I
- have written about the following subjects: the internet, the Internet, USENET,
- Network News, obtaining a news feed, and obtaining sources from archive sites.
- This year, I go over pretty much the same subjects: some basic definitions and
- information about networks, a description of USENET Network News, and finally,
- instructions for obtaining software from the networks.
- First, a quick overview of my column. "On The Networks" covers articles posted
- to several of the source groups on USENET Network News: comp.sources.games,
- comp.sources.misc, comp.sources.reviewed, comp.sources.unix, comp.sources.x,
- and alt.sources. Each of these groups is like a section of a large electronic
- magazine called USENET Network News. I call USENET Network News a magazine,
- and not a bulletin board, partly because of the way it distributes its news.
- Unlike a bulletin board, where each reader accesses a central machine to read
- the messages, USENET delivers Network News on a subscription basis to each
- computer, and subscribers read the articles locally. In "On the Networks," I
- let you know about some of the new postings to USENET Network News.
-
-
- Some Definitions and Basic Information
-
-
- Ok, I bandy about the terms USENET, Internet, and "the net," among others in
- this column. Its time to update the definitions of these items.
-
-
- USENET, internets, and the Internet
-
-
- USENET, oftentimes referred to as "the net," is a loose collection of
- cooperating computers. In the past, all of USENET ran UNIX, but now with other
- computers and operating systems supporting UUCP or similar transfer protocols,
- USENET computers could be running anything from MS/DOS to VAX/VMS. A computer
- is considered to be on USENET if it communicates via electronic mail to other
- computers on USENET (I realize I've provided a slightly circular definition).
- USENET consists of Electronic Mail, file transfers, and Network News. Most of
- the programs you read about in this column are distributed via Network News.
- If your computer talks to USENET or another computer via some forwarding
- gateway using a protocol other than UUCP, you are on an internet, short for
- inter-network. Being "on an internet" just means that you are using some
- network other than USENET. Note that this internet is spelled with a lower
- case i, and includes the Internet and several other networks such as CSNET and
- BITNET. When you're on an internet, your actual connection to USENET is via a
- gateway computer that talks to both the network you use and USENET.
- The Internet (capital I) is the set of computer networks that are
- interconnected by the TCP/IP protocol and listed in the routing tables
- maintained by the InterNIC. This giant network of networks grew out of the
- Defense Department's ARPANET (Advanced Projects Research Agency Network).
- While USENET sites make phone calls to other computers, sending information in
- a store and forward fashion, the Internet is mostly a set of machines with
- permanent or on-demand connections that allow direct real-time communication
- between any two computers on the network. In addition, Internet lines usually
- run faster than the dial up lines used by UUCP. Most inter-city traffic and
- the vast majority of Network News is now transfered via the Internet.
- The Internet is undergoing rapid change, and any column that attempts to
- describe it is chasing a moving target. In fact, as of the last months of
- 1993, as this column is being written, the Internet is on the verge of a very
- major change: it is moving from the public to the private sector, with NSFNET
- becoming the National Information Superhighway. However, to give a quick
- description of its current structure, I can say that Internet is a set of
- nation- wide backbone links connecting areas of the country at 45Mb/s (million
- bits per second). Connected to those backbone links are many regional networks
- running at speeds between 1.544Mb/s and 45Mb/s. Connected to the regionals are
- individual networks such as the networks at Datacomp Systems, Inc. These
- networks connect to the regionals at between 9.6kbps (on-demand dial-up links)
- through 1.544Mb/s and 45Mb/s (leased line connections).
- Whereas connection to USENET via UUCP usually provides only mail and news, the
- Internet runs the TCP/IP protocol and thus supports news (NNTP, Network News
- Transfer Protocol), mail (SMTP, Simple Mail Transfer Protocol), remote logins
- to any computer on the network on which you have an account telnet), remote
- file transfer (FTP, file transfer protocol), and many real-time on-line search
- engines including Archie, Gopher, and World-Wide-Web. All of these services
- coexist and work in real time.
-
-
- Internet and USENET Addressing
-
-
- The Internet performs much of the bulk transfer work for USENET; problems
- often occur because Internet and USENET use two different addressing methods.
- Since a large amount of the software mentioned in this column comes from
- USENET or the Internet, it's worthwhile to understand how to format the two
- types of addresses. A UUCP or USENET address is made up of site names
- separated by ! characters, as in uunet!dsinc!syd. If a site wants to mention
- more than one "well known site" to use as a route, it usually lists them in {
- } characters, as in {uunet, decwrl}!dsinc!syd. In this case, you can use
- either uunet!dsinc!syd or decwrl!dsinc!syd. USENET addressing presents a
- problem -- to do it you must know the complete path from your site to the
- destination site. Some systems run programs to help with this routing, and
- USENET's UUCP Mapping Project publishes maps to automate this process.
- However, not all sites have registered to be listed in these maps.
- Registration is free, recommended, and accomplished by sending your entry to
- rutgers!uucpmap. The mapping project continuously updates the maps and
- distributes them via the USENET news group comp.mail.maps.
- On the Internet, all sites have a unique "Fully Qualified Domain Name" which
- is administered by the NIC. My site's domain name is node.DSI.COM, where node
- is the individual computer at my site. Thus my full current address is
- syd@dsinc.DSI.COM, but our mailer, like the mailers at a lot of Internet
- sites, is smart, and knows how to forward the mail to me even if you send it
- to syd@DSI.COM. This feature allows me to move around within the DSI.COM
- domain without having to tell everyone a new address. The Internet does not
- require you to know the path to the site; you only need to know the domain
- name. The domain name is the complete address to that site.
- Now, a word of warning. Mixing both @ and ! in the same address leads to
- trouble. Not everyone follows the standard and processes the addresses
- correctly. Converting sitea!user@DSI.COM to a UUCP address would properly
- result in dsinc!sitea!user. Note that the @ has higher precedence than the !.
- Many sites get this standard wrong, and will cause your mail to bounce (be
- returned to you as undeliverable). Some sites, ours included, allow UUCP mail
- to have addresses including domain names in the ! path, as in
- dsinc!host.domain.type!user. Where allowed, this form of addressing is usually
- more reliable than mixing the ! and @'s.
-
-
- Public Domain vs. Freely Distributed Software
-
-
- Lastly, what is Public Domain Software and what is Freely Distributable
- Software? Much of the software described in this column is Freely
- Distributable, in that you pay no licensing fee if you are acquiring it for
- personal use. Some distributors even allow business use of Freely
- Distributable software for no fee. While most software in this column is
- Freely Distributable, almost all of it is not in the Public Domain. If
- software is in the Public Domain, either the copyright has run out and was not
- renewed, or its authors have specifically renounced copyright protection and
- have placed the software in the Public Domain. For most software mentioned in
- this column the copyright to the software is held either by the author or by
- some group. They then give the user rights to use and distribute the software
- for no charge. This practice does not place the software in the public domain.
- You still cannot sell this software, nor pretend that you wrote it. Many of
- the licensing agreements restrict how the software can be used for business
- purposes.
- Freely Distributable software is also different from Shareware, in that
- Shareware developers expect the user to pay a fee if he or she intends to
- continue using the program. Freely distributable software developers do not.
-
-
- USENET Network News
-
-
- This column refers to items posted to the source news groups of USENET Network
- News. How do USENET and USENET Network News differ? USENET Network News is a
- subset of the computers on USENET and the internet that agree to exchange one
- or more of the categories of Network News. Currently there are about 8,000
- different news categories, called newsgroups. The newsgroups are broken down
- into several hierarchies. These categories include the traditional major
- hierarchies of news, comp, rec, sci, soc and talk; regional hierarchies such
- as na, usa, ba, pa, nj (and others); and specialized hierarchies such as
- bionet, biz, bit, UNIX-pc, u3b (and others). The major hierarchies are the
- most widely distributed, accessing over a million computers worldwide.
- Regional hierarchies distribute messages of interest only over a particular
- region, such as North America (na), the United States (usa), the San Francisco
- Bay Area (ba), the state of Pennsylvania (pa), or New Jersey (nj), just to
- name a few. The specialized hierarchies serve communities with special
- interests.
- Each hierarchy has its own set of rules, which are enforced by consensus.
- USENET itself has no governing body, just a set of guidelines, that individual
- computer owners or administrators follow as they see fit. This scheme seems to
- work most of the time, as the net runs without too much chaos. There is even a
- hierarchy that runs without rules, called alt.
- You are considered to be a recipient (or to pass) network news if your
- computer subscribes to one or more of the newsgroups in any of the
- hierarchies. Some sites receive only a handful of the groups, some receive
- most, and some receive all. However, we are talking about a large amount of
- information -- over 60 megabytes of new postings every day. This volume is
- growing at about 7% per month. At 60 megabytes per day, each site can keep
- only a small portion of the feed on line at a time, and at that, only a few
- days worth.
- With so much information coming in each day, it would seem like a lot of work
- just maintaining it, or finding something worthwhile to read. It's not that
- bad. The software to run the news system controls itself almost automatically.
- The software includes facilities to send only those groups to which a
- recipient subscribes, as well as expiring old articles to recover space.
- However, to be a major site in the USENET Network News distribution system
- does require a large amount of disk space, and considerable modem time.
- Several of the groups are of interest to readers of this column; these are the
- source distribution groups. Generally, these groups congregate under the comp
- hierarchy in a collection called sources, thus the names comp.sources.unix and
- comp.sources.misc. Originally, comp.sources.unix released Freely Distributable
- sources that were designed to run on UNIX systems. Many of the sources posted
- there now also run on personal computers and other operating systems, but all
- can run on UNIX. Some of the other sources groups in the comp hierarchy
- currently are: amiga -- for software specific to amiga systems, atari.st --
- for atari-specific software, games -- restricted to game software (and game
- software is also restricted to this group), mac -- for Apple mac's, misc --
- general software, not necessarily for UNIX systems, reviewed -- a
- peer-reviewed software source code group, sun -- software specifically for Sun
- Microsystems workstations, and x -- software for the X windowing system.
-
- Authors submit their sources to a moderator, who is the only person allowed to
- post to the group. The moderator bundles the sources for distribution, checks
- that they are complete, and posts them. The moderator also assigns Volume and
- Issue numbers to each of the postings. A submission might require several
- issues, because an issue is limited to 60K to 100K bytes. The moderator also
- posts periodic indices of the sources posted to his group. Unlike some groups,
- moderated groups post no discussions of software. Moderated groups post only
- software. This restriction gives moderated groups what is called a high
- "signal-to-noise ratio."
- Because of the high "signal to noise ratio" in these groups, some computer
- sites around the world save the sources for future access. These sites are
- called archive sites. Each archive site decides on its own what groups to
- archive and for how long to keep the archives. It's to these archive sites I
- refer you to obtain the sources mentioned in the column. Why to the archive
- sites? Because the individual members of USENET, unless they archive these
- groups, will have deleted the sources to make room for the newer postings,
- usually within a week of the original posting.
-
-
- 60M a Day, How Can This Work?
-
-
- A small bit of simple math applied to the Network News volumes yields some
- impressive numbers. If your computer exchanges network news with two neighbors
- (a small site), you are receiving 60Mb a day for a full feed, and sending that
- 60Mb each day to the second site. As a participating site in Network news
- broadcasts, you send any article you get from one site to all other sites you
- are connected to that have already not received the article. Now transferring
- 60Mb per day on a 9600 bps phone line (960 characters per second maximum
- speed) requires approximately 36.40 hours on the phone per day. Not possible,
- you will fall behind very quickly. The solution is to send articles in
- compressed batches. The compression reduces the batch sizes by about 50-70%,
- cutting that 36.40 hours to 12-18 hours. Still a big phone bill. How do sites
- cut that down even further? Most big sites run special modems, such as
- V.32bis/V.42bis (at 19,200bps) or Telebit Worldblazers (at 2,250cps), which
- cuts transmission time down by another factor of two, to about 6-9 hours per
- day.
- A major site might exchange news with 20 or more neighbors. How can they do
- that? Several ways -- one is via a whole bank of modems. Another way to send
- data to so many sites is to only exchange partial feeds of selections from the
- list of 8,000 newsgroups. And the last way is via the high speeds offered by
- the Internet.
-
-
- Getting Software
-
-
-
-
- Network News Software
-
-
- There are now two current Network News transport software suites: C News and
- INN.
- The current version of the traditional Network News transport software is
- named C News, not because it is written in C, but because it follows A News
- and B News as the third rewrite of the transport software. C News supports
- transfer of the news articles (the individual messages) between every member
- of the USENET network. C News works best for smaller sites that mostly have
- UUCP feeds, and that feed a limited set of neighbors.
- Larger sites, especially those with Internet connections, generally run
- Internet Network News (INN) from Rich Salz. INN is largely responsible for
- cutting down the time it takes for an article to be propagated throughout the
- backbone networks. Whereas C news uses batching to distribute articles in
- bulk, INN uses an immediate transfer to its NNTP (TCP/IP based network news
- protocol) neighbors. Thus an article on Internet now reaches most of the
- backbone and regional network sites in only one to five minutes. (Just three
- years ago this delay was close to a day).
-
-
- Getting Software Mentioned in This Column
-
-
- Since particular sites keep news articles online for a short period of time
- (usually less than two weeks), by the time a piece of software appears in this
- column, it will have been expired and deleted for a long time. Thus you must
- access a news archive site. Many sites around the country have agreed to
- archive specific news groups. These sites are listed in the comp.archives news
- group. Many of the archive sites also identify themselves in their USENET
- Mapping Project map entry. I have even listed some in this column. How you
- access the archives depends on where they are, and how that site has set up
- access. Most archives allow either FTP or UUCP access and a few even allow
- both.
- If a site supports FTP access, you need to be on the Internet to access it.
- FTP allows you to open up a direct connection to the FTP server on a remote
- system and transfer the files directly to your system FTP will prompt for a
- user name and optionally a password. Most FTP archive sites allow you to enter
- a user name of "anonymous." If such a site then prompts for a password, any
- password will work, but convention and courtesy dictate that you use your name
- and site address for the password.
- If a site supports UUCP access, anyone with UUCP can access the archives. Most
- sites of this type publish a sample entry for the Systems file (L.sys) showing
- the system name, phone number of their modems, the connection speeds
- supported, and the login sequence. Using the uucp command you can poll the
- system directly and retrieve the software. Many sites post times-of-day
- restrictions on when you should access the modems. Courtesy dictates that you
- follow their requests, and some sites enforce the limit with programs. Be sure
- to call far enough before the end of the period to complete your transfer in
- time.
- A third transfer method, used for smaller files, is through access to an
- electronic-mail-based archive server. In this method, you send an electronic
- mail message to the archive server's mailbox name specifying the files you
- wish. The server will return the files to you via electronic mail. Remember
- that many sites limit the size of a single mail message, so don't ask for too
- much at once. Also remember that the archive server is a program, so phrase
- your request exactly as specified in the instructions for that archive server,
- and limit your message to exactly that request. Other comments in the message
- could confuse the program and make it fail to honor your request.
- Lastly, if your site is not connected to any network, some archive sites will
- copy the software onto media for you, if you send them a disk or tape along
- with return postage and a mailer. Other sites sell media with the software
- already copied onto it. This practice is especially useful for the largest
- distributions, such as the X windowing system, which spans multiple tapes.
- If you don't have Internet access, but subscribe to UUNET, UUNET will retrieve
- the files via FTP for you and make them available for UUCP access.
-
-
- What to Retrieve
-
-
- When I list a package from the newsgroups, I provide five pieces of
- information for each package: The Volume number, Issue(s) numbers, archive
- name, the contributor's name, and the contributor's electronic mail address.
- The Volume and Issue are specifically named in the listing. The archive name
- is in italics, and the contributor's name is followed by his or her electronic
- mail address, enclosed in<>'s.
- To locate a package via WAIS or archie, use the archive name. The archive name
- is the short, one-word name in italics given with each listing. To find the
- file at an archive site, use the group name (from the section of the column
- you are reading -- I place all listings for each group together in the
- column), the volume number, and the archive name. Most archive sites store the
- postings as group/ volume/archive-name. The issue numbers tell you how many
- parts the package was split into when posted. You can use issue numbers to be
- sure to get all of the parts.
- In addition, I report on patches to prior postings. These patches also include
- the volume numbers, issue(s) numbers, archive name, the contributor's name,
- and the contributor's electronic mail address. Patches are stored differently
- by different archive sites. Some sites store patches along with the original
- volume/archive name of the master posting. Some sites store them by the
- volume/archive name of the patch itself. The archive name listed is the same
- for both the patch and the original posting.
- Alt.sources, being unmoderated, does not have volume and issue numbers. So I
- report on the date in the "Date:" header of the posting and the number of
- parts in which it appeared. If the posting was signed an archive-name by the
- contributor, I also report on that archive name. Archive sites for alt.sources
- are harder to find, but they usually store things by the archive name.
-
-
- Where to Retrieve Listings
-
-
- The problem then, is finding out which sites archive which groups, and how to
- access these archives. I again refer to the articles by Jonathan I. Kames of
- the Massachusetts Institute of Technology, posted to comp.sources.wanted and
- news.answers. These articles appear weekly and explain how to find sources.
- As a quick review, here are the steps:
- I. Figure out in what group, Volume, and Issue(s) the posting appeared. Also
- try and determine its archive name. If you know these items, it's usually easy
- to find an archive site that archives that group. Most archive sites keep
- their information in a hierarchy, ordered first on the group, then on the
- volume number, and last on the archive name. These specifications together
- usually make up a directory path, as in
- comp.sources.unix/volume22/elm2.3
- In that directory you will find all of the articles that made up the 2.3
- release of the Elm Mail User Agent that was posted in Volume 22 of the
- comp.sources.unix newsgroup. If you do not know the archive name, but do know
- the volume, each volume also has an Index file that you can retrieve and read
- to determine the archive name. UUNET is one common publicly accessible archive
- site for each of the moderated groups mentioned in this article.
- II. If you do not know which sites archive the groups, or even if any site is
- archiving a particular item (because they are not archiving the entire group),
- consult Archie. (See "On the Networks," CUJ August 1991, Vol. 9, No. 8).
- Archie is a mail response program that tries to keep track of sites reachable
- via FTP that have sources available for distribution. Even if you cannot
- access the archive site directly via FTP, it is worth knowing that the archive
- site exists because there are other ways of retrieving sources available only
- via FTP. Archie can help you find out if the archive site exists, and where.
- III. If you know the name of the program, but do no know what group it was
- posted in, try using Archie and search for the program based on the name.
- Since most sites store the archives by group and volume, the information
- returned will tell you what newsgroup and volume it was posted in. Then you
- can retrieve the item from any archive site for that newsgroup.
- IV. If you do not even know the name, but know you are looking for source code
- that performs some function, retrieve the indexes for each of the newsgroups
- and see if any of the entries (usually listed as the archive name and a short
- description of the function) look reasonable. If so, try those. Or, make a
- query to Archie based on some keywords from the function of the software, and
- perhaps it can find items that match.
-
-
- CD-ROM Archives
-
-
-
- CD-ROMs containing USENET-posted sources, as well as other sources, are also
- available. Two of the larger publishers are Walnut Creek CD-ROM and Prime Time
- Freeware.
- Walnut Creek CD-ROM, 1547 Palos Verdes Mall, Suite 260, Walnut Creek, CA (800)
- 786-9907 or (510) 947-5996 publishes several CD-ROMs each year. Published
- software includes the Simtel20 MS-DOS Archive, the X and GNU archives,
- MS-Windows sources, and other collections of sources and binaries. Disks run
- from $25 to $60 each (varying by title) plus shipping. In addition, Walnut
- Creek offers those hard to find CD-caddys at reasonable prices.
- Prime Time Freeware, Prime Time Freeware, 370 Altair Way, Suite 150,
- Sunnyvale, CA 94086, (408) 738-4832, <ptf@cfcl.com>, publishes twice a year a
- collection of Freely Distributable source code, including the complete USENET
- archives. Prime Time's disks run about $60 each set plus shipping. The latest
- issue, 1993, has over 3 Gb of source code spread over two disks. Prime Time
- also offers a standing subscription plan at a discount.
-
-
- Conclusion
-
-
- I hope this special edition of my column has given you a hint as to how to
- read my column and track down the sources. Note: I have been asked many times
- if I can make floppies or tapes containing the software mentioned in my
- column. I cannot spare the time to do this. I also have to work (and teach)
- for a living, and if I started doing this, I could easily spend all my time
- trying to fulfill the requests and never get any of my work done.
- However, what I have offered to do in the past, and am still willing to do, is
- provide a list of USENET sites in your area code. Send me a self-addressed,
- stamped envelope (my address is in the bio squib attached to this column).
- Those living in major metropolitan areas, please include two stamps on your
- letter. Note: I can only offer this service for US area codes. If you have net
- access, but need a news neighbor, I will also reply to Electronic Mail asking
- for nearby news sites.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- CUG New Releases
-
-
- C Exploration Tools, GNU Indent, and Miscellaneous Goodies
-
-
-
-
- Victor R. Volkman
-
-
- Victor R. Volkman received a BS in Computer Science from Michigan
- Technological University, He has been a frequent contributor to The C Users
- Journal since 1987. He is currently employed as Senior Analyst at H.C.I.A. of
- Ann Arbor, Michigan. He can be reached by dial- in at the HAL 9000 BBS (313)
- 663- 4173 or by Usenet mail to sysop@hal9k.com.
-
-
-
-
- Introduction
-
-
- This month's new offerings actually spreads five new releases across three CUG
- volumes. Generally, a single CUG volume covers one specific library or
- programming tool. A CUG volume typically consists of from one to ten
- individual diskettes. However, there are some releases for which an entire
- diskette provides way too much space. These releases often have titles like
- "Miscellany #x," reflecting the eclectic nature of the submissions. For small
- releases, the CUG Library will cluster several of them on the same diskette
- (and hence the same volume). This practice both conserves resources and
- provides more value to the end users of the CUG Library.
-
-
- New Library Acquisitions
-
-
-
-
- C/C++ Exploration Tools v2.12: CUG #391
-
-
- C/C++ Exploration Tools, by Juergen Mueller (Kornwestheim, Germany), includes
- both his C Function Tree Generator (CFT) and the C Structure Tree Generator
- (CST). CFT and CST analyze the C/C++ source code of applications of any size,
- over multiple files. CFT and CST are useful for exploring new, unknown
- software and for supporting re- use, maintenance and re- engineering of
- software. By preprocessing, scanning, and analyzing the program source code,
- these programs generate the function call hierarchy (CFT) and the data
- structure/class (CST) relations. While both programs can handle C and C++
- code, CFT can also analyze assembly language code.
- An important feature of this product is its database generation, which allows
- you to recall code information without reprocessing the source. You can read
- this database again from CFT and CST to produce different outputs or to add
- new files to the database. The database format is dBASE compatible. Special
- recall programs called CFTN and CSTN perform fast searching for items in the
- database. These programs can be used within any environment, for example, from
- within editors like BRIEF, QEDIT, or MicroEMACS (DOS and Windows version), to
- provide a full software project management system with access to all functions
- and data types with just a keystroke. This feature makes a comfortable
- "hypertext source code browser and locator" system out of the editor.
- The documentation is supplied in ASCII and includes 63 pages of reference
- material. The manual is written in plain English and should be accessible to
- even novice C programmers even though the manual discusses advanced
- techniques.
- The C Exploration Tools v2.12 released 07/03/93) are immediately available as
- MS- DOS executables on CUG volume #391. The C Exploration Tools are shareware
- and require registration with the author if you decide to use them beyond the
- 30- day evaluation period. The registration price is $46 U.S. or 60 DM for a
- single copy. Generous site license discounts with prices as low as $15 are
- appropriate for corporate use or educational institutions. Registered users
- automatically receive Protected Mode versions of the tools optimized for the
- 80386 and the latest versions of everything. Source code for the C Exploration
- Tools is not available.
-
-
- GNU Indent v1.8: CUG #392
-
-
- GNU Indent, from Joseph Arceneaux (San Francisco, CA), becomes the newest
- installment of high quality tools from the GNU project. The Indent program
- changes the appearance of a C program by inserting or deleting whitespace. The
- Indent program can make code easier to read. It can also convert C code from
- one writing style to another. Indent understands a substantial amount about
- the syntax of C, but it also attempts to cope with incomplete and misformed
- syntax. Indent can replace the original source .C file and retain a backup
- copy or else write its output to a new .C file.
- There are several common styles of C code, including the GNU style, the
- Kernighan & Ritchie style, and the original Berkeley style. You may select a
- style with a single "background" option, which specifies a set of values for
- all other options. However, explicitly specified options always override
- options implied by a background option. Thus, you can create hybrid styles or
- a new coding style uniquely your own by combining the many option settings.
- Option settings cover many things that programmers regularly spar about, such
- as:
- Placement of blank lines, braces, and comments
- Special handling for braces around if/then/else constructs
- Spacing around casts and sizeof
- Overall number of spaces per indentation level
- Alignment of parentheses on continuation lines
- All aspects of function declaration layout
- Each option can be specified in short form or long form. For example, the
- short form option "- ncdb" can be entered as
- "--no-comment-delimiters-on-blank-lines."
- GNU Indent supports MS- DOS, OS/2, VAX VMS, and most versions of UNIX. For
- UNIX versions, Indent includes the popular GNU auto- configuration utility
- which customizes the Makefile to meet the needs of your system. The CUG
- Library distribution includes source code only. GNU Indent v1.8 (released
- 06/16/93) is immediately available as CUG volume #392.
-
-
- LL, GIFSave, and Cordic++: CUG #393
-
-
- As you might have guessed from the introduction, this volume is something of a
- C potpourri. George Matas (University of Surrey, U.K.) presents his LL for a
- generic double- linked list library with examples. Sverre H. Huseby (Oslo,
- Norway) contributes GIFSave to save bitmaps in this popular image file format.
- Last, Cordic++ by Timothy M. Farnum (Rochester, NY) builds on Michael
- Bertrand's C implementation of fast trigonometric functions. Altogether, these
- are three very useful and specialized tools for common C problems. The entire
- set fits on just one diskette. This diskette is immediately available as CUG
- volume #393.
-
-
-
- LL: CUG#393A
-
-
- LL is a double- linked list handler library with more than four dozen operator
- functions. Variables of any type can be stored in an LL list. Individual
- elements of a list may be of different types. With LL, you can create any
- depth of lists of lists. You create an instance of a list using either ConsLL,
- ConsCopyLL, or ConsPtrLL functions. It's best to call one of these functions
- at the point of a list variable's declaration. You must assign the result of
- one of the constructor functions to a given list instance before passing it to
- any other function in the LL library.
- ConsLL creates an empty list. ConsCopyLL(src) creates a new copy of an
- existing list. ConsPtrLL(src) creates a list of pointers to elements stored in
- list src. DestLL(list) destroys a list; i.e. it deletes all elements and frees
- all memory allocated for list. DestLL should be called at the point where list
- goes out of scope.
- LL has been tested only on SUN Sparcstations and DEC Ultrix machines. In these
- environments, LL works with both the native "cc" compiler as well as GNU C
- ("gcc"). The CUG Library distribution includes source only.
-
-
- GIFSave: CUG#393B
-
-
- The GIFSAVE library enables you to save GIF- images from your own graphics-
- producing C programs. GIFSAVE creates simple GIF files following the GIF87a
- standard. You can't create GIF files from interlaced images, and you should
- store only one image per file.
- GIFSAVE consists of four functions, all declared in GIFSAVE.H:
- GIF_Create creates new GIF files. GIF_Create takes parameters specifying the
- filename, screen size, number of colors, and color resolution.
- GIF_SetColor sets up the red, green, and blue color components. It should be
- called once for each possible color.
- GIF_CompressImage compresses an image. GIF_CompressImage accepts parameters
- describing the position and size of the image on screen, and a user- defined
- callback function that is supposed to fetch the pixel values.
- GIF_Close terminates and closes a GIF file.
- You should call these functions in the listed order for each GIF file. You
- must close one file before you creates a new one. To use these functions, you
- must create a callback function that will retrieve the pixel values for each
- point in the image.
- GIFSAVE includes a makefile for use with Borland C/C++. Huseby claims he has
- taken care to insure that all byte- order sensitive operations are handled in
- a platform- independent method. Therefore, the source code should work without
- modification on non- MS- DOS platforms.
- The Graphics Interchange Format(C) is the Copyright property of CompuServe
- Incorporated. GIF is a Service Mark property of CompuServe Incorporated.
- Huseby has released GIFSAVE as Public Domain source code with no restrictions
- on its use.
-
-
- CORDIC++: CUG#393C
-
-
- The Coordinate Rotational Digital Computer (CORDIC) was an early device
- implementing fast integer sine and cosine calculations. By favoring integer
- operations over floating point, CORDIC demonstrated a classic computing
- tradeoff of speed vs. precision. Although the CORDIC algorithm was first
- documented by Jack E. Volder in 1959, most CUJ readers may remember Michael
- Bertrand's C implementation (see "The CORDIC Method for Faster sin and cos
- Calculations," CUJ, November 1992). Farnum presents his own reimplementation
- of Bertrand's code -- this time as a full C++ class.
- According to Farnum, the most notable change in his C++ version is his
- encapsulation of variables which were global in the C version. As static
- member variables of a CORDIC class these encapsulated variables become
- protected from accidental modification by routines unaware of them. Moving
- these variables inside a class structure makes it necessary to develop
- interface routines. Farnum decided to predefine one member of the CORDIC
- class, cord, to access the member functions which compute the integer sine and
- cosine. You could create other instances of class cordic that would work as
- well as the predefined instance, but there is little advantage to this in the
- current implementation.
- Ambitious readers could build upon this implementation by providing the
- ability to instantiate the CORDIC class with different levels of accuracy. For
- example, a programmer could provide different levels of accuracy by using
- different bases for the CORDIC algorithm. Farnum decided against this approach
- because the complexity of such an implementation seemed to go against the
- straightforwardness which is the main advantage of the CORDIC algorithm.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Editor's Forum
- I'm back to being optimistic again, though heaven knows why. I write these
- words shortly after returning from the November 1993 joint meeting of ANSI
- X3J16 and ISO SC22/WG21, the folks standardizing C++. It's not as if I got
- everything I wanted at that meeting -- far from it. But the meeting
- nevertheless represents an important watershed. The public will soon be able
- to get its hands on a reasonably complete, albeit informal, draft of the C++
- standard.
- Four years in the making, produced by a cast of hundreds, this
- mini-spectacular will be in its most complete form ever by about the end of
- January 1994. It will debut as a draft document circulated within ISO SC22,
- the parent committee for all international programming language standards.
- Whatever reviews it garners from its first public exposure will be unofficial
- -- sort of like trying out a Broadway play in Providence or Boston -- but the
- early feedback will be important nonetheless.
- The pressure to produce this informal draft document (it was actually promised
- by December 1993) has been salutary. Andy Koenig, the new project editor, has
- cheerfully divvied up the rather large backlog of unincorporated edits to the
- Working Draft. I won the questionable honor of updating my earlier draft of
- the library portion (see my Editor's Forum, CUJ October 1993) to incorporate
- the latest additions to the C++ library. All this frenzy comes to a head by
- mid January, when Andy and a few friends bring all the bits together in an
- editing bee held at AT&T Bell Labs.
- Pasting the entire draft together for the first time is an important
- milestone, but it is far from the last one. You can bet that changes will be
- adopted at the March 1994 meeting in San Diego CA, and at the July 1994
- meeting in Waterloo ON. The push is on to incorporate all the remaining
- proposed additions before the formal public review process commences. And that
- is supposed to happen for the draft produced after the Waterloo meeting. From
- that point on, you can expect tweaks of steadily diminishing inventiveness, in
- response to public commentary. The process will still take years to result in
- a formal process.
- So why am I optimistic? Because the C Standard went out for informal public
- review in April 1985 and had an immediate stabilizing effect on the entire
- community. It also had a stabilizing effect on the committee itself. People
- felt they knew what Standard C was from that date onward, and they weren't
- keen to see major changes. (In fact, the formal adoption of the C Standard in
- 1989 was an anticlimax.) C++ is much bigger and fuzzier than C, at the
- comparable point of development, but I hope it enjoys similar stability soon.
- That's cause enough for optimism.
- P.J. Plauger
- pjp@plauger.com
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- New Products
-
-
- Industry-Related News & Announcements
-
-
-
-
- StratosWare Releases MemCheck v3.0 Professional for DOS
-
-
- Stratos Ware Corporation has released MemCheck v3.0 Professional for DOS, an
- error detection and prevention tool for C/C++ programmers. MemCheck v3.0
- Professional detects memory overwrites, memory underwrites, memory leaks, heap
- corruption, stack overwrites, out-of-memory conditions, and other memory
- errors without requiring any source code changes.
- MemCheck v3.0 Professional for DOS checks for memory overwrites and memory
- leaks in all parts of the host application, including third-party vendor
- libraries, with or without source code. Other features include detection of
- stack and static variable ovewrites, and a mouse-driven MemCheck control
- panel. MemCheck v3.0 requires no debugging information and can be used in
- combination with debuggers such as CodeView, Turbo Debugger, and WVIDEO.
- MemCheck v3.0 uses proprietary storage technology which, according to the
- company, makes it run "five to fifty times faster than any other,
- similar-featured debugger."
- MemCheck v3.0 integrates with existing C/C++ code and pops up at run time to
- identify errors by exact file and line number in the source code, or by
- functions called back to main if the error is found in object code. MemCheck
- v3.0 can be linked in or compiled with no source code changes. MemCheck v3.0
- can be switched on or off at run time, and linked out via a "Production"
- library. MemCheck v3.0 Professional for DOS is available for the Microsoft,
- Borland, Intel, Watcom, and MetaWare compilers for $139. For more information
- contact StratosWare Corporation, 1756 Plymouth Rd., Suite 1500, AnnArbor,
- MI48105, (800) 933-3284 or (313) 996-2944; FAX: (313) 747-8519.
-
-
- Ready-to-Run Introduces C/C++ Development System for SCO UNIX and Xenix
-
-
- Ready-to-Run Software, Inc. has introduced a C/C++ development system for SCO
- UNIX and SCO Xenix systems. This extension to Ready-to-Run's LanguagePak#1
- lets users of SCO and compatible UNIX systems build software independent of
- proprietary development systems. LanguagePak#1 is based on the GNU compiler,
- so ANSI and POSIX-compliant compilers and libraries are available to
- developers. The LanguagePak#1 includes: a set of C libraries, a curses
- library, a Termcap library and associated header files, and GNU C and C++
- compilers. The package also includes the GNU C++ Class Library, and the GNU
- gdb debugger. LanguagePak#1 is available both as executable flies and as
- source code.
- The LanguagePak#1 is one of Ready-to-Run's ReadyPak line of software products,
- which are built into finished, ready-to-run versions for specific UNIX
- environments, eliminating the need for a find/download/build/test/install
- process on the user's part. The LanguagePak#1 is $275. For more information
- contact Ready-to-Run, Rustic Trail, Groton, MA 01450, (508) 448-3959 or (800)
- 743-1723; FAX: (508) 448-2989; e-mail: info@rtr.com.
-
-
- AIB Upgrades SENTINEL
-
-
- AIB Software, Inc., formerly Virtual Technologies, has upgraded its SENTINEL
- debugging environment for C/C++ and X-Window developers. SENTINEL v2.0 has
- added a graphical user interface and will be integrated with Hewlett-Packard's
- Soft-Bench development environment. Softbench is a C/C++ programming
- environment built on the framework for open, integrated CASE. SENTINEL is a
- library of routines that can be linked into UNIX C/C++ programs to help
- programmers locate and resolve hidden bugs in the use of dynamic memory.
- Providing run-time verification of pointer usage and dynamic memory
- allocation, SENTINEL traps memory errors, traces stack, and reports the source
- file, function name, and line number of offending statements. SENTINEL also
- gives developers the same level of information concerning the allocation of
- memory and where the memory was freed or overwritten.
- SENTINEL v2.0 supports HP's Soft-Bench on HP and Sun platforms, and IBM's
- implementation of the system on the RS/6000. SENTINEL's GUI will be available
- on currently supported platforms.
- Users will have their choice of three licensing options for SENTINEL v2.0:
- host-based licensing, which will let any number of users on the host access
- SENTINEL; floating network licensing which will provide licensed access to any
- one user on a network at one time; and floating registered use licensing which
- will designate one person per site by name as the licensed user. SENTINEL v2.0
- ranges from $595 to $1895, and pricing is platform specific.For more
- information contact AIB Software, Inc., 46030 Manekin Plaza, Suite 160,
- Dulles, VA 20166, (800) 296-3000 or (703) 430-9247; FAX: (703) 450-4560;
- e-mail: info@aib.com.
-
-
- Liant Upgrades C++/Views
-
-
- Liant Software has upgraded its object-oriented development tool C++/Views.
- C++/Views v3.0 is a third-generation, object-oriented development tool that
- combines 100 ready-to-use classes with programmer productivity tools. This
- library includes interface, data, event, printer, and extended GUI classes.
- C++/Views uses the local GUI's toolkit whenever possible, allowing programmers
- to create native GUI applications. Features on C++/Views v3.0 include
- C++/Views Constructor, C++/Views Browser, and geometry management.
- C++/Views Constructor lets developers work visually with the C++/Views class
- library. C++/Views combines a visual interface builder with a browser,
- allowing users to switch between drawing and archiving portable resources to
- edit code which call these resources. The C++/Views Interface Builder is a
- WYSIWYG editing tool for designing and testing the behavior of portable
- resources (binary files of GUI objects such as bitmaps, dialogs, and menus).
- Portable resources are called from an application at run time. The same
- resource file can be called from Windows, Motif, Presentation Manager,
- Macintosh, or DOS applications. The C++/Views Constructor also lets users view
- and edit C++ code as they work within Constructor's object-oriented
- environment.
- The C++/Views Browser is a multiple document interface (MDI) application,
- which allows users to open and cut-and-paste among multiple C++ applications;
- view the class hierarchy; edit, inherit, add, and delete classes; and create
- and update header files, make files, and linker response files.
- C++/Views v3.0 also automatically adjusts the geometry of GUI objects so that
- the objects stay in their correct proportion and location when an application
- is moved to other platforms, or when the windows are resized. The API of
- C++/Views v3.0 includes a set of standard interface classes for designing MDI
- application windows. C++/Views v3.0 ranges in price from $149 for an upgrade
- to $2,999 for multi-platform suites. For more information contact Liant
- Software Corporation, 959 Concord St., Framingham, MA 01701, (508)872-8700;
- FAX: (508)626-2221.
-
-
- Imperial Software Adds Converter to X-Designer
-
-
- Imperial Software Technology has added a converter to X-Designer, its Motif
- GUI builder. The converter enables X-designer to convert OPEN LOOK interface
- designs to Motif by reading the .g files used by Sun DevGuide as its savefile
- format and converting them to the equivalent .xd files used by X-Designer. The
- .xd files can then be loaded into X-Designer to display the Motif interface.
- The user can make final adjustments or polish the Motif interface using
- X-Designer before generating the interface code in C, C++ or UIL.
- The OPEN LOOK to Motif converter upgrade is available to existing X-Designer
- users on Sun under their maintenance agreement, and will ship to all new Sun
- X-designer customers as part of the standard product. The converter runs on
- SunOS 4 and SPARC Solaris 2. For more information contact Imperial Software
- Technology, 95 London St., Reading, Berks RG1 4QA, United Kingdom,
- +44-734-587055; FAX: +44-734-589005. U.S. distributor for X-Designer is the
- V.I. Corporation, Northampton, MA, (800) 732-3200.
-
-
- Lucid Ships Energize v2.1
-
-
- Lucid, Inc. has begun shipping Energize v2.1, an upgrade to Energize, their
- integrated, incremental programming system for C/C++. Energize supports a set
- of tools for code construction, editing, compiling, debugging, building
- facilities, and a set of browsers for code understanding and reverse
- engineering.
- Features of Energize v2.1 include: C++ templates support; pre-compiled headers
- capability; source control system integration; syntax highlighting features;
- and Makefile integration. Other features include: menu-driven integration with
- source control systems, including SCCS, RCS, and CVS; support for the use of
- Makefiles with a shell utility; and syntax highlighting features that allow
- users to assign colors and fonts to syntactic constructs in source files.
- Three compiler options have been added for cfront compatibility.
- Lucid also announced the integration of Energize v2.1 with Pure Software's
- performance analysis software, Quantify. With Quantify, programmers can
- analyze the performance characteristics of their projects and identify
- bottlenecks in their code.
- Energize v2.1 is available for Sun SPARCstations and compatibles running SunOS
- v4.x (Solaris v1.0). Energize v2.1 is $16,250 for a workgroup of five or
- $29,500 for a workgroup of 10. Volume discounts and site licenses are also
- available. For more information contact Lucid, Inc., 707 Laurel St., Menlo
- Park, CA 94025, (415) 329-8400; FAX: (415) 329-8480.
-
-
- IDE Ships C++DE on IBM RS/6000
-
-
-
- Interactive Development Environments, Inc. (IDE) has begun shipping Software
- through Pictures C++ Development Environnment (C++DE) designed for the IBM
- RS/6000 workstations and servers. C++DE is an integrated, multi-user,
- object-oriented software development tool that supports design components.
- The C++DE for the RS/6000 integrates two IDE graphical editors supporting
- architectural and detailed design, an interactive reuse browser, and code
- generator for C++ with IBM's C++ POWERbench programming environment. The
- POWERbench intergration uses IBM's SDE WorkBench/06000 to facilitate
- communication between the tools. Interfaces to FrameMaker or Interleaf
- technical publishing systems are also available through IDE's shared
- repository.
- C++DE's graphical editors are driven by Object-Oriented Structured Design
- (OOSD) notation which specifies the drawing and connection rules. C++DE
- supports a C++ specific instantiation of the OOSD notation that provides an
- abstract representation of the language.
- The two graphical design editor components of the C++ Development Environment
- are available on the RS/6000 separately or as an integrated editor set. A
- C++DE five seat Success Package is $75,000 which includes both editors,
- training, consulting, technical support and one year of maintenance.
- Individual C++DE editor sets range from $4,000 to $13,000 per user. For more
- information contact Interactive Development Environments, Inc., 595 Market
- St., 10th Floor, San Francisco, CA 94105, (800) 888-4331; FAX:(415) 543-0145.
-
-
- XVT Announces XVI-DS/XVT-DS++ and PowerObjects
-
-
- XVT Software Inc. has announced XVT Development Solution for C (XVT-DS) and
- XVT Development Solution for C++ (XVT-DS++). These products repackage XVT's
- software to emphasize visual layout and prototyping of portable graphical user
- interfaces, for C and C++ cross-platform applications. New functions in the
- tools address issues such as supporting multiple platforms, screen sizes, and
- handling different languages. XVT-DS bundles XVT-design v3.0 with the XVT
- Portability Toolkit v4.0. (XVT-DS++ is similar bundling for C++.)
- XVT-Design new features include: GUI Object Palette and Layout Toolbar, Bitmap
- Editor, and GUI Object Browser, XVT Portability toolkit enhancements include:
- geometry management, portable bitmaps, internationalization, Help System, and
- custom control enhancements.
- XVT also announced PowerObjects, custom controls that a developer can
- incorporate into a user interface, complementing XVT's GUI development tools.
- PowerObjects include: a table object, a spreadsheet object, a toggle/picture
- button object, a tool bar object, and a status bar. PowerObjects for C was
- scheduled to begin shipment in October 1993, while XVT-PowerObjects for C++ is
- slated for 1994.
- XVT products are priced on a developerseat basis, with no additional user
- licensing or royalties. XVT-DS and XVT-DS++ are $1,950 on personal computers
- and $6,300 on workstations. The PowerObjects library is $395 on personal
- computers and 495 on workstations. For more information contact XVT Software
- Inc., 4900 Pearl Fast Circle, Boulder, CO 80301, (303) 445,4223; FAX:
- (303)443-0969.
-
-
- Intel and BSO/Tasking Release ANSI C Compiler Toolkit
-
-
- Intel Corporation and BSO/Tasking have released an ANSI C Compiler Toolkit
- supporting the Intel MCS-96 family of micro-controllers, including 87C196NT
- and 87C196NQ 20-bit extended derivatives. The agreement between Intel and
- BSO/Tasking gave BSO/Tasking a master source code license to Intel's MCS-96
- Software Development Tools. The MCS-96 ANSI C Compiler Toolkit, hosted on
- PC/DOS, also supports a previously released 16-bit version of the 196 family.
- The MCS-96 ANSI C Compiler toolkit consists of an ANSI C Compiler, Macro
- Assembler, Locating Linker, Librarian, an extended Floating Point Library, and
- other utilities operating in a PC/DOS environment. Intel's ApBUILDER and
- ProjectBUILDER programming and development products are also included in the
- toolkit.
- BSO/Tasking is offering a price discount to all existing InteliC-96 users who
- purchase the MCS-96 ANSI C Compiler Toolkit through March 1994. For more
- information contact Boston Systems Office/Tasking, Norfolk Place, 333 Elm St.,
- Dedham MA, 02026, (617)320-9400; FAX: (617)320-9212.
-
-
- McCabe Announces McCabe Tool Set v4.0
-
-
- McCabe & Associates have announced McCabe Tool Set v4.0, a reverse engineering
- and testing software tool. McCabe Tool Set v4.0 includes a Data Tool,
- enhancements to CodeBreaker, and a link to desktop publishing packages.
- The Data Tool, which will be incorporated into the core Battlemap tool.
- supports two McCabe data metrics: Global and User-Specified Data Complexity.
- Global Data Complexity measures the Cyclomatic complexity of a module's
- structure as it relates to global/parameter data. The User-Specified Data
- Complexity provides the same measures for a portion of the data. Data Tool
- also supports C/C++; shows data slices; highlights modules and creates classes
- of modules containing specified data; displays declaration and references of
- data; allows editing of source code containing data elements; generates
- reports based on current use specification; and generates global and specified
- data complexity, flow graphs, test paths, and data slices for modules in the
- current search list.
- The CodeBreaker, a tool to find redundant and reusable code, has been paired
- with BattlePlan, McCabe's forward engineering tool. CodeBreaker compares
- module paths, as well as a number of metrics and properties such as names,
- interfaces, design boxes, SPEC notes, and code; generates source code
- templates to match design; identifies existing code that can be reused;
- searches for pre-existing implementation of a design; compares program
- implementation with design; identifies reusable code by comparing a design
- description of what a module should do against physical copies of source code;
- and finds likely candidates of pseudocode that match a set of implemented code
- to reestablish the traceability of the code with its design. CodeBreaker
- includes user-configurable properties, and builds pre-parsed "repositories"
- that can be configured and loaded into CodeBreaker.
- McCabe Tool Set v4.0 can convert tool output to a format that can be imported
- to desktop publishing packages such as Interleaf and FrameMaker. For more
- information contact McCabe & Associates, 5501 Twin Knolls Rd., Suite 111,
- Columbia, MD21045, (800)638-6316; FAX: (410)995-1528.
-
-
- PostModern Releases NetClasses v2.0
-
-
- PostModern Computing Technologies Corporation has released NetClasses v2.0.
- NetClasses is a set of C++ class libraries for distributed object-oriented
- communications. By linking the appropriate NetClasses libraries, application
- programmers can transport objects over a network, set up fault-tolerant
- peer-to-peer TCP connections, and perform Remote Method Invocation (RMI). The
- programmer can use C++ class library abstraction of TCP, UDP, and file I/O
- streams to communicate objects in connection-oriented, connectionless, and
- persistent object application domains.
- Prasad Mokkapati, PostModern's VP of Engineering describes the changes as
- follows: "In the new release, the agent and classes have been opened up to
- make more information available to the user. We've providing blocking on
- replies, asynchronous and synchronous RMI, and detection of deadlocks and
- stack overflows." Although NetClasses v2.0 drops internal reliance on NIH
- classes, external NIH data is still supported. NetClasses v2.0 includes TCP-
- and UDP-based object transport mechanisms. The TCP and UDP facilities are
- organized as C++ class libraries. For more information contact PostModern
- Computing Technologies, Inc. 1032 Elwell Ct., Suite #240, Palo Alto, CA 94303,
- (415) 967-6169; FAX: (415)967-6212.
-
-
- Nu-Mega Upgrades BOUNDS-CHECKER for Windows
-
-
- Nu-Mega Technologies, Inc. has upgraded its debugging tool for Windows
- applications. BOUNDS-CHECKER for Windows (BOUNDS-CHECKER/W) v2.0 combines
- Event Logging and Viewing with its automatic bug detection. The most recent
- events, including API calls, window and dialog box messages, hooks, callbacks,
- and ToolHelp notifications, are saved in memory and can be viewed with
- BOUNDS-CHECKER's event log windows. For more analysis, events can be logged to
- a file and viewed with TVIEW, Nu-Mega's trace file viewer. TVIEW uses graphics
- and color to let programmers view the flow of their programs.
- Other features of BOUNDS-CHECKER/W v2.0 include validation of APIs, API return
- value checking, targeted program checking, and a variable and structure
- inspection window. The inspection windows let programmers view data items such
- as arrays, structures, and C++ classes. BOUNDS-CHECKER/W v2.0 also has an
- External Load option that lets programmers select which executable is to be
- checked. BOUNDS-CHECKER/W 2.0 is $249. Upgrades from v1.0 to v2.0 are $69. A
- corporate license program is also available. For more information contact
- Nu-Mega Technologies, Inc. P.O. Box 7780, Nashua, NH 03060, (603) 889-2386;
- FAX; (603)889-1135.
-
-
- MicroSoft Releases Visual C++ 32-bit Edition For Windows NT
-
-
- Microsoft Corporation has released the Microsoft Visual C++ development system
- 32-bit edition for Windows and Windows NT. This Windows NT-hosted development
- environment targets Win32 and Win32's applications. The new compiler is a
- retail release, and replaces the command-line compiler and tools shipped with
- the Win32 SDK. The new development environment will be distributed via CD-ROM.
- Visual C++ 32-bit Edition provides an integrated set of graphical tools for
- creating Windows applications. The Microsoft Foundation Class v2.0 provides
- encapsulated building blocks. AppWizard supports creation of a "skeleton"
- application that exploits the building blocks. ClassWizard supports
- connections of visual user-interface elements with application code. AppStudio
- supports creation, editing, and browsing of application resources. Visual
- Workbench provides an integrated editor, debugger, browser, and code profiler.
- Visual C++ for Windows NT provides background compilation and multitasking
- with other applications. The product was written from the ground up to be a a
- 32-bit application, using OS features such as multithreading. The debugger now
- includes support for multiplethreads, exception-handling, and a memory window.
- A new analysis tool, SPY++, is included. SPY++ provides information on
- threads, processes, messages, and windows.
- Through an agreement with Chinon America, Microsoft has arranged promotional
- pricing for Visual C++ 32-bit Edition and the Chinon 535 CD-ROM. Microsoft
- describes the bundling agreement as providing up to $300 off the separate
- suggested retail pricing of both. This offer will be valid through February
- 28, 1994, or while quantities last.
- Microsoft also announced license agreements for its Microsoft Foundation Class
- Library C++ application framework v2.0 to other vendors, including competing
- C/C++ compiler vendors, through its Windows Partners Program. The first three
- companies to license the MFC library are Symantec Corporation, MetaWare
- Incorporated, and Blue Sky Software.
- For more information contact Microsoft Corporation, One Microsoft Way,
- Redmond, WA 98052, (206) 882-8080; FAX: (206) 936-7329; Telex: 160520.
-
-
- Rational Systems Upgrades DOS/4GW and Instant-C
-
-
-
- Rational Systems, Inc. has released DOS/4GW Professional, the DOS extender
- included with the WATCOM C/C++ and FORTRAN 77 programming languages, and
- Instant-C v5.4, a C development tool. DOS/4GW Professional adds the ability to
- bind the DOS extender to an application. According to the company, programs
- using the DOS/4GW virtual memory manager will load and run two to five times
- faster and DOS/4GW Professional is approximately 50Kb smaller in memory and on
- disk than DOS/4GW.
- Instant-C v5.4 includes enhanced debugging power which, according to
- Rational's release, detects over 700 distinct syntax and run-time errors;
- addition of a "locals" window; context-sensitive online help; and tool-access
- hotkeys and window controls. Instant-C v5.4 also improves object code support
- by displaying register and stack values, and assembly source for object code
- functions. DOS/4GW Professional is $298 and is royalty-free. Instant-C v5.4 is
- $2,995. For more information contact Rational System, Inc. 220 N. Main St.,
- Natick, MA 01760; (508)653-6006; FAX; (508)655-2753.
-
-
- Cygnus Support Announces Partnership with Advanced Micro Devices
-
-
- Cygnus Support has announced a partnership with Advanced Micro Devices, Corp.
- to provide a development solution for the Am 29205 evaluation board. The
- complete GNU C and C++ software development toolkit from Cygnus Support is
- shipped with each Am29205 evaluation kit. The GNU software development toolkit
- includes the GNU C and C++ compilers, GNU debugger, assembler, linker, and a
- set of binary utilities. Complete documentation for the GNU tool, hosted on
- Sun SPARC and DOS targeting the AMD 29K, is included with each evaluation
- board. GNU tools hosted on Sun SPARC are provided on CD-ROM. GNU tools hosted
- on DOS are provided on 3.5" floppy disks.
- The evaluation kit is available from AMD at $595. Support services for the
- evaluation kit are available from Cygnus Support. For more information contact
- Cygnus Support, 1937 Landings Dr., Mountain View, CA 94043, (415)903-1400;
- FAX: (415)905-0122.
-
-
- KL Group Releases XRT/3d v2,0
-
-
- KL Group Inc. has upgraded XRT 3/d, a three-dimensional graph widget toolkit
- for X-Window Systems that can represent three-dimensional data in a variety of
- graph types including surfaces, bar charts, and contour graphs. Features of
- XRT/3d v2.0 include the addition of 3-D bar charts and histograms, interactive
- real-time rotation preview, direct labeling of the X, Y and Z axes, CGM
- output, and data formats that can handle irregular gridded data sets.
- XRT/3d uses the same object-oriented API as the OSF/Motif. XRT/3d v2.0 is
- published on the XRT Product CD which contains all XRT widgets for nine UNIX
- architectures: DECStation, DEC Alpha/OSF, IBM, HP 300/400 and 700/800, SVR4,
- Sun, SCO, and SGI. XRT/3d v2.0 is $2,495, and there are no royalties or
- runtime fees. For more information contact KL Group Inc., 134 Adelaide St. E,
- Suite 204, Toronto, Ontario, Canada M5C 1K9, (416) 594-1026; FAX: (416)
- 594-1919.
-
-
- Subtle Software Announces Subtleware for C++/SQL
-
-
- Subtle Software, Inc. has announced Subtleware for C++/SQL, scheduled to begin
- shipping in the fourth quarter '93. C++/SQL is a bridge technology between the
- object-oriented C++ programming language and the relational database
- management systems (RDBMS). C++/SQL is for developers using C++ and its
- object-oriented constructs for application modeling and development, but who
- also require the use of a SQL/RDBMS for information storage. C++/SQL automates
- the semantic mapping and the coding required to combine Object-Oriented/C++
- and Relational/SQL.
- C++/SQL supports many C++ compiler/preprocessors on a variety of platforms and
- local, remote, or cross-platform RDBMSs. Since C++/SQL generates source code,
- a user can customize the actual bridging code to their requirements. C++/SQL
- provides a process and framework focused on the task of mapping C++ objects to
- and from SQL tables and relations and lets the customer define their own C++
- development enviroment. For more information contact Subtle Software, Inc., 1
- Albion Rd., Billerica, MA 01821, (508) 663-5584.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- We Have Mail
- Code Disk Update
- The C Users Journal provides all code listings from articles on a monthly code
- disk, which may be purchased separately from the magazine or on a subscription
- basis. In addition, the code disk typically contains listings that are too
- long to be printed in the magazine. Unfortunately, several code listings
- referenced in the October 93 CUJ didn't make it to the code disk. These files
- are described as follows:
- splash.zip -- this file contains the entire splash class library described by
- Jim Morris in his article, "The SPLASH Class Library." Due to size
- constraints, this library was not listed in its entirety in the article, but
- was to be provided on the code disk.
- winroth.zip -- this file contains the exception handling macros described by
- Harald Winroth and Matti Rendahl in their article, "Exception Handling in C."
- 1110072C -- this file contains Listing 6 for the article "Random Event
- Simulation for C Programmers" by Martin Scolnick.
- We provide the missing files on the December 1993 code disk. However, if you
- received an October 1993 code disk missing these files, and would like a
- replacement, please call, write, or e-mail R&D Publications, Customer
- Relations, 1601 W. 23rd Suite #200, Lawrence, KS. 66046-2700. (913)-841-1631.
- e-mail: pam@rdpub.com.
- Also, CUJ code listings are available online, from a variety of sources. For a
- description of these sources, refer to the section entitled "CUJ Online Source
- Code," in the Table of Contents, page 6.
- Dear Bill:
- Lint for C++? A Great Idea!
- Actually, as the producers of PC-lint, we've been asked this question hundreds
- of times over the past few years. Our answer has always been, "We're working
- on it. It's coming."
- Well, it's almost here. We are nearing the end of our beta test cycle and on
- November 1, 1993, we expect to release PC-lint 6.00 for C/C++. FlexeLint for
- C/C++ will follow shortly thereafter.
- We appreciate Ken Pugh's recommendation of our PC-lint for C but challenge his
- assumption that the design of C++ all but makes lint-like checking obsolete.
- Over the years, a number of authors have offered do's and don't's for good C++
- programming. See, for example, Effective C++ by Scott Meyers, C++ Programming
- Style by Tom Cargill, C++ Programming Guidelines by Tom Plum and Dan Saks, and
- "Check List for Class Authors" (The C++ Journal, Nov. 92) by Andrew Koenig. At
- the very least, wouldn't it be great to be able to do this kind of checking
- automatically?
- Perhaps a design feature of C++ was to make lint-like checking obsolete, but,
- as they say, the best laid plans of mice and men oft go astray. I am reminded
- of an "unsinkable" ship called The Titanic.
- Sincerely,
- Anneliese Gimpel
- Marketing Director
- Gimpel Software
- 3207 Hogarth Lane
- Collegeville, PA 19426
- (215) 584-4261
- FAX (215) 584-4266
- I am pleased to see the Gimpels moving in this direction. I agree that their
- excellent C tools also have a place in the C++ world.
- Dear Sir,
- Given all the interest in calendar matters, I would like to add some more on
- the subject. First and foremost, I would like to point out the wealth of
- algoritmic information in:
- Doggett, L.E.: "Calendars"
- in Seidelmann, P.K. (ed.):
- Explanatory Supplement
- to the Astronomical Almanac
- Mill Valley, University Science Books, 1992,
- ISBN 0-935702-68-7
- chapter 12 (pps. 575-608).
- One can find there many date and calendar conversion algorithms, from and to
- Julian, Gregorian, Islamic, Indian, and the basic JDN (Julian Day Number),
- with the interesting aspect that not a single algorithm requires the
- floating-point format. A wealth of references is also to be found (51, one of
- them dated 1583! In Latin, of course.) It is to regret that the reference list
- doesn't seem to comply with ISO 690 and therefore some of the books will be
- hard to retrieve.
- It also is interesting to note that astronomers are moving their Julian
- calendar origin from January 1 --4712 into the future, to (the noon of)
- January 1, 2000 (o,c., section 1.253 p. 8) and calling that new system J2000.
- (All this is very much oversimplified here, of course.) That has several
- reasons, but the one that really interests me is avoiding, well, astronomical
- integers. Unfortunately it is connected with a rather arcane thing called
- Barycentric Dynamical Time I had better not know about. Maybe more practical
- will be the modified Julian date (MJD) given by:
- MJD = (Julian day number) - 2400000
- which works with UTC (l.c.) so that January 1, 2000 (noon), which is JD
- 2451545, becomes a quite manageable MJD 51545 (or J2000 day 0).
- There also was a letter from Mr. Viscogliosi to PC Magazine (May 11, 1993, p.
- 401) including a code listing (DOWGJ.C) for Gregorian and Julian day of the
- week (on PC Mag Net too, I suppose). Sorrily, that program should be named
- DOWGJ.CPP -- people are getting confused -- but it's worth being looked at.
- The LeapYear function used by many, and namely presented by Mr. David Burki in
- "Date Conversions," CUJ (1993) 11-2:29-34, works nice as a macro such as:
- #define LeapYear(Y) (! (Y % 4)
- && (Y % 100)
- !(Y % 400))
- (I'm using TC 2.0.) There's a most interesting Easter Day algorithm presented
- in:
- Carmony, L.A., Holliday, R.L.:
- A First Course in Computer Science
- with Turbo Pascal
- New York. Freeman & Co. 1991. p. 204
- ISBN 0-7167-8216-2
- which is said to have a domain of Year: [1900; 2099] although the reason is
- unstated. After streamlining and translating it boils down to the following
- function:
- int Easter(int Year)
- {
- int Y, A, B, C, D, Month, Day;
-
- /* Return false on domain error */
- if ( (Year<1900) (Year>2099) ) return 0;
-
- /* Compute Easter day */
- Y = Year - 1900;
-
- A = (7 * (Y % 19) +1) / 19;
- B = (11 * (Y % 19) +4 -A) % 29;
- C = ( Y + Y / 4 +31 -B) % 7;
- D = 25 -B -C;
-
- if (D>0) { Month = 4; Day = D; }
- else { Month = 3; Day = D +31; }
-
- /* Wrap and return result */
- return Month + Day*10;
- }
- the result being easily unwrapped by the calling function without need of a
- struct: Month = Result %10, Day = Result /10.
- From all that I've read in recent months, it appears that a group of sturdy
- civil calendar functions is something people need badly and keep on rewriting,
- a bit like the many attempts to augmenting mantissa size in C numerical
- formats. I have come across many algorithms of many kinds for calendar
- calculations and now possess a modest but confusing collection of those, but
- still try to improve on some of them. At this point, I took for myself some
- obviously questionable directions, such as not to consider algorithms
- requiring floating point, and sacrifice function domain for more basic data
- types such as ints by choosing a suitable calendar origin.
- From then on, some basic functions seem to be: day of the week, days between
- dates, new date given a date and an offset in days, and full date string in
- the country's language. (My compiler for sure has no strftime nor Locale.h.
- There are a number of these algorithms, but which is fastest, smallest, most
- accurate, simpler, most portable, easier to modify, etc.? We can't keep on
- re-inventing the wheel, even if it's our own wheel. Could it be that someone
- already did this? I don't want to spend a lifetime doing it, and bet most
- people don't, but I offer my cooperation -- sorry it's no big thing.
- Finally, just some tiny matters. I subscribed just very recently to CUJ, but
- am finding myself asking things such as "Where was that function?" or "Where's
- that paper on..." and browsing through my very lean collection of CUJ issues
- (which I guard with my life, more or less.) Given that I also subscribe to the
- disk listings, couldn't it be interesting to have in each one a index for the
- issue, possibly with keywords, accessible to search with a grep or DOS FIND?
- Authors usually know how to do that. I know it's a bore, but it doesn't take
- too long either.
- The final aspect is related with CUG Library. The most recent volumes have no
- documentation readily accessible. For instance, without having ordered CUJ
- back issues I would not have known that NEWMAT is a matrix/statistical package
- (something I need very badly) but is unfortunately coded in C++ (quite an
- icecold shower). I know you have such niceties as e-mail, but I don't (X.25 is
- unnafordable) and it's not my fault, nor yours, obviously. It just happens
- that I'm just an impoverished scientist.
- Finally, allow me to congratulate you and the CUJ team on such an accomplished
- journal. I am totally hooked on it and think I will subscribe to CUJ as long
- as I code in C, which I expect will be for quite a while.
- Best regards,
- Joao C. de Magalhaes
- R. Almeida Garrett 16 5E
- 2795 CARNAXIDE
- PORTUGAL
- And I thought I knew everything about calendar computations. I'm just
- returning from a visit to the CUJ intergalactic headquarters in Lawrence,
- Kansas. Interestingly enough, we identified two supplemental services that we
- felt should be beefed up. One is to make more available a machine-readable
- index to past articles in CUJ. The other is to better educate our readers
- about that little gold mine called the CUG Library. Your letter provides
- useful reinforcement at a critical time. -- pjp
- Dear CUJ,
- I am addressing this question to you because I know of no one else who can
- help me with this question. I have a DOS version of the editor vi. It is
- called VIPC although it comes up as PC/VI when it loads up. -- This software I
- believe was developed by:
- Custom Software Systems
- P.O. Box 678
- Natick, MA 01760
- 617-653-2555
- 508-653-2555
- I have tried contacting them but have been told by the Better Business Bureau
- of Massachusetts that CSS is no longer in business or may be listed under a
- new name. The reason I am trying to contact them is because I am trying to get
- an updated version, as the version I have is about six years old. If they are
- out of business, I was wondering how I could get hold of the source code to
- this software. I realize that at one time the AZTEC C Compiler came with a
- version of vi but I prefer the CSS version, as it is more like the Unix
- version.
- This is the specifics of the program:
- name size date time
- VIPC.EXE 95595 06-04-87 8:57p
- The following is the text that is displayed as it loads up:
- PC/VI Version 2.01 (IBM-PC) -- 6/04/1987
- --Copyright (C) 1985-1987
- Custom Software Systems
- So if you know how I can get a new version of VIPC or the actual source code I
- would greatly appreciate it.
- Thanks,
- Manuel Lopez
- 6820 LBJ Frwy
- Dallas, TX 75240
- Anybody? -- pjp
- Dear C Users Journal,
- Just a note to Hutchinson Persons, Engineer, who so eloquently presented his
- anthrocentric agenda in his letter published in your July 1993 issue. He
- objected to Christopher Skelly's "errors of distinction" in his
- personification of some computer terms and characteristics.
- Mr. Persons, it warms me truly to see a man so intent on controlling his
- environment. Damn the mosses! Let the human reign! With people like you at the
- helm, the planet shall be truly lacquered. Your sense of irony seduces me.
- What better way to point out the "imprecise thinking" in Mr. Skelly's
- animative statements than to present your own? ("Can your p[n] make this
- claim?")
- Thank you, also, for pointing out "the importance of a human centered
- philosophy." After all, when you state that "the word 'sense' applies to a
- human ability", we human-centered philosophers, at least, are aware that dogs
- do not really smell, bats do not really hear, and a whale feels nothing when a
- half-ton fetus slips from its loins. Senses? Of course not, those are merely
- the automatonous snappings of substandard synapses.
- Finally, thank you for denying the evolution and creative use of language.
- After all, if p[n] cannot "live" below p, then neither, as you sarcastically
- point out, does fly-tying require "surgical ability." You back this up with
- your other sarcastic statement that you "compute for a living." We "precise
- thinkers," of course, know that you really mean "I write programs which enable
- computers to compute, for a living." I salute you!
- I hope that when we all move into sealed and sanitized geodesic domes, as a
- result of our using "computers, electronics, chemistry, machinery, and any
- other method (we) can...to enjoy...control...of (our) environment," that you
- will be my neighbor, so that we, as humans can stand together and never be
- "relegated to the status of the animal."
- Sincerely,
- Ed Hawco
- Writer, Technical
- 4854 rue Dagenais
- Montreal, Quebec H4C 1L7
- And I thought that I was hard on the guy.-- pjp
- Dear Mr. Plauger,
- It was interesting reading Anthony Naggs letter in the March 93 issue of The C
- Users Journal. In this he was talking about trying to make the bubble sort
- more useful. I have often wondered why anyone would bother with bubble sort
- when, with the addition of a couple more lines of code, you can have a Shell
- sort which is considerably faster. I have found the following version of Shell
- sort to be very efficient and, as you can see, very easy to implement:
- void shell_sort(int list[], int listSize)
-
- {
- int gap=listSize/2, goforward,
- goback, temp;
-
- while (gap > 0) {
- for (goforward=gap;
- goforward<listSize;
- goforward++) {
- goback = goforward;
- while (list[goback--gap]
- > list[goback]) {
- temp = list[goback];
- list[goback]
- = list[goback-gap];
- list[goback-gap]
- = temp;
- if ((goback -= gap)
- < gap) break;
- }
- }
- gap = gap * 3 / 5;
- }
- }
- This beats the sock off bubble sort. For arrays of one million integers I've
- found it to take about twice as long a quick sort. On the other hand, it does
- have the advantage of tight memory control.
- Regards,
- Gordon Lingard
- P.O. Box 1550
- Armidale NSW 2350
- Australia
- glingard@neumann.une.edu.au
- Your point is well taken. And for a rather small number of items, a bubble
- sort can be smaller and faster than either Shell sort or quick sort. -- pjp
- Dear Dr. Plauger,
- I am a relatively recent subscriber to The C Users Journal. So far I am
- finding it quite informative and an inexpensive way to improve my C and C++
- skills. I have also purchased and read your book, The Standard C Library which
- I also found interesting and useful.
- In the process of attempting to design a string class for an application a
- friend and I are working on, I studied several examples found in various books
- I had purchased on C++. A problem that existed in all of them was an elegant
- and simple way to handle exceptions to allocating memory. One implementation
- never verified the return value of new at all -- which ran against my training
- and experience. My initial solution was to use set_new_handler, but on
- investigating this avenue further, it didn't seem to be ideal. One text
- referred to set_new_handler as an interim solution and I didn't want to
- consciously code something that is obsolescent.
- Anyway, over the course of about a week of experimentation and study I finally
- hit upon the following solution: overload the global new operator so that it
- takes a function pointer argument:
- void *operator new(size_t size, void
- (*newException)());
- This function pointer would be used to point to the desired exception
- function. The code for the overloaded operator new is:
- void *operator new(size_t size,
- void
- (*newException)())
- {
- Boolean quit = FALSE;
-
- // allocate memory
- void *p = malloc(size);
-
- // if error allocating memory
- if(!p)
- {
- // if newException points
- // to a routine
- if(newException)
- {
- // call the exception
- // handler
- newException();
-
- // attempt to allocate
-
- // memory once more if
- // newException returns
- if(NULL == (p =
- malloc(size)))
- quit = TRUE;
- }
-
- // exit if no handler defined
- else
- quit = TRUE;
- }
-
- // if memory allocation failed
- if(quit)
- {
- cerr << "\nInsufficient memory.
- Exited program...";
- exit(EXIT_FAILURE);
- }
-
- // memory allocation succeeded,
- // retum pointer
- return p;
- }
- My intent was to have a general purpose memory allocation error function that
- is called automatically. If the error routine returned, there would be one
- more attempt to allocate memory before exiting the program.
- I think I have accomplished that. In addition, some experiments with my own
- code has convinced me that this approach eliminates a large amount of code,
- which is to say, the object files are a lot smaller for modules that make
- extensive use of dynamic allocation. While I haven't run the overloaded new
- operator through a profiler, it seems reasonable to me that the reduction in
- speed is not that significant. The standard global new operator is not
- shadowed and is readily available should it be desired.
- Later, it occurred to me that the same approach could be utilized with the
- Standard C function malloc. It would be relatively simple to define a function
- such as the following:
- void *mymalloc(size_t size,
- void (*exception)());
- with code similar to that found above. This would greatly simplify writing
- dynamic allocation routines since most applications will want to handle
- exceptions in only a few standard ways. Using this method seems both simple
- and elegant. Code would be easier to read, and executable files would be
- significantly smaller (at only a slight cost in run time).
- What do you think of this approach? Is it a good method or is there a
- complication I'm not considering?
- I can't help feeling a lot of people smarter than I have worked many years in
- these languages. This has to have been considered at one time or another and
- yet I have never run across it before.
- Finally, what is the future of set_new_handler? Is it really intended to be an
- interim solution or do the C++ committees intend to retain it?
- Sincerely,
- Randel Dale Astle
- The joint committee definitely plans to retain the function set_new_handler,
- with just a few refinements in its semantics. As you have observed,
- programmers do not always check whether a new expression succeeds. Thus, the
- joint committee has introduced an exception that is thrown by default when the
- expression fails. Reconciling this behavior with past practice involves a few
- subtleties that I'd rather not explore here.
- I like your approach of passing function pointers for exception handlers. Yes,
- I've seen it before, in one form or the other, but it's not widely used. My
- guess is that most programmers don't want to have to specify the handler
- function on each call. -- pjp
- Dear Sir:
- I recently subscribed to the CUJ, and enjoy it very much. I especially enjoyed
- the articles on curve fitting, the alpha/beta filter, and recovering distorted
- wave forms. I would very much like to see some good, fully researched articles
- on the following topics:
- Fast Fourier Analysis
- Maximum Entropy Spectral Analysis
- Maybe one of the excellent "engineer" authors would do this for CUJ.
- I am not an engineer, but I need to use these methods in a project that I am
- working on. I have not been able to find anything concerning these topics in
- any of the programming magazines. Any help or comments will be appreciated.
- Thank you.
- Sincerely,
- Delbert Bourling
- 648 Maple Grove Rd.
- London, KY 40741
- P.S. Any published information on "Maximum Entropy Spectral Analysis" seems to
- be very, very scarce. I think some good algorithms and C/C++ code would be
- very useful to a wide range of CUJ readers.
- Your interest is noted. Potential submittors might note the same.-- pjp
- Howdy, howdy!
- Not only am I going to try out your journal but I am going to present you with
- a challenge to see how useful you can be to me.
- In mode 18 (native VGA) I draw X-Y axes on a screen, label them, make tick
- marks on the axes, and again label them. Then I draw a graph from data stored
- in an array. But now I want to send a duplicate of this to a printer and I
- don't really want to wait more than a minute for this to happen. Presently, I
- can redraw the image on a hidden video page and do a kind of screen dump by
- reading the entire screen pixel by pixel, row by row, and loading it into an
- array. Then I take a third-party printing utility (PGL Toolkit) to send the
- array to the printer. It is slow and the output is poor. If I had their source
- code, I would see if I could redesign it for my particular application and
- recompile it as a subroutine in my program. Also their printer drivers could
- stand some improvement. How do Microsoft and WordPerfect get their graphics
- images to a printer? I would like to know how and do it myself. If you have
- some alternatives or know something I don't, please send me some info.
- Hoping this finds you willing and able I am
- Jim Baugh
- 412 South Wakefield Drive
- Lafayette, LA 70503-4632
- Anyone want to rise to the challenge? -- pjp
- P.J. Plauger,
- Recently I was made aware of the existence of The C Users Journal. What do I
- have to do to subscribe? I would also be interested in contributing. Your
- monthly column is something I always enjoy.
-
- I have been a practicing programmer for about 30 years and your comment about
- "computer science" in the June 1993 Embedded Systems Programming struck a
- chord. I have never seen any justification for either "computer science" or
- "software engineering." My view has always been that these are just
- power-grabbing ploys by the two established disciplines of Science and
- Engineering, both of whom saw it as a threat to their cozy worlds and a chance
- to grab some of the kudos and dollars that would go along with annexing this
- new and useful but essentially unrelated activity.
- Programming has always seems to me to be a skilled craft. It has also been my
- observation that some people can do it and some can't. The ones who can't just
- never seem to get it, and no amount of training, etc. will help. I learnt to
- program in the early 1960s without the benefit of formal teaching by anybody
- who knew anything at all, simply because there weren't many such people
- available. Over the years I have managed to correct most (but not all) of the
- bad habits acquired and the process has left me with a keen awareness that I
- should always be looking at how I do things and be ready to change when there
- is obviously a better way.
- Which brings me to other thing which you have mentioned and causes me some
- perplexity at the moment, C++. As a practitioner, I have embraced most of the
- other significant developments in programming that have occurred over that
- last three decades. In just about every case they solved an obvious problem or
- pointed out a better way to do things. I can't get excited too much about C++.
- Maybe I am just getting too old for this and should go find something else to
- do, but comparing C++ to the elegant simplicity of Pascal or Modula leaves me
- cold. The fact that the authors/creators in many instances have simply
- replaced one set of jargon with another doesn't help in sorting out what is
- substantive in what is being offered.
- I guess I will have to go with the flow and at least try one project with C++.
- Frank Campbell
- uunet!mti.com!campbell
- I think you're being a bit hard on computer science and software engineering.
- There are definitely both scientific and engineering principles that are
- highly relevant to the design and use of computers. I agree that programming
- is a craft, having learned it much as you have, but many a craft has been
- improved by the application of discipline and technology.
- As for C++, it is certainly not elegant in the same sense as Pascal, Modula,
- or even good old C. I find that C++ comes into its own is with larger
- programs. If you have little occasion to work on large projects (where "large"
- is admittedly a relative term), you may find few compelling reasons to use
- C++. -- pjp
- Dear Mr. Plauger:
- I have a number of questions regarding file handling functions that I hope you
- can answer, or else direct me to a publication which discusses this area.
- The Microsoft C60.0 library includes three groups of file handling function;
- fopen, open, and dos open. Other than the buffering offered by the fopen
- family, I do not know what their other advantages and disadvantages are. For
- example, is there any difference between using _dos_read to read file data to
- a far buffer, and using read compiled with the large memory model?
- A second question relates to DOS system buffers. When a C function writes data
- to the hard disk, does it initially write to a DOS buffer, and then to disk
- when the operating system decides? I presume the system buffers are the ones
- defined in the CONFIG.SYS file. I also presume the buffering provided by the
- fopen family uses different buffers. Is there any way to ensure that data is
- immediately written to the disk?
- Finally, how does SHARE work? I use it with the fopen and dos_open functions
- because my programs are often used on networks and I need to implement record
- and file locking. I am confused about SH_DENYNO and SH_COMPAT. For example,
- Microsoft recommends using SH_COMPAT for DOS-based networks, but I have only
- been able to implement proper file sharing on Lantastic and Novelle networks
- using SH_DENYNO -- SH_COMPAT gives a "sharing violation" whenever a second
- program attempts to open a file already open but not locked.
- I would be very thankful for any light you can shed on these problems. Thank
- you.
- Yours truly,
- R.W.J. Ford, M.D.,
- Department of Anaesthesia
- Shaughnessy Hospital
- 4500 Oak Street, Room A437A
- Vancouver, B.C. V6H 3N1
- I can only give you partial answers, since I tend to stick to fairly portable
- C. Function open and its brethren model the original UNIX style of I/O. When C
- started getting moved among machines, fopen and company were added. These
- guarantee a reasonable amount of buffering, and insulation from peculiarities
- of the underlying operating system, at the cost of still more machinery and
- fewer supporting operations. The DOS versions give you access to more
- DOS-specific features. In particular, you can choose whether or not to lose
- those carriage returns that DOS has and UNIX lacks when you read from a file.
- You can pretty well count on system buffers to be involved in practically all
- reads and writes. The buffering logic within DOS has to deal with blocking and
- unblocking sectors of various sizes, and with various limitations on how well
- DMA channels can address different parts of storage.
- All I know about SHARE is what you apparently know -- that it helps minimize
- collisions among processes reading and writing the same files in parallel. I
- know it is supposed to support more sophisticated kinds of sharing such as you
- describe, but I've never had occasion to program with SHARE in mind. -- pjp
- Dear Mr. Plauger,
- My apologies for writing to you at a rival publication, but this was the most
- obvious address to get you at. The purpose of the letter is to tell you how
- much I have enjoyed "Programming On Purpose" over the years, and how much I
- will miss it years to come.
- I found your column to be one of the most thought provoking I have ever read
- (and not just from a programming point of view) -- and often used to bring the
- themes up in conversation at my place of work to get a debate going around the
- issues you raised. I cannot say I always agreed with you. I never, however,
- disagreed strongly enough to write a letter to the author!
- I sincerely hope the book Programming On Purpose: Essays on Software Design,
- as well as its companions, is released in South Africa. Just in case, would
- you mind sending me the ISBN numbers of the books at the above address. This
- will help me to get hold of them easily. I believe the publisher is
- Prentice-Hall.
- Once again, thank you for some provocative stuff.
- Yours Sincerely,
- John Bannister
- P.O. Box 32092
- Braamfontein, South Africa 2107
- Volume 1 (design) is ISBN 0-13-721374-3, volume 2 (people) is ISBN
- 0-13-328105-1, and volume 3 (technology) is ISBN 0-13-328113-2. All are indeed
- published by Prentice Hall and all originally appeared in Computer Language.
- -- pjp
- Dear Dr. Plauger,
- Thank you for printing the letter in which I asked Mr. Pugh why my int10h
- handler successfully controlled printf scrolling on monochrome, while EGA/VGA
- systems appeared to scroll without calling int10h at all (CUJ March '93,
- p.124).
- If you'll permit it, I'd like express publicly my thanks to Brian Knoblauch
- (Toledo, OH), Ir. H. Hahn (Veldhoven, The Netherlands), Steve Ferrell (Duluth,
- MN), and Carl Smotricz (Hattersheim, Germany). They took the time to share
- their experience and knowledge about EGA/VGA scrolling with me. Mr. Knoblauch
- suggested that I investigate int42h. Mr, Ferrell recommended PC Interrupts
- (Ralf Brown and Jim Kyle, Addison-Wesley, 1991), and said int42h was used as a
- replacement for int10h on some EGA/VGA systems. Mr. Hahn gave details about
- intercepting inth10h AH=13h (Display String) to control scrolling from string
- display functions. Mr. Smotricz had discovered that some systems call int10h
- AH=6 (Scroll Up) from int10h AH=9 (Write Char & Attr) and int10h (Write Char);
- he suggested that I intercept those functions in addition to intercepting
- Scroll Up.
- Disassembling int42h on an AST Premium 386 showed that it was indeed an int10h
- workalike, as Mr. Ferrell and Mr. Knoblauch indicated. In particular, the
- Scroll Up function (AH=6) looked like that of int10h. But when I modified my
- program to intercept int42h AH=6 in addition to int10h AH=6, nothing changed:
- Scrolling was controlled as expected on the XT-clone monochrome system, but
- not on the AST VGA or a 386SX VGA with Award BIOS. Nevertheless, the
- suggestions provided by Mr. Ferrell and Mr. Knoblauch provided valuable
- experience which I will use in future projects.
- My experiments showed that intercepting String Display (AH=13h) or Write Char
- (AH= 9 and 10h) of int10h would make screen updates too slow for my current
- application. Still, the detailed information on these points given by Mr.
- Smotricz and Mr. Hahn will be useful in other contexts. I'm glad they didn't
- let either national boundaries or the Atlantic Ocean stop them from giving me
- the benefit of their hardwon experience.
- Since there doesn't seem to be a good way to make my int10h method control
- printf scrolling on EGA and VGA, I had to look at alternatives. The solution I
- finally implemented was a modification of a method I had devised prior to the
- int10h method. It involves changing DOS Offset of Video Buffer (OVB) value
- stored at absolute address 0:44E to point to the first colunm of top row to be
- scrolled, and clearing 25 rows of video-buffer memory at that offset. This
- "OVB method" assumes that video page 0 is current, and that the last byte of
- page 0 is immediately followed by at least 3,840 bytes of unused memory (which
- is usually the first 24 rows of video page 1). The original version of my OVB
- method worked on a Hercules-clone monochrome system, and on a Paradise VGA
- Professional, but failed to control scrolling on an IGC 20 TIGA system, and on
- one other machine with an unknown video system. That's why I abandoned it in
- favor of the int10h method outlined in my earlier letter (CUJ 93 March,
- p.124).
- The failing systems behaved as if they reset the OVB word at 0:44E to 0
- repeatedly. (Indeed, the BIOS listings in the IBM Hardware Technical Reference
- (84apr) reveal that a call to int10h AH=0 (Set Mode) or 5 (Set Active Page)
- will cause the word at 0:44E to be reset to 0.) In the original version of my
- OVB method, I set the scroll-limiting OVB value only once, during program
- initialization. To make it work in spite of repeated resetting, I wrote a
- function to use instead of printf whenever limited scrolling was in effect.
- The replacement function (Listing 7) re-assigns the scroll-limiting value to
- the OVB word at 0:44E prior to each screen write.
- [The listings mentioned in the preceding and subsequent paragraphs are omitted
- because of their length. We include them on the monthly code disk. -- pjp]
- This solution controlled scrolling on all systems tested (Hercules, Paradise
- VGA Pro, and TIGA), but a problem remained: Lines written in the scrolling
- rows did not appear at all on the TIGA screen, yet displayed as expected on
- the other systems.
- To investigate the TIGA problem, I wrote a small test program so I could
- experiment with the effect of modifying three additional video parameters in
- various combinations. (The OVB word at 0:44E was changed for every test). I
- found that modifying the Start Address Register didn't work on any of the
- three test machines. Modifying the Screen Length word at 0:44C was not
- effective on any of them, either.
- In the test program, changing the Number of Displayed Rows at 0:484 to
- correspond to the new value at 0:44E worked on all the test machines. For
- example, if row 0 through row lockrw were to remain locked on the screen while
- rows lockrw+1 through 24 were allowed to scroll up, putting the value
- 160*(lockrw+1) at 0:44E and 23-lockrw at 0:484 would work on all test
- machines.
- Unfortunately, when I tried this method in a production program, the TIGA
- machine showed display anamolies. At that point, I decided that it would not
- be economically feasible to continue trying to include TIGA systems in this
- version of the production program. I removed the code for changing the Number
- of Displayed Rows value at 0:484 from the production program; and it worked as
- expected on all test machines except the TIGA.
- To use the OVB method, you'll need code similar to that shown in LOCKVROW
- (Listing 2) to lock a specified video row on the screen while allowing those
- below it to scroll. You'll also need something like UNLOCKVR (Listing 3), so
- you can restore normal scrolling before returning control to the operating
- system, or before each section of code that needs a visible cursor.
- The cursor should be hidden (HIDECURS, Listing 5) during locked-row operation,
- because its position is relative to the changed offset at 0:44E, while the
- visible characters on the screen are always relative to offset 0. For example,
- if the cursor is visible on row 24 when 0:44E is 0, changing 0:44E to 2720 (to
- lock row 17 on the screen) will put the cursor on row 24 of the "new" screen
- that begins at offset 2720 of the video buffer. But that row isn't visible on
- the displayed screen that begins at offset 0 (it would be row 41, counting
- from there).
- Furthermore, experience shows that printf (and scanf) ignore the 0:44E offset.
- Their data is relative to offset 0 of the video buffer, regardless of 0:44E.
- In one experiment on a Hercules clone, I set 0:44E to 17*160=2720, then used a
- BIOS call to put the cursor on row 0. The cursor immediately appeared on row
- 17, but both printf and scanf operated as if the cursor were on row 0:
- characters appeared on row 0 as the cursor moved in step along the columns of
- row 17.
- For sections of code that need a visible cursor, as when requesting and
- obtaining keyboard input, UNLOCKVR (Listing 3) can be used to restore the OVB
- word at 0:44E, and NORMCURS (Listing 6) will restore the cursor. To return to
- locked-row operation, use HIDECURS (Listing 5), then use RELOCKVR (Listing 4).
- I hope others can benefit from our combined experience with controlling printf
- scrolling. Thanks again to you and to the other four people who so unselfishly
- helped me. It's encouraging to see free exchange of ideas continuing in spite
- of software patenters who, ignoring that all ideas are derived from others --
- which are based on the ideas of still others, ad infinitum -- conceitedly
- claim exclusive ownership of any they might use in software. Please continue
- your valuable contributions to the international community of programmers,
- students, and teachers.
- Sincerely,
- Sid Sanders
- 5 Seneca Avenue
- Geneseo, NY 14454-9508
- I add my thanks to all our readers who have repeatedly demonstrated a
- willingness to share their knowledge with others. -- pjp
- To the editor:
- Mr. Ralph Franke has called to my attention an error in the RPFT code
- described in "Curve Fitting With Extrapolation, C Users Journal, June 1993.
- The statement for reading the command-line parameter -DIG in line 47 of
- function commd reads only the second digit. Lines 46 and 47 are:
- if (!strncmp(&argv[i][0],"-DIG=",5))
- { k=sscanf(&argv[i][6],"%d",dig);
-
- but should be:
- if (!strncmp(&argv[i][0],"-DIG=",5))
- { k=sscanf(&argv[i][5],"%d",dig);
- Mr. Franke also pointed out that, for data which are an exact fit to a
- polynomial and which include a Y value of zero which is not the first or last
- datum, the interpolation routine may give incorrect results with no warning
- message. This is not a problem for inexact empirical data, but the revised
- routine shown in Listing 8 should be used for safety.
- This version can still fail for some abscissae with exact polynomials such as:
- y = 10 x -9 x^2 + 8 x^3 -7 x^4 + 6x^5
- For x = -3, -2, -1, . . . , 12
- That function is fit properly by RPFT, but in a separate test using the
- interpolation routine by itself it failed at some points such as X = -1.5 with
- the message that a pole (zero denominator in the interpolating function) may
- exist at that point. X = -1.500001 runs OK.
- Lowell Smith
- 73377.501@compuserve
- We provide the corrected version of RPFT on the monthly code disk. -- pjp
- Dear CUJ,
- First, let me say that it would be nice if prospective letter writers were
- directed to an e-mail address. I didn't really want to dump on you, Mr.
- Plauger. :-) [Use cujed@rdpub.com -- mb]
- I enjoyed Chuck Allison's July article on C++ Streams. However I wish somebody
- would provide an exhaustive and accurate list of the effects of all format
- flags for both input and output. Also, there is a lot of confusion out here
- regarding the duration of these flags and items set by manipulators such as
- width, precision, and fill.
- I have been through talks or books by Stevens, Stroustrup, Coplien, Saks,
- Semaphore's Shewchuk, Rowe, Swan, and Allison, and I still am not sure on a
- couple of points. Specific areas of confusion include:
- Q. Is the default for ios::basefield all bits off? Is this the same as
- ios::dec? For Borland, at least, the default for an istream is all bits off,
- and this is not the same as having ios::dec set. For instance, given the
- example in Listing 9, try typing "0x12" three times:
- 0x12 0x12 0x12
- 18 cin.good()=1
- 18 cin.good()=1
- 0 cin.good()=1
- 0 cin.good()=0
- 0 cin.good()=0
- The first time uses the default input setting, and converts to hex. The second
- is explicitly set to 0 (the default) and also converts to hex. The third input
- explicitly sets the decimal flag and stops reading at the "x" giving you a
- value of zero. The other inputs choke on the "x."
- If you think this is esoteric, have a user enter "010" in a field where you
- have not set the base explicitly to ios::dec! You guessed it, you get a
- conversion to octal for a value of 8.
- Q. How long are width, precision, and fill in effect for istreams? ostreams?
- Best Regards,
- Dave Rogers
- Frank Russell Company
- 76366.2171@compuserve.com
- The Library Working Group of the joint C++ standards committee is indeed
- working to clarify the issues you raise. Currently, they suffer from an excess
- of variety, as you point out. Much of the basic work in this area has already
- been done for the LWG by Jerry Schwarz, the original author of iostreams. I
- will be answering questions such as this in more detail in my column,
- "Standard C" (admittedly a slight misnomer here), and later in the book I am
- currently writing on the Standard C++ library.
- For now, I will simply say that ios::basefield should be initialized to
- ios::dec. Precision and fill stay in effect until you explicitly change them.
- Width gets set to zero by conversions that use the width. -- pjp
- Dear C User's Journal:
- I enjoyed the article on "Automated Unit Testing" by Roger Meadows in the
- August 1993 CUJ (pp. 53-58). I did notice a bug that the test routine did not
- catch in the strws function. If the input string ends in whitespace,
- processing continues to run through memory until it finds a \0 terminator that
- is not preceeded by whitespace.
- I realize that the point of this article was to point out how to include a
- main routine for testing purposes but it should also be noted that this is
- just the type of error that is very difficult to track down when you get "bus
- error -- core dumped" much later due to other data being walked on. Especially
- when it is dependent on a specific data pattern that will likely be very
- intermittent and hard to reproduce.
- Anyone writing testing routines needs to look very carefully at potential
- errors in the routine being tested. In my opinion, any syntax like "*to++ =...
- ", should raise a flag that says "you'd better make sure you don't
- accidentally walk out of the bounds you intend." This means adding at least a
- suffix to the data. In this case, the suffix must contain multiple blanks or
- tabs to detect the problem.
- An example is shown in Listing 10.
- Consider the string "test\t", after copying the "test" characters, to and from
- both point to the tab. A blank is copied instead of the tab and both pointers
- are advanced to point to the \0 terminator. The inner while is not executed
- since we are pointing to the end of the string. We now come out and copy the
- next character since it is not a blank or tab. However, it happens to be the
- \0 terminator. The pointers are advanced and no longer point to the
- terminator. The outer while will continue to process until it finds a
- terminator not preceded by a blank or tab.
- The fix is to add a condition, as shown in Listing 11.
- And to include in the test code as shown in Listing 12.
- DISCLAIMER: I have not actually tested this code, these are just my thoughts
- while reading the article. You should check them before publishing any
- comments including this code.
- Ed Sarlls, III
- Western Geophysical Exploration Products
- Houston, TX, USA
- sarlls@wg2.waii.com
- Opinions expressed are not Western Geo's and may not even be mine.
- Roger Meadows replies:
- Thank you for your comments on the article and for finding the problem with
- the sample program. I followed the process presented in the article to fix the
- problem. First, I modified the test portion of the sample program so that it
- finds the bug you described. I used the test code modifications you suggested.
- The changes did cause some of the test cases to fail. However, it seemed that
- all of the test cases should have failed. I had to increase the length of the
- test suffix to get all of the test cases to fail. Then, I modified the
- application code, also using the modifications you suggested, to fix the
- problem. Rerunning the test code demonstrated that the fix worked and that it
- did not break anything.
- I think your suggested modification to the test code makes a good addition to
- the rules for writing automated test code.
- 8. Make sure application code does not write beyond the end of buffers.
- STRWS. C, a revised listing of the sample program with "/*new*/" at the end of
- new lines, is available on the monthly code disk.
- Dear Mr. Plauger,
- I have been reading CUJ now for several years. My work has benefited much from
- the feature articles and editorials in your magazine. Now I would like to
- query that immense knowledge base for a specific need. I am involved in the
- development of a large data acquisition system running the iRMX operating
- system. The system is distributed around an FDDI network that is implemented
- primarily with virtual circuit connections linking software modules. We need
- to synchronize the time on all of the machines as closely as possible. I was
- wondering (hoping) that either you or one of the readers knows of some
- iterative-feedback type algorithm for performing such a synchronization; sort
- of a software version of the four-wire power supply.
- Thank you,
- Frank Metayer
- Electric Boat
- Groton, CT 06340
- I don't, but I hope one of our readers does. -- pjp
- Dear Mr. Plauger,
- I was just sitting here reading the letters to the editor in Vol. 11 No. 7 of
- CUJ, waiting on my compiler to decide if I got it right. In reading the
- letters, I'm amazed at how critical they are! I hope you have a similar number
- of positive letters. Please don't be discouraged. I believe you provide an
- excellent service and a valuable resource. Keep up the good work!
-
- Tedd Gimber
- Letters tend to be more negative than positive, I think because anger is a
- more powerful motivator to action than mere joy. Mostly, I've learned to
- mentally compensate for that bias. But I still enjoy letters like yours
- whenever they come in. Thanks. -- pjp
- Chuck Allison:
- In your article in the August 1993 of The C User Journal, you talk about the
- void * pointer. Well, is it true that void * == (int * or char * or float *)?
- Why or why not? Please explain because I would like to know why the following
- statement is invalid:
- struct PIZZA {
- int key;
- /* Other fields. */
- };
-
- void main(void)
- {
- PIZZA *myPizza
- = calloc(10, sizeof(PIZZA));
- /* ERROR message generated. */
-
- ...
-
- }
- Thanks in advance,
- -Con(rad)
- ps: I received an error message for the above statement for every compiler
- that I tried to run my code on.
- Chuck Allison replies:
- If you are using C, the problems with your program are:
- 1) You didn't include <stdlib.h> for calloc
- 2) You didn't qualify PIZZA with struct (or typedef it).
- The program shown in Listing 13 works fine.
- If you are using C++, you really shouldn't be using calloc. Listing 14 shows a
- C++ version that works.
- As far as your question about void * -- it is an animal unto itself. It is not
- an int * or any other *. Its purpose is to allow assignment to and from
- pointers to any type. It cannot be dereferenced, hence it is impossible to
- think of it as an int *. This has little to do with your program excerpt.
- Since calloc returns a void *, it can be assigned (in C) to a PIZZA * or any
- other pointer type. Correct problems 1) and 2) above, and you're home free.
- Hi,
- Just read your article in The C Users Journal. I'd like to propose something
- that would be of great benefit to control progammers like I am. Embedded
- systems, and dedicated controllers often need the equivalent of floating-point
- fractional numbers, without the overhead of a floating-point package. I'd like
- to see a modifier as follows:
- fixed -- This modifier implies that the variable associated with is is
- inherently split in the middle with a binary point. All math associated with
- it takes this into consideration. for example:
- fixed int scale_factor; -- would have (on a 16-bit machine) an 8-bit integer
- portion and an 8-bit fractional portion, and thus could represent 0 to
- 256.9960. By the same token, fixed unsigned char would have a 4-bit integer
- portion (0-15) and a 4-bit fraction (0.9375). A fixed unsigned long would of
- course have a 16-bit integer and a 16-bit fraction.
- Conversion rules:
- Fixed of one size to fixed of another: smaller to larger i.e. 4.4 -> 8.8 would
- simply have the first 4 bits placed into bits 0-3 of the first byte, and the
- second 4 bits placed into bits 7-4 of the second byte. For larger to smaller,
- such as 8.8 -> 4.4, bits 0 to 3 of the first byte (in the 8.8) would be placed
- into bits 7-3 of the 4.4 and bits 7-3 of the second byte of the 8.8 would go
- into bits 3-0 of the 4.4
- Conversion to integers would simply drop the fractional portion. Conversion to
- floats would give the floating point equivalent. I have had to write these
- sorts of things many times, and it is always an aggravation.
- The second thing that I'd like to propose is a type called quad. quad is a to
- a long as double is to a float. If a long is 32 bits, then a quad is 64 bits.
- quads could also be prefaced with the fixed modifier.
- Cheers,
- Woody Baker
- Postscript consultant/Flint knapper
- Austin, TX
- woody@knapper.cactus.org
- A superset of the fixed-point arithmetic you describe is in PL/I and Ada. I
- don't know how widely it actually gets used. I believe that the Numerical C
- Extensions Group (X3J11.1) has also explored extended-precision integers. --
- pjp
- Dear Bill,
- Thank you for the opportunity to present my article on "Extending C for Object
- Oriented Programming" (though I fear it will mark me for life as the "Macro
- King"). I have since received several kind letters by email asking for source
- and reporting bugs. It's letters like these make all the hassle worthwhile. I
- have also copied you on my mailing of the latest source in case you want to
- update the code disk for the article.
- Yours sincerely,
- Greg Colvin
- gregc@ihs.com
- P.J. Plauger:
- Salutations from the other side [of the world]. I've just finished reading
- your April editorial in CUJ, and wish to agree wholeheartedly. In the last 12
- months, it seems my professional world has been turned on both its ears with
- release after release of software, each claimed to be an improvement over the
- last. In general, this is true, but I can't help lying awake some nights
- feeling completely inadequate in my abilities to keep up with the pace of
- change. It makes me feel much better to see that the giants of this
- programmer's world (for such you are, even if you don't see yourself that way)
- also suffer some of the same feelings. While I have your ear, or eyes, is it
- possible to order backissues of CUJ by email? It is a long way from Melbourne,
- Australia, to basically anywhere, and it would be considerably more convenient
- for me to order them by email rather than snailmail. Having only recently
- subscribed to CUJ, I am still catching up (since we get magazines about three
- months late down here), and I appear to have missed the March 93 issue. If you
- have not already done so, do give Visual Basic a look. While not complete in
- itself, it can be a lot of fun to use, and when combined with C, it becomes a
- truly powerful environment. Enough of this. I thank you for your time in
- reading this.
- Stay sane.
- -Craig
- We are working on smoothing the process of ordering back issues electronically
- and otherwise. Meanwhile, I've forwarded your letter to R&D's customer service
- folk. -- pjp
- Dear PJP:
- I read your article "Developing the Standard C++ Library" in the October, 1993
- issue of The C Users Journal. I am very anxiously looking forward to your book
- on the Standard C++ library.
- As a member of the C++ user community I would like to add my voice to the
- outrage over the delay in getting some kind of documentation of the iostreams
- interface to the public. To my way of thinking, even greater fervor should be
- applied to releasing some minimal documentation. When the ANSI standard for C
- was developed, K&R had already been published for several years and users were
- comfortable with the stdio library interface. If any changes were made to
- stdio as a result of the ANSI standard, it was relatively easy to make code
- adjustments.
- This is not the case with C++, since as you point out there is very little
- documentation available on iostreams. Steve Teale's book is a step in the
- right direction and for that he deserves praise. The title of the book is
- somewhat misleading since the book does not pretend to be the definitive
- iostreams reference. As a further criticism, the book references ANSI working
- documents by Jerry Scharz and Mike Vilot which are not available to the
- general public.
- To the members of the ANSI committee who are working hard to get iostreams
- standardized, I apologize. Those members who are prolonging the development of
- the standard with excessive concern for minutiae should be set straight. There
- is a desparate need now in the user community for iostreams documentation. The
- ANSI committee should set an immediate goal of informing the public as to what
- iostreams features and interfaces are likely to remain stable in the standard.
- I look forward to your upcoming columns on the C++ standard library as well as
- your book. I hope you, Steve Teale, or someone else will soon satisfy the need
- for an iostreams reference.
- Larry Johnson
-
- NCR, Lisle
- cuuxb!laj
- 708-810-6524
- (VP 473-6524)
- I can only share your concern. And believe me, I want to get my C++ library
- book out soon, too. -- pjp
-
- Listing 8 ratint.c
- Function RATINT} for the RPFT Code
- /******************************************************
- * RATINT - Diagonal rational function interpolation in
- * the arrays xa[1..n] and ya[1..n].
- *****************************************************/
- void ratint(double xa[], double ya[], double *c,
- double *d, int n, double x, double *y)
- { int m,i,ns=1;
- double w,t,hh,h,dd;
- static double miny=1.e99;
-
- if (miny>1.e90) for (i=1; i<=n; ++i)
- if (ya[i]<miny) miny=ya[i];
- hh=fabs(x-xa[1]);
- for (i=1;i<=n;i++)
- { h=fabs(x-xa[i]);
- if (h == 0.0) {*y=ya[i]; return; }
- else if (h < hh) { ns=i; hh=h; }
- c[i]=ya[i]-miny; d[i]=ya[i]-miny+1.e-50;
- }
- *y=ya[ns--] - miny;
- for (m=1;m<n;m++)
- { for (i=1;i<=n-m;i++)
- { w=c[i+1]-d[i] ; h=xa[i+m]-x; t=(xa[i]-x)*d[i]/h;
- dd=t-c[i+1];
- if (fabs(t)>1.e15)
- fprintf(stderr,"Probable loss of accuracy in"
- "RATINT. fabs(t) > 1.e15 for X = %.8G\n",x);
- if (dd == 0.0)
- { fprintf(stderr,"Error in routine ratint. The"
- "function may have a pole at x=%.8G\n",x);
- exit(1);
- }
- dd=w/dd; d[i]=c[i+1]*dd; c[i]=t*dd;
- }
- *y += (2*ns < (n-m) ? c[ns+l] : d[ns--]);
- }
- *y += miny; return;
- }
-
- /* End of File */
-
-
- Listing 9 flagxmpl.cpp
- #include <iostream.h>
- #include <iomanip.h>
- int main(int, char**)
- {
- int ival;
-
- cin >> ival;
- cout << dec << ival << " cin.good()=" << cin.good() << '\n';
-
- cin.clear(); // reset any error
-
- cin >> setbase(0) >> ival;
- cout << dec << ival <<" cin.good()=" << cin.good() << '\n';
- cin.clear(); // reset any error
-
- cin >> dec >> ival;
- cout << dec << ival << " cin.good()=" << cin.good() << '\n';
- cin.clear(); // reset any error
-
- cin >> oct >> ival;
- cout << dec << ival << " cin.good()=" << cin.good() << '\n';
- cin.clear(); // reset any error
-
- cin >> hex >> ival;
- coat << dec << ival << " cin.good()=" << cin.good() << '\n';
- cin.clear(); // reset any error
-
- return 0;
- }
-
- // End of File
-
-
- Listing 10 copyprob.c
- /* pseudocode on */
- /* copy characters looking for tab or blank. */
- while (not end of string) {
- if (tab or blank) {
- copy blank and advance pointers
- while (not end of string)
- if (not tab or blank)
- break
- else
- advance from pointer
- }
- /* we are not pointing to a blank or tab */
- copy next character and advance pointers
- }
- terminate string
- /* pseudocode off */
-
- /* End of File */
-
-
- Listing 11 copyfix.c
- while (*from) {
- if (blank or tab) {
- /* stuff deleted. */
- }
- /* its not a blank or tab now. */
- if (*from)
- /* We use terminator to end outer while so
- don't advance past it. */
- /* We force termination after end of outer while so
- don't copy terminator either. */
- *to++ = *from++;
- }
- *to = '\0';
-
- }
-
- /* End of File */
-
-
- Listing 12 sfxtest.c
- #define TEST_SUFFIX "sfx test"
-
- main()
- {
- char *sfx; /* to check suffix for violation. */
-
- /* stuff deleted. */
-
- do {
-
- strcpy(buf, testin[testcase]);
- sfx = buf + strlen(buf)+1;
- strcpy(sfx, TEST_SUFFIX);
-
- /* tested function call return value deleted. */
-
- if (strcmp(buf,testout[testcase]) != 0) {
- /* error messages deleted. */
- }
- if (strcmp(sfx, TEST_SUFFIX) != {0) {
- /* new error messages. */
- }
- } while (not end of test data);
- /* final report deleted. */
- }
-
- /* End of File */
-
-
- Listing 13 pizza.c
- /* pizza.c */
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
-
- struct PIZZA
- {
- int key;
- char stuff[10];
- };
-
- main()
- {
- struct PIZZA *myPizza = calloc(10,sizeof(struct PIZZA));
- myPizza->key = 0;
- strcpy(myPizza->stuff,"good food");
- printf("%d: %s\n",myPizza[0].key,myPizza[0].stuff);
- return 0;
- }
-
- /* End of File */
-
-
-
- Listing 14 pizza.cpp
- // pizza.cpp
- #include <iostream.h>
- #include <string.h>
-
- struct PIZZA
- {
- int key;
- char stuff[10];
- };
-
- main()
- {
- PIZZA *myPizza = new PIZZA[10];
- myPizza->key = 0;
- strcpy(myPizza->stuff,"good food");
- cout << myPizza[0].key << ": "<< myPizza[0].stuff << endl;
- return 0;
- }
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- A New Algorithm for Data Compression
-
-
- Philip Gage
-
-
- Phil Gage is a software engineer in Colorado Springs and has a BS degree in
- computer science from the University of Colorado. He has been a professional
- programmer since 1983 and has used C since 1986. Phil can be reached at (719)
- 593-1801 or via CompuServe as 70541,3645.
-
-
- Data compression is becoming increasingly important as a way to stretch disk
- space and speed up data transfers.
- This article describes a simple general-purpose data compression algorithm,
- called Byte Pair Encoding (BPE), which provides almost as much compression as
- the popular Lempel, Ziv, and Welch (LZW) method. (I mention the LZW method in
- particular because it delivers good overall performance and is widely used.)
- BPE's compression speed is somewhat slower than LZW's, but BPE's expansion is
- faster. The main advantage of BPE is the small, fast expansion routine, ideal
- for applications with limited memory. The accompanying C code provides an
- efficient implementation of the algorithm.
-
-
- Theory
-
-
- Many compression algorithms replace frequently occurring bit patterns with
- shorter representations. The simple approach I present is to replace common
- pairs of bytes by single bytes.
- The algorithm compresses data by finding the most frequently occurring pairs
- of adjacent bytes in the data and replacing all instances of the pair with a
- byte that was not in the original data. The algorithm repeats this process
- until no further compression is possible, either because there are no more
- frequently occurring pairs or there are no more unused bytes to represent
- pairs. The algorithm writes out the table of pair substitutions before the
- packed data.
- This algorithm is fundamentally multi-pass and requires that all the data be
- stored in memory. This requirement causes two potential problems: the
- algorithm cannot handle streams, and some files may be too large to fit in
- memory. Also, large binary files might contain no unused characters to
- represent pair substitutions.
- Buffering small blocks of data and compressing each block separately solves
- these problems. The algorithm reads and stores data until the buffer is full
- or only a minimum number of unused characters remain. The algorithm then
- compresses the buffer and outputs the compressed buffer along with its pair
- table. Using a different pair table for each data block also provides local
- adaptation to varying data and improves overall compression.
- Listing 1 and Listing 2 show pseudocode for the compression and expansion
- algorithms.
-
-
- Implementation
-
-
- Listing 3 and Listing 4 provide complete C programs for compression and
- expansion of files. The code is not machine dependent and should work with any
- ANSI C compiler. For simplicity, error handling is minimal. You may want to
- add checks for hash table overflow, expand stack overflow and input/output
- errors. The expansion program is much simpler and faster than the compression
- program.
- The compression algorithm spends the most time finding the most frequent pair
- of adjacent characters in the data. The program maintains a hash table
- consisting of arrays left[], right[], and count[] to count pair frequencies.
- The hash table size HASHSIZE must be a power of two, and should not be too
- much smaller than the buffer size BLOCKSIZE or overflow may occur. Programmers
- can adjust the value of BLOCKSIZE for optimum performance, up to a maximum of
- 32767 bytes. The parameter THRESHOLD, which specifies the minimum occurrence
- count of pairs to be compressed, can also be adjusted.
- Once the algorithm finds the most frequently occurring pair, it must replace
- the pair throughout the data buffer with an unused character. The algorithm
- performs this replacement in place within a single buffer. As it replaces each
- pair, the algorithm updates the hash table's pair counts. This method of
- updating the hash table is faster than rebuilding the entire hash table after
- each pair substitution.
-
-
- Pair Table Compression
-
-
- After the program has compressed a buffer, the pair table contains entries of
- those pairs of bytes that were replaced by single bytes within the buffer.
- Figure 1 shows a sample pair table resulting from compression of a string of 9
- characters, with a hypothetical character set limited to 8 characters. The
- pair table does not store the replacement bytes; rather, a pair's position in
- the table indicates the value of the replacement byte. For example, in Figure
- 1, pair 'A':'B' is found in the pair table's 8th entry, which indicates that
- this particular pair was replaced by the character 'H'. Those entries in the
- pair table not containing a replaced pair are distinguished by a left code
- whose value is equal to its index (index == leftcodef[index]). (Note: The
- compression algorithm uses the array rightcode[] for two different purposes.
- During the initial part of the compression process, function fileread uses the
- rightcode[] array to flag used vs. unused characters. After buffer
- compression, rightcode[] serves as half of the pair table.)
- The algorithm must write the pair substitution tables to the output along with
- the packed data. It would be simple just to write the character code and pair
- for each substitution. Unfortunately, this method would require three bytes
- per code and would waste space. Therefore, this program applies a form of
- encoding to also compress the pair table before it is written to the output.
- To compress the pair table, the program steps through the table from the first
- entry thru its last entry, classifying each entry as representing a replaced
- pair (index != leftcode[index]) or as not representing a replaced pair (index
- == leftcode[index]). To encode a group of contiguous replaced pairs, the
- program emits a positive count byte followed by the pairs. To encode a group
- of contiguous table entries that don't represent replaced pairs, the program
- emits a negative count byte followed by one pair.
- In the encoded pair table a positive count byte indicates to the expansion
- program how many of the following pairs of bytes to read, while a negative
- byte causes the expansion program to skip a range of the character set and
- then read a single pair. This technique allows many pairs to be stored with
- only two bytes per code.
- To further increase pair table compression, I've modified the algorithm from
- the preceding description to avoid disrupting runs of pairs where possible.
- Specifically, the algorithm does not encode an isolated, single byte not
- representing a pair; instead, the algorithm writes the byte to output along
- with the pair data without an accompanying right code. The expansion algorithm
- knows that the byte does not represent pair data because the byte occurs at a
- position such that byte value == leftcode[byte value].
-
-
- Expansion
-
-
- As opposed to the compression algorithm, which makes multiple passes over the
- data, the expansion algorithm operates in a single pass. You can think of the
- expansion algorithm as a black box which obtains input bytes from one of two
- sources, the input file, or a stack (see Figure 2). Regardless of an input
- byte's source, the algorithm processes each byte according to the following
- rule: if the byte is a literal, the algorithm passes it to the output; if the
- byte represents a pair, the algorithm replaces it with a pair and pushes the
- pair onto the stack.
- Now, to complete the loop, the algorithm selects its input source according to
- the following rule: If the stack contains data, the algorithm obtains its next
- input byte from the stack. If the stack is empty, the algorithm obtains its
- next input byte from the input file.
- The effect of these rules is "local" expansion of byte pairs; that is, if a
- byte expands to a pair, and that pair contains one or more bytes in need of
- expansion, the algorithm will expand the newly created bytes before it reads
- any more from the input file.
-
-
- Advantages of BPE
-
-
- One significant advantage of the BPE algorithm is that compression never
- increases the data size. This guarantee makes BPE suitable for real-time
- applications where the type of data to be compressed may be unknown. If no
- compression can be performed, BPE passes the data through unchanged except for
- the addition of a few header bytes to each block of data. Some algorithms,
- including LZW, can greatly inflate the size of certain data sets, such as
- randomized data or pre-compressed files.
- LZW compression adapts linearly to frequently occurring patterns, building up
- strings one character at a time. The BPE algorithm adapts exponentially to
- patterns, since both bytes in a pair can represent previously defined pair
- codes. The previously defined pair codes can themselves contain nested codes
- and can expand into long strings. This difference between LZW and BPE provides
- better compression for BPE in some cases. For example, under BPE a run of 1024
- identical bytes in a row is reduced to a single byte after only ten pair
- substitutions. This nesting of pair codes is the real power of the algorithm.
- The following example illustrates this process:
- Original input data string: ABABCABCD
- Change pair AB to unused X: XXCXCD
- Change pair XC to unused Y: XYYD
-
- Finally, both BPE's compression and expansion algorithms require little memory
- for data arrays, 5 to 30K for compression and only 550 bytes for expansion.
- The expansion routine is so simple that, coded in assembler, it should require
- only about 2K of memory for all code and data.
-
-
- Results
-
-
- The BPE program delivers performance comparable to LZW, as shown in Table 1. I
- compressed and expanded what I consider to be a typical binary file, the
- Windows 3.1 program WIN386.EXE. I measured the timing on a 33MHz 486DX PC
- compatible using MS-DOS 5.0 and Borland C++ 3.0.
- I tested the BPE program against the LZW program, LZW15V.C, from The Data
- Compression Book by Mark Nelson, using 12-bit codes with a 5021 entry hash
- table and 14-bit codes with a 18041 entry hash table. The 12-bit version uses
- less memory for data but does not compress quite as well. I also tested
- several other LZW programs and obtained similar results.
- The Default BPE column shows the results of using the default parameters from
- Listing 3, which are tuned for good performance on all types of files,
- including binary and text. Although BPE packed this binary file slightly
- better than LZW, performance will vary on other files depending on the type of
- data.
- The Small BPE column shows the results of reducing the amount of memory
- available for the compression program data arrays. I changed BLOCKSIZE from
- 5000 to 800 and HASHSIZE from 4096 to 1024. These changes only slightly
- decreased the compression ratio on the binary file, but the smaller buffer
- size will not work as well on text files.
- The Fast BPE column shows the results of increasing compression speed, by
- changing THRESHOLD from 3 to 10. This change caused the program to skip pairs
- with less than 10 occurrences. Since the program compresses most frequently
- occurring pairs first, skipping low-frequency pairs near the end of block
- processing has little effect on the amount of compression but can
- significantly improve speed. This change reduced the compression time from 55
- to 30 seconds.
-
-
- Enhancing BPE
-
-
- The BPE algorithm could be enhanced by block size optimization. The block size
- is critical to both the compression ratio and speed. Large blocks work better
- for text, small blocks work better for binary data.
-
-
- Conclusion
-
-
- It's surprising that the BPE algorithm works as well as it does, considering
- that it discards all information on previous data and does not use
- variable-sized bit codes, contrary to many modern compression techniques. The
- BPE compression algorithm is useful for applications requiring simple, fast
- expansion of compressed data, such as self-extracting programs, image display,
- communication links and embedded systems. Advantages include a small expansion
- routine, low memory usage, tunable performance, and good performance on
- worst-case data. The disadvantages of BPE are slow compression speed and a
- lower compression ratio than provided by some of the commonly used algorithms,
- such as LZW. Even with these disadvantages, BPE is a worthwhile technique to
- have at your disposal.
-
-
- Bibliography
-
-
- 1) J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data
- Compression," IEE Transactions on Information Theory, May 1977.
- 2) T. Welch, "A Technique for High-Performance Data Compression," Computer,
- June 1984.
- 3) M. Nelson, The Data Compression Book, M&T Books, 1991.
- Figure 1 Illustration of compression process with hypothetical character set
- Figure 2 Illustration of expansion process
- Table 1 Comparison of LZW and BPE performance
- 12 bit 14 bit Default Small Fast
- LZW LZW BPE BPE BPE
- -------------------------------------------------------------------
- Original file
- size (bytes) 544,789 544,789 544,789 544,789 544,789
- Compressed file
- size (bytes) 299,118 292,588 276,955 293,520 295,729
- Compression time
- (secs) 28 28 55 41 30
- Expansion time
- (secs) 27 25 20 21 19
- Compression data
- size (bytes) 25,100 90,200 17,800 4,400 17,800
- Expansion data
- size (bytes) 20,000 72,200 550 550 550
-
- Listing 1 Compression algorithm (pseudocode)
- While not end of file
- Read next block of data into buffer and
- enter all pairs in hash table with counts of their occurrence
- While compression possible
- Find most frequent byte pair
- Replace pair with an unused byte
- If substitution deletes a pair from buffer,
- decrease its count in the hash table
-
- If substitution adds a new pair to the buffer,
- increase its count in the hash table
- Add pair to pair table
- End while
- Write pair table and packed data
-
- End while
-
-
- Listing 2 Expansion algorithm (pseudocode)
- While not end of file
- Read pair table from input
- While more data in block
- If stack empty, read byte from input
- Else pop byte from stack
- If byte in table, push pair on stack
- Else write byte to output
- End while
-
- End while
-
-
- Listing 3 Compression program
- /* compress.c */
- /* Copyright 1994 by Philip Gage */
-
- #include <stdio.h>
-
- #define BLOCKSIZE 5000 /* Maximum block size */
- #define HASHSIZE 4096 /* Size of hash table */
- #define MAXCHARS 200 /* Char set per block */
- #define THRESHOLD 3 /* Minimum pair count */
-
- unsigned char buffer[BLOCKSIZE]; /* Data block */
- unsigned char leftcode[256]; /* Pair table */
- unsigned char rightcode[256]; /* Pair table */
- unsigned char left[HASHSIZE]; /* Hash table */
- unsigned char right[HASHSIZE]; /* Hash table */
- unsigned char count[HASHSIZE]; /* Pair count */
- int size; /* Size of current data block */
-
- /* Function prototypes */
- int lookup (unsigned char, unsigned char);
- int fileread (FILE *);
- void filewrite (FILE *);
- void compress (FILE *, FILE *);
-
- /* Return index of character pair in hash table */
- /* Deleted nodes have count of 1 for hashing */
- int lookup (unsigned char a, unsigned char b)
- {
- int index;
-
- /* Compute hash key from both characters */
- index= (a ^ (b << 5)) & (HASHSIZE-1);
-
- /* Search for pair or first empty slot */
- while ((left[index[ != a right[index] != b) &&
- count[index] != 0)
-
- index = (index + 1) & (HASHSIZE-1);
-
- /* Store pair in table */
- left[index] = a;
- right[index]= b;
- return index;
- }
-
- /* Read next block from input file into buffer */
- int fileread (FILE *input)
- {
- int c, index, used=0;
-
- /* Reset hash table and pair table */
- for (c = 0; c < HASHSIZE; c++)
- count[c] = 0;
- for (c = 0; c < 256; c++) {
- leftcode[c] = c;
- rightcode[c] = 0;
- }
- size= 0;
-
- /* Read data until full or few unused chars */
- while (size < BLOCKSIZE && used < MAXCHARS &&
- (c = getc(input)) != EOF) {
- if (size > 0) {
- index = lookup(buffer[size-1],c);
- if (count[index] < 255) ++count[index];
- }
- buffer[size++] = c;
-
- /* Use rightcode to flag data chars found */
- if (!rightcode[c]) {
- rightcode[c] = 1;
- used++;
- }
- }
- return c == EOF;
- }
-
- /* Write each pair table and data block to output */
- void filewrite (FILE *output)
- {
- int i, len, c = 0;
-
- /* For each character 0..255 */
- while (c < 256) {
-
- /* If not a pair code, count run of literals */
- if (c == leftcode[c]) {
- len = 1; c++;
- while (len<127 && c<256 && c==leftcode[c]) {
- len++; c++;
- }
- putc(len + 127,output); len = 0;
- if (c == 256) break;
- }
-
- /* Else count run of pair codes */
-
- else {
- len = 0; c++;
- while (len<127 && c<256 && c!=leftcode[c]
- len<125 && c<254 && c+1!=leftcode[c+1]) {
- len++; c++;
- }
- putc(len,output);
- c -= len + 1;
- }
-
- /* Write range of pairs to output */
- for (i = 0; i <= len; i++) {
- putc(leftcode[c],output);
- if (c != leftcode[c])
- putc(rightcode[c],output);
- c++;
- }
- }
-
- /* Write size bytes and compressed data block */
- putc(size/256,output);
- putc(size%256,output);
- fwrite(buffer,size,1,output);
- }
-
- /* Compress from input file to output file */
- void compress (FILE *infile, FILE *outfile)
- {
- int leftch, rightch, code, oldsize;
- int index, r, w, best, done = 0;
-
- /* Compress each data block until end of file */
- while (!done) {
-
- done = fileread(infile);
- code = 256;
-
- /* Compress this block */
- for (;;) {
-
- /* Get next unused char for pair code */
- for (code--; code >= 0; code--)
- if (code==leftcode[code] && !rightcode[code])
- break;
-
- /* Must quit if no unused chars left */
- if (code < 0) break;
-
- /* Find most frequent pair of chars */
- for (best=2, index=0; index<HASHSIZE; index++)
- if (count[index] > best) {
- best = count[index];
- leftch = left[index];
- rightch = right[index];
- }
-
- /* Done if no more compression possible */
- if (best < THRESHOLD) break;
-
-
- /* Replace pairs in data, adjust pair counts */
- oldsize = size - 1;
- for (w = 0, r = 0; r < oldsize; r++) {
- if (buffer[r] == leftch &&
- buffer[r+1] == rightch) {
-
- if (r > 0) {
- index = lookup(buffer[w-1],leftch);
- if (count[index] > 1) --count[index];
- index = lookup(buffer[w-1],code);
- if (count[index] < 255) ++count[index];
- }
- if (r < oldsize - 1) {
- index = lookup(rightch,buffer[r+2]);
- if (count[index] > 1) --count[index];
- index = lookup(code,buffer[r+2]);
- if (count[index] < 255) ++count[index];
- }
- buffer[w++] = code;
- r++; size--;
- }
- else buffer[w++] = buffer[r];
- }
- buffer[w] = buffer[r];
-
- /* Add to pair substitution table */
- leftcode[code] = leftch;
- rightcode[code] = rightch;
-
- /* Delete pair from hash table */
- index = lookup(leftch,rightch);
- count[index] = 1;
- }
- filewrite(outfile);
- }
- }
-
- void main (int argc, char *argv[])
- {
- FILE *infile, *outfile;
-
- if (argc != 3)
- printf("Usage: compress infile outfile\n");
- else if ((infile=fopen(argv[1],"rb"))==NULL)
- printf("Error opening input %s\n",argv[1]);
- else if ((outfile=fopen(argv[2],"wb"))==NULL)
- printf("Error opening output %s\n",argv[2]);
- else {
- compress(infile,outfile);
- fclose(outfile);
- fclose(infile);
- }
- }
-
- /*End of File */
-
-
- Listing 4 Expansion program
- /* expand.c */
-
- /* Copyright 1994 by Philip Gage */
-
- #include <stdio.h>
-
- /* Decompress data from input to output */
- void expand (FILE *input, FILE *output)
- {
- unsigned char left[256], right[256], stack[30];
- short int c, count, i, size;
-
- /* Unpack each block until end of file */
- while ((count = getc(input)) != EOF) {
-
- /* Set left to itself as literal flag */
- for (i = 0; i < 256; i++)
- left[i] = i;
-
- /* Read pair table */
- for (c = 0;;) {
-
- /* Skip range of literal bytes */
- if (count > 127) {
- c += count - 127;
- count = 0;
- }
- if (c == 256) break;
-
- /* Read pairs, skip right if literal */
- for (i = 0; i <= count; i++, c++) {
- left[c] = getc(input);
- if (c != left[c])
- right[c] = getc(input);
- }
- if (c == 256) break;
- count = getc(input);
- }
-
- /* Calculate packed data block size */
- size = 256 * getc(input) + getc(input);
-
- /* Unpack data block */
- for (i = 0;;) {
-
- /* Pop byte from stack or read byte */
- if (i)
- c = stack[--i];
- else {
- if (!size--) break;
- c = getc(input);
- }
-
- /* Output byte or push pair on stack */
- if (c == left[c])
- putc(c,output);
- else {
- stack[i++] = right[c];
- stack[i++] = left[c];
- }
- }
-
- }
- }
-
- void main (int argc, char *argv[])
- {
- FILE *infile, *outfile;
- if (argc != 3)
- printf("Usage: expand infile outfile\n");
- else if ((infile=fopen(argv[1],"rb"))==NULL)
- printf("Error opening input %s\n",argv[1]);
- else if ((outfile=fopen(argv[2],"wb"))==NULL)
- printf("Error opening output %s\n",argv[2]);
- else {
- expand(infile,outfile);
- fclose(outfile);
- fclose(infile);
- }
- }
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Two Fast Pattern-Matching Algorithms
-
-
- Erick Otto
-
-
- Erick Otto has a degree in computer science and has been working in the
- customer support group for the 5ESS telephone switch since 1985. He has been
- programming in C since 1984 and in the last four years has been involved in
- programming a Customer Complaints Database for AT&T switching products. His
- particular interests are String/Pattern searching and techniques to make C
- programs run as fast as possible. He is currently working as supervisor of
- AT&T-NS local field support in Indonesia. He may be reached at
- eotto@lfsin.idgtw.attibr.att.com or com!att!attibr!idgtw!lfsin!eotto.
-
-
-
-
- Introduction
-
-
- This article introduces two algorithms that are very fast in finding strings
- in large text files. I have implemented them to perform string searching in a
- text database of over 120 Megabytes. I needed algorithms that were very fast
- and flexible, because the normal -- and still very much used -- one-by-one
- character comparison method was too slow to be practical. The algorithms I
- discuss here are about ten times faster on average than the "normal" UNIX
- grep.
- The first algorithm is actually the faster. It is based on a tuned Boyer and
- Moore algorithm [2] further developed by Hume and Sunday [3]. The second
- algorithm, called "shift-OR," is based on algorithms found by Beaza-Yates and
- Gonnet [1] and further developed by Wu and Manber. It provides more
- flexibility, including regular-expression searching without much loss of
- speed.
-
-
- The Boyer and Moore Algorithm
-
-
- The basics of the first algorithm were first found by Boyer and Moore in 1977
- [2]. The original goal was to make as few character comparisons as possible in
- order to speed up the search. They found that a low number of character
- comparisons usually means that the program is faster. By using efficient
- programming, and testing various constructions of algorithms, a higher
- execution speed can be obtained.
- One of the differences from grep is that the algorithm makes use of
- information retrieved from the pattern itself. Knowing that a certain
- character does not occur in the pattern is useful information. One other
- difference from grep is that the characters are compared fight to left.
- The character comparison starts at the last character of the pattern and the
- related position of that character in the text. (See Figure 1a.) In other
- words, if we are positioned in the text at the character x, and the pattern we
- are looking for is today, there is no possibility that we can have a match at
- this position since x does not occur in the pattern.
- In order to know which characters occur in the pattern and which do not, we
- build a table dist, which contains the distances of every character in the
- pattern calculated from the end. The characters that do not occur in the
- pattern are set to the length of the pattern. (See Figure 2.) This is another
- difference from grep. We have some overhead in pre-processing the pattern.
- Now we can look up the character x in the table and find that the distance is
- 5. This means we can advance our text pointer by 5. (See Figure 1b.) The
- pointer into the pattern is untouched. We start the comparison again with the
- new pointer into the text.
- Now consider another situation, where the text character is an a. We can
- advance only dist[a], or one character in the text. That way, the second
- character on the left of both the text and the pattern would be the character
- a. (See Figure 1c.)
- Using this algorithm speeds up the search considerably because not every
- character in the text is compared with the pattern. It will at first only look
- for a starting position in the text to start the full pattern comparison.
- Another nifty trick that is used at this point is to see if the character that
- occurs the least frequently in the text (the lfc, for short) is present where
- we expect it. If we assume that m is the lfc in the pattern memorandum, we can
- do a pre-check on the leftmost occurrence of the lfc in the text. See Figure
- 3. This figure also shows at what positions which elements of the algorithms
- are used. If the pre-test is satisfied, we perform a normal match, comparing
- every character in the pattern with every character in the text.
- Finally the last basic step is to go forward in the text after we tested for a
- possible match. After extensive research performed by Hume and Sunday [3], the
- shift used is based on the first leftward re-occurrence of the last character
- of the pattern. If there is no such re-occurrence we forward by the pattern
- length.
- This algorithm is very fast, but depends on the length of the pattern and the
- frequency of the characters in the pattern. Listing 1 shows my implementation
- of the Boyer and Moore search algorithm. Listing 3 shows a small program that
- will generate a frequency distribution derived from a file.
-
-
- The Shift-OR Algorithm
-
-
- The other algorithm that I present is based on binary operations instead of
- character comparisons. This is faster than grep because mostly binary
- operations are performed.
- In this method we track the state of the search by using binary operations.
- The state variable will tell us how many characters have matched to the left
- of the current position in the text. To do this we need to maintain a table
- that contains information obtained from the pattern. The position of every
- character that occurs in the pattern gets a zero set in the table entry for
- that character. (See Figure 4.)
- The characters are counted from left to right and the bits are counted from
- right to left. Since the character e occurs twice in the pattern stereo, it
- gets two bits set to zero. Characters that do not occur in the pattern get all
- ones set in the table. The initial state gets set to all ones. To perform the
- comparison we first shift the state left one bit. (In C, this mean insert a
- zero.) We then bitwise-OR the table entry for that character in the text. It
- can be envisioned as a window, opening (zero) when comparing a certain
- position.
- Suppose that the current character in the text is s. We first shift the state
- left one bit and bitwise-OR in T[s]. The result is the new state of the
- search, which has bit zero set to zero. This means that at the current
- position in the text there is one match of length one. (See Figure 5a.)
- If the word stereo is in the text and we are positioned in the text at the
- second e, the state will be 1...101111. This means that there is one match of
- length five left of the current position. (See Figure 5b.)
- The way to decide if we have found a match is to set another variable to the
- length of the pattern (stereo has length 6) in the appropriate way -- in this
- case to 1...000000. Now, if the state is less than the stop variable, we have
- found a match (1...0111111..1000000).
- The flexibility of this algorithm comes from the fact that we can easily
- manipulate the character table T. If we set the second bit from the right to a
- zero for every entry, this means we don't care what the second character of
- the pattern is. This is the so-called wildcard search. And the nice thing
- about this is that it doesn't cost a penny more to perform.
- We can apply the same logic to ranges of characters. We could for example set
- the second bit in table T for the characters t, p, and f to a zero. This would
- match the words stereo, spereo, and sfereo in the text. You could even make
- use of the information in the state to report partial matches.
- This algorithm is slower than the previous one but does not depend on the
- length of the pattern or the frequency of the characters. It can also be used
- in a much more flexible way. Listing 2 shows my implementation of the shift-OR
- search algorithm.
-
-
- Implementations
-
-
- The code presented hides a few tricks I would like to mention. I have put a
- lot of comments in the code to make clear what is happening. First the speed
- of a program like this also depends on the I/O mechanism that is used. In line
- 20 in the first program (Listing 1) I therefore use the system-defined buffer
- size. This will reduce the number of I/O accesses. Usually, using buffers less
- than the system buffer size will cause unnecessary I/O.
- Further, in line 74 you will see a technique used to keep the number of tests
- inside a loop as small as possible. It usually saves a lot of execution time
- setting sentinels at the end or beginning of a buffer when searching for a
- character, instead of testing inside your loop for reaching the beginning or
- the end of your buffer.
- In line 80 we make sure that we end the text search on a line boundary, to
- prevent the match routine from getting half words. The build routine does the
- preparation work for the pattern search. It initializes the distance table,
- finds the lowest frequency character, and sets the shift. In line 176 and 177
- again you see the sentinel technique, although a little different than before.
- We need to put as many characters at the end of the buffer as the text pointer
- can be forwarded. (See Figure 3.)
- The other two more interesting points are the code lines at line 191-193 and
- line 234. At line 191 a technique called loop unrolling is used. Loop
- unrolling prevents a lot of branching (for iterations through the loop). On
- modern pipelined machines, the code is usually faster since most probably all
- the instructions can be contained in the instruction cache. Note that when on
- line 191 k is zero, lines 192-193 will also return k as zero, which is a
- prerequisite of using loop unrolling in this case. On different machine
- architectures, unrolling less or more times can be benificial. You can play
- around with it.
- Last, on line 234 we forward the text to the end of the line if we reported a
- match. Since we already printed the full line, why should we search for more
- occurences of the pattern? If you want to count the number of matches, this
- line should be removed!
- This program can also be easily adapted to do case-insensitive searches. The
- pattern will have to be converted to lower case. All table entries for
- characters occuring in the pattern will have to be set to their distances, but
- also their upper case counterparts will have to be set. Besides that, lines
- 205 and 209 will have to be changed so that references to the text are
- translated to lower case. The speed of the program is hardly lowered if this
- is done with a translation table using something like:
- char TR[MAXSYM];
-
- /* build the case TRanslation table
- */
- for(i=0; i < MAXSYM; i++)
- TR[i] = i;
- for(i='A'; i <= 'Z'; i++)
- TR[i] = i + 'a' - 'A';
- The shift-OR program (Listing 2) uses the same efficient programming
- techniques. Starting at line 13, the size of a "word" (unsigned int) is
- defined. This is needed since the number of bits in a word is also the limit
- for the maximum pattern length that can be used. Of course you can use two
- words (this will slow the algorithm down), but the function matchpat will have
- to be altered.
- The build function in line 80, does the pre-processing of the pattern. I have
- included a small subset of the normal regular-expression pattern matching. The
- build function can handle a wildcard (dot or .) and ranges such as [a-z]. I am
- sure more inventive ways can be found to implement this, but this one works
- fine. In line 242, you can see one limitation that this implementation has.
- The first character needs to be normal, not a regular expression
- metacharacter. This actually speeds things up. Finally, on line 250, you see
- the actual binary operations being performed.
- References
- [1] R. Baeza-Yates and G. H. Gonnet. "A New Approach to Text Searching,"
- Communications of the ACM, Oct. 1992, Vol 35, No 10.
- [2] R.S. Boyer and J.S. Moore. "A Fast String Searching Algorithm,"
- Communications of the ACM, Oct. 1977, Vol 20, No 10.
- [3] A. Hume and D. Sunday. "Fast String Searching," Software, Practice and
- Experience, Fall 1991.
- Figure 1 Text searching with a distance table
- Figure 2 The distance table used in Figure 1
- ∙
- ∙
- ∙
- dist[a] = 1
- ∙
- ∙
- dist[d] = 2
- ∙
- ∙
- dist[o] = 3
- ∙
- ∙
- dist[x] = 5
- dist[y] = 0
- dist[z] = 5
- ∙
- ∙
- ∙
- Figure 3 Detailed illustration of Boyer and Moore text search, with fast
- forwarding, lfc comparison, and shifting
- Figure 4 A table for the Baeza-Yates and Gonnet algorithm
- Pattern = stereo
-
- T[a] = 1...111111
-
- ∙
- ∙
-
- T[e] = 1...101011
- T[o] = 1...011111
- T[r] = 1...110111
- T[s] = 1...111110
- T[t] = 1...111101
-
- ∙
- ∙
- ∙
- Figure 5 The shift-OR process
- Figure 5a
- initial state = 1...111111
-
- state<<1 = 1...111110
- T[s] = 1...111110
- ------------- or
- new state 1...111110
- Figure 5b
-
- state<<1 = 1...101110
- T[e] = 1...101011
- ------------ or
- new state 1...101111
-
- Listing 1 The Boyer and Moore algorithm
- /************************************************
- * Fast String Search
- * Using a tuned Boyer and Moore algorithm.
- * This source code is written by Erick Otto. Algorithms from
- * Daniel Sunday and Andrew Hume are used in this code.
- * 1993 by Erick Otto.
- */
-
- #include <stdio.h>
- #include <fcntl.h>
- #include "freq.h"
-
- /* Size of text buffer, BUFSIZ */
- /* is optimum from stdio.h */
- #define TBUF (8*BUFSIZ)
-
- #define MAXPAT 256
-
- /* We use the following variables for the search */
- /* pat : The pattern */
- /* dist : A table which contains the offsets */
- /* for every char in the pattern from */
- /* the end of the pattern */
- /* lfc : The least frequent char in the patt */
- /* lfcoff : The offset of lfc from end of patt */
- /* shift : The leftward re-occurance of the */
- /* last charater in positions from the */
- /* end of the patt */
-
- int patlen;
- unsigned char pat[MAXPAT];
- unsigned char dist[MAXPAT];
- int lfc, lfcoff;
- int shift;
-
- main(argc, argv)
- int argc;
- char **argv;
- {
- char buf[TBUF+MAXPAT];
- char match[MAXPAT];
- register int fd, nread;
- register char *beg, *lastnl, *end;
- char filename[80];
-
- /* must be at least one arguments */
- if (argc < 2 ) {
- fprintf(stderr,"\n\tMatch string expected\n");
- exit();
- }
- if (argc < 3 ) {
- fprintf(stderr,"\n\tFilename expected\n");
- exit();
-
- }
-
- strcpy(match, argv[1]);
- strcpy(filename, argv[2]);
- build(match, strlen(match));
-
- if ((fd = open(filename,O_RDONLY)) == -1) {
- fprintf(stderr,
- "Can't open file %s\n", filename);
- perror(filename);
- exit(1);
- }
-
- *buf = '\n'; /* sentinel for printing purposes */
- beg = buf+1;
-
- while((nread=read(fd,beg,(&buf[TBUF]-beg)))>0){
- end = beg+nread;
- lastnl = end;
- while(*--lastnl != '\n') ;
-
- /* lastnl points to the newline */
- matchpat(buf+1, lastnl-buf);
-
- memcpy(buf+1, lastnl, end-lastnl-1);
- beg = buf + 1 + (end-lastnl);
- }
- close(fd);
- }
-
- build(match, m)
- unsigned char *match;
- register m;
- {
- register unsigned char *patend, *patptr;
- register unsigned char *d;
- register int i;
- register unsigned char *pos;
- unsigned char *lastchar;
- int lfcidx;
-
- if(m > MAXPAT) {
- printf("Length of the pattern too long!\n");
- exit(1);
- }
-
- /* initialize the pattern variables */
- patlen = m;
- memcpy(pat, match, patlen);
-
- /* build the dist table */
- d = dist;
-
- /* initialize all elements to the pattern length */
- for(i = 0; i < MAXPAT; i++)
- d[i] = patlen;
-
- /* set all characters in the pattern to their */
- /* relative distances from the end of the pattern */
-
-
- patptr = pat;
- patend = patptr+patlen-1;
-
- while(patptr < patend) {
- d[*patptr] = patend-patptr;
- patptr++;
- }
- d[*patptr] = 0;
-
- /* find the least frequent character */
- /* that occurs in the pattern */
-
- lfcidx = 0;
- for(i = 1; i < patlen; i++){
- if(freq[pat[i]] < freq[pat[lfcidx]])
- lfcidx = i;
- }
-
- /* set the lf char and offset for later use */
- lfc = pat[lfcidx];
- lfcoff = lfcidx - (patlen-1);
-
- /* Look backward and see if we can find an other */
- /* occurance of the same character as the last one */
- lastchar = patptr;
- for(pos = lastchar-1; pos >= pat; pos--)
- if (*pos == *lastchar) break;
-
- /* record the occurance for later use */
- shift = lastchar - pos;
- }
-
- matchpat(text, n)
- unsigned char *text;
- int n;
- {
- register unsigned char *e, *s;
- register unsigned char *dp;
- register int k;
- register int lfco, lfcc;
- register unsigned char *p, *q;
- register unsigned char *ep;
- register char *sp;
- register char *nl;
- register int t1;
- char save[MAXPAT];
-
- dp = dist;
- t1 = patlen-1;
- sp = save;
- s = text+t1;
- e = text+n;
-
- memcpy(sp, e, patlen);
- memset(e, pat[t1], patlen);
- lfco = lfcoff;
- lfcc = lfc;
- ep = pat + t1;
-
-
- while(s < e){
-
- /* fast forward through the text untill we find */
- /* the last char of the pattern (unrolled loop) */
-
- k = dp[*s];
- while(k){
- k = dp[*(s += k)];
- k = dp[*(s += k)];
- k = dp[*(s += k)];
- }
-
- /* If we hit the sentinel at the end of */
- /* the buffer we are done with the buffer */
- if(s >= e)
- break;
-
- /* Neat little trick, actually makes things */
- /* faster. Do a pre check on the lfc, */
- /* before starting the full comparison */
- if(s[lfco] != lfcc)
- goto mismatch;
-
- /* normal forward string matching */
- for(p = pat, q = s-t1; p < ep; ){
- if(*q++ != *p++)
- goto mismatch;
- }
-
- /** A match has been found **/
- /* at position q - patlen */
-
- nl=q;
-
- /* look back for the closest newline */
-
- while(*--nl != '\n')
- ;
- while(*++nl != '\n') putchar(*nl);
- putchar('\n');
-
- /* we are now at q which is somewhere in the
- * text we reported a match by printing the
- * line, so any possible other matches will
- * NOT have to be searched for in this line
- * so forward to the end of line
- */
- q=nl;
-
- mismatch:
- /* use the shift that we calculated */
- s += shift;
- }
- memcpy(e, sp, patlen);
- return(0);
- }
-
- /* End of File */
-
-
-
- Listing 2 The Beaza-Yates and Gonnet algorithm
- /*********************************************
- * Fast String Search, using shift or algorithm
- * July 1993 by Erick Otto.
- * Algorithms from Beaza-Yates and Gonnet
- * are used in this code.
- **********************************************/
-
- #include <stdio.h>
- #include <fcntl.h)
- #include <ctype.h>
-
- #define WORD 32 /* # of bits in an Uint */
- #define B 1 /* # of bits to shift */
- #define MAXSYM 256 /* alphabet size */
-
- unsigned int stopmask;
- unsigned int T[MAXSYM];
-
- #define TBUF (8*BUFSIZ) /* X times system buffer */
-
- #define MAXPAT WORD
-
- main(argc, argv)
- int argc;
- char **argv;
- {
- char buf[TBUF+2];
- char match[MAXPAT];
- register int fd, nread;
- register char *beg, *lastnl, *end;
- char filename[100];
- char first;
- char build();
-
- /* must be at least two arguments */
- if (argc < 2 ) {
- fprintf(stderr,"\n\tMatch string expected\n");
- exit(1);
- }
- if (argc < 3 ) {
- fprintf(stderr,"\n\tFile expected\n");
- exit(1);
- }
-
- strcpy(match, argv[1]);
- strcpy(filename, argv[2]);
- first = build(match);
-
- if ( (fd = open(filename,0_RDONLY)) == -1) {
- fprintf(stderr,
- "Can't open file %s\n",filename);
- perror(filename);
- exit(1);
- }
-
- *buf = '\n'; /* sentinel for printing purposes */
-
- beg = buf+1; /* start filling buffer at buf+1 */
-
- while((nread=read(fd,beg,(&buf[TBUF]-beg))) > 0){
- end = beg+nread;
- lastnl = end;
-
- while(*--lastnl != '\n')
- ;
-
- /* lastnl points to the newline */
- matchpat(buf+1, lastnl-buf,first);
-
- /* move the part skipped (from newline to end of */
- /* buffer) back to the begin of the buffer. */
- memcpy(buf+1, lastnl, end-lastnl-1);
- beg = buf+1 + (end-lastnl);
- }
- close(fd);
- }
-
- char
- build(pat)
- register char *pat;
- {
- unsigned int mask;
- int i,j;
- char *C_pat;
- char *space;
- char *malloc();
- char startrange;
- char endrange;
- char first;
-
- first = *pat;
-
- /* pre process */
-
- if (first == '.' first == '[') {
- fprintf(stderr,"%s %s\n",
- "Can not start with wildcard \'.\'",
- "or regular expression range");
- exit(1);
- }
-
- /* Initialize the table */
- for (i=0;i<MAXSYM;i++) {
- T[i] = ~0;
- }
-
- /* allow for . wildcard */
- space=malloc((strlen(pat)+1) * sizeof(char));
- if (space == NULL) {
- fprintf(stderr,"Malloc failed\n");
- exit(1);
- }
- C_pat = space;
- mask = ~0;
-
- /* scan the pattern for ranges and or wildcards */
-
- for(i=1;*pat;i <<= B) {
-
- switch(*pat) {
-
- case '[':
- /*start of a range*/
-
- pat++;
- while(*pat != ']' && *pat) {
-
- if (isalnum(*pat)) {
-
- /* if nxt char is a dash it's a range */
-
- if (*(pat+1) == '-') {
- startrange = *pat;
- pat++;
- pat++;
- endrange = *pat;
- for(j=startrange;j<=endrange;j++){
- T[j] & = ~i;
- }
- *C_pat = startrange;
- } else {
- /*normal character in range*/
- *C_pat = *pat;
- T[*pat] &= ~i;
- }
- } else if (*pat == '-') {
-
- if (*(pat-1) == '[')
- fprintf(stderr,
- "Invalid Expression\n");
- exit(1);
- } else {
-
- fprintf(stderr,
- "Invalid Expression\n");
- exit(1);
- }
-
- pat++;
- } /* while not ']' */
- C_pat++;
- break;
-
- case '.':
- /* We need to set the char position to something */
- * special, in this case it doesn't really matter */
- *C_pat++ = 'a';
- mask &=~i;
- break;
- default:
- *C_pat++ = *pat;
- break;
- } /* end of switch */
- pat++;
- } /* end of for loop */
-
-
- /* close the string and rewind the ptr */
- *C_pat= '\0';
- C_pat = space;
-
- if (strlen(C_pat) > MAXPAT) {
- fprintf(stderr,
- "Pattern too long max %d chars\n",WORD);
- exit(1);
- }
-
- /* apply wildcard mask to all elements */
- for (i=0;i<MAXSYM;i++) {
- T[i] =T[i] & mask;
- }
-
- /* Set masks for all chars that occur in the */
- /* pattern and calculate the match criteria */
-
- stopmask=0;
- for(j=1;*C_pat;j <<=B) {
- T[*C_pat] &= ~j;
- stopmask =j;
- C_pat++;
- }
-
- stopmask = ~(stopmask >>B);
- free(space);
- return(first);
- }
-
- matchpat(text,n,first)
- register char *text;
- int n;
- char first;
- {
-
- register unsigned int state,initial;
- int matches;
- int gotoeol = 0;
- register char *nl;
- register char *end;
- char *savepos;
- char savechar;
-
- end = text+n;
-
- /* save the last character which we use */
- /* as a sentinel for the while loop */
- savepos = end+1;
- savechar = *savepos;
- *savepos = first;
-
- /* search */
- matches = 0;
- initial = ~0;
-
- do {
-
- /* fast scan */
-
- while(*text != first) text++;
-
- if (text > end) continue;
-
- state=initial;
-
- do {
-
- state= (state << B) T[*text];
-
- if (state <stopmask) {
- /****** match **********/
-
- nl=text;
-
- /*look back for the closest newline*/
- while(*--nl != '\n')
- ;
-
- while(*++nl != '\n') putchar(*nl);
- putchar('\n');
-
- /* skip the rest of the line */
- text=nl;
- }
- text++;
- } while ( state != initial);
- } while (text < end);
-
- /* reset the character saved */
- *savepos=savechar;
- return(0);
- }
-
- /* End of File */
-
-
- Listing 3 Generates frequency distribution of characters in a file
- /*******************************************
- * Generate a header file with freqency
- * distribution obtained from specified
- * file
- *
- * 1993 by Erick Otto
- ******************************************/
- #include <stdio.h>
-
- main(argc,argv)
- int argc;
- char *argv[];
- {
-
- FILE *iptr,*optr;
- int dist[256];
- int c;
- int i;
- int total;
-
- for (i=0;i<256;i++) dist[i] =0;
-
- total=0;
-
- if (argc < 3) {
- fprintf(stderr,"Not enough arguments\n");
- fprintf(stderr,"%s infile outfile\n",argv[0]);
- exit(1);
- }
-
- if ((iptr = fopen(argv[1],"r")) == NULL) {
- fprintf(stderr,
- "Can not open file %s\n",argv[1]);
- }
- if ((optr = fopen(argv[2],"w")) == NULL) {
- fprintf(stderr,
- "Can not open file %s\n",argv[2]);
- }
-
- while ((c = getc(iptr)) != EOF) {
- totals++;
- dist[c]++;
- }
-
- fprintf(optr,"double freq[256] = { /* From %s */\n",
- argv[1]);
-
- for (i=0;i<256;i++) {
-
- fprintf(optr,"%G, ",(double)dist[i]/(double)total);
-
- if (i !=0 && (i % 4) == 0) fprintf(optr,"\n");
- }
- fprintf(optr,"};\n");
- }
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Coding with Printable Characters
-
-
- Thad Smith
-
-
- Thad Smith III is a consultant developing software for embedded measurement,
- control, and communication applications. He is current moderator of the
- Fidonet C Echo and chairman of the localchapter for ACM. He can be reached at
- (303)449-3322 or thadsmith@acm.org.
-
-
-
-
- Introduction
-
-
- I recently started exploring efficient ways to represent a binary stream with
- printable ASCII characters. While high efficiency is not needed for most
- applications, I found it a good exercise in exploring coding techniques. There
- are 95 printable ASCII characters, including space, which can be used, but I
- chose to exclude spaces, since spaces at the beginning or end of a line might
- get lost (or added!). That left 94 characters.
- Ninety-four characters could easily represent six bits (64 values), but seemed
- wasteful. What could I represent with two characters? That would give 94*94 =
- 8,836 combinations, which would nicely hold 13 bits (213 = 8,192) -- a good
- fit. Then I asked the opposite question: how large a character set was needed
- to represent 13 bits in two characters? That would be the square root of
- 8,192, which is just over 90.5. I chose the 91 consecutive ASCII characters
- from ! through {. Listing 1 shows how the 13-bit values were coded into two
- characters. Listing 2 shows how the characters were translated back to the
- original 13 bits.
- The harder part was grouping the input into 13-bit groups when encoding, then
- distributing the 13 bits into eight-bit bytes on output when decoding. The
- efficiency for this conversion is 13/16, since 13 bits are coded into 16 bits
- (assuming eight-bit characters), not counting the overhead of block start,
- block stop, and newlines. Later, someone suggested the challenge of writing an
- efficient encoder program. I knew that the 13/16 conversion program was fairly
- efficient, but wondered if I could do better. I could.
- The most efficient implementation would use all of the combinations of the
- character set, which I restricted to the 94 graphic characters. The amount of
- information in one such character is log2(94) = ln(94)/ln(2) = 6.55458+ bits.
- I was using 8*13/16 = 6.5 bits per encoded character. Using the notation of
- input bits per output bits, the efficiency limit for 94 characters is
- log2(94)/8 = 0.8193236+.
- I decided to use continued fractions to help find a higher encoding ratio.
- Continued fractions are a tool which mathematically suggests an efficient
- rational approximation of any real value. Real values are expressed by a
- series of compounded fractions, as shown by the example in Figure 1. By
- truncating the series at any point, we get a rational approximation to the
- exact value, with more terms offering better approximations. In the case of p,
- we get the series 3/1, 22/7, 333/106, and 355/113 as the first four
- approximations, alternately above and below the true value.
- The expansion of log2(94)/8, our ideal encoding ratio, is shown in Figure 2.
- It expands to the series 0, 1, 4/5, 5/6, 9/11, 59/72, 68/83, ... We must chose
- a ratio from the "low" series 0, 4/5, 9/11, 68/83. I chose to use the ratio
- 9/11 = .81818..., since it was very close to the theoretical value, yet had a
- chance of reasonable implementation.
-
-
- Converting the Data
-
-
- With a 9/11 encoder, nine eight-bit bytes are converted to an eleven-character
- representation. While that could be done in a straightforward manner, using
- multi-precision arithmetic to work with a 72-bit number, it would require a
- fair amount of code and processing time.
- I chose a method for which most of the input is converted in 32-bit chunks.
- Figure 3 shows a diagram of the way that the bits are redistributed on the way
- from nine bytes to eleven characters. A 32-bit value (from four consecutive
- input bytes) is broken into two parts: one part is converted into a
- four-character value (maximum of 944); the second part, which I call a prefix,
- is in the range 0 to 55 (232/944 = 55.01+). Doing this twice converts eight
- bytes to eight characters plus two values in the range 0 to 55.
- After the two four-byte blocks are converted into eight characters, there are
- three remaining values to encode: two prefixes, in the range 0 to 55, and the
- ninth byte, in the range 0 to 255. Combining them into a single number yields
- 56*56*256 = 802,816 values, which is less than 943 = 830,584, allowing them to
- be converted into three more characters, making eleven altogether. Whew!
-
-
- Faster Math
-
-
- Initially, the 32-bit value was separated into the prefix value and
- four-character encoded value by dividing by 944 = 78,074,896, where the
- quotient was the prefix and the remainder was the value to be encoded into
- four characters.
- I added a refinement by changing this divisor slightly. Keeping the maximum
- prefix at 55, I could lower the maximum value needed to be encoded into the
- four characters. Doing this, I chose a divisor of 1,171 * 216 (76,742,656),
- with the advantage that the encoder's 32-bit division by 944, which is slow on
- a 16-bit computer, was replaced by a 16-bit division.
- The code in Listing 3 shows the calculation to encode the nine input bytes
- into the eleven output characters. The calculations are done slightly out of
- order, producing the two blocks of four characters first, even though they are
- last in the output buffer (I wanted to keep the output in the same
- significance order as the input).
- Inside the for loop, the 32-bit variable block is first assigned to a value
- from four input bytes. It is then divided into the prefix value q and a
- remainder. The remainder is then further broken into two-character groups,
- which are in turn broken into base-94 digits which are translated to
- characters in the ASCII range ! through ~. The two values of q in the loop are
- combined into qb by a base-56 to binary conversion, resulting in a value 0 to
- 3,135.
- After the loop, qb is combined with the ninth byte (from the first position)
- via shift and OR, then that result is divided into three output characters.
- Since the maximum value of a block on the last calculation is 802,815, it can
- be represented by two base-94 characters plus one base-91 character. The short
- range character is placed first in the 11-character block.
-
-
- Encoding and Decoding
-
-
- Listing 7 contains the function decode_11_to_9, which shows the code to
- convert from the eleven-character format, dubbed BAZ911, to the nine original
- bytes. It reverses the process of the encoding procedure by converting the
- first three characters into the two prefix codes and the first byte, then
- converts the two blocks of 4 characters, plus associated prefix, into 4 bytes
- each.
- Routines ebaz_data and dbaz_data use the low-level block encode and decode
- routines to encode and decode a data stream of arbitrary length. They are
- called with successive blocks of data, of any size, which are combined and
- converted, with the output being passed off to the output function specified
- in the initialization call. This technique allows the stream encoding and
- decoding to be embedded into various applications.
- These functions also form the encoded output into fixed-length lines (if
- desired), provide/detect a flag for the end of data, and generate/check a CRC
- for data integrity.
- Once a conversion has started, some way is needed to designate the end of
- conversion. EBAZ. C uses an invalid leading character, }, as a flag to
- designate end of conversion. It is followed by a character whose encoded value
- is the number of following bytes, from 0 to 14. The actual number of encoded
- bytes following is two greater, since the 16-bit CRC is included.
- To make the decoding process easier, the minimum size of encoded data is
- eleven characters, allowing a full block of eleven characters of encoded data
- to be assembled before testing for end of data. The encoded output is made
- just long enough to carry the needed bytes only data streams of four bytes or
- less need to be padded with unused information. By retaining at least five
- bytes for use of the next ebaz_data call, the final ensemble, starting with },
- will be at least eleven characters long, thereby simplifying the decoding
- process.
-
-
- The Code
-
-
- Listing 4 shows BAZ.H, the header for the encoding and decoding functions.
- Listing 5 contains BAZCOM.H, a private header for DBAZ.C and EBAZ.C. Listing 6
- shows EBAZ.C, which contains the stream-level and low-level block encoding.
- Listing 7 shows DBAZ.C, which contain the stream-level and low-level block
- decoding. Listing 8 and Listing 9 show CRC16.H and CRC16G.C, respectively,
- which implement CRC calculations. Finally, Listing 10 and Listing 11 show
- BAZ.C and ZAB.C, respectively, which demonstrate the use of EBAZ and DBAZ to
- encode and decode files. The programs should work in any Standard C
- environment using the ASCII character set and eight-bit characters.
- I donate the code to the public domain, so it can be incorporated into your
- software, including commercial software, without need for licensing.
- Some Printable Encoding Background
- There are several ways to encode binary data into printable form. Some of the
- first I encountered were programs designed to allow an assembler to run on one
- type of machine but produce code for another. Since the type of file storage
- and I/O devices were, in general, unknown, these programs produced the output
- in records, one per line, containing hexadecimal values for the object data.
- Usually, there were other characters, such as S for Motorola's format or : for
- Intel's format, to designate the start of a record, followed by the
- hexadecimal data, each byte being encoded as two ASCII hexadecimal digits.
- These formats became popular and are still in use today for cross assemblers
- and compilers, PROM programmers, emulators, and other tools for cross-platform
- development.
-
- The efficiency of these formats, using two characters to represent one byte,
- took a back seat to portability. My first such use of a hexadecimal object
- format was in punched cards, read by a system which didn't even use ASCII or
- EBCDIC!
- Later, ASCII become popular and utilities such as UUEncode translated six bits
- of data into a 64-character subset of ASCII. Three eight-bit bytes could be
- broken into four six-bit values, which became four characters by either adding
- the value to the starting ASCII code or by indexing into a lookup table.
- Special character sequences would normally designate the start and end of
- coding, and other information about the file. The output was normally broken
- into lines of 80 characters or less for ease of handling.
- Figure 1
- Figure 2
- Figure 3 Distribution of bits in 9-to-11 encoding process
-
- Listing 1 Coding 13-bit values into two characters
- #define BASE 91 /* # possible output chars */
- #define FIRST_CODE '!' /* lowest output character */
- #define MAKE_PRINT(c) ((c)+FIRST_CODE)
-
- put_2_ASCII (n)
- unsigned int n;
- {
- /* put_2_ASCII() converts the 13-bit argument to two
- ** characters and writes them to the output file.
- */
- unsigned int rem;
- rem = n % BASE;
- n = n / BASE;
- putc (MAKE_PRINT(n), outf);
- putc (MAKE_PRINT(rem), outf);
- }
-
- /* End of File */
-
-
- Listing 2 Translating two characters back into 13-bit values
- #define BASE 91 /* # possible output chars */
- #define FIRST_CODE '!' /* lowest output character */
- #define BASEVAL(c) ((c)-FIRST_CODE)
- ...
- int t = getcode(); /* get next valid char */
- int u = getcode(); /* get next valid char */
- return BASEVAL(t) * BASE + BASEVAL(u);
-
- /* End of File */
-
-
- Listing 3 Calculation to encode 9 input bytes into 11 output characters
- #define BASE 94 /* # possible output chars */
- #define FIRST_CODE '!' /* lowest output char */
- #define PBASE 56 /* prefix base */
- #define BASESQ (unsigned long)(BASE*BASE)
- #define MAKE_PRINT(c) (char)((c)+FIRST_CODE)
- #define CV2ASCII(p,v) (*(p)=MAKE_PRINT((v)/BASE), \
- *(p+1)=MAKE_PRINT((v)%BASE))
- #define PBMULT (unsigned)(((0xffffffffUL/PBASE)>>16)+1)
- ...
- /* Encode 9 bytes into 11 printable ASCII chars. */
- unsigned long block; /* conversion area */
- int i; /* input byte index */
- unsigned int qb = 0; /* prefixes */
- ldiv_t ld; /* quotient, remainder */
-
- for (i = 1; i < 9; i += 4) {
- unsigned q;
- block = ((unsigned long)
- (((unsigned)in[i+0]<<8) ½ in[i+1])<<16)+
-
- (((unsigned) in[i+2]<<8) ½ in[i+3]);
-
- q = (unsigned) (block >> 16) / PBMULT;
- block = block- ((unsigned long)(q*PBMULT) << 16);
- ld = ldiv ((long) block, (long) BASESQ);
- CV2ASCII(out+i+2, (unsigned) ld.quot);
- CV2ASCII(out+i+4, (unsigned) ld.rem);
- qb = qb * PBASE + q;
- }
-
- /* Now convert the remaining byte and prefixes
- * from previous block conversions */
- block = ((unsigned long) qb << 8) ½ in[0];
- ld =ldiv ((long) block, (long) BASESQ);
- out[0] = MAKE_PRINT((unsigned) ld.quot);
- CV2ASCII(out+1, (unsigned) ld.rem);
-
- /* End of File */
-
-
- Listing 4 External definitions of BAZ routines
- /* BAZ.H
- * Contributed to Public Domain 9/93
- * by Thad Smith, Boulder Co.
- */
-
- /* External interfaces */
- typedef enum { /* decoder return status: */
- DECR_OK, /* normal return */
- DECR_NO_ENDMARK, /* no end marker on data */
- DECR_INVALID_DATA, /* invalid input data */
- DECR_CRC_ERR, /* CRC error */
- DECR_END /* decoding complete */
- } decode_stat;
-
- /* Output function type */
- typedef int outf_t (const char *out, size_t len);
-
- int
- ebaz_init ( /* Initialize encoder */
- int p_width, /* width of output lines */
- outf_t * p_outfunc); /* function taking output */
-
- int
- ebaz_data ( /* Encode data block */
- const unsigned char *data, /* input data */
- size_t len); /* length of data or 0=end */
-
- int
- dbaz_init ( /* Initialize encoder */
- outf_t * p_outfunc); /* function taking output */
-
- decode_stat
- dbaz_data ( /* Decode data block */
- const unsigned char *data, /* input data */
- size_t len); /* length of data or 0=end */
-
- /** Return number of characters in internal buffer.
- * This can be used after DECR_END status is returned
-
- * to determine the number of unused characters given
- * to the decoder. */
- size_t dbaz_excess_chars (void);
-
- void
- encode_9_to_11 ( /* Basic block encoder */
- char out[11], const unsigned char in[9]);
- int /* return: 0=OK, 1= invalid input */
- decode_11_to_9 ( /* Basic block decoder */
- unsigned char out [9], const char in[11]);
-
- /* End of File */
-
-
- Listing 5 Internal definitions for BAZ routines
- /* BAZCOM.H
- * Contributed to Public Domain 9/93
- * by Thad Smith, Boulder Co.
- */
-
- /* Internal constants */
- #define BASE 94 /* # possible output chars */
- #define FIRST_CODE '!' /* lowest output char */
- #define END_FLAG '}' /* starts last data block */
- #define PBASE 56 /* prefix base */
- #define BINB_LEN 9 /* length of binary block */
- #define ENCB_LEN 11 /* length of encoded block */
- #define MAX_ENDBLK_DB 13 /* max # data bytes in
- * final ensemble */
- #define MAX_ENDBLK_LEN (MAX_ENDBLK_DB+2+2*2) /* max
- *length of final ensemble, starting with END_FLAG */
- #define CRC_INIT_VALUE (~0) /* CRC seed */
- /* prefix multiplier */
- #define PBMULT (unsigned)(((0xffffffffUL/PBASE)>>16)+1)
-
- /* End of File */
-
-
- Listing 6 Encode 9 bytes to 11 characters
- /* EBAZ.Cz
- * Contributed to Public Domain 9/93
- * by Thad Smith, Boulder Co.
- */
- #include <stdlib.h>
- #include <string.h>
- #include "baz.h"
- #include "bazcom.h"
- #include "crc16.h"
-
- static unsigned int crc;/* input CRC */
- static int width; /* # output chars/line */
- static int col; /* current column */
- static int ninb; /* # bytes in inbuf */
- static unsigned char inbuf[BINB_LEN * 2];
- static char outbuf[ENCB_LEN * 2];
- static int (*outfunc) (const char *out, size_t len);
-
- #define BASESQ (unsigned long)(BASE*BASE)
- #define MAKE_PRINT(c) (char)((c)+FIRST_CODE)
-
- #define CV2ASCII(p,v) ((p)[0]=MAKE_PRINT((v)/BASE), \
- (p)[1]=MAKE_PRINT((v)%BASE))
-
- static int putn (const char *out, int n);
-
- /* Initialize the BAZ911 encoder. */
- int ebaz_init (
- int p_width, /* width of output lines */
- outf_t *p_outfunc
- ){
- initcrctab ();
- crc = CRC_INIT_VALUE;
- ninb = 0;
- col = 0;
- outfunc = p_outfunc;
- width = p_width;
- return 0;
- }
-
- /* Encode the next block of data. */
- int ebaz_data (
- const unsigned char *data,
- size_t len /* length of data or 0=end */
- ){
- int s; /* output return status */
-
- if (len) {
- unsigned int cl; /* # bytes needed for block */
- while (len > MAX_ENDBLK_DB - ninb) {
- if (ninb) {
- if (ninb < BINB_LEN) {
- cl = BINB_LEN - ninb;
- memcpy (inbuf + ninb, data, cl);
- len -= cl;
- data += cl;
- ninb = BINB_LEN;
- }
- /* convert block in inbuf */
- crc = updcrc (crc, inbuf, BINB_LEN);
- encode_9_to_11 (outbuf, inbuf);
- if ((s = putn (outbuf, ENCB_LEN)) != 0)
- return s;
-
- /* Now move remainder in inbuf down */
- memmove (inbuf, inbuf + BINB_LEN,
- ninb -= BINB_LEN);
- } else {
- /* Encode full blocks from input buffer */
- for (; len > MAX_ENDBLK_DB;
- data += BINB_LEN, len -= BINB_LEN) {
-
- crc = updcrc (crc, data, BINB_LEN);
- encode_9_to_11 (outbuf, data);
- if ((s = putn (outbuf, ENCB_LEN)) !=0)
- return s;
- }
- }
- }
- /* Copy remainder of input to working buffer */
-
- memcpy (inbuf + ninb, data, len);
- ninb += len;
- return 0;
-
- } else { /* final block of data */
- /** Write endmarker with final byte count.
- * Insert CRC and fill out blocks with 0xff to
- * prevent change when output is truncated. */
- char endmark[2];
- endmark[0] = END_FLAG;
- endmark[1] = MAKE_PRINT(ninb);
- if ((s = putn (endmark, sizeof endmark)) != 0)
- return s;
-
- crc = updcrc (crc, inbuf, ninb);
- inbuf[ninb++] = (crc >> 8) & 0xff;
- inbuf[ninb++] = crc & 0xff;
- memset (inbuf + ninb, 0xff, BINB_LEN * 2 - ninb);
- encode_9_to_11 (outbuf, inbuf);
- if (ninb > BINB_LEN) {
- ninb += 2;
- encode_9_to_11 (outbuf + ENCB_LEN,
- inbuf + BINB_LEN);
- } else if (ninb < BINB_LEN-2) ninb = BINB_LEN-2;
- /* Truncate the last block to the number of
- * required characters for the length. */
- if ((s = putn (outbuf, ninb + 2)) != 0)
- return s;
- return outfunc ("\n", 1);
- }
- }
-
- /* Send output to the output function, adding newline
- * character every width encoded characters. */
- static int
- putn (const char *out, int n)
- {
- int s; /* output return status */
- while (width && col + n > width) {
- if (width > col) {
- if ((s = outfunc (out, width - col)) != 0)
- return s;
- }
- if ((s = outfunc ("\n", 1)) != 0)
- return s;
- n -= (width - col);
- out += (width - col);
- col = 0;
- }
- col +=n;
- return outfunc (out, n);
- }
-
- /* Encode 9 bytes into 11 printable ASCII chars. */
- void encode_9_to_11 (char out[11],
- const unsigned char in[9])
- {
- unsigned long block; /* conversion area */
- int i; /* input byte index */
-
- unsigned int qb = 0; /* prefixes */
- ldiv_t ld; /* quotient, remainder */
-
- for (i = 1; i < 9; i += 4) {
- unsigned q;
- block = ((unsigned long)
- (((unsigned)in[i+0]<<8) in[i+1])<<16)+
- (((unsigned) in[i+2]<<8) in[i+3]);
-
- q = (unsigned) (block >> 16) / PBMULT;
- block = block- ((unsigned long)(q*PBMULT) << 16);
- ld = ldiv ((long) block, (long) BASESQ);
- CV2ASCII(out+i+2, (unsigned) ld.quot);
- CV2ASCII(out+i+4, (unsigned) ld.rem);
- qb = qb * PBASE + q;
- }
-
- /* Now convert the remaining byte and prefixes
- * from previous block conversions */
- block = ((unsigned long) qb << 8) in[0];
- ld =ldiv ((long) block, (long) BASESQ);
- out[0] = MAKE_PRINT((unsigned) ld.quot);
- CV2ASCII(out+1, (unsigned) ld.rem);
- }
-
- /* End of file */
-
-
- Listing 7 Decode 9 bytes from 11 characters
- /* DBAZ.C
- * Contributed to Public Domain 9/93
- * by Thad Smith, Boulder Co.
- */
- #include <stdlib.h>
- #include <string.h>
- #include "baz.h"
- #include "bazcom.h"
- #include "crc16.h"
-
- #define LAST_CODE (FIRST_CODE + BASE -1)
- #defin e BASESQ (unsigned long)(BASE*BASE)
- #define BASEVAL(c) (unsigned)((c)-FIRST_CODE)
- #define ISENCODE(c) ((c) >= FIRST_CODE && \
- (c) <= LAST_CODE)
-
- static unsigned int crc;/* input file CRC */
- static size_t n_unused; /* # unused input bytes */
- static int ninb; /* # bytes in inbuf */
- static int done; /* set if finished */
- static char inbuf[ENCB_LEN * 2 + 2];
- static unsigned char outbuf[BINB_LEN * 2];
- static int (*outfunc) (const char *out, size_t len);
-
- static decode_stat
- decode_endmark(const char *data, size_t len);
- /* Initialize the BAZ911 decoder. */
- int dbaz_init (outf_t *p_outfunc) {
- initcrctab ();
- crc = CRC_INIT_VALUE;
-
- done = 0;
- ninb = 0;
- n_unused = 0;
- outfunc = p_outfunc;
- return 0;
- }
-
- /* Decode block of data. */
- decode_stat
- dbaz_data (const unsigned char *data, size_t len) {
- int s; /* output return status */
-
- if (done) {
- n_unused +=len;
- return DECR_END;
- } else if (len == 0)
- return DECR_NO_ENDMARK;
-
- for (;;) {
- /* fill block input buffer */
- while (ninb < ENCB_LEN) {
- if (ISENCODE(*data))
- inbuf[ninb++]= (char) *data;
- data++;
- if (--len == 0)
- return DEC_OK;
- }
- if (*inbuf == END_FLAG) {
- return decode_endmark ((const char *) data,
- len);
-
- }
- if (decode_11_to_9 (outbuf, inbuf))
- return DECR_INVALID_DATA;
- ninb = 0;
- crc = updcrc (crc, outbuf, BINB_LEN);
- if ((s= outfunc ((char *)outbuf,BINB_LEN)) != 0)
- return (decode_stat) s;
- }
- }
-
- /* Return number of characters in internal buffer.
- * This can be used after DECR_END status is returned
- * to determine the number of unused characters given
- * to the decoder. */
- size_t dbaz_excess_chars (void) {
- return n_unused;
- }
-
- decode_stat decode_endmark (
- const char *data, /* data following inbuf */
- size_t len /* length of data */
- )
- {
- int s; /* output return status */
- int nreq; /* # input bytes required for
- * end ensemble */
- unsigned int rc; /* # remaining data bytes */
-
-
- if ((rc = BASEVAL(inbuf[1])) > MAX_ENDBLK_DB)
- return DECR_INVALID_DATA;
-
- memset (inbuf+ninb, FIRST_CODE, 2*ENCB_LEN+2-ninb);
- nreq = rc +4 +CRCLBY;
- if (nreq < ENCB_LEN)
- nreq = ENCB_LEN; /* min size is 1 block */
- if (rc > BINB_LEN - CRCLBY)
- nreq += 2; /* use part of 2nd blk */
- /* get remaining characters */
- while (ninb < nreq) {
- if (len == 0)
- return DECR_OK; /* need more input chars */
- if (ISENCODE(*data))
- inbuf[ninb++] = *data;
- data++;
- len--;
- }
- done = 1;
- if (decode_11_to_9 (outbuf, inbuf+2))
- return DECR_INVALID_DATA;
- if (rc > BINB_LEN - CRCLBY) {
- if (decode_11_to_9 (outbuf+BINB_LEN,
- inbuf+2+ENCB_LEN))
- return DECR_INVALID_DATA;
- }
- crc = updcrc (crc, outbuf, rc);
- if ((s = outfunc ((char *) outbuf, rc)) != 0)
- return (decode_stat) s;
- if (((crc >> 8) & 0xff) != outbuf[rc]
- ( crc & 0xff) != outbuf[rc+1] )
- return DECR_CRC_ERR;
- n_unused = len;
- return DECR_END;
- }
-
- /* Decode 11 printable ASCII characters into 9 bytes */
- int /* return: 0=OK, 1= invalid input */
- decode_11_to_9 (unsigned char out[], const char in[]) {
- unsigned long block;
- unsigned b2, q1, q2, i;
-
- block = BASEVAL(in[0]) * BASESQ +
- (BASEVAL(in[1]) * BASE + BASEVAL(in[2]));
- out[0] = (unsigned) block & 0xff;
- b2 = (unsigned) (block >> 8);
- q1 = b2 / PBASE;
- q2 = b2 - q1 * PBASE;
- if (q1 >= PBASE) return 1;
-
- for (i = 1; i < 9; i += 4) {
- block = (BASEVAL(in[i+2]) * BASE +
- BASEVAL(in[i+3])) * BASESQ +
- (BASEVAL(in[i+4]) * BASE + BASEVAL(in[i+5]));
-
- if (((unsigned)(block >> 16) >= PBMULT) Ã…Ã…
- q1 == PBASE-1 && block >
- (0xffffffffUL -(PBMULT*(PBASE-1UL) << 16))) {
- return 1;
-
- }
-
- block += (unsigned long) (q1 * PBMULT) << 16;
- out[i+0] = (unsigned char) (block >> 24);
- out[i+1] = (unsigned char) (block >> 16);
- out[i+2] = (unsigned char) (block >> 8);
- out[i+3] = (unsigned char) block;
- q1 = q2;
- }
- return 0;
- }
-
- /* End of File */
-
-
- Listing 8 Header for 16-bit CRC calculation
- /* CRC16.H
- *
- * Use updcrc() for a block of data,
- * UPDATE_CRC1() for a single byte.
- *
- * Adapted from CRC-16F.C, a public domain routine
- * in Bob Stout's Snippets file collection.
- * Adaptations donated to public domain.
- */
-
- #define CRCW 16 /* # bits in CRC */
- #define CRCLBY (CRCW/8) /* CRC byte length */
- #define CRCMASK ((1U<<CRCW)-1) /* mask for full CRC */
-
- extern unsigned short crctab[1 << 8];
-
- void initcrctab(void); /* Initialize CRC table */
- unsigned short updcrc(unsigned short icrc,
- const unsigned char *icp, unsigned int icnt);
-
- #define UPDATE_CRC1(c,crc) (((crc)<<8) ^ \
- crctab[(((crc)>>(CRCW-8)) ^ (c)) & 0xff])
-
- /* End of File */
-
-
- Listing 9 Calculate the CRC, a block of data at a time
- /** CRC16G. C
- *
- * Adapted from CRC-16F.C, a public domain routine
- * in Bob Stout's Snippets file collection.
- * Adaptations donated to public domain.
- *
- * Call initcrctab() to initialize table.
- */
- #include "crc16.h"
-
- unsigned short crctab[1 << 8];
- static int initialized = 0;
-
- #define P 0x1021 /* CRC polynomial */
-
- unsigned short
-
- updcrc(unsigned short crc, /* prev CRC */
- const unsigned char *cp, /* new data */
- unsigned int cnt) /* # bytes */
- {
- while (cnt--)
- crc = UPDATE_CRC1(*cp++, crc);
- return (crc & CRCMASK);
- }
-
- void initcrctab(void) {
- unsigned int b, v;
- int i;
- if (initialized)
- return;
-
- for (b = 0; b <= (1 << 8) - 1; ++b) {
- for (v = b << (CRCW - 8), i = 8; --i >= 0;)
- v = v & 0x8000 ? (v << 1) ^ P : v << 1;
- crctab[b] = v;
- }
- initialized = 1;
- }
-
- /* End of File */
-
-
- Listing 10 Encode binary file to BAZ911 file
- /* BAZ.C
- * Contributed to Public Domain 9/93
- * by Thad Smith, Boulder Co.
- */
- #include <stdio.h>
- #include <stdlib.h>
- #include "baz.h"
-
- #define MAX_COL 70 /* # output chars / line */
-
- FILE *inf, *outf;
- unsigned char buf[8192];
-
- /* Write data block to output file. */
- /* ret value: 0 = OK, -1 = error */
- int
- enc_out (const char *out, size_t len)
- {
- fwrite ((const void *) out, 1, len, outf);
- return (ferror (outf) ? -1 : 0);
- }
-
- int
- main (int argc, char *argv[])
- {
- unsigned int n; /* # bytes read */
- int s; /* return output status */
-
- if (argc != 3) {
- puts ("BAZ - Convert binary file to"
- "BAZ911-encoded file");
- puts ("Usage: BAZ infile outfile");
-
- puts (" infile = input file (any format)");
- puts (" outfile= output file in BAZ911 format");
- return EXIT_FAILURE;
- }
- if ((inf= fopen (argv[1], "rb")) == NULL) {
- fprintf (stderr, "Error opening input file: %s",
- argv[1]);
- return EXIT_FAILURE;
- }
- if ((outf = fopen (argv[2], "w")) == NULL) {
- fprintf (stderr, "Error opening output file: %s",
- argv[2]);
- return EXIT_FAILURE;
- }
- fprintf (outf, "BAZ911: %s\n", argv[1]);
- ebaz_init (MAX_COL, enc_out);
- do {
- n = fread (buf, 1, sizeof buf, inf);
- s = ebaz_data (buf, n);
- } while (n > 0 && s == 0);
- putc ('\n', outf);
- if (ferror (outf)) {
- fprintf (stderr, "Error writing output file\n");
- return EXIT_FAILURE;
- }
- if (ferror (inf)) {
- fprintf (stderr, "Error reading input file\n");
- return EXIT_FAILURE;
- }
- return EXIT_SUCCESS;
- }
-
- /* End of File */
-
-
- Listing 11 Decode BAZ911-encoded file
- /* ZAB.C
- * Contributed to Public Domain 9/93
- * by Thad Smith, Boulder Co.
- */
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
- #include "baz.h"
-
- FILE *inf, *outf; /* input, output files */
-
- char *error_msg[] = {
- "", "End of file before end of data\n",
- "Invalid input data\n", "CRC error\n",
- "Conversion complete\n"};
-
- unsigned char buf[8192];
-
- /* Write data block to output file. */
- /* ret value: 0 = OK, -1 = error */
- int dec_out (const char *out, size_t len) {
- fwrite ((const void *) out, 1, len, outf);
- return (ferror (outf) ? -1 : 0);
-
- }
-
- int main (int argc, char *argv[]) {
-
- char line[100]; /* line buffer for header
- * search */
- unsigned int n; /* number of bytes read */
- decode_stat s; /* return status from
- * encode/write */
-
- if (argc!= 3) {
- puts ("ZAB - Convert BAZ911-encoded file to"
- "binary file");
- puts ("Usage: ZAB infile outfile");
- puts (" infile= name of file in BAZ911 "
- "format");
- puts (" outfile= output binary file");
- return EXIT_FAILURE;
- }
- if ((inf = fopen (argv[1], "r")) == NULL) {
- fprintf (stderr, "Error opening input file: %s",
- argv[1]);
- return EXIT_FAILURE;
- }
- if ((outf = fopen (argv[2], "wb")) == NULL) {
- fprintf (stderr,"Error opening output file: %s",
- argv[2]);
- return EXIT_FAILURE;
- }
- dbaz_init (dec_out);
- do {
- if (!fgets (line, sizeof line, inf)) {
- fprintf (stderr, "BAZ911 starting flag not "
- "found in %s\n", argv[1]);
- return EXIT_FAILURE;
- }
- }while (strncmp (line, "BAZ911:", 7) != 0);
- do {
- n = fread (buf, 1, sizeof buf, inf);
- s = dbaz_data (buf, n);
- if (s != DECR_OK) {
- if (s == DECR_END)
- break;
- if (s == -1)
- fprintf (stderr, "Error writing file\n");
- else
- fprintf (stderr, error_msg[s]);
- return EXIT_FAILURE;
- }
- } while (n > 0);
- if (ferror (inf)) {
- fprintf (stderr, "Error reading input file\n");
- return EXIT_FAILURE;
- }
- return EXIT_SUCCESS;
- }
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Intuitive Access to Bit Arrays
-
-
- Siegfried Heintze
-
-
- Siegfried Heintze is a Software Engineer involved in research, software
- development and training. He consults in all areas of software engineering and
- specializes in object oriented techniques. He has developed and taught courses
- on object oriented design and analysis and object oriented programming.
-
-
-
-
- Introduction
-
-
- Our ability to express our ideas is often constrained by the syntax and the
- semantics of a programming language. The difference between what we want to
- write and what we are constrained to write is spoken of as the "semantic gap."
- The better a language is able to support abstraction, the less prominent is
- the semantic gap. Facilities to support abstraction include abstract data
- types and object-oriented programming.
- This article identifies some semantic gaps in the C programming language and
- discusses bridging the semantic gap using C++. I had always regretted the fact
- the C programming language had no explicit support for values sometimes best
- represented as single bits, such as TRUE or FALSE. C programmers often use
- typedef unsigned char Boolean;
- as a substitute when space is not an issue. If, however, it is evident that
- packing our bits into an unsigned long is appropriate, then we have to use
- macros. So, when we wanted to say:
- BitArray x, b;
- x[5] = b[6];
- we were constrained to using some macros that might make the above statements
- look like this in C:
- typedef unsigned long BitArray;
- BitArray x, b;
- int t;
- t = bitx(b,6);
- /* extract bit position 6, length 1 */
- bitd(x,5,t);
- /* deposit into bit position 5, length 1 */
-
-
- Class BitArray
-
-
- As you can see, there is quite a difference between what we wanted to write
- (or could write if we were writing in a language such as Pascal) and what the
- compiler constrained us to write. By incorporating the features of C++ we can
- overload the subscript operator (operator[]) to do what ever we want --
- correct? Well maybe. Even C++ places some constraints on what we do. Let's
- start with a naive (but functional) implementation of class BitArray.
- class BitArray {
- public:
- BitArray();
- int operator[](int);
- void set(int value, int pos);
- private:
- unsigned long _data;
- };
- With this class definition, we can now write:
- BitArray x, b;
- x.set(b[6],5);
- This is an improvement over the previous fragment in C because we can use
- operator[] to extract a bit. Unfortunately, we cannot use operator[] and
- operator= (the assignment operator) to deposit a value into an array of bits.
- As a substitute, we introduce the member function set to deposit new values
- into the array.
- If we want to abandon member function set and allow operator[] to be a
- modifiable 1value (an expression that can receive a value by being placed on
- the left hand side of an assignment statement), we typically overload
- operator[] to return a reference to an element of the array. Then we let the
- operator= actually perform the transfer of the value. We typically see
- operator[] overloaded as follows:
- class IntArray {
- public:
- int& operator[](int sub){
- {...}
- ...
- };
- The logic here is that we use a single function called operator[] to both
- deposit and extract values from the container object. The & indicates that we
- return a reference to a value instead of a value. Returning a reference
- permits operator[] to appear on the left side of an assignment statement, as
- in:
- IntArray x, b; x[6] = b[5];
- There are two problems with this approach, however:
- 1) We cannot return a reference to a single bit.
-
- 2) The function for fetching an element out of the array is identical to the
- function for storing an element. (In both cases we merely return a reference
- to the element).
- These constraints are clearly not a problem when implementing an array of int
- because our function return type is a reference to an int. The problem, then,
- is how to access (fetch and deposit) the elements of an array of bits using
- operator[] and operator=.
- The focus of the remainder of this article will be to examine some important
- features of the C++ language and develop a technique to solve the problem of
- returning a reference to a bit. Later, we will note that this is not the only
- place where type constraints make the naive use of operator[] difficult, so we
- can apply this technique elsewhere.
-
-
- const Function Arguments
-
-
- The const type qualifier allows us to tell the compiler that we promise not to
- modify an object. The object might be a function value, a function argument,
- or the secret (or implied) argument of a member function. This is done with a
- const type qualifier that becomes part of the signature of the function.
- Consider the following fragment of code:
- void func(const BitArray & x)
- {
- if (x[0] == 1)
- ...
- }
- Here we are passing the argument by reference (which sometimes indicates that
- we plan to modify the argument). But then we declare the argument to be const,
- which indicates that the body of code is not permitted to do so. This may seem
- confusing. Why did not we just pass the argument by value (the default passing
- mechanism in C and C++ that automatically makes a copy of the object being
- passed)? Often the answer is that we could have, but passing by reference is
- more efficient because it is not necessary to invoke the constructor and
- destructor for the parameter x.
- Sometimes, in copy constructors for example, we are not permitted to pass by
- value. The problem with the above fragment of code is that the compiler will
- complain that we are modifying the object x. This is not actually true, of
- course, we are just examining it. Nevertheless, we can observe that we will
- sometimes be using operator[] to modify a BitArray object and other times we
- will need to explicitly state that operator[] will not modify the object. We
- can use the const feature to overload operator[]. We now modify the definition
- of class BitArray:
- class BitArray {
- public:
- int& operator[](int sub)
- {...}
- int operator[](int sub) const
- {...}
- ...
- };
- The const in the second function is part of the signature and consequently
- becomes a criterion for selecting the correct function among several
- overloaded functions of the same name. We have solved one problem: we can now
- use operator[] for read-only access inside functions that don't permit us to
- modify BitArray objects. However, we are not finished yet because we don't
- know what the value is that we are going to deposit! Recall that operator[]
- has only two arguments: the implied argument (which is the array of bits) and
- the subscript.
-
-
- Class BitField
-
-
- Given the code:
- BitArray x, b;
- x[6] = b[5];
- How does the operator[] function on the left know that we want to deposit a
- value of one and not of some other value, perhaps zero? The answer is that it
- cannot. Only the operator= function knows this. In the last code fragment,
- operator= receives two integer values. But the operator= function needs to
- know:
- 1) the address of the representation of the array object (here the address of
- an unsigned long)
- 2) the subscript
- 3) the length of the bit field (in this example it is assumed to be one)
- 4) the value to be deposited in the array of bits.
- The problem, then, is to communicate these four pieces of information to the
- operator= function.
- To communicate the first three pieces of information we introduce a new class
- whose declaration looks something like this:
- class BitField {
- public:
- BitField(unsigned long *data,
- unsigned int pos,
- unsigned width);
- void operator=(unsigned long rhs);
- private:
- unsigned long *_data;
- int _pos;
- int _width;
- };
- We also change BitArray to accommodate this new change:
- class BitArray {
- public:
- BitField operator[](int) {
- {...}
- int operator[](int) const {
- {...}
- ...
-
- };
- Now we have solved another problem. We have a function BitField::operator=
- which will be called by the compiler on our behalf when the following code is
- encountered:
- BitArray x, b;
- x[6] = b[5];
- BitField::operator= has access to all four of the necessary pieces of
- information to perform the deposit into the array of bits. When the compiler
- sees x[6] it notes that it is a modifiable lvalue and consequently calls
- function BitField::operator[](int), which returns a BitField object.
- The compiler then looks for an appropriate assignment function and notes that
- class BitField has such a member function. This is then used to perform the
- actual masking to deposit the new value into the unsigned long which holds the
- actual representation of the array of bits. We might consider ourselves
- finished at this point. However, there is still another potential problem. If
- we code in the tradition of many C programmers, we may want to use a feature
- whereby we are allowed to have multiple assignments within a single statement,
- like the following:
- BitArray x;
- x[3] = x[2] = x[1] = x[0] = 1;
- This presents a problem because we declared our assignment operator function
- to be of type void, which means it is really a procedure that has no return
- value. So after we assign the value one to x[0] we have no value to assign to
- x[1].
- How can we make our new class BitField compatible with other built-in types in
- C? The key to this problem is to consider something counterintuitive. We
- normally don't think of operator= as a function that returns a value. But this
- exactly why we are permitted to write expressions such as a = b = c in C and
- C++. We now have a definition of BitField that looks like this:
- class BitField {
- public:
- BitField(unsigned long *data,
- unsigned int pos,
- unsigned int width);
- BitField& operator=
- (unsigned long rhs);
- private:
- unsigned long *_data;
- int _pos;
- int _width;
- };
- We could have declared the assignment operator as:
- BitField operator=(unsigned long)
- which would have returned a value of type BitField instead of a reference to a
- BitField. This would have caused constructors and destructors for the BitField
- class to execute. Since C++ allows us to return references as function values
- we added the & for efficiency at no cost. We still have not solved our problem
- however. operator= only accepts values of type unsigned long. Recall the
- fragment of code causing the problem:
- BitArray x;
- x[3] = x[2] = x[1] = x[0] = 1;
- While we now have the rightmost operator= returning a reference for use by the
- next rightmost operator=, we need a value of type unsigned long when in fact
- we have a reference to an object of type BitField. There are a couple of ways
- to resolve this problem. We shall now examine one of them.
-
-
- Conversion Operators
-
-
- C++ has yet another feature called conversion operators, which convert from a
- user-defined type to a different type (often a built-in type). We may define
- such a function as:
- class BitField {
- public:
- operator unsigned long() {
- {...}
- };
- Since this is a member function, we have a single implied argument and a
- function return value of type unsigned long. The compiler automatically
- inserts calls to this function to perform the conversions we need for
- statements containing multiple assignments. We now reexamine the example of
- multiple assignment operators one last time:
- BitArray x;
- x[3] = x[2] = x[1] = x[0] = 1;
- Progressing from right to left: The rightmost operator[] returns a reference
- to a BitField object for which its member function operator[] is invoked. The
- function returns a reference to a BitField object. This is a problem because
- the next operator[] wants an integer value, not a reference to a BitField
- object. The conflict is resolved by invoking the coversion operator which
- converts the reference to a BitField object to an integer value which is then
- used by the next operator[].
- There is one scenario in which the conversion operator does not work as we
- would like. Consider some arbitrary constructor that accepts an integer, and a
- function that accepts a const argument of that type:
- struct X{
- X(unsigned int x)
- {cout << "Create X ="
- << x << endl; }
- };
- void f(const X& x)
- {cout << "Function f"; }
- We expect the conversion operator for class X to perform the conversion, and
- indeed it does when the X object is declared to be const. Unfortunately, the
- conversion will not accomodate the conversion to a constant object. The
- workaround, shown below, is to explicitly declare a temporary constant object
- to pass to the function.
- main(){
- X a;
- f(a[1]); //error
- const X temp = a;
- f(temp[1]); //OK
- }
-
-
-
- Other Applications
-
-
- Stepping back a bit, we had a scenario where it was intuitive to use the
- subscript notation (operator[]) to access elements of a container object. The
- problem was that, unlike arrays of integers, the method for depositing into
- the object was quite different from extracting a value. Where else might this
- technique be useful?
- The most obvious extension is to accomodate array elements represented by
- multiple bits. We can specify the number of bits to be used for each element
- in the constructor for BitArray. Bitfield objects will then contain the
- address, bit position, and length (i.e., number of bits). This is demonstrated
- in Listing 1, showing the file bitarr.h. The first argument to the constructor
- gives the width in bits. Listing 2 shows the accompanying file bitarr.cpp.
- Interested readers will want to modify this code to accomodate arrays whose
- representation requires more than a single long word of bits. We have a
- similar situation for a video display composed of pixels, where the plot and
- readPixel functions mentioned earlier are used for writing and reading pixels
- respectively. If we have previously programmed in Pascal, we might be tempted
- to pursue implementing a syntax to accommodate code like this:
- VideoDisplay v;
- v[5,6] = v[20,10];
- While this was possible in some older C++ compilers, it is not permitted in
- the current C++ language, because the underlying C language does not permit
- multiple arguments for operator[]. This is unfortunate, but there is a
- reasonable substitute:
- VideoDisplay v;
- v(5,6) = v(20,10);
- We can use the same technique of overloading operator[] twice, the second time
- with a const type qualifier, then defining a temporary class of objects (call
- it class Pixel for this example) for which the appropriate conversion
- operators and assignment operators are defined. Yet another application of
- this technique might be ISAM (indexed sequential access method) files to
- implement persistent objects. We may find it intuitive to access indexed
- records with the subscript or brackets notation when directly using separate
- functions to read and write indexed records.
-
- Listing 1 Definition of classes BitField and BitArray
- //file: BitArr.h copyright 1994 by Siegfried Heintze
- #ifndef BITARR_H
- #define BITARR_H
- #include "Boolean.h"
- #include <assert.h>
- #include <iostream.h>
-
- // Class of temporary objects to reference bit fields
- class BitField{
- public:
- BitField(unsigned long *data, unsigned int pos,
- unsigned width):
- _data(data), _pos(pos), _width(width)
- {
- assert(_pos+_width<=32);
- }
- operator unsigned long(){
- unsigned long mask = 0xffffffff >> (32 - _width);
- unsigned long result = (*_data >> _pos) & mask;
- return result;
- }
-
- //Let the compiler supply these as necessary:
- // BitField(const BitField& src);
- // BitField& operator=(const BitField& rhs);
- // ~BitField();
-
- // Assignment operator. This is the code that deposits
- // an integer value into the packed array of bits.
- BitField& operator=(unsigned long rhs){
- unsigned long mask = 0xffffffff >> (32 - _width);
- rhs &= mask;
- rhs <<= _pos;
- mask <<= _pos;
- *_data &= ~mask;
- *_data ½= rhs;
- return *this;
- }
- private:
- unsigned long *_data;
- int _pos;
- int _width;
- };
-
-
- class BitArray{
- public:
- // Default Constructor.
- // Optional arguments:
- // width: number of bits in an array element
- // val: initial value of representation of bit array
- BitArray(unsigned int width=1, unsigned long val=0):
- _len(32), _data(val),_width(width){
- assert(_width <= _len);
- }
- //Let the compiler supply these as necessary:
- // BitArray(const BitArray& src);
- // BitArray& operator=(const BitArray& rhs);
- // ~BitArray();
-
- // Return the number of bits in an array element
- unsigned int width() const {
- return _width;
- }
-
- // Return the index of the first element in the array
- unsigned int first() const {
- return 0;
- }
-
- // Return the index of the last element in the array
- unsigned int last() const {
- return _len/_width-1;
- }
-
- // Return the number of bits in the array
- unsigned int length()const{
- return _len;
- }
-
- // Fetch an array element
- unsigned long operator[](unsigned int s) const {
- s*=_width;
- assert(s+_width<=_len);
- unsigned long mask = 0xffffffff >> (_len - _width);
- unsigned long result = _data;
- result >>= s;
- result &= mask;
- return result;
- }
-
- // Possibly deposit a value into an array element.
- // Defer actual deposit operation to BitField::operator=
- BitField operator[](unsigned int s) {
- return BitField(&_data, s*_width, _width);
- }
-
- private:
- unsigned long _data;
- // Actual representation of BitArray
- unsigned int _len;
- // number of bits in representation
- unsigned int _width;
- // number of bits in a single array element
-
- };
-
- #endif
-
- /* End of File */
-
-
- Listing 2 Accompanies file bitarr.h
- //file: BitArr.cpp copyright 1994 by Siegfried Heintze
- #include "BitArr.h"
-
- ostream& operator<<(ostream& os, const BitArray & bv){
- for(int ii = bv.first(); ii<=bv.last(); ii++){
- os << bv[ii] << " ";
- }
- return os;
- }
-
- BitArray reverse(const BitArray& src){
- BitArray result(src);
- for(int ii = src.first(); ii<= src.last(); ii++)
- result[src.last()-ii] = src[ii];
- return result;
- }
-
- struct X{
- X(unsigned long x){
- cout<<"create X, x = "<<hex<<x;
- }
- };
- void f(const X&){
- cout<<" Fuction f"<<endl;
- }
-
- main(){
- cout << "begin"<<endl;
-
- BitArray x(1, 0x8aaa71b5ul);
- cout << reverse(x) << endl;
- x[3] = x[2] = x[1] = 1;
- cout << reverse(x) <<endl;
-
- const BitArray xx = x;
- f(xx[1]);
-
- BitArray y(4, 0x8aaa71b5ul);
- cout << reverse(y) << endl;
- y[7]=13
- y[1]=15;
- cout << reverse(y) << endl;
-
- BitArray z(16, 0x8aaa71b5ul);
- cout << reverse(z) << endl;
- z[1]=15;
- cout << reverse(z) << endl;
-
- return 0; }
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- A Self-Extracting Archive for MS-DOS
-
-
- P.J. LaBrocca
-
-
- Pat LaBrocca is the author of ReCalc(TM), a set of rational expression
- calculators that never give answers (well, almost never), and run identically
- on PCs, Macintoshes and Apples. He has a BS and MA in Chemistry and teaches
- computer science at Peter Rouget Middle School 88 in Brooklyn, NY. You can
- contact him at plabrocc@nycenet.edu.
-
-
-
-
- Introduction
-
-
- An archive is a file that contains several other files. A self-extracting
- archive (SEA) is a file that doesn't need the original archiving software to
- release its contents. In this article I present pair of utilities for creating
- SEAs under MS-DOS. The utilities provide a somewhat barebones implementation
- in that they don't provide features common to other archives, such as
- compression, or preservation of subdirectories. Once you understand how these
- utilities work, you may want to extend them. I provide some references at the
- end of this article for those wishing to extend the utilities.
-
-
- The SEA Structure
-
-
- My self extracting archive consists of three parts. The first part is the set
- of files that have been archived, which I refer to as the component files. The
- second part, the extraction module, is the code embedded in the archive that
- reads the component files from the archive and writes them to new output
- files. Finally, a system of headers embedded in the archive tell the
- extraction module the lengths and names of the component files. The extraction
- module uses the information in the prefix headers to recreate the files. To
- avoid confusing C's include header files and MS-DOS's program file headers, I
- refer to my headers as prefix headers, since they prefix each component file.
- Also, when I refer to the "archiver" I am speaking of the utility that builds
- the archive. When I refer to the "SEA" I am speaking of the archive itself.
- The disk layout of a SEA containing n component files is shown in Figure 1.
- A terminating suffix header follows the last component file. The suffix header
- indicates to the extraction module that there are no more component files.
-
-
- Overview of SEA Process
-
-
- The archiver starts by copying the extraction module to the archive. Then for
- each component file, the archiver constructs a prefix header and writes the
- header and the associated file to the archive. When the archiver runs out of
- files to archive, it writes the suffix header.
- The SEA appears to MS-DOS as an .EXE file and executes like any other .EXE
- file. The SEA is actually an .EXE file with extra information appended to the
- end. The appended information does not interfere with the SEA's execution.
- (See the sidebar, "Piggybacking an .EXE File.")
- When you run the SEA, the extraction module reads the prefix headers and
- recreates each of the files in turn. This process continues until the SEA
- reads the suffix header, at which time the extraction is complete.
-
-
- The Archiver -- Detailed Operation
-
-
- The source code for the archiver is in arch.c (Listing 2). The extraction
- module and the archiver expect the same prefix headers, so I put the prefix
- header structure in a separate include file (see Listing 1). After verifying
- that the user has provided a list of files to archive, the archiver begins
- opening files. First, arch.exe tries to open the extraction module, extr.exe,
- to prepare to copy it to the archive. If arch.exe can't find the extraction
- module, arch.exe displays an error message and halts. The default name for the
- SEA to be created is out.exe. arch.exe tries to open the output file with this
- name, this time for writing, and issues a message if the open fails. Arch.exe
- opens out.exe as a binary file to ensure that other functions will not perform
- unwanted conversions on the file. For example, when fopen opens a file in text
- mode, subsequent reads strip out extra carriage returns. Since archive files
- need to contain any kind of file, fopen must treat all files as binary files,
- thus guaranteeing the integrity of each byte in the file.
- If all goes well, arch.exe copies the extraction module to out.exe, and then
- closes it.
- Next, arch.exe attempts to open a file from the command line list. If arch.exe
- can't open the file, it displays a message indicating the name of the file,
- increments count, and forces a jump to the top of the loop. (In this case, the
- failure of a file to open may not indicate a problem. Wild-card expansion in
- the command line sometimes generates the name of a subdirectory, which, of
- course, can't be opened.)
- Next, arch.exe calls fseek and ftell to determine the size of the opened file.
- The call
- fseek( input, 0, SEEK_END );
- moves the file position indicator (an index into the file maintained by the
- FILE type) to one byte after the last byte in the file. A call to ftell
- returns the current file position. The combination
- fseek( input, 0, SEEK_END );
- header.filesize = ftell( input );
- stores the file's size, in bytes, in header.filesize. Another call to fseek,
- this time with argument SEEK_SET,
- fseek( input, 0, SEEK_SET );
- repositions input''s file position indicator to the beginning of the file.
- Depending on how it is built, arch.exe may provide wildcard expansion of
- command line arguments. Most compilers provide an object module which can be
- linked into the program to provide this feature (see the sidebar "Wildcard
- Expansion"). Therefore, after arch.exe reads the command line, the
- command-line arguments may be in the form of file names with or without
- extensions, with partial paths or full paths. The prefix header structure
- expects at most a file name plus extension. arch.exe has to process the
- command line arguments into that form. Some C compilers provide a function
- that does just that job. Unfortunately, it's not a standard function. For
- example, Microsoft C provides _splitpath, which breaks a path into its
- component parts and stores them in strings. Zortech C++ 3.0 supplies the
- function filespecname, which returns a pointer to a string containing the file
- name plus extension. Instead of using a compiler-specific function, I created
- the function filename in Listing 2. filename is a stripped down version of
- _splitpath that extracts the file name plus extension from a path. Arch.exe
- passes filename the path and a character buffer. filename scans the path in
- reverse order, by decrementing a pointer, and stops when it has a full file
- name. The call to filename completes the prefix header data structure.
- Arch.axe calls fwrite to write the prefix header to out.exe. After writing,
- fwrite leaves the file position indicator just beyond the prefix header. After
- copying the current file to out.exe, Arch.exe then closes input in preparation
- for the next file and increments count.
- When there are no more files to be archived, arch.exe writes one final header,
- with header.filesize set to -1L, to the output file. This suffix header serves
- as the end-of-archive mark for the extraction module and completes the SEA.
-
-
- Extraction Module -- Detailed Operation
-
-
- Listing 3 contains the source code for the extraction module, extr.exe.
- extr.exe reads in prefix headers, and uses the information thus gleaned to
- recreate files.
-
-
-
- The Magic Number
-
-
- The extractor must know where the first prefix header starts, which means the
- extractor must know its own length. To get the size information into the
- extraction module, I needed to know the size of extr.exe before I compiled it.
- So I declared a long int, MagicNumber, and initialized it with a dummy value.
- Then I compiled and linked extr.c the usual way. I ran MS-DOS's DIR command to
- obtain extr.exe's file size and used this value to initialize MagicNumber. I
- had to recompile, of course, since I had edited the source code, but the size
- of extr.exe doesn't change. Now MagicNumber tells the extraction module how
- big it is. (I use a batch file to automate keeping the value of MagicNumber
- synchronized with the size of extr.exe. See "Miscellaneous Implementation
- Notes" for some details.)
-
-
- Command Line Processing
-
-
- When the extraction module begins execution just inside functin main) it first
- checks for arguments on the command line. If the user types in an unknown
- option at the command line, the SEA displays a usage message and exits. When
- argc equals 1 the default action, extraction, is performed. The only option
- extr.exe recognizes is -l(ist), which causes a list of archived files and
- their sizes to be sent to the standard output.
- argv[0] contains the string used to invoke the extraction program, so the
- function call fopen(argv[0], "rb") opens the file that is currently executing.
- The program can open its own .EXE file from disk because the executing image
- is just a copy of the disk file. Using this technique to open the SEA allows
- you to rename out.exe to whatever you want.
-
-
- Navigating the File
-
-
- The program calls fseek with arguments SEEK_SET and MagicNumber to move the
- file position indicator just past the extraction module, to the beginning of
- the first prefix header. (Remember to adjust MagicNumber if you edit extr.c!)
- In the while loop, fread reads in a prefix header. If it's the suffix header,
- there are no more files to extract, so the program exits the loop and closes
- input. Otherwise, the program attempts to create a file in the current
- directory using the string from the prefix header, header.filename. If a file
- with the same name already exists, the program overwrites it. The messages
- displayed along the way indicate progress. When the program has copied
- header.filesize bytes to the new file, it closes the new file, increments
- count, and starts the next iteration.
- The procedure for listing the component files is the same, except instead of
- copying a file, the program skips the file by calling
- fseek( input, header.filesize, SEEK_CUR );
- which moves the file position indicator header.filesize bytes forward from its
- current position, to the beginning of the next prefix header.
-
-
- Miscellaneous Implementation Notes
-
-
- The prefix header is a structure declared in Listing 1, sea.h. The first
- member, filename, holds the file's name in an array of characters, as a
- C-style string. The array only needs to be thirteen bytes long in this
- implementation. If you decide to store more than a base name, a dot, and an
- extension, you adjust the array's size accordingly. A long int, filesize,
- contains the file's length.
- If you change the size of extr.exe you must recompile the extraction module. I
- run a little batch file, REMAKE.BAT (Listing 5), from the makefile each time
- extr.exe gets rebuilt, which prints a message to the screen indicating if
- MagicNumber equals the size of extr.exe. The batch file creates a temporary
- file composed of extr.c and a one-line directory listing. An awk program
- (Listing 6), called from the batch file, digs out the file size from the
- directory line and the value used to intialize MagicNumber, compares them, and
- prints a one line report to the screen. (To keep the awk program simple, I put
- a space between MagicNumber's initializer and the semicolon.) I use MKS Awk,
- but other versions should work, too.
- To use the archiver, copy arch.exe and extr.exe to a separate subdirectory on
- your system. arch.exe expects to find extr.exe in the same subdirectory. The
- files to be archived can exist in any subdirectory and on other drives.
- However, the SEA as currently implemented does not store subdirectory or drive
- information. Therefore, when you run the SEA, it will extract all files to the
- same subdirectory. This can be a problem if the archive contains duplicate
- file names from different subdirectories. The extractor will overwrite files
- with duplicate names. If you compiled with Microsoft C and linked with
- setargv.obj as described in the sidebar "Wildcard Expansion," you can use the
- usual MS-DOS wildcards, ? and *. Other compilers may or may not offer wildcard
- expansion as an option. The archiver produces a SEA named out.exe in the
- current directory. You can rename it to anything you want.
- To add compression to the archiver see "A Simple Data-Compression Technique"
- by Ed Ross in the October 1992 issue of The C Users Journal. He describes a
- method of run length encoding. The source code is available on the CUJ code
- disk, or you can download it from one of the online sources listed at the end
- of the table of contents.
- For an extensive introduction to methods of data compression in C, see The
- Data Compression Book by Mark Nelson, from M & T Books. Nelson presents
- explanations and detailed working versions of popular varieties of data
- compression. The final chapter contains a complete compression/archiving
- package, CARMAN.
-
-
- Conclusion
-
-
- The SEA and archiver I have described are very simple, but useful. Because of
- the SEA's simplicity, programmers should find it easy to modify for their own
- use. The SEA's straightforward structure also makes it useful as a learning
- tool.
- Wildcard Expansion
- MS-DOS does not provide wildcard expansion of command-line arguments to
- programs. Therefore, some compilers provide for wildcard expansion by linking
- extra code into the functions that process command line arguments. For
- example, Microsoft C provides such code in a file named setargv.obj.
- To provide expansion of wildcards from the command line, you must link
- setargv.obj with arch.obj. main calls a routine called _setargv to process
- command line arguments. setargv's default action is to not expand wildcards.
- By linking with setargv.obj, you replace the usual _setargv with one that
- expands wildcards. _setargv. obj is located in the MSC LIB subdirectory. To
- link in _setargv.obj from the command line use something like
- cl arch.c setargv /link /NOE
- You must use the /NOE link option, otherwise you get a "symbol multiply
- defined" error. /NOE instructs the linker not to search extended dictionaries,
- which are lists of symbol locations that speed up searching for references. A
- makefile for Microsoft's NMAKE.EXE utility is provided in Listing 4.
- Borland C++ includes a similar file called wildargs.obj. Zortech's version is
- called _MAINx.OBJ, where the x refers to the model.
- Piggybacking an .EXE File
- My SEA files use the .EXE extension to tell MS-DOS that they are executable
- files. However, they are not normal .EXE files; they've been extended. MS-DOS
- still executes them without any difficulty. An .EXE file has three parts: a
- program file header, relocation tables and a relocatable image of the program.
- The relocation table contains the information MS-DOS uses to adjust references
- to memory locations in the executable image. The program file header contains,
- among other things, information about how much RAM the executable image needs
- to run. Note that the file size returned by DIR and the amount of memory
- needed to run the program are not related. The .EXE file does not even know
- how much room it takes up on disk. This is the trick I used to build an
- archive file. Adding bytes to the end of a .EXE file does not change the
- information in the program file header, so when you run a SEA, only the
- extraction module (i.e., the original extr.exe) gets loaded.
- You can investigate this further using a utility called exehdr.exe that
- Microsoft includes with C. It allows you to examine and change a program file
- header. (Versions of C before 6.0 included a slightly different version called
- exemod.exe.) If you run exehdr.exe on extr.exe, the extraction module, and
- then on out.exe, a SEA that contains the extraction module, all of the numbers
- displayed are the same, except for the file size, which is obtained from the
- disk, rather than the program file header.
- Figure 1 SEA structure
-
- Listing 1 Prefix header structure
- /* SEA.H */
- /* Copyright 1993 by P.J. LaBrocca
- All rights reserved.
- */
-
- #ifndef SEA_H
- #define SEA_H
-
- typedef struct header {
-
- char filename[20];
- long filesize;
- } HEADER;
-
- #endif
-
- /* End of File */
-
-
- Listing 2 Archiver
- /* ARCH.C */
- /* Copyright 1993 by P.J. LaBrocca
- All rights reserved.
- */
-
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
-
- #include "sea.h"
-
- /* Forms a filename plus extension,
- if any, from the full or partial path
- pointed to by path and stores it in
- buffer pointed to by name.
- */
- char *filename( char *path, char *name )
- {
- char *p;
-
- /* Start at last character */
- p = path + strlen( path ) - 1;
- while( p != path ) {
- if( *p == '\\' *P == ':' )
- break;
- --p;
- }
- if( p == path && *p != '\\')
- strcpy( name, path );
- else
- strcpy( name, ++p );
- return name;
- }
-
- void main( int argc, char **argv )
- {
- FILE *input, *output;
- int c;
- int count = 1;
- HEADER header;
- long i;
- /* char extension[5]; for _splitpath() */
-
- if( argc == 1 ) {
- printf("Usage: arch file [file ...]\n");
- printf(" Wild cards * and ?.\n");
- exit( 0 );
- }
-
-
- /* open extractor module */
- if((input = fopen("extr.exe", "rb")) == NULL) {
- printf( "error opening extr.exe\n" );
- exit( 0 );
- } /* if( ( archive = */
-
- /* open the final archive file */
- if((output = fopen("out.exe", "wb")) == NULL) {
- printf( "error opening output\n" );
- exit( 0 );
- } /* if( ( output = */
-
- /* copy extractor to final output file */
- while( ( c = getc( input ) ) != EOF ) {
- putc( c, output );
- } /* while( ( c = */
-
- fclose( input );
-
- while( count < argc ) {
- if((input = fopen( argv[count], "rb"))==NULL) {
- printf( "Can't open %s\n", argv[count] );
- ++count;
- continue;
- } /* if( ( input = */
- printf( "Adding %-15s", argv[count] );
- fseek( input, 0, SEEK_END );
- header.filesize = ftell( input );
- fseek( input, 0, SEEK_SET );
-
- #if 0
-
- _splitpath( argv[count], NULL, NULL,
- header.filename, extension );
- strcat( header.filename, extension );
- #endif
- filename( argv[count], header.filename );
-
- fwrite( &header, sizeof( HEADER ), 1, output );
-
- for( i = 0; i < header.filesize; ++i ) {
- putc( getc( input ), output );
- } /* for( i = 0; i <*/
- fclose( input );
- printf("Done!\n");
- ++count;
- } /* while( ) */
- header.filesize = -1L;
- fwrite( &header, sizeof( HEADER ), 1, output );
- fclose( output );
- } /* main */
-
- /* End of File */
-
-
- Listing 3 Extraction module
- /* EXTR.C */
- /* Copyright 1993 by P.J. LaBrocca All rights reserved. */
- /* The Extraction Module */
-
-
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
-
- #include "sea.h"
-
- void bailout( void )
- {
- printf( "\nUsage: archiveName [ -1 ]\n" );
- printf( " archiveName Extract files.\n" );
- printf( " archiveName -1 List files.\n" );
- exit( 0 );
- }
-
- void main( int argc, char **argv )
- {
- FILE *input, *output;
- /* size of extraction module */
- long MagicNumber = 10867 ; /* sic */
- int count = 1;
- HEADER header;
- long i;
- int sw;
-
- /* verify user input */
- if( argc > 2 ) {
- bailout();
- }
- if( argc == 1 )
- ;
- else if( strcmp( argv[1], "-1" ) == 0 )
- sw = '1';
- else
- bailout();
-
- /* open self-extracting archive */
- if( ( input = fopen( argv[0], "rb" ) ) == NULL ) {
- printf( "error opening %s\n", argv[0] );
- exit( 0 );
- } /* if( ( input = */
-
- /* skip extraction module */
- fseek( input, MagicNumber, SEEK_SET );
-
- switch( sw ) {
- default: /* extract contents of archive */
- while( 1 ) {
- fread(&header, sizeof(HEADER), 1, input);
- if(header.filesize == -1L)
- break;
- if((output=fopen(header.filename, "wb"))==NULL){
- printf("error opening %s\n",header.filename);
- exit(0);
- } /* if(( output = */
-
- printf("Creating %-55s", header.filename);
- for( i = 0; i < header.filesize; ++i ) {
- putc( getc( input ), output );
-
- } /* for( i = 0; i < */
- printf("Done!\n");
- fclose( output );
- ++count;
- } /* while( ) */
- break;
- case 'l': /* list contents of archive */
- while( 1 ) {
- fread(&header, sizeof(HEADER), 1, input):
- if( header.filesize == -1L )
- break;
- printf(" %-15s%91d\n", header.filename,
- header.filesize);
- /* Skip file contents. */
- fseek( input, header.filesize, SEEK_CUR );
- ++count;
- } /* while( ) */
- break;
- } /* switch */
-
- fclose( input );
- } /* main */
- /* End of File */
-
-
- Listing 4 Makefile for arch.exe and extr.exe
- # makefile for ARCH.EXE and EXTR.EXE
- # Copyright 1993 by P.J. LaBrocca
- # All rights reserved.
- #
-
- #CC =cl /c /AL /Od /Zi
- CC =cl /c /AL
-
- #CV = /CO
- CV =
-
- .c.obj:
- $(CC) $*.c
-
- all : arch.exe extr.exe
-
- arch.exe :arch.obj
- link arch+setargv,,, /NOE /NOI $(CV) ;
-
- arch.obj : arch.c sea.h
-
- # remake.bat requires awk
- extr.exe : extr.obj
- link extr, , , /NOI $(CV) ;
- # remake.bat
-
- extr.obj : extr.c sea.h
-
- backup: backup1 backup2
-
- backup1: arch.c makefile extr.c archart.doc sea.h
- cp -m $? a:\archart
- touch backup1
-
-
- backup2: remake.bat remake.awk
- cp -m $? a:\archart
- touch backup2
-
- hcopy: hcopy1
-
- hcopy1: arch.c makefile extr.c sea.h archart.txt
- pr -W -e4 -n' '3 $? > prn
- touch hcopy1
-
- # End of File
-
-
- Listing 5 A hatch file to check MagicNumber
- rem REMAKE.BAT
- dir extr.exe > %tmp%\extr.tmp
- cat extr.c >> %tmp%\extr.tmp
- awk -f remak.awk %tmp%\extr.tmp
- del %tmp%\extr.tmp
-
-
- Listing 6 An awk file to check the size of MagicNumber
- # REMAKE.AWK
-
- $1 == "EXTR" && $2 == "EXE" {ml = $3}
- $1 == "long" && $2 == "MagicNumber" {m2 = $4}
-
- END {
- if( ml != m2 } {
- printf("\n\tInitialize MagicNumber to %d", ml)
- print" ; and recompile EXTR.C!\n"
- }
- if( ml == m2 )
- print "\n\tMagicNumber is ok.\n"
- }
-
- # End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- C++ Memory Management
-
-
- P.J. Plauger
-
-
- P.J. Plauger is senior editor of The C Users Journal. He is convenor of the
- ISO C standards committee, WG14, and active on the C++ committee, WG21. His
- latest books are The Standard C Library, and Programming on Purpose (three
- volumes), all published by Prentice-Hall. You can reach him at
- pjp@plauger.com.
-
-
- The authors begin by admitting that memory management on the 80X86 family of
- microprocessors is byzantine. I'd call that an understatement. I once
- estimated that hundreds of programmer years were expended simply because
- someone at IBM decided in the early 1960s that System 360 could do quite
- nicely with 12-bit displacements in essentially all of its machine
- instructions. At the time, I never thought that a future decision would ever
- be as costly. IBM, Intel, and others (acting separately and in concert) have
- proved me wrong with the IBM PC memory layout and access machinery.
- I won't bore you with a recitation of all those definitions. You know:
- conventional memory, upper memory, high memory, expanded memory, extended
- memory, and so forth. Chances are you care intimately about this stuff and
- know far more than I. Or you neither know nor care how all that PC software
- manages to find enough storage to get the job done. But there is also a real
- chance that you fall somewhere in the middle -- you'd love to write large
- programs that run under MS-DOS, but blanch at the thought of mastering all
- that memory-management nonsense.
- If you fall in that third category, or even the first, then this is a book you
- should take a serious look at. It is written in the tradition of the best
- articles in The C Users Journal. You will find oodles of code that does all
- the various bits of magic needed to do battle with EMS, XMS, and even swapping
- memory to disk. The presentation starts with the (reasonably) simple and
- progresses in stages to a comprehensive solution that hides most of the uglies
- behind a clean interface. Along the way, the writing is generally clear and to
- the point.
- You might even buy the book just for the code disk in the back. The authors
- assert that it works with both Borland C++ v3.1 and Microsoft C++ v7.0. I
- didn't try it out, mostly because I cling desperately to my ignorance of PC
- memory management. But I have to say that the code listings and the
- presentation certainly look credible. After reading thousands of manuscripts,
- I like to think that I've developed an eye for this sort of thing.
- Dorfman and Neuberger give you working C++ code to interface to EMS and XMS.
- That's laudable in its own right. But it also amounts to rather thin icing
- over a lumpy underlying cake. Most important, they end with a virtual memory
- manager (VMM) that manages whatever kind of memory it can get its hands on.
- And that can include what they call "fast" memory -- the conventional or upper
- memory that reads and writes as fast as the rest of the stuff in your program.
- Their VMM interface is simpler than any of its building blocks. Moreover, the
- underlying implementation is willing to adapt over a broad range of
- possibilities.
- I can only carp about a couple of small items, after a fairly cursory review
- of this book. One is that the authors tend to be a bit cute in their writing.
- They should think twice about quoting words and phrases. Almost every use
- should be replaced by a font shift, a less clichéd phrase, or straight text
- spoken without seeming apology. But that is a minor blemish on otherwise clear
- writting.
- The other carp is the choice of C++ as a presentation language. I found very
- few places where they arguably took advantage of the greater power of C++ over
- C. They might have a more usable product written in C. On the other hand, they
- could have made better use of the information hiding made possible by C++. But
- that too is a minor issue, given the near universal packaging of C and C++
- compilers these days, particularly under MS-DOS.
- In summary, this book looks to be a valuable asset if you want to learn more
- about the various ways to stretch memory under MS-DOS. It is equally useful is
- you don't want to know any more than absolutely necessary, but still need to
- get the job done. That makes it a double winner, in my book.
- Title: C++ Memory Management
- Author: Len Dorfman and Marc J. Neuberger
- Publisher: Windcrest/McGraw-Hill, New York, 1993
- Price: $32.95
- ISBN: 0-8306-4288-9
- Pages: 293
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Standard C
-
-
- The Header <exception>
-
-
-
-
- P. J. Plauger
-
-
- P.J. Plauger is senior editor of The C Users Journal. He is convenor of the
- ISO C standards committee, WG14, and active on the C++ committee, WG21. His
- latest books are The Standard C Library, and Programming on Purpose (three
- volumes), all published by Prentice-Hall. You can reach him at
- pjp@plauger.com.
-
-
-
-
- Introduction
-
-
- This is the fourth installment in a series on the draft standard being
- developed for the C++ library. (See "Standard C: Developing the Standard C++
- Library," CUJ, October 1993, "Standard C: C++ Library Ground Rules," CUJ,
- November 1993, and "Standard C: The C Library in C++," CUJ, December 1993.) I
- am slowly working my way through the entire library, much as I did with the
- Standard C library in years past. The major difference is that this journey
- travels far more uncharted territory. So I endeavor to provide consistent
- landmarks along the way. For example, here, once again, is the overall
- structure of the draft C++ library standard:
- (0) introduction, the ground rules for implementing and using the Standard C++
- library
- (1) the Standard C library, as amended to meet the special requirements of a
- C++ environment
- (2) language support, those functions called implicitly by expressions or
- statements you write in a C++ program
- (3) iostreams, the extensive collection of classes and functions that provide
- strongly typed I/O
- (4) support classes, classes like string and complex that pop up in some form
- in every library shipped with a C++ compiler
- Thus far I have described (0) introduction and (1) the Standard C library. I
- have to confess, however, that I'm aiming at a moving target. Both those items
- have changed in an important regard, thanks to the introduction of namespaces
- into C++. Namespaces went into the language at the March 1993 meeting in
- Munich. The library adopted a style for using namespaces at the July 1993
- meeting in San Jose. But I won't confuse you by rehashing those topics just to
- make them current. That would be an endless battle, these days. You can expect
- changes in C++ every four months for the next year or two. We can only hope
- that the changes will grow ever smaller as the draft C++ standard converges to
- final form.
- More important, namespaces are so new that no commercial compiler I know of
- supports them yet. Nor are many likely to do so in the near future. And even
- when they're more widely available, you can bet that backward compatibility
- will remain an issue. I expect that namespaces will remain under the hood for
- some time to come. In the medium to long term, they are most likely to be of
- interest only to the more sophisticated C++ programmers.
- So I will begin, as I intended, with a new topic this month, (2) language
- support. It contains no small amount of news in the area of exceptions, which
- have been implemented by several C++ compilers. And it deals with issues of
- much wider interest, such as the global operator new and operator delete, plus
- a few newer odds and ends. For this installment, I focus only on the library
- support for exception handling.
- This section of the draft C++ library standard is called language support
- because it, more than any other part of the C++ library, is on rather intimate
- terms with the language proper. A compiler will generate code that uses the
- classes declared in this section, or that implicitly calls functions defined
- here. In the case of exception classes, you will also find quite a few fellow
- travelers, not directly connected to language support. But the tie is there,
- nevertheless.
-
-
- Exceptions
-
-
- Exceptions represent a significant departure in C++ from past programming
- practice in C. Much of what you write in C++ translates one for one to very
- similar C code. The rest may get longer winded, and a bit harder to read, but
- it's still conventional C. Exception handling, however, changes the underlying
- model of how functions get called, automatic storage gets allocated and freed,
- and control gets passed back to the calling functions.
- A compiler can generate reasonably portable C code to handle exceptions, but
- that code can have serious performance problems -- even for programs that
- don't use exceptions. The very possibility that an exception can occur in a
- called function changes how you generate code for the caller. Alternatively, a
- compiler can generate code directly that can't quite be expressed in C -- and
- face a different set of problems. It may be hard to mix such C++ code with
- that generated from C or another programming language. Perhaps you can see now
- why C++ vendors have generally been slow to add this important new feature to
- the language.
- What makes exception handling important is that it stylizes a common operation
- expressible in C only in a rather dirty fashion. You can think of exception
- handling, in fact, as a disciplined use of the notorious functions setjmp and
- longjmp, declared in <setjmp.h>. (Strictly speaking, setjmp is a macro, but
- let's not pursue that distraction for now.)
- In a C program, you call setjmp at a point to which you expect to "roll back."
- The function memorizes enough context to later reestablish the roll-back
- point, then returns the value zero. A later call to longjmp can occur anywhere
- within the same function or a function called from that function, however deep
- in the call stack. By unspecified magic, the call stack gets rolled back and
- control returns once again from setjmp. The only difference is, this time you
- can tell from the nonzero return value that a longjmp call initiated the
- return.
- That all adds up to a clever bit of machinery, used to pull off all sorts of
- error recovery logic over the past couple of decades. The only trouble is,
- it's too clever by half. Many implementations have trouble determining how to
- roll back all the automatic storage properly. The C Standard is obligingly
- vague on the subject, making life easier on the implementors at the expense of
- being harder on those wishing to write portable and robust code. Nobody
- pretends that <setjmp.h> is an elegant piece of design.
- In C++, matters are much worse. That language prides itself on cradle-to-grave
- control of objects, particularly nontrivial ones. You are assured that every
- object gets constructed exactly once, before anybody can peek at its stored
- values. And you are promised with equal fervor that every object gets
- destroyed exactly once. Thus, you can allocate and free auxiliary storage for
- an object with a discipline that ensures no files are left open, or no memory
- gets lost, in the hurly burly of execution.
- longjmp sabotages the best efforts of C++ compilers to maintain this
- discipline. In rolling back the call stack, the older C function cheerfully
- bypasses all those implicit calls to destructors strewn about by the C++
- compiler. Promises get broken, files remain open, storage on the heap gets
- lost. The draft C++ standard leaves <setjmp.h> in the library for upward
- compatibility. But it discourages the use of these heavy handed functions in
- the neighborhood of "real" C++ code with nontrivial destructors.
- Enter exceptions. In modern C++, you don't report a nasty error by calling
- longjmp to roll back to a point established by setjmp. Instead, you evaluate a
- throw expression to roll back to a catch clause. The throw expression names an
- object whose type matches that expected by the catch clause. You can even
- examine the object to get a hint about what caused the exception. It's kind of
- like calling a function with a single argument, only you're not always sure
- where the function actually resides. And the function is further up the call
- stack instead of one level further down.
- Most important of all, none of those destructors get skipped in the process of
- rolling back the call stack. If that sounds like a nightmare in bookkeeping to
- you, you're absolutely right. Somehow, the executing code must at all times
- have a clear notion of what destructors are pending before control can pass
- out of a given block or a given function. It must also deal with exceptions
- thrown in constructors and destructors, and exceptions thrown while processing
- earlier exceptions. Kids, don't try this at home.
-
-
- The header <exception>
-
-
- So this fancier machinery is now in the draft C++ standard. All that remains
- is to decide what to do with it. You can get a few hints from other
- programming languages. Ada, to name just one, has had exceptions for over a
- decade. Their very presence changes how you design certain interfaces and how
- you structure programs that must respond to nasty errors. The one thing we
- know for sure is that you must develop a style for using exceptions that fits
- the language as a whole, then use it consistently.
- That has serious implications for the Standard C++ library. Traditionally, of
- course, the library has thrown or caught no exceptions. (There weren't any
- such critters to throw!) But it's a poor advertisement for this new feature if
- the library itself makes no use of exceptions. Put more strongly, the Standard
- C++ library has a moral obligation to set a good example. Many programmers
- will use only the exceptions defined in the library. Others will model their
- own on what they see used by the library. Thus, the library is duty bound to
- set a good example for the children.
- Most decisions about the Standard C++ library are made within the Library
- Working Group (or LWG) of X3J16/WG21, the joint ANSI/ISO committee developing
- the draft C++ standard. Early on, the LWG committed to using exceptions as
- part of the error reporting machinery of the library. Not everyone is happy
- with this decision. Some people object to this decision because they don't
- want to incur the inevitable overheads of exception handling in every program
- that touches the library -- and that's essentially every program you write in
- C++. Others object because of the putative difficulties of validating a
- program that throws exceptions. Some projects require that the software
- vendors assert that exceptions can never be thrown. So the decision to use
- exceptions in the library was not lightly made.
- Only recently has the LWG agreed on an overall structure. What I present here
- was approved by the joint committee as recently as November 1993. But aside
- from a few name changes and other small tweaks, it is likely to survive
- reasonably unchanged.
- All the relevant declarations and class definitions for exception handling can
- be had by including the header <exception>. (Note the absence of a trailing
- .h, the hallmark of new C++ headers.) Within this header you can find the
- definition for class xmsg, the mother of all exceptions thrown by the library.
- (Yes, the name is horrid -- it's very likely to be changed to exception in the
- near future.) Listing 1 shows at least one way that this class can be spelled
- out.
- The basic idea is that each exception has three null-terminated message
- strings associated with it:
- what -- telling what caused the exception
- where -- telling where it occurred
- why -- telling why it occurred
-
- Some exceptions may use only the first one or two messages, leaving the later
- pointers null.
- The next important notion is that an exception should have private copies (on
- the heap, presumably) of all these message strings. A typical exception
- constructor allocates storage on the heap, copies the strings, and sets the
- flag alloced. That way, the destructor knows to free the storage once the
- exception has been processed.
- But then why the flag if this is the preferred mode of operation? Well, one
- important exception derived from this base class is xalloc. It is thrown by
- operator new when it fails to allocate storage. (More on this in a later
- installment.) The last thing you want to do is try to copy strings onto the
- heap when you have to report that there's no more room on the heap! Thus, the
- special protected constructor that lets you specify no copying of strings. Of
- course, anyone using this constructor had better provide strings with a
- sufficiently long lifetime, or trouble ensues. That's why this form is
- discouraged, except where absolutely necessary.
-
-
- Throwing an Exception
-
-
- You'd think then that all you have to do to throw an exception is write
- something like:
- throw xmsg("bad input record");
- You can certainly do so, but that is not the preferred method. Instead, for
- any exception object ex, you're encouraged to call ex.raise(). That function
- does three things:
- First it calls (*ex.handler)(*this) to call the raise handler. The default
- behavior is to do nothing, but you can hijack any thrown exception by
- providing your own handler with a call to xmsg::set_raise_handler.
- Then it calls the virtual member function do_raise. That permits you to hijack
- thrown exceptions only for some class derived from xmsg.
- Finally it evaluates the expression throw *this.
- The first escape hatch is for embedded systems and those projects I indicated
- above that abhor all exceptions. You can reboot, or longjmp to some recovery
- point (and to beck with the skipped destructors).
- The second is best illustrated by the derived class reraise, shown in Listing
- 2. It overrides the definition of do_raise in a special way. The override
- evaluates the expression throw, which "rethrows" an exception currently being
- processed. It turns out that iostreams has an occasional need to pass an
- exception up the line to another handler. I invented reraise as a way to do
- this, but it looks to be generally useful.
- The third thing is to do what exception classes were invented to do in the
- first place. By having all library exceptions be thrown through this
- machinery, however, the class meets the needs of several constituencies.
-
-
- Exception Hierarchy
-
-
- There's still more to library exceptions. Figure 1 shows a whole hierarchy of
- classes derived from xmsg. Some are defined in other headers, but most are to
- be found in <exception>. The basic partitioning is into two groups:
- logic errors, derived from class xlogic, which report errors that you can, in
- principle, detect and avoid when writing the program
- runtime errors, derived from class xruntime, which report errors that you
- detect only when you run the program
- The former category is for those "can't happen" events that are often too hard
- to really prevent, at least until after some thorough debugging. The latter is
- for surprises that happen during program execution, such as running out of
- heap or encountering bad input from a file.
- Listing 3 shows the class xlogic and Listing 4 shows the class xruntime. Note
- the slight asymmetry. The latter class has an extra low-level constructor, as
- I described earlier, which is supposed to be used only by xalloc. Two more
- classes are derived from these to report mathematical errors. Listing 5 shows
- the class xdomain and Listing 6 shows the class xrange. A domain error occurs
- when you call a mathematical function with arguments for which its behavior is
- not defined (such as taking the real square root of -5). A range error occurs
- when the result of a mathematical function is defined in principle but not
- representable in practice (such as raising e to the 10,000 power).
- Finally, Listing 7 shows the class bad-cast. It is the one exception thrown
- implicitly by statements generated by the compiler. C++ now includes dynamic
- casts which, in certain contexts, yield a null pointer if the cast is not
- permissible at runtime. If the context also requires that a reference be
- initialized, the executable code throws a badcast exception instead. (A
- reference can never be to a nonexistant object.)
-
-
- Terminate and Unexpected Handlers
-
-
- Exception processing code can also call two additional functions:
- terminate, when exception handling must be abandoned for any of several
- reasons
- unexpected, when a function throws an exception that is not listed in its
- (optional) exception specification.
- A terminate condition occurs:
- when the exception handling mechanism cannot find a handler for a thrown
- exception
- when the exception handling mechanism finds the execution stack corrupted
- when a destructor called during execution stack unwinding caused by an
- exception tries to transfer control to a calling function by throwing an
- exception
- The default behavior of terminate is to call abort, while the default behavior
- of unexpected is to call terminated. As usual in C++, however, you can provide
- your own flavors of these functions. A call to set_terminate lets you specify
- a pointer to a new function that must still somehow terminate the program. A
- call to set_unexpected lets you specify a pointer to a new function that can
- itself throw (or rethrow) an exception or terminate program execution.
-
-
- Conclusion
-
-
- As you can see, the facilities provided by <exception> give you considerable
- latitude in reporting and handling exceptions. The C++ library uses this
- machinery exclusively, so you can control what the library does with
- exceptions. You can even prevent the library from actually evaluating any
- throw expressions.
- Given our limited experience to date with using expressions in C++, I'm fairly
- confident that this is mostly a good design. Time, of course, will tell us
- better how well we've done.
- Figure 1 Exception class hierarchy
- xmsg
- xlogic
- xdomain
- badcast
- invalid_argument (<bits>, etc.)
- length_error (<string>, etc.)
- out_of_range (<string>, etc.)
- xruntime
- xrange
- xalloc (<new>)
-
- failure (<ios>)
-
- Listing 1 Class xmsg
- class xmsg {
- public:
- typedef void(*raise_handler)(xmsg&);
- private:
- const char *what, *where, *why; // exposition only
- int alloced; // exposition only
- static raise_handler handler; // exposition only
- protected
- virtual void do_raise();
- xmsg(const char *what_arg, const char *where_arg,
- const char *why_arg, int copyfl);
- public:
- xmsg(const char *what_arg = 0, const char *where_arg = 0,
- const char *why_arg = 0);
- virtual ~xmsg();
- void raise();
- const char *what() const;
- const char *where() const;
- const char *why() const;
- static raise_handler set_raise handler(raise_handler handler_arg);
- };
-
- // End of File
-
-
- Listing 2 Class reraise
- class reraise : public xmsg {
- protected:
- virtual void do_raise();
- public:
- reraise();
- virtual ~reraise();
- };
-
- // End of File
-
-
- Listing 3 Class xlogic
- class xlogic : public xmsg {
- protected:
- virtual void do_raise();
- public:
- xlogic(const char *what_arg = 0, const char* where_arg = 0,
- const char *why_arg = 0);
- virtual ~xlogic();
- };
-
- //End of File
-
-
- Listing 4 Class runtime
- class xruntime : public xmsg {
- protected:
- virtual void do_raise();
- xruntime(const char *what_arg, const char *where_arg,
- const char *why_arg, int copyfl);
-
- public:
- xruntime(const char *what_arg = 0, const char *where_arg = 0,
- const char *why_arg = 0;
- virtual ~xruntime();
- };
-
- //End of File
-
-
- Listing 5 Class xdomain
- class xdomain : public xlogic {
- protected:
- virtual void do_raise();
- public:
- xdomain(const char *what_arg = 0, const char *where_arg = 0,
- const char *why_arg = 0);
- virtual ~xdomain();
- };
-
- // End of File
-
-
- Listing 6 Class xrange
- class xrange : public xruntime {
- protected:
- virtual void do_raise();
- public:
- xrange(const char *what_arg = 0, const char *where_arg = 0,
- const char *why_arg = 0);
- virtual ~xrange();
- };
-
- // End of File
-
-
- Listing 7 class badcast
- class badcast : public xlogic {
- protected:
- virtual void do_raise();
- public:
- badcast();
- virtual ~badcast();
- };
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Questions & Answers
-
-
- A Tricky (But Important) Type Distinction
-
-
-
-
- Kenneth Pugh
-
-
- Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++
- language courses for corporations. He is the author of C Language for
- programmers and All On C, and was a member of the ANSI C committee. He also
- does custom C programming for communications, graphics, image databases, and
- hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707.
- You may fax questions for Ken to (919) 489-5239. Ken also receives email at
- kpugh@allen.com (Internet) and on Compuserve 70125,1142.
-
-
-
-
- Heap or Stack
-
-
- Regarding your Q?/A! column "Heap or Stack -- Which Should You Use" in the
- October 1993 CUJ: On page 122, col. 1, bottom, you state "The type of
- char_matrix[4] is pointer to char...". Actually, it is "array of 10 char." You
- make this distinction clear elsewhere in the article. Like all such array
- pointer differences, this one really only becomes visible if you do
- sizeof(char_matrix[4]), which reports 10, not sizeof(char*). I enjoy your
- articles -- keep up the good work. This is a mere quibble, but I thought you'd
- like to get things exactly right.
- Craig Berry
- Thanks for the sharp-eyed correction. For readers who have lost their October
- issue, the declaration was:
- #define NUMBER_OF_CHARS 10
- #define NUMBER_OF_STRINGS 5
- char char_matrix[NUMBER_OF_STRINGS]
- [NUMBER_OF_CHARS];
- char_matrix[4] is an array of 10 chars, as Mr. Berry notes. Its type reduces
- to char *, so you can code the following without warning:
- char *p_char = char_matrix[4];
-
-
- Char Pointers
-
-
- I really enjoyed your article in the October issue of CUJ. The examples that
- you presented are very practical examples of allocating memory for arrays.
- Most books on C fail to provide such practical examples and assume that
- everyone simply defines strings inside code hardwire style such as:
- char char_array[3] = "123";
- I wish I had had a copy of your article four years ago when I had to figure
- out how to allocate strings through experimentation. Keep up the good work!
- Russell Thrasher
- Austin, TX
- Thanks for your feedback. If you object to the literal "123" appearing in the
- source code, you can bet that C programming books define strings this way for
- the sake of simplicity. As a general rule, it's best to avoid scattering
- literals, strings, or numeric constant definitions throughout the body of a
- program. Usually, these constants need only appear in #defines at the
- beginning of the source file where they're easy to find. In some cases you can
- eliminate constants from the source file entirely by reading the constants
- into the program from a seperate file at run time.
- Keeping string constant definitions out of a program's source files allows you
- to alter the user interface appearance without changing the executable
- program. This capability is especially important in the international market,
- where you may need to change the (human) language of a user interface. The
- example in the October issue demonstrated one way of eliminating constant
- definitions from the source code entirely, but there are lots of other
- methods.
- For example, the language features of UNIX (NLS), such as the message catalog,
- provide a standard method for storing strings. You can use the resource files
- for the MacIntosh and for Microsoft Windows to store strings in a program. You
- can configure the .Xdefaults file for X-Window (Motif and Open-Look) to store
- all strings for an application. Finally, you can find commercial products that
- will extract strings from a code file and place them into files similar to
- that referenced in the October example.
-
-
- A Class for String Storage
-
-
- Following along the lines of the previous letter, I will show how C++ classes
- can eliminate most char *pointers and the problems they create, as well as cut
- down on portability problems between various implementations of strings.
- To start, I define a String class as:
- class String
- {
- public:
- String(const char * string);
- operator const char *();
- ...
- private:
- char * data;
- ...
-
- };
- At this point, I don't specify much of an implementation, as that should not
- be of concern to the user. I include the cast operator to a const char *only
- for back-fitting to functions that require char *pointers. In a more abstract
- interface, all functions in the program would only require either a String or
- a String & (reference to a String) parameters.
- A Collection_of_strings class could have the following interface:
- class Collection_of_strings
- {
- public:
- Collection_of_strings(const char * identifier);
- ~Collection_of_strings();
- String get_string_by_index(unsigned int index);
- ...
- private:
- String * strings;
- unsigned int number_of_strings;
- };
- The user of this class might code a header file as follows:
- #define MY_STRINGS_IDENTIFIER "my_string.dat"
- // These are the names for each string which represents
- // its purpose
- #define ERROR_STRING 0
- #define TITLE_STRING 1
- ...
- And the user's program might look like this:
- void main()
- {
- Collection_of_strings strings(MY_STRINGS_IDENTIFIER);
- String a_string;
- ...
- a_string = strings.get_string_by_index(ERROR_STRING);
- ...
- printf("%s", (const char *) a_string);
- // or whatever else you need to do with it
- ...
- }
- Note that this program does not need to concern itself with how the strings
- are stored in memory or on disk. The only information this program requires is
- the identifier of the set of strings and names or identifiers for all the
- strings.
- Let's look at possible implementations of Collection_of_strings. The
- MY_STRING_IDENTIFIER might either identify an actual data file or a subsection
- of a standard file. The constructor would open the appropriate file. The
- constructor could read all the strings into a memory array and close the file.
- get_string_by_index( unsigned int index ) would use the index to locate the
- appropriate String and return it.
- Alternatively, the constructor would perform an open and the destructor a
- close. get_string_by_index() would use the index to read the corresponding
- String off the disk. Depending on the system and the number of strings read,
- the I/O delay might not be noticeable. With a disk cache, the delay might only
- be apparent the first time a String was read.
- If you are using NLS, Microsoft Windows, or the MacInstosh, the implementation
- of this class could load the string via calls to the appropriate functions.
- You would need to ensure that the values of the string identifiers matched on
- all these systems, but that is a bookkeeping problem, not a coding problem.
- Let's make a few changes and add a few features to this class:
- class Collection_of_strings
- {
- public:
- enum Retrieval_type {FAST_BUT_NEEDS_LOTS_OF_MEMORY,
- SLOW_BUT_LITTLE_MEMORY};
- enum Error_code {OK, IDENTIFIER_NOT_AVAILABLE,
- LANGUAGE_NOT_AVAILABLE};
- Collection_of_strings();
- Error_code load(const char * identifier,
- Retrieval_type retrieval_type,
- Language language);
- ~Collection_of_strings();
- String get_string_by_index(unsigned int index);
- ...
- private:
- String * strings;
- unsigned int number_of_strings;
- };
- This class has a language selection capability, as well as the ability to
- utilize alternative retrieval methods.
- When you design a set of classes, you face a tradeoff between individual class
- complexity and how many classes you must design. Sometimes using another
- parameter in the constructor decreases the number of classes and simplifies
- the class interface, although it may make the implementation slightly more
- complex.
- Instead of adding parameters to the constructor, I created a separate load
- function. A constructor can only report problems via exceptions or by setting
- a flag that can be tested later; having a separate load function allows an
- error code to be returned. Since the likelihood is high that a requested
- language won't be available, a return code is a better choice than an
- exception as a reporting method.
-
- The program source should contain a general language header file, say
- "language.h" that contains:
- enum Language {ENGLISH, SPANISH, FRENCH, ....};
- The program may also require a file that contains strings with standard or
- alternative spellings of languages for user interface programs. For example:
- struct Language_equivalent
- {
- enum Language code;
- char * string;
- };
-
- Language_equivalent language_equivalents[] =
- {
- {ENGLISH, "English"},
- {ENGLISH, "American English"},
- {GERMAN, "German"},
- {GERMAN, "Deutsch"},
- ...
- };
- With this Collection_of_strings class, the calling program might appear as
- follows. (This example does not handle every possible error problem, but it
- demonstrates the general approach.):
- void main()
- {
- Collection_of_strings strings
- Strings::Error_code error_code;
- error_code = strings.load(MY_STRINGS_IDENTIFIER,
- FAST_BUT_NEEDS_LOTS_ OF_MEMORY, FRENCH);
- switch (error_code)
- {
- case OK:
- break;
- case IDENTIFIER_NOT_AVAILABLE:
- cerr << "String data not available "
- << MY_STRINGS_IDENTIFIER;
- exit(1);
- break;
- case LANGUAGE_NOT_AVAILABLE:
- cerr << "Language not available "
- << "English selected";
- error_code = strings.load(MY_STRINGS_IDENTIFIER,
- FAST_BUT_NEEDS_LOTS_OF_MEMORY,
- ENGLISH);
- break;
- }
- ...
- String a_string;
- ...
- a_string = strings.get_string_by_index(ERROR__STRING);
- ...
- cout << a_string;
- // or whatever else you need to do with it
- ...
- }
- You could implement this class with NLS fairly easily. Implementing it with
- Microsoft Windows or the MacIntosh would be a bit more difficult, since the
- resources, although logically separate from the code, are bound to the
- executable file. With X-Window, you could load an .Xdefaults file based on the
- particular human language being handled by the user interface.
- A particular implementation might not be able to support efficiently both
- Retrieval_types. You could add Error_codes to report that the desired type was
- unavailable and an alternative type was employed. You might even try out
- intermediate Retrieval_types, which could attempt to trade off memory space
- for speed. However, that might unduly complicate the implementation.
- Q
- I have just read your comments on lint for C++ in C Users Journal with the
- greatest interest as I was about to contact Gimpel with the same question that
- Sue Lindsey posed to you.
- I have found lint (with strong type checking) to be the single most valuable
- tool and source of education in using C, and rely on it heavily. However, I've
- been following the C++ and object-oriented programming literature and with a
- current software project exceeding 10,000 lines of source code (for the first
- time for me) I can readily appreciate some of the advantages C++ potentially
- offers.
- I don't have the luxury of much "dabbling time," and have been nervous about
- making a difficult move which would also deprive me of my trusted lint! If
- your statement that C++ implicitly takes care of most of the problem areas in
- C that lint covers is really true, that would be one of the most powerful
- reasons I have heard for moving from C to C++. I have never seen an equivalent
- view expressed before, and I just want to hear you say it again!
- Could I suggest that you expand on this a bit in a future column, perhaps even
- seeking some input from Gimpel? I'm thinking of the number of times lint has
- saved me from what would have been a hard bug to find by a "conceivably
- un-initialized" warning in a rarely-taken program branch (for example). I
- don't see anything in C++ that would help there; perhaps I'm wrong!
- Finally, this gives me a welcome opportunity to thank you for your excellent
- columns in CUJ -- I always read them first and, as a fairly isolated
- practioner, find most OF them informative and entertaining.
- Rob Sherlock
-
- I described Gimpel's PC-lint for C++ in last month's column. If you were
- hesitant about moving to C++ due to the lack of a lint for the language, you
- need wait no longer.
- As I noted in my answer to the original lint question, C++ compilers (as well
- as many C compilers) have become more lint- like in their warning messages.
- However, compilers still vary in recognizing potential errors. lint reports
- conditions that are not detected by all compilers. For example, some compilers
- may not recognize
- if (a_variable=0)
- another_variable=5;
- as a potential error, but lint will.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Stepping Up To C++
-
-
- Overloading and Overriding
-
-
-
-
- Dan Saks
-
-
- Dan Saks is the founder and principal of Saks & Associates, which offers
- consulting and training in C++ and C. He is secretary of the ANSI and ISO C++
- committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of
- the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach
- him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601,
- or electronically at dsaks@wittenberg.edu.
-
-
- In October, my wife Nancy suggested that our family should go to Disney World
- for vacation in early December. I was reluctant because I thought I had too
- much work to do. But that's been my excuse for weaseling out of a lot things
- for the past few years. OK, I let her talk me into it. But I made her promise
- to help me get caught up. And she did. And I almost did.
- The C++ standards committee met in November and passed almost twice as many
- motions as ever before at a single meeting. (I'll tell you about them in the
- not-too-distant future.) So I had to spend an additional two or three days
- that I hadn't planned on writing these longer-than-ever minutes. And all my
- other work slipped.
- So here I am now, sitting in my room at Disney's Carribean Beach Resort in
- Orlando, overloaded with work, while Nancy, Ben and Jeremy are over riding on
- the Big Thunder Mountain Railroad at the Magic Kingdom. I've got to hurry up
- and write this so I can join them.
- Many C++ programmers, even moderately experienced ones, confuse the terms
- overloading and overriding. For beginners, overriding is often a new term Even
- before they write their first line of C++, most programmers hear that C++
- offers function and operator overloading. (It's one of the features that lures
- them to C++ in the first place.) New C++ programmers don't encounter the term
- overriding until they start using virtual functions. Then they confuse the
- words simply because they sound alike. Beginning C++ programmers usually err
- by speaking of overloading when they mean overriding, simply because they
- don't understand the subtle differences.
- Experienced programmers continue to confuse the terms because overloading and
- overriding have similar properties. And of course, the words continue to sound
- alike. It doesn't help when an overly-exuberent lecturer uses one word when he
- or she means the other and no one in the audience catches the gaff. I know
- that happened to me in some of my earliest presentations on C++. I don't
- believe I've made that mistake recently, but I can't promise I'll never do it
- again.
- OK, so what's the difference? In short, overloading means declaring the same
- symbol in the same scope to represent two or more different entities.
- Overriding means redefining a name in a derived class to hide a name inherited
- from a base class.
- In C++, you can overload both function identifiers and operators. For example,
- void put(FILE *f, char c);
- void put(FILE *f, int i);
- void put(FILE *f, const char *s);
- overloads the identifier put by declaring three different functions with that
- name. The predefined operator + is inherently overloaded because it already
- applies to a variety of operand types. The declaration
- int operator+(complex z1, complex z2);
- overloads operator + to handle complex numbers as well. I described many of
- the C++ overloading rules in detail in "Function Overloading," CUJ, November,
- 1991, and in a four-part series on "Operator Overloading" that appeared in
- every other CUJ from January through July, 1992.
- The following code demonstrates overriding. Overriding applies specifically to
- functions in class hierarchies. For example, given
- class B
- {
- public:
- virtual void f();
- virtual int g(int i);
- };
- the definition
- class D :
- public B
- {
- public:
- virtual void f();
- };
- derives class D from B and overrides B's f with D's own version of f. D does
- not override g, so D's g is exactly as inherited from B.
- Overloading and overriding interact in some complex and very subtle ways. By
- itself, each feature is a powerful programming tool (maybe too powerful).
- Together, these features produce much richer capabilities than most programs
- need.
- Overloading and overriding combine to produce some very puzzling diagnostic
- messages, and even more baffling run-time errors. That's the nature of C++.
- C++ provides sufficient capability for experienced programmers to write code
- that is more maintainable, yet no less efficient, than it would be in C. But
- in doing so, it gives naive programmers even more opportunities to get into
- trouble (as if C doesn't already offer enough).
- Some of the things I'm about to show you are pretty intricate. They are
- probably more complex than anything you're likely to want to do in real C++
- programs. My advice, as always, it to keep things as simple as you can. Then
- why would I show you things you probably don't want to do? Because you're
- likely to do them by accident. Understanding these examples should help you
- recognize and correct your mistakes. And once in a great while, you might even
- find a reason to do these things intentionally.
- The following discussion of course assumes you're familiar with vtbls (virtual
- tables) and vptrs (pointers to virtual tables) as an implementation model for
- virtual functions in C++. I introduced them last month in "How Virtual
- Functions Work" (CUJ, January, 1994).
-
-
- Virtual and Non-virtual Overriding
-
-
- A class can contain both virtual and non-virtual functions. For a given class,
- the translator creates an entry in the vtbl only for each virtual function in
- the class, not for the non-virtual functions.
- A derived class can override any of its inherited functions, be they virtual
- or not. When you override a function that's virtual in the base class, it
- automatically becomes virtual in the derived class. You can't turn off the
- dynamic binding when you override a virtual function. That is, you cannot
- override a virtual function with a non-virtual function. On the other hand,
- you can override a non-virtual function with either a virtual or a non-virtual
- function. When you override a function that's non-virtual in the base class,
- the overriding function is also non-virtual, unless declared so explicitly.
- When a C++ translator first encounters the definition for a class D derived
- from some base class B, it creates an image for D's vtbl by copying B's vtbl.
- (As always, I am describing a conceptual model for how the translation works.
- Any particular implementation may do it differently.) When it parses a
- declaration in D that overrides a virtual function f, the translator simply
- overwrites the entry for f in D's vtbl to point to D::f instead of B::f. Thus,
- overriding a function that's virtual in B doesn't increase the size of D's
- vtbl.
- The translator resolves all non-virtual function calls during translation, so
- it need not store any non-virtual function addresses in vtbls. Thus,
- overriding a non-virtual function with another non-virtual function has no
- effect on the vtbls at all. But, overriding a function that's non-virtual in B
- with a virtual function in D increases the size of D's vtbl, adding a new
- entry to D's vtbl that has no corresponding entry in B's vtbl.
- Listing 1 shows a simple inheritance hierarchy that mixes virtual and
- non-virtual overriding. The base class, B, has four member functions, but only
- two are virtual. Thus, the D's vtbl has only two entries in it, for functions
- f and h as shown in Figure 1.
- Class C in Listing 1 derives from B and overrides three of the four functions
- it inherits. During translation, C's vtbl starts out as a copy of B's. C::f is
- virtual because it overrides virtual B::f, and the compiler replaces the first
- entry in C's vtbl (corresponding to f) with the address of C::f. C::g
- overrides non-virtual B::g. Since g's declaration in C doesn't include the
- virtual specifier, C::g is also non-virtual. C doesn't override the h it
- inherited from B, so the second entry in C's vtbl (corresponding to h)
- continues to point to B::h.
-
- C::j overrides non-virtual B::j, but C declares j as virtual. Therefore the
- compiler adds a new entry at the end of C's vtbl corresponding to j, and fills
- it in with the address of C::j. The resulting vtbl for class C also appears in
- Figure 1.
- Class D in Listing 2 derives in turn from C, and overrides functions h and j.
- Both h and j are virtual in C, so they are also virtual in D. The translator
- replaces the entries for h and j in D's vtbl with the addresses of D::h and
- D::j, respectively. D's vtbl entry for f continues to point to C::f, as it did
- in C's vtbl. See Figure 1 for D's completed vtbl.
- Listing 2 contains a test program that illustrates the behavior of the
- inheritance hierarchy defined in Listing 1. The statement
- B *pb = &c;
- assigns the address of c to pb, so that *pb has static type B but dynamic type
- C. Thus, a non-virtual member function call applied to *pb selects a member
- function from class B, but a virtual function call applied to *pb selects from
- class C. You can resolve the non-virtual function calls merely by looking at
- B's declaration in Listing 1. You resolve the virtual function calls by
- looking at C's vtbl in Figure 1.
- pb->g() and pb->j() call B::g and B::j, respectively because both functions
- are non-virtual in B. pb->g() is straightforward because C's vtbl doesn't even
- have an entry for g. pb->j() can be confusing because C's vtbl has an entry
- for j. But the compiler always determines whether a call is virtual or
- non-virtual based on the static type of the object. In this case, *pb has
- static type B and j is non-virtual in B, so pb->j() ignores the vtbl and
- simply calls B::j.
- pb->f() calls C::f because f is virtual in B, *pb is a C object, and C
- overrides f. Even though h is virtual in B, pb->h() still calls B::h. The call
- goes through C's vtbl, but winds up at B::h anyway because C does not override
- h.
- I won't go over every call in Listing 2, but I will call your attention to the
- calls, such as pc->B::f(), that explicitly qualify the function name with the
- name of a base class. Again, *pc has static type C and f is virtual in C.
- Without a qualified name, the call pc->f() behaves like a normal virtual
- function call, selecting the function's address from the vtbl for D, because D
- is the dynamic type of *pc. Looking in D's vtbl in Figure 1 you can see that
- the entry for f points to C::f, so that's what gets called.
- On the other hand, using an explicit base class name qualifier on the function
- name turns off the virtual call mechanism and uses static binding. That is,
- even though pc->f() is a virtual call, pc->B::f() is not. The call ignores the
- dynamic type of *pc and invariably calls B::f. This rule exists so that a
- virtual function in a derived class can call the function it overrides in a
- base class without getting stuck in an infinite recursion. My article on
- "Virtual Functions" (CUJ, December, 1993) explains this behavior in greater
- detail, including a fairly practical example that relies on it.
-
-
- Overriding Overloaded Functions
-
-
- You can overload virtual functions. That is, you can declare more than one
- virtual function with the same name in the same class, as in class stream
- shown in Listing 3. As with any set of overloaded functions, each function
- signature (the sequences of types in a formal parameter list) in an overloaded
- set of virtuals must be sufficiently distinct for the compiler to tell them
- apart. The vtbl for the class contains a distinct entry for each virtual
- function name and signature. The vtbl for class stream appears in Figure 2. No
- surprises so far.
- Deriving from a base class with overloaded virtual functions behaves pretty
- much as you'd expect, as long as you override all of the overloaded functions,
- or none of them. But, if you derive from a base class with overloaded virtual
- functions and override some, but not all, of those overloaded virtual
- functions, the results may surprise you.
- All the functions in a given set of overloaded functions must be declared in
- the same scope. Another declaration for a function with the same name in an
- inner scope doesn't add to the overloaded set; it starts a new set and
- completely hides all of the overloaded functions of the outer scope while in
- the inner scope. The inner scope can access the overloaded functions in the
- outer scope only by explicitly using a :: (the scope resolution operator).
- A class defines a new scope. The members of a class are in the scope of that
- class. Thus, a single member function declaration in a class hides all the
- overloaded functions with the same name declared in any enclosing scope, as
- shown in Listing 4.
- Listing 4 contains the definition for a class File, with a member function put
- that writes a null-terminated string to a file. File::put uses one of the
- overloaded put functions declared at file scope to actually put the string.
- Unfortunately, none of those put functions at file scope are in scope inside
- the body of File::put. Therefore, you must precede the call with :: to force
- the compiler to look for a function at file scope, as shown in the body of
- File::put in Listing 4. Otherwise, the C++ compiler thinks the call to put(s,
- f) inside File::put is a (recursive) call to File::put, but with the wrong
- number of arguments.
- Similar behavior occurs if you derive class File from a base class that
- contains several functions named put. The declaration for File::put hides all
- the overloaded put functions in the base class while in the scope of class
- File. A call to an inherited put function inside a File member function must
- attach the base class name and a :: before the function name.
- Now let's see what happens when a derived class overrides some, but not all,
- of the overloaded virtuals in its base class. Listing 5 shows a base class B
- with three virtual functions f(int), f(long) and f(char *). B's vtbl appears
- in Figure 3.
- Class C derived from B overrides only f(int). Therefore, only f(int) is
- visible in the scope of C; however, C's vtbl still has three entries: one for
- each virtual function in its base class. C's vtbl appears in Figure 3. It has
- the same layout and values as B's vtbl, except the entry for f(int) points to
- C::f(int) instead of B::f(int).
- A derived class never has fewer virtual functions (i.e., a smaller vtbl) than
- its base class. Some inherited virtual functions may be invisible in the
- derived class scope, but their addresses must still be in the vtbl. Remember,
- an object of a derived class is an object of its base class. A derived object
- has everything that its base object has, and maybe more. This applies to vtbls
- as well as the objects themselves. The vtbl for a derived class must have at
- least as many entries as the vtbl for its base class. Consider the
- consequences if this were not so.
- The virtual call mechanism relies on the assumption that a derived object is a
- base object. When a compiler encounters a virtual function call applied to an
- object, it simply translates the call into code that follows a vptr to a vtbl
- and selects the address of the appropriate function. The vtbl for the object
- involved in the call must have at least as many entries as the vtbl for the
- base class, or else the call might reach beyond the end of the object's vtbl
- and grab something that isn't a function address.
- Class D in Listing 5 derives from C and overrides only f(long). Again, its
- vtbl has three entries, as shown in Figure 3. The values in D's vtbl as the
- same as in C's, except for the value corresponding to f(long).
- Listing 6 is a test program that demonstrates the behavior of the function
- call bindings for the hierarchy in Listing 5. The first call contains no
- surprises. *pb has static type B but dynamic type C. An expression that occurs
- to the right of pb-> is in the scope of B. In the scope of B, the compiler can
- choose from three different functions named f. f(1) exactly matchs f(int).
- f(int) is virtual in B, so the compiler generates a virtual call to f(int). At
- the time the program executes, pb actually points to a C object, and C
- overrides f(int), so pb->f(1) calls C::f(int). The next two calls behave
- similarly, except that C does not override f(long) and f(char *).
- You may find the call d.f(1) surprising. It appears that f(1) matches f(int)
- exactly, so at first it seems that it should call D::f(int). But D doesn't
- override f(int), so shouldn't the expression actually call C::f(int)? Well,
- that's not what happens either. The expression to the right of d. is in the
- scope of D, where only f(long) is visible. Therefore the compiler promotes the
- argument 1 to 1L and calls D::f(long).
- You can study most of these calls on your own. I'll draw your attention to a
- couple of interesting cases.
- The call d.C::f(1L) uses explicit qualification to access an inherited
- function that's otherwise hidden. It looks like it should call an f(long), but
- it dosen't. The expression to the right of the qualifier C:: is in the scope
- of C, where only f(int) is visible. The compiler converts 1L to 1 (an int) and
- calls C::f(int). The explicit qualifier turns off the virtual call mechanism.
- The call pc->f("hello") is an error. The expression to the right of pc-> is in
- the scope of C, where only f(int) is visible. "hello" has type char *, and
- there's no standard conversion from char * to int.
-
-
- Behaving Responsibly
-
-
- As I mentioned earlier, I'm not suggesting that you'd ever want to write code
- like this. Quite to the contrary, I'm trying to shine a little light into a
- dark corner, and show you how can inadvertantly write some pretty confounding
- stuff.
- I think overloaded virtual functions are a useful feature. Common classes like
- the istream and ostream classes in iostream.h use this feature well. When you
- drive from a class with a set of overloaded virtual functions, you should
- override all or none of the functions in that set. In fact, this is a good
- guidelines even if the overloaded functions are non-virtual.
- Many compilers actually warn you when you violate this guideline. For example,
- when you compile Listing 5 and Listing 6 together, you may get a warning to
- the effect that the declarations of f(int) in C hides the declarations of
- f(long)n and f(char *) inherited from B.
- As with any guideline there are exceptions, but in this case they are rare.
- Remember that function and operator loading are there to help you write more
- intuitive code. If overloading makes it less so, then back off.
- So, Dan Saks, you've just finished your CUJ article. What are you going to do
- next?
- Figure 1 The vtbls for the classes in Listing 1
- Figure 2 vtbl for class stream in Listing 4
- Figure 3 The vtbls for the classes in Listing 5
-
- Listing 1 A class hierarchy that mixes virtual and non-virtual overriding
- #include <iostream.h>
-
- class B
- {
- public:
- virtual void f(); // virtual
- void g(); // non-virtual
- virtual void h(); // virtual
- void j(); //non-virtual
- };
- void B::f() { cout << "B::f()\n"; }
-
- void B::g() { cout << "B::g()\n"; }
-
-
- void B::h() { cout << "B::h()\n"; }
-
- void B::j() { cout << "B::j()\n"; }
-
- class C : public B
- {
- public:
- void f(); // virtual
- void g(); // non-virtual
- virtual void j(); // virtual
- };
-
- void C::f() { cout << "C::f()\n"; }
-
- void C::g() { cout << "C::g()\n"; }
-
- void C::j() { cout << "C::j()\n"; }
-
- class D: public C
- {
- public:
- void h(); // virtual
- void j(); // virtual
- };
-
- void D::h() { cout << "D::h()\n"; }
-
- void D::j() { cout << "D::j()\n"; }
-
- // End of File
-
-
- Listing 2 A test program for the hierarchy in Listing 1
- int main()
- {
- C, c;
- D, d;
-
- B* pb = &c; // ok, &c is a C * which is a B *
- pb->f(); // calls C::f()
- pb->g(); // calls B::g()
- pb->h(); // calls B::h()
- pb->j(); // calls B::j()
-
- C *pc = &d; // ok, &d is a D * which is a C *
- pc->f(); // calls C::f()
- pc->B::f(); // calls B::f()
- pc->g(); // calls C::g()
- pc->h(); // calls D::h()
- pc->C::h(); // calls B::h()
- pc->j(); // calls D::j()
- pc->C::j(); // calls C::j()
-
- B &rb= *pc; // ok, *pc is a C which is a B
- rb.f(); // calls C::f()
- rb.B::f(); // calls B::f()
- rb.g(); // calls B::g()
- rb.h(); // calls D::h()
- rb.j(); // calls B::j()
-
-
- return 0;
- }
-
- // End of File
-
-
- Listing 3 A class with overloaded virtual functions
- class stream
- {
- // ...
- public:
- virtual stream &get(char &c);
- virtual stream &get(double &d);
- virtual stream &get(char *s);
- virtual stream &put(char c);
- virtual stream &put(double d);
- virtual stream &put(const char *s);
- // ...
- };
-
- // End of File
-
-
- Listing 4 A declaration in an inner scope hides all functions with the same
- name in an outer scope
- #include <stdio.h>
-
- void put(char c, FILE *stream);
- void put(const char *s, FILE *stream);
-
- class File
- {
- FILE *f;
- public:
- File(FILE *ff) : f(ff) { }
- void put(const char *s);
- };
-
- void File::put(const char *s)
- {
- ::put(s, f); // needs :: to
- // access outer scope
- }
-
- int main()
- {
- File f(stdout);
- f.put("hello, world\n");
- return 0;
- }
-
- // End of File
-
-
- Listing 5 Aa class hierarchy that overrides some but not all overloaded
- virtual functions
- #include <iostream.h>
-
- class B
- {
-
- public:
- virtual void f(int i);
- virtual void f(long L);
- virtual void f(char *up);
- };
-
- void B::f(int i)
- {
- cout << "B::f(int i = << i << ")\n";
- }
-
- void B::f(long L)
- {
- cout << "B::f(long L) = " << L <<")\n";
- }
-
- void B::f(char *p)
- {
- cout << "B::char *p = \"" << p << "\")\n";
- }
-
- class C : public B
- {
- public:
- void f(int i); // virtual
- };
-
- void C::f(int i)
- {
- cout << "C::f(int i = "<< i<<")\n";
- }
-
- class D : public C
- {
- public:
- void f(long L); // virtual
- };
-
- void D::f(long L)
- {
- cout << "D::f(long L = "<< L <<")\n";
- }
-
- // End of file
-
-
- Listing 6 A test program for the hierarchy in Listing 5.
- int main()
- {
- C c;
- D d;
-
- B *pb = &c; // ok, &c is a C * which is a B*
- pb->f(1); // calls C::f(int)
- pb->f(2L); // calls B::f(long)
- pb->f("hello"); // calls B::f(char *)
-
- d.f(1); // calls D::f(long)
- // C::f(int) is hidden
-
- // but 1 promotes to long
- d.C::f(1); // calls C::f(int)
- d.B::f(1); // calls B::f(int)
- d.f(1L); // calls D::f(long)
- d.C::f(1L); // calls C::f(int)
- // B::f(long) is hidden
- // but 1L converts to int
- d.B::f(1L); // calls B::f(long)
-
- C *pc = &d; // ok, &d is a D * which is a C*
- pc->f(1); // calls C::f(int)
- pc->f(1L) // calls C::f(int)
- // B::f(long)is hidden
- // but 1L converts to int
- pc->B::f(1L); // calls B::f(long)
- pc->f("hello"); // error! f(char *) is hidden
- pc->B::f("hello"); // calls B::f(char *)
-
- B &rb = *pc; // ok, *pc is a C which is a B
- rb.f(1); // calls C::f(int)
- rb.f(2L); // calls D::f(long)
- rb.B::f(2L); // calls B::f(long)
- rb.f("hello"); // calls B::f(char *)
-
- return 0;
- }
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- On the Networks
-
-
- Concurrent Development
-
-
-
-
- Sydney S. Weinstein
-
-
- Sydney S. Weinstein, CDP, CCP is a consultant, columnist, lecturer, author,
- professor, and President of Myxa Corporation, a consulting and contract
- programming firm specializing in databases, data presentation and windowing,
- transaction processing, networking, testing and test suites, and device
- management for UNIX and MS-DOS. He can be contacted care of Myxa Corporation,
- Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail
- on the Internet/USENET mailbox syd@myxa.COM (dsinc!syd for those who cannot do
- Internet addressing).
-
-
- For years, UNIX developers have used SCCS and RCS to control source code and
- deal with tracking changes. But users of these two programs always have to
- deal with a potential problem. If more than one developer is working on a
- project, there is always the chance that the their efforts will collide. This
- collision occurs when the two developers both try to update the same file at
- the same time.
- RCS and SCCS solve this problem by allowing one of the developers to lock on
- file for editing. In doing so, the person who locks the file is the only one
- who can make changes to the file. This scheme works well, but leads to another
- problem, deadlock. Popular files get locked by almost everyone; and of course,
- with the one-lock-at-a-time method, the other developers must wait until the
- first developer is done with it. Those who miss the chance to grab a file may
- have to wait a while as its current holders develop their changes, integrate
- and test them, make them permanent, and release the lock.
- Peter Miller <pmiller@bmr.gov.au> provides a possible solution, aegis-2.1,
- posted as Volume 27, Issues 36-54 of comp. sources. unix. Quoting Peter:
- "The aegis program is a CASE tool with a difference. In the spirit of the UNIX
- Operating System, the aegis program is a small component designed to work with
- other programs."
- aegis is a project change supervisor. It provides a framework within which a
- team of developers may work on many changes to a program independently. aegis
- coordinates the integration of these changes back into the master source of
- the program, with as little disruption as possible. Resolution of contention
- for source files, a major headache in any project with more than one
- developer, is one of the aegis's major functions.
- aegis uses a software development model consisting of a project master source
- (or baseline) of a project, and a team of developers creating changes to be
- made to this baseline. When a change is complete, aegis integrates it with the
- baseline, to create a new baseline. aegis requires each change to be atomic
- and self-contained, and allows no change to cause the baseline to cease
- working. "Working" for a baseline is defined as passing its own tests. The
- tests are considered part of the baseline.
- To ensure that changes can't make the baseline stop working, aegis mandates
- that changes be accompanied by at least one test, and that all such tests
- complete successfully. These steadily accumulated tests form an ever
- increasing regression test suite for all later changes. aegis also mandates a
- review stage for each change to the baseline.
- One benefit in using aegis is that there are only O(n) interactions between
- developers and the baseline. Contrast this number with the number of
- interactions for a master source that is being edited directly by the
- developers -- there are O(n!) interactions between developers, which makes
- adding "just one more" developer a potential disaster. Another benefit of
- aegis is that the project baseline always works. Always having a working
- baseline lets you always have a version available for demonstrations, or those
- "pre-release snapshots" developers are always forced to provide.
- aegis is often compared to CVS, the RCS tool for handling concurrent
- development. In effect, aegis adds baseline validity checks to CVS, since
- aegis requires a change to pass validity checks before it is added to the
- baseline.
-
-
- More for UNIX
-
-
- Continuing with the highlights this month from comp.sources.unix: Gordon Ross
- <gwr@mc.com> extended bootp-2.2. B, posted as Volume 27, Issues 63 and 64.
- BOOTP is a server for booting systems over the network. New in this version is
- support for clients that need the server to honor the format of the options
- area, and support for an extended option area. In addition the source now
- works with SVR4 systems. A patch was issued in Volume 27, Issue 76.
- In addition, Gordon submitted a test program for exercising BOOTP servers.
- This program appears as Volume 27, Issue 65 and can be used to debug problems
- encountered when developing or using BOOTP servers.
- Uwe Doering <fas@geminix.inberlin.de> contributed an important re-release of
- FAS as fas-2.11.0, in Volume 27, Issues 67-74. The re-release contains an
- important change which affects users of FAS on SCO UNIX. This change fixes a
- bug in FAS which caused reliability problems (crashes) with SCO UNIX. Other
- changes in this new version include performance improvements, bug fixes, more
- options on hardware flow control, including DSR as well as CTS lines, and
- enhanced support for 57,600 and 115,200 bps modes.
- Much discussion has occurred on the net on how to verify that a file is the
- original file and is unmodified, especially with the often "lossy" connections
- used to transfer files around the network. Checksum programs can help verify a
- file's originality -- several checksum programs which have been developed over
- the years include MD5, Snefru-8, and the traditional CRC-32. Daniel J.
- Bernstein <djb@silverton.berkeley.edu> has combined all of these into a single
- package, fingerprint, which he submitted for Volume 27, Issues 79 and 80.
- fingerprint provides a common set of programs and C-callable libraries for
- producing a universal fingerprint of a file.
- If you find Make limited in what it can do, and would prefer a full language
- for makefiles complete with conditionals, loops, and all those other features
- you expect in a full language, consider jam from Christopher Seiwald
- <seiwald@vix.com>. jam is a make-like program that adds full language
- features, plus a centralized database for project-wide rules, as well as
- several other extensions. jam was posted as Volume 27, Issues 81-85.
- Wayne Davison <davison@borland.com> contributed mthreads, a news database
- manager that processes one or more newsgroups into a tree-structured list of
- articles related by their References and Subject lines. For each group you
- enable, mthreads produces a .thread file that trn (Threaded Read News) can
- read to get a quick summary of the articles in the group and how they are
- related. mthreads takes up about three to five percent of your news-spool if
- you enable all groups. (Any site which is not running INN must run mthreads to
- use trn-3.3.) mthreads is posted as Volume 27, Issues 90-93.
- Wayne followed this posting with a new release of Threaded RN, trn-3.3, as
- Issue 27, Volumes 94-105. trn is a newsreader that uses an article's
- references to display discussions in a natural reply-ordered sequence called
- threads. New in 3.3 is support for a default subscription list for new users,
- better handling of redirected and disabled groups, support for MIME, and
- various bug fixes.
- The collection of useful libraries grew once again with the contribution of
- clc by Panos Tsirigotis <panos@anchor.cs.colorado.edu> posted as Volume 27,
- Issues 106-126. The C Libraries Collection includes the following:
- dict. Support for various types of data structures, including doubly-linked
- lists, hash tables and binary search trees.
- fsma. Support for quick memory allocation/deallocation of fixed-size objects.
- misc. A collection of generic functions including functions for managemeat of
- environment variables, a tree walk function to replace ftw(3), and functions
- to get the basename/dirname of a pathname.
- pq. An implementation of priority queues using heaps.
- pset. Support for pointer sets, implemented as dynamic pointer arrays.
- sio. Support for fast stream I/O, optionally using memory mapping for input if
- the operating system supports it. sio din includes four types of functions:
- string matching functions (offering the Boyer-Moore, Knuth-Morris-Pratt,
- Rabin-Karp, and Shift-OR algorithms), string printing functions, string
- parsing functions, and string utility functions.
- timer. Support for multiple timers by multiplexing the timers provided by the
- operating system.
- xlog. This library provides logging objects which can be connected either to
- syslog or to a file. Objects connected to files may be customized to not
- exceed a certain file size.
- David I. Bell <dbell@canb.auug.org.au> contributed a new release of
- calc-2.9.0, posted as Volume 27, Issues 127- 146. calc is an arbitrary
- precision C-like programmable calculator with many built-in functions. calc's
- basic data types are integers, fractions, complex numbers, strings, matrices,
- associations, lists, files, and user-definable "objects." You can use calc
- interactively to evaluate expressions line by line, or you can write
- complicated programs in its C-like language. New in this version is ANSI C
- support, initialization of new objects and variables to zero instead of null,
- addition of static variable support, and many bug fixes.
-
-
- Plot your MISC
-
-
- The major update in comp.sources.misc is version 3.5 of gnuplot from Alexander
- Woo <woo@playfair. stanford.edu>. Posted as Volume 40, Issues 13-45, gnuplot
- is a command-line driven interactive function plotting utility for UNIX,
- MSDOS, and VMS platforms. gnuplot was originally intended as a graphical
- program to allow scientists and students to visualize mathematical functions
- and data. Additions to this version of the software allow plots of
- three-dimensional functions and data files. gnuplot supports many different
- types of terminals, plotters, and printers and is easily extensible to include
- new devices. gnuplot handles both 2-D and 3-D coordinate spaces and objects.
- This release marks the end of Alexander's tenure as the coordinator of the
- gnuplot effort. He has passed the hat on to Alexander Lehmann, who has
- volunteered to coordinate the next release. Send all new contributions to
- bug-gnuplot@ dartmouth.edu.
- Angus Duggan <ajcd@dcs.ed.ac.uk> has submitted an update to psutils posted as
- Volume 39, Issues 93-96. This package is a set of utilities for manipulating
- PostScript documents. This version supports page selection and rearrangement,
- including arrangement into signatures for booklet printing, and page merging
- for n-up printing. In addition, this version includes several shell and perl
- scripts for converting many postscript output formats into the standard paging
- conventions so psutils can process them. A patch appeared in Volume 41, Issue
- 29 to add several new features and regularize several options across the
- utilities.
- Michael Peppler <mpeppler@itf.ch> re-released Sybperl at patch level 8 as
- Volume 39 Issues 101-103. Sybperl is a set of extensions to perl to add direct
- accesses to sybase databases via the dblibrary. Additions in this level
- include a user settable variable to control null return on queries and to
- control how binary data is formatted. The new patch also adds several internal
- calls to avoid packaging problems, and access to the bulk copy calls in
- dblibrary. Patch 9 was posted in Volume 40, Issue 5; it can set the
- application name for the sybase process and it fixes some bugs.
- Ted A. Campbell <tcamp@delphi.com> has contributed bwbasic, Bywater BASIC
- Interpreter/Shell, version 2.10, as Volume 40, issues 52-54. bwbasic
- implements a large superset of the ANSI Standard for Minimal BASIC
- (X3.60-1978) and a significant subset of the ANSI Standard for Full BASIC
- (X3.113-1987) in C. bwbasic also offers shell programming facilities as an
- extension of BASIC. New features in this version are compatibility with K&R C
- compilers, implementation of ANSI-BASIC-style structured programming,
- enhancements to the interactive environment, bug fixes, and portability
- enhancements.
- Mike Gleason <mgleason@cse.unl.edu> contributed a new release of ncftp as
- Volume 40, Issues 76-81. ncftp is an alternative user interface for the ftp
- utility, ncftp offers a much more friendly and convient interface than
- standard ftp programs. New features in 1.6 includes support for the term
- package used by Linux, support for SCO Xenix, AIX and Dynix/PTX, incorporation
- of all the fixes from patches, and then some.
- Adam Costello <amc@wuecl.wustl.edu> contributed an improved paragraph
- formatter, par, as Volume 38, Issues 114-116. par is a filter which reformats
- each paragraph as it copies its input to its output. par generates each output
- paragraph from the corresponding input paragraph according to the following
- rules: par removes an optional prefix and/or suffix from each input line,
- divides the remainder of the line into words (delimited by spaces), joins the
- words into lines to make an eye-pleasing paragraph, then reattaches the
- prefixes and suffixes. If a line contains suffixes, par inserts spaces before
- them so that they all end in the same column. par provides many other options
- to control paragraph formatting. par's main benefit over fmt is its support
- for prefix and suffix characters, and its allowance of leading indention.
- Panos Tsirigotis <panos@cs.colorado.edu> contributed pst, a program which
- extracts the text of a PostScript file and which tries to make that text
- appear as it does in the PostScript document. pst is posted as Volume 40,
- Issues 172-177. pst tries to avoid splitting words and to maintain paragraph
- boundaries. To achieve these goals, pst first tries to identify how the
- PostScript file was produced, so that it can use an appropriate text
- extraction algorithm. The user may also specify which of the available
- algorithms to use to extract text.
-
- On the patch front, remind received some additions via Patches 9 (Volume 39,
- Issues 115-118), 10 (Volume 40, Issues 48-50), and 11 (Volume 40, Issue
- 167-171). New features in patch 9 include display of moon phases (including
- little moon symbols in the PostScript calendar), ANSI color terminal support,
- flexibility for Sunday or Monday as start of week, better control of
- PostScript calendar printouts, plus the usual bug fixes. Patch 10 added 0S/2
- support, more sorting options, MS-DOS and OS/2 test suites, and more
- PostScript tuning. Patch 11 adds release notes for UNIX and OS/2 for pop-up
- alarms, improvements for OS/2 and some optimizations, plus bug fixes.
- Aphael Manfredi <ram@acri.fr> issued a few more patches to dist-3.0 to handle
- more updates to the module library. Patch 12-13 (Volume 40, Issue 46-47)
- updated dist to use a never version of itself, and added many changes to the
- modules based on feedback from developers using the disk package. Patch 14
- (Volume 40, Issue 128) fixes some bugs and adds some consistency checking.
- The C++ libraries for parsing UNIX-like command line options and arguments
- from Brad Appleton <brad@amber.ssd.csd.harris.com> received patches. Patch 2,
- for options, was posted as Volume 40, Issue 157. Patch 2 added hidden options
- and fixed some portability problems. Patch 3 for cmdline documented secret
- arguments to cmdparse, and fixed some bugs.
-
-
- Heavy Rolodexs
-
-
- Though they're not too portable, and a bit heavy, computers make good
- rolodexs. Gregg Hanna <gregor@kafka.saic.com> has contributed as Volume 21,
- Issues 810, mrolo, to comp.sources.x. mrolo is a set of card file database
- programs designed to be simple yet robust, and was born out of frustration
- with xrolo's and xrolodex's many menus. The main program is mrolo, the Motif
- based card file program; the contribution also includes crolo, the
- curses-based equivalent. Some of mrolo's features include:
- From the main screen you see a list of all cards and may scroll through them
- viewing/editing cards as desired. You may jump to a section of the card file
- by clicking on a lettered tab (A-Z) on the edge of the screen (implemented in
- mrolo) or by typing the upper-case letter (implemented in crolo).
- You can search cards quickly by simply entering text in a text field on the
- main screen. (You can easily do a regular expression search, if you want, just
- start the text with a slash.)
- There are no explicit save/load operations, the current display reflects the
- disk file's contents and mrolo verifies and writes changes at the time of the
- change.
- New features in version 1.3 include regular expression matching, addition of
- crolo (curses version) for dial-up use, an "as of" feature for tracking card
- dating, filter constraints, and the usual bug fixes. Two patches appeared,
- Patch 1 in Volume 21, Issue 43 and patch 2 in Volume 44. Both are bug fix
- patches.
- George Ferguson <ferguson@cs.rochester.edu> additionally, published two
- patches to xarchie. Patch 2 (Volume 21, Issue 1) is labeled internally patch
- 8, and brings xarchie to 2.0.8. This patch fixes a minor bug in the X routines
- and adds additional server information. Patch 3 (Volume 21, Issue 7) takes
- xarchie to 2.0.9, adds more server info and fixes a misunderstanding in the
- FTP cwd command.
- Robert Andrew Ryan <rr2b+@andrew.cmu.edu> contributed sxpc, a program to
- compress the X protocol stream, as Volume 21, Issue 12. sxpc provides up to
- 75% compression of the X protocol stream. sxpc's intended use is to improve
- the performance of X applications over a slow internet connection (e.g. slip,
- cslip, or term). sxpc assumes there is a Unix operating system at both ends of
- the link.
- Brian V. Smith <envbvs@epb12.lbl.gov> contributed patch 2 to xfig as Volume
- 21, Issues 21-36. xfig is an X program to draw and manipulate objects. New
- features include changes in minimum movements, display of lengths, not just
- resize/move on highlight, automatic use of scalable fonts if available, on
- screen rotation of text, double click support on more menu items, keyboard
- accelerators, and many bug fixes.
- Yearning for the "good old days" when outputting simple graphs didn't require
- so much window and event management code, Antoon Demaree <demaree@imec.be>
- contributed xvig, X Window Virtual Graphics as Volume 21, Issues 48-57. xvig
- is a simple graphics library that is I/O driven and supports opening windows,
- drawing shapes and text, and dealing with cursors. xvig produces simple graphs
- without scroll bars, pop-up menus, and fancy text features.
-
-
- Previews from alt.sources
-
-
- As usual, there's plenty in alt.sources so here are just a couple of
- highlights of what's to come in the mainstream groups.
- In a blast from the far past, James Hightower <jamesh@netcom.com> posted
- focal, a FOCAL interpreter. He didn't write it, and is interested in who did,
- but he posted it. It's from 1981, and provides a version of the FOCAL language
- as used to run on the old PDP-8 computers. focal was posted on October 16,
- 1993 in one part. Jonathan A. Chandross <jac@pilot.njin.net> saw it, tried to
- compile it, and failed, and re-posted a corrected version. Chandross also
- added a makefile, documentation, and an example FOCAL program. He posted his
- corrected version on October 22, 1993, also in one part.
- Thomas Driemeyer <thomas@bitrot.inberlin.de> posted a schedule planner based
- on X/Motif in 11 parts on November 21, 1993. plan displays a month calendar
- similar to xcal, but its every-day box is large enough to show appointments in
- small print. By pressing on a day box, you can list and edit the appointments
- for that day. plan also supports warn and alarm times and several methods of
- listing appointments.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Code Capsules
-
-
- Variable-Length Argument Lists
-
-
-
-
- Chuck Allison
-
-
- Chuck Allison is a regular columnist with CUJ and a software architect for the
- Family History Department of the Church of Jesus Christ of Latter Day Saints
- Church Headquarters in Salt Lake City. He has a B.S. and M.S. in mathematics,
- has been programming since 1975, and has been teaching and developing in C
- since 1984. His current interest is object-oriented technology and education.
- He is a member of X3J16, the ANSI C++ Standards Committee. Chuck can be
- reached on the Internet at allison@decus.org, or at (801)240-4510.
-
-
- If printf is not the most widely used function in the standard C library, it
- is certainly the most flexible. printf, and its companions sprintf and
- fprintf, are the workhorses of formatted output. Their ability to process a
- variable number of arguments makes these functions very useful indeed. The
- format string in the statement
- printf("%d, %s\n",n,s);
- tells printf that it must extract an int and then a pointer to char from the
- argument space (usually the program stack). But how does printf get at the
- optional arguments? Well, I can't show you the inner workings of printf, but I
- can show you the mechanism C provides to handle variable-length argument
- lists. In this article I will show how you can write functions of your own
- that accept a variable number of arguments, and why you might want to do so.
-
-
- The Ellipsis Prototype Specification
-
-
- printf's function prototype in your compiler's stdio.h should look something
- like
- int printf(const char *, ...);
- The ellipsis tells the compiler to allow zero or more arguments of any type to
- follow the first argument in a call to printf. The format string communicates
- the number and type of the caller's optional arguments to the printf function.
- For printf to behave correctly, the arguments that follow the format string
- must match the types of the corresponding edit descriptors in the format
- string. If the argument list contains fewer arguments than the format string
- expects, the results are undefined. If the argument list contains more
- arguments than expected, printf ignores them. The bottom line is that when you
- use the ellipsis in a function prototype you are telling the compiler not to
- type-check your optional arguments because you think you know what you're
- doing -- so be sure that you do.
-
-
- Variable-Length Argument Lists from Scratch
-
-
- It's not too difficult to write your own functions that will accept a variable
- number of arguments. The program in Listing 1 extracts the maximum of a list
- of integers. It uses a fixed integer argument to communicate the number of
- elements in the list. The program assumes that the list begins immediately
- after the integer n in memory, that is, at address &n + 1.
- The program in Listing 2 extends this technique to lists of mixed types. Since
- the pointer p must visit arguments of different size, I have defined it to be
- a pointer to char. To extract an object of a certain type from the argument
- space, I just apply the appropriate cast and dereference, as in
- s = *(char **) p;
- and I use pointer arithmetic to skip over the extracted object:
- p + = sizeof s;
- The following macros, which automate this process, appear in Listing 3:
- #define first_arg(x,p)
- p = (char *) &x + sizeof(x)
- #define next_arg(p,T,x)
- x = *(T*)p; p += sizeof(T)
- first_arg initializes p with the address of the first argument that follows
- the fixed argument x. next_arg assigns an object of type T to x and then skips
- past it.
-
-
- The va_list Mechanism
-
-
- The programs in Listing 1 through Listing 3 are okay for illustration
- purposes, but they aren't portable; they only work on platforms that store
- arguments linearly in order of increasing address and that leave no holes
- between arguments. Fortunately, Standard C defines a portable method for
- processing variable-length argument lists, with the macros defined in stdarg.h
- (see Listing 4). The stdarg header defines a new type, va_list ("variable
- argument list"), which refers to the list of optional, trailing arguments in a
- function call. The statement
- va_start(args,npairs);
- initializes args to point to the va_list adjacent to the fixed argument
- npairs. To extract an object from the list and "advance" to the next, call
- va_arg, specifying the desired type:
- n = va_arg(args,int);
- To be completely portable, you must close a va_list with the va_end macro
- (although on my compiler it is just a no-op). Listing 5 shows how to code a
- portable version of the maxn function from Listing 1.
- As you can see, to use variable-length argument lists you must provide two
- things:
- 1) At least one fixed argument (always the last before the ellipsis) to
- initialize the va_list, and
- 2) Some mechanism that communicates the number and/or type of arguments to the
- function.
- The following function prototype is both useless and syntactically invalid:
- void f(...);
- /* Location of args unknown */
-
- There are a number of ways to satisfy point 2). For example, the program in
- Listing 6 concatenates a variable number of strings into its fixed string
- argument. The program processes one string after another until it finds a null
- pointer in the va_list. A call such as
- concat(s,NULL);
- initializes s to the empty string.
- As another example, consider a certain screen interface library that supports
- data entry tables. The library has a function table_put_row that allows you to
- fill rows with initial data. For example, if you have defined the columns of a
- table to represent name, occupation and salary fields, you can populate the
- table like this:
- table_put_row(tp,0,"Sandra",
- "Executive","57000");
- table_put_row(tp,1,"James",
- "Mechanic","45000");
- table_put_row(tp,2,"Kimberly",
- "Musician","66000");
- /* etc. */
- where tp is a pointer to a table structure and the second argument is the row
- number. As you can see in Listing 7, table_put_row doesn't need a parameter
- specifying the number of field arguments since it can infer that number
- directly from the Table structure.
-
-
- va_lists as Arguments
-
-
- Listing 8 presents a useful function, fatal, which prints a formatted message
- to stderr and exits a program gracefully. You call it as you would printf,
- with a format string and a list of parameters, such as
- fatal("Error %d on device %d\n",
- err,dev);
- What you would like to do is just pass the format string and print arguments
- to some function that implements the printf machinery. The C library function
- vfprintf makes this very easy. All you have to do is initialize a va_list with
- the print arguments and pass it as the third argument. As you would expect,
- the C library includes the companion functions vprintf and vsprintf as well.
-
-
- Why the Fuss?
-
-
- Programmers coming from most any other language may wonder why we need all of
- this machinery. For example, the FORTRAN programmer is quite accustomed to
- writing print statements that give no explicit information as to number or
- type of arguments in its argument list:
- * Output two numbers:
- PRINT *, x, y
- In a FORTRAN statement, to find the maximum of a list of numbers, only the
- numbers appear:
- PRINT *, MAX(1,3,2)
- The reason programmers can do this in FORTRAN is that statements such as PRINT
- and MAX are part of the language (FORTRAN calls them intrinsic functions). The
- compiler knows their requirements and therefore can supply the appropriate
- information. In C, on the other hand, there is no input, output, or any other
- functionality built into the language except for what the operators provide.
- The C philosophy is to keep the language small and to supply needed
- functionality with libraries. Since the only communication between libraries
- and the compiler is the function call mechanism, you must provide a function
- all the needed information when you call it.
-
-
- An Application
-
-
- In financial and other numerical applications you often want to express
- integers, such as monetary amounts, as groups of numbers separated by commas:
- $11,235,852
- One approach is to convert the number to a string with sprintf and then
- traverse the string backwards, copying it to another string and inserting
- commas as needed. Another approach, which I present here, solves the more
- general problem of creating strings backwards.
- The program in Listing 9 calls a function prepend to build a string backwards.
- You pass prepend three arguments: the output buffer, an offset which points to
- the first character in the populated portion of the string, and the string to
- prepend. After it's finished, prepend returns the new offset.
- The following diagrams show the state of s[] after each call to prepend:
- Click Here for Diagram
- Listing 10 presents the implementation of prepend along with a function
- preprintf, which allows you to prepend strings with formatting. preprintf uses
- vsprintf to create the formatted string, and then calls prepend to tack it
- onto the front of the existing string.
- I can now implement a function commas, in terms of prepend and preprintf (see
- Listing 11). As I extract each digit in turn, moving right to left by the
- usual remainder and quotient calculations, I push that digit onto a static
- character buffer, inserting commas where necessary. commas returns a pointer
- to the beginning of the completed string, which may or may not coincide with
- the beginning of the buffer. Note that the numeric base and the grouping size
- are parameterized.
-
-
- Conclusion
-
-
- In this article I have illustrated the hows and whys of variable-length
- argument lists. The stdarg macros are a lot like a parachute -- you don't need
- one very often, but when you do, usually nothing else will suffice. Since
- these macros involve a relaxation of C's argument type-checking mechanism, be
- sure to use them with care.
-
- Listing 1 Finds the largest of n integers
- /* max1.c */
- #include <stdio.h>
-
- int maxn(size_t n,...)
- {
- int x;
-
- int *p = (int *) (&n + 1);
- int m = *p;
-
- while (--n)
- {
- x = *++p;
- if (x > m)
- m = x;
- }
- return m;
- }
-
- main()
- {
- printf ("max = %d\n",maxn(3,1,3,2));
- return 0;
- }
-
- /* Output:
- max = 3
-
- /* End of File */
-
-
- Listing 2 Extracts integers/string argument pairs
- /* vargs1.c */
- #include <stdio.h>
-
- void int_string_pairs(size_t npairs,...)
- {
- int n;
- char *s;
- char *p = (char *) &npairs + sizeof npairs;
-
- while (npairs--)
- {
- n = *(int *) p;
- p += sizeof n;
- s = *(char **) p;
- p += sizeof s;
- printf("%d, %s\n",n,s);
- }
- }
- main()
- {
- int_string_pairs(3,1,"one",2,"two",3,"three");
- return 0;
- }
-
- /* Output:
- 1, one
- 2, two
- 3, three
-
- /* End of File */
-
-
- Listing 3 Encapsulates the parameter extraction logic
- /* vargs2.c */
-
- #include <stdio.h>
-
- #define first_arg(x,p) \
- p = (char *) &x + sizeof(x)
- #define next_arg(p,T,x) \
- x = *(T*)p; p += sizeof(T)
-
- void int_string_pairs(size_t npairs,...)
- {
- int n;
- char *s, *p;
-
- first_arg(npairs,p);
- while (npairs--)
- {
-
- next_arg(p,int,n);
- next_arg(p,char *,s);
- printf("%d, %s\n",n,s);
- }
- }
-
- main()
- {
- int_string_pairs(3,1,"one",2,"two",3,"three");
- return 0;
- }
-
- /* End of File */
-
-
- Listing 4 Uses the macros in stdarg.h to process a variable-lenght argument
- list
-
- /* vargs3.c */
- #include <stdto.h>
- #include <stdarg.h>
-
- void int_string_pairs(size_t npairs,...)
- {
- int n;
- char *s;
- va_list args;
-
- va_start(args,npairs);
- while (npairs--)
- {
- n = va_arg(args,int);
- s = va_arg(args,char *);
- printf("%d, %s\n",n,s);
- }
- va_end(args);
- }
-
- main()
- {
- int_string_pairs(3,1,"one",2,"two",3,"three");
- return 0;
- }
-
-
- /* End of File */
-
-
- Listing 5 Implements maxn via the stdarg macros
- /* max2.c */
- #include <stdio.h>
- #include <stdarg.h>
-
- int maxn(size_t count, ...)
- {
- int n, big;
- va_list numbers;
-
- va_start(numbers,count);
-
- big = va_arg(numbers,int);
- while (count--)
- {
- n = va_arg(numbers,int);
- if (n > big)
- big = n;
- }
-
- va_end(numbers);
- return big;
- }
-
- main()
- {
- printf("max = %d\n",maxn(3,1,3,2));
- return 0;
- }
-
- /* End of File */
-
-
- Listing 6 Concatenates a variable number of strings
- /* concat.c */
- #include <stdarg.h>
- #include <stdio.h>
- #include <string.h>
-
- char * concat(char *s,...)
- {
- va_list strings;
- char *p;
-
- /* Copy first string */
- va_start(strings,s);
- if ((p = va_arg(strings,char *)) == NULL)
- {
- *s = '\0';
- return s;
- }
- else
- strcpy(s,p);
-
- /* Append others */
- while ((p = va_arg(strings,char *)) != NULL)
-
- strcat(s,p);
- return s;
- }
-
- main()
- {
- char buf[BUFSIZ];
- concat(buf,"Sweet","Talker","Betty","Crocker",NULL);
- printf("\"%s\"\n",buf);
- return 0;
- }
-
- /* Output:
- "SweetTalkerBettyCrocker"
-
- /* End of File */
-
-
- Listing 7 Uses a va_list to populate tables
- #include <stddef.h>
- #include <stdarg.h>
- #include "column.h"
-
- typedef struct Table
- {
- Column *columns;
- size_t num_columns;
- /* other details omitted */
- } Table;
-
- void table_put_row(Table *tp, int row, ...)
- {
- if (tp);
- {
- int i;
- va_list strings;
-
- /* Load each column element from va_list */
- va_start(strings,row);
- for (i = 0; i < tp->num_columns; ++i)
- column_put(tp->columns[i],row, va_arg(strings,char *));
- va_end(strings);
- }
- }
-
- /* End of File */
-
-
- Listing 8 Builds a variable format string by passing a va_list to vfprintf
- /* fatal.c: Exit program with an error message */
-
- #include <stdio.h>
- #include <stdlib.h>
- #include <stdarg.h>
- #include <string.h>
-
- void fatal(char *fmt, ...)
- {
- va_list args;
-
-
- if (strlen(fmt) > 0)
- {
- va_start(args,fmt);
- vfprintf(stderr,fmt,args);
- va_end(args);
- }
- exit(1);
- }
-
- /* End of File */
-
-
- Listing 9 Illustrates the use of prepend
- #include <stdio.h>
- #include <assert.h>
-
- #define WIDTH 11
-
- extern int prepend(char *, unsigned, char *);
-
- main()
- {
- char s[WIDTH+1];
- int offset = WIDTH;
-
- s[offset] = '\0';
- offset = prepend(s,offset,"three");
- assert(offset >= 0);
- puts(s+offset);
-
- offset = prepend(s,offset,"two");
- assert(offset > = 0);
- puts(s+offset);
-
- offset = prepend(s,offset,"one");
- assert(offset > = 0);
- puts(s+offset);
- return 0;
- }
-
- /* Output:
- three
- twothree
- onetwothree
-
- /* End of File */
-
-
- Listing 10 Functions to build strings backwards
- /* preprint.c: Functions to prepend strings */
-
- #include <stdio.h>
- #include <string.h>
- #include <stdarg.h>
- #include <stdlib.h>
-
- int prepend(char *buf, unsigned offset, char *new_str)
- {
-
- int new_len = strlen(new_str);
- int new_start = offset - new_len;
- /* Push a string onto the front of another */
- if (new_start >= 0)
- memcpy(buf+new_start,new_str,new_len);
-
- /* Return new start position (negative if underflowed) */
- return new_start;
- }
-
- int preprintf(char *buf, unsigned offset, char *format, ...)
- {
- int pos = offset;
- char *temp = malloc(BUFSIZ);
-
- /* Format, then push */
- if (temp)
- {
- va_list args;
-
- va_start(args,format);
- vsprintf(temp,format,args);
- pos = prepend(buf,offset,temp);
- va_end(args);
- free(temp);
- }
- return pos;
- }
-
- /* End of File */
-
-
- Listing 11 Uses prepend and preprintf to format numbers with comma separators
- /* commas.c: Converts a number into a string with commas */
-
- #include <stdio.h>
-
- #define BASE 10
- #define GROUP 3
-
- /* Need space to hold the digits of an unsigned long,
- * intervening commas and a null byte. It depends on
- * BASE and GROUP above (but logarithmically, not
- * as a constant. so we must define it manually here)
- */
- #define MAXTEXT 14 /* For BASE = 10 */
-
- int prepend(char *, unsigned, char *);
- int preprintf(char *, unsigned, char *, ...);
-
- char *commas(unsigned long amount)
- {
- short offset = MAXTEXT-1, /* where the string "starts" */
- place; /* the power of BASE for */
- /* current digit */
- static char text[MAXTEXT];
-
- text[offset] = '\0';
-
-
- /* Push digits right-to-left with commas */
- for (place = 0; amount > 0; ++place)
- {
- if (place % GROUP == 0 && place > 0)
- offset = prepend(text,offset,",");
- offset = preprintf(text,offset,"%x",amount % BASE);
- amount /= BASE;
- }
- return (offset >= 0) ? text + offset : NULL;
- }
-
- main()
- {
- puts(commas(1));
- puts(commas(12));
- puts(commas(123));
- puts(commas(1234));
- puts(commas(12345));
- puts(commas(123456));
- puts(commas(1234567));
- puts(commas(12345678));
- puts(commas(123456789));
- puts(commas(1234567890));
- return 0;
- }
-
- /* Output:
- 1
- 12
- 123
- 1,234
- 12,345
- 123,456
- 1,234,567
- 12,345.678
- 123,456,789
- 1,234,567,890
- */
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- CUG New Releases
-
-
- C++SIM, NNUTILS, and a Small-but-Good Combo
-
-
-
-
- Victor R. Volkman
-
-
- Victor R. Volkman received a BS in Computer Science from Michigan
- Technological University. He has been a frequent contributor to The C Users
- Journal since 1987. He is currently employed as Senior Analyst at H.C.I.A. of
- Ann Arbor, Michigan. He can be reached by dial-in at the HAL 9000 BBS (313)
- 663-4173 or by Usenet mail to sysop@hal9k.com.
-
-
-
-
- Introduction
-
-
- Every month I get dozens of e-mail inquiries about how to obtain archive
- distributions listed in these monthly columns. I only reply that people should
- use the C Users Order Form in the center portion of the magazine. Using the
- Order Form is simpler and easier than chasing after archives from online
- services.
- Of course, the CUG does not claim to be the exclusive distributor of anything
- in its library. This means that these archives are frequently available from
- many online sources, including FTP servers, Compuserve, and dial-up BBSes.
- These online sources often have older versions or sometimes just patches. What
- CUG guarantees is that each archive is completely intact, and fully authorized
- for distribution.
-
-
- New Library Acquisitions
-
-
- C++SIM (CUG#394): Simula-style class libraries for discrete event process
- based simulation.
- Input-Edit, SORTLIST AVL, and Typing Tutor (CUG #395): multi-platform
- user-input line editor, AVL balanced binary tree, and typing instructor for
- Curses
- NNUTILS (CUG #396): neural net source library & tutorial
-
-
- C++SIM Discrete Simulations: CUG 394
-
-
- C++SIM is a newly released package from M.C. Little, at the Department of
- Computing Science in the University of Newcastle upon Tyne (England), and D.L.
- McCue at Xerox Corp. (Webster, NY). The C++SIM discrete event process based
- simulation package provides Simula-style class libraries. The same
- distribution also includes the SIMSET linked list manipulation facilities.
- (According to MacLennan [1], Simula was the first computer language to
- incorporate the ideas of "class" and "object" constructs back in 1967.) SIM++
- currently claims usability only on UNIX workstations, such as SUN Spares.
- C++SIM version 1.0 (released 06/15/92) is now available as CUG volume #394.
- C++SIM uses inheritance throughout the design to an even greater extent than
- already provided by Simula. C++SIM's use of inheritance allows you to add new
- functionality without affecting the overall system structure. Thus, C++SIM
- provides for a more flexible and expandable simulation package.
- C++SIM provides the following classes: Process, ProcessList, ProcessIterator,
- ProcessCons, Random, Element & Head, thread, lwp_thread, and gnu_thread.
- C++SIM includes a 20-page paper entitled "Construction and Use of a Simulation
- Package in C++." This paper is available in PostScript format only. The paper
- describes the class hierarchy itself as well as how to further refine the
- simulation package.
- The simulation package requires a threads package and currently works only
- with the Sun lightweight process library or the included GNU thread package.
- The thread library is the only system-specific code, so porting the remainder
- of the code to other UNIX workstations should be easy. C++SIM compiles with
- Cfront 2.1 and Cfront 3.0.1 and GNU g++ 2.3.3
- The C++SIM license grants permission to use, copy, modify, and distribute the
- program for evaluation, teaching and/or research purposes only and without
- fee. The University of Newcastle upon Tyne copyright and permission notice
- must appear on all copies and supporting documentation, and similar conditions
- are imposed on any individual or organization to whom the program is
- distributed.
-
-
- Input-Edit, SORTLIST AVL, and Typing Tutor: CUG #395
-
-
- This volume combines three relatively small but powerful archives on a single
- diskette. Chris Thewalt (University of California at Berkeley, Civil
- Engineering) presents his interactive line editor library. Walter Karas (Cary,
- NC) contributes his implementation of the classic binary search tree with AVL
- balancing. Last, Christopher Sawtell (Linwood, Christchurch, New Zealand)
- releases his Typing Tutor for use with Curses. All three are immediately
- available as CUG volume #395.
-
-
- Input-Edit: CUG #395A
-
-
- Input-Edit, also known as getline, greatly increases the functionality of
- programs which read input a line at a time. With Input-Edit, interactive
- programs that read input line by line can now provide line editing and a
- history buffer to the end user that runs the program. As far as the programmer
- is concerned, the program only asks for the next line of input. However, until
- the user presses the RETURN key he can use Emacs-style line editing commands
- and can traverse the history buffer of lines previously typed.
- Other packages, such as GNU's readline, have greater capability but are also
- substantially larger. Input-edit is small (1200 source lines) and quite
- portable because it uses neither stdio nor any termcap features. For example,
- Input-edit only uses \b to backspace and \007 to ring the bell on errors.
- Since Input-edit cannot edit multiple lines, it scrolls long lines left and
- right on the same line.
- Input edit is written in K&R C and can run on any UNIX system (BSD, SYSV or
- POSIX), AIX, and XENIX, as well as non-UNIX systems such as MS-DOS with MSC,
- Borland Turbo C, or djgpp, OS/2 with gcc (EMX), and DEC VAX-11 VMS. Porting
- Input-Edit to new systems consists mainly of altering the package's character
- read function to read a character when it is typed without echo.
-
-
- Sortlist AVL: CUG #395B
-
-
-
- SORTLIST implements a "sorted list" data structure library in ANSI C. This
- library is appropriate whenever all elements of the sorted list have the
- following characteristics:
- 1. All elements are of a single fixed size.
- 2. Each element is associated with a unique key value.
- 3. The set of key values has a well-defined "less than, greater than"
- relation.
- Symbol tables and dictionary applications are excellent candidates for the
- sorted list data structure. This implementation of a sorted list data
- structure employs an AVL tree. AVL trees were invented by Adelson-Velskii and
- Landis in 1962. Specifically, Karas draws on algorithms presented by Horowitz
- and Sahni in Fundamentals of Data Structures (Computer Science Press). The
- add, find, and delete operations on an AVL tree have worst-case O (n) time
- complexity. SORTLIST version 1.1 (released 8/25/93) is now available on CUG
- volume #395.
-
-
- Typing Tutor: CUG #395C
-
-
- The Typing Tutor for use with Curses is a marvel of compactness. Since it
- builds on the substantial functionality of the UNIX Curses library, the Typing
- Tutor consists of just 250 source lines. Typing Tutor's learning scenario is
- simple, yet easily customizable to fit any lesson plan. Typing Tutor's screen
- displays two windows. The top window contains the "lesson" which is the text
- you will be typing from. The bottom window contains the results of your
- (presumably) touch typing. Every time a character in the bottom window fails
- to match the original in the top window, Typing Tutor flags it by changing the
- screen attribute to flashing. Although Sawtell does not specify compatability
- for Typing Tutor, he expects it to run on any UNIX system with a Curses
- package available.
-
-
- NNUTILS Neural Network: CUG #396
-
-
- NNUTILS, by Gregory Stevens (Rochester, NY), is a public domain package to
- help you to start programming neural networks in C. NNUTILS is a tutorial with
- source code as your textbook. Stevens' intensely documented source code
- contains everything you need to implement several kinds of net architectures.
- NNUTILS gives you a series of simple implementations to let you see how they
- work step by step.
- Each NNUTILS subdirectory contains a different example application with six
- standard C source files and a main program. The source is written in
- ANSI-compliant C and developed primarily under Borland C++ 3.1. Accordingly,
- the CUG distribution includes DOS executables and project files for each
- implementation. Because the code is ANSI compliant, all of the examples work
- with GNU C under UNIX. Building executables with GNU C is simple enough
- without makefiles, so none are included.
- Briefly, here's a summary of problem sets included with NNUTILS:
- NNTEST1: A network with one input node and one output node.
- NNTEST2: A network using the "logistic" activation function (as opposed to a
- linear function).
- NNXOR: A network implementation of the famous Exclusive-Or problem.
- NNSIM1: A generic feed-forward network with back propagation.
- NNWHERE: A 5x5 retina is presented with shapes it must classify correctly.
- NNWHAT: A continuation of above where shapes can assume any position.
- NNEVOLVE: A feed-forward, back-propagation supervised net.
- NNSIM2: Simulates a competitive learning network with a Hebbian learning
- algorithm.
- NNRECUR1: A recurrent net with back propagation, and a kind of supervised
- learning called Self-Correlating, where the input patterns serve as their own
- output patterns.
- NNUTILS versions 1.01 (released 08/02/93) is immediately available as CUG
- #396.
- References
- [1] MacLennan, Bruce J., Principles of Programming Languages: Design,
- Evaluation, and Implementation. New York: H.H. Rhinehart, and Winston. 1983.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Editor's Forum
- The C community is waking up once again. For years, C has been the stable
- alternative, the language of choice where performance, precision, and
- portability count. Want thrills and excitement? Go try your hand at C++.
- That's where most of the clever new ideas get tried out these days. C++ has
- reveled in an adolescent vigor because it could depend on C to provide a
- stable springboard.
- Now the shoe is on the other foot, at least to some extent. Work on revising
- the C Standard is commencing two years earlier than required by ISO rules. ISO
- committee WG14 and ANSI X3J11 both unanimously agreed at their December
- meeting in Kona, Hawaii on an outline for bringing Standard C more up to date.
- Even more important, the joint committee agreed on a number of guiding
- principles for introducing changes to the language. (You'll be reading more
- about those principles in these pages in the months and years to come.)
- Closing the gap with C++ ranked very near the top of that list of principles.
- A strength of C++, sometimes forgotten, is that it remains largely upward
- compatible with C. You can always fall back on the older language and still
- write the "interesting" parts in C++. A weakness of C++, sometimes
- underestimated, is that it thus feels unconstrained. The sentiment among those
- standardizing C++ is that it should handle all coding challenges from the
- world of object-oriented programming, however much they complexify the
- language.
- We self-appointed custodians of Standard C appreciate the real contributions
- made by C++ to the programmer's toolchest. At the same time, we fear losing
- "the spirit of C" in our zeal to keep up with the times. Our safety valve is
- to apply the C++ trick in reverse. We're eager to add classes to C, but
- probably not templates -- polymorphism, but maybe not multiple inheritance.
- Sure there are times when you need those more advanced features. Maybe those
- are the times when you should mix in a little C++ code. The rest of the time,
- let's keep C efficient and conceptually simple.
- Bjarne Stroustrup, the developer of C++, has privately expressed concern to me
- that this new C standardization activity could lead to dueling standards. He
- rightly fears a civil war between the two most popular programming languages.
- I certainly don't want to see anything of the sort come to pass. So long as we
- focus on keeping a large common subset between the two languages, I don't
- think that it will. For years, the watchword of the C++ standards effort has
- been, "as close as possible to C, but no closer." Now the watchword of the C
- standards effort should be, "closer to C++ than you thought possible, but not
- too close."
- P.J. Plauger
- pjp@plauger. com
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- New Products
-
-
- Industry-Related News & Announcements
-
-
-
-
- Liant Ships C++/Views v3.0 Visual Programming Tool
-
-
- Liant Software has begun shipping their C++/Views v3.0 Visual Programming
- Tool. C++/Views v3.0 is an object-oriented development tool that provides
- support for creating and porting GUI applications among Microsoft Windows,
- OS/2 Presentation Manager, OSF/Motif, Apple Macintosh, and DOS character mode
- environments. C++/Views v3.0 combines 100 ready-to-use classes with programmer
- productivity tools. C++/Views v3.0 includes interface, data, event, printer,
- and extended GUI classes. With C++/Views, programmers can create native GUI
- applications because C++/Views uses the local GUIs toolkit.
- C++/Views v3.0 includes a visual development tool, called C++/Views
- Constructor, which lets developers work visually with the C++/Views class
- library. Constructor unites a visual interface builder with an enhanced
- C++/Views Browser, letting users switch between drawing and archiving their
- portable resources to editing code which calls these resources.
- The C++/Views Interface Builder is a WYSIWYG editing tool for designing and
- testing the behavior of portable resources such as binary files of bitmaps,
- dialogs, and menus, or other GUI objects. Portable resources are called from
- an application at run-time. The same resource file can be called from a
- Windows, Motif, Presentation Manager, Macintosh, or DOS application, and will
- have a "look and feel" consistent with the host environment under which it
- runs.
- Other features of C++/Views v3.0 include geometry management and C++/Views
- Browser v3.0. The Browser is an MDI application which lets users cut-and-paste
- among multiple C++ applications. C++/Views v3.0 ranges from $499 to $1,999
- depending on platform. There are no royalties or run-time fees. Upgrades range
- from $149 to $279 depending on platform. For more information contact Liant
- Software Corporation, 959 Concord St., Framingham, MA 01701, (508) 872-8700;
- FAX: (508) 626-2221.
-
-
- Borland Announces ObjectWindows for Novell's AppWare
-
-
- Borland International Inc. has announced that it will combine its
- ObjectWindows Library (OWL) with Novell's AppWare Foundation technology to
- provide developers with C++-based, cross-platform development libraries.
- Borland's ObjectWindows is a set of class libraries for developing
- applications in C++ that provides a set of pre-made objects for developing
- Windows Applications. The Novell AppWare Foundation is a set of C libraries
- that provide developers cross-platform functionality across multiple operating
- systems, graphical interfaces and network services. ObjectWindows for AppWare
- will let developers of applications for Microsoft Windows move their
- applications to Apple's Macintosh, IBM's OS/2, and UNIX platforms.
- ObjectWindows for AppWare is based on ObjectWindows v2.0. ObjectWindows 2.0 is
- a Microsoft Windows specific implementation of the ObjectWindows application
- framework. ObjectWindows for AppWare replaces the Microsoft Windows specific
- implementation with the AppWare Foundation cross platform API. The result lets
- the application run on Microsoft Windows, Macintosh, UNIXWare, SunOS, and
- HP-UX.
- An Early Experience Program is planned for the first quarter of 1994, and both
- companies plan to begin distributing ObjectWindows for AppWare in the summer
- of 1994. For more information contact Borland International Inc., 100 Borland
- Way, P.O. Box 660001, Scott's Valley, CA 95067, (408) 431-5172.
-
-
- The Haley Enterprise Ships Rete++
-
-
- The Haley Enterprise has begun shipping Rete++, a new product that integrates
- rule-based programming with C++. Rete++ supports rule-based expert system
- development by providing rules that match against C++ objects using the Rete
- Algorithm.
- Rete++ provides both forward and backward chaining, and includes a
- comprehensive class hierarchy for convenient development of rule-based C++
- applications. Rete++ generates C++ class taxonomies for use in C++
- applications, creating C++ header files with class definitions. Rete++ also
- generates a class for each type of object that an application references in
- any rule. C++ applications can use the Rete++-generated classes directly or
- can further subclass them as needed.
- With Rete++, an application can assert facts either via the rule-based
- deffacts statement or directly from C/C++, C++ operator overloading allows
- elements to be expressed using standard C++ syntax, eliminates coding of
- complex, type-specific conditional logic, and reduces the need for
- operation-specific function names. Rather than invent a strongly-typed
- C++-like syntax for rules, Rete++ retains the traditional parenthesis
- delimited OPS5/ART/CLIPS/Eclipse syntax.
- Rete++ supports Microsoft Windows, Microsoft Windows NT, SUN Solaris,
- NeXTStep, Apple's Macintosh, and HP-UX. For more information contact The Haley
- Enterprise, Inc., 413 Orchard St., Sewickley, PA 15143,
- (412)967-1100;FAX:(412)741-6457.
-
-
- Applied Microsystems Announces X-Windows GUI for Real-Time Debugger
-
-
- Applied Microsystems has announced debugging support for Microtec Research
- Inc.'s XRAY MasterWorks. The debugger, MWX-ICE (Multi Windows XRAY for In
- Circuit Emulation), is based on MRI's multi-windows XRAY debugger and includes
- in-circuit emulator additions. MWX-ICE can be used as a stand-alone product or
- connected to the XRAY MasterWorks environment.
- MWX-ICE debugger gives the user control over execution of C, C++ and assembly
- language programs. Separate windows for execution trace history, registers,
- and stacks lets the user browse, and make modifications. MWX-ICE lets the user
- display and change variables, including structures and unions. User-defined
- macros can automatically be called when a breakpoint is encountered. Control
- of the emulator's overlay capability, breakpoint, and event system is
- provided. Supported load formats include IEEE 695, COFF, and A.OUT. Other
- features of MWX-ICE include: a Motif-based user interface which combines
- debugger controls with function notebooks; configurable control panel buttons
- that activate various debugging commands; and an on-line, context-sensitive
- hypertext help system.
- MWX-ICE supports Motorola familys of 68000, 68EC/HC000, 68302,
- 68020/030/EC030, 68330/340, 68331/332, 68360/EN360, and Intel's 80960CA/CF
- processors. MWX-ICE is $3000. For more information contact Applied
- Microsystems Corporation, 5020 148th Ave. N.E., P.O. Box 97002, Redmond, WA
- 98073, (206) 882-2000 or (800) 426-3925; FAX: (206) 883-3049; Telex: 185196.
-
-
- Optimite Systems Ships PC_Opt
-
-
- Optimite Systems has begun shipping PC_Opt, a post compile optimizer. PC_Opt
- analyzes across object modules and run-time libraries and performs
- optimizations on object code. The optimizations performed by PC_Opt are in
- addition to traditional compiler-based optimizations. These additional
- optimizations are made possible by analyzing the complete object code, as
- opposed to isolated modules in compiler-based optimization.
- Features of PC_Opt include: converting far call and return instructions to
- near call and return instructions, converting stack clearing code from C to
- Pascal format, and registering parameter passing. PC_Opt also deletes
- unreachable code. PC_Opt requires no source code modification or setup. PC_Opt
- lets users specify the set of input object module files that would normally be
- specified during the link process. PC_Opt output is in ready-to-link OMF
- standard object module format.
- PC_Opt supports object modules and run-time libraries generated by Borland C++
- v3.1 (C source only), and other standard OMF format object modules. PC_Opt is
- $15. For more information contact Optimite Systems, 1000 Singleton Blvd.,
- Dallas, TX 75212, (214) 745-1301; FAX: (214) 747-3614.
-
-
- Gimpel Releases PC-lint for C/C++
-
-
- Gimpel Software has released PClint for C/C++. PC-lint for C/C++ will analyze
- a mixed suite of C/C++ programs and report on bugs, glitches, and
- inconsistencies. PC-lint provides a number of C++-specific checks that include
- reminders to virtualize inherited destructors, and to create custom assignment
- operators and copy constructors for classes. Other checks include new and
- delete imbalances, name hiding, and unusual but legal constructs. In addition,
- PC-lint for C/C++ provides the same checks for C as PC-lint for C. These
- checks include strong type checking, a control-flow based analysis of variable
- initialization, loss of precision, strange uses of Booleans, unaccessed
- variables, unusual macros, and unused program components.
- PC-lint for C/C++ is compatible with Borland C/C++ and Microsoft C/C++ and
- their respective Application Frameworks. PC-lint for C/C++ is based on the
- Annotated C++ Reference Manual (ARM), and is tracking the ANSI/ISO X3J16
- standardization process, including templates and exceptions. PC-lint for C/C++
- provides both a DOS-OS/2 bound executable and a 386 DOS-extended executable.
- The cost to license PC-lint for C/C++ is $239. For more information contact
- Gimpel Software, 3207 Hogarth Lane, Collegeville, PA 19426, (215) 584-4261.
-
-
-
- Spectrum Systems Releases ObjectBase
-
-
- Spectrum Systems, Inc. has released ObjectBase, an object-oriented C++ class
- library for relational databases. When using ObjectBase, tables or groups of
- tables may be accessed as objects. Data redundancy is hidden from the user.
- According to the company, the programming task of interfacing to the DBMS is
- simplified, without taking away the programmer's ability to access the
- database directly through traditional routes, e. g. SQL.
- Features of ObjectBase include: transparent shared connections and shared
- connection groups; transparent multiple connections to multiple servers; data
- retrieval via SQL and stored procedures; blocking, non blocking, and
- asynchronous data retrieval; program independence from details of column
- definition; transactions; and customizable error handling. Objectbase can
- create an error handler that will attempt to re-establish lost connections
- between client and server, without intervention by the user. ObjectBase can
- also automatically generate both the class description (the header file) and
- the class implementation (the source code) for tables from the schema
- description.
- ObjectBase supports Sybase SQL Server and Microsoft SQL Server. ObjectBase
- developer/run-time licenses range from $899 to $2,699. For more information
- contact Spectrum Systems, Inc., Woodfield Corporate Center, 425 N. Martingale
- Rd, Suite 800, Schaumburg, IL 60173, (708) 330-3797 or (708) 706-3800; FAX:
- (708) 706-3700.
-
-
- Rogue Wave Upgrades Tools.h++
-
-
- Rogue Wave Software, Inc. has upgraded their C++ class library, Tools.h++.
- Tools.h++ v6.0 includes "Internationalized" support which provides the ability
- to work with world-wide, local character sets. With internationalization,
- programmers can write a single application that can be shipped to many
- countries. This application, when executed, will be able to process times,
- dates, strings, and currency in the native format.
- Features of Tools.h++ v6.0 include: multi-byte and wide character strings;
- classes that parse and format times, dates, and currency in multiple locales;
- and classes that support multiple time zones, daylight savings rules, and
- localize I/O streams and messages. Other features of Tools.h++ v6.0 include
- supporting exceptions as specified by the emerging draft of the ANSI C++
- standard, support for multiple-threaded programs, and an RWCString class.
- Tools.h++ v6.0 is available on Windows NT, Macintosh, DOS, Windows 3.x, OS/2,
- and UNIX. Tools.h++ v6.0 ranges from $299 to $395 depending on platform.
- Upgrades are available to current users of the Tools.h++ library at $99 for
- DOS, and $135 for Windows 3.x, OS/2, and UNIX. In addition to Tools.h++ v6.0,
- Rogue Wave will be providing ToolsPro.h++. Rogue Wave ToolsPro.h++ contains
- the implementation of the Tools.h++ library, plus test suites for the
- Tools.h++ library. For more information contact Rogue Wave Software, Inc.,
- P.O. Box 2328, Corvallis, OR 97339, (503) 754-3010; FAX: (503) 757-6650.
-
-
- Computer Innovations Releases C++ Language v3.0 for UnixWare
-
-
- Computer Innovations, Inc. has released C++ Language v3.0 for UnixWare, a
- USL's industry standard language, and combines both super-C functionality and
- object-oriented programming. In addition to supporting templates as part of
- the compiler, C++ Language Releases v3.0 for UnixWare produces object code
- that is optimized for the Intel 486/Pentium platform.
- Standard UNIX utilities are used to install C++ Language v3.0 for UnixWare,
- and all necessary header files are built specifically for the UnixWare system.
- C++ Language v3.0 for UnixWare is $349. For more information contact Computer
- Innovations Inc., 1129 Broad St., Shrewsbury, NJ07702, (908) 542-5920; FAX:
- (908) 542-6121.
-
-
- Alsys Announces Object CM
-
-
- The Alsys CASE Division has announced Object CM, an object-oriented
- configuration management system. Object CM integrates an object base which is
- compliant with Portable Common Tools Environment (PCTE).
- Features of Object CM include a PCTE-based object-oriented repository, a
- graphical use interface, an object browser, a system administration facility,
- and a variety of productivity tools. Integral to Object CM is a single control
- facility, which provides for the creation and tracking of trouble reports and
- modification requests, and their correct association with the software
- objects.
- ObjectCM can be extended to support process control over activities such as
- software design, coding, documentation, project management, and software
- reuse. Object CM interfaces with third-party CASE and documentation tools, and
- overlays a control structure which manages their use throughout the project.
- Object CM is the core module in the FreedomWorks family of CASE integration
- products. Object CM is licensed for $2,500 to $3,500 per seat, depending on
- quantity. For more information contact Alsys CASE Division, 10251 Vista
- Sorrento Pkwy, Suite 300, San Diego, CA 92121, (619) 457-2700; FAX: (619)
- 452-2117.
-
-
- DC Micro Development Ships Crusher! Data Compression Toolkit
-
-
- DC Micro Development has begun shipping the Crusher! Data Compression Toolkit.
- Crusher! is designed for use with C and is compatible with both DOS and UNIX
- systems. Crusher! provides portable source code, and produces multi-file
- archives compatible with both DOS and UNIX. Crusher! supports Borland C/C++
- v3.1 and Microsoft/C++ v7.1 compilers for DOS, and ANSI or K&R C for UNIX
- System V.
- Features of Crusher! include: layered design; buffer allocation; compression
- of ASCII and binary data; 32-bit CRC file integrity checking; multiple-file
- archive support; UNIX-style wildcard support for DOS applications; full
- support for subdirectories; user-definable callback functions; ARQ source
- code; and portability to other architectures. According to the company,
- "Typical compression ratios (when using Crusher!) are 50%, while many
- database, spreadsheet, and ASCII files compress to 20% or less of their
- original size."
- Crusher! Data Compression Toolkit is $239.90 including source code. An
- evaluation version with full documentation is available for $35. For more
- information contact DC Micro Development, 3554 Creekwood Dr. #7, Lexington, KY
- 40502, (606) 268-1559; FAX: (606) 266-0726; BBS: (606)268-1251.
-
-
- FSI Announces HyperC Compiler
-
-
- Fortunel Systems, Inc. (FSI) has announced a compiler for a new, portable
- data-parallel language, HyperC. HyperC was built as an extension to C that
- allows programmers to express the parallelism of an application, and run the
- application on (massively) parallel systems. HyperC was designed with the
- goals of efficient expression and compilation of parallel programs across
- different architectures. Constructs in the language emphasize: increasing
- local computation, minimizing communication requirements, and overlapping
- communications with computations.
- Data parallelism is achieved by applying the same operation across all the
- elements of a data set. HyperC provides "collections" as data sets for data
- parallel operations. Given overloading of operations and functions, a compact
- syntax can express data parallel computations. As an extension to C, HyperC
- supports incremental parallelization of existing C programs.
- HyperC is available for workstations including Sun, SGI, HP, and DEC. A PVM
- version is being developed and is planned for release by the end of March,
- 1994. For more information contact Fortunel Systems, Inc., 1135 Kildaire Farm
- Rd., Ste. 311-5, Cary, NC 27511, (919) 319-1624; FAX: (919) 319-1749; e-mail:
- fortunel@vnet.net.
-
-
- Greenleaf Introduces Archivelib
-
-
- Greenleaf Software, Inc. has introduced ArchiveLib, a Windows compatible data
- compression and archive library for C/C++ programmers. ArchiveLib is an
- object-oriented data-compression run-time library, with equivalent C functions
- for C developers. Using a two-stage process, consisting of two different
- data-compression methods, ArchiveLib compresses ASCII or binary data into an
- archive for storage.
- The series of 100 ArchiveLib functions maintain language independence and
- under Windows, ArchiveLib is also available as language independent DLL. With
- ArchiveLib, programmers can compress and archive buffers of data within their
- applications without having to store them as a file. Compressed data can be
- retrieved into either a disk file or a memory buffer. Users can also create
- their own data objects. The archiving features found in ArchiveLib can serve
- several functions: transmitting or storing files via a medium that does not
- support a file system; moving files from one operating system to another;
- distributing software to customers; and storing internal program data.
- ArchiveLib is $279 and includes full source code (excluding the proprietary
- data compression algorithm) and a technical support package. For more
- information contact Greenleaf Software, Inc., 16479 Dallas Pkwy. Ste 570,
- Dallas, TX 75248, (214) 248-2561; FAX: (214) 248-7830.
-
-
- Century Computing Releases TAE Plus v5.3
-
-
-
- Century Computing has released TAE Plus v5.3. TALE Plus v5.3 is a portable
- software development environment that supplies both development tools for
- creating GUIs and management tools for controlling their application's user
- interface at run-time.
- Features of TAE Plus v5.3 include: the TAE Plus Workbench which lets the user
- lay out an interface interactively; Code Generator and Code Merge which
- generate the code in C, (ANSI or K&R), C++, or ADA; Dynamic Data Objects which
- let the designer create GUIs that use dials, gauges, pictures, maps, switches,
- icons, or animation to communicate with users; prototyping which lets the
- developer generate prototype interfaces, and iteratively test and refine them;
- and automated scripting which provides the capability for automated testing,
- on-line demos, tutorials, and useability testing.
- TAE Plus v5.3 supports Sun OS (both Motif v1.1.4 and v1.2). TAE Plus v5.3
- licensing agreements are based upon the number of TAE developers in a
- workgroup. Fees range from $2,250 to $11,200, with discounts for workgroups of
- more that 15 developers. For more information contact Century Computing Inc.,
- 1014 West Street, Laurel, MD 20707, (301) 953-3330 or (880) 823-3228.
-
-
- Odyssey Ships ISYS Developers' Toolkit
-
-
- Odyssey Development, Inc. has begun shipping the ISYS Developer's Toolkit, a
- search engine for developers and OEMs. With the toolkit, users can integrate
- the ISYS text retrieval engine with applications such as CD-ROM authoring,
- electronic publishing, document preparation, and image management.
- ISYS Developer's Toolkit provides access to text information residing in word
- processor and other files. Users key in a word or phrase, and ISYS finds the
- word or phrase in the available files. ISYS can read 28 word processor
- formats, as well as some spreadsheet and databases file formats. Documents to
- be searched remain in their native formats. The Developer's Toolkit also lets
- OEMs develop External Access Modules, which lets them extend the functionality
- of ISYS with their own specific data access interfaces. ISYS can then access
- and index "foreign" data sources such as text stored in relational databases.
- The ISYS engine is available for Microsoft Windows and DOS. The engine is
- callable from many languages. Sample code is provided for C, Pascal, and
- Visual Basic. For more information contact Odyssey Development, Inc., 650 S.
- Cherry St., Suite 220, Denver, CO 80222, (303) 394-0091; FAX: (303) 394-0096.
-
-
- ParaSoft Announces Insight v1.1 and Inuse
-
-
- ParaSoft Corporation has announced Insight v1.1 and Inuse, components of
- ParaSoft's TQS (Total Quality Software) package. Insight v1. 1 is a run-time
- debugger which supports finding the bugs in software after compiling or
- relinking it. Features of Insight v1.1 include: checking for uninitialized
- memory accesses at compile and run-time; incremental checking without
- recompilation; linkable interfaces which let the user extend and customize
- error checking; stack tracing in error reports; error suppression; support for
- the animation of dynamically allocated memory blocks; real-time animation of
- an application's dynamic memory allocation; "Total Coverage Analysis," which
- lets the user see how the code has been tested; and support forX Window System
- v5.0 and Motif v1.2.
- A modular component of Insight v1.1 is Inuse, a graphical utility which
- provides real-time animation of the dynamic memory allocation requests in an
- application. Inuse provides feedback on algorithms which "leak" memory and can
- help the user allocate memory.
- Insight v1.1 supports Sun IBM, DEC, HP, and SGI platforms. Insight is $995.
- For more information contact ParaSoft Corporation, 2500 E. Foothill Blvd.,
- Pasadena, CA 91107, (818) 792-9941; FAX: (818) 792-0819; e-mail:
- insight@parasoft.com.
-
-
- Pure Software Releases Purify for HP PA-RISC and Sun Solaris 2.x
-
-
- Pure Software Inc. has released Purify for HP PA-RISC workstations and Sun
- SPARC workstations running Solaris 2.x. Purify detects run-time errors in C
- and C++ UNIX applications. Purify uses Pure Software's Object Code Insertion
- (OCI) technology. OCI examines the object code and inserts checking
- instructions around every memory function to monitor usage at run-time and
- report illegal memory accesses and leaks. OCI lets Purify analyze the
- application including shared and third-party libraries. Purify is also
- integrated with HP's SoftBench development environment.
- Purify for HP PA-RISC and Sun SPARC running Solaris 2.x is $1,298 per license.
- For more information contact Pure Software Inc., 1309 S. Mary Ave., Sunnyvale,
- CA 94087, (408) 720-1600; FAX: (408) 720-9200; e-mail: info@pure.com.
-
-
- MKS Announces MKS Toolkit v4.2 for DOS
-
-
- Mortice Kern Systems Inc. has announced MKS Toolkit v4.2 for DOS. MKS Toolkit
- v4.2 for DOS includes USENET news support for sending and receiving electronic
- news, including nr, a visually oriented mail and news reader. MKS Toolkit also
- includes two Windows applications, Vi for Windows with functional scroll bars,
- font selection, mouse support, and sizable windows; and Visual Diff for
- Windows with several options for viewing files differences. Other features of
- MKS Toolkit v4.2 for DOS include: ASPI-compatible SCSI tape drive support in
- which the pax backup utility conforms to IEEE POSIX.2 standard and supports
- the standard tar and cpio formats; 32-bit utilities; and a program launcher
- for Windows.
- MKS Toolkit v4.2 for DOS is $299. The upgrade is $99. For more information
- contact Mortice Kern Systems Inc., 35 King Street N., Waterloo, Ontario,
- Canada N2J 2W9, (519) 884-2251; FAX:(519)884-8861.
-
-
- Blue Sky Upgrades RoboHELP
-
-
- Blue Sky Software Corporation has announced its support for Word v6.0 for
- Windows with its upgraded version of RoboHELP. RoboHELP v2.6 is a help
- authoring system for Windows and Windows NT. Features of RobeHELP v2.6
- include: automatically converting existing text into a Help system or a Help
- system into user documentation; creating topics and jumps at any time by
- adding a graphical tool palette to Word v6.0; and creating hotspot graphics by
- placing a bitmap in a Help window. Other features of RoboHELP v2.6 include:
- access to the Windows Help Engine; VBX controls and custom controls; Error
- Wizard; and a Simulation Mode which lets the user test changes without
- recompiling. RoboHELP v2.6 includes the Help compilers, and runs the Help
- compilation under Windows.
- RoboHELP v2.6 is $499. Registered uses for RoboHELP v1.0 can upgrade for $229.
- For more information contact Blue Sky Software Corporation, 7486 La Jolla
- Blvd., Suite 3, La Jolla, CA 92037, (619) 459-6365; FAX:(619)459-6366.
-
-
- Computer Mindware Introduces Visual FEDIT v2.7
-
-
- Computer Mindware Corporation has introduced Visual Formatted EDIT (Visual
- FEDIT v27). Visual FEDIT v2.7 is a Custom Control Dynamic Link Library that
- supports many data types including: strings, numeric, date, time, Boolean, and
- unformatted multi-line text. Visual FEDIT v2.7 supports VBX format for
- integration into Visual C++ v1.0 or Visual Basic v2.0/3.0 applications, and
- includes a Smalltalk class library, a wrapper for Smalltalk/V.
- Features of Visual FEDIT v2.7 include support for class encapsulation for C++,
- data binding for VB, and read-only fields. Visual FEDIT v2.7 also supports
- required and optional fields, null and default values, auto-validation,
- run-time dynamic reformatting, named fields for links to database fields,
- user's extension flags, and dialog and field level support. Visual FEDIT v2.7
- is $159. Source code is available. For more information contact Computer
- Mindware Corporation, 36 Trinity Place., E. Hanover, NJ 07936, (201) 884-1123.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- We Have Mail
- Dear Dr. Plauger:
- I am writing in response to P.J. LaBrocca's recent article "Dynamic
- Two-Dimensional Arrays" (November 1993 issue of The C Users Journal). On page
- 77 of that article, LaBrocca mentions that his dyn2darray routine suffers from
- limited portability because of memory-alignment problems. I have found a
- method of implementing LaBrocca's two-dimensional array allocation without his
- memory-alignment problem. The method I describe below maintains a crucial
- advantage of dyn2darray, it still requires only one call to calloc. The key
- idea is to determine which memory addresses are suitably aligned for storing
- the objects. The allocated memory is used to store three different data types:
- pointers to void, unsigned integers, and objects of unknown type. This might
- necessitate leaving gaps between objects of different types to achieve proper
- memory alignment.
- I start with the following question: How do we know at which addresses we can
- safely store the objects of the 2-D array?
- Let obj_size = sizeof(object type).
- Let p be a character pointer to dynamically allocated storage.
- C allows us to store a one-dimensional "array" of these objects starting at
- address p. From this it follows that the addresses
- p + k * obj_size (k = 0,1,2,...)
- are properly aligned for storing these objects (as long as we don't go beyond
- the allocated region of memory). For example, if the object has type double,
- sizeof (double) = 8. (Suppose this to be the case on some machine.) Then
- p, p+8, p+16, p+24, ...
- are valid addresses at which to store a double.
- This example shows how to allocate a 2-D array of doubles so that the pointers
- and the objects are stored in the same allocated memory region while
- preserving proper memory-alignment. For the moment I will not worry about the
- added complication of storing the number of rows and columns. Suppose that we
- want to allocate a 2-D array of doubles given the following conditions:
- sizeof (double *) = 2, rows = 3,
- columns = 4, sizeof (double) = 8
- Storage for the pointers uses up 14 characters of memory. The first available
- spot after the pointers at which we can store the doubles is at p+16. This
- means that a gap of two characters must be left between the pointers and the
- doubles to ensure proper alignment of the doubles. In general, we would store
- the objects at the address
- p + DynRndUp(SpaceForPointers, obj_size)
- where
- SpaceForPtrs = rows * sizeof (void *)
- and
- Dyn2dRndUp(i,j)
- is a macro that rounds i up to the nearest multiple of j. (i.e., Dyn2RndUp(14,
- 8) = 16).
- Of course, we still have to consider how to store the number of rows and
- columns between the pointers and objects. The first address at which we could
- store these values is given by:
- p + SpaceBeforeRowsAndCols
- where
- SpaceBeforeRowsAndCols =
- Dyn2dRndUp(SpaceForPtrs,
- sizeof (unsigned))
- The amount of space used by the pointers and the two unsigned values (number
- of rows and number of columns) is:
- SpaceForPtrsRowsAndCols =
- SpaceBeforeRowsAndCols
- + 2 * sizeof(unsigned)
- Finally we can begin storing the objects at the address
- p + SpaceBeforeObjects
- where
- SpaceBeforeObjects =
- Dyn2dRndUp(SpaceForPtrsRowsAndCols,
- obj_size)
- If there is a substantial gap between the end of the pointers and the
- beginning of the objects there may be several locations at which we could
- store these two unsigned values. To make sure that we can recover these two
- values (given only the 2-D array) we must store them as close to the objects
- as possible. An example will clarify the matter. Suppose:
- rows = 3, cols = 2,
- sizeof (void *) = 2,
- sizeof (unsigned) = 4,
- obj_size = 22
- These sizes were chosen to illustrate the most general case. The pointers use
- up the first six characters of memory. The first address available to store
- the number of rows would be p+8. The objects can be stored starting at p+22.
- The first two columns in Figure 1 show that we have two choices as to where to
- store the number of rows and columns.
- In this example, SpaceBeforeObjects = ((char **) p)[0] - p is 22. If you redo
- this example setting rows = 5 you will also get 22 (see third column in the
- diagram above). Therefore, storing the number of rows and columns as close to
- the pointers as possible makes it impossible to recover them later. The first
- and third columns in the diagram above makes it is clear that the location of
- these two values cannot be determined by the amount of space before the
- objects.
- Storing these two values as close to the objects as possible solves this
- problem. We simply round down the space before the objects to the nearest
- multiple of sizeof (unsigned). In the example above (columns 2 and 3) the
- space before the objects (22) is rounded down to the nearest multiple of
- sizeof (unsigned) (4) to get 20. The values for the number of rows and columns
- are to be stored immediately before this offset (p+20).
- In general, the expression
- p + Dyn2dRndDown(SpaceBeforeObjects,
- sizeof (unsigned))
- - 2 * sizeof (unsigned)
- where Dyn2dRndDown(i,j) is a macro which rounds i down to the nearest multiple
- of j, gives us a pointer to the beginning of the row and column data. In the
- above example we would get
- p + Dyn2dRndDown(22,4) - 2 * 4 = p + 12
- I provide new versions of dyn2darr.c and dyn2darr.h [available on the monthly
- code disk -- pjp].
- Finally, I should mention that one small portability problem still remains.
- Both my version of the code and LaBrocca's make the assumption that all
- pointers have the same size and representation. My understanding is that this
- is much less of a portability issue than is the memory-alignment problem. (I
- have seen several C books write "portable" code which makes this assumption).
- P.S. I hope this proves useful to your readers. I do not have my own email
- address but I can be reached at the address shown below, or at jessica @
- engin.umich.edu.
- Steve Coffman
- C-TAD Systems, Inc.
- Boardwalk Office Center
-
- 3025 Boardwalk Drive
- Ann Arbor, Michigan 48108
- (313)-665-3287
- Whew! I think you illustrate neatly why LaBrocca saw fit to sidestep the
- storage alignment issue. You can also sidestep the problem of different sizes
- of data pointers by storing only pointers to void. Still, beyond a certain
- point, the investment in potential portability starts getting hard to justify.
- -- pjp
- Hi Bill,
- I've just finished reading the November CUJ -- very entertaining as always --
- and I couldn't help notice the inside back page ad:
- Sequiter Software Inc. says, "As with C, ANSI C++ is an international standard
- across all hardware platforms. This means you can port CodeBase++ applications
- between DOS, Windows, NT, OS/2, Unix, and Macintosh -- today."
- Sigh! The BSI jumped up and down on a few advertisers over stuff like this in
- the days before validated C compilers were available. Perhaps someone should
- have a word with the folks at Sequiter?
- See you in San Jose?
- Regards,
- Sean Corfield
- Development Group Manager
- Programming Research, England
- Sean.Corfield@prl0.co.uk (44) 372-462130
- Yeah. People aren't supposed to claim conformance to a standard until it's
- approved. In the case of C++, it's particularly daring to refer to a putative
- "international standard." -- pjp
- Greetings,
- Enough of the language standards and extensions stuff already. Compare and
- contrast:
- Applications: Sequiter Software's CodeBase 5.0 vs Kedwell's DataBoss
- Libaries: Greenleaf's SoftC Database Lib vs Software Science's Topaz
- GO BROWNS!
- Sincerely,
- Noah Hester
- nbh@cis.csuohio.edu
- Your wishes are noted, except for the part about the Browns. -- pjp
- Dear PJP:
- In a letter published in the November 93 CUJ, you mention the problem of using
- sizeof in preprocessor statements, something most ANSI compilers don't allow.
- Mr. Plauger offered a solution:
- static char junk[sizeof(structname) !=132
- ? 0 : 1];
- but also offered the caveat that it wastes a byte of storage. I've been using
- a similar solution for several years that doesn't waste any storage:
- typedef struct {
- // ensure sizeof(structname)
- char x[sizeof(structname) == 132];
- // is exactly 132 bytes.
- } _size_check_structname_;
- Because the statement is a typedof, no storage is allocated. The == operator
- is guaranteed to generate a 0 or 1 result (on an ANSI compiler). Even on a few
- compilers I've encountered which have an extension which allows a zero-sized
- array to appear as the last element in a structure, an error message is
- generated because the size of the structure overall cannot be zero. The error
- message you get from this construct varies between compilers, but it rarely
- indicates what the real problem is, so comments in the code are essential.
- (The fact that the typedef name is _size_check_something_ helps. Using a
- similar standard naming convention throughout a project is probably a very
- good idea.)
- Other checks are possible using this method. For example, in a project once I
- had special 16-byte-at-a-time block zero and block move routines for
- performance. To safely use them on structures, I included the check:
- typedef struct {
- // ensure sizeof(memnode)
- char x[(sizeof(memnode) & 0x0F) == 0];
- // is a multiple of 16 bytes.
- } _size_check_memnode_;
- I've encountered several C programmers who hated such compile-time assertions
- in source (or header) files; perhaps they never make mistakes, or they enjoy
- long debugging sessions. While it is rare to make a mistake which causes one
- of these assertions to fail, the hours saved when it happens are worth the
- minutes it takes to code them.
- Ian Lepore
- Moderator, BIX c.language conference
- ianl@bix.com
- I like your solution better than mine. Thanks for telling us about it. -- pjp
- Dear Sir
- Since a long time ago, I'm an avid reader of the articles that you write in
- Dr. Dobbs, C Users Journal, etc. I enjoy every piece, especially with your
- style of writing.
- I just want to say thanks for your great writing, and for being such a great
- researcher in the computer area (and the like).
- Your Friend,
- Leo Medellin
- 0______LINEEND____
- _.>/)_____LINEEND____
- leo.medellin@asb.com * (_) \(_)....
- leo.medellin%bbs@quake.sylmar.ca.us
- ak467@FreeNet.HSC.Colorado.EDU
- Thanks. And I like your bicyclist. -- pjp
- Mr. Plauger
- Thanks for continuing to produce an interesting magazine. I have been a
- subscriber for about 5 years. Some points/comments:
-
- 1. Articles such as "Code Capsules" by Chuck Allison are useful -- we must all
- remember that new, young C programmers join the ranks and have missed all the
- useful information contained in the early issues of the Journal.
- 2. Linux -- a Unix System V clone which runs on a PC, costs virtually nothing
- and includes Emacs, Latex, and X together with the GNU C and C++ compilers --
- is becoming very popular. Is there any possibility of some coverage for Linux
- in your magazine?
- 3. Dr Dobb' s Journal have produced a CD-ROM containing all articles from
- January 1988 to June 1993 together with text search facilities. Is there any
- chance that such a product will be produced by R&D as I feel that many
- subscribers would find this of interest.
- Yours sincerely,
- David Richards
- 184 Turf Pit Lane
- Moorside
- Oldham
- OL4 2ND
- ENGLAND
- (1) I like Chuck's writing too. Glad you appreciate the function it serves in
- this magazine. (2) I'll happily entertain proposals for articles in Linux. (3)
- We've been exploring numerous ways to make the material from CUJ more
- available to our readers, but I don't have an answer for you yet on this
- topic. -- pjp
- Dear Mr. Plauger,
- I really enjoy reading The C Users Journal. There is one glaring omission in
- the C++ I/O streams library: a reset manipulator to set the stream back to
- default mode. Any function (except for main) has no knowledge of what flags,
- fill character, and precision are set for any streams it receives from the
- caller. To make sure there are no surprises, it has to explicitely set all the
- flags, the fill character and the precision to the values it needs. Having a
- reinit manipulator would make this much easier. Now let's take it one step
- further: what happens when the function returns control back to the caller.
- The mode of the stream may have changed, and the caller has to re-set
- everything. What a mess! Of course the called procedure should undo any
- changes it has made, so it has to save the mode on entry and restore it on
- exit, adding a couple of lines to every function. It seems obvious that we
- need save and restore manipulators to do the job. Of course you can implement
- the reinit, save, and restore manipulators, but this is such a universal need
- that I don't understand why they're not part of the standard library. Right
- now everyone who uses I/O manipulators has to reinvent the wheel on their own.
- Incidentally, C Standard I/O has a clear advantage here because it's modeless.
- Sincerely,
- Hans Salvisberg
- Salvisberg Software & Consulting
- Bellevuestr. 18
- CH-3095 Berne
- SWITZERLAND
- The nearest thing to what you want in the current C++ library draft is
- ios::copyfmt, which lets you copy just the formating information between one
- ios object and another. -- pjp
- Dear PJP:
- I wish to comment on the article "A Revision Control System for MS-DOS",
- published in the July 1993 issue of The C Users Journal. There are two errors
- that will cause people a lot of grief. The function print_warning listed on
- page 48 declares the variable string as a character pointer, but doesn't
- assign it a value. It is then used in a call to fgets as the buffer location.
- This will lead to the data fgets reads being written to who knows where, and
- may cause serious problems. It caused my system to re-boot. The same type of
- error exists in the function rev_number, listed on page 50.
- Another concern I have about the code presented in the article is the lack of
- checks for unexpected end-of-file conditions. The first thing I put under RCS
- control, after fixing the above mentioned fault, was the RCS source code. I
- believe I then used checkout to get a copy of a file, and my system hung. The
- reason the system hung was that the editor I used to create the source files
- did not require that the file end with a newline character, so the RCS file
- did not end with a line containing the delimiter, but with a line containing
- the } character followed by the delimiter. Since there were no checks for EOF
- on the input file, the system kept calling fgets to get the next line, and the
- check for the delimiter always failed.
- I also worry about the lack of checks for write failures. It appears that
- there could be serious problems of writes are attempted and the disk is
- already full, though I must admit, I have not seen this problem.
- J.P. Schoonover
- (708) 979-7907
- It is always interesting to see what you have to do to code prepared for
- presentation when you start using it seriously. Or code tested on one system
- when you move it to another. -- pjp
- Dear PJP:
- Over the years I've gotten much useful code and advice from CUJ. However,
- lately the quality of the published code has decreased significantly.
- As an example, consider the last two articles on exceptions CUJ has published.
- While I do not wish to single out these authors, neither of these packages
- compiled without significant modification on any popular workstation or PC
- operating system, nor worked as advertised once compiled. In addition, neither
- package (on the code disk or as available from the Internet) included any
- installation instructions. It seems obvious the articles were accepted on the
- basis of perceived interest and not the portability or functionality of either
- package.
- I think all but the most basic of packages offered should include installation
- instructions and dependencies. Both exception packages have substantial
- requirements for non-standard development packages (such as a specific version
- of gmake). Perhaps a "tools and rules" sidebar containing instructions for
- building and using the code would solve this problem.
- A more significant problem is the poor performance and portability of the
- code. While I would not expect production or GNU quality code from CUJ, the
- functionality advertised should be present and hopefully relatively bug free.
- I was especially disappointed with the most recent package because of the
- attractiveness of an exception mechanism portable between C and C++. After
- much work by myself and the author I was able to compile this package but then
- discovered substantial run-time problems. Test cases were not present and the
- samples provided with the package would not even compile -- due to undefined
- symbols rather than any obscure portability issue. I commend the author for
- all the help he gave me but why did CUJ publish a package seemingly without
- looking at the source or attempting to build it? Good examples of previous
- high quality CUJ articles include the socket library and generic object
- packages. Both were simple enough to compile on any operating system offering
- TCP/IP services or a C++ compiler and are robust enough to have become a
- standard part of my programming toolbox. Both of the articles I've singled out
- offered functionality of great usefullness but neither delivered on their
- promises nor did the articles contribute substantially to the understanding
- you could gain from a quick reading of any number of C++ or EIFFEL books.
- If CUJ is to be a pragmatic magazine for professional programmers and not a
- fluff publication or academic journal a la Communications, its offerings
- should set the standard for well-executed, portable code. Professionals as
- well as beginners could benefit from the example such a publication would set.
- C. Justin Seiferth
- Phillips Laboratory
- (505) 846-0561 (V)
- (505) 846-0473 (F)
- seiferth@lyra.plk.af.mil
- We do indeed make some effort to pick articles that have code which is both
- useful and reasonably correct. Sadly, we (I) don't always guess right. And we
- lack the resources to compile and test all the submitted code, or even verify
- that they are easy to install and run cursory examples. I wish it were not so.
- On the other hand, your experience with one particular author is not unique.
- Often, our readers tell us that authors make extraordinary efforts to assist
- potential users of their code. I am pleased that our contributors are so
- willing to follow through on their submissions.
- Both this and the preceding letter underscore the essential problem of using
- other people's code. There is a tremendous variation in robustness,
- portability, and ease of use. I'm not casting aspersions on the talents of our
- contributors when I say this -- what is a good design decision for one person
- may be an incredibly poor decision for someone else. We can only hope that
- most of the articles we run are useful to many of our readers much of the
- time. We'll keep trying. -- pjp
- Dear Mr. Plauger,
- In the August 1993 C Users Journal article "Automated Unit Testing," Mr.
- Meadows lists several guidelines he recommends. The first is "Include all test
- code inside a main program, that is, inside a #ifdef TESTMAIN block."
- Well, that approach just does not work that well. Having developed and
- maintained several large products over a number of years, I have found it
- better to have truly independent test stubs. Aside from not having lint
- complain about multiple mains, having one test program that exercises all the
- functions in a library is much more useful and compact. In addition, golden
- output of each test function is easier to manage if it is maintained in files
- along with the library.
- A library (or application) code area is then composed of Source Code, Make
- files, a regression test script, and golden unit test case input and output
- files. This is all maintained in the RCS pool along with the source. Although
- we find it easier to keep the unit tests with the source code, if disk space
- is a problem they can be maintained separately.
- This does violate another guideline of Mr. Meadows. "Do not make the test
- program dependent on external files." Well, sorry. But any sufficiently large
- system will have some external files. Libraries will rely on other libraries.
- And applications will have large external data files. Having some simpler
- versions of the data files for unit tests is not a bad tradeoff.
- We find, as a general rule of thumb, that our library unit test cases are at
- least as large as the code itself. When one adds in the test stubs, and golden
- output files, it does build up quickly. Some Application Unit Tests can grow
- even larger, say 2-5x.
- However, the payback is when any developer can go into a library or
- application, run make test, and in a few minutes see if their changes have
- affected any previous results. Given that 20% of a 150,000 line software
- product may be changed during a given release, this pays off very quickly in
- not introducing unwanted bugs. The costs of this approach are disk space, and
- the displine to maintain the tests as part of the source and development
- process.
- On a final note, one of our newer tools has been the Purify software from Pure
- Research. Even in evaluation, the product was able to find several memory
- leaks and other problems which had gone undiscovered for years. I personally
- recommend this group of products for any serious software team.
- Sincerely,
- Richard Vireday
- Sr. Software Engineer, Intel
- rvireday@pldote.intel.com
- (916)351-6105
- I've found the approach described by Meadows very useful for smaller projects,
- and the approach you describe better for larger ones, for the reasons you
- describe. -- pjp
- Dear Mr. Plauger,
- While I can't claim your longevity in data processing, I have been in the
- industry since 1976. As you have pointed out, there's little in the world of
- data processing that hasn't been seen before. In particular, there have always
- been people who believe it is possible to constsruct a perpetual motion
- machine for software support. Once you prime the pump with an initial license
- fee, the machine keeps producing answers, bug fixes, and enhancements with no
- further input. This belief is reinforced by the examples of Word Perfect
- Corporation and Microsoft, who seem to keep providing support just because
- they think it's the right thing to do.
- Free support is really a variation on the infamous Ponzi scheme; you give me
- $1,000 and I will pay you $250 interest every month forever. Or, to rephrase,
- you buy my $125 competitive software upgrade, and I will pay the distributor
- his cut and provide you with $45 per hour support forever. It's become the
- case for PC software, including the development tools advertised throughout
- your magazine, that selling computer software in an extended market requires
- the vendor to either lie or become a software missionary.
- This leaves potential customers only two ethical and legal purchase
- alternatives: only buy products from vendors who charge enough to cover
- support, or accept spotty support provided for free. Anyone who steals
- software should never become a parent, or should have a high tolerance for
- hypocrisy when their child is found cheating or shoplifting.
-
- Sincerely,
- James P. Hoffman
- 416 West Kerr St.
- Salisbury, NC 28144
- While I wouldn't use your emotion-charged phraseology, I agree with much of
- what you say. I ran a software company for a decade and found myself
- entertaining a different scheme for pricing code and maintenance almost every
- year. Charge too much and your competitors steal the market. Charge too little
- and you go broke getting rich. I'm glad I don't run a software company today.
- -- pjp
- Dear Mr. Plauger:
- I want to express my thanks for the three part series CUJ ran on pointers by
- Chuck Allison. These are the kind of articles that are so helpful to me.
- Incidentally, they exemplify what is missing from most books on C that Mr.
- Musielski complained about in his letter in the October issue. But to Chuck
- Allison's articles: I was raised on the assembly language programming and did
- nothing else for the first ten years of my programming experience.
- Consequently, I was well aware of the advantages of indirect addressing, but
- it has been amazing to me how little this benefited me understanding C's
- pointer syntax.
- I have a library of 38 books on C. Yet it is so often when I run into a
- problem that I must wade through more than half of them before I discover the
- key. It is a constant annoyance that most books on C never proceed beyond the
- simplest example. I will give you a trivial example: look at most books aimed
- at beginners in C. How many show that curly brackets are necessary with if,
- for, do, while when more than one statement follows? Trivial, maybe, but not
- to a beginner. How many warn that scanf is worthless for dealing with user
- responses that don't meet the programmed format requirements? How many show
- useful alternatives for interactive user responses?
- My first exposure to higher-level language was BASIC. I have exactly three
- books explaining the language and never needed more. Mike Musielski has a
- valid grievance and it is just a little unsettling to me that I have had to
- collect so many books on C despite my admiration for C and appreciation of its
- features and power.
- On the subject of Numerical Extensions to C, about which you wrote in the
- September issue: My interest in C mainly centers on electronic engineering
- programming and I watch with great interest the deliberations of the NCEG
- group. One feature which I have seen no information is whether they are
- looking at non-integer exponentiation. BASIC allows statements such as 2^1.6
- which save a lot of bother working with logarithms. The only text I have found
- dealing with engineering programming is Numerical Recipes in C. Unfortunately,
- the authors worked assiduously at a FORTRAN translation and mostly ignored the
- powerful features of C because no counterpart existed in FORTRAN. The result
- is that oft-times the sources are not easy to read. I often fall victim to
- their "unit" approach to arrays where they simulate FORTRAN's elimination of
- the zero element of an array.
- Sincerely,
- Forrest Gehrke
- 75 Crestview Rd.
- Mountain Lakes, NJ 07046
- There are so many books on C simply because there is a huge market. Everybody
- wants to write the next Kernighan & Ritchie (still the best selling technical
- book ever), and nobody wants to leave an entire market to some other potential
- K&R.
- As for your question about exponentiation, it seems to me that the current pow
- function does what you want. -- pjp
- Mr. Plauger,
- With much amusement, I read your article "An Embedded C++ Library" in the
- October 1993 issue of Embedded Systems Programming. In the past, standards and
- embedded systems were always separate subjects. They did not benefit from each
- other. Now that you are an official EPSILON [Embedded Programming Society,
- International, and the Loyal Order of Nonentities -- pjp] author I am glad to
- see that you want to become part of the solution rather than part of the
- problem. By way of your first paragraph in the Embedded Wish List section of
- your article, I see that you understand that the major problems are
- non-technical. Welcome and glad to have you on board.
- I have some experience trying to get the ANSI C standards committee to provide
- language support for embedded systems. For the most part, the ANSI C committee
- was a very homogeneous group of compiler writers whose expertise in embedded
- systems was unhindered by their ignorance of the subject. I tried to make some
- of your points at one of their meetings. They literally laughed at me. The
- chair of the subcommittee addressing these issues openly equated embedded
- systems with toasters. As a group, they were arrogant, rude, and lacked the
- experience to understand the technical issues except from the standpoint of
- compiler design. For the most part, they seemed to prefer it that way.
- It reminds me of the story of the salesman and the engineer or safari in
- Africa. The first morning, the salesman took the engineer lion hunting.
- Shortly after they left the hut, the salesman and the engineer came running
- back with a lion snapping at their heels. As they reached the hut the salesman
- opened the door and stepped aside as the engineer and then the lion ran into
- the hut. Slamming the door shut the salesman bragged, "I caught him, now you
- skin him."
- Instead of dealing with embedded systems issues, the ANSI C committee slammed
- the door and was done with the problem. It is the embedded systems programmers
- who are constantly reminded that they are still living with a lion. I hope the
- C++ salesman knows more about catching lions.
- Years ago, at your request, I sent you copy of my public comments on ANSI C by
- fax. Let me paraphrase a few key recommendations.
- Freestanding libraries must be a superset not a subset of the hosted library
- requirements because embedded systems requirements are more strict.
- Functions that require operating systems support must have a defined interface
- to the operating system. For example, in the freestanding library, printf
- should call putc. The library should include printf and the user should write
- putc. The library should also provide a putc that acts a data sink, in case
- the user written putc is not available. The standard should document both
- functions as well as the relationship between them. In some cases, the
- standard will need to define new freestanding user written functions. For
- portability all library functions must be present in the freestanding
- environment without exception.
- The standard needs to define which freestanding library functions you may call
- recursively and which you may not call recursively.
- Now we get to deal with the C++ standard. I hope history will not repeat
- itself. I wish you lots of luck, fortitude and the tenacity to get the job
- done.
- Yes, I know your article appeared in another magazine. The other magazine is
- deficient. It lacks a Mail column. I suppose it has something to do with
- committing space to advertisers or readers depending upon where your
- priorities lie. I like The C Users Journal. It has lots of mail with many
- differing and even critical viewpoints. Since you write for both magazines and
- because this is a topic with very significant impact on readers of both
- magazines, please forgive me the sin of mentioning the name of another
- publication. This is just my way of trying to get broader support for making a
- better language.
- Sincerely,
- Russell Hansberry
- 171 Whitney Road
- Quilcene, WA 98376-9629
- Telephone: 206 765-4465
- Fax: 206 765-4430
- Compuserve ID: 70314,1506
- I'm really sorry you left that C Standards meeting feeling the way you did. I
- can assure you that Jim Brodie as Chair would not tolerate such open
- expressions of disdain as you perceived. (He is personally incapable of being
- rude to another person, in my experience.) There was expertise on embedded
- systems within X3J11 and we had discussed any number of such issues in earlier
- meetings. My article in DDJ expressed my dismay that we chose to provide so
- little explicit support for embedded programming in Standard C, but it was not
- intended as a criticism of the committee's decisions.
- I agree that the level of standardization you describe would be helpful in
- many circles. From experience, however, I know how much work it takes to flesh
- out what you propose. And I know the burden it would impose on most
- implementors to conform. So I expect that, while some things will be better
- attended to in the C++ standard for embedded programming, the final result
- will still fall short of your guidelines. -- pjp
- Dear PJP,
- Thanks for the informative article [on what? can someone guess what this
- refers to and insert its name and publication date? - pjp]. Incidentally, the
- October issue of Byte carries two excellent articles on similar products,
- which are more expensive, yet seem to do the same thing: CodeCenter,
- ObjectCenter, and others (page 28) and the heavily-advertised and consequently
- high-priced BoundsChecker (page 159). (I suggest, CUJ acquire a copyright and
- carry those articles in December, for the benefit of all readers.)
- Isn't it a shame (for the C language compiler manufacturers) that such
- products are necessary? I understand that in the early days of C, products
- like the manifold LINTs were necessary. There was memory either for a compiler
- or a LINT-type consistency check. But nowadays, our compiler manufacturers go
- after a fashionable C++ compiler, allowing their C-compiler to gather dust.
- Can't someone produce a decent C compiler which catches those memory
- overwrites etc. in a "debug" or "verbose" mode? And then give us decent longs
- which can be used for business applications also (the "pennies in long doubles
- problem")? You mentioned in your October editorial that people are demanding a
- revision of the C standard. You seemed surprised, after having jumped onto the
- C++ wagon in a hurry. You made me curious. Can you tell us more?
- Ref: C++ vs.C
- I have learnt that mediocre standards are better than brilliant ideas, which
- turn the users into guinea-pigs for testing one revision after another. See
- the PC, which in its days was not the most brilliant, yet... See C nowadays.
- It is alright for an academician to buy a C++ compiler. But what if you
- produce a "ton" of software in C++, knowing that you will adapt your code by
- July 1994, when the new standard arrives hopefully. The guinea-pigs are
- necessary to fund the development effort, but not everyone can afford... I
- think the field is maturing. We don't believe anymore that the latest is the
- best. In a different field, take the software developers (WordPerfect e.g.)
- who deliberately shun Windows, as their own graphics routines under DOS are
- superior! (Going by BYTE October 1993.)
- Suggestion: Devote a topical issue of CUJ to the C++ vs. C debate, and discuss
- -- for the benefit of your readers -- the merits of the one over the other.
- The basics. Not the style you adopt when you talk to colleagues on the C++
- committee. You get my idea. I am particularly interested in the performance
- payoff for the interpretative elements in C++ ("housekeeping" known from good
- olde BASIC back in ..., with the deletion of runtime objects). Everyone knows
- by now that programs consist of data and algorithms. C hinges everything on
- algorithms, so that data are ubiquitous unless you deliberately implement some
- "information hiding." C++ sees data only (or mainly) and subordinates
- algorithms to data. It is the other extreme, certainly more appropriate for
- big projects, but hardly suitable for the small application and hardly
- suitable for top-down design. Once you know what you want it is alright to
- pull the objects together, which you have collected in a bottom-up approach.
- The shift in paradigm is total...
- Organizing code is very much like organizing companies. (I have worked in that
- field called management consultancy for some 10 years!) In the small company
- the boss looks after everything. No need for information hiding. Then come the
- specialized departments. In C++ these are the objects. C++ uses structures to
- bundle data and algorithms. What if C was (made) a little more aware of
- subdirectories? (Yes the subdirectories of the operating system.) That is also
- a way of organizing data. It can be done, even with the current
- implementations of C, which know about an INCLUDE directory and the
- source-code directory, at best. The root directory is the boss. It just does a
- few strategic calls to the subdirectories visible from the root directory (I
- am talking development time). The visibility of data is restricted to a
- subdirectory. Between subdirectories there are no side-effects: all data is
- passed as arguments or returned as return values. I have been organizing my
- programs like that for some time. With enormous benefits. I plug and play,
- even without C++ constructors and destructors. The stack is a fabulous
- automatic constructor/destructor. And there is no need for housekeeping. We
- are talking about two languages, which unfortunately are "marketed" as
- successors of each other! What confuses everyone is that in C++ you can
- program (almost) just as if it were good old C!
- You get my message? Give us a decent C, with llongs 64-bits wide (for business
- applications, accounting, and the like), with decent compilers that can check
- for memory overwrites in a verbose mode, with some evolution (not revolution)
- on information hiding, and then move to C++, Smalltalk, when the job is done.
- We are judging a half-finished product against another half-finished product.
- That obscures everybody's judgement. And in turbid waters there is good
- fishing (marketing). Remember: "With our modern marketing techniques we would
- easily have pushed Beethoven's symphony production into the double-digit
- bracket".
- Sincerely yours,
- L. Engbert
- Engbert UB
- Taunusstrasse 8
- 61389 Schmitten
- Germany
- (06084) 2367
- FAX +49-6084-2458
- As a rip-roaring pragmatist, I applaud that the world has both C and C++
- compilers in it. And that more often than not, the two more or less play
- together. Right now, I view C as the safe and solid bet for delivering serious
- applications with serious robustness and performance requirements. I view C++
- as the cauldron in which ambitious new ideas are being stewed. There's a place
- in my tool chest for both. That seems to be the case for many of our readers
- as well. -- pjp
- Dear Mr. Plauger:
- In reading your July 1993 issue, I noticed a letter to the editor from Kevin
- Nickerson asking about distribution of the ANSI C (IS0) Standard. While not in
- machine-readable form, The Annotated ANSI C Standard, recently published by
- Osborne/McGraw-HiII, contains the ANSI C Standard on left hand pages with
- annotations by C programming author Herbert Schildt on right hand pages. This
- book is now available in bookstores or by calling 1-800-227-0900 at a price of
- $39.95.
- We published the book because we felt that there are thousands of people like
- Mr. Nickerson who would like to have the Standard, but don't have their own
- copy. As a special offer to your readers, Osborne/McGraw-Hill will give a ten
- per cent discount off the cover price to those who call our 800 number and say
- they saw it in The C User's Journal.
- Sincerely,
- J. M. Pepper
- Editor-in-Chief
-
- Osborne/McGraw-Hill
- 2600 Tenth Street
- Berkeley, CA 94710
- 510-548-2805
- FAX 510-54906693
- Here's your chance to get a good price on a useful document. -- pjp
- To the Editor:
- The code from the October 1993 issue shown in Listing 1 was compiled with
- Borland C++ (Chuck Allison's "Code Capsules: Pointers, Part 3"). It does not
- work with the input
- sortargs *.c
- as alleged, but does sort command line arguments if all are entered at the
- command line. It does not work with redirection or using more.
- Any Comments?
- Yours Sincerely,
- James R Lane
- 13 Waratah St.
- Walkerville 3959
- Victoria
- Australia
- Chuck Allison is obviously speaking UNIX shell language here, not DOS command
- lines. The UNIX shell expands wildcards such as *.c and invokes the command
- with the argument list spelled out completely. DOS passes on the wildcard as a
- single argument and leaves it to each command to expand the wildcard as it
- sees fit (if at all). Redirection has no effect on the interpretation of
- command-line arguments. -- pjp
- Dear P.J. Plauger:
- I have a couple of important comments about "Dynamic Two-Dimensional Arrays,"
- by P.J. LaBrocca in the November '93 issue: VERY! USEFUL!
- His dynamic 2-D arrays slipped painlessly into an application that desperately
- needed them. Exactly the kind of information I read your magazine for.
- Jeffrey Siegel
- Tokyo, Japan
- Thanks. Glad we could be of help.-- pjp
- Dear Mr. Plauger,
- This is a response to the letter from Lawrence H. Hardy in CUJ, December 1993.
- A readily available source of documentation of the PCX file format is Flights
- of Fantasy, by Christopher Lampton, Waite Group Press, 1993. This book is both
- fun and informative. It contains C++ source for a flight simulator. It does a
- good job of explaining C++ classes, VGA graphics, and animation.
- Steve Robison
- Thanks. -- pjp
- Gentlemen:
- I am writing in response to a couple of letters in the December 1993 issue of
- The C Users Journal about the Hayes AT Command set. An excellent reference is
- the Technical Reference for Hayes Modem Users, Hayes Microcomputer Products,
- 1992. This publication is available free to Hayes modem purchasers when
- requested within 90 days of purchase of a Hayes modem. Others can purchase
- this publication from Hayes for $25.00. They accept prepayment by check and
- will take Visa and MasterCard or they will send C.O.D.
- Hayes Microcomputer Products, Inc.
- Attention: Customer Service
- P.O. Box 105203
- Atlanta, GA 30348-9904
- (404) 441-1617
- Sincerely,
- James E. Truesdale
- Systems Analyst
- jbm electronics
- 4645 LaGuardia
- St. Louis, MO 63134-9906
- Thanks. -- pjp
-
- Listing 1 Sorts command line arguments with qsort
- /* sortargs.c
- * from C users Journal p84 Oct 1993
- * Chuck Allison Listing 1 sortargs.c
- * sorts command line arguments with qsort
- *
- */
-
- #include <stdio.h>
- #include <string.h>
- #include <stdlib.h>
-
- int comp(const void *,const void *);
-
- main(int argc, char *argv[])
- {
-
- qsort(argv+1,argc-1,sizeof(argv[0]),comp);
- while(--argc)
- puts (*++argv);
- return 0;
- }
-
- int comp(const void *p1,const void *p2)
- {
- const char *ps1 = * (char **)p1;
- const char *ps2 = * (char **)p2;
-
- return strcmp(ps1,ps2);
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Symbolic Access To Embedded Controllers
-
-
- Odd A. S. Olsen and Petter H. Heyerdahl
-
-
- Odd A.S. Olsen and Petter H. Heyerdahl received their Master of Engineering
- degrees in Engineering Cybernetics at the Norwegian Institute of Technology,
- where Mr. Olsen later received a Ph.D. He is currently an independent
- consultant, developing electronics and programs for embedded controllers. Mr.
- Heyerdahl is associate professor at the Agricultural University of Norway,
- Department of Agricultural Engineering. Odd may be contacted at Jutulveien 11,
- N-0852 Oslo, Norway or by the Internet: odd. olsen@itf.nlh.no.
-
-
-
-
- Introduction
-
-
- A PC often functions as the user interface, debugging tool, and general
- supervisor of an embedded controller. With this arrangement it is important
- for the programs that transfer parameters between the PC and the controller to
- maintain agreement on the names and meanings of all variables. If the PC sets
- parameter 63 to the value of 123, for instance, the PC and the controller must
- agree on what parameter 63 is and what the value 123 represents. This
- consistency is especially at risk during development because new parameters
- are sometimes introduced on-the-fly and often end up in different locations on
- the two computers. The processors may not even agree on a format for the
- values. (An Intel PC could be storing numbers in big-endian format while a
- Motorola controller might want them in small-endian format.)
- A symbol table is an efficient solution to the consistency problem because it
- allows the data structures of the PC and controller to develop separately, yet
- links the two systems with a simple communication protocol. By this method,
- which resembles the message format found in the general-purpose interface bus
- (GPIB), values are referenced by their symbolic name, e.g. measured_pH,
- heater_power etc. We used this principle to control a group of bioreactors.
- Each reactor is controlled by an embedded controller which communicates with a
- supervising PC for operator instructions and data logging.
-
-
- How Things are Connected
-
-
- Figure 1 illustrates the information flow in the system. In our implementation
- the PC is always the initiator of communication and the embedded controller
- only responds to received messages. The most important entries in the symbol
- tables are the parameter names and their values. (In this context a parameter
- is more general than a constant. It can also represent measured data, debug
- information, etc.) Parameter values move from the PC symbol table to the
- controller symbol table through messages sent over the RS-232 line. The PC
- symbol table can be altered either by accessing the user interface or by
- causing the PC to read a refreshed version of the parameter file. The new data
- is then sent to the symbol table in the embedded controller. The controller
- symbol table can thus be changed from the PC, or through the measurement and
- control algorithms of the controller itself. Parameters for the control
- algorithms are stored in the EEPROM, enabling the controller to initialize its
- symbol table values at start-up and maintain its control process by itself
- should the PC fail.
-
-
- The Symbol Tables
-
-
- Both the PC program and the embedded controller program are centered on their
- respective symbol tables. The structure of the PC symbol table is:
- typedef struct {
- char *name;
- int ival;
- double fval;
- void *(*scalefunc)(void *);
- char eeProm;
- char file;} symTabEntry;
- Each of the process variables maintained by the controller (temperature,
- oxygen level, pH, etc.) is represented as a record in the symbol table. Every
- record in the table contains a unique name (e.g. measured_pH), an integer or
- double floating-point value, and a pointer to an optional scaling function.
- The table also contains two flags: eeProm and file. If eeProm is set to TO_EE,
- the record will be transferred to the controller upon start-up, and will be
- stored in the controller EEPROM if its value is different from the one already
- residing there. The file flag indicates whether the current record contains
- data associated with the system's parameter file. This parameter file contains
- parameter names (of process variables and other system variables) and their
- values. If the system finds a particular parameter name in the file during
- startup, the system copies the parameter's value into the record, and sets a
- bit in the file flag. Likewise, if the user later wants to save the current
- parameter values back to the file, the system will save only those parameters
- whose file flags are set. While the system is operating, a record holds a
- parameter's value in either ival (for integer parameters) or fval (for
- floating-point parameters).
- The user interface is also oriented to the symbol table. The user can change
- parameters by choosing a parameter window. Each parameter entry field knows
- the pointer to its entry in the symbol table. It can thus read and write the
- value and call the scaling function. When the user changes a parameter, the
- new value is immediately scaled from floating-point to integer and transmitted
- to the controller. Because values are referred to by name rather than by
- pointer, the set-up of the parameter windows is simplified.
- The symbol tables are alphabetically sorted, which allows the processor to use
- a binary search when looking for a name. (A binary search is faster than a
- linear search, but if the tables were large, I would consider using
- hash-tables.) The programmer will now and then insert new entries to the
- tables at a wrong position. On start-up both the PC and controller programs
- therefore go through their tables and check for alphabetic order. If unordered
- elements are found, the function issues an error message.
- The structure of controller's symbol table is
- typedef struct {
- char *name;
- int ival;
- char *(*func)(char *);
- int eeOffset;} symTabEntry;
- name is the name of the parameter, ival the value, and the function pointer
- either a function that can be invoked by a message or a hook for individual
- processing of received parameters. eeOffset is the EEPROM address where the
- parameter is stored. (Parameters that are not saved in EEPROM have an offset
- value of --1.)
- The controller's program runs periodically, e.g. every five seconds. The
- program first reads all measurements and stores the values in the symbol
- table. The control algorithms then take their values and parameters from the
- symbol table and calculate responses. The results are stored into the table
- and output to actuators. Both measured values and actuator states are thus
- available to the PC.
- To understand the use of the symbol table, say the control algorithm is
- pHerror=pHsetpoint-pHmeasured. This can be programmed as:
- FindSymbol ("pHerror")->ival=
- FindSymbol ("pHsetpoint")->ival
- - FindSymbol ("pHmeasured")->ival;
- FindSymbol returns a pointer to the symbol table entry with the given name.
- However, less typing and a faster program are achieved by initializing
- pointers to the symbol table entries at start-up:
- int *ppHerror;
- ppHerror= &FindSymbol ("pHerror")->ival;
- etc. and coding the algorithm as
- *ppHerror= *ppHsetpoint - *ppHmeasured;
- At start-up the symbol tables are initialized according to the following
- sequence. First the tables are initialized with the values specified at
- compile time. The controller then reads values from the EEPROM into its table.
- The PC reads a parameter file, which may change some of the values in its
- table. The PC program will then transfer those values having flag TO_EE set to
- the controller EEPROM and symbol table.
-
-
-
- Message Format
-
-
- The messages between the PC and controller are ASCII strings sent over an
- RS-232 connection. Before transmission the message assembler adds a checksum.
- The receiver checks and removes the checksum from the message before passing
- the message to the message interpreter.
- Each message consists of one or more submessages. The submessages and their
- elements can be separated by spaces and have the following format:
- <name><operator>[<value>]
- The name field is the symbolic name of a parameter or a command. The operator
- is a single character as given in Table 1. For messages with an associated
- value, the value field is the numerical value of the parameter. The name must
- begin with a letter (a-zAZ), followed by a number of letters, digits, or
- underscores (a-zAZ0-9_). In our implementation the value field is always an
- integer because all processing in the controller is integer based.
- To add other value types the programmer need only define new operators and
- modify the structures and programs accordingly. A floating-point type is
- useful in some systems. A string type can also be of use, for example to
- transfer strings directly to an operator display on the controller. In a
- multi-processor embedded system, the string type can be used to transfer
- messages verbatim to subprocessors.
- The following message transfers temp_setp=20 to EEPROM, sends delay=10 to the
- symbol table, requests the present value of temp, and executes the function
- pwr_sav:
- temp_setp>20 delay:10 temp? pwr_sav!
- The response might be
- temp:22 pwr_sav#16
- The error message indicates that pwr_sav was not found in the symbol table of
- the controller. (Such error messages are hopefully only encountered during
- program development.) In this case the message indicates a discrepancy between
- the PC and controller programs.
-
-
- Message Interpretation
-
-
- The interpreter parses the received messages into a task list which is handed
- over to an execution function. The structure of the list records is:
- typedef struct {
- int command;
- int type;
- char *name;
- int value; } toDoList;
- command is a value specifying the command, type the type of the value (in this
- case always INTEGER), *name a pointer to the name of the parameter, and value
- the numerical value of the parameter.
- In the controller the interpreter and reply assembler are integrated. In the
- PC, however, parameters are sent through a message assembler and replies
- decoded in a separate interpreter. The receiver is interrupt driven and
- executes the interpreter after a full message has arrived. The interpreter
- places the received values into the symbol table and transfers error messages
- to the user interface. The PC interpreter resembles the controller
- interpreter, which I am about to describe.
-
-
- A Sample Application
-
-
- This slightly modified version of the controller's interpreter and message
- assembler will illustrate message handling and symbol table operations. (This
- version runs on a PC.) Listing 1 presents the definitions and function
- prototypes; Listing 2 shows the main program and message interpreter.
- The main program reads a message from the keyboard. It parses the message,
- then prints and executes the task list toDo. The task list is a list of
- structures with one entry for each subcommand. The Parse function assumes the
- message is formated as defined in Table 1. Since the grammar is quite simple,
- there is no need for recursive descent or other strategies used in compilers.
- There are several tests that check adherence to the format, and issue error
- messages if discrepancies are found. (This is most valuable during program
- development.) Finding an error stops the message interpretation. Inserting an
- EMPTY command terminates the list.
- GetNextToken reads the next token in the message string. If the first
- character encountered is a letter, the token is a name. If it is a digit, the
- token is a value. If the first character is neither a name nor a value, the
- token is an operator. The function then copies the token into the token string
- of the parser and sets the type parameter according to the type. The size of
- the token-string (MAX_TOKNE_LEN) must be larger than the longest token that
- can occur. The type parameter is set equal to the operator's defined constant.
- Parser calls StroreToken to save the encountered names in a string pool. This
- pool is recycled for each message by setting pPool=stringPool. The pool size
- (STRINGPOOL) must be large enough to store all the names can occur in one
- message. The address of the names is stored in the name element of the
- structure.
- The task list ToDo can now be processed by calling DoCommands. This functin is
- essentially a loop which executes the commands through a switch statement.
- However, it first initializes a pointer to the beginning of the txBuffer,
- where the response message is stored. This buffer might be local to the more
- hardware-dependent parts of the program and not globally accessible, so this
- requires a function call to obtain the pointer. All function called in the
- DoCommands maintain the buffer pointer by accepting it as an argument and
- returning the possibly changed value.
- SayError writes an error message to the transmit buffer. It receives a pointer
- to the task list, where it finds the name relating to the error, and error
- number.
- The functions Assign, Request, ToEeprom, and RunCommand each begin by getting
- a name from the task list and looking it up in the symbol table. If the name
- isn't found, they write an error message to the transmit buffer.
- The Assign function inserts the value in the symbol table. The Request
- function reads the value from the symbol table and generates a response
- submessage. ToEeprom puts the value into the symbol table and checks if the
- parameter has a place in the EEPROM. If eeOffset is greater than or equal to
- zero, it writes the value. If not, it issues an error message. RunCommand
- executes the program pointed to in the symbol structure. If the pointer is
- NULL, RunCommand generates an error message.
- Assign, Request and ToEeprom all check for a function hook before returning.
- If they find one they call it. Assign, Request, and ToEeprom also maintain the
- transmitter buffer pointer so they can write to the buffer.
- The program was compiled with Borland TurboC++ version 3.0. The library
- functions are all UNIX and ANSI defined, so using other compilers should not
- present any problem.
-
-
- Final Thoughts
-
-
- This symbolic data transfer scheme enabled us to implement a complete control
- system from scratch in about one programmer-month. When we started we had no
- clear picture of what the final result would be, so parameters were constantly
- added and removed as we tried different control strategies. By using the
- described method, we were able to respond quickly to these changes. This
- arrangement also made it easier to add auxiliary parameters for debugging.
- There are of course faster ways to transfer data, but the kinds of processes
- maintained by embedded controllers are often slow enough that a minor increase
- in response time isn't significant.
- Figure 1 Information flow through the system
- Table 1 Message formats and operators
- <name> = <value> Assign value to parameter name
- <name> ? Request the value of parameter name
- <name> > <value> Transfer value to the EEPROM
- <name> : <value> Current value of parameter name
- <name> ! Execute name-command
- <name> # <value> Error code value associated with name
-
- Listing 1 Message interpreter definitions and function prototypes
- /* parsdef.h */
-
- /* No copyrights claimed */
- #define ERROR 1
- #define EMPTY 2
- #define ASSIGN 3
- #define REQUEST 4
- #define TO_EEPROM 5
- #define IS 6
- #define COMMAND 7
- #define INTEGER 8
- #define NAME 10
- #define NAME_ERROR 11
- #define OP_ERROR 12
- #define VAL_ERROR 13
- #define END_ERROR 14
- #define POOL_ERROR 15
- #define UNDEF_SYMB 16
- #define NOT_EEPROM 17
- #define UNDEF_FUNC 18
- #define MAX_TOKEN_LEN 10
- #define MAX_STATEMENTS 20
- #define STRINGPOOL 80
- #define MESSAGELENGTH 80
-
- typedef struct {
- int command;
- int type;
- char *name;
- int value;
- } toDoList;
-
- typedef struct {
- char *name;
- int ival;
- char *(*func)(char *);
- int eeOffset; /* -1 if not saved in eeprom */
- } symTabEntry;
-
- char *Assign(toDoList *pL, char *pTx);
- void DoCommands(toDoList *pL);
- symTabEntry *FindSymbol(char *name);
- char *GetNextToken(char *begin, char *token, int *type);
- void Parse(char *message, toDoList *pL);
- void PrintToDo(toDoList *toDo);
- char *Request(toDoList *pL, char *pTx);
- char *Reset(char *ptr);
- char *RunCommand(toDoList *pL, char *pTx);
- char *SayError(toDoList *pL, int err, char *cp);
- int StoreToken(char **strp, char *token);
- int SymCmp(symTabEntry *a, symTabEntry *b);
- char *ToEeprom(toDoList *pL, char *pTx);
- char *TxBuffer(void);
- void WriteEeprom(symTabEntry *pSym, int type);
-
- /* End of File */
-
-
- Listing 2 The message interpreter and assembler for the embedded controller,
- implemented on a PC
- /* pars.c */
- /* no copyrights claimed */
-
- #include <stdio.h>
- #include <stdlib.h>
- #include <ctype.h>
- #include <string.h>
- #include "parsdef.h"
-
- static toDoList toDo[MAX_STATEMENTS];
- static char stringPool[STRINGPOOL];
- static char *pPool, *poolEnd;
- static char message[MESSAGELENGTH];
- static char *messList[]= {
- "EMPTY","error","empty","assign","request",
- "to_eeprom","is","command","integer","float",
- "name","name_error","op_error","val_error",
- "end_error","pool_error"
- };
-
- main()
- {
- while(1) {
- pPool= stringPool;
- poolEnd= stringPool+STRINGPOOL-1;
- printf("\nmessage: ");
- gets(message);
- if(message[0]=='.') break;
- Parse(message, toDo);
- PrintToDo(toDo);
- DoCommands(toDo);
- }
- }
-
- /* first, the parsing */
-
- void Parse(char *message, toDoList *pToDo)
- {
- char *pC, token[MAX_TOKEN_LEN+1];
- int type, i;
- pC= message;
- for(i=0; i<MAX_STATEMENTS; i++) {
- (pToDo+1)->command= EMPTY; /* mark end */
- if(*pC=='\0'){pToDo->command= EMPTY; return;}
- /* get name */
- pC= GetNextToken(pC, token, &type);
- if(StoreToken(&pToDo->name, token) != 0)
- { pToDo->command=POOL_ERROR; return; }
- if((type==ERROR)(type!=NAME))
- { pToDo->command= NAME_ERROR; return;}
- if(*pC=='\0') {pToDo->command=EMPTY; return;}
- /* get operator */
- pC= GetNextToken(pC, token, &type);
- if(type==ERROR)
- {pToDo->command=OP_ERROR; return;}
- pToDo->command= type;
- /* get value */
- if((type==ASSIGN)(type==TO_EEPROM)
- (type==IS)) {
- if(*pC=='\0')
- {pToDo->command=VAL_ERROR; return;}
- pC= GetNextToken(pC, token, &type);
-
- pToDo->type= type;
- if(type==INTEGER)
- pToDo->value= atoi(token);
- else {pToDo->command= VAL_ERROR; return;}
- }
- else pToDo->type= INTEGER;
- if(pToDo->command== EMPTY) return;
- pToDo++;
- }
- pToDo->command= END_ERROR;
- return;
- }
-
- char *GetNextToken(char *begin, char *token, int type)
- {
- char *pC, *pT;
- int i;
- pC= begin;
- pT= token;
- *token='\0';
- *type= EMPTY;
- if(*pC == '\0') {return pC;}
- /* remove leading spaces */
- while(*pC=='') pC++;
- if(*pC=='\0') return pC;
- if(isalpha(*pC)) { /* name */
- *type= NAME;
- *pT++= *pC++;
- for(i=1; i<MAX_TOKEN_LEN; i++) {
- if((isalpha(*pC)) (isdigit(*pC))
- (*pC=='_')) *pT++= *pC++;
- else {
- *pT= '\0';
- return pC;
- }
- }
- *type= ERROR;
- return pC;
- }
- else if((isdigit(*pC))(*pC=='+'))
- (*pC=='-')) { /* number */
- *pT++= *pC++;
- for(i=1; i<MAX_TOKEN_LEN; i++) {
- if(isdigit(*pC)) *pT++= *pC++;
- else {
- *pT= '\0';
- *type= INTEGER;
- return pC;
- }
- }
- *type= ERROR;
- return pC;
- }
- else if(*pC== '=') *type= ASSIGN;
- else if(*pC== '?') *type= REQUEST;
- else if(*pC== '>') *type= TO_EEPROM;
- else if(*pC== ':') *type= IS;
- else if(*pC== '!') *type= COMMAND;
- else {*type= ERROR; return ++pC;}
-
- return ++pC;
- }
-
- int StoreToken(char **ppName, char *token)
- {
- int length;
- *ppName= pPool;
- length= strlen(token)+1;
- if((pPool+length)>=poolEnd) return -1;
- strcpy(pPool, token);
- pPool+= length;
- return 0;
- }
-
- /* and now, the action */
-
- static symTabEntry symTab[]= {
-
- {"reset" , 0 , Reset, -1},
- {"status" , 35 , NULL , -1},
- {"temp" , 16 , NULL , 20}
- };
-
- void PrintToDo(toDoList *toDo)
- {
- while(toDo->command != EMPTY) {
- printf("\n %s %s '%s'",
- messList[toDo->command],
- messList[toDo->type], toDo->name);
- if(toDo->type==INTEGER)
- printf(" %d", toDo->value);
- toDo++;
- }
- }
-
- int SymCmp(symTabEntry *a, symTabEntry *b)
- {
- return strcmp(a->name, b->name);
- }
-
- symTabEntry *FindSymbol(char *pName)
- {
- symTabEntry dummy, *p;
- dummy.name= pName;
- p= (symTabEntry*) bsearch(&dummy, symTab,
- sizeof(symTab)/sizeof(symTabEntry),
- sizeof(symTabEntry),
- (int(*)(const void*, const void*))SymCmp);
- return p;
- }
-
- char *Reset(char *pC)
- {
- printf("\nExecuting 'Reset'");
- return pC;
- }
-
- void DoCommands(toDoList *pToDo)
- {
-
- char *txBuffer, *pTx;
- txBuffer= pTx= TxBuffer();
- *pTx= '\0';
- while(pToDo->command != EMPTY) {
- switch(pToDo->command) {
- case ERROR:
- case NAME_ERROR:
- case OP_ERROR:
- case VAL_ERROR:
- case END_ERROR:
- case POOL_ERROR:
- pTx= SayError(pToDo, pToDo->command, pTx);
- break;
- case ASSIGN: pTx= Assign(pToDo, pTx);
- break;
- case REQUEST: pTx= Request(pToDo, pTx);
- break;
- case TO_EEPROM: pTx= ToEeprom(pToDo, pTx);
- break;
- case COMMAND: pTx= RunCommand(pToDo, pTx);
- break;
- default: break;
- }
- ++pToDo;
- }
- printf("\ntxBuffer: '%s'\n", txBuffer);
- }
-
- char *SayError(toDoList *pToDo, int err, char *pC)
- {
- char str[MAX_TOKEN_LEN+1], *pS;
- pS= pToDo->name;
- while(*pC++=*pS++);
- pC--;
- *pC++='#';
- sprintf(str,"%-d", err);
- pS= str;
- while(*pC++= *pS++);
- return --pC;
- }
-
- char *Assign(toDoList *pToDo, char *pTx)
- {
- symTabEntry *pSym;
- pSym= FindSymbol(pToDo->name);
- if(pSym==NULL) {
- pTx= SayError(pToDo, UNDEF_SYMB, pTx);
- return pTx;
- }
- if(pToDo->type==INTEGER) pSym->ival= pToDo->value;
- if(pSym->func!=NULL) pTx= (*pSym->func)(pTx);
- return pTx;
- }
-
- char *Request(toDoList *pToDo, char *pTx)
- {
-
- symTabEntry *pSym;
- char *pS, str[MAX_TOKEN_LEN+1];
-
- pSym= FindSymbol(pToDo->name);
- if(pSym==NULL) {
- pTx= SayError(pToDo, UNDEF_SYMB, pTx);
- return pTx;
- }
- pS= pToDo->name;
- while(*pTx++=*pS++);
- pTx--;
- *pTx++= '=';
- if(pToDo->type==INTEGER)
- sprintf(str,"%-d", pSym->ival);
- pS= str;
- while(*pTx++=*pS++);
- if(pSym->func!=NULL) pTx= (*pSym->func)(pTx);
- return --pTx;
- }
-
- char *ToEeprom(toDoList *pToDo, char *pTx)
- {
- symTabEntry *pSym;
- pSym= FindSymbol(pToDo->name);
- if(pSym==NULL) {
- pTx= SayError(pToDo, UNDEF_SYMB, pTx);
- return pTx;
- }
- if(pSym->eeOffset<O) {
- pTx= SayError(pToDo, NOT_EEPROM, pTx);
- return pTx;
- }
- if(pToDo->type==INTEGER) {
- pSym->ival= pToDo->value;
- WriteEeprom(pSym, INTEGER];
- }
- if(pSym->func!=NULL) pTx= (*pSym->func)(pTx);
- return pTx;
- }
-
- char *RunCommand(toDoList *pToDo, char *pTx)
- {
- symTabEntry *pSym;
- pSym= FindSymbol(pToDo->name);
- if(pSym==NULL) {
- pTx= SayError(pToDo, UNDEF_SYMB, pTx);
- return pTx;
- }
- if(pSym->func!=NULL) pTx= (*pSym->func)(pTx);
- else {
- pTx=SayError(pToDo, UNDEF_FUNC, pTx);
- return pTx;
- }
- return pTx;
- }
-
- char *TxBuffer(void)
- {
- static char buffer[200];
- return buffer;
- }
-
-
- void WriteEeprom(symTabEntry *pSym, int type)
- {
- if(type==INTEGER) printf(
- "\nTo EE, type: %d off: %d val: %d\n",
- type, pSym->eeOffset, pSym->ival);
- return;
- }
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ROMLDR, an Embedded System Program Locator
-
-
- Charles B. Allison
-
-
- Charles Allison has been working with microprocessor hardware and firmware in
- embedded systems since 1976. He has a Bachelor of Science degree in physics
- and a Master of Business Administration degree. Charles has a microprocessor
- consulting business, Allison Technical Services, where he has been developing
- embedded control and monitoring products for clients since 1984. Charles can
- be reached through CompuServe 71005,1502, his BBS/FAX line at (713)-777-4746
- or his company, ATS, 8343 Carvel, Houston, TX 77036.
-
-
- MS-DOS software development tools such as C compilers and debuggers have
- become marvelously sophisticated and useful. They can also provide a
- cost-effective means for developing embedded systems. The purpose of this
- article is to introduce one of the aspects of using a high performance
- DOS-based C compiler for embedded systems, that of relocating code and data
- segments. I provide a program, ROMLDR, which can modify a program from the
- MS-DOS EXE format to a located binary file format.
- ROMLDR's principle purpose is to adapt DOS-based C compiler output for use in
- EPROM-based embedded systems. Other uses include BIOS extensions, EPROM-based
- MS-DOS applications, and relocation of MS-DOS programs in PC memory, such as
- above the 640k MS-DOS memory limit. ROMLDR can also be used with EXE files
- generated by languages other than C.
- ROMLDR was written using Borland C 3.1. It should be possible to use either
- Borland Turbo C or Microsoft C to make the ROMLDR program and to use for
- embedded programs. I have tested ROMLDR only with the Borland C 3.1 compiler.
-
-
- Embedding DOS-Compiled Programs
-
-
- To use DOS-compiled programs in EPROM-based embedded systems, you must provide
- several additional components, as well as resolve some unique programming
- issues. A number of the required components, such as the startup code, depend
- on the compiler, target hardware, and application. (See the sidebar "Coding
- for Embedded Applications" for a brief discussion on embedded code
- requirements.)
- Once you have attended to all these details, you can program, compile, and
- link the various program files into an MS-DOS EXE file. However, you can't
- just burn your EXE file into EPROM and go. You must explicitly perform a step
- that MS-DOS performs implicitly when it loads an EXE file.
- MS-DOS modifies programs with the EXE extent when it loads them into memory.
- This modification, often referred to as a fix up (also known as the locate
- function), consists of modifying the program's segment values for the actual
- memory address where it is to run. By performing fix ups, DOS can load an
- executable almost anywhere in real-mode memory. (DOS can load small programs
- with the COM extent anywhere and run them without modification.)
- DOS performs fix ups by adding the file's load-address segment value to all
- segment addresses stored in the code and data areas of the program.
- Unlike DOS programs, most embedded systems programs reside in and execute from
- EPROM. Embedded system locators must perform fix ups prior to placing the
- program in EPROM and must take into account that variables will be located in
- RAM, through their initialized values for startup are still in EPROM.
-
-
- Locating for Embedded Systems
-
-
- Before I present my program in detail, I want to outline the major parts of
- the location process. I will first describe the structure of a DOS EXE file,
- and how that structure is reflected in my code. Next, I will describe the
- location process.
-
-
- EXE File Format
-
-
- The EXE file consists of several sections, including a header, program code,
- data initialization values, and optional program debug information. Refer to
- structure EXE_HDR at the beginning of Listing 1 for the layout and definitions
- of the various parameters. The header section consists of several parameters
- which define the size of the file and the size of the header. Following these
- parameters is a section of fix-up far pointers.
- Each of these pointers targets a location in the program code containing a
- segment address value that must be modified with the correct load address.
- These pointers are stored in standard 80x86 segment:offset format relative to
- the beginning of the code section. The pointers' segment values are derived
- from teh MAP file segment table by using the top four hexadecimal digits from
- the beginning address listed for each segment. (MAP files are generated by the
- compiler. MAP file segments consist of the top four hexadecimal digits of the
- beginning address.)
- There are num_reloc fix-up pointers in the header section. These pointers
- begin at the offset off_reloc from the beginning of the header. (Note that
- fix-up pointers may not be sorted by address as they occur in the EXE file.)
- Following the header is the program's code section. This section consists of
- one or more separate segments, the number and type of which depend on the
- program and its memory model. Following the code is the initialized data
- section. Then comes the uninitialized data section, and finally the stack.
-
-
- The Location Process
-
-
- Once it has loaded a program into memory, the MS-DOS loader adds the code
- section's segment address to the segment value stored in each location
- requiring a fix up. The loader finds these locations by dereferencing each
- fix-up pointer in the header. MS-DOS sets the CPU's stack registers to
- disp_stack_seg:sp and calls the program at address rel_cs_seg:ip. (Note: For
- the sake of illustration, I use disp_stack_seg, sp, rel_cs_seg, and ip to
- represent values stored at specific offsets within the EXE header. By
- referring to fields of the same name in my struct, EXE_HDR, you can see where
- these values are stored in the header.) MS-DOS also provides some environment
- and header information to the loaded program through register contents.
- Most, but not quite all of the information necessary to generate rommable
- absolute binary files already exists in the EXE file. The rest of the
- information must come from the segment data in the compiler's MAP file and
- from configuration information provided by the user in a loader configuration
- file.
-
-
- Program Description
-
-
- ROMLDR uses the linked EXE file and its MAP file to create a binary file that
- can be programmed into EPROMs.
- Main begins by allocating a far buffer to contain program segments. This
- buffer should be 0xl0000 bytes in length to ensure that any size code segment
- can be processed. (While testing from within the Borland IDE, I had to reduce
- the buffer's size significantly due to memory constraints.) A simple error
- routine, term_error, generates error messages for a variety of potential
- problems, and provides for program termination.
- After allocating a buffer, ROMLDR reads the configuration file (CFG) specified
- on the command line. This file contains the names for the EXE, MAP, and BIN
- output files, EPROM and RAM hexadecimal load addresses, and the class name of
- the first RAM segment. Table 1 shows the CFG file format. CFG file parameters
- must be located on separate lines and separated by spaces. On each line,
- ROMLDR ignores any characters occurring after the list of required parameters.
-
-
- Reading the MAP File
-
-
- ROMLDR executes a while loop to read and process lines of text from the MAP
- file. (The MAP file used with ROMLDR should be the short version, which
- contains only the segment table.) ROMLDR extracts memory allocation class
- names, plus their starting and ending addresses, and stores them in an array
- of structures called maptable. ROMLDR performs a simple length check using
- configuration variable class_loc to determine if the current line contains
- segment information. ROMLDR expects the line to be in a fixed column format,
- with the class name occurring at offset class_loc. ROMLDR converts address
- values to long integers, and compares class names with ram_class, a
- configuration variable used to define the beginning of RAM. The beginning
- segment address is stored in ramdata. ROMLDR currently will process a maximum
- of 120 segments.
-
-
-
- Processing the Header
-
-
- Once ROMLDR has acquired the MAP file, it reads the EXE header portion via
- function gethdr. This function first reads in 32 bytes of the header to
- determine the header's size and then loads the rest of the header. This
- function stores header information in an array named header which can contain
- a maximum of 10,000 bytes.
- ROMLDR then sorts the fix-up pointers into ascending address order. Function
- sort_table uses qsort to sort the pointers. Function cmp_ptr, supplied as an
- argument to qsort, compares values for qsort by converting pointers from
- segment:offset to long integer form.
- After sorting the pointers, ROMLDR performs fix ups on each segment, using a
- for loop to iterate through all segments. Function read_segm reads each
- segment from the EXE file and returns the segment size. If the segment length
- is non-zero, ROMLDR calls function fix_segm to run through any fix ups needed
- for the segment and then calls write_segm to output the processed segment to
- the BIN file. (A logical enhancement to ROMLDR would be to output the segments
- in a standard hex format, such as Intel hex format.)
-
-
- HOW ROMLDR Handles EPROM and RAM
-
-
- EPROM segments require different modifications than RAM segments. ROMLDR
- treats all segments at or below the class named ENDCODE as EPROM and treats
- those above ENDCODE as RAM. The location process currently terminates on
- reaching the last segment, the STACK class. (The BIN file, however, needs only
- to contain code and data segments up to the last initialized data value.)
- The ENDCODE class name is special for another reason. ROM based systems
- typically must transfer initialized data from EPROM to RAM. Therefore, ROMLDR
- will modify all references to data segments to refer to the RAM locations and
- not to the initial values located in the EPROM. (The location for the initial
- values must be used in the startup code so that they can be transferred to
- RAM.) Segment ENDCODE is used for this purpose. I make the ENDCODE segment's
- length less than 16 bytes, locate it on a paragraph boundary, and ensure that
- the beginning data segment is also aligned by paragraph. As a result, the
- beginning ROM location for initialized data becomes ENDCODE+1. Since ENDCODE's
- address is less than the beginning of the RAM segment, it will refer to the
- ROM address just below the initialized data values.
- ROMLDR modifies EPROM data by adding the configuration file's EPROM segment
- address, stored in romsadr, to the code's existing segment value. (A more
- sophisticated version of the program could offer the option of several
- user-defined addresses and the names of the classes that would reside in
- each.)
- ROMLDR modifies RAM segments by first subtracting out the value of the first
- RAM segment and then adding the configuration file's RAM segment location
- value. This method allows the RAM locations to begin at the
- configuration-defined starting value. The subtraction was not necessary for
- code segments since they began with a zero segment value.
-
-
- Example Code
-
-
- Listing 2 is DEMO.C, a typical "Hello World" program with some added items to
- provide examples of values for several segment classes. Listing 3, DEMO.MAP,
- is the map file generated for DEMO.C using the example startup code in Listing
- 4 instead of the standard Borland startup code. Note that the _INIT_ segment
- contains some values which are addresses of library initialization routines
- that should be called in order of priority. The example startup code does not
- yet include this section or the interrupt vector initialization section.
- Examples of these can be found in your compiler's startup code. The sidebar
- "Startup Code for Embedded Systems" provides a discussion of startup code
- requirements.
-
-
- ROMLDR Versus Commercial Products
-
-
- ROMLDR has several shortcomings when compared to commercial locator packages.
- When you use ROMLDR you must provide startup code for your application;
- commercial products usually provide the basic startup code required as well as
- code solutions for a variety of problems which must be overcome in various
- embedded configurations. ROMLDR does not provide debugging support, but
- commercial products usually provide some capabilities for the debugging of
- application programs. Finally, vendors of commercial locator packages often
- provide technical support; when you use a non-commercial package such as
- ROMLDR you must solve all problems on your own.
-
-
- Conclusions
-
-
- While ROMLDR is intended primarily as a learning tool and an introduction to
- embedded systems, it can prove useful for some low-end applications. ROMLDR
- should be able to handle straightforward applications where there is one EPROM
- and one RAM memory space. It can easily be modified for more complex
- configurations, especially those which have specific fixed requirements.
- Embedded systems often monitor and control equipment other than normal
- computer peripherals. Embedded systems programmers must be extra cautious,
- since bugs in their programs can place property and lives at risk. In this
- situation, there is no substitute for understanding both the application and
- the tools. Understanding how ROMLDR works may give you insight into how more
- complex systems operate.
- Coding for Embedded Applications
- Writing embedded systems applications requires special efforts on the part of
- the programmer, because of how these applications differ from non-embedded
- applications. First, an embedded program that crashes can cause damage or
- injury, while a non-embedded program (e.g. a word processor) may cause only a
- certain amount of user frustration. Embedded systems must handle the
- conceivable problems in stride and effectively recover from the inconceivable
- without help from users.
- Second, there are several special aspects to embedded code. Many embedded
- systems run standalone programs without the support of complete operating
- systems such as MS-DOS, so these programs must take control of relevant
- interrupt vectors (including error conditions such as divide by 0 and the
- Non-Maskable Interrupt). For these standalone programs, both the hardware and
- the program may require setup code. Programs compiled to run under MS-DOS
- contain startup and termination code to accept control from and return control
- to the operating system. Standard library functions that don't access MS-DOS
- under normal conditions may contain error-handling code that does access
- MS-DOS or that would terminate the program in an unacceptable fashion within
- an embedded system. In addition, some programs may attempt to access non-disk
- MS-DOS functions, such as those used to modify interrupt vectors. If you port
- such programs to an embedded system, you must provide code to perform
- equivalent functions within the embedded system operating environment. An
- embedded program must start (or restart) in a known state. Since the program
- cannot obtain state information from a command line, it must find initial
- state data in EPROM.
- Startup Code for Embedded Systems
- Startup code consists of code to setup and to terminate an application
- program. Startup code is typically written in assembly language and the source
- code is sometimes provided with the C compiler. Replacement libraries often
- include replacement startup code that can be used when the normal C compiler
- library functions are not used.
- Startup code for EPROM-based systems must provide additional functionality
- which depends on the specifics of the system.
- Embedded system startup code usually performs the following steps:
- 1. Establish order of segment classes.
- 2. Set up the program stack.
- 3. Transfer initialized values to RAM from EPROM.
- 4. Zero uninitialized RAM values.
- 5. Set up error interrupt vectors.
- 6. Call necessary library initialization routines.
- 7. Call the application.
- The startup code must also provide the following components:
- 8. Error shut-down code to terminate or restart.
- 9. Exit shut-down code to terminate or restart.
- 10. Any error and miscellaneous routines.
- 11. Initialization of application before calling main function.
- 12. A reset vector for standalone applications.
- There are two approaches to creating a custom startup module. You can start
- with the original library or compiler code and modify it to handle the added
- embedded requirements, or you can start with a minimal embedded system startup
- module and extend it as required.
- Custom startup code may not provide all features described in a compiler's
- documentation. Environment, argv and argc, and even common option variables
- may not be implemented. For example, Borland C allows you to set stack size by
- initializing a far variable, _stklen. This variable is created by the Borland
- startup code and may be replaced in custom startup code by a simple stack-size
- definition.
- Startup code defines segments and segment classes and their order. While
- compilers often allow these names to be changed for modules, the compiled
- libraries are still expecting certain module names to exist. Warnings abound
- over changing the basic segment order so extreme care should be used if that
- appears to be necessary. The addition of segments and classes is much less
- critical. As shown in Listing 4, you can even add them virtually to make the
- MAP more descriptive.
-
- Listing 4 is a simple example of startup code. I provide it as a template
- rather than a complete example since some portions are compiler dependent. You
- may still need to add several code sections for an application. Refer to your
- compiler's startup code module for specific details.
- Table 1 CFG file format
- Mapfile Exefile Binfile // filenames with no path
- ROM_Address RAM_address // 4 digit hex addresses [0...9,a...f]
- Ram_Class // name for first RAM segment
-
- Listing 1 The EXE locator program
- /* ROMLDR.C EXE locator program
- written by: Charles B. Allison
- last change: 11-3-93 */
-
- #include <sys\stat.h>
- #include <io.h>
- #include <dos.h>
- #include <stdio.h>
- #include <stdlib.h>
- #include <alloc.h>
- #include <conio.h>
- #include <ctype.h>
- #include <fcntl.h>
- #include <string.h>
- #define BC 1
- /* BC is Borland c, else assume Microsoft */
- typedef struct {
- unsigned sig; /* signature = 4d5ah */
- unsigned 1st_sec_lng;
- /*length of last sector in file modulo 512*/
- unsigned file_size;
- /* size of file in 512 byte pages includes hdr*/
- unsigned num_reloc;
- /* number of relocation items */
- unsigned hdr_siz;
- /* # of 16 byte paragraphs in header */
- unsigned min_ld_para;
- /* min # of paragraphs above load file */
- unsigned max_ld_para;
- /* max # of paragraphs requested by file */
- unsigned disp_stack_seg;
- /* rel displacement of stack segment */
- unsigned sp;
- /* contents of stack ptr on entry to prog */
- unsigned chksm; /* check sum for file */
- unsigned ip; /* beginning instruction ptr */
- unsigned rel_cs_seg; /* relative cs segment */
- unsigned off_reloc;
- /* offset to 1st relocation item typ. 1e */
- unsigned over_lay; /* overlay number */
- unsigned rsrved; /* ?? reserved ?? */
- /* relocation item format is seg:off location
- relative to the beginning of the code section */
- } EXE_HDR;
- struct MP_TBL {
- long addrs; /* segment beg. address */
- long haddrs; /* segment high addrs*/
- char class[12]; /* class of object */
- } maptable[120];
-
- void sort_table(void);
- void gethdr(char *bf);
-
- long hexcvt(char *num);
- size_t read_block(char far *segbuf, size_t segsz);
- long read_segm(int i);
- size_t write_block(char far *segbuf,size_t segsz);
- long write_segm(long segsize);
- int fix_segm(int i);
- int config(char *cfgfile);
- void term_error(int numerr);
-
- unsigned (*ch_ptr)[2];/* pointer to translation table
- [0] = offset,[1] = segment*/
- char mfile[14] = "rom.map"; /* dumy file names */
- char bfile[14] = "rom.bin";
- char efile[14] = "rom.exe";
- /*columns for starting and ending addresses in MAP */
- #define LCOL 1
- #define HCOL 8
- #ifdef BC
- #define MAPCOL 41;
-
- /*bc map file class column*/
- #else
- #define MAPCOL 45;
- /*ms map file class column */
- #endif
- int class_loc = MAPCOL;
- FILE *mapfile,*exefile,*binfile;
- #define BUF_SIZE 60
- char mapstring[BUF_SIZE];
- long fsize; /*number of bytes in exe file */
- int nsegs; /* number of segments in map */
- unsigned romsadr = 0xf000,ramsadr=0x40;
- char header[10000];
- EXE_HDR *filhdr = (EXE_HDR *)header;
- char far *seg_buffer;
- int next_fix = 0;
- char ram_class[15] = "FAR_DATA";
- unsigned ramdata; /* beginning ram segment*/
- /* ************ main ************* */
- int main(int argc,char *argv[])
- {
- int r_class_flag=1,i;
- long tmp,ssize;
-
- if((seg_buffer = (char far *) farmalloc(0x10000L))
- == NULL) term_error(0);
- if(argc == 1) term_error(-1); /*any cfg file name? */
- config(argv[1]);
- i=0;
- while (fgets(mapstring,BUF_SIZE,mapfile))
- {
- /* process map file from mapstring input */
- /* maptable[1] - n contains the segments -
- class STACK should be last one */
- /* ends with i having n+1 segments */
- if( (int)strlen(mapstring) > class_loc+1)
- { /* get rid of \n at end of string */
- mapstring[(int)strlen(mapstring)-1]='\0';
- if((tmp = hexcvt(&mapstring[LCOL])) >= 0)
-
- {
- maptable[i].addrs = tmp;
- maptable[i].haddrs=hexcvt(&mapstring[HCOL]);
- &mapstring[class_loc]);
- strcpy(maptabte[i].class,
- &mapstring[class_loc]);
- if(r_class_flag)
- if(strcmp(&mapstring[class_loc],ram_class)==0)
- { /*set it to first class occurance*/
- ramdata = (maptable[i].addrs) >>4;
- r_class_flag = 0;
- }
- printf("\n Segment %4.41x Class %s",
- maptable[i].addrs/16,maptable[i].class);
- i++;
- }/* end - if hexcvt */
- } /* end if strlen */
- if(i>=119) break; /* error too many segments */
- } /* end while */
- if(feof(mapfile))
- printf("\nend of file\n");
- else
- printf("\nerror reading map file\n");
- nsegs = i-1; /* number of segments [1 to nsegs] */
- gethdr(header); /* read in the exe header info */
- /* size of object section of file */
- fsize = (long) ((512L * (filhdr->file_size-1)) +
- filhdr->lst_sec_lng - 16L * filhdr->hdr_siz);
- printf("\nStartup Address %4.4x:%4.4x\n",romsadr+
- filhdr->rel_cs_seg,filhdr->ip);
- printf("Rom Size %lx\n",fsize);
- /* process the exe file header - sort fix ups */
- sort_table();
- /* read in exe file by segment
- do fixups from map table and
-
- write it to output file */
- for(i=0;i < nsegs;i++)
- {
- if((ssize = read_segm(i)) > 0L )
- { /* ignore 0 length segments */
- fix_segm(i);
- write_segm(ssize );
- }
- }
- /* done - end the program */
- farfree(seg_buffer);
- fcloseall();
- return 0;
- }
- /* ****************** */
- /* qsort routine for far pointers */
- int cmp_ptr(const void *a, const void *b)
- {
- long vala,valb;
- vala=((long)((unsigned *)a)[0])+
- (((unsigned *)a)[1]<<4);
- valb=((long)((unsigned *)b)[0])+
- (((unsigned *)b)[1]<<4);
-
- vala -= valb;
- if(vala < 0) return -1;
- if(vala > 0) return 1;
- return 0;
- }
-
- /*------------ sort_table ----------------*/
- /* sort header table */
- void sort_table(void)
- {
- qsort((void *)&header[filhdr->off_reloc],
- filhdr->num_reloc,4,cmp_ptr);
- }
- /* -------------- read_block ---------------- */
- size_t read_block(char far *segbuf,size_t segsz)
- {
- if(fread(segbuf,1,segsz,exefile) != segsz)
- term_error(-7);
- return segsz;
- }
- /* -------------- read_segm.------------------ */
- long read_segm(int i)
- {
- long segsize;
- segsize = maptable[i].haddrs - maptable[i].addrs;
- if(!segsize) return 0;
- segsize += maptable[i+1].addrs-(maptable[i].haddrs);
- if(segsize <= 0x8000)
- { read_block(seg_buffer,(size_t)segsize);
- } else {
- read_block(seg_buffer,0x8000);
- read_block((&seg_buffer[0x8000]),
- (size_t)(segsize-0x8000));
- }
- return segsize;
- }
- /* ------------------ write_block -------------------- */
- size_t write_block(char far *segbuf,size_t segsz)
- {
- if(fwrite(segbuf,1,segsz,binfile) != segsz)
- term_error(-8);
- return segsz;
- }
- /* ---------------- write_segm ------------------ */
- long write_segm(long segsize)
- {
- if(!segsize) return 0;
- if(segsize <= 0x8000)
- { write_block(seg_buffer,segsize);
- } else {
- write_block(seg_buffer,0x8000);
- write_block((&seg_buffer[0x8000]),
- (size_t)(segsize-0x8000));
- }
- return segsize;
- }
- /* ------------ fix_segm -------------- */
- int fix_segm(int i)
- {
-
- unsigned tmp,cseg,fixup;
- unsigned far * fixptr;
- cseg = (unsigned)(maptable[i].addrs/16L);
- while(next_fix < filhdr->num_reloc)
- {
- if(ch_ptr[next_fix][1] > cseg) break;
- tmp = ch_ptr[next_fix][0]; /*offset into buffer*/
- fixptr = (unsigned far *) &seg_buffer[tmp];
- fixup = *fixptr;
- /* modify segment fixup according to type */
- if(fixup >= ramdata)
- {/* modify for ram */
- fixup -= ramdata;
- fixup += ramsadr;
- }
- else
- {/* handle as rom */
- fixup += romsadr;
- }
- *fixptr = fixup;
- next_fix++;
- } /*end while */
- return 0 ;
- }
- /* ------------ hexcvt -------------- */
- /* do hex digits to unsigned long */
- long hexcvt(char *num)
- {
- char *term;
- long value;
- value = strtoul(num,&term,16);
- return value;
- }
- /* -------------- gethdr ---------------- */
- void gethdr(char *buf )
- {
- int i,j=32; /*index counter */
- if(fread(&buf[0],l,32,exefile) < 32)
- term_error(-6);
- /* have filhdr contents so get size of full header */
- if(fread(&buf[j],16,filhdr->hdr_siz-2,exefile)
- < filhdr->hdr_siz-2) term_error(-6);
- (unsigned *)ch_ptr =
- (unsigned *)(&buf[filhdr->off_reloc]);
- /* get address of relocation table - ch_ptr [n][m]
- m - 0 offset, 1 - seg, n relocation # */
- }
- /*---------------------- config.------------------------*/
- /* get configuration data */
- int config(char *cfgfile)
- {
- FILE *cfg;
- char buf[80];
- if((cfg = fopen(cfgfile,"r"))==NULL) term_error(-1);
- if(fgets(buf,80,cfg) == NULL) term_error(-2);
- if(sscanf(buf,"%s %s %s",&mfile,
- &efile, &bfile) != 3) term_error(-2);
- /* Now try to open input file 1 */
- if((mapfile=fopen(mfile,"r"))==NULL) term_error(-3);
-
- if((exefile=fopen(efile,"rb"))==NULL) term_error(-4);
- if((binfile=fopen(bfile,"wb"))==NULL) term_error(-5);
- if(fgets(buf,80,cfg) == NULL) term_error(-7);
- if(sscanf(buf,"%4x %4x",&romsadr,
- &ramsadr) != 2) term_error(-7);
- if(fgets(buf,80,cfg) == NULL) term_error(-7);
- if(sscanf(buf,"%s",&ram_class) != 1) term_error(-7);
- return 0;
- }
- /* error handler */
- char *errlist[10] = {
- "Memory Allocation Error",
- "No configuration file, USAGE:romldr cfgfile.cfg",
- "Configuration file error - File names", //-2
- "Map File open error", //-3
- "Exe File open error", //-4
- "Bin File open error", //-5
- "Error reading header", //-6
- "Error reading exe file", //-7
- "Error writing bin file", //-8
- " "
- };
- void term_error(int errnum)
- {
- errnum = abs(errnum);
- if(errnum >= 9) exit(-1);
- printf("%s\n",errlist[errnum]);
- farfree(seg_buffer);
- exit(errnum);
- }
-
- /* End of File */
-
-
- Listing 2 Sample program that produces segment classes
- DEMO.C
- /* Test Program for Rom Loader */
- /* Large model with some far data*/
- #include <conio.h>
- #include <dos.h>
-
- char msg[15] = "Hello World\n\r";
- int locint;
- int directvideo=1; /*BC direct to hw*/
- int far test;
- int far test2=0x55aa;
- int far * tptr = &test;
-
- void main(void)
- {
- int far * ptr2 = &test;
- test = 0x1111;
- locint = test+l;
- cputs(msg);
- }
- /* End of File */
-
-
- Listing 3 Map file for sample program
-
- DEMO.MAP
- Start Stop Length Name Class
-
- 00000H 00000H 00000H ROMSTART_BEG CODE
- 00000H 00E8FH 00E90H _TEXT CODE
- 00E90H 00EC9H 0003AH DEMO_TEXT CODE
- 00ED0H 00ED5H 00006H ENDCODE ENDCODE
- 00EE0H 00EE0H 00000H IDATA_BEG IDATA_BEG
- 00EE0H 00EE0H 00000H _FARDATA FAR_DATA
- 00EE0H 00EE3H 00004H DEMO5_DATA FAR_DATA
- 00EF0H 00FB9H 000CAH _DATA DATA
- 00FBAH 00FBBH 00002H _CVTSEG DATA
- 00FBCH 00FC1H 00006H _SCNSEG DATA
- 00FC2H 00FC2H 00000H CONST CONST
- 00FC2H 00FC7H 00006H _INIT_ INITDATA
- 00FC8H 00FC8H 00000H _INITEND_ INITDATA
- 00FC8H 00FC8H 00000H _EXIT_ EXITDATA
- 00FC8H 00FC8H 00000H _EXITEND_ EXITDATA
- 00FD0H 00FD0H 00000H IDATA_END IDATA_END
- 00FD0H 00FD0H 00000H UDATA_BEG UDATA_BEG
- 00FD0H 00FD1H 00002H _BSS BSS
- 00FD2H 00FD2H 00000H _BSSEND BSSEND
- 00FE0H 02FDFH 02000H STACK STACK
- 02FE0H 02FE0H 00000H UDATA_END UDATA_END
-
- Program entry point at 0000:0000
-
-
- Listing 4 An example of startup code
- NAME ROMSTART_TEXT
- ;
- ; example startup code for
- ; imbedded systems use
- ;
- ; segment classes
- ; High addrs (rom)
- ; 'BOOT'
- ; ""
- ; ""
- ; 'CODE' _TEXT segment
- ;
- ; ram
- ;
- ; 'STACK' UDATA_END segment
- ; 'BSS' IDATA_END and UDATA_BEG segments
- ; 'CONST'
- ; 'DATA' IDATA_BEG segment
- ;
- ; Rename object file output to c0x.obj
- ; where x = S,C,M,L, or H
- ; and locate in project subdirectory
- ; Memory model selection set to 1 all others 0
- ;
- SMALLM EQU 0
- C0MPACTM EQU 0
- MEDIUMM EQU 0
- LARGEM EQU 1
- HUGEM EQU 0
-
-
- STACK_SIZE EQU 1000H ;set desired stack size
- _acrtused equ 1 ;satisfy external reference
-
- PUBLIC_acrtused
-
- DGROUP GROUP IDATA_BEG, _DATA, CONST, IDATA_END,\
- UDATA_BEG, _BSS, STACK, UDATA_END
-
- ; this segment marks beginning of rom code
- ROMSTART_BEG SEGMENT BYTE 'CODE'
- ROMSTART_BEG ENDS
-
- IF SMALLM OR COMPACTM
- _TEXT SEGMENT BYTE PUBLIC 'CODE'
- EXTRN _main:NEAR ;main program
- ASSUME CS:_TEXT
-
- ENDIF
- IF MEDIUMM OR LARGEM OR HUGEM
- EXTRN _main:FAR ;main c program
- _TEXT SEGMENT BYTE PUBLIC 'CODE'
- ASSUME CS:_TEXT
- ENDIF
-
- ASSUME DS:DGROUP, SS:DGROUP
- PUBLIC start
-
- start PROC NEAR
- cli
- cld
- ;*****************************************
- ;
- ; do hardware initialization and ram check
- ;
- ;*****************************************
-
- PUBLIC init_ram
-
- init_ram:
- ;************
- ; transfer initialize data from rom to ram
- ;
- MOV BX, SEG IDATA_BEG
- MOV AX, SEG IDATA_END ;data to init.
- sub ax,bx
- mov cl,3
- shl ax,cl
- mov cx,ax
- jcxz no_init_data
-
- ;address of frame # in rom
- mov ax, seg ENDCODE ;needed for jbromldr
- inc ax ;ram init values begin at
- mov ds.ax ;segment ENDCODE + 1
- mov si,0
- ;address of frame # in ram
- mov ax, seg IDATA_BEG
- mov es,ax
- mov di,0
-
- ; initialize data and const segments
- rep movsw ;word transfer
- no_init_data:
- mov bx, seg UDATA_BEG ;clear bss data
- mov ax, seg UDATA_END
- sub ax, bx
- mov cl,3
- shl ax,cl
- mov cx,ax
- jcxz no_zerodata
- mov es,bx
- mov di,0
- mov ax, 0
- rep stosw
- PUBLIC no_zerodata
- no_zerodata:
- mov ax,DGROUP ; set up stack
- mov ds,ax
- mov ss,ax
- mov sp,OFFSET DGROUP:STACK_TOP
-
- ;********************************************
- ; add code to:
- ; 1) initialize INITDATA Functions (see c0.asm)
- ; 2) capture interrupt vectors 0-4
- ;********************************************
- sti ;enable interrupts
- call _main ;enter main program
- ;********************************************
- ; add code to:
- ; 1) shut down EXITDATA Functions
- ; 2) handle shutdown errors
- ; 3) prepare to restart
- ;********************************************
- jmp start
- start ENDP
- _abort PROC DIST
- PUBLIC _abort
- ;handle error abort
- jmp start
- _abort ENDP
- IF SMALLM OR COMPACTM
- _TEXT ENDS
- ENDIF
- IF MEDIUMM OR LARGEM OR HUGEM
- _TEXT ENDS
- ENDIF
-
- ; segment marks end of rom code
- ;make it non zero, length < 16 bytes
- ENDCODE SEGMENT PARA PUBLIC 'ENDCODE'
- db "ENDROM" ;seg ENDCODE+1 = begin ram
- ENDCODE ENDS
- ;**********************
-
- ;beginning of dgroup and initialized data in ram
-
- IDATA_BEG SEGMENT PARA PUBLIC 'IDATA_BEG'
- IDATA_BEG ENDS
-
-
- _FARDATA SEGMENT PARA PUBLIC 'FAR_DATA'
- _FARDATA ENDS
-
- _DATA SEGMENT PARA PUBLIC 'DATA'
- _DATA ENDS
-
- _CVTSEG SEGMENT WORD PUBLIC 'DATA'
- PUBLIC _RealCvtVector
- _RealCvtVector label word
- _CVTSEG ENDS
- _SCNSEG SEGMENT WORD PUBLIC 'DATA'
- _SCNSEG ENDS
- CONST SEGMENT WORD PUBLIC 'CONST'
- CONST ENDS
-
- _INIT_ SEGMENT WORD PUBLIC 'INITDATA'
- _INIT_ ENDS
- _INITEND_ SEGMENT WORD PUBLIC 'INITDATA'
- _INITEND_ ENDS
-
- _EXIT_ SEGMENT WORD PUBLIC 'EXITDATA'
- _EXIT_ ENDS
- _EXITEND_ SEGMENT WORD PUBLIC 'EXITDATA'
- _EXITEND_ ENDS
-
- IDATA_END SEGMENT PARA PUBLIC 'IDATA_END'
- IDATA_END ENDS
-
- ; end of initialized data
-
- UDATA_BEG SEGMENT WORD PUBLIC 'UDATA_BEG'
- UDATA_BEG ENDS
-
- _BSSSEGMENT WORD PUBLIC 'BSS'
- _BSSENDS
- _BSSEND SEGMENT BYTE PUBLIC 'BSSEND'
- _BSSEND ENDS
-
- STACK SEGMENT PARA STACK 'STACK'
- DW STACK_SIZE DUP (?)
-
- STACK_TOP LABEL WORD
- STACK ENDS
-
- ; end of initialized segment in ram
- UDATA_END SEGMENT WORD PUBLIC 'UDATA_END'
- UDATA_END ENDS
- ; bootstrap address for powerup reset
- ; far jump to program beginning
- ; may have to be manually placed in EPROM
- ;BOOTSTRAP SEGMENT AT 0FFFFH
- ; JMP FAR PTR start
- ;
- ;BOOTSTRAP ENDS
- end start
-
- ; End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- A Fuzzy-Logic Torque Servo
-
-
- Jack J. McCauley
-
-
- Since receiving a BSEE from UC Berkeley, Jack J. McCauley has been working as
- a Software Engineering Consultant. His specialty is real-time systems with an
- emphasis on servo controls and signal processing. He can be reached at (510)
- 531-1581.
-
-
-
-
- Background
-
-
- The developer of a control system desires a well-behaved system that is stable
- throughout the operating spectrum. In the process of designing and debugging
- the controller, operating regions that exhibit oscillatory or unstable
- operation are avoided. Nonlinearities are typically modeled using a series of
- discrete equations developed for the system.
- A PID or Proportional Integral Differential controller is an example of a
- closed-loop system. Implemented discretely, it would compute at z intervals:
- T(z) = T(z-1) + Kk * (E(z) - E(z-1))
- + Ki * E(z) + Kd * E(z-2) * (1 + E(z-1))
- In this linear example, the z sampled periods for the controller represent
- discrete time samples and Kk, Ki, and Kd are constants. A nonlinear system
- designer might attempt to modify Kk, Ki, and Kd as a function of input E(z).
- This of course would require Kk(z), Ki(z), and Kd(z) (and possibly Kk(z-1,
- z-2, .... z-n) etc). The designer would compute stability and phase margin in
- a continuous system and convert these to the discrete world using a Discrete
- Transform.
- After the analysis the designer would proceed to code T(z) into assembler or C
- and port it to a target. This article describes how to implement a simple
- fuzzy-logic based servo controller in the C programming language. C language
- portability allows the controller to run in either a micro-controller based
- environment or on a PC.
-
-
- Escaping Brittleness
-
-
- Brittleness is the bane of the control-system designer. A brittle system is
- one which "breaks" easily. For example, in the coding of T(z), we defined the
- output T(z) to be an eight-bit unsigned char that follows the eight-bit radix
- of our A/D converter. Suppose that several of the terms of T(z) were
- discreetly nine or ten bits. The result could be truncated and thus produce
- incorrect output when T(z) was cast to an unsigned char. We must check
- overflow after every operation of T(z) and perform corrective logical
- operations.
- Besides the implementation issues, PID and PI suffer from some serious
- real-world problems. In controls, non-linearities are the rule. Factors such
- as friction and temperature affect the behavior of systems. What appears to be
- stable under one set of conditions yields poor performance under another.
- There are techniques for dealing with changes. One possibility is to
- sliding-mode modify the PI gain parameters. Experience with this approach has
- shown that it works very well so long as the inputs are bounded properly.
- Current thinking, however, indicates that the amount of time invested in
- developing this approach will not be extracted in performance. What most
- people do to deal with non-linearities is to add a bunch of testing and
- branching code to deal with the caveats. In most PID, PI closed-loop systems,
- it ends up that the majority of the code is dedicated to dealing with these
- anomalies.
-
-
- Fuzzy Logic
-
-
- Fuzzy logic holds promise as a means of handling control-system
- non-linearities. It provides a unique way of looking at control problems. The
- control problem is described as a rule base, with a rule input matrix m(T) and
- a corresponding output function for that rule. When few rules are to be
- evaluated, simple linguistic rules can be substituted instead.
- One might imagine a fuzzy set m(T) that describes the outside temperature as
- "very cold," "cold," "warm," and "hot." The four members of m(T) are
- linguistic operators that represent the human inferred description of the
- space representation of each member of m(T). The membership function is
- typically a trapezoid that represents a degree of membership. The degree of
- membership is a real number between 0.0 and 1.0 with a degree of 1.0 meaning
- full membership and a degree 0.0 indicating no membership. (See Figure 1.)
- Suppose the outside temperature is --50 F. Then according to m(T) the outside
- temperature will lie in the domain of "very cold." If the temperature is 12 F,
- the temperature might not be so much "very cold" as it is just "cold."
- Membership functions almost always overlap in the domain of m(T). Fuzzy logic
- provides a means of dealing with overlapping domains and also domains of
- overlapping output sets. The math allows us to weigh the cumulative effect of
- all rules to generate a crisp output. A crisp output is also called a
- de-fuzzified output.
- The typical fuzzy plant (controller) consists of one or more input fuzzy sets,
- a rule base, and an output fuzzy set. The input sets fuzzify using the rule
- base, and the output set de-fuzzifies from the inputs, rules, and the output
- de-fuzzifier function.
-
-
- An Application
-
-
- This two-input controller is a torque regulator on a DC motor. The controller
- topology could easily be extended to any two-input, single-output fuzzy
- controller.
- We desire to regulate the torque applied to the shaft of a permanent-magnet DC
- brush motor. The torque Tmotor applied to the shaft is a linear function of
- armature current Imotor and (to a lesser extent which will be ignored here)
- motor temperature. Armature current is controlled by a Pulse Width Modulated
- (PWM) amplifier which varies the duty cycle DA from 0 to 100 per cent as a
- linear function of input voltage.
- If a positive voltage is applied to the motor terminals, a positive torque
- will be applied to the motor shaft. If a negative voltage is applied, a
- negative torque will be applied to the motor shaft. The job of the servo is to
- regulate the torque on the motor shaft under varying load conditions. The
- servo applies an adjustment to DA based on the commanded torque input,
- Tcmd(z), and the feedback torque input:
- Tarm = Imotor(z) * Kmotor(z)
- sensed on the motor shaft. (See Figure 2.)
- The eight-bit PWM applies a positive voltage to the terminals if the PWM value
- is between 128 and 255. A negative voltage is applied if the value is between
- 0 and 126. An eight-bit A/D converter and pre-amp sense the feedback torque.
- The granularity of the feedback torque is determined by the radix of an
- eight-bit A/D converter. An eight-bit D/A (PWM) assures coherency between
- conversion radixes on the input. (See Figure 3.)
-
-
- Fuzzy Controller for Torque
-
-
- The fuzzy controller is implemented in C on an inexpensive high-integration
- micro-controller. A simulation program executes on a PC-AT, to assist
- debugging of the fuzzy servo prior to porting it to the micro-controller.
- In this system the fuzzy plant is a two-input, single-output controller. Three
- fuzzy sets exist with the input sets occupying two dimensions of the rule base
- and the output set occupying the action to be performed on each rule, which is
- the combination pointed to by the input fuzzy set membership functions.
- The first input fuzzy variable Terror is the error between the setpoint or
- command torque at time t, and the feedback torque or:
- Terror(z) = Tcmd(z)- Tarm(z)
-
- The second input dTerror/dt is the "rate of change of error" or how quickly
- the feedback torque at time t is changing with respect to the previous sample
- at time t--1:
- dTerr(z)/dt = Tarm(z) - Tarm(z-1)
- The output function m(DA) determines the action to be performed upon
- evaluation of the rules. For example, if the instantaneous feedback torque is
- at the setpoint:
- Tcmd(z) ~ Tarm(z)
- but the feedback torque derivative is non-zero:
- Tarm(z) > Tarm(z-1)
- we might wish to apply a slight adjustment to DA so that Terror(z) and
- dTerr(z)/dt remain zero. The amount of adjustment applied to DA is determined
- by the inputs and also the output control fuzzy set m(DA). Which membership
- function of m(DA) to apply is determined by the input membership function
- space and the operator-inferred description of the control problem in the
- rule-base table. The idea is to maintain zero error and zero error derivative:
- dTerr(z)/dt = Terror(z) = 0
- The number of inputs and the dimension of each determines the rule base
- matrix. For example, we have two input fuzzy sets, m(T(Verr) and m(dTerr/dt),
- with five membership functions each. This yields a 5x5 or 25 rule-base
- controller. The corresponding action to be performed by the evaluation of each
- input rule determines the output controlling membership function m(DA). For
- example, rule (3, 3) is:
- IF Terror == M
- AND dTerr/dt == -M
- THEN DA == Z
- and rule (3, 2) is:
- IF Terror == M
- AND dTerr/dt == Z
- THEN DA == -M
- A literal evaluation of rule (3,2) would read, "If the feedback torque is less
- than the setpoint, and the feedback torque is approaching the setpoint at
- medium rate, then apply zero bias to the PWM." The key to fuzzy logic control
- is that the degrees of membership incorporate the amount of bias to apply and
- the degree of truth to each rule. (See Figure 4.)
-
-
- When a Rule Fires
-
-
- A rule fires when a nonzero result is returned upon evaluation of the rule.
- The degree to which the rule is true is incorporated in the membership
- functions. If the input values lie within the membership function trapezoid
- space, then the input is valid. A rule fires when both inputs to the rule are
- non-zero. A rule is evaluated using ternary operators similar in concept to
- Boolean operators. The operators logically follow the AND, OR, and NOT
- constructs.
- Rules and rule bases can be built using the operators exactly like those in
- standard Boolean logic, but because membership functions deal with degrees of
- truth we must have a mechanism for evaluating the operators:
- AND == MINIMUM(Input x, Input y)
- OR == MAXIMUM(Input x, Input y)
- NOT == 1.0 -Input x
- The AND operator returns the minimum of the result of a rule that has fired.
- The OR operator returns the maximum of the result of a rule that has fired.
- The NOT operator returns the complement of its input x, which is roughly the
- analog of ~x in C.
- In this application the AND operator is used exclusively to operate on the
- rule table inputs, and the normalized 0.0 -> 1.0 values are actually
- normalized eight bit values from 0 to 255.
- (let dTerr(z)/dt = -11 and Terr(z) = 90)
- In evaluating the fired rules (3, 3) and (3, 2) shown above, we first
- determined if the inputs lie within the domain of m(Terror) and m(dTerror/dt)
- by examining membership functions for m(Terror) and m(dTerror/dt). If the rule
- fires, we take the a cut for each of the input membership functions M and Z.
- The a cut graphically slices the top of the output controlling membership
- function. The AND operator returns the lesser of the two a cuts for the inputs
- m(dTerror/dt, L) and m(Terror, M) for use in the slicing operation. Because of
- the overlapping domain of the membership functions, more than one rule will
- typically fire. When this occurs, fuzzy logic provides a mathematical method
- for dealing with multiple rule findings. the output of each rule is an output
- membership functions that has been a cut and is then graphically ORed with the
- accumulated preceding output membership functions. The resultant resembels a
- "shadow" of each fired rule with the accumulated output overlapping each
- preceding fired rule. Mathematically, the accumulation is the fuzzy OR
- operator (MAXIMUM) applied to the current rule that fires and the accumulated
- fired rules. (See figure 5.)
-
-
- De-fuzzification
-
-
- The accumulated result of all firing rules occupies a two-dimensional space
- which is a fuzzy set with one membership function. The resultant must be
- converted to real world crisp result which takes into account all applied
- rules. To do this, a spatial average taken which computes the center of area
- (COA) of the function. COA is among several methods for computing the average.
- It is the method used in this application because it executes quickly.
- COA = (25*18 + 25*31 + 128*100
- + 128*127 + 128*140)
- /(25 + 25+ 128 + 128 + 128) = 74
- So the DA value written to the PWM/D/Ais 74.
-
-
- C Language Implementation
-
-
- The membership functions use table look up to follow the example:
- m(dTerror/dt) (in oz-in):
- -L WHEN dTerr(z)/dt<=-35
- -M WHEN -50 <= dTerr(z)/dt =< 0
- ZM WHEN -10 > = dTerr(z)/dt <= 7
- M WHEN 0 <= dTerr(z)/dt <= 50
- L WHEN dTerr(z)/dt >= 10
- All 25 rules evaluated and the rules that fire result in a output according to
- the COA output de-fuzzifier function. The output array stores the result of
- the rule firing and the history of predicate rules. Conflicts are resolved
- using the fuzzy OR MAXIMUM) operator. The resultant outputs contribute to a
- weighted sum to yield the COA output. This strategy allows for the quick
- computation of a crisp value using the eight-bit multiples.
- For a micro-controller application, It was determined by experimentation that
- a loop repetition rate of at least 1 KHz would be required for satisfactory
- performance. Command set-points were entered through a serial port and an
- oscilloscope measured the response of the loop. Figure 6 shows the measured
- step response, which is quite adequate.
-
-
- Conclusion
-
-
-
- Fuzzy logic control seems to hold promise as an alternative to standard PID
- control. Rigid modeling is not required because the servo tuning is done via
- the fuzzy set membership functions and the rules. The rules incorporate
- Boolean statements and operator-inferred control rather than strict
- theoretical modeling. The membership functions allow tuning of the control
- system by altering the space that they occupy, and the placements of the
- output membership functions provide the control action to be performed by the
- rule. The rules use Boolean-like operators that infer a control action
- depending upon the degree of truth of the rule firing. A rule fires if the
- fuzzified inputs return a non-zero degree of truth, referred to as the a cut
- for each input.
- Fuzzy logic is not as brittle as standard PID control because of the natural
- language approach to the control problem. Second-order effects such as
- temperature can be incorporated directly in the rule by logically appending
- another fuzzy set for temperature onto the rule. Suppose that we know that an
- increase in temperature causes the torque to change much more rapidly. By
- creating a rule base that incorporates temperature, we compensate for the
- change, as in:
- IF Terror == M
- AND dTerr/dt == -M
- AND Temperature == L
- THEN DA = M
- In PID control second-order variables such as temperatures are not so easily
- incorporated into the model.
- The fuzzy controller is also less prone to radix overflow and underflow
- problems. In this application, the controller was implemented using eight-bit
- LUTs. All mathematical operations were cast to 32-bit long, thus preventing
- overflow. Because of the averaging nature of COA, it is inherently safe from
- overflow.
- Figure 1 Fuzzy set for temperature
- Figure 2 Feedback configuration of a typical torque servo
- Figure 3 Fuzzy plant for torque servo
- Figure 4 Fuzzifier rules and membership functions for 8-bit system
- Figure 5 Example of Fuzzify >=Defuzzify when two rules fire
- Figure 6 An illustration of de-fuzzification
-
- Listing 1 C declaration code for rule table and membership functions
- /**************************************************************
- File: fuzzy.h
- Date: 4/3/93
- Author: Jack J. McCauley
- Header file for fuzzy.c
- **************************************************************/
-
- /* membership function size */
- #define TORQUE_MEMBERS 5
- #define DER_MEMBERS 5
-
- /* loop frequency */
- #define LOOP_HZ 1
-
- /* integrator constant z-1*/
- #define SKIP 1
-
- /* normalized value */
- #define NORMAL 255
-
- /* Current ma */
- #define MAX_TORQUE 1000
- #define MIN_TORQUE -1000
-
- /* derivative */
- #define MAX_DERI LOOP_HZ*MAX_TORQUE
- #define MIN_DERI LOOP_HZ*MIN_TORQUE
-
- /* pwm 1/10 % */
- #define MAX_PWM 255
- #define MIN_PWM 0
-
- /* Max and Min */
- #define MIN_ERROR 0
- #define MAX_ERROR 255
-
- /* size of all arrays */
- #define ARRAY_SIZE 256
-
- /* represent a normalized 0 -> 1000 = 0.0 -> 1.0 */
- #define FUZZY_RADIX NORMAL
-
-
- /* fuzzy max (ternary) operator */
- #define FUZ_MAX( x, y ) ((x>y) ? x : y)
-
- /* fuzzy min (ternary) operator */
- #define FUZ_MIN( x, y ) ((x<y) ? x : y)
-
- /* fuzzy AND operator */
- #define FUZ_AND( x, y ) FUZ_MIN( x, y )
-
- /* fuzzy OR operator */
- #define FUZ_OR( x, y ) FUZ_MAX( x, y )
-
- /* fuzzy compliment operator */
- #define FUZ_NOT( x ) ( FUZZY_RADIX - x )
-
- /* create the membership function space */
- /* first is the torque error fuzzy set */
- struct s_torq_members {
- int neg_large[ARRAY_SIZE];
- int neg_med[ARRAY_SIZE];
- int zero[ARRAY_SIZE];
- int pos_med[ARRAY_SIZE];
- int pos_large[ARRAY_SIZE];
- float slope;
- float intercept;
- } torq_members;
-
- /* next is the rate of change fuzzy set torque error */
- struct s_deri_members {
- int pos_large[ARRAY_SIZE];
- int pos_med[ARRAY_SIZE];
- int zero[ARRAY_SIZE];
- int neg_med[ARRAY_SIZE];
- int neg_large[ARRAY_SIZE];
- float slope;
- float intercept;
- } deri_members;
-
- /* last is the PWM fuzzy set ouput */
- struct s_pwm_members {
- int neg_large[ARRAY_SIZE];
- int neg_med[ARRAY_SIZE];
- int zero[ARRAY_SIZE];
- int pos_med[ARRAY_SIZE];
- int pos_large[ARRAY_SIZE];
- float slope;
- float intercept;
- } pwm_members;
-
- /* define output rule table members */
- #define L pwm_members.pos_large
- #define M pwm_members.pos_med
- #define Z pwm_members.zero
- #define NM pwm_members.neg_med
- #define NL pwm_members.neg_large
-
- /* finally the fuzzy sets */
- /* torque feedback error */
- int *dTerr_dt[] = {
-
- deri_members.pos_large ,
- deri_members.pos_med ,
- deri_members.zero ,
- deri_members.neg_med ,
- deri_members.neg_large ,
- };
- /* torque (OZ-in) */
- int *Terror[] = {
- torq_members.neg_large ,
- torq_members.neg_med ,
- torq_members.zero ,
- torq_members.pos_med ,
- torq_members.pos_large ,
- }
-
- /* create the rule table and allocate space */
- struct s_rule {
- int *table[TORQUE_MEMBERS];
- } rule[DER_MEMBERS] =
- /* */
- {{Z, NM, NM, NM, NL },
- {M, Z, NM, NM, NL },
- { L, M, Z, NM, NL },
- { L, M, M, Z, NM },
- { L, M, M, M, Z }};
-
- The Fuzzy membership functions are stored in a look up table for all input
- and output fuzzy sets. Fuzzy variables with three digits of precision are
- normalized between 0 and 255 integer corresponding to an 8-bit radix converter
- and a fuzzy set 0.0 - 1.0 normalized value. Look up tables increase bandwidth
- and allow the servo to run with few floating point calculations.
-
- /****************************************************
- File: fuzzy.c
- Date: 4/3/93
- Author: Jack J. McCauley
- fuzzy torque controller for motor
- ****************************************************/
-
- #include "stdio.h"
- #include "stdlib.h"
- #include "conio.h"
- #include "proto.h"
- #include "fuzz.h" /* the above file */
-
- /****************************************************
- Routine: calc_slope
- Date: 4/3/93
- Author: Jack J. McCauley
- calcs line slope of two points in a plane
- *****************************************************/
- float calc_slope( float x1, float x2, float y1, float y2 ) {
-
- float slope;
-
- if ( x1 == x2 )
- slope = 10000000.0;
- else if( y1 == y2 )
- slope = 0.0;
-
- else {
- slope = (y1 - y2)/(x1 - x2);
- if( slope > 100000000.0 )
- slope = 100000000.0;
- }
- return( slope );
- }
- /*****************************************************
- Routine: calc_intercept
- Date: 4/3/93
- Author: Jack J. McCauley
- calcs line intercept of two points in a plane
- *****************************************************/
- float calc_intercept( float x1, float x2, float y1, float y2 ) {
-
- float intercept;
-
- if ( x1 == x2 )
- intercept = 100000000.0;
- else {
- intercept = (y2*x1 - y1*x2)/(x1 - x2);
- if( intercept > 100000000.0 )
- intercept = 100000000.0;
- }
- return( intercept );
- }
- /* end print_array */
-
- /**********************************************************
- Routine: down_load()
- Date: 4/3/93
- Author: Jack J. McCauley
- initialize fuzzy membership function tables from serial port DRIVER not shown
- ***********************************************************/
- void down_load( int *membership_function, int len )
- {
- int k;
- char buff[32];
-
- /*short, tight loop */
- for( k=0; k< len; k++ ) {
- /* get ascci string from driver */
- gets( buff, 12 );
- *membership_function++: atoi( buff );
- }
- }
- /* end down_load() */
- /********************************************************
- Routine: fuzzy_init
- Date: 4/3/93
- Author: Jack J. McCauley
- initialize fuzzy membership function tables
- *********************************************************/
- void fuzzy_init( void )
- {
- int k;
- long slope;
- long intercept;
- int val;
-
-
- /*
- There are several ways to generate these tables:
-
- 1) The easiest is to simply use ROM space and store them permanently in
- memory. FUZZ.H
- would need to be modified to reflect the static ROM delecerations for each
- fuzzy
- set and the values would be initialize directly attaching them to the data
- arrays.
-
- 2) Down load the fuzzy sets through the serial port as I did in the
- tuning of this servo. In which case I've included that code here (of
- course in the ROM version of the system you'll need 1)).
-
- 3) Store the slope-intercept form of each membership function and
- calculate the line slopes and intercepts
- */
-
- /*
- Membership functions are downloaded through the serial port from a file in
- ASCII
- format one set member at a time. In this system I used a spreadsheet and
- graphical interface to actually draw the membership functions with a mouse.
- This aided in tuning the servo substantailly. In the ROH version of the
- system,
- I wrote a small programm to append the RDM statics memberships to fuzz.h.
- */
- down_load( deri_members.neg_large, ARRAY_SIZE);
- down_load( torq_members.neg_large, ARRAY_SIZE);
- down_load( pwm_members.neg_large, ARRAY_SIZE);
- down_load( deri_members.neg_med, ARRAY_SIZE);
- down_load( torq_members.neg_med, ARRAY_SIZE);
- down_load( deri_members.zero, ARRAY_SIZE);
- down_load( torq_members.zero, ARRAY_SIZE);
- down_load( pwm_members.zero, ARRAY_SIZE);
- down_load( deri_members.pos_med, ARRAY_SIZE);
- down_load( orq_members.pos_med, ARRAY_SIZE);
- down_load( pwm_members.pos_med, ARRAY_SIZE);
- down_load( deri_members.pos_large, ARRAY_SIZE);
- down_load( torq_members.pos_large, ARRAY_SIZE);
- down_load( pwm_members.pos_large, ARRAY_SIZE);
-
- /* slope intercept line calculation */
-
- /* these line equations are used for scaling the
- crisp values from the above arrays */
- deri_members.slope = calc_slope(MIN_DERI, MAX_DERI, 0, ARRAY_SIZE-1 );
- deri_members.intercept = calc_intercept(MIN_DERI, MAX_DERI, 0, ARRAY_SIZE-1
-
- torq_members.slope: calc_slope( MIN_TORQUE, MAX_TORQUE, 0, ARRAY SIZE-1 );
- torq_members.intercept = calc_intercept(MIN_TORQUE, MAX_TORQUE, 0,
- ARRAY_SIZE-1);
-
- pwm_members.slope = calc_slope(0, ARRAY_SIZE-1, MIN_PWM, MAX_PWM);
- pwm_members.intercept = calc_intercept(0, ARRAY_SIZE-1, MIN_PWM, MAX_PWM);
- /*
- for( k=0; k<ARRAY_SIZE; k++ )
- printf("%d %4d %4d %4d %4d
- %4d\n["],k,deri_members.pos_large[k],deri_members.pos_med[k],
- deri_members.zero[k],deri_members.neg_med[k],deri_memb
- ers.neg_large[k]);
- */
- }
- /* end FUZZINIT */
-
-
- /***********************************************
- Routine: defuzzify_COA
- Date: 4/3/93
- Author: Jack J. McCauley
- defuzzify using COA
- **********************************************/
- float defuzzify_COA( int *output ) {
-
- static long k;
- static long numerator, denominator, val;
-
- numerator = 0;
- denominator = 0;
-
- for ( k=0; k<ARRAY_SIZE; k+=SKIP ) {
-
- val = (long)*output;
- output+=SKIP;
- /* look for non-zero values and include in our COA calc */
- if( val ) {
- denominator += val;
- /* ROM based later */
- numerator += val *(long)k;
- }
- }
- /* divide if non-zero x/D */
- if( denominator ) {
- /* return crisp value */
- /* Could be converted from float if desired */
- numerator = (long)((float)(numerator/denominator) * pwm_members.slope +
- pwm_members.intercept);
- return( numerator );
- } else
- return( 0 );
-
- }
- /* end COA calc */
-
- /******************************************************
- Routine: fuzzify
- Date: 4/3/93
- Author: Jack J. McCauley
- fuzzifier for our servo
- ******************************************************/
- int fuzzify( int derr_dt, int err,
- int *p_Terror, int *p_dTerror,
- int *p_out, int *resultant ) {
-
- static int k, val, alpha_cut, temp, *moe;
-
- /* COA method */
- /*
- Get the alpha cut for error and derivative. Use the AND (min) operator and cut
- the
- output , a non-zero fuzzy MIN indicates a rule has fired , write the cut to
- the result
- array by "shadowing" the exisiting using the MAX operator
- */
-
- /* normalize the output derivative */
-
- moe: resultant;
-
- /* Find out if the rule fired. If the rule fired then get the alpha cut */
- if( (alpha_cut = FUZ_AND( p_dTerror[derr_dt], p_Terror[err] )) != 0 ) {
- for ( k=0; k<ARRAY_SIZE; k+=SKIP ) {
- /*
- An interesting effect will be noticed if the skip is set greater than
- 1. In this system setting skip to lets say 2 or three will yield
- in most circumstance yield the same defuzzified crisp output if the
- slope of the membership functions are not too steep.
- The other benifit is that the execution speed is increased greatly
- */
- /* create shadow */
- val = *p_out;
- /* don't get bit by shortcuts *./
- val = FUZ=MIN( alpha_cut, val );
- temp = *resultant;
- * resultant = FUZ_MAX( temp, val );
- resultant+=SKIP;
- p_out+=SKIP;
- }
- /* rule fired */
- return( 1 );
- } else
- /* rule didn't fire */
- return( 0 );
- }
- /* end FUZZIFICATION calc */
-
- /********************************************************
- Routine: servo_torque
- Date: 4/3/93
- Author: Jack J. McCauley
- fuzzy servos to torque set point
- ********************************************************/
- /*
- Routine would run as an ISR attched to a timer interrupt passed values are
- read
- from and A/D convertor and command torque (serial port etc..)
- */
- long servo_torque( int error, int derr_dt )
- {
- /* statics for speed */
- static int row, column;
-
- /* pointers to tables */
- static int *p_out, *p_dTerr_dt, *p_Terror;
-
- /* COA container class */
- static int resultant[ARRAY_SIZE];
-
- /* zero fill the array */
- for( row = 0; row<ARRAY_SIZE; row++ )
- resultant[row] = 0;
-
- /* normalizing the error derivative will allow the error to ff
- /* normalize error derivative to look up table RADIX*/
- error = (long)((float)error * torq_members.slope + torq_members.intercept);
-
- /* normalize error derivative*/
-
- derr_dt = (long)((float)derr_dt*deri_members.slope + deri_members.intercept);
-
- /* MAX and MIN error */
- if( error > MAX_ERROR )
- error = MAX_ERROR;
- else if( error < MIN_ERROR )
- error = MIN_ERROR;
-
- /* MAX and MIN derivative */
- if( derr_dt > MAX_DERI )
- derr_dt = MAX_DERI;
- else if( derr_dt < MIN_DERI )
- derr_dt = MIN_DERI;
-
- /* traverse the rule table and evaluate the rules */
- for( row = 0; row < DER_MEMBERS; row++ ) {
-
- /* get pointers to tables from data structures */
- p_dTerr_dt = dTerr_dt[row];
- for( column = 0; column < TORQUE_MEMBERS; column++ ) {
-
- /* get pointers to tables from data structures */
- /* get the output membership function for that rule evaluation */
- p_out = rule[row].table[column];
- p_Terror = Terror[column];
- /* fuzzify the rule */
- fuzzify( derr_dt, error, p_Terror, p_dTerr_dt, p_out, resultant );
- }
- }
-
- /* defuzzify using COA */
- return( defuzzify_COA( resultant ) );
-
- /* return the fired rules */
- }
-
- /*END */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Weight Reduction Techniques in C++
-
-
- Randy Kamradt
-
-
- Randy Kamradt has been programming in C/C++ for the past seven years. He is
- currently working for TEAM Software, a fashionably lean consutling company
- developing an integrated database package for the Ventura County
- Superintendent of Schools, in Ventura California.
-
-
-
-
- Introduction
-
-
- I have been working on a large scale database downsizing project for the
- Ventura County School District since the beginning of 1993. Recently we turned
- over to the users the first application, a teaching-credential tracking
- database. Instead of the ticker tape parade we had expected, all they did was
- comment on how slow it seemed to run. Even though we used all the latest buzz
- words -- such as "object-oriented" and "client-server" -- we had to admit it
- was very sluggish. So we began an intense optimization effort, putting our
- application on a crash course of exercise and diet, to get it down to size and
- up to speed. This article deals with the potential problems facing
- object-oriented programming in C++ and some of the solutions we used.
- One of the major advantages to object-oriented programming is that it can hide
- complexity. Very complicated sub-systems can be wrapped by classes. These
- classes provide well defined interactions, making their objects relatively
- immune to environmental factors. The down side to this is that the application
- programmer may not be aware of all the ramifications of using an object.
- Objects may encapsulate the acquisition of system resources, or they may
- create sub-objects. Just declaring a local variable can be a costly procedure.
- One problem that faced us early on was from a class of the Commonbase Database
- class library, DBTable. Just creating a variable of type DBTable had the side
- effect of opening a communication channel (socket) to the database server. We
- were allocating DBTable objects from free store and in rare cases they never
- got deleted. Since our networking software had only 20 sockets available, the
- application would run fine for a while and then run out of sockets. This is
- actually an old problem, familiar to C programmers, but with a new twist: more
- than memory might be allocated by creating a user-defined type. Since this
- caused the program to die it was fixed early on, but other similar problems
- existed. We had to automate the creation of each DBTable, so as not to create
- it until it was needed, and to delete it as soon as possible. It could not be
- left up to the application programmer.
- Another side of complexity hiding is that an object may be a front for many
- sub-objects. One of our objects, BMOJoin (Business Model Object Join),
- contained a list of DBTables. A join is database jargon for a relational
- combination of tables. Since a DBTable holds one socket, a BMOJoin can hold
- many. To add to this problem, another object, BMOIterator has a pointer to a
- BMOJoin. As I will explain later, the BMOIterator was the key to controlling
- all these resources, and minimizing their allocation.
- Another class, ACond, represents a logical condition. It is implemented as an
- expression tree. So doing a simple assignment of an ACond could be costly.
- Again this is actually an old problem, the only difference being that in C
- such copying can be done only by calling a function, while in C++ the function
- can be initiated with an assignment operator. C programmers know that calling
- a function can take time and resources, but using built-in operators always
- takes (relatively small) constant time and never takes system resources. These
- assumptions go out the window in C++.
- Free-store usage is another problem. Just opening a window and tieing it to a
- database in our application takes thousands of memory allocations, some of
- them just for a few bytes. Memory allocation is the lifeblood of C++
- programming because objects should be made autonomous, and cannot rely on
- outside sources such as local or global variables for their internal memory
- space.
- A string class, for instance, typically contains a pointer to memory
- containing the string (see Figure 1). That memory must come from free store if
- the class is to be considered general purpose. If the memory came from outside
- the class, the string could not delete it when it was done, and the user would
- have to delete it if necessary. The issue of pointer ownership is important in
- making a safe application, and helps alleviate some of the common C pointer
- troubles such as deleting memory twice, or forgetting to delete it at all. It
- also means that memory allocation and copying may be done more often than
- absolutely necessary.
- Another problem introduced with object-oriented programming is
- over-generalization. With base classes one can provide the common subset of
- functions available from a set of derived classes. If the programmer uses the
- base class, the full set of functions may not be available. An example of this
- is the use of a list object. A list object might be represented either by a
- linked list or by an array. If the programmer uses a list and needs to get to
- the n-th object, indexing may not be an option. The programmer will have to
- iterate through the list, keeping a count. If the programmer uses an array
- object, indexing is available.
- Another example of this is the use of a complex number class. The operation (2
- + 0i) * (6 + 0i) can be calculated much easier as 2 * 6. Because the more
- general form is used, the optimization for a special case is missed. The
- ability to generalize is a major plus for C++, but can limit optimization.
-
-
- Code Bloat
-
-
- Another issue I deal with here is code bloat. I used to think a program was
- big if I had to leave small model (greater than 64 kilobytes of code or data).
- Our current application is about 1.5 megabytes and growing (from about 30,000
- lines of code and two libraries). Part of this growth stems from the tendency
- to make objects all-purpose (or worse, multi-purpose). Poorly designed objects
- often give liberal access to their internals, or multiple methods of access. A
- well designed object should do one thing, one way, and do it well. The public
- interface should be minimal in order to guide the programmer to the expected
- usage, and to keep things simple.
- Making matters worse, currently the linkers I use (Borland's tlink and HP-UX
- 1d) always link in all virtual functions even if they are never used. This is
- because all virtual functions are referenced in an object's virtual table,
- which is included with a class constructor. I have heard that new linkers are
- addressing this problem, but without thorough evaluation of all the code, they
- will have trouble determining whether any one virtual function will assuredly
- be called (or assuredly not be called).
- Another culprit in code bloat is the inline function. In the C++ style of
- object-oriented programming, small functions are very prevalent, and inline
- functions are necessary for a well oiled program. The factor that should
- decide whether a function is inline is the ratio of time spent calling the
- function to the time spent inside the function. If the time spent inside the
- function is much greater than the time spent calling the function, making it
- inline won't help speed much and will simply contribute to excessive code
- size. This implies that to use inlines effectively one should be aware of the
- inner workings of the compiler, and all of the side effects caused by some C++
- functions.
- One of the best ways to optimize is to use a profiler. I used the Borland
- Windows profiler extensively while optimizing. In spite of its tendency to
- crash, or reboot my computer occasionally, it was a tremendous help not only
- in optimizing, but in finding bugs and memory leaks. The Borland profiler is
- interactive, much like a visual debugger. It allows you to stop the program at
- any point to examine statistics collected, to turn profiling on and off during
- a single run, and to selectively profile portions of the code. It provides
- time spent on a single line of code, and the number of times a single line is
- called. This can be very handy for finding hidden bugs (see Figure 2).
- Profiling event-driven programs can be a challenge since the main loop of the
- program may be inaccessible. The method I used was to profile each of the main
- classes of the program individually. Since we keep all of the code for a class
- in one module, that meant compiling that module with the debugging option on.
- In the profiler I could then set a profile area on every line in the module.
- After running the program and doing a few transactions that would exercise the
- class I was profiling, the statistics window would show me where the most time
- was spent. I used this technique for finding candidates for inlining and other
- optimizations. By profiling the main classes first, I was able to tell which
- of the sub-classes were in need of optimizing, focusing my effort where it was
- needed.
- There are some limitations on inlining virtual functions. If a virtual
- function is called via a pointer or reference to an object, the actual
- function that gets called depends on the original type of that object. The
- compiler cannot determine at compile time the correct function, and therefore
- it cannot inline it. Virtual functions that are called via an actual object,
- or ones that are explicitly called with the :: operator, can be inlined (see
- Figure 3). In at least one case we changed a virtual function to a non-virtual
- function in order to take advantage of inlining. This step must be taken with
- caution, however, possibly adjusting for any change in functionality.
- The decision to inline doesn't need to be all or nothing for a function. There
- might be a situation where a function can be split apart to facilitate
- inlining. Figure 4 shows that the member function GetWidgetPointer gets called
- 10,000 times, but only has to create a Widget 10 times. By splitting this
- function in two, the part that gets executed 10,000 times can be inlined, and
- the main part of the code can be isolated in a private member function.
- Some special precautions should be taken when inlining constructors and
- destructors. The compiler may add code that you didn't call explicitly. For
- example with an inline constructor, the compiler will also inline for you the
- setting of the virtual table pointer, calls to any base class constructors,
- and calls to constructors of any data members that have constructors. Borland
- C++ also adds code to check if the constructor is called on the behalf of a
- new statement and optionally calls a memory allocation routine. The
- destructors will do the same.
- One more precaution for inlines, if the inline contains a function call, that
- function call may also be inline, which may also contain an inline function,
- and so on. Make sure you know what you put inline, Just as an aside, some
- current compilers forget to call the destructors for local variables in inline
- functions. This is important if the local variables hold memory or system
- resources and can cause a memory leak. For now I wouldn't create local
- variables or pass-by-value user-defined objects in an inline function.
-
-
- Pointer Tricks
-
-
- One of the best ways to address rampant free-store allocation is by counting
- pointers. This is a simple but effective technique that can be used on any
- class to speed up assignments and passing by value. It is best targeted at
- utility classes that are used in many different places and cannot be isolated.
- A good example is a string class. I pointed out above that a class always
- allocating its own memory can lead to excessive memory allocation and copying.
- But if the string class contains a pointer to an intermediate data structure
- with a count, assignment becomes a simple matter of decrementing and
- incrementing a counter (see Figure 5).
- By adding a little smarts to the string class we can reduce the amount of work
- done, but we don't have to change the external appearance of the class. It is
- important to maintain the external appearance if you need to change the class,
- but don't want to make changes to all modules that use the class.
- One problem still remains. If you assign string1 to string2, then modify
- string1, string2 will also get modified. This problem is solved by using
- "copy-on-write" semantics. Copy-on-write is a strategy where any member
- function of the string class that modifies the string will first call a
- function that splits off a private copy of the internal implementation (see
- Figure 6).
- In our application we use both pointer counting and pointer counting with copy
- on write. The class BMOIterator mentioned above uses pointer counting as a
- smart pointer. Besides constructors, assignment operators, and a destructor,
- the only member function is the overloaded operator->. An object used with
- this operator appears as simply a pointer (see Figure 7). In order to work
- intuitively as a pointer, we did not implement copy on write. Also, since a
- BMOIterator holds system resources, we judged it best that no copying should
- take place unless explicitly asked for.
- In another class, AttributeList, which is basically a list of integers, we did
- implement copy on write. Since these AttributeLists are not often modified,
- adding copy on write doesn't add significantly to run time. However after
- running the AttributeList code through the profiler, we discovered that there
- were many thousands of empty lists being created, and much time being spent
- allocating arrays of zero length. To alleviate this situation, we established
- a special case where a zero-length list was represented by a null
- implementation pointer (see Figure 8.). Copying and assignment are slightly
- longer but the default constructor is simplified. Only after using the
- profiler did we become aware of this special case optimization.
-
-
- Overloading new and delete
-
-
- Another strategy for addressing memory allocation bottlenecks is to overload
- the new and delete operators for a class. Using a special purpose memory
- allocation algorithm rather than the general purpose algorithm used by malloc
- and free, you can squeeze out additional speed. By using a different heap for
- different allocation sizes you can reduce the amount of time needed to search
- for a chunk of free memory, and reduce fragmentation as well.
- An example of this is presented in Listing 1 - Listing 3, which implement a
- memory-pool class and a linked list class that uses the memory pool. The
- memory-pool class is a heap for a specific size allocation. It is used by
- associating one instance of the memory pool class with any class (via a static
- data member) and overloading the new and delete operators to use the pool.
- Since the specific new operator is only used for that class member, the size
- is always the same.
- The heap works by allocating 32 objects at a time, and maintaining a bit map
- of used areas. The pool has two lists, one for chunks that have at least one
- space, and the other for chunks that are completely used up. Therefore the
- allocation logic has to search only one chunk for a free spot. (I have to
- thank my brother Mark for the bit searching method, without which this method
- was actually slower than the built-in malloc.) This special version of new
- could also be rewritten in assembly language for even greater speed.
- For classes that hold limited resources (such as our BMOJoin which controls
- one or more network sockets), a good strategy is to delay creation until the
- last possible moment. We used code similar to that in Figure 4b inside the
- BMOIterator, calling GetJoinPointer rather than accessing the BMOJoin pointer
- itself. In this way we can create BMOIterators as we put up a window, but the
- BMOJoin isn't created until a search is performed. We added a member function
- called Disconnect to the BMOIterator, and call it when a window is put into
- the background to delete the BMOJoin pointer and free resources for the
- foreground window (see Figure 9).
-
-
-
- Conclusion
-
-
- Finally, running the profiler through a particularly sluggish area, we
- discovered that a lot of time was being spent padding strings with spaces. The
- problem here was that operator+= for the string was not as efficient as we had
- hoped. By going to a lower level, and accessing C-style strings directly, we
- managed to speed up the padding process considerably.
- I should mention that a good string class would have had a padding function,
- or at least the ability to create a blank string of count bytes. This
- low-level access then wouldn't be necessary. Since the string class was part
- of a commercial library we didn't want to modify or add to it. It's good to
- know that all the old C tricks are available even if just used at the lowest
- levels. Credit has to be given here to the profiler for finding this
- bottleneck.
- Many of the optimizations described above are fairly simple because of the
- separation of interface from implementation. The idea is to be able to grease
- up a class without having to worry about side effects that might be possible
- in a less restrictive interface. Other optimizations are possible by dipping
- into C++'s C heritage as a low-level language.
- Of course there is no substitute for a good design effort up front. A
- temptation in design is to make a lot of "friendly" classes, which access each
- other's internals. But what you wind up with is "spaghetti objects." By
- maintaining integrity between classes, you can twiddle with the internal bits
- to your heart's content without having to worry about unforeseen side effects.
- Figure 1 Good objects must do their own memory management
- class BadString {
- public:
- // Using the pointer directly:
- BadString(char *s) { ptr = s; }
- // Where did pointer come from:
- ~BadString() { delete[] ptr; }
- private:
- char *ptr;
- };
-
- char *strdup(const char *str)
- {
- return strcpy(new
- char[strlen(str)+1],str);
- }
-
- class GoodString {
- public:
- // Don't use the pointer, copy it:
- GoodString(const char *s)
- { ptr= strdup(s); }
- // Now you can safely delete it:
- ~GoodString()
- { delete[] ptr; }
- private:
- char *ptr;
- };
-
- // End of File
- Figure 2 Using the profiler to find hidden problems
- // Why isn't the while loop
- // ever executed?
-
- time count
- 0.0012 50 int flag = 0;
- 0.0031 50 while(flag = 0)
- {
- 0.0000 0 if(doTilTrue())
- 0.0000 0 flag = 1;
- }
-
- // End of File
- Figure 3 Inlining virtual functions
- class Base {
- public:
- virtual void print()
- { printf("Base"); }
- };
-
-
- class Derived1 : public Base {
- public:
- virtual void print()
- {
- // Base::print() can be inlined
- Base::print();
- printf("Derived1");
- }
- };
-
- class Derived2 : public Base {
- public:
- virtual void print()
- {
- // Base::print() can be inlined
- Base::print();
- printf("Derived2");
- }
- };
-
- void function(Base *bp)
- {
- // Doesn't know whether
- // to call Derived1::print()
- // or Derived2::print(),
- // won't inline:
- bp->print();
-
- // Forced to call Base::print(),
- // can inline:
- bp->Base::print();
- Derived1 d1o;
- // Forced to call
- // Derived1::print(), can
- // inline:
- d1o.print();
- }
-
- // End of File
- Figure 4 Efficient inlining
- Figure 4a. Splitting functions to facilitate inlining.
-
- 5.1274 10000 Widget *WidgetHome::GetWidgetPointer()
- {
- 1.0333 10000 if(widgetPtr == NULL)
- {
- 0.0012 10 widgetPtr = new Widget;
- 0.0013 10 if(widgetPtr == NULL
- widgetPtr->isError())
- 0.0000 0 return NULL;
- }
- 1.0237 10000 return widgetPtr;
- 3.5429 10000 }
-
- Figure 4b. Inlining only the part called most often.
-
- // the inline doesn't show up on the profile count:
-
- inline Widget *WidgetHome::GetWidgetPointer()
-
- {
- return widgetPtr ?
- widgetPtr:
- PrivateGetWidgetPointer();
- }
-
- 0.0020 10 Widget *WidgetHome::PrivateGetWidgetPointer();
- {
- 0.0059 10 widgetPtr = new Widget;
- 0.0035 10 if(widgetPtr == NULL
- widgetPtr->isError())
- 0.0000 0 return NULL;
- 0.0008 10 return widgetPtr;
- 0.0001 10 }
-
- // End of File
- Figure 5 Counting pointers
- class String {
- private:
- struct SringImp {
- char *ptr;
- unsigned count;
- StringImp(const char *str)
- : ptr(strdup(str)),
- count(l)
- {
- ;
- }
- ~StringImp()
- {
- delete[] ptr;
- }
- } *imp;
- public:
- String(const char *str)
- {
- imp = new StringImp(str);
- }
- String(const String &str)
- {
- imp = str.imp;
- imp->count++;
- }
- // assignment is a little tricky
- String &
- operator=(const String &str)
- {
- // increment first in case
- // of assignment to self
- str.imp->count++;
- // be sure to clean up the old imp!
- if(--imp->count == 0)
- delete imp;
- imp = str.imp;
- return *this;
- }
- ~String()
- {
- if(--imp->count == 0)
-
- delete imp;
- }
- };
- // End of File
- Figure 6 Implementing copy on write
- class String {
-
- // the contents of the string
- // class from Figure 5.
-
- ...
-
- // this indexing operator could
- // modify the string
- char &operator[](int i)
- {
- if(imp->count > 1)
- Split();
- return imp->ptr[i];
- }
- // This indexing operator won't
- // modify, don't split.
- char operator[](int i) const
- {
- return imp->ptr[i];
- }
- private:
- void Split()
- {
- // Create private copy:
- imp->count--;
- imp = new
- StringImp(imp->ptr);
- }
- };
- // End of File
- Figure 7 Implementing smart pointers
- class BMOIteratorImp;
-
- class BMOIterator {
- public:
- BMOIterator(const char *dbname);
- BMOIterator(const BMOIterator &);
- BMOIterator &
- operator=(const BMOIterator &);
- ~BMOIterator();
- BMOIteratorImp *operator->()
- {
- return imp;
- }
- private:
- BMOIteratorImp *imp;
- }
-
- function()
- {
- BMOIterator it("DBNAME");
- // AddCol is a memeber of
- // BMOIteratorImp
-
- it->AddCol("table.col1");
- }
- // End of File
- Figure 8 Using a special case for an empty list
- class AttributeList {
- private:
- struct AttributeListImp {
- int *list;
- unsigned count;
- AttributeListImp (int *
- data, unsigned size)
- {
- list = new int[size];
- memcpy(list,data,
- sizeof(int)*size);
- count = 1;
- }
- } *ptr;
- public:
- AttributeList()
- {
- // Empty list special case:
- ptr = NULL;
- }
- AttributeList(int *data,
- unsigned size)
- {
- ptr = new
- AttributeListImp(data,
- size);
- }
- AttributeList(const
- AttributeList &a)
- {
- // Now we have to check
- // for null:
- ptr: a.ptr;
- if(ptr) ptr->count++;
- }
-
- ...
-
- };
-
- // End of File
- Figure 9 Delaying creation of an object by restricting access
- class BMOJoin;
-
- class BMOIteratorImp {
- public:
- BMOIteratorImp()
- {
- itsJoinPtr = NULL;
- }
- ~BMOIteratorImp()
- {
- delete itsJoinPtr;
- }
- int Search(char *str)
-
- {
- return
- GetJoinPointer()->Search(str);
- }
- void Disconnect()
- {
- delete itsJoinPtr;
- itsJoinPtr = NULL;
- }
- private:
- // this is actually split apart
- // (see Figure 4.)
- BMOJoin *GetJoinPointer()
- {
- if(itsJoinPtr)
- return itsJoinPtr;
- itsJoinPtr = new
- BMOJoin(itsStoredParameters);
- if(itsJoinPtr == NULL)
- // for now throw is just an inline
- // for exit()
- throw(ErrNoMem);
- return itsJoinPtr;
- }
- };
-
- // End of File
- Figure 10 Going low-level to improve efficiency
- Figure 10a. Results of an inefficient += operator.
-
- 1.0235 500 void PadWithSpaces(CString &str, int count)
- {
- 5.0346 20000 while(count--)
- 30.0397 19000 str += " ";
- 1.4354 500 }
-
- Figure 10b. Improved performance with low-level C-strings.
-
- 1.0235 500 void PadWithSpaces(CString &str, int count)
- {
- 2.4505 500 char *tmp = new char[count+l];
- 3.2930 500 memset(tmp,' ',count);
- 0.9438 500 tmp[count] = '\0';
- 5.3049 500 str += tmp;
- 1.9430 500 delete[] tmp;
- 1.4353 500 }
-
- // End of File
-
- Listing 1 Defines class MemoryPool
- #ifndef MEMPOOL_H
- #define MEMPOOL_H
-
- #include <stddef.h>
-
- const CharSize = 8;
- const PoolSize =
- sizeof(unsigned long)*CharSize;
-
-
- class MemoryPoolLink {
- private:
- friend class MemoryPool;
- MemoryPoolLink(size_t _size,
- MemoryPoolLink *_next);
- ~MemoryPoolLink();
- void *malloc(size_t size);
- void free(void *, size_t size);
- unsigned long bits;
- MemoryPoolLink *next;
- char *data;
- };
-
- class MemoryPool {
- MemoryPoolLink freeHead;
- MemoryPoolLink usedHead;
- size_t size;
- public:
- MemoryPool(size_t size);
- void *add();
- void *malloc();
- void free(void *);
- };
-
- #endif
-
-
- Listing 2 Memory pool member functions
- #include "mempool.h"
-
- MemoryPoolLink::
- MemoryPoolLink(
- size_t size,
- MemoryPoolLink *_next)
- : bits(0l),
- data(new
- char[size*PoolSize]),
- next(_next)
- {
- ;
- }
-
- MemoryPoolLink::
- ~MemoryPoolLink()
- {
- delete[] data;
- }
- void *MemoryPoolLink::malloc(
- size_t size)
- {
- static char lookup[] = {
- 0, 1, 0, 2,
- 0, 1, 0, 3,
- 0, 1, 0, 2,
- 0, 1, 0, 4,
- };
- int shift = 0;
- unsigned long b = bits;
- if((b&0xFFFF) == 0xFFFF)
-
- {
- shift = 16;
- b >>= 16;
- }
- if((b&0xFF) == 0xFF)
- {
- shift += 8;
- b >>= 8;
- }
- if((b&0xF) == 0xF)
- {
- shift += 4;
- b >>= 4;
- }
- shift += lookup[b&0xF];
- bits = (1l << shift);
- return data +
- (shift * size);
- }
-
- void MemoryPoolLink::free(
- void *ptr,
- size_t size)
- {
- bits &=
- ~(1l <<
- ((char *)ptr-data)/size);
- }
-
- MemoryPool::MemoryPool(
- size_t_size)
- : size(_size),
- freeHead(0,NULL),
- usedHead(0,NULL)
- {
- }
-
- void *MemoryPool::add()
- {
- MemoryPoolLink *temp = new
- MemoryPoolLink(size,
- freeHead.next);
- if(temp == NULL)
- return NULL;
- freeHead.next = temp;
- return
- freeHead.next->
- malloc(size);
- }
-
- void *MemoryPool::malloc()
- {
- if(freeHead.next)
- {
- void *ret =
- freeHead.next->
- malloc(size);
- if(freeHead.next->bits ==
- 0xFFFFFFFFL)
-
- {
- MemoryPoolLink *temp =
- freeHead.next;
- freeHead.next =
- temp->next;
- temp->next =
- usedHead.next;
- usedHead.next = temp;
- }
- return ret;
- }
- return add();
- }
-
- void MemoryPool::free(
- void *ptr)
- {
- MemoryPoolLink *temp =
- freeHead.next;
- MemoryPoolLink *prev =
- &freeHead;
- while(temp)
- {
- ptrdiff_t diff =
- temp->data -
- (char *)ptr;
- if(diff > 0 &&
- diff < size*PoolSize)
- {
- temp->free(ptr, size);
- if(temp->bits == 01)
- {
- prev->next =
- temp->next;
- delete temp;
- }
- return;
- }
- prev = temp;
- temp = temp->next;
- }
- temp = usedHead.next;
- prev = &usedHead;
- while(temp)
- {
- ptrdiff_t diff =
- temp->data -
- (char *)ptr;
- if(diff > 0 &&
- diff < size*PoolSize)
- {
- temp->free(ptr, size);
- prev->next =
- temp->next;
- temp->next =
- freeHead.next;
- freeHead.next =
- temp;
- return;
-
- }
- prev = temp;
- temp = temp->next;
- }
- }
-
- // End of File
-
-
- Listing 3 A linked list for the memory pool
- #include "mempool.h"
- #include <fstream.h>
- #include <string.h>
-
- class Link {
- char *data;
- Link *next;
- static MemoryPool pool;
- public:
- friend class List;
- friend class Iterator;
- Link(const char *_data,
- Link *_next)
- {
- next =_next;
- data = strcpy(new
- char[strlen(_data)+1],
- _data);
- }
- ~Link()
- {
- delete data;
- }
- void *
- operator new(size_t)
- {
- return pool.malloc();
- }
- void
- operator delete(void *ptr)
- {
- pool.free(ptr);
- }
- };
-
- MemoryPool
- Link::pool(sizeof(Link));
-
- class List {
- Link head;
- public:
- friend class Iterator;
- List ();
- -List();
- void add(char *data);
- };
-
- class Iterator {
- Link *link;
-
- public:
- Iterator(List &_list)
- {
- link = _list.head.next;
- }
- char *Next()
- {
- char *ret = link ?
- link->data : NULL;
- link = link->next;
- return ret;
- }
- };
-
- List::List()
- : head("",NULL)
- {
- ;
- }
-
- List::~List()
- {
- while(head.next)
- {
- Link *temp =
- head.next->next;
- delete head.next;
- head.next = temp;
- }
- }
-
- void List::add(char *data)
- {
- head.next = new
- Link(data,head.next);
- }
-
- main(int, char **argv)
- {
- static char buff[1024];
- List list;
- ifstream in(argv[1]);
- while(in.getline(buff,
- sizeof(buff)))
- list.add(buff);
- Iterator it(list);
- char *ptr;
- while((ptr = it.Next()) !=
- NULL)
- cout << ptr << endl;
- }
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Enhancing the UNIX Korn Shell Using Predictor Techniques
-
-
- Philip Thomas and Shmuel Rotenstreich
-
-
- Philip K. Thomas received the B.S. degree in Physics from the California
- Polytechnic State University and is currently in the process of defending his
- Ph.D. dissertation in Computer Science at the George Washington University.
- Philip also serves as Director of System Architecture at PRC corporation in
- Mclean, Virginia and can be reached at philip@rsi.prc.com.
-
-
- Shmuel Rotenstreich received the B.S. degree in Computer Science from the Tel
- Aviv University and the Ph.D. degree also in Computer Science from the
- University of California at San Diego. Shmuel is currently Professor of
- Engineering and Applied Sciences at the George Washington University and can
- be reached at shmuel@sparko.gwu.edu.
-
-
-
-
- Introduction
-
-
- On most computer systems, users interact with the system through
- command-language interpreters called shells. These shells accept user input
- and interpret them as commands for the system. There are three major UNIX
- shells: Bourne, C and Korn (kshell). Other shells include command.com for
- MS-DOS and for OS/2.
- The Korn Shell, which we consider here, offers sophisticated management of
- past commands (history). We have enhanced this functionality to include a
- learning automaton that predicts the next command. This command prediction
- often allows the user to avoid entering the command sequence in its entirety.
- In most of the computing environments we considered, this enhanced shell
- predicted the next command with a high degree of accuracy.
- This article describes our shell enhancements and details some of the methods
- we used to implement them. Although the Korn Shell was our target shell, we
- have applied these same enhancements to the C Shell. We believe this article
- will show that our enhancements are applicable to other shells as well.
-
-
- Attaining Command Cycle Consciousness
-
-
- In our research we have found that most users exhibit cyclic behavior when
- interacting with an operating system. For example, to create and execute a
- program, users may produce the following:
- 1. Command: Evoke editor -- create the source file.
- 2. Command: Create executable -- compile & link the source file.
- 3. Command: Execute the file created in (2) -- examine the results.
- The user typically repeats this sequence until he or she has completed and
- thoroughly debugged the program.
- If a shell can recognize such cyclical behavior (we say it has cycle
- consciousness) it can predict the user's next command at any point in the
- sequence. We incorporate cycle consciousness into the shell by integrating it
- with the Predictor module outlined in the next section.
-
-
- The Predictor Module
-
-
- To incorporate cycle-based knowledge into various operating system facilities,
- we have developed an abstract data type module called a Predictor (Listing 1).
- The Predictor is a learning automation representative of the
- learning-from-analogy class of learning strategies. In learning-from-analogy
- strategies, the learner reaches a conclusion about the current case by
- considering previously processed cases bearing strong similarities to the
- current case. For the predictor module, the previously processed cases are the
- sequences of commands already encountered from the command stream.
- The Predictor module incorporates an abstract component as well as several
- concrete components. The abstract component is the prediction algorithm, which
- we describe shortly. The concrete components consist of three major classes:
- 1. Control Class: The two instances of this class supply entry points for
- initializing and terminating the Predictor Module.
- 2. Input Class: This class obtains the next command from external modules.
- 3. Predict Class: This class makes predictions for the Predictor module.
- predictor can make two types of predictions:
- 1) The next command
- 2) When a particular command will next occur
- By creating a predictor class, we encapsulate all cycle-dependent code and
- data, making it easier to modify. This method of encapsulation also permits
- the coexistence of multiple predictors. (Environments where multiple command
- streams exist require multiple predictors.) We describe the predictor class in
- detail later in this article.
-
-
- The Algorithm
-
-
- For this discussion we say that a command stream C is a sequence of N commands
- composed of M unique commands. These commands arrive at the predictor one at a
- time via the input interface. These commands determine the current state of
- the cycle (command stream).
- The predictor module maintains a weighted adjacency matrix A. A is an M x M
- matrix which represents a command stream C (this command stream dynamically
- increases in length as each subsequent command arrives), where the ijth
- element A[i,j] is 0 if command j never follows command i, or some positive
- integer indicating the number of times command j follows command i. (Some
- readers may recogonize this implementation as part of a semantic network.)
- To predict the next command, the algorithm locates the corresponding row for
- the latest command n in the adjacency matrix A. The algorithm then selects
- command i such that command i is: MAX(A[n,i]) i=1,2..M (index of command i)
- and A[n,i] >1
- If such a command exists, the algorithm returns command i as the best
- prediction of the next command to occur. If no such command exists, the
- algorithm returns indicating that the predictor module cannot predict the next
- command given its current state.
- The algorithm performs the following steps to predict when a particular
- command will next occur:
- 1. Save the current state of the predictor module.
- 2. Set count=0.
- 3. Call the predict-the-next-command interface repeatedly until the return
- value command i is
-
- a) Equal to the specified command
- b) A value indicating that the predictor cannot make a prediction
- c) A value previously returned by the predict-the-next-command interface (this
- implies that the predictor assumes the command stream will cycle without an
- occurrence of the specified command). For each time the
- predict-the-next-command interface is called, increment count.
- 4. Restore the original state of the predictor module.
- 5. Return count if step 3 terminated because of condition 3.a, otherwise
- return a value indicating that the predictor cannot predicting when the
- specified command will next occur.
-
-
- The Predictor Class
-
-
- The predictor class contains three publicly accessible functions to perform
- input and output:
- PutNextCommand -- This function registers commands occurring on the system. We
- modified the main control loop of the kshell to call this function whenever
- the kshell assembled a command and registered it to provide "history"
- functionality. This function returns values corresponding to internal fatal
- errors and success.
- GetNextCommand -- This function passes the next predicted command to the
- system, if it can be predicted. We augmented the kshell source to recognize
- when the user types ESC-p (in emacs mode) at the prompt, and to call
- GetNextCommand. As a result, the kshell produces either a beep (to indicate
- the inability to predict the next command) or a command string adjacent to the
- prompt, which the user may execute by typing a carriage return.
- WhenNextCommand -- As its name implies, this function predicts when a
- particular command will occur. We do not use this function in the kshell. We
- have included it in the predictor module because it is invaluable in other
- systems that require this kind of prediction. (e.g. in a file caching system
- -- when a file needs "flushing," information about when a file will next be
- used is advantageous.) This function returns zero to indicate that the command
- stream will cycle before it encounters the specified command. This function
- returns a value of less than zero to signal an internal error.
- These last two functions maintain and manipulate the adjacency matrix
- mentioned previously.
-
-
- Results
-
-
- To show how well the predictor works, we have derived the following examples
- from a command stream in a programming environment. Figure 1 shows the first
- example. The first column shows the actual command stream. The second column,
- with the prefix p: shows the predicted command stream. The last two columns
- show the current percentile of successful predictions and the maximum
- percentile of successful predictions, respectively. The second example (Figure
- 2) differs from the first only by one command; we have removed this command,
- ls, from the stream in example 2. This command represents an "exception" to
- the cyclical nature of the command stream. Thus, example 2 shows the effect of
- filtering out exceptions.
-
-
- Open Issues:
-
-
- To keep this article consice several items have not been sufficiently
- developed. The following are a few points that we believe require further
- clarification.
-
-
- How are command parameters handled?
-
-
- Commands can be classified as either tightly coupled or loosely coupled with
- their parameters. The degree of coupling is context dependent. For instance,
- when the predictor is used in a file caching subsystem, predicting the
- sequence of parameters (e.g. file names) is more important than predicting the
- sequence of the commands themselves. We say the commands are loosely coupled.
- In the case of tightly coupled commands, we treat parameters and their
- commands as one predictor unit, that is, the predictor considers the same
- command with different parameters as distinct and dissimilar commands. (The
- predictor outlined in this article works this way.) In the loosely coupled
- scenario, we either parse the parameters out and use them as distinct
- predictor units, or, as is usually more beneficial, we still treat
- command-parameter sets as tightly coupled and treat them as single predictor
- units.
-
-
- Are command cycles common?
-
-
- Yes. When we remove a lot of the "noise" from command streams, we find that
- most of the commands are members of command cycles. The "noise," we have
- discovered, comes in many forms. Many commands streams are sprinkled with
- commands we isolate as "exceptions." These exceptions would be commands like
- ls, dir, who etc. Users tend to use commands like these with enough randomness
- that it is easy to filter these commands out of otherwise perfectly healthy
- command streams. Another form of noise is the partial cycle. The partial cycle
- is best illustrated using the example session. In the edit-compile-execute
- cycle mentioned above, evoking the compiler sometimes results in errors that
- require the user to re-visit the editor without completing the cycle. Partial
- cycles are much harder to deal with than exceptions and, consequently, our
- success rate with these tends to be lower.
-
-
- GUI Environments
-
-
- We have also begun investigations into incorporating cycle consciousness into
- graphical user interface (GUI) environments. The essential problem here is
- defining a command primitive. In a command-oriented interface a command
- primitve is well defined, but in a modern GUI a command primitive is more
- loosely defined and may incorporate several user actions, such as mouse moves
- and button clicks. User activity is hard to partition into distinct commands.
-
-
- Conclusion
-
-
- As is any article of this nature, we had to summarize a large body of work
- both theoretical and experimental to present here. The algorithm presented
- here is a first-degree learning automation. We have added several filters and
- general enhancements to this automation resulting in several different
- predictors that, depending on the situation, are selected and evoked at run
- time (late binding -- virtual functions). Our work leveraged the predictor
- across several operating-system elements including memory management,
- schedulings and disk allocation. Overall, we think our assertions for
- incorporating cycle consciousness have been well founded. We have had
- favorable results in the areas we have studied.
- References:
- Peterson, James L.; Silberschatz, Abraham, Operating System Concepts,
- Addison-Wesley Publishing Co, 1985.
- Tanimoto, Steven L., The Elements of Artificial Intelligence, Computer Science
- Press, 1990.
- Thomas, Philip K., Rotenstreich, Shmuel, "Command Cycles," Ph. D.
- Dissertation, George Washington University, 1993.
- Anatomy of a Command Shell
- In most systems, user programs and system programs are executed by the
- command-language interpreter. In more sophisticated systems such as UNIX,
- there are no major distinctions between this interpreter and any other program
- -- so users can easily create their own shells.
- The shell actually executes a command by completing a fork, after which the
- child process executes an execve (load and execute) of the command. The parent
- process (the shell) does a wait and suspends its own execution until the child
- process finishes executing and performs an exit. In multi-tasking systems,
- such as UNIX, both the shell and the command it is processing can execute
- concurrently. Users can initiate concurrent execution by typing a command
- followed by an ampersand. The shell interprets this symbol as an indication
- not to perform the wait; instead the shell continues with the next step in its
- command input by prompting the user for the next command.
- This capability implies the existence of a fairly sophisticated interprocess
- communication system. As a bare minimum, a child process needs a method to
- signal its termination to the parent process (the shell), which is not
- necessarily polling for this signal.
-
- Patching the Korn Shell
- The kshell source we started with was PD KornShell written by Eric Gisin
- <egisin@math. UWaterloo. EDU>. PD Korn Shell installs on 4.2+ BSD System V,
- and POSIX-compatible systems. PD KornShell assumes you have Standard C (ANSI)
- and POSIX header files functions. The PD KornShell source is available on
- several Internet locations including:
- ftp.uu.net (192.48.96.9)
- /usenet/comp.sources.amiga/volume91/shells and
- softu1.ncu.edu.tw (140.115.19.11)
- /pub5/tarz
- We assembled the kshell on a HP-UX machine using HP's C++ compiler. On the
- HP-UX system, which is a hybrid UNIX system (System V & BSD), only a
- negligible amount of source modification was to get the shell up and running.
- Since the source was in " public-domain UNIX-type" C, we decided to leave the
- original parts in C and incorporate our enhancements in a C++ class. To
- facilitate the intermingling of C and C++, we instructed the C++ compiler to
- suppress name mangling (See "Using C/C++ with Clipper" by Mark W. Schumann in
- the December 1993 issue of CUJ for a thorough treatment of name mangling in
- C++).
- Figure 1 A sample session with predicted and actual commands streams
- Example 1:
- ==========
- Original Command Stream Predicted Command Stream
- ----------------------- ------------------------ --- ---
- o:cc -o timer timer.c p: 00% 00%
- o:timer p: 00% 00%
- o:ls -l timer p: 00% 00%
- ***Removed from example 2****
- o:vi timer.c p: 00% 00%
- o:cc -o timer timer.c p:cc -o timer timer.c *16% 16%
- o:timer p:timer *28% 28%
- o:vi timer.c p:ls -l timer 25% 28%
- o:vi tstin p:cc -o timer timer.c 22% 28%
- o:cc -o timer timer.c p: 20% 28%
- o:vi tstin p:timer 18% 28%
- o:vi timer.c p:cc -o timer timer.c 16% 28%
- o:cc -o timer timer.c p:cc -o timer timer.c *23% 28%
- o:timer p:timer *28% 28%
- o:vi timer.c p:vi timer.c *33% 33%
- o:cc -o timer timer.c p:cc -o timer timer.c *37% 37%
- o:timer p:timer *41% 41%
- o:vi timer.c p:vi timer.c *44% 44%
- o:cc -o timer timer.c p:cc -o timer timer.c *47% 47%
- o:timer p:timer *50% 50%
- o:vi timer.c p:vi timer.c *52% 52%
- o:cc -o timer timer.c p:cc -o timer timer.c *54% 54%
- o:timer p:timer *56% 56%
- o:vi timer.c p:vi timer.c *58% 58%
- o:cc -o timer timer.c p:cc -o timer timer.c *60% 60%
- o:timer p:timer *61% 61%
- o:vi timer.c p:vi timer.c *62% 62%
- o:cc -o timer timer.c p:cc -o timer timer.c *64% 64%
- o:timer p:timer *65% 65%
- o:vi timer.c p:vi timer.c *66% 66%
- o:cc -o timer timer.c p:cc -o timer timer.c *67% 67%
- o:timer p:timer *68% 68%
- o:vi timer.c p:vi timer.c *69% 69%
- o:cc -o timer timer.c p:cc -o timer timer.c *70% 70%
- o:timer p:timer *71% 71%
- o:vi timer.c p:vi timer.c *72% 72%
- o:cc -o timer timer.c p:cc -o timer timer.c *72% 72%
- o:timer p:timer *73% 73%
- o:vi timer.c p:vi timer.c *74% 74%
- o:cc -o timer timer.c p:cc -o timer timer.c *75% 75%
- o:timer p:timer *75% 75%
- o:vi timer.c p:vi timer.c *76% 76%
- o:cc -o timer timer.c p:cc -o timer timer.c *76% 76%
- o:timer p:timer *77% 77%
- o:vi timer.c p:vi timer.c *77% 77%
- o:cc -o timer timer.c p:cc -o timer timer.c *78% 78%
- o:timer p:timer *78% 78%
-
- o:vi timer.c p:vi timer.c *79% 79%
- o:timer p:cc -o timer timer.c 77% 79%
- o:vi timer.c p:vi timer.c *78% 79%
- o:timer p:cc -o timer timer.c 76% 79%
- o:vi timer.c p:vi timer.c *76% 79%
- o:cc -o timer timer.c p:cc -o timer timer.c *77% 79%
- o:timer p:timer *77% 79%
- o:vi timer.c p:vi timer.c *78% 79%
- o:timer p:cc -o timer timer.c 76% 79%
- o:mail p:vi timer.c 75% 79%
- o:remsh rigel p: 74% 79%
- o:vi timer.c p: 72% 79%
- o:cc -o timer timer.c p:cc -o timer timer.c *73% 79%
- o:timer p:timer *73% 79%
- o:vi timer.c p:vi timer.c *74% 79%
- o:cc -o timer timer.c p:cc -o timer timer.c *74% 79%
- o:vi timer.c p:timer 73% 79%
- o:cc -o timer timer.c p:cc -o timer timer.c *73% 79%
- o:more $inc/setjmp.h p:timer 72% 79%
- o:vi timer.c p: 71% 79%
- o:cc -o timer timer.c p:cc -o timer timer.c *72% 79%
- o:vi timer.c p:timer 71% 79%
- o:cc -o timer timer.c p:cc -o timer timer.c *71% 79%
- o:vi timer.c p:timer 70% 79%
- o:cc -o timer timer.c p:cc -o timer timer.c *70% 79%
- o:timer p:timer *71% 79%
- o:vi timer.c p:vi timer.c *71% 79%
- o:cc -o timer timer.c p:cc -o timer timer.c *72% 79%
- o:timer p:timer *72% 79%
- o:vi timer.c p:vi timer.c *72% 79%
- o:cc -o timer timer.c p:cc -o timer timer.c *73% 79%
- o:timer p:timer *73% 79%
- Figure 2 The sample session with an exception removed
- Example 2: (Without ls)
- =======================
- Original Command Stream Predicted Command Stream
- ----------------------- ------------------------ --- ---
- o:vi timer.c p: 00% 00%
- o:cc -o timer timer.c p: 00% 00%
- o:timer p: 00% 00%
- o:vi timer.c p: 00% 00%
- o:cc -o timer timer.c p:cc -o timer timer.c *20% 20%
- o:timer p:timer *33% 33%
- o:vi timer.c p:vi timer.c *42% 42%
- o:vi tstin p:cc -o timer timer.c 37% 42%
- o:cc -o timer timer.c p: 33% 42%
- o:vi tstin p:timer 30% 42%
- o:vi timer.c p:cc -o timer timer.c 27% 42%
- o:cc -o timer timer.c p:cc -o timer timer.c *33% 42%
- o:timer p:timer *38% 42%
- o:vi timer.c p:vi timer.c *42% 42%
- o:cc -o timer timer.c p:cc -o timer timer.c *46% 46%
- o:timer p:timer *50% 50%
- o:vi timer.c p:vi timer.c *52% 52%
- o:cc -o timer timer.c p:cc -o timer timer.c *55% 55%
- o:timer p:timer *57% 57%
- o:vi timer.c p:vi timer.c *60% 60%
- o:cc -o timer timer.c p:cc -o timer timer.c *61% 61%
- o:timer p:timer *63% 63%
-
- o:vi timer.c p:vi timer.c *65% 65%
- o:cc -o timer timer.c p:cc -o timer timer.c *66% 66%
- o:timer p:timer *68% 68%
- o:vi timer.c p:vi timer.c *69% 69%
- o:cc -o timer timer.c p:cc -o timer timer.c *70% 70%
- o:timer p:timer *71% 71%
- o:vi timer.c p:vi timer.c *72% 72%
- o:cc -o timer timer.c p:cc -o timer timer.c *73% 73%
- o:timer p:timer *74% 74%
- o:vi timer.c p:vi timer.c *75% 75%
- o:cc -o timer timer.c p:cc -o timer timer.c *75% 75%
- o:timer p:timer *76% 76%
- o:vi timer.c p:vi timer.c *77% 77%
- o:cc -o timer timer.c p:cc -o timer timer.c *77% 77%
- o:timer p:timer *78% 78%
- o:vi timer.c p:vi timer.c *78% 78%
- o:cc -o timer timer.c p:cc -o timer timer.c *79% 79%
- o:timer p:timer *80% 80%
- o:vi timer.c p:vi timer.c *80% 80%
- o:cc -o timer timer.c p:cc -o timer timer.c *80% 80%
- o:timer p:timer *81% 81%
- o:vi timer.c p:vi timer.c *81% 81%
- o:cc -o timer timer.c p:cc -o timer timer.c *82% 82%
- o:timer p:timer *82% 82%
- o:vi timer.c p:vi timer.c *82% 82%
- o:timer p:cc -o timer timer.c 81% 82%
- o:vi timer.c p:vi timer.c *81% 82%
- o:timer p:cc -o timer timer.c 80% 82%
- o:vi timer.c p:vi timer.c *80% 82%
- o:cc -o timer timer.c p:cc -o timer timer.c *80% 82%
- o:timer p:timer *81% 82%
- o:vi timer.c p:vi timer.c *81% 82%
- o:timer p:cc -o timer timer.c 80% 82%
- o:mail p:vi timer.c 78% 82%
- o:remsh rigel p: 77% 82%
- o:vi timer.c p: 75% 82%
- o:cc -o timer timer.c p:cc -o timer timer.c *76% 82%
- o:timer p:timer *76% 82%
- o:vi timer.c p:vi timer.c *77% 82%
- o:cc -o timer timer.c p:cc -o timer timer.c *77% 82%
- o:vi timer.c p:timer 76% 82%
- o:cc -o timer timer.c p:cc -o timer timer.c *76% 82%
- o:more $inc/setjmp.h p:timer 75% 82%
- o:vi timer.c p: 74% 82%
- o:cc -o timer timer.c p:cc -o timer timer.c *74% 82%
- o:vi timer.c p:timer 73% 82%
- o:cc -o timer timer.c p:cc -o timer timer.c *73% 82%
- o:vi timer.c p:timer 72% 82%
- o:cc -o timer timer.c p:cc -o timer timer.c *73% 82%
- o:timer p:timer *73% 82%
- o:vi timer.c p:vi timer.c *73% 82%
- o:cc -o timer timer.c p:cc -o timer timer.c *74% 82%
- o:timer p:timer *74% 82%
- o:vi timer.c p:vi timer.c *75% 82%
- o:cc -o timer timer.c p:cc -o timer timer.c *75% 82%
- o:timer p:timer *75% 82%
-
- Listing 1 Definition of the Predictor
- enum ReturnValue
-
- {
- Failure,
- Success
- };
-
- const DEF_CARDINAL=500; //number of commands a_matrix can hold
- const DEF_DELTA_CARDINAL=100; //increment of cardinal_a_matrix
- const MAX_SIZE_COMMAND=50; //maximum size of command
-
- class predictor
- {
- int num_commands; //current number of unique commands
- int cardinal_a_matrix; //max number of commands a_matrix can hold
- int *a_matrix; //pointer to the adjacency matrix
- char *commands; //list of unique commands
- int last_ordinal; //index of last command
-
- ReturnValue size(int cardinality); //malloc or remalloc to create
- // approp. a_matrix & commands
- int is_unique(char *command); //checks commands[] for command
- // <0 =>No,else=>index of command
- ReturnValue insert(char *command); //add command to commands
- ReturnValue update(int ordinal); //increment a_matrix entry
-
- public:
- predictor()
- {
- num_commands=0;
- cardinal_a_matrix=DEF_CARDINAL;
- size(cardinal_a_matrix);
- last_ordinal=-1;
- }
- predictor(int cardinality) //overload constructor
- {
- num_commands=0;
- cardinal_a_matrix=cardinality;
- size(cardinal_a_matrix);
- last_ordinal=-1;
- }
- ~predictor() //destructor
- {
- cardinal_a_matrix=0;
- free(a_matrix);
- free(commands);
- }
-
- ReturnValue PutNextCommand(char *);
- ReturnValue GetNextCommand(char *);
- int WhenNextCommand(char *);
- };
-
-
- ReturnValue predictor::PutNextCommand(char *command)
- {
- int ordinal;
-
- if(num_commands+1 >= cardinal_a_matrix) //no space ?
- {
- cardinal_a_matrix+=DEF_DELTA_CARDINAL;
-
- if(size(cardinal_a_matrix)==Failure)//can not allocate space
- return(Failure);
- }
-
- if((ordinal=is_unique(command))==-l) //command encountered before?
- {
- if(insert(command) == Failure) //no - add it
- return(Failure);
- ordinal=num_commands; //new ordinal number of command
- }
-
- return(update(ordinal)); //return appropriate value
- }
-
-
- ReturnValue predictor::GetNextCommand(char *command)
- {
- int ordinal,max=0,i;
- int *last_command;
-
- //note: assumes a_matrix is stored in row major fashion
-
- last_command=a_matrix+ //calculate row of last_command
- last_ordinal*
- cardinal_a_matrix*sizeof(int); //size of a row
- //Note: With certain compilers, sizeof(int) is unnecessary
-
- for(i=0; i<num_commands; ++i) //do MAX() function
- {
- if(*(last_command+i*sizeof(int))>max)
- {
- max=*(last_command+i*sizeof(int));
- ordinal =i;
- }
- }
-
- if(max>0)
- {
- strcpy(command,commands*MAX_SIZE_COMMAND);
- return(Success);
- }
- else
- {
- command=NULL;
- return(Failure);
- }
- }
-
- int predictor::WhenNextCommand(char *command)
- {
- int count=1;
- int old_last_ordinal=last_ordinal; //save state of predictor
- int any_cycle[num_commands]; //check to see if we have cycled
- int i;
- char pcommand[MAX_SIZE_COMMAND];
-
- //zero
- for(i=0; i<num_commands; ++i)
- any_cycle[i]=0;
-
-
- while(GetNextCommand(command)==Success)
- {
- if(!strcmp(pcommand,command)) //found the command
- break;
-
- i=is_unique(pcommand); //find index of predicted command
- if(!any_cyle[i]^1) //test and check
- {
- count=0; //we have cycled - failure
- break;
- }
-
- ++count;
- }
- return(count);
- }
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- C Elements of Style
-
-
- Dwayne Phillips
-
-
- Dwayne Phillips works as a computer and electronics engineer with the U.S.
- Department of Defense. He has a PhD in Electrical and Computer Engineering
- from Louisiana State University. His interests include computer vision,
- artificial intelligence, software engineering, and programming languages.
-
-
- Steve Oualline's C Elements of Style is, as its name suggests, a style manual
- for C programming. Qualline outlines a method for writing clear and
- decipherable C code. He emphasizes a simple and straightforward style. In his
- words, this book is for programmers "who want their programs to be easily read
- and maintained by others."
-
-
- Audience
-
-
- You do not need to be a C wizard to understand this book. In fact, wizards who
- write statements like:
- *destination++ = *source++
- may not like this book. Oualline advocates the more accessible:
- *destination = *source; destination++; source++;
- If you write programs that must be corrected or augmented by others, this book
- will be of interest to you. It is especially appropriate for relatively new
- programmers who are struggling to learn good C programming habits.
-
-
- Contents
-
-
- C Elements of Style has nine chapters, a compact style manual, two appendices,
- and an index. The chapters cover the topics expected in this type of book.
- They include (1) style and program organization, (2) file basics, comments,
- and program headings, (3) variable names, (4) statement formatting, (5)
- statement details, (6) the preprocessor, (7) C++ style, (8) directory
- organization and makefile style, and (9) user-friendly programming.
- For much of the book, Oualline gives examples of good and bad code, and
- summarizes with style rules. His rules are simple and easy to apply. Some of
- the rules apply to the process of writing code. For example, "Comment your
- code as you write it." (This rule, he says, saves time in the long run.) Other
- rules address the code itself: "Constant names are all upper case" and "Follow
- every variable declaration with a comment that defines it." Some rules just
- make a programmer's life easier: "Assume that *,/, and % come before + and -.
- Put parentheses around everything else."
- The style rules are aimed at making programs readable and reliable. Good
- variable and subroutine names, which help make the code explain itself, and
- good comments, which help reveal the organization of programs, enhance
- readability. Oualline strives to increase reliability by limiting the code to
- safe subsets of the C language. (Not allowing shortcuts such as *destination++
- = *source++ reduces the risk of problematic side effects.) Oualline describes
- this emphasis on readable and reliable code as "defensive programming" or
- "doing a lot of thinking so I don't have to do a lot of thinking."
- The chapter on makefile style is outstanding and to my knowledge unique.
- Makefiles are essential to C programming and are often the hardest part of a
- project to understand. Other books describe how to use makefiles, but this is
- the only book I have seen that tells how to organize and format them. Oualline
- formally defines many makefile items that have become pseudo standards over
- the years. This material may seem trivial to an experienced UNIX C programmer,
- but many C programmers today have never worked on a UNIX system (believe it or
- not).
- The chapter on user-friendly programming is a bit out of place, but useful. It
- discusses how to write programs that users will actually use. (It includes the
- Law of Least Astonishment: The program should act in a way that least
- astonishes the user.)
- The style manual is 35 pages of rules without all the discussion. For those
- short on time, this book-within-a-book provides a concise summary of the text.
- The first appendix shows three complete code examples (two in C, one in C++)
- that employ the style rules given in the body of the book. These examples show
- Oualline's recommendations in practice. The second appendix lists all the
- rules given in the chapters. (Take these seven pages and pin them on the wall
- next to the coffee machine.)
-
-
- Other Style Manuals
-
-
- For years, the only programming style manual available was Kernighan and
- Plauger's thin little classic The Elements of Programming Style [1].
- Fortunately, several style manuals have appeared in the recent past -- each
- written with a different point of view.
- Plum's C Programming Guidelines [2] and C++ Programming Guidelines [3]
- (reviewed in The C Users Journal, January 1993) are comprehensive references
- on programming standards and style. They are written in a compact,
- subroutine-like style that is appropriate for reference books. These books are
- technical and not for the novice.
- Steve McConnel's Code Complete [4] is a comprehensive handbook for
- programmers. As such, it includes plenty of good advice on programming style,
- but its style sections are scattered throughout the 800-plus pages.
- While Oualline favors code that anyone can understand, Ranade and Nash's The
- Elements of C Programming Style [5] (reviewed in The C Users Journal, July
- 1993) encourages programmers to master and exploit the notation of the C
- language. (Ranade and Nash's do recommend backing away from terse C style if
- the audience includes programmers who have a limited knowledge of C.)
-
-
- Conclusion
-
-
- Oualline's book is the closest to Kernighan and Plauger's and it serves as a
- worthy update to that earlier classic. It is a short, yet complete treatment
- of C style.
- C Elements of Style is a deserving addition to the desk top -- not just the
- bookshelf.
- References
- [1] The Elements of Programming Style, second edition, Brian W. Kernighan,
- P.J. Plauger, McGraw Hill, New York, New York, 1978, ISBN 0-07-034207-5.
- [2] C Programming Guidelines, second edition, Thomas Plum, Plum Hall Inc.,
- ISBN 0-911537-07-4.
- [3] C++ Programming Guidelines, Thomas Plum and Dan Saks, Plum Hall Inc., ISBN
- 0-911537-10-4.
- [4] Code Complete, A Practical Handbook of Software Construction, Steve
- McConnel, Microsoft Press, One Microsoft Way, Redmond, Wash. 98052-6399, ISBN
- 1-55615-484-4.
- [5] The Elements of C Programming Style, Jay Ranade and Alan Nash, McGraw
- Hill, New York, New York, 1993, ISBN 007-051278-7.
- Title: C Elements of Style The Programmer's Style Manual for Elegant C and C++
- Programs
- Author: Steve Oualline
- Publisher: M&T Books
-
- 411 Borel Ave, Suite 100
- San Mateo, CA 94402
- 1992
- Price: $21.95
- Pages: 265
- ISBN: 1-55851-291-8
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Standard C
-
-
- C++ Language Support Library
-
-
-
-
- P.J. Plauger
-
-
- P.J. Plauger is senior editor of The C Users Journal. He is convenor of the
- ISO C standards committee, WG14, and active on the C++ committee, WG21. His
- latest books are The Standard C Library, and Programming on Purpose (three
- volumes), all published by Prentice-Hall. You can reach him at
- pjp@plauger.com.
-
-
-
-
- Introduction
-
-
- I conclude my discussion of the language support portion of the library
- specified by the draft C++ standard. (See "Standard C: The Header
- <exception>," CUJ, February 1994, "The C Library in C++," CUJ, December 1993,
- "C++ Library Ground Rules," CUJ, November 1993, and "Developing the Standard
- C++ Library," CUJ, October 1993.)
- "Language support" consists of those functions that can be called implicitly
- by C++ code, even when the code apparently contains no function calls. It also
- consists of the types required to declare and use those functions, as well as
- a few other related functions and types not directly needed to support the C++
- language proper.
- Standard C has few such creatures (if any). You can argue that several type
- definitions are part of language support. The types ptrdiff_t, size_t, and
- wchar_t are defined in various headers so you can declare objects that have
- the same types as certain expressions. They are a way to convey otherwise
- unknowable (or hard to learn) information about these types from the
- translator to the program.
- You can also argue that function exit is part of language support. The
- execution of any C program effectively occurs by evaluating the expression
- exit(main(argc, argv)). Saying such a thing simplifies descriptions -- it ties
- together the effects of calling exit and returning from main, for example. But
- it doesn't have a dramatic effect on how you actually write C programs. You
- cannot, for example, provide your own version of exit and expect it to be
- called when main returns. (And many implementations don't really call exit
- when main returns, but some other underlying function instead.)
- C++, on the other hand, offers numerous opportunities along these lines. You
- can trot up all sorts of functions that get control, either directly or
- indirectly, when one of the language support functions gets called. For
- example, in last month's discussion of exceptions I identified three functions
- that let you register handlers, or functions that get control under some
- circumstances:
- set_terminate, to specify a handler for calls to terminate() set_unexpected,
- to specify a handler for calls to unexpected()
- xmsg::set_raise_handler, to specify a handler for calls to xmsg::raise()
- You can also derive a class from xmsg and override the virtual xmsg::do_raise
- to get control when certain exceptions get reported by executing code.
- Thus, the draft C++ standard is a bit harder to write than the C Standard. In
- C, the implementation provides all library functions and you the programmer
- cannot displace them. The C Standard only has to describe a single interface
- between implementation and program. In C++, however, the program can displace
- functions otherwise supplied by the library. The draft C++ standard must spell
- out the environment promised to such a displacing function. And it must spell
- out what is expected of the displacing function so the program doesn't get
- surprised.
- A handler for terminate(), for example, is not supposed to return to its
- caller. If you provide one that prints a message and returns, you can cause
- the library severe problems. The draft C++ standard says so. So when you read
- the descriptions that follow, remember that the "treaty" between programmer
- and implementor can be multifaceted. The extra complexity of the draft C++
- standard is one of the prices we pay for extra flexibility in this area.
-
-
- Storage Allocation
-
-
- Exceptions can be thought of as a way to structure the use of setjmp and
- longjmp. Similarly, the addition of new and delete to C++ essentially
- structure the use of malloc and free. By writing:
- Thing *p = new Thing;
- you are assured that the object of type Thing is properly constructed after it
- is successfully allocated and before it can be accessed through p. Similarly,
- the expression statement:
- delete p;
- ensures that the object is destroyed before its storage is deallocated.
- You don't have to include any headers before writing expressions like these --
- new and delete are indeed built right into the language. But you can also play
- a variety of games with storage allocation if you choose. To do so, you begin
- by including the header <new>. Listing 1 shows a representative version of
- this header. I omit the extra superstructure required by namespaces, because
- it is distracting and still in a state of flux.
- The simplest game you can play is to gain control when space for the heap is
- exhausted. The function set_new_handler lets you register a handler for this
- condition. In principle, the draft C++ standard says you can "make more
- storage available for allocation and then return," but it fails to describe a
- portable way to do so. Calling free to liberate storage may help, but there is
- no requirement that storage be actually allocated by calling malloc. Deleting
- one or more allocated objects may also help, but even that is not guaranteed.
- More likely, you will want to throw an exception or terminate execution at
- this point.
-
-
- xalloc Exceptions
-
-
- The default "new handler" does, in fact, throw an exception now. As I
- described last month, all library exceptions are derived from the base class
- xmsg. Moreover, all exceptions are thrown by calling ex.raise(), for some
- object ex of class xmsg. Unless you seize control of the process in one of the
- ways I described last month, the eventual outcome is that a failed allocation
- will throw an exception, which will in turn terminate execution of the
- program.
- This is a significant change from universal past practice, which has been to
- quietly yield a null pointer as a result of the new expression. The Library
- Working Group of X3J16/WG21, the joint ANSI/ISO standards committee for C++,
- anguished quite a bit before recommending this change. The joint committee
- anguished a bit more in turn. But eventually, the predominant wisdom was that
- the Standard C++ library had bloody well better use the full language in this
- case, not just the bits that were available when new and delete were first
- added to C++.
- A persuasive argument is that very few programs truly check all new
- expressions for null pointers. Those that don't may well stumble about when
- the heap is exhausted -- they're almost certainly better off dying a clean
- death. Those that do check all such expressions often simply abort -- the path
- to abnormal termination is now just slightly different. It is only those few
- sophisticated programs that try to do something nontrivial when heap is
- exhausted that need a bit of rewriting. Most of the joint committee felt this
- was a necessary price to pay to introduce exceptions at this critical
- juncture.
- Even so, some sympathy remains for being able to revert to the old behavior.
- For a variety of reasons, the Library Working Group has not spelled out a
- portable way to do so. But the group has identified what it thinks should be a
- common extension. Calling set_new_handler with a null pointer argument is
- otherwise undefined behavior. It seems natural to use this nonportable call as
- a way for implementations to know that they should revert to the older
- behavior.
-
-
- Replacing operator new(size_t)
-
-
- If you want more certain control over the business of allocating storage, your
- best bet is to provide your own versions of operator new(size_t) and/or
- operator delete(void *). These functions have a peculiar dispensation -- the
- library provides a version of each, but you can "knock out" those versions by
- defining your own. (Only the array versions of these two operators, described
- below, also enjoy this special status within the Standard C++ library.)
- Before I go into details, please note an important distinction here. When you
- write:
-
- Thing *p = new Thing;
- the new Thing part is called a "new expression." It calls operator new(size_t)
- to allocate storage, but it also does other things, such as constructing the
- newly allocated object. All that operator new(size_t) has to worry about is
- providing the number of requested bytes, suitably aligned, or dealing with
- heap exhaustion. Listing 3 shows one way to write this function.
- Similarly, when you write:
- delete p;
- the delete p part is called a "delete expression." It calls operator
- delete(void *) to free storage, but it first destroys the object (only if the
- pointer is not null, of course). All that operator delete(void *) has to worry
- about is freeing storage for the object. Listing 4 shows one way to write this
- function.
- So one thing you might do is replace operator delete(void *) with a function
- that doesn't really free the storage. That could be handy while you're
- debugging a program, provided of course that you have enough heap to run your
- test cases.
- Or you might replace both operator new(size_t) and operator delete(void *)
- with versions that are simpler, or faster, or more sophisticated than the
- library versions. It is important to replace both, because the latter function
- in the library only knows how to free storage for objects allocated by the
- former.
- In either case, you probably don't have to bother with set_new_handler. You
- are at liberty to do whatever you want when you run out of heap. No need to
- call the new handler, which you can't easily do portably anyway.
-
-
- Placement Syntax
-
-
- Yet another latitude granted by the C++ language is to provide an arbitrary
- set of additional arguments in a new expression, as in:
- Thing *get_special(T1 stuff,
- T2 more_stuff)
- {
- return (new (stuff, more_stuff) Thing);
- }
- This form implicitly calls the function:
- void *operator new(size_t, T1, T2);
- which you are obliged to supply. I leave it to your imagination what extra
- parameters might be useful when you're allocating some of your more
- sophisticated objects.
- It doesn't take too much imagination, however, to see a very common need.
- Sometimes you know exactly where you want a C++ object to be constructed --
- you have reason to believe that the storage area X is large enough and
- suitably aligned to hold an object of type Thing. Moreover, you're confident
- that no object has been constructed there already for which a destructor will
- later be called. (Whew!)
- To deal with this twilight zone between C and C++ programming, you can write:
- Thing *p = new ((void *)&X) Thing;
- This, naturally enough, calls the function:
- void *operator new(size_t, void *);
- which can simply return its second argument, as shown in Listing 4. The
- Standard C library provides this one version of a placement operator new.
- (Don't forget to include the header <new> to be sure it is properly declared.)
- Any fancier placement variants are up to you to provide.
-
-
- Member operator new
-
-
- Yet another way exists for controlling how objects get allocated. For any
- class, you can overload all the variants of operator new and/or operator
- delete that I've mentioned so far. Perhaps you want to write your own versions
- of:
- void *Thing::operator new(size_t);
- void Thing::operator delete(void *);
- that does a really fast job of allocating and freeing objects of class Thing.
- It can, for example, maintain a list of previously freed objects and hand them
- back quickly for future allocation requests. Unless you really get tricky, you
- can even ignore the size_t first argument to all variants of operator new,
- since you know how big a Thing is likely to be. (How do you get tricky? Well,
- you can make operator new virtual in the base class and fail to override it in
- a derived class. But thinking about things like that gives me a headache.)
- So you see that you can exercise pretty fine control over how all objects, or
- even individual objects, get allocated.
-
-
- Allocating Arrays
-
-
- But that leads to one last residual problem, regarding the allocation and
- freeing of arrays. You can, for example, write:
- Thing *p = new Thing[N];
- to allocate an array of N elements each of type Thing. Each of the elements is
- constructed in order, starting with the first (element zero). In this case,
- you must write the expression statement:
- delete[] p;
- to delete the array, not just a simple:
- delete p;
- as before. Why? Because the "array new expression" above has to somehow
- memorize how many elements N it has allocated. It needs to know to locate this
- memorized information and use it to destroy the appropriate number of elements
- and free the appropriate amount of storage. Yes, some existing implementations
- of C++ let you be cavalier about deleting arrays the wrong way, but don't
- count on that license in a portable program.
- This requirement presents another problem. What happens if you've provided a
- member operator new(size_t) for class Thing, as above? It cannot, in general,
- know whether it's being asked to allocate storage for a single element or a
- whole array. (Remember the potential trickery I mentioned above.) So what C++
- has done in the past is to ignore any such member functions and call the
- global operator new(size_t) for all array allocations. This has been a less
- than satisfactory solution.
- The joint committee has plugged this control gap by permitting you to define
- functions such as the members operator[] new(size_t) and operator delete(void
- *). Defining these functions gives you control over the allocation and freeing
- of arrays of class objects as well as the class objects themselves. You can't
- necessarily tell how many array elements are being allocated, by the way. An
- array new expression can ask for extra storage for its own bookkeeping, so
- you'd better honor the size_t argument blindly. But at least you can maintain
- private storage pools now for array objects.
- For completeness, the draft C++ standard also includes global versions of:
- void *operator new(size_t);
- void operator delete(void *);
- The library versions of these functions just turn around and call the
- non-array library versions, so I won't show you the code for them. And you can
- indeed knock these functions out with your own definitions, but I'm not sure
- why you'd bother. Doubtless, someone more clever or perverse than I can make a
- case for any feature added to C++.
-
-
- Type Information
-
-
-
- There is one last aspect to the language support library. It is rather small
- compared to exceptions (all of last month's installment) or storage management
- (most of this month's). I tack it on here for completeness.
- Another relatively recent significant addition to the draft C++ standard is
- "run-time type identification" (or RTTI, for short). Basically, it adds the
- operator typeid for obtaining various bits of information on the type of an
- object (or expression). The operator yields an object of class typeinfo,
- defined in the header <typeinfo>. Listing 5 shows one way to write this
- header.
- The exception badtypeid is reported in those cases where the type cannot be
- determined statically at translation time. If, in the process of chasing down
- the actual object, the program encounters a null pointer, you can guess what
- happens.
- (If you're put off by all these names made from words run together, you're not
- alone. There's a good chance that the joint committee will approve a new
- naming convention that involves a more liberal use of underscores to separate
- component words in names. So don't be surprised if many of these compound
- names change in the coming months.)
- What can you do with an object of class typeinfo? Well, you can obtain some
- sort of name for the type, for one thing. typeinfo::name() yields a
- null-terminated multibyte string (or NTMBS, in the jargon of the draft C++
- standard) that presumably says something meaningful about the type. There are
- no standard names defined, so far, not even for the builtin types.
- You can also compare two objects of class typeinfo for equality or inequality.
- Within any given program, you can expect two such objects to compare equal
- only if they derive from two expressions of the same type. Don't expect to be
- able to remember these critters in files, however, and check for type equality
- across programs. Even running the same program twice doesn't promise to yield
- the same representation of a typeinfo object for the same type each time. (I
- have indicated that the type information can be represented as an int, but
- that is just illustrative, not a requirement.)
- Finally, you can impose an ordering on all the types within a program.
- typeinfo::before(const typeinfo&) returns nonzero for an object that
- represents a type earlier in the pecking order than the argument object. Once
- again, however, no promises are made about the rules for determining this
- order, or whether they're even the same each time you run the program.
- I'm sure far more can be said about the uses of RTTI, but I'm not the one to
- say it at this point in my career. Even if I were, this is not the place to
- say it. For now, you know what the standard C++ library has to know about
- RTTI.
-
- Listing 1 The header <new>
- #ifndef _NEW_____LINEEND____
- #define _NEW_____LINEEND____
- #include <exception>
- // class xalloc
- class xalloc : public xruntime {
- protected:
- // virtual void do_raise();
- public:
- xalloc(const char * = 0, const char * = 0);
- virtual ~xalloc();
- };
- // function and object declarations
- fvoid_t *set_new_handler(fvoid_t *);
- void operator delete(void *);
- void operator delete[](void *);
- void *operator new(size_t);
- void *operator new[](size_t);
- void *operator new(size_t, void *);
- extern fvoid_t (*_New_hand);
- #endif
-
-
- Listing 2 The function operator new(size_t)
- // operator new(size_t) REPLACEABLE function
- #include <stdlib.h>
- #include <new>
-
- void *operator new(size_t size)
- { // try to allocate size bytes
- void *p;
- while ((p = malloc(size)) == 0 && _New_hand != 0)
- (*_New_hand)();
- return (p);
- }
-
- // End of File
-
-
- Listing 3 The function operator delete(void *)
- // operator delete(void *) REPLACEABLE function
- #include <stdlib.h>
- #include <new>
-
- void operator delete(void *p)
- { // free an allocated object
- free(p);
- }
-
-
- // End of File
-
-
- Listing 4 The function operator new(size_t, void *)
- // operator new(size_t, void *)
- #include <new>
-
- void *operator new(size_t, void *p)
- { // allocate in place
- return (p);
- }
-
- // End of File
-
-
- Listing 5 The header <typeinfo>
- #ifndef _TYPEINFO_____LINEEND____
- #define _TYPEINFO_____LINEEND____
- // class badtypeid
- class badtypeid : public xlogic {
- protected:
- // virtual void do_raise();
- public:
- badtypeid();
- virtual ~badtypeid();
- };
- // class typeinfo
- class typeinfo {
- const char *_Name;
- const int _Desc; // implementation dependent
- typeinfo(const typeinfo&);
- typeinfo& operator=(const typeinfo&);
- public:
- virtual ~typeinfo();
- int operator==(const typeinfo&) const;
- int operator!=(const typeinfo& _Rop) const
- {return (!(*this == _Rop)); }
- int before(const typeinfo&);
- const char *name() const {return (_Name); }
- };
- #endif
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Questions & Answers
-
-
- Run-Time Type Checking in C++
-
-
-
-
- Kenneth Pugh
-
-
- Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++
- language courses for corporations. He is the author of All On C, C for COBOL
- Programmers, and UNIX for MS-DOS Users, and was a member of the ANSI C
- committee. He also does custom C/C++ programming and provides
- SystemArchitectonics services. His address is 4201 University Dr., Suite 102,
- Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also
- receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.
-
-
-
-
- Run-Time Typing
-
-
- I have taught myself C++ and am comfortable with most of its concepts. There
- is one aspect that I am unsure about. Bjarne Stroustrup, in his book The C++
- Programming Language states: "using run-time type inquiries ... destroys all
- modularity in a program and negates the aims of object-oriented programming."
- I attempt to adhere to this suggestion, but there is a situation in which I
- don't know how to apply the rule: reading and writing objects to data files.
- As an example, let's suppose we are writing a simple checkbook balancing
- program. The base object is an Entry, which corresponds to something you would
- enter in your checkbook. Derived from base class Entry are three classes which
- can be instantiated: Check (a written bank draft), Deposit (a bank window
- deposit), and Withdrawal (a window withdrawal). Keeping track of the type of
- objects in memory is easy, since when one of the three derived classes is
- created, it is done through a constructor which builds in type information.
- There are many ways to write the checkbook entries to a data file, so let's
- assume we just write a character-based representation to the file. The
- question is how can we write the data so that we can read the information back
- into memory? The only way I can think of is to include type information, such
- as an integer coded for each class type. This method violates Mr. Stroustrup's
- rule, though. Any thoughts you have on this matter would be greatly
- appreciated.
- Paul Waldo
- Forest, VA
- A
- One reason Bjorne was against run-time checking was because it enables
- programmers to avoid derivation. For example, suppose you have a type_of
- function that can identify the Entry type of a pointer. To post an entry, you
- might code something that looks like Listing 1. If you need to add an
- additional type of Entry, then you have to add another case to the switch
- statement.
- However, there's a cleaner way to handle this problem. Suppose you give Entry
- a pure virtual post function. Each class derived from Entry now supplies its
- own post function. The post_entry function could look like Listing 2. When
- adding a new derived class, you do not need to change the post_entry function.
- You just provide a post function for the new class.
- As you have suggested, virtual functions only work while objects are in
- memory. Each object of a class with virtual functions contains a pointer to a
- table of function pointers (the vtable, as it is sometimes referred to). All
- objects of a particular class contain a pointer to the same table. When the
- program calls a virtual member function it uses the pointer to the
- corresponding function in the vtable. In a sense, this vtable pointer uniquely
- identifies the class type of an object. In fact, some compilers have
- non-standard extensions that can use this pointer to provide a form of
- run-time type identification. Other vendors provide alternative methods for
- run-time identification. Microsoft has a CRuntimeClass object associated with
- each class that is used with the IsKindOf function. With this function, you
- can determine if an object belongs to a particular class or if it is derived
- from a class. (The class must be derived from the Microsoft CObject class to
- work with IsKindOf.) To use the IsKindOf function, you include a
- DECLARE_DYNAMIC macro in the class definition and an IMPLEMENT_DYNAMIC macro
- in the class implementation. CRuntimeClass is the data type used to store
- class information. You can obtain a pointer to an object of this type with:
- CRuntimeClass * pclass = RUNTIME_CLASS(Your_class);
- Typically you do not use the information in CRuntimeClass directly, but
- instead pass it to IskindOf. For example, as shown in the following code
- fragment, you might want to cast a base class pointer to a pointer to a real
- object. To be logically correct, you need to be sure the object pointed to
- belongs to a particular class.
- CObject * pyour_object = new Your_class;
- ...
- if ( pyour_object->IsKindOf( RUNTIME_CLASS(Your_class) ) )
- {
- // Cast it
- Your_class * p_this_object =
- (Your_class *) pyour_object;
- }
- You can use this type information to create objects of a given class, The
- CRuntimeClass class provides a member function CreateObject for dynamically
- creating objects. The use of CreateObject is demonstrated as follows:
- CRuntimeClass * p_your_class = RUNTIME CLASS(Your_class);
- // Create it
- CObject * pyour_object =
- p_your_class->CreateObject();
- // Cast it to the class
- Your_class * p_this_object =
- (Your_class *) pyour_object;
- In your question, you have run across one area of programming that requires
- some form of type identification: permanent or persistent storage. You cannot
- use the address of a vtable as the identification means for an object. When
- the contents of an object are read back into memory, the chances are that the
- vtable will be in a different memory location. You will need to store some
- form of type identification with the object. This identification could either
- be an integer value or the string name of the class. If your program stores
- objects in a known sequence and retrieves them by the same known sequence,
- then you do not need any type identification stored with the object. For
- example, if you always store Checks starting at the first position in the
- file, followed by Deposits and then Withdrawals, you can determine the type of
- an object by its position in the file. In your example, you cannot assume this
- kind of ordering exists, so you must store a class identifier. There are lots
- of ways to store this identifier, depending on how your classes are organized.
- Let's assume you have some unique identifier for each account entry type, such
- as:
- enum Entry_type {Entry_check, Entry_deposit,
- Entry_withdrawal};
- One of these values would be stored away with each object. For example, if
- your compiler provides run-time type identification with type_of, you might
- code:
- Entry::save()
- {
- if ( this->type_of() == Check )
- // Store Entry_check
- else if (this->type_of() ==
- Deposit )
- // Store Entry_deposit
- else if (this->type_of() ==
- Withdrawal )
-
- // Store Entry_withdrawal)
- ...
- // Store remaining contents
- // in an account
- }
- Before I get too much mail regarding this "hidden switch statement," let me
- explain that this function is complementary to the retrieval function, which I
- will show shortly. You could use a virtual function for save in each of the
- derived classes. The function would store the appropriate Entry_type value,
- call a save function in the Entry class to store the data members in that
- class, and then save its own data members. If you were using the Microsoft
- compiler, this type checking code might look like the following fragment.
- However, Microsoft provides other functions that make this code unnecessary,
- as we shall see shortly.
- Entry::save()
- {
- CRuntimeClass * pcheck_class
- = RUNTIME_CLASS(Check);
- CRuntimeClass * pdeposit_class
- = RUNTIME_CLASS(Deposit);
- CRuntimeClass * pwithdrawal_class
- = RUNTIME_CLASS(Withdrawal );
- if ( this->IsKindOf(pcheck) )
- // Store Entry_check
- else if (this->IsKindOf(pdeposit) )
- // Store Entry_deposit
- else if (this->IsKindOf(pwithdrawal) )
- // Store Entry_withdrawal)
- ...
- // Store remaining contents in an account
- }
- The basic dilemma comes in retrieving the objects. You cannot use a retrieve
- function for each derived class, as you do not know the type of entry for the
- next object stored in the file. Thus you can only retrieve an entry as a base
- class object. For example, the caller might use something that looks like:
- Entry *pentry;
- Account account;
- ...
- account.retrieve_next(&pentry);
- The retrieve_next function could look like the following:
- int Account::retrieve_next(Entry **pentry_in)
- {
- // Get rid of old pointer
- Entry *pentry = *pentry_in;
- if (*pentry != NULL)
- delete pentry;
- // Read the record off disk
- // Then check the type of the record ready
- if (type == Entry_check)
- *pentry = new Check;
- else if (type == Entry_deposit)
- *pentry = new Deposit;
- else if (type == Entry_withdrawal)
- *pentry = new Withdrawal;
- // Move associated data into the type
- ...
- // Then return the pointer
- *pentry_in = pentry;
- }
- The caller passes this function a pointer to pointer to the Entry class. The
- function deletes the Entry that was pointed to by pentry_in. The Entry class
- should provide a virtual destructor, so that any additional data members in
- the derived classes are deallocated. Based on the Entry_type value stored in
- the file, the function allocates the appropriate pointer. The function moves
- the necessary data from the file into the new object and returns the value of
- the pointer. If necessary, each derived class can include a function that
- reads any additional data members from the file.
- The Microsoft archive retrieval function (CArchive::operator>>) does not
- require this if-else structure. The storage function (CArchive::operator<<)
- stores a run-time class identifier (the name of the class) in the file. When
- the retrieval function reads the file, it dynamically creates an object of the
- stored class and loads the information from the file into that object.
- Microsoft's multi-pronged approach to run-time typing eliminates worries about
- the details of persistent object storage and retrieval. All you have to do is
- to include the necessary macros in your object header file and your object
- implementation file.
-
-
- C++ Problems
-
-
- I am an agricultural economist at the U.S. Department of Agriculture and am
- trying to learn C and C++. I have bought over ten books on C, and am reading
- them and keying in the exercises to practice the concepts. One of the books I
- have bought is Learning C++, by Neil Graham. I began keying in one of the
- exercises (in C++) and the program gave me a bunch of iostream declaration
- errors when I attempted to compile the source code using Borland Turbo C++,
- 1991 (Listing 3). I saw your column in The C Users Journal and thought that
- you might be willing to suggest what is wrong, and how I might fix it. Also,
- do you have any recommendations for textbooks and/or diskette tutorials that
- may be useful in learning C and C++?
- Kenneth W. Erickson
- Washington, D.C.
- A
-
- Reading several books and trying their examples is an excellent way to learn a
- new language. You can get a good feel for alternative ways of approaching
- problems. However, when you use only books for learning, you often run into
- problems for which a book has no answer. Many times I've been in the same
- quandary when using vendor-supplied manuals. In some cases the solution to my
- problem was simple but the manual just didn't deal with the problem.
- In your case, what appears to be an unexplainable error is caused by a simple
- glitch. You named the program with a ".c" extension. The compiler took this to
- mean that your program was to be compiled as a C program. Inside the
- iostream.h file are several C++ syntactic constructs (such as class). The
- compiler reported error messages for these constructs since they do not exist
- in C. If you had named the program with a .cpp extension, the compiler would
- have cleanly compiled the program. You might have been confused if you
- compiled other C++ programs without errors and therefore you thought there was
- something wrong with this program. Many compilers (such as Borland), have a
- switch that compiles .c files as C++ programs. If this option was set in your
- configuration file for previous programs, then other C++ programs with .c
- extensions would have compiled properly. If you switched around directories or
- reinstalled the compiler, the switch may have been reset.
- There are numerous books out on C. Many are based on your knowledge of other
- languages. My book C Language for Programmers (QED) gives comparisons of C
- constructs with COBOL, PASCAL, PL/1, BASIC, and FORTRAN. My new book C
- Language for COBOL Programmers (QED) presents very detailed comparisons
- between COBOL language statements and C. All on C (Harper Collins) explains C
- without comparisons to other languages, but with numerous examples. In the C++
- arena, I suggest C++ Programming and Fundamental Concepts by Anderson and
- Heinze (Prentice-Hall). I use that book as a supplementary book for my C++
- courses.
-
-
- User interface
-
-
- Q
- I am new to the C language and am having a problem that you might help me
- with. I am writing a football card data base, and need help with the user
- interface. What I am trying to do is let the user of my program enter required
- information without pressing the enter key. I also want the user to be able to
- move from one data entry field to another using the arrow and home keys of the
- key pad. If the user presses the arrow keys the highlighted field would move
- up or down as required, but if a letter key or a number key was pressed, the
- letter or number would be concatenated to the appropriate char string
- variable. The concatenating part I can handle. It's getting the input that's
- giving me the problem. I've tried many different approaches to make this work,
- but still no luck. If you could help me with this problem I would be very
- grateful. I'm a subscriber to the C Users Journal, and will be checking the
- Questions & Answers section to see if you can solve my problem. I am using
- Borland C++ version 3.0 (DOS).
- Randy Jones
- Ruther Glen, VA
- A
- As you have discovered, data-field entry functions are not included in the
- Standard C libraries. Numerous shareware and commercially-available libraries
- will perform the operations you list. For a listing of available shareware
- consult The C User's Group Public Domain Catalog [available for free upon
- request from R&D Publications, 1601 W. 23rd St., Lawrence, KS 66046. Ph:
- (913)-841-1631. Fax: (913)-841-2624. e-mail: cujsub@rdpub.com. Perusing The C
- Users' Journal itself will reveal a number of the major vendors of display
- packages. Since for the past year I have been programming almost exclusively
- in Microsoft Windows using Visual C++, I haven't kept up with all the
- different features of the commercial packages. Many of these packages have
- integrated screen designers/code generators. With such a system, you can
- layout your screens with a mouse or cursor keys, rather than by writing
- individual function calls for each field. In case you can't find what you want
- in either the catalog or the ads, my book All on C shows a sample
- implementation of a package of display functions providing most of the
- features you requested.
-
- Listing 1 One way to do run-time type checking
- void Account::entry_post(Entry * pentry)
- {
- switch(type_of(pentry))
- {
- case Deposit_entry:
- balance += pentry->amount;
- break;
- case Check_entry:
- balance -= pentry->amount;
- break;
- case Withdrawal_entry:
- balance -= pentry->amount;
- break;
- }
- /* End of File */
-
-
- Listing 2 Virtual functions simplify run-time type checking
- void Account:: entry_post(Entry * pentry)
- {
- balance = pentry->post(balance);
- }
- /* End of File */
-
-
- Listing 3 Code that one compiler didn't like
- /* GRM-P45.C -- Graham, Learning C++, p. 45 */
- // File rabbits.ccp
- // Program for inverse Fibonacci rabbit problem
-
- #include <iostream.h>
-
- main()
- {
- // Get input from user
-
- long current; // number of pairs this month
- long fertile; // number of fertile pairs
- long needed; // number of pairs needed
- ...
- }
-
- Error messages:
-
-
- Compiling C:\GRM-P45.C:
- Error E:\TC\INCLUDE\IOSTREAM.H 38: Declaration syntax error
- Error E:\TC\INCLUDE\IOSTREAM.H 39: Declaration syntax error
- Error E:\TC\INCLUDE\IOSTREAM.H 42: Declaration syntax error
- ...
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Stepping Up To C++
-
-
- The Return Types of Virtual Functions
-
-
-
-
- Dan Saks
-
-
- Dan Saks is the president of Saks & Associates, which offers consulting and
- training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan
- is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall
- Validation Suite for C++ (both with Thomas Plum). You can reach him at 393
- Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601, or
- electronically at dsaks@wittenberg.edu.
-
-
- WG21+X3J16, the joint ISO ANSI C++ technical committee, is now in its fifth
- year of work on a standard definition for the C++ programming language and its
- accompanying library. Over the years, the committee has added more than a
- dozen new features to the language. I described several of them:
- templates
- exception handling
- new keywords and digraphs to support European translation environments
- wchar_t as a keyword
- function and operator overloading for enumeration types
- operator new[] and operator delete[] for array allocation
- in "Recent Language Extensions to C++", CUJ, June 1993, and described one
- other:
- forward declaration of nested classes in "Nested Classes", CUJ, July 1993.
- This month, I'll explain another extension:
- relaxed restrictions on the return types for virtual functions
- This feature enhances the language's support for object-oriented programming.
- In particular, it extends the set of valid conversions within a type hierarchy
- that you can write without casting. I assume you are familiar with virtual
- function in C++, which I described in my last three columns (see "Virtual
- Functions," "How Virtual Functions Work," and "Overloading and Overriding" in
- CUJ, December, 1993 through February, 1994).
-
-
- What the ARM Says
-
-
- According to the The Annotated C++ Reference Manual (ARM) (Ellis and
- Stroustrup [1990]), a member function declared in a derived class D overrides
- a virtual function in its base class B only if that function in D has the same
- name and signature (sequence of parameter types) as the function it overrides.
- Clearly, if the name of a function, say f, declared in D differs from the name
- of every function declared in B, then f doesn't override anything. f's
- declaration adds a new name to the scope of D. If D declares a function with
- the same name, again say f, as one or more functions inherited from B, but D's
- f has a signature that differs from the signatures of every f function in B,
- then again D's f doesn't override anything. In this case, D's f hides all of
- B's f functions while in the scope of D. This is not an error, but as I
- explained last month, the resulting behavior might surprise you. Consequently,
- many C++ compilers issue a warning when this sort of hiding occurs.
- What about differing return types? According to Section 10.2 of the ARM: "It
- is an error for a derived class function to differ from a base class virtual
- function in the return type only." For example, in
- class B
- {
- public:
- virtual int vf(int);
- ...
- };
-
- class D : public B
- {
- ...
- void vf(int); // error
- ...
- };
- the declaration of D::vf is an error because it has the same name and
- signature as a function declared in B, but B::vF and D::vF have different
- return types.
- Consider the consequences of allowing different return types in this
- situation. A function g that accepts a formal parameter of type B * or B & can
- apply vf to that B object, as in
- void g(B *bp)
- {
- ....
- if (bp->vf(0) > 1)
- ...
- }
- When compiling this function, the compiler considers only the definition for
- class B. None of the classes derived from B need to be declared when compiling
- g. Based on the static type of B::vf, the call bp->vf(0) in g should be just
- fine; it returns an int, as the if statement in g apparently expects.
- Since D is publicly derived from B, you can pass a D * to g, as in
- D d;
- ...
-
- g(&d);
- But now if D::vf has a void return, how can the call bp->vf(0) possibly return
- an int? It can't, which is why the ARM insists that an overriding virtual
- function must have the same return type as the function it overrides.
-
-
- Cloning Objects
-
-
- Some members of the standards committee suggested that this rule is more
- restrictive than it needs to be. There are, in fact, legitimate circumstances
- where the return types of the overridden and overriding functions need not be
- absolutely identical. Clone functions are one such circumstance.
- Some applications need to be able to clone an object, that is, create an
- object that's an exact copy of another object. Typically, you implement a
- class X with a cloning function as something like
- class X
- {
- public:
- X *clone() const
- { return new X(*this); }
- ...
- };
- Inside X::clone, *this is an expression of type X that designates the object
- being cloned. The expression new X(*this) allocates a new X object and
- initializes it using X's copy constructor (the X constructor that takes an
- argument of type X). clone is a const member function because it does not
- alter *this, and should thus apply to const as well as non-const objects.
- new X returns an X *. Although you could write a clone function that returns
- an X or an X &, I suggest returning X * to emphasize that clone returns a
- dynamically-allocated object that should be deleted eventually. In general,
- for any class X, the return type of X::clone should be X *. For example, in a
- library of geometric shapes, circle::clone should return a circle * and
- rectangle::clone should return a rectangle *.
- When used in a class that's the root of a polymorphic hierarchy, clone
- functions should be virtual, and often pure virtual. This lets you clone an
- object without knowing its exact type. For example, Listing 1 shows a
- polymorphic hierarchy of shapes similar to the one I presented three months
- ago (see "Virtual Functions", CUJ, December 1993). Classes circle, rectangle,
- and triangle are all derived from base class shape. shape declares a pure
- virtual clone function as:
- virtual shape *clone() const = 0;
- Each of the derived classes overrides the clone function with an impure
- definition. For instance, class circle declares
- virtual shape *clone() const;
- in the class definition, and later defines
- shape *circle::clone() const
- {
- return new circle(*this);
- }
- Not long ago I said that, in general, a clone function for a class X should
- have a return type of X *. But here the return type of circle::clone is shape
- *, not circle *. According to ARM, circle::clone must return a shape * because
- that is the return type of the function it overrides. Nonetheless, this clone
- function is still quite useful.
- circle::clone returns the result of new circle, which is indeed a circle *.
- Since circle is publicly derived from shape, a circle is a shape, so the
- circle * in the return expression converts safely and quietly to the return
- type shape *. A similar conversion occurs in the clone functions for rectangle
- and triangle shown in Listing 1. An application can clone an arbitrary shape
- with code such as:
- shape *s;
- ...
- shape *cs = s->clone();
- which leaves cs pointing to an object that has the same dynamic type and value
- as *s.
- Listing 2 shows a more elaborate example dealing with collections of shapes
- implemented as arrays of pointers. The clone_all function replicates an entire
- collection of shapes. First, it allocates a new array to hold the pointers to
- the shape clones. Then, for each shape in the original collection, it clones
- that shape (using its virtual clone function) and places the pointer to the
- copy into the new array. As is always the case with polymorphic objects, it
- doesn't matter to clone_all how many different shapes there are; each shape
- knows how to clone itself.
-
-
- Unnecessary Downcasting?
-
-
- The ARM's requirement that the return type of an overriding function must be
- the same as the return type of the function it overrides apparently doesn't
- pose any problems when dealing with pointers (or references) to objects at the
- root of the hierarchy, as in Listing 2. However, it often necessitates using
- casts when dealing with pointers to objects of other types in the hierarchy.
- For example, you cannot write
- rectangle *r;
- ...
- rectangle *cr = r->clone();
- because r->clone() returns a shape *. Even though a rectangle is a shape, a
- shape is not necessarily a rectangle. Therefore, you must add a cast, as in
- rectangle *cr = (rectangle *)r->clone();
- Casts are dangerous things. They tell the compiler to stop complaining and let
- you do what you want to do, or at least, what you think you want to do. But
- compilers are right more often than we care to admit. Casts indicate that you
- are doing something that is generally unsafe. Thus, you really should avoid
- casts in your C++ programs, probably even more so than in C programs. When
- casts are rare, the few casts you really do need will stand out and draw more
- scrutiny, which they deserve. (For more about avoiding casts, see Plum and
- Saks [1991].)
- Of course, you don't really need a cast in the previous example, because you
- don't really need to call clone to replicate a shape that you know is a
- rectangle. Rather, you can simply clone *r with
- rectangle *cr = new rectangle(*r);
- But consider what happens if rectangle and triangle (and any other polygons)
- are derived from an abstract base class polygon derived from shape, rather
- than directly from shape, as outlined in Listing 3. (An abstract base class is
- a class with at least one pure virtual function. An abstract base class can be
- a derived class.)
- To satisfy the ARM, all the clone functions in Listing 3 return a shape *.
- (The declaration of polygon's clone function is inside a comment because you
- don't really need it. A function declared pure virtual in a base class remains
- pure virtual in the derived class unless overridden with an impure
- declaration.) When you clone a polygon, you get a pointer of type shape *,
- even though you know that pointer specifically addresses a polygon. Thus, you
- cannot clone a polygon and copy the result to a polygon * without a cast. That
- is, you cannot omit the cast in
- polygon *p;
- ...
- polygon *cp = (polygon *)p->clone();
- All of the casts in the previous examples cast pointers to base class objects
- into pointers to derived class objects. These casts are commonly called
- "downcasts" because most people draw class hierarchies with the base classes
- above their derived classes. Remember, when a class D is publicly derived from
- B, a D is a B, so you can safely convert a D * to a B * without a cast. But a
- B * is not necessarily a D *, so you can't convert a B * downward to a D *
- without a cast. Thus, like all other casts, downcasts are generally unsafe
- unless you're absolutely, positively sure that your B * actually points to a
- D.
- But the downcast in
- polygon *cp = (polygon *)p->clone();
- is actually quite safe because we do know that p->clone() returns a polygon *.
- In fact, there's a whole family of similarly safe downcasts that occur
- commonly in object-oriented systems. The problem is that, on the surface, the
- safe casts look just like the unsafe ones. The cure is to augment the rules
- for virtual overriding so that you can write the safe conversions without
- casts.
- For example, you should be able to declare each virtual clone function in a
- hierarchy so that it returns a pointer whose static type is the same as its
- dynamic type. That is, circle::clone should return a circle * and
- rectangle::clone should return a rectangle *, even though shape::clone returns
- a shape *. Then you can clone any shape, or anything derived from shape,
- without a cast. For instance, given
- shape *s;
-
- circle *c;
- you can write
- shape *cs = s->clone();
- to clone a shape, and
- circle *cc = c->clone();
- to clone a circle.
- In a sense, declaring circle::clone to return a circle * doesn't introduce any
- new conversions. It merely shifts the exact point where the conversions occur,
- or eliminates the conversions altogether. For example, when you write
- circle::clone as
- shape *circle::clone() const
- {
- return new circle(*this);
- }
- a conversion from circle * to shape * occurs as part of the return statement.
- Then there's no conversion at all in a calling expression like
- shape *s;
- ...
- shape *cs = s->clone();
- In contrast, when you write circle::clone as
- circle *circle::clone() const
- {
- return new circle(*this);
- }
- no conversion occurs inside the function, but an implicit conversion from
- circle *(or rectangle *or triangle *) to shape * occurs in the calling
- expression. The net effect is the same.
-
-
- The New, Relaxed Rules
-
-
- The C++ standards committee agreed that the ARM's requirement on the return
- type of virtual functions is a bit too restrictive. Thus, the current draft of
- the Working Paper (the standard-to-be) relaxes the original rule. The new rule
- as it appears in the Working Paper is jargon-rich and seems to change with
- each new draft, so I'll spare you the exact words. Here's more-or-less what it
- says:
- For all classes B and D defined as
- class B
- {
- ...
- virtual BT f();
- ...
- };
-
- class D : public B
- {
- ...
- DT f();
- ...
- };
- types BT and DT must be identical, or they must satisfy either of the
- following conditions:
- 1. BT is BB * and DT is DD *where DD is derived from BB.
- 2. BT is BB & and DT is DD & where DD is derived from BB.
- In either case (1) or (2),
- 3. class D must have the access rights to convert a BB *(or BB &) to a DD *(or
- DD &, respectively).
- In most common applications, BB is a public base class of DD, so D can perform
- the conversions. But, for example, if BB is a private base class of DD then
- the conversions are not valid, and BT and DT will not satisfy condition (3).
- The above rules apply even if D is derived indirectly from B. Or, BB might be
- B and DD might be D. The latter, in fact, is the case with clone functions.
- Listing 4 shows the shape hierarchy of Listing 1 rewritten using the new
- relaxed rules for the return type of virtual functions. For completeness, I've
- included all the member function bodies so you can use them to build and
- execute the test code in Listing 2.
- Although the committee adopted these relaxed rules in March, 1992, I believe
- most vendors have yet to release a C++ compiler that supports them. As of
- early 1994, only two of the six PC-based compilers I own (Borland 4.0 and
- Watcom 9.5) can compile Listing 4 without error.
-
-
- cv-qualifiers in Return Types
-
-
- According to the current (September 1993) Working Paper, the cv-qualifiers
- (const and volatile) in the return types of the overriding and overridden
- functions need not be identical. My understanding is that the overriding
- function's return type cannot have any cv-qualifiers that are not also in the
- overridden function's return type. Listing 5 shows some examples.
- Class B in Listing 5 declares virtual function f with a return type const BB
- *, but class D overrides it with a function that returns DD * (where DD is
- publicly derived from BB). Hence, if bp is a B * that actually points to a D,
- then
- const BB *bbp = bp->f();
- invokes D::f applied to *b. D::f returns a pointer to a non-const DD object,
- which the expression bp->f() quietly converts to const BB *.
-
- Pointer conversions that add cv-qualifiers are always safe, but conversions
- that strip off cv-qualifiers are not. Thus, given
- char *cp;
- const char *ccp;
- then
- ccp = cp;
- is safe, but
- cp = ccp;
- is not. Similarly, as you convert derived types to their base types, adding
- cv-qualifiers to pointer types should not make the conversions any less safe.
- Listing 5 also shows that derived class D has a virtual function g returning a
- const DD & that overrides a function returning a const volatile BB &. This is
- also valid. However, if you omit volatile from the return type of B::g, then
- D::f is erroneous.
- None of the compilers I own support this feature yet.
-
-
- More to Come
-
-
- Over the past two years, the standards committee has added several other new
- features to C++:
- run-time type identification
- mutable members for const objects
- namespaces
- a predefined boolean type
- new syntax for casts
- I will explain them all in upcoming columns.
-
-
- Meeting Dates, Etc.
-
-
- WG21+X3J16 will meet three times in 1994:
- March 6-11 in San Diego, CA USA
- July 10-15 in Waterloo, Ontario Canada
- November 6-11 in Valley Forge, PA USA
- If all goes as scheduled, the draft standard should be available for public
- review and comment shortly after the July meeting.
- If you would like to participate in the standards process as a member of
- X3J16, contact the vice-chair:
- Josée Lajoie
- IBM Canada Laboratory
- 844 Don Mills Rd.
- North York, Ontario M3C 1V7 Canada
- (416)448-2734
- josee@vnet.ibm.com
- References
- Ellis and Stroustrup [1990]. Margaret A. Ellis and Bjarne Stroustrup. The
- Annotated C++ Reference Manual. Addison- Wesley.
- Plum and Saks [1991]. Thomas Plum and Dan Saks. C++ Programming Guidelines.
- Plum Hall.
-
- Listing 1 A class hierarchy with virtual cloning functions that have identical
- return type
- //
- // base class 'shape'
- //
- class shape
- {
- public:
- enum palette { BLUE, GREEN, RED };
- shape(palette c);
- virtual double area() const = 0;
- virtual shape *clone() const = 0;
- palette color() const;
- virtual const char *name() const = 0;
- virtual ostream &put(ostream &os) const;
- private:
- palette _color;
-
- static const char *color_image[RED - BLUE + 1];
- };
-
- ...
-
- //
- // class 'circle' derived from 'shape'
- //
- class circle: public shape
- {
- public:
- circle(palette c, double r);
- double area() const;
- shape *clone() const;
- const char *name() const;
- ostream &put(ostream &os) const;
- private:
- double radius;
- };
-
- shape *circle::clone() const
- {
- return new circle(*this);
- }
-
- ...
-
- //
- // class 'rectangle' derived from 'shape'
- //
- class rectangle : public shape
- {
- public:
- rectangle(palette c, double h, double w);
- double area() const;
- shape *clone() const;
- const char *name() const;
- ostream &put(ostream &os) const;
- private:
- double height, width;
- };
-
- shape *rectangle::clone() const
- {
- return new rectangle(*this);
- }
-
- ...
-
- //
- // class 'triangle' derived from 'shape'
- //
- class triangle : public shape
- {
- public:
- triangle(palette c, double s1, double s2, double a);
- double area() const;
- shape *clone() const;
- const char *name() const;
-
- ostream &put(ostream &os) const;
- private:
- double side1, side2, angle;
- };
-
- shape *triangle::clone() const
- {
- return new triangle(*this);
- }
-
- ...
-
- // End of File
-
-
- Listing 2 A crude application of the shape hierarchy in Listing 1
- //
- // 'clone_all' clones shape array 'sal' with 'n'
- // elements
- //
- shape **clone_all(shape *sa1[], size_t n)
- {
- shape **sa2 = new shape *[n];
- for (size_t i = 0; i < n; ++i)
- sa2[i] = sa1[i]->clone();
- return sa2;
- }
-
- //
- // 'largest' returns the shape with the largest
- // area from shape array 'sa' with 'n' elements
- //
- shape *largest(shape *sa[], size_t n)
- {
- shape *s = 0;
- double m = 0;
- for (size_t i = 0; i < n; ++i)
- if (sa[i]->area() > m)
- {
- m = sa[i]->area();
- s = sa[i];
- }
- return s;
- }
-
- int main()
- {
- const int N = 4;
- shape *s[N];
- shape *ls;
- s[0] = new circle(shape::RED, 2);
- s[1] = new triangle(shape::BLUE, 5, 6, asin(0.8));
- s[2] = new rectangle(shape::RED, 3, 4);
- s[3] = new circle(shape::GREEN, 3);
- cout << "The shapes are:\n";
- for (int i = 0; i < N; ++i)
- cout << i << ")\t" << *s[i] << '\n';
- cout << '\n';
- shape **cs = clone_all(s, N);
-
- cout << "The cloned shapes are:\n";
- for (i = 0; i < N; ++i)
- cout << i << ")\t" << *cs[i] << '\n';
- cout << '\n';
- ls = largest(cs, N);
- cout << "The shape with the largest area is a...\n\t";
- cout << *ls << ".\n";
- cout << "Its area is "<< ls->area() << ".\n";
- return 0;
- }
-
- // End of File
-
-
- Listing 3 The shape hierarchy with rectangle and triangle derived from
- abstract base class polygon
- class shape
- {
- public:
- ...
- virtual shape *clone() const = 0;
- ...
- };
-
- ...
-
- class circle : public shape
- {
- public:
- shape *clone() const;
- ...
- };
-
- shape *circle::clone() const
- {
- return new circle(*this);
- }
-
- ...
-
- class polygon : public shape
- {
- public:
- // shape *clone() const = 0; // still pure
- ...
- };
-
- ...
-
- class rectangle : public polygon
- {
- public:
- shape *clone() const;
- ...
- };
-
- shape *rectangle::clone() const
- {
- return new rectangle(*this);
- }
-
-
- ...
-
- class triangle : public polygon
- {
- public:
- shape *clone() const;
- ...
- };
-
- shape *triangle::clone() const
- {
- return new triangle(*this);
- }
-
- ...
-
- // End of File
-
-
- Listing 4 The shape class hierarchy with virtual cloning functions employing
- the relaxed return type rules
- #include <iostream.h>
- #include <math.h>
- #include <stddef.h>
-
- //
- // base class 'shape'
- //
- class shape
- {
- public:
- enum palette { BLUE, GREEN, RED };
- shape(palette c);
- virtual double area() const = 0;
- virtual shape *clone() const = 0;
- palette_color() const;
- virtual const char *name() const = 0;
- virtual ostream &put(ostream &os) const;
- private:
- palette_color;
- static const char *color_image[RED - BLUE + 1];
- };
-
- const char *shape::color_image[shape::RED - shape::BLUE + 1] =
- { "blue", "green", "red" };
-
- shape::shape(palette c) : _color(c) { }
-
- ostream &shape::put(ostream &os) const
- {
- return os << color_image[color()] << ' ' << name();
- }
-
- shape::palette shape::color() const
- {
- return_color;
- }
-
- //
-
- // class 'circle' derived from 'shape'
- //
- const double pi = 3.1415926;
-
- class circle : public shape
- {
- public:
- circle(palette c, double r);
- double area() const;
- circle *clone() const;
- const char *name() const;
- ostream &put(ostream &os) const;
- private:
- double radius;
- };
-
- circle::circle(palette c, double r) : shape(c), radius(r) { }
-
- double circle::area() const
- {
- return pi * radius * radius;
- }
-
- const char *circle::name() const
- {
- return "circle";
- }
-
- ostream &circle::put(ostream &os) const
- {
- return shape::put(os) << "with radius = " << radius;
- }
-
- circle *circle::clone() const
- {
- return new circle(*this);
- }
-
- //
- // class 'rectangle' derived from 'shape'
- //
- class rectangle: public shape
- {
- public:
- rectangle(palette c, double h, double w);
- double area() const;
- rectangle *clone() const;
- const char *name() const;
- ostream &put(ostream &os) const;
- private:
- double height, width;
- };
-
- rectangle::rectangle(palette c, double h, double w)
- : shape(c), height(h), width(w) { }
-
- double rectangle::area() const
- {
- return height * width;
-
- }
-
- const char *rectangle::name() const
- {
- return "rectangle";
- }
-
- ostream &rectangle::put(ostream &os) const
- {
- return shape::put(os) << "with height = " << height
- << " and width = " << width;
- }
-
- rectangle *rectangle::clone() const
- {
- return new rectangle(*this);
- }
-
- //
- // class 'triangle' derived from 'shape'
- //
- class triangle : public shape
- {
- public:
- triangle(palette c, double s1, double s2, double a);
- double area() const;
- triangle *clone() const;
- const char *name() const;
- ostream &put(ostream &os) const;
- private:
- double side1, side2, angle;
- };
-
- triangle::triangle(palette c, double s1, double s2, double a)
- : shape(c), side1(s1), side2(s2), angle(a) { }
-
- double triangle::area() const
- {
- return side1 * sin(angle) * side2 / 2;
- };
-
- const char *triangle::name() const
- {
- return "triangle";
- }
-
- ostream &triangle::put(ostream &os) const
- {
- return shape::put(os) <<" with one side =" << side1
- << ", another side =" << side2
- <<" and angle = " << angle;
- }
-
- triangle *triangle::clone() const
- {
- return new triangle(*this);
- }
-
- ostream &operator<<(ostream &os, const shape &s)
-
- {
- return s.put(os);
- }
-
- // End of File
-
-
- Listing 5 A derived class whose virtual functions have fewer cv-qualifiers
- than the functions they override
- class B
- {
- ...
- virtual const BB *f();
- virtual const volatile BB &g();
- ...
- };
-
- class D: public B
- {
- ...
- DD *f();
- const DD &g();
- ...
- };
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Code Capsules
-
-
- The Preprocessor
-
-
-
-
- Chuck Allison
-
-
- Chuck Allison is a regular columnist with CUJ and a software architect for the
- Family History Department of the Church of Jesus Christ of Latter Day Saints
- Church Headquarters in Salt Lake City. He has a B.S. and M.S. in mathematics,
- has been programming since 1975, and has been teaching and developing in C
- since 1984. His current interest is object-oriented technology and education.
- He is a member of X3J16, the ANSI C++ Standards Committee. Chuck can be
- reached on the Internet at allison@decus.org, or at (801)240-4510.
-
-
- To use C effectively you really have to master two languages: the C language
- proper and the preprocessor. Before a compiler begins the usual chores of
- syntax checking and instruction translation, it submits your program to a
- preliminary phase called preprocessing, which alters the very text of the
- program according to your instructions. The altered text that the compiler
- sees is called a translation unit. In particular, the preprocessor performs
- the following three functions for you:
- 1) header/source file inclusion
- 2) macro expansion
- 3) conditional compilation
- In this article I will illustrate these features of the preprocessor.
-
-
- The Include Directive
-
-
- One of the first source lines any C programmer sees or composes is this:
- #include <stdio.h>
- Take a moment right now and jot down everything you know about this statement.
- Let's see how you did. stdio.h is of course a standard library header, so
- called because such include directives usually appear near the beginning of a
- source file so that their definitions will be in force throughout the rest of
- the compilation. We commonly think of it as a header file, but there is no
- requirement that the definitions and declarations pertaining to standard input
- and output reside in a file. The C Standard only requires that these
- definitions and declarations replace the include directive in the text of the
- program before translation. They could reside in tables internal to the
- preprocessor. Most compiler implementations do supply header files for the
- standard library, however. MS-DOS compilers install header files in a suitable
- subdirectory. Here is a sampling:
- \BC4\INCLUDE /* Borland C++ 4.0 */
- \MSVC\INCLUDE /* Microsoft Visual C++ */
- \WATCOM\H /* Watcom C/C++ */
- On UNIX systems you will find header files in /usr/include.
- Since an implementation is not even obliged to supply headers in the form of
- physical files, it's no surprise that those implementations providing files
- don't always give them the same name as the include directive. After all, how
- could a compiler supply a file named stdio.h on a platform whose file system
- didn't allow periods in a file name? On MS-DOS systems there can be no file
- that exactly matches the C++ header <strstream.h>, because the file system
- only allows up to eight characters before the period.
- Most MS-DOS implementations map header names into file names by truncating the
- base part (the portion before the period) to eight characters, and the
- extension to three (so the definitions for <strstream.h> reside in the file
- STRSTREA.H). A standard-conforming implementation must supply a mapping to the
- local file system for user-defined header names having at least six characters
- before the period and one character after.
- Conforming compilers also support include directives with string-like
- arguments, as in:
- #include "mydefs.h"
- The string must represent a name recognized by the local file system. The file
- must be a valid C/C++ source file and, like the standard headers, usually
- contains function prototypes, macro definitions, and other declarations. An
- implementation must specify the mechanism it uses to locate the requested
- source file. On platforms with hierarchical file systems, the compiler usually
- searches the current directory first. If that fails, it then searches the
- subdirectory reserved for the standard headers. Because standard header names
- are special preprocessing tokens and not strings, any backslashes in a header
- name are not interpreted as escape characters. In the following directive, a
- double backslash is not needed.
- #include <sys\stat.h> /* \, not \\ */
- #include "\project\include\mydefs.h"
- Included files may themselves contain other include directives, nested up to
- eight levels deep. Since some definitions (like typedefs) must only appear
- once during a compilation, you must guard against the possibility of a file
- being included more than once. The customary technique defines a symbol
- associated with the file. Exclude the text of the file from the compilation if
- the symbol has already been seen by the compiler, as in the following:
- /* mydefs.h */
- #ifndef MYDEFS_H
- #define MYDEFS_H
-
- <declarations/definitions go here>
-
- #endif
-
-
- Macros
-
-
- As you can see, there's more to the #include directive than meets the eye. C
- provides eleven other preprocessor directives you can use to alter your source
- text in meaningful ways (see Table 1). (All begin with the '#' character,
- which must be the first non-space character on its source line.) In this
- section I elaborate on one of the other directives, the #define directive, to
- introduce a very useful construct called a macro.
- The #define directive creates macro definitions. A macro is a name for a
- sequence of zero or more preprocessing tokens. (Valid preprocessing tokens
- include valid C language tokens such as identifiers, strings, numbers and
- operators; and any single character). For example, the line
- #define MAXLINES 500
- associates the text 500 with the symbol MAXLINES. The preprocessor keeps a
- table of all symbols created by the #define directive, along with the
- corresponding replacement text. Whenever the preprocessor encounters the token
- MAXLINES outside of a quoted string or comment, it replaces MAXLINES with the
- token 500. In later phases of compilation it appears as if you actually typed
- 500 instead of MAXLINES. It is important to remember that this operation
- consists of mere text replacement. No semantic analysis occurs during
- preprocessing.
- A macro without parameters, such as MAXLINES, is sometimes called an
- object-like macro because it defines a program constant that looks like an
- object. Because object-like macros are often constants, it is customary to
- type them in upper case as a hint to the reader. You can also define
- function-like macros with zero or more parameters, as in the following code
- fragment:
-
- #define beep() putc('\a' ,stderr)
- #define abs(x) ((x) >= 0 ? (x) : (-(x)))
- #define max(x,y) (((x) > (y)) ? (x) : (y))
- There must be no whitespace between the macro name and the first left
- parenthesis. The expression
- abs(-4)
- expands to
- ((-4) >= 0 ? (4) : (-(4)))
- You should always parenthesize macro parameters (such as x) in the replacement
- text. This practice prevents surprises from unexpected precedence in complex
- expressions. For example, if you had used the following naive mathematical
- definition for absolute value:
- x >= 0 ? x : -x
- then the expression abs(a - 1) would expand to
- a - 1 >= 0 ? a - 1 : -a - 1
- which is incorrect when a - 1 < 0 (it should be -(a - 1)).
- Even if you put parentheses around all arguments, you should usually
- parenthesize the entire replacement expression as well to avoid surprises with
- respect to the surrounding text. To see this, define abs() without enclosing
- parens, as in:
- #define abs(x) (x) >= 0 ? (x) : (-x)
- Then abs(a) - 1 expands to
- (a) >= 0 ? (a) : -(a) - 1
- which is incorrect for non-negative values of x.
- It is also dangerous to use expressions with side effects as macro arguments.
- For example, the macro call abs(i++) expands to
- ((-i++) >= 0 ? (i++) : (-(i++)))
- No matter what the value of i happens to be, it gets incremented twice, not
- once, which probably isn't what you had in mind.
-
-
- Pre-defined Macros
-
-
- Conforming implementations supply the five built-in object-like macros shown
- in Table 2. The last three macros remain constant during the compilation of a
- source file. Any other pre-defined macros that a compiler provides must begin
- with a leading underscore followed by either an uppercase letter or another
- underscore.
- You may not redefine any of these five macros with the #define directive, nor
- remove them with the #undef directive. Most compilers support multiple modes,
- some of which are not standard-conforming. (To guarantee that the sample
- program in Listing 1 will run correctly under Borland C, for example, you need
- to run in "ANSI mode" via the "-A" commandline option.)
- Conforming compilers also provide a function-like macro, assert, which you can
- use to put diagnostics in programs. If its argument evaluates to zero, assert
- prints the argument along with source file name and line number (using
- __FILE__ and _LINE_) to the standard error device and aborts the program (see
- Listing 2). For more information on using the assert macro, see the Code
- Capsule "File Processing, Part 2" in the June 1993 issue of CUJ.
- A compiler is allowed to provide macro versions for any functions in the
- standard library (getc and putc usually come as macros for efficiency). With
- the exception of a handful of required function-like macros (assert, setjmp,
- va_arg, va_end, and va_start), an implementation must also supply true
- function versions for all functions in the standard library. A macro version
- of a library function in effect hides its prototype from the compiler, so its
- arguments are not type-checked during translation. To force the true function
- to be called, remove the macro definition with the #undef directive, as in
- #undef getc
- Alternatively, you can surround the function name in parentheses when you call
- it, as in:
- c = (getc)(stdin);
- There's no danger of this expression matching the macro definition since a
- left parenthesis does not immediately follow the function name.
-
-
- Conditional Compilation
-
-
- You can selectively include or exclude segments of code with conditional
- directives. For example, you can embed the following excerpt in your code to
- accommodate different syntaxes of the delete operator in earlier versions of
- C++:
- #if VERSION < 3
- delete [strlen(p) + 1] p;
- #else
- delete [] p;
- #endif
- Your compiler probably supplies a macro similar to VERSION (Borland C++
- defines __BCPLUSPLUS__ , Microsoft _MSCVER). The argument of an #if directive
- must evaluate to an integer constant, and obeys the usual C rule of non-zero
- means true, zero false. You cannot use casts or the sizeof operator in such
- expressions.
- C++ implementations also pre-define the macro __cplusplus, which you can use
- to customize your code for mixed C/C++ environments. For example, if you want
- to link with existing C code in a C++ environment, you need to use the extern
- "C" linkage specification (which of course is not valid in a C environment).
- The following excerpt will do the right thing in either environment:
- #ifdef __cplusplus
- extern "C"
- {
- #endif
-
- <put C declarations here>
-
- #ifdef __cplusplus
- #endif
- The #if directive is handy when you want to comment out long passages of code.
- You can't just wrap such sections in a single, enclosing comment because there
- are likely to be comments in the code itself (right?), causing the outer
- comment to end prematurely. It is better to enclose the code in question in
- the body of an #if directive that always evaluates to zero:
- #if 0
- <put code to be ignored here>
- #endif
-
-
-
- Preprocessor Operators
-
-
- Sometimes you just want to know if a macro is defined, without using its
- value. For example, if you only support two compilers, you might have
- something like the following in your code:
- #if defined _MSCVER
- <put Microsoft-specific statements here>
- #elif defined __BCPLUSPLUS______LINEEND____
- <put Borland-specific statements here>
- #else
- #error Compiler not supported.
- #endif
- defined is one of three preprocessor operators (see Table 3). The defined
- operator evaluates to 1 if its argument is present in the symbol table,
- meaning that the macro was either defined by a previous #define directive or
- the compiler provided it as a built-in macro. The #error directive prints its
- argument as a diagnostic and halts the translator.
- It isn't necessary to assign a value to a macro. For example, to insert debug
- trace code into your program, you can do the following:
- #if defined DEBUG
- fprintf(stderr,"x = %d\n",x);
- #endif
- To define the DEBUG macro, just insert the following statement before the
- first use of the macro:
- #define DEBUG
- The following equivalences are recognized by the preprocessor:
- #if defined X <==> #ifdef X
- #if !defined X <==> #ifndef X
- Using the defined operator is more flexible than the equivalent directives on
- the right because you can combine multiple tests as a single expression, as
- in:
- #if defined _cplusplus && !defined DEBUG
- The operator #, the "stringizing" operator, effectively encloses a macro
- argument in quotes. As the program in Listing 3 illustrates, stringizing can
- be useful for debugging. The trace() macro encloses its arguments in quotes so
- they become part of a printf format statement. For example, the expression
- trace(i,d) becomes
- printf("i" " = %" "d" "\n",i);
- and, after the compiler concatenates adjacent string literals it sees this:
- printf("i = %d\n",i);
- There is no way to build quoted strings like this without the stringizing
- operator because the preprocessor ignores macros inside quoted strings.
- The token-pasting operator, ##, concatenates two tokens together to form a
- single token. The call trace2(1) in Listing 4 is translated into
- trace(x1,d)
- Any space surrounding these two operators is ignored.
-
-
- Implementing a s s e r t ( )
-
-
- Implementing assert reveals an important fact about using macros. Since the
- action of assert depends on the result of a test, you might first try an if
- statement, as in:
- #define assert(cond) \
- if (!(cond)) __assert(#cond,__FILE__,__LINE__)
- where the function __assert prints the message and halts the program. This
- implementation causes a problem, however, when assert finds itself within an
- if statement, as in:
- if (x > 0)
- assert(x != y)
- else
- /* whatever */
- because the preceding code expands into
- if (x > 0)
- if (!(x != y))_assert("x != y","file.c",7);
- else
- /* whatever */
- The indentation that results from expanding assert in place is misleading
- because it's actually the second if that intercepts the else. Rewriting the
- expanded code to represent the actual flow of control produces:
- if (x > 0)
- if (!(x != y))
- __assert("x != y","file.c",7)
- else /* OOPS! New control flow! */
- /* whatever */
- The usual fix for nested if problems such as this is to use braces, as in:
- #define assert(cond) \
-
- {if (!(cond))_assert
- (#cond,__FILE,__LINE__) }
- but this code expands into
- if (x > 0)
- {if (!(x != y)) _assert
- ("x != y","file.c",7)};
- else
- /* whatever */
- and the combination }; in the second line creates a null statement that
- completes the outer if, leaving a dangling else, which is a syntax error. A
- correct way to define assert is shown in Listing 5. (This simple version does
- not recognize the macro NDEBUG.) (Listing 6 shows the implementation of the
- support function __assert()). In general, when a macro must make a choice, it
- is good practice to write it as an expression and not as a statement.
-
-
- Macro Magic
-
-
- It's important to understand precisely what steps the preprocessor follows to
- expand macros, otherwise you can be in for some mysterious surprises. For
- example, if you insert the following line near the beginning of Listing 4:
- #define x1 SURPRISE!
- then trace2(1) expands into
- trace(x ## 1,d)
- which in turn becomes
- trace(x1,d)
- But the preprocessor doesn't stop there. It rescans the line to see if any
- other macros need expanding. The final state of the program text seen by the
- compiler is shown in Listing 7.
- To further illustrate, consider the text in Listing 8. Listing 8 is not a
- complete program, by the way, but is for preprocessing only -- don't try to
- compile it all the way. (If you have Borland C use the CPP command.) The
- output from the preprocessor appears in Listing 9. The str() macro just puts
- quotes around its argument. It might appear that xstr() is redundant, but
- there is an important difference between xstr() and str(). The output of the
- statement str(VERSION) is of course
- "VERSION"
- but xstr(VERSION) expands to
- str(2)
- because arguments not connected with a # or ##are fully expanded before they
- replace their respective parameters. The preprocessor then rescans the
- statement, providing "2". So in effect, xstr() is a version of str() that
- expands its argument before quoting it.
- The same relationship exists between glue() and xglue(). The statement
- glue(VERSION,3) concatenates its arguments into the token VERSION3, but
- xglue(VERSION,3) first expands VERSION, producing
- glue(2,3)
- which in turn rescans into the token 23.
- The next two statements are a little trickier:
- glue(VERS,ION)
- == VERS ## ION
- == VERSION
- == 2
- and
- xglue(VERS,ION)
- == glue(VERS,ATILE)
- == VERS ## ATILE
- == VERSATILE
- Of course, if VERSATILE were a defined macro it would be furher expanded.
- The last four statements in listing 8 expand as follows:
- ID(VERSION)
- == "This is version "xstr(2)
- == "This is version "str(2)
- == "This is version ""2"
-
- INCFILE(VERSION)
- == xstr(glue(version,2)) ".h"
- == xstr(version2) ".h"
- == "version2" ".h"
-
- str(INCFILE(VERSION))
- == #INCFILE(VERSION)
- == "INCFILE(VERSION)"
-
- xstr(INCFILE(VERSION))
- == str("version2" ".h")
- == #"version2" ".h"
- == "\"version2\" \".h\""
-
- For obvious reasons, the # operator effectively inserts escape characters
- before all embedded quotes and backslashes.
- The macro replacement facilities of the preprocessor clearly offer you an
- incredible amount of flexibility (too much, some would say). There are two
- limitations to keep in mind:
- 1) If at any time the preprocessor encounters the current macro in its own
- replacement text, no matter how deeply nested in the process, the preprocessor
- does not expand it but leaves it as-is (otherwise the process would never
- terminate!). For example, given the definitions
- #define F(f) f(args)
- #define args a,b
- F(g) expands to g(a,b), but what does F(F) expand to? (Answer: F(a,b)).
- 2) If a fully-expanded statement resembles a preprocessor directive, (e.g., if
- expansion results in an #include directive), the directive is not invoked, but
- is left verbatim in the program text. (Thank goodness!).
-
-
- Character Sets and Trigraphs
-
-
- The character set you use to compose your program doesn't have to be the same
- as the one in which the program executes. These two character sets often
- differ in non-English applications. A C translator only understands the source
- character set -- English alphanumerics, the graphics characters used for
- operators and punctuators (there are 29 of them), and a few control characters
- (newline, horizontal tab, vertical tab, and form-feed). Any other characters
- presented to the translator may appear only in quoted strings, character
- constants, header names or comments. The execution character set is the set of
- characters that the program uses in its literals, and to input and output
- data. This set is implementation-defined, but must at least contain characters
- representing alert ('\a'), backspace ('\b'), carriage return ('\r'), newline
- ('\n'), form feed ('\f'), vertical tab ('\v'), and horizontal tab ('\t').
- Many non-U.S. environments use different graphics for some of the elements of
- the source character set, making it impossible to write readable C programs.
- To overcome this obstacle, standard C defines a number of trigraphs, which are
- triplets of characters from the Invariant Code Set (ISO 646-1983) found in
- virtually every environment in the world. Each trigraph corresponds to a
- character in the source character set which is not in ISO 646 (see Table 4).
- For example, whenever the preprocessor encounters the token ??= anywhere in
- your source text (even in strings), it replaces this token with the '#'
- character code from the source character set. The program in Listing 11 shows
- how to write the "Hello, world!" program from Listing 10 using trigraphs.
- (Borland users: you have a separate executable, trigraph.exe, for procesing
- trigraphs.)
- In an effort to enable more readable programs world-wide, the C++ draft
- standard defines a set of digraphs and new keywords for non-ASCII developers
- (see Table 5). Listing 12 shows what "Hello, world" looks like using these new
- tokens. Perhaps you will agree that the symmetric look of the bracketing
- operators is easier on the eye.
-
-
- Phases Of Translation
-
-
- The C standard defines eight distinct phases of translation. An implementation
- doesn't make eight separate passes through the code, of course, but the result
- of translation must behave as if it had. The eight phases are:
- 1. Physical source characters are mapped into the source character set. This
- includes trigraph replacement and things like mapping a carriage return/line
- feed to a single newline character in MS-DOS.
- 2. All lines that end in a backslash are merged with their continuation line,
- and the backslash is deleted.
- 3. The source is parsed into preprocessing tokens and comments are replaced
- with a single space character. The C++ digraphs are recognized as tokens.
- 4. Preprocessing directives are invoked and macros are expanded. Steps 1
- through 4 are repeated for any included files.
- 5. Escape sequences in character constants and string literals that represent
- characters in the execution set are converted (e.g., '\a' would be converted
- to a byte value of 7 in an ASCII environment).
- 6. Adjacent string literals are concatenated.
- 7. Traditional compilation occurs: lexical and semantic analysis, and
- translation to assembly or machine code.
- 8. Linking occurs: external references are resolved and a program image is
- made ready for execution.
- The preprocessor performs steps 1 through 4.
-
-
- C++ And The Preprocessor
-
-
- C++ preprocessing formally differs from that of C only in the tokens it
- recognizes. A C++ preprocessor must recognize the tokens in Table 5 as well as
- .*, ->*, and ::. It must also recognize //-style comments and replace them
- with a single space. Though C++'s preprocessor isn't much different than C's,
- you may want to use it a lot differently. For example, as far as I can tell,
- there is no good reason to define object-like macros anymore. You should use
- const variable definitions instead. The statement
- const int MAXLINES = 500;
- has a couple of advantages over
- #define MAXLINES 500
- Since the compiler knows the semantics of the object, you get stronger
- compile-time type checking. You can also reference const objects like any
- other with a symbolic debugger. Global const objects have internal linkage
- unless you explicitly declare them extern, so you can safely replace all your
- object-like macros with const definitions.
- Function-like macros are almost unnecessary in C++. You can replace most
- function-like macros with inline functions. For example, replace the max macro
- as shown previously with
- inline int max(int x, int y)
- {
- return x >= y ? x : y;
- }
- Note that you don't have to worry about parenthesizing to avoid precedence
- surprises, because this code defines a real function, with scope and type
- checking. You also don't have to worry about side effects like you do with
- macros, such as in the call
- max(x++,y++)
- The macro version may seem superior to the inline function because it accepts
- arguments of any type. No problem. Define max as a template, as in the
- following code; now the inline function will accept arguments of any type:
- template<class T>
- inline int max(const T& x, const T& y)
- {
- return x > y ? x : y;
- }
- Do keep in mind, however, that inline is a only hint to the compiler. Not all
- functions are amenable to inlining, especially those with loops and
- complicated control structures. Your compiler may tell you that it can't
- inline a function. Still, in many cases it is better to define a function
- out-of-line than to define it as a macro and lose the type safety that a real
- function affords.
- There is still room in C++ for function-like macros that use the stringizing
- or token-pasting operators. The program in Listing 13 uses stringizing and an
- inline function to test the new string class available with Borland C++ 4.0.
-
-
- Conclusion
-
-
-
- The preprocessor doesn't know C or C++. It is a language all its own. Many
- library vendors have used the preprocessor intelligently to simplify the
- installation and use of their products. I encourage you to use it, but to use
- it prudently. It has some dark corners, which I've purposely avoided. It is
- good practice, especially with C++, to do as much as you can in the
- programming language, and use the preprocessor only when you need to.
- Table 1 Preprocessor directives
- #include Includes text of header or source file.
-
- #define Enters a symbol into the symbol table for the
- current compilation unit, with an optional
- value.
- #undef Removes a symbol from the symbol table.
-
- #if Control flow directives for conditional
- #elif compilation.
- #else
- #endif
-
- #ifdef Symbol table query directives.
- #ifndef (ALso used for conditional compilation).
-
- #line Renumbers the current source line. UtiLities
- like code generators use this to synchronize
- generated lines with original source lines in
- error messages.
-
- #pragma Compiler-dependent actions.
- Table 2 Pre-defined macros
- Macro Value
- ---------------------------------------------
- __LINE__ The number of the current source
- line (equal to one more than the
- number of newline characters read
- so far).
-
- __FILE__ The name of the source file.
-
- __DATE__ The date of translation, in the
- form "Mmm dd yyyy."
-
- __TIME__ The time of translation, in the
- form "hh:mm:ss".
-
- __STDC__ 1, if the compilation is in
- "standard" mode.
- Table 3 Preprocessor operators
- Operator Usage
- # Stringizing
- ## Token pasting
- defined Symbol table query
- Table 4 Trigraph sequences
- Trigraph C Source Character
- ??= #
- ??( [
- ??/ \
- ??) ]
- ??' ^
- ??< {
- ??!
- ??> }
- ??- ~
- Table 5 New C++ digraphs and identifiers
-
- Token Translation
- <% {
- %> }
- <: [
- :> ]
- %% #
- bitand &
- and &&
- bitor
- or
- xor ^
- compl ~
- and_eq &=
- or_eq =
- xor_eq ^=
- not !
- not_eq !=
-
- Listing 1 Prints the pre-defined macros
- /* sysmac.c: Print system macros */
-
- #include <stdio.h>
-
- main()
- {
- printf("__DATE__ == %s\n",__DATE__);
- printf("__FILE__ == %s\n",__FILE__);
- printf("__LINE__ == %d\n",__LINE__);
- printf("__TIME__ == %s\n",__TIME__);
- printf("__STDC__ == %d\n",__STDC__);
- return 0;
- }
-
- /* Output:
- __DATE__ == Dec 18 1993
- __FILE__ == sysmac.c
- __LINE__ == 9
- __TIME__ == 19:05:06
- __STDC__ == 1
- */
- /* End of File */
-
-
- Listing 2 Illustrates an assertion failure
- /* fail.c */
- #include <stdio.h>
- #include <assert.h>
-
- main()
- {
- int i = 0;
-
- assert(i > 0);
- return 0;
- }
-
- /* Sample Execution:
- C:>assert
- Assertion failed: i > 0,
-
- file assert.c, line 9
- Abnormal program termination
- */
- /* End of File */
-
-
- Listing 3 Illustrates the stringizing operator
- /* trace.c: Illustrate a trace *
- * macro for debugging */
-
- #include <stdio.h>
-
- #define trace(x,format) \
- printf(#x " = %" #format "\n",x)
-
- main()
- {
- int i = 1;
- float x = 2.0;
- char *s = "three";
-
- trace(i,d);
- trace(x,f);
- trace(s,s);
- return 0;
- }
-
- /* Output:
- i= 1
- x = 2.000000
- s = three
- */
- /* End of File */
-
-
- Listing 4 Illustrate the token-pasting operator
- /* trace2.c: Illustrate a trace *
- * macro for debugging */
-
- #include <stdio.h>
-
- #define trace(x,format) \
- printf(#x " = %"#format "\n",x)
- #define trace2(i) trace(x## i,d)
-
- main()
- {
- int x1 = 1, x2 = 2, x3 = 3;
- trace2(1);
- trace2(2);
- trace2(3);
- return 0;
- }
-
- /* Output:
- x1 = 1
- x2 = 2
- x3 = 3
- */
-
- /* End of File */
-
-
- Listing 5 A simple implementation of the assert macro
- /* assert.h */
-
- extern void __assert(char *, char *, long);
-
- #undef assert
- #ifdef NDEBUG
- #define assert(cond)
- (void) 0
- #else
-
- #define assert(cond) \
- ((cond) \
- ? (void) 0 \
- : __assert(#cond,__FILE__,__LINE__))
- /* End of File */
-
-
- Listing 6 The __assert support function
- /* xassert.c */
- #include <stdio.h>
- #include <stdlib.h>
-
- void __assert(char *cond, char *fname, long lineno)
- {
- fprintf(stderr,
- "Assertion failed: %s, file %s, line %ld\n",
- cond,fname,lineno);
- abort();
- }
-
- /* End of File */
-
-
- Listing 7 Preprocessed source with a surprise
- main()
- {
- int SURPRISE! = 1, x2 = 2, x3 = 3;
- printf("x1" " = %" "d" "\n",SURPRISE!);
- printf("x2" " = %" "d" "\n",x2);
- printf("x3" " = %" "d" "\n",x3);
- return 0;
- }
- /* End of File */
-
-
- Listing 8 Illustrates macro rescanning
- /* preproc.c: Test # and ## preprocessing operators
- *
- * NOTE: DO NOT COMPILE! Preprocess only!
- */
-
- /* Handy stringizing macros */
- #define str(s) #s
- #define xstr(s) str(s)
-
-
- /* Handy token-pasting macors */
- #define glue(a,b) a##b
- #define xglue(a,b) glue(a,b)
-
- /* Some definitions */
- #define ID(x) "This is version " ## xstr(x)
- #define INCFILE(x) xstr(glue(version,x)) ".h"
- #define VERSION 2
- #define ION ATILE
-
- /* Expand some macros */
- str(VERSION)
- xstr(VERSION)
- glue(VERSION,3)
- xglue(VERSION,3)
- glue(VERS,ION)
- xglue(VERS,ION)
-
- /* Expand some more */
- ID(VERSION)
- INCFILE(VERSION)
- str(INCFILE(VERSION))
- xstr(INCFILE(VERSION))
- /* End of File */
-
-
- Listing 9 Preprocessed results from Listing 8
- "VERSION"
- "2"
- VERSION3
- 23
- 2
- VERSATILE
-
- "This is version ""2"
- "version2" ".h"
- "INCFILE(VERSION)"
- "\"version2\" \".h\""
-
-
- Listing 10 A "Hello, world!" program
- /* hello.c: Greet either the user or the world *\
- #include <stdio.h>
-
- main(int argc, char *argv[])
- {
- if (argc > 1 && argv[1] != NULL)
- printf("Hello, %s!\n",argv[1]);
- else
- printf("Hello, world!\n");
- return 0;
- }
-
- /* End of File */
-
-
- Listing 11 "Hello, World!" using trigraphs
- /* thello.c: Greeting program using trigraphs */
- #include <stdio.h>
-
-
- main(int argc, char *argv??(??))
- ??<
- if (argc > 1 && argv??(0??) != NULL)
- printf("Hello, %s!??/n",argv??(1??));
- else
- printf("Hello, world!??/n");
- return 0;
- ??>
-
- /* End of File */
-
-
- Listing 12 "Hello, World!" with the new C++ digraphs and tokens
- /* dhello.c: Greeting program using C++ digraphs */
- #include <stdio.h>
-
- main(int argc, char *argv<::>)
- <%
- if (argc> 1 and argv<:0:> != NULL)
- printf("Hello, %s!??/n",argv<:1:>);
- else
- printf("Hello, world!??/n");
- return 0;
- %>
-
- /* End of File */
-
-
- Listing 13 Uses macros and an inline function to test the standard C++ class
- // tstr.cpp: Test the C++ string class
-
- #include <iostream.h>
- #include <stddef.h>
- #include <cstring.h>
-
- // Handy display macros
- #define result(exp) \
- cout << #exp ": \"" << (exp) << '\"' << endl
- #define test(obj,exp) \
- exp, print(#obj ", after "#exp ":\n",obj)
-
- // Print a string in quotes
- inline void print(const char *p, const string& s)
- {
- cout << p << '"' << s << '"' << endl;
- }
-
- main()
- {
- string s1("Now is the time for all worthy carbon units"),
- s2 = "to come to the aid of their sector.",
- s3 = '\n',
- s4(s1);
- size_t len = s1.length();
- // Test some operators
- result(s1 == s4);
- result(s1 < s4);
- result(s1 + s3 + s2);
-
- test(s1,s1 += s3 + s2);
- result(s1 == s4);
- test(s1,s1.resize(len));
- result(s1 == s4);
- cout << endl;
-
- // Search and replace
- size_t pos = s1.find("all");
- if (pos != NPOS)
- test(s1,s1.replace(pos,3,"some"));
- pos = s1.find("worthy");
- if (pos != NPOS)
- {
- result(s1.substr(pos,5));
- test(s1,s1.insert(pos,"un"));
- }
- cout << endl;
-
- // More searching
- result(s1.find_first_of("aeiou"));
- result(s1.find_first_not_of("aeiou"));
- result(s1.find_last_of("aeiou"));
- result(s1.find_last_not_of("aeiou"));
- cout << endl;
-
- // Subscripting
- pos = s2.find_first_of('d');
- test(s2,s2[pos] = 'l');
- return 0;
- }
-
- /* Output:
- s1 == s4: "1"
- s1 < s4: "0"
- s1 + s3 +s2: "Now is the time for all worthy carbon units
- to come to the aid of their sector."
- s1, after s1 += s3 + s2:
- "Now is the time for all worthy carbon units
- to come to the aid of their sector."
- s1 == s4: "0"
- s1, after s1.resize(len):
- "Now is the time for all worthy carbon units"
- s1 == s4: "1"
-
- s1, after s1.replace(pos,3,"some"):
- "Now is the time for some worthy carbon units"
- s1.substr(pos,5): "worth"
- s1, after s1.insert(pos,"un"):
- "Now is the time for some unworthy carbon units"
-
- s1.find_first_of("aeiou"): "1"
- s1.find_first_not_of("aeiou"): "0"
- s1.find_last_of("aeiou"): "43"
- s1.find_last_not_of("aeiou"): "45"
-
- s2, after s2[pos] = 'l':
- "to come to the ail of their sector."
- */
-
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- CUG New Releases
-
-
- IOCCC, ASXXXX, MINED, TDE Update, and a Bug Fix
-
-
-
-
- Victor R. Volkman
-
-
- Victor R. Volkman received a BS in Computer Science from Michigan
- Technological University. He has been a frequent contributor to The C Users
- Journal since 1987. He is currently employed as Senior Analyst at H. C.I.A. of
- Ann Arbor, Michigan. He can be reached by dial-in at the HAL 9000 BBS (313)
- 663-4173 or by Usenet mail to sysop@hal9k.com.
-
-
-
-
- CUG Library Volume IV
-
-
- CUG Library proudly announces volume IV of its directory of user-supported C
- source code. This latest effort thoroughly catalogs CUG Library volume numbers
- #300 through #349. Volume IV includes both exhaustive reviews of selected
- volumes plus capsule summaries of all volumes. In all, this amounts to more
- than 250 cross-referenced and indexed pages of information. Volume IV can be
- yours for $10 or order the set of volumes I through IV for just $28 total. As
- always, see the order blank in the center portion of this issue.
-
-
- Bug Fix for GNUPlot, CUG #334
-
-
- Arild Olsen <d_olsen_o@kari.uio.no> writes:
- "I just received the disk, and the HPLJII-driver does not function, as stated
- by R.T. Stevens in his review (CUJ, June 93). Maybe this is due to an
- incompatibility between HP LaserJet II and III; I have a LJ III. When sending
- a bitmap to the printer, the program specifies TIFF-format. This is not
- correct since the bitmap is plain.
- To solve this, edit the HPLJIItext function in the HPLJII driver. The string
- "\033*b2m%dW" should be changed to "\033*b0m%dW".
-
-
- New Acquisitions and Updates
-
-
- This month we present three additions to the CUG Library, as well as an update
- to a recently featured volume.
- International Obfuscated C Code Contest (CUG #397)
- ASxxxx Cross Assembler - Part 3 (CUG #398): MC 68HC08 CPU support
- MINED Editor: (CUG #399)
- Thomson-Davis Editor (CUG #386 update): multi-window text/binary file editor
- -- new version 3.2A
-
-
- International Obfuscated C Code Contest 1984-1993: CUG #397
-
-
- Landon Noll (Sunnyvale, CA) submits a decade of source code from the
- International Obfuscated C Code Contest (IOCCC). This contest has long been a
- favorite of many CUJ readers. The entire IOCCC archive from 1984-1993 is now
- available as a two-diskette set from the CUG Library. Obfuscation implies
- purposefully obscuring and confusing a situation. Why obfuscate C code? The
- official IOCCC states its objectives as follows:
- To show the importance of programming style, in an ironic way.
- To stress C compilers by feeding them unusual code.
- To illustrate some of the subtleties of the C language.
- To provide a safe forum for poor C code.
- Recently, Bob van der Poel reviewed Don Libes' book entitled Obfuscated C and
- Other Mysteries (see CUJ, October 1993, pp. 131-132). The diskette for this
- book includes IOCCC entries from 1984-1991. Libes has produced special reports
- about the IOCCC several times in CUJ. Please see the following back issues for
- more detail:
- Libes, Don. "Don't Put This on Your Resume," CUJ, May 1991, p. 89.
- Libes, Don. "The Far Side of C," CUJ, May 1990, p. 125.
- Libes, Don. "The International Obfuscated C Code Contest," CUJ, July 1989, p.
- 93.
- The CUG Library volume #397 contains the full IOCCC archive including two
- additional years not included in the Libes' book.
- In addition to dozens and dozens of obfuscated C programs, the archive
- includes complete rules and guidelines so you can submit your own entries into
- next year's contest. Some of the obfuscated programs are quite useful,
- including scaled-down versions of make, grep, and various editors.
-
-
- ASxxxx Cross Assembler - Part 3: CUG 398
-
-
-
- Cross assemblers continue to play an important role in the CUG Library. A
- cross assembler reads assembly language source code for a non-native CPU and
- writes object code that can be linked and downloaded to the target machine for
- execution. Embedded systems developers are the most frequent users of cross
- assemblers. This month, Alan R. Baldwin (Kent State University, Ohio), adds
- his third cross assembler to the CUG Library's repetoire. ASxxxx Part 3
- provides a complete Motorola 68HC08 development system. ASxxxx Part 3 version
- 1.50 (released 8/9/93) is immediately available as CUG volume #398.
- The CUG distribution of ASxxx Part 3 includes MS-DOS executables for the
- ASxxxx Cross Assembler and Linker. However, if you want to recompile the Cross
- Assembler or Linker, you'll also need ASxxxx Part 1 (CUG #292). ASxxx Part 2
- contains cross assembler source files for the 6816 CPU. The ASxxxx family of
- cross assemblers can be built on DEC machines running DECUS C in the TSX+
- environment or PDOS C v5.4b under RT-11. ASxxxx has been built with Borland
- C++ v3.1 under MS-DOS and includes a project (.PRJ) file. Although only these
- implementations have been specifically tested, Baldwin claims many other K&R C
- compilers may work as well.
- ASxxxx Part 3 includes a comprehensive 80-page manual covering functionality
- provided by all three existing ASxxxx cross assemblers and linkers. The
- documentation lays out the exact specifications of syntax for symbols, labels,
- assembler directives, and expressions in detail. The manual includes
- appendices with instruction set highlights and supported syntax for Motorola
- 6800, 6801, 6804, 6805, 68HC08, 6809, 6811 6816, Intel 8080 and 8085, and
- Zilog Z80 and HD64180 CPUs.
- The ASxxxx assembler falls short of full macro implementation, but does
- include a host of important features such as: if/then/else, #include files,
- radix support from binary to hexadecimal, and a full complement of C-language
- operators for expressions. The ASxxxx linker goes beyond conventional loaders
- by resolving intermodule symbols, combining code into segments, relocating
- absolute symbols and base addresses, and producing either Intel HEX or
- Motorola S19 output files.
-
-
- MINED Editor: CUG #399
-
-
- MINED, by Thomas Wolff (Freie Universität Berlin, Institut für Informatik,
- Germany), is a modeless full-screen text editor. MINED was originally written
- for MINIX and now works with most UNIX platforms as well as MS-DOS, and DEC
- VAX-11/VMS. MINED works best at editing small files (50K or less) and can edit
- many files simultaneously. Unlike other editors which have separate command
- modes and input modes, MINED uses a modeless design for ease of use. It also
- includes powerful regular expression operations for both searching and
- replacing text. MINED Version 3 (released 08/04/93) is immediately available
- as CUG volume #399.
-
-
- Thompson-Davis Editor Update: CUG #386
-
-
- The Thomson-Davis Editor, as provided by Frank Davis (Tifton, GA), is a
- multi-file/multi-window binary and text file editor written for IBM PCs and
- close compatibles. Thomson-Davis Editor (TDE) works well with batch files,
- binary files, text files, and various computer language source code files. TDE
- can handle any size of file and any number of windows that fit in conventional
- DOS memory.
- Davis reports the following enhancements since TDE was last released to the
- CUG Library:
- Pop-up pull-down command menu = <CTRL>+\
- More Language support, thanks to Byrial Jensen, <byrial@daimi.aau.dk>
- TDE ported to Linux (POSIX, SVR4, BSD4.3+?, FIPS 151-1, etc.)
- A bug (a blunder, actually) got fixed in the 3.1 config utility.
- Linux FAQs and HOWTOs
- New regular expression meta characters: < = Empty string at beginning of word;
- > = Empty string at end of word
-
-
- Non-English Language Support
-
-
- Byrial Jensen contributed several functions to TDE that are useful with
- non-English languages. Using these functions, DOS filenames can contain
- extended ASCII characters. As a result, the dirlist function in TDE (which
- sorts filenames according to the sort order array) can be customized to your
- favorite alphabet. Byrial also contributed two new macro functions that look
- at the Caps Lock key: IfCapsLock and IfNotCapLock. Other changes supporting
- non-English usage are as follows: predefined regular expression macros may be
- redefined; all editor prompts have been gathered into prompts.h; response
- letters have been gathered into letters.h; and the window letters can be
- changed to follow a non-English alphabet.
-
-
- Improved Regular Expression Handling
-
-
- Davis writes: "I use the regular expression search much more often than I
- first anticipated. A couple of features missing in the original implementation
- are the beginning-of-word and end-of-word metacharacters. These metacharacters
- really come in handy for culling prefixes and suffixes from the search. Here's
- our new regular expression table: [Please refer to Table 1]"
- TDE version 3.2a (Released 11/15/93) immediately replaces version 3.0 and is
- available as CUG Library volume #386.
- Table 1 Regular Expession operator precedence: (based on the table in Dr.
- Aho's book)
- c = char x = string r,s = regular expression
- -------------------------------------------------------
- c any non-operator character Felis
- \c c literally and C escapes catus\.
- \:c predefined macro: \:c*(ieei)
- \:a - alphanumeric
- \:b - white space
- \:c - alphabetic
- \:d - decimal
- \:h - hex
- \:l - lower alpha
- \:u - upper alpha
- . any character but newline c.t
- < beginning of word <cat
- > end of word <cat>
- ^ beginning of line ^cat>
- $ end of line cat$
- [x] any character in x [a-z0-9]
- [^x] any character not in x [^AEIOU]
- r* zero or more r's ca*t
-
- r+ one or more r's ca[b-t]+
- r? zero or one r c.?t
- rs r followed by s ^$
- rs either r or s kitty½cat
- (r) r (c)?(a+)t
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- CUG Product Focus
-
-
- C++ SIM
-
-
-
-
- Victor R. Volkman
-
-
- Victor R. Volkman received a BS in Computer Science from Michigan
- Technological University. He has been a frequent contributor to The C Users
- Journal since 1987. He is currently employed as Senior Analyst at H.C.I.A. of
- Ann Arbor, Michigan. He can be reached by dial-in at the HAL 9000 BBS (313)
- 663-4173 or by Usenet mail to sysop@hal95.com.
-
-
-
-
- Introduction
-
-
- This month's product focus is derived from documentation written by M.C.
- Little and D.L. McCue. Little and McCue have provided this documentation,
- which describes their C++ SIM simulation class library, expressly for reprint
- in The C Users Journal.
- C++ SIM is a class library which provides discrete process-based simulation
- similar to that provided by SIMULA [Birtwhistle 73][Dahl 70] and has been used
- in the work presented in [McCue 92]. Based on the facilities provided in
- SIMULA, C++ SIM provides active objects (instances of C++ classes) as the
- units of simulation using the type-inheritance facilities of C++ to convey the
- notion of "activity."
- C++ SIM is designed to be used with a user-supplied threads package. C++ SIM's
- authors use Sun Microsystem's lightweight process (thread) package; however,
- they have added this package to the simulation class hierarchy through an
- abstract class definition so that other lightweight process packages can be
- used instead with very little modification. Users of this framework can
- replace existing classes as long as the replacements conform [Black 86] to the
- original class definition.
- This article describes the C++ SIM class hierarchy, and shows how it can be
- used to further refine the simulation package [1].
-
-
- The Class Hierarchy
-
-
- Figure 2 illustrates the main class hierarchy within the simulation package.
- The base class is Thread, which provides the minimum functionality required of
- a threads library. Two classes, GNU-Thread and LWP_Thread, derive from the
- Thread class to support the threads packages that were available to C++ SIM's
- authors at the time of this writing. These classes are Sun's own lightweight
- process package, and the GNU threads library. Class Thread_Type provides a
- (relatively) transparent way to change from one thread implementation to
- another. Class Process provides all operations required by the simulator to
- control execution of all processes in the simulation. These classes are
- described in the following sections.
-
-
- The Threads Base Class
-
-
- In keeping with the C++ programming model, classes obtain the thread
- characteristic, necessary to convey the notion of "activity" within the
- simulation environment, by inheriting an appropriate base class (in simulation
- terms they become processes). In C++ SIM, all classes that provide the
- abstraction of threads must be derived from the Thread base class. This base
- class forces the derived class to provide a minimum set of operations required
- for the management of threads. (The base class defines these operations as
- pure virtual functions, and C++ requires that a deriving class define such
- functions before an instance of the class can be declared.) These operations
- are shown in the Thread class as follows:
- class Thread
- {
- public:
- virtual void Suspend() = 0; // pure virtual function
- virtual void Resume() = 0;
-
- virtual void Body() = 0;
- virtual long Current_Thread() = 0;
-
- virtual long Identity();
- static Thread* Self();
- };
- When defined, the Suspend and Resume methods will give the thread package
- specific ways of suspending and resuming execution of a thread respectively.
- Body represents the controlling code for each object, i.e., the scope within
- which the controlling thread will execute.
- Current_Thread must be defined by the derived class, since it returns the
- identity of the currently executing thread, which is specific to the thread
- package used.
- The base class itself implements the operations Identity and Self because some
- threads packages do not provide similar functionality. Identity returns the
- unique identity of the thread associated with a given object, and Self returns
- the currently executing thread. Because Self is a static member function
- programs can invoke it without creating an instance of the Thread class, i.e.,
- programs may call Thread::Self().
-
-
- The Class LWP_Thread
-
-
- User classes which require separate threads of control using the Sun thread
- package can be derived from the LWP_Thread class shown as follows:
-
- class LWP_Thread : public Thread
- {
- public:
- virtual void Suspend();
- virtual void Resume();
- virtual void Body() = 0;
-
- virtual long Current_Thread();
-
- thread_t Thread_ID();
- static void Initialize();
-
- protected:
- static const int MaxPriority;
- LWP_Thread(int priority = MaxPriority);
- };
- The MaxPriority constant represents the maximum priority at which a thread may
- execute (by default all threads derived from this class execute at this
- priority). Class LWP_Thread defines all of the pure virtual functions declared
- in Thread except Body, which must be defined by the deriving class.
- Initialize initializes the threads package prior to use. (Obviously the
- operations performed within this method are thread package specific.)
- Thread_ID returns more detailed (package-specific) information about the
- associated thread.
-
-
- The Process Class
-
-
- Applications could derive from the Thread base class to provide active objects
- in C++ outside of the simulation package. However, to become a process in the
- simulation environment, a class must be derived from the Process base class.
- This class is shown as follows:
- class Process : public LWP_Thread
- {
- public:
- virtual ~Process ();
-
- static double CurrentTime ();
-
- void ActivateBefore (Process&);
- void ActivateAfter (Process&);
- void ActivateAt (double AtTime = CurrentTime());
- void ActivateDelay (double AtTime = CurrentTime());
- void Activate();
-
- void ReActivateBefore (Process&);
- void ReActivateAfter (Process&);
- void ReActivateAt (double AtTime = CurrentTime());
- void ReActivateDelay (double AtTime = CurrentTime());
- void ReActivate ();
-
- void Cancel ();
- double evtime ();
- void set_evtime (double);
-
- boolean idle ();
- boolean terminated ();
-
- virtual void Body () = 0;
-
- protected:
- Process ();
-
- void Hold (double t);
- void Passivate ();
-
- };
- idle returns either TRUE or FALSE depending upon whether the process is
- currently in the simulation queue.
- terminated returns either TRUE or FALSE depending upon whether the process is
- terminated.
- evtime returns the simulation time at which a process is due to be
- reactivated; set_evtime enables a program to change this time.
- The Hold method removes the active process from the head of the event queue
- and schedules it to become active a specified number of time units later.
- Passivate removes the currently active process from the event queue
- altogether. If the process is to execute again the program must recreate it.
- Cancel removes the process from the simulation queue or simply suspends it
- indefinitely if it is currently not in the queue.
- At any point in time, a process can be in one (and only one) of the following
- states:
- active: the process is at the head of the queue maintained by the scheduler
- (to be described shortly) and its actions are being executed.
- suspended: the process is in the queue maintained by the scheduler, scheduled
- to become active at a specified time in the future.
- passive: the process is not in the scheduler's queue. Unless another process
- brings it back into the queue, it will not execute any further.
- terminated: the process is not in the scheduler's queue and has no further
- actions to execute.
- There are five ways to activate a process, and similarly five ways to
- reactivate a waiting process:
- before another process (ActivateBefore and ReActivateBefore);
- after another process (ActivateAfter and ReActivateAfter);
- at a specified (simulated) time (ActivateAt and ReActivateAt);
- after a specified (simulated) delay (ActivateDelay and ReActivateDelay);
- activate now (at the current simulated time) (Activate and ReActivate).
- (Note that if a process is already scheduled, reactivation will simply
- re-schedule the process.)
- The Current Time method returns the current simulation time; programs
- typically call Curren time to control action relative to a given time period.
-
-
- The Simulation Scheduler
-
-
- As in SIMULA, simulation processes (entities) execute at their assigned
- simulation time, which is typically determined by an appropriate distribution
- function. Only one process executes in any instance of real time, but many
- processes may execute at any instance of simulation time. Programs place
- currently inactive processes in a simulation queue (the event queue), which is
- arranged in order of increasing simulation time.
- To coordinate the execution of these processes, the scheduler manages the
- simulation queue as follows: when no process is currently active, the
- scheduler selects a process to run from the head of the queue and (re-)
- activates it. When no processes are left to execute, i.e., the queue is empty,
- the simulation ends.
- The simulation queue is organized as a tree to improve the efficiency of the
- scheduling algorithm. All nodes (processes) at the same level of the tree are
- assigned to the same simulation time, as shown in Figure 1.
- Because the scheduler manages processes in the simulation environment it
- cannot itself be a simulation process. Like the main thread to be described
- later, the scheduler is a priority thread within the environment and as such
- must be controlled in a slightly different manner than the other simulation
- entities. The structure of the scheduler is extremely simple and is shown as
- follows:
- class Scheduler : public LWP_Thread
- {
- public:
- Scheduler ();
- ~Scheduler ();
- void Body ();
- double CurrentTime ();
- };
-
- Every simulation application must start one scheduler before the simulation
- can begin. The example to be described near the end of this article
- illustrates use of the scheduler.
-
-
- Priority Threads
-
-
- C++ SIM executes two "priority" threads which cannot be derived from the
- Process base class and therefore must be activated and deactivated separately.
- These threads are as follows:
- the simulation scheduler: this thread must be activated via the Resume method
- of the thread base class from which it is derived (e.g., LWP_Thread);
- the thread associated with main. A program must suspend this thread to allow
- other threads to run since this thread has the highest priority in the system.
- Calling the Thread class Initialize method within the main body of the
- simulation code adds this thread to the thread queue maintained by class
- Thread. The thread's presence in the queue allows the Suspend method to act on
- it when the program requirs it to become inactive (using the
- Thread::Self()->Suspend() operation).
-
-
- Distribution Functions
-
-
- Simulations often require distribution functions of various events (e.g., the
- rate of arrivals of jobs at a processor, or the time between failures for a
- node). C++ SIM provides a set of classes which give access to various useful
- distribution functions, including the following: RandomStream, UniformStream,
- Draw, Exponentialstream, ErlangStream, HyperExponentialStream, and
- NormalStream. By creating instances of these classes the simulation processes
- can gain access to the appropriate distribution function. Figure 3 shows the
- class hierarchy of the distribution functions.
-
-
- RandomStream and NormalStream
-
-
- Classes RandomStream and NormalStream illustrate how the distribution
- functions are derived and show how further functions could be built.
- RandomStream (from which all other distribution functions are derived) is
- shown as follows:
-
- class RandomStream
- {
- public:
- RandomStream (long MGSeed = 772531L.
- long LCGSeed = 1878892440L);
- virtual double operator() () = 0;
- double Error ();
-
- protected:
- double Uniform ();
-
- private:
- double MGen ();
- double series[128];
- long MSeed, LSeed;
- };
- The Error method returns a chi-square error measure on the uniform
- distribution function. The Uniform method generates random numbers; Uniform
- uses the linear congruential generator based on the algorithm from [Knuth
- Vol2], and shuffles the results of the linear generator with the
- multiplicative generator as suggested by [Knuth Vol2] [3] to obtain a
- sufficiently uniform random distribution.
- Class NormalStream is defined as follows:
- class NormalStream : public RandomStream
- {
- public:
- NormalStream (double Mean, double StandardDeviation);
- virtual double operator() ();
-
- private:
- double Mean, StandardDeviation;
- double z;
- };
- The operator() uses the polar method in [Knuth Vol2] [4] to implement the
- NormalStream by making use of the Uniform method of RandomStream.
-
-
- SIMSET
-
-
- C++ SIM also provides entity and set manipulation facilities similar to those
- provided by the SIMSET classes of SIMULA. These facilities break down into two
- classes:
- Link: the Link class provides elements of a doubly linked list;
- Head: the Head class maintains doubly linked lists of Link elements.
- Class link is defined as follows:
- class Link
- {
- public:
- virtual ÿLink ();
-
- Link* Suc () const;
- Link* Pred () const;
-
- Link* Out ();
- void InTo (head*);
-
- void Precede (Link*);
- void Precede (Head*);
- void Follow (Link*);
- void Follow (Head*);
-
- protected:
- Link ();
- };
- Because it makes no sense to create instances of Link objects, the constructor
- for Link is protected -- programs must derive a class from Link to benefit
- from its functionality.
- Suc and Pred return the successor and predecessor of this list element
- respectively. These functions return 0 if no such element exists.
-
- Out removes the object to which it currently belongs (if any) from the linked
- list. InTo makes this object the last element in a linked list if the list
- exists; If the list doesn't exist Out attempts to remove the object from any
- linked list to which it may belong.
- Precede also places an object in a linked list. If Precede is passed another
- Link element, say L, then if L is a member of a linked list, this object is
- placed into the same linked list and immediately preceding L. If L isn't a
- member of a linked list, the result is the same as for Out. If Precede is
- passed a Head object it produces the same result as InTo.
- Follow acts similarly to Precede, except that L.Follow(L1) inserts L
- immediately after L1, and L.Follow(H), places L as the first element in H,
- where H is a Head object.
- Note that as in SIMULA, Link elements can only belong to one linked list at a
- time.
- Class Head is defined as follows:
- class Head
- {
- public:
- Head ();
- virtual ~Head ();
-
- Link* First () const;
- Link* Last () const;
-
- long Cardinal () const;
- boolean Empty () const;
-
- void Clear ();
- }
- First and Last return references to the first and last Link objects in the
- list respectively. If the list is empty then these functions return 0.
- Cardinal returns the number of Link objects in the list, and Empty returns
- TRUE if the list is empty, FALSE otherwise. Clear removes all Link objects
- from the list.
-
-
- Example: Job Service Simulation
-
-
- This example is taken from [Mitrani 82] and simulates a process scheduler for
- a machine which attempts to execute as many process (jobs) as possible. The
- machine can only process one job at a time and queues job requests until it
- can deal with them. The machine is prone to failures, so started jobs will be
- interrupted by such failures and delayed until the machine has been repaired
- (reactivated), at which point it is forced to restart execution from the
- beginning (i.e., it is placed at the head of the job queue). The main
- processes within this example are:
- Arrivals: this process controls the rate at which jobs arrive at the service
- (Machine).
- Breaks: this process controls the availability of the Machine by "killing" it
- and restarting it at intervals drawn from a Uniform distribution.
- Job: this process represents the jobs that the Machine must process.
- Machine: this is the machine on which the service resides. Machine obtains
- Jobs from the job queue for the service and then attempts to execute them. The
- machine can fail and so the response time for Jobs is not guaranteed to be the
- same every time the job is performed.
-
-
- Arrivals
-
-
- The Arrivals class definition is relatively simple since none of the other
- processes invoke operations on it. Arrivals is defined as follows:
- class Arrivals : public Process
- {
- public:
- Arrivals (double);
- ~Arrivals ();
-
- void Body ();
-
- private:
- Exponential Stream* InterArrival Time;
- };
- The constructor initializes the stream from which the rate of Job arrivals is
- drawn and the destructor simply cleans up before the object is destroyed:
- Arrivals::Arrivals (double mean)
- {
- InterArrivalTime = new ExponentialStream(mean);
- }
- Arrivals::~Arrivals () { delete InterArrivalTime; }
- The main body of Arrivals (shown below) simply waits for an amount of time
- dictated by the rate of arrivals stream and then creates another Job. This
- procedure is repeated until the simulation ends.
- void Arrivals::Body ()
- {
- for (;;) // inifinite loop
- {
-
- Hold((*InterArrivalTime) ());
- Job* work = new Job();
- }
- }
-
-
- Job
-
-
- Unlike Arrivals, which is an active entity within the simulation, the Job
- class does not need to be a separate process, since it is simply enqueued when
- it is created and dequeued by the Machine when it can be executed. All a given
- Job must do is calculate how long it took to be "processed":
- class Job
- {
- public:
- Job ();
- ~Job ();
-
- private:
- double ArrivalTime;
- double ResponseTime;
- };
- Because no operations are invoked on instances of the Job class, its
- constructor and destructor perform all of its work:
- Job::Job ()
- {
- boolean empty;
-
- ResponseTime = 0;
- ArrivalTime = sc->CurrentTime();
- empty = JobQ.IsEmpty();
- JobQ.Enqueue(this); // place this Job on to the queue
- Total Jobs++;
-
- if (empty && !M->Processing() && M->IsOperational())
- M->Activate(); // Machine idle as no Jobs in queue
- // and not broken
- }
-
- Job::~Job ()
- {
- ResponseTime = sc->CurrentTime() - ArrivalTime;
- TotalResponseTime += ResponseTime;
- }
-
-
- Queue
-
-
- The program places jobs which are not being serviced in a job queue. As with
- the Job class, an instance of Queue is not required to be active, and as such
- Queue is not derived from the Process class.
- Queue is defined as follows:
- class Queue
- {
- public:
- Queue ();
- ~Queue ();
-
- boolean IsEmpty ();
- // returns TRUE if no Jobs in queue
- long QueueSize ();
- // returns number of Jobs in queue
- Job* DeQueue ();
-
- // returns head of queue
- void Enqueue (Job*);
- // places Job at tail of queue
- };
-
-
- Machine
-
-
- The Machine process obtains Jobs from the queue and processes them. Since
- Machine is prone to failures Jobs can take extended periods of time to
- complete. Other simulation processes invoke various operations on the machine
- (for example to determine whether or not it has failed):
- class Machine : public Process
- {
- public:
- Machine (double);
- ~Machine ();
-
- void Body ();
-
- void Broken ();
- void Fixed ();
- boolean IsOperational ();
- boolean Processing ();
- double ServiceTime ();
-
- private:
- ExponentialStream* STime;
- boolean operational;
- boolean working;
- };
- As with the Breaks and Arrivals processes, Machine's constructor and
- destructor initialize and delete the stream that dictates the time required to
- process a Job.
- Processing returns the current status of the machine, i.e., whether or not it
- is executing a job:
- boolean Machine::Processing () { return working; }
- Broken and Fixed de-activate (crash) and re-activate the machine respectively:
- void Machine::Broken () { operational = false; }
- void Machine::Fixed () { operational = true; }
- IsOperational indicates whether or not the machine is currently active (i.e.,
- whether it has "crashed"):
- boolean Machine::IsOperational () { return operational; }
- ServiceTime returns the time required to service a given job based on the
- relevant distribution function initialized within the constructor:
- double Machine::ServiceTime () { return (*STime)(); }
- The main body of the Machine gets a Job from the job queue (if one is
- available) and attempts to process it before looping again:
- void Machine::Body ()
- {
- for(;;)
- {
- working = true;
-
- while (!JobQ.IsEmpty())
- // continue as long as Jobs are available
- {
- Hold(ServiceTime());
- Job* J = JobQ.Dequeue();
-
- ProcessedJobs++;
- // keep track of number of completed Jobs
- delete J; // remove finished Job
- }
-
- working = false;
- // no Jobs in queue so become idle
-
- Cancel();
- }
- }
-
-
- Breaks
-
-
- The Breaks class defines a process which simply waits for a specific period of
- time before "killing" the Machine process. This process then waits again
- before re-activating the machine. The Breaks class definition is relatively
- simple:
- class Breaks : public Process
- {
- public:
- Breaks ();
- ~Breaks ();
-
- void Body ();
-
- private:
- UniformStream* RepairTime;
- UniformStream* OperativeTime;
- boolean interrupted_service;
- };
- The constructor and destructor simply initialize and delete the streams used
- by the Breaks process respectively.
- The main body of the Breaks process activates and deactivates the Machine
- process. The Machine fails and recovers according to the OperativeTime and
- RepairTime distribution functions respectively. The body is defined as
- follows:
- extern Machine* M; // This is the machine used to
- // service requests
- extern Queue JobQ; // This is the queue from which
- // Jobs are drawn
-
- void Breaks::Body ()
- {
- for(;;)
- {
- Hold((*OperativeTime)());
- M->Broken();
- // de-activate the Machine
- M->Cancel();
- // remove Machine from Scheduler queue
-
- if (!JobQ.IsEmpty())
- interrupted_service = true;
-
- Hold((*RepairTime)());
- M->Fixed(); // re-activate the Machine
- if (interrupted_service)
- {
- interrupted_service = false;
- M->ActivateAt(M->ServiceTime() +
- CurrentTime());
- }
- else
- M->ActivateAt();
- }
- }
-
-
- MachineShop
-
-
-
- The MachineShop class is the core of the simulation; it starts up all of the
- main processes involved, and when the simulation ends it prints out the
- results.
- class MachineShop : public Process
- {
- public:
- MachineShop ();
- ~MachineShop ();
-
- void Body ();
- void Await ();
- };
- The Body method starts up the other processes, such as the Machine, and then
- waits until the number of processed Jobs is at least 100,000:
- void MachineShop::Body ()
- {
- sc= new Scheduler(); // create the simulation
- // queue scheduler
- Arrivals* A = new Arrivals(10);
- M = new Machine(8);
- Job* J = new Job;
- Breaks* B = new Breaks;
-
- // activate the relevant simulation processes
-
- B->Activate();
- A->Activate();
- sc->Resume(); // start up the scheduler
- // - it is not a process
-
- while (ProcessedJobs < 100000)
- Hold(10000);
-
- cout << "Total number of jobs processed"
- << TotalJobs << endl;
- cout << "Total response time" << TotalResponseTime << endl;
- cout << "Avge response" <<
- (TotalResponseTime/ProcessedJobs) << endl;
- cout << "Avge number jobs present"
- << (JobsInQueue/CheckFreq) << endl;
-
- // end simulation by suspending processes
-
- sc->Suspend();
- A->Suspend();
- B->Suspend();
- }
- It isn't necessary to explicitly activate the Machine process because the
- Breaks or Jobs process will do this.
- The Await method suspends the thread associated with main, thus allowing the
- other simulation threads to execute:
- void MachineShop::Await()
- {
- Resume();
- Thread::Self()->Suspend();
- }
-
-
- Main
-
-
- The main part of the simulation code initializes the various thread-specific
- variables used (e.g., the maximum priority of a thread), creates the main body
- of the simulation code (in this case MachineShop) and then suspends the thread
- associated with main:
- void main ()
- {
-
- LWP_Thread::Initialize();
-
- MachineShop m;
- m. Await(); // Suspend main's thread
- // (NOTE: this MUST be done
- // by all applications).
- }
-
-
- Conclusions
-
-
- The authors of C++ SIM have endeavoured to provide a simulation package which
- provides similar functionality to that of SIMULA, since SIMULA has fulfilled
- the needs of users over many years. From their experiences of using SIMULA,
- both as a general programming language and as a simulation tool, they believe
- they have been successful. As a result of using C++ they also believe that
- they have produced a simulation package having several advantages over SIMULA,
- for example:
- performance -- C++ compilers typically generate code that is several times
- more efficient than similar SIMULA code, and as a result, simulations execute
- correspondingly faster;
- C++ provides more extensive object-oriented features than SIMULA, allowing,
- for example, class instance variables to be either publicly or only privately
- available. In SIMULA, every thing is public, affecting the way code is written
- and providing extra problems during debugging.
- C++ SIM incorporates inheritance throughout its design to an even a greater
- extent than is already provided in SIMULA. For example, C++ SIM's I/O
- facilities, random number generators, and probability distribution functions
- are entirely object-oriented, relying on inheritance to specialize their
- behavior. Hence, users can add new functionality (e.g., new random number
- generators) with little effect on the overall system structure.
-
-
- Acknowledgements
-
-
- The authors would like to thank Professor Isi Mitrani for the help he has
- given them in the development of this simulation package and the time he has
- devoted to listening to their thoughts and problems. They would also like to
- thank Ron Kerr for his help with the SIMULA language, and Dr. Graham
- Partington for his comments on drafts of this article. The work reported here
- has been supported by SERC/MOD Grant GR/H81078 and Esprit Broadcast (Basic
- Research Project Number 6360).
- References
- [Birtwhistle 73] G. M. Birtwhistle, O-J. Dahl, B. Myhrhaug, K. Nygaard, Simula
- Begin, Academic Press, 1973.
- [Black 86] A. Black et al, "Object Structure in the Emerald System",
- Proceedings of the ACM Conference on Object-Oriented Programming Systems,
- Languages, and Applications, October 1986.
- [Dahl 70] O-J. Dahl, B. Myhrhaug, K. Nygaard, "SIMULA Common Base Language,"
- Technical Report S-22, Norwegian Computing Centre, 1970.
- [Knuth Vol2] Knuth Vol2, Seminumerical Algorithms, Addison-Wesley: p. 117.
- [McCue 92] D. L. McCue and M. C. Little, "Computing Replica Placement in
- Distributed Systems," Proceedings of the 2nd Workshop on the Management of
- Replicated Data, November 1992: pp. 58-61.
- [Mitrani 82] I. Mitrani, Simulation Techniques for Discrete Event Systems,
- Cambridge University Press, Cambridge, 1982: p. 22.
- [Sedgewick 83] R. Sedgewick, Algorithms, Addison-Wesley, Reading MA, 1983: pp.
- 36-38.
- [Stroustrup 86] B. Stroustrup, The C++ Programming Language, Addison Wesley:
- 1986.
- Footnotes
- [1] The software to be described in this paper is available via anonymous ftp
- from arjuna. ncl. ac. uk
- [2] The authors would like to thank Professor I. Mitrani for his help in
- developing the multiplicative generator used in the simulation. It is based on
- the following algorithm: Y[i+1] = Y[i] * 5^5 mod 2^26, where the period is
- 2^24, and the initial seed must be odd.
- [3] As suggested by Maclaren and Marsaglia.
- [4] Due to Box, Mullers and Marsaglia
- Figure 1 Head of simulation queue
- Figure 2 Simulation class hierarchy
- Figure 3 Class hierarchy for distribution functions
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Editor's Forum
- Well, I made it past my 50th birthday. Being a 50-year-old computer programmer
- is definitely better than being 20 -- you get more toys to play with and more
- autonomy in choosing what to do with them. But I confess that it's not as nice
- as being 30, or even 40 -- I miss the energy that came with those relatively
- youthful milestones. My main consolation is that I still have some energy
- left, and I'm much wiser about how I spend it now than in years gone by.
- I couldn't help but notice at 30 that many of my 40-year-old colleagues
- weren't as active in the trenches as I was. Ten years later, I found out why.
- With maturity in one's profession comes increasing demands to serve as
- caretaker rather than front-line contributor. Someone has to write all those
- proposals, job descriptions, requirements documents, requisitions, etc. They
- eat time like candy and sure don't resemble programming. But only the most
- dedicated techies can resist the siren lure of responsibility. The rest of us
- get suckered into acting like grownups.
- I couldn't help but notice at 40 that many of my 50-year-old colleagues
- weren't even fighting interesting battles, or so it appeared to me at the
- time. They seemed to attend interminable meetings and talk about policy
- matters and other ephemeral abstractions. I mean, how important can it be to
- draft international standards, or procedures for software quality control, for
- heaven's sake? Now, after a decade or more of doing that sort of stuff, I've
- come to see the merit in it. I've learned how to play the guru, or the
- statesman, or the doddering old fool, as the need arises. I can even sit
- through a two-hour meeting without squirming (excessively).
- I still write the odd bit of code, or the odd requirements spec, and it's more
- fun than ever. I spent the half-week before my birthday in New Jersey, beating
- on the draft C++ standard with Andy Koenig, Tom Plum, Bjarne Stroustrup, and
- others -- and I have to admit it was mostly enjoyable. The computer business
- has never been more exciting than it is today. I have much to be grateful for.
- When I stumbled into this field at the age of 19, I never dreamed it would
- consume my entire adult career. Or that it would bring me so many rewards. I
- can only hope that most of you who read this magazine can enjoy a comparable
- passion. May your candle burn equally bright.
- So how does it feel to be half a century old? My favorite quote on that topic
- is from Lowell Thomas, who was asked on his birthday how it felt to be 80. He
- said, "It's not bad, considering the alternative."
- P.J. Plauger
- pjp@plauger.com
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- New Products
-
-
- Industry-Related News & Announcements
-
-
-
-
- Blue Sky Releases Code Generator for OWL v2.0
-
-
- Blue Sky Software Corporation has released WindowsMAKER Professional v5, a
- Prototyper and C/C++ code generator for Windows and Windows NT, which lets
- Borland C++ v4.0 users generate OWL v2.0 code. WindowsMAKER v5 integrates into
- the Borland IDE and offers 50 features in addition to those offered by
- AppExpert.
- Features of WindowsMAKER v5 include: Visual Basic-like code editing, toolbar,
- and context-sensitive SmartMenus; a Visual Screen Designer, which includes
- drag-and-drop editing and a configurable tool palette; and a Preview Mode
- which lets users preview and test run an application before compiling.
- WindowsMAKER v5 also includes Switch-It Code Generation Modules (SIMs), which
- let users switch to any language or major C++ library during the development
- of an application. Users can choose to automatically generate ANSI C, MFC and
- OWL from one design. Using WindowsMAKER v5, users can start a project and
- decide the type of code to use later and target Windows, Win32s, and Windows
- NT.
- Other features of WindowsMAKER v5 include: toolbar support; application
- templates; code generation for online help; special effect support for fonts,
- colors, 3D buttons, patterns, and 256-color bitmaps; style and property
- setting support for controls; MDI application support; and Extended
- Functionality Modules (EFMs).
- WindowsMAKER Professional v5 includes the ANSI C SIM and is $995. Registered
- users of WindowsMAKER Professional v4 can upgrade for $269. For more
- information contact Blue Sky Software Corporation, 7486 La Jolla Blvd., Suite
- 3, La Jolla, CA 92037, (800) 677-4946 or (619) 459-6365; FAX: (619) 459-6366.
-
-
- National Information Systems Releases ACCENT STP
-
-
- National Information Systems, Inc. has released ACCENT STP (Sun Transition
- Pack) v2.0, a source code translator for Sun developers who plan to migrate
- their OPEN LOOK applications to Motif and the Common Desktop Environment.
- ACCENT STP supports Motif v1.2 and X11R5 on Sun OS v4.1.x, and Solaris v2.x.
- According to NIS, ACCENT STP will translate 80% to 100% of the C/C++
- application source code produced by XView, OLIT, or Devguide GIL files,
- including header files, where there are equivalent resources offered in Motif.
- The translated output will be recognizable Motif C or C++ source code.
- Features of ACCENT STP v2.0 include support for "drag and drop," TTY widgets,
- and internationalization support. Other features of ACCENT STP include:
- compliance to the Motif standard, compatibility with Solaris v2.4/COSE CDE,
- and application portability to multiple hardware platforms.
- ACCENT STP consists of four optional modules. The Devguide Conversion, XView
- Conversion, and OLIT Conversion modules are $4,995 each. The WindowMaker GUI
- Editor is $1,495. Motif consulting and training are also available. ACCENT
- ToolKit, a Motif library and OPEN LOOK ToolKit which helps support OPEN LOOK
- users after the source code is translated, is available for $2,495 when
- purchased with ACCENT STP. For more information contact National Information
- Systems, Inc., 4040 Moorpark Ave., Suite 200, San Jose, CA 95117, (800)
- 441-5758; FAX: (408) 246-3127; e-mail: info@nis.com.
-
-
- Select Software Upgrades C++ Designer
-
-
- Select Software Tools has upgraded C++ Designer, an MS-Windows-based object
- oriented analysis and design tool, which includes Microsoft's MFC2 Library and
- the Borland Owl Library. Based on Rumbaugh's Object Modeling Technique (OMT),
- C++ Designer lets the user structure systems graphically, adding classes and
- relationships using the GUI. The most recent release allows users to select
- from classes available in the Microsoft library or from their own class
- library. Header files for C++ can be generated from the design and linked into
- the Visual C++ environment.
- Features of C++ Designer include: Windows CUA compliance; MDI interface; a
- data dictionary browsing capability; access control via user-ID; export
- facility to clipboard or to Windows metafile; project administration including
- versioning, backup, restore, and off-line; and multi-page printing. C++
- Designer also supports the object-oriented and C++ features including: single
- and multiple-class inheritance; class properties such as attributes and
- operations; associations including multiplicity, roles, qualifiers, and
- specialized forms such as aggregations; defining the access, virtual and
- static nature of attributes and operations, and the access and virtual nature
- of base classes; and storage of additional descriptive and constraint
- information for classes, attributes, operations, and associations.
- C++ Designer can be integrated with Borland's C++ and Turbo C++, and
- Microsoft's Visual C++. It can also be configured to integrate with other
- Windows IDEs. Code frames can also be generated from C++ Designer. Code is
- compatible with most C++ compilers including Borland, Microsoft, and Clarion
- TopSpeed.
- C++ Designer is $295 per user, including free technical support via a toll
- free number. Upgrades are sold separately. Users who have purchased C++
- Designer within the past three months will receive the MFC2 library at no
- cost. For more information contact Select Software Tools, Ltd., 1526
- Brookhollow Dr., #84, Santa Ana, CA 92705, (714) 957-6633; FAX: (714)
- 957-6219.
-
-
- Inmark Releases zApp Interface Pack
-
-
- Inmark Development Corporation has released the zApp Interface Pack, an add-on
- product to its C++ class libraries. zApp is a portable C++ Application
- Framework that gives application developers cross-platform portability through
- object-oriented C++ classes. zApp v2.0 is shipping on Windows, Windows NT, DOS
- Test, DOS Graphics, and OS/2.
- The zApp Interface Pack augments zApp by providing a set of additional classes
- that add graphical elements to applications. Classes in the zApp Interface
- Pack (ZIP) provide applications with toolbars and status lines, 3D custom
- controls, bitmap buttons, and a table object.
- Quoting Howard Love, President and CEO of Inmark, "Developers can now
- incorporate the most advanced features of modern applications with just a few
- lines of code. Toolbars, spreadsheet-like tables, and other high-level
- controls, which would normally require weeks or months of engineering time,
- are now available to developers with just a few lines of code."
- Demonstration software of zApp is available on the Inmark BBS. For more
- information contact Inmark Development Corporation, 2065 Landings Dr.,
- Mountain View, CA 94043, (415) 691-9000; FAX: (415) 691-9099; BBS: (415)
- 691-9990, CompuServe GO INMARK
-
-
- StratosWare Releases MemCheck for ANSI and K&R Platforms
-
-
- StratosWare Corporation has released an ANSI-compliant source code version of
- MemCheck, its error-detection tool for C/C++. MemCheck can be used for
- development projects built with ANSI or K&R compliant C/C++ compilers.
- MemCheck for ANSI and K&R can detect memory overwrites and underwrites, memory
- leaks, heap corruption, and out-of-memory conditions, and requires no source
- code changes for most projects. MemCheck integrates with existing C/C++ code
- and works at run time to identify errors by source file and line number. The
- error messages may be directed to the screen, written to log files, or sent in
- network or e-mail messages.
- MemCheck for ANSI and K&R includes source code and requires no debugging
- information and no special compilation options. Developers may ship
- applications with MemCheck linked in, royalty-free, allowing detection of
- errors at beta or customer sites. According to StratosWare, applications with
- MemCheck linked in run unchanged and at full speed. MemCheck may be switched
- on or off at run time, linked out via a "Production" library, or compiled
- completely out with no source code changes.
- MemCheck for ANSI and K&R is $199. StratosWare offers free technical support
- via fax, CompuServe, Internet, and a toll free number. For more information
- contact StratosWare Corporation, 1756 Plymouth Rd., Suite 1500, Ann Arbor,
- MI48105, (313) 996-2944; FAX: (313) 747-8519.
-
-
- Quadralay Ship UDT for C/C++
-
-
- Quadralay Corporation has begun shipping UDT for C/C++ v1.2, an open UNIX
- development environment. UDT for C/C++ works with existing tools and code
- base, and acts as an incremental front end to existing compilers and other
- development tools. Features of UDT for C/C++ include: source code browsing,
- editing, project management, compiling, and prototyping for C++ classes and
- libraries.
- UDT for C/C++ v 1.2 supports SPARC SunOS v4.1.x, SPARC Solaris v2.x, Intel SCO
- Open Desktop, Intel Solaris v2.x, HP 9000/700, and the RS/6000. UDT for C/C++
- v1.2 single and multiple licenses are $595 and $795 per seat respectively. A
- free, thirty-day evaluation with online documentation is available via
- anonymous ftp from ftp.quadralay.com (192.195.32.1) or by request. For more
- information contact Quadralay Corporation, 8920 Business Park Dr., Austin TX
- 78759, (512) 346-9199; FAX: (512) 794-9997; e-mail: info@quadralay.com.
-
-
-
- Sector Seven Releases MakeMasterv2.6
-
-
- Sector Seven has released MakeMaster v2.6. MakeMaster (formerly DEPGEN), reads
- C source code and include files to create makefiles. Features of MakeMaster
- v2.6 include: support for C++, an optimized file search algorithm, flat
- dependency lists, relative directory references, and custom compiler command
- lines.
- MakeMaster v2.6 supports multiple disk and directory projects, cross-compilers
- and linkers, libraries, and DLLs. MakeMaster v2.6 runs under DOS v3.3+, and
- supports ANSI C and C++, Borland's Turbo Make, and Microsoft's NMake.
- MakeMaster v2.6 is $49.95. For more information contact Sector Seven, P.O. Box
- 11391, Burke, VA 22009, (703) 866-9477.
-
-
- Stewart, Dufour and Gossage Distributes ProminareTM
-
-
- Stewart, Dufour and Gossage Ltd. have begun distributing Pominare Inc.'s
- ProminareTM in North America. ProminareTM combines a graphically oriented GUI
- design and code generation tool with an integrated development environment and
- a programming editor. The GUI development part of ProminareTM supports PM
- controls including those for MMPM/2 and Pen-PM. ProminareTM supports C/C++
- compilers and CommonView. The GUI tool also supports direct creation of
- resource files as well as programming code.
- According to the company, the integrated development environment of
- ProminareTM eliminates most of the common mistakes encountered in trying to
- build OS/2 applications. ProminareTM's graphical interface supports
- compiler-independent specification of common options required for compiling
- and linking programs. The IDE works with the programming editor to manage
- compiler and linker-detected errors.
- ProminareTM's programming editor is PM-based but performs, according to the
- company, like a character-based editor. The editor is integrated with the IDE
- and the GUI design tool and provides links to on-line help for the IBM
- Toolkits. The single-user and network versions of ProminareTM are $895 each.
- Additional workstation modules are $795. There are no run-time fees or
- royalties. For more information contact Stewart, Dufour & Gossage Ltd.,
- 210-1730 Courtwood Cr., Ottawa, Ontario, K2C 2B5, Canada, (613) 225-2121; FAX:
- (613) 225-2624; BBS: (613) 225-2968.
-
-
- DDC International A/S Introduces 1stOBJECT EXEC
-
-
- DDC International A/S has introduced 1st OBJECT EXEC, a C++-based real-time
- operating system aimed at industrial use. DDC-I regards C++ as a highly
- suitable candidate for safely solving control problems, and expects its
- run-time system to find uses from electronic measuring systems to life and
- safety-critical appplications. According to DDC-I, their C++ run-time system
- can facilitate rapid development of complex systems and advanced
- communications with the switches, sensors, and motors used in new designs.
- For more infomation contact DDC International A/S, Gi. Lundtoftevej 1B,
- DK-2800 Lyngby, Denmark, +45 45 87 11 44; FAX: +45 45 87 22 17; Telex: 37704
- ddci dk.
-
-
- Omega Systems Ships VERSIONS
-
-
- Omega Systems has begun shipping VERSIONS, a version control system for
- Windows. VERSIONS provides version control support for programmers by managing
- multiple project resources including source code, documentation, or other
- files. VERSION supports varied file formats, including binary files, and
- doesn't limit the number or types of files which can be maintained. VERSIONS
- is network-compatible and supports multiple developers working on the same
- project, allowing storage of both temporary and permanent versions of a file.
- Eschewing the approach of storing incremental changes to files, VERSIONS
- stores previous versions of a file in a "master project," a proprietary format
- that labels, maintains, and tracks files for access to both the most current,
- as well as any past version. VERSIONS also tracks changes to a project either
- on a server or local workstation through the master project. Users check files
- into and out of the master project as needed and VERSIONS automates the
- process of determining which files need to be checked in or out. VERSIONS is
- $99. One copy of VERSIONS is required for each workstation in a networked
- installation. For more information contact Omega Systems, 5405 Alton Parkway,
- Suite 5A494, Irvine, CA 92714, (800) 458-5467 or (714) 253-6700; FAX: (714)
- 253-6712.
-
-
- EMS Professional Shareware Upgrades C/C++ Utility Library
-
-
- EMS Professional Shareware has upgraded its C/C++ Utility Library on CDROM.
- The library has over 1000 public domain and shareware products for programmers
- using C/C++, Microsoft C, and Turbo C. The products are compressed on 46 1.44
- Mb diskettes or one CD-ROM. All products in the library, and 150 commercial
- products, are described in an indexed database which accompanies the library.
- When a programmer needs to locate a particular type of C/C++ product, he or
- she can find it by vendor, name, type, release date, or free text search
- across descriptions. The C/C++ Utility Library contains a variety of files,
- including: Array, Binary Tree, Communication, Compression, Database, Debugger,
- Editor, Graphics, Linked List, Memory Management, MS Windows, MS Windows NT,
- Paradox Engine, Program Generator, Reference, Spreadsheet, String, TSR/ISR,
- Virus, and other types.
- The C/C++ Utility library is $59.50 on CD-ROM or $149 for the diskette
- versions. A subset of the C/C++ library containing 406 files for C++ only is
- available on 20 diskettes for $59.50. All products come with a 30-day,
- money-back guarantee. For more information contact EMS Professional Shareware,
- 4505 Buckhurst Ct., Olney, MD 20832, (301) 924-3594; FAX: (301) 963-2708;
- e-mail: eengelmann@worldbank.org.
-
-
- Network Dynamics Upgrades Internationalization Toolkit
-
-
- Network Dynamics has released the Internationalization Toolkit v3.0. The
- toolkit is used by programmers to simplify and expedite the
- internationalization/localization of software. Formerly known as "The String
- Externalization Tools," the toolkit has been expanded to include support for
- Windows and Presentation Manager string resource files, string length
- tracking, string file appending, an extraction "undo" utility, string
- replacement, handling of static initialization, and a graphical user
- interface.
- Version 2.0 features like string extraction, multi-byte Kanji support,
- software lifecycle support, string tagging, string caching, and string
- re-insertion have been expanded in version 3.0, with the intent of providing
- greater flexibility and ease of use. The user's manual has also been expanded.
- A "white paper" on the subject of software internationalization is available
- as an option. Sample programs are included to illustrate DOS code page
- swapping and keyboard manipulation.
- The Internationalization Toolkit is available in source code form for C/C++.
- Plans call for release of Turbo Pascal and Quick Basic versions in 1994. The
- Internationalization Toolkit supports DOS, OS/2, and UNIX platforms. The
- Internationalization Toolkit is $249.95 for a royalty-free source code license
- for up to 10 programmers. For more information contact Network Dynamics, (804)
- 220-8771; FAX: (804) 220-5741.
-
-
- Cadre Technologies Introduces Ensemble Viewer
-
-
- Cadre Technologies Inc. has released Ensemble Viewer, an interactive 2-D and
- 3-D graphical tool for visualizing C programs.
- Ensemble Viewer supports browsing of program flow, data structure, and
- physical file structures. Ensemble Viewer provides interactive views of key
- program aspects by displaying program information and test results that are
- stored in the Ensemble database. Ensemble Viewer lets the user interact with
- the software design, code, and files by viewing a graphical representation of
- the actual program structures. This interaction lets a user understand the
- program or see the impact of program changes without having to read through
- the source code. Ensemble Viewer is available for Sun SPARCstations and the
- company plans called for a release on HP9000 and IBM RS/6000 during the first
- quarter of 1994. Prices for Ensemble Viewer start at $2,400 depending on
- configuration. For more information contact Cadre Technologies Inc., 222
- Richmond St., Providence, RI 02903, (401) 351-5950; FAX: (401) 351-7380.
-
-
- TerraLogics Upgrades Mapping Software Toolkit
-
-
- TerraLogics, Inc. has upgraded TerraView, its geographic mapping software
- development kit. TerraView v4.0, lets developers embed geographic maps
- directly into the source code of Windows applications developed with
- Microsoft's Visual C++ and C/C++ v7, Borland's C++ v3.1, Gupta's SQL Windows,
- and Powersoft's PowerBuilder.
- Features of TerraView v4.0 include: raster data support, which lets developers
- superimpose TerraView data maps over aerial photographs or satellite images;
- style sheets, which are used to customize map displays; dynamic renditioning,
- which supports style changes for a single use of a map; and faster updating of
- mobile symbols for real-time tracking and display applications. TerraView v4.0
- lets map users access data from many database managers, including Gupta
- SQLBase, dBASE, and Oracles, as well as Defense Mapping Agency's Digital Chart
- of the World and the Census Bureau's Topologically Integrated Geographic
- Encoding and Referencing (TIGER) data.
- TerraView v4.0 is available for MS-DOS and Windows, Apple Macintosh, and UNIX
- systems from SUN, HP, and IBM. For more information contact TerraLogics, Inc.,
- 600 Suffolk St., Lowell, MA 01854, (508) 656-9900; FAX: (508) 656-9999.
-
-
-
- PractiSys Releases STORC GOLD v2.0
-
-
- PractiSys has released v2.0 of STORC GOLD, its Windows form conversion tool.
- STORC GOLD v2.0 lets Microsoft Visual C++ developers directly import Visual
- Basic .FRM files into C++ projects, reusing both form design and VBX controls.
- STORC GOLD v2.0 is $45.93 per copy per single user. Developers can download a
- fully-functional evaluation copy from CompuServe (GO WINSDK or GO MSBASIC), or
- can obtain a copy from PractiSys for $3. For more information contact
- PractiSys, 4767 Via Bensa, Agoura, CA 91301, (818) 706-8877.
-
-
- Computer Applications Specialists Introduces ProMet
-
-
- Computer Applications Specialists has introduced ProMet, a C/C++ metrics
- utility. ProMet measures program size, number of comments and comment density,
- McCabe's and other complexity measurers, level of nesting, number of program
- jumps, and number of literals. These measures provide quantitative data for
- estimation of effort and conformance to programming style guidelines.
- In addition to calculation of metrics for each function and each module,
- ProMet provides program summary data and a "Top Ten" report that highlights
- the ten functions that have the highest metric scores. The Top Ten functions
- are presented both in a file list and as a graph to highlight the code
- sections that have the highest likelihood of having errors. The Top Ten list
- lets testing and review resources focus on code areas that are the most
- complex and may suggest the need for additional oversight.
- ProMet is $99 and comes with a manual that provides a background on software
- metrics and their use. For more information contact Computer Applications
- Specialists, 9948 Hibert St., Suite 103, San Diego, CA 92131, (619) 695-2600;
- FAX; (619) 695-0794.
-
-
- Digital Information Systems Corporation Ports PVCS to Alpha AXP
-
-
- Digital Information Systems Corporation and Digital Equipment Corporation have
- ported Intersolv's PVCS Version Manager v5 and PVCS Configuration Builder v5
- products to Digital's Alpha AXP. Under the terms of the agreement, DISC is
- porting, distributing, and providing technical support for PVCS products on
- operating systems not supported by Intersolv. DISC has completed ports to
- OpenVMS, VAX/VMS, OSF/1, and several UNIX-based platforms.
- Features of PVCS Version Manager v5 include: SQL support facilities, expanded
- file support, user-defined delta generation, and internationalization
- capabilities. Features of PVCS Configuration Builder v5 include: run-time
- diagnostics, build techniques, directives, and build script compilation
- capabilities.
- The price for PVCS Version Manager v5 starts at $599 for a 1-4 user license
- and Configuration Builder starts at $399. For more information contact Digital
- Information Systems Corporation, 11070 White Rock Rd., Rancho Cordova, CA
- 95670, (800) 366-3472 or (916) 635-7300; FAX: (916) 635-6549.
-
-
- Odyssey Development Releases ISYS Search Engine
-
-
- Odyssey Development, Inc., has released the ISYS Developers' Toolkit, which
- includes Odyssey's ISYS text retrieval software. Developers and OEMs can
- integrate the ISYS text retrieval engine into applications for CD-ROM
- authoring, electronic publishing, or document and image management. ISYS can
- access textual information stored in word processor and other files. ISYS can
- read 28 word processor formats, as well as some spreadsheet and database
- files. Documents to be searched remain in their native formats. The
- Developers' Toolkit also lets OEMs develop External Access Modules. ISYS can
- then access and index "foreign" data sources, such at text stored in a
- relational database.
- The ISYS engine is available for Microsoft Windows and MS-DOS. The ISYS engine
- can be called from many major languages. Sample code is provided for C,
- Pascal, and Visual Basic. For more information contact Odyssey Development,
- Inc., 650 S. Cherry St., #220, Denver, CO 80222, (303) 394-0091: FAX: (303)
- 394-0096.
-
-
- SLR Systems Upgrades OPTLINK
-
-
- SLR Systems Inc. has upgraded OPTLINK for Windows. OPTLINK v5.0 for Windows
- lets developers generate compressed executables (.EXE) and Dynamic Link
- Libraries (.DLL). According to the company, OPTLINK v5.0 for Windows can
- reduce file size by 50 percent. Compressed EXEs and DDLs generated by OPTLINK
- become self-loading and decompress automatically as segments are demanded.
- OPTLINK v5.0 supports Borland, Microsoft, and Symantec C++ compilers.
- Debugging support is included for Borland's Turbo Debugger and Microsoft's
- CodeView. As a DOS-hosted linker, OPTLINK is distributed in two forms, real
- mode and protected mode (DPMI).
- OPTLINK v5.0 for Windows is $350. Registered owners of v4.0 will receive the
- v5.0 update free, while other previous owners can purchase the update for
- $165. For more information contact SLR Systems, Inc., 1622 N. Main St.,
- Butler, PA 16001, (412) 282-0864; FAX: (412) 282-7965; BBS: (412) 282-2799.
-
-
- Segue Releases QA Partner for NT and OS/2
-
-
- Segue Software has released its QA Partner cross-platform GUI test tool for
- IBM's OS/2 and Microsoft NT. Using QA Partner, developers can test
- applications on either of these platforms, and the test scripts will be
- portable across the other platforms QA Partner supports, including Windows,
- Macintosh, VMS, and Motif.
- QA Partner for both IBM OS/2 and Microsoft NT is $1,495. For more information
- contact Segue Software, Inc. 1320 Centre St., Newton Centre, MA 02159, (617)
- 969-3771 :FAX: (617) 969-4326.
-
-
- Intel Ships Plug and Play Kits
-
-
- Intel Corporation has begun shipping three Plug and Play Development kits. The
- kits are designed to help developers in making PCs easier to configure. The
- Plug and Play Kits contain the software to perform the automatic configuration
- of new Plug and Play cards and Plug and Play-ready systems.
- The first kit, the "Plug and Play Kit for MS-DOS and Windows," includes a DOS
- driver, interface libraries, a configuration utility, a VHDL description for
- an ASIC implementation, and reference manuals. The second kit, the "Plug and
- Play BIOS Enhancements Kit," contains BIOS software that detects and
- configures PCI and Plug and Play ISA add-in cards. Software is available in
- source code form. The third kit, "The Plug and Play ISA Hardware Demo Kit,"
- contains a functional audio Plug and Play ISA demo card and Windows v3.1
- Virtual Device Drivers. The kit also includes diagnostics, board schematic
- files, and speakers. Bundled with this kit is the "Plug and Play Kit for MS
- DOS and Windows."
- The Plug and Play ISA Hardware Demo Kit is $895. For more information contact
- Intel Literature Packet #F8PO1, P.O. Box 7641, Mr. Prospect, IL 60056, (800)
- 548-4725.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- We Have Mail
- Dear Mr. Plauger:
- There are a number of aspects of the current state of the art of software
- engineering that I find less than satisfactory. They are
- 1) Bug-infested tools
- 2) The dearth of productive engineering tools
- 3) The programming languages themselves.
- First of all, I have used Lattice C, Turbo C, and Microsoft and Borland C/C++
- compilers. Our group chose the latter two for their third-party and Windows
- support. First I used Lattice C 3.1. It didn't do floats but doubles seemed to
- work OK. Next I used Microsoft v5.0. It optimized code out of existence.
- Because of that we switched to Turbo C/C++. It refused to do assignments of
- floating point (float or double) literals in the range between 0 and 1!
- This problem only occurred when linking together 20-odd modules, In a
- three-module test program all worked fine. Due to project deadline pressure I
- was never able to whack away at the 20-odd modules to the point where the
- problem went away in order to determine its cause. I worked around it. This
- problem was also present in Borland C/C++ 3.0, but does not occur in version
- 3.1 with identical source code. I have seen two instances where Borland C/C++
- 3.1 handled the following snippet where the function returns the value 1 by
- not taking the if:
- if ( func() ) // returns 1
- do_something(); // never executed
- Changing this to the following worked, however:
- status = func();
- if ( status )
- do_something()
- My first gripe about the state of the art is that compiler vendors seem to
- concentrate on features like IDEs, 32-bit versions (64-bit next, no doubt),
- Windows support, etc., while basic functionality is not solid. Floating point
- seems especially difficult for these people. (You'd think that after ten years
- of writing C compilers...) I really don't know if Borland C/C++ v3.1 has
- floating point problems; I just haven't seen any problems so far. Now if
- Borland and Microsoft are the least-buggy products on the market, I'd hate to
- see the others. I would love to be able to trust the compiler, but that seems
- to be wishful thinking. At least we have debuggers to go roto-rooting in the
- generated assembly code.
- More and more vendors are charging for phone support. Now I get to pay them
- for reporting bugs in their products. Borland actually told me to call their
- 900 number to discuss a bug! No thanks! Along with the bug situation, I wish
- compiler error and warning messages were more accurate. In many cases I have
- to ask myself, "Now what does it really mean?" I use PC-LINT religiously. To
- the compiler vendor's credit, error and warning messages have been getting
- better in recent years.
- Now lets add C++ on top of this whole mess. I enjoyed your article,
- "Programming Language Guessing Games" in Dr. Dobb's (October 1993). I agree of
- course with your suggestion to "pick a subset of the [C++] language that
- minimizes surprises, learn it well, and don't stray from it." The problem is
- that the compiler vendor chooses to support everything that looks like it will
- make it through the ANSI committee. If they can't get some of these simple
- things solid (like the function return value example above) how can I trust
- them to do some of the complicated things you have described for C++? Even if
- I do restrict myself to a subset of the language, it could be that the other
- features induce bugs in the parts that I am using. The bottom line is that the
- majority of our time is spent on maintenance. Compiler bugs cost us. Secondly,
- the support tools available do not help me with software engineering (as
- opposed to hacking which I define as, "code it first without thought of
- design, document it later") as I would like. Diagrams are a neat idea, but
- then we have to translate the diagram into code by hand and translation is
- where errors creep in. Management might go for the time spent on diagramming
- if automatic code generators were available. Code generators for Windows
- screens and other user interfaces are generally available, but I haven't seen
- any for other parts of a program. A framework generator would be nice.
- Hardware engineers have schematic capture and printed-circuit-board layout
- tools including auto routers, but as software engineers, we do analogous
- things by hand. Also, I've yet to see a decent cross-reference generator after
- evaluating several. The ones with usable output are way too slow. I use GNU
- CTAGS, but wish it did more. I know of no general purpose, highly configurable
- cross-reference generator. We have written our own programs and editor macros
- to insert comment blocks for functions and modules and for extracting those
- into a Word document, but so much more could be done. I am beginning to write
- add-on tools for Codewright, a very extensible Windows programmer's editor.
- Thirdly, despite some ugly features of C, I like it a lot (more than Pascal,
- Modula-3, Eiffel, etc.). For example, C overloads the static and void
- keywords, allows functions to return pointers to automatic variables, and
- various other gotchas mentioned in Andrew Koenig's C Traps and Pitfalls.
- (Being able to execute an array of opcodes is a feature?) But for the sadist,
- C++ is a real treat. Now we have not two but three meanings of static and not
- two but four meanings of void!
- C and C++ violate many of the accepted guidelines of language design. Many
- things can be done several different ways, much of it behind the scenes. I
- shudder whenever I read one of your articles concerning C++, your latest in
- The C User's Journal on the Standard C++ library included (P.J. Plauger,
- "Standard C: Developing the Standard C++ Library," CUJ, October 1993). I know
- that I cannot avoid learning C++, but hopefully it will be replaced soon with
- something simpler. We need new paradigms and metaphors to handle advanced
- concepts. Languages will have to become more complicated, but they should be
- as simple as possible. I like the approach taken by Meyer in Eiffel of
- including an extensive set of libraries. The less I have to write myself, the
- more productive I am. Hopefully, libraries supplied with compilers are well
- thought-out and bug-free. I wish Borland C/C++ came with the library support
- that Eiffel or NextStep Objective-C does. A collection of third-party classes
- or libraries does not have the consistency that a single-sourced library does.
- I have only touched the surface of some current problems in the software
- industry. What I would like to know is:
- 1) Which in your experience are the least-buggy C/C++ compilers?
- 2) Do you know of a diagrammer w/code generator? What other engineering tools
- have you found to be helpful?
- 3) Do you know of any cross-reference generator that outputs a list of
- identifiers including the module name, line number, and function (if any) they
- were found in, and that is fast?
- 4) What do you think of Modula-3, Eiffel, and IBM's Object REXX? Thank you in
- advance for your help. Thanks for your excellent articles. Please keep them
- coming.
- BRUCEDICKEY@VAX.MICRON.COM
- Bruce Dickey
- Micron Semiconductor, Inc.
- 2805 E. Columbia Rd., MS 892
- Boise, ID 83706
- Whew! You raise a whole slew of issues. I agree with you almost across the
- board, except that I am a bit more sympathetic to the plight of all those
- compiler vendors out there scrabbling for market share in a turbulent
- industry. I guess that comes from selling compilers myself for ten years.
- Growing complexity seems always to offset gains in our ability to manage
- software development and reduce shipped bugs.
- I don't know how to answer any of your questions at all well, but I invite
- other readers to chip in. My opinion of Modula-3 and Eiffel, for what it's
- worth, is fairly simple. Both offer interesting combinations of features, but
- neither has quite pulled off the synergy of C or C++. I don't know enough
- about REXX to comment. -- pjp
- Dear Editor,
- My compiler issues an error diagnostic on the call to g in the code fragment
- shown in Listing 1, but no error on the call to f The error claims that there
- is a pointer mismatch in the argument. I am using Digital's DEC C compiler
- under OpenVMS on their new Alpha processor. When I contacted DEC's customer
- service, they explained that the ANSI standard allows conversion from a
- pointer to a qualified type to a pointer to a non-qualified type, but does not
- allow 'ignoring' the qualifier for a pointer to a pointer to a type. Their
- argument seemed somewhat niggling to me, so I wanted to appeal to a higher
- authority. How do you think a compiler should handle this code (shown in
- Listing 1)? Sincerely,
- Dick Hile
- T and B Computing, Inc.
- 24 Frank Lloyd Wright Dr.
- P.O. Box 302 - Lobby A
- Ann Arbor, Michigan 48106-0302
- DEC is correct. The distinction may appear niggling to you, but we meant to do
- it that way when we wrote the C Standard. -- pjp
- Dear Mr. Plauger,
- I would like to add my two cents' worth to Mr. Mike Musielski's comments (CUJ,
- October 1993) on the nature of C as a language. ("Is C a language?" was his
- opening question.) The short answer to that question is, no, C is not a
- language. Neither is any programming language. The programming-as-writing
- analogy is interesting, but like most analogies it is partly true and partly
- dangerous.
- A programming language is really a notation system: defined narrowly,
- conceived as a formal means of expression, and limited in its range of
- expressiveness, at least when compared with natural languages. No programming
- language is used as an end in itself (except of course by participants in
- Obfuscated C contests), whereas a natural language is used that way all the
- time by artists whose only goal it is to explore the language's poetic
- possibilities. This is the main reason why programs, no matter how short,
- cannot be compared to short stories or other forms of fiction.
- Nevertheless, there is a basis for the analogy. Like natural languages,
- programming languages do evolve and adapt themselves to new situations and
- conditions. And as they do, new vocabulary creeps in, functions begin to
- overlap, and things begin to get -- well, messy. Even standards cannot clear
- up all ambiguities: there will always be more than one way to skin a cat. You
- can say "pass away" or "kick the bucket," just as you can use strtol or atol.
- Both elements in each pair approximate, but do not exactly map, the same idea.
- How are you to recognize when it is appropriate to use one and not the other?
- This is what Mr. Musielski wants to know.
- There are some useful texts out there. I suggest Mr. Musielski look at (again,
- if not for the first time) The C Programming Language, by Kernighan and
- Ritchie. Another book I have found useful on occasion is C Lab Notes, by
- Flanders and Holmes. This book has a nice feature in that each chapter has at
- its head a list of the tasks that are to be explored inside ("forward and
- reverse scrolling," "sending a message to another node," etc.). And, not
- least, The C Users Journal is itself very instructive in matters of language
- use. None of these texts is consciously "literary," but all take the implicit
- stance that a programming language is an idiom to be mastered. Mastering the
- language (your "material") is the goal of literary writing as well.
- I think what might be very beneficial to Mr. Musielski (and others, myself
- included) would be a kind of reverse dictionary: a list of common tasks along
- with the range of functions, procedures, tools, etc. that could be used to
- fulfill each task's requirement. Such a book could not possibly be exhaustive
- and the organizational problems would be daunting, but a good text like this
- might be useful enough to be a starting point. Of course, as writers learn
- from writing, programmers learn best from programming. There is still no
- effective substitute for trial and error--many trials, many errors.
- Sincerely,
- Michael Nichols
- 1725 York Avenue
- New York, New York 10128
- Thanks for your insightful comments. -- pjp
- Editor:
- I found Robert Watson's "DMA Controller Programming, in C" (CUJ, November
- 1993) interesting, particularly of course the protected-mode details, but
- found his conclusion, "the technique is actually very easy to use" laughable,
- considering what he had just painstakingly documented. "Virtually impossible"
- is more like it. I agree with his further conclusion: that DMA is increasingly
- obsolete, since fast CPUs deal with dedicated card memory blocks more
- expeditiously (and of course the implication that such memory is relatively
- cheap in modern times -- DMA was originally a cost as well as speed feature).
- I also think along the way he erroneously minimized the difficulties involved
- in getting hold of a DMA buffer in real mode ("in real mode, generating a DMA
- buffer is relatively easy"). I hope I'm missing something, but in my source
- where I tried to accomplish this there are three or four failed attempts
- commented-out, and I'm not at all happy with my current effort, shown in
- Listing 2.
- If anybody knows how to do this rationally, stop me before I code again! The
- crude strategy here is to keep getting buffers, and if they're no good --
- won't do DMA because they cross a physical boundary -- retain the offending
- part of the buffer and keep trying; and then when we get a good one, free all
- the chunks accumulated along the way. This approach relies on totally
- unreliable assumptions about memory allocation, and really comes down to a
- "keep trying until you give up" approach.
- Operating systems in 80x86 land get to own low memory, where it is relatively
- easy to set-up a good DMA buffer. Programs, on the other hand, can be loaded
- at any address, and can't make any such arrangements except, as far as I can
- figure out, by using awful like the example.
- And incidentally, who owns the VDS (Virtual DMA Services) interface mentioned
- in the article? I have a vague idea how one goes about getting in touch with
- DPMI; is VDS one of these things that comes with DOS extenders and/or Windows
- or what?
- j.g. owen
- Software engineer
- 31 Darby Drive
-
- South Huntington, NY 11746
- cis 71121,625
- To the Editor:
- Concerning the letter from Lawrence H. Hardy in the 11.12 issue of CUJ, page
- 136, asking for information about PCX graphics:
- And I thought my memory was going... Actually, this might instead be seen as
- an illustration of the need for a CUJ index -- depending on one's perspective
- of course. At any rate, whilst fumbling through a stack of old Journals
- seeking edification on a particular coding problem, I came across Vol. 9, No.
- 8 (August, 1991). Emblazoned across the cover was "PCX GRAPHICS DOCUMENTED!"
- And sure enough, there on page 89 was a decent discussion of the topic by one
- Ian Ashdown, complete with bibliography.
- I would also like to join Ian Somerton (p. 127) in expressing concern, if not
- alarm, at the exponential increase in C++ coverage of late. Perhaps someone on
- your staff misread it as C**?
- Sincerely,
- Scott Swanson
- Box 75
- Pendroy MT 59467
- I am notorious within R&D for my instant amnesia, once an issue goes out the
- door. You're right that an index to CUJ is indispensable, and the folks back
- in Kansas have anticipated your wish. As for C++ coverage, that happens to be
- an area of intense interest in our community at the moment. We haven't
- forgotten about C by any means, but the mix what we publish is strongly
- influenced by the mix of what's proposed to us in the way of articles. -- pjp
-
- Listing 1 Creates "pointer mismatch" error message
- char x;
- char *xp = &x;
- char **xpp = &xp;
- void f(const char *cp)
- {}
- void g(const char **cpp)
- {}
- void a(void)
- {
- /* Call to 'g' gets pointer mismatch error */
- f(xp);
- g(xpp);
- }
-
- /* End of File */
-
-
- Listing 2 Attempts to allocate a real-mode DMA buffer
- #define BYTE char
- typedef unsigned int WORD;
- typedef unsigned long BIG;
-
- #define JUNKCHUNK (128)
- /* how we will search for physical
- bounderies. (Must be >= sizeof(JUNK))
- */
-
- /* produce a big physical address. This
- is "large" mode code; " FP_SEG" etc.
- are
- Turbo C macros which extract the two
- parts of an 80x86-style segment-offset
- pointer. */
-
- #define PHYSICAL(x) ((BIG)((((BIG)
- (FP_SEG(x)))<<4)+((BIG)FP_OFF(x))))
-
- typedef struct JK
- struct JK *next;
- } JUNK;
-
- _f void *mustallocdma(WORD size) {
- BYTE *a;
- JUNK root, *next, *p;
- BIG leftover;
- WORD junksize;
-
- root.next=NULL;
- next=&root;
-
- while(1) {
- a=mustalloc(size);
- /* mustalloc==malloc,
- but aborts if failure;
- the function exists this
- way or via the break below
- at success.*/
- if (
- (leftover=(PHYSICAL(a) & 0xffff)) +
- ((BIG) size)
- >= 0x10000) {
- /* it's no good. */
- if (
- (junksize = 0x10000-leftover)
- < JUNKCHUNK)
- junksize=JUNKCHUNK;
- /* avoid endless
- teeny-tiny manipulations. */
-
- free(a);
- next->next=mustalloc(junksize);
- next=next->next;
- }
- else
- break; /* success! */
- }
- /* free debris. */
- next=root.next;
- while(next)
- p=next;
- next=next->next;
- free(p);
- }
- return a;
- }
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Expanding a Conversation Processor for Time
-
-
- Russell Suereth
-
-
- Russell Suereth has been consulting for over 12 years in the New York City and
- Boston areas. He started designing and coding systems on IBM mainframes and
- now also builds PC software systems. You can write to Russell at 84 Old
- Denville Rd, Boonton, NJ 07005. Russell Suereth is now completing a book, to
- be published this summer by R&D Technical Books, entitled Processing Human
- Conversations. He has extended his basic processor to deal with more complex
- issues such as idioms, generating questions, and identifying general themes.
-
-
- This article expands the language processor presented in "A Natural Language
- Processor," CUJ, April 1993, to include time. That processor could accept
- input sentences from a user, and within certain limited contexts, accurately
- interpret their meaning. The processor can now recognize time words and
- phrases to process more kinds of input sentences. I've included additional
- processes for tense and number in the article. These additional processes help
- clarify the meanings of ambiguous sentences, and generate an error response
- when tense or number don't agree.
- The code presented here is the part of the language processor that deals with
- time. The complete processor is not shown here, but it is available on the
- code disk.
-
-
- Defining Time Words and Phrases
-
-
- Time phrases are sequences of words that clarify when an event occurs. Without
- time phrases the event can only be recognized as occurring in the past,
- present, or future, by examining the tense of the sentence in which the event
- occurs.
- Time phrases identify events by specific time, elapsed time, or habitual
- actions. Specific time is indicated by the day, month, year, or clock time.
- The sentence "Jim runs at one o'clock" identifies the specific time Jim runs.
- Elapsed time is indicated by a duration. The sentence "Jim runs for one hour"
- identifies the length of time Jim runs. Habitual actions are indicated by
- words such as "each" and "every." The sentence "Jim runs every day" identifies
- the action Jim often performs.
- A time word is a word that clarifies when an event occurs. The word may stand
- on its own, or be part of a time phrase. Words such as "Tuesday," "midnight,"
- and "morning" are time words, but the words "at," "in," and "last" are also
- time words when used in a time phrase. The processor identifies time words by
- matching the input sentence to an underlying structure.
-
-
- Identifying Time Words and Phrases
-
-
- Listing 1 contains the code for this article. The main routine calls the
- check_underlying and check_time routines to identify time words in the input
- sentence. The check_underlying routine matches the input sentence to the
- underlying structures. If the match is successful, then the sentence is
- processed. Time words are words that match an underlying structure and have a
- TIME word type. The processor recognizes adjacent time words as a time phrase
- and assigns the value TIMEPHRASE to these words. The check_time routine copies
- the time phrase into the times array. The times array contains a time phrase
- entry for each input sentence.
-
-
- Deriving Time Meaning
-
-
- The processor derives time meaning from specific time words in the input
- sentence. Some time words and their meanings are shown in Table 1. When one of
- these words occurs in a sentence and indicates time, that word's time meaning
- is assigned to the sentence.
- The main routine calls derive_time_meaning which looks at the input sentence's
- time phrase. If a time phrase word matches a coded time word, then that word's
- time meaning is assigned to the time_meaning array. The time_meaning array
- contains a time meaning entry for each input sentence.
-
-
- Interpreting Auxiliary Meaning
-
-
- Many sentences contain auxiliary phrases which qualify the certainty of a
- particular event's occurrence. Example phrases are "could be" and "must have
- been." The auxiliary phrase helps the processor understand the sentence. On
- the other hand, the sentence can't be fully understood if the auxiliary
- meaning is unclear. Unclear auxiliary meaning occurs when the sentence has no
- auxiliary, when the auxiliary has more than one meaning, or when the auxiliary
- meaning is ambiguous. If the auxiliary meaning is unclear, the processor can
- use the auxiliary and sentence tense to determine a clear meaning. Table 2
- shows unclear auxiliary and tense combinations with their clear meaning.
-
-
- Handling No Auxiliary
-
-
- The auxiliary meaning is unclear when the sentence has no auxiliary. Sentences
- with no auxiliary have an implied auxiliary meaning. The processor uses the
- sentence tense to identify the implied meaning. For example, the sentence "Jim
- ran in the race" has no auxiliary and is past tense. That sentence is similar
- to another sentence with past tense "Jim had run in the race." The auxiliary
- "had" means a particular point of time. When an input sentence has no
- auxiliary and is past tense, the processor assigns the particular point of
- time meaning to the sentence. In another example, the sentence "Jim runs in
- the race" has no auxiliary and is present tense. That sentence is similar to
- another sentence with present tense "Jim is running in the race." The
- auxiliary "is," used in the present tense, means limited duration. When an
- input sentence has no auxiliary and is present tense, the processor assigns
- the limited duration meaning to the sentence.
- The main routine calls derive_aux_meaning to derive a clear auxiliary meaning
- from an unclear meaning. The derive_aux_meaning routine looks at the
- auxiliaries entry for the sentence. If the auxiliary's string length is zero,
- then the sentence has no auxiliary and the routine looks at the tenses entry.
- If the tenses value is PAST, then PARTICULAR_POINT_OF_TIME is assigned to the
- auxiliary meaning. If tenses is PRESENT, then LIMITED_DURATION is assigned to
- the auxiliary meaning. If tenses is FUTURE, then the processor assigns
- FIXED_PLAN to the auxiliary meaning.
-
-
- Handling Ambiguous Auxiliaries
-
-
- An ambiguous auxiliary is also potentially confusing to the processor. An
- example is the auxiliary "could be" in the sentence "Jim could be running in
- the race." In the present tense, "could be" has two meanings. That sentence in
- the present tense means "Jim is able to run in the race," or "Jim is permitted
- to run in the race." The auxiliary meaning remains ambiguous because "could
- be" has two meanings in the present tense. In the future tense, "could be" has
- one meaning. That sentence in the future tense means "It is possible that Jim
- will run in the race." The auxiliary meaning is clarified because "could be"
- has only one meaning in the future tense.
- The derive_aux_meaning routine looks for specific ambiguous auxiliaries in the
- sentence's auxiliaries entry. If a specific ambiguous auxiliary is found, then
- the routine assigns a clear meaning to the sentence's auxiliary meaning. The
- routine assigns a clear meaning based on the value in the sentence's tenses
- entry.
-
-
- Asking for Specific Meaning
-
-
-
- The processor can clarify ambiguous auxiliary meaning when certain tenses are
- used. But when the auxiliary meaning remains ambiguous, then the processor
- generates a response that asks for more specific meaning. Figure 1 shows a
- processor session with ambiguous auxiliary meaning in the input sentence.
- The make_response routine calls ask_meaning to generate a response that asks
- about ambiguous auxiliary meaning. The ask_meaning routine looks for specific
- auxiliaries in the sentence's auxiliaries entry. If a specific auxiliary is
- found, then the routine generates a response. The routine also looks at the
- sentence's tenses and numbers entries. These entries help identify the
- appropriate auxiliaries that can be used to make the response grammatical.
-
-
- Error Handling
-
-
- Time phrases can refer to an event in the past, present, or future. This time
- reference must match the past, present, or future tense of the sentence. If
- the time and tense conflict, then the sentence doesn't make sense and is
- ungrammatical. An example is the sentence "Jim had run next week." The time
- and tense don't match because "had run" refers to the past, while "next week"
- refers to the future.
- The make_response routine calls check_agreement to identify agreement errors
- in the input sentence. The check_agreement routine looks at the time_meaning
- and tenses entries for the input sentence. If the time_meaning entry is LAST,
- and the tenses entry is PRESENT or FUTURE, then agreement_error is called to
- generate an agreement error response. If the time_meaning entry is NEXT, and
- the tenses entry is PAST, then agreement_error is called for the agreement
- error response. The check_agreement routine also calls agreement_error when
- either the sentence number or sentence tense are in error.
- The processor generates an error response for three kinds of grammar conflicts
- in the input sentence. A grammar conflict between time meaning and auxiliary
- meaning is shown in the sentence "Jim will run last week." A grammar conflict
- between auxiliary tense and verb tense is shown in the sentence "Jim will ran
- in the race." A grammar conflict among subject, auxiliary, and verb number is
- shown in the sentence "Jim are running in the race." The explanation response
- gives a detailed reason why the input sentence is ungrammatical. Figure 2
- shows the explanation responses for the above ungrammatical input sentences.
- The check_agreement routine calls agreement_error to generate the explanation
- response. The agreement_error routine is passed a value in the error_type
- parameter that identifies the kind of grammar conflict. If the error_type is
- TIME_MEANING_ERROR, then the time and auxiliary meaning conflict and the
- routine generates an appropriate explanation response. If the error_type value
- is TENSES_ERROR, then the auxiliary and verb tense conflict and an explanation
- response is generated. If the error_type value is NUMBER_ERROR, then the
- subject, auxiliary, and verb number conflict and an explanation response is
- generated.
-
-
- Responses Based on Meaning
-
-
- The processor generates a response based on auxiliary meaning when the
- sentence is grammatical. In the original processor the generated response was
- a simple "OK." This expanded processor creates an interesting response based
- on the auxiliary meaning. Given the input sentence "Jim is running in the
- race," the processor assigns the limited duration meaning to the sentence. The
- processor then generates the response "When will Jim stop running" based on
- that auxiliary meaning. Table 3 shows auxiliary meanings and associated
- example responses.
- The make_response routine calls aux_meaning_response to generate a response
- based on auxiliary meaning. The aux_meaning_response routine looks for
- specific auxiliary meanings in the input sentence. If a specific auxiliary
- meaning is found, then a response for that meaning is generated with words
- from the input sentence. The routine also checks the sentence tense, number,
- and subject type so appropriate words can be used to create a grammatical
- response.
-
-
- Summary
-
-
- Several processes are required to process time. The processes described in
- this article identify time phrases and words in the sentence; derive time
- meaning from the sentence; derive a clear auxiliary meaning from an unclear
- meaning; generate a response that asks about unclear meaning; determine time
- and sentence tense agreement; generate a response that explains grammar
- conflicts; and generate a response based on meaning.
- Further expansions to the processor could compare one time to another; include
- time when matching input sentences; generate interesting responses based on
- time meaning; and allow ungrammatical sentences to be fully processed.
- Processing time helps the processor identify grammatical sentences, and to
- derive more meaning from the sentence. As a result, the processor can generate
- responses that, at times, almost appear human.
-
-
- Bibliography
-
-
- Liles, Bruce L. 1971. An Introductory Transformational Grammar. Englewood
- Cliffs: Prentice-Hall.
- Quirk, Randolph, and Sidney Greenbaum. 1973. A Concise Grammar of Contemporary
- English. San Diego: Harcourt Brace Jovanovich.
- Suereth, Russell. "A Natural Language Processor." The C User's Journal. April,
- 1993.
- Suereth, Russell. "Natural Language Expansions for Tense and Number." The C
- User's Journal. June, 1993.
- Figure 1 A processor session when the input sentence has ambiguous auxiliary
- meaning
- Sentence: Jim can run in the race
- Response: Do you mean
- Jim is able to run, or
- Jim is permitted to run
-
- Sentence: Jim could have run in the race
- Response: Do you mean
- Jim was able to run, or
- Jim was permitted to run
-
- Sentence: Jim may run in the race
- Response: Do you mean
- Jim will be permitted to run, or
- Jim will possibly run
-
- Sentence: Jim will run in the race
- Response: Do you mean
- Jim is willing to run, or
- Jim is intending to run, or
- You insist Jim will run, or
- You predict Jim will run
- Figure 2 A processor session with ungrammatical input sentences
- Sentence: Jim will run last week
-
- Response: I don't understand, 'will' means future
- time, but 'last week' means past time
-
- Sentence: Jim will ran in the race
- Response: I don't understand, 'will' and that form
- of 'run' aren't used together
-
- Sentence: Jim are running in the race
- Response: I don't understand, 'are', 'Jim', and that
- form of 'run' don't agree in number
- Table 1 Time words and meaning
- Time Word Time Meaning
- -------------------------
- at Point of Time
- on Day
- in Period of Time
- for Duration
- before Before
- by Before
- after After
- since Since
- between Between
- until Until
- up to Until
- last Last
- next Next
- each Each
- every Every
- all All
- Table 2 The auxiliary and tense combinations below initially have an unclear
- auxiliary meaning. But the processsor can determine a clear meaning through
- sentence analysis.
- Auxiliary Tense Clear Meaning
- --------------------------------------------------
- no auxiliary Past Particular Point of Time
- Present Limited Duration
- is Present Limited Duration
- Future Fixed Plan
- could Future Possibility
- could be Future Possibility
- may be Past Possibility
- Present Possibility
- may have Past Possibility
- Present Possibility
- may have been Past Possibility
- Present Possibility
- would Past Characteristic
- would have Past Characteristic
- would have been Past Characteristic
- would be any Probability
- must have been any Logical Necessity
- must any Obligation
- must be any Obligation
- Table 3 Example responses for auxiliary meaning in the input sentence
- Auxiliary Meaning Example Response
- in Input Sentence
- --------------------------------------------------------------------
- Limited Duration When will Jim stop running
- Particular Point of Time When did Jim run
- Up to Present Will Jim continue running
- Not Completed When did Jim stop running
-
- Possibility Is this highly possible
- Probability Will this be highly possible
- Obligation What happened when Jim didn't run
- Characteristic How often did Jim run
- Logical Necessity Why was it necessary that Jim was running
- Fixed Plan Will Jim be ready to run
-
- Listing 1 The expansion code for time
- /* Copyright (c) 1993 Russell Suereth */
-
- #include "natural.h"
- #define PROBABILITY 220
- #define CHARACTERISTIC 221
- #define LOGICAL_NECESSITY 222
- #define TENSES_ERROR 210
- #define TIME_MEANING_ERROR 211
- #define NUMBER_ERROR 212
- #define POINT_OF_TIME 100
- #define DAY 101
- #define PERIOD_OF_TIME 102
- #define DURATION 103
- #define BEFORE 104
- #define AFTER 105
- #define SINCE 106
- #define BETWEEN 107
- #define UNTIL 108
- #define LAST 111
- #define NEXT 112
- #define EACH 113
- #define EVERY 114
- #define ALL 115
- int ask_meaning(void);
- void check_time(void);
- void derive_aux_meaning(void);
- void derive_time_meaning(void);
- int check_agreement(void);
- void check_time(void);
- void agreement_error(int);
- void aux_meaning_response(void);
- int check_underlying3(void);
- char times[20][31];
- unsigned char time_meaning[20];
-
- /******************************************************/
- /* Determine if the input sentence contains a known, */
- /* underlying structure. If it does, then assign the */
- /* correct types and phrases for the words. */
- /******************************************************/
- int check_underlying()
- {
- int i = 0;
- /* Structure PRON-AUX-VERB-PREP-DET-NOUN */
- if ( (check_type("PRON", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("VERB", i+2) == 0) &&
- (check_type("PREP", i+3) == 0) &&
- (check_type("DET", i+4) == 0) &&
- (check_type("NOUN", i+5) == 0) ) {
- strcpy(prime_types[i], "PRON");
-
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "VERB");
- strcpy(prime_types[i+3], "PREP");
- strcpy(prime_types[i+4], "DET");
- strcpy(prime_types[i+5], "NOUN");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "PREPPHRASE");
- strcpy(phrases[i+4], "PREPPHRASE");
- strcpy(phrases[i+5], "PREPPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]);
- get_aux();
- return(0):
- }
-
- /* Structure NAME-AUX-VERB-PREP-DET-NOUN */
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("VERB", i+2) == 0) &&
- (check_type("PREP", i+3) == 0) &&
- (check_type("DET", i+4) == 0) &&
- (check_type("NOUN", i+5) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "VERB");
- strcpy(prime_types[i+3], "PREP");
- strcpy(prime_types[i+4], "DET");
- strcpy(prime_types[i+5], "NOUN");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "PREPPHRASE");
- strcpy(phrases[i+4], "PREPPHRASE");
- strcpy(phrases[i+5], "PREPPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]);
- get_aux();
- return(0);
- }
-
- /* Structure NAME-AUX-AUX-AUX-VERB-PREP-TIME-TIME*/
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("AUX", i+2) == 0) &&
- (check_type("AUX", i+3) == 0) &&
- (check_type("VERB", i+4) == 0) &&
- (check_type("PREP", i+5) == 0) &&
- (check_type("TIME", i+6) == 0) &&
- (check_type("TIME", i+7) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "AUX");
- strcpy(prime_types[i+3], "AUX");
- strcpy(prime_types[i+4], "VERB");
- strcpy(prime_types[i+5], "TIME");
- strcpy(prime_types[i+6], "TIME");
- strcpy(prime_types[i+7], "TIME");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
-
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "VERBPHRASE");
- strcpy(phrases[i+4], "VERBPHRASE");
- strcpy(phrases[i+5], "TIMEPHRASE");
- strcpy(phrases[i+6], "TIMEPHRASE");
- strcpy(phrases[i+7], "TIMEPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]);
- strcat(auxiliaries[sentence], " ");
- strcat(auxiliaries[sentence], word_array[i+2]);
- strcat(auxiliaries[sentence], " ");
- strcat(auxiliaries[sentence], word_array[i+3]);
- get_aux();
- return(0);
- }
-
- /* Structure NAME-AUX-AUX-AUX-VERB-PREP-TIME */
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("AUX", i+2) == 0) &&
- (check_type("AUX", i+3) == 0) &&
- (check_type("VERB", i+4) == 0) &&
- (check_type("PREP", i+5) == 0) &&
- (check_type("TIME", i+6) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "AUX");
- strcpy(prime_types[i+3], "AUX");
- strcpy(prime_types[i+4], "VERB");
- strcpy(prime_types[i+5], "TIME");
- strcpy(prime_types[i+6], "TIME");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "VERBPHRASE");
- strcpy(phrases[i+4], "VERBPHRASE");
- strcpy(phrases[i+5], "TIMEPHRASE");
- strcpy(phrases[i+6], "TIMEPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]);
- strcat(auxiliaries[sentence], " ");
- strcat(auxiliaries[sentence], word_array[i+2]);
- strcat(auxiliaries[sentence], " ");
- strcat(auxiliaries[sentence], word_array[i+3]);
- get_aux();
- return(0);
- }
-
- /* Structure NAME-AUX-AUX-AUX-VERB-TIME-TIME */
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("AUX", i+2) == 0) &&
- (check_type("AUX", i+3) == 0) &&
- (check_type("VERB", i+4) == 0) &&
- (check_type("TIME", i+5) == 0) &&
- (check_type("TIME", i+6) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "AUX");
- strcpy(prime_types[i+3], "AUX");
- strcpy(prime_types[i+4], "VERB");
-
- strcpy(prime_types[i+5], "TIME");
- strcpy(prime_types[i+6], "TIME");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "VERBPHRASE");
- strcpy(phrases[i+4], "VERBPHRASE");
- strcpy(phrases[i+5], "TIMEPHRASE");
- strcpy(phrases[i+6], "TIMEPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]);
- strcat(auxiliaries[sentence], " ");
- strcat(auxiliaries[sentence], word_array[i+2]);
- strcat(auxiliaries[sentence], " ");
- strcat(auxiliaries[sentence], word_array[i+3]);
- get_aux();
- return(0);
- }
-
- /* Structure NAME-AUX-AUX-VERB-PREP-TIME-TIME */
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("AUX", i+2) == 0) &&
- (check_type("VERB", i+3) == 0) &&
- (check_type("PREP", i+4) == 0) &&
- (check_type("TIME", i+5) == 0) &&
- (check_type("TIME", i+6) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "AUX");
- strcpy(prime_types[i+3], "VERB");
- strcpy(prime_types[i+4], "TIME");
- strcpy(prime_types[i+5], "TIME");
- strcpy(prime_types[i+6], "TIME");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "VERBPHRASE");
- strcpy(phrases[i+4], "TIMEPHRASE");
- strcpy(phrases[i+5], "TIMEPHRASE");
- strcpy(phrases[i+6], "TIMEPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]);
- strcat(auxiliaries[sentence], " ");
- strcat(auxiliaries[sentence], word_array[i+2]);
- get_aux();
- return(0);
- }
-
- /* Structure NAME-AUX-AUX-VERB-PREP-TIME */
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("AUX", i+2) == 0) &&
- (check_type("VERB", i+3) == 0) &&
- (check_type("PREP", i+4) == 0) &&
- (check_type("TIME", i+5) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "AUX");
- strcpy(prime_types[i+3], "VERB");
- strcpy(prime_types[i+4], "TIME");
-
- strcpy(prime_types[i+5], "TIME");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "VERBPHRASE");
- strcpy(phrases[i+4]. "TIMEPHRASE");
- strcpy(phrases[i+5], "TIMEPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]);
- strcat(auxiliaries[sentence], " ");
- strcat(auxiliaries[sentence], word_array[i+2]);
- get_aux();
- return(0);
- }
-
- /* Structure NAME-AUX-AUX-VERB-TIME-TIME */
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("AUX", i+2) == 0) &&
- (check_type("VERB", i+3) == 0) &&
- (check_type("TIME", i+4) == 0) &&
- (check_type("TIME", i+5) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "AUX");
- strcpy(prime_types[i+3], "VERB");
- strcpy(prime_types[i+4], "TIME");
- strcpy(prime_types[i+5], "TIME");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "VERBPHRASE");
- strcpy(phrases[i+4], "TIMEPHRASE");
- strcpy(phrases[i+5], "TIMEPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]);
- strcat(auxiliaries[sentence], " ");
- strcat(auxiliaries[sentence], word_array[i+2]);
- get_aux();
- return(0):
- }
-
- /* Structure NAME-AUX-VERB-PREP-TIME-TIME */
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("VERB", i+2) == 0) &&
- (check_type("PREP", i+3) == 0) &&
- (check_type("TIME", i+4) == 0) &&
- (check_type("TIME", i+5) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "VERB");
- strcpy(prime_types[i+3], "TIME");
- strcpy(prime_types[i+4], "TIME");
- strcpy(prime_types[i+5], "TIME");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "TIMEPHRASE");
- strcpy(phrases[i+4], "TIMEPHRASE");
- strcpy(phrases[i+5], "TIMEPHRASE");
-
- strcpy(auxiliaries[sentence], word_array[i+1]);
- get_aux();
- return(0);
- }
-
- /* Structure NAME-AUX-VERB-PREP-TIME */
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("VERB", i+2) == 0) &&
- (check_type("PREP", i+3) == 0) &&
- (check_type("TIME", i+4) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "VERB");
- strcpy(prime_types[i+3], "TIME");
- strcpy(prime_types[i+4], "TIME");
- strcpy(phrases[i], "NOUNPHRASE");
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "TIMEPHRASE");
- strcpy(phrases[i+4], "TIMEPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]):
- get_aux();
- return(0);
- }
-
- /* Structure NAME-AUX-VERB-TIME-TIME */
- if ( (check_type("NAME", i) == 0) &&
- (check_type("AUX", i+1) == 0) &&
- (check_type("VERB", i+2) == 0) &&
- (check_type("TIME", i+3) == 0) &&
- (check_type("TIME", i+4) == 0) ) {
- strcpy(prime_types[i], "NAME");
- strcpy(prime_types[i+1], "AUX");
- strcpy(prime_types[i+2], "VERB");
- strcpy(prime_types[i+3], "TIME");
- strcpy(prime_types[i+4], "TIME");
- strcpy(phrases[i], "NOUNPHRASE"):
- strcpy(phrases[i+1], "VERBPHRASE");
- strcpy(phrases[i+2], "VERBPHRASE");
- strcpy(phrases[i+3], "TIMEPHRASE");
- strcpy(phrases[i+4], "TIMEPHRASE");
- strcpy(auxiliaries[sentence], word_array[i+1]);
- get_aux();
- return(0);
- }
- return(1);
- }
-
- /*****************************************************/
- /* If the phrase is a "TIMEPHRASE", then all the */
- /* words in the phrase refer to a time. Concatenate */
- /* these words to the times array. */
- /*****************************************************/
- void check_time()
- {
- int i;
- for (i=0; i<word_ct; i++) {
- if (strcmp(phrases[i], "TIMEPHRASE") == 0) {
-
- if (strlen(times[sentence]) > 0)
- strcat(times[sentence], " ");
- strcat(times[sentence], word_array[i]);
- }
- }
- return;
- }
-
- /*****************************************************/
- /* Generate a response with information from a */
- /* matching, previous sentence. */
- /*****************************************************/
- void_make_response()
- {
- int i;
-
- if (check_agreement() != 0) return;
-
- if (strcmpi(word_array[0], "where") != 0) {
- if (ask_meaning() != 0) aux_meaning_response();
- return;
- }
-
- /***************************************************/
- /* Match subject, action, tense, and meaning. */
- /***************************************************/
- for (i=sentence-1; i>=0; i--) {
- if ((strcmpi(subjects[i],subjects[sentence])==0) &&
- (strcmpi(actions[i], actions[sentence]) ==0) &&
- (strlen(places[i]) > 0) &&
- (tenses[i] == tenses[sentence]) &&
- (strpbrk(aux_meaning[i],aux_meaning[sentence])
- != NULL)) {
- make_answer(i);
- return;
- }
- }
- /***************************************************/
- /* Match subject, action, and tense. */
- /***************************************************/
- for (i=sentence-1; i>=0; i--) {
- if ((strcmpi(subjects[i],subjects[sentence])==0) &&
- (strcmpi(actions[i], actions[sentence]) ==0) &&
- (strlen(places[i]) > 0) &&
- (tenses[i] == tenses[sentence])) {
- make_answer(i);
- return;
- }
- }
-
- /***************************************************/
- /* Match subject, action, and meaning. */
- /***************************************************/
- for (i=sentence-1; i>=0; i--) {
- if ((strcmpi(subjects[i],subjects[sentence])==0) &&
- (strcmpi(actions[i], actions[sentence]) ==0) &&
- (strlen(places[i]) > 0) &&
- (strpbrk(aux_meaning[i],aux_meaning[sentence])
- != NULL)) {
-
- strcpy(response, "I'm not sure, but ");
- make_answer(i);
- return;
- }
- }
- /***************************************************/
- /* Match subject and action. */
- /***************************************************/
- for (i=sentence-1; i>=0; i--) {
- if ((strcmpi(subjects[i],subjects[sentence])==0) &&
- (strcmpi(actions[i], actions[sentence]) ==0) &&
- (strlen(places[i]) > 0)) {
- strcpy(response, "I'm not sure, but ");
- make_answer(i);
- return;
- }
- }
- strcpy(response, "I don't know");
- return;
- }
-
- /*****************************************************/
- /* Obtain a clear auxiliary meaning from an */
- /* ambiguous meaning. */
- /*****************************************************/
-
- void derive_aux_meaning()
- {
- if (strlen(auxiliaries[sentence]) == 0) {
- if (tenses[sentence] == PAST)
- aux_meaning[sentence][0] =
- PARTICULAR_POINT_OF_TIME;
- if (tenses[sentence] == PRESENT)
- aux_meaning[sentence][0]= LIMITED_DURATION;
- if (tenses[sentence] == FUTURE)
- aux_meaning[sentence][0]= FIXED_PLAN;
- }
-
- if (strcmpi(auxiliaries[sentence], "is") == 0) {
- if (tenses[sentence] == PRESENT) {
- memset(aux_meaning[sentence], '\0', 5);
- aux_meaning[sentence][0]= LIMITED_DURATION;
- }
- if (tenses[sentence] == FUTURE) {
- memset(aux_meaning[sentence], '\0', 5);
- aux_meaning[sentence][0]= FIXED_PLAN;
- }
- }
-
- if ( ((strcmpi(auxiliaries[sentence],
- "could" == 0)
- (strcmpi(auxiliaries[sentence],
- "could be") == 0)) &&
- (tenses[sentence] == FUTURE)) {
- memset(aux_meaning[sentence], '\0', 5);
- aux_meaning[sentence][0]= POSSIBILITY;
- }
-
- if ( ((strcmpi(auxiliaries[sentence],
-
- "may be") == 0)
- (strcmpi(auxiliaries[sentence],
- "may have") == 0)
- (strcmpi(auxiliaries[sentence],
- "may have been") == 0)) &&
- ((tenses[sentence] == PAST)
- (tenses[sentence] == PRESENT)) ) {
- memset(aux_meaning[sentence], '\0', 5);
- aux_meaning[sentence][0] = POSSIBILITY;
- }
-
- if ( ((strcmpi(auxiliaries[sentence],
- "would") == 0)
- (strcmpi(auxiliaries[sentence],
- "would have") ==0)
- (strcmpi(auxiliaries[sentence],
- "would have been") == 0)) &&
- (tenses[sentence] == PAST)) {
- memset(aux_meaning[sentence], '\0', 5);
- aux-meaning[sentence][0] = CHARACTERISTIC;
- }
-
- if (strcmpi(auxiliaries[sentence],
- "would be") == 0) {
- memset(aux_meaning[sentence], '\0', 5);
- aux_meaning[sentence][0] = PROBABILITY;
- }
-
- if (strcmpi(auxiliaries[sentence],
- "must have been") == 0) {
- memset(aux_meaning[sentence], '\0', 5);
- aux_meaning[sentence][0]= LOGICAL_NECESSITY;
- }
-
- if ((strcmpi(auxiliaries[sentence], "must") == 0)
- (strcmpi(auxiliaries[sentence],
- "must be") == 0)) {
- memset(aux_meaning[sentence], '\0', 5);
- aux_meaning[sentence][0] = OBLIGATION;
- }
-
- return;
- }
-
- /*****************************************************/
- /* Obtain the meaning of time words in the sentence. */
- /*****************************************************/
- void derive_time_meaning()
- {
- int i;
- for (i=0; i<word_ct; i++) {
- if (strcmp(phrases[i],"TIMEPHRASE") == 0) {
- if (strcmpi(word_array[i], "at") == 0)
- time_meaning[sentence] = POINT_OF_TIME;
- if (strcmpi(word_array[i], "on") == 0)
- time_meaning[sentence] = DAY;
- if (strcmpi(word_array[i], "in") == 0)
- time_meaning[sentence] = PERIOD_OF_TIME;
- if (strcmpi(word_array[i], "for") == 0)
-
- time_meaning[sentence] = DURATION;
- if (strcmpi(word_array[i], "before") == 0)
- time_meaning[sentence] = BEFORE;
- if (strcmpi(word_array[i], "after") == 0)
- time_meaning[sentence] = AFTER;
- if (strcmpi(word_array[i], "since") == 0)
- time_meaning[sentence] = SINCE;
- if (strcmpi(word_array[i], "until") == 0)
- time_meaning[sentence] = UNTIL;
- if (strcmpi(word_array[i], "between") == 0)
- time_meaning[sentence] = BETWEEN;
- if (strcmpi(word_array[i], "by") == 0)
- time_meaning[sentence] = BEFORE;
- if (strcmpi(word_array[i], "up to") == 0)
- time_meaning[sentence] = UNTIL;
- if (strcmpi(word_array[i], "last") == 0)
- time_meaning[sentence] = LAST;
- if (strcmpi(word_array[i], "next") == 0)
- time_meaning[sentence] = NEXT;
- if (strcmpi(word_array[i], "each") == 0)
- time_meaning[sentence] = EACH;
- if (strcmpi(word_array[i], "every") == 0)
- time_meaning[sentence] = EVERY;
- if (strcmpi(word_array[i], "all") == 0)
- time_meaning[sentence] = ALL;
- break;
- }
- }
- return;
- }
-
- /****************************************************/
- /* Create a response that asks for more specific */
- /* meaning. */
- /****************************************************/
- int ask_meaning()
- {
- if ((strcmpi(auxiliaries[sentence],
- "can") == 0)
- (strcmpi(auxiliaries[sentence],
- "can be") == 0)
- (strcmpi(auxiliaries[sentence],
- "can have") == 0)) {
- strcpy(response, "Do you mean\n\t\t");
- subjects[sentence][0] =
- (char) toupper(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
-
- if (tenses[sentence] == PRESENT) {
- if (numbers[sentence] == SINGULAR)
- strcat(response, " is");
- else strcat(response, "are");
- }
-
- if (tenses[sentence] == FUTURE)
- strcat(response, "will be");
-
- strcat(response, " able to ");
- strcat(response, actions[sentence]);
-
- strcat(response, " or,\n\t\t");
- strcat(response, subjects[sentence]);
-
- if (tenses[sentence] == PRESENT) {
- if (numbers[sentence] == SINGULAR)
- strcat(response, " is");
- else strcat(response, "are");
- }
-
- if (tenses[sentence] == FUTURE)
- strcat(response, "will be");
-
- strcat(response, " permitted to ");
- strcat(response, actions[sentence]);
- return(0);
- }
-
- if ( ((strcmpi(auxiliaries[sentence],
- "could") == 0)
- (strcmpi(auxiliaries[sentence],
- "could be") == 0)
- (strcmpi(auxiliaries[sentence],
- "could have") == 0)
- (strcmpi(auxiliaries[sentence],
- "could have been") == 0)) &&
- ((tenses[sentence] == PAST)
- (tenses[sentence] == PRESENT)) ) {
-
- strcpy(response, "Do you mean\n\t\t");
- subjects[sentence][0] =
- (char) toupper(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
-
- if (tenses[sentence] == PAST) {
- if (numbers[sentence] == SINGULAR)
- strcat(response, " was");
- else strcat(response, " were");
- }
-
- if (tenses[sentence] == PRESENT) {
- if (numbers[sentence] == SINGULAR)
- strcat(response, " is");
- else strcat(response, "are");
- }
-
- strcat(response, " able to ");
- strcat(response, actions[sentence]);
- strcat(response, " or,\n\t\t");
- strcat(response, subjects[sentence]);
-
- if (tenses[sentence] == PAST) {
- if (numbers[sentence] == SINGULAR)
- strcat(response, " was");
- else strcat(response, " were");
- }
-
- if (tenses[sentence] == PRESENT) {
- if (numbers[sentence] == SINGULAR)
- strcat(response, " is");
-
- else strcat(response, " are");
- }
-
- strcat(response, " permitted to ");
- strcat(response, actions[sentence]);
- return(0);
- }
-
- if ( ((strcmpi(auxiliaries[sentence],
- "may") == 0)
- (strcmpi(auxiliaries[sentence],
- "may be") == 0)) &&
- (tenses[sentence] == FUTURE) ) {
-
- strcpy(response, "Do you mean\n\t\t");
- subjects[sentence][0] =
- (char) toupper(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
- strcat[response, " will be permitted to ");
- strcat(response, actions[sentence]);
- strcat(response, " or,\n\t\t");
- strcat(response, subjects[sentence]);
- strcat(response, " will possibly ");
- strcat(response, actions[sentence]);
- return(0);
- }
-
- if ((strcmpi(auxiliaries[sentence], "will") == 0 )
- (strcmpi(auxiliaries[sentence],
- "will be") == 0)) {
-
- strcpy(response, "Do you mean\n\t\t");
- subjects[sentence][0] =
- (char) toupper(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
-
- if (numbers[sentence] == SINGULAR)
- strcat(response, " is");
- else strcat(response, " are");
-
- strcat(response, " willing to ");
- strcat(response, actions[sentence]);
- strcat(response, " or,\n\t\t");
- strcat(response, subjects[sentence]);
-
- if (numbers[sentence] == SINGULAR)
- strcat(response, " is");
- else strcat(response, " are");
-
- strcat[response, " intending to ");
- strcat(response, actions[sentence]);
- strcat(response, " or,\n\t\t");
-
- strcat(response, "You insist ");
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- (char) tolower(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
- strcat(response, " will ");
-
- strcat(response, actions[sentence]);
- strcat(response, " or,\n\t\t");
-
- strcat(response, "You predict ");
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- (char) tolower(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
- strcat(response, " will ");
- strcat(response, actions[sentence]);
- return(0);
- }
- return(1);
- }
-
- /****************************************************/
- /* Create a response based on the meaning of the */
- /* auxiliary in the sentence. */
- /****************************************************/
- void aux_meaning_response()
- {
- switch (aux_meaning[sentence][0]) {
- case LIMITED_DURATION:
- strcpy(response, "When will ");
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- tolower(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
- strcat(response, " stop ");
- get_verb(tenses[sentence],
- numbers[sentence], 'I');
- break;
-
- case PARTICULAR_POINT_OF_TIME:
- strcpy(response, "When did ");
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- tolower(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
- strcat(response, " ");
- strcat(response, actions[sentence]);
- break;
-
- case UP_TO_PRESENT:
- strcpy(response, "Will ");
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- tolower(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
- strcat(response, " continue ");
- get_verb(tenses[sentence],
- numbers[sentence], 'I');
- break;
-
- case NOT_COMPLETED:
- strcpy(response, "When did ");
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- tolower(subjects[sentence][0]);
-
- strcat(response, subjects[sentence]);
- strcat(response, "stop ");
- get_verb(tenses[sentence],
- numbers[sentence], 'I');
- break;
-
- case POSSIBILITY:
- if (tenses[sentence] == PAST)
- strcat(response, "Was this ");
- if (tenses[sentence] == PRESENT)
- strcat(response, "Is this ");
- if (tenses[sentence] == FUTURE)
- strcat(response, "Will this be ");
- strcat(response, "highly possible");
- break;
-
- case PROBABILITY:
- strcpy(response,
- "Will this be highly probable");
- break;
-
- case OBLIGATION:
- strcpy(response, "What ");
-
- if (tenses[sentence] == PAST)
- strcat(response, "happened when ");
- if (tenses[sentence] == FUTURE)
- strcat(response, "will happen if ");
-
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- tolower(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
-
- if (tenses[sentence] == PAST)
- strcat(response, " didn't ");
- if (tenses[sentence] == FUTURE)
- strcat(response, " doesn't ");
-
- strcat(response, actions[sentence]);
- break;
-
- case CHARACTERISTIC:
- strcpy(response, "How often did ");
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- tolower(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
- strcat(response, " ");
- strcat(response, actions[sentence]);
- break;
-
- case LOGICAL_NECESSITY:
- strcpy(response,
- "Why was it necessary that ");
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- tolower(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
-
- if (numbers[sentence] == SINGULAR)
- strcat(response, " was ");
- else strcat(response, " were ");
-
- get_verb(tenses[sentence],
- numbers[sentence], 'I');
- break;
-
- case FIXED_PLAN:
- strcpy(response, "Will ");
- if (subjects_type[sentence] == PRONOUN)
- subjects[sentence][0] = (char)
- tolower(subjects[sentence][0]);
- strcat(response, subjects[sentence]);
- strcat(response, " be ready to ");
- strcat(response, actions[sentence]);
- break;
-
- default: /* all others */
- strcpy(response, "OK");
- break;
- }
- }
-
- /****************************************************/
- /* Determine if the input sentence words conflict. */
- /****************************************************/
- int check_agreement()
- {
- if (time_meaning[sentence] == LAST) {
- if ( (tenses[sentence] == PRESENT)
- (tenses[sentence] == FUTURE) ) {
- agreement_error(TIME_MEANING_ERROR);
- return(1);
- }
- }
-
- if (time_meaning[sentence] == NEXT) {
- if (tenses[sentence] == PAST) {
- agreement_error(TIME_MEANING_ERROR);
- return(1);
- }
- }
-
- if (numbers[sentence] == UNKNOWN) {
- agreement_error(NUMBER_ERROR);
- return(1);
- }
-
- if (tenses[sentence] == UNKNOWN) {
- agreement_error(TENSES_ERROR);
- return(1);
- }
-
- return(0);
- }
-
- /****************************************************/
- /* Generate a response that explains the time */
-
- /* agreement error. */
- /****************************************************/
- void agreement_error(int error_type)
- {
- int i;
- if (error_type == TIME_MEANING_ERROR) {
- strcpy(response, "I don't understand, '");
- strcat(response, auxiliaries[sentence]);
- strcat(response, "' means ");
- for (i=0; (aux_tense[i] != ' ') &&
- (aux_tense[i]); i++) {
- if (i > 0) strcat(response, " or ");
- if (aux_tense[i] == PAST)
- strcat(response, "past");
- if (aux_tense[i] == PRESENT)
- strcat(response, "present");
- if (aux_tense[i] == FUTURE)
- strcat(response, "future");
- }
-
- strcat(response, " time, but '");
- strcat(response, times[sentence]);
- strcat(response, "' means ");
- if (time_meaning[sentence] == LAST)
- strcat(response, "past");
- if (time_meaning[sentence] == NEXT)
- strcat(response, "future");
- strcat(response, " time ");
- }
-
- if (error_type == NUMBER_ERROR) {
- strcpy(response, "I don't understand, '");
- strcat(response, auxiliaries[sentence]);
- strcat(response, "', '");
- strcat(response, subjects[sentence]);
- strcat(response, "', and that form of '");
- strcat(response, actions[sentence]);
- strcat(response, "' don't agree in number");
- }
-
- if (error_type == TENSES_ERROR) {
- strcpy(response, "I don't understand, '");
- strcat(response, auxiliaries[sentence]);
- strcat(response, "' and that form of '");
- strcat(response, actions[sentence]);
- strcat(response, "' aren't used together");
- }
-
- return;
- }
-
- /* End of file */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Creating Spin Controls for Windows
-
-
- Keith E. Bugg
-
-
- Keith Bugg is a principal at Tristar Systems, Inc., which provides software
- consultants and technical training for Windows and DEC platforms. Direct
- queries to CompuServe ID 72203,3612.
-
-
- Users of MS-Windows programs often need a convenient way to input a value that
- falls within a certain range. For example, users of a process control system
- may need to set the temperature of a vessel within a range of 100 to 200 F.
- Because of their simplicity and intuitive feel, spin controls often provide
- the best user interface in such applications; a spin control is one that
- allows the user to change an input value by clicking on an up or down arrow
- (or other appropriate icon). Clicking the up arrow increases the value;
- clicking the down arrow decreases it. As the arrows are clicked, the new value
- is displayed to the user via an edit box control and/or a graphical image
- (e.g., the temperature could be displayed as a thermometer).
- In this article, I present a simple implementation of spin controls in C++. A
- sample program accompanies the text and should serve as a jumping-off point
- for future exploration. The sample program's spin controls allow the user to
- specify a date within a given range. The program demonstrates one of the
- benefits of spin controls: they can prevent the user from entering invalid
- inputs. In this example, spin controls keep the user from entering an invalid
- date (including leap years).
-
-
- Creating Spin Controls
-
-
- From a Windows programming perspective, a spin control consists of three
- fundamental parts: a read-only text box for displaying the current value, an
- up arrow control for increasing the value, and a down arrow control for
- decreasing the value. These controls can be placed on the screen in either a
- vertical or horizontal orientation. Figure 1 illustrates some typical spin
- controls. (The reason the text box is marked read-only is to prevent the user
- from clicking in the box and directly entering a value.) The controls are
- implemented as bit maps. Most Windows development kits supply a plethora of
- bit maps, among which are the familiar up/down arrows used here. By making a
- connection between an object (the edit box) and an event (a mouse click within
- the outlines of a bit map) you create a spin control.
-
-
- Making the Connection
-
-
- In essence, spin control software is simply a matter of managing rectangles on
- the screen. The connections between the edit box and its controls are
- established by the following steps:
- 1. Load into memory the control bit maps using LoadBitmap; note sizes.
- 2. Create a memory Device Context (DC) via CreateCompatibleDC.
- 3. Select bit maps into memory DC via SelectObject;
- 4. Copy bitmaps using BitBlt to the output DC; note their positions.
- 5. Trap left mouse click event; test if cursor was inside bit map.
- 6. Process accordingly; update edit box as necessary.
- Save the bit map sizes and positions and map these into RECT objects; the API
- function PtInRect can then indicate if the mouse scores a "hit."
-
-
- The Sample Program
-
-
- The sample Windows program demonstrates the preceding actions in more detail.
- This program builds a main window with a menu bar containing one option. When
- selected, the option invokes a dialog box which is associated with the date
- processor (see Figure 2). The function of most interest is DateDialog; the
- other stuff is mostly the grunt work to build the window, menu, etc. When the
- dialog box is initialized, the program creates edit boxes as child windows and
- maps them to the dialog box; these edit boxes show the current day and year,
- respectively. The program creates a combobox for the month and populates it
- with the names of the twelve months. (The months are not sorted alphabetically
- because the index into the combobox equals the numerical month; thus January
- is 0, February is 1, etc.) Next, the program loads the bit maps for the up and
- down arrows and saves each bit map's width and height in variables. The
- program then creates a memory device context by calling CreateCompatibleDC.
- The date processor is initialized with a default date of 15-Jan-1993. Finally,
- the program defines the bit maps as rectangles, and maps them to the screen.
- (These rectangles have been hard-coded for simplicity's sake.)
- The Windows messages of major interest here are WM_PAINT, WM_LBUTTONDOWN, and
- WM_RBUTTONDOWN. WM_PAINT causes the current value of the date to be displayed,
- while the button messages effect changes in the date values. The program
- responds to WM_PAINT by selecting the bit map objects (both the up and down
- arrows, in sequence) into the memory DC created earlier; these objects are
- then copied to the screen's DC via the BitBlt function.
- The message handlers for the buttons are much more involved, but they have
- more to do. First, the handlers record the position of the mouse cursor at the
- time of the click. The program calls the Windows API function PtInRect to
- determine if the cursor is in one of the rectangles occupied by a bit map. If
- the cursor lies within a bit map region, the current date value is incremented
- or decremented. Range checking is also enforced here; note the usual nasty
- code that checks for leap years.
- The right button down message handler requires even more code, since it
- supports year "slewing." Slewing refers to the ability to update the value
- continuously by holding the mouse button down while in one of the bit maps.
- The slewing works as follows: once the program know the cursor is in one of
- the year control arrows, it starts another loop. This new loop calls
- PeekMessage to see what's in the pipeline. This loop immediately gets the
- mouse position -- if the user has moved out of the rectangle, the year stops
- slewing (even if the button is still down). Also, if the program finds a
- WM_RBUTTONUP message, the loop breaks and the year stops slewing.
- The rest of the program is concerned with housekeeping chores and exiting.
- Most noteworthy in this regard are the DeleteDC and DeleteObject functions.
- These functions free up the memory locked for loading the bit maps and the
- memory DC.
- I built this program with Borland's Turbo C++ v 3.1, but it should port easily
- to most Windows-SDK-type platforms. To build this program, you will need to
- create your own MAKE file and include SPIN.C, SPIN.RC, and SPIN.DEF. These
- correspond to Listing 1, Listing 2 and Listing 3, respectively.
-
-
- Ups and Downs of Spin Controls
-
-
- As a Windows programmer, spin controls can save you a lot of work while
- increasing software reliability. This reliability results from a simple
- property of spin controls -- values are selected, not entered. This property
- allows your program to easily enforce ranges, data typing, defaulting, and
- other ammenities.
- Spin controls can also have a down side. For starters, the target platform
- must have a mouse. Writing a keyboard interface is clearly possible, but
- messy. Also, the value being updated should span a relatively narrow range. It
- would be poor design to force the user to click through 1000 values, for
- example. A slider control would be more appropriate for values with a wide
- range.
-
-
- Final Remarks
-
-
- For certain Windows programs, spin controls can provide a very user-friendly
- control that pays extra dividends by way of increasing quality and
- reliability. Likely candidates are process control systems, multi-media
- applications (e.g., volume controls), and data base front ends. The
- construction of spin controls by the technique described in this article lends
- itself quite readily to object orientation, or at least to a DLL.
- Spin controls are widely available from Microsoft and other vendors in the
- guise of Visual Basic Custom Controls (VBXs). I've found these to be reliable
- and cost-effective; if you do a lot of development work, you may be better off
- to purchase a VBX. One caveat: VBXs will not be supported under Windows NT (in
- their current form, at least), so pick your battles carefully. You might want
- to hedge your bet and encapsulate the concepts of the sample program into a
- C++ class for spin controls.
-
-
-
- Bibliography
-
-
- The algorithm for the leap year calculations is from The C Programming
- Language by Kernighan & Ritchie, 2nd. edition.
- Figure I Typical spin controls
- Figure 2 Spin controls in a dialog box
-
- Listing 1 A demo program that implements spin controls
- // **************************************************
- // * NAME = SPIN.C *
- // * DEV. PLATFORM= Turbo C++,v3.1 for Windows *
- // * MEMORY MODEL = SMALL *
- // * Demo. program for creating/using spin controls.*
- // * Keith E. Bugg, TRISTAR SYSTEMS, Inc *
- // **************************************************
- #define OEMRESOURCE // MUST define BEFORE <windows.h>
- #define DAY_CTRL 100// Ctrl IDs for day editbox ctrl.
- #define MONTH_CTRL 200 // ID for month combobox ctrl.
- #define YEAR_CTRL 300 // ID for year editbox ctrl.
- #include <windows.h>// must have Windows header file
- #include <stdio.h> // get basic prototypes
- #include <stdlib.h> // for 'itoa()', etc.
-
- /* declare prototypes here */
- long FAR PASCAL _export SpinProc(HWND, unsigned,
- WORD, LONG);
- BOOL FAR PASCAL DateDialog(HWND ,WORD,WORD ,LONG );
- //
- // GLOBAL VARIABLES HERE
- //
- HDC hDC; // handle to output Display Context
- int day,month,year;
- char *months[]= {"January","Febuary","March","April",
- "May","June","July","August","September","October",
- "November" ,"December"};
- int max_days[]= {31,28,31,30,31,30,31,31.30,31,30,31};
- char dayval[3],yearval[5];
- // **************************************************
- // end var. decl., etc. Now do message handler for
- // the main window of the spin control demo program
- // **************************************************
- //
- long FAR PASCAL_export SpinProc(HWND hWnd,
- unsigned message,WORD wParam, LONG lParam)
- {
- PAINTSTRUCT ps; // Paint Struct for BeginPaint call
- int i; // general purpose variable
- HINSTANCE hInstance;
- FARPROC lpfnDlgProc;
- // --------- end local variables ---------------
- //
- switch (message) // process messages here
- {
-
- case WM_CLOSE: // Exit via system menu
- MessageBeep(0); // Warning beep
- i= MessageBox(hWnd,
- "Are you sure you want to Exit?",
- "EXIT",MB_OKCANCEL MB_ICONEXCLAMATION);
-
- if(i == IDOK) // really wants to exit
- { //queue up a QUIT msg
- PostMessage(hWnd,WM_QUIT,0,0);
- return 0L;
- }
- break;
-
- case WM_COMMAND: // check for system message
- switch(wParam)
- {
- case SC_MINIMIZE: // on minimize
- ShowWindow(hWnd,SW_SHOWMINIMIZED);
- break;
-
- case SC_MAXIMIZE: // on maximize
- MessageBeep(0); //stub
- ShowWindow(hWnd,SW_SHOWMAXIMIZED);
- break;
-
- case SC_RESTORE: // on restore
- ShowWindow(hWnd,SW_SHOW);
- break;
- //
- // here is user clicks the sub-menu
- // option "Run Demo" from drop-down
- case 100: // Run Demo menu option
- hInstance = GetWindowWord(hWnd,
- GWW_HINSTANCE);
- lpfnDlgProc= MakeProcInstance(
- DateDialog,hInstance);
- DialogBox(hInstance,"DIALOG_1",
- hWnd,lpfnDlgProc);
- break;
-
- default:
- break;
- }
- break;
-
- case WM_QUIT: // QUIT & DESTROY messages
- case WM_DESTROY:
- return (0L);
-
- default: // message is of no interest
- return (DefWindowProc(hWnd, message,
- wParam, lParam));
- }
- return (NULL);
- } /* end SpinProc() */
-
- #pragma argsused // TC++ compiler directive
- //
- // next comes WinMain, which builds demo window
- //
- // ****************************************************
- int PASCAL WinMain(HINSTANCE hInstance, HINSTANCE
- hPrevInstance,LPSTR lpCmdLine, int nCmdShow)
- {
- MSG msg; // message
-
- WNDCLASS wc;
- static HWND hWnd;
- //
- //------------------------------------------------
- //
-
- if (!hPrevInstance) // Other instances running?
- {
- // Fill in window class structure with
- // parameters that describe the main window.
- //
- wc.style = CS_HREDRAW CS_VREDRAW;
- wc.lpfnWndProc=(long(FAR PASCAL*)())SpinProc;
- wc.cbClsExtra = 0; // No per-class extra data.
- wc.cbWndExtra = 0; // No per-window extra data.
- wc.hInstance = hInstance;
- wc.hIcon = LoadIcon(hInstance, "spin");;
- wc.hCursor = LoadCursor(NULL, IDC_ARROW);
- wc.hbrBackground = CreateSolidBrush(
- RGB(100,255,255)); // cyan background
- wc.lpszMenuName = "MAIN_MENU";
- wc.lpszClassName = "SPIN";
-
- // Register the window class and return
- // success/failure code.
-
- if(!RegisterClass(&wc))
- return 0;
-
- // Create main window for application instance.
-
- hWnd = CreateWindow(
- "SPIN", // See RegisterClass() call.
- "Spin Controls",// Text for title bar.
- WS_MAXIMIZE WS_SYSMENU WS_MINIMIZEBOX
- WS_MAXIMIZEBOX WS_THICKFRAME
- WS_OVERLAPPEDWINDOW,// Window's styles..
- 0, // horizontal position.
- 0, // vertical position.
- 600, // Default width.
- 400, // Default height.
- NULL, // no parent.
- NULL, // No menu bar (later)
- hInstance, // inst. owns window.
- NULL // Pointer not needed.
- );
- //
- // If CreateWindow failed, return "failure" */
- if (!hWnd)
- {
- MessageBeep(0);
- MessageBox(hWnd,"Could not create window!",
- "ERROR",MB_OK);
- return 0;
- }
-
- // Make window visible; update its client area
-
- nCmdShow= SW_SHOWNORMAL; // Show normal size
-
- ShowWindow(hWnd,nCmdShow ); // Show the window
- UpdateWindow(hWnd); // Send WM_PAINT
-
- }
- // Acquire and dispatch messages until a
- // WM_QUIT message is received.
-
- while (GetMessage(&msg, // message structure
- NULL, // window handle rec. msg
- NULL, // lowest msg. to examine
- NULL)) // highest msg. to examine
- {
- TranslateMessage(&msg); // Translates msgs.
- DispatchMessage(&msg); // Dispatches msgs.
- }
- return (msg.wParam); // Returns the value
- // from PostQuitMessage
- } // end WinMain()
-
- BOOL FAR PASCAL DateDialog(HWND hDlg,WORD wMessage,
- WORD wParam, LONG lParam)
- {
- static HWND hDay,hCombo,hYear;
- HANDLE hInstance; // app. instance
- //
- // decl. rectangles for bit maps
- //
- static RECT day_up,day_down,year_up, year_down;
- static HDC hMemDC; // handle to memory DC
- static BITMAP bm; // for loading bit maps
- static HBITMAP hUp,hDown; // bit map handles
- MSG msg; // message struct. for slewing
- int k, mon_max; // mon_max is max # days in month
- static int bmWidth, bmHeight; // bit map size
- POINT pt; // for finding mouse click in rects.
- //
- switch(wMessage)
- {
- // initialize dialog box here
- case WM_INITDIALOG:
- hInstance = GetWindowWord(hDlg,
- GWW_HINSTANCE);
- //
- // create Day edit box as child window
- //
- hDay = CreateWindow("EDIT",NULL,ES_LEFT
- ES_READONLY WS_CHILD WS_BORDER
- WS_VISIBLE WS_TABSTOP,40,55,50,
- 25,hDlg,DAY_CTRL,hInstance,NULL);
- //
- // create Month combobox as child window
- //
- hCombo = CreateWindow("COMBOBOX",NULL,
- WS_CHILD WS_VSCROLL WS_TABSTOP
- CBS_DROPDOWN CBS_HASSTRINGS,
- 175, 55, 120, 150,hDlg,MONTH_CTRL,
- hInstance,NULL);
- //
- // create Year edit box as child window
-
- //
- hYear = CreateWindow("EDIT",NULL,ES_LEFT
- ES_READONLY WS_CHILD WS_VISIBLE
- WS_BORDER WS_TABSTOP,
- 310,55,70,25,hDlg,YEAR_CTRL,
- hInstance,NULL);
- ShowWindow(hYear,SW_SHOWNORMAL);
- //
- // init. combobox with names of months
- //
- for(k=0; k < 12; k++)
- SendMessage(hCombo,CB_ADDSTRING,0,
- (DWORD)(LPSTR)months[k]);
-
- ShowWindow(hCombo,SW_SHOWNORMAL);
- //
- // Load the Microsoft-supplied arrows here
- //
- hUp = LoadBitmap(NULL,OBM_UPARROW);
- hDown = LoadBitmap(NULL,OBM_DNARROW);
- //
- // Get the height & width of the up/down
- // arrows supplied by Microsoft. Use
- // this to set up rectangles upon which
- // the arrows will be "pasted". When
- // user clicks mouse inside one of these
- // rectangles, we change the day/year
- //
- GetObject(hUp, sizeof(BITMAP), &bm);
- // get size of up/down arrow (same)
- bmWidth = bm.bmWidth;
- bmHeight = bm.bmHeight;
- //
- // memory handle to DC; !** IMPORTANT **!
- //
- hMemDC = CreateCompatibleDC(hDC);
- //
- // init. the "Day" and "Year" edit boxes
- // set to date 15-Jan-1993
- //
- day= 15; // set day to 15
- SetDlgItemText(hDlg,DAY_CTRL,"15");
- //
- // do month here
- //
- month= 0; // January
- SendMessage(hCombo,CB_SELECTSTRING,0,
- (DWORD)(LPSTR)months[month]);
- //
- //
- year= 1993; // set year to 1993
-
- SetDlgItemText(hDlg,YEAR_CTRL,"1993");
- //
- // define rect. for the "Day" up arrow
- // ***********************************
- day_up.left=105;
- day_up.top= 56;
- day_up.right= 105+bmWidth;
-
- day_up.bottom= day_up.top+bmHeight;
- //
- // now do down arrow for "Day"
- //
- day_down.left=105;
- day_down.top= 74;
- day_down.right= 105+bmWidth;
- day_down.bottom= day_down.top+bmHeight;
- //
- // define rect. for the "Year" up arrow
- //
- year_up.left= 395;
- year_up.top= 56;
- year_up.right= 395+bmWidth;
- year_up.bottom= year_up.top+bmHeight;
- //
- // do down arrow for "Year"
- //
- year_down.left= 395;
- year_down.top= 74;
- year_down.right= 395+bmWidth;
- year_down.bottom= year_down.top+bmHeight;
-
- return (TRUE);
- //
- // do paint message here
- //
- case WM_PAINT:
- //
- // get handle to display context
- //
- hDC= GetDC(hDlg);
- //
- // now "paste" the up/down arrow bitmap
- // into position. Note: the coordinates
- // are hard-coded, not as parameters
- //
- SelectObject(hMemDC,hUp);
- BitBlt(hDC,105,56,bmWidth,bmHeight,hMemDC,
- 0,0,SRCCOPY); // day up
- BitBlt(hDC,395,56,bmWidth,bmHeight,hMemDC,
- 0,0,SRCCOPY); // year up
- //
- SelectObject(hMemDC,hDown);
- BitBlt(hDC,105,74,bmWidth,bmHeight,hMemDC,
- 0,0,SRCCOPY); // day down
- BitBlt(hDC,395,74,bmWidth,bmHeight,hMemDC,
- 0,0,SRCCOPY); // year down
-
- ReleaseDC(hDlg,hDC); // release DC
- break;
- //
- // end paint message
- //
- case WM_CLOSE:
- //
- // *** Release memory back to Windows ***
- //
- DeleteDC(hMemDC);
-
- DeleteObject(hUp);
- DeleteObject(hDown);
- EndDialog(hDlg,0); // close dialog box
- return (TRUE);
- //
- // user clicked left button here
- //
- case WM_LBUTTONDOWN:
- pt.x = LOWORD(lParam);
- pt.y = HIWORD(lParam);
- //
- // Was click in day up bit map?
- //
- if(PtInRect(&day_up,pt))
- {
- ++day; // user wants to increase day
- //
- // get month, then max days for that
- // month.check for leap year, too
- //
- month = (WORD) SendMessage(hCombo,
- CB_GETCURSEL,0,0L);
- mon_max= max_days[month];
- //
- // check for leap year here
- //
- if(month == 1) // 0=Jan, 1= Feb, etc
- {
- // need to get year
- if((year % 4 == 0 && year % 100
- != 0) year % 400 ==0)
- max_days[1]= 29;// lp yr.
- }
- if(day > mon_max)
- day = 1;
- itoa(day,dayval,10);
- // Update edit box with new day
- SetDlgItemText(hDlg,DAY_CTRL,dayval);
- }
- //
- // see if click in day down arrow
- //
- if(PtInRect(&day_down,pt))
- {
- --day; // user wants to decrease day
- if(day < 1)
- day = 1;
- itoa(day,dayval,10);
- // Update edit box with new day
- SetDlgItemText(hDlg,DAY_CTRL,dayval);
- }
- //
- // see if click in year up arrow
- //
- if(PtInRect(&year_up,pt))
- {
- ++year; // incr. year & check range
- if(year > 2200)
- year = 2200;
-
- itoa(year,yearval,10);
- // Update edit box with new year
- SetDlgItemText(hDlg,YEAR_CTRL,yearval);
- }
- //
- // see if click in year down arrow
- //
- if(PtInRect(&year_down,pt))
- {
- --year; // decr. year & check range
- if(year < 1800)
- year = 1800;
- itoa(year,yearval,10);
- // Update edit box with new year
- SetDlgItemText(hDlg,YEAR_CTRL,yearval);
- }
- break;
-
- case WM_RBUTTONDOWN: // right button slewing
- pt.x = LOWORD(lParam);
- pt.y = HIWORD(lParam);
- //
- // Is cursor in year up rectangle?
- //
- if(PtInRect(&year_up,pt))
- {
- while(1)
- {
- if(PeekMessage(&msg,NULL,0,0,
- PM_REMOVE))
- {
- pt = msg.pt;
- ScreenToClient(hDlg,&pt);
- if(!PtInRect(&year_up,pt))
- { // user moved out of box
- break;
- }
- if(msg.message== WM_RBUTTONUP)
- break;
- else
- {
- TranslateMessage(&msg);
- DispatchMessage(&msg);
- }
- }
- ++year; // increment year
- if(year > 2200) // check range
- year = 1993;// default to 1993
- itoa(year,yearval,10);
- // Update edit box with new year
- SetDlgItemText(hDlg,YEAR_CTRL,
- yearval);
-
- } // end while loop for slewing
-
- } // end if slewed year up
- //
- // now check for slewing year down
- //
-
- if(PtInRect(&year_down,pt))
- {
- while(1)
- {
- if(PeekMessage(&msg,NULL,0,0,
- PM_REMOVE))
- {
- pt = msg.pt;
- ScreenToClient(hDlg,&pt);
- if(!PtInRect(&year_down,pt))
- { // user moved out of box
- break;
- }
- if(msg.message== WN_RBUTTONUP)
- break;
- else
- {
- TranslateMessage(&msg);
- DispatchMessage(&msg);
- }
- }
- --year; // decrement year
- if(year < 1800) // check range
- year = 1800;
- itoa(year,yearval,10);
- // Update edit box with new year
- SetDlgItemText(hDlg,YEAR_CTRL,
- yearval);
- } // end while loop for slewing down
- } // end if year down was slewed
-
- break;
- } // end switch on messages
- return (FALSE);
- } // end DateDialog()
- //
- // end spin.c
- //
- // End of File
-
-
- Listing 2 Resource file for spin control program
- // Name = SPIN.RC
- // Resource file for spin control demo program.
- // **************************************************
- //
- // build dialog box for date processor w/spin control
- //
- DIALOG_1 DIALOG 18, 18, 231, 127
- STYLE DS_MODALFRAME WS_POPUP WS_CAPTION
- WS_SYSMENU
- CAPTION "DATE PROCESSOR"
- BEGIN
-
- LTEXT "Day", -1, 20, 15, 16, 8, WS_CHILD
- WS_VISIBLE WS_GROUP
- LTEXT "Year", -1, 170, 15, 16, 8, WS_CHILD
- WS_VISIBLE WS_GROUP
- LTEXT "Month", -1, 96, 15, 25, 8, WS_CHILD
-
- WS_VISIBLE WS_GROUP
- END
- //
- // build main menu bar resource
- //
- MAIN_MENU MENU
- BEGIN
- POPUP "&Date"
- BEGIN
- MENUITEM "&Run Demo", 100
- END
-
- END
- //
- // end Listing 2, SPIN.RC
- //
- // End of File
-
-
- Listing 3 spin.def: module definition file for spin control program
- NAME SPIN
- DESCRIPTION 'Demonstrates how to create & use spin controls'
- EXETYPE WINDOWS
- CODE PRELOAD MOVEABLE
- DATA PRELOAD MOVEABLE MULTIPLE
- HEAPSIZE 1024
- STACKSIZE 5120
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Scrolling List Dialog for Scientific Programming
-
-
- Steve Welstead
-
-
- This article is not available in electronic form.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- An Alternative to Large Switch Statements
-
-
- Matt Weisfeld
-
-
- Matt Weisfeld currently works for the Allen-Bradley Company in Cleveland, Ohio
- developing software for UNIX, VMX, DOS, Windows and other platforms. The
- author has recently published a book, entitled Developing C Language Portable
- System Call Libraries, which is available from the WILEY-QED group of John
- Wiley & Sons. Matt can be reached via Compuserve [71620,2171] or Internet
- [maw@ferrari.da.hh.ab.com].
-
-
- After many years of writing structured C code, I find it difficult to adjust
- to many aspects of Windows programming. The enormous size of many Windows
- program switch statements makes me particularly uncomfortable. Given the
- event-driven nature of Windows, I can understand why these statements grow so
- large, but my instincts tell me to avoid creating such constructs as a general
- practice. Just imagine a menubar/pulldown menu structure with 50 potential
- choices generating 50 possible messages (the Borland C++ for Windows 3.1
- compiler environment has over 60). If I create such a menu via the traditional
- approach my resulting switch statement will contain 50 case statements. That
- makes for pretty hard-to-read code. In this article I demonstrate how to
- replace these gigantic switch statements with something more manageable.
-
-
- An Alternative
-
-
- As an alternative to the switch statement, I use a function table with a
- look-up routine. This routine uses the parameter (formerly the switch
- statement's control variable) as a search value. When the routine finds a key
- in the table matching the search value, it executes the function associated
- with the key.
- I create the internal function table with the following prototype:
- typedef struct {
- WPARAM message;
- int (*funcptr)(HWND);
- } INFUNCS;
- The first field in the structure, defined as type WPARM, corresponds to the
- argument of the same type obtained from main Windows message loop. The second
- field is a function pointer to a routine. In this example, the parameter in
- the internal function call, a window handle (HWND), indicates where to display
- a messagebox. (The actual type and number of arguments passed in a real
- application will be different, depending on the application.)
- As an illustration, consider a Windows application, called internal, which
- creates a single window (I use the BORLAND C++ Compiler v3.1; however, this
- technique is not compiler dependent). This window has a simple menu structure
- providing two choices on the menubar, FILE and EDIT. Selecting either of these
- options produces a pulldown menu that contains the actions available to the
- user (FILE has eight, EDIT has six). Figure 1 shows a sample window with the
- EDIT pulldown menu activated. Choosing one of these pulldown items, using
- either the mouse or the keyboard, sends a WM_COMMAND message to the program
- message loop. After recognizing WM_COMMAND, the code checks the wParam
- argument for the actual message type, which represents the menu choice.
- After encountering a WM_COMMAND message, a simple table search attempts to
- match the wParam value to its counterpart in the table. If found, the code
- calls the corresponding internal function. If not found, the code generates a
- messagebox displaying an internal error. This error should theoretically not
- occur unless a legitimate wParam value is not in the table. (In this case, you
- can simply add the missing value to the table.)
- The code for all the internal functions resides in funcs.c (see Listing 7).
- The routine for p_file_open looks like this (the others are almost identical):
- int p_file_open(HWND hwnd)
- {
- MessageBox (hwnd, "file_open",
- "COMMAND SELECTED", MB_OK);
- return(0);
- }
- The initialization of the function table, called infuncs, is specified in the
- file funcs.h (see Listing 5). An abbreviated example of the table definition
- looks like this:
- INFUNCS infuncs[] = {
- {WM_FILE_NEW, p_file_new},
- {WM_FILE_OPEN, p_file_open},
-
- ...
- {WM_EDIT_CLEAR, p_edit_clear},
- {NULL, NULL},
- };
-
-
- Code Comparison
-
-
- The complete code for the main procedure resides in the file internal.c (see
- Listing 6). The code to perform the table search, and thus replace the switch
- statement, is as follows:
- case WM_COMMAND:
- {
- for (i=0; infuncs[i].message !=
- NULL; i++) {
- if (infuncs[i].message ==
- wParam) {
- status = (*in-
- funcs[i].funcptr)(hwnd);
- break;
- }
- }
-
-
- if (infuncs[i].message == NULL) {
- MessageBox (hwnd, "Bad Message",
- "INTERNAL ERROR", MB_OK);
- break;
- }
- }
- The algorithm performs a sequential search that terminates when a message
- matches an entry in the table (by a break statement) or when the search
- reaches the NULL values (meaning the message was not in the table, an error as
- described earlier). Compare this code to the switch statement in Listing 3.
- Even though the switch statement contains only 14 case statements, the
- function table still provides a code savings.
-
-
- Performance Considerations
-
-
- It is true that this technique introduces overhead in the form of table
- searches and function calls. I was initially concerned that this approach
- would degrade performance beyond the limits of acceptability. However,
- discussion on the BORLAND forum of Compuserve provided a consensus that there
- should be no performance problems relative to switch statements. I don't
- expect performance to suffer compared to switch statements because the
- compiler usually treats a large case statement as a table look-up anyway, and
- function calls require very little overhead. However, there is no easy way of
- predicting how this technique will impact a specific application.
- You can also speed up table searches by using smarter algorithms (see the
- sidebar, "Choosing a Table Search Algorithm"). My sample application just uses
- a linear search -- one of the slowest when it comes to large tables.
-
-
- Conclusion
-
-
- Despite the added overhead, the compelling reason to abandon the switch
- statement is to enhance readability. When using switch statements, more
- messages require more case statements. When using internal functions, the
- addition of a message requires no change to the message processing code --
- only a new entry in the table and a new internal function. The utility gained
- by using internal functions greatly increases as the number of messages
- increases.
- References:
- Data Structures, Algorithms and Program Style using C, James F. Korsh and
- Leonard J. Garrett. PWS-Kent Publishing Company. Boston, 1988. p 358.
- Introduction to Data Structures with Pascal, Thomas Naps and Bhagat Singh.
- West Publishing Company. St. Paul, 1986. p 317.
- Peter Norton's Windows 3.1 Power Programming Techniques, 2nd ed., Peter Norton
- and Paul Yao. Bantam Books, 1992.
- Programming Windows, 2nd ed. Charles Petzold. Microsoft Press, 1990.
- Choosing a Table Search Algorithm
- Choosing the proper search algorithm can greatly impact the efficiency of a
- program. This is especially true as a table grows very large. For example,
- consider an array with n elements (size n, assume elements start at 1). Each
- element of the array is a structure with the first field designated as the
- key. The search algorithm will compare a search value to each key in the table
- until it finds a match or the table is exhausted. I describe two types of
- search techniques here: the linear search and the binary search. (Hash tables
- are also an alternative to certain searching applications; however, they are
- mainly used for tables that are highly volatile or built at run time. The
- table in this article is fairly static and is built at compile time.)
-
-
- Linear Search
-
-
- Linear searches are the simpler of the two and normally considered the least
- efficient. The search progresses in a linear fashion (i.e. table position 1,
- position 2, ... position n). Processing time increases linearly with n. As a
- measure of efficiency, I present figures in terms of number of accesses
- required:
- Best Case : 1 -- a search for the first element in the table
- Worst Case : n -- a search for the last element in the table
- Average Case : n/2 -- on average half the numbers are searched
- A simple technique to eliminate a comparison step can increase the efficiency
- of a linear search by 20 to 50 percent. Most unimproved linear searches will
- require a simple comparison to limit the loop (ex: if i <= n) at each
- iteration. An improved algorithm elimates these comparisons by first placing a
- copy of the search value at location n+1. The algorithm can now walk through
- the array without checking if it has exceeded maximum table size, since it is
- certain stop its search at location n+1, if not before. If the search
- algorithm progressed as far as location n+1, then the search value was not
- originally in the table.
- It is also possible to order the array so that the most frequently accessed
- values are at the top of the table. However, this technique requires the
- program to keep statistics as the table is continuously searched, and to
- perform periodic re-ordering of the table.
-
-
- Binary Search
-
-
- Binary searches are in many cases more efficient than linear searches.
- However, they have one major drawback: the table must be pre-sorted. Binary
- searches use the bisection method to search the table, much like a human looks
- through an alphabetized index. For example, if the table is of size 10, the
- first element searched is at the mid-point (1+10)/2 = 5. If element 5 is not a
- match, a simple comparison tells us whether the key is above or below element
- 5 (since the table is sorted). At this point half the table is eliminated from
- further consideration. If the value is greater than the key in element 5, then
- the search is confined to locations 6 to 10. The new mid-point is (6+10)/2 =
- 8. This bisection process continues until either a match is made or the last
- element is searched. For a listing of the actual algorithm, consult the
- references provided. In this example, searching the entire table takes at most
- four iterations (compared to ten for the linear search). The access figures
- provided here assume that the bisection splits the remaining table into two
- parts that differ in size by at most one. The figures for binary searches are
- as follows:
- Best Case : 1 -- a search finds the first mid-point searched in
- the table
- Worst Case : log2 n -- a search finds the last element searched in the
- table
- Average Case : (log2 n)-1 -- if each element is searched with equal
- frequency
-
-
- Comparing Linear and Binary Search
-
-
- For large tables, the binary search requires much less time than the linear
- search in the worst case. Consider a table of 50,000 items. The worst-case
- scenario for a binary search requires no more than 16 accesses whereas the
- worst scenario for the linear search takes 50,000 accesses! In situations
- where the majority of searches are likely to fail, the efficiency of the
- binary search is very compelling. However, the logic of the binary search is
- much more complex (and thus much more time-consuming) than the logic of the
- linear search. Also, as already mentioned, the requirement for a sorted table
- is a drawback of the binary search. In this case, efficiency is impacted by
- the sorting method employed. Thus, for certain applications with small tables,
- the linear search may well be the faster of the two.
- The volatility of the table also influences the choice of search technique. If
- the table is constantly being updated, then the sorts required by the binary
- search will occur with greater frequency, decreasing its efficiency relative
- to the linear search. However, if the table is static, then the sort is
- performed only once, thus increasing the attractiveness of the binary search.
- In the context of the article, the size of the table is relatively small, so
- it might be best to use a linear search. Even in larger Windows applications,
- the number of case statements in a single switch structure will most certainly
- be less than 1000. However, the internal function table is very static and
- thus eliminates the primary objection to using a binary search.
-
- Figure 1 A sample window with an activated pulldown menu
-
- Listing 1 Definitions for internal application
- /************************************************************
- File : internal.h
- Author : Matt Weisfeld
- ************************************************************/
- #define WM_FILE-NEW 100
- #define WM_FILE_OPEN 101
- #define WM_FILE_SAVE 102
- #define WM_FILE_SAVE_AS 103
- #define WM_FILE_SAVE_ALL 104
- #define WM_FILE_PRINT 105
- #define WM_FILE_PRINTER_SETUP 106
- #define WM_FILE_EXIT 107
- #define WM_EDIT_UNDO 108
- #define WM_EDIT_REDO 109
- #define WM_EDIT_CUT 110
- #define WM_EDIT_COPY 111
- #define WM_EDIT_PASTE 112
- #define WM_EDIT_CLEAR 113
-
- /* End of File */
-
-
- Listing 2 Resource file for internal
- /************************************************************
- File : internal.rc
- Author : Matt Weisfeld
- ************************************************************/
- #include "internal.h"
-
- 1 MENU
- {
- POPUP "&File"
- {
- MENUITEM "&New", WM_FILE_NEW
- MENUITEM "&Open...", WM_FILE_OPEN
- MENUITEM "&Save", WM_FILE_SAVE
- MENUITEM "Save As...", WM_FILE_SAVE_AS
- MENUITEM "Save &All...", WM_FILE_SAVE_ALL
- MENUITEM SEPARATOR
- MENUITEM "&Print", WM_FILE_PRINT
- MENUITEM "P&rinter setup...", WM_FILE_PRINTER_SETUP
- MENUITEM SEPARATOR
- MENUITEM "E&xit", WM_FILE_EXIT
- }
- POPUP "&Edit"
- {
- MENUITEM "&Undo", WM_EDIT_UNDO
- MENUITEM "&Redo", WM_EDIT_REDO
- MENUITEM SEPARATOR
- MENUITEM "Cu&t", WM_EDIT_CUT
- MENUITEM "&Copy", WM_EDIT_COPY
- MENUITEM "&Paste", WM_EDIT_PASTE
- MENUITEM "Cl&ear", WM_EDIT_CLEAR
- }
- }
-
-
- /* End of File */
-
-
- Listing 3 Example of an unwieldy switch statement
- /**************************************************************
- Switch statement example
- **************************************************************/
- case WM_COMMAND:
- {
- switch (wParam) {
- case WM_FILE_NEW:
- MessageBox (hwnd, "file_new", "COMMAND SELECTED", MB_OK);
- break;
- case WM_FILE_OPEN:
- MessageBox (hwnd, "file_open", "COMMAND SELECTED", MB_OK);
- break;
- case WM_FILE_SAVE:
- MessageBox (hwnd, "file_save", "COMMAND SELECTED", MB_OK);
- break;
- case WM_FILE_SAVE_AS:
- MessageBox (hwnd, "file_save_as", "COMMAND SELECTED", MB_OK);
- break;
- case WM_FILE_SAVE_ALL:
- MessageBox (hwnd, "file_save_all", "COMMAND SELECTED", MB_OK);
- break;
- case WM_FILE_PRINT:
- MessageBox (hwnd, "file_print", "COMMAND SELECTED", MB_OK);
- break;
- case WM_FILE_PRINTER_SETUP:
- MessageBox (hwnd, "file_printer_setup", "COMMAND SELECTED", MB_OK);
- break;
- case WM_FILE_EXIT:
- MessageBox (hwnd, "file_exit", "COMMAND SELECTED", MB_OK);
- break;
- case WM_EDIT_UNDO:
- MessageBox (hwnd, "edit_undo", "COMMAND SELECTED", MB_OK);
- break;
- case WM_EDIT_REDO:
- MessageBox (hwnd, "edit_redo", "COMMAND SELECTED", MB_OK);
- break;
- case WM_EDIT_CUT:
- MessageBox (hwnd, "edit_cut", "COMMAND SELECTED", MB_OK);
- break;
- case WM_EDIT_COPY:
- MessageBox (hwnd, "edit_copy", "COMMAND SELECTED", MB_OK);
- break;
- case WM_EDIT_PASTE:
- MessageBox (hwnd, "edit_paste", "COMMAND SELECTED", MB_OK);
- break;
- case WM_EDIT_CLEAR:
- MessageBox (hwnd, "edit_clear", "COMMAND SELECTED", MB_OK);
- break;
- }
- }
-
- /* End of File */
-
-
- Listing 4 Prototypes for internal functions example
-
- /************************************************************
-
- File : proto.h
-
- Author : Matt Weisfeld
-
- ************************************************************/
- typedef struct {
- WPARAM message;
- int (*funcptr)(HWND);
- } INFUNCS;
-
- LRESULT CALLBACK InternalWndProc (HWND, UINT, WPARAM, LPARAM);
-
- int p_file_new();
- int p_file_open();
- int p_file_save();
- int p_file_save_as();
- int p_file_save_all();
- int p_file_print();
- int p_file_printer_setup();
- int p_file_exit();
- int p_edit_undo();
- int p_edit_redo();
- int p_edit_cut();
- int p_edit_copy();
- int p_edit_paste();
- int p_edit_clear();
-
- /* End of File */
-
-
- Listing 5 Map table to internal function
- /************************************************************
-
- File : funcs.h
-
- Author : Matt Weisfeld
-
- ***********************************************************/
- INFUNCS infuncs[] = {
- {WM_FILE_NEW, p_file_new},
- {WM_FILE_OPEN, p_file_open},
- {WM_FILE_SAVE, p_file_save},
- {WM_FILE_SAVE_AS, p_file_save_as},
- {WM_FILE_SAVE_ALL, p_file_save_all},
- {WM_FILE_PRINT, p_file_print},
- {WM_FILE_PRINTER_SETUP, p_file_printer_setup},
- {WM_FILE_EXIT, p_file_exit},
- {WM_EDIT_UNDO, p_edit_undo},
- {WM_EDIT_REDO, p_edit_redo},
- {WM_EDIT_CUT, p_edit_cut},
- {WM_EDIT_COPY, p_edit_copy},
- {WM_EDIT_PASTE, p_edit_paste},
- {WM_EDIT_CLEAR, p_edit_clear},
- {NULL,NULL},
- }
-
- /* End of File */
-
-
-
- Listing 6 Demonstrates use of internal functions instead of a switch
- statement.
- /************************************************************
- File : internal.c
- Author : Matt Weisfeld
- ************************************************************/
-
- #define STRICT
- #include <Windows.H>
- #include "proto.h"
- #include "internal.h"
- #include "funcs.h"
-
- /* global variables */
-
- char achWndClass[] = "Internal:MAIN";
- char achAppName[] = "Menu using Internal Functions";
-
- /* Main Function */
-
- int PASCAL WinMain (HINSTANCE hInstance,
- HINSTANCE hPrevInstance, LPSTR lpszCmdLine,
- int cmdShow) {
- HWND hwnd;
- MSG msg;
- WNDCLASS wndclass;
-
- if (!hPrevInstance)
- {
- wndclass.lpszClassName = achWndClass;
- wndclass.hInstance = hInstance;
- wndclass.lpfnWndProc = InternalWndProc;
- wndclass.hCursor = NULL;
- wndclass.hIcon = NULL;
- wndclass.lpszMenuName = "#1";
- wndclass.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
- wndclass.style = NULL;
- wndclass.cbClsExtra = 0;
- wndclass.cbWndExtra = 0;
-
- RegisterClass( &wndclass);
- }
-
- hwnd = CreateWindowEx(0L, achWndClass, achAppName,
- WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, 0, CW_USEDEFAULT,
- 0, NULL, NULL, hInstance, NULL);
-
- ShowWindow (hwnd, cmdShow);
-
- while (GetMessage(&msg, 0, 0, 0))
- {
- TranslateMessage(&msg);
- DispatchMessage(&msg);
- }
- return 0;
- }
-
- /* Process messages */
-
-
- LRESULT CALLBACK InternalWndProc (HWND hwnd, UINT mMsg,
- WPARAM wParam, LPARAM lParam)
- {
- int i, status;
-
- switch (mMsg)
- {
- case WM_COMMAND:
- {
- /* Loop through internal function table */
- for (i=0; infuncs[i].message != NULL; i++) {
- // If message found, execute function
- if (infuncs[i].message == wParam) {
- status = (*infuncs[i].funcptr)(hwnd);
- break;
- }
- }
-
- /* If message does not exists, internal error */
- if (infuncs[i].message == NULL) {
- MessageBox (hwnd, "Bad Message",
- "INTERNAL ERROR", MB_OK);
- break;
- }
-
- /* If file_exit is called, exit application */
- if (wParam == WM_FILE_EXIT)
- {
- SendMessage (hwnd, WM_SYSCOMMAND, SC_CLOSE. 0L);
- }
- }
- break; /* WM_COMMAND */
-
- case WM_DESTROY:
- PostQuitMessage(0);
- break;
-
- default:
- return(DefWindowProc(hwnd,mMsg,wParam,lParam));
- break;
- }
- return 0L;
- }
-
- /* End of File */
-
-
- Listing 7 Internal functions to replace switch statement.
- /************************************************************
- File : funcs.c
- Author : Matt Weisfeld
- ************************************************************/
- #include <Windows.h>
- #include "proto.h"
-
- int p_file_new(HWND hwnd)
- {
- MessageBox (hwnd, "file_new", "COMMAND SELECTED", MB_OK);
-
- return(0);
- }
- int p_file_open(HWND hwnd)
- {
- MessageBox (hwnd, "file_open", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_file_save(HWND hwnd)
- {
- MessageBox (hwnd, "file_save", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_file_save_as(HWND hwnd)
- {
- MessageBox (hwnd, "file_save_as", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_file_save_all(HWND hwnd)
- {
- MessageBox (hwnd, "file_save_all", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_file_print(HWND hwnd)
- {
- MessageBox (hwnd, "file_print", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_file_printer_setup(HWND hwnd)
- {
- MessageBox (hwnd, "file_printer_setup", "COMMAND SELECTED",
- MB_OK);
- return(0);
- }
- int p_file_exit(HWND hwnd)
- {
- MessageBox (hwnd, "file_exit", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_edit_undo(HWND hwnd)
- {
- MessageBox (hwnd, "edit_undo", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_edit_redo(HWND hwnd)
- {
- MessageBox (hwnd, "edit_redo", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_edit_cut(HWND hwnd)
- {
- MessageBox (hwnd, "edit_cut", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_edit_copy(HWND hwnd)
- {
- MessageBox (hwnd, "edit_copy", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_edit_paste(HWND hwnd)
-
- {
- MessageBox (hwnd, "edit_paste", "COMMAND SELECTED", MB_OK);
- return(0);
- }
- int p_edit_clear(HWND hwnd)
- {
- MessageBox (hwnd, "edit_clear", "COMMAND SELECTED", MB_OK);
- return(0);
- }
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Approximate String Matching
-
-
- Thomas Phillips
-
-
- Thomas Phillips, CCP, is a member of the ACM, IEEE:CS, ACL, SIAM, and AMTA.
- His interests include linguistics, computational linguistics, parallel
- processing, and cognitive science. He can be reached via e-mail at
- 72713.2100@compuserve.com.
-
-
-
-
- Introduction
-
-
- People aren't perfect. Unfortunately, computers often demand perfection,
- sometimes unjustly. One example is when the user must type in a word or a
- string of words to tell a program what to do. Suppose you have a database
- filled with names and addresses; you can write a program that allows a user to
- search by name, but if you're not careful the program will be too selective.
- Let's say the last name is Xavier, but the user thinks the name is spelled
- Zavier. If your program uses a simple strcmp, it won't find the name. What you
- need is a function to perform approximate matches. That's what freq_match does
- (see Listing 1) -- it finds the closest match based on frequency
- distributions.
-
-
- Matching via Frequency Distributions
-
-
- The frequency distribution of a word shows how many times each letter occurs.
- For example, the frequency distribution of the word "LETTER" would be: L-1,
- E-2, T-2, R-1. If you find the frequency distributions of two different words,
- you can compare them for similarity in spelling.
- The function freq_match uses the array freq_count[] to compare the
- distributions of two words. Each element in freq_count[] holds the
- distribution for its respective character. For example, the ASCII value for an
- 'a' is 97 (or 61H). If a word being analyzed contained three a's (like
- aardvark), then freq_count[97] would equal 3. (Note that freq_count[0]
- shouldn't really be used; you would only expect one NULL in a string.)
-
-
- Comparing Two Distributions
-
-
- Once freq_count[] is loaded with the distribution for one word (what I call
- the mask word), it's simple to calculate a number that tells how closely the
- mask word matches another word (the test word). I call this number the
- divergence. To start, let freq_count[] hold the frequency distribution of the
- mask word. Next, subtract from it the frequency distribution of the test word.
- Take the words "aardvark" and "apple," for example. The mask word is
- "aardvark," so
- freq_count['a'] is 3,
- freq_count['r'] is 2,
- freq_count['d'] is 1,
- freq_count['v'] is 1,
- freq_count['k'] is 1,
- and all others are 0.
- After the first letter of "apple" is processed, freq_count['a'] equals 2. This
- is because my algorithm subtracts while it processes the test word. After all
- the letters of the test word have been processed, freq_count[] will be as
- follows:
- freq_count['a'] is 2,
- freq_count['r'] is 2,
- freq_count['d'] is 1,
- freq_count['v'] is 1,
- freq_count['k'] is 1,
- freq_count['p'] is -2,
- freq_count['l'] is -1,
- and freq_count['e'] is -1.
- Finally, the algorithm adds up the absolute values of each of freq_count's
- elements. The elements not listed (such as freq_count['q']) are assumed to
- contain zeros. The total, or divergence, is 11. This huge divergence indicates
- that "aardvark" and "apple" are quite disimilar. However, had I compared
- "aardvark" with its misspelling, "ardvark," I would have arrived at a
- divergence of 1, indicating extreme similarity.
-
-
- Using freq_match
-
-
- Using the freq_match function requires two steps. First, the program must
- initialize the distribution array by passing it a valid mask word and an empty
- string for the test word. An example would be:
- freq_match("aardvark", "");
- The second parameter must be an empty string -- find_match uses this empty
- string as a switch enabling initialization. Since freq_match stores the mask
- word's frequency distribution in a static array (freq_count[]), it only needs
- to load each unique mask word once. In subsequent calls to freq_match, the
- first parameter is ignored as long as the second parameter is not an empty
- string. The second parameter should contain a valid test word. For example,
- once "aardvark" has been loaded, you can test its divergence from "apple" by
- using this statement:
- divergence = freq_match("", "apple");
- The return value will be 11.
- To use freq_match, call it for each word in your list of test words. Keep
- track of the minimum divergence and its corresponding test word. After you've
- gone through the entire list of words (or optionally stopping when you find a
- divergence of zero), accept the test word with the minimum divergence as the
- most similar word.
-
-
-
- Anagrams and Other Problems
-
-
- Frequency distributions are great for most approximate matches, unfortunately,
- they can't discriminate between a word and one of its anagrams. For example,
- "TAB" is an anagram for "BAT." The frequency distribution for either of these
- words is: T-1, A-1, B-1. One possible solution to this problem is to test the
- first or last letters of the mask and test words. If one or both match, then
- reduce the divergence by some constant number. If they don't match, increase
- the divergence. For example:
- divergence+=(mask_word[0]==test_word[0]) ?
- ((divergence>0) ? -1 : = 0) : 1;
- This strategy will slow down the matching process, though. In addition, it can
- introduce some of its own problems (like matching "BIG" with "BOG").
- Should you throw away strcmp and use freq_match for all of your string
- matching needs? Obviously no, but there are places where freq_match() is very
- useful. freq_match works best when a mask word must be found in a relatively
- small list of test words (a few hundred or less). Typically, you would use
- this function in applications where human errors are also frequent -- and that
- covers a lot of ground.
-
- Listing 1 Definition of function freq_match and test program
- #include <stdio.h>
-
- int freq_match(mask, test)
- char *mask;
- char *test;
- {
- static int freq_count[256]; /* the frequency */
- /* distribution */
- int divergence;
- int i;
-
- /* freq_match("maskword", ""); */
- if (test[0] == '\0') {
- /* initialize the distribution array */
- for (i=0; i<256; i++)
- freq_count[i++] = 0;
-
- /* compute the distribution */
- for (i=0; mask[i] != '\0'; i++)
- freq_count[mask[i]] += 1;
-
- /* return a zero for initialization */
- return 0;
- } /* if */
-
- /* freq_match("don't care", "testword"); */
- else {
- /* subtract the freq. dist. of the test word */
- for (i=0; test[i]!='\0'; i++)
- freq_count[test[i]] -= 1;
-
- /* compute the divergence */
- for (divergence=0, i=0; i<256; i++)
- divergence += abs(freq_count[i]);
-
- /* this code is to reset the freq. dist. */
- /* back to the settings for the mask word */
- for (i=0; test[i]!='\0'; i++)
- freq_count[test[i]] += 1;
-
- return divergence;
- } /* else */
- } /* freq_match() */
-
- void main()
- {
- char mask[80], test[80];
-
- printf("Mask:");
-
- gets(mask);
-
- printf("Test:");
- gets(test);
-
- freq_match(mask, "");
- printf("The divergence is %d.\n",
- freq_match("", test));
- } /* main() */
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Record-Oriented Data Compression
-
-
- John W. Ross
-
-
- John W. Ross is a computational scientist in the University of Toronto's High
- Performance and Research Computing group. His interests include scientific
- programming on highly parallel and vector supercomputers. John can be reached
- at yjohn@utirc.utoronto.ca.
-
-
-
-
- Introduction
-
-
- Data compression programs are very useful but many of them have an annoying
- shortcoming; they work on a file in toto, compressing the whole file in one
- gulp. This shortcoming becomes especially significant when you are building a
- data-base program or file manager. You can use a data compressor to compress
- the resulting file, but once you do you no longer know where the record
- boundaries are. You have to uncompress the file to extract or insert records,
- so you lose some of the advantages of compression. A lot of similar
- applications can't take advantage of conventional data compressors for this
- reason.
- We need a data compressor/decompressor that operates on the individual records
- of a file. Such a program would allow you to compress a record, hand it to the
- data base manager for storage, then later retrieve it and uncompress it. In
- this article I present a program that does just that.
-
-
- Implementation
-
-
- I've chosen Huffman encoding as the algorithm to compress/decompress data
- records (see the sidebar, "Two Popular Compression Algorithms," for a detailed
- discussion of my choice).
- Since Huffman encoding relies on character frequencies, it works with
- practically any size data record. Of course, it is necessary to perform an
- initial pass over the entire data set (or at least, a representative data set)
- to determine the character frequencies. From that point on, as long as a data
- record has the same character frequency characteristics as this data set, the
- program will achieve good compression ratios.
- I have implemented my compression scheme as three distinct entities. The first
- is a stand-alone program that reads a file and generates a Huffman encoding
- tree, which it then writes out to a file. The next is a function which
- compresses a record using the encoding tree. The function takes a pointer to a
- record to be compressed, and passes back a pointer to the compressed record.
- The third entity is a function which decompresses a record. I have also
- written a simple file management program to illustrate the application of
- these functions. I consider each of the units in turn.
-
-
- Building the Encoding Tree
-
-
- Before compressing data records a program must build a Huffman encoding tree
- from a sample of text similar to that which will be compressed. The encoding
- tree will be based on the character frequencies found in the file. Program
- bldtree (Listing 2) performs this operation.
- hufftree.h (Listing 1) is an include file that defines the encoding tree
- structure. By defining it in terms of short integers I am able to keep the
- external storage for the tree to a minimum.
- bldtree takes a single command-line argument, the name of the file to be
- analyzed. The first thing bldtree does is determine a frequency count of the
- characters in the file. Note that all characters are assigned a minimum count
- of 1, so that the compression routine will be able to generate a code for any
- character, even if it didn't appear in the file. The remainder of the program
- builds the encoding tree from the table of character frequencies. bldtree
- creates a file called htree.dat and writes the encoding tree to the file.
-
-
- The Compression Function
-
-
- The function compress and its ancillary functions, encode and emit, are shown
- in Listing 3. compress takes a pointer to a character buffer (containing the
- uncompressed data record) and the buffer length as arguments, and returns a
- pointer to another character buffer containing the compressed data record.
- Note that the first thing compress does is put the length of the input data
- record into the first byte of the output record. compress stores this length
- in the first byte so the decompression function will know how many bytes it
- has to decode from the compressed record. Using one byte to store the length
- restricts the maximum input record length to 255 bytes, minus one byte to
- store that length -- so the input record must contain at most 254 bytes of
- data. If you wanted to use larger records you could store the length as a
- short integer using the first two bytes of the output record. (This assumes
- that the input records are variable length. If they are fixed length, and the
- application knows that length then it is unnecessary to store the length in
- the record.)
- For each character in the input buffer, compress calls function encode, which
- in turn uses the global struct ht, the encoding tree. The application program
- is responsible for defining storage for ht, and either reading it from an
- external file or constructing it. The final call to emit in compress pads out
- the final byte of the compressed record if necessary.
-
-
- Decompression
-
-
- To invert the compression process, an application program passes a compressed
- record to the function decode, (Listing 4) which expands the record and
- returns it in its original form along with its length in a separate buffer.
- (It should be obvious, but in case it isn't I want to stress that the encoding
- tree used to decompress the data records must be the same one that was used to
- compress them.)
-
-
- An Example
-
-
- The program called filer (Listing 5) shows how the compression routines may be
- applied. filer takes two command-line arguments: the name of an input file to
- be compressed and the name of an output file that will contain the compressed
- records. filer is designed to work with a plain text file. filer treats each
- line in the file, delineated by a newline (\n), as a record to be compressed.
- filer also writes an index file as it processes the text file so that
- individual compressed records may be retrieved later.
- filer assumes that a file called htree.dat containing the Huffman encoding
- tree exists.
- filer keeps track of the number of bytes in the input and output records and
- prints out the average compression ratio.
- unfiler (Listing 6) illustrates the application of decompression. unfiler can
- read any record from the compressed file by number, but it's set up in this
- example to read all of the records in sequence.
- To use these programs with a text file called misc. txt, you would type the
- following commands:
- bldtree misc.txt
-
- filer misc.txt misc.cmp
- unfiler misc.cmp misc.unc
- misc.cmp is the file containing the compressed records and misc.unc contains
- the decompressed versions of the records. misc.txt and misc.unc should be
- identical. These programs will compile and run under MS-DOS and most versions
- of UNIX.
- To work with a file that does not consist of individual lines of text, it is
- simple to modify filer to read blocks of text or data rather than lines. Just
- replace the program lines that read:
- while (fgets(bufin, MAXBUF-1, in) != NULL)
- { inlen = strlen(bufin);
- ncharin += inlen;
- with
- while ((inlen = fread(bufin, 1, 255, in)) > 0)
- { ncharin += inlen;
- This modification will yield slightly better compression ratios. You don't
- have to modify unfiler to recover the output file.
-
-
- Simple Encryption
-
-
- This compression process also performs encryption as a side effect. The
- encryption key is the particular Huffman encoding tree in use when the data
- was compressed. Someone looking to decrypt your data would have to determine
- that it had been compressed using Huffman encoding and then would have to
- determine what sort of character frequency distribution had been used. While
- this kind of challenge probably wouldn't stop a professional code breaker, it
- might dissuade the casual snooper.
-
-
- Summary
-
-
- In this article I have shown how to adapt a common file compression algorithm,
- Huffman encoding, so that it may be applied to the individual data records in
- a file. This technique would be useful in database or file management
- applications in which parts of the file have to be stored or retrieved
- independently. The compression process also also serves as a means of
- encrypting the data.
- Two Popular Compression Algorithms
- Two general purpose data compression algorithms are currently in widespread
- use. These are the Lempel-Ziv-Welch (LZW) algorithm (Welch, Terry A., "A
- Technique for High Performance Data Compression," IEEE Computer, June, 1984.)
- and the Huffman encoding algorithm. Both are lossless compression methods, but
- rely on completely different compression mechanisms.
- The LZW algorithm generally yields better results and does not require
- external tables. This algorithm replaces strings of characters with single
- codes which create a sort of expanded alphabet. LZW adds new strings to a
- string table, which is created dynamically as the input data stream is
- processed. As output LZW produces the codes representing strings in the input
- stream.
- In the decompression phase the LZW algorithm reads the string codes, from
- which it can dynamically reconstruct the string table and output the original
- strings of characters.
- The Huffman encoding algorithm is based on the observation that certain
- characters, especially in text, occur more frequently than others. Instead of
- allocating an 8-bit ASCII code to each character the Huffman algorithm uses
- variable-length codes for each character. Frequently occurring characters are
- represented by codes with fewer bits while rarely occurring characters are
- represented by codes with more bits.
- Static Huffman encoding requires a table of probabilities of occurrence of
- each character before compression. Programs may derive this table from a
- pre-scan of the actual data if it is available, or from statistical
- observations of similar data -- this works well for English text, for example.
- In any event, the program constructs an encoding tree from the probability
- information. This tree must be available in both the compression and
- decompression phases. Though the output consists of variable-length bit
- strings for each character, no end-of-character markers are required -- the
- algorithm "knows" when it has reached the end of a code in the decompression
- phase.
-
-
- Choosing an Algorithm for Compression
-
-
- As mentioned above, the LZW algorithm generally produces better compression
- ratios but it is not suitable for record-oriented compression. The records
- involved may be quite short and may contain no repeated strings. Since the
- records may be processed independently (for example, inserting one record at
- random into a data base) LZW can't take advantage of the fact that the data
- base or file as a whole may have much repeated information in it. Since the
- LZW algorithm works by replacing repeated strings with shorter codes it would
- often provide unsatisfactory compression in this application.
- Huffman encoding will work with individual records, but it's not without
- drawbacks. First, Huffman encoding requires an encoding tree be available
- during both compression and decompression phases. If you use an encoding tree
- that was constructed from data with a particular character frequency
- distribution to encode data that has a different character frequency
- distribution, your results will not be satisfactory. Second, the algorithm
- doesn't work well for data containing characters that occur with a roughly
- uniform frequency. Binary data is particularly bad -- files such as a program
- executable don't compress -- they will probably increase in size!
- A third problem stems from the algorithm's conversion of input data as a
- string of bytes to output data as a string of bits. Since most computers can
- only address data units down to the byte level, this output string must be
- padded out to an integral number of bytes. Unfortunately, in the decompression
- phase the Huffman algorithm can't tell whether these extra few bits represent
- more characters or whether they are just padding. It is therefore necessary to
- inform the decompression program what the original record length was so it
- will know when to stop.
- Having said all that, the algorithm generally works well for text data. For
- English text you can expect data to be compressed to about 60% of its original
- size. The algorithm also works well in situations where your data consists of
- numbers represented by ASCII digits -- in this case you can get compressions
- closer to 40% of the original size.
-
- Listing 1 hufftree.h: defines Huffman encoding tree structure
- struct htree
- { short parent;
- short right;
- short left;
- };
-
- #ifdef MAIN
- struct htree ht[512];
- short root;
- #else
- extern struct htree ht[];
- extern short root;
- #endif
-
- void compress(int inlen, char *bufin, int *outlen, char
- *bufout);
- void encode(short h, short child, int *outlen, char *bufout);
- void emit(int bit, int *outlen, char *bufout);
- void decode(char *bufin, int *outlen, char *bufout);
-
-
- /* End of File */
-
-
- Listing 2 A program that reads a file and generates a Huffman encoding tree
- based on the character frequencies in that file.
- /* ---------------------- bldtree - main --------------------
- */
-
- #include <stdio.h>
- #define MAIN
- #include "hufftree.h"
-
- long htcnt[512];
-
- main(int argc, char *argv[])
- {
- FILE *fin, *fout;
- int c;
- int i;
- int ntree = 256;
- short h1, h2;
-
- if (argc < 2)
- { fprintf(stderr,"usage: %s infile \n",argv[0]);
- exit(0);
- }
-
- if ((fin = fopen(argv[1],"rb")) == NULL)
- { fprintf(stderr,"Unable to open %s for input\n",argv[1]);
- exit(1);
- }
-
- if ((fout = fopen("htree.dat","wb")) == NULL)
- { fprintf(stderr,"Unable to open htree.dat for output\n");
- exit(1);
- }
-
- /* initialize character counts so all characters recognized
- */
- for (i=0; i<256; i++)
- htcnt[i] = 1;
-
- /* count character occurrence frequencies */
- while ((c = fgetc(fin)) != EOF)
- { htcnt[c]++;
- }
-
- /* build Huffman tree */
- while(1)
- { h1 = 0;
- h2 = 0;
- for (i=0; i<ntree; i++)
- { if (i != h1)
- { if (htcnt[i] > 0 && ht[i].parent == 0)
- { if (h1 == 0 htcnt[i] < htcnt[h1])
- { if (h2 == 0 htcnt[h1] < htcnt[h2])
- h2 = h1;
- h1 = i;
- }
-
- else if (h2 == 0 htcnt[i] < htcnt[h2])
- h2 = i;
- }
- }
- }
- if (h2 == 0)
- { root = h1;
- break;
- }
- ht[h1].parent = ntree;
- ht[h2].parent = ntree;
- htcnt[ntree] = htcnt[h1] + htcnt[h2];
- ht[ntree].right = h1;
- ht[ntree].left = h2;
- ntree++;
- }
-
- /* write out tree */
- fwrite(&root, sizeof(root), 1, fout);
- fwrite(ht, sizeof(ht), 1, fout);
- fclose(fin);
- fclose(fout);
- }
-
- /* End of File */
-
-
- Listing 3 functions that compress a data record using Huffman encoding.
- #include "hufftree.h"
-
- /*
- ------------------------ compress ----------------------------
- /*
-
- void compress(int inlen, char *bufin, int *outlen, char
- *bufout)
- {
- int c, i;
-
- bufout[(*outlen)++] = inlen;
- for (i=0; i<inlen; i++)
- { c = bufin[i];
- encode((c & 255), 0, outlen, bufout);
- }
- emit(-1, outlen, bufout);
-
- }
-
-
- /*
- ------------------------- encode -----------------------------
- */
-
- void encode(short h, short child, int *outlen, char *bufout)
- {
- if (ht[h].parent != 0)
- encode(ht[h].parent, h, outlen, bufout);
- if (child)
- { if (child == ht[h].right)
-
- emit(0, outlen, bufout);
- else if (child == ht[h].left)
- emit(1, outlen, bufout);
- }
- }
-
-
- static char byt;
- static int cnt;
-
- /*
- ------------------------- emit -------------------------------
- */
-
- void emit(int bit, int *outlen, char *bufout)
- {
- if (bit == -1)
- { while (cnt != 8)
- { byt = byt << 1;
- cnt++;
- }
- bufout[(*outlen)++] = byt;
- byt = 0;
- cnt = 0;
- return;
- }
- if (cnt == 8)
- { bufout[(*outlen)++] = byt;
- byt = 0;
- cnt = 0;
- }
- byt = (byt << 1) bit;
- cnt++;
- }
-
- /* End of File */
-
-
- Listing 4 A function to decompress a data record compressed with Huffman
- encoding.
- #include <stdio.h>
- #include "hufftree.h"
-
- /*
- -------------------------- decode -----------------------------
- */
-
- void decode(char *bufin, int *outlen, char *bufout)
- { short h;
- int obit;
- int nin = 0, nout = 0;
- int byt, cnt = 8;
- unsigned char size;
-
- size = (unsigned char)bufin[nin++];
- while (nout< size)
- { h = root;
- while (ht[h].right != NULL)
- { if (cnt == 8)
- { byt = bufin[nin];
-
- nin++;
- cnt = 0;
- }
- obit = byt & 0x80;
- byt <<= 1;
- cnt++;
- if (obit)
- h = ht[h].left;
- else
- h = ht[h].right;
- }
- bufout[nout++] = h;
- }
- *outlen: nout;
- }
-
- /* End of File */
-
-
- Listing 5 A program to illustrate the use of the data compression functions.
- #include <stdio.h>
- #include <string.h>
- #define MAIN
- #include "hufftree.h"
-
- #define MAXBUF 1024
-
- /*
- ---------------------- filer - main -------------------------
- */
-
- main(int argc, char *argv[])
- {
- FILE *ftree, *indx, *in, *out;
- long nrecs, end, zero = 0;
- int inlen, outlen;
- char bufin[MAXBUF];
- char bufout[MAXBUF];
- long ncharin = 0, ncharout = 0;
-
- if (argc ! = 3)
- { fprintf(stderr,"Usage: %s input output\n", argv[0]);
- exit(0);
- }
-
- if ((in = fopen(argv[1], "rb")) == NULL)
- { fprintf(stderr,"Unable to open input file %s\n", argv[1]);
- exit(1);
- }
-
- if ((out = fopen(argv[2], "wb")) == NULL)
- { fprintf(stderr,"Unable to open output file %s\n", argv[2]);
- exit(1);
- }
-
- /* read in Huffman tree */
- if ((ftree = fopen("htree.dat","rb")) == NULL)
- { fprintf(stderr,"Unable to open htree.dat\n");
- exit(1);
-
- }
- else
- { fread(&root, sizeof(root), 1, ftree);
- fread(ht, sizeof(ht), 1, ftree);
- fclose(ftree);
- }
-
- /* create index file */
- if ((indx = fopen("index", "wb")) == NULL)
- { fprintf(stderr,"Unable to open index file index\n");
- exit(1);
- }
- else
- { fwrite(&zero, sizeof(zero), 1, indx);
- fwrite(&zero, sizeof(zero), 1, indx);
- nrecs = 0;
- end = 0;
- }
-
- while (fgets(bufin, MAXBUF-1, in) != NULL)
- { inlen = strlen(bufin);
- ncharin += inlen;
- outlen = 0;
- compress(inlen, bufin, &outlen, (char *)bufout);
- ncharout += outlen;
- nrecs++;
- end += outlen;
- fwrite(&end, sizeof(end), 1, indx);
- fwrite(bufout, sizeof(char), outlen, out);
- }
- rewind(indx);
- fwrite(&nrecs, sizeof(nrecs), 1, indx);
-
- printf("avg chars/record in: %.2f out:%.2f\n",
- (float)ncharin/nrecs, (float)ncharout/nrecs);
- printf("compression ratio: %.3f\n",
- (float)ncharout/(float)ncharin);
- }
-
- /* End of File */
-
-
- Listing 6 A program to illustrate the use of the data decompression function.
- #include <stdio.h>
- #define MAIN
- #include "hufftree.h"
-
- #define MAXBUF 1024
-
- /*
- ---------------------- unfiler - main -------------------------
- */
-
- main(int argc, char *argv[])
- {
- FILE *ftree, *indx, *in, *out;
- char bufin[MAXBUF];
- char bufout[MAXBUF];
- long ofst, nrecs, start, end;
-
- int i, len, outlen;
-
- if (argc != 3)
- { fprintf(stderr,"Usage: %s input \n", argv[0]);
- exit(0);
- }
-
- if ((in = fopen(argv[1], "rb")) == NULL)
- { fprintf(stderr,"Unable to open input file %s\n", argv[1]);
- exit(l);
- }
-
- if ((out = fopen(argv[2], "wb")) == NULL)
- { fprintf(stderr,"Unable to open output file %s\n", argv[2]);
- exit(l);
- }
-
- if ((ftree = fopen("htree.dat","rb")) == NULL)
- { fprintf(stderr,"Unable to open htree.dat\n");
- exit(1);
- }
- else
- { fread(&root, sizeof(root), 1, ftree);
- fread(ht, sizeof(ht), 1, ftree);
- fclose(ftree);
- }
-
- if ((indx = fopen("index", "rb")) == NULL)
- { fprintf(stderr,"Unable to open index file\n");
- exit(1);
- }
- else
- { fread(&nrecs, sizeof(nrecs), 1, indx);
- }
-
- for (i=0; i<nrecs; i++)
- {
- ofst: sizeof(nrecs) + sizeof(start) * i;
- fseek(indx, ofst, SEEK_SET);
- fread(&start, sizeof(start), 1, indx);
- fread(&end, sizeof(end), 1, indx);
- fseek(in, sizeof(char) * (start), SEEK_SET)
- len = end - start;
- fread(bufin, sizeof(char), len, in);
- decode(bufin, &outlen, bufout);
- bufout[outlen] = '\O';
- fprintf(out,"%s",bufout);
- }
-
- }
-
- /* End of File */
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Code Complete
-
-
- Tommy Usher
-
-
- Tommy Usher is president of RENT-A-HACK, a low-cost computer consulting firm.
- He also does freelance programming. He can be reached at hacker3000@ao1.com.
-
-
- Code Complete by Steve McConnell provides a good overview of a wide range of
- programming topics. Though the book is described on the cover as "A Practical
- Guide to Software Construction," in fact it ventures well beyond the practical
- and into what McConnell might call the "philosophy" of good programming. Code
- Complete doesn't provide all of the answers, but is a good place to start.
-
-
- Audience
-
-
- The author lists three groups who he feels should read this book: experienced
- programmers, who want a comprehensive guide to software construction,
- self-taught programmers, who are looking to learn more effective programming
- practices, and students making the transition from an academic environment to
- a professional one.
-
-
- Content
-
-
- The book is divided into eight sections:
- 1. Laying the Foundation
- 2. Design
- 3. Data
- 4. Control
- 5. Constant Considerations
- 6. Quality Improvement
- 7. Final Steps
- 8. Software Craftsmanship
- Each of these sections contains several chapters on more specific topics.
- McConnell uses special symbols throughout the book to alert the reader to key
- points, hard data, suggestions for further reading, and what he calls coding
- horrors. (This last item provides examples of how not to program.) Many of the
- chapters include a check list designed to serve as a quick reference for the
- chapter's contents.
- The first section, "Laying the Foundation," deals with the concept of software
- construction and provides a philosophical basis for the material which
- follows. It begins with a definition of software construction, followed by a
- discussion of the various metaphors that programmers use to understand
- programming. (Does one "build" a program or "grow" it?) The author then
- discusses the prerequisites to coding.
- The next section deals with the design of software. Here McConnell introduces
- concepts such as routines, modules, and Program Design Language (PDL). The
- section includes a discussion of design methodologies, with special emphasis
- on structured and object-oriented design. It concludes with a description of
- "round-trip design," which seeks to combine the ideas of the other design
- methodologies and apply the best-suited method to each problem.
- The "Data" section puts the concept of data structures in practical terms. The
- emphasis is not on the structures themselves but on how to use them
- efficiently. Subjects covered include general issues, variable names, and data
- types.
- The "Control" section begins with straight-line code and moves through the
- various types of control structures, including conditionals, loops, and
- unusual control structures such as goto, return, and recursion.
- The chapter on unusual control structures is one of the best in the book and
- possibly one of the most controversial. The author provides one of the best
- discussions of the goto issue I have read. He is balanced but cautious in his
- treatment of goto. His conclusion is that in nine out of ten cases, gotos can
- be easily eliminated. Of the remaining one in ten, nine out of those ten can
- be eliminated with reasonable effort. That leaves one in 100 gotos that are
- actually necessary. This may annoy those who feel gotos must never be used,
- but he makes a reasoned and logical argument for his position. Further, he
- provides reasonable guidelines for the proper use of gotos.
- Another area where he provides useful information is when and how to use
- recursion. An interesting section shows that the classic textbook examples
- illustrating recursion -- calculating factorials and Fibonacci numbers -- are
- both confusing and misleading. The section concludes with a discussion of
- general control issues, and a closer look at structured programming.
- The "Constant Considerations" section covers issues that programmers face
- everyday, like layout, style, documentation, tools, and management. Here
- again, McConnell tackles areas of controversy (that some programmers approach
- with religious fervor) with balance and moderation. His ideas on program
- layout and style will not please everyone -- for example, I prefer a different
- style of indentation. Still, they deserve consideration and provide
- interesting ideas. On the subject of comments, he tells an amusing story about
- a group of philosophers debating their various philosophies of commenting.
- Other chapters deal with programming tools, the relationship between size and
- software construction, and management. Management is examined from both the
- manager's and the programmer's points of view.
- The next section is called "Quality Improvement." In this section the author
- first gives an overview of software quality, then addresses some specific
- subjects. Reviewing, unit testing, and debugging are examined in this section.
- Each subject is treated in depth and McConnell suggests techniques and tools
- that can help ensure the quality of a program.
- Some additional topics related to quality improvement are discussed in the
- next section, "Final Steps." Here the emphasis is on system integration, code
- tuning, and software evolution. In a world of increasingly larger and larger
- software packages, these are important chapters. One interesting and
- potentially controversial concept introduced in the chapter on integration is
- evolutionary delivery. This scheme calls for the developer to deliver the
- software at successive levels of completion. The initial delivery is the core
- of the ultimate product. Subsequent deliveries add capabilities, improve
- interfaces and performance, and ultimately lead to a fully functional product.
- These techniques seem to me more suited to development for in-house use but
- the author is advocating them for commercial releases. His discussion of
- evolutionary delivery almost seems a defense of releasing an unfinished
- product with the intention of insuring future sales. (During the development
- of a spreadsheet, for example, file operations are not added until the third
- delivery.) This chapter seems to provide a solid basis for the truism that one
- should never purchase any software with a version number ending in zero.
- The last section of the book is on "Software Craftsmanship." The first chapter
- in this section deals with what may seem a strange topic: personal character.
- McConnell argues that character is a legitimate topic of discussion and
- proceeds to describe the characteristics of good programmers. The next chapter
- deals with several different themes related to software craftsmanship:
- complexity, processes, iteration, and religion. By religion he of course
- refers to the various issues, such as use of goto, layout style, and program
- structure, that programmers become emotional about. The book ends with a real
- gem of a chapter called "Where to Go for More Information." Here he gives
- suggestions, both specific and general, for various books that should be in
- the library of the software professional. He also suggests periodicals,
- professional organizations, and sources for books on programming.
-
-
- Commentary
-
-
- Whatever your background, "Code Complete" has something to offer you. Even the
- most experienced C programmer will find ideas for improvement. The author
- writes in an entertaining style and makes good use of code segments to
- illustrate his points.
- Not everyone will be pleased with everything McConnell says. Since he deals
- with subjects that many programmers are very emotional about, some readers
- will no doubt object to his suggestions. Still, this book will benefit any
- programmer who approaches it with an open mind. Even when I disagreed with the
- author's approach, I could see his point. I was pleasantly surprised by his
- willingness to challenge coventional wisdom.
- The book is very well organized. Each chapter is cross-referenced to areas of
- similar interest, and references to other works are in the margins rather than
- at the end of the chapters. (It might be interesting to covert this book to an
- online form. The extensive cross-references almost seem to have been created
- with this in mind.)
- Code Complete should be required reading for anyone who plans to begin or
- continue a career in software development. It won't teach you how to be a C
- programmer, but it could help make you a more productive one.
- Title: Code Complete
- Author: Steve McConnell
- Publisher: Microsoft Press
- Price: $35.00
- ISBN: 1-55615-484-4
- Pages: 880
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Standard C
-
-
- Introduction to lostreams
-
-
-
-
- P.J. Plauger
-
-
- P.J. Plauger is senior editor of The C Users Journal. He is convenor of the
- ISO C standards committee, WG14, and active on the C++ committee, WG21. His
- latest books are The Standard C Library, and Programming on Purpose (three
- volumes), all published by Prentice-Hall. You can reach him at
- pjp@plauger.com.
-
-
- The largest single component of the Standard C++ library is the package called
- "iostreams." It consists of a whole slew of classes that work together to make
- input and output look simple. The commonest application is probably the
- classic:
- #include <iostream>
- .....
- cout << "Hello, world." << endl;
- which writes the message Hello, world. followed by a newline character to the
- standard output stream (same as stdout).
- This simple expression is the tip of a rather large iceberg. Let's look a bit
- below the surface.
- As you might guess, the header <iostream> declares all the necessary classes.
- (Existing implementations of C++ may append a .h, or a .hpp, to the header
- name.) The class we care about for this example is ostream, each of whose
- objects controls a stream into which you can insert characters. That stream
- can be an output file, a text string that grows dynamically in memory, or lots
- of other things.
- In this particular case, cout is a static object of class ostream whose name
- has global linkage. It conspires rather intimately with stdout, the familiar
- Standard C library object of type pointer to FILE, to help you write formatted
- text to the standard output stream. You can freely intermix expressions such
- as the one above with more traditional calls such as to putchar or printf and
- get the result you'd expect.
- (Translation: you don't have to worry about two different buffering mechanisms
- saving up characters and writing reordered blocks of characters to the
- standard output stream. Again, existing implementations don't always make this
- promise, at least not unless you perform some magic incantation at run time.
- And then you can often expect a degradation of performance. The draft C++
- Standard hopes to encourage better synchronization of C and C++ I/O, if only
- to the standard streams.)
- The header <iostream> declares two similar objects of class ostream:
- cerr, which controls writes of unbuffered text to the standard error stream,
- in cooperation with stderr, and
- clog, which also cooperates with stderr but which won't necessarily force text
- to be written out at the end of each insertion operation.
-
-
- Inserters
-
-
- Now let's take a closer look at the first part of the example expression:
- cout << "Hello, world." .....
- The class ostream overloads the left-shift operator repeatedly with member
- functions to provide a rich set of inserters. These each encode a right-hand
- operand by various rules (akin to the conversion specifications for printf),
- then insert the encoded text in the stream controlled by the ostream object.
- In this particular case, the inserter called is:
- ostream& operator<<(const char* s);
- This function knows to write the null-terminated string pointed to by s to the
- output stream controlled by cout. Put another way, the function inserts each
- of the characters from the string into the controlled stream up to but not
- including the terminating null.
- More complicated inserters turn binary integer and floating-point values into
- sequences of characters that human beings can read. For example, the inserter:
- ostream& operator<<(int n);
- lets you write expressions like:
- cout < i;
- that inserts, say, the sequence -123 in the controlled stream to represent the
- value --123 stored in the integer object i.
- C programmers have been doing the same sort of thing for decades by writing:
- printf("%s %d\n", s, i);
- So what's the big deal? Well, the inserter approach offers a few advantages:
- The translator picks the appropriate inserter to match the right hand operand
- type. No need to make sure that conversion specifiers in a format string line
- up with the proper arguments. There is that much less chance that the value
- will be interpreted incorrectly.
- The notation is often convenient. You can string inserters out left to right,
- as in the original example above, to perform a series of insertions one after
- the other.
- The notation can be augmented in various clever ways, as with the end1 in the
- original example.
-
-
- Manipulators
-
-
- I won't elaborate much on that last point in this column. All you need to know
- for now is that inserting endl in an output stream has the effect of inserting
- a newline character ('\n') then flushing output to the controlled stream. It
- is but one of many interesting manipulators you can use with the inserter
- notation.
- Other manipulators make up for one of the shortcomings of the inserter
- notation. Remember that with printf you can write some pretty fancy conversion
- specifications, such as:
- printf("%+10d\n", i );
- to force a plus sign on positive output and pad the generated text to (at
- least) ten characters. But operator<< takes only two operands. There is no
- place to smuggle in that extra formatting information.
- The solution is to squirrel away the extra information in the cout object
- before performing the insertion. You can do this by calling member functions,
- as in:
-
- cout.setf(showpos), cout.width(10);
- cout < i;
- Or you can use still more magic manipulators to achieve the same effect, as
- in:
- cout << showpos << setw(10) << i;
- As you can see, it's possible to have manipulators that take arguments. But
- that involves even more chicanery -- a topic for much later in this series.
- (In case you're wondering, the effect of showpos endures for subsequent
- insertions, in either of the above forms. I won't show how to turn it off
- here. But the field width evaporates after the first inserter that makes use
- of it.)
-
-
- Extractors
-
-
- You can probably guess what's coming next. The header <iostream> also supports
- reading and decoding text from various streams, including input files. It
- declares the object cin, which helps you extract characters from the standard
- input stream, in cooperation with stdin. As you might further guess, this
- object is of class istream, the obvious companion to ostream.
- Thus, you can write code involving extractors, such as:
- int n;
- cin >> n;
- to read a sequence of characters and decode them by the usual rules for
- encoded integer input. The "usual" rules are much as for the function scanf:
- Skip any leading whitespace.
- Gobble one or more characters that look like a valid encoded integer and
- convert them to int representation.
- If no such characters are found, or if the result can't be properly
- represented, report a failure.
- One small difference exists between inserters and extractors. You can insert
- the value of an expression into a stream. (This is usually called an "r-value"
- by old-line C programmers.) But you extract from a stream a value into an
- object. (Those same C programmers would call this an "l-value," but the times
- and the terms they are a-changing.) A corresponding difference appears in C --
- you call printf with arbitrary expressions for value arguments and you call
- scanf with pointer arguments to designate the objects to store into.
- In C++, you declare extractors with reference arguments, as in:
- istream& operator>>(int& n);
- That ampersand lets you write a bald n, but still ensures that a real live
- 1-value gets bound to the corresponding parameter within the function. No
- worry about null pointers or other pointer type mismatches.
- You can also play tricks with extractors, by the way, much like that end1
- shorthand I showed earlier. If all you want to do, for example, is consume any
- pending white space from the standard input stream, you can write:
- cin >> ws;
- and the job is done. Similarly, you can communicate various bits of formatting
- information through other manipulators. much as with output streams. Once
- again, I won't begin to explain the magic behind that bald ws manipulator.
- Just note for now that such tricks are possible.
-
-
- A Little History
-
-
- These iostreams have several clear advantages over the formatted I/O functions
- of the Standard C library. Little wonder that every implementation of C++ has
- for years offered some version of iostreams, however much the implementation
- may vary in its support for other common library classes. Jerry Schwarz, now
- at Lucid Technology, gets credit for developing the earliest version of
- iostreams, for helping it become so widespread, and for seeing the package
- through several major revisions. He is also responsible for drafting the
- specification of iostreams in the draft standard C++ library.
- Unfortunately, little has been written on the detailed architecture of
- iostreams. About the only commercially available guide is a book by Steve
- Teale [1], which deals with a slightly dated version of the package. Since the
- draft C++ standard progresses even beyond the current field version, Teale's
- book offers only limited guidance. Still, it's better than what you typically
- get from the vendor of a C++ translator.
- For many class libraries this lack of information would not be a problem. But
- Schwarz designed iostreams to be extensible in several important ways:
- You can overload operator<< to define additional inserters, or operator>> to
- define additional extractors for classes you define.
- You can define a host of manipulators that work with objects of class ostream,
- class istream, or both.
- You can derive new classes from class streambuf, then over-ride several of its
- virtual member functions, to control sources and sinks of characters of your
- own devising.
- Such power is not without its complexity. And complexity can be mastered
- safely only with careful guidance. To date, programmers have relied on access
- to bits and pieces of library source code to get that guidance. Where such
- code is not available, or where it varies among implementations, adequate
- guidance has been lacking. Standardizing iostreams is thus a major step toward
- helping the package realize its full potential.
-
-
- Base Class ios
-
-
- Let's begin with some basic architecture. Classes ostream and istream have
- several requirements in common:
- Both must control a stream through the agency of some object of class
- streambuf.
- Both must maintain some notion of the state of the controlled stream,
- including a history of any errors that have occurred and how to report future
- errors.
- Both must memorize a host of formatting options, as I described earlier.
- Both must define a number of common nested types for describing the member
- functions and objects needed to effectuate the above requirements.
- To provide all these services, both classes derive from the virtual public
- base class ios. Listing 1 shows how class ios is declared in the draft C++
- Standard. Two kinds of members are commented out:
- Those labeled exposition only are merely indicative of what kind of
- information the class must maintain. Don't expect an actual implementation to
- have exactly such a member with exactly such a name. (The italicized names
- aren't even reserved.)
- Those labeled optional can be present in an implementation for backward
- compatibility with some widespread existing practice. A conforming
- implementation may, however, choose to omit such members.
- Note that class ios is a virtual base for both ostream and istream. That is
- more a matter of compatibility with past practice than of necessity. Some
- implementations control a stream that can be both read and written by
- declaring an object of class iostream, defined something like:
- class iostream
- : public istream, ostream {
- ..... };
- Were the base ios not virtual, this class would end up with two such
- subobjects, not just one. That would lead to all sorts of confusion in trying
- to control the bidirectional stream. Making the base virtual does add a bit of
- complexity here and there, particularly with initialization, but it permits
- the traditional definition of class iostream for those who want to keep using
- it.
- This class is not a part of the draft C++ standard, however, because it is no
- longer necessary. The preferred way to control a bidirectional stream is with
- two separate objects, one of class ostream and one of class istream. Both
- point to the same streambuf object, which is the only agent who really has to
- know that the stream can be both read and written. (That is part of the reason
- why an object of class ios contains a pointer to a separate streambuf object,
- instead of the object itself.)
-
-
-
- In Times to Come
-
-
- Class ios takes a lot of declaring, as you can see from Listing 1. For all
- that, it is neither a big nor a very complex class. Still, all that declaring
- demands a comparable amount of explaining. And class ios does lie at the heart
- of the entire iostreams class hierarchy. I plan to devote the next installment
- of this column to a careful study of this class, and the subtler implications
- of its semantics. Only after such a detailed introduction do classes ostream
- and istream begin to make sense.
- Class streambuf is equally fundamental to iostreams. You can get quite a lot
- of use out of iostreams without ever declaring a streambuf object in anger.
- But if you want to know how the whole works hangs together, or if you want to
- extend iostreams in nontrivial ways, you must know how this class behaves in
- detail.
- Manipulators are yet another topic. Many are simple and easy to explain once
- you know the basics of the classes ios, ostream, and istream. But all those
- manipulators with arguments derive from one of several "interesting" template
- classes. (In many existing implementations, they are built atop even more
- interesting macros.)
- Two library classes show some of the power of class streambuf. Class
- strstreambuf provides capabilities very akin to the Standard C library
- function sprintf. You can use inserters and extractors to manipulate in-memory
- text strings. Class stringstreambuf is similar, except that it eases
- conversion between such in-memory strings and objects of the standard library
- class string. (I'll discuss class string in detail much later.)
- Finally, we return to the objects that manipulate external files. Class
- filebuf lets you open files by name, much as with fopen, then manipulate them
- as iostreams. And the objects cin, cout, cerr, and clog have their own tale as
- well. It turns out that initializing these creatures, as required by the draft
- C++ Standard, is no mean feat.
- The description of the iostreams facilities occupies about half the library
- portion of the draft C++ Standard. It's going to take quite some time to
- review it in adequate detail. Bear with me.
- References
- [1] Steve Teale, C++ IOStreams Handbook, Addison-Wesley, 1993.
-
- Listing 1 Class ios
- class ios {
- public:
- class failure public: xmsg {
- public:
- failure(const char* where_val = 0, const char*
- why_val = 0);
- virtual ~failure();
- protected:
- // virtual void do_raise(); inherited
- };
- typedef T1 fmtflags;
- static const fmtflags dec;
- static const fmtflags fixed;
- static const fmtflags hex;
- static const fmtflags internal;
- static const fmtflags left;
- static const fmtflags oct;
- static const fmtflags right;
- static const fmtflags scientific;
- static const fmtflags showbase;
- static const fmtflags showpoint;
- static const fmtflags showpos;
- static const fmtflags skipws;
- static const fmtflags unitbuf;
- static const fmtflags uppercase;
- static const fmtflags adjustfield;
- static const fmtflags basefield;
- static const fmtflags floatfield;
- typedef T2 iostate;
- static const iostate badbit;
- static const iostate eofbit;
- static const iostate failbit;
- static const iostate goodbit;
- typedef T3 openmode;
- static const openmode app;
- static const openmode ate;
- static const openmode binary;
- static const openmode in;
- static const openmode out;
- static const openmode trunc;
- typedef T4 seekdir;
- static const seekdir beg;
- static const seekdir cur;
- static const seekdir end;
- // typedef T5 io_state; optional
- // typedef T6 open_mode; optional
-
- // typeder T7 seek_dir; optional
- class Init {
- public:
- Init();
- ~Init();
- private:
- // static int init_cnt; exposition only
- };
- ios(streambuf* sb_arg);
- virtual ~ios();
- operator void*() const
- int operator!() const
- ios& copyfmt(const ios& rhs);
- ostream* tie() const;
- ostream* tie(ostream* tiestr_arg);
- streambuf* rdbuf() const;
- streambuf* rdbuf(streambuf* sb_arg);
- iostate rdstate() const;
- void clear(iostate state_arg = 0);
- // void clear(io_state state_arg = 0); optional
- void setstate(iostate state_arg);
- // void setstate(io_state state_arg); optional
- int good() const;
- int eof() const;
- int fail() const;
- int bad() const;
- iostate exceptions() const;
- void exceptions(iostate except_arg);
- // void exceptions(io_state except_arg); optional
- fmtflags flags() const;
- fmtflags flags(fmtflags fmtfl_arg);
- fmtflags setf(fmtflags fmtfl_arg);
- fmtflags setf(fmtflags fmtfl_arg, fmtflags mask);
- void unsetf(fmtflags mask);
- int fill() const;
- int fill(int ch);
- int precision() const;
- int precision(int prec_arg);
- int width() const;
- int width(int wide_arg);
- static int xalloc();
- long& iword(int index_arg);
- void*& pword(int index_arg);
- protected:
- ios();
- init(streambuf* sb_arg);
- private:
- // streambuf* sb; exposition only
- // ostream* tiestr; exposition only
- // iostate state; exposition only
- // iostate except; exposition only
- // fmtflags fmtfl; exposition only
- // int prec; exposition only
- // int wide; exposition only
- // char fillch; exposition only
- // static int index; exposition only
- // int* iarray; exposition only
- // void** parray; exposition only
- };
-
-
- // End of File
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Code Capsules
-
-
- Visibility in C
-
-
-
-
- Chuck Allison
-
-
- Chuck Allison is a regular columnist with CUJ and a software architect for the
- Family History Department of the Church of Jesus Christ of Latter Day Saints
- Church Headquarters in Salt Lake City. He has a B.S. and M.S. in mathematics,
- has been programming since 1975, and has been teaching and developing in C
- since 1984. His current interest is object-oriented technology and education.
- He is a member of X3Jl6, the ANSI C++ Standards Committee. Chuck can be
- reached on the Internet at allison@decus.org, or at (801)240-4510.
-
-
-
-
- What's in a Name?
-
-
- Every token in a C program that begins with a letter or an underscore, other
- than a keyword or macro, names one of the following entities:
- data object
- function
- type definition (typedef)
- tag for a structure, union, or enumeration
- member of a structure, union
- enumeration constant
- label
- These entities are active at certain times and places in your program,
- depending on how and where their declarations appear. Whether or not you are
- aware of it, when you declare an identifier you determine its scope, lifetime,
- linkage, and namespace. In this article I will illustrate these inter-related
- concepts as Standard C defines them.
-
-
- Scope
-
-
- The scope of an identifier is the region of your program's text where you can
- use that identifier (in other words, where it is visible). There are four
- types of scope in Standard C:
- 1) Block -- a region within a pair of matching {}-braces that begins where the
- declarator first appears and ends with the first subsequent closing brace
- 2) Function Prototype -- the region in a function prototype from where the
- identifier occurs to where the last closing parenthesis appears
- 3) Function -- the entire body of a function
- 4) File -- the region in a source file from where an identifier first appears
- outside of any block to the end of the source file
- Formal parameters have the same scope as if they were declared in the
- outermost block of their function. Identifier names are optional in function
- prototypes, and serve only for documentation if they appear. Block scope and
- function prototype scope together are sometimes referred to as local scope.
- In Listing 1, the optional identifier val serves as documentation only, and is
- not visible outside the prototype for the function f. f's formal parameter i
- is visible immediately after the opening brace that defines f. Since each
- block introduces a new scope, the i initialized by j in the innermost block
- temporarily hides the i in the outer block. The value of j is available to
- initialize i in the inner block because j was declared first. If j had been
- declared like this:
- {
- int j = i; /* outer i */
- then j would have received the value 1. There is no conflict because the inner
- i isn't visible until the next statement.
- An identifier is visible as soon as its declarator is complete. The following
- declaration is ill-formed:
- int i = i;
- Since int i is sufficient to declare an integer named i, the i on the right is
- the same as on the left, and i is left uninitialized.
- Only labels have function scope. There's a difference between function scope
- and the scope of a function's outermost block. Labels which have function
- scope are visible throughout the function, even before they are "declared,"
- but identifiers with block scope are visible only after their point of
- declaration and can be hidden by other declarations in nested blocks.
- An identifier declared outside of any block or function parameter list has
- file scope. Such identifiers are sometimes referred to as global, and are
- visible from their point of declaration until the end of the translation unit,
- unless they are hidden by an identifier with the same name having block scope.
- The program in Listing 2 illustrates function and file scope. Since the
- identifier i with file scope (the one initialized to 13) is visible only after
- its declaration, it would be an error to try to use it in the main program.
- The global i is not available anywhere in f1 since f1 has a parameter named i.
- The innermost block of f1 in turn hides that parameter with its own i (the
- types don't have to be the same). Since f2 declares no identifiers named i, it
- has access to the global i. The declarations of fl and f2 in main inject those
- names into the body of main only. It would be an error, for example, to call
- f2 from f1.
-
-
- Minimal Scoping
-
-
- Thoughtful placement of declarations can greatly enhance the readability of a
- program. Most programmers still seem to follow the convention, required by
- languages such as FORTRAN and COBOL, of placing all declarations together in a
- single section of the source code. This gives you the advantage of always
- knowing where to look for variable definitions. But when a program gets large,
- you spend a good deal of time flipping back and forth between the declaration
- of an identifier and its point of use. Those of you who program in Microsoft
- Windows know that Hungarian notation, a convention of encoding the type of an
- identifier into its name, evolved as a means of compensating for the distance
- between a name's declaration and its use. At the risk of inciting a flurry of
- letters to the editor, I would like to suggest instead a simple technique
- known to C++ programmers which, when coupled with modular design, renders
- strange-looking Hungarian names unnecessary in most cases. I call it minimal
- scoping. Simply put, it means to declare an identifier as close as possible to
- its first use.
- For example, what do you infer from the following program segment?
- void f(void)
-
- {
- int i, j;
- ...
- }
- Even though you may only use i and j in a small portion of f, the declaration
- says that they are available everywhere within the function. Therefore, i and
- j have more scope than they deserve. If, for example, i and j only apply under
- certain conditions, you can easily limit their scope by declaring them within
- an appropriate inner block:
- void f(void)
- {
- ...
- if (<some condition>)
- {
- int i, j;
- /* only use i and j here */
- }
- ...
- }
- Another advantage to this practice is that i and j aren't even allocated if
- the condition is false -- they don't exist outside their block. C++ encourages
- minimal scoping by allowing declarations to appear anywhere a statement can,
- as in:
- for (int i = 0; i < n; ++i)
- ...
- The index i is visible from its point of declaration to the end of its block.
- Minimal scoping aids readability because you don't even see identifiers until
- you need to -- when they add meaning to the program.
-
-
- Lifetime
-
-
- The lifetime, or storage durations of an object is the period from the time it
- is created to the time it is destroyed. Objects that have static duration are
- created and initialized once, prior to program startup, and are destroyed when
- the program terminates normally. Objects with file scope, as well as objects
- declared with the static specifier at block scope, have static duration.
- Listing 3 shows an example of the latter. The variable n is initialized once
- at program startup and retains its last-assigned value throughout the program.
- (Its scope, however, is just the body of the function count.)
- Function parameters and objects declared within a block without the extern or
- static specifier have automatic duration. Such objects are created anew every
- time execution enters their block. Every time execution enters a block
- normally, that is, not as the result of a goto, then any initialization you
- may have specified is also performed. When execution falls through or jumps
- past the end of a block, or returns from a function, all automatic variables
- in that scope are destroyed.
- The program in Listing 4 illustrates both static and automatic duration with
- the familiar factorial function. The token n!, pronounced "n-factorial"
- (without yelling), denotes the product of all positive integers up to and
- including n. For example,
- 3! = 3 x 2 x 1
- 4! = 4 x 3 x 2 x 1
- etc.
- Most math textbooks give the following equivalent recursive definition
- instead:
- Click Here for Equation
- You can render this definition concisely in C with the following recursive
- function:
- long fac(long n)
- {
- return (n <= 1) ? 1 : n * fac(n-1);
- }
- When n is greater than 1, fac calls itself recursively, with an argument equal
- to one less than it started with. This action temporarily suspends the current
- scope and creates a new one, with its own copy of n. This process continues
- until the most deeply nested copy of n is equal to 1. This scope terminates
- and returns 1 to the scope that called it, and so on up to the original
- invocation. For example, consider the execution of the expression fac(3):
- fac(3): return (3 <= 1) ? 1 : 3 * fac(2);
- This calls fac(2):
- fac(2): return (2 <= 1) ? 1 : 2 * fac(1);
- which in turn calls fac(1):
- fac(1): return (1 <= 1) ? 1 : fac(0);
- which returns 1:
- fac(1): return 1;
- fac(2) now resumes and returns the following to fac(3):
- fac(2): return 2 * 1;
- which returns the value 6 to the original caller.
- The program in listing 4 traces this recursive computation by wrapping the
- factorial formula with statements to print the value coming into the function
- and the computed value going out. This program keeps track of how deep the
- recursion has nested with the static variable depth. Since depth has static
- duration, it is allocated and initialized once prior to program startup and
- retains its value across function calls (including recursive ones). Only
- automatic variables, like n, are replicated with each recursive call. The auto
- keyword is purely documentary, since all variables with block scope are
- automatic by default.
-
-
- Linkage
-
-
- According to the rules of linkage, two same-named identifiers can refer to the
- same object, even if they occupy different translation units. There are three
- types of linkage in C:
- 1) External linkage -- names across translation units in a program
- 2) Internal linkage -- names throughout a single translation unit
- 3) No linkage -- certain objects are unique, hence, they have no linkage
- Both functions and global objects declared wihout the static keyword have
- external linkage. There must be only one definition of each such object, but
- there may be many declarations that refer to that definition. For example, if
- the following declaration occurs at file scope in a file:
-
- /* file1.c */
- int x;
- then it can used in another file at any scope where the following occurs:
- /* file2.c */
- extern int x;
- The extern specifier in essence says, "find an object named x defined at file
- scope." The extern specifier is not required to link to a function with
- external linkage:
- /* file1.c */
- int f(void)
- {
- return 1;
- }
-
- /* file2.c*/
- int f(void); /* extern specifier assumed for functions
- - links to f in file1.c */
- Although the C standard doesn't explicitly define it, you can think of objects
- with external linkage as constituting a new scope: that of objects visible
- across translation units. On the street this is known as program scope.
- Functions and global objects declared with the static specifier have internal
- linkage. Identifiers with internal linkage are visible only within their
- translation unit. This use of the keyword static has little to do with the
- static storage duration specifier discussed previously. It's a good idea to
- commit the following pseudo-formulas to memory:
- static + block scope == static storage duration
- static + file scope == internal linkage
- The first use of static alters lifetime, the second linkage. If you think this
- is confusing, C++ muddies the waters further by introducing a third use of the
- term (static class members). I'll spare you that one until next month.
- Certain program entities, which are always unique, are said to have no
- linkage. These entities include objects having block scope but no extern
- specifier, function parameters, and anything other than function or object,
- such as a label, tag name, member name, typedef name, or enumeration constant.
- The source files in Listing 5 and Listing 6 comprise a single executable
- program that illustrates the different types of linkage. Listing 5 is similar
- to Listing 2 except that functions f1 and f2 are private to the source file,
- and a function from the source file in Listing 6 is added to the executable
- program. The integer i at file scope in Listing 5 has external linkage because
- it does not carry the static specifier. A variable of the same name in another
- file can refer to it if declared with the extern specifier, as Listing 6 does.
- (It is an error to have two definitions of the same object with external
- linkage, e.g., two i's modified by neither static nor extern.)
- The functions f1 and f2 have internal linkage because they use the static
- specifier. The float object named i in f1 has no linkage because it is
- declared at block scope without the extern specifier. The integer j in Listing
- 6 has internal linkage because of the static specifier, and the function f3
- has external linkage because of the absence of the static specifier.
- The following three lines from Listing 5 require particular explanation:
- extern void f1(int); /* Internal Linkage */
- extern void f2(void); /* Internal Linkage */
- extern void f3(void); /* External Linkage */
- Since the extern specifier means "link with something at file scope," the
- declarations f1 and f2 in main link respectively with the functions of the
- same name in the same file, which happen to have internal linkage. It is
- important that you declare f1 and f2 static at file scope before the extern
- references in main, or else the compiler will assume that they have external
- linkage (like it does for f3), which conflicts with the actual function
- definitions later in the file.
-
-
- Namespaces
-
-
- An identifier can play various roles in a C program. For example, in the
- following excerpt, pair is both a function name and a structure tag:
- struct pair {int x; int y;};
-
- void pair(struct pair p)
- {
- printf("(%d,%d)\n",p.x,p.y);
- }
- The compiler keeps separate lists of identifiers used in different roles, so
- there is no danger of ambiguity. These lists are called namespaces. There are
- four different types of namespaces in standard C:
- 1) labels
- 2) tags for structures, unions, and enumerations
- 3) members of structures and unions
- 4) ordinary identifiers (i.e., all others: data objects, functions, types, and
- enumeration constants).
- Each function keeps its own list of labels. Each translation unit and each
- block keeps its own set of namespaces for tags and for ordinary identifiers,
- which is what allows an inner scope to hide like- named entities at outer
- scopes. Each structure or union type keeps its own list of identifiers.
- Enumeration constants belong to the space of ordinary identifiers, since you
- use them like objects in a program.
- The program in Listing 7 uses the following namespaces (arbitrary names are
- mine):
- tag-global -- struct-union-enum tags defined at file scope. A redefinition of
- struct foo at block scope would hide the one from an enclosing scope.
- global-foo-member -- the member names of the struct foo defined at file scope.
- Any redefinitions of struct foo at block scope would essentially be new types,
- and would keep their own list of member names.
- ordinary-global -- any data objects, functions, typedefs or enumeration
- constants defined at file scope (in this case, a struct foo object, which is
- hidden by a like-named object in main).
- ordinary-main-1 -- ordinary identifiers defined at the outer scope main (a
- struct foo object which hides the global one).
- ordinary-main-2 -- ordinary identifiers defined in the inner block of main
- (struct foo x, and the integer foo, which hides the struct foo of the same
- name in the enclosing scope).
- label-main -- labels defined in main (foo:)
- The comments in the source file indicate the namespace that claims each use of
- the identifier foo. I hope you find this program a little (nay, a lot)
- confusing. My monotonous overuse of a single identifier was for illustrative
- purposes only. You should only reuse names for good reason -- and I can't
- think of any at the moment, without discussing C++ overloading. If you see
- such code in "real life," treat it like a sensitive government document:
- DESTROY BEFORE READING!
-
-
- Summary
-
-
-
- There's a lot to a name in a C program. Each identifier has a scope (where it
- is visible), a lifetime (when it is active), a linkage (whether remote uses of
- the same name refer to the same entity), and a namespace (its role among
- identifiers). If this article hasn't made sense to you (but I hope it has),
- guess what: it gets worse! I can't think of a single concept I've discussed
- here that C++ doesn't affect. Now you know the subject of next month's
- article.
-
- Listing 1 Illustrates local scope
- /* scopel.c */
-
- #include <stdio.h>
-
- void f(int val);
-
- main()
- {
- f(1);
- return 0;
- }
-
- void f(int i)
- {
- printf("i == %d\n",i);
- {
- int j = 10;
- int i = j;
- printf("i == %d\n",i);
- }
- }
-
- /* Output:
- i == 1
- i == 10
- */
-
- /* End of File */
-
-
- Listing 2 Illustrates function and file scope
- /* scope2.c */
-
- #include <stdio.h>
-
- main()
- {
- void fl(int i);
- void f2(void);
-
- f1(23);
- f2();
- return 0;
- }
-
- int i = 13;
-
- void fl(int i)
- {
- for (;;)
- {
- float i = 33.0;
-
- printf("%f\n",i);
- goto exit;
- }
-
-
- exit:
- printf("%d\n",i);
- }
-
- void f2(void)
- {
- printf("%d\n",i);
- }
-
- /* Output:
- 33.000000
- 23
- 13
- */
-
- /* End of File */
-
-
- Listing 3 Illustrates static storage duration
- /* lifetime.c */
-
- #include <stdio.h>
-
- main()
- {
- int count(void);
- int i;
-
- for (i = 0; i < 5; ++i)
- printf("%d\n",count( ));
- return 0;
- }
-
- int count(void)
- {
- static int n = 0;
-
- return ++n;
- }
-
- /* Output:
- 1
- 2
- 3
- 4
- 5
- */
-
- /* End of File */
-
-
- Listing 4 Illustrates recursion and storage duration
- /* recurse.c */
-
- #include <stdio.h>
-
- main()
- {
-
- long n;
- long fac(long);
-
- fputs("Enter a small integer: ",stderr);
- scanf("%ld%*c",&n);
- printf("\n%ld! = %ld\n",n,fac(n));
- return 0;
- }
-
- long fac(long n)
- {
- static int depth = 0;
- auto long result;
- void print_current(int,long);
-
- print_current(++depth,n);
- result = (n <= 1) ? i : n * fac(n-1);
- print_current(depth--,result);
-
- return result;
- }
-
- void print_current(int depth, long n)
- {
- int i;
-
- /* Indent to show depth */
- for (i = 0; i < depth; ++i)
- fputs(" ",stdout);
-
- printf("%ld\n",n);
- }
-
- /* Output:
- Enter a small integer: 3
- 3
- 2
- 1
- 1
- 2
- 6
-
- 3! = 6
- */
-
- /* End of File */
-
-
- Listing 5 Links with linkage2.c
- /* linkage1.c */
-
- #include <stdio.h>
-
- static void fl(int);
- static void f2(void);
-
- main()
- {
- extern void fl(int); /* Internal Linkage */
-
- extern void f2(void); /* Internal Linkage */
- extern void f3(void); /* External Linkage */
-
- f1(23);
- f2();
- f3();
- return 0;
- }
-
- int i = 13; /* External Linkage */
-
- static void fl(int i) /* Internal Linkage */
- {
- for (;;)
- {
- float i = 33.0; /* No Linkage */
-
- printf("%f\n",i );
- goto exit;
- }
-
- exit: /* No linkage */
- printf("%d\n", i );
- }
-
- static void f2(void) /* Internal Linkage */
- {
- printf("%d\n",i );
- }
-
- /* Output:
- 33.000000
- 23
- 13
- 16
- */
-
- /* End of File */
-
-
- Listing 6 Links with linkage1.c
- /* linkage2.c */
-
- #include <stdio.h>
-
- extern int i; /* External Linkage */
-
- static int j = 3; /* Internal Linkage */
-
- void f3(void) /* External Linkage */
- {
- printf("%d\n",i+j);
- }
-
- /* End of File */
-
-
- Listing 7 Illustrate namespaces
- /* namspace.c */
-
-
- #include <stdio.h>
-
- struct foo /* tag-global */
- {
- int foo; /* global-foo-member */
- };
-
- struct foo foo; /* ordinary-global
- not used) */
-
- main( )
- {
- struct foo foo; /* tag-global, ordinary-main-1 */
-
- goto foo; /* label-main */
-
- foo: /* label-main */
-
- foo.foo = 1; /* ordinar