C/C++ Users Journal 1990

home *** CD-ROM | disk | FTP | other *** search

/ C/C++ Users Journal 1990 - 1995 / CUJ.iso / unix / 1990.txt next >

Wrap

Text File | 1996-02-07 | 3.3 MB | 96,735 lines

Text Truncated. Only the first 1MB is shown below. Download the file for the complete contents. We Have Mail Dear Mr. Ward: I am not much of a letter writer, but after reading the July 89 issue of the C Users Journal I felt I could save some of your readers a lot of time tracking down a problem with the Microsoft C, version 5.10 memory allocation routines. Enclosed is a listing and the output from the program. This may help Steven Isaacson who is having memory allocation problems using Vitamin C. I found this problem after a week of tracking down a memory leak problem in a very large application. My final solution was to write my own malloc()/free() rountines that call DOS directly. This will let the DOS allocator do what it is supposed to do. No time penalty was noticed in our application. Note if you do write your own malloc()/free() routines, call them something else! MSC uses these routines internally and makes assumptions about what data is located outside the allocated area. I always use a malloc()/free() shell to test for things like memory leaks and the free of a non-allocated block. It also will give you an easy way to install a global 'out of memory' error handler. The code supplied by Leonard Zerman on finding the amount of free space in a system is simplistic and very limited. A better routine would build a linked list of elements and then the variable vptrarray could be made a single pointer to the head of the list. The entire routine becomes dynamic, much more robust, and there is no danger of overflowing a statically allocated array. See the supplied code for an example. The linked list implementation has the side effect that it will work on a virtual memory system. Why you would want to do this is beyond me, but it could be considered a very time consuming way to find out what swapmax is set to on a UNIX system. If you have any questions, please contact me. My phone number is (408) 988-3818. My fax number is (408) 748-1424. Sincerely yours, Jim Schimandle Primary Syncretics 473 Sapena Court, Unit #6 Santa Clara, CA 95054 Thanks for the information. We've included your code in Listing 1. -- rlw Dear Mr. Ward: I'm new to programming and need to extract information from old mainframe files. Each file has its own annoying attributes. Some files are reports for printing on 132 column paper with headers on each page along with errors in tabulation and decimal point alignment. I'd like to know enough about grep, awk, sed, and tr so I'm not reinventing the wheel with my C programs for file manipulation. Where can I find an understandable and brief overview of these UNIX tools? (I know nothing about regular expressions, scanning, and syntactic analysis.) Sincerely, Orion C. Whitaker, M.D. 400 Brookline Ave., #22F Boston, MA 02215 I suggest The UNIX Programming Environment by Kernighan and Pike. This is a tidy little book that does more to explain how the tools work and work together than any other book I've seen. While it's insightful, it's also a good teaching text. You should also consider The Awk Programming Language by Aho, Kernighan and Weinberger (the A. W. K. in awk). If our readers know of other texts that do a good job of explaining how to use the UNIX language-oriented tools, I'd like to hear from you. -- rlw Thank you for your letter/brochure. First, I have some questions. I studied BASIC last Semester at Comm. College, and would now like to learn C. My major problem is MY computer. I have a Commodore 64 with 256K RAM expansion, and plan to use Abacus Software's Super C Compiler 64. I am a retiree with little prospect of buying a new computer. 1. Do you offer much in this format, or am I butting my head against a wall? 2. Would it be practical for me to attend a class where they are using, probably, IBM compatibles, and do my homework on my system? Would work developed on my system operate on "IBM"s? The disks are not compatible, but could my work be 'retyped' into the "IBM"? I have Standard C by Plauger & Brodie, and Transactor Magazine has articles which look like they will be useful when I learn more. Les Maynard P.O. Box 915 Palmer, AK 99645 Unfortunately, we can't write Commodore disks. However, it's my understanding that if you have the right Commodore drive you can get a program that will let you read MS-DOS disks directly. Whether you can do your C homework on your Commodore depends on several things: 1) Is your instructor willing to accept Commodore output. If you have to run your work on an MS-DOS host to make it acceptable, it probably won't work. 2) What subjects and exercises will the class focus upon? If writing direct to the IBM video display is one of the exercises, it probably isn't reasonable for you to try to work along on the Commodore. If, on the other hand, the class will confine itself to general, portable language features and concepts, you will have less trouble. 3) How adept are you at researching your own system? At some point (probably several points), a classroom illustration isn't going to work on your machine. It really isn't fair to expect the instructor to research the problem for you. Can you find your own way? 4) Is your Commodore implementation complete enough to support the scope of the class? Will you be asked to write programs that exceed the memory space? Will you need doubles? Will the exercises require elaborate pre-processor capabilities? At the very least you should have a serious talk with the instructor before you enroll. Whether work you develop will run on an IBM depends entirely upon the code. If you confine yourself to generic file processing and discipline yourself to avoid or at least properly hide any Commodore peculiarities, then your code should run in the IBM environment. (You might find some helpful ideas in Rainer Gerhard's story in this issue.) Please note these are major ifs even for very experienced C programmers. -- rlw Dear CUG, I am writing to warn you and other users of the problems I have found with LEX part 1 and 2 on disk number 172 and 173. The program generates code which crashes the system when run. The problem is in llstin(). If _tabp is NULL, it assigns it to the return of lexswitch(). lexswitch() returns a pointer to the previous table, which is NULL when first cared. The results is _tabp being set to NULL forever. Since this table contains pointers to functions, the program jumps off to an unknown address. The source code that was provided will NOT generate this code, indicating that the exe file was not built from this source! So, I rebuilt it and, in testing, found the new exe produced different tables than the release program did. There are various solutions to this problem. One is by setting _tabp to the location of the table in the .lxi. The solution is to edit the generated source file each time and removing the assignment statement to _tabp in llstin(). Or you could alternately change lexswitch() to return the new value. I don't like the last one because all the documentation states the return value is a pointer to the previous table. Since I am using the -s option, I edit the file as there is another problem with that option. The problem with the -s option may only exist with Microsoft C. llstin() is declared as a void at the beginning. The function itself is NOT. The compiler produces a diagnostic error. With the incorrect source, the only way around this is to edit the file. (A REAL PAIN if you are using a make file to build the final program.) I also have a copy of "Bison". It has worked very well with one exception. I found I had to include stdlib.h in simple.prs in order to get rid of several warning messages under certain conditions. One might include it inside the .y file, instead. By placing it inside simple.prs I don't have to remember to put it inside the .y. In general, I've found bison to be GREAT. Keep up the good work, and good luck. Sincerely, Frank Veenstra 24797 Metric Dr. Moreno Valley, CA 92387 Yes, the .exe and source files are out of phase. We'll test your fix and remaster the volume with the fix. When we have a new master we'll announce an upgrade in the New Releases column. Thanks for the help. -- rlw Mr. Robert Ward: In the May, 1989 issue of the C Users Journal, Timothy Prince presented a rather eloquent and detailed article entitled "Efficient Matrix Coding in C". However, I would like to bring to your attention, excuse me if someone already has, an error in that article. Mr. Prince asserts the following to be true: a[i][j] = *( &a[0][0] + i * I + j ) when given the declaration: float a[I][J] ; C stores array elements in a row-major order and not in column-major order as suggested above. The valid condition is as follows: a[i][j] = *( &a[0][0] + i * J + j) for the given declaration. All the elements of row a[0][.] are located at a lower address than the first element of row a[1][.], which is stored right after the array element a[0][J-1]. Consequently, to access a[i][j], it is necessary to skip i rows, where each row contains J elements plus the j elements before the desired element. I would also like to take this opportunity to commend you and your staff on producing a Journal that is superior technically than all the other superficial computer magazines that I have read. That May issue was my first copy of C Users Journal and it certainly will not be my last. Sincerely yours, Girish T. Hagan 27401 Via Olmo Mission Viejo, CA 92691 Ah yes, the hazard of too much FORTRAN and Pascal. Thanks for correcting our slip -- and thanks for the kind words. -- rlw Dear Robert, I have been a member of the C Users' Group for quite a long time now, around the seven to eight years mark. Over this period I have kept all of your newsletters and your present The C Users Journal publications. I have watched the evolution of the Journal with great interest. During your 'early days' I often reread some of the newsletters when I needed some information on a particular piece of code, or on a bug which another member had discovered. But time seems to compress as you get older. These days I rarely have the time to re-read articles, unless it is important that I do so. WHY is he telling me this...do I hear you ask? Well, I hope I have set the scene properly because I assume you have many more readers than just Phil Cogar who have difficulty in finding enough time to squeeze in their preferred reading. Professionals in any line of work tend to be busy people. Which brings me to the August issue of the Journal and, specifically, the article by Denis Schrader on the FOR_C Translator. Not that I am at all interested in FORTRAN_to_C translators but I always read the Journal from cover to cover and I hope my comments will assist in raising the standard of the Journal even further. With respect to Denis Schrader, who I hope does not take offence that I have selected his article to point out what I believe is wrong with some of the User Reports, I would like to direct your attention to this article with the plea that you consider setting certain standards for authors to write to for future User Reports. So, and without wishing to offend Denis, let me start by asking you to instruct your authors to make their reviews complete (or as complete as they can in the circumstance) as they stand. Don't presume the reader either has access to, or the inclination to look up, an earlier review. Of course rules are meant to be broken so you might give a reference to something written within the previous several months, but I suggest two years is a bit too long. I refer here specifically to the words-"...which I reviewed in the August 1987 issue...However, comments in this review will point out improvements which have occurred since the release of earler verisons of the product." Point 2, back up specific comments with specific information. For example if you say-- "The translator will pay for itself quickly in saved programmer hours." then you should also say how much it costs, both the List Price and, if you know it, the street price. Point 3, if we are talking about a specific product then either cut out or cut down on the generalisations. An example of this is the comment (statement?)-- "The translator translates almost 100 percent of ANSI Standard FORTRAN as well...extensions." If the reader is reading the User Report because he or she wants to be better informed about the product (and isn't that the purpose of the User Report?) then, in this case, unless we are told- -Whether this (the non-translation of the FORTRAN code) is a transient thing. In other words do you have to check each piece of translated code for small errors (perhaps for large errors...I don't know and the Review doesn't say) which might translate to bugs in C; or -Whether this is systematic and the FOR_C translator only fails to properly translate certain pieces of FORTRAN code properly into C. In such a case does the translator 'flag' the offending pieces of code so they can be corrected using the recommended, known conversion; or -If your translated C code compiles without the compiler complaining to you, does this mean the code is a 1-for-1 translation of the FORTRAN routine, or not; and so on. It seems to me that a generalised comment of the type mentioned above does little (nothing?) to better inform the reader about the merits or otherwise of the product. Point 4, comparisons are odious (or so we are told) but they seem to abound in product reviews. My point is that partial comparisons tend more to mislead the reader than to inform him/her. In other words we are talking here about a product which translates FORTRAN code into C code. We are not told WHY it is desirable to do this if you already have good, de-bugged FORTRAN routines you wish to incorporate into C programmes. Please correct me if I have got it wrong but, as I understand the situation the Microsoft family of microcomputer languages allow you to generate files compiled in Basic, C Pascal, FORTRAN and assembler any and all of which can be linked into a run-time file as required. I am (most certainly) NOT an apologist for Microsoft but I do suggest a reviewer has not properly informed the reader as to the merits or otherwise of the product without at least canvassing other alternatives. If Microsoft, for example, have a family of languages which can do the job in another way (you'll notice I didn't say 'a better way' because I don't have a clue which is the better approach, the Review didn't tell me) then the Reviewer should at least mention this. In other words alert the reader to other possible alternatives, at least. The preferred option would be to make a comparison between the competing products and compare features, strengths and weaknesses. So there it is. In summary my four points are-- Point 1- Make the review as complete as possible in the spacce allowed. Don't ask the reader to look up other references. We aren't dealing with a scientific paper, just a product review. Point 2- Give specific (factual) backup to specific comments. It's not that we don't trust reviewers to be objective, but we are discussing opinions here, and my opinion may well differ from the Reviewer's if I am given the opportunity to see what his\her opinion was based on. Point 3- Leave out generalisations, at least if we are discussing one, specific product. Generalisations are OK if we are discussing a 'family' of products. Who was it said-- 'All generalisations are false.' Or perhaps I got that wrong? Point 4- If you believe comparisons (with products from other sources) make the review stronger, then by all means put in the comparisons...but at least try to cover the best alternatives to the product being reviewed. Anything less then you are misleading your readers. I know it has been tedious, but that's all I wish to say on the subject for the moment. Perhaps you will find something here to put before future Product Reviewers, when they submit their articles. My hope is that I have sparked a debate which will lead to an even higher standard for what is already a fine publication. Yours sincerely, Phil E. Cogar P.O. Box 364, Narrabeen, N.S.W. Australia 2101 I find myself in complete agreement with your four points. I'm sorry the FOR_C article didn't measure up. Generally I'd just as soon do without "reviews". That's why we've used the label "User Reports". I don't really care if someone gave the product four stars -- I want to know what it's like to use the product. Will it require some changes in my work habits? Does it seem to fit a certain design style better than others? Are certain unobvious tricks necessary to certain goals. If someone has spent enough time with a product to be qualified to evaluate it for other experienced programmers, then that person has also learned several things that aren't in the manuals. Why should I have to relearn those items if I decide to buy the product? The writer should give me the full vicarious benefit of his experience. Here are some of my guidelines for anyone interested in writing a product related story. Don't try to sell the product or your philosophy of how products should be designed, tested, marketed, packaged ... whatever. Instead, tell us what it does and doesn't do. Keep the opinions to a minimum. If you give intelligent, experienced readers access to the facts that produced your opinion, they'll reach a similar (or at least reasonable) opinion on their own. Don't be cute. I don't care how entertaining you think your struggle to remove the shrink wrap was, I don't want to waste time reading about it. Don't guess. If you aren't certain about a particular issue, either find out or don't mention it. Don't just list features. That's the role of vendor literature. Do share all you learned in working with the product. If you include information inappropriate to my audience, I can edit it out. I can't edit in information. I'm acutely aware that we very seldom get product-releated copy that fully measures up to these guidelines. We're always working on getting better copy. -- rlw To The C Users Group: I am disheartened at the lack of truly advanced pioneering books in C programming. Particularly those of a scientific nature. Numerical Recipes in C and Numerical Software Tools in C are the only two that I have heard of, which are primarily argorithm'ic' books without instruction. Everyone seems to be publishing the same link lists, the same databases, and the same TODO lists. Just as in assembly language books one gets the same Ram disks, disk caches and clocks. That is not just book publishers either. Journals and magazines are doing the same thing. I cannot believe that the programming community lacks such expertise. When will publishers realize that enough is enough, and start producing books and articles of a truly advanced nature, like the one you had The Fast Walsh Transform. It is also time for a complete numerical methods book written for C programming in a common compiler (MSC TC) with full descriptions as one would receive in a course in numerical methods at a University. Sincerely, Jerry Rice 504 Eastland St. El Paso, TX 79907 Maybe some qualified author (with a willing publisher) will hear your plea. Why do publishers publish the same material over and over? Perhaps because it sells. One of our earlier issues (with several stories covering the fundamentals of device drivers) remains one of our most popular back issues. Perhaps device drivers are old-hat to you, but to many they remain a mystery. Most of our readers are expert programmers, they just aren't all expert in the same areas.--rlw Using Header Files To Enhance Portability Rainer Gerhards Rainer Gerhards specializes in systems programming and has a strong interest in C. He has written some large-scale control systems and many small utilities in C. He owns his own small software company in addition to managing the computing center of a mid-sized company. He may be contacted at Petronellastrasse 6, 5112 Baesweiler, West Germany. C is known for its efficient code, rich set of features and portability. While portability is not built in, you can avoid possible portability problems by anticipating them. Let's look at a few problem areas, suggest some solutions, and examine one method in detail. One important portability issue is the C dialect that your compiler implements. Although there have always been C language standards, until recently they have been too imprecise to preclude varying interpretations. Early, less powerful machines also forced compiler writers to limit features, contributing additional variant dialects. Thus, some compilers can't understand valid C-coding if it contains unsupported features. Bit fields are a good example. A number of modern compilers still don't support bit fields. Of course, you could avoid using bit fields, but what if you write for one compiler which doesn't support structure and union assignment and for several others which do? You might avoid these constructs too, but would you prefer to learn while porting a 50,000 line program which makes extensive use of structure assignment, that the environment to which you're porting doesn't support structure assignment? The challenge is to know which features to avoid. Now nearly all commercially-used compilers support C in its entirety. But these compilers offer extra features, especially in the preprocessor area. Though you may simply avoid these features, you may not know which features are non-standard, especially if you are new to C or if you work in just one environment. Some compiler vendors don't flag such features. Even an experienced C programmer determined to avoid the problems outlined above by using only standardized constructs still faces the difficulty of deciding which "standard" to use: the original Kernighan and Ritchie (K&R) standard defined in The C Programming Language, or the forthcoming ANSI standard. The ANSI standard resolves many portability problems not addressed by K&R and provides a good base for the future. The ANSI standard is mostly upwardly compatible with K&R; most K&R programs can be moved to ANSI compilers without any problems. But in order to move code in the opposite direction successfully (from ANSI to K&R), compilers require special preprocessor tricks I'll describe later. The standard library poses similar problems. Compiler writers have restricted and extended the library rather than the language. Some compilers don't even have a standard library; many libraries include numerous extensions. MS-DOS compilers in particular tend to offer extensions covering graphics, interrupts, and operating system interfaces. Porting code which uses one compiler's extensions to a different compiler can be very difficult. Operating system differences, because they are the hardest to hide, are among the hardest subjects to address. Moreover, operating systems differ greatly -- some do multi-tasking, some are multi-user, and some are single tasking systems. The file-naming conventions are anything but standardized. These problems are minor compared to the variations in file organization. For example, while most operating systems consider text files to have variable length records (if any), some use fixed-length records (if any). Records may be delimited by \n, \n\r or record-length fields. Some OSs use special blocking mechanisms, others don't. Fortunately most standard libraries can hide these differences, but only by distinguishing between text and binary mode, introducing subtle, non-standard features. In addition to processing files the operating system should have some kind of interaction with the user, which leads to additional problems if you use special system features like asynchronous communication or sophisticated display manipulation. Hardware differences can cause programs that compile and link without error and run well in one environment, to crash in another. Often these problems are caused by different word lengths. It's hard for a UNIX programmer working with the portable C compiler (PCC) on 68xxx to learn that the same PCC on 80x86-based machines uses 16 instead of 32 bits for integers. A 68xxx program that uses integers to index some two million database records on a 68xxx machine may require a major rewrite before it can access more than 32,767 records on the 80x86 machine. Hardware differences can also affect the portability of pointer casts. Many programmers assume that pointers can simply be cast from one type to another -- a reasonable assumption on most byte machines. However, word machines' (like the Unisys 1100) pointers to word-aligned items differ significantly from pointers to non-aligned items. This is true for some so-called byte machines too. Still other problems arise when you port code from machines with a segmented address space to one with a linear address space. The last problem is machine resources. Many programmers assume that if their code is portable and standardized, their program will run on all machines supporting a standard C- compiler. While this is basically true, some programs require so much memory or processing time that they simply can't be run on some smaller machines. Designing For Portability In spite of these problems, it is possible to write C programs that can be compiled and executed in different environments. To be portable, a program must be designed and coded in a fashion that hides environmental differences. C's own design hides many environmental differences. The standard library is a successful attempt to hide some very environment-specific information -- such as the way in which file system (and some others) calls are done on the target operating system. Without the standard library, every programmer would have to write the interface coding himself. Even worse, he would have to rewrite it again and again for each new environment. You can hide other large environment differences by creating your own "standard libraries" for other tasks: extract the non-portable operations to a separate source module, define a general interface for this model and build a different implementation for every environment you want to work with. Many of the high quality portable support library products available do this for you. Such a library provides "instant" portability, lower cost, and more functionality than an equivalent product written by a single programmer. While system-specific libraries are appropriate for horrible, non-portable tasks like dealing with the user console, using a standardized function call for smaller tasks which require only slightly different coding in limited areas of the source code might not make sense. In this case it would not make sense to define a one-line function to set a signal handler under one environment only, especially if the signal-handler is called from inside a tight loop where the calling overhead could cause performance problems. The C preprocessor is the obvious tool for these smaller coding differences: just use conditional compilation to enable the code which sets the signal handler in the one environment where it's needed. You don't have to define a large number of functions, and there is no unnecessary calling overhead. The preprocessor can also help solve problems that arise simply because different names are used for the same thing. For example, nearly every compiler uses its own name for the machine-level i/o (port) functions of MS-DOS compilers (for example inp and outp versus inportb and outportb). Fortunately these functions have the same calling conventions. In this situation, rather than use conditional compilation for every function call parameterized, just use conditional compilation one time to define a macro that in turn calls the function with the right name. Everywhere else, the code uses the macro to call the function. Macro and constant definitions can also completely hide slight differences in standard library paramenters. For example, when working under two different operating systems where the standard libraries have different open modes for text and binary files, you could use the call to open a binary file for writing fp = fopen ("file", OPM_WM) Under UNIX, OPM_WB would be defined "w" and the call would expand to fp = fopen("file", "w") Under MS-DOS (Microsoft C) OPM_WB would be defined "wb" and would expand to fp = fopen("file", "wb") Sometimes a simple define can also hide significant hardware differences. Different data type sizes can be hidden by defining your own data types with a guaranteed minimum and maximum precision. For example, type int32 (integer containing at least 32 bits) would be mapped to int for 68xxx machines and to long for 80x86 machines. If int32 has been used in every spot requiring a 32-bit integer, nothing but the definition needs to be changed to adjust for the alternate name. (Please note that a data type redefinition can be done either with the preprocessor or a compiler typedef. While the former is potentially more portable, so far I have not seen a compiler which does not implement typedef. Thus I prefer using typedef because sophisticated compilers can do better error checking with it. However, if you want to be absolutely sure that your data type redefinition will be accepted by all old compilers, you must use preprocessor defines.) By now it is obvious that the preprocessor can help make programs more portable. What would make more sense than to combine all these preprocessor-based aids? This can be done in a single header file. For nearly two years I have been using such a file, working mainly with four different MS-DOS compilers and the UNIX PCC. The idea developed because of minor standard-library differences between MS-DOS compilers, but it soon became clear that the header file could help when porting to UNIX, too. The still incomplete result will be described below. environ.h All necessary preprocessor statements and typedefs are included in one single file named environ.h (Listing 1). It should be the very first file included. Before including environ.h, you should define which other standard include files you need. This is done by defining some preprocessor constants which correspond to standard include file functionality. You read right, functionality -- not names. For example, if you select the define INCL_ASSERT, not only will the file assert.h be included but the necessary (for MS-DOS/MSC) file process.h also. If you compile under UNIX, only assert.h is included. Defining these constants in terms of functionality hides the include file name differences -- an important feature that saves you many conditional directives in the source modules. Microsoft uses a similar system for their OS/2 header files in MSC 5.1. When completely defined for your environment, environ.h should #include all include files needed by your application. If you find it necessary to explicitly include other files, you should extend the definitions in environ.h. They are still incomplete (see lines 274 - 401). environ.h begins by preventing the accidental inclusion of a header file more than once. Multiple inclusion may cause damage to some preprocessor defines. At best, it will cause additional overhead, and at worst, program errors may occur. To prevent these problems environ.h checks preprocessor constant ENVIRON_H. If this constant is defined, environ.h assumes that it has been previously included and takes no further steps (via the #ifndef ENVIRON_H in line 26). If ENVIRON_H is not defined, then this is the first inclusion of environ.h and processing takes place. First ENVIRON_H is defined, ensuring that no second inclusion will be possible. Next, based on which compiler and operating system are active, ENVIRON_H defines the target environment. Information about the environment is acquired in a relatively straightforward way (lines 29 - 165). Operating-system specific constants that may be defined automatically by the compiler are purged -- they will be replaced with your own. The #undef of the default definitions is not actually necessary, but it will prevent possible warning messages from appearing when redefining the compiler default constants. The #undefs are followed by defines which select the target OS. Only one may be active at one time. Note the definition to 0 or 1. You could also define only one OS constant and use #ifdef instead of #if CONSTANT == 1 but this has the disadvantage that K&R compilers have no "#if defined(CONSTANT)". Without this command it is hard to build complex preprocessor-ifs using #ifdef and #ifndef because you can't use Boolean operators. If you define the constants to 0 and 1, you can build normal conditional expressions. This is an advantage if you consider that you must often ask questions like #if MSDOS && USE_BIOS Following the OS definition there are some auxiliary definitions used only under specific OS to identify the target machine. Currently these apply only to certain generic MS-DOS machines within compatible hardware or BIOS requiring actual MS-DOS calls (as opposed to BIOS calls or direct hardware manipulation). The only common example is the early Wang PCs, for which there is a separate definition. The operating system definitions are followed by the compiler definitions. A specific compiler selection is only necessary if more than one is available under one OS. In my case this is only needed for MS-DOS. But as you can see in environ.h there is only a definition for MSC. All other compilers I use identify themselves by doing an automatic constant definition upon startup (e.g., ___TURBOC___ for Borland's Turbo C). Note that the MSC constant is overridden if one of the other predefined constants is detected or an OS other than MS-DOS is active (lines 88 - 106). This feature simplifies proper configuring of the header-file. Separate constants for each compiler to allow conditional compilation for small compiler differences. To avoid code like "#if MSC DLC LC ___TURBOC___ .... "we introduce some language set selection constants (lines 70 - 76). Each define corresponds to one language feature. If the constant is equated to true (1) that language feature can be used, otherwise it cannot. All other decisions are based on these feature selection constants and are much more readable. Now the example given above takes the more intelligible form #if USE_VOID. To avoid modifying all language selection constants each time you change compilers, environ.h includes an automatic language set selection which automatically redefines the language set constants based on the compilers' and OS definitions. While auto selection is currently only functional in the MS-DOS environment, it can easily be expanded to work under different operating systems (lines 129 - 164). To complete the environment definition, environ.h defines the constant ANSI_C to 0 or 1 in respect to the compilers' C standard (K&R/ANSI) (lines 119 - 127). This constant is currently set based on the state of a language feature selection (like USE_VOID), but could become more important in the future. The example header file still lacks one feature, a definition check. All definitions are accepted as entered. If, for example, the programmer defines two or more operating systems to 1 the behavior of environ.h is undefined but clearly erroneous. This could be avoided by checking the entered definitions to see if two or more definitions are true and aborting compilation if so: #if MSDOS && UNIX "Error: Both MSDOS and UNIX selected" #endif This code ask for the error condition and generates a compile-time error if it detects one. The error message generated by the compiler points at the real error message in the source module. Examples can be found in CUG library volume 227 (compatible graphics) in file graphics.h. This file contains extensive definition checking. So far environ.h has supplied definitions that allow conditional compilation in the source units but no automatic porting aids. The balance of the file addresses this second need. Different compiler data types and modifiers can be hidden largely by preprocessor defines. For example, if the compiler doesn't support the void keyword, just define void to nothing, and the void keyword will disappear. Since you didn't use void originally when writing for that compiler, this disappearance will cause no problems. Your coding can now be used with compilers that support void without any additional work. That is the key feature of modifier definition: you can hide all data type and modifier differences by simply defining the data type in question to nothing (as in lines 167 - 195 in environ.h). Here's another example: if a compiler doesn't support the volatile modifier, it normally doesn't do the strange optimizations that force you to use volatile (or they can be turned off), so there is no problem in purging all volatile modifiers in your source. This kind of type redefinition allows you to use the types on machines supporting them without losing backward compatibility. If an older compiler doesn't support these type modifiers, their extra value is gone but your program still runs without problems. Most data types and modifiers can be treated in this manner. (In some cases you may instead redefine the type to something different -- e.g. define void to int instead of purging it). However, some types and modifiers, like enum, can't simply be redefined to nothing or to some other value. If you try to redefine these types, your program won't compile due to the syntax differences between defining a "normal" data item and an enum one. Defining an enum is a process nearly identical to defining a structure or union. Special definitions are required. You can't hide them by one general define. You still can use enum on supporting and non-supporting compilers, but you must define all your enum types using conditional compilation. If the compiler supports enum, you can use it without difficulty. If not, you define an int type and use the preprocessor to define the enum tags: "#if USE_ENUM typedef enum { A, B } enumtype; #else typedef int enumtype; #define A 0 #define B 1 #endif" This clearly entails more programming work but allows the use of extended error checking features of compilers that support enum. You can define your own data types to hide hardware differences, especially machine word length differences. They ("personal types") have a guaranteed minimum and maximum precision and are mapped to the actual hardware data type. By relying on these "personal types," you can write programs that work on different machines in an expected manner, and you can take memory requirements into account because there is a guaranteed MAXIMUM precision. This problem wasn't critical to me, so the example header file contains only very limited support (lines 258 - 261). Please note that typedefs are used instead of preprocessor defines. The next problem area is that of standard library function names and calling conventions. For example, calling exit() in C will commonly terminate your program gracefully. Under the Starsys OS, exit() is an OS call something like abort(). The real exit() function has been called dx_exit(). This causes problems to all but a few programs and would normally require text modifications. But that's exactly what the preprocessor can do for you: if you're running under Starsys, just define a macro named exit which takes one parameter (the return value). It will expand to a call to dx_exit() with that given parameter (line 234 - 236). A similar technique hides the variations among library functions with different names but identical calling parameters and functionality. Example macro definitions can be found a few lines above the exit() macro. File open modes are addressed in lines 241 - 253. Please note that not all open modes are supported, but the definitions can be easily expanded. Function Prototyping Unfortunately, ANSI function prototyping is not supported in every environment. Rather than sacrificing the extended error checking features that prototyping offers by not using it at all, you can use prototyping when the compiler supports it and turn it off when it does not. Turning off function prototyping is a little harder than turning off an unknown modifier. First you must build two classes of function prototypes, external and internal, corresponding to external and static functions. The external prototype macros appear in lines 197 - 211. This macro expands to extern func() for a K&R compiler and to extern func(int) for ANSI compilers. Please note the extra parentheses around int in the PROTT definition. These parentheses become part of the macro argument and are re-expanded. After expansion, they are the function parentheses of extern func(int). These parentheses are especially important if you want to prototype a function with more than one argument. If there were no inner braces, the macro would have two arguments, which would force you to write one prototyping macro for every number of function arguments you will ever use. Given these inner braces the whole prototype is one macro argument and only one prototyping macro will satisfy all needs. Normally you write a function header only once for each internal function. It is more difficult to hide these prototypes: modern ANSI's style is to write argument types and names in the function header (e.g. static func(int a)), while K&R's style is to write the argument names only (static func (a)). Fortunately ANSI compilers accept function headers written in K&R style, but usually don't build prototypes for such headers. One solution is to write the prototype first and then to write the actual function header (STATICPT(func, (int));\n static func()). In this case the function prototype defines the function first as extern to prototype it (just as is done in application header files). While this has worked well with all ANSI-compilers I know of, I'm not certain that it is guaranteed to be legal under ANSI-standard. At first glance you may wonder why the prototype does not have the form static func PROTT((int)) and in fact I am not sure if these constructs are legal. Most compilers accept the functions to be declared to extern and later redefined to static. However, the MSC compiler doesn't accept this construct and generates error messages (at least QC does; CL accepts them with warnings). Instead, MSC allows both the function prototype and the actual function header to be declared static -- the approach used in environ.h. If MSC is active, the prototype attribute is redefined to static. To do this the macro must have control over the whole prototype line, not just part of it. So a new construct has been created. The macro has two parameters: the function name and the prototype. It expands to the correct modifier followed by the function name and (if selected) the function prototype. This may be a somewhat unusual macro construct, but remember that the C preprocessor is mainly a text substitution tool and not part of the actual compilation process. This allows the preprocessor to make some very strange modifications to the C source code, including constructs like the static function prototyping which cannot be done by any C statement. Building such unusual constructs can give very simple solutions to otherwise intractable problems. The STATICPT() macro can be found between lines 197 and 211. Conclusions As you can see, the environmental header file environ.h can aid in writing portable programs, especially in the problem areas of data type, modifier and name differences. In addition, some machine specifics can be hidden and some newer constructs mapped to work with older compilers. On the other hand, the header file can't hide some differences (e.g. different mechanisms for interacting with the user console). Such differences require special coding that normally should be contained in external modules. But the header file can help you write these modules too by precisely defining the target environment. Precise functional definitions are the basis for selecting the right code sequences in the low-level driver modules (assuming that coding for more than one environment can be contained in one source unit). The definitions will aid you in activating slightly different source lines which you may have in your program. Thus, a larger porting system is built using three modules. First, the environment header file describes the environment and hides all differences possible using the preprocessor and typedefs (mainly text substitutions). Second, libraries of standardized functions handle larger problem areas that actually require different coding. Third, conditional compilation within the source modules hides very small differences where the text-substitution capabilities of the preprocessor are insufficient and a special function call makes no sense. This last option should be limited to cases where it is absolutely necessary, because conditional compilation is not really portable programming, but is rather having code for all known environments. If you switch to a new environment, you must not only write new coding but also look for a problem area in the source file. To avoid these problems I recommend flagging these lines with special comments (e.g./*PORT*/). Related code can be found in the CUG library holdings. Volume CUG227 contains a compatible graphics system which makes extensive use of the preprocessor's text substitution capabilities. Volume CUG265, the cpio starter kit, contains a header file similar to the one discussed here. It also contains programs using it. Listing 1 1: /* 2: *e n v i r o n. h 3: * ----------------- 4: * This module contains environment specific information. 5: * It's used to make the programs more portable. 6: * 7: * @(#)Copyrigth (C) by Rainer Gerhards. All rights reserved. 8: * 9: * Include-file selection defines are: 10: * 11: * Define Class 12: * --------------------------------------------------------- 13: * INCL_ASSERT assert macro and needed functions 14: * INCL_CONIO low-level console i/o 15: * INCL_CONVERT conversion and classification functions 16: * INCL_CTYPE ctype.h 17: * INCL_CURSES curses.h 18: * INCL_LLIO low-level i/o 19: * INCL_MEMORY memory acclocation/deallocation functions 20: * INCL_MSDOS MS-DOS support 21: * INCL_PROCESS process control 22: * INCL_STDIO stdio.h 23: * INCL_STDLIB standard library functions 24: * INCL_STRING string handling functions 25: */ 26: #ifndef ENVIRON_H 27: #define ENVIRON_H 28: 29: #undef MSDOS 30: #undef OS2 31: #undef UNIX 32: #undef STARSYS 33: 34: /* 35: * configurable parameters. 36: * modify the following parameters according to the target environment. 37: */ 38: 39: /* 40: * define target operating system 41: */ 42: #define MSDOS 0 43: #define UNIX 0 44: #define OS2 1 45: #define STARSYS 0 46: 47: /* 48: * define target machine 49: * 50: * This is auxiluary data only needed for some operating 51: * systems. Currently only needed if MS-DOS is active. 52: */ 53: #define IBM_PC 1 /* IBM PC, XT, AT & compatibels */ 54: #define WANG_PC 0 /* Wang PC, APC ... */ 55: 56: /* 57: * define target compiler (if neccessary) 58: */ 59: #undef MSC 60: #define MSC 1 /* Microsoft C */ 61: 62: #define AUTO_SEL 1 63: /* 64: * The above #define allowes an automatic language set selection. It is 65: * only functional if the used compiler identifies itself via a #define. 66: * 67: * Note: If AUTO_SEL is set, the parameters below are meaningless! 68: */ 69: 70: #define USE_FAR 0 /* use far keyword */ 71: #define USE_NEAR 0 /* use near keyword */ 72: #define USE_VOID 1 /* use void keyword */ 73: #define USE_VOLA 0 /* use volatile keyword */ 74: #define USE_CONST 0 /* use const keyword */ 75: #define USE_PROTT 0 /* use function prototypes */ 76: #define USE_INTR 0 /* use interrupt keyword */ 78: /* +--------------------------------------------------------+ 79: * End Of Configurable Parameters 80: * +--------------------------------------------------------+ 81: * Please do not make any changes below this point! 82: */ 83: 84: #ifdef SYMDEB 85: # define SYMDEB 0 86: #endif 87: 88: /* 89: * Check target_compiler. Note that the MSC switch is overriden if 90: * either __TURBOC__ or DLC are defined. 91: */ 92: #ifdef __TURBOC______LINEEND____ 93: # undef MSC 94: #endif 95: #ifdef DLC 96: # undef MSC 97: #endif 98: #if STARSYS 99: # undef MSC 100: #endif 101: 102: #if !(MSDOS OS2) 103: # undef MSC 104: # undef AUTO_SEL 105: # define AUTO_SEL 0 106: #endif 107: 108: #if OS2 109: # undef MSC 110: # define MSC 1 111: # undef AUTO_SEL 112: # define AUTO_SEL 1 113: #endif 114: 115: /* 116: * Compiler ANSI-compatible? 117: * (First we assume it's not!) 118: */ 119: #define ANSI_C 0 120: #ifdef MSC 121: # undef ANSI_C 122: # define ANSI_C 1 123: #endif 124: #ifdef TURBO_C 125: # undef ANSI_C 126: # define ANSI_C 1 127: #endif 128: 129: #if AUTO_SEL 130: # undef USE_FAR 131: # undef USE_NEAR 132: # undef USE_VOID 133: # undef USE_VOLA 134: # undef USE_CONST 135: # undef USE_PROTT 136: # undef USE_INTR 137: # ifdef __TURBOC______LINEEND____ 138: # define USE_FAR 1 139: # define USE_NEAR 1 140: # define USE_VOID 1 141: # define USE_VOLA 1 142: # define USE_CONST 1 143: # define USE_PROTT 1 144: # define USE_INTR 1 145: # endif 146: # ifdef DLC 147: # define USE_FAR 1 148: # define USE_NEAR 1 149: # define USE_VOID 1 150: # define USE_VOLA 1 151: # define USE_CONST 1 152: # define USE_PROTT 1 153: # define USE_INTR 0 154: # endif 155: # ifdef MSC 156: # define USE_FAR l 157: # define USE_NEAR 1 158: # define USE_VOID 1 159: # define USE_VOLA 1 160: # define USE_CONST 1 161: # define USE_PROTT 1 162: # define USE_INTR 1 163: # endif 164: #endif 165: 166: 167: #if !USE_FAR 168: #define far 169: #endif 170: 171: #if !USE_NEAR 172: #define near 173: #endif 174: 175: #if !USE_VOID 176: #define void 177: #endif 178: 179: #if !USE_VOLA 180: #define volatile 181: #endif 182: 183: #if !USE_CONST 184: #define const 185: #endif 186: 187: #if USE_INTR 188: # ifdef MSC 189: # define INTERRUPT interrupt far 190: # else 191: # define INTERRUPT interrupt 192: # endif 193: #else 194: # define INTERRUPT 195: #endif 196: 197: #if USE_PROTT 198: # define PROTT(x) x 199: # ifdef MSC 200: # define STATICPT(func, prott) static func prott 201: # else 202: # define STATICPT(func, prott) extern func prott 203: # endif 204: #else 205: # define PROTT(x) () 206: # ifdef MSC 207: # define STATICPT(func, prott) static func () 208: # else 209: # define STATICPT(func, prott) extern func () 210: # endif 211: #endif 212: 213: #ifdef MSC 214: # define inportb(port) inp(port) 215: # define outportb(port, val) outp(port, val) 216: #endif 217: 218: #ifdef__TURBOC______LINEEND____ 219: # define REGPKT struct REGS 220: #else 221: # define REGPKT union REGS 222: #endif 223: 224: #ifdef DLC 225: # define defined(x) 226: # define inportb inp 227: # define outportb outp 228: #endif 229: 230: #if !SYMDEB /* symbolic debugging support */ 231: # define STATICATT static 232: #endif 233: 234: #if STARSYS 235: # define exit(x) dx_exit(x) 236: #endif 237: 238: /* 239: * Define open modes according to selected operating system/compiler. 240: */ 241: #if MSDOS 0S2 242: # define OPM_WB "wb" 243: # define OPM_WT "wt" 244: # define OPM_RB "rb" 245: # define OPM_RT "rt" 246: #endif 247: 248: #if UNIX 249: # define OPM_WB "w" 250: # define OPM_WT "w" 251: # define OPM_RB "r" 252: # define OPM_RT "r" 253: #endif 254: 255: #define TRUE 1 256: #define FALSE 0 257: 258: typedef unsigned char uchar:; 259: typedef int bool; 260: typedef unsigned short ushort; 261: typedef unsigned long ulong; 262: 263: #define tonumber(x) ((x) - '0') 264: #define FOREVERL() for(;;) 265: 266: /* 267: * Select #include-files depending on target compiler and OS. 268: * 269: * Phases: 270: * 1. Define all include selection constants to true or false. 271: * 2. Select actual include files and include them. 272: * 3. #Undef all include selection constants. 273: */ 274: #ifndef INCL_STDIO 275: # define INCL_STDIO 0 276: #else 277: # under INCL_STDIO 278: # define INCL_STDIO 1 279: #endif 280: #ifndef INCL_CURSES 281: # define INCL_CURSES 0 282: #else 283: # undef INCL_CURSES 284: # define INCL_CURSES 1 285: #endif 286: #ifndef INCL_CTYPE 287: # define INCL_CTYPE 0 288: #else 289: # undef INCL_CTYPE 290: # define INCL_CTYPE 1 291: #endif 292: #ifndef INCL_ASSERT 293: # define INCL_ASSERT 0 294: #else 295: # undef INCL_ASSERT 296: # define INCL_ASSERT 1 297: #endif 298: #ifndef INCL_LLIO 299: # define INCL_LLIO 0 300: #else 301: # undef INCL_LLIO 302: # define INCL_LLIO 1 303: #endif 304: #ifndef INCL_PROCESS 305: # define INCL_PROCESS 0 306: #else 307: # undef INCL_PROCESS 308: # define INCL_PROCESS 1 309: #endif 310: #ifndef INCL_MEMORY 311: # define INCL_MEMORY 0 312: #else 313: # undef INCL_MEMORY 314: # define INCL_MEMORY 1 315: #endif 316: #ifndef INCL_STRING 317: # define INCL_STRING 0 318: #else 319: # undef INCL_STRING 320: # define INCL_STRING 1 321: #endif 322: #ifndef INCL_STDLIB 323: # define INCL_STDLIB 0 324: #else 325: # undef INCL_STDLIB 326: # define INCL_STDLIB 1 327: #endif 328: #ifndef INCL_CONVERT 329: # define INCL_CONVERT 0 330: #else 331: # undef INCL_CONVERT 332: # define INCL_CONVERT 1 333: #endif 334: #ifndef INCL_MSDOS 335: # define INCL_MSDOS 0 336: #else 337: # undef INCL_MSDOS 338: # define INCL_MSDOS 1 339: #endif 340: #ifndef INCL_CONIO 341: # define INCL_CONIO 0 342: #else 343: # undef INCL_CONIO 344: # define INCL_CONIO 1 345: #endif 346: 347: #if INCL_STDIO && !(INCL_CURSES && UNIX) 348: # include <stdio.h> 349: #endif 350: #if INCL_CURSES && UNIX 351: # include <curses.h> 352: #endif 353: #if INCL_CTYPE INCL_CONVERT 354: # include <ctype.h> 355: #endif 356: #if INCL_ASSERT 357: # include <assert.h> 358: # ifdef MSC 359: # undef INCL_PROCESS 360: # define INCL_PROCESS 1 361: # endif 362: # ifdef __TURBOC______LINEEND____ 363: # undef INCL_PROCESS 364: # define INCL_PROCESS 1 365: # endif 366: #endif 367: #if INCL_LLIO 368: # ifdef MSC 369: # include <fcntl.h> 370: # include <io.h> 371: # endif 372: #endif 373: #if INCL_PROCESS 374: # ifdef MSC 375: # include <process.h> 376: # endif 377: #endif 378: #if INCL_MEMORY 379: # include <malloc.h> 380: #endif 381: #if INCL_STRING 382: # if ANSI_C 383: # include <string.h> 384: # endif 385: #endif 386: #if INCL_STDLIB INCL_CONVERT 387: # if ANSI_C 388: # include <stdlib.h> 389: # endif 390: #endif 391: #if INCL_CONIO 392: # ifdef __TURBOC______LINEEND____ 393: # include <conio.h> 394: # endif 395: # ifdef MSC 396: # include <conio.h> 397: # endif 398: #endif 399: #if MSDOS && INCL_MSDOS 400: # include <dos.h> 401: #endif 402: 403: 404: /* 405: * Purge utility #defines. 406: */ 407: #undef INCL_STDIO 408: 409: #endif Writing Standard Headers: The String Functions Dan Saks Dan Saks is the owner of Saks & Associates, which offers training and consulting in C and C++. He is a member of X3J11, the ANSI C committee. He has an M.S.E. in computer science from the University of Pennsylvania. You can write to him at 287 W. McCreight Ave., Springfield, OH 45504 or call (513) 324-3601. In a recent letter to The C Users Journal, Phil Cogar of N.S.W. Australia complained that much of the C source code appearing in this and other programming journals contains references to headers such as <stdlib.h> that are not published along with the code. He observed that if your compiler provides these headers, then typing in the code and getting it to run is usually easy; without them, it may be impossible. He has a legitimate complaint, but as editor Robert Ward points out in his response, it's often impractical to publish the headers with the code. (See The C Users Journal, October 1989, p.138.) To get the programs to run, you can write your own standard headers to go with your existing compiler and library. Although writing an entire Standard C library from scratch is a big chore, you can fill many of the gaps in an existing library by yourself in only a few days. The Standard Headers The fifteen headers specified by the Standard are summarized in Table 1. Most of them declare a set of related library functions, along with any macros and types needed to call them. A few headers don't contain any functions; they simply define useful macros and types that have nowhere else to go. Some macros and types appear in more than one header, but each function is declared only once. Most compilers supply additional headers. For example, UNIX compilers add headers such as <direct.h>, <fcntl.h> and <process.h>. Many MS-DOS compilers supply some of the UNIX headers, along with others such as <bios.h>, <conio.h> and <dos.h>. None of these headers is covered by the C Standard. Some UNIX headers have been formalized by the IEEE 1003.1 POSIX Portable Operating System Standard, but many aren't covered by any non-proprietary standard. A C program using library headers other than those listed in Table 1 will not be portable to all Standard C implementations. A program accesses the contents of a standard header by referencing the header in an include directive, such as #include <stdio.h> Headers are often referred to as "include files" because they are almost always implemented as source files with the same names. Other implementations are permitted, and so the Standard is careful not to refer to them as files. Nevertheless, "headers" and "include files" are generally understood to mean the same thing. Determining What You Already Have Before starting to fix your standard headers, you should look to see what you already have. Headers are usually easy to locate them. For example, on UNIX systems the headers for cc are usually in /usr/include (see the subheading FILES on the manual page(s) for cc(1) in your UNIX manual). The default setup for Turbo C on MS-DOS places the headers in \turboc\include. Most MS-DOS compilers do something similar. The headers for DECUS C on my PDP-11 are in the same subdirectory as my compiler executables, which is a subdirectory with the logical name C:. You should not be surprised to find that you already have several of the standard headers. The standard library is not pure invention; it's the result of an effort to "codify common existing practice." You will almost certainly find a version of <stdio.h> -- the only standard header used by Kernighan and Ritchie in the first edition of The C Programming Language. <ctype.h> is also extremely common. Beyond that, it's hard to say just how many headers you're likely to find. For example, the DECUS C compiler has only four of the standard headers: <ctype.h>, <setjmp.h>, <stdio.h>, and <time.h>. The UNIX 4.2 BSD compiler (cc) has these four, plus <assert.h>, <errno.h>, <math.h>, and <signal.h>. It also has <varargs.h>, which is very similar to <stdarg.h>. Turbo C 2.0, Microsoft C 5.1 and Zortech C 1.07 (all for MS-DOS) have every header except <locale.h>, but very few of the headers among all three compilers are exactly as they should be. Where To Put New Headers Before you start creating and modifying headers, you should think about where to put them. You can throw caution to the wind and put the new headers in the same directory as your existing ones (assuming you have the access rights), but then you run a serious risk that some of your old code won't work with the new headers. I recommend creating a directory for your new headers and reconfiguring your compiler environment to search this new directory before it searches the old one. Remove the new headers from the search if you have to. Compiler environments vary so much that I can't explain how to do this for everyone, but I will show you what I've done on a few different systems: On UNIX 4.2 BSD: I put the new headers in a subdirectory /usr/include within my home directory (/u/dsaks). I wrote a shell script called cc that simply contains /bin/cc -I/u/dsaks/usr/include $* This script invokes the UNIX C compiler (in /bin) with the -I option. -I tells the compiler to search for include files in the named directory before searching in the standard places. The $* passes all the arguments to the cc script through to the C compiler. I put this script in /u/dsaks/usr/bin, and added this directory name to my shell path variable. I made the script executable by using chmod +x cc This cc command compiles with the new headers. If I need to omit them, I simply rename the command with mv cc cc.new so the cc command reverts to the one in /usr/bin (without -I). On MS-DOS 3.0 and higher: I put the original headers for Microsoft C and Quick C in \ms\include, and my new headers in \ms\usr\include. Both compilers support the -I option, so you can create a cc.bat command file like the UNIX shell script. Yet, Microsoft gives you an easier alternative. The Microsoft compilers use the INCLUDE environment variable to define the search path for include files. I use two different command files to configure the compiler environment. My msnew.bat uses set INCLUDE=c:\ms\usr\include;c:\ms\include to put the new headers in the search path, while msold.bat uses set INCLUDE=c:\ms\include to take them out. Other MS-DOS compilers require slightly different approaches. Zortech's command line compiler, ZTC, uses the INCLUDE environment variable just like Microsoft C, but their integrated environment, ZED, gets its search path from a configuration file maintained by a utility called ZCONFIG. Borland's Turbo C lets you specify the search path in a file called TURBOC.CFG. Consult your compiler user's guide for details. On RT-11 V5.0 and higher: The DECUS C compiler has a built-in preprocessor that's virtually useless. Fortunately, the compiler is distributed with MP, a decent preprocessor from the UNIX User's Group. My compilation command files disable the built-in preprocessor (with the /M compiler switch) and use MP instead. MP has a preset search path for include files. First it looks in the directory with the logical name LB:, then it looks in C:, and finally it looks in SY:. I put the original headers in a directory assigned to C: and the new headers in another directory assigned to LB:. I can remove the new headers from the search by deassigning LB:. <string.h> I'll begin with <string.h> because it's often missing and yet is easy to create. Once you have it, you'll use it frequently. <string.h> (see Table 2) declares the string handling functions in the library. It also declares one macro, NULL, and one type, size_t, that are needed to use these functions. There is no universal way to define NULL ---- you tailor the definition to your machine's architecture. The easiest way to obtain a definition for NULL is to steal one from <stdio.h>. If you can't find a definition there or in some other header, then you should probably use #define NULL ((void *)0) if your compiler supports the void * type, or #define NULL ((char *)0) if it doesn't. If you know that your pointers have the same size as type int, you can use simply #define NULL 0 If the pointers on your machine have the same size as type long int, you can use #define NULL OL I prefer to use the casts to determine the size of NULL. However, I suspect you'll find that one of the latter two forms is already used in your existing headers. Whichever form you choose, use it consistently. Most MS-DOS C compilers provide pointers in two different sizes, near and far. The headers in these compilers use conditional compilation to select the appropriate definition for NULL, something like #ifdef _NEAR_POINTERS #define NULL 0 #else #define NULL OL #endif If your <string.h> needs a definition like this, you should find it in one of your existing headers. (For more insight into the possible definitions for NULL, see "Doctor C's Pointers: The 'NULL' Macro and Null Pointers" by Rex Jaeschke in The C Users Journal, Sept/Oct, 1988.) NULL is defined in several standard headers. The headers may be included in any order, and a given header may be included more than once, so you must insure that the repeated definitions for NULL don't conflict with each other. Most implementations permit "benign" macro redefinitions (repeated definitions formed by identical sequences of tokens) as specified in the Standard. In this case, make all the definitions the same. If your preprocessor doesn't allow any redefinitions, you will have to put a "protective wrapper" around each one, as in #ifndef NULL #define NULL ((void *)O) #endif size_t is the type of the result of the sizeof operator. The Standard says that it should be an unsigned integral type, so use either typedef unsigned size_t; or typedef unsigned long size_t; You can select the appropriate definition using the program in Listing 1. In many C implementations, sizeof yields a signed int value. You should still define size_t as unsigned, so that operations on objects of that type have the proper unsigned behavior. You can always use size_t to cast the possibly negative result of sizeof to its 'true' unsigned value, as in if ((size_t)sizeof(something_big) > 0) For more about size_t and sizeof, see "Doctor C's Pointers: Exploring the Subtle Side of the 'sizeof' Operator" by Rex Jaeschke in The C Users Journal, Feb., 1988 or see Rex's book, listed in References. As with NULL, size_t appears in several standard headers. The Standard and many implementations do not allow typedef redefinitions (even "benign" ones) in the same scope, so you may need a protective wrapper around each definition. For example #ifndef _SIZE_T_DEFINED typedef unsigned size_t; #define _SIZE_T_DEFINED #endif You don't have to use the name _SIZE_T_DEFINED. Any identifier beginning with an underscore followed by an upper-case letter or another underscore will do. The Standard reserves these names for the implementation of the compiler (of which the headers are part). Since benign macro redefinitions are usually allowed, you may be tempted to define size_t as #define size_t unsigned in order to eliminate the protective wrapper. I have seen this done in some "ANSI-conforming" compilers. Although you will probably never notice the difference, the macro definition is wrong because it changes the scope of size_t. Use the typedef. And now for the functions. Most older C compilers don't support prototypes, so you might have to delete or "comment out" the parameter lists. Some functions return void *. If your compiler won't accept that type, use char *. You will find that your library contains some, but not all, of the string functions. Sometimes you will find a standard C function under an archaic name. Many recent books on C have an appendix that details the functions in the standard library. (See references at the end of the article.) You should compare the functions in the standard library with the functions in your compiler's library to find as many matches as you can. For example, some implementations use index instead of strchr. In this case, you could declare strchr as char *index(); #define strchr(s, c) index(s, c) but there is a hazard. If you forget that strchr is really index, and write another function called index, you will inadvertently redefine strchr. (This is an excellent way to test your debugging skills.) This macro definition should only be used as an interim fix until you add a compiled version of the missing function to the run-time library. What about functions that are completely missing? Should you still put their declarations in <string. h>? The answer is a definite maybe. Suppose that memchr is missing from your library. memchr returns a void *, but if you leave the declaration out of <string. h>, the compiler will assume it returns an int. When you compile char *p, s[10]; p = memchr(s, 'x', 10); you may get a spurious warning about an illegal pointer assignment, but compilation will continue. You won't know what's really happening until the linker reports that memchr is undefined. Under these circumstances, you should declare memchr in the header to eliminate the unnecesary warnings. If you use a Lint-like program checker that can detect undeclared functions (or if your compiler has such an option), then don't declare functions that are missing from the library. When you reference a missing function, you will still get a meaningful error message, but won't have to wait for the linker to tell you what you already know. Listing 2 shows the <string.h> that I use on UNIX 4.2 BSD. It includes some interim macro definitions for missing functions. The #ifndef ... #endif wrapper around the entire header prevents repeated compilation of the declarations if the header is included more than once. The wrapper isn't needed for protection since you can redeclare functions (provided all declarations in the same scope are the same), and everything else in the header is either benign or protected. I added the wrapper to simplify debugging. While debugging macros, I sometimes look at the preprocessor output to verify the expansions. Eliminating redundant headers from preprocessor output makes it easier to read. The comment at the header's beginning is not in the wrapper so it still appears wherever the header is included, even if the rest of the header does not. One final word of caution. In Listing 2, strlen is declared to return a size_t, even though strlen is actually defined in the library to return an int. On machines where a signed int to unsigned int conversion performs no transformation of the data (as on twos-complement machines), strlen returning a size_t is perfectly safe. On other machines, you should leave the declaration as int strlen(); so that the compiler can recognize that size_t n; n = strlen(s); involves a signed to unsigned conversion and generate the proper code. You should also cast the result of strlen to size_t whenever strlen is used in an expression with other ints, such as if ((size_t)strlen(s) > 0) This is the same technique used with sizeof when it returns an int. Conclusion In this article I've tried to show why it's impossible to just publish a single portable version of the standard headers. The headers provide a portable definition of the Standard C environment, but they do it in a non-portable way. Rather than writing the missing string functions in the library, I suggest you write the remaining standard headers. Doing so solves more portability problems and gives you the definitions you need to compile new library functions as you write them. In <string. h>, you've already seen many of the design problems, so most of the remaining work is simply determining what goes into the other headers. References Darnell, Peter and Margolis, Philip, Software Engineering in C (1988, Springer-Verlag). Gardner, James, From C to C: An Introduction to ANSI Standard C (1989, Harcourt Brace Jovanovich). Jaeschke, Rex, Portability and the C Language, (1989, Hayden Books). Plauger, P.J. and Brodie, Jim, Standard C (1989, Microsoft Press). Ritchie, Dennis and Kernighan, Brian, The C Programming Language, 2nd. ed. (1988, Prentice-Hall). Table 1 Standard Headers assert.h - program diagnostics ctype.h - character testing and case mapping errno.h - error reporting float.h - floating type characteristics limits.h - integral type sizes locale.h - local customs math.h - mathematics setjmp.h - non-local jumps signal.h - signal handling stdarg.h - variable-length arguments lists stddef.h - common definitions stdio.h - input and output stdlib.h - general utilities string.h - string handling time.h - date and time utilities Table 2 Summary of <string.h> Macros: NULL Types: size_t Function Prototypes: void *memchr(const void *, int, size_t); int memcmp(const void *, const void *, size_t); void *memcpy(void *, const void *, size_t); void *memmove(void *, const void *, size_t); void *memset(void *, int, size_t); char *strcat(char *, const char *); char *strchr(const char *, int); int strcoll(const char *, const char *); int strcmp(const char *, const char *); char *strcpy(char *, const char *); size_t strcspn(const char *, const char *); char *strerror(int); size_t strlen (const char *); char *strncat(char *, const char *, size_t); int strncmp(const char *, const char *, size_t); char *strncpy(char *, const char *, size_t); char *strpbrk(const char *, const char *); char *strrchr(const char *, int); size_t strspn(const char *, const char *); char *strstr(const char *, const char *); char *strtok(char *, const char *); Listing 1 /* * write the definition for size_t */ #include <stdio.h> main() { printf("typedef unsigned%s size_t;\n", sizeof(sizeof(int)) == sizeof(int) ? "" : "long"); } Listing 2 /* * string.h - string hadling (for cc on UNIX 4.2 BSD) */ #ifndef _STRING_H_INCLUDED #define NULL ((char *)0) #ifndef _SIZE_T_DEFINED typeder unsigned size_t; #define _SIZE_T_DEFINED #endif char *strcat(); int strcmp(); char *strcpy(); size_t strlen(); char *strncat(); int strncmp(); char *strncpy(); /* * interim macro definitions for functions */ char *index(); #define strchr(s, c) index(s, c) extern int sys_nerr; extern char *sys_errlist[]; #define strerror(e) \ ((e) < sys_nerr ? sys_errlist[e] : "?no message?") char *rindex(); #define strrchr(s, c) rindex(s, c) /* * missing functions */ char *memchr(); int memcmp(); char *memcpy(); char *memmove(); char *memset(); int strcoll(); size_t strcspn(); char *strpbrk(); size_t strspn(); char *strstr(); char *strtok(); size_t strxfrm(); #define _STRING H_INCLUDED #endif UNIX 'termcap' Facility Improves Portability By Hiding Terminal Dependencies Ronald Florence Ronald Florence is a novelist, sheep farmer, occasional computer consultant, and UNIX addict. He can be reached at ron@mlfarm or ... {hsi,rayssd}!mlfarm!ron. For programmers accustomed to writing for single-user systems, UNIX (and Xenix) holds some quick surprises. All those carefully optimized, hand-coded screens, the lightning-fast displays that write to the screen buffer, even "well-behaved" routines that rely on BIOS calls, are suddenly useless. Terminal displays, including the console, are treated as teletype devices under UNIX. To perform even the simplest screen display function, such as clearing the screen, the program must send the proper screen control sequence. In effect, all screen displays are comparable to using the ANSI.SYS driver under MS-DOS. If the UNIX system had only a single terminal or if only one type of terminal were used on the system, it would be easy enough to hand-code the proper screen control sequences. Indeed, even if several different terminals are used on a system, the screen control sequences can be hand coded. For example, the function in Listing 1 could be used to clear screens. For a closed system where most of the output is teletype format, with only simple screen display commands, your programs may not need much more. But what if the system is not closed? What if there are outside logins using a variety of terminals? And what if you want to write screen displays that utilize a wide range of terminal capabilities, including automargins and optimized cursor motion, and make sure those displays are scaled to the size of the terminal display? And what if some of the terminals using the system require padding at certain speeds or have other quirks that make them unsuitable or tricky to use with fancy screen display programs? It is possible to keep adding options to code like Listing 1, but by the tenth terminal type, the code starts to look like linguini. The alternative is to use the termcap and terminfo databases of screen display parameters and control sequences which are provided with most UNIX systems. Termcap, which was developed at Berkeley, uses an ASCII database; the terminfo database is compiled. A curses library of screen display and terminal input functions is supplied with both systems. Terminfo is theoretically faster; it supports many terminal capabilities which are normally not encoded into the termcap database, and the curses library supplied with terminfo has many capabilities which are not supported under termcap curses. The termcap database is substantially easier to modify, and there are ways to incorporate many of the capabilities of the terminfo curses into programs running on termcap systems. This article will discuss only termcap, which is used by Xenix and by most BSD systems. The UNIX documentation describes the termcap routines as "low level" and the curses routines as "higher level," in much the way that troff/nroff is a low level formatting package, and the formatting macro packages (MM or MS) are high level. Actually, the analogy is not really appropriate. Curses is a screen optimization package with some convenient windowing functions. Termcap is a straightforward package of functions to access the database of screen and keyboard control sequences. The termcap database is normally in the file /etc/termcap. Comments in the file are prefaced with a # character. All lines which do not begin with the # are considered part of the database. Each entry in the database represents a different terminal. The entry begins with alternate names of the terminal, separated by characters. Usually the first name listed for the terminal is a special two-character abbreviation, used by some older programs. The second name is used by most utilities, such as the editor vi. The last name listed is the full name of the terminal, and is the only name which can have blanks inserted for readability. Thus: d1vt100vt-100pt100pt-100dec vt100: are the names of a DEC vt-100. If you add terminal descriptions to the termcap database, make sure that every name in your addition is unique. The capabilities of the terminal are listed after the name, separated from one another by colons. Newlines in the entry must be escaped with a backslash. The capabilities are strings, boolean, or integers. Most are mnemonic. Boolean capabilities are true if named. Strings follow an equals sign (=). Integers follow a #. There are no spaces or tabs within capabilities or between them, and an entry carried to a second line must repeat the :. Thus: MTmytermMy Special Terminal:\ bs:am:cl =\E[J:ho=\E[H:lines#24: indicates that myterm can backspace (bs), has automatic margins (am), that there are 24 lines displayed on the screen, and gives the sequences that should be sent to clear the screen (cl) and home the cursor (ho). Several special sequences are used to encode the strings:\E is the escape character (0x1b); ^X is "Control-X" or any other control key; \n, \r, \t, \b, and \f are newline, carriage return, tab, backspace, and form feed; \^ is ^, and \\ is \; All non-printing characters may be represented as octal escapes; the :, which is used to separate capabilities in each entry, must be entered as \072 if used in a string. Null characters can be entered as \200 because the routines that process termcap entries strip the high bits of the output late, so that \200 comes out \000. Padding can be encoded into the strings by prefacing the string with an integer, representing milliseconds of delay. An integer and a * indicate that the delay is proportional to the number of lines involved in the execution of the command. When the * is used, the delay can be stated in tenths of a millisecond, so that 3.5* before the string for ce (clear to end of line) would mean that the command requires 3.5 milliseconds of padding for each line that is to be cleared. Terminals which are identical to another entry with few exceptions can make use of the tc string and the @ negator. NTnewtermMy alternate terminal:lines=25:@bs:tc=vt100: describes a terminal with 25 lines, no backspace capability, but otherwise identical to a vt100. One caution in using entries with tc encoding: programs with a fixed stack (such as Xenix 286) may crash when reading tc encoded entries. The cure is to make the stack larger with the -F option on the compile command line. The cursor addressing string (cm) is coded with printf-like escapes. These are described in detail in the termcap (M) entry in the UNIX documentation. In addition to the regular termcap capabilities, which begin with lower case letters, some UNIX systems utilize extensions. Xenix uses a variety of upper case termcap entries to indicate special PC keys: PU for Page Up, EN for End, GS for Start-Character-Graphics-Mode, and pseudo-mnemonics for eight-bit PC graphics drawing characters. GNU Emacs uses upper- case capabilities to describe terminal command sequences which are not generally used in termcap, such as AL and DL for adding and deleting multiple lines. Programs which use these extended termcap capabilities may not be portable to other UNIX systems. The termcap library provides functions to retrieve the encoded information from the database. The termcap routines first search the environment for a TERMCAP variable. If it is found, does not begin with a slash, and the terminal type matches the environment string TERM, the TERMCAP string is read. If it begins with a slash, it is read as the pathname of the termcap database (instead of the default /etc/termcap). Using the environment variable instead of searching the database will speed up the development of new termcap entries. If your system has a tset command which supports separate TERM and TERMCAP environment entries, it will also speed the startup of programs which use termcap. One obvious use for the termcap database is in displaying formatted text to the screen. Although there are wordprocessing programs available to run under UNIX and/or Xenix, much text processing in UNIX systems is done by using an editor (vi or emacs) to prepare the text with nroff/troff formatting codes, usually with one of the macro packages such as MM. The formatted file is then piped to a printer or type-setter, or to a screen display for proofing. Although it is possible to prepare nroff terminal driving tables to encode the screen control sequences needed for such formatting features as bold type, italics or underlining, a different table would have to be encoded and compiled for each terminal, and the user would have to indicate the terminal type on the nroff command line: nroff -cm -Tmyterm myfile Also, the nroff terminal driving table format was created when daisy-wheel printers were the cutting edge of desktop hardcopy capabilities, and the coding is sometimes awkward to adapt to the capabilities of a terminal display. For simple text formatting, it is easier to parse the default nroff output, which uses backspaces and overstrikes to generate underlined or bold characters, and use termcap to look up the appropriate standout (bold) and underline sequences. The program in Listing 2 (Bold.c), uses termcap library functions to look up the terminal screen control sequences for so and se (standout start and standout end), us and ue (underline start and underline end), and sg, which is an integer coded quantity indicating how many spaces the attribute change to standout mode requires. For terminals with multiple fonts, the switchover to italic font could be encoded in us, so that underlined text would be displayed in italics. A bold screen attribute could be encoded in so and se, so that bold text would be displayed in bold font, instead of in reverse video. Alternately, new termcap entries could be created to hold the screen control sequences for bold or italic fonts. The termcap access functions are simple and straightforward. To parse the database, you need to allocate a buffer of 1024 characters (tbuf in Listing 2), to hold the entire termcap entry as it is retrieved by tgetent(). This buffer must be retained through all calls to the three functions which parse capabilities: tgetstr(), tgetflag(), and tgetnum(). Another buffer (sbuf in Listing 2) should be allocated for the strings which will be retrieved by tgetstr(). This should be a static buffer. The tgetstr() function is passed the address of a pointer to this buffer. As string capabilities are read, they are placed in the buffer, and the pointer is advanced. Using a static buffer saves the overhead of allocating space for each string as it is retrieved. The termcap library also provides a function tputs(), which correctly sends screen control sequences to the display, including any needed padding. tputs() requires a pointer to a user-supplied function which can display a single character. The function prch() (Listing 1) invokes the macro putchar(). Although it is not used here, the termcap library includes one other function, tgoto(), which uses the cm (cursor movement) string to go to a desired column and line. Because togoto() will output tabs, programs which make tgoto() calls should turn off the TAB3 bit when setting the line protocol. The function putout() in Listing 2 is not really necessary. It is used here to check for insertions of ^G (0x7) in the text files. ^G was chosen because it passes through nroff transparently. It is used to trigger expanded font in files sent to the printer. In Bold.c, it triggers the insertion of a space between characters to simulate expanded font. Termcap can also be used to retrieve the sequences sent by non-ASCII keys, like the arrow or functions keys. Although the termcap curses library does not use the arrow or functions keys, the keys can be added to programs which use curses for screen control by making a second set of termcap calls (curses makes it own calls to termcap), and then reading for the arrow or function key sequences in a getkey() routine (see Listing 3, keys.c). Reading arrow keys for terminals which use a single character code for each arrow (such as ^H, ^J, ^K, ^L) is simple, but many terminals, such as the PC console, send escape-prefaced strings (ESC[A, ESC[B, etc.) when the arrow keys or other non-ASCII keys are pressed. Some systems may balk at reading strings with a simple read() system call. It is worth fiddling with the VMIN and VTIME values in structure termio if you cannot read key sequences with the code in getkey(). The values in function fixquit() in Listing 3 are a good start. The alternative is to put the strings together out of characters read one at a time. This may be the most reliable technique for an editor or other program that reads repeated sequences of fast input characters that might be misinterpreted, such as an ESC followed by a [ and an alphabetic character, which an ANSI terminal might interpret as a screen control sequence. The trick if you are reading a character at a time is to distinguish between a lone ESC (0x1b) and an ESC sent as the first character of an escape sequence. One technique is to set a timeout alarm. If you get the characters that would constitute a key string before the timeout, return the key string, otherwise return an ESC followed by individual characters. The whole procedure takes tinkering, and fast typists can foul it up. Hence, using a read() call is simpler. One problem that can arise with the arrow key is that ^\, the UNIX "quit" character, is used as an arrow key on some terminals. Even if the "quit" signal is disabled, the keys will still be intercepted. The easiest fix to the problem is to change the "quit" key to an impossible value. The function fixquit() does this. The global variable ttytype is set by the curses termcap routines, which in this program are called before lookupkeys(). The ttytype could be set by a call to getenv(), as in the code for Listing 1. The header file in Listing 4 (keys.h) defines integer equivalents for the arrow and function keys; these defines can be used in switch statements. (The values given are those used in the terminfo header files.) What termcap cannot do is to optimize screen output by cutting down the overhead of repeated cursor movement sequences. The output routines in the curses library do a fair job and are simple to use. The code for life.c in Listing 5 uses these routines along with the arrow key routines from key. c, and while the speed of output cannot compare with an optimized routine writing directly to screen memory, it is quick enough on a console or a terminal running at 19,200 baud. Highly optimized screen output which requires even more efficiency could mean a journey into the treacherous code of screen display routines which calculate the cost of each move. One such package is the display routines in the Gosling Emacs code, which quite properly carries a dire warning to those who would venture into the tangles of the code. Listing 1 cls () { char *getenv(), *term, *cl; term = getenv ("TERM"); if (!strcmp(term, "ansi")) cl = "\033[2J\033[H"; else if (!strcmp(term, "wy50")) cl = "\033*"; /* add other terminals ... */ /* if all else fails, try a form feed */ else cl = "\f"; fputs (cl, stderr); } Listing 2 /* * Bold.c - filters nroff output for terminal display * displays bold in SO, underlines, expanded font * copyright 1987 Ronald Florence */ #include <stdio.h> #define UL 01 #define BOLD 02 #define ULSTOP 04 #define Bold() tputs(so, 1, prch), att = BOLD #define Stopbold() tputs(se, 1, prch), att &= ~BOLD #define Uline() tputs(us, 1, prch), att = UL #define Stopuline() tputs(ue, 1, prch), att &= ~(ULULSTOP) prch(c) register char c; { putchar(c); } char *so, *se, *us, *ue; main() { static char sbuf[256]; char tbuf[1024], *fill = sbuf, *tgetstr(), *getenv(); register a, c; int i, att = 0; if (tgetent(tbuf, getenv("TERM")) == 1 && tgetnum("sg") < 1) { so = tgetstr("so", &fill); se = tgetstr("se", &fill); us = tgetstr("us", &fill); ue = tgetstr("ue", &fill); } a = getchar(); while ((c = getchar()) != EOF) { if (a == '_') { if (c == '_' && (att & UL) == 0) Uline(); else if (c == '\b') /* nroff italics */ { if ((a = getchar()) == EOF) a = 0; c = getchar(); if ((att & UL) == 0) Uline(); } if (c != '_' && (att & UL)) /* c is the last underline */ att = ULSTOP; } else if (c == '\b') { if ((c = getchar()) != a) { /* Not a bold: print the character and pass the \b to be printed. */ putout(a); a = '\b'; } else { if ((att & BOLD) == 0) Bold(); for (i = 0; i < 5; i++) if ((c = getchar()) != a && c != '\b') break; } } else if (att & BOLD) Stopbold(); putout(a); if (att & ULSTOP) Stopuline(); a = c; } } putout(c) register char c; { static int expanded; if (c == 07) /* ^G signals expanded font */ { expanded++; return(0); } putchar(c); if (expanded) { if (c == '\n') expanded = 0; else putchar(' '); } } Listing 3 /* * keys.c - gets arrow and function keys from termcap, * returns terminfo codes * changes quit key for use as arrow * * define NO_SYSV for versions of curses that do not look up * arrow & function keys from termcap * * copyright 1988 Ronald Florence * changed VMIN & VTIME for wy99 @ 9600 ron@mlfarm (7/11/88) */ #include <curses.h> #ifndef KEY_DOWN #include "keys.h" #endif #define NKEYS 16 char #ifdef NO_SYSV *tcap_ids[] = { "kd", "ku", "k1", "kr", "kh", "kb", "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7", "k8", "k9", 0 }, #endif *fkeys[NKEYS]; lookupkeys() { #ifdef NO_SYSV static char sbuf[256]; char **key, tbuf[1024], *fill = sbuf, *tgetstr(); int i = 0; tgetent(tbuf, ttytype); for (key = tcap_ids; *key; ++key) fkeys[i++] = tgetstr(*key, &fill); #else fkeys[0] = KD; fkeys[1] = KU; fkeys[2] = KL; fkeys[3] = KR; fkeys[4] = KH; fkeys[5] = KB; fkeys[6] = K0; fkeys[7] = K1; fkeys[8] = K2; fkeys[9] = K3; fkeys[10] = K4; fkeys[11] = K5; fkeys[12] = K6; fkeys[13] = K7; fkeys[14] = K8; fkeys[15] = K9; #endif fixquit(); } getkey() { char cmd[6]; register k; k = read(0, cmd, 6); cmd[k] = '\0'; for (k = 0; k < NKEYS; k++) if (strcmp(cmd, fkeys[k]) == 0) return (k + KEY_DOWN); return ((int) *cmd); } fixquit() { struct termio new; ioctl(0, TCGETA, &new); new.c_cc[VQUIT] = 0xff; /* in case QUIT is an arrow */ new.c_cc[VTIME] = 1; /* minimum timeout */ new.c_cc[VMIN] = 3; /* three characters satisfy */ ioctl(0, TCSETA, &new); } Listing 4 /* * keys. h * copyright 1988 Ronald Florence * * use with curses programs that need extended keyboard * (if tcap.h does not include the defines) */ #define KEY_DOWN 0402 #define KEY_UP 0403 #define KEY_LEFT 0404 #define KEY_RIGHT 0405 #define KEY_HOME 0406 #define KEY_BACKSPACE 0407 #define KEY_F0 0410 #define KEY_F(n) (KEY_F0 + (n)) Listing 5 /* life.c copyright 1985, 1988 Ronald Florence compile: cc -O -s life.c keys.c -lcurses -ltermcap -o life */ #include <curses.h> #include <signal.h> #ifndef KEY_DOWN #include "keys.h" #endif #define ESC 0x1b #define life '@' #define crowd (life + 4) #define lonely (life + 2) #define birth (' ' + 3) #define minwrap(a,d) a = --a < 0 ? d : a #define maxwrap(a,d) a = ++a > d ? 0 : a #define wrap(a,z) if (a < 0) (a) += z; \ else if (a > z) (a) = 1; \ else if (a == z) (a) = 0 #define MAXX (COLS-1) #define MAXY (LINES-3) #define boredom 5 typedef struct node { int y, x; struct node *prev, *next; } LIFE; struct { int y, x; } pos[8] = { { 1,-1}, {1, 0}, {1, 1}, {0, 1}, {-1, 1}, {-1, 0}, {-1,-1}, { 0,-1} }; LIFE *head, *tail; extern char *malloc(); char *rules[] = { " ", "The Rules of Life:", " ", " 1. A cell with more than three neighbors", " dies of overcrowding.", " 2. A cell with less than two neighbors", " dies of loneliness.", " 3. A cell is born in an empty space", " with exactly three neighbors.", " ", 0 }, *rules2[] = { "Use the arrow keys or the vi cursor keys", "(H = left, J = down, K = up, L = right)", "to move the cursor around the screen.", "The spacebar creates and destroys life.", "<Esc> starts the cycle of reproduction.", "<Del> ends life.", " ", "Press any key to play The Game of Life.", 0 }; main(ac, av) int ac; char **av; { int i = 0, k, die(); initscr(); crmode(); noecho(); signal(SIGINT, die); lookupkeys(); head = (LIFE *)malloc(sizeof(LIFE)); /* lest we have an unanchored pointer */ tail = (LIFE *)malloc(sizeof(LIFE)); head->next = tail; tail->prev = head; if (ac > 1) readfn(*++av); else { erase(); if (COLS > 40) for ( ; rules[i]; i++) mvaddstr(i+1, 0, rules[i]); for (k = 0; rules2[k]; k++) mvaddstr(i+k+1, 0, rules2[k]); refresh(); while (!getch()) ; setup(); } nonl(); while (TRUE) { display(); mark_life(); update(); } } die() { signal(SIGINT, SIG_IGN); move(LINES-1, 0); refresh(); endwin(); exit(0); } kill_life(ly, lx) register int ly, lx; { register LIFE *lp; for (lp = head->next; lp != tail; lp = lp->next) if (lp->y == ly && lp->x == lx) { lp->prev->next = lp->next; lp->next->prev = lp->prev; free(lp); break; } } display() { int pop = 0; static int gen, oldpop, boring; char c; register LIFE *lp; erase(); for(lp = head->next; lp != tail; lp = lp->next) { mvaddch(lp->y, lp->x, life); pop++; } if (pop == oldpop) boring++; else { oldpop = pop; boring = 0; } move(MAXY+1, 0); if (!pop) { printw("Life ends after %d generations.", gen); die(); } printw("generation - %-4d", ++gen); printw(" population - %-4d", pop); refresh(); if (boring == boredom) { mvprintw(MAXY, 0, "Population stable. Abort? "); refresh(); while (!(c = getch())) ; if (toupper(c) == 'Y') die(); } } mark_life() { register k, ty, tx; register LIFE *lp; for (lp = head->next; lp; lp = lp->next) for (k = 0; k < 8; k++) { ty = lp->y + pos[k].y; wrap(ty, MAXY); tx = lp->x + pos[k].x; wrap(tx, MAXX); stdscr->_y[ty][tx]++; } } update() { register int i, j, c; for (i = 0; i <= MAXY; i++) for (j = 0; j <= MAXX; j++) { c = stdscr->_y[i][j]; if (c >= crowd c >= life && c < lonely) kill_life(i, j); else if (c == birth) newlife(i, j); } } setup() { int x, y, c, start = 0; erase(); y = MAXY/2; x = MAXX/2; while (!start) { move(y, x); refresh(); switch (c = getkey()) { case 'h' : case 'H' : case ('H' - '@'): case KEY_LEFT: case KEY_BACKSPACE: minwrap(x, MAXX); break; case 'j' : case 'J' : case ('J' - '@'): case KEY_DOWN: maxwrap(y, MAXY); break; case 'k' : case 'K' : case ('K' - '@'): case KEY_UP: minwrap(y, MAXY); break; case '1' : case 'L' : case ('L' - '@'): case KEY_RIGHT: maxwrap(x, MAXX); break; case ' ' : if (inch() == life) { addch(' '); kill_life(y, x); } else { addch(life); newlife(y, x); } break; case 'q' : case 'Q' : case ESC : ++start; break; } } } newlife(ny, nx) int ny, nx; { LIFE *new; new = (LIFE *)malloc(sizeof(LIFE)); new->y = ny; new->x = nx; new->next = head->next; new->prev = head; head->next->prev = new; head->next = new; } readfn(f) char *f; { FILE *fl; int y, x; if ((fl = fopen(f, "r")) == NULL) errx("usage: life [file (line/col pts)]\n", NULL); while (fscanf(fl, "%d%d", &y, &x) != EOF) { if (y < 0 y > MAXY x < 0 x > MAXX) errx("life: invalid data point in %s\n", f); mvaddch(y, x, life); newlife(y, x); } fclose(fl); } errx(m,d) char *m, *d; { fprintf(stderr, m, d); endwin(); exit(0); } Fitting Curves To Data Michael Brannigan Micheal Brannigan is President of Information and Graphic System, IGS, 15 Normandy Court, Atlanta, GA 30324 (404) 231-9582. IGS is involved in consulting and writing software in computer graphics, computational mathematics, and data base design. He is presently writing a book on computer graphics algorithms. He is also the author of The C Math Library EMCL, part of which are the routines set out here. Fitting curves to data ranks as one of the most fundamental needs in engineering, science, and business. Curve fitting is known as regression in statistical applications and nearly every statistical package, business graphics package, math library, and even spreadsheet software can produce some kind of curve from given data. Unfortunately the process and underlying computational mathematics is not sufficiently understood even by the software firms producing the programs. It is not difficult, for example, to input data for a linear regression routine to a well known statistical package (which I shall not name) used on micros and mainframes for which the output is incorrect. Constructing a functional approximation to data (the formal act known as curve fitting) involves three steps: choosing a suitable curve, analyzing the statistical error in the data, and setting up and solving the required equations. Choosing a suitable curve is a mixture of artistic sensibility and a knowledge of the data and where it comes from. Analyzing statistical error can be something of a guessing game and requires some thought. Setting up and solving the equations is computationally the most interesting. It is here that many programs fail because they use computationally unstable methods, but more of that later. The number of methods for data fitting is legion and we suggest some in this article. However, we give only one method in full and consider only 2-D data. Anyone interested in other specific data fitting techniques may contact the author. Problem Given data points (xi,yi)i=1,...,n we suppose there exists a relation yi = F(xi) + ei, i = 1,...,n where F(x) is an unknown underlying function and ei represents the unknown errors in the measurements yi. The problem is to find a good approximation f(x) to F(x). We thus need a function f to work with and some idea, however minimal, of the errors. How To Choose f f will have variable parameters whose correct values (the values that solve the approximation problem) are found by solving a system of equations, each data point defining one equation. We call the function f linear or non-linear if it is linear or non-linear in its parameters. Consider some of the general principles involved in choosing a suitable function f. We must have more data points than parameters, otherwise f will fit the data exactly and we will not model the errors. Unless absolutely necessary, don't use a non-linear f; solving systems of non-linear equations uniquely is, except for special cases, nearly impossible. In most cases polynomials are not a good choice; they are wiggly curves and nearly always wiggle in the wrong places. The best option in most cases is to use piecewise polynomials. The example we give is a piecewise cubic polynomial such that the first derivatives are continuous everywhere. (You can, of course, use cubic splines if you want second derivatives to be continuous, but in most cases the example set out here is superior for a general purpose curve fitting routine. If you want the full cubic spline, please use the B-spline formulation, no other, otherwise you get unstable systems of equations resulting in incorrect solutions. Using the B-spline formulation for spline approximation, you need only change the routine coeff_cubic() in the program given in this article. The system of equations is solved by the same routines.) Once f has been chosen and applied to each data point, we obtain a system of linear equations to solve, where the number of equations will be greater than the number of unknowns. Such a system is called an overdetermined system and no exact solution exists -- one that is exactly what we want. However, overdetermined systems have an infinite number of inexact (approximate) solutions; we will seek an approximation that minimizes some particular error measure. (Mathematicians call these error measures "norms". Thus the problem of curve fitting becomes an optimization problem.) Of the infinite possible norms three should be considered for any curve fitting package: the L1-norm, the L2-norm (least squares norm), and the minimax (Chebyshev) norm (These norms are defined later in this article.). Fortunately good algorithms exist for solving overdetermined systems of linear equations in all three norms. For the L1-norm and the minimax norm, you use a variation of the simplex method of linear programming; for the L2-norm you use a QR decomposition of the matrix in preference to the computationally unstable method of solving the normal equations. (We cannot give all the program code here as space is limited but for more guidance the reader can contact the author.) Of many possible combinations the following solution is a good general-purpose option. Solution We have data points (xi,yi)i=1,...,n. Let each xi belong to some interval [a,b]. Specify k points Xj j=1,...k, on the X-axis, we call these points knots. These knots are such that a = X1 < X2 < < Xk = b We can now define our function as follows: for each x in the interval [Xj,Xj+1] define the cubic polynomial y = [(d3 - 3dp2(x) + 2p3(x))Yj + (dp(x)q2(x))Yj' + (3d - 2p(x))p2(x)Yj+1 + dp2(x)q(x)Yj+1']/d3 where d = Xj+1 - Xj p(x) = x - Xj q(x) = Xj+1 - x Thus y is a cubic polynomial with the linear parameters Yj, Yj+1,Yj',Yj+1', which are the values and first derivatives at the knots Xj and Xj+1 respectively. For each data point we obtain one linear equation so we can set up n linear equations in the 2k unknowns Y1,Y1',..Y, k,Y k'. In matrix form this can be written as AY = b where A is a block diagonal matrix, Y is the vector of unknowns, and b is the vector of y values. Because A is block-diagonal, for very large data sets optimal use should be made of the structured sparsity. With the same knots we could also define cubic B-splines and then fit a cubic spline to the data. We would again arrive at an overdetermined system of linear equations with a matrix of coefficients having block-diagonal structure. In fact the equations we have set out above form a cubic spline with each knot Xj a double knot. Choosing A Norm For each possible solution Y we have errors si i=1,...,n such that AY - b = s where s is the vector of si values. The L1-norm is defined to be åabs(si) i The L2-norm or least squares norm is (åsi2)1/2 i And the minimax or Chebyshev norm is max(abs(si) :i=1,...,n) We solve the overdetermined system of equations by finding that vector Y which minimizes one of these norms. The choice of norm depends on the unknown errors ei and we hope that the choice of norm will give errors si that will mirror these unknown errors. The general rule is: choose the L1-norm if the ei are scattered (belong to a long tailed distribution); choose the L2-norm if the ei are normally distributed; choose the minimax norm if the ei are very small or belong to a uniform distribution. Research has indicated that data sets have errors nearer to the L1-norm than the L2-norm. (Errors in data are never normally distributed, neither as they are nor in the limit. This assumption of normally distributed errors is common in most packages, the user should question this assumption very carefully.) So when you don't know how the errors are distributed, use the L1-norm. The minimax norm is rarely used for fitting curves to experimental data. However, always use the minimax norm if you want to fit a function to another function, for example fitting a Fourier series to a complicated function where you know the values exactly. Whichever norm you choose, the computer solution of the equations is not straightforward. You must choose an algorithm that is computationally stable. (A computationally unstable algorithm is one that is mathematically correct but when fed into a computer, produces wrong answers. For example solving linear equations without pivoting, or solving quadratic equations from the well-known formula. So get some professional help in choosing the algorithm.) Program After you have spent some time analyzing your particular data fitting problem, decided upon a suitable function to approximate the data, and also decided upon the norm to use for the errors in the data, you must program the result. Unless your application requires special functions, then the approximating function set out above is a good general purpose function. The programming for this function or any other has the same form. The system of equations is set up with one equation for each data point, and then the system is solved with the required norm. For the function described here the programming is just as straightforward. The main routine is Hermite(), named after the mathematician who defined these piecewise polynomials. The routine first gives the user the choice (by setting the variable flag) of either setting the k knots lambda[] on input or using the routine app_knots() to compute the knots. In most cases the user will never just use the routine once but compute a first approximation then alter the position of the knots for a second approximation. For a first approximation set flag to true and use app_knots() to compute the knots automatically. Then look at the result and choose new knots. A more sophisticated method automatically chooses the number of knots k and their position. Once the knots are defined the routine allocates space for the matrix A of size nx2k. After making sure all elements of the matrix are zero, the routine calls coeff_cubic() to set up the coefficients of the matrix. Now the program solves the overdetermined system in the appropriate norm. The variable norm is set by the user to indicate which norm to use. (We do not give here the three routines that solve the overdetermined system of equations as they require lots of space, but the reader can find the algorithms in most computational mathematics textbooks.) The routine L1_approx() uses the Ll-norm, the routine CH_lsq() uses the least squares norm, and the routine Cheby() uses the minimax norm. With the solution from the appropriate routine, the function now fits the data. Some words on the other routines. First, the routine app_knots() will compute k knots lambda[j] so that in each interval (lambda[j], lambda[j+1]) there are approximately the same number of x values. This is a good starting point for our Hermite approximation and for any spline approximation that needs knots. The routine coeff_cubic() is merely a direct translation of the formulae. This routine uses interval(), which finds to which knot interval each x value belongs. coeff_cubic() also uses the macro SUB() to move around the matrix (this is my preferred method for dealing with matrices passed as parameters). Finally there is the routine app_cubic(). This routine uses the results from Hermite() to compute the piecewise polynomial for any value of x. Thus app_cubic() completes the curve fitting problem. Example An example (using data from actual measurements of the horizontal angular displacement of an elbow flexion against time) will show how the pieces fit together. There are 142 measured points and these measurements are quite accurate (the experimenters knew the kind of instruments they were using--see the paper by Pezzack, et al). In this instance a close fit to the data points is required. In all the figures the dark circles are the knots and the crosses are the data points. The solution is in Figure 1. Figure 2 shows the result when the L2-norm is used. Figure 3 shows the result when the minimax norm is used. As would be expected with such "clean" data, the answers are all quite good, the best being Figure 1. To illustrate the behavior in the presence of noise, add some significant errors to the same data points. Using the same curve approximation method, then Figure 4, Figure 5, and Figure 6 show the result when using the L1- norm, L2-norm, and minimax norm respectively. As theory suggests, the Ll-norm gives definitely superior results. This example is a straightforward application of the method set out here -- well, nearly! You may be asking the six thousand dollar question, "How do I choose the knots?" The answer is not straightforward and contemporary research has different answers. As you can see from the figures, the number and position of the knots changes for each example. The goal is to choose the number of knots and their position so as to give the best fit possible for the norm chosen--easy to say but not easy to compute. All the knots in each figure have been chosen according to an information theoretic criterion, plus a little experience on the placement of knots. The idea behind this method is to attempt to extract the maximum amount of information from the data points until only error remains. To do this we need a computable value for the amount of information contained in the errors si; we suggest using the Akaike Information Criterion. The routine changes the number of knots and their position until there is no more information in the errors. For those readers who wsh to go further into this problem, see the papers by Brannigan for a full mathematical treatment of this method, the information theoretic criterion, and an extension to multivariate data. Bibliography Pezzak, J.C. et al. "An Assessment of Derivatives Determining Techniques Used for Motion Analysis," J. Biomechanics, 10 (1977). Brannigan, M. "An Adaptive Piecewise Polynomial Curve Fitting Procedure for Data Analysis," Commun. Statist. A10(18), (1981). Brannigan, M. "Multivariate Data Modelling by Metric Approximants," Comp. Stats. & Data Analysis 2, (1985). Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Listing 1 Coeff_Cubic void coeff_cubic (a,p,q,x,y, lambda,k) /* * Set up the equations for the Hermite cubic approximation. */ double *a,*x,*y,*lambda; int p,q,k; { double d,alpha,beta,d3,alpha3; int i,j,col; for (i=0; i<p; i++) { j = interval (lambda,x[i],k); col = SUB(i,2*(j-1),q); d = lambda[j] - lambda[j-1]; alpha = x[i]-lambda[j-1]; beta = d-alpha; d3 = d*d*d; alpha3 = alpha*alpha*alpha; *(a+col) = (d3-3.0*d*alpha*alpha+2.0*alpha3)/d3; *(a+col+1) = d*alpha*beta*beta/d3; *(a+col+2) = (3.0*d-2.0*alpha)*alpha*alpha/d3; *(a+col+3) = -d*alpha*alpha*beta/d3; } } int interval (x,v,n) /* * Given a value v find the interval j such that v is in the interval * x[j-1] to x[j], where x is an increasing set of n values. */ double x[],v; int n; { int j = 0, found = 0; if (v == x[0]) return(1); while (!found && ++j<n) found =( v<=x[j] && v> x[j-1]) ? 1:0; return(j); } double app_cubic (x,j,lambda,res) /* * Given the result res[] from the routine Hermite() find the value * of y for the given x value. */ double x,*lambda,*res; int j; { double d,alpha,beta,d3,alpha3,sum,val[4]; int i, col; col = 2*(j-1); d = lambda[j] - lambda[j-1]; alpha = x-lambda[j-i]; beta = d-alpha; d3 = d*d*d; alpha3 = alpha*alpha*alpha; val[0] = (d3-3.O*d*alpha*alpha+2.0*alpha3)/d3; val[1] = d*alpha*beta*beta/d3; val[2] = (3.0*d-2.0*alpha)*alpha*alpha/d3; val[3] = -d*alpha*alpha*beta/d3; for (sum=0.0,i=0; i<4; i++) sum += val[i]*res[col+i]; return (sum); } Hermite (Listing 2) #define SUB(i,j,k) (i)*(k)+(j) double Hermite (x,y,n,norm,lambda,k,flag,res,err) /* * Given n data points (x[],y[]) find the Hermite cubic approximation * to this data using the k nots lambda[]. If flag = true then find the * knots from the routine app_knots() otherwise lambda[] is set by the * user. The 2k result is returned in res[] and the error at each point * is returned in err[].The overdetermined system of equations is * solved with respect to the value of norm, uses L1-norm if norm = 1, * uses the L2-norm if norm = 2, and uses the minimax norm if norm = 3. * The return value z is the size of the resultant norm. */ double *x,*y,*lambda,*res,*err; int n,norm,k,flag; { double *a,z; int i,j,l,kk,m,m2; /* * Find whether the knots are to be computed. */ if (flag) app_knots (x,n,lambda,k); /* * Now form the system of equations one equation per data point. */ m2 = n*2*k; /* * Allocate space for the matrix. */ a = (double*)calloc(m2,sizeof(double)); if (a==0) printf ("\n NO DYNAMIC SPACE AVAILABLE"); else { for (i=0; i<m2; i++) *(a+i) = 0.0; coeff_cubic (a,n,m,x,y,lambda,k); /* Set up the matrix. */ switch (norm) { case 1: z = Ll_approx(a,n,m,m,y,res,err]; /* L1-norm solution */ break; case 2: z = CH_lsq {a,n,m,m,y,res,err); /* L2-norm solution */ break; default: z = Cheby (a,n,m,m,y,res,err); /* Minimax norm solution */ break; } free (a); } return (z); } void app_knot0s (x,n,lambda,k) /* * Given n x[] values compute k knots lambda[] such that the * distribution of points in each interval is nearly the same. */ double *x,*lambda; int n,k; { int i,j,s,t; lambda[0] = x[0]; lambda[k-1] = x[n-1]; if (k>2) { i = n/(k-1); j = (n-(i*(k-3)))/2; lambda = x[j]; if (k>3) { s = j; for (t=2; t<k-1; t++) {s+=i; lambda[t] = x[s]; } } } } A Simple Application Environment Mark A. Johnson Mark Johnson has been designing software for a major R&D corporation since 1976. He received a BSCS from the University of Pittsburgh and his MSCS from Carnegie-Mellon. His current computer interests include languages, programming for children, business applications, and computer-generated music. Mark is continuing to develop other DCUWCU applications. Having used a mouse in user interfaces since 1981, I believe it to be most convenient way to inform a computer program what you want it to do. I wanted to use a mouse in a number of PC programs and so looked into a few application environments. Microsoft Windows and Digital Research's GEM disappointed me, due to the complexity that had to be mastered. The resource construction sets, complex window management, and other overheads needed to write a simple application led me to write my own simple application environment based on Turbo C graphics routines and a public domain mouse interface. My goal was to build an easy-to-use environment that provides a mouse-driven cursor, stacked pop-up menus, and forms that contain editable fields and a variety of selectable buttons. The environment would keep track of what the user was doing, inform the application as needed, and clean up after itself. An additional goal was to facilitate porting the environment to other machines that have a mouse, bitmap graphics, console I/O, and a simple timer. I have the same DCUWCU environment on my PC compatible and Atari ST, allowing me to easily move applications between systems. Operation A typical application begins with a blank screen (or suitable greeting) showing an arrow-shaped cursor controlled by the mouse. Pressing the right mouse button displays a set of stacked pop-up menus. While holding the right mouse button down, the user selects an item (or another menu) from the front-most menu and then releases the button. If a menu item was selected, then the application acts on that selection. If another menu was selected, it is brought to the front of the menu stack, ready for another round of menu item selection. Pressing the left mouse button or a keyboard key usually causes an application-specific action, often resulting in a form appearing on the screen that the user must fill out. When processing the form, all mouse and keyboard events are handled by the environment. Keyboard input is directed to the current editable field, denoted by the special input cursor. A TAB moves the input cursor to the next editable field. An ESC (cancel) or ENTER (accept) terminates form processing, returning data and control back to the application. Some forms may contain small text labels, called form buttons, that are selected (or de-selected) by moving the cursor over them and pressing the left mouse button. There are three types of buttons: plain, radio, and exit. A plain button is a simple on/off switch. A radio button is a one-of-many switch, much like the buttons on a car radio. An exit button is like the plain button, but causes form processing to end. The application environment works equally well when no mouse is present by using the cursor keys to simulate mouse motion and the function keys F1 and F2 to simulate the left and right mouse buttons. Pressing F2 once simulates pressing the right mouse button; another press of F2 simulates its release. A single press of the F1 button simulates the left mouse button. Application Interface The interface between application and environment was made as simple as possible: strings are used to define forms and menus, pointers to variables are used to store values collected by forms, and calls to functions inform the application of user events, such as menu selection or mouse button clicks. The application environment follows (and is named after) what is called "The Hollywood Principle," (or "don't call us, we'll call you"). An application developer supplies four critical routines, called when the application environment detects various user interface events. start(argc, argv, envp) int argc; char **argv, **envp; This is the initialization routine called immediately after initializing the graphics interface but before the environment is completely started. It is passed the same arguments normally passed to a C main() routine. The start() routine usually initializes the application and creates the menu stack using repeated calls to add_menu(). menu(m, i) The menu() routine is called whenever a menu selection is made. The application environment supports a stack of pop-up menus. Any number of menus can be supported, although only two or three are usually active at any one time in order to minimize interface complexity (see menu_state() below). The m argument identifies which menu was selected. When the menu is first declared (see add_menu()), the application provides a value that identifies the menu. This same identifer value is passed back to the application when a menu is selected. The i argument specifies which menu item was selected, a value of 1 meaning the first item, etc. button(b, x, y) The button() routine is called when a mouse button is pressed. The right mouse button is reserved for menu manipulation, all others are passed to the application. The b argument is the button number (usually 1) and the x and y arguments are the mouse coordinates when the button was pressed. keyboard(key) The keyboard() routine is called whenever a console key is struck. The character typed by the user is contained in the single argument. timer(t) The (optional) timer() routine is called whenever an application-requested timer expires. When the timer is requested, a value identifying the timer is passed to the application environment. The same identifer value is passed back to the application in the t argument when the timer expires. Environment Interface The application environment provides some basic routines that an application can call for control and service. finish() The finish() routine is called whenever the application is done and the program must exit. add_menu(m, mdef) char *mdef; The add_menu() routine adds a menu to the current set of pop-up menus maintained by the environment. An application typically initializes all its menus from the start() routine. The m argument is remembered by the environment and passed back to the application when a menu selection is made. The mdef argument is a string that defines the menu. For example, add_menu(1, "Main:AboutHelpQuit") defines a menu identified as menu 1, titled Main, and with three items: About, Help, and Quit. menu_state(m, on) The menu_state() routine allows the application to activate or de-activate a particular menu. The m argument refers to the menu defined with a previous add_menu() call. The on argument should be set to 1 to activate or 0 to deactivate the menu. menu_item(m, i, str) char *str; The menu_item() routine is used to change the name of a particular menu item. For example, suppose a drawing program can turn a grid on and off. The application might have a menu item called Grid when no grid is shown and change it to No Grid using menu_item() when the grid is shown. mouse_state(on) The mouse_state() routine will activate or deactivate the mouse-driven cursor. The on variable should be set to 1 to show the mouse and 0 to hide it. mouse_shape(type, bits, hotx, hoty) char *bits; The application can control the cursor's shape with mouse_shape. There are two built-in forms: arrow and cross. A type value of 0 and 1 specify arrow and cross, respectively. A type value of 3 allows the user to specify a custom designed mouse cursor. The bits argument is a pointer to an eight-byte character array containing the mouse bitmap (8 x 8 bits). The hotx and hoty arguments indicate which bit in the bitmap is considered the hotspot of the cursor. For example, the cross form has a hotspot of hotx=hoty=3, which is the center of the 8 x 8 bitmap. add_timer(t, wait) long wait; Many applications, especially games, require some sense of the passage of time. Using add_timer(), the application can arrange for timer() to be called after some time has elapsed. The application's timer() routine can do such things as blank the screen if no activity has taken place for many minutes or move sprites around the screen after a few tenths of a second. The t argument identifies a particular timer and is passed back to the application when the timer expires. The wait argument specifies the needed delay in milliseconds (e.g. wait=1000L is a delay of one second). form(def, argptr1, ...) char *def; form() displays a form on the screen, collects data from the user, and deposits it in the variables pointed to by the argptr parameters. This routine is somewhat similar to scanf(). The form definition string defines a number of fields. For example, " Name: %15s Number: %5d %[malefemale] %[over 55] %{ok}" This form definition would result in the following being displayed in the middle of the screen surrounded by a rectangle. Name:____________________LINEEND____ Number:_____ [malefemale] [over 55] {ok} Most of the text in the definition string is used as titles. A signifies the beginning of a new line in the form. A data field begins with a % and is associated with a particular variable. There are five types of data fields. (For the examples that follow, assume the following declarations occur before the call to form(): char c, buf[11]; int x;). (See Table 1.) If a character pointer fdef points to the string described in Table 1, the following code fragment uses form() correctly. char name[16], m_or_f = 0, over_55 = 1, ok = 0; int number; if (form(fdef, name, &number, &m_or_f, &over_55, &ok)) { /* do stuff with name, number, etc. */ } After filling out the form, if the user selects the {ok} button or hits ENTER, form() returns a non-zero value. The data values collected by the form and stored in name, number, m_or_f, and over_55 are processed further by the if statement. If the user strikes the ESC key while filling out the form, form() returns zero and the processing doesn't take place. An Example Listing 1 is a simple drawing program that illustrates how to build a DCUWCU application. The code for this example and the complete source for the DCUWCU have been added to the CUG Library. See the New Releases column by Kenji Hino for more details. Conclusion I have used the "Hollywood Principle" design model for a number of projects and have found it to shorten development time and result in a robust application. The mouse is an effective user interface device and, when coupled with pop-up windows and forms, provides clean, uncluttered operation. I would like to acknowledge the designers of the many mouse-based user interfaces I have used in the past, such as the Apple Macintosh, Microsoft Windows, DR/Atari GEM, but most significantly the Xerox Mesa Development System, for the inspiration to build this simple application environment. Table 1 Field Example Argument Values ------------------------------------------- text %10s &buf[0] string number %5d &x integer button %[abc] &c 0 or 1 radio %[abc:def:ghi] &c 0, 1, ... exit %{ok} &c 0 or 1 Listing 1 /* * this is a very simple drawing program that * illustrates how to build a DCUWCU application * * Copyright 1989 Mark A. Johnson */ #include <stdio.h> #include <graphics.h> #define M_POINTER 0 /* mouse shapes */ #define M_CROSS 1 #define ON 1 #define OFF 0 #define MAX_OBJECT 100 #define ESC 27 #define BOX 'b' /* object types we support */ #define ELLIPSE 'e' #define LINE 'l' #define TEXT 't' #define M_MAIN 1 /* handles for the menus */ #define M_FILE 2 #define M_OBJ 3 #define M_ACT 4 #define A_COPY 1 /* action requests for button() */ #define A_MOVE 2 #define A_EDIT 3 #define min(a,b) ((a) < (b) ? (a) : (b)) #define max(a,b) ((a) > (b) ? (a) : (b)) typedef struct { int type, l, t, r, b; char select, *data; } Object; Object objects[MAX_OBJECT]; /* the table of objects defined so far */ int last_object; /* the end of the object table */ int map[] = { /* maps a M_OBJ menu item to an object */ 0, BOX, ELLIPSE, LINE, TEXT }; char *about = /* form used on the M_MAIN About item */ " Draw This! byMark A. Johnson %{continue}"; char *help = /* help message for wrong keyboard input */ "quit refresh : box line ellipse text : delete copy move edit"; char filename[20]; /* save the filename we're working with */ char text[100]; /* extra buffer for text i/o */ int actn_obj = 0; /* flag for button(), some action req */ int make_obj = 0; /* flag for button(), need to create */ int slct_cnt = 0; /* count of selected objects */ int first; /* helps make_object collect points */ int grid = 0; /* grid displayed, snap coords */ extern int Maxx, Maxy, MaxColor; /* start routine, called by the application driver, gets things going */ start(argc, argv) char **argv; { add_menu(M_MAIN, "Main:AboutQuitRefreshGrid"); add_menu(M_FILE, "File:ReadWriteSavePrint"); add_menu(M_OBJ, "Objects:BoxEllipseLineText"); add_menu(M_ACT, "Actions:DeleteCopyMoveEdit"); menu_state[M_ACT, 0); if (argc> 1) { strcpy(filename, argv[1]); read_objects(); } } /* no timers in this application , but DCUWCU needs an entry anyway */ timer() {} /* button routine called every time button 1 is depressed */ button(b, x, y) { if (make_obj) { /* need points to make an object */ make_object(x, y); } else if (actn_obj) { /* got a point for a copy or move */ action_object(x, y); } else { /* do a selection */ select_object(in_object(x, y)); } check_menu(); } /* menu routine called every time a menu item is selected */ menu(m, i) { char junk = 0, on; switch (m) { case M_MAIN: /* main menu */ switch (i) { case 1: form(about, &junk); break; case 2: quit(); break; case 3: refresh(); break; case 4: do_grid(); break; } break; case M_FILE: /* file menu */ if (i < 3 && !get_name()) break; switch (i) { case 1: read_objects(); break; case 2: case 3: write_objects(); break; case 4: print(); break; } break; case M_OBJ: /* objects] */ start_make(map[i]); break; case M_ACT: /* actions */ switch (i) { case 1: kill_object(); break; case 2: start_actn(A_COPY); break; case 4: start_actn(A_MOVE); break; case 4: start_edit(); break; } break; } } /* routine called everytime a key is struck */ keyboard(c) { switch (c) { case 'p': print(); break; case 'g': do_grid(); break; case 'r': refresh(); break; case 'q': quit(); break; case 'b': start_make(BOX); break; case 't': start_make(TEXT); break; case 'l': start_make(LINE); break; case 'm': start_actn(A_MOVE); break; case 'c': start_actn(A_COPY); break; case 'd': kill_object(); break; case 'e': if (slct_cnt) start_edit(); else start_make(ELLIPSE); break; default: msg(help); } } /* time to go, see if they really want to */ quit() { char yes = 0, no = 0; char *f_exit = "Are you sure? %{yes} %{no}"; if (form(f_exit, &yes, &no) && no == 0) finish(); } /* * miscellaneous support routines */ /* reset the current grid size */ do_grid() { char gridval, ok = 0, nok = 0, x; switch (grid) { case 8: gridval = 1; break; case 16: gridval = 2; break; default: gridval = 0; break; } x = form("Change Grid Size %[none:8:16]%{ok} %{cancel}", &gridval, &ok, &nok); if (x == 0 nok) return; grid = gridval * 8; refresh(); } /* print the current screen somewhere, Epson-compatible graphics mode */ print() { static char grhd[] = { ESC, 'L', 0, 0 }; /* 960 bit graphics */ static char grlf[] = { ESC, 'J', 24, '\r' }; /* line feed */ static char prbuf[960]; int x, y, i, b, n, any, pixel, max; max = min(Maxx, 960); grhd[2] = max; grhd[3] = max >> 8; mouse_state(OFF); b = 0x80; any = 0; for (y = 0; y < Maxy; y++) { for (x = 0; x < max; x++) { if (getpixel(x, y)) { any = 1; prbuf[x] = b; } } b >>= 1; if (b == 0) { /* out it goes */ if (any) { prn(grhd, 4); prn(prbuf, max); } prn(grlf, 4); b = 0x80; any = 0; for (x = 0; x < max; x++) prbuf[x] = 0; } } mouse_state(ON); } /* print the n bytes out the printer port */ prn(s, n) char *s; { while (n-) biosprint(0, *s++, 0); } /* select or de--select an object */ select_object(obj) { int i; Object *o; if (obj == --1) { /* de--select all */ for (i = 0; i < last_object; i++) { o = &objects[i]; if (o-->select) { o-->select = 0; highlight(o, 0); } } slct_cnt = 0; } else { o = &objects[obj]; o-->select = !o-->select; highlight(o, o-->select); slct_cnt += o-->select ? 1 : --1; } } /* get a filename from the user, return 0 if abort */ get_name() { return form("Path: %20s", filename); } /* based on current select state, set the top--most menu */ check_menu() { menu_state(M_ACT, slct_cnt > 0); menu_state(M_OBJ, slct_cnt <= 0); } /* start to make an object by collecting points */ start_make(tape) { char *s; switch (make_obj = type) { case BOX: s = "box: top left corner..."; break; case ELLIPSE: s = "ellipse: top left corner..."; break; case LINE: s = "line: one end..."; break; case TEXT: s = "text: starting..."; break; } msg(s); mouse_shape(M_CROSS); first = 1; } /* if enough points have been collected, make the object */ make_object(x, y) { static int fx, fy; if (grid) snap(&x, &y); switch (make_obj) { case TEXT: *text = 0; form("text: %20s", text); add_object(TEXT, x, y, x + strlen(text)*8, y+8, text); make_obj = 0; mouse_shape(M_POINTER); msg(""); break; default: if (first) { fx = x; fy = y; first = 0; line(x--3, y, x+3, y); line(x, y--3, x, y+3); if (make_obj == LINE) msg("other end..."); else msg("bottom right corner..."); } else { add_object(make_obj, fx, fy, x, y, 0L); msg(""); make_obj = 0; mouse_shape(M_POINTER); } } } /* snap the coordinates to the nearest grid point */ snap(xp, yp) int *xp, *yp; { int g2 = grid/2, g4 = grid/4, x = *xp, y = *yp; x = ((x + g2) / grid) * grid; y = ((y + g4) / g2) * g2; msg("x %d-->%d y %d-->%d", *xp, x, *yp, y); *xp = x; *yp = y; } /* move, copy, or edit a figure */ action_object(x, y) { int i, dx, dy; Object *o; if (grid) snap(&x, &y); /* find reference point and compute distance moved */ dx = dy = (actn_obj == A_EDIT ? 0 : 10000); for (i = 0; i < last_object; i++) { o = &objects[i]; if (o-->select) { if (actn_obj == A_EDIT) { dx = max(o-->r, dx); dy = max(o-->b, dy); } else { dx = min(o-->l, dx); dy = min(o-->t, dy); } } } dx = x -- dx; dy = y -- dy; /* do it to all selected items, de-selecting as you go */ for (i = 0; i < last_object; i++) { o = &objects[i]; if (o-->select) { o-->select = 0; highlight(o, 0); switch (actn_obj) { case A_COPY: highlight(o, 0); add_object(o-->type, o-->l + dx, o-->t + dy, o-->r + dx, o-->b + dy, o-->data); break; case A_MOVE: draw_object(o, 0); o-->l += dx; o-->t += dy; o-->r += dx; o-->b += dy; draw_object(o, 1); break; case A_EDIT: draw_object(o, 0); set_coords(o, o-->l, 0-->t, o-->r + dx, o-->b + dy); draw_object(o, 1); break; } } } /* deselect all and reset the mouse */ actn_obj = 0; slct_cnt = 0; mouse_shape(M_POINTER); msg(""); check_menu(); } /* read objects from a file */ read_objects() { int type, t, l, r, b; FILE *f = fopen(filename, "r"); if (f != NULL) { last_object = 0; while (fgets(text, 100, f)) { sscanf(text, "%c %d %d %d %d '%[^']\n", &type, &l, &t, &r, &b, text); add_object(type, l, t, r, b, text); } fclose(f); msg("%d objects loaded", last_object); } else msg("can't open '%s'", filename); } /* write objects to a file */ write_objects() { int i; Object *o; FILE *f; if (*filename == 0 && !get_name()) return; if ((f = fopen(filename, "w")) != NULL) { for (i = 0; i < last_object; i++) { o = &objects[i]; fprintf(f, "%c %d %d %d %d '%s'\n", o-->type, o-->1, o-->t, o-->r, o-->b, o-->type == TEXT ? o->data : ""); } fclose(f); } else msg("can't write '%s'", filename); } /* save the given string in malloc'ed memory */ char * strsave(s) char *s; { char *malloc(); char *r = malloc(strlen(s)+1); if (r) strcpy(r, s); else msg("out of memory!!!"); return r; } /* re--display all the objects on the screen */ refresh() { int i, x, y, gy; Object *o; clearviewport(); setcolor(MaxColor); if (grid) { gy = grid/2; for (x = grid; x < Maxx; x += grid) for (y = gy; y <Maxy; y += gy) putpixel(x, y, 1); } for (i = 0; i < last_object; i++) { o = &objects[i]; draw_object(o, 1); if (o->select) highlight(o, 1); } } /* (de)highlight the current selected item */ highlight(o, color] object *o; { setcolor(color); rectangle(o-->l--2, o-->t--2, o-->l+2, o-->t+2); rectangle(o-->r--2, o-->b--2, o-->r+2, o-->b+2); } /* give the user some feedback */ msg(fmt, a, b, c, d) char *fmt; { static int lastback = 0; setfillstyle(EMPTY_FILL, 0); bar(0, 0, lastback, 8); sprintf(text, fmt, a, b, c, d); setcolor(MaxColor); outtextxy(0, 0, text); lastback = strlen(text) * 8; } /* * object handling */ /* see if x, y are in an object, begin looking at start + 1 */ in_object(x, y) { static int last = 0; int l, r, t, b; Object *o; int i = last+1, n = last_object; while (n-) { if (i >= last_object) i = 0; o = &objects[i]; l = min(o-->l, o-->r); r = max(o-->l, o-->r); t = min(o-->t, o-->b); b = max(o-->t, o-->b); if (x >= l && x <= r && y >= t && y <= b) return (last = i); i++; } return (last = --1); } /* add an object to the object table */ add_object(type, l, t, r, b, data) char *data; { Object *o = &objects[last_object++]; char *s; o-->type = type; set_coords(o, l, t, r, b); o-->select = 0; if (type == TEXT) o-->data = strsave(data); draw_object(o, 1); } /* set the coordinates properly */ set_coords(o, l, t, r, b) Object *o; { if (o-->type == LINE) { /* no fixup on these */ o-->l = l; o-->t = t; o-->r = r; o-->b = b; } else { o-->l = min(l, r]; o-->t = min(t, b); o-->r = max(l, r); o-->b = max(t, b); } } /* draw an object on the screen */ draw_object(o, color) Object *o; { int x, y, xrad, yrad; setcolor(color); switch (o->type) { case TEXT: x = strlen(o->data) * 8; setfillstyle(EMPTY_FILL, 0); bar(o-->l, o-->t, o-->l + x, o-->t + 8); outtextxy(o-->1, o-->t, o-->data); break; case BOX: rectangle(o-->l, o-->t, o-->r, o-->b); break; case LINE: line(o-->l, o-->t, o-->r, o-->b); break; case ELLIPSE: x = o-->l + (o-->r -- o-->l)/2; y = o-->t + (o-->b -- o-->t)/2; xrad = o-->r -- x; yrad = o-->b -- y; ellipse(x, y, 0, 360, xrad, yrad); break; } } /* delete an object */ kill_object() { int i, j; Object *o; for (i = j = 0; i < last_object; i++) { o = &objects[i]; if (o-->select) { highlight(o, 0); draw_object(o, 0); o-->select = 0; } else { if (i > j) objects[j++] = objects[i]; else j++; } } last_object = j; slct_cnt = 0; check_menu(); } /* start an edit on the selected objects */ start_edit() { int i; Object *o; /* edit the text objects now */ for (i = 0; i < last_object; i++) { o = &objects[i]; if (o-->type == TEXT && o-->select) { o-->select = 0; highlight(o, 0); draw_object(o, 0); strcpy(text, o-->data); if (form("edit: %20s", text)) { free(o-->data); o-->data = strsave(text); o-->r = o-->l + strlen(text)*8; } draw_object(o, 1); slct_cnt-; } } if (slct_cnt > 0) { /* must be other stuff */ start_actn(A_EDIT); } check_menu(); } /* initiate an action on selected objects */ start_actn(actn) { switch (actn) { case A_COPY: msg("copy to..."); break; case A_MOVE: msg("move to..."); break; case A_EDIT: msg("editing..."); break; } actn_obj = actn; mouse_shape(M_CROSS); } Spiffier Windows For Turbo C Tony Servies Tony Servies is a programmer/analyst with World Computer Systems in Oak Ridge, Tennessee. Presently he is working on a project to develop computer-based training programs for the U.S. Navy. His computer interests include utilities and C programming. You may contact him at Route 1, Box 143, Greenback, TN 37742. Want to spice up your user interface with flashy windows with only a minimal amount of coding and time? With a few lines of code and Borland's Turbo C, it's possible. Turbo C Window Interface Two functions can be used to create text windows in Turbo C: gettext() and puttext(). Each function either gets a screen image or puts an image to the screen, respectively. The programmer supplies only the window coordinates and a character string pointer (or character array, if you will); the function does the rest. These remarkable routines do some rudimentary screen I/O quickly and cleanly. One drawback though is the inherent lag between the time you write to the window area and the moment the text is displayed. The user 'sees' any text writing that you perform. Most applications today require a window flashed on the screen intact, such as in a pull-down menu. The code in Listing 1 allows writing to a window before it is displayed on the screen. Then when the window is flashed on the screen, it is complete. You call puttext_write with the x,y coordinates, window size, the character string to display, the attribute for the string, and the pointer to the window buffer. The x,y coordinates start from the upper left corner with the location 0,0. The size of the window is given with the number of columns (width) and the number of rows (heighth). The string to display is simply a character string stored in standard C format; a '\0' character terminates the string. The buffer is a pointer to an area of characters that denotes the window area. The string attribute is the usual color attributes found in almost every reference manual on PCs. How It Works puttext_write first checks that you are not positioning the data beyond the physical bounds of the window. Of course, this routine will wrap any text past the end of a line onto the following line (unless it is the last line in the window). The routine then gets the pointer address for the last character and attribute pair in the window area, called maxbuffer in the subroutine. The offset for the proper x,y location is added to the buffer so that it points to the correct character. While the buffer location is less than the maxbuffer pointer and the character in the string is not the end of string terminator ('\0'), the while loop updates the character and attribute of the buffer. The loop terminates only when the buffer overflows (buffer >= maxbuffer) or at the end of string (*string == '\0'). Now, just put the window on the screen and you're ready to go. I've included a quick and dirty sample program illustrating the flashy windows routine (Listing 2). Note how easy it is to create a window. Just use a character array of XSIZE*YSIZE*2 bytes. (You multiply the area by two because each displayed character is followed by a byte of attribute information (color, blink, etc.).) The program then clears the window and sets all of the attributes. In this example I set all attributes to magenta characters on cyan background. Then the routine that does the actual call to puttext_write() loops through ten times. After the page is full, I put the window on the screen with the puttext() command and wait for half a second. The routine loops through nine more times until it completes the for loop. I then restore the original screen with the puttext() call for the original screen area (oldbuffer). This routine should enable you to enhance your pull- down, pop-up, and user-entry screens. Feel free to modify the code to account for border areas, highlighted text, etc. Listing 1 puttext_write (x, y, xsize,ysize,string,attr,buffer) int x,y,xsize,ysize; char *string, attr, *buffer; { char *maxbuffer; if (x >= xsize y >= ysize) /* Range Errors */ return; maxbuffer = buffer+(xsize*ysize*2)-1; /* maxbuffer points to the attribute of the last character */ buffer += (((y*xsize)+x)*2); /* buffer points to the first character to write */ /* While buffer is not overrun and there are characters left * to print. */ while ((buffer < maxbuffer) && (*string != '\0')) { *buffer++ = *string++; /* Do character */ *buffer++ = attr; /* Do attribute */ } } Listing 2 #include <stdio.h> #include <conio.h> #define XSIZE 50 #define YSIZE 15 char newbuffer[XSIZE * YSIZE * 2]; /* Allow for Attributes */ char oldbuffer[XSIZE * YSIZE * 2]; main() { int i, j; char key_string[15]; /* Get the existing screen area and store it in oldbuffer. * Subtract 1 from size, since the 1st position is 0. */ gettext (5,5,5+XSIZE-1,5+YSIZE-1,oldbuffer); /* Clear the new window area (newbuffer) */ for (i = 0; i < YSIZE; i++) { for (j = 0; j < XSIZE*2; j+=2) { newbuffer[i*XSIZE*2+j] = ' '; /* Blank Space */ newbuffer[i*XSIZE*2+j+1] = '\35'; /* Attribute */ } } /* Loop through 10 times */ for (j = 0; j < 10; j++) { /* Print YSIZE lines */ for (i = 0; i < YSIZE; i++) { sprintf(key_string,"Value %.3d",i+(j*(int)YSIZE)); puttext_write(1,i,XSIZE,YSIZE,key_string,\ '\35',newbuffer); } /* Show it on the screen */ puttext(5,5,5+XSIZE-1,5+YSIZE-1,newbuffer); delay(500); } /* Restore the original screeen */ puttext(5,5,5+XSIZE-1,5+YSIZE-1,oldbuffer); } The C Programmer's Reference: A Bibliography Of Periodicals Harold Ogg This article is not available in electronic form. Standard C Formatted Input P.J. Plauger P.J. Plauger has been a prolific programmer, textbook author, and software entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standard committee. This is the fourth in a series of columns on input and output under Standard C. (See "Evolution of the C I/O Model," CUJ August '89, "Streams," CUJ October '89, and "Formatted Output," CUJ November '89.) The topic this month is how to perform formatted input. You can think of it as a natural, but not essential, companion to formatted output. As I emphasized last month, you really must perform output somewhere in every program that you write. If the output is to be directly digestible by human beings, as is often the case, then you want the program to produce readable text. The formatted output functions help you produce readable text that reflects the values of encoded data in your program. On the other hand, not all programs read input. Those that do can read data directly, using an assortment of standard library functions, and interpret it as they see fit. Converting small integers and text strings for internal consumption are both five-finger exercises that most C programmers perform easily. It is only when you must convert floating point values, or recognize a complex mix of data fields, that standard scanning functions begin to look attractive. Even then the choice is not always clear. The usability of a program depends heavily on how tolerant it is to variations in user input. You as a programmer may not agree with the conventions enforced by the standard formatted input functions. You may not like the way they handle errors. In short, you are much more likely to want to roll your own input scanner. Obtaining formatted input in not simply the inverse of producing formatted output. With output, you know what you want the program to generate next and it does it. With input, however, you are more at the mercy of the person producing the input text. Your program must scan the input text for recognizable patterns, then parse it into separate fields. Only then can it determine what to do next. Not only that, the input text may contain no recognizable pattern. You must then decide how to respond to such an "error." Do you print a nasty message and prompt for fresh input? Do you make an educated guess and bull ahead? Or do you abort the program? Various canned input scanners have tried all of these strategies. No one of them is appropriate for all cases. It is no surprise, therefore, that the history of the formatted input functions in C is far more checkered than for the formatted output functions. Most implementations of C have long agreed on the basic properties of printf and its buddies. (A notable exception is the I/O library I originally wrote for the Whitesmiths C compiler. It nicely regularized the names of functions and format conversion specifications, but at a serious cost in compatibility. Eventually, we had to abandon our special dialect of I/O.) By contrast, scanf and its ilk have changed steadily over the years and have proliferated dialects. Committee X3J11 spent considerable time sorting out the proper behavior of formatted input. Once we agreed on which input conversions to include in Standard C, we had to agree on exactly what they did. Implementations varied on the valid formats for numeric fields. They were all over the map on how to respond to invalid input. They seldom clarified how scanf interacts with ungetc and other I/O functions. All these decisions had to be made in an atmosphere of general dissatisfaction. A vocal minority wanted major changes in the formatted input functions. An almost silent majority didn't want to be bothered with details about functions they considered useless at best, dangerous at worst. Given all these handicaps, I think X3J11 did rather a good job of clarifying the formatted input functions and making them useful. After that introduction, I will rashly assume that you still care about the formatted input functions. The rest of this column discusses the scan functions, so called because they all have scan as part of their names. These are the functions that scan input text and convert text fields to encoded data. All are declared in the standard header <stdio.h>. To use the scan functions, you must know how to call them, how to specify conversion formats, and what conversions they will perform for you. Calling Scan Functions The Standard C library provides three different scan functions, declared as follows: int fscanf(FILE *stream, const char *format, ...); int scanf(const char *format, ...); int sscanf(char *src, const char *format, ...); The function fscanf obtains characters from the stream stream. The function scanf obtains characters from the stream stdin. Both stop scanning input early if an attempt to obtain a character sets the end-of-file or error indicator for the stream. The function sscanf obtains characters from the null-terminated string beginning at src. It stops scanning input early if it encounters the terminating null character for the string. Note that all of the functions accept a varying number of arguments, just like the print functions. And just like the print functions, you had better declare any scan functions before you use them by including <stdio.h>. Otherwise, some implementation may go crazy when you call your undeclared scan function. All the functions accept a read-only format argument, which is a pointer to a null-terminated string. The format tells the function what additional arguments to expect, if any, and how to convert input fields to values to be stored. (A typical argument is a pointer to a data object that receives the converted value.) It also specifies any literal text or whitespace you want to match between converted fields. If scan formats sound remarkably like print formats, the resemblance is quite intentional. But there are also important differences. I will revisit formats in considerable detail later in this column. All the functions return a count of the number of text fields converted to values that are stored. If any of the functions stops scanning early for one of the reasons cited above, however, it returns the value of the macro EOF (defined in the standard header <stdio.h>). Since EOF must have a negative value, you can easily distinguish it from any valid count, including zero. Note, however, that you can't tell how many values were stored before an early stop. If you need to locate a stopping point more precisely, break your scan call into multiple calls. A scan function can also stop scanning because it obtains a character that it is unprepared to deal with. In this case, the function returns the cumulative count of values converted and stored. You can determine the largest possible return value for any given call by counting all the conversions you specify in the format. The actual return value will be between zero and this maximum value, inclusive. When either fscanf or scanf obtains such an unexpected character, it pushes it back to the input stream. (It also pushes back the first character beyond a valid field when it has to peek ahead to determine the end of the field.) How it does so is similar to calling the function ungetc. There is a very important difference, however. You cannot portably push back two characters to a stream with successive calls to ungetc (and no other intervening operations on the stream). You can portably follow an arbitrary call to a scan function with a call to ungetc for the same stream. What this means effectively is that the one-character pushback limit imposed on ungetc is not compromised by calls to the scan functions. Either the implementation guarantees two or more characters of pushback to a stream or it provides separate machinery for the scan functions. Note that the scan functions push back at most one character. Say, for example, that you try to convert the field 123EASY as a floating point value. The field is, of course, invalid. Even the subfield 123E is invalid, since the conversion requires at least one exponent digit. What will happen is, the subfield 123E is consumed and the conversion fails. No value is stored and the scan function returns. The next character to read from the stream is A. This behavior matters most for floating point fields, which have the most ornate syntax. Other conversions can usually digest all the characters in the longest subfield that looks valid. As a final point, the Standard C library does not provide any of the functions vfscanf, vscanf, or vsscanf. These are obvious analogs to the print functions vfprintf, vprintf, and vsprintf which I described last month. X3J11 simply felt that there was not enough call for such scan functions to require them of all implementations. Writing Formats Last month, I described the print formats as a mini programming language. The same is, of course, true of the scan formats. I also commented earlier that print and scan formats look remarkably alike. This should serve as both a comfort and a warning to you. The comfort is that the print and scan functions are designed to work together. What you write to a text file with one program should be readable as a text file by another. Any values you represent in text by calling a print function should be reclaimable by calling a scan function. (At least they should be to good accuracy, over a reasonable range of values.) You would even like the print and scan formats to resemble each other closely. Doug McIlroy, at AT&T Bell Laboratories, makes a stronger statement. He feels that any good formatted I/O package should let you write identical formats for print and scan function calls. A formatting language that is not symmetric, he feels, is deficient. I believe that Standard C comes close to achieving this goal. It is at least possible for you to write symmetric formats (those that read back what you wrote out). Be warned, however, that developing symmetry can take a bit of extra thought. And here lies the danger. The fact remains that the print and scan format languages are different. Sometimes the apparent similarity is only superficial. You can write text with a print function call that does not scan as you might expect with a scan function call using the same format. Be particularly wary when you print text using conversions with no intervening whitespace. Be somewhat wary when you print adjacent whitespace in two successive print calls. The scan functions tend to run together fields that you think of as separate. The basic operation of the scan functions is, indeed, the same as for the print functions. Call a scan function and it scans the format string once from beginning to end. As it recognizes each component of the format string, it performs various operations. Most of these operations consume characters sequentially from a stream (fscanf or scanf) or from a string stored in memory (sscanf). Many of these operations generate values that the scan function stores in various data objects that you specify with pointer arguments. Any such arguments must appear in the varying length argument list, in the order in which the format string calls for them. For example, sscanf("thx 1138", "%s%2o%d", &a, &b, &c); stores the string "thx" in the char array a, the value 9 (octal eleven) in the int data object b, and the value 38 in the int data object c. It is up to you to ensure that the type of each actual argument pointer matches the type expected by the scan function. (The pointer must, of course, also point to a data object of the expected type.) Standard C has no way to check the types of additional arguments in a varying length argument list. Not every part of a format string calls for the conversion of a field and the consumption of an additional argument. In fact, only certain conversion specifications gobble arguments. Every conversion specification begins with the % escape character and matches one of the patterns described below. The scan functions treat everything else either as whitespace or as literal text. Whitespace in a scan format, by the way, is whatever the standard library function iswhite (declared in <ctype.h>) says it is. That can change if you call the function setlocale (declared in <locale.h>) before you call the scan function. Your program begins execution in the "C" locale, where whitespace is what you have learned to know and love. A sequence of one or more whitespace characters in a scan format is treated as a single entity. It consumes an arbitrarily long sequence of whitespace characters from the input. (Again, whitespace is whatever the current locale says it is.) The whitespace in the format need not resemble the whitespace in the input in any way. The input can contain no whitespace. Whitespace in the format simply guarantees that the next input character (if any) is not a whitespace character. Any character in the format that is not whitespace and not part of a conversion specification calls for a literal match. The next input character must match the format character. Otherwise, the scan function returns with the current count of converted values stored. A format that ends with a literal match can produce ambiguous results. You cannot determine from the return value whether the trailing match failed. Similarly, you cannot determine whether a literal match failed or a conversion that follows it. For these reasons, literal matches have only limited use in scan formats. For completeness, I should point out that a literal match can be any string of multibyte characters. Each sequence of literal text must begin and end in the initial shift state, if your target environment uses a state-dependent encoding for multibyte characters. I suspect, however, that you will have little need to match Kanji characters with scan formats in the next few years. Conversion Specifications A scan conversion specification differs from a print conversion specification in fundamental ways. You cannot write any of the print conversion flags and you cannot write a precision (following a decimal point). On the other hand, scan conversions have an assignment-suppression flag and a conversion specification called a scan set. Following the % you write three components. All but the last component is optional. In order: You write an optional asterisk (*) to specify that the converted value is not to be stored. You write an optional field width to specify the maximum number of input characters to match when determining the conversion field. The field width is an unsigned decimal integer. Many conversions skip any leading whitespace, which is not counted as part of the field width. You write a conversion specifier to determine the type of any argument, how to determine its conversion field, and how to convert the value to store. You write a scan set conversion specifier between brackets ([]). All others consist of one or two character sequences from a predefined list of about three dozen valid sequences. The two-character sequences begin with an h, l, or L, to indicate alternate argument types. I describe scan sets and list all valid sequences in Table 1. Don't write anything else in a scan format if you want your code to be portable. The goal of each formatted input conversion is to determine the sequence of input characters that constitutes the field to convert. The scan function then converts the field, if possible, and stores the converted value in the data object designated by the next pointer argument. (If assignment is suppressed, no function argument is consumed.) Unless otherwise specified below, each conversion first skips arbitrary whitespace in the input. Skipping is just the same as for whitespace in the scan format. The conversion then matches a pattern against succeeding characters in the input to determine the conversion field. You can specify a field width to limit the size of the field. Otherwise, the field extends to the last character in the input that matches the pattern. The scan functions convert numeric fields by calling one of the standard library functions strtod, strtol, or strtoul (declared in <stdlib.h>). A numeric conversion field matches the longest pattern acceptable to the function it calls. Scan Sets A scan set behaves much like the s conversion specifier. It stores up to w characters (default is the rest of the input) in the array of char pointed at by ptr. It always stores a null character after any input. It does not, however, skip leading whitespace. It also lets you specify what characters to consider as part of the field. You can specify all the characters to match, as in: "%[0123456789abcdefABCDEF]" which matches an arbitrary sequence of hexadecimal digits. Or you can specify all the characters that do not match, as in: "%[^0123456789]" which matches any characters other than digits. If you want to include the right bracket (]) in the set of characters you specify, write it immediately after the opening [ (or [^). You cannot include the null character in the set of characters you specify. Some implementations may let you specify a range of characters by using a minus sign (-). The list of hexadecimal digits, for example, can be written as: "%[0-9abcdefABCDEF]" or even, in some cases, as: "%[0-9a-fA-F]" Please note, however, that such usage is not universal. Avoid it in a program that you wish to keep maximally portable. Table 1 Conversion Specifiers In the descriptions that follow, I summarize the match pattern and conversion rules for each valid conversion specifier. w stands for the field width you specify, or the indicated default value if you specify no field width. ptr stands for the next argument to consume in the varying length argument list: c -- stores w characters (default is 1) in the array of char whose first element is pointed at by ptr. It does not skip leading whitespace. d -- converts the integer input field by calling strtol with a base of 10, then stores the result in the int pointed at by ptr. hd -- converts the integer input field by calling strtol with a base of 10, then stores the result in the short pointed at by ptr. ld -- converts the integer input field by calling strtol with a base of 10, then stores the result in the long pointed at by ptr. e -- converts the floating point input field by calling strtod, then stores the result in the float pointed at by ptr. le -- converts the floating point input field by calling strtod, then stores the result in the double pointed at by ptr. Le -- converts the floating point input field by calling strtod, then stores the result in the long double pointed at by ptr. E -- is the same as e. lE -- is the same as le. LE -- is the same as Le. f -- is the same as e. lf -- is the same as le. Lf -- is the same as Le. g -- is the same as e. lg -- is the same as le. Lg -- is the same as Le. G -- is the same as e. lG -- is the same as le. LG -- is the same as Le i -- converts the integer input field by calling strtol with a base of zero, then stores the result in the int pointed at by ptr. (A base of zero lets you write input that begins with 0, 0x, or 0X to specify an actual numeric base other than 10.) hi -- converts the integer input field by calling strtol with a base of zero, then stores the result in the short pointed at by ptr. i -- converts the integer input field by calling strtol with a base of zero, then stores the result in the long pointed at by ptr. n -- converts no input, but stores the cumulative number of matched input characters in the int pointed at by ptr. It does not skip leading whitespace. hn -- converts no input, but stores the cumulative number of matched input characters in the short pointed at by ptr. It does not skip leading whitespace. ln -- converts no input, but stores the cumulative number of matched input characters in the long pointed at by ptr. It does not skip leading whitespace. o -- converts the integer input field by calling strtoul with a base of eight, then stores the result in the unsigned int pointed at by ptr. ho -- converts the integer input field by calling strtoul with a base of eight, then stores the result in the unsigned short pointed at by ptr. lo -- converts the integer input field by calling strtoul with a base of eight, then stores the result in the unsigned long pointed at by ptr. p -- converts the pointer input field, then stores the result in the void * pointed at by ptr. Each implementation defines its pointer input field to be consistent with pointers written by the print function. s -- stores up to w non-whitespace characters (default is the rest of the input) in the array of char pointed at by ptr. It first skips leading whitespace, and it always stores a null character after any input. u -- converts the integer input field by calling strtoul with a base of 10, then stores the result in the unsigned int pointed at by ptr. hu -- converts the integer input field by calling strtoul with a base of 10, then stores the result in the unsigned short pointed at by ptr. lu -- converts the integer input field by calling strtoul with a base of 10, then stores the result in the unsigned long pointed at by ptr. x -- converts the integer input field by calling strtoul with a base of 16, then stores the result in the unsigned int pointed at by ptr. hx -- converts the integer input field by calling strtoul with a base of 16, then stores the result in the unsigned short pointed at by ptr. lx -- converts the integer input field by calling strtoul with a base of 16, then stores the result in the unsigned long pointed at by ptr. X -- is the same as x. hX -- is the same as hx. lX -- is the same as lx. % -- converts no input, but matches a percent character. (%) Doctor C's Pointers (R) The Memory Management Library Rex Jaeschke Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091 or via UUCP at uunet!aussie!rex. The C run-time library has long had a family of routines that enable a programmer to allocate and free memory at run-time, at his pleasure. This capability is a powerful one and was adopted (and somewhat expanded) in ANSI C. Oftentimes you define an array of elements (necessarily of fixed size) only to find that, in most cases, you don't use all the elements or that, in some cases, you need just a few more. What you need is the ability to have variable sized arrays. However, according to the definition of C, the dimension of an array in a definition must be a compile-time integer constant. That is, the C language does not support such constructs. (Note that the Numerical C Extensions Group, of which I am the convener, is investigating the possibility of adding such a construct.) However, this idea can be implemented using the memory allocation routines in the standard library. The beauty of these allocation routines is twofold: the programmer determines just when space is allocated and exactly how long it is kept, and, if the program is written correctly, you can change the manner in which the space is allocated and freed, transparently. Let's discuss the second point further. ANSI C defines the term storage duration by saying "An object has a storage duration that determines its lifetime. There are two storage durations: static and automatic." I prefer to also add a third duration, dynamic. An object having dynamic storage duration is one allocated by the programmer using the library. (For the purposes of this discussion, the address space from which dynamic objects are allocated will be referred to as the heap. This term is widely used for this purpose but is not used in the ANSI C Standard.) Consider the following example: #include <stdlib.h> void f() { char c1[100]; static char c2[100]; char *c3; c3 = malloc(100); c1[10] = 'a'; c2[10] = 'a'; c3[10] = 'a'; } Ignoring the possibility of malloc() failing to allocate memory, c1, c2, and c3 can be used to designate the automatic, static, and dynamic arrays, respectively. Since the notation for referencing all three arrays is identical, the executable code can be ignorant of the object's storage duration. You can change from automatic to dynamic, from dynamic to static, etc., with no real impact on the code, if you design it appropriately to begin with. The allocation functions somehow magically change the address space of our program at run-time. The way in which this is done is specific to an implementation and may vary widely. In any case, an understanding of such details is unnecessary to use the allocation functions effectively. All you need know is that if they succeed, the requested space is allocated contiguously and you are given a base address. The Parent Header In the not too distant past, there were only four or five "standard" headers. Apart from those, there was a wide variation as to which functions were provided and in which header (if any) they were declared. ANSI C requires the allocation functions to be declared in the header stdlib.h. Many implementations currently declare them in malloc.h as well as, or instead of, stdlib.h. I have also seen quite a lot of old code that contained explicit declarations for these functions, presumably because no header in their implementation contained them. As a result of ANSI C, the declarations of these functions has changed both with regard to return as well as argument types. ANSI C adopted the concept of a void pointer from C++. This solved two important issues: it provided a bridge for porting code across byte and word (and other) architectures where different pointer types may actually have different physical representations, and secondly, it provided a way to represent a generic pointer, one that simply contained an address of some (unknown) object type. Since the allocation routines are not given any information about the type of object a programmer wishes to store in the allocated space, the pointers used and returned by these functions were prime candidates for type void *. A consequence of this is that the returned value no longer need be explicitly cast. For example in the following case: int *pi; pi = (int *)malloc(10 * sizeof(int)); pi = malloc(10 * sizeof(int)); the assignments are equivalent since a void pointer is assignment-compatible with all other pointer types. (Historically, it was common to see such casts even though they generally were not needed. That is, strict pointer assignment-compatibility checking was not enforced as is now required by ANSI.) If some of your code explicitly declares the allocation functions as having return types of char *, without such casts you will get errors when compiling in strict ANSI mode if the target of the assignment has type other than char *. The best solution to this is to remove the explicit declaration and include stdlib.h instead. With ANSI's adaptation of function prototypes from C++, stdlib.h now describes the allocation routines' argument type information as well. Again, all pointer types here have type void * but this is of no consequence since any "real" pointer type is compatible with void * and, as such, objects of such type can be passed in. ANSI C has invented the type size_t, the type of a sizeof expression. This type is typedefed in numerous standard headers including stdlib.h and is used in various library function prototypes (including the allocation functions) for the type of sizes and counts. Since sizes and counts can never be negative, size_t is an unsigned integer type. However, the underlying type of size_t is implementation-defined and may be unsigned int or unsigned long. Historically, descriptions of the allocation functions stated that sizes and counts had type unsigned int. The Allocation Functions calloc #include <stdlib.h> void *calloc(size_t nmemb, size_t size); calloc() allocates contiguous space for nmemb objects, each of whose size is size. The space allocated is initialized to all-bits-zero. Note that this is not guaranteed to be the same representation as floating-point zero or the null pointer constant NULL. free #include <stdlib.h> void free(void *ptr); free() causes the space (previously allocated by calloc(), malloc(), or realloc()) pointed to by ptr to be freed. If ptr is NULL, free does nothing. Otherwise, if ptr is not a value previously returned by one of these three allocation functions, the behavior is undefined. The value of a pointer that refers to space that has been freed is indeterminate, and such pointers should not be dereferenced. Note that free() has no way to communicate an error if one is detected. On some systems, most noticeably MS-DOS, freed space may not actually be given back to the operating system. (It likely will, however, be available for future allocations within that program.) It might only be really released when the program terminates. One consequence of this is that if you try to execute another program from within a running program that has freed up memory using free(), there still might not be sufficient physical memory available to start the new program. malloc #include <stdlib.h> void *malloc(size_t size); malloc() allocates contiguous space for size bytes. The space allocated has no guaranteed initial value. realloc #include <stdlib.h> void *realloc(void *ptr, size_t size); realloc() changes the size of the space pointed to by ptr to have size size. If ptr is NULL, realloc() behaves like malloc(). Otherwise, if ptr is not a value previously returned by calloc(), malloc(), or realloc(), the behavior is undefined. The same is true if ptr points to space that has been freed. size is absolute, not relative. If size is larger than the size of the existing space, new uninitialized contiguous space is allocated at the end; the previous contents of the space are preserved. If size is smaller, the excess space is freed; however, the contents of the retained space are preserved. If realloc() cannot allocate the requested space, the contents of the space pointed to by ptr remain intact. If ptr is non-NULL and size is 0, realloc() acts like free(). Whenever the size of space is changed by realloc(), the new space may begin at an address different from the one given it, even when realloc() is truncating. Therefore, if you use realloc() in this manner, you must beware of pointers that point into this possibly-moved space. For example, if you build a linked list there and use realloc() to allocate more (or less) space for the chain, it is possible that the space will be "moved," in which case the pointers now point to where successive links used to be, not where they are now. You should always use realloc() as follows: ptr1 = realloc(ptr, new_size); if (ptr1 != NULL) { ptr = ptr1; ... } This way, you never care whether the object has been relocated since you always update ptr each call, to point to the (possibly new) location. General Comments The way in which a heap is physically organized can vary widely. On some systems, the stack and the heap (and possibly even the static data area) share the same address space. On others, each may have its own address space. Some MS-DOS implementations provide both near and far heaps. Historically, many C implementations have permitted the allocation of zero bytes to be successful. That is, a non-NULL pointer is returned. Since ANSI C does not permit zero-sized objects to be defined, this practice was hotly debated during X3J11 deliberations. As a compromise, if you attempt to allocate zero bytes, it is implementation-defined whether a null pointer or a unique pointer is returned. We are told that if an allocation attempt fails, NULL is returned. The common approach I've seen to this is to display some error message and call exit(). However, most applications I have seen could ill-afford to actually do this since it would leave either disk files and/or shared memory data areas compromised. For example, if you cannot get more dynamic space, you may have quite some work to undo your current situation before you can gracefully terminate or continue. On the other hand, failure to allocate more memory when doing an in-memory sort can simply be handled by writing the sorted tree to disk, freeing the memory, and starting on the next set of strings. In such cases, the failure to allocate memory is not fatal. In cases where it is, you must consider the ramifications of receiving a NULL return at design time, not during maintenance when the first failure occurs. When heap allocation fails it might well be useful to find out how much you can get. Unfortunately, ANSI C does not provide this capability. Several implementations (including Microsoft's) do provide some help in this area. Either they can tell you how much is available now in one allocation or, how many allocations you can make of a given size. (The two need not add up to the same number of bytes since each time you request bytes, extra bytes may also be fetched to help manage the space allocated.) Similarly, ANSI C provides no help in debugging heap-related problems by "walking the heap links" and the like. Again, it's up to the quality of the implementation. On some systems (VAX/VMS, for example) the cost of allocating memory dynamically can be somewhat expensive. As such, a caching approach may be taken. That is, when you free memory the larger of the freed block and the cache currently held, will be kept. The idea is that if you alternately allocate and free, each new allocation will have some chance of getting memory from the freed cache. ANSI C guarantees that any non-NULL address returned by the allocation functions will be aligned appropriately so it can be dereferenced via any pointer type. On systems that require object alignment, this means that space is allocated in multiples of some cluster value (such as machine words, for example.) On such systems, more memory may actually be allocated than you requested. If your program contains a bug and copies (slightly) beyond the end of allocated memory, the bytes overwritten may be those extra ones and no error occurs. However, if you change the request to a few extra bytes, the bug may manifest itself. The most common example I see is as follows: char name[30]; getname(name); pc = malloc(strlen(name)); strcpy(pc, name); Here, strcpy() adds a null character to the destination but no space was allocated for it. If the length was odd and malloc() allocates an even number anyway, the problem will not be observed. However, with even length names it may well appear. It is considered good style to explicitly free allocated memory when you are done with it. Presumably, if you don't, this is done when your program terminates (although this is not so stated by ANSI C.) Note that if you "forget" where your allocated space resides (by overwriting the pointer value returned by malloc(), for example), there is no way of getting that address back. One relatively easy way of having this happen is to use: ptr = realloc(ptr, new_size); If realloc() fails, you have lost the address of the original area. An alternate memory allocation system also exists in many systems. It usually involves using sbrk(). The two schemes are incompatible and must not be used in the same program. ANSI C does not include this alternate scheme. Transparent Heap Usage It is possible that your program calls the allocation routines even if you don't call them yourself. For example, some library routines might need dynamic space to efficiently handle variable size amounts of local information. Many systems have a fixed limit on the number of open files they support. However, others do not. They can achieve this by building a linked list of FILE objects using the allocation routines. They may even include stdin, stdout, and stderr in this list, in which case, the program startup code may contain calls to malloc(), etc. Compile and link an empty main() program and look at the linker map to see if these library functions are called at startup. Multi-Dimensional Arrays Occasionally, it may be necessary to allocate a multi-dimensional array on the heap. This can be done just as easily as for single-dimensioned arrays once you master the required pointer declaration. For example, double (*pd)[10]; pd = malloc(50 * sizeof(double)); pd[3][2] = 1.234; By declaring pd to be a pointer to an array of 10 doubles, pd can be subscripted to two levels. pd[3] designates the fourth row of 10 elements and pd[3] [2] designates the third column in that row. (If you are confused about the difference between a pointer to double and a pointer to an array of double, you will have to wait for a future column.) Implementer's Notebook Life With Static Buffers Don Libes This article is not available in electronic form. Applying C++ Designing And Implementing A Text Editor Using OOP, Part 1 Tsvi Bar-David Tsvi Bar-David is president of Deerworks and currently a faculty member in the Software Engineering Department at Monmouth Collge. He received his PhD in mathematics from the University of California at Berkeley. Previously, he was employed at Bell Labs in the development and delivery of UNIX, C++, and Object-Oriented courses. In my July 1989 column on training for object-oriented programming, I presented a simple framework for object-oriented design. Today we embark upon a journey -- likely to last several columns -- in which we apply the design framework to the problem of constructing a simple text editor. Along the way we will develop some types which are not only useful in building the editor, but also as tools in general, and so can serve as members of a general-purpose object library. Most languages, including C++, require that the solution to a problem be represented as a main program. This we will do. Yet, our goal is not to design and build programs, but rather to identify and construct useful object types, out of which we can construct an infinite number of programs. In a sense akin to mathematics, we are constructing a solution not to just one problem, but rather to a family of related problems -- for example, the problem of editing text. It is precisely this approach to problem solving, I believe, that permits an object-oriented design (design as a noun, the result of the design process) to be easily modified, enhanced and re-used. The brevity of the main programs that we build reflects this approach; typically these programs instantiate an object or two and then invoke a couple of member functions. Bertrand Meyer [2] takes and supports very much the same position in the eiffel language. Indeed, eiffel has no main program; one simply selects a first object to which to send a message. The action associated with that message goes ahead and creates other objects and sends messages to them ad infinitum. Design Framework You should refer to the July 1989 column for details of the design framework. In sum, the framework manages a process that maps a requirements document to an implementable design document. To quote the earlier column: "The heart of object-oriented design is the identification of the types in the program and the relationships between them. To identify a type is to specify its behavior (public interface). To identify relationships means bringing to light the relationships (inheritance and parametric types) in the behavior of the types. One can then implement the behavior in many ways." Here is the pseudo-code for the design process. initial decomposition(on requirements document); while( stopping condition has not been met ) { abstraction; type relationships; type decomposition; } return design specification; In order to begin the design process, we need a behavioral description of the object we want to build, namely the text editor. Describing The Editor The ced editor allows the user to create new text files or edit existing ones. The editor views the file as just a sequence of characters (thus the 'c' in ced) with no other structure, such as a sequence of lines. Since newline ('\n') is just an ordinary character, we can easily recover the traditional line structure of a file by using ordinary edit operations. In addition, the editor maintains the notion of current point in the file. The point is regarded as being between two characters. The notion of current point is pretty close to the concept of current offset in UNIX files. At this point, we have to make a requirements decision about the user interface to the editor. For the sake of simplicity, assume that the editor has a traditional command line interface like edlin on MS-DOS systems or ed on UNIX systems (the input command stream looks like a sequence of lines). Each line consists of an optional integer prefix followed by a character. The table below associates commands with the characters that invoke them. N.B. Bracketed arguments are optional [n] g -- Move point to just before nth character (zero-based). Default value for n is 0. [n] p -- Print n characters starting at the first character after point, followed by a newline. n defaults to 1. Increment point by n. i -- Insert an arbitrary number of characters before point. Terminate insertion with '.' on a line by itself. [n] d -- Delete n characters starting at the first character after point. n defaults to 1. [n] y -- Paste whatever was last deleted n times just before point. n defaults to 1. w [file] -- Write out the internal representation of the file (the buffer) to the named file. The primary default for file is the filename command line argument to ced. If ced was invoked without a filename, it selects the last file written to. q -- Exit the editor. ? -- Print out useful information, like filename, point and size of file. Normally the editor scans standard input for commands. However, for flexibility, the editor should be able to get its command stream from a file or possibly some other source, like a string or a window. When the editor is invoked with an argument at the command line interface, such as prompt> ced filename the editor opens an existing file for editing or creates an empty file of that name. In either case, point is located just before the first character in the file. If the editor is called without an argument prompt> ced it manages an editing session. The user decides how to explicitly write the contents to a named file. A typical edit session might look like: 36g i hello there . g 50p w q Initial Decomposition Our task now is to identify the high-level types from the requirements, out of which we will construct the editor. Certainly File is one of these types and is used in two ways: as the file to be created or modified, and as the command stream (typically standard input from the terminal). In our description of the editor write command, we briefly mentioned the internal representation of the file under edit, traditionally known as the buffer. Is the Buffer type synonymous with the File type? We can answer this question more easily once we have described (the abstraction step) the public interface of both File and Buffer; namely, if the public interfaces (really, the manual pages) of two types are the same, then the types are one and the same. At the risk of getting ahead of ourselves, let's try to answer this question right now. Assume that a File object essentially has the semantics of a standard I/O FILE object (as supported by the standard run-time library of the ANSI C compiler [1]). Files and Buffers may very well share the offset or point concept. On the other hand, whatever a Buffer is, it must support the editor commands listed in the requirement section, particularly insertion and deletion. Yet, there are no native insertion and deletion operators on Files. The operation that puts a character into a file (putc( int, FILE *)) can be considered as inserting only when appending the file, not if the file offset is anywhere in the middle of the file (it will overwrite the character at the offset). This is not the behavior we are looking for. We conclude then that a Buffer is not a File, and so we must design and implement the Buffer abstraction. Now we may be able to implement Buffer in terms of File (as some implementations of the full-screen editor will do), but that is merely (yes, merely!) a matter of implementation and is not to be confused with the behavior or semantics of the Buffer. Is the editor itself a type? Even though giving the Editor a type may seem unnatural at first, we will reap the benefits already mentioned. Our design policy is clear, albeit extreme -- everything in the application is an object of one type or another. So what is the behavior of an editor object? An editor object interprets the command stream and performs actions both upon a buffer and the user interface, which for now is just standard output. That is, the editor coordinates three objects: the input (command) stream, a buffer, and an output stream (a view of the buffer). For simplicity of design, assume that an editor object manages precisely one buffer, which corresponds to at most one file. I say "at most" and not "precisely one" since the edit program ced can be invoked with no arguments. In such a case, the program presumably contains an editor object which manages a buffer, which currently does not correspond to any file. Later on, we will build an edit program based upon the Editor type which manages multiple buffers and files, something in the spirit of emacs. Now that we have identified the object types Editor, Buffer, and File we must now perform the design process on each of the types. We'll start with File since it is the most familiar of the types in our working list. But why even bother representing File as a class when all C++ compilers already support the standard I/O FILE structure? There are several reasons: Consistency. We want objects of all types in our application -- other than built-in types of the language -- to be represented by classes. This provides developers and maintainers a uniform feel of object orientation. The message expression object.memberfunction() will be the sole means of communicating with an object. Using a standard I/O function like putc( 'a', fp) directly on a FILE pointer (fp) would violate this desideratum. Insulation. We can regard our File type as an application-specific type layered on top of the environments's existing I/O support. This helps to make the editor more portable. When you port the editor to a new operating system, only the implementation of File need change. Other code that uses File doesn't change one iota. But we can do better. Since every C++ compiler's run-time support library contains FILE, we can just implement/layer File on top of FILE. Furthermore, there won't be much of a run-time penalty for this layering, if we declare all of the member functions of File to be inline! For File's public interface you need the five classic operations of a minimal interface. open -- connect the program to the named file or create it. close -- sever the connection between the program and the file. iseof -- returns true if at end-of-file, otherwise false. get -- get a character and advance the file offset. put -- put a character and advance the file offset. We can get rid of the explicit open and close member functions elegantly by using a constructor and destructor respectively. The advantage of this approach is that an instantiated File object is guaranteed to be initialized properly. Furthermore, mapping close to the destructor guarantees that when the File object dies (goes out of scope) in the program, the associated file in the file system is automatically closed, without the client programmer having to explicitly close it. The public interface of File as a C++ class is class File { public: File( char *name = "", char *mode = "r"); ~File(); Truth iseof(); int get(); void put( int c); private: // data members }; The constructor takes two arguments, and both are provided with defaults. Here are the intended semantics. The declaration File f; invokes the default constructor File( "", "r" ), which connects the object f to standard input for reading. File f( filename); invokes File( filename, "r") and so opens filename for reading. File f( filename, mode); opens filename with some mode (with the same semantics as fopen()). So, for example, File f( "foo", "w" ); opens the file foo for writing. Before we wax too lyrical about the joys of using constructors in place of an open function, we must face a design problem. Just after the constructor runs, how do we know that the file is really open? If the open failed for any reason (the file doesn't exist, we don't have the correct permissions, etc.), it would be nonsensical to invoke any member function against the object. One solution is to forget the constructor approach and just endow File with an explicit open function with the following form typedef int Truth;// boolean type Truth File::open( char *filename, char *mode); The open function could report success or failure of the operation in a manner similar to the C/C++ library functions fopen() and open() ---- by returning a boolean value (the value is regarded as boolean by convention). However, for those who want to stick with the constructor approach, here is another solution to the problem. Endow File with a member function // returns TRUE if open succeeded in constructor Truth File::isok(); whose sole purpose is to report on the status of the open performed in the constructor. Perhaps a separate isok() function is unnecessary; iseof() can report on open. But assigning this function to iseof() is bad design for two reasons. First, checking for end-of-file is conceptually a completely separate matter from checking to see if the open succeeded. And secondly, how are we to interpret the return value of iseof() on a newly created file for writing? To play it safe, we will have two predicate functions. The File type was originally developed to support a lexical scanner object. To make implementing the scanner easier, we included the following additional member functions in File's public interface: class File { public: ... void unget(int c); int peek(); ... }; unget() pushes the character c back onto an input stream. c is the next character get() gets. peek() returns the value of the next character without removing it from the input stream. As we have alluded, we can piggy-back or layer the implementation of File on top of standard I/O FILE. One easy implementation is found in Listing 1. The only thing difficult about this layered implementation was figuring out that we needed a state data member for recording the status of the open. All the member functions, with the exception of the constructor, are one-liners. Wrap-up In this column we have begun applying an object-oriented design framework to the problem of constructing a text editor. Starting from a description of the editor's behavior, we have identified three types of objects: Editor, Buffer, and File. We discussed how File might be used by other types and let that guide us in identifying its public interface. We then wrote a portable implementation of File layered on top of the standard I/O FILE abstraction. In the next column, we will continue on our journey, focusing our attention on the Buffer abstraction. In the course of designing Buffer, we will become acquainted with two useful parametric container types, Sloop[T] and Yacht[T], that will make the implementation of Buffer rather simple. Bibliography [1] Brian Kernighan and Dennis Ritchie, The C Programming Language, second edition, 1988, Prentice Hall. [2] Bertrand Meyer, Object Oriented Program Construction, 1988, Prentice Hall. (addresses object oriented design, including parametric types). Listing 1 class File { public: File( char *name = "", char *mode = "r") { if( *name ) fp = fopen( name, mode); else if( *mode == 'r' ) fp = stdin; else fp = stdout; state = (int)fp; } ~File() { if( fp) fclose( fp); } Truth isok() { return state; } Truth iseof() { return feof( fp); } int get() { return getc( fp); } void unget(int c) { (void)ungetc( c, fp); } int peek() { int c = get(); unget(c); return c; } void put( int c) { putc( c, fp); } private: FILE *fp; int state; }; Questions & Answers Readability, Portability, And Coding Style Ken Pugh Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707. Q I would appreciate your comments on the following questions and problems: 1. Type char: signed or unsigned? Most compilers consider chars as signed by default. We, European users, make extensive use of ASCII codes above 127 and the signed chars default does not seem to be the best choice. Which mode, in your opinion, is "better"? Why are constant chars considered as ints? The following: char c = 'é'; if (c == 'é') will work only if default char is unsigned. Otherwise, a cast to (char) is necessary to get the program to work, yet the constant é is clearly a char, not an int. 2. Good use or abuse of #defines and typedefs? What does one think of the current practice of #defineing or typedefing native C types, like char into BYTE, unsigned char into BYTE or UBYTE, char * into TEXT, int into COUNT, int into BOOL, etc. Is there really a reason for this (except (sometimes!) for portability, of course)? There is no such things (as far as I know) in the standard library header files! Moreover, when strictly prototyped programs are compiled the result is generally a long list of type mismatch errors (often pointer mismatches between (char *) and (unsigned char *)). 3. New C programming style What do you think of the 'new' (?) C style programming, à la PASCAL, with (long) identifiers mixing lowercase and uppercase and banishing the underscore? Thanks for your opinion and sincerely yours, Hubert Toullec Angers, France A In the ANSI C committee meetings, there was considerable discussion as to whether a particular feature of the language should be made right or whether backward compatibility should be preserved, to avoid "breaking" existing programs that used documented features of the language. If George Burns (in "Oh God") remade the world from scratch, he "would make the avocados with smaller seeds"; judging from the committee's discussion of this topic, remaking C is much more complex. Several features were left unchanged for the sake of backward compatibility including the priority of the operators (even though some of the bitwise operators could be used more comfortably if the priorities were modified). Similarly, the type of plain chars was specifically left unchanged and thus remains unspecified (i.e. not specifically typed as signed or unsigned). I agree with you that unsigned chars are more useful. I sometimes use the char type to hold small integer values, but they are usually non-negative integers. The char data type has been converted to int since the early days of the language. That eliminates having separate rules for character arithmetic. Character constants should be treated the same way (signed or unsigned) as character variables. Note that standard ASCII includes only seven bit characters, so none of its values have the high order bit set. The C language does not specify that programs must run if you include non-ASCII characters. (Actually it specifies exactly which source characters are acceptable, but that basically is the ASCII set). With your example, char c = 'é'; if (c == 'é') you have used a character that is not specified as being standard. The compiler is not even obliged to compile the code. If you used the octal or hexadecimal escape sequence to represent the character, then the compiler would treat it as a regular character constant. I compiled with Quick-C and ran the program in Listing 1 with one unexpected result. The results were: Unequal -118 138 (char) Equal -118 -118 Hex Equal -118 -118 Hex (char) Equal -118 -118 Notice that the compiler treated both the char variable and the char constant as signed. However, it treated the non-standard character as a regular integer value. Some compilers provide a runtime switch on the interpretation of character variables. You might try using one that has such a switch. On your next question, I am strongly in favor of using typedefs to define logical data types. Using typedefs is preferable to using #defines for consistency's sake, as there are many types which cannot be described in terms of a #define. Declaring variables with typedefs captures a significant amount of information for the maintenance programmer. Unfortunately the C standard, in my opinion, does not go far enough in checking the use of typedefs. My favorite illustration is: typedef SPEED double; typedef TIME double; typedef DISTANCE double; SPEED compute_speed(time, distance) TIME time; DISTANCE distance; { SPEED speed; if (distance != 0.0) speed = time / distance; else speed = 0.0; return speed; } and in another program: SPEED car_speed; TIME car_time; DISTANCE car_distance; car_speed = compute_speed(car_time, car_distance); car_speed = compute_speed(car_distance, car_time); Under the ANSI standard, both of these function calls are compatible, but logically one is erroneous. Some super lint or the compiler itself may one day use the typedef information for error checking. I agree that there is a problem with the type checking performed when comparing or assigning unsigned char pointers and regular char pointers. This problem is most irritating when it forces you to write the declaration as: unsigned char *string = "ABC"; with a cast as: unsigned char *string = (unsigned char *) "ABC"; The ANSI committee debated whether it would be okay to not require such a cast in an initialization statement, but decided that consistency in typing was more important. Of course, I strongly urge using full names for the type names, e.g. BOOLEAN instead of BOOL, etc. On your final question, I am in favor of readable and meaningful variable and function names. Some people may have heard of studies that conclude otherwise, but ALongVariable-Name appears less readable to me than a_long_variable_name. The latter appears closer to what you would expect to read in normal text. How much you should use abbreviations in naming is an open issue. The more abbreviations you use, the more you will have to remember and the more the maintenance programmer will have to infer and comprehend when reading the program. For example, XMT for transmit and TX for transaction may be common, but does CMP stand for compare or compute? Q I am developing a simulation program for study of our company's manufacturing plant using C Language compilers on IBM-PC/AT Machine. I shall be thankful to you for sending information on various software tools in C language for incorporating graphics in the Program. P.K. Gupta Gujarat, India A The only package with which I personally have extensive experience is Essential Graphics by South Mountain Software, Inc., 76 So. Orange Avenue, South Orange, NJ 07079 (201) 762-6965 ($299 list, $230 street). You can distribute products built with Essential Graphics royalty-free, and you can use direct coordinates (your x,y values specify an exact pixel location) or world coordinates (your x,y values are transformed into a pixel location), the latter at some price in speed. The names in this package are somewhat unintelligible, since the developers tried to stay with an eight character name. For example: grbx draws a box, grwx draws an x at a point, hsrect draws a rectangle with a hatch style and a label. As I mentioned above, I would prefer something like graph_box, graph_write_x, and hatch_rectangle_with_label. Essential Graphics also supports loading and saving PC Paintbrush .PCX files. There are several other packages on the market, including Halo Graphics and Advantage Graphics. Perhaps some of our readers may have comments on these or other packages. Reader Responses: Commodore 128 In the May 1989 issue of The C Users Journal, I took note of the questions by Mr. David Ockrassa regarding printing special characters such as the braces, vertical bar, and tilde on the Commodore 128. Before I started programming the Amiga in C, I dealt with the same problem. The problem is two-fold in nature. Because these characters are not in the standard font set of the Commodore 128, the C language packages for that machine generally include an editor that re-defines several characters bitmaps to conform to the missing ones. These are saved with the file as a non-ASCII byte. The problem occurs when the file is printed, because the redefined characters may or may not have the same font set as that of the printer being used. The solution is to write a small printer utility in C. The accompanying code (Listing 2) accomplishes this task, and is available on most commercial bulletin boards. I wrote several printer drivers of this type for the Commodore 128 for use with different printers that have a few more features than the included code such as pagination and filename/date headers. John D. Clark St. Louis, MO MS Dynamic Data Exchange: This letter is in response to Ken Libert's request for material concerning MS Dynamic Data Exchange. If you contact Microsoft's product support services and ask for Windows Software Development Kit support, you can request their Application Notes concerning Dynamic Data Exchange. With this publication you get a disk complete with examples and source. The DDEAPP example allows you to initiate a session with Excel and actually exchange cell data in multiple formats. Tim Kuntz University of Pittsburgh Listing 1 main() { char c = 'é'; char c1 = '\x8A'; if (c == 'é' printf("\n Equal %d %d", c, 'é'); else printf("\n Unequal %d %d", c, 'é'); if (c == (char) 'é') printf("\n (char) Equal %d %d", c, (char) 'é'); else printf("\n (char) Unequal %d %d", c, 'é'); if (c == '\x8A') printf("\n Hex Equal %d %d", c, '\x8A'); else printf("\n Hex Unequal %d %d", c, '\x8A'); if (c == (char) 'é') printf("\n Hex (char) Equal %d %d", c, (char)'\x8A' else printf("\n Hex (char) Unequal %d %d", c, '\x8A'); } Listing 2 /* Printer driver for Gemini 10x */ #include <<stdio.h>> main(argc, argv) { unsigned int count; FILE infile, outfile; char c; outfile = 5; open(outfile, 4, 7, " "); for(count = 0; count << argc; count++) { infile = fopen(argv[count], "r"); while((c = getc(infile)) != EOF) { switch(c) { case '{': c + 123; break; case '}': c = 125; break; case '\\': c = 92; break; case '~': c = 126; break; case '': c = 124; break; case '_': c = 95; break; default: if(islower(c)) c += 32; else c -= 128; } putc(c, outfile); } close(infile); } close(outfile); } New Releases Prolog And 'Curses' Added To Library Kenji Hino New Releases CUG297 -- Small Prolog Henri de Feraudy (France) has submitted a public domain Prolog interpreter. His Small Prolog follows a Cambridge syntax (LISP-like syntax) that has advantages for meta-programming and small code. The Small Prolog includes most of standard built-in (predicates) based on Clocksin and Mellish's descriptions in Programming in Prolog, although it can be extended by creating more user defined built-ins. The disk includes C source files, make files, documentation, and many Prolog example files that demonstrate Prolog features for C programmers who may be unfamiliar with Prolog. The source code is very portable and will compile under Turbo C v1.5 and Mark William Let's C v4 on PC clones, Mark William C v3.0 and Megamax Laser C on Atari ST and Sun C compiler on Sun-3. CUG298 -- PC Curses Jeffrey S. Dean has contributed PC Curses, v0.8. This shareware release of PC curses is a C window functions library designed to provide compatibility with the UNIX curses package. By fully utilizing the PC features, this package is coded much simpler than the UNIX version. For example, there is no need for cursor motion and screen output optimization on PC. Currently, there are two major versions of curses database under UNIX; one is termcap, the other terminfo. However, PC curses derives primarily from the former version, with some features of the latter version. Moreover, additional routines (not in the original curses package) are provided for the PC user. The distribution disk includes a couple of demo programs, Small and Large model library for Microsoft C v5.0 and Turbo C v1.5 compilers, and documentation that describes all the functions in the library. The source code is obtained by paying a $20 fee directly to the author. Updates CUG220 -- Window BOSS Phillip A. Mongelluzzo (CT) from Star Guidance Consulting has submitted Revision 07.01.89 of The Window Boss. This release provides additional data entry routines along with support for user-defined physical sizes (i.e. 43 and 50 line EGA/VGA screen sizes). CUG198 -- MicroEmacs Source Willam Bader has extensively updated a text editor, MicroEmacs v3.9. His update includes not only bug fixes of the old version, but also additional commands, portability improvement, and performance enhancement. The new feautures of MicroEmacs are built-in emulation DEC EDT editor support for VT100/VT200 keypads, function keys and scrolling regions, better VMS support such as filter-buffer command and preservation of record format attributes, extra commands such as insert a C format octal escape sequence, scroll the screen horizontally, callable interface of Emacs (you can call Emacs as a function), VMS subshell routines, support for ANSI color, BINARY mode for MS-DOS, pull-down menu, and more. The enhancements include a faster search routine, faster lookup for normal keys and FNC macro, faster display routine. Bader has tested the new version of MicroEmacs using the following compilers and operating systems: VAX11c under VMS4.1 on VAX-11/750, Microsoft C 5.0 under MS-DOS 3.20, Turbo C 1.5 under MS-DOS 3.20, CI86 2.30J under MS-DOS 3.20, Microsoft C under XENIX 386, cc under SunOS 3.5 on Sun 3/360C, cc under SunOS 4.0 on Sun 386i, cc under BSD 2.9 on PDP-11/70. In order to create an executable code for your environment, you need to turn on/off the switches of Machine/OS definitions, Compiler definitions, Terminal Output definitions, and Configuration options in the header file, estruct.h. The distribution setting is to compile under MS-DOS using Turbo C. On The Networks How To Get Net Software Sydney S. Weinstein Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and President of Datacomp Systems, Inc., a consulting and contract programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites and device management for UNIX and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320. First, an introduction, and a thank you. I am the new "Contributing Editor" of the "On The Networks" column. I have written before for The C Users Journal so, hopefully I won't be a total stranger to you. And, as David Fiedler said in the last CUJ, I am the Elm coordinator. (Elm, itself, is a large piece of freely distributable software.) I can be reached at syd@DSI.COM, for those with Internet access, or at {bpa, vu-vlsi}!dsinc!syd for those without Internet access. I don't plan any change in the scope or content of this column. I will attempt to report on the latest freely distributable software available on Usenet and the Internet. Also as David did, I am willing to forward a list of neighboring sites for access, provided you send me a self-addressed, stamped envelope. If you have net access but need a news neighbor, I will reply to electronic mail asking for nearby news sites. To David Fiedler, a well earned thank-you for his two year tenure in this spot. Many megabytes of useful software were highlighted here. His tireless attempts to find neighbors for those sites that requested it, is also gratefully appreciated. It was with his help that our site found its first news neighbor several years ago. However, I highly doubt I can keep up with his run of puns. Some Definitions For the past two years, the terms Usenet, Internet, internet, and "the net" have been bantered about in this column. I would like to add a new one: "freely distributable software." Some definitions are in order. Usenet, often times referred to as "the net" is a loose collection of cooperating computers. In the past, all of Usenet ran UNIX, but now with other computers and operating systems supporting UUCP, hosts could be running anything from MS-DOS to VAX/VMS. All that is required to be considered a computer on Usenet is that you communicate via the UNIX to UNIX Communications Protocol (UUCP) to another computer. Usenet consists of electronic mail, file transfers, and network news. It is via network news that most of the programs you read about in this column are distributed. If your computer talks to Usenet or to another computer via some protocol other than UUCP, you are considered to be on an internet (lower case "i"), short for inter-network. This just means that you are using some network other than the UUCP-based Usenet. This generic internet includes "the Internet" and several other networks such as CSNET and BITNET. The actual connection to Usenet is via a gateway computer that talks to both the network you use and Usenet. The Internet (capital "I") is the computer network loosely managed by the Network Information Center at SRI. The Internet is a collection of networks that grew out of the Defense Department's ARPANET (Advanced Projects Research Agency Network). Usenet sites make phone calls to other computers; the Internet is mostly machines connected with dedicated leased lines. These lines usually run faster than the dial-up lines used by UUCP. The Internet has many sub-networks associated with it, including NSFNET, the National Science Foundation Networks. These newer networks run at much higher speeds and currently also pick up a lot of the long distance traffic for Usenet's Network News. In my area, the local NSFNET related network is called PREPnet and has a backbone consisting of 1.544Mb/s (million bit per second) data links and each site either has a 1.544Mb/s or a 56kb/s (thousand bit per second) hookup to the network. The main backbone NSFNET is now all 1.544Mb/s data links and is quickly upgrading to 45Mb/s data links as they become available. Whereas only mail and news is usually available over the Usenet via UUCP, the Internet runs the TCP/IP protocol and supports news (NNTP, Network News Transfer Protocol), mail (SMTP, Simple Mail Transfer Protocol), remote logins to any computer on the network provided you have an account there (telnet), and remote file transfer (FTP, file transfer protocol), and many other services. All of these services coexist and work in real time. The problem with the Internet providing much of the bulk transfers for Usenet is that they use two different addressing methods. Since a large amount of the software mentioned in this column comes from Usenet or the Internet, you'll need to understand how to format the two types of addresses. A UUCP or Usenet address is made up of site names separated by exclamation points, as in bpa!dsinc!syd. If a site wants to mention more than one "well-known site" to use as a route, it usually lists them in curly braces as in {bpa, vu-vlsi}!dsinc!syd (meaning you can use either bpa!dsinc!syd or vu-vlsi!dsinc!syd). Such addresses assume that you know the complete path from your site to one of the named "well-known sites". Some systems run programs to help with this routing, and Usenet's UUCP Mapping Project publishes maps to automate this process. However, not all sites have registered to be listed in these maps. Registration is free and accomplished by sending your entry to rutgers!uucpmap. The maps are continuously updated and distributed via the comp.mail.maps news group. On the Internet, all sites have a unique "Fully Qualified Domain Name" which is administered by the NIC. My site's domain name is node.DSI.COM, where node is the individual computer at my site. Thus, my full current address is syd@dsinc.DSI.COM, but our mailer, like the mailers at a lot of Internet sites, is smart and knows how to forward the mail to me even if you send it to syd@DSI.COM. This allows me to move around within the DSI.COM domain without having to tell everyone a new address. The Internet does not require users to know the complete path to the site; it is sufficient to know the domain name. Now a word of warning. Mixing both @ and ! in the same address leads to trouble. Not everyone follows the standard and processes the addresses correctly. Converting sitea!user@DSI.COM to a UUCP address would produce dsinc!sitea!user. Note that the @ has higher precedence than the !. Many sites get this wrong, causing your mail to bounce (be returned to you as undeliverable). Some sites, ours included, allow UUCP mail to have addresses including domain names in the ! path, as in dsinc!host.domain.type!user. Where allowed, this convention is usually more reliable than mixing the ! and @s. Lastly, what is Public Domain Software and what is Freely Distributable Software? Much of the software described in this column is free in that no licensing fee is required for personal users. In some cases even commercial users aren't required to pay a licensing fee. However, almost all of the software mentioned in this column is not in the Public Domain. For software to be in the Public Domain, either the copyright must expire (and not be renewed) or the authors must specifically renounce copyright protection. The copyright to most software mentioned in this column is reserved by the author or some group. Though the copyright is reserved, the holders have given the user the right to use and distribute the software without fee. This does not place the software in the public domain. You still cannot sell this software nor pretend that you wrote it. Many of the licensing agreements restrict how the software can be used for business purposes. Freely Distributable Software is also different from Shareware. Shareware expects (but doesn't require) the user to pay a fee if they intend to continue using the program. Freely distributable software does not. Now, how do you get the software mentioned in this column? Much of the software mentioned in this column is distributed in Usenet's network news, especially in the comp.sources.unix or the comp.sources.misc news groups. Game software is in the comp.sources.games group. There are also groups for amigas, atari sts, macs, suns, and computers running the X windowing system. The Usenet news groups are distributed via a store-and-forward broadcast from Usenet neighbor to Usenet neighbor either via UUCP or NNTP. However, news articles are kept online at a particular site for only a short period of time, usually less than two weeks. By the time a piece of software appears in this column, it will have been expired and deleted for a long time. Thus, it is necessary to access a news archive site. Many sites around the country have agreed to archive specific news groups. These sites are listed in the comp.archives news group. Many sites are also identified as archive sites in their Usenet Mapping Project map entry. Some have even been listed in this column. These sites allow access to their archives to retrieve the sources. How one accesses the archives depends on where they are and how that site has set up access. Most archives are available for either FTP or UUCP access and a few even allow both. If a site supports FTP access, you need to be on the Internet to access them. FTP allows for opening up a direct connection to the FTP server on their system and transferring the files directly to your system. FTP will prompt for a user name and optionally a password. Most FTP archive sites allow a user name of anonymous. If it then prompts for a password, any password will work, but convention and courtesy dictate that you use your name and site address for the password. If a site supports UUCP access, anyone with UUCP can access the archives. Most sites of this type publish a sample entry for the Systems (L.sys) file showing the system name, phone number of their modems, the connection speeds supported, and the login sequence. Using the uucp command, one can poll the system directly and retrieve the software. Many sites post hour restrictions on when you should access the modems. Courtesy dictates that you follow their requests, and some sites enforce the limit with programs. Be sure to call far enough before the end of the period to complete your transfer in time. A third method, used for smaller files, allows access to an electronic mail-based archive server. With these sites, you send an electronic mail message to the archive server's mailbox name specifying the files you wish. The files are then returned to you via electronic mail. Remember that many sites have a limit on the size of a single mail message, so don't ask for too much at once. Also remember that the archive server is a program, so phrase your request exactly as specified in the instructions for that archive server, and limit your message to exactly that request. Other comments in the message could confuse the program and it might not honor your request. Lastly, for those sites not connected to any network, some sites will copy the software onto your media if you send them a disk or tape along with return postage and a mailer. Other sites sell media with the software already copied onto it. This is especially useful for the largest distributions, such as the X windowing system, which runs multiple tapes. For those sites without Internet access but who do subscribe to UUNET, UUNET will retrieve the files via FTP for you and make them available for UUCP access. And to come... Starting in February, back to more new software from Usenet's source newsgroups and news from the Internet and public access sites. If you have an archive of UUCP-accessible software and would like even more accesses to it, drop me a note via electronic mail and I'll try to get it into an upcoming column. Until then, a slight paraphrase of David's tag line: see you on the nets! PC-METRIC -- A Measuring Tool For Software Larry Versaw Larry Versaw is a systems engineer at Electronic Data Systems' Corporate Communications Division. His 1984 masters thesis was entitled Measuring the Size, Structure, and Complexity of Software. He may be contacted care of 5400 Legacy Drive, Plano, TX 75024. Have you ever wanted to compare the complexity of two programs or to tell how long it took to develop them? Have you ever needed a precise measure of programmer productivity? No one yet can produce truly reliable answers to these problems, but researchers in the 1970s invented many software metrics and have since conducted hundreds of experiments to see what information could be derived by analyzing program source code. Some metrics purport to measure software complexity; others gauge program size or calculate how well structured a program is. The researchers developed many static code analyzers for use in their software metrics experiments, but few such tools were commercially marketed. PC-METRIC, developed by SET Laboratories, Inc., is one of the few stand-alone software metrics programs, if not the only one, said today. To evaluate PC-METRIC, I tried it out on 80 source files containing 25,000 lines of working C code. That exercise proved PC-METRIC to be a reliable product, an efficient program measurement tool that would be indispensable to anyone wishing to use software metrics in his work. The Product For this article I evaluated v1.1, then v2.3 of PC-METRIC. In addition to the C language versions which I examined, SET Laboratories has produced metrics programs for Ada, Assembler, COBOL, FORTRAN, Modula-2, and Pascal. Some languages are supported on systems other than MS-DOS. PC-METRIC specializes in static code analysis; that is, it reports certain quantifiable attributes of program source code without executing it. These attributes include number of source lines, number of executable statements, and a dozen other quantities which are derived by counting certain program elements. Software metrics experiments have usually shown correlations between these kinds of metrics and actual, observed software management factors, such as programmer skill, number of remaining bugs, and actual programming effort. PC-METRIC is based on the work of several of the pioneers in software metrics, notably Tom McCabe and Maurice Halstead. McCabe [McCabe 1976] proposed a measure of program control flow complexity based on a program's directed control flow graph. This metric, called cyclomatic complexity, may be calculated as one plus the number of branches (if statements, loops, alternatives in case statements) in a program. PC-METRIC reports two variants of cyclomatic complexity for each function it analyzes. McCabe's metric is widely accepted and intuitively satisfying as a complexity measure because it represents the amount of program logic that must be understood and retained to understand an algorithm. One of the most imaginative and ingenious models of software, including software size, was developed by the late Professor Halstead [Halstead 1977]. Halstead's system, labeled software physics, is ultimately based on counts of operators and operands in program source code. Several of PC-METRIC's metrics, including length, estimated length, purity ratio, volume, the effort metric, estimated time to develop, and estimated errors, are implementations of Halstead's software physics formulae. Some have seriously questioned the theoretical basis underlying Halstead's model, and Halstead's attempt to bring theory from the realm of psychology to bear on software development has been widely discounted [Coulter 1981, Perlis 1981]. On the other hand, some rather impressive correlations have been observed between certain of Halstead's metrics and such management factors as code quality, programming time, and debugging effort [Gordon 1979, Curtis 1979, Funami 1976, Paige 1980]. If you are experienced with software metrics, you may find some of your favorite metrics missing from PC-METRIC's repertoire. However, PC-METRIC supports more measures of program size and complexity than are actually needed. Most size and complexity metrics are highly correlated with each other, so that beyond the first two or three, additional size and complexity metrics are redundant. In a study which analyzed great quantities of source code written by diverse programmers in C, Ada, PL/I and Pascal, no statistically significant differences were found among the reliability of different size metrics [Versaw 1984]. They all measure the same attribute, after all. Variations in programming style notwithstanding, it is my belief that lines of code remains as good a measure of program size as any other measure we have today, and is almost as good a measure of complexity as any other. Research continues on the subject, but on a smaller scale than ten years ago. Installing and using PC-METRIC is simplicity itself. A user must learn only one command, CMET, which runs interactively or in batch mode. PC-METRIC is configurable to different dialects of C, by modifying a table of key words and symbols, which is stored in an ASCII text file. As PC-METRIC analyzes source code, it produces two reports. The complexity analysis report lists metrics values calculated for each function, and the combined values for the entire module being considered. In the new version of PC-METRIC, SET has remedied the worst problem with version 1.1, which was its inability to analyze units of source code larger than one file. The second report file, called the exceptions report, highlights all measured values which lie outside of predetermined, user-defined limits. Both the analysis report and exceptions report are output as ASCII files. In the current version of PC-METRIC, these reports are suitable for printing without any manual editing or reformatting. The current version of PC-METRIC provides a CONVERT utility which can convert the report data into a comma-delimited text file suitable for uploading into many spreadsheet or database packages. This is an especially valuable addition to the PC-METRIC package. Program attributes which cannot be measured by simply counting certain operators and identifiers are beyond the scope of PC-METRIC, unfortunately. These would include attributes such as the degree of information hiding, module coupling, function binding, and efficiency. If we could only measure these attributes objectively and automatically, it would greatly enhance the practice of software engineering. Where PC-METRIC does excel is in calculating reliably the most common size and complexity metrics with a minimum of fuss at a reasonable speed (4000 lines per minute on a 10 MHz AT type computer). System Requirements PC-METRIC requires far less memory and disk space than any C compiler would, so hardware requirements do not limit the use of PC-METRIC. The Audience PC-METRIC is intended primarily for two kinds of users. The first is software developers who would use a statistical analysis of their code as a help in identifying overly complex modules or functions. The PC-METRIC manual correctly identifies programmer feedback as an important application of PC-METRIC. The second kind of person who needs PC-METRIC is the manager or software project leader who would use software metrics as an tool to monitor programmer compliance to local standards of function size, module complexity, or other quantifiable program aspects. Documentation All bases are covered in PC-METRIC's three-part manual. Part 1 provides a well-written tutorial on the field of software metrics, concentrating on the specific metrics obtainable with PC-METRIC. It even includes a brief annotated bibliography of software metrics literature. Users with little prior exposure to the field of software metrics should be sure to read this part. Part 2 describes how to install, configure, and use PC-METRIC. It also is well-organized and gives the right amount of examples. PC-METRIC's counting strategy is documented toward the end of this section. Part 3, "Applying PC-METRIC", instructs users on what to do with all those numbers PC-METRIC generates. It first documents the indispensable new CONVERT utility mentioned above. Then it explains ways to interpret the results: how to properly use software measures as a feedback tool or resource estimation tool, in practice. Support SET Labs offers technical support by telephone for their products and will answer general questions on software metrics. SET offers site licensing as well as individual licenses. If you have a particular machine or language for which you would like a version of PC-METRIC, SET Labs will usually do a port for the price of a single site license. Conclusions PC-METRIC is an indispensable tool, and perhaps the only tool in its class, for analyzing program size and complexity by those software metrics it provides. By cleaning up the reports and by providing the CONVERT utility, the new version of PC-METRIC has enhanced users' ability to analyze and apply program metrics. PC-METRIC applies state of the art methods for objectively measuring two basic attributes of program source code: size and complexity. The usefulness of these measures is variable, but not because of any deficiency in PC-METRIC itself. PC-METRIC, and counting programs in general, find their surest application in measuring adherence to a specific coding standard. I recommend PC-METRIC for programmers and managers as a tool for monitoring adherence to their coding standard which could, and probably should, include some complexity metrics. I recommend it also as a tool for identifying overly complex modules that need extra testing or rewriting. The list price is $199. You can contact SET Labs for more information at P.O. Box 86327, Portland, OR 97283 (503) 289-4758. References Coulter, Neal S., Applications of Psychology in Software Science, Proceedings of IEEE COMPSAC 81, (1981), 50-51. Curtis, Bill; Sheppard, Sylvia; and Milliman, Phil, Third Time Charm: Stronger Prediction of Programmer Performance by Software Complexity Metrics, Proceedings of Fourth International Conference on Software Engineering, (1979), 356-360. Funami, Y., and Halstead, M.H., A Software Physics Analysis of Akiyama's Debugging Data, Proceedings of the Symposium on Computer Software Models, (1976), 133-138. Gordon, Ronald, A Quantitative Justification for a Measure of Program Clarity, IEEE Transactions on Software Engineering, IV (March 1979), 121-128. Halstead, Maurice, Elements of Software Science, New York, Elsevier, 1977. McCabe, T.J., A Complexity Measure, IEEE Transactions on Software Engineering, II (December 1976), 308-320. Paige, M., A Metric for Software Test Planning, Proceedings of IEEE COMPSAC 80, (1980), 499-504. Perlis, Alan J.; Sayward, Frederick G.; and Shaw, Mary, editors, Software Metrics: An Analysis and Evaluation, Cambridge, Massachusetts, MIT Press, 1981. Versaw, Larry, A Tool for Measuring the Size, Structure and Complexity of Software, thesis, Denton, Texas, North Texas State University, 1984. GRAD Graphics Library Ron Burk and Helen Custer Ron Burk has a BSEE from the University of Kansas and has been a programmer for the past 10 years. He is currently president of Burk Labs, a small software consulting firm. Helen Custer holds degrees in Computer Science, English, and Psychology from the University of Kansas and is currently a Senior Software Technical Wrter for a Fortune 500 company. She has coauthored books on C, GW-BASIC, and Z-BASIC. Both may be contacted at Burk Labs, P.O. Box 3082; Redmond, WA 98073-3082. The GRAD Graphics Library, written by Conrad Kwok, is a shareware package for drawing simple graphics images, including circles, lines, ellipses, arcs, and rectangles. It can also fill regions, display characters, and dump screen graphics to your printer. The 50-odd graphics functions are carefully documented in a 100-page users manual. The functions are written for PC/XT/AT clones using Microsoft C v4.0. The package is written in Microsoft C and 8088 assembly language. GRAD can also be compiled with Turbo C, and directions for doing so are included with the disks; a fewer minor changes are required. GRAD supports both CGA (640 x 200) and HGA (720 x 348) graphics cards, but unfortunately, it only supports one device at a time. You link with the library that corresponds to the device you want to use; there is no auto-detect of the graphics card adaptor. The routines are modularized in such a way that it may be possible to make them work with other graphics devices by changing one or two files of source code. However, a graphics device is not absolutely necessary, as GRAD allows you to define up to nine virtual graphics screens at run time. GRAD also supports several printers, including the Epson FX-80, the Okidata ML192, and compatibles, or laser printers using the JLASER card. You can also configure other printers to work with GRAD. The GRAD user's manual and assorted documentation files thoroughly document the functions that are available in GRAD. The writing is friendly and, in addition to GRAD, the files document a number of concepts relating to graphics libraries in general. It describes how fonts are viewed by a graphics package, how to use a graphics coordinate system, and what a virtual graphics screen is, among other things. Example code is provided for functions that are difficult to describe. Pixels Vs. Lines The graphics screen on most personal computers is pixel-oriented; it is made up of dots that you can turn on or off. A pen plotter, on the other hand, is line-oriented; everything it draws is made up of line segments. The GRAD library is oriented towards pixel graphics. For example, it supports the ability to "grab" a rectangular portion of the screen and transfer it to another part of the screen. That sort of operation could not be implemented with a pen plotter. You could, however, use GRAD as a PC device driver for a more general set of line-oriented library functions. GRAD supplies almost all of the primitives you would need for such a project. Also, most printers support pixel graphics, so they can serve as hard-copy devices for programs that use pixel graphics. If your printer is similar to the Epson FX-80 or the Okidata ML192, you can adapt the software to work with your printer. An appendix at the back of the manual documents that process. In the general case, however, you may have to buy the source code from the author to make GRAD work with your printer. Standard Transformations In some kinds of graphics, you find yourself drawing the same basic symbol in slightly different ways (different proportions, different locations on the screen, and so on). Three types of transformations of graphics are commonly supported by high-level graphics libraries: Translation--Moving a graphic element (a square, for example) to a new location. Scaling--Making a graphic element appear shorter and fatter, or taller and thinner. Rotation--Turning a graphic element around an axis. GRAD supports graphics translation by allowing you to change the value for the upper left corner, or "origin", of your frame. For example, if you wish a graphic element to appear multiple times in your final drawing, you can create a subroutine that draws the element, then call that subroutine multiple times. Between calls to the subroutine, you simply change the origin for the element. GRAD does not support graphics scaling or rotation. If you want to draw the same symbol with different heights or widths, you must implement the scaling with your own code. Likewise, if you want to rotate a graphic image so that it appears sideways or upside-down, you'll have to write your own code to do this. One reason you might want to do a transformation such as scaling is to solve the problem of aspect ratio. Aspect ratio is the ratio of a pixel's height to its width. GRAD assumes that each pixel is square, the same height as width. However, on a typical CGA monitor, each pixel is rectangular instead of square, that is, its aspect ratio is not 1:1. The aspect ratio problem becomes very clear when you ask GRAD to draw a circle on a CGA monitor. It draws a true circle, but because the pixels are not square, the result on the screen is a "stretched" circle (an ellipse). In a line-based graphics library, this problem can be solved by applying the appropriate scaling transformation just before translating the line into pixels. In GRAD, however, there isn't much you can do except take the problem into account in your code and draw a rectangle to get a square, an ellipse to get a circle, and so on. Virtual Screens A virtual screen is just like the real screen in every way--you just can't see it. Suppose you want your graphics program to have the ability to undo the last drawing request the user made. One way to accomplish the visual part of this task is to use a virtual screen. For each user request that is not an undo request, you first perform the previous request on the virtual screen, then perform the new request on the real screen. If the request is an undo, you could simply copy the virtual screen to the real screen. GRAD provides virtual screens which it calls "frames". A frame is a rectangular memory area where a graphic image is stored. If the memory area corresponds to video memory, then the graphic is visible on the screen. If the memory area is regular memory, the frame is a virtual graphics screen. A graphic image created in this area can only be seen by dumping it to the printer or by copying it to the video memory. Frames are especially useful for windowing operations, as described in the following section. Drawing Attributes GRAD allows you to specify line styles and writing modes. Normally, when you draw a line across the screen, you get a solid line. A line style, however, allows you to specify that all lines are dotted lines, or dashed lines, or almost any pattern of dots and dashes you like. Another drawing attribute that GRAD lets you specify is the writing mode. On a pen plotter, a line is a line--you can never erase an existing line. On a graphics screen, however, there are several interesting possibilities. Usually, you want the screen to look like it would on a pen plotter. This is called OR mode, since it is accomplished by bitwise ORing the pixels to be drawn with the screen pixels' existing value. GRAD also supports an XOR mode and an AND mode. The XOR mode can be used to "erase" lines, because if you draw a line in OR mode and then redraw the line in XOR mode, the line disappears. This isn't perfect, however; if there is a second line on the screen that intersects the first one, it will have a "hole" in it, because the pixel where the two lines intersected is turned off. You can also use XOR mode to achieve a kind of reverse-video effect, by turning on a block of pixels, switching to XOR mode, then drawing on the block. Drawing lines in AND mode doesn't make much sense, because the only pixels that will get turned on are those that were already on. In other words, it will look as though nothing got drawn. AND mode is useful for Bit-Block Transfers, however. Bit-Block Transfers, or bitblts (pronounced "bitblits"), are at the heart of windowing systems that operate in graphics mode. For example, moving a window from one place to another is a bitblt operation; so is removing a window (copying a block of background pattern to it). GRAD provides basic bitblt operations that allow you to transfer blocks between virtual screens and to and from files. GRAD's bitblt operations obey the current writing mode, so you can combine the block transfers with the bit-wise modes to do things like erase a window or cause a window to appear in reverse-video. Clipping Clipping is the ability to restrict graphics output to a specific (usually rectangular) region of the screen. For example, if you are using an inch-high strip along the bottom of the screen to display status information about your program, you want to ensure that no other part of your output strays into that area. If your graphics library supports clipping, you can define a clipping rectangle. Your program can then continue to draw anywhere it likes, but only that portion of the drawing that lies within the clipping region appears. GRAD allows you to specify a single, rectangular clipping region called a "window". There is no on/off function to disable or enable the defined clipping region. Instead, GRAD supplies a ResetWin() function that redefines the clipping region to be the entire virtual screen (which effectively turns clipping off). Drawing Text Whether you are drawing business charts or flowcharts, you inevitably need to display text along with your graphics. There are two general ways to draw text in graphics mode on a pixel-oriented device. Bitmapped fonts are the kind you normally see in text mode on a PC screen. As the name implies, they are defined in terms of a set of bits that are on or off, each bit corresponding to a pixel of the overall character. Bitmapped fonts are easy to define and fast to display, but difficult to scale up and down in size, and difficult to clip except on character boundaries. Stroke fonts, on the other hand, are stored as line segments and, therefore, can usually be scaled up and down in size, stretched in any direction (to form slanted text, for example), and even rotated to arbitrary angles. GRAD has no stroke fonts but supports bitmapped fonts. These fonts can be stored on files and loaded into memory dynamically, as needed. This is useful when you want to use many fonts, but don't want to consume a lot of memory. You can get the effect of rotated fonts (you can get one of four, 90-degree rotations) by using a specially rotated font file. There are 18 font files on the GRAD disk. Although most of these are variations on a couple of fonts, they provide good examples of what you can do. You can also make bitmapped fonts that have variable width. This looks more professional, especially with larger fonts. GRAD also supplies a graphics input function that reads from the keyboard. This is very handy when you need to query the user while you are drawing graphics, since you will want the keyboard input to be echoed on the screen with graphic text. Remember that just calling gets() probably won't produce the desired result when the screen is in graphics mode. Picture Segments If you are writing a program that allows the user to manipulate the graphics drawn on the screen, you may want to provide a way for them to control units of the picture more complicated than individual pixels or lines. For example, the user of an architectural program may want to move an entire wall (including windows and doors) as a unit. Picture segments support this type of operation. A picture segment is just a sequence of drawing commands that you can store, retrieve, and use to draw the same object in a variety of places on the screen. GRAD does not provide picture segments as such, but it defines a draw() function which is a step in that direction. draw() takes three arguments: a C string containing drawing commands and two integer arguments that can be used to parameterize the commands in the string. The key feature of the graphics commands that you store in the string is that they are relative to the current drawing coordinate. For example, here is a command that draws a rectangle at the current location. Draw("RT10 DN5 LF10 UP5", 0, 0); It always draws the same size rectangle; however, it could be parameterized like this: Draw("RT%OX DN%OY, LF%OX, UP%OY", 3, 10); In this case, the arguments to draw() alter the symbol that is specified in the command string. Notice that you could build up command strings, save them to files, and bring them back later -- just as you would use a symbol library. GRAD just supplies the basic command string ability, though. You would have to design your own functions to manage a symbol library. Graphics Environment If you were going to implement a symbol library, you would want the drawing of symbols to be modular. The symbol might draw in a different line style, graphics mode, or font, or use a different clipping window than the calling routine. A modular library would ensure that each symbol routine resets all these attributes back to their original values after the symbol is drawn. Fortunately, there is an easier solution. GRAD groups attributes like the current origin, clip region, line style, font, and so on, into a bundle which it calls an environment. The modular symbol or graphics routine can simply save the current environment before it begins drawing, and restore it after all its graphics operations are complete. Utility Programs The GRAD disk contains several utility programs, which Conrad Kwok wrote as sample programs for the library. The first program, Interp, is an interpreter for GRAD library functions. You can place a series of GRAD function calls in a file, then give that file name as an argument to Interp. Interp interprets the graphics commands and draws the resulting graphic on the screen. This is a fast way to experiment with the library, since you don't have to recompile anything to make changes to what you're drawing. The input to the interpreter mimics the analagous C functions. Whenever a particular function returns a value, you can simply write: var1 = function(val1, val2, ...) The variable that you name (in this case, var1) is created and initialized by the value returned by the function. Similarly, for functions that return values through pointers, you can type something like this: function (&var1, &var2, ...) The variable names available for use are hard-coded in the program, but the source to the interpreter is supplied, so you could easily extend it. Listing 1 shows a sample input file which draws an ellipse around the text "The C User's Journal". MPrint (Merge Print) is a variation of Interp. MPrint allows you to specify a file containing lines of text that are merged with the graphics drawn by the interpreter. In other words, you can print graphics in graphics mode on the printer and print the text portions in text mode, which is much faster than printing text in graphics mode. Distribution and Licensing The GRAD graphics library is a good, basic, integer graphics system. It contains a complete set of primitives which could be used as a base for a more sophisticated graphics package, a floating-point package, for example. The main disadvantage of the library is that it is not written for multiple graphics adaptors. You must compile the library for a specific adaptor. Conrad Kwok, GRAD's author, is distributing this graphics package as shareware. If you find his program useful, he requests that you send a contribution of $20 to him. If you send $20 or more, you will receive updates to the library. If you send a contribution of $60 or more, you will get the source for the latest version of GRAD, as well as a programmer reference manual which documents the internal data structures and algorithms used in the library. The source is copyrighted. The licensing terms for GRAD are as follows: You may freely copy and distribute the GRAD library and related programs provided the documentation and sample programs are not modified in any way. However, you may write additions to the library and distribute those along with the original library. You may not charge a fee for distributing the library or your enhancements to it. However, you may charge a small fee for the cost of the disk, shipping, and handling. Your program must be in the public domain and must contain a message indicating that it contains code from GRAD, written by Conrad Kwok. If your program does not meet the above requirements, you must get written permission from Conrad Kwok before distributing it. Publisher's Forum To show our appreciation for your readership and to commemorate The C Users Journal's second anniversary, we've bound a combination calendar and reference card into this issue. P.J. Plauger prepared the reference card. It summarizes calling conventions for Standard C library functions. Susan, our staff artist, prepared the calendar. We hope you find at least one side useful. This issue begins our third year of publishing The C Users Journal. It also marks our first issue on a monthly publication cycle. Two years ago, when we first combined The C Journal and The C Users Group Newsletter, the Journal was 72 pages and went to 6800 subscribers. This issue of 144 pages will be distributed to over 23,000 subscribers; another 5400 copies will go to newsstand distributors. The magazine and related activities now employ 16 persons -- up from about ten a year ago. To accommodate this extra staff, we've just moved into larger quarters, about two blocks from our old office. (We're all moved in, but we're not yet 100 percent functional. There are still little things missing -- like my terminal, and Donna's doorknob, and Kenji's return air vent, and ...) We think all these signs are cause to celebrate. (Well, maybe all but the moving ... that's pretty traumatic.) Since it's your interest in C that has stimulated this activity, we wanted to share the celebration with you. Unfortunately, it's difficult to coordinate a celebration involving over 23,000 persons scattered around the globe. We considered mailing you each a party favor with instructions about when to toot your whistle, but the reference card seemed more practical. If nothing else, we're always practical. (Personally, I'm celebrating by trying to catch up on some lost sleep.) We hope you like the card. We offer it with our heartfelt gratitude; thanks for reading the magazine, thanks for writing for the magazine, and thanks for advertising in the magazine. We'll be doing our best to earn your continued participation. Sincerely yours, Robert Ward Editor/Publisher New Products Industry-Related News & Announcements Oasys Offers Green Hills C++ Oasys, Inc. has introduced the Green Hills C++ compiler, which supports cross and native mode development. Green Hills C++ is integrated with the Oasys 680x0 and 88000 Cross Tool Kits, enabling embedded systems developers to take advantage of object-oriented techniques. Green Hills C++ supports Kernighan and Ritchie C and complies with ANSI C standard. Green Hills C++ provides object oriented programming features such as data abstraction, strong type checking, and overloading of function names and operators. New C++ features include classes with scope, and overloading new and arrow operators. Green Hills C++ also includes compiler optimizing techniques such as inlining, loop unrolling and register caching. Green Hills C++ compiler is available from Oasys on the Sun-3. Oasys claims that the compiler will be ported to other UNIX workstation and minicomputers soon. Oasys supports Designer C++, the C++ translator developed by Glockenspiel, Ltd. Oasys will provide current customers with the ability to upgrade to the Green Hills C++ compiler. For more information contact Oasys at 230 Second Ave., Waltham, MA 02154 (617) 890-7889; FAX (617) 890-4644. Library Brings UNIX Functions To Hercules Card Users Certified Scientific Software has announced a subroutine package that allows programmers using most PC-based UNIX systems to take full advantage of Hercules-type monochrome graphics adapters. The package includes the standard UNIX plot(3) subroutines plus many enhancements, such as patterned fills of circles, rectangles and user-defined shapes; two fonts -- 8x8 pixel and 8x16 pixel -- for labels; clipping windows; five pixel write-modes, including bit-set, bit-clear and exclusive-or; and routines to support double buffering using the Hercules adapter's two graphics pages, making animation effects possible. The subroutines use only integer code, so they will run efficiently whether or not floating-point hardware is installed. A 10-page manual and demonstration C code is included. The package is currently available for Interactive Systems 386/ix; AT&T Sytem V/386; Microport System V/AT; XENIX 286 v2.2/2.3 and 386 v2.3; and VENIX v2.3/2.4. A single-user license is priced at $99, plus $2 shipping and handling. The subroutines may be licensed for incorporation in programs for resale by special arrangement. For more information or a review copy, contact Certified Scientific Software, P.O. Box 802168, Chicago, IL 60680 (312) 326-6098. Send e-mail to: UUCP:{seismo,harpo,ihnp4, linus,allegra}!harvard!certif!herc INTERNET:certif!herc@ harvard. harvard.edu Screen Manager Professional Updated To Version 1.5B Logical Alternatives, Inc. has released version 1.5B of the Screen Manager Professional for C programmers. The S.M.P. is a tool box of over 150 pre-written functions for complex windowing, menu generation and interactive context sensitive help features. To maximize performance and minimize memory overhead, the windowing functions are written in assembly language. The smallest possible program size using the S.M.P. functions is approximately 7K. The menu system, on the other hand, is written in C, providing flexibility and allowing the programmer to customize the function. Other features include: keyboard filtering for data entry systems, OS and compiler independence, full video support, background processing, reconfigurable memory allocation, and a 300-page ring bound manual. This product also includes an event driven mouse support system, which makes S.M.P. comparable to a text-based Microsoft Windows programming interface. Full technical support is available including a new bulletin board for professional programmers: The LAB (814) 234-1881. The introductory price for S.M.P. v1.5B is $250, (with source code, $350). Screen Manager Professional supports Microsoft C, Borland's Turbo C, Watcom C, Lattice C, and Zortech C++. For more information contact Donald McCandless, Marketing Director, Logical Alternatives, Inc., Calder Square, P.O. Box 10674, State College, PA 16805 (814) 234-8088, BBS: (814) 234-1881, FAX: (814) 234-6864. TE Version 3.0 Announced Sub Systems, Inc. has released TE Developer's Kit v3.0. The new version includes a TES small window editor routine. An application program can utilize TES without programming changes to the routine. The application program passes a set of parameters which specifies the window coordinates, maximum file size and an input buffer or an input file. The output is either a buffer or a file. The TES routine supports screen scrolling functions, word-wrapping, and block commands. It requires 60K of memory and supports Microsoft and Borland C compilers. The package includes the complete source code. This version of TE Developer's Kit retains TE text editor source code and library routines from the earlier version. The package lists for $125. For more information contact Sub Systems, 159 Main St. #8C, Stoneham, MA 02180 (800) 447-6819 or (617) 438-8901. Powerline Updates Source Utilities Powerline Software, Inc. has released new versions of their programming utilities Source Print v4.0 and Tree Diagrammer v3.0. Powerline has added graphics drivers to support over 400 printers. These new features include support for many printers (including laser printer), support for C, Pascal, and dBASE from a variety of language development companies. Both Source Print (a source code formatting utility) and Tree Diagrammer (an "organizational chart" diagrammer) are software tools for all PC programmers coding in C, C++, dBASE, Pascal, BASIC, FORTRAN, and Modula-2. For more information contact Powerline Software Inc. at their new address: 826 Douglass Street, San Francisco, CA 94114 (415) 346-8325. Emulator Mimes Xenix Console Hansco Information Technologies, Inc. has released its new terminal emulator system, HIT/Ansi. HIT/Ansi is a memory-resident program for MS-DOS compatible computers that emulate the Xenix color console. The program may be called up while running any MS-DOS application with a hot key so that the computer functions as a terminal to a host Xenix machine. When the hot key is pressed again, the computer returns to MS-DOS and to whatever program was running. Using less than 48K of RAM, HIT/Ansi supports color (CGA, EGA, and VGA) or monochrome systems, 12 function keys and local printers in the foreground or background through the parallel port. A descriptive brochure and demonstration diskette for the product are available upon request. For more information contact Hansco Information Technologies, Inc., 185 West Ave., Ste. 304, Ludlow, MA 01056 (800) 548-9754 or (413) 547-8991. Saber And TI Join Efforts Saber Software, Inc., developer of Saber-C has announced a joint software development agreement with Texas Instruments, Inc. Engineering teams from both companies are using Saber-C for cooperatively developing new software technology that will be used in software products TI and Saber plan to introduce in the future. Texas Instruments will also use Saber-C widely for its own internal development projects. Saber-C runs on UNIX, Sun Microsystems Sun-3, Sun-4, Sun 386i and SPARCstation workstations. Saber-C is also available for DEC's VAXstation, and Ultrix. For more information, contact Saber Software, Inc., 185 Alewife Brook Parkway, Cambridge, MA 02138 (617) 876-7636; FAX (617) 547-9011. Watcom Ships v7.0 For 386 Hosts Watcom is now shipping the Watcom C v7.0/386 optimizing compiler and run-time library for the Intel 80386 architecture. Already available for the 16-bit MS-DOS environment with the 80X86 processors, Watcom C v7.0 is now available for the 32-bit 80386 processor. Watcom C v7.0/386 ports MS-DOS applications to 32-bit native mode, enabling full 386 performance without 640K limitations. Watcom C v7.0/386 generates code for 32-bit protect mode and can access large data areas without source modification or special compiler options. Watcom C v7.0 possesses 386-specific instructions, sophisticated addressing modes and 32-bit linear addresses. Porting to the 386 architecture involves recompiling existing programs and linking with the 386 library to enable addressing of up to 4 gigabytes of memory. Applications compiled with Watcom C v7.0/386 operate with MS-DOS extenders which enable use of 80386 protect mode. Both the 80386 software tools from Phar Lap Software and OS/386 from A.I. Architects support use of Watcom C v7.0/386 32-bit protect-mode with MS-DOS. Watcom C v7.0/386 includes the compiler run-time library, a "compile and link" utility, and Touch utilities, an object file disassembler, a patch utility, and the Watcom C Preprocessor. The list price for Watcom C v7.0 /386 is $895. For more information, contact Watcom at 415 Phillip Street, Waterloo, Ontario, Canada, N2L 3x2 (519) 886-3700, FAX (519) 747-4971, or call the Watcom C order and inquiry line toll free: (800) 265-4555. Sterling Castle Offers Logic Gem In Single Language Versions Sterling Castle is shipping a "single language edition" of Logic Gem v1.5, its logic processor and code generator. This edition includes one of BASIC, FORTRAN, Pascal, dBase and C, plus English for documenting procedures, writing pseudocode, and building rule bases for expert systems. The products are identical except that one programming language choice appears in the language menu instead of five. LogicGem includes an editor, interpreter and compiler and runs on PC, XT, AT, PS/2 or compatibles. LG requires 640K of RAM, PC/MS-DOS 2.0 or greater and can be used with a color or monochrome monitor. LG's "Programmer's Edition" complete with documentation has a suggested retail of $99. The single language edition, sold only directly from Sterling Castle, is $49.95 with complete documentation and on 3.5" or 5.25" disks. The full purchase price of the single language edition is applicable against a later purchase of the multi-language programmer's edition. There is a 90-day money-back guarantee, free technical support and 24 hour bulletin board service. Upgrades to v1.5 are free to registered users. Contact Sterling Castle, 702 Washington St., Ste. 174, Marina Del Rey, CA 90292. Inside CA (213) 306-3020 or (800) 323-6406; Outside CA (800) 722-7853; FAX (213) 821-8122. CI Adds Profiler To QNX Computer Innovations has added a new utility which provides statistical profiling of a program to the Computer Innovations C86 C Compiler for QNX. The profiler points out parts of the program that use the most CPU time, done in terms of source file constructs that the programmer can easily relate to: by module, function, or line number. The profiler is currently included with the C86 C Compiler package, and is available for downloading (by registered C86 users) from the Computer Innovations Bulletin Board Update System. For more information contact Computer Innovations, Inc., 980 Shrewsbury Ave., Tinton Falls, NJ 07724 (201) 542-5920. Spell Checker Works With C Geller Software Laboratories, Inc. has introduced SpellCode, a spell checker. SpellCode works with C, Pascal, BASIC, databases and Lotus spreadsheets as well as dBase and all work-alike interpreters and compilers. SpellCode includes a comprehensive English dictionary and a special dictionary of common computer terms. The user can also create as many customer dictionaries as needed. It is available from Geller Software Laboratories, Inc., 35 Stephen St., Montclair, NJ 07042 for a special introductory price -- $49.95. For more information call (201) 746-7402. MetaWare Available On SystemV/386 MetaWare's High C compiler will be offered on the Santa Cruz Operation (SCO) and AT&T UNIX System V/386 operating system. The High C compiler features over a dozen different global optimizations, including global allocation of values to registers, removal of invariant expressions from loops, live/dead analysis, dead code elimination, and constant and copy propagation. MetaWare's High C compiler also features a code generator that makes use of 386/387 instruction sets including support of in-line transcendentals and floating-point long doubles (80 bits). The code generator also features in-line intrinsic function; in certain cases, the compiler replaces a call to the C library with the actual in-line instructions, resulting in code that is smaller and performs fewer operations. The High C compiler provides ANSI compatibility, cross-language calling, acccurate and helpful diagnostics, and maximum configurability. Developers can select from a wide variety of compiler features through the use of toggles and programs. MetaWare supports the complete Intel 80x86 microprocessor family including the 8086, 80186, 80286, 80386, and 80486, and the Intel i860; Advanced Micro Devices' Am29K; Sun Microsystem's Sun386i, Sun-3, and Sun-4 workstations; Motorola's 680x0 family of processors; IBM's PS/2, RT, and 370; and DEC's VAX. Operating system support includes UNIX 4.x BSD, UNIX System V.x, SunOS, IBM's AIX, DEC's Ultrix, MS/PC-DOS, OS/2, DRI's FlexOS, AIA's OS/286 & 386, Phar Lap's 386DOS-Extender, DEC's VMS, and others. Most platforms are supported with native and cross compilers. For more information contact MetaWare Incorporated, 2161 Delaware Avenue, Santa Cruz, CA 95060-5706 (408) 429-6382; FAX (408) 429-9273. FairCom Announces Update For c-tree File Handler FairCom has announced c-tree File Handler/Server v4.3, which provides functions to store, update and retrieve fixed or variable length data in random or sequential order. c-tree comes with source code and employs portable client/server architecture. The new version has a high speed sorted key load routine enabling virtually linear time index creation regardless of the number of index entries. Another function returns the key value at an approximate given percentile of the ordered key value list. The new version also estimates the number of entries between two key values. c-tree v4.3 has new make files and scripts for OS/2, Watcom, MPW v3.0 and Commando tool support for all of MPW. There is server support for LightSpeed C on the Mac and server/client support for Turbo C. Reuse of depleted nodes in single-user and c-tree Server modes of operation is possible. Version 4.3C lists at $395 (plus shipping and handling). To order contact FairCom Corp, 4006 W. Broadway, Columbia, MO 65203, (800) 234-8180 FAX (314) 445-9698. Coromandel Releases C-Trieve For MS-Windows Environment Coromandel has announced the release of its C-Trieve-ISAM file manager for MS-Windows. C-Trieve/Windows, now shipping, is based on the X/Open standard. It also runs under MS-DOS, XENIX, UNIX and DESQview. C-Trieve can be used by both C and C++ programmers. C-Trieve/Windows is a library of routines that allows the programmer to build custom data management applications. C-Trieve/Windows is based on a Client-server model. A single server can support multiple clients and maintain application integrity using locking and transactions. C-Trieve/Windows is based on C-Trieve which is the native file manager of Coromandel's RDBMS, C-SQL. The current offering includes dBase and Btrieve. C-Trieve users can upgrade to C-SQL and continue to use their files; no need exists to translate or modify the data for SQL access. For more information contact Coromandel Industries, Inc., 108-27, 64th Road, Forest Hills, NY 11375 (718) 997-0699; FAX (718) 997-0793. Eigenware Tech Offers CSL Buyer's Guide Eigenware Technologies now has available a 45 page buyer's guide for the C Scientific Programming Library. This guide provides a description of the CSL product and several other related products and services. These other products include compilers, editors, technical monograph, and TeX typesetting software used for CSL documentation. Detailed ordering and international shipping information is also supplied in the buyer's guide. The guide is available for $5 from Eigenware Technologies, 13090 La Vista Drive, Saratoga, CA 95070. For more information call (408) 867-1184. QuickGeometry Receives Upgrade Building Block Software has released QuickGeometry Library v1.01, a collection of math subroutines for developing CAD/CAM, parametric design, NC programming, post processing, finite element analysis or other similar programs. The major enhancements are the addition of support for Turbo C, and internal changes that simplify interfacing to graphics libraries. The QuickGeometry Library provides CAD/CAM programmers with routines for standard geometric operations required for CAD/CAM software development. In addition, the QuickGeometry Library provides routines that read and write DXF files, and that manage lists. Selling for $199, the product includes source code, object code for MS-DOS, extensive documentation, working example programs, one hour of telephone support and a 30-day money-back guarantee. For more information contact Building Block Software, PO Box 1373, Somerville, MA 02144 (617) 628-5217. We Have Mail [Editor's Note: Yes, we omitted the listing from last month's letters column. It appears as a separate article in this issue, Dealing With Memory Allocation Problems. -- rlw] Dear Sir, It has been many years since I sent a letter to a periodical, some 16 or 17 years to be precise. I have some 22 years of programming background, ranging from systems programming to applications and telecommunications. As the original designer and author of SHADOW (IBM mainframe telecommunications system), and co-designer of MANAGE-IMS I feel I can speak with some experience. I mention my background not to attempt to impress, but to add some weight to my words about the latest fad in the C world. C++. When C was first inflicted on us I welcomed it and disliked it, however, two facts stand out. First, K&R are undoubtedly very bright people with much insight. Secondly, ANSI cleaned up the loose ends and now C is a serious commercial language. C is now one of the four that the IBM SAA endorses. I have written in C since 1982, using MVS, UNIX and the micro versions. Many years ago we in the mainframe world discovered the benefits of control blocks, pointers and vector tables. In fact the control block structure of any dynamic operating system is, no ifs, no ors, no buts about it, is an object oriented programmed system. This "new" concept of O.O.P. (object oriented programming) is what worries me. First, it is not new. We have used object oriented systems for all the 22 years I have been in the industry. I have a fear that OOP will become OOPS. I feel that as far as C goes, C++ is violating the cardinal rule "IF IT ISN'T FIXED, DON'T BREAK IT"! I have studied OOP systems, the new window systems are OOP, and on the whole well done. They exist without the ?benefit? of C++. As is stands, C supports objects very well. I have an example in the C language forum of Compuserve, complete in and of itself for anyone who cares to study it. In short, C++ is a farce. C++ I feel was implemented by some well intentioned people who have no serious commercial programming expertise, and certainly no IBM mainframe internals experience. C++ is a random collection of items, a mixed bag of minor changes, and the OOP extension. The minor additions attack the heart of structured programming (for example allowing data to be defined anywhere code may exist). They had some good ideas, existing for a quarter of a century in the mainframe world, such as defaults. Yet the defaults are positional as opposed to pure keyword! When keyword parameters are introduced into functions and macros then a whole new world is opened up. C++ felt it was better to stick to methods flying in the face of good mainframe experience and thus limit its abilities. The data reference, the change to casting, the inline functions are questionable at best, and ignore the potential increase in power of the processor and the optimisation ability of future compilers. Programmers are made to get involved with optimisation, not the machine. Overloaded functions I admit are a benefit. They are the base of the C++ object implementation. I ask myself if that benefit isn't perhaps the only benefit of C++. The object oriented side of C++ does nothing, except inheritance, that any C compiler today can do. And if serious preprocessors were defined with global symbols then inheritance can also be implemented. What I am saying is that rather than C++, let us have a full preprocessor with typical mainframe abilities, and skip the rest. C was designed to be bare bones, enhanced (very successfully) with functions. The quantum jump should be a preprocessor and proper macro and language preprocessor such as the IBM assembler macro facility. The next quantum leap is not the poorly thought out ideas of C++. In creating an object based system, much thought has to go into the structure, and this is true whether C++ is inheritance and scope, easily controlled in other ways if C is used, employing run time inheritance and binding. I am getting suspicious that perhaps AT&T felt it was losing control of its brilliant child, "C" and needed to show that perhaps they were still in the lead. I suspect that since OOP was becoming more the rage that they jumped on the bandwagon. They used that to reestablish their leadership. The C++ authors wanted to become the next generation of venerated programmers, to be the next K & R. I am sorry, but as senator Benson put it, "they are no K&R". OOP was not invented by AT&T, it is a long established method for handling interrupt and interrupt driven systems. The resurgence of OOP came about with among other things the need to handle the dynamic world of dynamic objects such as in windowing systems and the like. OOP is a good discipline where applicable. It has many uses in the distributed processing world of the future. I hope that the readers will take a closer look at C++ and study some OOP systems implemented in C and realise that C++ is a farce, a joke being perpetrated on the data processing world. I am all for positive change, this isn't it. I am recommending to my company that C++ not be implemented. I note that there will be no ANSI C++, they have seen the light. I thank you for your patience, Simon Wheaton-Smith 2902 N. Manor Dr. West Phoenix, AZ 85014 You're welcome to my patience, but not to any support for your position. I wonder if K&R had any IBM mainframe internals experience? If not, perhaps we should make them rescind C? -- rlw Dear CUJ, Please allow me to introduce myself. My name is Chris Proctor. I'm an IBM mid-range systems contractor. I felt compelled to write you a letter to tell you why I would not be renewing my C Users Group subscription. I am relatively new to C programming and I was hoping that your magazine would provide me with helpful hints and programming tips that would help me become a better C programmer. Unfortunately, in most issues I found nothing that was beneficial to me. Please believe me when I tell you that I am not "knocking" your magazine at all. I'm sure that if I was more knowledgeable in C, your magazine would be very interesting. But, quite frankly I don't understand half of the articles in each issue. What I would like to see is an article or section of each issue dedicated to the basics of C, or at least programming tips that the layman can understand. I can't believe that I am the only one that has not renewed my subscription because the articles are "over my head". Perhaps, something like I have mentioned may even increase subscriptions just from people glancing through the C Users Journal on the magazine rack. I realize that you have to appeal to the masses and not the exceptions and if that's the case, I'll probably subscribe to the magazine when I feel that it would be of some use to me. You have an excellent magazine. Keep up the good work. Sincerely yours, Chris Proctor 21352 Avenida Ambiente El Toro, CA 92630 I too would like to see some quantum of good tutorial material in every issue, in addition to the more demanding copy. Unfortunately, we don't get very many well-written tutorial submissions. If my readership includes some willing but uninspired authors, here's your chance. Send us a concise but thorough tutorial on some aspect of C. We need more such submissions than we are currently receiving. -- rlw Dear Howard, I was pleased that my article, "The C Programmer's Reference: A Bibliography of Periodicals," appeared in print in your January, 1990 issue. However, I was dismayed to learn that I had inadvertently omitted a couple of worthy entries. These annotations, with the appropriate citations, are as follows: C Gazette (quarterly, $6.50/issue, $21.00/year) C Gazette, 1341 Ocean Avenue #257, Santa Monica, CA 90401. A "code-intensive" quarterly which thrives on printing lots of C code (and some C++). Specializes in MS-DOS and OS/2, but no UNIX. An in-depth publication aimed at intermediate and advanced C programmers. Few advertisements and few reviews. For programmers who are serious about their C code. Journal of C Language Translation ($235.00/year) Journal of C Language Translation, 2051 Swan's Neck Way, Reston, VA 22091. An academic quarterly which just recently commenced publication. Aimed at compiler writers and programmers who must implement the ANSI standard in language products. Covers extensions to the standard, such as implementation of numerical representation, etc. No advertisements and few reviews. An important resource for programmers in this narrow niche. I had compiled the original bibliography some time ago, and from the holdings of a corporate library. I assumed that the library's holdings were relatively complete, and I overlooked the two periodicals above. I hope that this letter will fill the gap. I regret it if anyone was offended, and I trust that this information will further assist readers of The C Users Journal in their language research. Sincerely, Harold C. Ogg Chicago State University The Paul and Emily Douglas Library Ninety-Fifth Street at King Drive Chicago, Illinois 60628-1598 (For those wondering, Howard is our editorial coordinator. I should let him respond to this letter, but he's buried somewhere under some manuscripts and pasteups.) I appreciate the information. In addition to his column for CUJ, Rex Jaeschke also writes a C column for DEC Professional -- not a "C magazine", but at least another C resource. If you regularly refer to a C-related information source we failed to include, please write and we'll mention it here in a future issue. -- rlw Dear Mr. Ward; I'm glad the C Users Journal is starting to publish articles on the Macintosh, its development environment, and its operating system. Keep 'em coming! Nice article by Allan Brown [Bruton] in the October '89 edition. True, the Macintosh toolbox does add some additional complexity, but once one becomes accustomed to it -- and it may take quite a bit of time becoming fluent in "toolboxese" -- one can be assured, though, that there is less likelihood of code obsolescence and greater possibilities for code portability among the various Macintosh hardware platforms and operating systems by following the development guidelines and using the toolbox calls for performing window manipulations. Anyway, I tried executing the code presented on page 99 (Listing 1), and the code as written does not draw a set of nested rectangles as promised at the beginning of the article. When one executes the code specified in Listing 1, nested triangles are drawn on the screen. To obtain nested rectangles the variable yb will have to initialized to read yb = 25; rather than yb = 300; as printed in the article. That's the only change necessary for having the Macintosh draw nested rectangles. Thanks again for printing an article of interest to programmers who program the Macintosh in C. Yours truly, Clifford J. Campo 123 Fennerton Road Paoli, PA 19310 Gee, you mean rectangles have four sides? Maybe I should spend more time watching Sesame Street with my son. Thanks for the correction, and thanks for noticing our Macintosh coverage. We've really worked hard to get those stories. -- rlw Dear Robert: I'd like to offer several comments to your "Publisher's Forum" in the August 1989 issue. I like the new glossier paper; I think it makes the pages easier to turn because there's less friction between them. Goodness knows, we readers don't want too much friction. (Truly, I do like it better.) I can't tell you what a relief it is to read that you're refusing to get involved in C puns. At least in your articles. Your advertisers more than make up for it. (Of course, it's not just CUJ advertisers...) Too bad X3J11 didn't outlaw C puns as part of the ANSI standard. Regarding swimsuits, etc.: I agree that would be out of place in CUJ. There's plenty available elsewhere. However, your comment, "Wouldn't you rather explore lex than sex?" leaves me concerned. Have you somehow arrived at the assumption that real programmers are so obsessed with digital high tech that they will forego sex? Of course not. How do you think we burn off all of that Jolt and pizza? Not at a keyboard surely! Speaking of sex and assumptions, and here I am finally being serious, there's a big one or two in your comment, "We've even considered running pictures of all the staff (especially the women since most of them are single and most of you are male).", namely that all male CUJ readers are straight. I assure you, it ain't so! About 10% of most any population is gay and lesbian, and while I haven't seen any polls to confirm that this is true of programmers, I have no reason to feel I should believe otherwise. So, if you were to do swimsuits, it would only be fair to include your female and male staff. Fair to your straight women readers too, don't forget them! CUJ is great, please keep it up (speaking of standards and high ones at that)! Sincerely Bill Lee 5132 106A Street Edmonton Alberta, CA T6H 2W7 What can I say? -- rlw To The C_Users Group, Concerning Numerical Software Tools in C. It is a fine book for those starting to program in C. Any book in your Advanced topic area, I as well as all others, assume that Advanced means just that -- Advanced! An advanced book would be like Numerical Recipes in C by Press et.al. from Cambridge University Press. You truly need to re-analyze what is considered advanced considering that more and more books actually treating advanced topics are coming out. In the past, few knew anything about C. Since it is now the #1 language of choice, advanced isn't the advanced of yesterday. The book which I'm sending back should be considered elementary to intermediate. Even though it was published in 1987, does not mean that it is advanced. Further, four routines of the most elementary type, does not in my view constitute "Tools". Tools to me are a compendium of primitives that one may use in developing one's own applications. This book falls way short of that. Again further, the price is outrageous for what one receives. Jerry Rice, PHD. 504 Eastland St. El Paso, Texas, 79907 In all truth, I haven't read this book. In fact there are more then a few books among the 100 or so that we carry that I haven't read. Except when I have personal knowledge of the book's contents, we rely upon publisher's descriptions when categorizing the book. -- rlw User Interface Language Eases Prototyping Vincent Guarna and James Krause This article is not available in electronic form. Using 'Screen Machine' Rick Knoblaugh Rick Knoblaugh is a Systems Engineer specializing in systems programming for PCs. He is the coauthor of Screen Machine, a screen design/prototyping/code generation utility. He may be reached at 15014 River Park Dr., Houston TX 77070. Prototypes and code generators can significantly reduce development costs. In this article I'll discuss a recent consulting project and show how the "Screen Machine" -- a prototyping tool which I am making available to other programmers as shareware -- assisted in prototyping, generating C code for the user interface, and documenting the system. The Application My project was a student grade tracking application for a high school. The software allows student names and grades to be scanned into a PC clone using an optical mark reader, a scanning device which reads forms which have been marked with a pencil. Student names and grades can also be manually entered or edited. The product enables teachers to maintain their grade books on a PC. Grade tracking and printing tasks, such as letters to parents, are all handled in a menu-driven environment. Thus, the application required menus, data entry screens and help screens. I began by planning the major components of the software, such as the scanner communications and the decoding of the scanned data. Next I needed to develop a user interface from which all program functions could be selected. For this phase the user interface prototyping software was invaluable. Benefits Of Prototyping In the past programmers who have developed interactive programs, have painstakingly designed the appearance of screen displays on paper and then written the code for these user screens. Today, many developers are using some type of screen prototyping software. Most prototyping tools permit screen design using a powerful screen editor. Screen editors make it much easier to manipulate blocks of data, to center screen data, and to experiment with color and other aspects of screen appearance. In addition to a screen editor, prototyping packages usually include some control facility that allows branches to various screens to depend on user input. This allows the developer to create the "look and feel" of a user interface before any code is written. Prototyping also lets the user become more involved in the design of the user interface. More importantly, it allows the programmer to be more creative and to develop an interface that makes sense. Some prototyping tools also provide code generation for the screen displays. Once the screen design is finalized, the program automatically generates the associated source code. Screen Machine Screen Machine runs under MS-DOS and consists of a screen editor/code generator, a mini-language for prototyping the flow of application screens, and a TSR screen capture program which allows any text mode screen to be imported into the screen editor. Screen Machine can generate source code for screens in your choice of C, BASIC, Turbo Pascal, 8086 assembler, and dbase. Screen Machine is limited to handling display portions of screens only; it does not handle data input. The prototyping module permits the input of single keystrokes, allowing screens to be displayed when the operator selects a menu option or presses a specific key. Designing Screens With SCREEN I experimented with the appearance of the grade tracking application screens using Screen Machine's screen editor and code generator, SCREEN.EXE. As with most applications, I started with the main menu (Figure 1). The SCREEN box drawing feature makes it easy to put borders around menus and other screens. Text can be centered on a given line of the screen or within the graphics character borders of a drawn box. You can even shift the entire screen left or right to aid in centering screen data and attributes. Other screen editor features include: inserting and deleting lines, copying and moving blocks, selection of color, reverse video, undo of last editing function, key stroke macros, and online help. I saved my designed application screens in Screen Machine screen data files. (Screens can be saved with or without attributes). If no color or reverse video is needed, the screens can be saved as ASCII text files. Prototyping The Interface Once the data files for all application screens are complete, the programmer develops an executable simulation of the application interface using the Screen Machine's mini-prototyping language module, SHOW.COM. The completed simulation will display the main menu, accept keystrokes, and based on these keystrokes, select other application screens for similar processing. The SHOW mini-language consists of display/keystroke input statements, case statements, and goto and gosub statements. The heart of these is the display/keystroke input statement, whose syntax is: Filespec [basekey max] [/Tn] [/An] [/Xn] Filespec names the screen data file to be displayed. (e.g. I saved my main menu screen data file in C:\GRADE\MAINMENU.SRN.) The basekey is optional and represents the lowest-valued key accepted as input from the user when the screen is displayed. The basekey is one of these: A specific key, enclosed in quotation marks (e.g. "1"). A decimal scan code value (unquoted) (e.g. 59 for the <F1> key). An unquoted asterisk (*), which is taken to mean "any key". The max cannot be specified unless basekey is specified; it is the highest-valued key accepted as input. If input from a given screen falls neatly within a range of keystrokes (e.g. if on my main menu only "1" to "9" were used, and not <Alt><H>), specifying basekey and max eliminates all unwanted keystrokes. The T switch specifies a time value in seconds -- useful for creating timed "slide shows". SHOW will display the screen data file and then wait n seconds (0-255) before displaying the next screen. The A switch displays a screen data file in a certain attribute. This is generally only used if you have not saved attributes in your screen data files. The X switch is the key on which a "getout" is performed. "n" is specified in the same manner as basekey and max, i.e. either a quoted character or an unquoted scan code. A "getout" is accepted as a valid key press and performs any pending return or else returns to the operating system. Case statements allow branches to other portions of a SHOW command file to depend upon keystrokes input via the display/keystroke input statement. The syntax for the case statement is: case [key] [range] [S: G: R:] [label name] If a keystroke matches key or falls within range, control is transferred to label name. If S: is present, the transfer is executed as a gosub, meaning the address of the next display/keystroke input statement is put onto the "stack" and control is transferred to the label. G: does a goto transfer to the label. R: returns to the label (similar to BASIC). The syntax for labels is the same as in MS-DOS batch files (i.e. a ":" followed by a label name). The grade tracking SHOW command file appears in Listing 1. The top of the command file displays the main menu which is stored in the Screen Machine screen data file, MAINMENU.SRN. The asterisk after the file name instructs the SHOW program to wait after displaying the main menu until some key is pressed. The /X indicates that if a 9 is pressed, the SHOW command file should terminate and return to MS-DOS. The case statements perform gosubs to other labels in the command file. For example, if the user presses a 6, SHOW will gosub to the label otherprint where the print options menu is displayed and processed. The strange looking NUL screen data file name followed by the case * G:top is necessary because the limited SHOW command set only allows unconditional branching to be initiated in case statements. Case statements can only be performed after a screen data file has been displayed by a display/keystroke input statement. The reserved screen data file NUL only satisfies the case statement by simulating a screen display and a key stroke entry. The asterisk indicates that if any key is pressed, a goto should be performed to the label top. After the appropriate gosub is processed from the main menu, control transfers back to the top of the command file. Generating The Source Code A SCREEN program configuration option allows you to select the language to be generated. When SCREEN generates C code, it declares a structure _scrn and defines a global array of structures of type _scrn (Listing 2). Notice that the array of structures is named with screen_ followed by the name of the screen data file to prevent naming conflicts. After including these statements in your program, you can either write a routine to display the arrays of structures, or include the routine supplied with Screen Machine in your program as in Listing 3. The routine uses the BIOS software interrupt 10h function 9 to display the arrays of structures. Function 9 writes a character and attribute at the current cursor position. The Microsoft C library function _settextposition is used to position the cursor. The function disp_screen is called passing the name of the array of structures to be displayed and a flag indicating whether the screen should be cleared prior to displaying the data. disp_screen clears the screen using the background color defined in the variable color_back_grnd. This should be set to the desired background color. Data Entry Because Screen Machine handles only display portions of screens, I used my own general-purpose data entry routines for those portions of the application where data entry was required. Screen Capture A third Screen Machine module CAPTURE.COM is a TSR program that allows text mode screen displays to be captured and stored on disk. This utility makes it easy to include application screens, complete with sample data in the users manual. CAPTURE takes over the shift print screen function (interrupt 5). When the program is invoked and becomes resident, command line options specify the file name under which screen data files are to be stored and whether or not attributes should be included in the screen data files. If attributes are not desired, screen data is stored in ASCII text files. Captured screens can also be used as input into the screen editor/code generator. This means that any text mode screen can be translated into source code which will display that screen in one or all five supported programming languages. This capability can be used when translating an application from one language to another or if you want to generate source code for screens created with a prototyping tool that doesn't support source code generation. Conclusion Screen Machine does several things satisfactorily. Its lack of support for input fields may preclude your using it for some applications. Certainly, if you need really detailed simulations of your programs such as sound effects and emulation of disk I/O, you should use a more full-featured commercial prototyping program. Also, if you require a graphics interface then Screen Machine will not help you. Figure 1 Grade Book Main Menu 1) Scan Grades 2) Edit/View Grades 3) Print Grade Book 4) Scan Names 5) Print Rosters 6) Other Print Functions 7) Set Teacher Information 8) Drop Lowest Grade 9) Exit For help, press < Alt > < H >. Listing 1 /*SHOW command file for grade tracking program. /* ---------------------------------------------- :top mainmenu.srn * /x"9" /*display main menu, accept any key, /*exit to dos if "9" case "1" s:scangrades /*gosub to display the appropriate screens case "2" s:editgrades case "3" s:printgrades case "4" s:scannames case "5" s:printrosters case "6" s:otherprint case "7" s:setteacher case "8" s:droplow case 35 s:mainhelp /*<alt><h> help /*When all gosubs return, branch back to top. You can only branch /*as part of a case statement and you can only have a case statement /*after display/keystroke input statement. Thus, the special NUL /*screen name can be used to branch anytime. nul /*special reserved display/keystroke input statement case * g:top /*branch back to top of command file /*----------------------------------------------- :scangrades scangrad.srn /*display scan grades screen and wait for a key case * r: /*return /*----------------------------------------------- :editgrades editgrad.srn * /x1 /*display edit/view grades screen and wait for /*a key, return to caller if esc (scan /* code 1) is pressed case 35 s:edithelp /*if <alt><h> (scan code 35) is pressed, go /*display the edit/view grade help nul case * g:editgrades /*go back to edit/view grade screen /*------------------------------------------------- :printgrades /*display print grades screen prtgrade.srn * case * r: /*----------------------------------------------- :scannames /*display scan names screen scanname.srn * case * r: /*----------------------------------------------- :printrosters /*display print rosters screen prtrost.srn * case * r: /*---------------------------------------------- :otherprint /*display other print options menu prtmenu.srn "1" "6" /x"6" /*accept only 1-6, return to caller if 6 case "1" s:report1 /*branch to report 1 screen case "2" s:report2 /*branch to report 2 screen case "3" s:report3 /*branch to report 3 screen case "4" s:report4 /*branch to report 4 screen case "5" s:report5 /*branch to report 5 screen nul case * g:otherprint /*------------------------------------------------ :report1 reportl.srn * case * r: /*------------------------------------------------ :report2 report2.srn * case * r: /*------------------------------------------------ :report 3 report3.srn* case * r: /*------------------------------------------------ :report4 report4.srn * case * r: /*------------------------------------------------ :report5 report5.srn * case * r: /*------------------------------------------------ :setteacher setteach.srn * /*display set teacher information screen case * r: /*------------------------------------------------ :droplow droplow.srn * /*display drop lowest grade screen case * r: /*------------------------------------------------ :edithelp edithelp.srn * /*display edit/view help screen and return to case * r: /*caller when any key is pressed /*------------------------------------------------ :mainhelp mmhelp.srn * /*display main menu help screen and return to case * r: /*caller when any key is pressed Listing 2 struct _scrn { char *chrs; /*pointer to screen text*/ char cw; /*column where text appears*/ char rw; /*row where text appears*/ char att; /*attribute in which text appears*/ }; struct _scrn screen_mainmenu[]={ Click Here for Figure Listing 3 /*various include files*/ #include <stdio.h> #include <graph.h> #include <bios.h> #include <dos.h> #define FALSE 0 #define TRUE 1 #define VIDEO 0x10 /*software interrupt 0x10 */ #define WRITE_ATTR_CHAR 9 /*function 9 */ void disp_screen(struct _scrn *, unsigned short ); struct _scrn { char *chrs; char cw; char rw; char att; }; struct _scrn screen_mainmenu[]={ {"èëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëë£",21,6,31}, {" ",21,7,31}, {" Grade Book Main Menu ",21,8,31}, {" ",21,9,31}, {" 1) Scan Grades ",21,10,31}, {" 2) Edit/View Grades ",21,11,31}, {" 3) Print Grade Book ",21,12,31}, {" 4) Scan Names ",21,13,31}, {" 5) Print Rosters ",21,14,31}, {" 6) Other Print Functions ",21,15,31}, {" 7) Set Teacher Information ",21,16,31}, {" 8) Drop Lowest Grade ",21,17,31}, {" 9) Exit ",21,18,31}, {" ",21,19,31}, {" For help, press <Alt><H>. ",21,20,31}, {"àëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëëx",21,21,31}, {"\0",0,0,0} }"; main() { disp_screen(screen_mainmenu, TRUE); /*clear screen and then display the screen defined by screen_mainmenu*/ } long color_back_grnd= 1; /*all screens will use a blue background*/ /*----------------------------------------------------------- disp_screen - Use ptr passed to array of structures containing &text; col; row; and attribute. Use BIOS int 10h function 9 to display the data. If cls_flag is TRUE, clear the screen before displaying the data. When clearing the screen, use the attribute defined in the variable color_back_grnd ------------------------------------------------------------*/ void disp_screen(p, cls_flag) struct _scrn *p; unsigned short cls_flag; { char wcol; char * wsptr; union REGS inregs, outregs; if (cls_flag) { _setbkcolor(color_back_grnd); _clearscreen(_GCLEARSCREEN); } inregs.h.ah = WRITE_ATTR_CHAR; /*print char and attribute*/ inregs.x.cx = 1; /*print 1 char*/ while ( *(p->chrs) ) { wsptr=p->chrs; /*get ptr to string*/ wcol=p->cw; inregs.h.bh = 0; /*video page 0*/ inregs.h.bl = p->att; /*attribute to use */ while (inregs.h.al = *wsptr++) /*char to print*/ { /*position the cursor*/ _settextposition( (short) p->rw, (short) wcol++); int86 ( VIDEO, &inregs, &outregs ); /*print with BIOS*/ } p++; } } Prototyping Experiences Brett Martensen Brett Martensen is a Senior Systems Consultant with SRI Strategic Resources, Inc. He specializes in tools and techniques, including CASE, prototyping and JAD, to develop database applications. Areas such as entity relationship data modeling are his forte. He has a M.Sc. in Computer Science (1976) from Queen's University (Kingston, Ontario). When developing a prototype, one is faced with reaching a maximum level of functionality across the maximum scope of the application, but within a minimum time frame. Two productivity tools help reach these conflicting goals: the CASE (Computer Aided Software Engineering) tool, which is the specification engine, and a DBMS (DataBase Management System), which is the application engine. I recently participated with a team to develop a prototype system for Canada Post Corporation. This prototype had to be robust enough to be used across the country, at a number of different user sites during a three-month trial. Thus, it had to be more functionally complete than would normally be expected of a prototype. Background A prototype is a miniature system which approximates the final system but provides only a subset of the application's scope and functionality. As such, a prototype comes with all the benefits associated with modeling. A model is easier and certainly less expensive to change than a real system. Prototyping permits developers to elicit, model and then capture user requirements for a system. Like the buildings on a movie set, a prototype must look real even though it is only a facade. On a movie set, certain buildings have rooms, some of which are furnished. Similarly some features in a prototype are fully implemented, while others remain as images only. There are four levels of functionality used when describing a prototype: Level Functionality one Screens only two Screens with field entry and edit, some controllable flow. three Level two plus Create, Retrieve, Update and Delete of data and Menu linking screens together. four Level three plus Integrity checking, a correctly structured database and some application specific algorithms working. Most prototypes end up as a mixture of these levels applied to different parts of the application. The prototype developer must be able to respecify the data model and quickly regenerate the database structure because a prototype is not a static model; it goes through a number of iterations. A typical prototype development iteration consists of analyzing the requirements (read documentation, conduct interviews); specifying (data model, functional specifications); designing (display layout, reports); developing (program, fill in tables of data); demonstrating and using; and finally, reviewing the design with users using Joint Application Design (JAD) techniques. The JAD is a meeting in which a consensus can be reached amongst the user population as to the system requirements. The results of the JAD provide the requirements for the start of the next iteration. Theoretically, this cycle should be repeated three times during a prototyping project. Ideally, each iteration of the prototype progresses closer to the specification for the final system. A rule of thumb is that sixty percent of the remaining requirements are captured in each cycle. After the second cycle, the system should be 84 percent correct and after the third cycle, 93+ percent correct (rather like golf where each shot gets you closer to the hole). At some point, however, diminishing returns make further iterations pointless. The Canada Post Prototype The Canada Post Corporation prototype was developed using E-R Designer from Chen and Associates, a CASE tool for data modeling to specify the data model; and ZIM from Sterling Software, Zanthe Systems Division, a powerful 4GL DBMS to develop the prototype. ZIM is a natural choice for prototyping projects. It directly implements an Entity-Relationship data model which it keeps in an Active and Integrated Data Dictionary. An entity-relationship database structure allows transferring the conceptual data model produced in the specification stage directly into the database data dictionary. An Active data dictionary is important because the programs can access all the metadata. An Integrated data dictionary stores display specification information as well as the database structure. An advantage of choosing E-R Designer is that the data model can be exported and used to create the ZIM database description without rekeyboarding. ZIM also runs code either in interpreted or compiled mode. Since prototype performance is not a consideration, the interpreted mode is used. This mode has a macro substitution capability which allows names of entities, relationships, displays, and ZIM routines to be substituted in the programs and resolved at run time. The data dictionary stores all these names, as well as segments of ZIM code for macro substitution. Thus, the data dictionary becomes the repository of the prototype specifications, data, displays and programs. Other tools such as WordPerfect and internally developed ZIM utilities increased productivity further. We used WordPerfect's line draw facility to draw boxes and rapidly design the displays. Then below each display, the position of all the fields and their prompts were specified. A short ZIM program analyzed this information and created the display form specifications in the ZIM data dictionary. One of ZIM's most useful prototyping features is the way in which display forms relate to entities in the database. If a field on a display form is given the same name as a field in an entity, then the ZIM command CHANGE Form FROM EntitySet fills in the fields of the current form from the fields of the same name in the current record of the entity set. Similarly, ADD EntitySet FROM Form creates a new record in the entity set from the data entered on the form. This functional relationship between display form fields and entity set fields is re-inforced by maintaining the metadata in the data dictionary for each entity set field. (See Table 1.) The other ZIM database information normally needed length, decimals, indexed and required are also stored in the data dictionary. Other useful attributes, such as default value, data mask and validation rules, exist for the fields in the display form. We developed a number of generic, table-driven modules for the prototype's functional side. The concept of table-driven software is extremely powerful: by simply modifying the data in the tables, the developer can rapidly change the way in which a module functions, what it operates on and how it appears to the user. For example, we developed a menu program which was totally table driven. By changing the tables, the routines executed when the menu items were chosen, the menu structure and menu functions were modifiable. With the table-driven approach, code can be reused. For example, only one modify routine can be applied to any entity set, and an enhancement made to a routine is universally available in the prototype system. Thus, the only nonreusable code is the application-specific algorithms, which are all linked in via the program name attribute attached to each field. For the Canada Post Corporation prototype, generic table-driven routines were developed for Menu Level 1 and 2 screens (slide show) Entityset Lister, which provided access to the functions of: Sort list Pick record Print list Find record Add record Modify record Delete record Help This list doesn't include "Reporting" since programming a general-purpose reporting module is difficult to perform using the table-driven approach. The Print list routine provided simple reports. More complex reports which needed to were hand-coded in ZIM. Reports that did not need to function were presented as Level one screens. There exist both a spreadsheet and business graphics package which take data from a ZIM database and allow for its manipulation, analysis and presentation. Given that the Canada Post Corporation application involved large quantities of statistical information, both these packages were linked into the prototype to assist the user in performing ad hoc inquiries and analysis of the data. These two packages were especially useful for the design and creation of new and existing reports to elicit user feedback. They added substantial functionality to the prototype with very little effort. The modules of the prototype were linked together in the calling sequence as shown in Figure 1. This prototype environment can be easily extended as new generic routines are developed. A table-driven ad hoc inquiry function could be built and linked in via the menu. Since the purpose of a prototype is to develop a working model with frequent feedback from the user population, it is appropriate to add a feature which captures users' ideas while they are fresh in their minds. Since "help" was available throughout the prototype, a suggestion box feature was added to the "help" module which allowed for on-line and in-context idea capturing. These ideas were collected, printed and analyzed during the JAD sessions. User feedback was very general on the first iteration: "We also need to be able to store information about our services," for example. In subsequent iterations, suggestions became more specific: "Use the word 'item' rather than 'product'." The final goods delivered from a prototyping project consist of the working prototype and a large quantity of documentation. The documentation covers the working prototype and user requirements that were not implemented in the prototype, such as complicated application-specific algorithms or feedback from the final JAD. Conclusion The combined use of CASE tools and 4GLs allows for greater productivity in prototype development. Using generic table-driven modules results in less software development. As a result, the workload shifts to the specification and analysis tasks. As in so many situations where technology is applied to the automation of the simpler tasks, fewer people are required, but the ones remaining need more expertise. The skills of business and functional analysis, data modeling, and design, become more important than programming. Reference Application Prototyping: A Requirements Definition Strategy for the 80's, Bernard H. Boar, Wiley-Interscience Publication, John Wiley & Sons, New York, 1984. Figure 1 Name The unique name given to this data element. Type Whether alpha-numeric, integer, character, etc. Prompt The full username to appear on displays. Column Header An abbreviated name to appear at the top of any list. Program Name The name of the ZIM routine to be executed whenever the value of the field is changed. Helptext A user-readable explanation of the data element, its possible values and purpose, if necessary. Figure 2 The UI2 Code Generator Paul Combellick Paul Combellick has a BS in Petroleum Engineering from the University of Alaska, Fairbanks. He is a contract programmer specializing in dBASE and C local area network (LAN) database applications and can be reached at (602) 280-2569 or via Compuserve at 70671,3054. As a Network DBMS applications developer, I recently undertook my first major C project. After having scoffed at UI2 for several months, I decided to give it a try as a fast prototyping tool. The resulting productivity gains exceeded my most optimistic expectations. I was able to produce about fifty percent of the 20,000 lines of C code in this application using UI2. I used UI2 with Vermont Views Screen library and Btrieve Record Manager to build a Network DBMS application. As I was new to all three tools, I spent a third of project time learning the new tools, with the remaining weeks actually producing generated and hand-written code. By the time I completed this single DBMS application, I had produced a set of templates and template libraries, using UI2 terminology, that would allow me to produce the next Network DBMS application in a few weeks, rather than in months if the code were entirely written by hand. Description Of UI2 UI Programmer Version Two, The Developer's Release, by Wallsoft is a programmable code generator targeted toward dBASE programmers, but is flexible enough to be used for many languages in the MS/PC-DOS environment including C. UI2 contains four major components. Screen Editor. The user can interactively draw screens and define screen entities including menus, background text, boxes, variables, fields. Templates. Code generation language files define how the code for a particular screen will be created. Template libraries. A library contains groups of functions written in UI2's generation language that are called by the templates during the generation of the target source code file. Code Generator. An interpreter that executes the template language to generate the target source code for a particular screen. UI2 is shipped with a set of templates and template function libraries for dBASE programmers. The C programmer will have to create his own templates before any non-trivial C code, other than simple menus, can be generated. Case Study For this application the client specified C for portability to OS/2 and UNIX. The target system is a Novell Network which supports the Btrieve database server. Btrieve also has versions for OS/2 and Intel-based PCs running UNIX. I chose Vermont Views screen library for its portability to UNIX and OS/2. This networked DBMS application contains several types of screen input/output: menu-only screens, data entry-only screens, combined menu and data entry screens, and reports. I created three templates, one for each of the first three screen types. These templates are actually source code files, written in the code generation language, that are executed by the code generator's interpreter. The templates describe how the code generator should handle a particular screen object to create the target source code language. I painted the screens by drawing boxes, menus and their actions, background text, and defining data entry fields. Through the screen editor the user may specify menu attributes such as hot keys and the name of a function to call when the menu is selected. The editor also allows specification of field attributes such as type, width, picture and provision for such user-designed features as begin-field and end-field trigger and validation functions. The programmer designs the template language functions to take advantage of the entities and their attributes defined in the screen editor. UI2 has an interactive mode as well as a command line mode that allows UI2's code generator to be accessed by make. A make response file can include the dependencies for generating target source code files from screen definitions, as well as compiling and linking the entire system. I can now modify a template file and make will call UI2 to regenerate all the affected source code modules. To modify a screen definition file -- either by adding text or a data element, or changing one of the many field, form, or menu attributes -- I don't edit the C source file; instead, I modify the screen definition file using the UI2 screen editor. I design screens so that I never modify the UI2 generated C code files except through the UI2 screen editor. UI2 is not limited to any particular coding style or third-party library and is adaptable to many different compilers. UI2 was used on this project to generate 40 screens and about 10,000 lines of C code. The code generation language syntax is very dBASE-like and the learning period was brief. UI2 Strengths This type of code generator performs very well on repetitive tasks such as building screens. I was able to build all the screens -- both menu and data entry -- entirely with UI2. After learning Vermont Views, Btrieve, UI2 and building templates, I will probably be able to reproduce 40 new screens for a new application in a few days. More importantly, the generated C code will be free of syntax errors and errant pointers. This bug-free code is at least as important as the productivity in creating the code. Once the templates are debugged, future screens will be virtually free of syntax errors. On the very first project, I used UI2 to boost productivity significantly, despite a learning period to become familiar with a new tool. Limitations In light of the fact that UI2 was designed with the dBASE programmer in mind, it lacks a couple of features for the C programmer. The most obvious feature missing is a full-featured dictionary that supports C data types, including structures and scoping concepts, and general data file schema beyond the dBASE file support. However, I was able to work around most of the data dictionary limitations by creating a hidden box in each screen. The box, made up mostly of #includes and external declarations, contained code for the generator to insert literally into the generated C source code. Conclusion I am quite satisfied with UI2. I have created templates for my non-programmer partner to fast prototype systems for prospective clients in order to illustrate what a proposed system may look like. I believe that UI2 will boost my coding and debugging productivity by factors of five to ten in the area of screen generation and maintenance. On future projects I expect to realize tremendous productivity gains now that I am familiar with this tool and have created a set of templates and template libraries to create code that utilizes Vermont Views Screen Library. Listing 1 The fragment of the template in Listing 1 expands to produce the C code in Listing 2. /*********************** define the form ***************************/ <<menuform = 'form' ** set UI var to 'form' for vvdispc.tlb >> /* define a form */ {menuname}_dfmp = fm_def( {formbox.row}, {formbox.col}, {formbox.height}, {formbox.width}, LNORMAL, BDR_NULLP ); /* define boxes around form items ****/ <<define_all_form_boxes()>> /*********** define background text */ <<display_text()>> sfm_help( "*DATA HELP" , {menuname}_dfmp ); /* define form help keyword */ <<define_form_options()>> /******* define form data fields *********/ <<define_all_form_fields()>> Listing 2 /*********************** define the form ***************************/ /* define a form */ CUG_dfmp = fm_def( 0, 0, 21,80, LNORMAL, BDR_NULLP ); /* define boxes around form items ****/ bg_boxdef( 0,0,21,80,LNORMAL,BDR_SPACEP,CUG_dfmp); bg_boxdef( 5,14,11,52,LNORMAL,BDR_DLNP,CUG_dfmp); /*********** define background text */ bg_txtdef( 1, 28, "C USER'S GROUP UI2 DEMO", LNORMAL, CUG_dfmp); bg_txtdef( 2, 28, " ", LNORMAL, CUG_dfmp); bg_boxdef( 5,14,11,52,LNORMAL,BDR_DLNP, CUG_dfmp); bg_txtdef( 7, 19, "Name : [ ]", LNORMAL, CUG_dfmp); bg_txtdef( 8, 19, "Address : [ ]", LNORMAL, CUG_dfmp); bg_txtdef( 9, 19, "City : [ ]", LNORMAL, CUG_dfmp); bg_txtdef( 10, 19, "State : [ ] Zip : [ - ]", LNORMAL, CUG_dfmp); bg_txtdef( 12, 19, "Phone : [ ]", LNORMAL, CUG_dfmp); bg_txtdef( 13, 19, "Fax : [ ]", LNORMAL, CUG_dfmp); sfm_help( "*DATA HELP" , CUG_dfmp ); /* define form help keyword */ /******* define form data fields *********/ CUG_fld1 = fld_def( 7,33, NULLP , FADJACENT , "!!!!!!!!!!!!!!!!!!!!!!!!!", F_STRING , (PTR) name, CUG_dfmp ); CUG_fld2 = fld_def( 8,33, NULLP , FADJACENT , "XXXXXXXXXXXXXXXXXXXXXXXXX" , F_STRING , (PTR) address, CUG_dfmp ); CUG_fld3 = fld_def( 9,33, NULLP , FADJACENT , "XXXXXXXXXXXXXXXXXXXXXXXXX", F_STRING , (PTR) city, CUG_dfmp ); CUG_fld4 = fld_def( 10,33, NULLP , FADJACENT , "!!", F_STRING , (PTR) state, CUG_dfmp ); CUG_fld5 = fld_def( 10,48, NULLP , FADJACENT , "UUUUU-UUUU", F_STRING , (PTR) zip, CUG_dfmp ); CUG_fld6 = fld_def( 12,33, NULLP , FADJACENT , "(UUU)UUU-UUUU", F_STRING , (PTR) phone, CUG_dfmp ); CUG_fld7 = fld_def( 13,33, NULLP , FADJACENT , "(UUU)UUU-UUUU", F_STRING , (PTR) fax, CUG_dfmp ); MEL: A Metalanguage Processor George Crews George M. Crews received his bachelors in General Engineering from the University of Nevada at Las Vegas, and his masters in Engineering Science from the University of Tennessee at Knoxville. He is a "generalist" with over 15 years experience in mechanical and software engineering design and analysis. He may be contacted at 109 Ashland Lane, Oak Ridge, TN 37830 (615) 481-0414. As a mechanical engineer, my experience with analysis programs falls in the areas of structural stress, fluid dynamics, heat conduction, and thermal/hydraulic system simulation. Such programs present the technical software developer with a number of unique problems, not least of which is providing a user-friendly interface. Though program users tend to be computer literate, input data can often be voluminous and tedious to prepare; the typical user may make many runs with only slight modifications as design optimization is often accomplished by repeated analysis. Both input and output must be stored and presented in a manner that allows independent verification and validation. Finally, the information output from one program may be required as input by another. Another big headache is that modern (i.e., graphical) user interfaces tend to be hardware or system-software specific. A good universal interface would free the developer from the nuances of different machines and operating systems, while at the same time representing a standard that machine-specific routines can work with. MEL is my solution for making such technical programs more user-friendly and modularized. MEL (for MEtaLanguage data processor) is a set of input/output utilities that provides a standard interface between the program and the user. It can translate input data written in "pseudo-English" (Example 1) making the data available to the program as variables (Example 2). It can also translate program variables (Example 3) into pseudo-English (Example 4). Effort was made to provide data objects that could be easily incorporated into almost any engineering analysis program (Example 5). The pseudo-English look of MEL means that I/O will be more readable and comprehensible to the user (or checker). Secondly, MEL is object oriented in that it provides a structured and encapsulated I/O interface. Thus, development time will be reduced and future changes can be made to the program more easily. Thirdly, MEL's grammar is simple and unambiguous, with both input and output formats identical so that output from one program may serve directly as input to another. Finally, MEL can read and write data directly to a file so that a permanent record of a run and its results are available. Description In MEL the smallest unit of pseudo-English I/O is called a "descriptor." Its purpose is to describe something, either data or a command, to a program. The general format for descriptors is much like function calls in a typical programming language. An I/O unit consists of a descriptor name (somewhat like a function name), followed by a parameter list, followed by an end-of-unit symbol (the semi-colon). For example, consider the following MEL descriptor, which could be used as part of the input to a piping network analysis program: pipe, length = 100 (ft), diameter = 6 (in); This is a pipe descriptor whose parameters are length and diameter. The values assigned to these parameters would be 100 and 6, and in units of feet and inches, respectively. Although the tokens (names and parameters) making up descriptors are customized by the developer for each individual application program, the above grammar remains the same for all programs using MEL. (See Example 1 and Example 4.) MEL's format was chosen for its simplicity, while allowing for as much flexibility as possible without introducing ambiguity. In MEL, tokens may be abbreviated as long as they remain uniquely identifiable. MEL assumes a default parameter order if parameter names are missing. Comments may be included by enclosing them in double quotes; parameter values may be labeled as "unknown," etc. These format choices are designed to make programs incorporating MEL as convenient to the user as possible. Incorporating MEL In order to incorporate MEL into one of your own programs, you must customize the mel.h header file to be included in your application source code file. First create a "dictionary" for both input and output that defines the proper spelling, number, and types (integer, array, etc.) of data associated with each descriptor and parameter. (Note that by simply changing spellings in the dictionary you could go from pseudo-English to "pseudo-French" or some other "pseudo-language.") The task of defining dictionaries has been made as painless as possible by providing complete instructions and an example program on the MEL diskette available through the CUG library. (The diskette contains MEL source code, header file, documentation and instructions, an example program, and a conversion factor routine. Since a listing of all MEL routines would run over 50 pages, a complete listing has not been included with this article.) You will need to prepare documentation for the user, defining the dictionaries and explaining what the tokens mean. To obtain data from a descriptor, you must first read it and then extract the data (see Example 2). An example of outputing data is shown in Example 3. Allowing the user to input data with different units requires conversion to internal units (ASTM, 1982). Included on the MEL diskette is a routine that can convert more than 150 different units. Additional units and conversion factors can easily be added to the source code. How MEL Was Developed An early decision was to write MEL in C. Fortran is the traditional language for scientific programs; however, engineers like myself are beginning to realize that there is more to technical software development than simply correctly coding a complex algorithm. ANSI C has a number of significant non-numerical advantages over Fortran (Kempf, 1987). C allows for more flexible structured programing and data encapsulation techniques to be applied (also see Jeffery, 1989). C has more operators and program control constructs than Fortran. C allows indirection (pointers) where Fortran does not. C more easily interfaces to existing system software since much of this software is itself written in C. Also, C is a popular language for unconventional computer architectures such as parallel processors (Lusk, 1987) and neural networks. Let me also mention some of C's shortcomings, which are related to its relative naivete for scientific purposes. Dynamic array dimensioning in C is convoluted (Press, 1988). C does not have the numerical library that Fortran does. And finally, C does not allow operator overloading for data structures (complex numbers for example) nor does it have an exponentiation operator. However, I do not think these deficiencies are difficult to overcome. Partly as an experiment to form my own opinion about OOP, the design of MEL incorporates the object-oriented paradigm. I chose to make use of C's preprocessor to restrict the visibility of public type, function, and data declarations to just those objects that the application program may need at a certain place (see Example 5). (The private type, function, and variable data needed by the MEL routines themselves are not shown in the example and are hidden from your program by other defined/undefined manifest constants.) For another approach refer to the article by Jeffery. Summary And Future Enhancement Software engineering is rapidly evolving and everyone seems to have his or her own ideas about what makes a good user-interface. I believe MEL is a practical answer to the spectrum of interface problems confronting the developer and user of complex technical programs. Some may criticize MEL for its verbosity (as compared to Fortran's fixed field format), the time a user must spend learning to use MEL (versus a more interactive interface), and the somewhat clumsy way objects must be (or at least, were) encoded in C. These points are legitimate and are inherent in MEL's design. No design can be all things to all people. The next steps in MEL's evolution might be incorporating it into a language sensitive editor, a graphical output post-processor, and perhaps later, into an expert system shell specialized for the type of analysis being performed. Bibliography George M. Crews, "HAPN--A Hydraulic Analysis of Piping Networks Program," Masters Thesis in Engineering Science, University of Tennessee, Knoxville, 1989. A portion of this thesis describes MEL and how it was developed and used for a specific analysis program. David Jeffery, "Object-Oriented Programming in ANSI C," Computer Language Magazine, February, 1989. This article discusses the object-oriented paradigm and a way to implement it in C. James Kempf, Numerical Software Tools in C, Prentice-Hall, Inc., 1987. This book contains an introduction to both numerical programming and C. The emphasis of the text is on creating small routines that can be used as building blocks for larger programs. Possible shortcomings are its lack of data hiding and that it treats doubly dimensioned arrays statically rather than dynamically. Ewing Lusk, Overbeek, et al., Portable Programs for Parallel Processors, Holt, Reinhart and Winston, Inc., 1987. This book describes a set of C tools for use on a broad range of parallel machines. William H. Press, Flannery, et al., Numerical Recipes in C, Cambridge University Press, 1988. Based on an earlier Fortran edition, this is a great cookbook giving a wide range of oven-tested recipes for the numerical gourmet. It shows the correct way to handle multidimensioned arrays (dynamically). A complaint sometimes heard is that a few of the algorithms are getting obsolete due to rapid advances in numerical techniques being made. ASTM E 380-82 Standard for Metric Practice, American Society for Testing Materials, 1982. This standard contains many useful conversion factors between English and metric units. Listing 1 Example 1. An Example of MEL Input for a Hydraulic Analysis Program. (Note that tokens will be unique to each application.) title, 'Example Problem Illustrating MEL'; fluid, "water" density = 62.4 (lbm/ft3), viscosity = 1 (cp); node, 1, pressure = 8.67 (psi); "20 ft of water" branch, 100, from_node = 1, to_node = 2; pipe, length = 100 (ft), id = 6 (in), material = steel; end_of_branch; node, 2, pressure = 6.5 (psi); "15 ft of water" next; Listing 2 Example 2. Example of Obtaining Data From a MEL Descriptor: Descriptor: pipe, length = 100 (ft), diameter = 6 (in); Code fragment: double pipe_length, diameter; union meli_param_data data; /* see Example 5. */ char units[MAX_STRING_LEN+1]; int array_len; int unknown_flag; meli(); /* reads descriptor */ meli_data("length", &data, units, &array_len, &unknown_flag); /* gets pipe length */ pipe_length = data.real; /* will equal 100 */ meli_data("diameter", &data, units, &array_len, &unknown_flag); /* gets pipe diameter */ diameter = data.real; /* will equal 6 */ /* note that units, array_len, and unknown_flag are not considered (used). */ Listing 3 Example 3. Example of Outputting a MEL descriptor: Code Fragment: double pipe_length = 100, diameter = 6; union melo_param_data data; /* see Example 5. */ char length_units[] = "ft"; char diameter_units[] = "in"; int array_len = 0; int unknown_flag = 0; melo_init("pipe"); /* initialize */ /* get data ready to output: */ data.real = pipe_length; melo_data("length", &data, length_units, array_len, unknown_flag); data.real = diameter; melo_data("diameter", &data, diameter_units, array_len, unknown_flag); melo(); /* translates data into string */ Descriptor: pipe, length = 100 (ft), diameter = 6 (in); Listing 4 Example 4. An Example of Output Generated by a Hydraulic Analysis Program using MEL. (From the input data given in Example 1.) program, name = 'HAPN - Hydraulic Analysis of Piping Networks', problem_title = 'Example Problem Illustrating MEL'; message, text = 'Date: Thu Jul 13 09:02:11 1989'; message, text = 'Input filename: input'; equations, node = 0, loop = 0, iterations = 7; branch, number = 100, type = 'independent_branch', flow_rate = 436238 (lbm/h), flow_change = -6.20476e-007 (%), flow_dp = 2.17 (psi), elevation_dp = 0 (psi); component, branch_number = 100, component_number = 0, type = 'pipe', resistance = 4.95228 (Pa*s2/kg2), change_resistance = -1.24095e-008 (%), pressure_drop = 2.17 (psi); node, number = 1, pressure = 8.67 (psi); node, number = 2, pressure = 6.5 (psi); next; Listing 5 Example 5. Public Interface Between MEL and Any Application Program Using It. (Excerpted from mel.h header file.) /* if using MEL for input (#define MEL_INPUT), then must define the MEL input data object: */ #ifdef MEL_INPUT /* firstly, define input constants (all must be CUSTOMIZED for specific application program): */ #define MELI_MAX_DESCRIP_STR_LEN 256 /* maximum number of characters in any input descriptor string. */ #define MELI_MAX_PARAMS 6 /* maximum number of parameters for any descriptor (min num = 1). */ #define MELI_MAX_PARAM_STR_LEN 80 #define MELI_MAX_PARAM_ARRAY_STR_LEN 1 /* largest allowable parameter string lengths (min size = 1) */ #define MELI_MAX_PARAM_INT_ARRAY_LEN 1 #define MELI_MAX_PARAM_REAL_ARRAY_LEN 1 #define MELI_MAX_PARAM_STR_ARRAY_LEN 1 /* maximum number of elements in parameter data arrays (min = 1). */ #define MELI_UNITS_STR_LEN 80 /* maximum length of units associated with any param (min = 1) */ /* secondly, define input data structures: */ union meli_param_data { int integer; /* also holds boolean type */ double real; char string[MELI_MAX_PARAM_STR_LEN+1]; int integer_array [MELI_MAX_PARAM_INT_ARRAY_LEN]; double real_array[MELI_MAX_PARAM_REAL_ARRAY_LEN]; char string_array [MELI_MAX_PARAM_STR_ARRAY_LEN] [MELI_MAX_PARAM_ARRAY_STR_LEN+1]; }; /* this is used for input parameter data. it may either be an integer, real, string, array of integers, array of reals, or an array of strings. (to save space a union was used.) */ /* thirdly, define input variables: */ char meli_descriptor_string[MELI_MAX_DESCRIP_STR_LEN+1]; /* global storage for the input descriptor string. */ /* lastly, define input functions (typically they return 0 if no error encountered, else some nonzero error code): */ int meli_file(FILE *meli_file_handle); /* read a descriptor string from the input stream and call meli(). also, put copy of string read into meli_descriptor_string. */ int meli(void); /* translate meli_descriptor_string and put information into a private data structure (meli_datum). */ char *meli_descrip_type (void); /* return pointer to name of type of descriptor read by meli(). */ int meli_num_params(void); /* return number of parameters read by meli(). */ int meli_param(int param_num, char *param, union meli_param_data *data, char *units, int *array_len, int *unknown_flag); /* fill arguement list with param_num'th parameter read by meli(). (start with param_num = 0.) */ int meli_data(char *param, union meli_param_data *data, char *units, int *array_len, int *unknown_flag); /* see if *param was input. if it was, then fill argument list with data from meli_datum. */ #endif /* MEL_INPUT */ /* if using MEL for output, must define the MEL output data object: */ #ifdef MEL_OUTPUT /* firstly, define output constants (all must be CUSTOMIZED): */ #define MELO_MAX_DESCRIP_STR_LEN 256 /* how many characters can be in an output descriptor string? */ #define MELO_MAX_PARAMS 6 /* maximum number of parameters for any descriptor. */ #define MELO_MAX_PARAM_STR_LEN 80 #define MELO_MAX_PARAM_ARRAY_STR_LEN 1 /* largest allowable parameter string length. */ #define MELO_MAX_PARAM_INT_ARRAY_LEN 1 #define MELO_MAX_PARAM_REAL_ARRAY_LEN 1 #define MELO_MAX_PARAM_STR_ARRAY_LEN 1 /* maximum number of elements in array of parameter data. */ #define MELO_UNITS_STR_LEN 80 /* maximum string length of any units associated with a param. */ /* secondly, define output data structures: */ union melo_param_data { int integer; double real; char string[MELO_MAX_PARAM_STR_LEN+1]; int integer_array[MELO_MAX_PARAM_INT_ARRAY_LEN]; double real_array[MELO_MAX_PARAM_REAL_ARRAY_LEN]; char string_array[MELO_MAX_PARAM_STR_ARRAY_LEN] [MELO_MAX_PARAM_ARRAY_STR_LEN+1]; }; /* this is for output parameter data. it may either be an integer, real, string, array of integers, array of reals, or an array of strings. (to save space a union was used.) */ /* thirdly, define output variables: */ char melo_descriptor_string[MELO_MAX_DESCRIP_STR_LEN+1]; /* global storage for the output descriptor string. */ /* lastly, define output functions (typically return 0 if no error): */ int melo_init(char *descrip_type); /* initialize private data structure (melo_datum) to accept parameter data from following functions. output descriptor type will be descrip_type. returns 0 if no errors were encountered. */ int melo_data(char *param, union melo_param_data *data, char *units, int array_len, int unknown_flag); /* put data for parameter *param into the proper place in melo_datum. returns zero if no errors were encountered. */ void melo(int melo_verbose_flag); /* takes the information in melo_datum and translates it into melo_descriptor_string. user must set melo_verbose_flag = 1 to make output as readable as possible, set it equal to zero to make output as terse as possible (and still remain in MEL format). */ int melo_file(FILE *melo_file_handle, int melo_verbose_flag); /* take the information in melo_datum, translate it into melo_descriptor string, and output it to file. */ #endif /* MEL_OUTPUT */ /* now define data objects common to both input and output: */ /* if an error occurs, MEL will try and tell you what happened. so define required error handling information: */ #define MEL_MAX_ERR_MSG_LEN 80 struct mel_errors { enum { /* which error occured? */ mel_no_err, mel_read_err, mel_write_err, mel_end_of_file_err, mel_end_of_data_err, mel_syntax_err, mel_unknown_descrip_name_err, mel_unknown_param_name_err, mel_missing_param_name_err, mel_param_data_err, mel_missing_paren_err, mel_too_many_param_err, mel_missing_bracket_err, } type; int, start_line; /* on which lines did err occur? */ int end_line; /* (meaningful for input only.) */ char msg[MEL_MAX_ERR_MSG_LEN+1]; /* additional info describing err */ } mel_err; /* (not same as messages below). */ #define MEL_MAX_NUM_ERR_MESSAGES 13 #ifdef MEL-INIT /* the following describes each type of enumerated error: */ char mel_err_msg[MEL_MAX_NUM_ERR_MESSAGES] [MEL_MAX_ERR_MSG_LEN+1] ={"No errors encountered", "Can't read file", "Can't write file", "Unexpected end of file encountered", "End of input data encountered", "Descriptor/parameter syntax error", "Unknown descriptor name", "Unknown parameter name", "A (or another) parameter name was expected but is " "missing", "Unable to read parameter value(s) for this " "descriptor", "Missing right parenthesis while reading units", "Too many (or duplicate) parameters given for this " "descriptor", "Missing brackets around array data"}; #else extern char mel_err_msg[MEL_MAX_NUM_ERR_MESSAGES] [MEL_MAX_ERR_MSG_LEN+1]; #endif /* MEL_INIT */ Object-Oriented Programming As A Programming Style Eric White Eric White is a software engineer at Advanced Programming Institute, Ltd. He is working on a character-based version of XVT. XVT is a common programming interface with implementations for various window systems, including Macintosh, Microsoft Windows, Presentation Manager, OSF/Motif, and character based on UNIX and MS-DOS. He can be reached at API at (303) 443-4223. Object-oriented programming is a programming style that can be used in many languages, including C and C++. Some programmers think that C++ gives them the ability to do object-oriented programming. This isn't accurate -- C programmers can already do object-oriented programming. I will demonstrate by showing two identically structured object-oriented programs, one in C and the other in C++. Even though one can do object-oriented programming in C, C++ offers several advantages: C++ supplies syntactic support for object-oriented programming and C++ provides type checking where not possible in C. I am assuming the reader has already read one of the numerous magazine articles that introduce object-oriented programming. A good article is "Mapping Object Oriented Concepts Into C++ Language Facilities", CUJ July '89 by Tsvi BarDavid. If you already know C, an example of object-oriented programming in C can clarify exactly what is goes on in object-oriented programming. Once you understand the C example, the identical example in C++ can make learning C++ easier. You can even imagine how the code generated by a C++ translator looks. The Example I'll develop the comparison using a graphical application that could be the beginnings of a drawing program such as Mac Draw. This example is constructed with four classes of objects: graph_obj, circle, square, and double_circle. Three instructions can be given to any one of these objects: init, which takes as arguments the initial position and size of the object. init initializes the object, then draws it. move, which draws the object in black, modifies the position, then draws it in white. move takes a change in the y and x coordinates as arguments. draw, used by init and move. draw takes a color as an argument. The Listings Listing 1 is the pseudo-code for the example. The code in Listing 2 (obj.h) and Listing 3 (obj.c) facilitates object-oriented programming in C, allowing the creation of classes, methods, objects, and implementing inheritance. Listing 4 (drawc.c) and Listing 5 (drawcxx.cxx) are two examples of object-oriented code in C and C++ respectively. They perform identically. In the pseudo-code, you can see: We derive classes circle and square from class graph_obj. We derive class double_circle from class circle. All classes inherit the method move from class graph_obj. If method move needs to be invoked for an object of class circle, then method move of class graph_obj is actually the function called. We are able to reuse the move method for every class in this example. Class double_circle inherits the method init from class circle. Class double_circle overrides the method draw from class circle. If method draw needs to be invoked for an object of class double_circle, then the method is not inherited from the super-class. For portability, I isolate the graphics functions in a utility module. Listing 6 (utility.h) is the interface to the utility module. Listing 7 (utility.c) contains fatal() and the graphical functions. The utility module is compiled and linked with either the C or C++ code. The isolation also makes it easier to compare the two object-oriented examples. Object-Oriented Programming In C This system implements classes, methods, objects, inheritance, and messages. The entire module that facilitates object-oriented programming is less than 90 lines of code. I'll start with a simple data abstraction mechanism, then develop it into a system that supports classes, inheritance, and messages. The most natural means of creating an object and associating methods with it is to put pointers to the methods (pointers to functions) in a structure along with the data. A structure for an instance of a circle might look like this: struct { int y; int x; int radius; void (*init)(); void (*draw)(); void (*move)(); } circle; This implements an object that knows how to initialize itself, draw itself, and move itself. The implementation could vary for different types (such as a double circle). However, we might get tired of setting up the methods every time we create a new instance of a circle. A solution is to design another structure (called a class) that contains the pointers to the functions, and place only a pointer to the class in each object. With this technique we may create a class once, then create several objects and have them point to that class. To make the class structure more generic, we define an array of pointers to functions, and by convention, define the methods as an index into this array. The code now looks like /* defines for methods */ #define INIT 0 #define DRAW 1 #define MOVE 2 struct class { int nbr_methods; void (**method)(); }; typedef struct class CLASS; struct { CLASS *class; int y; int x; int radius; } circle; When creating a class, we need to initialize the array of pointers to functions after allocating memory for it. If the method is implemented in the class itself, then the pointer is set to the function address. If the method is inherited from the super-class, then the pointer is loaded from the super-class. To make an object more generic, we'll take the definition of the data out of the object and replace it with a pointer to the data. Space for the data is allocated when the object is created and freed when the object is no longer needed. Listing 2 contains the final definitions of structures for class and object. Classes To define a class: Define a structure to hold the information about the class. (Listing 6, lines 15-18) Write the methods (the functions associated with the class). An example is the DRAW method for class circle. (Listing 6, lines 69-81) Declare a structure of type class. (Listing 6, line 143) Call new_class(), which loads the pointers to the inherited methods from the super-class. It also saves the size of memory needed for each object in the class. (Listing 6, line 160) Call reg_method() to register each method that we want to implement in the class being created. Registering a method means storing a pointer to a function in the array of pointers to functions. reg_method() shouldn't be called for methods inherited from the super-class. (Listing 6, lines 161-162) Methods A method is a function written specifically to go with the class. In this example, methods don't return a value. All methods should be aware that obj->data is a pointer to the data allocated on the heap. For a particular class, this data is of an assumed structure type. By casting obj->data to a pointer to a structure, the method can access the object data correctly. All methods receive the argument arg_ptr, which can be used with the macro va_arg() if there are arguments to the method. See your documentation on stdarg.h. Objects The structure that holds what we need to know about an object is: typedef struct { void *data; CLASS *class; } OBJECT; To create and use an object: Declare a structure of type OBJECT. (Listing 6, line 148) Call the function new_object(), which registers a class with the object and allocates memory for the object. (Listing 6, line 174) Send messages to the object. With the graphical objects in the example, the first message that we want to send is the INIT message. (Listing 6, line 175). After that, we can send MOVE or DRAW messages. (Listing 6, line 186) When done with the object, we call free_object (), which frees the allocated memory. (Listing 6, line 191) Inheritance Inheritance of methods is demonstrated here. circle inherits MOVE from class graph_obj. double_circle inherits INIT and MOVE from its super-classes. I implement inheritance of data structures by having a sub-class allocate more memory than the super-class. The sub-class data consists of the parent-class data followed by the data specific to the subclass. Messages There is a distinction between a message and a method. A message gets sent to an object, and then something decides which method to invoke. Invoking a method means that the function that is part of the class is called. In C++, the translator decides which method to invoke. In the system implemented in C, the function message() (Listing 3) decides, based on the class of the object. Summary Of OOP In C One disadvantage of doing object-oriented programming in C is that there is no function prototyping. We have no idea what the arguments to a method are when we declare the pointers to functions in the class structure. Programmers are responsible for sending the correct parameters to a message. Another disadvantage is that when writing methods, the programmer must access the data in the object correctly. The pointer to the data in the object structure must be cast as a pointer to the correct structure type. Object-Oriented Programming In C++ The C++ example also demonstrates classes, methods, objects, inheritance, and messages. I'll explain a small subset of the syntax of C++, only what is essential to do object-oriented programming. There are many features of C++ that have nothing to do with object-oriented programming, and the object-oriented programming part of C++ is elaborate, with useful but nonessential features. The subset is: Definition of a class, with and without a super-class. Definition of a method. Declaration of an object. Sending a message to an object. Classes The three essential pieces of a class are: The data structure of the class. The super-class if there is one. The methods. The definition of a class in C++ looks like: class graph_obj { public: int y; int x; void init(int y, int x); void move(int y, int x); virtual void draw(int color){}; }; y and x are the data that will be contained in an object of class graph_obj. To define methods, you put the function prototype for the methods in the definition of the class. The class graph_obj doesn't have a super-class. When defining a class where there is a super-class, you follow the name of the class by a colon (:), the keyword public, and the name of the super-class. For example: class circle : public graph_obj { public: int radius; void init(int y, int x, int radius); void draw(int color); }; Members of a class may be private or public. For simplicity's sake, all members of all classes in this example are public. I'm not attempting to do data-hiding in this example. Hiding data is a separate (and important) issue, but is beyond the scope of this article. The keyword public before the name of the super-class means that all the public members of the super-class are public members of the sub-class. Methods The definition of a method looks similar to that of a function. To define the name of the function, you follow the class name with the scope resolution operator :: and the name of the method. For example, the draw method for class circle would look like this: void circle::draw(int color) { /* code to draw a circle */ ... } Here is an important note about coding a method. A hidden argument to every method is the object. When a method gets invoked for a particular object, by definition you get access to that object. You can access the members of that object just by using the names of the members. Methods are invoked much as functions are called in C. Sometimes, when writing code for a method, we want to force a method to be invoked for a super-class, and the class for which we are writing the method has a method of the same name as the one in the super-class that we want to invoke. In this case, we can use the scope resolution operator (::) and force the method to be invoked for the super-class. For the init method for class circle, to invoke the init method for class graph_obj, we specify the name of the class, followed by the scope resolution operator, followed by the name of the method. Sometimes the method to invoke at run time can't be determined because a particular section of code could be operating on many types of objects. In C++, code such as this must be operating on objects of a certain class, or of a sub-class of that class. If you declare a method of a class highest in the class hierarchy virtual, C++ will wait until run time to make the decision of which method to invoke, and will invoke the correct method for the object being operated on. To do this, C++ puts something in the object that indicates which class it is. Resolution of the method to invoke at run time is called late binding. This is useful when you send messages to pointers to objects, where the pointer could point to one of several classes of objects. It's also useful in a method that serves a class and its subclasses. draw is virtual because the method move (which uses the method draw) in class graph_obj also serves classes circle, double_circle and square. In C++, each class can have two special methods: the constructor and destructor. Essentially the constructor is called automatically when an object comes into scope, and the destructor is called when an object goes out of scope. For example, if you declare an automatic object at the start of a function, the constructor is called at the time of declaration, and the destructor is called before the function returns to its calling function. Constructors and destructors are not essential to object-oriented programming. In other systems, programmers make a method specifically for initializing an object when they need one, then send that message to the object after creating it. In the C++ example that accompanies this article, I don't use the built-in constructors and destructors. In both the C and C++ examples, I have a method that initializes the values of the graphical object. I call this method INIT. In the C example, I use a function that allocates memory for the object before use and frees the memory after use. These functions aren't defined as part of a class and should not be confused with methods. Objects An object declaration looks like a declaration of something for which there is a typedef in C. A declaration of an object of class circle looks like: circle c1; In the graphics example, immediately after declaring a graphical object the init message is sent to the new object. This gives the object its starting position and size, and draws it on the screen. Listing 7, line 99 shows initialization of a circle at position (40, 40), with a radius of 20. After sending the init message, we can send a move message to the object, causing it to move on the screen. (Listing 7, line 103-105). In the C example, we use a pointer in an object to point to the data specific to that instance of the object. new_object() allocates that data on the heap, and the function free_obj () frees it. In contrast, the C++ translator actually creates a structure that contains the data. In our example, this structure is an automatic structure. Space for it gets deallocated when main() returns. We don't need to free any data on the heap as we needed to do in the C example. Inheritance Just as in the C example, the C++ example demonstrates inheritance of methods. double_circle inherits init and move from class circle. Messages Sending a message in C looks like: message(&c1, MOVE, 1, 1); Sending a message in C++ looks like: c1.move(1, 1); We specify the same essential elements in both cases. They are: The object (c1) The message (MOVE or move) The number of pixels to move in the y and x direction. Summary Of OOP In C++ Data hiding and modularity are important issues in C++ as in other languages. I am not addressing these issues and have put the entire program in one source file. I want to focus on the object-oriented aspect and keep it simple. Often in C++, when a message is sent to an object of a known type, the compiler resolves the particular method to invoke at compilation time. This is called early binding. In contrast, the function message() in the C scheme presented here resolves the issue of which method to invoke at run time. This is called late binding. Because the C methodology always does late binding, a little more code must always be executed at run time. The C code may be a bit slower than the code generated by the C++ translator. However, when using virtual functions, I believe that the speed of sending a message in C is comparable to C++. C++ inherits many of the characteristics of C. In C++, you have the ability to corrupt memory in the same ways that you can corrupt memory in C. This causes temporal and referential non-localization of bugs. C++ offers the same beneficial characteristics of C such as speed, compactness, and the possibility of portability. Portability The C code is quite portable and runs on: Microsoft C v5.1 Microsoft Quick C v2.0 Zortech C compiler The C++ code runs on: Zortech C++ compiler Glockenspiel C++ translator using the Microsoft C compiler v5.1. The graphics code works on CGA, EGA, Hercules and VGA. The utility module can use either the graphics library that accompanies Microsoft C v5.0 or the graphics library that comes with the Zortech C++ compiler. If you are using the Microsoft graphics library and Hercules graphics, before you can run these programs you need to run MSHERC.COM. The Zortech graphics library has its origin at the lower-left corner. Microsoft has its origin at the upper-left corner. Also, because pixels are not square, neither the Zortech nor the Microsoft libraries create perfectly round circles. Because this article is focusing on object-oriented techniques and not on graphical techniques, I didn't address any of these problems. Exercises A few valuable exercises might be: Make a new class such as a diamond. Make a new method such as expand or contract that will change the size of an object. Adapt this system to another graphical system. Acknowledgements I thank Marc Rochkind and Tom Cargill, who taught me much of what I know about object-oriented programming. Listing 1 Class Graphical Object Graphical Object is an abstract class. There will never be any instances of this class. Classes Circle and Square are subclasses of this class. Graphical Object data: y position x position Graphical Object methods: INITIALIZE Starting y position Starting x position DRAW Only implemented by subclasses MOVE Arguments: Increment in the y direction Increment in the x direction Send the draw black message to the object (erase the object). Modify the x and y position of the object per the arguments passed to the MOVE method. Send the draw white message to the object. Class Circle Circle is a subclass of class Graphical Object. Circle data (in addition to Graphical Object data): radius of the circle Circle methods: INITIALIZE Arguments: Starting y position Starting x position Radius Send the INITIALIZE message to class Graphical Object Store the size in the Circle data. Send the DRAW message to the Circle. DRAW Argument: Color of the circle to be drawn. Draw the circle on the screen. MOVE Inherited from the class Graphical Object. Class Square Square is a subclass of class Graphical Object. Square data: the length of a side of the square Square methods: INITIALIZE Arguments: Starting y position Starting x position Radius Send the INITIALIZE message to class Graphical Object Store the size in the Square data. Send the DRAW message to the Square. DRAW Argument: Color of the square to be drawn. Draw the square on the screen. MOVE Inherited from the class Graphical Object. Class Double_circle Class Double_circle is a subclass of class Circle. Double_circle data: Same a for a Circle. Double_circle methods: INITIALIZE Inherited from class Circle. DRAW Argument: Color of the Double_circle to be drawn. Draw a circle on the screen. Draw a slightly smaller concentric circle on the screen. MOVE Inherited from class Circle. Listing 2 001 /* obj.h - Interface to module for object oriented 002 programming in C. */ 003 004 struct class { 005 int size; /* size of data */ 006 int nbr_methods; 007 void (**method)(); 008 }; 009 010 typedef struct class CLASS; 011 012 typedef struct { 013 void *data; 014 CLASS *class; 015 } OBJECT; 016 017 void new_class(CLASS *class, CLASS *super_class, 018 int nbr_methods, int size); 019 void reg_method(CLASS *class, int mth, void (*fcn)()); 020 void new_object(OBJECT *obj, CLASS *class); 021 void message(OBJECT *obj, int msg, ...); 022 void free_object(OBJECT *obj); 023 void free_class(CLASS *class); Listing 3 001 #include <stdlib.h> 002 #include <stdarg.h> 003 #include <stdio.h> 004 #include "utility.h" 005 #include "obj.h" 006 007 void new_class(CLASS *class, CLASS *super_class, 008 int nbr_methods, int size) 009 { 010 int x; 011 class->nbr_methods = nbr_methods; 012 class->size = size; 013 class->method = 014 (void (**)())malloc 015 ((unsigned)(nbr_methods * sizeof (void (*)()))); 016 for (x = 0; x < nbr_methods; ++x) 017 class->method[x] = (void *)NULL; 018 if (super_class != NULL) 019 for (x = 0; x < super_class->nbr_methods && 020 x < class->nbr_methods; ++x) 021 class->method[x] = super_class->method[x]; 022 } 023 024 void free_class(CLASS *class) 025 { 026 free(class->method); 027 } 028 029 /* register a method with a class */ 030 void reg_method(CLASS *class, int mth, void (*fcn)()) 031 { 032 if (mth < 0 mth >= class->nbr_methods) 033 fatal("attempting to register an invalid method"); 034 class->method[mth] = fcn; 035 } 036 037 /* initialize an object */ 038 void new_object(OBJECT *obj, CLASS *class) 039 { 040 void *v; 041 obj->class = class; 042 v = malloc((unsigned)class->size); 043 if (v == NULL) 044 fatal("smalloc failed"); 045 obj->data = (void *)((unsigned char *)v); 046 } 047 048 /* send a message to an object */ 049 void message(OBJECT *obj, int msg, ...) 050 { 051 va_list arg_ptr; 052 va_start(arg_ptr, msg); 053 if (obj->class->method[msg) == NULL) 054 fatal("no method for this class"); 055 (*obj->class->method[msg])(obj, arg_ptr); 056 va_end(arg_ptr); 057 } 058 059 /* free the data allocated for an object */ 060 void free_object(OBJECT *obj) 061 { 062 free(obj->data); 063 } Listing 4 001 /* interface to utility module */ 002 003 extern int g_white; 004 extern int g_black; 005 006 void fatal(char *s); 007 void g_init(void); 008 void cleanup(void); 009 void g_circle(int y, int x, int radius, int color); 010 void g_square(int y, int x, int size, int color); Listing 5 001 #include <stdlib.h> 002 #include <stdarg.h> 003 #include <stdio.h> 004 #include "utility.h" 005 #ifdef __ZTC______LINEEND____ 006 #include <fg.h> 007 #else 008 #include <graph.h> 009 #endif 010 011 int g_white; 012 int g_black; 013 014 void fatal(char *s) 015 { 016 printf("FATAL ERROR: %s\n", s); 017 exit(1); 018 } 019 020 void trace(char *fmt, ...) 021 { 022 static FILE *outfp = NULL; 023 va_list arg_ptr; 024 va_start(arg_ptr, fmt); 025 if (outfp == NULL) { 026 unlink("tf"); 027 if ((outfp = fopen("tf", "w")) == NULL) 028 fatal("fopen failed\n"); 029 setbuf(outfp, NULL); 030 } 031 vfprintf(outfp, fmt, arg_ptr); 032 va_end(arg_ptr); 033 } 034 035 /* utility function to put screen in graphics mode */ 036 void g_init(void) 037 { 038 #ifdef_ZTC_____LINEEND____ 039 fg __init__all(); 040 g_white = FG_WHITE; 041 g_black = FG_BLACK; 042 #else 043 struct videoconfig this_screen; 044 _getvideoconfig(&this_screen); 045 switch (this_screen.adapter) 046 { 047 case _CGA: 048 case _OCGA: 049 _setvideomode(_HRESBW); 050 break; 051 case _EGA: 052 case _OEGA: 053 _setvideomode(_ERESCOLOR); 054 break; 055 case _VGA: 056 case _OVGA: 057 case _MCGA: 058 _setvideomode(_VRES2COLOR); 059 break; 060 case _HGC: 061 _setvideomode(_HERCMONO); 062 break; 063 default: 064 printf("This program requires a CGA, EGA, MCGA,"); 065 printf("VGA, or Hercules card\n"); 066 exit(0); 067 } 068 g_white = _getcolor(); 069 g_black = 0; 070 #endif 071 } 072 073 /* utility function - wait for a key so we can see 074 graphics, set video mode back to character mode */ 075 void cleanup() 076 { 077 int ch; 078 ch = getchar(); 079 #ifdef __ZTC______LINEEND____ 080 fg_term(); 081 #else 082 _setvideomode(_DEFAULTMODE); 083 #endif 084 /*lint -esym(550,ch) */ 085 } 086 /*lint +esym(550,ch) */ 087 088 void g_circle(int y, int x, int radius, int color) 089 { 090 #ifdef __ZTC_____LINEEND____ 091 fg_drawarc((fg_color_t)color, FG_MODE_SET, ~0, x, y, 092 radius, 0, 3600, fg_displaybox); 093 #else 094 _setcolor(color); 095 _ellipse(_GBORDER, x - radius, y - radius, x + radius, 096 y + radius); 097 #endif 098 } 099 100 void g_square(int y, int x, int size, int color) 101 { 102 #ifdef __ZTC______LINEEND____ 103 int hs; 104 fg_box_t box; 105 hs = size / 2; 106 box[FG_X1] = x - hs; 107 box[FG_Y1] = y - hs; 108 box[FG_X2] = x + hs; 109 box[FG_Y2] = y + hs; 110 fg_drawbox((fg_color_t)color, FG_MODE_SET, ~0, 111 FG_LINE_SOLID, box, fg_displaybox); 112 #else 113 int hs; 114 hs = size / 2; 115 _setcolor(color); 116 _rectangle(_GBORDER, x - hs, y - hs, x + hs, y + hs); 117 #endif 118 } Listing 6 001 #include <stdio.h> 002 #include <stdlib.h> 003 #include <stdarg.h> 004 #include "utility.h" 005 #include "obj.h" 006 007 /* methods for graphical_object, circle, double_circle, square */ 008 #define INIT 0 009 #define DRAW 1 010 #define MOVE 2 011 012 /********************************************************/ 013 /* CLASS GRAPHICAL OBJECT */ 014 015 struct graph_obj_s { 016 int y; 017 int x; 018 }; 019 020 typedef struct graph_obj_s GRAPH_OBJ_T; 021 #define GRAPH_OBJ_SIZE sizeof(GRAPH_OBJ_T) 022 #define GRAPH_OBJ_OFFSET 0 023 024 /* graph_obj_init(object, y_position, x_position); */ 025 void graph_obj_init(OBJECT *obj, va_list arg_ptr) 026 { 027 GRAPH_OBJ_T *g; 028 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 029 GRAPH_OBJ_OFFSET); 030 g->y = va_arg(arg_ptr, int); 031 g->x = va_arg(arg_ptr, int); 032 } 033 034 /* graph_obj_move(object, distance_y, distance_x); */ 035 void graph_obj_move(OBJECT *obj, va_list arg_ptr) 036 { 037 GRAPH_OBJ_T *g; 038 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 039 GRAPH_OBJ_OFFSET); 040 message(obj, DRAW, g_black); 041 g->y += va_arg(arg_ptr, int); 042 g->x += va_arg(arg_ptr, int); 043 message(obj, DRAW, g_white); 044 } 045 046 /********************************************************/ 047 /* CLASS CIRCLE */ 048 049 struct circle_s { 050 int radius; 051 }; 052 053 typedef struct circle_s CIRCLE_T; 054 #define CIRCLE_SIZE sizeof(CIRCLE_T) + GRAPH_OBJ_SIZE 055 #define CIRCLE_OFFSET sizeof(GRAPH_OBJ_T) 056 057 /* circle_init(object, y_position, x_position, radius); */ 058 void circle_init(OBJECT *obj, va_list arg_ptr) 059 { 060 CIRCLE_T *c; 061 graph_obj_init(obj, arg_ptr); 062 (void)va_arg(arg_ptr, int); 063 (void)va_arg(arg_ptr, int); 064 c = (CIRCLE_T *)((unsigned char *)obj->data + CIRCLE_OFFSET); 065 c->radius = va_arg(arg_ptr, int); 066 message(obj, DRAW, g_white); 067 } 068 069 /* circle_draw(object, color); */ 070 void circle_draw(OBJECT *obj, va_list arg_ptr) 071 { 072 int color; 073 CIRCLE_T *c; 074 GRAPH_OBJ_T *g; 075 c = (CIRCLE_T *)((unsigned char *)obj->data + CIRCLE_OFFSET); 076 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 077 GRAPH_OBJ_OFFSET); 078 color = va_arg(arg_ptr, int); 079 /* g_circle(g->y, g->x, c->radius, va_arg(arg_ptr, int)); */ 080 g_circle(g->y, g->x, c->radius, color); 081 } 082 083 /********************************************************/ 084 /* CLASS SQUARE (very similar to CIRCLE) */ 085 086 struct square_s { 087 int size; 088 }; 089 090 typedef struct square_s SQUARE_T; 091 #define SQUARE_SIZE sizeof(SQUARE_T) + GRAPH_OBJ_SIZE 092 #define SQUARE_OFFSET sizeof(GRAPH_OBJ_T) 093 094 /* square_init(object, y_position, x_position, size); */ 095 void square_init(OBJECT *obj, va_list arg_ptr) 096 { 097 SQUARE_T *s; 098 graph_obj_init(obj, arg_ptr); 099 (void)va_arg(arg_ptr, int); 100 (void)va_arg(arg_ptr, int); 101 s = (SQUARE_T *)((unsigned char *)obj->data + SQUARE_OFFSET); 102 s->size = va_arg(arg_ptr, int); 103 message(obj, DRAW, g_white); 104 } 105 106 /* square_draw(object, color); */ 107 void square_draw(OBJECT *obj, va_list arg_ptr) 108 { 109 SQUARE_T *s; 110 GRAPH_OBJ_T *g; 111 s = (SQUARE_T *)((unsigned char *)obj->data + SQUARE_OFFSET); 112 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 113 GRAPH_OBJ_OFFSET); 114 g_square(g->y, g->x, s->size, va_arg(arg_ptr, int)); 115 } 116 117 /********************************************************/ 118 /* CLASS DOUBLE CIRCLE (sub-class of CIRCLE) */ 119 120 #define DOUBLE_CIRCLE_SIZE CIRCLE_SIZE 121 122 /* double_circle_draw(object, color); */ 123 void double_circle_draw(OBJECT *obj, va_list arg_ptr) 124 { 125 int color; 126 CIRCLE_T *c; 127 GRAPH_OBJ_T *g; 128 c = (CIRCLE_T *)((unsigned char *)obj->data + CIRCLE_OFFSET); 129 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 130 GRAPH_OBJ_OFFSET); 131 color = va_arg(arg_ptr, int); 132 g_circle(g->y, g->x, c->radius, color); 133 g_circle(g->y, g->x, c->radius - 2, color); 134 } 135 136 /********************************************************/ 137 138 int main(int argc, char **argv); 139 int main(int argc, char **argv) 140 { 141 int x; 142 143 CLASS graph_obj; 144 CLASS circle; 145 CLASS square; 146 CLASS double_circle; 147 148 OBJECT c1; 149 OBJECT s1; 150 OBJECT dc1; 151 152 g_init(); 153 154 /* make class graphical object */ 155 new_class(&graph_obj, NULL, 3, GRAPH_OBJ_SIZE); 156 reg_method(&graph_obj, INIT, graph_obj_init); 157 reg_method(&graph_obj, MOVE, graph_obj_move); 158 159 /* make class circle */ 160 new_class(&circle, &graph_obj, 3, CIRCLE_SIZE); 161 reg_method(&circle, INIT, circle_init); 162 reg_method(&circle, DRAW, circle_draw); 163 164 /* make class square */ 165 new_class(&square, &graph_obj, 3, SQUARE_SIZE); 166 reg_method(&square, INIT, square_init); 167 reg_method(&square, DRAW, square_draw); 168 169 /* make class double_circle */ 170 new_class(&double_circle, &circle, 3, DOUBLE_CIRCLE_SIZE); 171 reg_method(&double_circle, DRAW, double_circle_draw); 172 173 /* make a circle object */ 174 new_object(&c1, &circle); 175 message(&c1, INIT, 40, 40, 20); 176 177 /* make a square object */ 178 new object(&s1, &square); 179 message(&s1, INIT, 40, 100, 20); 180 181 /* make a double circle object */ 182 new_object(&dc1, &double_circle); 183 message(&dc1, INIT, 40, 160, 20); 184 185 for (x = 0; x < 100; ++x) { 186 message(&c1, MOVE, 1, 1); 187 message(&s1, MOVE, 1, 0); 188 message(&dc1, MOVE, 0, -1); 189 } 190 191 free_object(&c1); 192 free_object(&s1); 193 free_object(&dc1); 194 195 free_class(&graph_obj); 196 free_class(&circle); 197 free_class(&square); 198 free_class(&double_circle); 199 200 cleanup(); 201 202 return (0); 203 } Listing 7 001 #include <stdio.h> 002 #include <stdarg.h> 003 #include "utility.h" 004 005 /*********************************************************/ 006 /* CLASS GRAPHICAL OBJECT */ 007 008 class graph_obj { 009 public: 010 int y; 011 int x; 012 void init(int y, int x); 013 void move(int y, int x); 014 virtual void draw(int color){}; 015 }; 016 017 void graph_obj::init(int y2, int x2) 018 { 019 y = y2; 020 x = x2; 021 } 022 023 void graph_obj::move(int y_delta, int x_delta) 024 { 025 draw(g_black); 026 x += x_delta; 027 y += y_delta; 028 draw(g_white); 029 } 030 031 /*********************************************************/ 032 /* CLASS CIRCLE */ 033 034 class circle: public graph_obj { 035 public: 036 int radius; 037 void init(int y, int x, int radius); 038 void draw(int color); 039 }; 040 041 void circle::init(int y2, int x2, int radius2) 042 { 043 graph_obj::init(y2, x2); 044 radius = radius2; 045 draw(g_white); 046 } 047 048 void circle::draw(int color) 049 { 050 g_circle(y, x, radius, color); 051 } 052 053 /*********************************************************/ 054 /* CLASS SQUARE */ 055 056 class square: public graph_obj { 057 public: 058 int size; 059 void init(int y, int x, int radius); 060 void draw(int color); 061 }; 062 063 void square::init(int y2, int x2, int size2) 064 { 065 graph_obj::init(y2, x2); 066 size = size2; 067 draw(g_white); 068 } 069 070 void square::draw(int color) 071 { 072 g_square(y, x, size, color); 073 } 074 075 /*********************************************************/ 076 /* CLASS DOUBLE_CIRCLE */ 077 078 class double_circle: public circle { 079 public: 080 void draw(int color); 081 }; 082 083 void double_circle::draw(int color) 084 { 085 g_circle(y, x, radius, color); 086 g_circle(y, x, radius - 2, color); 087 } 088 089 /********************************************************/ 090 091 int main(void); 092 int main(void) 093 { 094 int x; 095 circle c1; 096 square s1; 097 double_circle dc1; 098 g_init(); 099 c1.init(40, 40, 20); 100 s1.init(40, 100, 20); 101 dc1.init(40, 160, 20); 102 for (x = 0; x < 100; ++x) { 103 c1.move(1, 1); 104 s1.move(1, 0); 105 dc1.move(0, -1); 106 } 107 cleanup(); 108 return (0); 109 } Tools For MS-DOS Directory Navigation Leor Zolman Leor Zolman wrote "BDS C", the first C compiler designed exclusively for personal computers. Since then he has designed and taught programming workshops and has also been involved in personal growth workshops as both participant and staff member. He STILL doesn't hold any degrees. His latest incarnation is as a CUJ staff member. As an MS-DOS user with a large amount of hard disk space to manage, I frequently find myself cd-ing all over the system in pursuit of source files and data. The standard MS-DOS command processor COMMAND.COM's repertoire of options for facilitating system navigation is bare-bones and full of idiosyncrasies. For instance, to change directly to an arbitrary drive and user area, the user must enter the drive selector and path specification as two separate commands. Switching from the root directory of drive C: to the \work directory on drive D: requires the command sequence: C:\>d: (select D:) D:\>cd work (change to the desired directory) D:\WORK>... (All examples assume the PROMPT environment variable is set to $p$g so that COMMAND.COM will display the current path as part of the system prompt.) If the user attempts to select a different drive and a path with one command, he will find that apparently nothing has happened: C:\>cd d:\work C:\>... Actually, the system has selected the specified path to be active on the specified drive, but the specified drive is not selected to be current! The system maintains a current working directory for each logical drive. If the user were then to select that other drive, i.e., C:\>d: D:\WORK>... then the selected path would show up as the current directory. Another "missing" feature in the standard command environment is a simple directory-name aliasing mechanism, so that one can switch quickly to commonly-used directories even if the path name happens to be lengthy. MS-DOS does provide a simplistic facility (the subst command) to relate an arbitrary path to a new drive designator, but subst isn't really adequate: the alias name is limited to a single letter and there is no facility for viewing all active assignments. I would prefer to have the ability to assign arbitrary mnemonics to arbitrary paths, and to have those mnemonics be recognized when specified in cd commands. I would also like some clean mechanism for instantly switching to the previous directory -- even if I've forgotten what it was. The Answer To address these needs, I wrote an extended CD command that supports combined drive and path specifications and a companion command that returns the user to the previous directory (taking the directory specification from information recorded in an environment variable by the extended cd command). The cd-replacement stores the old full path name in an environment variable before switching to a new specified path, and the companion command reads this environment variable and returns to the original directory upon request. Since the extended cd must modify its parent's environment, it uses the functions for modifying the master environment which appeared in the July 1989 issue of CUJ. CDE (for CD Extended) works similarly to MS-DOS's cd command, except for the following special cases: When both a drive designator and a path name are specified, the specified drive is immediately selected together with the path. When the argument is identified as the name of an existing MS-DOS environment variable, the named variable is assumed to contain a path name to be substituted as the path to switch to. In support of the "return to previous directory" feature, I decided to implement a "directory stack" mechanism. This stack is maintained via environment variables, and the user may select the naming convention for those variables by customizing the #define statements in CDERET.H. (See Listing 1.) One master environment variable (I call it CHAINS) specifies the maximum size of the directory stack. When CDE is first invoked, CDE checks to see if the CHAINS variable has been previously defined in the environment; if so, its current value is used. If not, CDE initializes CHAINS to a default value (also specified by a definition in the header file). Thus, the user has the option of setting the value of CHAINS explicitly (using the standard built-in command set) or allowing CDE to handle the initialization of CHAINS automatically. (See Listing 2.) A "stack" of size CHAINS is represented by a set of environment variables named by a common base name (I use CHAIN) with position numbers appended. Thus, with CHAINS=3, after several CDEs the environment variables CHAIN1, CHAIN2 and CHAIN3 would be created to store the pertinent path names in the environment. Every time CDE is used to change directories, it "pushes" the old current working directory "on the stack" by reassigning all the relevant environment variables. CHAIN1 is always the top of the stack, CHAINn (where n = CHAINS) is the base. Since there is no disk activity involved, this process is quite fast. The RET command (Listing 3) "returns" to the previous directory (either specified by CHAIN1 or undefined), then "pops" the stack by reassigning all the active environment variables in reverse order. As long as CHAINS is greater than 1, then the directory stack behaves as described above and successive uses of RET unravel the stack. When CHAINS is set to 1, RET considers this a special case: after returning to the directory specified by CHAIN1, CHAIN1 is reset to the name of the directory that was current at the time of the RET call. Thus, repeated uses of RET with CHAINS equal to 1 effect a "toggle" between two directories. Depending on the way your system is organized, this toggling mechanism may be more useful to you than the directory stack mechanism. Icing The directory aliasing feature is activated by simply setting an environment variable to the full path desired, then using that environment variable name as a parameter to CDE. For example, C:\>set WORK=d:\project\subproj\ new\testing C:\>cde work D:\PROJECT\SUBPROJ\NEW\TESTING>... As a special case, for convenience, giving the CDE command without any arguments will cause CDE to look for a special environment variable (I call it HOME) and switch to the directory it specifies. If you spend much of your time headquartered at one particular directory, this is an easy way to go back to it from anywhere in the system, regardless of the state of the directory stack. The current directory at the time this special form of CDE is given will, as usual, be recorded in the environment by CDE in case you want to use RET from the HOME directory. When setting environment variables in general, be careful not to type spaces between the end of the variable name and the = sign. DOS would keep the space as part of the variable name, and things wouldn't work. The CDE program will handle spaces after the = sign (and before the text) with no problem, but it's probably safer to be consistent and use no spaces whatsoever. Implementation Both CDE.C and RET.C have two phases of operation: the first phase performs the required drive/directory selection, and the second phase updates the related environment variables. If the first phase fails, then the programs exit immediately; there's no need to update environment variables if the current directory wasn't changed. To obtain the name of the target directory in phase one, RET simply accesses the CHAIN1 environment variable. If the variable does not exist, then CDE has never been run and an appropriate message is displayed. If CHAIN1 exists, it specifies the target path. CDE.C gets its target path name from the command line. If the name happens to be the name of an active environment variable, then the value of the variable with that name is used to obtain the target path. The directory selection process itself is identical for both commands and takes two steps: the selection of the logical drive and the selection of the desired directory. The drive is selected first; if that fails, we quit and no harm has been done. Once the new drive has been selected, then the new path is selected. If that fails, we have to go back and reinstate the original drive. If it succeeds, we're done with phase one. Phase two for RET.C is relatively straightforward. If CHAINS is equal to 1, then the CHAIN1 environment variable is set to the original current directory name (before phase one) in order to support the toggling feature. For other values of CHAINS, the directory stack is "popped" by looping to reassign each CHAINn variable to the value of its next higher counterpart. CDE's phase two begins by making sure the CHAINS environment variable, used to specify the stack size, is present and initialized. If it exists, its value is assigned to the program variable chaincnt. If CHAINS does not exist, then it is initialized to the default value (specified by the symbolic string constant DEFAULT_CHAINS). Finally the directory stack is "pushed" by copying each CHAINn variable (for n = 1 to CHAINS-1) to its next higher counterpart. CHAIN1 is a special case; it is assigned to the name of the current directory before phase one was completed. Configuration The following symbolic constants may be changed to suit your own preferences: CHAINS_VAR The master directory chain size control variable CHAIN_BASE The "base" name of directory stack variables DEFAULT_CHAINS The default value for CHAINS_VAR (in quotes) HOME_NAME The name of the env. variable for home directory The CDE.EXE and RET.EXE commands should be placed in a directory that is somewhere in your system search path. (I use c:\bin for all my personal utilities.) System-Dependent Functions The two areas of high compiler-dependency in this application, direct console I/O and DOS logical drive selection, have been isolated in a separate utility library named UTIL.C (Listing 4). The only support function required by the functions in UTIL.C is the bdos function typically supplied with most popular compiler libraries. If you need to write the bdos function yourself, the prototype is shown at the top of the UTIL.C source file. It takes an interrupt (int 21h) function number, a DX register value, and an AL register value as parameters (although the AL parameter is not needed for this application). The bdos function can easily be written in terms of any of the more general operating system interface functions (int86(), intdos(), etc) you may have available. To keep the commands' .EXE file sizes as short as possible, all messages are displayed on the console using direct console I/O calls (through bdos facilities) so as not to require the file I/O support to be dragged into the linkages. The UTIL.C functions cputs () and putch () are similar to their namesakes in the Microsoft library and are provided here for the benefit of users of compiler packages that do not include these functions. The setdrive() function I provide is cleaner than Microsoft's _dos_setdrive(). The library functions chdir() and getcwd() are used by the commands and should be available in your compiler's standard library. When compiled with optimization, both CDE.EXE and RET.EXE weigh in at just over 6K, so their load-and-run time is negligible. Caveats The following line in your CONFIG.SYS file will insure plenty of environment space for the CHAIN variables: shell = c:\command.com /p /e:1500 Due to an as-of-yet inexplicable MS-DOS anomaly, specifying too small a value for the environment (xxxx in /e:xxxx) may cause the system to hang up after CDE or RET completes execution. The message I've gotten says something about COMMAND.COM being "invalid". While this has never been destuctive, it has required a re-boot of the system. The only way I've found (so far) to avoid this problem is to allocate plenty of extra environment space. If anyone has a more "bulletproof" solution, please let us know here at CUJ. I recommend highly that one modification be made to the Master Environment package as listed in the 7/89 CUJ: the environment variable name should be converted to upper case both in the m_getenv() and m_delenv() functions. As written, only the m_putenv() function converts the name to upper case, and this causes failure when either m_getenv() or m_delenv are called with lower-case variable names. To make this change, alter the lines reading: n = name; to: n = strupr(name); There is one such line near the beginning of both the m_getenv() and m_putenv() functions. Linking The commands to compile and link CDE.C and RET. C using Microsoft C are shown at the top of the source file listings. I arbitrarily named the master environment package ENVLIB.OBJ, so including envlib on the qcl command line links in the object module. Summary The CDE and RET commands provide a clean, quick and convenient mechanism for alleviating some of MS-DOS's command processor limitations. Although there are plenty of full-blown command processor replacements, shells and special-purpose TSRs out there (even for free) that offer alternative ways to "get around" your DOS system, few (if any) of these can offer 100% compatibility with all other packages and TSRs, zero bytes of system RAM overhead (unless you count the few extra bytes of environment space required), and virtually instantaneous gratification. And you even get the source code! Listing 1 /* * UTIL.H: Includes and definitions for the CDE/RET * Directory Navigation utilities. */ #define MAX_DIRNAME_SIZE 100 /* longest conceivable directory name size */ #define MAX_EVARNAME_SIZE 20 /* max length of env. var. names created */ #define DEFAULT_CHAINS "1" /* initial default dir. stack size */ #define CHAINS_VAR "CHAINS" /* name of env. var. controlling stack size */ #define CHAIN_BASE "CHAIN" /* base name of env. vars holding dir names */ #define HOME_NAME "HOME" /* Name of 'home dir' environment variable */ /* * Prototypes for utility functions in CDERET.C: */ void error(char *msg); int cputs(char *txt); int putch(char c); int setdrive(int drive_no); int getdrive(); void change_dir(char *newpath); /* * Prototypes for Master Environment Control routines * (functions from CUJ 7/89) */ char *m_getenv(char *name); int m_putenv(char *name, char *text); int m_delenv(char *name); Listing 2 /* * CDE.C: Extended "cd" command for MS-DOS. * Written by Leor Zolman, 9/20/89 * * Features: * 1) Allows changing to another drive and directory in one step * 2) Supports directory aliasing through environment variables * 3) With no arguments, optionally switches to 'home' directory * (if the HOME environment variable is currently defined) * 3) Manages a "previous directory" stack through environment * variables. The number of entries in the stack is dynamically * configurable through a special controlling environment variable. * 4) For special case of stack size = 1, toggling back and forth * between two directories is supported * * Usage: * cde [d:] [path] (changes to given drive/directory) * cde <env-var-name> (indirect dir change on environment variable) * cde (changes to HOME directory, if defined, or * returns current working directory otherwise) * * Compile/Link: * cl /Ox cde.c util.c envlib (where ENVLIB.OBJ is Master Env. Pkg.) * * Uses the Master Environment library from CUJ 7/89. * */ #include <stdio.h> #include <dos.h> #include <string.h> #include <stdlib.h> #include "util.h" main(int argc, char **argv) { char *pathp; char cwdbuf[MAX_DIRNAME_SIZE]; /* buffer for current dir name */ int chaincnt; /* size of dir stack */ char chaincnt_txt[10], *chaincntp; char chnevar1[MAX_EVARNAME_SIZE], /* env var names built here */ chnevar2[MAX_EVARNAME_SIZE]; char chndname_save[MAX_DIRNAME_SIZE], *chndname; char itoabuf[10]; /* used by itoa() function */ int i; /* Get current dir. name and current drive: */ getcwd(cwdbuf, MAX_DIRNAME_SIZE); if (argc == 1) /* if no args given, */ if (pathp = m_getenv(HOME_NAME)) /* if HOME directory defined, */ { change_dir(pathp); /* then try to change to it. */ strcpy(chnevar1, CHAIN_BASE); /* set top-stack env var */ strcat(chnevar1, "1"); if (m_putenv(chnevar1, cwdbuf)) /* to old dir */ error("Error setting environment variable"); return 0; } else { /* just print current working dir */ cputs(cwdbuf); putch('\n'); return 0; } if (argc != 2) error("Usage: cde [d:][newpath] or <environment-var-name>\n"); pathp = argv[1]; /* skip whitespace in pathname */ if (chndname = m_getenv(pathp)) /* if env-var-name given, */ pathp = chndname; /* use its value as new path */ change_dir(pathp); /* Read or initialize master chain length variable: */ if ((chaincntp = m_getenv(CHAINS_VAR)) == NULL) if (m_putenv(CHAINS_VAR, strcpy(chaincntp = chaincnt_txt, DEFAULT_CHAINS))) error("Error creating environment variable"); /* Update the environment directory chain: */ chaincnt = atoi(chaincntp); for (i = chaincnt; i > 0; i--) { /* construct name of previous dirname variable: */ if (i != 1) { strcpy(chnevar2, CHAIN_BASE); strcat(chnevar2, itoa(i-1, itoabuf, 10)); } if (chndname = ((i != 1) ? m_getenv(chnevar2) : cwdbuf)) { /* copy value of prev. to current */ strcpy(chndname_save, chndname); /* m_putenv() bashes it */ strcpy(chnevar1, CHAIN_BASE); strcat(chnevar1, itoa(i, itoabuf, 10)); if (m_putenv(chnevar1, chndname_save)) error("Error setting environment variable"); } } return 0; } Listing 3 /* * RET.C: Return to previous working directory * Written by Leor Zolman, 9/89 * * (companion to CDE.C) * Uses the Master Environment package from CUJ 7/89 * * Usage: * ret (returns to previous directory) * * Compile/Link: * cl /Ox ret.c util.c envlib (ENVLIB.OBJ is Master Environment pkg) */ #include <stdio.h> #include <string.h> #include <stdlib.h> #include <dos.h> #include "util.h" main(int argc, char **argv) { char *pathp; char cwdbuf[MAX_DIRNAME_SIZE]; int chaincnt; char chnevar1[MAX_EVARNAME_SIZE], /* env var names built here */ chnevar2[MAX_EVARNAME_SIZE]; char chndname_save[MAX_DIRNAME_SIZE], *chndname; char itoabuf[10]; /* used by itoa() function */ int i; /* Get current dir. name and current drive: */ getcwd(cwdbuf, MAX_DIRNAME_SIZE); if (argc != 1) error("Usage: ret (returns to last dir cde'd from)"); if ((pathp = m_getenv(CHAINS_VAR)) == NULL) error("cde hasn't been run yet"); else chaincnt = atoi(pathp); /* See if CDE has created any entries: */ strcpy(chnevar1, CHAIN_BASE); strcat(chnevar1, "1"); if (!(pathp = m_getenv(chnevar1))) /* if so, pathp points to last dir */ error("No previous directory"); /* else no previous dir */ change_dir(pathp); /* change to previous directory: */ /* Update the environment directory chain: */ if (chaincnt == 1) /* special case: record old dir */ { if (m_putenv(chnevar1, cwdbuf)) error("Error setting environment variable"); return 0; } for (i = 1; ; i++) { /* get name of current dirname variable */ strcpy(chnevar1, CHAIN_BASE); strcat(chnevar1, itoa(i, itoabuf, 10)); strcpy(chnevar2, CHAIN_BASE); strcat(chnevar2, itoa(i + 1, itoabuf, 10)); if (!(chndname = m_getenv(chnevar2))) break; /* found end of saved chain */ /* copy value of next higher to current */ strcpy(chndname_save, chndname); /* m_putenv() bashes it */ strcpy(chnevar1, CHAIN_BASE); strcat(chnevar1, itoa(i, itoabuf, 10)); if (m_putenv(chnevar1, chndname_save)) error("Error setting environment variable"); } return 0; } Listing 4 /* * UTIL.C: Utility functions for CDE/RET package * * These function rely on the "bdos" library function * from your compiler's library. Prototype: * * int bdos(int dosfn, unsigned dosdx, unsigned dosal); */ #include <stdio.h> #include <dos.h> #include <ctype.h> #include "util.h" /* * Print error msg and abort: */ void error(char *msg) { cputs("cde: "); cputs(msg); putch('\n'); exit(-1); } /* * Change to specified drive/path, terminate program on error: */ void change_dir(char *new_path) { int old_drive; old_drive = getdrive(); while (*new_path && isspace(*new_path)) /* skip whitespace */ new_path++; if (new_path[1] == ':') /* if drive designator */ { /* given, then set drive */ if (setdrive(tolower(*new_path) - 'a')) error("Can't select given drive\n"); new_path += 2; } if (*new_path && chdir(new_path)) /* If path given, set new path. */ { setdrive(old_drive); /* If error, restore drive */ error("Can't change to given path"); } } /* * DOS functions, written in terms of the "bdos" function: */ int cputs(char *txt) /* display msg, console I/O only */ { char c; while [c = *txt++) { if (c == '\n') putch('\r'); putch(c); } return 0; } int putch(char c) /* display a char on console */ { return bdos(2, c, 0); } int setdrive(int drive_no) /* set logical drive. Return */ { /* non-zero on error. */ int after; bdos(0x0E, drive_no, 0); after = bdos(0x19, 0, 0); if ((after & 0xff) == drive_no) /* low 8 bits are new drive no. */ return 0; /* set correctly */ else return -1; /* error. */ } int getdrive() /* return current logical drive */ { return bdos(0x19, 0, 0); } Dealing With Memory Allocation Problems Dear Mr. Ward: I am not much of a letter writer, but after reading the July 89 issue of The C Users Journal I felt I could save some of your readers a lot of time tracking down a problem with the Microsoft C, version 5.10 memory allocation routines. Enclosed is a listing and the output from the program. This may help Steven Isaacson who is having memory allocation problems using Vitamin C. I found this problem after a week of tracking down a memory leak problem in a very large application. My final solution was to write my own malloc()/free() rountines that call DOS directly. This will let the DOS allocator do what is is supposed to do. No time penalty was noticed in our application. Note if you do write your own malloc()/free() routines, call them something else! MSC uses these routines internally and makes assumptions about what data is located outside the allocated area. I always use a malloc()/free() shell to test for things like memory leaks and the free of a non-allocated block. It also will give you an easy way to install a global 'out of memory' error handler. The code supplied by Leonard Zerman on finding the amount of free space in a system is simplistic and very limited. A better routine would build a linked list of elements and then the variable vptrarray could be made a single pointer to the head of the list. The entire routine becomes dynamic, much more robust, and there is no danger of overflowing a statically allocated array. See the supplied code for an example. The linked list implementation has the side effect that it will work on a virtual memory system. Why you would want to do this is beyond me, but it could be considered a very time consuming way to find out what swapmax is set to on a UNIX system. If you have any questions, please contact me. My phone number is (408) 988-3818. My fax number is (408) 748-1424. Sincerely yours, Jim Schimandle Primary Syncretics 473 Sapena Court, Unit #6 Santa Clara, CA 95054 Editor's Note: If you couldn't find "Listing 1" in last month's "We Have Mail", you needn't fear the onset of any perceptual disorder -- there was no Listing 1. Usually publishers blame this kind of problem on someone else -- the printer, the typesetter, the proofreader, the paste-up artist. Unfortunately this publisher doesn't have any convenient scapegoats; I pasted up the letters section (something I often do), and failed to include the listing. Anyway, here is the original letter and the promised listing. This time it will be right -- my staff is doing it. --rlw Listing 1 /*---------------------------------------------------------------------- ++ membug.c Demonstrate MSC malloc() large size problem Description membug.c demonstrates a problem that occurs when Microsoft C, version 5.10 is used to allocate and free large blocks of memory. If this program is compiled and run, you will find that the first list will have significantly more memory allocated to it. The second list will only have 1 to 2 elements allocated to it, depending on your memory layout. The basic problem is that MSC never deallocates a DOS allocated memory block, even if the memory call is about to fail. Thus, the first list causes the MSC runtime to allocate memory in 48K blocks. When the first list is freed, the 48K blocks remain. Then, when the second list is allocated, there are only 2 blocks that DOS can carve the 60K blocks from: the default memory segment and the last DOS memory block. The default memory segment is 64K, so we should always get an allocation from it. The last memory block can be expanded by DOS to fit the 60K request if your memory layout will allow it. Note that if you reverse the order of memory requests, both will return the same number of memory blocks because the 48K requests will fit in the 60K blocks. Compilation Compilation is under Microsoft C, version 5.1 using the command: c1 /W3 AL membug.c Execution Execution of the program should use the command line: membug > membug.out +- $Log$ -- */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <dos.h> /* Local definitions */ /* ----------------- */ #define FIRST_ALLOC_SIZE 48000 #define SECOND_ALLOC_SIZE 60000 /* Memory allocation list structure */ /* -------------------------------- */ typedef struct mb /* Memory list node */ { /* ---------------------------- */ struct mb * mb_next ; /* Pointer to next block */ char mb_data ; /* Start of data area */ /* Actual data area size is */ /* determined by runtime */ /* malloc() argument */ } MEM_BLOCK ; /* Pointer conversion macros */ /* ------------------------- */ #define FARPTR_SEG(a) ((int) (((unsigned long) (a)) >> 16)) #define FARPTR_OFF(a) ((int) ((long) (a))) #define MAKE-FARPTR(seg,off) ((void far *) ((((long) (seg)) << 16) + (of))) /* Function prototypes */ /* ------------------- */ void main() ; void DOS_Mem_Display(char *) ; /*--------------------------------------------------------------------*/ + main Entry point for MSC dynamic memory test Usage void main() Parameters None Description main() is the entry point for the Microsoft C dynamic memory test. The function allocates a list of FIRST_ALLOC_SIZE elements, frees the first list, allocates a second list of SECOND_ALLOC_SIZE, and frees the second list. The statistics printed out are the total bytes allocated by each allocation and a dump of the DOS memory list after each allocation/free. Notes None - */ void main() { MEM_BLOCK * list ; MEM_BLOCK * p ; long first_size ; long second_size ; /* Allocate list using first allocation size */ /* ----------------------------------------- */ list = NULL ; first_size = 0; while ((p = (MEM_BLOCK *) malloc(FIRST_ALLOC_SIZE)) != NULL) { p->mb_next = list ; list = p ; firstsize += FIRST_ALLOC_SIZE ; } /* Print first allocation results */ /* ------------------------------ */ printf("***** First allocation - %ld *****\n\n", first_size) ; DOS_Mem_Display("After first allocation/n") ; /* Free first list */ /* --------------- */ while (list != NULL) { p = list ; list = list->mb_next ; } DOS_Mem_Display("After first free\n") ; /* Allocate list using second allocation size */ /* ------------------------------------------- */ list = NULL ; second_size = 0 ; while ((p = (MEM_BLOCK *) malloc(SECOND_ALLOC_SIZE)) != NULL) { p->mb_next = list ; list = p ; second_size += SECOND_ALLOC_SIZE ; } /* Print second allocation results */ /* ------------------------------- */ printf("***** Second allocation - %ld *****\n\n", second_size ; DOS_Mem_Display("After second allocation\n") ; /* Free second list */ /* ---------------- */ while (list != NULL) { p = list ; list = list->mb_next ; free (p) ; } DOS_Mem_Display("After second free\n") ; } /*--------------------------------------------------------------*/ DOS_Menu_Display psp_seg = *(p+1) + ((*(p+2)) << 8) ; blk_paras = *(p+3) + ((*(p+4)) << 8) ; size = ((long) blk_paras) << 4 ; if (psp_seg == 0) { prg = (unsigned char far *) "(free)" ; total += size ; } else { ip = (unsigned int far *) MAKE_FARPTR(psp_seg, 0x2c) ; prg = MAKE_FARPTR(*ip, 0) ; while (*prg != '\0') { prg += strlen((char *) prg) + 1 ; } prg += 3 ; } sprintf(str, "%5d %91d %p", idx++, size, p) ; printf("%s\t%s\n", str, prg) ; if (*p == 'z'} { break ; } p = MAKE_FARPTR(FARPTR_SEG(p) + blk_paras + 1, 0) ; } sprintf(str, "Total Free: %ld", total) ; printf("%s\n\n", str) ; } /*--------------------------------------------------------------------*/ Standard C Quiet Changes, Part I P.J. Plauger P.J. Plauger has been a prolific programmer, textbook author, and software entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standard committee. A language standards committee can commit a variety of sins. It can eliminate existing features, so that existing programs that use them generate diagnostics with new translators. It can add lots of new features, so that existing programs trip over them and generate diagnostics. It can even redefine existing features, so that existing programs apparently misuse them and generate diagnostics. All of these are nasty things to do. A committee that indulges in such sins had better be prepared to justify its actions. Discarded features must be arguably dangerous, or at least not worth the clutter they cause by remaining in the language. Added features must fill a real need and not add to the clutter. Changed features require the most justification of all, since they cause the greatest disturbance. So long as changes cause diagnostics, however, you can live with them. Even if you have to convert half a million lines of existing C code, you know how to proceed. Stuff your code through the new translator and see where it gripes. For very common gripes, you can often contrive a global edit that will mechanically fix up the code. For the rest, you at least have your attention forcibly directed to the areas where you must manually intervene. The worst sin of all for a language standards committee is to make a change that does not cause a diagnostic. You have a working program with your existing C translator. You upgrade to a standard C compiler and your program quietly recompiles. The only problem is, it behaves differently. That is a project manager's worst nightmare. Even if you generally like the new behavior, you have a serious problem on your hands. That half a million lines of existing C code may change its behavior in only a handful of places.You cannot rashly assume that the new behavior is acceptable every place. (Probably it is not.) You need to locate every place and check the implications of the change. Committee X3J11 dubbed such alterations "quiet changes." We blanched every time we faced the prospect of introducing one. We did our best to avoid them. Nevertheless, we occasionally found compelling reasons to adopt quiet changes along with various other subtle but noisy changes. So we made sure that we documented every quiet change we made in the Rationale that accompanies the Standard. I discussed the most ambitious of these quiet changes last year. (See "Standard C Promotes Types According to Value Preserving Rules," CUJ August '88.) The rules for mixing signed and unsigned integer operands in an expression were, in the past, both subtle and varied. The Committee discussed the different approaches at length before choosing a particular set of "promotion" rules. I did my best to present all the arguments and to justify the choice we eventually made. This column and the next endeavor to summarize all of the quiet changes made in Standard C. They may not affect you because there have been numerous dialects of C in past practice. (That's a principal reason for making a language standard, to eliminate dialects.) We labeled something a quiet change if any significant dialect of C quietly changed meaning. The change may not affect your favorite dialect. Nevertheless, you should be aware of any possibility of a quiet change in C code. Who knows, you may already have a lurking problem in code moved from a different implementation of C. In the explanations that follow, I have copied the description of each quiet change almost verbatim from the Rationale for Standard C. They appear in the same order as in the Rationale, which reflects the order of topics presented in the Standard. The Quiet Changes "Programs with character sequences such as ??! in string constants, character constants, or header names will now produce different results." For example, printf ("You said what??!\n"); quietly becomes printf ("You said what\n"); This is the result of introducing trigraphs. The committee felt a compelling need to provide a way to represent certain characters unavailable in EBCDIC or the invariant subset of ISO 646. (The characters are [\]^{/}~#.) The alternate forms had to be representable using just the common subset of characters. They also had to be usable within character constants, string literals, and header names. Since existing programs can conceivably contain an arbitrary sequence of characters in these places, we had no way to satisfy these basic requirements without introducing the possibility of a quiet change. We settled on trigraphs, or three-character sequences, as a compromise. Digraphs might be easier to type, but were more likely to change the meaning of older programs. (C uses all of the characters in the subset, so even code outside quotes and headers is endangered.) Each trigraph begins with two question marks, to minimize the chance of a quiet change. It ends with a character from the subset that is designed more or less to suggest the replacement character. Nobody pretends that ??< is a highly readable alternative to {. But then nobody prevents you from filtering your C code before you send it to a printer. (You might, for example, overstrike a left parenthesis and a minus sign to print a left brace instead of printing the actual trigraph.) Trigraphs serve the limited purpose of providing a minimal interchange standard for shipping C between countries. (Even the Danes, who are adamant that trigraphs are insufficient, have offered no alternative to their use within quotes and header names.) "A program that depends upon internal identifiers matching only in the first (say) eight characters may change to one with distinct objects for each variant spelling of the identifier." For example, int get_stuff_DEF; f() { extern int get_stuff_REF; return (get_stuff_REF); } A clever programmer may expect that all the names beginning with get_stuff refer to the same data object. That is no longer true. There was widespread support for longer names in C. The eight-character significance limit inherited from Ritchie's original implementation is certainly inadequate. Worse, implementations differed on the treatment of "insignificant" characters in a name. (Is an implementation obliged to ignore the extra characters when comparing names? Or is it merely permitted to ignore them?) Further confusing the issue was the distinct, and more severe, limit on external names imposed by old-fashioned linkers. The committee decided on a three-tiered limitation on names. First, any name can be as long as a logical line. An implementation can choose to inspect all characters when comparing names. Second, an implementation must inspect at least the first 31 characters. It can choose to look at no more than 31 characters. Finally, an implementation may require that external names differ in the first six characters, and ignore case distinctions. These rules were adopted despite a few notorious cases cited of existing programs that would quietly change. It seems that some implementations ignore characters after the first eight. Some programmers have made a practice of intentionally punning by writing distinct names that are intended to compare equal. I don't recall the rationale for this practice and I don't care. The practice is sufficiently barbaric that it garners little sympathy, even if it can be the victim of a quiet change. "A program relying on file scope rules may be valid under block scope rules but behave differently -- for instance, if d_struct were defined as type float rather than struct data in the following example:" typedef struct data d_struct { /* ... */ }; first() { extern d_struct func(); /* ... */ } second() { d_struct n = func(): } (This example from the Rationale is not wonderful. I even had to fix a small bug in reproducing it here.) At issue here is the clash between C as a block scoped language and C as a "flat" language with separately compiled modules. The former requires that names be forgotten at the end of the scope in which they are defined. The latter requires that external names be remembered and matched up across separate compilations. Past implementations differ widely on the treatment of extern declarations within function bodies. Do such declarations percolate out, a block at a time, to file level so they can be matched up with any other file-level declarations for the same name? Or does each such declaration form a worm-hole out to the linker, with the worm-hole forgotten at the end of the block? Or does something even more bizarre occur? The example above can give different results with different interpretations. In the first case, the declaration of func percolates out from the first function. It is then visible within the second function, so the assignment makes sense. In the second case, the declaration of func goes out of scope at the end of the first function. The second function must assume that func is implicitly declared as an external function returning int. In this case, you get a diagnostic. But change the type definition to float, as the Rationale suggests, and you get a quiet (but erroneous) conversion across the assignment. Like the previous issue on identifier lengths, here is a case where a quiet change is essentially unavoidable. Existing dialects differ too much for the standard to contain a common subset of behavior. What the committee chose, in fact, was the second behavior. C is a block structured language with holes blown in it. A translator can diagnose conflicting external declarations within a translation unit. It can also elect not to do so, since this is a case of "undefined behavior." A linker can diagnose conflicts between separate compilations. It can also elect not to do so. In practice, most compilers and few linkers will choose to diagnose such conflicts. "Unsuffixed integer constants may have different types. In K&R, unsuffixed decimal constants greater than INT_MAX, and unsuffixed octal or hexadecimal constants greater than UINT_MAX, are of type long." For example, on an implementation where type int occupies 16 bits, f(32768); /* argument now 16- bits */ i = OxFFFFF / -10; /* divide now unsigned */ This is part of the fallout of choosing value-preserving rules for promoting types in expressions (discussed later). The committee felt obliged to tidy up the typing rules for integer constants, to maintain a consistent philosophy toward preserving the expected value of a sub-expression. Ritchie's original rules required that 32768 have type long on an implementation where type int occupies 16 bits. That led to occasional surprises, particularly when writing arguments on function calls. (There were no function prototypes in those days to fix up or diagnose improper argument types.) With value-preserving promotion rules, however, you get the expected result more often by making 32768 type unsigned int. And such a choice is more consistent with the basic philosophy of choosing the "cheapest" type that preserves the value of an expression. Similarly, octal and hexadecimal integer constants are expected to be unsigned. It is silly for one to lose its unsignedness just because its value is too large to be represented as type int. Consistency requires that 0x10000 (on an implementation where type int occupies 16 bits) have type unsigned long instead of long. In both cases, you can contrive programs that quietly change meaning with the change of typing rules for integer constants. The committee felt, however, that such programs were already at risk in being moved among existing dialects, which supported a variety of promotion rules. "A constant of the form '\078' is valid, but now has different meaning. It now denotes a character constant whose value is the (implementation-defined) combination of the values of the two characters '\07' and '8'. In some implementations the old meaning is the character whose code is 078 == 64." This is a consequence of now disallowing the digits 8 and 9 in octal escape sequences. Even the earliest C compilers have tolerated the practice, and more than a few programs have taken advantage of this tolerance. Nevertheless, the committee felt it was sufficiently barbarous that it had to be dropped. (The committee did not revoke the even more barbarous license to write 111l in place of 111L.) "A constant of the form '\a' or '\x' now may have different meaning. The old meaning, if any, was implementation defined." For example, char letter = 'a'; if (letter == '\a') /* no longer same as 'a' */ The backslash is no longer ignored in front of an arbitrary letter. Worse, Standard C now gives special meaning to \a. The committee felt obliged to add to the list of character escape sequences. The sequence \a stands for the "alert" character. In ASCII, it is the BEL code that rings the bell on old Teletype terminals and makes some sort of electronic beep on modern ones. The sequence \x signals the start of a hexadecimal escape sequence of arbitrary length. Neither of these escape sequences was officially defined in the past. There was the general promise that a backslash before a character with no magic meaning simply stood for that character. (I had, in fact, written a number of strings that used \x as a place holder to be filled in. That was my tough luck.) Nevertheless, the addition could cause a quiet change. "A string of the form "\078" is valid, but now has different meaning." See above for the same issue with character constants. The only difference is that the string literal gets longer. Character constants pack all the character codes into a single int value, in an implementation-defined manner. "A string of the form "\a" or "\x" now has different meaning." See above for the same issue with character constants. "It is neither required nor forbidden that identical string literals be represented by a single copy of the string in memory; a program depending upon either scheme may behave differently." For example, char *s = "abc"; ..... if (s != &"abc"[0]) printf("s has changed\n"); The printed message is correct only if both instances of "abc" become the same data object. This is not guaranteed in Standard C. Here is another case where existing dialects of C were in conflict. Some dialects guarantee that identical string literals are represented by a single copy within a translation unit. Others guarantee that each string literal occupies distinct storage. The committee chose to leave the choice up to the implementation. It is "unspecified," so the implementation need not document the choice or even be consistent in how it chooses. (Another example of unspecified behavior is the order in which a program evaluates multiple arguments on a function call.) Naturally, any program that depends on some particular behavior is likely to be disappointed by some conforming implementation. "Expressions of the form x=-3 change meaning with the loss of the old-style assignment operators." For example, i =-3; /* now stores -3 */ It has been many years since UNIX C reversed the assigning operators. Where now you write -= you once wrote =- as in the example above. Programmers who are stingy or haphazard with spacing around operators got burned often enough that Ritchie switched C to match the Algol 68 convention. Nevertheless, a number of commercial C compilers retained the old forms for backward compatibility with early C code. Disallowing the old forms can, of course, lead to all sorts of nasty puns. Those who didn't bite the bullet back in the seventies must do so now. Intermission That's about half of the quiet changes documented in the Rationale for the C standard. Tune in next month for the rest of the story. Doctor C's Pointers (R) Header Design And Management Rex Jaeschke Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091 or via UUCP at uunet!aussie!rex. All too often, programs just "happen." There is little if any serious design done, and programmers "design on the fly", using an approach I call stepwise refinement. That is, you code a bit and test it then iteratively refine it till it's somewhere close to what you think you want. And after you have hard-coded the same macro definitions and function declarations in ten different places you think perhaps it would be a good idea to create a header instead. However, this either doesn't get done or it's done at the local level to solve just the particular problem in the code you are currently working on. For the most part, I find people program defensively. Designing and managing headers is an integral part of a C project design. It must be done before any code is written to ensure that the design is consistent, can be managed easily, and that a high degree of quality assurance can result. The lack of properly designed headers is a likely recipe for added development, debugging, and maintenance time, as well as significantly reduced reliability. There are many aspects to designing headers. In this article I will look at those I've recognized. However, before I begin, a definition of the term header is in order. I think you all know what a header is but for the purposes of this discussion, I will consider a header to be a collection of declarations that can be shared across multiple source files via the #include preprocessing directive. And while a header is typically represented as a source code file on disk, it need not exist as such. For example, a header might actually be built into the compiler (at least the standard ones like math.h could be) or it could be compiled into some binary form that the preprocessor can more easily or efficiently handle. The specific representation details are left to the implementer's choice and will not be further discussed here. As such, I prefer to use the term header rather than header file or include file since the last two names imply a file representation. Whatever term you use, be consistent. Header Categories There are four categories into which headers can be classified: standard, system, library, and application. A standard header is one of the 15 defined by ANSI C, such as stdio.h, math.h, and string.h. ANSI requires you to include standard headers using the notation #include <header.h>. Do so even if #include "header.h" appears to work for them. A standard header is stored in some special place such that it can be accessed from all places in which a source file can be compiled. A system header is one supplied by the compiler vendor that can be used to interface to and/or exploit the host hardware and/or operating system. Examples on MS-DOS systems include bios.h and dos.h; on VAX/VMS, headers rms.h, rab.h, and fab.h are used to access the RMS file system; and on UNIX, the special set sys\*.h is provided. An implementer can provide as many system headers as he needs. VAX C, for example, comes with about 200. Since system headers are useful to all applications, they are typically stored in the same place as standard headers. A library header is one provided with a third-party library such as a windows, graphics, or statistical package. Again, a product may include many headers and you may use a number of different libraries in the same application. Library headers are also universally shareable and will likely reside with standard and system headers. An application header is one you design for a particular application and as such, it should be located in a place separate from headers in the other three categories. It is possible, however, that over the course of designing an application, you build a header that is useful beyond the life of the current system. This header then, should really be treated as a miscellaneous library header. If each programmer on the project develops his own private miscellaneous headers naming conflicts can easily arise, so you must ensure that private headers are not used. During testing stages of a project, it can be very tempting to provide a quick (and often dirty) fix to a given problem by simply changing a header and recompiling the offending source module. However, this can cause other nasty side-effects later on when the system as a whole is rebuilt. Also, you must never, never, ever even think of changing a standard, system, or library header; these are sacred. For example, you might discover you need macros called TRUE and FALSE in several modules and since stdio.h is included in all of them, why not simply add definitions for these macros to that header? Afterall, it can't hurt any existing uses of these headers, can it? Apart from reflecting bad style when you next (re)install the compiler, these changes are lost. One solution to this is to make all headers, including application headers that have been moved to production, read-only. That way, if you should ever try to change or overwrite them you are reminded of the seriousness of such an action. Header Names ANSI C requires the standard header names to be written in lower case. Do so even if your file system is case insensitive (as is the case with MS-DOS and VAX/VMS.) In fact, ANSI does not require that filenames of the form header.h be supported by your file system. The compiler must accept #include <stdio.h>, but is allowed to map the period or any other part of that header name to other characters. The convention of naming headers with a .h suffix is exactly that, a convention and need not be followed by user-written headers. Certainly, it's a useful default convention if you have no good reason to do otherwise. If you wish to port code, keep in mind that the length of significance, case distinction, and format of filename (assuming a header is a file), are all implementation-defined. It is generally considered bad style to specify device and or directory information in a header name. Considering that almost all compilers provide compile-time options and/or environment variables to specify an include search path, I see no reason to unduly reduce your flexibility options. Header Contents Just what should go in a header and how big should headers be? It is relatively easy to answer the "what." If something cannot be shared, it does not belong in a header. For the record, candidates for inclusion in a header are: macros, typedefs, templates for structures, unions, and enumerations, and function prototypes, extern data declarations, and preprocessing directives. Placing anything else in a header needs careful scrutiny. In particular, including executable code that is not inside a macro definition is very bad style. My rule of thumb is to put all related stuff together in one header. However, if that makes for a very large header and the contents can easily be broken into logical subsets, then I prefer each subset be in its own header. It's useful to give such headers names with the same prefix so you can easily determine they are related. The only difference here is whether the preprocessor has to process one big header instead of just those parts it needs. Don't get too hung up on worrying how much work the preprocessor has to do unnecessarily since that's what CPU cycles are for. In fact, in the extreme case where you put each declaration in its own header, the preprocessor won't need to do any extra work, except for opening and closing all those headers. It's quite likely that, while most things will fit neatly into related groups each in a header, some miscellaneous bits will be left over. About the only way to handle these reasonably is a miscellaneous header. ANSI C has one of these, called stddef.h. Whatever organization you chose, everything that can be shared should be shared. That is, you should make sure that all macros, function prototypes, etc., are part of some header and not hard-coded in source files directly. Each header should be self-contained. If one header refers to something in another header, the first should directly include the second. Forcing the programmer to know and remember the order in which related headers need be included is burdensome and unnecessary. Protecting Header Contents It is very likely that in some source modules you will include the same header multiple times, once directly and one or more times indirectly via other headers. Since everything in a header is supposed to be shareable, there should be no problem in processing the same header multiple times except the extra work of preprocessing. Right? Well, that's not quite true. Specifically, if the same typedef or structure, union, or enumeration template definition is seen more than once, the compiler produces an error so they must be somehow protected. The best way to achieve this is to place a conditional compilation protective wrapper around the whole header as follows: /* header local.h */ #ifndef LOCAL_H #define LOCAL_H ... #endif I prefer to use a macro spelled in upper case the same as the header, along with a suffix of _H. This naming convention is easy to understand and is very unlikely to be used for other macros elsewhere in the set of headers. Using something like LOCAL could easily be used as a different macro elsewhere, leading to confusion. Since the standard headers can also be included multiple times and some of them contain typedefs and structure templates, these too must be protected. Check those provided with your compiler to see if they indeed are protected. The only difference between your wrapper and that used by the standard headers is that you must not begin your private macro name with an underscore while they must, since that's the implementer's namespace. It is preferable to have each thing defined in one, and only one, header. However, for various reasons it may be desirable to duplicate something in multiple headers. The problem here is to make sure that all of those headers containing duplicates can be included at the same time. For example, consider the case of having a typedef for count in two headers as in Listing 1. You should also check your standard headers for this kind of protection since size_t, the type of the sizeof operator, is required to be typedefed in five of them. Note that ANSI C places strict rules on whether a standard header can include another standard header. For example, most identifiers defined in a standard header are only "reserved" if their parent header is included. For example, if you don't include one of the six standard headers that define NULL, you are perfectly safe in defining your own identifier NULL even though it would be bad style. So, if assert includes stdio.h, all the names in stdio.h would become defined as well, even though they are not defined in assert.h. And while assert.h could contain #undefs to remove these, there is no way for it to remove any typedefs or template definitions. Many mainstream compilers claiming ANSI conformance or claiming to be tracking the ANSI standard break this rule. As such, they are not ANSI-conforming. Check your standard headers for this. Conditional Inclusion There are a number of ways to conditionally include headers as necessary. Perhaps the best is to conditionally compile a subset of #include directives inside a header, based on the existence or value of a macro defined using a compiler option. That is, the compilation path is specified outside all source modules. This way, you can trigger any possible conditional compilation path from as few as one macro. You also have the ANSI invention of #include macro where macro must expand to a header name of the form <...> or "...". You also can use the stringize and token pasting preprocessor operators # and ## respectively, to construct a macro that is to expand to a header name. I have also found that it is a good idea to remove as many preprocessing directives as possible from source modules into headers. In particular, I find conditional compilation directives in source code to be most distracting, especially when there are more that two compilation paths. The aim is to isolate such dependencies into headers so you can forget about them and get on with the business of implementing or maintaining the application. An example of this strategy follows: #if TARGET == 1 fp = fopen("DBAO:[direct]master.date", "r"); #else fp = fopen("A:\direct\master.date", "r"); #endif This can be implemented in a much clearer way by abstracting the filename into a header as in Listing 2. Planning For Debugging And Maintenance People who don't design programs are unlikely to plan for debugging and maintenance. They probably don't even write a shopping list for that matter. Unfortunately, there are lots of these people programming, many of them in C. It is very naive and probably irresponsible to believe that with a non-trivial program, debugging will be a mere formality and that you will always be around to maintain the code. Over the years I have found it a useful idea to include a header called something like debug.h into every source file I write when working on a non-trivial project. If the header is empty, that's fine. However, it makes it very easy to add or change that header's contents and recompile all or part of the system for testing. Since you have one header included everywhere, it is trivially easy to make powerful changes and to experiment. And the cost of having this flexibility is practically nothing, if you cater for it at the beginning. Concatenating Headers There are always people who try to stretch a language's capabilities to the extreme. For example, they place part of a source file in one header and the rest in another and include them both to form a valid source module. Cute, but very bad style. Let's look at just what can and cannot be split across multiple source modules, and therefore across multiple headers. A source module must contain complete tokens. That is, a source token cannot be split across two files. Specifically, the notation of backslash/new-line continuation cannot be used in the last line of a source file. Likewise, a comment cannot span two files. With string literal concatenation now supported by ANSI, you could have a string in one file concatenated with a string in another, but that would require the strings to be outside a macro definition and I have already said that's very bad style. You could also split a structure template definition across multiple files, but I see no benefit. One thing not immediately obvious in ANSI C is that each matching set of #if/endif and corresponding #elif and #else directives must be contained within the same source file. That is, the #if and matching #endif directives must be in the same source file. Conclusion I have addressed many issues here most of which have arisen from my own experiences. I am sure there are others that could be added. For the most part, I find header design to be simply a matter of common sense once you know and understand the tools the language and preprocessor provide. But then again, I find that to be pretty much the solution to a vast number of problems. It's sad that common sense is not all that common. Listing 1 /* h1.h */ #ifndef H1_H #define H1_H ... #ifndef COUNT_T #define COUNT_T typedef unsigned int count; #endif ... #endif /* h2.h */ #ifndef H2_H #define H2_H ... #ifndef COUNT_T #define COUNT_T typedef unsigned int count; #endif ... #endif #include "h1.h" /* count defined */ #include "h2.h" /* count not redefined */ Listing 2 /* files.h */ #if TARGET == 1 #define MASTER_FILE "DBAO:[direct]master.date" #else #define MASTER_FILE "A:\direct\master.date" #endif /* source.c */ #include "files.h" ... fp = fopen(MASTER_FILE, "r"); On The Networks Games And Tongues Sydney S. Weinstein Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and President of Datacomp Systems, Inc., a consulting and contract programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites and device management for UNIX and MSDOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do Internet addressing). RPN Fans - Here's One For You Before I took over David Fiedler's column, he mentioned in his last installment the ultimate on-screen calculator for UNIX systems. Here now is a simpler one, usable on any system that has a curses package or emulation library. It emulates the HP-16C and can popup on both UNIX and MS-DOS. Support for floating point, hexadecimal, decimal, octal and binary modes is provided. The calculator, written by Emmet Gray of the US Army, has ten registers and supports computer-oriented functions. It was posted to comp.sources.misc and is available from the archive sites that support that group, including uunet. New Games New versions of several games were distributed recently in comp.sources.games. These include version 4 of Conquer, a middle earth multi-player game for UNIX systems. Source to the game itself, as well as the patches, is available from the archive sites for comp.sources.games, including uunet. Conquer v4 patches are volume 8, issues 1 - 4. Nethack has also had a major update in comp.sources.games volume 8, issues 6-12. New screens and enhancements were added to this display-oriented dungeons and dragons game. Galactic Bloodshed, an empire-like war game has also been upgraded this month in comp.sources.games, volume 8, issues 26 - 30. This upgrade gives several new versions to keep those UNIX systems busy. A new game has also appeared, a two-handed card game similar to Bridge and Spades (especially two-handed Spades). It's a trick-taking game with a trump suit determined by bidding. Cards are drawn from the deck, each player taking a turn drawing one card from the top of the deck. If you desire to keep that card, it becomes part of your hand and the next card is discarded without being seen, otherwise you discard it and take the next card. This yields two thirteen-card hands. Bidding is based on the number of tricks you think you can take, with the last winner naming the trump. Lastly, the hand is played out. Scoring is simple; if the bid is made, you score ten times the bid plus the number of overtricks. If you go down and don't make the bid, you score negative ten times the bid. The winner is the first player to 250 points. The author, Scott Turner from UCLA, has asked for help in improving the bidding process. He has provided a program with a very interesting set of bidding options coded as rule based, neural networks, and then a cheating bidder that reads both hands. However, he is not happy with the outcome and is asking for help. The program gives ample statistics for tuning a bidding algorithm and those of you up to a challenge just might want to take him up on his offer for help. Back To Work Several serious works also appeared recently on the networks. For those diehard fans of vi type editors comp.sources.misc recently distributed "stevie" (ST Editor for VI Enthusiasts), a public domain clone of UNIX's vi editor. This version was developed for the Atari ST, but has since been ported to UNIX, OS/2, DOS and Minix-ST. Unsupported ports also included in the release include Minix-PC, Amiga, and some Data General systems. Thus, stevie appears to be extremely portable. Makefiles are included for all the systems. Stevie's main drawback, for some environments, is that it keeps the file being edited in memory, limiting the size of the file to be edited for systems with smaller addressing spaces or without virtual memory. It was originally written by Tim Thompson, but this latest version was posted by Tony Andrews at onecom!wldrdg!tony. He also will mail diskettes to those who send him a formatted disk along with a self-addressed, stamped disk mailer for returning the disk. He can write Atari ST (SS or DS) or MS-DOS (360K or 1.2M) formats. His address is Tony Andrews, 5902E Gunbarrel Avenue, Boulder, CO 80301. Now that Berkeley has released much of its BSD 4.3-tahoe release to the public, sections of it are being ported to UNIX System V and Xenix. Comsat, the BSD mail notification daemon, was recently posted to comp.sources.misc. Comsat sends messages to users when mail is delivered for them. It uses a daemon approach, and thus does not need to wait for the current command to complete or the user to type a carriage return to the shell. Also included in this port are changes to smail v2.5 necessary for it to notify comsat when mail is delivered. Users control whether or not they get notification using the biff command, which is also included. Since UNIX System V usually doesn't support the Berkeley socket interface, this port uses named pipes, so the notification is limited to the local machine. Those with the socket interface can use the BSD version of the program. Thanks to David MacKenzie for his porting effort. Foreign Tongues? In volume 8, issues 65-87 comp.sources.misc has distributed a major effort that will strike people as either a godsend or totally useless. If you need to print foreign languages with their extended character set support, the "cz text to PostScript system" is for you. It is a table-driven system that can be used to convert any "context-free octet-based character set into PostScript." This means that every character in the character set is represented by one or more eight-bit bytes and that only the bytes of that character determine what it prints, not other bytes in the file. This excludes locking shift sequences. Even if you don't need the foreign language support, the posting had an addendum called libhoward that includes several C functions to convert numeric literals to internal representations and perform string manipulation all with error recovery. It's all documented and worth looking at, even just to see how he did it, courtesy of Howard Gayle of Ericsson Telecom AB in Sweden (howard@dahlbeck.ericsson.se). Yea! Its Back, Maybe? After a long absence from USENET with no postings, comp.sources.unix distributed the first program of Volume 20. It is a contribution from Barry Books at IBM releasing into the public domain an include file tester. This tester checks include files for POSIX 1003.1 and ANSI compliance. It reports missing items, additional items allowed by the standard, and additional items not allowed by the standard. References to the standards documents are also included in the report. This could prove to be a really useful tool for portability. Unfortunately, after this promising posting, comp.sources.unix has been quiet again. Hopefully, Rich Salz, the moderator, will find time to resume the postings shortly. Upcoming Releases Perl, Larry Wall's Practical Extraction and Report Language, is going though its beta period on a new version via alt.sources. Version 3 has lots of new features, and next time I will give an in-depth review of this new release from one of the net's most respected authors of "Off the Wall Software". Less, a more replacement (a display pager) is also in beta test with its newest release. alt.sources is wonderful for hints of what is to come. Many authors are using it for beta test distributions. Another major package is also in its latest beta round; the Extended Portable Bitmap Toolkit appeared recently in alt.sources. This set of tools is used to convert images from one bitmap format to another. It supports many formats and again, next time, a more detailed report. If you have a pending release you would like covered in this column, drop me a line. My electronic address is syd@DSI.COM and I look forward to hearing from you. Questions & Answers malloc, Porting, And Stack Overflow Ken Pugh Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You can fax me your questions at (919) 493-4390. While you hear the answering message, hit the * button on your telephone. Or you can send me e-mail kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP). Q I was having problems using malloc on a UNIX machine. After allocating some memory with malloc(), I wrote past the end of the allocated memory. The next time I called malloc(), it hung up. I ran the same program on an IBM-PC and it worked fine. What gives? Jim Campbell Durham, NC A Writing beyond (or before) the memory space that is allocated with malloc and related functions can cause some serious problems. These functions allocate a block of memory from the heap (memory space not used for code, data, and stack). They return the address of the memory block. The memory remains allocated until you call free(), passing it the address of the block. This deallocates the block and returns it to the heap. When the program exits, it will free up any allocations you have for which you have not called free(). These functions look like: #include <stdlib.h> void *malloc(size_requested) size_t size_requested; /* Number of bytes */ void free(pointer) void *pointer; /* Address of memory to free */ You request an amount of memory in bytes. The function returns to you an address which points to the first byte of the allocated memory. You can use this memory for any purpose. However, you should not write in the memory preceding or following the allocated block. The operating system and/or the compiler usually use a few bytes of memory adjacent to the allocated block. These bytes, sometimes called the "block header", may come before or after the block. The header keeps such information as the size of the block allocated, and usually some pointers, including one to the next block (i.e., a linked list). If the information in this block header is destroyed, the system cannot allocate a new block or deallocate an old block. Basically, the block looks something like the diagram in Figure 1. Let's assume that the information is kept after the block, as it appears in the case of your UNIX machine. You probably did something like: char *pc; pc = malloc(100); ... *(pc + 100) = 0; ... pc1 = malloc(200); and overwrote the first byte in the block header. When you attempted the next allocation, malloc() hung up as you destroyed the block header for the previous block. On a PC, the block header typically appears before the allocated memory. In that case, your program ran okay, as you were simply writing into unallocated memory, which contains no information. Depending on the order in which you perform allocations and illegal accesses, you could still have problems. For example, let's assume that you performed both allocations first, and then an illegal access: char *pc; char *pc1; pc = malloc(100); pc1 = malloc(200); *(pc + 100) = 0; Assuming that you do not attempt to allocate blocks later on in the program, this will execute as if no error occurred until the program attempts to exit. When the operating system tries to free the allocated memory, it will become confused due to the erroneous block header information. You will get a dreaded "Memory allocation error -- system halted" message. With some compilers, malloc() does not call the operating system routine if the request can be satisfied from its own unallocated buffer. In this case, you may not see this allocation error, since the exit operations will simply free all the buffer at once and not the individual pieces. Q I am using an array of pointers; each pointer points to a structure; and each structure contains several strings of various lengths. My array of pointers is declared something like this: struct { char firstname[MAX_FIRSTNAME+1]; char lastname[MAX_LASTNAME+]; char homephone[MAX_HOMEPHONE+1]; char workphone[MAX_WORKPHONE+1]; char areacode[MAX_AREACODE+1]; char street [MAX_STREET+1]; char city[MAX_CITY+1]; char state[MAX_STATE+1]; char comments[MAX_COMMENTS+1]; } *record[MAX_RECORDS]; It follows that I could display each element of the structure that represents the current record as follows: show_record () { printf("%s\n",record[current-record]->firstname); printf("%s\n",record[current-record]->lastname); printf("%s\n",record[current-record]->homephone); printf("%s\n",record[current-record]->workphone); printf("%s\n",record[current-record]->areacode); printf("%s\n",record[current-record]->street); printf("%s\n",record[current-record]->city); printf("%s\n",record[current-record]->state); printf("%s\n",record[current-record]->comments); } However, it seems that much of the code is unnecessarily duplicated. It would be more efficient if I could create a loop and access a different element of the structure each time through the loop. My show_record() function would then look something like this: show_record() { int i: for(i = 0; i < NUM_OF_FIELDS; i++) { printf("%s\n",record[current_record]->??? ); } } Where ??? is the part I can't figure out. I could think of ways to do it in assembly language by providing additional data types and accessing them in the loop. Since the elements of a structure are usually word aligned, it's hard to even be sure how many bytes are between each element of the structure. Again, any information you could provide would be greatly appreciated. Jonathan Wood Irvine, CA A Accessing individual members of a structure in a loop is a commonly needed operation. There are several ways that you can do this. Let me change your structure template slightly and add a tag-type. I normally avoid declaring variables when declaring a structure template, eliminating the need to declare those variables when you use the template in another program. A clean structure template is a handy thing to have around because it makes declaring variables of the same structure a breeze. struct s_record { char firstname[MAX_FIRSTNAME + 1]; ... }; struct s_record *record[MAX_RECORDS]; You could use a static variable, which will have constant addresses and set up an array of pointers to those addresses. show_record() might then look like: static struct s_record print_record; #define NUMBER_FIELDS 9 char *record_field_address[NUMBER_FIELDS] = { &print_record.firstname, &print_record.lastname, ... /* Remainder of the fields */ }; show_record() { int i; /* Copy in the record to be printed */ print_record = *record[current_record]; for (i=0; i < NUMBER_FIELDS; i++) { printf("%s\n", record_field_address[i]); } } One feature in the new ANSI standard, the offsetof() macro, can help you out here. Its syntax is: #include <stddef.h> offsetof( type, member-name) The type is a structure type and the member-name is a member in the structure. Instead of keeping the address of individual members in an array, you simply keep the offsets from the start of a structure. For example, #define NUMBER_FIELDS 9 size_t record_offsets[NUMBER_FIELDS] = { offsetof(struct s_record, firstname), offsetof(struct s_record, lastname), ... /* Remainder of the fields */ }; Now show_record could look something like: show_record() { int i; char *pc; pc = (char *) &record[current_record]; for (i=0; i < NUMBER_FIELDS; i++) { printf("%s\n", pc + record_offsets[i]); } } Note that the conversion of the address to a char pointer is necessary. If you simply printed out &record[current_record] + record_offsets[i], you would get the address of something which is record_offsets[i] * sizeof(struct s_record) after the beginning of record. I would suggest that you change the calling sequence of show_record so that it expects a record (or an address of a record). This way, you can print out records that are not part of the array (such as a record that might be used for input purposes). show_record(record) /* Prints out a record */ struct s_record record; { int i; char *pc; pc = (char *) record; for (i=0; i < NUMBER_FIELDS; i++) { printf("%s\n", pc + record_offsets[i]); } } or show_record(precord) /* Prints out a record, whose address is passed */ struct s_record *record; { int i; char *pc; pc = (char *) precord; for (i=0; i < NUMBER_FIELDS; i++) { printf("%s\n", pc + record_offsets[i]); } } You might want to be even more organized and create another structure that contains not only the offsets, but also the names of the members, so that you can use the same names everywhere you print the record. struct s_field { char name[MAX_FIELD_NAME + 1]; size_t offset; } #define NUMBER_FIELDS 9 struct s_field fields[NUMBER_FIELDS] = {"First name", offsetof(struct s_record, firstname)}, {"Last name", offsetof(struct s_record, lastname)}, ... /* Remainder of the fields */ }; With this you might have a function like: show_record_with_field_names(precord) /* Prints out a record, whose address is passed */ struct s_record *record; { int i; char *pc; pc = (char *) precord; for (i=0; i NUMBER_FIELDS; i++) { printf("%-20.20s: %s\n", fields[i].name, pc + fields[i].record_offsets); } } You should note that elements of a structure are not necessarily word aligned. On a PC, they can be byte aligned or word aligned. I prefer packed (i.e., byte alignment) structures, in order to save space, but there is a slight element of speed in using non-packed structures. Note that the sizeof() operator and the offsetof() macro take into account any padding bytes (unused bytes due to alignment). In fact, it is the potential presence of padding bytes that made the ANSI committee eliminate the equality comparison of structures. For example: func() { static struct s_record record_1; struct s_record record_2; if (record_1 == record_2) ... } The padding bytes in record_1 will be set to 0, since it is a static variable. The padding bytes in record_2 will be garbage, since record_2 is an automatic variable. You could use the fields array shown above to create a structure comparison function, if you required it. Q How do you make a binary data file that is portable between the MAC and the IBM PC? Richard Walton Wellesley, MA A Porting data files between any two systems presents a problem in that the representation of the numbers varies from computer to computer. A common way of avoiding this problem is to output the data to an text file using fprintf() and to read the data on the other machine using fscanf(). For example, on one machine you would have: struct_s record { int one_number; double another_number; }; write_record_to_file(data_file, record) FILE *data_file; struct s_record record; { int ret; ret = fprintf(data_file, "%d %lf\n", record.one_number, record.another_number); return ret; } On the other machine, you would use: read_record_from_file(data_file, precord) FILE *data_file; struct s_record *precord; { int ret; ret = fscanf(data_file, "%d %lf", &(precord-> one_number), &(precord->another_number) ); return ret; } If you do not wish to have the overhead of the conversions done by fprintf() and fscanf(), then you will need to write some specific code. For example, suppose on an IBM you have written out the records as: write_record_to_file(data_file, record) FILE *data_file; struct s_record record; { int ret; ret = fwrite(&record, sizeof(struct s_record), 1, data_file); return ret; } On the other machine, you will have to rearrange the bit patterns manually: #define SIZE_BUFFER 8 /* Size of record on other machine */ read_record_to_file(data_file, precord) FILE *data_file; struct s_record *precord; { int ret; char buffer[SIZE_BUFFER]; ret = fread(&buffer, SIZE_BUFFER, 1, data_file); /* Now you need to convert each value individually */ convert_ibm_int_to_mac_int(&buffer[0]°, & (precord->one_number); convert_ibm_double_to_mac_double((&buffer[2], & (precord->another_number); return ret; } Now each of the individual members must be dealt with separately. The double conversion is a bit of a bear. As they say in the teaching business, it is reserved as an exercise for the student. The integer conversion might look like: convert_ibm_int_to_mac_int(pibm_number,pmac_number) char *pibm_number; char *pmac_number; { /* Reverse the byte order */ *(pmac_number) = *(pibm_number + 1); *(pmac_number + 1) = *(pibm_number); } Note that I have simply shown a return value for each of these file functions. You probably want to be more clever and test the functions so that the return value is consistent among all the functions. For example, the first function might look like: #define BAD_IO 1 #define GOOD_IO 0 write record_to_file (data_file, record) FILE *data file; struct s_record record; { int ret; int io_ret; ret = fprintf(data_file, "%d %lf\n", record.one_number,record.another_number); if (ret < 1) io_ret = BAD_IO; else io_ret = GOOD_IO; return io_ret; } Q I am in the process of implementing hotkey-controlled real-time data acquisition for some laboratory experiments. This is being achieved by taking control of the keyboard interrupt number 0x09. My compiler is Microsoft C v5.1. The experimental apparatus has three distinct modes of operation: A, B, and C, which are to begin upon the striking of their respective keys from the keyboard. Assume that task A, defined by its function, fA(), is currently executing and that the user now strikes the key to commence task B, similarly defined by its function, fB(), so that fA() stops and fB() starts. My question is this: Can you continually interrupt function i and start function j and expect to escape a stack overflow? How does one handle suspending a function at an arbitrary time with no a priori intention of returning to it (which would free the stack space used by the function). I would imagine that you could do this a few times, but what about suspending A and starting B (or C) an arbitrary number of times? Perhaps setjmp() and longjmp() are the solution. Another serious problem that concerns me is that my method does not seem to admit a way to signal end-of-interrupt to the keyboard handler (or to whatever is listening). Because the directives to begin execution of function A, B, or C are embedded in the new 0x09 interrupt handler, the handler could potentially never finish executing during the experiment. Is there a better implementation which can achieve what I need and still use hotkeys? Mark S. Petrovic Stillwater, OK A You are right in your concern over stack overflow. If you keep calling an interrupt function without clearing up the stack (i.e., with an IRET instruction), you will eventually run out of stack space. An interrupt function that might cause overflow could look like the following, where keyboard_input() is a function that gets the actual keystroke. control_function() /* This will only be called if a keyboard interrupt */ { int c; /* Get the key that was hit */ c = keyboard_input(); switch(c) { case 'A': function_a(); break; case 'B': function_b(); break; case 'C': function_c(); break; default: function_default(); break; } /***** This function never returns *****/ } function_A() { /* Code to perform function A */ } function_B() { /* Code to perform function B */ } function_C() { /* Code to perform function C */ } function_default() { /* Code to perform default function */ } Everytime you invoke the interrupt, another set of flags and return addresses are pushed onto the stack. set_jmp/longjmp provide an appropriate mechanism for implementing the sort of structure you desire. These two functions allow you to set a place marker in your code (setjmp) and then jump directly back to it from another routine (longjmp). Without setjmp/longjmp to report an error that occurred several levels deep in a program, you could return an error value at every level as you exit the nested calls. With setjmp/longjmp you can instead simply jump back to a central error handler and give it the error value. The function calls are: #include <setjump.h> int setjmp(environment) jmp_buf environment; /* Will hold the place information */ and void longjmp( environment, return_value) jmp_buf environment; /* Place information from setjmp */ int return_value; /* To be returned to setjmp */ setjmp() returns 0 the first time it is invoked. The calling function can test this and ignore any error condition. When longjmp() is called, the next C instruction to be executed is the equivalent of a return from setjmp(). This returns execution to the place marked by the call to setjmp(). One of the parameters to longjmp() is a non-zero value which was setjmp()'s return value. longjmp() cleans up the stack from any nested function calls. The parameter passed to setjmp() is of type jmp_buf. This variable holds information regarding the current position of the stack. You can call setjmp() in many different places and pass it different variables of type jmp_buf. The value passed to longjmp() determines to which of the setjmp() calls it will return. The code below gives an indication of how your problem might be programmed. You would connect this up to the keyboard interrupt. #include <setjmp.h> #define TRUE 1 #define FALSE 0 control_function() /* This will only be called if a keyboard interrupt */ { int c; /* Character input */ int ret; /* Return value from setjmp() */ jmp_buf environment; /* For the setjmp */ static int init = FALSE; /* First time through flag */ if (init) { /* Stop previous execution */ longjmp(environment, 1); } ret = setjmp(environment); if (ret == 0) { /* This is the return from the initial setup */ init = TRUE; } else { /* This is the return from the longjmp */ ; } /* Get the key that was hit */ c = keyboard_input(); switch (c) { case 'A': function_a(); break; case 'B': function_b(); break; case 'C': function_c(); break; default: function default(); break; } /******* THIS FUNCTION NEVER RETURNS */ } Alternatively, you could avoid using an interrupt by coding each function to periodically check for something on the keyboard stack. This approach does kludge up your lower level functions. However, if the lower level functions have sections of code that should not be interrupted, then this less elegant method may be preferable. Two Microsoft (and some other compiler) functions (not ANSI standard) support this alternate approach. The kbhit() function returns non-zero if there is a key in the buffer. The getch() function returns the character in the buffer, without waiting for a carriage return. #include <setjmp.h> #define TRUE 1 #define FALSE 0 main() { int c; /* Get the key the first key*/ while (1) { c = getch(); switch (c) { case 'A': function_a(); break; case 'B': function_b(); break; case 'C': function_c(); break; default: function_default(); break; } } /* End while loop */ function_A() { /* Code to perform function A */ /* Inside each loop: */ if (kbhit()) return; } function_B() { /* Code to perform function B */ /* Inside each loop: */ if (kbhit()) return; } function_C() { /* Code to perform function C */ /* Inside each loop: */ if (kbhit()) return; } Figure 1 New Releases A New Year's Wish List Kenji Hino Kenji Hino is a member of The C Users' Group technical staff. He holds a B.S.C.S. from McPherson College and an undergraduate degree in metalurgy from a Japanese university. He is currently working toward an M.S.C.S. at the University of Kansas. New Releases CUG299 -- MEL and BP This volume contains two programs, MEL -- Universal Metalanguage Data Processor submitted by George Crews (TN), and BP -- Back Propagation for neural networks by Ronald Michaels (TN). MEL provides an I/O interface between a program and the user. It can take input data written in "pseudo-English" and translate it into program variables. It can also translate a program's variables into pseudo-English. (See the article on page 33 in this issue.) MEL was originally designed for use with engineering analysis programs. It was written in ANSI C and was developed using Microsoft C v5.1. The disk includes MEL source code, a test example program, sample input and output files, documentation, and the article and listings from this issue. Since MEL provides only a processor engine, you need to define your own input and output data format rule (called a dictionary) for your application program in mel.h. BP is a simple implementation of the back propagation algorithm as an example of a neural network. The implementation is based upon the article in Nature, "Learning representations by back propagating errors" by Rummelhart, Hinton and Williams. BP employs an adaptive algorithm that converges as result of learning. BP was developed on an AT clone with a math coprocessor using Zortech C v1.07. The disk also includes the Hercules graphics version of BP. CUG300 -- MAT_LIB Our first volume in the 300's is a shareware package, MAT_LIB -- Matrix Library submitted by John J. Hughes III (TN). MAT_LIB includes approximately 50 C language functions and macros which input and output tabular data maintained in ASCII text files. While the tabular data is in RAM, it is stored in dynamically-allocated token or floating-point arrays on the heap. Functions are provided to examine an ASCII text file to determine the number of rows, columns, and token size of the tabular data in the file. Other C macros dimension either a floating-point or string token array large enough to hold the ASCII data. Once in memory, floating-point array matrix operations can be performed on the data. Token array data can be converted to and from float or integer values. Floating-point arrays which have been modified by calculation can be merged into token arrays for output or they can be output to a text file directly. The output text files can in turn be used as the input for later application programs found in MAT_LIB text file formats. The disk includes a users manual, test programs, example programs, and small and medium model libraries for Turbo C. The library source can be obtained for $20 from the author (John Hughes III, 928 Brantley Dr., Knoxville, TN 37923). CUG301 -- BGI Applications This volume contains graphics applications that use Borland Graphics Interfaces (BGI) submitted by three authors, Mark A. Johnson (CO), Henry M. Pollock (MA), and John Muczynski (MI). All programs were compiled with Turbo C and use BGI files. The disk includes C source code, executable code and BGI files. Mark A. Johnson has created DCUWCU -- a simple application environment that provides a mouse-driven cursor, stacked pop-up menus, and forms that contain editable fields and a variety of selectable buttons. The sample program DRAW allows you to draw lines, circles, and text on the screen using a mouse. A stacked pop-up menu can be invoked anywhere on the screen (Figure 1). DRAW uses public domain Microsoft mouse routines written by Andrew Markley (CUJ Sept/Oct 1988). An article describing DCUWCU appeared in the Jan '89 issue of CUJ (p. 67). Henry M. Pollock has submitted a demonstration program combining trig functions and graphics functions in Turbo C. By selecting an option from the menu, the program displays circleoids, asteroids, spirals, cycloids (Figure 2), etc. My review of the JJB library in the October 1989 issue prompted John Muczynski to create a graphics pull-down menu system with deeply nested menus. The separate include code allows you to change key assignments and create macros. The new configuration may be saved and restored. He also has submitted an example program, "Conway's game of life," using the pull-down menu. Updates CUG295 -- blkio Library The blkio library released in the November issue has been updated. Version 1.1 includes minor bug fixes and modifications. Retrospective CUG started collecting and maintaining public domain source code (originally just BDS C source code) nine years ago. The library started with just ten standard CP/M 8-inch disks. Currently, the total number of volumes (one volume includes one to three 360K MS-DOS disks) has surpassed 300. The past nine years have brought remarkable changes in C compiler technology and in the microcomputer marketplace. Figure 3 shows the change in formats requested by our members. Over the past three years, CP/M has become virtually extinct and MS-DOS has come to dominate. More interesting, however, is the diversity of operating systems used in recent years. Macintosh, UNIX/Xenix, Atari and Amiga have appeared more than ever -- indicating that more and more programmers who use non-MS-DOS operating systems are interested in C and are seeking portable C source code. I think this trend is strong evidence that C is a portable language. Table 1 shows the 20 most popular disks in the last three years. The most-ordered CUG disk is MicroEmacs v3.9 (CUG#197 and CUG#198). MicroEmacs faithfully implements most of the features of Richard Stallman's Emacs editor. Daniel Lawrence claims copyright privileges for this version which has also been updated and enhanced many times by our staff and members. The secret of MicroEmacs' popularity seems to be its portablity (it runs on more than ten different operating systems), rich set of features, and its configurability -- a built-in macro language lets MicroEmacs be tailored to virtually any task. The next two most popular disks are UNIX tools used in compiler development. CUG#172, #173 and #290 are LEX, a lexical analyzer that extracts tokens from an input character stream. CUG#285 is a YACC compatible parser and code generator. As you'll notice from the Top 20 list, our library contains a wide variety of application programs and development tools, including cross-assemblers, windows, graphics, an AI application, communications, and a math package, among others. One of the more recent trends in the library is the emergence of shareware. Even though you must pay some minimal fee for the source code of a shareware program, the quality of some volumes is very competitive with more expensive commercial products. Another trend is the submission of more serious and specialized applications. For example, the 3-D medical imaging software on CUG#293-294. Wish List Even with all this diversity, there are many frequently requested packages. A Simple Text Editor Many people have asked for a simple text editor that can be embedded in their application. The editor needn't be fancy and powerful like MicroEmacs, but should offer these features: Be callable (as a function) from the application program Function in both full-screen and windowed applications Can retrieve and save a file Can browse a file (page up/down) Be modeless Support block manupilations (block copy, move, or delete) Can be compiled with small model under MS-DOS Can read up to 30K ASCII text Search or replacement is optional Go to the specified line number is optional An ANSI C Compiler This is a real challenge. We hope to address this need by distributing the GNU C compiler (and C+ + compiler) from The Free Software Foundation. .PDX Or .DBF File Function Libraries A .PDX file is an image file produced by ZSoft's PC Paintbrush. It is a common graphics file format for the PC and is also used by most scanners, fax programs, and desktop publishing programs. A .DBF file is a data file used by Ashton-Tate's dBase programs. We need function libraries that manipulate these standard format files. Spread Sheet As with the editor, we need a simple spread sheet that can be embedded in larger applications. Pascal To C Translator This would be useful for Pascal programmers trying to port their programs. Michael Yokoyama (HI) has forwarded such a program to us, but we have been unable to contact the author, Per Bergsten of Sweden, to get permission to release the program. Please let us know if you can contact Per Bergsten or know of an independent version of this code. C To Pascal Useful for Pascal programmers who want to port an application program written in C. Cross C Compiler Thanks to Will Colley, we have a variety of cross assemblers. However, our only cross C compiler is CUG204, 68000 C compiler by Matthew Brandt, which runs under MS-DOS and generates 68000 object code. We need more variety in this area (like a cross C compiler that runs under MAC and generates 8086 code). Download Fonts In A Laser Printer All sorts of applications could make better use of laser printer capabilities if they could download special fonts. We'd like a library of functions that can read Bitstream, Ventura Bitmap and other popular font files and download them to an HP compatible. Sideways Text Not a configuration utility that uses a printer's landscape mode, but a utility that exploits a printer's graphic mode to print 90° rotated text. Why not? Database Management We would like a simple and useful relational database manager -- in C. If you've seen C source code such as those listed here or can implement them, please let us know. In addition, we are interested in obtaining C++ and C source code for Macintosh. Moreover, I believe you have your own wish list. Please let me know about it for a future column. P.S. Henri de Feraudy of France, the author of Small Prolog in CUG#297, is sending us a PC version of Little Small Talk. It will be a new release in a future issue. Figure 1 Figure 2 Figure 3 Table 1 Year 1987 1. 173 LEX Part 1 (lexical analyzer) 2. 172 LEX Part 2 3. 198 MicroEmacs v3.9 Source (text editor) 4. 197 MicroEmacs v3.9 Executable & Documetation 5. 175 (Replaced with CUG285) 6. 174 (Replaced with CUG 285) 7. 201 MS-DOS System Support (ANSI driver, TRS, ..etc.) 8. 204 68000 C Compiler (cross compier for MSDOS) 9. 236 Highly Portable Utilities (Unix-like tools) 10. 200 Small C Interpreter 11. 220 Window BOSS (window library) 12. 227 Portable Graphics 13. 164 Windows 14. 218 Dictionary Part I 15. 217 Spell & Dictionary Part II (spell checker) 16. 155 B-TREES, FFT, etc. (balanced binary tree, fast fourier transform) 17. 228 Miscellaney IX (window, ISAM routines, .. etc. 18. 165 Programs from Reliable Data Structures (from Plum Hall) 19. 216 Zmodem & Saveram (communication) 20. 226 ART-CEE (rule-based inference engine) Year 1988 1. 197 MicroEmacs v3.9 Exec. & Doc. (Text Editor) 2. 198 MicroEmacs v3.9 Source 3. 259 Console I/O & Withers Tools (window functions) 4. 255 EGA Graphics Library 5. 172 LEX Part 1 (Lexical analyzer) 6. 173 LEX Part 2 7. 260 Zmodem, CU, tty Library (communication) 8. 236 Highly Portable Utilities UNIX-like tools) 9. 151 Ed Ream's Screen Editor for IBM PC 10. 263 C_wndw Toolkit (windows) 11. 248 Micro Spell (spell checker) 12. 241 Inference Engine & Rule Based Compiler 13. 242 Still More Cross Assemblers 14. 155 B-TREES, FFT, etc. (balanced binary tree, fast fourier transform) 15. 227 Portable Graphics 16. 247 Miracl (multi-precision integer and rational arithmetic C library) 17. 246 Cycles, Mandelbrot 18. 232 Little Smalltalk - Unbundled Part 2 19. 231 Little Smalltalk - Unbundled Part 1 20. 265 cpio Installation Kit (archive utility) Year 1989 (Until October) 1. 197 MicroEmacs v3.9 Exec. & Doc. 2. 198 MicroEmacs v3.9 Source 3. 285 Bison for MS-DOS (YACC like parser) 4. 290 FLEX (fast lexical analyzer) 5. 263 C_wndw Toolkit 6. 283 FAFNIR (general-purpose, table-driven forms engine) 7. 277 HP Plotter Library (graphics) 8. 173 LEX Part 2 9. 172 LEX Part 1 10. 284 Portable 8080 Emulator 11. 260 Zmodem, CU, tty Library 12. 236 Highly Portable Utilities 13. 276 Z80 and 6804 Cross Assembler 14. 155 B-TREES, FFT, etc. 15. 241 Inference Engine & Rule Based Compiler 16. 242 Still More Cross Assemblers 17. 273 Turbo C Utilities 18. 261 68K Cross Assembler for MSDOS 19. 220 Window BOSS (window library) 20. 292 ASxxxx C Cross Assemblers C Programmer's Toolbox/PC Kenji Hino Kenji Hino is a member of The C Users' Group technical staff. He holds a B.S.C.S. from McPherson College and an undergraduate degree in metalurgy from a Japanese university. He is currently working toward an M.S.C.S at the University of Kansas. Unlike UNIX, MS-DOS has no standard utility programs to support C programmers in program development or maintenance. In the past, C programmers have developed their own tools from scratch or ported tools from other operating systems to MS-DOS. UNIX tools have been ported most, simply because they are the "right" tools to improve programmer productivity. This report looks at a collection of UNIX-like tools, C Programmer's Toolbox/PC revision 2.0 by MMC AD Systems. Component The Toolbox/PC consists of Volumes I and II, which are available separately or together. I recommend getting both. Each volume includes two IBM 360K disks and costs $99.95; both volumes together go for $175. The manual (in a binder) describes both volumes, regardless of whether you purchase Volume I, II, or both. The C Programmer's Toolbox is available from MMC AD Systems, Box 360845 Milpitas, CA 95035, phone (408) 263-0781. Although the Toolbox/PC runs on PC/MS-DOS, MMC AD Systems also distributes versions of the Toolbox for the Macintosh MPW and the Sun UNIX system. The installation of the Toolbox on either a floppy disk or hard disk system is straightforward; just copy all files from the distribution to your disk. If you install the Toolbox on hard disk systems, be sure that the path is set correctly. The Tools The Toolbox includes 21 tools (see Table 1). All the tools are command-line driven. The corresponding UNIX tools are also listed in the same table. The tools help analyze the structure, format and execution of programs, manipulate and/or modify program input/output data, or verify program input/output data (see Figure 1). Covering all 21 tools in a report this size is impractical and undesirable. Thus, I will focus on the analytical tools, CFlow, PMon and CritPath. These tools are mainly used to understand a program's structure and to analyze the performance of your application program for the enhancement. CFlow Whether developing or maintaining a program, as the program becomes larger, you tend to lose sight of the overall program structure. Discerning the inter-relationships between modules becomes harder as the program grows. Even worse, you may have to study code written by somebody else. CFlow is a tool for studying code. It scans one or more C source files to generate reports that describe the hierarchy of both defined and invoked functions (external or library functions). Figure 2 shows a program flow tree, one of the reports produced by CFlow. (The analyzed source code is shown in Listing 1 and is adapted from a program in the CUG PD Library. The original author is Richard Threlkeld.) The line indentation indicates the level of function invocations. If the same function is referenced more than once, the line number of the last reference is attached to the beginning of the line. An asterisk (*) indicates whether the function is an external or run-time library function. Within the parentheses following a function name is the source filename and a starting line number of the function definition. In order to obtain the desired result, you must specify the dash/slash options appropriately. For example, function names at each level of a CFlow tree are displayed in alphabetical order by default. If you want function names displayed as they are encountered, use the -e option. In addition, when using multiple input files, the -f option is useful to display the location of each function. In this version 2.0, many improvements were made over the previous version. CFlow now reports a function pointer (such as (*a) ()) or function address (such as f(); a = f;). It also has a virtual memory system that handles programs of unlimited size (true for some of the other tools, too). The biggest improvement is that CFlow now automatically preprocesses your source code. That is, it recognizes #if directives to read and process the appropriate portions of your code. This, however, creates one problem. If a function is a macro, it is expanded and replaced with some system-level function, surprising you with some unfamiliar function name in the report, such as _filbuf() instead of getc(). This can be solved by turning off the preprocessor with the p switch, thereby sacrificing all the preprocessor benefits. Along with the CFlow Tree, CFlow generates a Master Define Function List (a list of caller and callee), an Undefined Function List (a list of external or library functions) and Function Called by List (a list of callee and caller) when you specify the proper dash/slash options. Using CFlow, the programmer can easily and quickly understand how a program is structured and which module is invoked by which module. To understand visually, you can draw the structure diagram as in Figure 3, based on the Program Flow Tree. In Figure 3 for example, if a portion of the code in crc_update() is modified, you know from the reports which other functions will be affected (in this case, crc() and crc_finish()). PMon And CritPath The execution profiler PMon is a tool which analyzes a program. It determines how much execution time is spent on each symbol (functions or BIOS/DOS calls) or program area. During program execution, PMon resides in memory with the target program, intercepts the program at regular intervals and examines the CS: IP register of the target program to determine which section of code is currently being executed. PMon tracks this information for each intercept and, using the information from the .MAP file (symbol entries), generates a set of reports. I tested PMon using the CRCK (Cyclic Redundancy ChecK) program CRC15.EXE. The program listing of CRC15 is in Listing 1; it must be compiled and linked to generate a .MAP file. The .MAP file is then processed by MapVar and placed into PMon with the target executable program. Figure 4 shows two reports resulting from monitoring CRC15. The first report is the program execution summary, which gives the complete synopsis of the program's execution. Descriptions for certain summary headings are: Total execution clicks. The total number of clock ticks recorded in the program initiation, execution, and termination. Total monitored hits. The actual number of clock ticks recorded during program execution. Total symbol entries. The total number of symbols (function names) used in the program. Number of symbols hit. The number of symbols detected in the execution. Total symbols hits. The total number of times PMon found the program executing as opposed to BIOS, DOS, or other resident programs. Time in program. The total time spent in the program vs. BIOS/DOS functions and other activities (Time below/above). Time in BIOS/DOS. The total time spent in BIOS/DOS functions. According to the program execution summary, CRC15 processed 1 file within 6 seconds. Although CRC15 contains 115 symbol entries, PMon found only four symbols during program execution, even though it checked CRC15 a total of 92 times. CRC15 made 113 DOS system calls using 12 different DOS calls. Of the 92 times checked, PMon found the program executing for 4.76 seconds (79.3%) and BIOS/DOS for 1.24 seconds (20.7 %). The second report, the Symbol execution Summary, shows where a monitored program is executing within itself, excluding DOS calls. Abs Adr -- the starting address (segment:offset) of asymbol. Hits -- the total number of times PMon found the execution of a symbol. Loc% -- the percentage of activity of a symbol when compared with the total execution excluding DOS calls. Tot% -- the percentage of activity of a symbol when compared with the total execution including DOS calls. Entry Name -- Symbol name. In this example, PMon detected that function crc_update(), whose starting address is 0:011e, executed 50 times and took 63.5% of total execution time excluding DOS calls and 54.3% of total execution including DOS calls. In addition, PMon generates a BIOS Interrupt Summary, a DOS Function Call Execution Summary Report and DOS Function Call Execution Detail Report showing the statistics of BIOS/DOS operations performed in the program execution such as Character input/output, File input/output, etc. Although these reports provide a good amount of information about software performance, further analysis can be done with CritPath command. CritPath determines the critical path of a program by analyzing the reports generated by CFlow and PMon commands. A program's critical path is the sequence of functions called from main() that consumes more execution time than any other sequence. Figure 5 shows a Critical Path Report generated by CritPath. The report provides the primary information necessary to improve a program's performance. The report shows a list of the 20 functions that used the most execution time (Top 20 Functions in Actual Time), a list of the 20 functions that by themselves and through other functions that they called used the most execution time (Top 20 Functions in Cumulative Time). Finally, the reports provide a list of the functions that comprise the critical path of the program. In this example, the critical path is the sequence of functions crc() and crc_update(). CritPath also generates both a Function Summary Report that evaluates the performance of all functions and system calls in the program and a Weighted Hierarchical Program Flow Tree. Using the statistics produced by PMon and CritPath, programmers can spot places where performance could be improved. However, these tools only identify weak spots in the program and don't come up with the method to improve the performance. Such information might be obtained from books such as Supercharging C With Assembley Language by Harry Chesley, Mitchell Waite, The Waite Group. Conclusion Overall, compared to UNIX tools, the Toolbox tools have more options and provide more detailed information, helping the programmer to take more control over program output. On the other hand, he or she must read the manual very carefully and specify the appropriate options that will generate the desired result. Furthermore, the input source code for some tools should be not only syntactically correct but done in good programming style, even if the program compiled fine. Otherwise, the output information might come out confusing. For example, the inappropriate choice of options and poor programming style (such as Listing 1) cause CFlow to report an identifier, crc as a function address, not as a variable (crc is used for a function name and variable name. This can be detected by CXref.). CFlow also doesn't distinguish between function invocation and function declaration inside a function. For beginners, the Toolbox can be a good starting point for using tools to improve productivity since the commands are very uniform and the manual is well written. In the manual, each tool is uniformly explained using sample results. In particular, observations and suggestions about the reports generated are honest and good advice for users. For advanced programmers, the combination of CFlow, PMon and CritPath can give them clues for fine tuning or improving software performance either after the program has been developed or when it is about to be updated. CFlow, CPrint, CXref and CLint can be used to study existing programs and will greatly reduce maintenance cost. Figure 1 Figure 2 *** Program Flow Tree *** ------------------------- 1: main(CRC15.c:4) 2: crc(CRC15.C:29) 3: crc_clear(CRC15.C:58) 4: crc_finish(CRC15.C:80) 5: crc_update(CRC15.C:63) 6: 5 crc_update() 7: exit(*) 8: fclose(*) 9: fopen(*) 10: fprintf(*) 11: printf(*) 12: _filbuf(*) 13: 7 exit(*) 14: 11 printf(*) Figure 3 Figure 4 *** Program Execution Summary *** Program executed: crc15.exe Delay/Run period (clicks): 0/0 Start date/time: October 19, 1989 19:45:12 Stop date/time: October 19, 1989 19:45:18 Elapsed execution time: 0: 0: 0: 6 6 seconds Total execution clicks: 95 Approximate clicks/second: 15.8 Approx sample period (ms): 63.2 Total monitored hits: 92 Total symbol entries: 115 Number of symbols hit: 4 % of total symbols hit: 3.5 Total symbol hits: 73 Avg hits/hit symbol: 18.3 Number of monitored interrupts: 2 Number of interrupts used: 2 % of total monitored: 100.0 Total BIOS interrupt calls: 141 Avg # interrupts/hit: 7.4 Total BIOS interrupt hits: 19 Avg # hits/interrupt: 0.1 Number of DOS calls used: 12 Total DOS program calls: 113 Time in program (secs): 4.76 % of total: 79.3 Time in BIOS/DOS (secs): 1.24 % of total: 20.7 Time below program (secs): 0.00 % of total: 0.0 Time above program (secs): 0.00 % of total: 0.0 Total KNOWN time used (secs): 6.00 % of total: 100.0 Total UNKNOWN time used (secs): 0.00 % of total: 0.0 *** Symbol Execution Summary *** Abs Addr Hits Loc % Tot % Entry Name --------- -------- ----- ----- ---------- 7a 12 16.4 13.0 _crc 11e 50 68.5 54.3 _crc_update 3e4 1 1.4 1.1 __chkstk 1edc 10 13.7 10.9 __aNlshr --- HINT --- HINT --- HINT --- HINT --- HINT --- HINT --- HINT --- HINT --- Concentrate on the following functions to improve your program's performance: _crc ( 13.0) _crc_update ( 54.3) __aNlshr ( 10.9) Figure 5 *** Critical Path Report *** ---------------------------- Top 20 Functions in Actual Time ------------------------------- Rank Seconds % Total Function Name ---- ------- ------- ------------- 1. 3.3 54.3% crc_update() 2. 1.0 17.4% __SysCall_3fH() 3. 0.8 13.0% crc() 4. 0.7 10.9% _aNlshr() 5. 0.1 1.1% _chkstk() 6. 0.1 1.1% __SysCall_3dH() 7. 0.1 1.1% __SysCall_40H() 8. 0.1 1.1% __SysCall_43H() 9. 0.0 0.0% crc_clear() 10. 0.0 0.0% crc_finish() 11. 0.0 0.0% exit() 12. 0.0 0.0% fclose() 13. 0.0 0.0% fopen() 14. 0.0 0.0% fprintf() 15. 0.0 0.0% main() 16. 0.0 0.0% printf() 17. 0.0 0.0% _filbuf() Top 20 Functions in Cumulative Time ----------------------------------- Rank Seconds % Total Function Name ---- ------- ------- ------------- 1. 6.0 100.0% crc() 2. 6.0 100.0% main() 3. 2.7 44.6% crc_finish() 4. 2.7 44.6% crc_update() 5. 0.8 14.1% __SysCall_3fH() 6. 0.5 8.7% _aNlshr() 7. 0.0 0.0% crc_clear() 8. 0.0 0.0% exit() 9. 0.0 0.0% fclose() 10. 0.0 0.0% fopen() 11. 0.0 0.0% fprintf() 12. 0.0 0.0% printf() 13. 0.0 0.0% _chkstk() 14. 0.0 0.0% _filbuf() 15. 0.0 0.0% __SysCall_3dH() 16. 0.0 0.0% __SysCall_40H() 17. 0.0 0.0% __SysCall_43H() The Critical Path ----------------- Act Rank Cum Rank (%) (%) ---- ---- ---- ---- 0.0 15 100.0 2 main() 13.0 3 100.0 1 crc() 0.0 10 44.6 3 crc_finish() 54.3 1 44.6 4 crc_update() Critical path hits = 62 Total hits = 92 Critical path time = 4.0 secs Total time = 6.0 secs % of total = 67.4 Table 1 Toolbox Volumes I & II UNIX tools Description ================================================================== Cat cat, cp Concatenate Data CharCnt wc Count Characters,Lines... CFlow cflow Trace C Program Flow CLint lint C Semantic Checker CPrint cb, indent C Source Code Beautifier/Reformatter CritPath Critical Path Analyzer CXref xref C Cross Reference Detab expand Remove Tabs Entab unexpand Restore Tabs ExecTime time Time Program Execution FileComp comp Compare Files FileDiff diff Difference Files FileDump od Dump File FileList List and Find Files Fill Expand Text Template MapVar Extract Load Map Variables PMon prof, gprof Program Performance Monitor STrip Extract Text Tail tail Copy End of File TabTran sed Translate Tabs TransLit tr Transliterate Characters Listing 1 #include <stdio.h> main(argc,argv) int argc; char **argv; { int i; void crc(); if (argc <= 1) { printf("USAGE:crc15 filename [filename...]\n"); exit(1); } for(i=1; i < argc; i++) { printf ("\n%-s ",argv[i]); crc(argv[i]); } exit(0); } /* main */ /* CRC * Cycric Redundancy Check * */ void crc(argv) char *argv; { FILE *fd; int crc; int c; char crc_char; int crc_clear(),crc_update(),crc_finish(); fd = fopen(argv,"rb"); if(!fd) { fprintf(stderr,"Can't open %s !\n",argv); exit(1); } crc = crc_clear(); while((c = getc(fd)) != EOF) { crc_char = c; crc = crc_update(crc,crc_char); } crc = crc_finish(crc); printf("%04x",crc); fclose(fd); } /* crc */ int crc_clear() { return(0); } int crc_update(crc,crc_char) int crc; char crc_char; { long x; int i; x = ((long)crc << 8) + crc_char; for(i = 0;i < 8;i++) { x = x << 1; if(x & 0x01000000) x = x ^ 0x01A09700; } return(((x & 0x00ffff00) >> 8)); } int crc_finish(crc) int crc; { return(crc_update(crc_update(crc,'\0'),'\0')); } Publisher's Forum I've been reading documentation. It's no fun. Here's some advice from an experienced "how to" writer, who's also an experienced programmer, about how documentation should be structured to be useful. Include an extended procedural tutorial. This section is for the user who doesn't have enough prior experience with similar products to guess what to do next. Don't mix tips about advanced tricks into this section, or cautions about product limitations and quirks. If you do, the user won't be able to find those important tidbits later without re-reading the entire section. In every "how-to" piece, focus is everything: give the procedural outline and just the procedural outline. Include a goal-oriented "Tips & Techniques" section. I don't care what fruity name you give your product, there will be certain non-obvious tricks that make it more productive. Organize these by goals -- e.g. Printing Fields From A Join, Timestamping A File, Converting File Formats. This section should be rife with cross-references and redundancy. Each goal's discussion should at least cross-reference related material that appears elsewhere, and include all the other "extraneous" information you were tempted to toss in as asides in the procedural section. Short, well-targeted examples belong here. Even if your product is "truly unique", the goals should be stated in terms of commonly recognized paradigms so that my experience with similar projects can speed my adaptation to your product. Include a thorough technical specification. No, technical specifications don't help the beginner, but they are invaluable to an experienced user. Cross-reference the specs. Include hardware requirements, interface specifications, data structure templates, file specifications, and command-line syntax for subordinate modules (even for those modules that are normally invoked by some "integrated environment driver" -- don't presume to know better than the programmer what he needs to know to get the job done). Explain the design goals and philosophy. Virtually every product started in a specific environment with a specific, limited application in mind. Yes, marketing will want to promote the product as everything for everyone, but make room somewhere in the document for the truth. Sharing the design philosophy helps the programmer understand where the product fits and reduces the early frustration level. If I'm trying to use your tool in a development project, and I know the design goals that produced the tool, I stand a better chance of designing a project that can be built with the tool. Invest in a superb index. So what if the answer to my question is in the manual. How many times can I afford to read a 900-page primer to find the two lines that are critical? The answer is a very small integer; I'm going to be calling customer support. Get your ego and marketing's time-table out of the way and hire a professional to prepare a SUPERB index. Every dollar spent on an index will be returned ten-fold in reduced customer support costs. Explain the installation process for standard environments, and then explain what configuration options are available and how they interact. Give me this information even if you do bundle a whizbang installation utility. I've probably been at this long enough to have my own ideas about where to put my working files. In short, keep your reader in mind. Design your documentation to meet the user's needs over his entire life as a user: a detailed step-by-step to orient the beginner; well-packaged goal-organized information to support the exploration and growth of the intermediate user; and comprehensive, frank, and well-indexed reference material for the experienced and technically advanced user. I mean it. Robert Ward Editor/Publisher New Products Industry-Related News And Announcements UNIX Alternative Announced For The Apple Macintosh Technical Systems Consultants, Inc. has released a UNIX compatible, real-time operating system for the Apple Macintosh family. The system, UniFLEX, supports multi-tasking and multi-users and comes complete with all development tools, a C Compiler, TCP/IP Networking support and X Window System v11.3 software. A version has also been released for Force Computer's CPU-37-singleboard VMEbus computer with integrated Ethernet hardware. For the Apple Macintosh family, price for a single system development license is $595. The price includes 90 days phone support. For the Force CPU-37, the single system licensing price is $1000 for UniFLEX/RT or $1800 for UniFLEX/RN with networking. Contact Technical Systems Consultants, Inc., 111 Providence Road, Chapel Hill, NC 27514 (919) 493-1451; FAX (919) 490-2903. Stepstone Updates Objective C The Stepstone Corporation has released its Objective-C Compiler v4.0 running under MS-DOS and Microsoft's OS/2 The Compiler is a C-based hybrid object-oriented language and is ANSI C compatible. Objective C v4.0 requires a PC/AT or PS/2 class machine running MS-DOS and Microsoft C v5.0. The compiler, packaged with a basic data structures library (ICpak101) and built-in extended memory support is $249. Stepstone has also released its object-oriented user interface toolkit, ICpak101, for workstations running the X-Windows System v11. Product information is available from the Stepstone Corporation at (203) 426-1875, (800) 289-6253 or by mail to The Stepstone Corporation, 75 Glen Road, Sand Hook, CT 06482. Lattice's New 6.0 Release Features ANSI Compliance Lattice, Inc. is shipping v6.0 of its C compiler for MS-DOS & OS/2. The release features major enhancements to the compiler, a global optimizer, new programming utilities, and a number of new library functions. Both the compiler and libraries are now ANSI compatible. Version 6.0 contains a new global optimizer, automatic register variable support, in-line function support, optimized libraries, and upgrades to the compiler. The Lattice C Compiler v6.0 allows program modules compiled under different memory models to be linked into a single program. The Lattice v6.0 now includes LASM, a full-featured macro assembler with support for 386 systems. LASM is compatible with MASM, and its output is compatible with CodePRobe so assembly language programs can also be debugged at source level. Utilities now bundled with the compiler are an overlay linker, a MAKE facility, BIND Utility, and several UNIX-like tools including EXTRACT and BUILD, DIFF, GREP, SPLAT, TOUCH, and WC. Programmer's tools in the compiler package include the CodePRobe source level debugger, an integrated editor, object module disassembler, object module librarian, and an automatic installation program. In addition to the OS/2 API and special graphics libraries in the previous version, Lattice adds its Curses screen management library, communications library, the dBC III library of database functions, and a protected mode OS/2 library. The new list price of $250 includes unlimited free technical support through Lattice's telephone hotline, bulletin board, MIX network, or written correspondence. Lattice provides an unconditional, 30-day money-back guarantee with each product. For further information, contact: Lattice, Inc., 2500 South Highland Avenue, Lombard, IL 60148 (312) 916-1600; FAX: (312) 916-1190. Greenleaf CommLib, V3.0 Released Greenleaf Software has released a new version of its communications library, CommLib. Greenleaf CommLib v3.0 includes Kermit, XModem, XModem 1K, and YModem batch file transfer protocols. It fully supports automatic RTS-CTS hardware flow control, Hayes modem control functions, and XON/XOFF software flow control. CommLib automatically filters up to three codes from the receive stream, stores status along with data in a "WideTrack Receive" mode, and programmatically ignores or reacts to modem status at the interupt service level. The Greenleaf CommLib supports the PC, XT, AT, PS/2, and compatible machines using COM1 and COM2 ports, and COM3..COM8 on a PS/2. It also supports up to 35 ports when using multiport boards. It can serve several families of multi-port boards, including Digiboard, Stargate, Arnet, Contec, Quatec, and Quadram. CommLib v3.0 is $299. For additional information and a free Demo disk, contact Greenleaf Software, Inc.; 16479 Dallas Parkway, Suite 570; Dallas, TX 75218; (800) 523-9830; FAX (214) 248-7830. XVT Now Runs On MS-DOS, OS/2 And UNIX New character-based versions of GSS' XVT Extensible Virtual Toolkit are available for MS-DOS, OS/2 and UNIX programmers. XVT allows programmers to support character displays with applications that feature windowing, pull-down menus, dialog boxes, scroll-bars and other graphical user-interface features. The same application source code can support the Windows, PM and Mac GUIs. Versions for Windows, PM, Macintosh and UNIX list for $595. XVT carries no run-time redistribution royalties. The company is located at 9590 SW Gemini Drive, Beaverton, Oregon 97005 (503) 641-2200; FAX: (503) 643-8642. Helios Enhances Proteus System Helios Software has released a new version of its prototype/demo system, Proteus. Proteus v4.5 enables software developers to build functional prototypes, marketing demos, tutorials and other interactive presentations. Version 4.5 offers an integrated environment to build character-based demos and bitmapped demos in any of 23 graphics formats. Both full-screen and overlay images can be displayed, using 26 different video effects. Designers can create screens with the built-in Screen Painter or configure Proteus to execute any external paint program. Captured screens can also be incorporated into demos. Proteus is $199 for a three-disk set, with examples in source code. There is a 30-day money-back guarantee, no royalties for distribution and no sign-on screen. Required hardware configurations depend on the graphics mode used, ranging from monochrome text to super-VGA. The Helios order number is (800) 634-9986, or contact them at P.O. Box 22869, Seattle, WA 98112 (206) 324-7208. High C V1.6 Includes 486 Support MetaWare Inc. has released its ANSI-conformant High C compiler v1.6 for 386/DOS on the 80386 and the 80486 in protected mode. Protected mode on the 386 and 486 is supported in conjunction with MS-DOS extenders. Specific support for the 486 is provided under toggle control. MetaWare has also released High C v1.6 for OS/2 and real-mode MS-DOS. Version 1.6 features expanded libraries, new documentation, two editors, a disk cache utilily, a B-tree library, and a graphics library for the 80386/486 in protected mode. Users also get MetaWare's new make facility, and DOS Helper which is a set of UNIX-style utilities for the MS-DOS operating system. This upgrade comes with the GFX/386 Graphics library, produced in conjuction with C Source. GFX for the 80386 is a user-transparent port of the C Source GFX graphics package. The graphics package provides specific floating-point graphics function; MetaWare is providing additional libraries that support the 80387 and Weitek Abacus. High C also includes the EC editor from C Source, HyperDisk disk cache from HyperWare, and source code for the MicroEMACS editor. In addition, v1.6 will be bundled with two products from Sterling Castle: BlackStar/386 "C" Function Library and BPTPlus in C. These products provide data retrival capabilities and over 300 additional library functions. Sterling Castle's BlackStar/386 C libraries and the GFX/386 Graphics library are available only through MetaWare. Please refer inquiries to MetaWare Incorporated, 2161 Delaware Avenue, Santa Cruz, CA 95060-5706 (408) 429-6382; FAX (408) 429-9273. Prototyping Tools Combined Genesis Data Systems has consolidated its line of prototyping and presentation products, promerly sold as RADs and RPS, into a single system named "ProtoFinish." ProtoFinish is a versatile system for creating prototypes, demos, tutorials and other presentations. It includes a screen design module for building ASCII-based screens, a memory-resident utility for capturing text or CGA graphics screens, a music module for adding sound, a flexible 4th-generation language for accurately simulating the look and feel of a program, and a royalty-free run-time utility for distribution. Libraries of assembly language routines, primarily for incorporating screens in C, PASCAL, BASIC, and Clipper code, are included for the programmer. Contact Genesis Data Systems, 8415 Washington Place NE, Albuquerque, NM 87113 (505) 821-9425; FAX (505) 821-9695. LISP Objects Sapiens Software has released a beta test version of its Common LISP Object System (CLOS) implementation. CLOS supports generic functions and methods (rather than message passing), and multiple inheritance of object slots. Star Sapphire CLOS is embedded in the Star Sapphire LISP v3.1 run-time, written in C, which eliminates CLOS loading time. Star Sapphire LISP runs on any PC compatible with 640Kb and a hard disk; extended memory can be used if installed. The product is $99.95 from Sapiens Software Corporation, P.O. Box 3365, Santa Cruz CA 95063 (408) 458-1990. Faircom Offers 'Special Edition' The Faircom Corporation has released a new application development toolbox, which includes the d-tree development environment, file management system and report generation system. Faircom is introducing this product with a $695 "Special Edition" package and a 30-day, no-risk trial offer. For more information, contact Faircom at (800) 234-8180, 4006 West Broadway, Columbia, MO 65203; FAX (315) 445-9698. Oakland Updates Screen Tools Oakland Group, Inc. has released v3.1 of the Look & Feel screen designer and the C-scape interface management system. Look & Feel lets you prototype and simulate screens, and automatically turn screens into C source code that will run across MS-DOS, OS/2, UNIX, and VMS. The new version of C-scape allows for total portability, has fewer levels of indirection, and creates smaller executables. MS-DOS and OS/2 versions of C-scape with Look & Feel cost $399, including source code. Look & Feel costs $149; C-scape $299. UNIX versions begin at $999. Look & Feel source code costs $900. For more information, contact Oakland Group, Inc., 675 Massachusetts Avenue, Cambridge, MA 02139 (800) 233-3733 or (617) 491-7311. New Linker Pocket Soft, Inc., has released .RTLink/Plus, an advanced overlay linker which supports debugging of programs with multiple and nested overlays with Microsoft's CodeView debugger. .RTLink/Plus also provides a unique link-time Profiler, which gives a detailed performance analysis in timing intervals which are user-adjustable to thousandths of a second. Pocket Soft is an authorized licensee of Microsoft CodeView information. .RTLink/Plus has a list price of $495 and is available through most common distribution/reseller channels and direct from Pocket Soft, Inc., 7676 Hillmont, Suite 195, Houston, TX 77040 (713) 460-5600. Tool Writes Dialog Box Source Code The Software Organization, Inc. has released DialogCoder, a programming tool that eliminates as much as 95 percent of the coding normally associated with windows dialog box programming. DialogCoder automatically generates C source code from dialog templates to manage all controls in the dialog; it uses graphical metaphors to express the relationships between dialog controls and actions, which eliminates most of the conventional dialog control programming. It also allows users to interactively specify the state of each dialog control during initialization and command processing. DialogCoder requires a 286-or 386-based machine with Windows 2.X. A Microsoft-compatible mouse is optional. DialogCoder is $349. To order, contact the Software Organization, Inc. at (800) 696-2012. Trio Releases C-Index/PC Trio Systems has started shipping a new $195 C database library, C-Index/PC. The new product, based on their C-Index/Plus package, allows C programmers to incorporate database features into their applications running under Microsoft Windows, OS/2, and MS-DOS. The C-Index/PC database library supports single-user and multi-user LAN applications with full file management facilities. Complete source code is supplied with C-Index/PC and can be adapted for use with any PC compiler and operating system running on an Intel microcomputer. Product features include: precompiled libraries for Microsoft C and Turbo C, B + Tree indexing, variable-length records, direct and sequential access and multiple record formats per file. There are no application royalties. For more information, call (818) 798-5567. New Debug Tool Traces Memory References TUITS Inc. has introduced Dr. MD., a run time memory tracking utility that finds memory overwrite bugs before an application crashes. Dr. MD catches memory overwrites when they happen. It also catches 'free()s' on invalid pointers, and dangling pointers. Dr. MD will not allow you to overwrite allocated or automatic variables. When Dr. MD finds a problem it reports the source file and the line number where the problem was found as well as where the space was allocated. No heap walking is needed. Dr. MD comes as source and you compile it with your compiler to fit your environment. The vendor claims it should work with any ANSII standard compiler, and has worked successfully in MS-DOS and UNIX System V environments. Dr. MD supports all the string library functions as well as memset, memcpy, and limited support of sprintf. Dr. MD sells for $59.95, and includes source code, manual, and some hints on memory management. For more information contact TUITS Inc., 411 N. Shields, Fort Collins, CO 80521, or call at (303) 224-9070. AtLast Offers Overlay Tools AtLast Software has released two new products: Overlay Architect, which automates the process of overlay construction, and Overlay Optimizer, which analyzes the performance of the program's overlay structure, then determines how to rebuild the overlays for the best performance in a given amount of space. AtLast Software will also custom build an overlay structure for developers who do not want to build their own. Overlay Architect sells for $369; Overlay Optimizer for $269. They can be purchased together for $569. Quantity discounts are available. Custom built structures are priced individually. MicroWay 486 Compilers For C, Pascal & FORTRAN MicroWay has released its 80486-targeted series of compilers, NDP C-486, NDP Fortran-486, and NDP Pascal-486. Each of the NDP-486 compilers include a "scheduler/code generator" that aligns code and data on paragraph boundaries, detects and minimizes prefetch buffer starving, uses new code sequences that run faster on the 80486 than the 80386, and incorporates a new strategy for driving the Weitek 4167 high speed coprocessor. They also provide a library of 70 device-independent graphics, keyboard, and sound routines. C, NDP Fortran, and Pascal-486 generate globally optimized, 32-bit native code that runs in protected mode under UNIX 386 System V v3.0, SCO XENIX 386 v2.3, and Phar Lap extended DOS. The compilers support the 486's built-in FPU and the Weitek 4167 numeric coprocessor. NDP C-486 is a two-dialect compiler that passes 100 percent of the Plum Hall validation suite for UNIX System V C and 95 percent of the tests for the new ANSI C standard. It includes an inline assembly language interface that simplifies the writing of embedded code by allowing the programmer to specify register values and generate interrupts. The MS-DOS, UNIX, and XENIX versions of NDP C-486, NDP Fortran-486, and NDP Pascal-486 retail at $1195 each. The C + + preprocessor lists at $495. All of the compilers include one year of free updates. Users should contact MicroWay's Technical Support Staff at (508) 746-7341 for more information. DOS Extender Supports Turbo C Eclipse Computer Solutions, Inc.'s OS/286 MS-DOS extender now supports Borland's Turbo C v2.0 and will soon support Turbo Pascal as well. The MS-DOS extender products of Eclipse Computer Solutions, Inc. (formerly A.I. Architects) exploit the protected mode operation of the 80286 and 80386 processors and make it possible to create, with conventional development tools, applications that are not restricted by normal MS-DOS memory limits. Contact Eclipse Computer Solutions, Inc., One Intercontinental Way, Peabody, MA 01960 (508) 535-7510; FAX: (508) 535-7512. T & T Enhances Data Junction Tools & Techniques has released Data Junction v3.01. The new version adds formats, an improved user interface, an expanded EZ-Convert mode, 300 percent plus speed improvements, a built-in case translation, and new conversion filters. MS-DOS licenses are $99 for Data Junction: Standard, $199 for Data Junction: Professional, and $299 for Data Junction: Advanced. UNIX/Xenix and LAN licenses start at $495. Data Junction is written in C, and distribution/OEM licenses are also available. For more information, contact Micheal Hoskins at Tools & Techniques Inc., 1620 West 12th Street, Austin, TX 78703 (800) 444-1945, or (512) 482-0824. LALR Adds Scanner Generator To Version 3.2 LALR Research has released LALR v3.2 which features the following improvements over v3.0. A lexical scanner generator is included which provides a 10 percent increase in syntax checking speed over the previous hand-written scanner. An option has been added to generate 0-40 percent smaller parsers. Multiple parsers can exist in an application program. Parsers can read input files of unlimited size. The input grammar format for the new version is fully compatible with previous versions. LALR v3.2 is $249 and comes with a 60-day, money-back guarantee. Upgrades from LALR v3.0 are $150. Shipping is $6. For more information, contact LALR Research at PO Box 4722, Chico CA 95927 (916) 345-0916. Solbourne Updates OS/MP Solbourne Computer, Inc., has shipped the latest version of its multiprocessing operating system, OS/MP v4.0A, which is based on the SunOS v4.0.1, licensed from Sun Microsystems Inc. OS/MP v4.0A introduces a set of system administration tools, to handle user account maintenance, group account maintenance, network group account maintenance, network account maintenance, NFS client maintenance, NFS server configuration and modem installation. OS/MP v4.0 also includes two new X Window tools. Smail is a user-friendly interface to the standard UNIX mail environment. Sproperty displays the property of any visible X Window. Contact Solbourne Computer, Inc. at 1900 Pike Road, Longmont, CO 80501 (303) 722-3400; FAX: (303) 772-3646. Belief Maintenance Using The Dempster-Shafer Theory Of Evidence Dwayne Phillips The author works as a computer and electronics engineer with the U.S. Department of Defense and is a doctoral candidate in Electrical and Computer Engineering at Louisiana State University. His interests include computer vision, artificial intelligence, software engineering, and programming languages. He first used the Dempster Shafer theory of evidence in 1984 and uses it extensively in his PhD research into computer vision. An expert system makes a decision given an amount of evidence. Usually it must choose between several competing answers or hypotheses. The human expert keeps these answers in his mind while he thinks over the problem. He gathers evidence and shifts his thoughts from one answer to another. After gathering evidence, he chooses the most favorable answer. We all do this in our daily decisions, but we don't think about the process, and we certainly don't keep track of specific numbers in our head. An expert system needs a sub-system to pool evidence and reach decisions: a belief maintenance system. The belief maintenance system keeps track of the hypotheses and the degree of belief attributed to each hypothesis. When the expert system finishes gathering evidence, the belief maintenance system chooses the answer. In some expert systems a belief maintenance system is not necessary, because some expert systems make decisions based on a single, clear cut piece of evidence. For instance, suppose an expert system has the task of rolling up the windows in your car. The evidence is whether or not it is raining. The system would check the atmosphere and ask, "Is it raining?" If the answer were yes, it would roll up the windows. In other expert systems a belief maintenance system is essential. Suppose the expert system had to decide at 9.00 AM whether or not to roll up the windows at 3.00 PM. Now the question is tougher. Evidence would include the daily weather forecast, the wind speed and direction, the relative humidity, weather records from past years, forecasts from the Farmer's Almanac, satellite photographs, and other relevant sources. The expert system would pool all the evidence and arrive at an answer. Consider the nature of evidence. Some evidence is not reliable (the weatherman is wrong sometimes and right sometimes). Some evidence is uncertain (an intermittent atmospheric reading). Some is incomplete (the wind speed by itself does not tell us much). Some evidence is contradictory (the weatherman's forecast and the atmospheric conditions). Finally, some evidence is incorrect (a broken atmospheric sensor or a wrong weather forecast). The belief maintenance system must deal with these factors, taking the evidence, assigning a measure of belief to each hypothesis, and changing this belief as new evidence becomes available. The resulting decision must be the same regardless of the order in which the system gathers the evidence. The method of belief maintenance that most of us know is classical probability. The basic properties of this system are [Beyer]: A) P(Æ) = 0 (null set) B) P(Q) = 1 (entire sample set) C) P(A) = 1 - P(A') D) P(AB) = P(A) + P(B), if A and B are mutually exclusive E) P(AÇB) = P(A) * P(B), if A and B are mutually exclusive Another belief maintenance system came from the MYCIN project (a pioneering medical expert system developed in the early seventies by Edward Shortliffe at Stanford.) MYCIN used a system of certainty factors to keep track of hypotheses. Shortliffe later dropped the certainty factor system for the Dempster-Shafer theory of evidence. The Dempster-Shafer (D-S) theory of evidence was created by Glen Shafer [Shafer, 1976] at Princeton. He built on earlier work performed by Arthur Dempster. The theory is a broad treatment of probabilities, and includes classical probability and Shortliffe's certainty factors as subsets. In the D-S theory of evidence, the set of all hypotheses that describes a situation is the frame of discernment. The letter Q denotes the frame of discernment. The hypotheses in Q must be mutually exclusive and exhaustive, meaning that they must cover all the possibilities and that the individual hypotheses cannot overlap. The D-S theory mirrors human resoning by narrowing its reasoning gradually as more evidence becomes available. Two properties of the D-S theory permit this process: the ability to assign belief to ignorance, and the ability to assign belief to subsets of hypotheses. An example provides the easiest way to understand these properties and how they differ from classical probability. Suppose we want to decide which of three persons in an office -- Adam, Bob, and Carol -- will come in early to turn on the lights and make coffee. In the D-S theory the set Q = {Adam or Bob or Carol}. The sets {Adam}, {Bob}, and {Carol} are the mutually exclusive and exhaustive hypotheses. They are singletons. In the frame of discernment there are 2Q or 8 possible interpretations. (Figure 1) Figure 1 contains two special sets in {Æ} and {Adam, Bob, Carol}. The first is the null set, which cannot hold any value. As later examples will show, the null set normalizes beliefs. The second special set is {Adam or Bob or Carol}, represented by Q. Assigning belief to Q does not help distinguish anything. Therefore, Q represents ignorance. Representing ignorance is a key concept. Humans often give weight to the hypothesis "I don't know", which is not possible in classical probability. Assigning belief to "I don't know" allows us to delay a decision until more evidence becomes available. This mirrors the human tendency to procrastinate. Suppose that given a piece of evidence, we make the assertion shown in Figure 2. The D-S theory calls an assertion a basic probability assignment. The M in Figure 2 represents the measure of belief. The assertion of Figure 2 says that we believe Adam is the best choice with a weight of 0.6. We'll give the other 0.4 of belief to Q or "I don't know," thus allowing us to delay deciding on Adam. We cannot make this type of assertion in classical probability. The classical system's property of complements given earlier forces us to give Adam' 0.4 (the complement of Adam) if we give 0.6 to Adam. In this case Adam' = {Bob or Carol}. Notice the difference between Q ={Adam or Bob or Carol} and Adam' ={Bob or Carol}. Adam' gives more belief to Bob and Carol than we want. Q allows us to express a true "no comment" on the situation. Assigning belief to subsets in the D-S theory allows us to assign belief to a general concept instead of being too specific. Suppose in our example that the local police advise us that we should not have women coming to work early by themselves. We would make an assertion like that, shown in Figure 3. This assertion gives a weight of 0.7 to the subset {Adam or Bob} and a weight of 0.3 to ignorance or no comment. Classical probability does not permit a subset assertion. Recall that property D requires P(Adam or Bob) = P(Adam) + P(Bob). That property would force us to assign specific beliefs to Adam and to Bob individually. We do not want to be that specific. We want to procrastinate and think it over some more. Also, property C would make us assert the 0.3 to the complement of {Adam or Bob} which is {Carol}. We do not want to assert 0.3 to {Carol}. Assigning belief directly to {Carol} would contradict the evidence the police gave us. The D-S theory employs Dempster's rule of combination to combine two assertions. The mathematical formulas may be found in the references. They confuse the best of us, but they are simple when illustrated. Figure 4 shows how the two assertions combine. The table in Figure 4 is an intersection tableau. Which lists one assertion across the top and one down the side. Inside the tableau are the intersections of the sets in the rows and columns, with the products assigned to the intersections. The measures of belief inside the table sum to the final values given below the table. Notice how combination narrows the decision process. The single set {Adam} now has the highest belief. The subset {Adam or Bob} comes in second with no comment last. Now suppose that we require the first person in the office in the morning to bring up the computer system. Carol is an expert at this so we make the assertion shown in Figure 5. This attributes most of the belief to {Carol}. This new requirement or piece of evidence contradicts the previous evidence given by the police. That is the nature of evidence. Dempster's rule of combination allows us to combine the contradictory evidence and draw a logical conclusion. Figure 6 shows the combination of the result of Figure 4 and the assertion of Figure 5. Inside the intersection tableau is the null set. There is no intersection between the set {Carol} and the set {Adam} and there is also no intersection between the set {Carol} and the set {Adam, Bob}. The null set cannot hold any value. Therefore, it normalizes the beliefs of the other subsets. The sum of the beliefs of the other subsets is divided by one minus the belief in the null set. The beliefs of all the subsets sum to one. The bottom of Figure 6 shows this extra step. As a result, Carol is now the choice for coming in early in the morning. If she is unable to do so, then Adam is the logical replacement. If Adam is unavailable, then Bob comes in early. Implementation The preceding examples show that no complex mathematics are involved in combining two assertions. Dempster's rule of combination uses simple addition, subtraction, multiplication, and division. The only tricky part is the intersections of the sets in the tableau. There are several ways to solve the intersection question. Since there are three singletons and 23 total interpretations, we'll represent the hypotheses with three bits as in Figure 7. Listing 2 shows the C function that combines two assertions. The inputs are two belief vectors, each holding an assertion. The belief vector is a one-dimensional array of floats. In our examples, the LENGTH_OF_BELIEF_VECTOR is eight because we have three singletons and 23=8. The belief vector has a space, or slot, for each hypothesis, ordered as in Figure 7. The belief vector is awkward to initialize since we would like Adam in slot one, not slot four, and Carol in slot three, not slot one. Nevertheless, a uniform belief vector allows a very simple subroutine to combine the assertions. The first for loop initializes the sum_vector, the belief vector which holds the sums of the values found inside the intersection tableau. sum_vector holds the sums for later when normalization occurs. The for a loop goes through the belief vectors, finds the intersections, and calculates the products. The two if > 0.0 statements reduce processing time by eliminating unnecessary multiplication by zero. The function uses the C bitwise AND operator & to find the intersection of sets. Without the bitwise AND, the function would be much longer and much more complex. The last for loop performs the normalization. The values in sum_vector are divided by one minus the value assigned to the null set. The answer is stored in vector1. The combine_using_dempsters_rule function is the meat of the program written in Turbo C v1.5. I used this compiler because it had a few functions that made the user interface more pleasant. Except for those functions, there is nothing in the program that is machine, compiler, or operating system specific. One important note about implementing Dempster's rule of combination. The number of calculations depends on 2Q . In our example there were eight hypotheses. Alternatively, 200 single hypotheses would produce 2200 subsets, 2200 slots in the belief vector, and 2200 floating point calculations. This gets out of hand rather quickly. Several of the references [Gordon, Shortliffe 1985] [Shafer 1985] [Shafer 1987] deal exclusively with this topic. The discussion and proposed solutions are beyond the scope of this article. Conclusion The Dempster-Shafer theory of evidence is one method that an expert system may use to keep score on competing hypotheses while it gathers evidence and draws a logical conclusion. It is more general and capable than the classical probability with which most of us are familiar. It is easy to implement and executes quickly as long as the number of hypotheses is manageable. I suggest you try it on your next expert system or AI-related project. References Beyer, William H., CRC Standard Mathematical Tables, 26th edition, CRC Press, 1983, pp. 503-559. Gordon, Jean, Edward H. Shortliffe, "The Dempster-Shafer, Theory of Evidence," pp. 272-292 of Shortliffe, Edward H., Bruce G. Buchanan, eds., Rule Based Expert Systems, Addison Wesley Publishing Company, 1984. Gordon, Jean, Edward H. Shortliffe, "A Method for Managing Evidential Reasoning in a Hierarchical Hypothesis Space," Artificial Intelligence, Vol. 26, No. 3, July 1985, pp. 323-357. Shortliffe, Edward H., Bruce G. Buchanan, eds., Rule Based Expert Systems, Addison Wesley Publishing Company, 1984. Shafer, Glen, A Mathematical Theory of Evidence, Princeton University Press, 1976. Shafer, Glen, "Hierarchical Evidence," The Second Conference on Artificial Intelligence Applications, IEEE Press, December 1985, pp. 16-21. Shafer, Glen, Roger Logan, "Implementing Dempster's Rule for Hierarchical Evidence," Artificial Intelligence, Vol. 33, No. 3, November 1987, pp. 271-298. Figure 1 Frame of Discernment for the Case of Adam, Bob, and Carol {Adam, Bob, Carol} {Adam, Bob,} {Adam, Carol} {Bob, Carol} {Adam} {Bob} {Carol} {0} Figure 2 An Assertion Showing the Use of Ignorance m{Adam} = 0.6 m{Q} = 0.4 Figure 3 An Assertion Showing Belief Assigned to a Subset m{Adam, Bob} = 0.7 m{Q} = 0.3 Figure 4 Combining Two Assertions Using Dempster's Rule of Combination Figure 5 A New Assertion m{Carol} = 0.9 m{Q} = 0.1 Figure 6 Combining Result of Figure 4 with Figure 5 Figure 7 Using Three bits to Represent the Hypotheses bits hypothesis 000 {0} 001 {Carol} 010 {Bob} 011 {Bob, Carol} 100 {Adam} 101 {Adam, Carol} 110 {Adam, Bob} 111 {Adam, Bob, Carol} or {Q} Listing 1 /******************************************************************* * file d:\tc\cujds.c * * Functions: This file contains * main * display_belief_vector * clear_belief_vector * enter_belief_vector * combine_using_dempsters_rule * * Purpose: * This program demonstrates how to implement Dempster's * rule of combination. * * NOTE: This is written for Borland's Turbo C * Version 1.5. This allows us to use some * nice user interface functions. The actual * combination code is compiler independent. * ******************************************************************/ extern unsigned int _stklen = 40000; #include "d:\tc\include\stdio.h" #include "d:\tc\include\io.h" #include "d:\tc\include\fcntl.h" #include "d:\tc\include\dos.h" #include "d:\tc\include\math.h" #include "d:\tc\include\graphics.h" #include "d:\tc\include\conio.h" #include "d:\tc\include\sys\stat.h" #define LENGTH_OF_BELIEF_VECTOR 8 main() { char response[80]; int choice, i, j, not_finished; short place; float a[LENGTH_OF_BELIEF_VECTOR], belief, v[LENGTH_OF_BELIEF_VECTOR]; textbackground(1); textcolor(7); clrscr(); not_finished = 1; while(not_finished){ clrscr(); printf("\n> You may now either:"); printf("\n 1. Start the process"); printf("\n 2. Enter more assertions"); printf("\n 3. Exit program"); printf("\n _\b"); get_integer(&choice); switch (choice){ case 1: clear_belief_vector(v); clear_belief_vector(a); clrscr(); enter_belief_vector(v, 1); clrscr(); enter_belief_vector(a, 1); clrscr(); printf("\n> Initial Belief Vector\n"); display_belief_vector(v); printf("\n> Second Belief Vector\n"); display_belief_vector(a); combine_using_dempsters_rule(v, a); printf("\n> Resultant Belief Vector\n"); display_belief_vector(v); break; case 2: clrscr(); clear_belief_vector(a); enter_belief_vector(a, 1); clrscr(); printf("\n> Initial Belief Vector\n"); display_belief_vector(v); printf("\n> Second Belief Vector\n"); display_belief_vector(a); combine_using_dempsters_rule ( v, a); printf("\n> Resultant Belief Vector\n"); display_belief_vector(v); break; case 3: not_finished = 0; break; } /* ends switch choice */ } /* ends while not_finished */ } /* ends main */ clear_belief_vector (v) float v[]; { int i; for(i=0; i<LENGTH_OF BELIEF_VECTOR; i++) v[i] = 0.0; } /* ends clear_belief_vector */ display_belief_vector(v) float v[]; { int i, j; char response[80]; j=1; for(i=0; i<LENGTH_OF_BELIEF_VECTOR; i++){ if((j%5) == 0){ printf("\n"); j++; } if(v[i] > 0.0001){ printf(" [%3d]=%6f",i, v[i]); j++; } } printf("\n Hit RETURN to continue"); read_string(response); } /* ends display_belief_vector */ enter_belief_vector(v, line) float v[]; int line; { int i, not_finished, y; float value; y = line; printf("\n> ENTER BELIEF VECTOR"); printf("\n> Enter the place (RETURN) and value (RETURN)"); printf("\n> (Enter -1 for place when you're finished)"); not_finished = 1; while(not_finished){ printf("\n [__]=___"); y = wherey(); gotoxy(5, y); get_integer(&i); gotoxy(10, y); get_float(&value); if(i != -1){ v[i] = value; } /* ends if i 1+ -1 */ else not_finished = 0; } /* ends while not_finished */ } /* ends enter_belief_vector */ /*************************************************************** * * This is the function that implements Demptser's rule * of combination. * vector1 holds the original beliefs and will hold the * result of the combination. * ***************************************************************/ combine_using_dempsters_rule(vector1, vector2) float vectorl[LENGTH_OF_BELIEF_VECTOR], vector2 [LENGTH_OF_BELIEF_VECTOR]; { float denominator, sum_vector[LENGTH_OF_BELIEF_VECTOR]; int a, i, place; /* set the sums to zero */ for(i=0; i<LENGTH_OF_BELIEF_VECTOR; i++) sum_vector[i] = 0.0; /* Now go through the intersection tableau. */ /* Look for the intersection of non-zero beliefs */ /* and save their products. */ for(a=1; a<LENGTH_OF_BELIEF_VECTOR; a++){ if(vector2[a] > 0.0){ for[i-0; i<LENGTH_OF_BELIEF_VECTOR; i++){ place = i & a; if(vector1[i] > 0.0) sum_vector[place] = (vector1[i] * vector2[a]) + sum_vector [place]; } /* ends loop over i */ } /* ends if vector2[a] > 0.0 */ } /* ends loop over a */ denominator = 1.0 - sum_vector[0]; for(i=1; i<LENGTH_OF_BELIEF_VECTOR; i++) vector[i] = sum_vector[i]/denominator; } /* ends combine_using_dempsters_rule */ /* The following functions are I-O */ read_string(string) char *string; { int eof, letter, no_error; eof = -1; no_error = 0; while((letter = getchar()) != '\n' && letter != eof) *string++ = letter; *string = '\0'; return((letter == eof) ? eof : no_error); } /* ends read_string */ clear_buffer(string) char string[]; { int i; for(i=0; i<80; i++) string[i] = ' '; } long_clear_buffer(string) char string[]; { int i; for(i=0; i<300; i++) string[i] = ' '; } #define is_digit(x) ((x >= '0' && x <= '9') ? 1 : 0) #define is_blank(x) ((x == ' '] ? 1 : 0) #define to_decimal(x) (x - '0') #define NO_ERROR 0 #define IO_ERROR -1 #define NULL2 '\0' get_integer(n) int *n; { char string[80]; read_string(string); int_convert(string, n); } int_convert (ascii_val, result) char *ascii_val; int *result; { int sign = 1; /* -1 if negative */ *result = 0; /* value returned to the calling routine */ /* read passed blanks */ while (is_blank(*ascii_val)) ascii_val++; /* get next letter */ /* check for sign */ if (*ascii_val == '-' *ascii_val == '+') sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */ /* * convert the ASCII representation to the actual * decimal value by subtracting '0' from each character. * * for example, the ASCII '9' is equivalent to 57 in decimal. * by subtracting '0' (or 48 in decimal) we get the desired * value. * * if we have already converted '9' to 9 and the next character * is '3', we must first multiply 9 by 10 and then convert '3' * to decimal and add it to the previous total yielding 93. * */ while (*ascii_val) if (is_digit(*ascii_val)) *result = *result * 10 + to_decimal(*ascii_val++); else return (IO_ERROR); *result = *result * sign; return (NO_ERROR); } get_short(n) short *n; { char string[80]; read_string(string); int_convert(string, n); } short_convert (ascii_val, result) char *ascii_val; short *result; { int sign = 1; /* -1 if negative */ *result = 0; /* value returned to the calling routine */ /* read passed blanks */ while (is_blank(*ascii_val)) ascii_val++; /* get next letter */ /* check for sign */ if (*ascii_val == '-' *ascii_val == '+') sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */ /* * convert the ASCII representation to the actual * decimal value by subtracting '0' from each character. * * for example, the ASCII '9' is equivalent to 57 in decimal. * by subtracting '0' (or 48 in decimal) we get the desired * value. * * if we have already converted '9' to 9 and the next character * is '3', we must first multiply 9 by 10 and then convert '3' * to decimal and add it to the previous total yielding 93. * */ while (*ascii_val){ if (is_digit(*ascii_val)){ *result = *result * 10 + to_decimal(*ascii_val++); if( (sign == -1) && (*result > 0)) *result = *result * -1; } else return (IO_ERROR); } /* ends while ascii_val */ return (NO_ERROR); } get_long(n) long *n; { char string(80]; read_string(string); long_convert(string, n); } long_convert (ascii_val, result) char *ascii_val; long *result; { int sign = 1; /* -1 if negative */ *result = 0; /* value returned to the calling routine */ /* read passed blanks */ while (is_blank(*ascii_val)) ascii_val++; /* get next letter */ /* check for sign */ if (*ascii_val == '-' *ascii_val == '+') sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */ /* * convert the ASCII representation to the actual * decimal value by subtracting '0' from each character. * * for example, the ASCII '9' is equivalent to 57 in decimal. * by subtracting '0' (or 48 in decimal) we get the desired * value. * * if we have already converted '9' to 9 and the next character * is '3', we must first multiply 9 by 10 and then convert '3' * to decimal and add it to the previous total yielding 93. * */ while (*ascii_val) if (is_digit(*ascii_val)) *result = *result * 10 + to_decimal(*ascii_val++); else return (IO_ERROR); *result = *result * sign; return [NO_ERROR); } get_float(f) float *(f); { char string[80]; read_string(string); float_convert(string, f); } float_convert (ascii_val, result) char *ascii_val; float *result; { int count; /* # of digits to the right of the decimal point. */ int sign = 1; /* -1 if negative */ double pow10(); /* Turbo C function */ float power(); /* function returning a value raised to the power specified. */ *result = 0.0; /* value desired by the calling routine */ /* read passed blanks */ while (is_blank(*ascii_val)) ascii_val++; /* get the next letter */ /* check for a sign */ if (*ascii_val == '-' *ascii_val == '+') sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */ /* * first convert the numbers on the left of the decimal point. * * if the number is 33.141592 this loop will convert 33 * * convert ASCII representation to the actual decimal * value by subtracting '0' from each character. * * for example, the ASCII '9' is equivalent to 57 in decimal. * by subtracting '0' (or 48 in decimal) we get the desired * value. * * if we have already converted '9' to 9 and the next character * is '3', we must first multiply 9 by 10 and then convert '3' * to decimal and add it to the previous total yielding 93. * */ while (*ascii_val) if [is_digit(*ascii_val)) *result = *result * 10 + to_decimal(*ascii_val++); else if (*ascii_val == '.') /* start the fractional part */ break; else return (IO_ERROR); /* * find number to the right of the decimal point. * * if the number is 33.141592 this portion will return 141592. * * by converting a character and then dividing it by 10 * raised to the number of digits to the right of the * decimal place the digits are placed in the correct locations. * * 4 / power = (10, 2) ==> 0.04 * */ if (*ascii_val != NULL2) { ascii_val++; /* past decimal point */ for (count = 1; *ascii_val != NULL2; count++, ascii_val++) /************************************************* * * The following change was made 16 June 1987. * For some reason the power function below * was not working. Borland's Turbo C pow10 * was substituted. * *************************************************/ if (is_digit(*ascii_val)){ *result = *result + to_decimal(*ascii_val)/((float)(pow10(count))); /*********** *result = *result + to_decimal(*ascii_val)/power(10.0,count); ************/ } else return (IO_ERROR); } *result = *result *sign; /* positive or negative value */ return (NO_ERROR); } float power(value, n) float value; int n; { int count; float result; if(n < 0) return(-1.0); result = 1; for(count=1; count<=n; count++){ result = result * value; } Listing 2 C Code to Implement Dempster's Rule of Combination /* * This is the function that implements dempster's rule * of combination. * vector1 & vector2 are belief vectors. vector2 will * hold the result of the combination. */ #define LENGTH_OF_BELIEF_VECTOR 8 combine_using_dempsters_rule (vector1, vector2) float vector1[LENGTH_OF_BELIEF_VECTOR], vector2[LENGTH_OF_BELIEF_VECTOR]; { float denominator, sum_vector[LENGTH_OF_BELIEF_VECTOR]; int a, i, place; /* set the sums to zero */ for(i=0; i<LENGTH_OF_BELIEF_VECTOR; i++) sum_vector[i] = 0.0; /* Now go through the intersection tableau. * Look for the intersection of non-zero beliefs * and save their products. */ for(a=1; a<LENGTH_OF_BELIEF_VECTOR; a++){ if(vector2[a] > 0.0){ for(i=0; i<LENGTH_OF_BELIEF_VECTOR; i++){ place = i & a; if(vector1[i] > 0.0) sum_vector[place] = (vector1[i] * vector2[a]) + sum_vector[place]; } /* ends loop over i */ } /* ends if vector2[a] > 0.0 */ } /* ends loop over a */ denominator = 1.0 - sum_vector[0]; for(i=1; i<LENGTH_OF_BELIEF_VECTOR; i++) vectorl[i] = sum_vector[i]/demoninator; } /* ends combine_using_dempsters_rule */ An Introduction To Speech Recognition B.J. Gleason B.J. Gleason is an Assistant Professor at Upsala College in New Jersey, and holds a master's degree in computer science. He is currently working on an Ed.D. in computer education at Nova University. Contact Mr. Gleason care of the Computer Science Department, Upsala College, East Orange, NJ 07019. "Open the pod bay door, Hal." "I'm sorry, Dave, I'm afraid I can't do that." -- 2001, A Space Odyssey "Shields Open." --Batman One of the friendliest user interfaces should be voice. At some point in the future, we will be able to talk to our computer system and have it understand us. In the two examples above, the computer understood the spoken word. In Hal's case, he could also lip-read (An article for this is currently in progress.). With the Batmobile, the speech recognition is more realistic, if not the automatic pilot in the car. In the above paragraph, I use the word "should" to describe voice as a friendly interface. It should be, but at this time, it isn't. Not for lack of trying, however. Speech recognition (SR) is not yet in the home although it has been around for about 20 years. SR is still unreliable and not easy to use. Yet. Speech recognition is a wide field that must be broken down if it is to be understood. There are two portions to a speech recognition (SR) system. The first is the recognition portion. This is almost easy. The person says something and the recognizer returns the words that were spoken. The second part, the understanding of what was said, is much harder. The second portion falls into the area of Natural Language Processing. In the Batman example, the understand portion is easy. In the 2001 example, it is harder. For example, Hal must not only know how to open the door, but why the door is to be opened. Hal understands this, and realizes that he must not open the door for Dave. In this article, I will describe the first portion, the recognition procedure. It is beyond the present scope to describe Natural Language Processing. We will take a look at the current techniques used in SR and I will provide you with everything you need in order to experiment with SR on your own. Intro To Speech Recognition Most SR systems belong to one of two fundamental classes: speaker dependent or speaker independent. With speaker dependent systems, the user must first "train" the system to recognize his or her voice. During training the system displays a word, the user pronounces it, and the system saves the resulting voice pattern. Once a sample of all the required words is stored, the user may begin using the system. When the user speaks, the system will take the unknown voice pattern and compare it to the patterns stored in memory. The word associated with the closest matching pattern is returned as being the word the system believes the user spoke. This is the most common technique. Speaker dependent SR units are capable of 91% to 99% accuracy. While the training is time consuming, this technique is economical. You can pick up one of these units for less than 100 dollars. If you are electronically inclined, you can build one for less than 50 dollars. A speaker independent system requires no training. It should recognize words as soon as it is turned on. Speaker independence is much harder to accomplish. The voice pattern is broken down and analyzed for certain features -- features chosen to distinguish among the words to be recognized. Speaker independent systems are very attractive, and will become more so in the future. At this time however, most speaker independent units are expensive, achieve accuracies of only 85% to 95%, and often recognize fewer words as well. Speaker independent systems require more processing power to do the analysis, require dedicated processors and start in the 500 dollar range. Radio Shack is now selling a speaker independent isolated word recognizer chip for about 10 dollars. Build an amplifier circuit for it and it will recognize nine different words. This particular chip is used in a number of voice activated toys. Specifications How many words do we need to recognize? The ultimate dream, of course, is the voice typewriter which would have to be speaker independent and to recognize about 50,000 or so words. But many applications can get away with much less. A simple voice calculator for example, would need to recognize only 10 digits, four operations, a decimal point, and an equal sign -- only 16 words. Several companies use voice inventory systems. These require the digits and a few commands, again around 16 or so words. Many applications need only a few words. A small vocabulary has important advantages. Larger vocabularies require more memory and more processing time to find a match. With larger vocabularies, accuracy will typically decrease. Given the current state of the art, it would be best to limit the vocabulary to the smallest possible number of words. Speech habits also affect recognition systems. Most humans talk in what is known as continuous speech. Most SR units depend upon isolated speech. Research has shown that it is very difficult to separate the words in continuous speech. The pauses between words in continuous speech are very short -- sometimes even shorter than the natural pauses that occur within words. The emphasis and accents common to continous speech create additional problems. There is typically a marked difference between the pronunciation of a sentence and a collection of words. Dictating to an isolated word recognizer for dictation takes some getting used to. You must pause, typically for one tenth of a second or more, between words. Until you get used to it, it can be frustrating. The Hardware In order to talk to your computer, we need a device to translate voice waves into binary values. Figure 1 shows a block diagram of a typical SR unit. The microphone is fed into a preamp circuit. The output of the preamp is fed into two bandpass filters. The lowband filter has a range of 150Hz to 900Hz. The highband filter has a range of 1Khz to about 4Khz. The output of the bandpass filters is a sinewave whose frequency falls within the range of the associated filter. Using these bandpass filters we can isolate the high and low frequencies of the utterance. More sophisticated SR units might have more bandpass filters to include midranges as well. The output of the bandpass filters goes into a zero crossing detector (ZCD) and a rectifier/averager (R/A) circuit. The ZCD is a comparator that tests its input against a reference voltage. When the signal is above the reference voltage, the output is one; when less than the reference voltage, it is zero. The timing output of the ZCD will approximate the input frequency. The R/A circuit converts the average RMS AC signal to its DC equivalent. This signal is then inverted and fed into a comparator set to slightly less than the reference voltage. With no speech, the normal output would be zero. When speech enters the system, the output of the comparator will go to one, indicating that voice input is present. These four outputs are fed into a parallel port and read by the computer. Speaker Independent Systems Now that we can get the speech into the computer system, we need to process it. There are several different techniques for speaker independent speech. The technique I will describe is phonetic analysis. The sound that we produce to form words can be broken down into several categories: Pure Voice Vowels (V): a,e,i,o,u,uh, aa,ee,er,uu,ar,aw Nasal (N): m,n,ng Voice Fricative (VF): z,zh,v,dh Unvoiced Fricative (UF): s,sh,f,th Plosive (P): b,d,g,p,t,k,h Glide (G): r,w,l,y Our speaker independent board and program would first take the speech utterance and break it down into these categories. For example, vowels sounds are continuous and generate low frequency energy, whereas fricatives are continuous but generate high frequency energy. The software must look for the characteristics of each group in the input speech. It would then generate a sequence of these phonetic grouping and look up the word in a dictionary: 0 VF-V-G-V YES G-V-UF 1 G-V-N NO N-V 2 P-V START UF-P-V-P 3 UF-G-V STOP UF-P-V-P 4 UF-V-G 5 UF-V-VF 6 UF-V-P-UF 7 UF-V-VF-N 8 V-P 9 N-V-N This small table starts to illustrate some of the problems with this technique. As we add words, correctly grouping the phonemes becomes a parsing nightmare. Notice that the phonetic makeup for START and STOP are the same! How can we tell the difference? We can't. Not easily anyway. Larger vocabularies well generate even more "collisions". In a 20,000 word dictionary, there are only about 2000 unique phonic combinations. 10,000 of the words are five phonetic symbols or less. Another technique, template averaging, reads in several copies of a word and then finds the major features of the word. This technique is a mix between speaker dependent and independent. In the training (similar to speaker dependent) phase, five people save the same set of words. The program finds the major features and uses these collective patterns as the model. The system is then speaker independent and will recognize almost any voice pronouncing the word. Speaker Dependent Systems Speaker dependent systems are reasonably straightforward. They all work on the same principle. In the training phase, the sounds are stored in a reference template. In the speech phase, the unknown sound is compared against the known sounds for the closest match. I have prepared a program and data files that will allow you to experiment with an isolated word, speaker independent system. The program appears in Listing 1. The data files are available on the code disk for this issue. How It Works If we are training or analyzing the speech, the same process goes on when a person talks into the microphone. To capture sound and reproduce any waveform, we must sample it at twice the highest frequency (the Nyquist rate). For speech we must capture and store 8000 four-byte samples per second. At this rate, we would need 32K of storage for a single word. Most SR units don't really sample at the Nyquist rate, since they don't reproduce the sound. SR units commonly capture only 100 four-byte samples per second. Assuming a maximum of 1.5 seconds of speech per utterance, each word will require less than 600 bytes of storage. The comparator circuit converts the output of the bandpass filters into square waves at a relative frequency. The software counts the number of square waves in both the low and the high bandpass filters during each 0.01 second interval, producing a one-byte approximation to the frequency in each band. These numbers, along with the values of high and low energy (four values all together), are stored in a "raw speech buffer". Up to 150 samples can be stored, enough for about 1.5 seconds of speech. The beginning of speech is indicated by at least 0.1 seconds of sound, and the end of speech is at least 0.1 seconds of silence. The "raw" bytes produced by this process are still not very useful, as they may still be "distorted" by "time warping." If you say the word ONE twice and each time say it slightly slower, each repetition will appear to be a longer word. To compensate for this variance, we time-normalize the word so that it can be matched against faster or slower pronunciations. Once the end of speech is detected, the buffer is broken down into 16 evenly spaced points, based on the total length of the word. At each of the 16 points, a value is taken from the high and low energy, as well as the two zero crossing detectors. The final result is a 64-byte template of the utterance. Time normalization forces all the templates to be the same size, making it easier to compare against one another. A template constructed during the training phase would be stored in an array, indexed to the number of a word. A template constructed while analyzing the speech is instead compared to all the known templates in the array. During the recognition phase, the speech input is reduced to a template which is compared against the stored templates. As the unknown template is compared against each stored template, a difference value -- the minimum difference -- is computed. During the matching process, each minimum difference is also compared against a rejection limit; if the program can't find a match with a minimum difference at least as small as the rejection limit, it will respond "unknown word" instead of making a wild guess. Thus, the rejection limit sets the degree of "confidence" required to announce a match. Higher rejection limits will result in more erroneous matches; lower limits will result in more "unknown word" responses. Calculating all the minimum differences is time consuming. We can avoid some of these calculations by abandoning the calculation as soon as the partial result exceeds the rejection limit. We can get even better results by "remembering" the best minimum difference calculated so far and abandoning the calculation as soon as the partial result exceeds this limit. Both optimizations can be folded into one test if during the search the "rejection limit" is adjusted dynamically so that it is either the programmed rejection limit or the best minimum difference seen during the current matching attempt. We can enhance the rejection process by using the delta difference technique. As we calculate the differences of all the words to the unknown sample, we may get two words that are very close to each other but both under the rejection limit. The delta difference technique requires that the correct word "beat" the other words by a certain amount. If the delta difference is set to 10, and one template difference is 124 and the other is 129, both candidates will be rejected. In general, the greater the delta difference between the two best choices, the better. Once a speaker dependent system is trained, it is tested. If a particular word generates a large number of errors, the user may re-train on that particular word. Operation Considerations To get the best results with any voice recognition circuit, remember a few simple guidelines: speak slowly and clearly; try to repeat the words as consistently as possible; operate it in a quiet environment; be careful when choosing your word lists, avoiding words that sound alike; and hold the microphone close to your lips. The smaller the list of words, the greater the accuracy of the system. For example, for a game of Hi/Lo, only four words are needed: higher, lower, yes, and no. By clearing the other templates, your recognition of these should be 100 percent. One can also increase recognition by storing two templates (derived from different training sessions) in the vocabulary. This will give you two chances to match the word. The Program The main program, called SPEECH.C, is the simplest version of the speech recognition algorithm. This program will work quickly with reasonable accuracy. The code was written in Turbo C v2.0 and has been written for clarity rather than speed or compactness. The program is menu-driven. It has options to load and train a set of voice (data) files. With the "Train" option you may vary the voice files individually. The "Debug" option (normally on) will display the waveforms of the words and indicate the elements being extracted for the templates. During the "Perform" choice, the debug option will also display the minimum difference table along with the delta difference. With the debug option off, the system will just display which word is recognized. Data Files The disk includes data files for the digits zero through none and a telephone number (with area code). These files are taken from the raw speech buffer. These files are included so that you can experiment with voice recognition algorithms without having the hardware. The function getvoice() on my system will wait for a word to be spoken and place the raw speech into an array. On your system, it will open up and read the data file and place the raw speech into an array. If the debugging feature is turned on, a character-based plot of the wave forms will be generated on the screen in a vertical fashion as they are read in. If this drawing is too crude for your tastes, you can import these files into a spreadsheet program, such as Lotus, and plot them. You can use the digit files to train the SR system, and then use it to recognize the phone number. Alternate Techniques I include two sets of digit data, each set captured during a separate speech session. You can use this extra data to experiment with techniques that may increase the accuracy of the system: Duplicate Entries -- Train the system with both sets, so you have twenty templates. During recognition, if the template matched is greater than nine, subtract 10 to get the "real" number. Template Averaging -- Take two samples of each digit, average them together and use the result as the template. Input Averaging -- Average each point on each band to the point directly ahead and behind. This will help to smooth the band and eliminate noise that can creep into the system. Amplitude Normalization -- On some system, the volume of the sound can greatly effect the signals coming from the speech board. You can eliminate most of this effect with normalization. Calculate the average of an entire band, compare this to a standard amplitude factor (for example, 20), then multiply this factor by each element of that band. This will have a sliding effect on the band. Alternatively, you can just subtract the average from each element. Interpolation -- When extracting the 16 samples from the raw speech buffer, I calculated the precise position, but used the nearest element. For a more exact representation of the wave form, a new sample should be interpolated from the adjacent two samples. Number of Samples -- I have used 16 normalized samples. You can vary this number up or down and test the results. The larger values save more information but increase the processing time. Smaller values increase processing speed, but make for closer delta differences. Speech Recognition With Natural Language Many databases now have "natural" language interfaces so that you can ask questions such as "What is the highest mountain in New Jersey?" While many of these interfaces seem natural, most of them have a hidden structure behind them. For example, in the Batmobile after the word "Shields", the system will probably accept "Open", "Close" and nothing. If only the word "Shields" is spoken, the shields will close by default. The SR unit would have the words Shields, Open, Close, Stop, and a wide (we assume) list of other words. To help cut down on processing time, we can eliminate illogical words as we parse the sentence. "Shields Stop" would be illogical. When we identify the word "Shields", we compare the next utterance to "Open" and "Close" only. If no word is spoken, or the word is rejected, we close the shields. Natural language processing also helps with the "To, Too and Two" problem of similar sounding words. If we said "I want to go too", the NL processor would be able to sort out the correct usage based on the context in which the word was said. Acknowledgements The author would like to thank Larry Eckelkamp and Nuala Murphy for their help with this article. Bibliography Ainsworth, W. A., Mechanisms of Speech Recognition, Elmsford, NY: Pergamon Press, 1976. Carter, John P., Electronically Hearing: Computer Speech Recognition, Indianapolis, IN: Howard W. Sams & Co., Inc., 1984. Staugaard, Jr. Andrew C., Robotics and AI, Englewood Cliffs, NJ: Prentice-Hall, Inc., 1987. Figure 1 Circuit Block Diagram Listing 1 /* Speech Recognizer */ /* bj gleason, Upsala College, Computer Science Department */ /* East Orange, nj 07019 (201)-998-1037 */ #include <stdio.h> #include <conio.h> #define RAWBUFFERSIZE 150 /* 100 samples/second 1.5 seconds */ #define VOCABSIZE 10 /* 10 digits, 0 - 9 */ #define NUMSAMPLES 16 /* Number of samples to extract */ #define NUMBANDS 4 /* High, Low Freq, High, Low Energy */ #define BIGNUM 32767 /* large number for min diff. calc */ int rawspeech[RAWBUFFERSIZE][NUMBANDS]; /* to hold raw speech */ int index[VOCABSIZE]; /* indicate if digit trained */ int template[VOCABSIZE][NUMSAMPLES][NUMBANDS]; /* known templates */ int unknown[NUMSAMPLES][NUMBANDS]; /* unknown voice template */ int min_diff[VOCABSIZE]; /* min diff. each digit */ int sam_size; /* current sample size */ int debug; /* show debugging info */ /* getspeech will read in a file from disk. The length in bytes */ /* will be returned. The rawspeech buffer will be modified. */ int getspeech() { FILE *fptr; int i,j; char fname[80]; if (debug) printf("\nReading in Speech"); printf("\nEnter name of file?"); gets(fname); if ((fptr=fopen(fname,"rt"))==NULL) { printf("\nCant find file %s",fname); return(0); } else { for(i=0;i<=RAWBUFFERSIZE;i++) for(j=0;j < NUMBANDS;j++) { if ((fscanf(fptr,"%i",&rawspeech[i][j]))==EOF) { fclose(fptr); return(i); } } } } int plot_it() { int i,j,x,y; printf("\n\n"); for (i=0;i < sam_size;i++) { for (j=NUMBANDS-1; j >= 0; j--) { x=rawspeech[i][j]+(j*2); gotoxy(x,wherey()); putchar(j+48); } printf("\n"); } } /* the closest match routine compares the unknown template with */ /* known templates. It builds a minimum difference list that is */ /* the difference between unknown and each known. We then scan */ /* list to find the closest match. */ int closest_match() { int p,i,j; int low, next_low,digit,next_digit; if (debug) printf("\nFinding Closest Match"); for (p = 0; p < VOCABSIZE; p++) if (index[p] != 0) { min_diff[p] = 0; for(i = 0; i < NUMSAMPLES; i++) for(j = 0; j < NUMBANDS; j++) /* for each digit, find the absolute difference */ /* between known and unknown templates */ min_diff[p] = min_diff[p] + abs(unknown[i][j] -template [p] [i] [j]); } else { min_diff[p]=BIGNUM; /* put in a big number if digit not */ } /* trained. */ /* min_diff now has the difference for each template. Search */ /* to find the smallest difference. This will be our digit. */ /* Find the next lowest match to calculate the delta diff. */ digit = -1; next_digit = -1; low = BIGNUM; next_low = BIGNUM; if (debug) printf("\nTP# Diff Low Digit"); for (p = 0; p < VOCABSIZE; p++) { if(min_diff[p] < low) { next_low = low; next_digit = digit; digit = p; low = min_diff[p]; } if (debug) printf("\n%3i %5i %5i %2i",p,min_diff[p],low,digit); } if (debug == 1) { printf("\nMinimun Difference was %i, Digit is %i",low, digit); printf("\nNext Closest Diff was %i, Digit is %i",next_low ,next_digit); printf("\nWith the delta difference of %i",next_low-low); } /* it would be right here where your would add the code */ /* to set a rejection limit or a delta difference limit. */ /* If the digit is rejected, send back error, such as -1. */ return(digit); } /* Extract template will extract a template from the raw */ /* speech buffer. This is to reduce the size of the */ /* template and to elimate time warping. */ /* the rate is kept in floating point to prevent truncation */ /* errors. */ int extract_template() { int i,j,p; float rate,x; if (debug) printf("\nExtracting Template"); rate = (float) sam_size / NUMSAMPLES; p = 0; if (debug) { printf("\nExtracting %i elements from Raw Speech", NUMSAMPLES); printf("\nTake every %f element", rate); printf("\n\n UN RS"); } for (x = 0; x < sam_size ; x = x + rate) { for (j = 0; j < NUMBANDS; j++) unknown[p][j] = rawspeech[(int)x][j]; if (debug) printf("\n%3i %3i",p,(int)x); p++; } } /* During the training phase, this will take the extracted template */ /* and store it in the array of known templates. */ int store_template(int position) { int i,j; if (debug) printf("\nStoring template at position %i",position); for (i = 0; i < NUMSAMPLES ; i++) for (j = 0; j < NUMBANDS; j++) { template[position][i][j] = unknown[i][j]; } } /* Perform - Get the speech, extract an unknown template, compare */ /* against the rest, and print the resulting digit. */ int perform() { int digit; sam_size = getspeech(); if (debug) plot_it(); if (debug) printf("\nSize of Sample = %i",sam_size); extract_template(); /* break raw buffer up and */ /* place into unknown template */ digit = closest_match(); printf("\nDigit spoken was %i",digit); } /* Training - Get the speech, extract an unknown template, */ /* find from the user what digit it was, then store it in */ /* the known template array. */ int train() { char ans[10]; int digit; sam_size = getspeech(); if (debug) plot_it(); if (debug) printf("\nSize of Sample = %i",sam_size); printf("\nEnter the digit spoken ?"); gets(ans); digit = atoi(ans); index[digit] = 1; /* indicate this digit is trained */ extract_template(); /* break raw buffer up and */ /* place into unknown template */ store_template(digit); /* store the template */ } /* Eztrain - This is to quickly load in files a0 - a9 */ int eztrain(char fname[80], int digit) { FILE *fptr; int i,j; if ((fptr=fopen[fname,"rt"))!=NULL) { sam_size = 0; printf("\nReading file %s", fname); for(i=0;i<=RAWBUFFERSIZE;i++) for(j=0;j < NUMBANDS;j++) if ((fscanf(fptr,"%i",&rawspeech[i][j]))!=EOF) sam_size = i; fclose(fptr); if (debug) plot_it(); if (debug) printf("\nSize of Sample = %i",sam_size); index[digit] = 1; extract_template(); store_template(digit); } } main() { int i; char ans[80]; char choice; /* clear the training index... nothing has been entered */ for (i=0; i<VOCABSIZE; i++) index[i] = 0; printf("\nWelcome to Speech Recognition Demo, Version 1.0\n"); debug = 1; /* display debugging information */ do { printf("\n\nTrain, Perform, Load A or B, Debug "); if (debug) printf("Off"); else printf("On"); printf(", or Quit? (T/P/A/B/D/Q)"); gets(ans); choice = toupper(ans[0]); if (choice == 'A') { eztrain["a0",0); eztrain["a1",1); eztrain["a2",2); eztrain["a3",3); eztrain["a4",4); eztrain["a5",5); eztrain("a6",6); eztrain("a7",7); eztrain("a8",8); eztrain("a9",9); } if (choice == 'B') { eztrain("b0",0); eztrain("b1",1); eztrain("b2",2); eztrain("b3",3); eztrain("b4",4); eztrain("b5",5); eztrain("b6",6); eztrain("b7",7); eztrain("b8",8); eztrain("b9",9); } if (choice == 'D') { debug = !debug; printf("\n Debugging Trace "); if (debug) printf("On"); else printf("Off"); } if (choice == 'T') train(); if (choice == 'P') perform(); } while(choice != 'Q'); printf("\n\nAll done."); } The World Of Command Line Options Scott Maley Scott Maley is a member of the technical staff at The Analytic Science Corporation (TASC). He has more than fifteen years experience in areas of software engineering ranging from Space Shuttle flight simulation to Cobol maintenance. Visual, iconic or graphic interfaces reduce complexity for the user by relegating details to another level. Complexity is neither created or destroyed -- it only changes its appearance or location and distribution. Thus, beneath the surface of many visual interfaces the various graphical tools exchange a great deal of information, often by means of command line options, where the programmer, rather than the user, must deal with the complexity. This article describes a command-line option-handling package that pushes some of the command line complexity down a level, reducing the amount of complexity with which the programmer must deal. This article is not intended to help you decide when to use command line options. If you have decided to use them, it may help you decide how. I have assumed that you know how arguments (which include options) are passed to a C program. Even if that is not completely clear, you must at least understand that a pointer is a way to refer to an object, and not the object itself. Overview Options are used in two fundamentally different ways, distinguished by their position (in)dependence. An option may indicate that something is desired for all arguments (position independent), or for all following arguments (position dependent). No option can be both, because we can't distinguish which is intended unless context is expanded to more than one option at once. The function cmd_opts presented here assumes that it must deal with only position independent options. It associates arguments with each recognized option (basically a sorting process) and leaves all unrecognized options, including position dependent options, undisturbed. Thus, the cmd_opts function handles position independent options, and the programmer continues to handle position dependent options. cmd_opts accepts the same number of arguments as getopt (provided with UNIX System V). The first and second arguments are identical to those passed to getopt. The third encapsulates expanded information in a nil terminated array of options structures, each option isolated in a separate structure. cmd_opts works backwards through the command line, grouping arguments for options as it goes. By working backwards, the pointer to each option's array of arguments ends up right where we want it, pointing to the first associated argument; cmd_opts leaves any options it doesn't understand where it found them (relatively speaking) and returns an error count to warn us. See Listing 3. The Options Structure Listing 1 presents the elements of the structure options. The first element, options.s, is the character that will be used on the command line for this option (e.g. s in -s). Options.arg_flg indicates the minimum number of this option we expect on the command line. The third element, poptv, controls how arguments are associated with options. If poptv is set to NULL, no arguments can be associated with the option. Otherwise cmd_opts will point poptv to the first such argument. Note that the ADDRESS (&) of the array of pointers (e.g. barg) must be placed in options.poptv, so that it may be set to point to the first of any arguments for the option. The arg_flg for each option is used to return the number of valid instances of the option that were encountered. It may also be directly used as a flag, since a nonzero value is considered TRUE in C. The second (arg_flg) and third (poptv) elements of the options structure interact to determine how an option is handled by cmd_opts. Listing 2 presents examples of how they interact, where: aoptional, no associated arguments boptional, arguments expected crequired, no associated arguments (for completeness) drequired, arguments expected You can gain a better understanding of the interactions by experimenting with various combinations (compile and link tcmdopts.c with cmd_opts. c -- both include cmd_opts.h). Conclusions cmd_opts is much easier to use than the widely used getopt package provided with UNIX System V (source code freely available). While getopt accepts a list of option switch characters and has a means of specifying which require arguments, it has a number of shortcomings. After all the work it does to isolate option switches and associated arguments, getopt requires you to perform similar work. Once getopt returns with a switch character you must determine which it was, then associate any arguments. Worse, getopt passes some of the information via globals (e.g. optarg). cmd_opts handles the burden of associating arguments with options for those which are position independent. Yet, it leaves any unrecognized options where they were, so that we may handle position dependent options. Thus, we may typically dispense (at least for position independent options) with the switch statement that is often used to associate arguments with options when using the getopt function. No globals are used by cmd_opts, so possible side-effects have been minimized, and the package is more usable in shared code libraries. Finally, encapsulating the information associated with each option in a structure makes cmd_opts easier to understand and use than getopt. Listing 1 /* cmd_opts.h, c\include * structure definition for command line options */ struct options { char s; /* valid switch letter */ int arg_flg; /* flag to indicate an argument is required */ char ***poptv; /* pointer to option's value vector */ } ; Listing 2 /* tcmdopts.c, c\lib\test * Test cmd_options routine */ #include <stdio.h> #include "cmd_opts.h" main(argc, argv) int argc; char *argv[]; { int cmd_errs, i; static char **barg, **darg; static struct options sw[] = {'a',0,NULL, 'b',0,&barg, 'c',1,NULL, /* generally useless */ 'd',1,&darg, 0, 0,NULL}; cmd_errs= cmd_options( & argc, argv, sw); if (sw[0].arg_flg > 0) printf("%d -a\n",sw[0].arg_flg); for (i= 0; i < sw[1].arg_flg; i++) printf("-b %s\n", barg[i]); if (sw[2].arg_flg > 0) printf("%d -c\n",sw[2].arg_flg); for (i= 0; i < sw[3].arg_flg; i++) printf("-d %s\n", darg[i]); puts("Unclaimed:"); for (i= 1; i < argc; i++) /* argv[0] is still the command */ printf(" %s",argv[i]); puts("\n"); if (cmd_errs != 0) { printf("\7\ntcmdopts [-a] [-b<value>] -c -d<value> ...\n"); printf("\n%d Command line options invalid\n", -cmd_errs); exit(1); } } Listing 3 /* cmd_opts.c, c\lib\src, (c) 1989 Scott D. Maley May be freely used, as long as copyright notice is preserved cmd_options(argc, argv, option) int *argc; -- pointer to command line arg count char *argv[]; -- pointer to array of pointers to command line arguments struct options option[]; -- structure array defining valid options This is a function to process command line options (or switches). The full set of command line arguments is passed to the routine via argc and argv. Every option switch encountered that is a valid match for a switch specified in the option array is counted, removed from argv, and the pointer to it's associated value (if any) is moved to the optv array. A count of switches which are not valid matches of any option is returned, and those switches are left in argv. A switch's value may be contiguous with it, or be separated from it by white-space (e.g. -svalue, -s value). White-space is commonly blanks and tabs, but may also include commas in some C implementations. This routine doesn't care. The C runtime initialization routine which runs before main() is entered parses the command line into tokens (which the elements of argv point to), based on it's definition of white-space. The structure "options" is used to define what this routine will parse: struct options { char s; -- The option (switch) letter int arg_flg; -- indicates if an arg is required char **poptv[];-- pointer to option value vector -- NULL, if none expected } ; The third argument to this routine, option, is an array of the options structures. The end of this array is signaled with s == 0. This routine returns: 0 - if all switches encountered were valid options. -n - Negative of the count of invalid (e.g. no value followed the switch when one was expected, or a value was contiguous with the switch, but none was expected) switches encountered. N also includes a count of switches that were expected, but not encountered in argv. It also sets arg_flg to indicate how many of each switch encountered. Sample use: ---------- #include <stdio.h> #include "cmd_opts.h" main(argc, argv) int argc; char *argv []; { char s, *farg[] *marg[]; static struct options sw[] = {'a', 0, NULL, -- optional, no value 'f', 0, &farg, -- optional, w/ value 'm', 1, &marg, -- required, w/ value 0, 0, NULL}; if (cmd_options( & argc, argv, sw) < 0) { --- error, handle it here --- } --- --- continue with rest of program --- } *-- History: * 30 Jan 89 SDM (TASC) No need to calloc optv, we * can work entirely within argv (plus * a temp pointer). * 27 Jan 89 SDM (TASC) Handle multiple instances * of a switch. * Retain everything not * specified in opts in argv, and set * argc accordingly. * 20 Jan 89 S.D. Maley (TASC) Initial implementation. *-- End History */ #include <stdio.h> #include "cmd_opts.h" #define EOS '\0' #define MoveOptFromArg(optv,argv,i,argc) \ (char *temp;\ temp= argv[i];\ RemoveArg(argv,i,argc);\ (optv)--;\ optv[0]= temp;\ } #define RemoveArg(argv,i,argc) \ {int j;\ (argc)--;\ for(j=i;j<argc;j++) argv[j]=argv[j+1];\ } #define SWFLG '-' #define SwChr *(argv[i]+1) #define SwMatch (*argv[i] == SWFLG && opts[j].s == SwChr) #define SwValContig (*(argv[i]+2) != EOS) #define SwValNext (i+1 < *argc && *argv[i+1] != SWFLG) cmd_options(argc, argv, opts) int *argc; char *argv[]; struct options opts[]; { int i,j, njth, stat; char **optv; /* equivalent to: *optv[] */ optv = argv + *argc; /* work from back to front */ /*-- Transfer options from argv to optv * -- and check against expectations */ stat = 0; for (j = 0; opts[j].s != 0; j++) { njth = 0; for (i = *argc - 1; i > 0; i--) { /* back to front, we build optv */ if (SwMatch) { if (opts[j].poptv == NULL) { /* no arg value desired */ if (SwValContig) continue; /* next i */ else RemoveArg(argv,i,*argc); } else { /* A value is desired */ if (SwValContig) { argv[i] += 2; /* past "-'opt_char'" */ MoveOptFromArg(optv, argv,i,*argc); } else if (SwValNext) { /*-- pick up value from next arg */ RemoveArg(argv,i,*argc); MoveOptFromArg(optv, argv,i,*argc); } else continue; /* next i */ } njth++; /* only count valid switches */ } /* if SwMatch */ } /* for i */ if (opts[j].poptv != NULL) *opts[j].poptv= optv; /* point to option value vector */ if (opts[j].arg_flg > njth ) stat _= opts[j].arg_flg - njth ; /* not enough */ opts[j].arg_flg = njth; } /* for j */ for (i= 1; i < *argc; i++) if (*argv[i] == SWFLG) stat--; /* a switch we couldn't handle */ return(stat); } Multitasking With Lightweight Threads Gregory Colvin Trained in cognitive psychology, Dr. Colvin first learned to program in 1972, in BASIC on a PDP-8. He later had the distinction of being the first Cornell University graduate student to purchase an Apple II with student loan money, and has been happily hacking microcomputers ever since. He has been programming professionally in C since 1983. He welcomes comments and queries at 680 Hartford, Boulder, CO 80303 (303) 499-7254. Often, and against my better judgment, I contract to create applications under operating systems which do not support multitasking. Lightweight threads can sometimes be used to circumvent this limitation; both Microsoft's Windows and Apple's Multifinder use lightweight threads to retrofit multitasking facilities to single tasking operating systems. This article presents the ANSI C source for a multitasking kernel based on lightweight threads. I have tried to make this kernel as small, fast, simple, and portable as possible. I have successfully used the predecessor of this kernel to implement a real-time graphics display system and to provide background query processing for a database application. Threads A thread of computer execution consists of at least three elements: a memory segment containing executable machine instructions, an instruction pointer register which indicates the next instruction to execute, and a data segment for variable allocation. Most computers also support function calls by providing a stack memory segment; a frame pointer register, which points to the current stack frame; and a stack pointer register, which points to the next available space for a stack frame. A stack frame contains the arguments and local variables for a function, and the instruction pointer and frame pointer of the function that called it (see Figure 1). A function call typically creates a new stack frame by pushing the function arguments and the current instruction and frame pointer registers on the stack, moving the current stack pointer to the frame pointer register, decreasing the stack pointer enough to leave room for local variables, and moving the address of the called function to the instruction pointer. Thus nested function calls result in a linked list of stack frames on the stack segment, which is traversed as functions return. Context Switching A multitasking operating system can execute several threads "simultaneously" on one machine. Since most machines can only execute a single instruction at a time, only one thread is really executing at any one time, but multiple threads appear to run simultaneously because the O.S. performs context switches between threads at frequent intervals. At each context switch, the O.S. saves the contents of the machine registers for the current thread and restores the state of the registers for another thread. Each thread has its own code, data, and stack segments, so that threads cannot ordinarily interfere with one another. C, unlike ADA or Modula 2, is single threaded, so that an executing C program has one instruction pointer and one set of memory segments. However, the ANSI Standard C library does provide a pair of functions, setjmp() and longjmp(), for saving and restoring the contents of the machine resisters. I have used (some might say abused) these functions to implement multiple threads within a single C program. The setjmp() and longjmp() calls are unusual, in that the longjmp() call, when successfull, never returns to the function that calls it, whereas the setjmp() call can return to the same function any number of times. The first call to setjmp(jmp_buf) saves the current state of the machine registers in the jmp_buf structure and returns 0. A subsequent call to longjmp(jmp_buf,int) restores the saved registers, causing the nonzero int specified in the calling argument to be returned by setjmp(jmp_buf) to the function that called it. Usually longjmp() is used to abort from errors in deeply nested functions without actually returning from all the functions. An error handler is installed with a C statement like: if (error=setjmp(buf)) e(error); else f(); The call to setjmp() returns false, so that function f() is called. Errors in f() or any functions called within f() can then be handled by calling longjmp (buf,error), which causes setjmp() to return error, so that e(error) is called. To use these functions to support switching among multiple threads requires several jmp_buf structures -- at least one for each thread of execution. Implementation To implement lightweight threads, the single thread of execution of a normal C program must be divided into independent tasks. Since a C program begins its life with a single stack segment, this segment must be divided into pieces, one for each thread. The ThInit(int n,int size) function does this by first calling setjmp() to mark an entry point for a thread, then calling itself until enough room for a thread (size bytes) has been used on the stock. It repeats this recursive process until the desired number of threads (n) has been created. The saved machine registers for each thread are kept in Threads, a global array of thread structures, one structure for each thread. Having thus divided up the stack, new threads can be created by ThNew(void (*root) (int)), which simply saves the address of the root() function for the new thread in Root, then does a longjmp() to restore the registers set by ThInit(). The reactivated ThInit() calls (*Root) (ThCurr), and the new thread is underway. ThNew() returns the ID of the new thread, which is a non-zero index into the Threads table. The threads created in this way are lightweight in two senses. First, they share the same code and data segments, and are thus not protected from each other by the operating system's memory management. Second, they are are treated as one process by the operating system, and thus are not automatically switched. Thus, the ThJump(int ID) function is needed so that a thread can cause a context switch to another thread, specified by ID. ThJump calls setjmp() to save the state of the current thread, then calls longjmp() to restore the saved state of the destination thread. The ThJump() function does not return until another thread jumps back to it, in which case it returns the thread ID of the jumper. Figure 2 shows a picture of the stack and the global Threads array while running two threads. Thread 1 is the initial thread (that is, the thread that called ThInit()), and Thread 2 is the currently running thread, which has made several function calls since it was jumped to by Thread 1. Communication And Deadlock Since lightweight threads share the same data segment they can communicate easily through global variables and shared memory buffers. On this basis, you may implement semaphores, pipes, messages, or any other communication discipline. Whatever communication method you choose, you must "beware the Jabberwock" of deadlock. For instance, a simple message passing scheme (not a very efficient one) can be implemented by placing the address of a message into an array of pointers, one for each thread, initialized to zero: Message = (char **) calloc(N_Threads, sizeof(char *)); and then jumping to the message destination: void msg_send(char *message,int destination) { int id = ThId(); Message[id] = message; do ThJump (destination); while (Message[id]); } The ThId() macro is used to get the ID of the current thread. The sending thread waits in a loop until the message is received. The destination thread can then receive a message with: char *msg_recv () { int id; char *m; do id = ThJump(ThNext()); while (!Message[id]); m -- Message[id]; Message[id] = 0; return m; } If no thread ever sends a message then the receiving thread will never leave the loop, a condition called starvation. This may not be a problem, since if no message is sent there may be nothing for the waiting thread to do. However, consider the following code, which waits for a message from a particular thread: char *msg_wait(int godot) { int id; char *m; while (!Message[godot]) ThJump(godot); m = Message[godot]; Message[godot] = O; return m; } If Thread 1 is in the loop, waiting for a message from Thread 2, and Thread 2 is also in the loop, waiting for a message from Thread 1, then neither thread will ever get out of the loop, and no other threads will get to run. This is a deadlock. Preventing Deadlock In general, deadlock can occur whenever threads must block to wait for exclusive use of a particular resource (memory buffer,screen, keyboard,disk controller, printer, etc.) that is in use by another thread. In the examples above I have implemented blocking by busy waiting in a loop. It would be better to add a flag to the thread structure, so that blocking could be handled by the kernel. It would then be possible to centralize deadlock control within the kernel. The easiest way to prevent deadlock is simply not to share resources. For instance, one thread might handle all file IO, another all printer output. Another easy solution is to arrange resources and threads in a hierarchy, so that there is never a conflict between threads. For instance, one thread might buffer keyboard input, while a second thread reads the keyboard buffer, performs computations, and buffers window output, and a third thread reads the window buffers and displays them on the screen. Other cases are harder, and the solutions tend to be task specific. For example, database programs can prevent deadlock, and ensure data integrity, with the concept of a transaction. A thread needing resources, such as write access to a set of data records, sets out to acquire them, one by one. If all the needed resources are acquired, the transaction succeeds and releases its resources for the next transaction. If any resource cannot be acquired, the transaction aborts, releasing all its resources. The thread then waits for a while, and tries again. Thus, no thread is ever holding a resource while waiting for another resource. For other examples on preventing deadlock, be sure to check the relevant literature for the tasks you are implementing. A good general discussion, with C source code and further references, can be found in Andrew Tanenbaum's Operating Systems: Design and Implementation (Prentice Ha11,1987). If you fail to ensure that your application is deadlock free you can look forward to mysterious system hangs and other evidence of Murphy's Law. Caveats I have tested the code presented here with Microsoft C 5.0 on my 386 clone and with MPW C 3.0 on my Macintosh SE. On my clone the kernel compiles to under one K of code, and executes over 80,000 jumps per second. (Ed: The code is available from the CUG Library; see New Releases.) Be sure to design and test very carefully if you implement a similar kernel. Although the setjmp() and longjmp() functions are portable, this implementation depends on non-portable details of stack implementation. Be especially careful not to overrun the stack areas set up for each thread. I have provided a simple ThProbe() macro that exits with a message if an overrun is detected, but I have succeeded in crashing threads with printf() between calls to ThProbe(). For a more powerful approach to ensuring stack integrity, see the article "A Stack Checking Function" by Eric White in The C Users Journal (Volume 7, Number 3, April 1989). If you exercise proper care, you will find the concept of lightweight threads to be a useful addition to your tool kit. Figure 1 Figure 2 Listing 1 /*********** THREAD.C COPYRIGHT 1989 GREGORY COLVIN ************ This program may be distributed free with this copyright notice. ***************************************************************/ #include "thread.h" thread *Threads; /* table of threads */ int ThCurr=1; /* current executing thread */ static int N_Threads; /* number of threads in table */ static int Free=2; /* first free thread */ static int Next=1; /* next runnable thread */ static char *Stack; /* bottom of stack for init */ static void (*Root)(void); /* for temporary use in exec */ thread *ThInit(int n,int size) /* create n size byte threads */ { int i; if (!N_Threads) { /* if just entered */ if (n < 2) return 0; /* error, n too small */ Threads= /* create table */ (thread *)calloc(n,sizeof(thread)); if (!Threads) return Threads; /* error, bad calloc */ Threads--; /* will index from 1 */ N_Threads= n, n= 1; if (setjmp(Threads[1].exit)) /* set exit point */ exit(0); /* exit init thread */ } else if (!Stack) { /* start new thread */ Stack= (char *)&size; /* at top of stack */ if (setjmp(Threads[n].exec)) { /* set entry point */ if (!setjmp(Threads[ThCurr].exit))/* set exit point */ (*Root)(); /* call root function */ Threads[ThCurr].free= Free; /* come here on exit */ Free= ThCurr; /* put on free list */ Next= Threads[Free].next; /* take off run list */ for (i=1; i <= N_Threads; i++) { /* clean up table */ if (Threads[i].parent == Free) /* if abandoned child */ Threads[i].parent = 1; /* adopt by init */ if (Threads[i].next == Free) /* patch run list */ Threads[i].next= Threads[Free].next; } ThCurr= Threads[Free].parent; /* will jump to parent */ Threads[Free].parent= 0; /* Free is parentless */ longjmp(Threads[ThCurr].jump,Free); } } if (Stack - (char *)&size < size) /* if not enough stack */ ThInit(n,size); /* push more stack */ else { /* done with a thread */ Threads[n].stack =(char *)&size; /* save top of stack */ Stack= 0 ; /* start new thread */ if (n < N_Threads) { /* if not done */ Threads[n].free= n + 1; /* link to free list */ ThInit(++n,size); /* push more stack */ } else Threads[n].free = 0; /* at end of free list */ } return Threads; /* done: return table */ } void ThFree() /* free the thread table */ { free (Threads+1); /* goodbye */ Threads= 0, N_Threads= 0; /* can init again */ } int ThNew(void (*root)(void)) /* fork and exec new thread */ { int parent, fork; ThProbe(); /* stack probe */ fork= Free; /* fork to free thread*/ if (fork == 0) return -1; /* error, none free */ Free= Threads[fork].free; /* take off free list */ parent= ThCurr; /* current is parent */ Threads[fork].parent= parent; /* set parent */ ThCurr= fork; /* will run fork next */ if (!Threads[Next].next) /* link to run list */ Threads[ThCurr].next= Next; /* make circular list */ else Threads[ThCurr].next= Threads[Next].next; Threads[Next].next= ThCurr; Next= ThCurr; /* next on run list */ if (setjmp(Threads[parent].jump)) /* put parent to sleep */ return fork; /* parent is awake */ Root= root; /* who to call */ longjmp(Threads[ThCurr].exec,fork); /* call root from init */ } void ThExit(void) /* exit to parent */ { ThProbe(); /* stack probe */ longjmp(Threads[ThCurr].exit,ThCurr); } int ThJump(int id) /* jump to another thread */ { int jumper, caller; ThProbe(); /* stack probe */ if (id == 0 ) /* if no destination */ id= Threads[ThCurr].next; /* next on run list */ if (id < 1 id > N_Threads Threads[id].parent < 0) return -1; /* error, bad id */ caller= ThCurr; /* where we came from */ if (id == caller) return ThCurr; /* nowhere to go */ ThCurr= id; /* where we are going */ if (jumper=setjmp(Threads[caller].jump)) return jumper; /* return who jumped */ longjmp(Threads[id].jump,caller); /* jump to ThCurr */ } static void test() { ThProbe(); printf("test: called from thread %d\n",ThId()); ThJump(0); printf("test: falling off thread %d\n",ThId()); } main() { int i; ThInit(3,2048); for (i=1; i <= 9; i++) { printf("main: loop %d\n",i); printf("main: created new thread %d\n",ThNew(test)); printf("main: created new thread %d\n",ThNew(test)); printf("main: exited from thread %d\n",ThJump(0)); printf("main: exited from thread %d\n",ThJump(0)); } } Listing 2 /*********** THREAD.H COPYRIGHT 1989 GREGORY COLVIN ************ This program may be distributed free with this copyright notice. ***************************************************************/ #ifndef THREAD #define THREAD #include <assert.h> #include <setjmp.h> #include <stdio.h> typedef struct { jmp_buf exec; /* state of thread for exec */ jmp_buf jump; /* state of thread for jump */ jmp_buf exit; /* state of thread for exit */ int parent; /* id of parent thread */ int nchildren; /* number of children */ int free; /* id of next free thread */ int next; /* id of next thread to run */ char *stack; /* top of stack for thread */ } thread; extern thread *Threads; /* table of threads */ extern int ThCurr; /* current thread */ #define ThProbe() { char p; assert(Threads[ThCurr].stack < &p);} #define ThId() ThCurr thread *ThInit(int n,int size); /* create n size byte threads */ void ThFree(void); /* free the thread table */ int ThNew(void (*root)(void)); /* fork and exec new thread */ int ThJump(int id); /* jump to another thread */ void ThExit(void); /* exit to parent */ #endif Writing Your Own Standard Headers: <stdlib.h>, <stddef.h>, <stdarg.h> And <limits.h> Dan Saks Dan Saks is the owner of Saks & Associates, which offers training and consulting in C and C++. He is a member of X3J11, the ANSI C committee. He has an M.S.E. in computer science from the University of Pennsylvania. You can write to him at 287 W. McCreight Ave., Springfield, OH 45504 or call (513) 324-3601. In "Writing Your Own Standard Headers: The String Functions" (The C Users Journal, Jan. 1990), I presented some basic rules for creating standard headers, and then I showed you how to apply those rules to create <string.h>. This article shows how to write five other headers you most likely need. But first, here is a non-standard header that simplifies writing the standard ones and eliminates some irritating portability problems. <quirks.h> The standard headers frequently use void and void * types. void indicates that a function returns no value, as in void exit(int); or to indicate that a function accepts no arguments, as in int rand(void); void * is the "generic data pointer" type used in declarations like void *malloc(size_t); void free(void *); Many old compilers don't recognize void as a keyword. For these compilers 'void' functions are written without a return type in the function declaration (it defaults to int), and char * is used instead of void * for generic pointers. You can express your intent more clearly if you define typedef int void; typedef char *void_star; These let you write declarations like void_star malloc(); void free(); which look more like Standard C. If your compiler generates code so that functions return ints the same way they return chars, then you can safely define typedef char void; and write declarations like void *malloc(); void free(); which looks even more like Standard C. Some compilers, like cc on UNIX 4.2 BSD, implement void as a keyword, but don't allow void * as a type. On these systems, you need only define void_star. After putting your definitions for void or void_star in a header called <quirks.h>, you should include it at the beginning of every standard header. These types will then almost appear to be built-in. You will need to include <quirks.h> explicitly only in source files that use none of the standard headers. quirks.h can smooth out other differences in dialects. For example, if your compiler doesn't implement the const and volatile keywords, you can add #define const #define volatile Listing 1 shows a version of <quirks.h> for DECUS C. The protective wrapper prevents repeated definitions of void. <stdlib.h> Like <string.h>, <stdlib.h> was invented by the ANSI standard. It declares the general utility functions in the standard library, summarized in Table 1. EXIT_SUCCESS and EXIT_FAILURE are codes used with the exit function to indicate a program's success or failure to the host environment. They expand to integral expressions that need not be constants. (An integral type is any of the signed or unsigned forms of char, short int, int or long int, or any enumerated type.) On MS-DOS and UNIX, the codes are usually defined by #define EXIT_SUCCESS 0 #define EXIT_FAILURE 1 Some systems, such as RT-11, define multiple levels of failure, such as warning, error, severe error, etc., one of which you must pick for EXIT_FAILURE. You can define additional codes like EXIT_WARNING, but they will clearly be non-portable. MB_CUR_MAX expands to a positive integer expression whose value is the maximum number of bytes in a multibyte character as determined by the currect locale. This is meaningful only if you already have multibyte character support, in which case MB_CUR_MAX is already in your header. I just set it to 1. RAND_MAX is the maximum value that can be returned by the rand function. It must be integral and constant. The return type of rand is int, so RAND_MAX is typically the value of the largest positive signed integer. The Standard stipulates that RAND_MAX must be at least 32767, but if your rand operates over a smaller range, use the smaller value until you rewrite the function. div_t and ldiv_t are structure types returned by the div and ldiv functions, respectively. You can define them as typedef struct {int quot, rem} div_t; typedef struct {long quot, rem} ldiv_t; where quot and rem may be in either order. wchar_t is the wide character type, an integral type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales. Like MB_CUR_MAX, this symbol relates to multibyte and wide character support. If you don't have it, use typedef char wchar_t; Listing 2 shows my <stdlib.h> for UNIX 4.2 BSD. Notice that the definition of NULL uses void_star from <quirks.h>. A protective wrapper surrounds wchar_t because it appears in other headers in <stdlib.h>. No wrapper protects div_t and ldiv_t because they appear only in <stdlib.h> and the protective wrapper around the entire header prevents them from being redefined. The abs and labs macros are just interim implementations, because the ANSI standard requires that, unless explicitly exempted, all library functions must be implemented as functions (so they are addressable). Functions declared in headers may also be implemented as a macro, provided that the macro is "safe" (i.e., it expands to code that evaluates each of its arguments only once), but abs and labs aren't safe. When abs is a macro, abs(*p++) will evaluate *p++ twice, producing both unpredictable results and unwanted side effects. <stddef.h> <stddef.h> contains some commonly used definitions, three of which -- NULL, size_t and wchar_t -- are also in <stdlib.h>. <stddef.h> also introduces a new type, ptrdiff_t, and a new macro, offsetof. My DECUS C implementation appears as Listing 3. ptrdiff_t is the type of the result of pointer subtraction. It is a signed integral type either int or long int. It doesn't need a protective wrapper because it isn't defined anywhere else. The macro call offsetof(t, m) returns the offset (in bytes) of member m within structure type t. offsetof expands to a constant expression of type size_t. m cannot be a bit-field. In the rationale for the ANSI committee suggests three possible definitions for offsetof: #define offsetof(t, m) \ ((size_t)&(((t *)0)->m)) or #define offsetof(t, m) \ ((size_t) (char *)&(((t *)O)->m)) or #define offsetof(t, m) \ ((size_t)((char *)&(((t *)X)->m) - (char *)&(x)) where X is some predeclared address. None of these definitions is guaranteed to be portable, but so far, the first one has worked on every system I've tried. <stdarg.h> This header defines a type, va_list, and three macros, va_start, va_arg, and va_end, which access the arguments to a function with a variable length argument list (like printf or scanf). <stdarg.h> is very similar to <varargs.h> found with many UNIX C compilers. Listing 4 shows a simple function, concat, that uses <stdarg.h>. The function heading is a prototype whose parameter list ends with an ellipsis (, ...), indicating that the length of the list is variable. va_list is the type of a data object that tracks the current position in the argument list. va_start initializes ap so that the first call to va_arg returns the value of the first argument in the list's variable part. Subsequent calls to va_arg return the values of the succeeding arguments. You must supply the argument's type to each call to va_arg since arguments in a variable length list may be of different types. (Bear in mind that the type of an argument in a variable length parameter list will be promoted so that it will not be an integer type smaller than int, nor will it be float.) va_end does any cleanup that might be needed. The implementation of <stdarg.h> depends on the compiler'sparameter-passing conventions. Most compilers pass arguments by pushing them onto the run-time stack. The rationale for the ANSI standard states that <stdarg.h> was designed to accommodate newer machines that may pass arguments in machine registers. Having no experience with C compilers for these machines, I will stick to the more common stack-oriented methods. Most MS-DOS compilers push arguments so that the first argument has the lowest address. Figure 1 shows the argument list format for a call to printf("%d %f %d\n", i, x, n) using a typical MS-DOS C compiler (where i and n are 16-bit ints and x is a 64-bit double). SP represents the value of the stack pointer. The figure shows the state of the stack just before jumping to printf. Listing 5 presents an implementation of <stdarg.h> for moat MS-DOS C compilers. va_start(ap, p) initializes ap to (va_list)(&(p) + 1) which is the address of the first parameter in the list's variable part (the parameter after p). Some implementations write this expression as (va_list)&(p) + sizeof(p) which is equivalent as long as va_list is char *. va_start should expand to a void expression, but many compilers erroneously omit the void cast. va_arg(ap, t) returns the value of the current argument addressed by ap (cast to type t), and advances ap to point to the next argument. On many compilers, you can implement va_arg as #define va_arg(ap, t) (*((t *)(ap))++) This auto-increment expression may be a little easier to understand than the one in Listing 5, but it relies on an extension to the C Standard. The standard states that a cast expression, such as (t *) (ap) is not an lvalue, so it cannot be the operand of ++. The version of va_arg in Listing 5 increments ap before applying the cast, then subscripts backwards to obtain the argument originally referenced by ap. It's more obscure, but stays within the standard. If your compiler lets you use the auto-increment expression, is there any reason not to? Yes. Consider Microsoft C v5.1. By default, the compiler lets you use various language extensions. You can implement va_arg as an auto-increment expression, but compiling your code with the /Za option (disable language extensions) produces a warning from the compiler. Microsoft implements va_arg as in Listing 5 so that it will work with every compiler option. On the other hand, Zortech C v1.07 also uses the auto-increment version of va_arg. However, if you compile code using va_arg with the -A option (enforce ANSI compatibility), you don't get a warning. This means the compiler can't warn you about using this language extension in you code. In most implementations va_end does nothing, but the standard states that it should expand to a void expression. If your compiler complains that ((void)0) is a useless expression, you can try using #define va_end(ap) ((void)((ap) = 0)) If generating unnecessary code bothers you, you can #define va_end(ap) which works fine when va_end is called in a separate statement (as in Listing 4), but produces a syntax error when va_end is embedded in nasty (but legal) expressions like va_end(ap), n = 1; If your compiler pushes the arguments so that the first one is at the highest address, then you should use an implementation of <stdarg.h> like the one in Listing 6. It differs from Listing 5 in two ways: va_start initializes ap to point to (instead of beyond) the last fixed argument, and va_arg uses a pre-decrement (instead of a post-increment) to step to the next argument. <limits.h> This header contains macros that define limits for the sizes and ranges of integral types. Table 2 lists the macro names and their meanings. The standard specifies a minimum magnitude (absolute value) for each limit. The version of <limits.h> in Listing 7 uses these minimums. All implementation may (may because it's permitted by the standard!) increase the magnitude of the limits, but any program that relies on extended limits will not be portable to all implementations. For example, SHRT_MIN and SHRT_MAX define the range of values for type short int. The standard requires the range to be at least -32767 to +32767 (decimal) -- the set of values that can be represented using 16-bit ones-complement or sign-magnitude arithmetic. On a two-complement machine, you can increase the magnitude of SHRT_MIN to -32768, but any program that stores -32768 in a short int might not work on other architectures. The standard allows the range of int to be as small as the range of short int. Hence, the minimum magnitudes for INT_MIN and INT_MAX are the same as for SHRT_MIN and SHRT_MAX, respectively. At the opposite extreme INT_MIN and INT_MAX could be as large as LONG_MIN and LONG_MAX, respectively. I recommend that you write your <limits.h> to use the actual ranges supported by your compiler. This lets you take full advantage of your architecture when efficiency is more important than portability. When portability is important, you must remember to avoid depending on the larger limits. CHAR_MIN and CHAR_MAX define the range of values for "plain" char. A compiler can choose to represent plain char as either signed char or unsigned char. If your compiler treats plain char as signed, then use #define CHAR_MAX SCHAR_MAX #define CHAR_MIN SCHAR_MIN Otherwise, use #define CHAR_MAX UCHAR_MAX #define CHAR_MIN 0 Some compilers let you select the representation of "plain" char. For example, Microsoft C v5.1 normally treats char as signed, but the /J option changes it to unsigned. This option also defines the macro _CHAR_UNSIGNED to allow conditional compilation (as in Listing 7) to determine the appropriate settings for CHAR_MIN and CHAR_MAX. Borland's Turbo C v2.0 provides a switch for selecting the the representation of "plain" char, but doesn't define a macro like _CHAR_UNSIGNED. In place of #ifndef_CHAR_UNSIGNED it uses #if (((int)((char)0x80)) < 0) According to the standard, #if expressions cannot use type casts or the sizeof operator. Therefore, this technique can be used only on a compiler that supports this language extension. It also means the compiler won't warn you about using this feature even when you ask it to disable language extensions. The standard states that every macro, except CHAR_BIT and MB_LEN_MAX is defined as an expression that has "the same type as would an expression that is an object of the corresponding type converted according to the integral promotions." For example, INT_MAX is defined as an expression of type int, and UINT_MAX is an expression of type unsigned int. On the other hand, the character range limits (such as UCHAR_MAX) are defined as int expressions, rather than as (signed or unsigned) char expressions, because character types are promoted to int when used in an expression. Notice that the unsigned limits are defined as unsigned constants. For example, UINT_MAX is defined by #define UINT_MAX 65535u in Listing 7. The u suffix on the constant makes it unsigned. Without the u, a decimal constant is either a signed int or a signed long int, depending on the compiler. For example, DECUS C treats 65535 as (-1), but Microsoft C treats it as 65535L (a long int). If your compiler doesn't support the u suffix, you can try to write unsigned int constants in octal or hex. For instance, some compilers with 16-bit ints treat 0100000 through 0177777 and 0x8000 through 0xFFFF as unsigned int constants. If that doesn't work, you can try #define UINT_MAX ((unsigned)65535) which might introduce another problem. Limits like UINT_MAX are supposed to be usable in #if expressions; however, this definition uses a cast, which (according to the standard) isn't usable. Even if your preprocessor won't accept casts in #if expressions, you might still find this definition useful in other contexts. A similar problem occurs when you try to set INT_MIN to -32768 on some two-complement machines (such as a PC) using 16-bit ints. In Microsoft C, 32768 is greater than INT_MAX, so it's a long int. Therefore, the definition #define INT_MIN (-32768) is wrong because it makes INT_MIN a long int. On the other hand, #define INT_MIN (-32767-1) only uses constants of type int, and so correctly defines INT_MIN as an int. What's Been Gained? I have shown how to write five standard headers: <string.h>, <stdlib.h>, <stddef.h>, <stdarg.h>, and <limits.h>. I have also presented <quirks.h>, which fakes a few new keywords that are missing from older compilers. With just these few headers, it's much easier to port Standard C code to older compilers. Figure 1 Table 1 Summary of <stdlib.h> Macros: EXIT_FAILURE, EXIT_SUCCESS, MB_CUR_MAX, RAND_MAX, NULL Types: div_t, ldiv_t, size_t, wchar_t Function Prototypes: void abort(void); int abs(int); int atexit(void (*)(void)); double atof(const char *); int atoi(const char *); long atol(const char *); void *bsearch ( const void *, const void *, size_t, size_t, int (*)(const void *, const void *) ); void *calloc(size_t, size_t); div_t div(int, int); void exit(int); void free(void *); char *getenv(const char *); long labs(long); ldiv_t ldiv(long, long); void *malloc(size_t); int mblen(const char *, size_t); int wctomb(char *, wchar_t); int mbtowc(wchar_t, const char *, size_t); void qsort ( void *, size_t, size_t, int (*)(const void *, const void *) ); int rand(void); void *realloc(void *, size_t); void srand(unsigned); double strtod(const char *, char **); long strtol(const char *, char **, int); unsigned long strtoul(const char *, char **, int); int system(const char *); #define _STDLIB_H_INCLUDED #endif Table 2 Macros defined by <limits.h>: CHAR_BIT - number of bits in the smallest object that isn't a bit field (a byte) SCHAR_MIN - minimum value for an object of type signed char SCHAR_MAX - maximum value for an object of type signed char UCHAR_MAX - maximum value for an object of type unsigned char CHAR_MIN - minimum value for an object of type (plain) char CHAR_MAX - maximum value for an object of type (plain) char MB_LEN_MAX - maximum number of bytes in a multibyte character, for any supported locale SHRT_MIN - minimum value for an object of type short int SHRT_MAX - maximum value for an object of type short int USHRT_MAX - maximum value for an object of type unsigned short int INT_MIN - minimum value for an object of type int INT_MAX - maximum value for an object of type int UINT_MAX - maximum value for an object of type unsigned int LONG_MIN - minimum value for an object of type long int LONG_MAX - maximum value for an object of type long int ULONG_MAX - maximum value for an object of type unsigned long int Listing 1 /* * quirks.h - eliminate quirks (for DECUS C) */ #ifndef _QUIRKS_H_INCLUDED #define const #define signed #define volatile typedef char void; *define _QUIRKS_H_INCLUDED #endif Listing 2 /* * stdlib.h - general utilities (for UNIX 4.2 BSD) */ #ifndef _STDLIB_H_INCLUDED #include <quirks.h> #define EXIT_SUCCESS 0 #define EXIT_FAILURE 1 #define MB_CUR_MAX 1 #define NULL ((void_star)0) #define RAND_MAX 2147483647 typedef struct {int quot, rem} div_t; typedef struct {long quot, rem} ldiv_t; #ifndef _SIZE_T_DEFINED typedef unsigned size_t; #define _SIZE_T_DEFINED #endif #ifndef _WCHAR_T_DEFINED typedef char wchar_t; #define _WCHAR_T_DEFINED #endif void abort(); double atof(); int atoi(); long atol(); void_star calloc(); void exit(); void free(); char *getenv(); void_star malloc(); void qsort(); int rand(); void_star realloc(); void srand(); int system(); /* * interim macro definitions for functions */ #define abs(j) ((j) >= 0 ? (j) : -(j)) #define labs(j) abs((long)(j)) /* * missing functions */ int atexit(); void_star bsearch(); div_t div(); ldiv_t ldiv(); int mblen(); int wctomb(); int mbtowc(); double strtod(); long strtol(); unsigned long strtoul(); #define _STDLIB_H_INCLUDED #endif Listing 3 /* * stddef.h - common definitions (for DECUS c) */ #ifndef _STDDEF_H_INCLUDED #include <quirks.h> #define NULL ((void *)0) #define offsetof(t, m) ((size_t)&(((t *)NULL)->m)) typedef int ptrdiff_t; #ifndef _SIZE_T_DEFINED typedef unsigned size_t; #define _SIZE_T_DEFINED #endif #ifndef _WCHAR_T_DEFINED typedef char wchar_t; #define _WCHAR_T_DEFINED #endif #define _STDDEF_H_INCLUDED #endif Listing 4 #include <stdio.h> #include <stdarg.h> #include <string.h> /* * Concatenate copies of a variable number strings into * s1. The list of strings must be terminated by NULL. * concat returns s1. */ char *concat(char *s1, ...) { char *s = s1; const char *t; va_list ap; va_start(ap, s1); while ((t = va_arg(ap, const char *)) != NULL) { strcpy(s, t); s += strlen(s); } va_end(ap); return s1; } int main(void) { char s[100]; puts(concat(s, "This ", "is ", "great!", NULL)); return 0; } Listing 5 /* * stdarg.h - variable-length argument processing (for stack- * oriented argument passing with the 1st argument at the * lowest address) */ #ifndef _STDARG_H_INCLUDED #include <quirks.h> typeder char *va_list; #define va_start(ap, p) ((void)((ap) = (va_list)(&(p) + 1))) #define va_arg(ap, t) (((t *)((ap) += stzeof(t)))[-1]) #define va_end(ap) ((void)0) #define _STDARG_H_INCLUDED #endif Listing 6 /* * stdarg.h - variable-length argument processing (for stack- * oriented argument passing with the 1st argument at the * highest address) */ #ifndef _STDARG_H_INCLUDED #include <quirks.h> typedef char *va_list; #define va_start(ap, p) ((void)((ap) = (va_list)&(p))) #define va_arg(ap, t) (*(t *)((ap) -= sizeof(t))) #define va_end(ap) ((void)0) #define _STDARG_H_INCLUDED #endif Listing 7 /* * limits.h - sizes of integral types (using minimum * magnitudes) * #ifndef _LIMITS_H_INCLUDED #include <quirks.h> #define CHAR_BIT 8 #define SCHAR_MIN (-127) #define SCHAR_MAX 127 #define UCHAR_MAX 255 #ifndef _CHAR_UNSIGNED #define CHAR_MAX SCHAR_MAX #define CHAR_MIN SCHAR_MIN #else #define CHAR_MAX UCHAR_MAX #define CHAR_MIN 0 #endif #define MB_LEN_MAX 1 #define SHRT_MIN (-32767) #define SHRT_MAX 32767 #define USHRT_MAX 65535u #define INT_MIN (-32767) #define INT_MAX 32767 #define UINT_MAX 65535u #define LONG_MIN (-2147483647) #define LONG_MAX 2147483647 #define ULONG_MAX 4294967295u #define _LIMITS_H_INCLUDED #endif Linked Lists In C++ Bob Jarvis Bob Jarvis is a Senior Capacity Planning Analyst for American Greetings Corporation who has been programming in C for four years, and C++ for a year and a half. His current interests include computer performance analysis and object-oriented programming. He can be reached at American Greetings Corp., 10500 American Road, Cleveland, OH 44144. C++ offers a number of useful enhancements to C, including the ability to easily derive new specialized classes from previously-defined classes. This class derivation technique is particularly useful when working with so-called "container" classes (i.e., linked lists, stacks, B-trees, etc.). Programmers should be able to derive specialized containers for each type of object used in a manner which ensures safe and efficient storage and use. This article presents an implementation of a double linked list class for C++, and discusses some of the problems encountered during the design and implementation of this class. What's In A List? A linked list is a data structure in which each individual data element is stored as a single node in the list and each node contains pointers, or "links", to other elements in the list. In a double linked list there are two pointers -- one to the previous element in the list and another to the next element. (In a simpler form, the single linked list, each node has a single pointer to the next element in the list. The single linked list simplifies list maintenance and reduces the storage requirements slightly, but eliminates the ability to traverse the list "backwards"). Unlike an array, physical storage order is irrelevant in a linked list; with pointers the previous and next elements may be located at a memory address before or after the current element. Three positions in the list are particularly important: the "head" (the first element in the list), the "tail" (the last element in the list), and the "current" element (the element currently being used). Note that these three positions may not necessarily differ from one another -- the current element may be the same as the head or tail elements. In fact all three logical positions will point to the same physical element if there is only one element in the list. Implementation Considerations Figure 1 shows the interrelationships of nodes within a hypothetical double linked list. We can see immediately that there are two distinct object types -- the linked list itself, and the nodes within the list. (You could also argue that there is a third object, namely the data stored within the list). Although my first cut at implementing LIST also implemented the list nodes as a true class, the present version implements the nodes as a simple structure. This was done primarily for programming convenience (I found it easier to derive subclasses this way). As implemented here, a LIST copies the data to be stored in the nodes and keeps a pointer to the copy. Copying eliminates any requirement that data stored in a LIST be static, but requires that LIST subclasses be able to properly allocate, copy, and delete instances of the data stored in the nodes, which in turn means that LISTs should be "aware" of the type of data that they're storing. This need for awareness creates a problem. As implemented, the base LIST class stores a string of bytes, but has no knowledge of what is actually being stored. If the object being stored in the list has pointers to dynamically-allocated memory, those pointers will be copied, creating two objects which point to the same memory area. When either one of those objects is destroyed, the memory pointed to by the one object is freed, leaving the remaining object with a pointer to a memory area which is no longer valid. The solution to this problem uses virtual functions to create, copy, and delete copies of the data stored in the LIST. The default functions provided in the base LIST class should work properly for all simple objects which do not contain pointers to dynamically-allocated storage. If the data objects being held in a class derived from LIST do contain pointers to dynamic storage the virtual functions (the create_data(), copy_data(), and delete_data() member functions) will have to be rewritten to correctly invoke the constructors and destructors as appropriate. (Note that having an operator=() function defined as part of the data object class greatly simplifies the task of rewriting the copy_data() function). In Listing 5 the copy_data() function demonstrates how casts can be used to simplify the process of rewriting the virtual functions. Usage Using the LIST class is fairly straightforward. An instance of the class is declared, then elements are added to it, retrieved, etc. In Listing 1 a LIST named ilist is declared and filled with a sequence of integers. These integers are then retrieved and printed. Deriving Sub-Classes While using the generic LIST class to hold a series of integers works quite well, adding elements to the list is a bit awkward. Two parameters must be supplied -- the address of the integer and the size of the element being added (in this case an integer). While a #define could be used to "neaten" things up, a LIST which specifically handled integers would be better. Creating such a LIST is not difficult to do since all of the member functions can be replaced by inline calls to the member functions in the original LIST class (Listing 4). In this case we create a class named INTLIST, derived from LIST. The INTLIST class does not automatically inherit the public members of LIST (it is declared as INTLIST : LIST rather than INTLIST : public LIST). This declaration limits users to the interface defined for INTLIST; otherwise, users could add other non-integer items by using the generic functions defined for LISTs. A new data item (curr_size) was added to the INTLIST class. curr_size holds the size of the current item in the list, and is equivalent to sizeof(int) except in the case where an action could not be satisfied (such as getting the next element after the last element in the list), in which case curr_size is set to zero. The test code using the LIST class can now be modified to use an INTLIST as in Listing 2. Summary The addition of container classes and the ability to derive specialized classes from them makes using C++ faster and more reliable than C. Programmers can concentrate on developing code to solve problems rather than writing and rewriting repetitive functionality such as linked lists. Figure 1 Listing 1 #include <stream.hpp> #include "list.hpp" main() { LIST ilist; int i,size,*iptr; for(i = 0 ; i < 10 ; ++i) { cout << "Adding " << i << "\n"; ilist.add_tail(&i, sizeof(i)); } cout << "\n"; iptr = ilist.get_head(size); while(iptr != NULL) { cout << "Retrieved " << *iptr << "\n"; iptr = ilist.get_next(size); } } Listing 2 #include <stream.hpp> #include "intlist.hpp" main() { INTLIST ilist; int i; for(i = 0 ; i < 10 ; ++i) { cout << "Adding " << i << "\n"; ilist.add_tail(i); } cout << "\n"; i = ilist.get_head(); while(ilist.get_curr_size() != 0) { cout << "Retrieved " << i << "\n"; i = ilist.get_next(); } } Listing 3 #include <stream.hpp> #include <stddef.h> #include <string.h> #include "list.hpp" LIST::LIST() // constructor { head = curr = tail = NULL; } LIST::~LIST() // destructor { struct listelem *work; while(head != NULL) { delete_data(head); work = head->next; delete head; head = work; } } void *LIST::get_head(unsigned int &sz) { curr = head; return get_curr(sz); } void *LIST::get_curr(unsigned int &sz) { if(curr == NULL) { sz = 0; return NULL; } sz = curr->size; return curr->data; } void *LIST::get_tail(unsigned int &sz) { curr = tail; return get_curr (sz); } void *LIST::get_prev(unsigned int &sz) { if(curr->prev != NULL) { curr = curr->prev; return get_curr(sz); } else { sz = 0; return NULL; } } void *LIST::get_next(unsigned int &sz) { if(curr->next != NULL) { curr = curr->next; return get_curr(sz); } else { sz = 0; return NULL; } } void LIST::add_before(void *vptr, unsigned int sz) { struct listelem *lptr; lptr = new struct listelem; if(lptr == NULL) exit(99); // ugly - should be fixed later lptr->size = sz; create_data(lptr); copy_data(lptr,vptr); // rearrange pointers if(curr != NULL) { lptr->prev = curr->prev; lptr->next = curr; if(lptr->prev != NULL) lptr->prev->next = lptr; else head = lptr; curr->prev = lptr; } else // curr == NULL - must be first element in list { lptr->prev = lptr->next = NULL; head = curr = tail = lptr; } } void LIST::add_after(void *vptr, unsigned int sz) { struct listelem *lptr; lptr = new struct listelem; if(lptr == NULL) exit(99); // ugly - should be fixed later lptr->size = sz; create_data(lptr); copy_data(lptr,vptr); // rearrange pointers if(curr != NULL) { lptr->prev = curr; lptr->next = curr->next; curr->next = lptr; if(lptr->next != NULL) lptr->next->prev = lptr; else tail = lptr; } else // curr == NULL - must be first element in list { lptr->prev = lptr->next = NULL; head = curr = tail = lptr; } } void LIST::add_head(void *vptr, unsigned int sz) } struct listelem *lptr; lptr = new struct listelem; if(lptr == NULL) exit(99); // ugly - should be fixed later lptr->size = sz; create_data(lptr); copy_data(lptr,vptr); if(head != NULL) { lptr->prev = NULL; lptr->next = head; head->prev = lptr; head = lptr; } else { lptr->prev = lptr->next = NULL; head = curr = tail = lptr; } } void LIST::add_tail(void *vptr, unsigned int sz) { struct listelem *lptr; lptr = new struct listelem; if(lptr == NULL) exit(99); // ugly - should be fixed later lptr->size = sz; create_data(lptr); copy_data(lptr,vptr); if(tail != NULL) { lptr->next = NULL; lptr->prev = tail; tail->next = lptr; tail = lptr; } else { lptr->prev = lptr->next = NULL; head = curr = tail = lptr; } } void LIST::delete_curr() { struct listelem *lptr; if(curr == NULL) return; lptr = curr; if(curr->prev != NULL) curr->prev->next = curr->next; else head = curr->next; if(curr->next != NULL) curr->next->prev = curr->prev; else tail = curr->prev; if(curr->prev != NULL) curr = curr->prev; else if(curr->next != NULL) curr = curr->next; else if(head == NULL && tail == NULL) curr = NULL; // list is now empty else { cerr << "LIST::delete_curr() : deletion sequence error\n"; exit(99); } delete_data(lptr); delete lptr; } // The following three functions should be replaced in all // derived classes. create_data() should allocate and // initialize space for a new instance of the class being // stored in the list (implying that you should derive a // new class for each type of object). delete_data() should // handle the destruction of class instances stored in a list. // copy_data() must take care of copying a complete class // instance from one place to another - it must properly // handle the situation where a class being stored in a list // has dynamically-allocated storage. As written these // functions should properly handle all simple types (that // is, types which have no dynamic storage). void LIST::create_data(struct listelem *lptr) { lptr->data = new char[lptr->size]; if(lptr->data == NULL) exit(99); } void LIST::delete_data(struct listelem *lptr) { if(lptr->data !=NULL) delete lptr->data; } void LIST::copy_data(struct listelem *lptr, void *from) { memcpy(lptr->data,from,lptr->size); } Listing 4 // LIST.HPP - linked list interface #ifndef _LIST_HPP #define _LIST_HPP class LIST { struct listelem { struct listelem *prev, *next; void *data; unsigned int size; }; struct listelem *head, *curr, *tail; public: LIST(); ~LIST(); void *get_head(unsigned int &sz); void *get_curr(unsigned int &sz); void *get_tail(unsigned int &sz); void *get_prev(unsigned int &sz); void *get_next(unsigned int &sz); void add_before(void *vptr, unsigned int sz); void add_after(void *vptr, unsigned int sz); void add_head(void *vptr, unsigned int sz); void add_tail(void *vptr, unsigned int sz); void delete_curr(void); virtual void delete_data(struct listelem *lptr); virtual void create_data(struct listelem *lptr); virtual void copy_data(struct listelem *lptr, void *from); }; #endif // ifndef _LIST_HPP Listing 5 // INTLIST.HPP - list of integers - derived from LIST #ifndef _INTLIST_HPP #define _INTLIST_HPP #include "list.hpp" class INTLIST : LIST // a LIST of integers... { unsigned int curr_size; public: INTLIST(void) {curr_size = 0;} int get_head(void) {return *((int*)LIST::get_head(curr_size));} int get_curr(void) {return *((int*)LIST::get_curr(curr_size));} int get_tail(void) {return *((int*)LIST::get_tail(curr_size));} int get_prev(void) {return *((int*)LIST::get_prev(curr_size));} int get_next(void) {return *((int*)LIST::get_next(curr_size));} void add_before(int i) {LIST::add_before(&i, sizeof(int));} void add_after(int i) {LIST::add_after(&i, sizeof(int));} void add_head(int i) {LIST::add_head(&i, sizeof(int));} void add_tail(int i) {LIST::add_tail(&i, sizeof(int));} void delete_curr(void) {LIST::delete_curr();} unsigned int get_curr_size(void) {return curr_size;} void copy_data(struct listelem *lptr, void *from) {*((int*)lptr->data) = *((int*)from);} }; #endif // _INTLIST_HPP Standard C Quiet Changes, Part II P.J. Plauger P.J. Plauger has been a prolific programmer, textbook author, and software entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee. Last month, I began the process of describing all of the quiet changes in Standard C. (See "Quiet Changes, Part I," CUJ February '90.) A quiet change is a change in the meaning of Standard C versus some earlier (and presumably popular) dialect of C. It is a change that converts an acceptable program with one behavior to an acceptable program with different behavior. You get no diagnostic to warn you that your program may have to be altered to keep its old behavior. Needless to say, quiet changes are not nice. Committee X3J11 did its best to minimize them. It considered each such change very carefully. And it documented all the quiet changes it felt compelled to make in the rationale that accompanies the standard. My goal with this column and the previous one is to show you all the quiet changes. With each one, I give an example of code that might be affected. I also endeavor to explain how the committee was led to introduce the change. Last issue, I covered about half of the quiet changes. Here are the rest. More Quiet Changes "A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint.This is considered the most serious semantic change made by the committee to a widespread current practice." For example, unsigned char uc = digit; if (uc - '0' > 'g') printf("not a digit\n"); The message is no longer printed for all cases where digit is out of range. Some older implementations always performed an unsigned compare and hence cleverly got the test right for all cases. The fact that I devoted a whole column to this issue should tell you that it has a lot of ramifications. (See "Standard C Promotes Types According to Value Preserving Rules," CUJ August '88.) Here I will simply summarize the issue. It required the committee to choose between two divergent classes of dialects. Neither could be dropped without causing quiet changes in programs written for the other dialect. The divergence occurred when C acquired the additional unsigned types besides unsigned int. These new types required additional rules for promoting integer types, such as when you subtract an unsigned char and an int. The decision inside Bell Labs was to preserve unsignedness. That is, if either operand is an unsigned type, both operands are promoted to the "cheapest" computational unsigned type that is at least as wide as the wider operand. (The computational types are the signed and unsigned versions of int and long.) So unsigned char minus int yields a result of type unsigned int. What I and some other implementors chose to do was different. We chose to promote both operands to the cheapest computational type that would represent all the values of each of the two operand types. So unsigned char minus int yields a result of type int, so long as unsigned char has a narrower representation than int. Unlike the "unsigned preserving" promotion rules, the result type is different for different target machines. On the other hand, the result more often has a type that correctly reports a negative value as a negative signed result, not some huge unsigned result. That is why the second set of promotion rules has been dubbed "value preserving." After much heated debate, the value preserving faction won. The most convincing argument was that value preserving promotions produced surprising results less often. The most convincing counter argument was that UNIX, and lots of other important code, was written using unsigned preserving promotions. Groups had successfully ported UNIX, however, using a compiler with value preserving promotions. That quelled enough fears for the committee to reach consensus. My belief is that this was a tempest in a teapot. (Naturally, it didn't feel that way to those of us in the teapot at the time.) It's hard to contrive realistic examples that quietly change along with these promotion rules. I gave the most compelling one I could think of above, in the spirit of fair play. If you're still worried, however, go back and read the full diatribe in my original column on the subject. "Expressions with float operands may now be computed at lower precision. The Base Document specified that all floating point operations be done in double." For example, float x, y; x = x - y; The subtraction can now be performed to float precision, which could retain less significance than in the past. C has traditionally performed all floating point arithmetic in double. This minimized type mismatches for floating point arguments back before there were function prototypes. It also happened to model the behavior of the PDP-11 floating point hardware used in the first implementation of C. Unfortunately, that traditional behavior has been one of the principal reasons why more FORTRAN programs have not been converted to C. The performance penalty can be substantial. What you gain in retained precision is often not worth the cost, particularly to programmers skilled in juggling precisions. The committee had little trouble changing the promotion rules to match FORTRAN more closely. We felt that few programs depend critically on the higher intermediate precision promised in the past when adding operands of type float. "A program that uses #if expressions to determine properties of the execution environment may now get different answers." #if -1U/2+1 == 1U<<31 /* int is 32-bits, 2's-complement */ The comment is not necessarily true for the target environment. Some C programs determined properties of the execution environment by testing how the C preprocessor performed integer arithmetic. The assumption was that preprocessor arithmetic was the same as for the target environment. That happens not to be true for most cross compilers. It is also not true for many compilers designed to support numerous target environments. The committee decided that target-independent preprocessors were a Good Thing. To be sure, compile time arithmetic must retain at least the same precision and range as arithmetic on the target. But it need not slavishly match all of the foibles of the target. Instead of writing clever (and unreadable) expressions in #if directives, programmers are now urged to test the values of appropriate macros. The standard header <limits.h> defines macros that tell you all sorts of interesting things about the representation of integers on the target. The standard header <float.h> defines macros that tell you more than you're likely ever to want to know about the representation of floating point types. The above example should be changed to #include <limits.h> #if INT_MAX == 2147483647 && \ INT_MIN == -2147483648 /* int is 32-bits, 2's-complement */ An implementation can still promise to model the target arithmetic in the preprocessor. In that case, the clever programs need not change. If you want to port them to another implementation, however, you'd probably have to change the #if expressions anyway. "The empty declaration struct x; is no longer innocuous. For example, f() { struct x; /* special meaning */ struct y { struct x *px; ..... }; struct x { struct y *py; ..... }; The first declaration now assures that the two structures point at each other, regardless of any outer context. Just as block structure does not mix well with external linkage, it collides at times with forward references as well. C lets you declare a structure tag before you declare the contents of the structure. (It is an incomplete type.) You need to make such a forward reference when you declare two structures each of which contains a pointer to the other. Unfortunately, C had no way to shield a forward reference from a structure tag definition visible in an outer block. Should you wish to plunk down a patch of code containing forward references to tags in an arbitrary code environment, you ran the risk of having the patch misbehave in some contexts. This defeats much of the purpose of block structuring to protect name spaces. This is an esoteric problem. You've probably never encountered it and you probably never will. Nevertheless, the committee decided to solve it by giving special meaning to an otherwise empty declaration such as struct x;. You can contrive a program that breaks because this esoteric problem has been fixed. You will have trouble convincing me that it is a program worth writing. "Code which relies on a bottom-up parse of aggregate initializers with partially elided braces will not yield the expected initialized object." You can, of course, initialize structures containing structures, or arrays of arrays, or arrays of structures. For any complex initializer involving aggregates, you write braces around the stuff for each aggregate to set it off. Unfortunately, C has a long tradition of letting you omit all but the outermost set of braces. For a compiler writer, this is a nightmare. When implementors on the committee started comparing nightmares, matters got even worse. It seems that people had settled on at least two different ways to parse aggregate initializers with partially elided braces. In terms of parsing theory, the two general camps can be characterized as "top-down" and "bottom-up." I will not bore you with detailed examples to illustrate the subtle differences. What you need to know is that the committee eventually endorsed the top-down approach to parsing initializers. You also need to know that omitting braces is a great way to confuse yourself and future maintainers. Never mind that Standard C translators are all supposed to guess the same way. If an initializer is sufficiently complex, you are asking for trouble if you omit any of the internal braces. Any program that suffers from this quiet change already faced portability problems in the past. "Type long expressions and constants in switch statements are no longer truncated to int." For example, on an implementation where type int occupies 16 bits, long lo = 0x20001; switch (lo) { case 0x0001: /* no longer matches */ The switch comparisons are now performed with long arithmetic, so the first case does not match a truncated value. The committee entertained proposals for all sorts of improvements to switch statements. We decided against permitting floating point or pointer expressions to control switch statements. On the other hand, we found little justification for continuing to rule out the other integer computational types besides int. A switch statement whose control expression is of type long will now do all its comparisons against case values with long arithmetic. Conceivably, an existing program contains a switch expression of type long. Conceivably, the program depends upon the value being altered when it is converted to type int for the comparisons. If so, then the program will quietly change its behavior. The likelihood of such an occurrence is reasonably remote. "Functions that depend on char or short parameter types being widened to int, or float to double, may behave differently." For example, f(x, y) char x; { if (y == 0) x = 500; In Standard C, x remains type char. Hence, the stored value would probably be truncated. Many past implementations would promote x to type int. One school of thought in the past was that a parameter of type char was silly. Everyone knew that the argument value was actually passed as type int, so why not just rewrite the type of the parameter to match the passed value? The alternate school was that the programmer's wishes should be obeyed. However the argument value was passed, it should be stored in a data object of the declared type. You get fewer surprises that way. Besides, for most integer parameters, you can make the conversion "free" just by picking the right piece of the argument value as the parameter data object. The second school of thought prevailed in the end. That causes trouble for programs written for translators that rewrite parameter types. (Again, the trouble was already there for people who tried to move such programs between implementations.) Look for places where you declare a parameter as other than the "widened" type (the type actually passed). If the function stores values too large for the declared type, you will have a quiet change. If the function takes the address of that parameter, you may well have a quiet change. Surprisingly, however, the change often has no effect. "A macro that relies on formal parameter substitution within a string literal will produce different results." For example, #define pr(msg) printf("msg\n") Standard C will not alter the string when it expands the macro. Some earlier implementations would do so. The folks at Berkeley decided that macros should be able to generate tailored string literals. Consequently, they adopted the convention that a macro parameter name within a string literal should be replaced by the actual parameter. (This happens only for string literals written as part of the expansion text of a macro, never outside macro expansions.) There was general support for such a mechanism among members of the committee. Some of us, however, objected to this particular approach. (I was the loudest of the objectors.) It would mean greatly complicating the lexical description of string literals. They were already complicated enough with escape sequences and string concatenation. Adding the concept of embedded identifiers was appalling. Instead, the committee adopted a new convention. Any parameter name you precede with a # gets turned into a string literal. With string concatenation, you can paste the "stringized" parameter into a larger string. The capability is the same, just the machinery differs. The example above must be changed to #define pr(msg) printf(#msg "\n") Nevertheless, any program that depends upon this practice will suffer a quiet change. "A program which relies on size-0 allocation requests returning a non-null pointer will behave differently." For example, sscanf("%u", &size); p = malloc(size); If size is zero, the behavior is now undefined. The value stored in p may be a null pointer, or the program may abort, for example. An unfortunate schism developed within the committee over the proper behavior of malloc(0). Should this produce a non-null pointer to an object of zero size? Or should it be considered invalid so that an implementation must diagnose it (perhaps at run time)? It is a religious issue, touching on people's basic beliefs as to what constitutes elegant behavior. Like most religious issues, many bystanders quickly tire of arguments on either side of the matter. That made it all the more difficult for the committee to achieve an informed resolution. The net result was that both sides lost. The "compromise" was to label malloc(0) undefined behavior. This means that programmers can't depend on its working right. They also can't depend on its being diagnosed. A program that allocates objects of varying size may now suffer a quiet change. Where once it could handle the occasional zero-size object without special code, now it can fail. Conclusion If you look back over this set of quiet changes, you will find much cause for hope. The hope is that nearly all of the changes need not remain quiet. A good translator can have extra checks to look for these cases and emit warning messages. Only the last, concerning zero-size objects, must be augmented by run-time checking. I have not yet heard of an implementation that offers to check for quiet changes. One that does will be a valuable migration tool. (You don't want the checks turned on all the time, only for old C code that you are upgrading.) Vendors please take note. Dr. C's Pointers(R) Void Pointers, Jump Tables, And Friends Rex Jaeschke Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091 or via UUCP at uunet!aussie!rex. This column started out as an example of using a void pointer to point at an object having one of a set of possible types. The intent was to also record the type to which it currently pointed and to discuss an efficient way of accessing the underlying object at a later time. I did achieve these goals but along the way, digressed into a number of other interesting areas as you shall see. Void Pointers ANSI C adopted the notion of a generic pointer from C++. A generic pointer is declared as void *pv and may point to any object or function. C does not require pointers to different types to have the same representation, and on word and segmented architectures there are often two or more different pointer representations. Since a generic pointer must be able to store an address with the smallest resolution and a char is the smallest addressable object in C, a void pointer must be at least as big as a char pointer. (In fact, ANSI C requires they have exactly the same representation.) A void pointer may contain any arbitrary address and at different times, could point to a char, a double, or a structure of some type, for example. A void pointer does not record any information about the object (or function) to which it currently points -- it is the programmer's responsibility to keep track of this (just like the current contents of a union). Since the compiler knows nothing about the object (or function) to which a void pointer points, such a pointer has several restrictions: You cannot dereference a void pointer, and you cannot do arithmetic on it -- both operations require knowledge of the underlying object type. The Linked List Problem In some applications, it is useful to have a linked list where each link describes an object whose type may be different from that of other objects described by other links. An example might be a linked list of device control blocks in an operating system. The format of control blocks for different devices will likely vary. If each link in a list has a different format, how can the linked list be declared? This would require the forward and backward pointers (assuming a doubly-linked list) in a link to be of type void * (so they could point to any object type) and it would also require a flag in the link to indicate the type of the forward and backward objects. This can be cumbersome particularly if a link points to more than two places. (You may have multiple linked lists linked to each other, for example.) The approach I have taken is somewhat similar but it avoids the flag field for forward and backward pointers. Essentially, each link contains a void pointer that points to some underlying object and the type of that object is stored in a field in the link. When you point to an object of type A, the flag is updated to record that. And a special flag value is used to indicate when the pointer does not point to an object. The Switch Solution There are several ways to implement the code. One uses the switch construct, as in Listing 1. The five macros TYPE* represent the five possible states of the flag field objtype. Their values must be distinct but otherwise are immaterial, as is the fact that objtype is unsigned -- it could just as easily have been signed. To simplify the example, I have allocated space for only one link and have ignored the possibility of malloc returning NULL. Each link contains a pointer (pfwd) to the next link in the list, a pointer (pbwd) to the previous link, a pointer (pobject) to some generic object, and a flag (objtype) indicating the type of the object to which pobject points. The link is initialized to point to a double object and the forward and backward pointers are set to NULL to indicate the end of the list. You might well ask, "Why not use calloc to allocate the space so it's initialized and the pointers need not be set to NULL in the program?" Certainly, that approach will succeed on some systems. However, it is not maximally portable. calloc initializes the allocated space to all-bits-zero and while that's the representation of integer zero it need not represent floating-point zero, or in this case, the null pointer constant. ANSI C does not require that NULL be represented internally as all-bits-zero. It simply requires that the null pointer be a value that is not the address of an object or function, and that comparing and assigning pointers with integer zero actually works. Once the list has been constructed, it becomes necessary to process the objects to which each link points. The problem here is how to do this efficiently? In this example, the switch construct is used and the correct answer is produced -- the link does indeed point to a double object. But is this approach efficient? The real question comes down to "How is a switch implemented?" ANSI C does not specify this; it simply requires that in the absence of a break statement (or similar) that you drop through from one case to the next. And if each case is mutually exclusive, the ordering of the cases (including default) can be arbitrary. A lot of programmers believe (not necessarily for any good reason) that the order in which they specify the case labels, is important. This may or may not be true depending on your implementation and the set of case label values. For example, if the set of labels is dense (as in this case), the compiler might generate a jump table of addresses. (It might even be able to take advantage of a hardware case instruction such as exists on the VAX.) Certainly, this would make for efficient code. If, on the other hand, the set of label values is sparse the compiler may generate a series of nested if/else constructs and it may do them in the order in which the cases are specified, the reverse order, or possibly in some other order. (As an exercise, if your compiler can produce a machine-code listing, do so for each of the three solutions shown in this article. Compare the code generated for the switch, if/else, and jump table approaches.) The bottom line is that you are never guaranteed (by C) that the first case is tested for before the second (and the third, etc.) so specifying the most common case label value first need not be the most efficient approach. And if nested if/elses are used, the number of tests made to resolve the branch will be proportional to the number of cases defined. The casts are needed since you cannot dereference a void pointer directly. The if/else Solution Whereas the switch construct provides no guaranteed order of case value matching, the if/else construct does. For example: if (pnode-objtype == TYPECHAR) printf ("char: %c\n", *(char *)pnode-pobject); else if (pnode-objtype == TYPEINT) printf("int: %d\n", *(int *)pnode-pobject); else if (pnode-objtype == TYPELONG) printf("long: %ld\n", *(long *)pnode-pobject); else if (pnode-objtype == TYPEDOUBLE) printf("double: %f\n", *(double *)pnode-pobject); else if (pnode-objtype == TYPENONE) printf("none:\n"); Now we have complete control of the order in which the tests are done. However, this ordering is fixed and favors those values near the front of the set of tests. It also cannot take advantage of any jump table generation the compiler might be able to do (unless the compiler has a very, very clever optimizer). The Jump Table Solution It so happens this problem can be solved and in a manner that involves no priority of testing. That is, the underlying object can be processed efficiently without regard to its type. (More correctly, I should say the code to do the processing is dispatched without favoritism.) Of course, there are always trade-offs and in this case, the code to process each object type must be in a function. That is, we must call a function to do the work whereas with the switch and if/else approaches, the work could be done inline(Listing 2). The key to the solution is the object funtable. It's an array of five objects each of which is a pointer to a function that has no return value and has one argument, of type pointer to void. The array is initialized with the addresses of the five object type processing functions pro*. An array of function pointers is often referred to as a jump table. It is absolutely critical that the order of the initializer expressions for funtable exactly match the values assigned in the TYPE* macros since we will use these macros to index into the funtable array. That is, the macro corresponding to pronone (TYPENONE) must have a value of zero since that is the first subscript value. The expression (*funtable[pnode-objtype]) (pnode-pobject); actually dispatches the type processing code. Following the operator precedence table, funtable is first subscripted using the type flag giving the address of the appropriate function. Then that function is called with the generic address of the underlying object being passed as the only argument. Regardless of the number of possible values for pobject, you only need this one statement to call the processing function -- all type processing functions take equally long to dispatch since they all require one lookup in funtable. To change the number of types, you simply need to define the new processing functions and add them to the table initializer list. The concept of controlling the order in which types are tested for no longer exists since using the flag as a subscript you intuitively know the function to be used each time. The messy looking cast expressions are still present in each processing function. Why couldn't proint (for example) be defined as void proint(int *parg) { printf("int: %d\n", *parg); } instead of void proint(void *parg) { printf("int: %d\n", *(int *)parg); } Again, this may work on some systems but, according to ANSI C, the behavior is undefined. Specifically, in main a void pointer is passed yet in proint an int pointer is expected. As stated earlier, these two pointer types are not required to have the same size and representation. (On a word machine such as a Cray supercomputer such mismatching will likely result in the wrong answer for all characters except the first in a given memory word.) The function prochar is a special case since it could be defined to expect a char pointer. And since char pointers and void pointers are required to have the same representation, this would work. However, in both cases (proint and prochar) the formal argument list in the definition would not match the prototypes for these functions. And if you change the prototypes to match, the table initializer will be erroneous. By definition, every function pointer must point to a function having the same argument list as well as return type. You could bypass the strict checking rules by leaving the argument information out of the table declaration but this still won't help you. In the absence of a prototype in the table declaration, the actual void pointer argument will be passed as is, giving rise to the mismatch problem with the formal argument as discussed earlier. In short, the functions must all have the exact same argument list thus requiring the explicit cast before dereferencing. Even pronone must have an argument despite the fact it is never used. Just what is the cost of a cast anyway? None at all on systems where all pointers are created equal. (This is typically the case on byte architectures having a linear address space.) On word and segmented architectures, most pointer conversions are also nonevents except where either the cast operand or the cast type is a char (or possibly short int) pointer. So don't be too concerned about the cast generating code. It was implied earlier that requiring each type's processing code to be a function might be inefficient since we have added the overhead of calling a function. Depending on this cost, it may or may not be significant. Also, an increasing number of compilers are adding the ability to automatically inline functions in each place they are called. (VAX C recently added this in V3 and C++ supports it explicitly using the inline keyword.) Enumerations Versus Macros In all three solutions, macros were used to come up with a set of unique integer values. The same result can be achieved using an enumerated type as follows: enum {TYPENONE, TYPECHAR, TYPEINT, TYPELONG, TYPEDOUBLE}; Not only do we get a set of unique int values, they also start at zero (as required by the jump table approach). And we are relieved from having to assign the numbers explicitly. Regarding the spelling of the enumerations constant identifiers; should they be in upper- or lower-case? If you follow the rule "All upper-case for macros and all lower- or mixed case for other identifiers" then they should be all lower-case. When I see an identifier written in upper-case I immediately understand that identifier might expand into an arbitrarily complex expression and I should take care how it's used. Since an enumeration constant "expands" to a simple integer constant the connotations of spelling it in upper-case are unwarranted. In the final analysis though, I don't think your choice will have significant stylistic ramifications. One final thing about the enum declaration; it has no tag and as such, no objects can later be declared to have that type. And although tagless structure and union declarations declared like this are useless, this is not so for enumerations. The scope of the enumeration constants declared inside the braces goes beyond the use of objects of that enumerated type. These constants have type int and can be used even though no enumerated objects of that type are actually declared. From my experience, enums are mostly used in just this manner. Listing 1 #include <stdio.h> #include <stdlib.h> /* structure type flag values */ #define TYPENONE 0 /* Not pointing at an object */ #define TYPECHAR 1 /* char */ #define TYPEINT 2 /* int */ #define TYPELONG 3 /* long */ #define TYPEDOUBLE 4 /* double */ struct node { struct node *pfwd; /* forward ptr */ struct node *pbwd; /* backward ptr */ void *pobject; /* ptr to object */ unsigned int objtype; /* indicate object type */ }; main() { char c = 'A'; int i = 10; long int 1 = 123456; double d = 123.45; struct node *pnode; pnode = malloc(sizeof(struct node)); /* let's point to a double */ pnode->pobject = &d; pnode->objtype = TYPEDOUBLE; pnode->pfwd = NULL; pnode->pbwd = NULL; /* at a later point, let's process the object to which we point */ switch (pnode->objtype) { case TYPECHAR: printf ("char: %c\n", *(char *)pnode->pobject); break; case TYPEINT: printf("int: %d\n", *(int *)pnode->pobject); break; case TYPELONG: printf("long: %ld\n", *(long *)pnode->pobject); break; case TYPEDOUBLE: printf("double: %f\n", *(double *)pnode->pobject); break; case TYPENONE: printf ("none:\n"); break; } } The output generated by this program is: double: 123.450000 Listing 2 #include <stdio.h> #include <stdlib.h> void prochar(void *parg); void proint(void *parg); void prolong(void *parg); void prodouble(void *parg); void pronone(void *parg); /* structure type flag values */ #define TYPENONE 0 /* Not pointing at an object */ #define TYPECHAR 1 /* char */ #define TYPEINT 2 /* int */ #define TYPELONG 3 /* long */ #define TYPEDOUBLE 4 /* double */ struct node { struct node *pfwd; /* forward ptr */ struct node *pbwd; /* backward ptr */ void *pobject; /* ptr to object */ unsigned int objtype; /* indicate object type */ }; main() { char c = 'A'; int i = 10; long int 1 = 123456; double d = 123.45; static void (*funtable[])(void *parg) = { pronone, prochar, proint, prolong, prodouble }; struct node *pnode; pnode = malloc(sizeof(struct node)); /* let's point to a double */ pnode->pobject = &d; pnode->objtype = TYPEDOUBLE; pnode->pfwd = NULL; pnode->pbwd = NULL; /* at a later point, let's process the object to which we point */ (*funtable[pnode->objtype])(pnode->pobject); } /* processing functions */ void prochar(void *parg) { printf("char: %c\n", *(char *)parg); } void proint(void *parg) { printf("int: %d\n", *(int *)parg); } void prolong(void *parg) { printf("long: %ld\n"; *(long *)parg); } void prodouble(void *parg) { printf("double: %f\n", *(double *)parg); } void pronone(void *parg) { printf("none:\n"); } Questions & Answers More On Keyboard Routines, A Preprocessor Puzzler Kenneth Pugh Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 493-4390. When you hear the answering message, press the * button on your telephone. Ken also receives email at kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP). Q I am writing a program using Microsoft C v5.1 that connects to a mainframe session to execute an on-line mainframe application. I used the High Level Language Application Programming Interface (HLLAPI) to do the terminal emulation. One of the options that I am trying to incorporate into this program is the ability to toggle a debug switch on or off that will print certain information to an audit file. My intention is to use a function key for this purpose. When I send a string of keystrokes to the mainframe session using HLLAPI, these characters are placed in the keyboard buffer. If I press a function key, this is also placed in the keyboard buffer and sent with the other characters to the mainframe session, thus messing up the mainframe command in addition to losing the function key before the program finally gets around to testing to see if the function key has been pressed. In IBM Compiler BASIC, the ON KEY (x) command would handle this situation, presumably by doing an interrupt whenever the appropriate function key is pressed. I have not been able to find a similar prewritten function in C. Therefore, I have been attempting to learn how to code an interrupt, but have not been able to find a really good book that explains them well. From the bits and pieces I have been able to find, I think that I need to change interrupt 9 or 15 or 16, but can't figure out which one or how. What I need is an example of how to code this interrupt and a good book on the subject (one that uses C code for examples, not assembler, if possible). Mike Drew Woodridge, IL A You could write a TSR that intercepts the keyboard interrupt, tests the key, and either sets a debug flag or passes it on. The Blaise toolkit provides an easy way of creating a TSR, although there is a bit more overhead than what you require for your one operation. That's the simple approach, with no assembler required. Microsoft provides an "interrupt" modifier for a function type which allows it to be a handler for an interrupt. It receives as parameters the values of the register when the interrupt was called. To handle an interrupt you: 1. create a function to perform the desired service with the interrupt type modifier. 2. call_dos_getvect() to get the current interrupt handler's address 3. call_dos_setvect() to set the interrupt location to your function's address 4. call_chain_intr() to chain to the previous interrupt's address. For your keyboard routine, you want to connect to the key I/O interrupt 0x16. The characters typed on the keyboard are handled by the keyboard interrupt routines (0x9) and are placed in a circular buffer. When the key I/O routine is called by a user program (such as by kbhit() or getch()), it checks the keyboard buffer for the next character to return. Using far pointers, your function can look at this circular buffer and eliminate your hot key, if it is in there. The offset of the head of the buffer is at 0040:001A, the tail offset is at 0040:001C, and the offset of the end is at 0040:001C. The offsets are relative to segment 0040. Note that these addresses are only good for IBM-PC compatibles. The IBM Technical Reference Manual gives the full information, but unfortunately it is documented in comments to assembly code. Q I am currently using Turbo C v2.0 with MS-DOS v2.2 on my Zenith PC. However, I also own an older KayPro portable equipped with a Z-80 processor and running CPM 2.2. At the present time finances preclude replacing the KayPro with an MS-DOS battery powered laptop. As a result, I am searching for a CPM/Z-80 C compiler for the KayPro, that would allow me to develop C code on the PC and then recompile and run it on the KayPro. In addition I am not willing to pay a large price for the CPM compiler because of its memory and speed limits. If you have information regarding CPM/Z-80 C compilers, I would appreciate your response. Ronald L. Nave Horsham, PA A I used to use the Manx Aztec Compiler to perform exactly what you are talking about. In addition, I also ported the C code from the IBM-PC to an Apple II. Although you could use the latest version of the compiler on the PC, it might be better to use an old version that matches the version available for CP/M. (None of the recent ANSI features are available in the CP/M version.) If you use the older version on both machines, you will not be tempted to use some of the more modern features, such as structure assignment. This version may be available from Manx or on the used compiler market. Q We are using Microsoft C v5.0. We would like to define a manifest constant to be used both as a number and as part of a message string. For example: #define Max 10 extern short Value; extern DoMsg(char *Msg) if (Value > Max) DoMsg(<what goes here?>); One way to do this is to itoa() the constant and assemble the message (at runtime), (or printf could be made to do this), but we thought we could get the preprocessor to do this sort of thing. All our attempts -- pasting, stringizing, etc. -- have been foiled. For example, use of the stringizer, "#", suppresses macro expansion, so the constant comes through unsubstituted. If we set up a pasting macro, there's no way (that we could find) to paste quotes onto something. The point of all this is to be able to have just one manifest constant to change if the value changes, and have that apply to both value uses and string uses within the module. Help! Josh Cohen Stuart Downing Dexter, MI 48130 A You got me on this one. I had assumed that the following would have worked, as you tried, till I read the ANSI standard again. #define MAX 10 #define STRING_WITH_VALUE(X)"Error message " #X DoMsg(STRING_WITH_VALUE(MAX)) The # operator is a new ANSI preprocessor operator that only works as part of a #define with tokens (a function-like macro). It is a "stringizing" operator. It forms a string literal (a set of characters in quotes) from the token that follows. However the token is not processed for replacement. Thus the output of the preprocessor looks like: DoMsg("Error message " "MAX") and not DoMsg("Error message " "10") In the first case, the implicit concatenation of string literals in ANSI C yields the following: DoMsg("Error message MAX") If the output of the preprocessor had been the second case, then you would have gotten exactly what you wanted. Replacement of #define names is not supposed to occur within a string literal. The operation of the # operator appears to be consistent with that philosophy. Perhaps a reader can solve this problem for us. A solution would certainly be useful. For example, I have been meaning to rewrite quite a bit of code to look something like: #define WFIRST_NAME 20 #define WLAST_NAME 30 struct s_record { char first_name[WFIRST_NAME]; char last_name[WLAST_NAME]; ... }; struct s_record record; ... #define quote(x) #x printf("\n Record is %" quote(WFIRST_NAME) "s %" quote (WSECOND_NAME) "s", record.first_name, record.last_name); which would yield (if it worked): printf("\n Record is %20s %30s", record.first_name, record.last_name); One alternative is: #define MAX_BUFFER 200 /* Or whatever */ char DoMsgBuffer[MAX_BUFFER];/* Local or Global buffer */ #define DoMsgInt(string, integer) \ { \ sscanf(DoMsgBuffer, "%s %d", string, integer); \ DoMsg(DoMsgBuffer); \ } Now instead of calling DoMsg directly, you call; DoMsgInt("Error message", MAX); This at least eliminates having to change two constants. An uglier alternative uses a separate quote constant with lots of comments about changing both. #define MAX 10 /* If you change this, change S_MAX to match - or else */ #define S_MAX "10" DoMsg("Error Message" S_MAX); Q This month's C Users Journal (Nov. 89) contains an article by P.J. Plauger describing the print facilities in Standard C. His review of these commands brings back a gripe I have long held about C, which I wish to raise. I was brought up as a Fortran programmer, and was weaned on its FORMAT statements. Say what you like, Fortran allows a form which is much missed by those of us who have to format complicated output, and that is the "repeat" format construct. Thus in Fortran, a format like 10I3 would replace a lot of writing in Standard C. Even more useful are statements like 4(3I2,2X,4F6.4), which formats 28 variables in 15 characters, and even adds blank spaces where desired. I am well aware that no two languages are equivalent, but it would seem to me that somebody by now has worked out a scheme to allow multiple format statements of similar types to be written with less difficulty than Standard C allows. What do the gurus say to this? David Tal Haifa, Israel A I am an ancient Fortran programmer also, or should I say, one who used Fortran in relative antiquity. One of the least desirable aspects of the FORMAT statements was having to count the number of characters (e.g. 8HVALUE IS) for character output. Because this was the syntax for characters, the leading digits on a format specifier (I, X, F) could be used as a repetition count. Where the language does not feature a particular item you are interested in, you can always come up with your own version. With the printf format specifier, everything that is not preceded by a % is output as a literal character. You might come up with a scheme such as: "%r%Z" where r is a repetition count (an integer) for the next format specifier. You might even use "%r (x)" where the x represents a string that will be repeated r number of times. All you need to do is write a routine something like: char *repeat_format(format) char *format; which you would use in a printf such as: printf(repeat_format("%5%3d %2(%d ABC)",a,b,c,d,e,f,g,h); and which would print the equivalent of: printf("%3%3d%3d%3d%3d %d ABC %d ABC",a,b,c,d,e,f,g,h); The routine might be easier if you always required the parenthesis. Otherwise it will have to determine where the end of a simple format specifier is. The function would need an internal buffer, whose address it could return. Reader Requests I have a question which concerns especially the IBM PC but maybe you know a solution for my problem. In writing a special data transfer program for an IBM PS and some telecommunications equipment I encountered the following task: Is there any way to determine whether there is a mouse connected to a specific serial COM port? Of course I know how to detect the presence of the mouse driver (Int 33h, function 0). But this doesn't tell me which port is used! Thank you, and I hope to receive a reply soon. Michael Wiedmann West Germany That is an interesting problem. The mouse driver that I use does not seem to check for presence of a mouse on the COM port. It assumes that one is there and displays a random position. Anyone have an answer? (KP) I am a new subscriber to The C Users Journal. I found the article "Pointer Arithmetic at Memory Segment Boundaries" by D. and N. Saks (Vol.7 #7 pp. 27) very informative. I'm interested in writing applications in C that use expanded memory to store variables and data. I'm also interested in finding information regarding writing applications in C that can run in expanded memory. Can you direct me to sources of such information? Thank you for your time and continued success with The C Users Journal. Phil Pistone Chicago, IL The Intel/Lotus specifications give the full details on the interface (KP). Anyone have suggestions as to a readable version? (Ed. Note: You might find what you're after in two articles, both in old Microsoft Systems Journals: "Expanded Memory: Writing Programs That Break the 640K Barrier", M. Hansen, Krueger, B., and Stuecklen, N., March 1987, p. 21; and "Extended Memory Specification 2.x: Taking Advantage of the 80286 Protected Mode", Chip Anderson, July 1989, p. 17.--rlw) Reader Responses: Here is some information for Jeff Saraiva's questions concerning graphics and the MSC graphics library that was published in your November issue. Sample Programs: I have not seen any extensive programs that use the MSC graphics library. Two good books that I have used are: Advanced Graphics in C and Graphics Programming in C. The authors develop their own graphics library, but the concepts are the same and it should be an easy matter of replacing calls to these libraries to the MSC graphics library routines. UNIX Tests: You pretty much named the worthwhile ones. TIFF Information: The May 1988 issue of Dr. Dobb's Journal has an excellent article about TIFF starting on page 26. The article has an address to which you can write for the source code and documentation to a TIFF library that reads and writes TIFF files. The code and documentation is free. There are two libraries; one for the MAC and one for the PC. I have used both of these and they work very well. The documentation is very complete. Ray-tracing Algorithms: Sorry, can't help you on this one. Joseph K. Vossen Duluth, GA Printscreen: In your "Q?/A!" column in The C Users Journal, v7n8, you suggested that Roger Glocke kill the "printscreen" key by replacing the vector for INT 5. That is the hard way, and (as you point out) can lead to trouble. The easy way is to set PrtScr's data byte (at address 0050:0000) to "busy" so the ROM BIOS interrupt handler will ignore the key stroke. The attached listing (Listing 1) illustrates the technique with a small stand-alone utility to set the byte or restore it to "not busy." To use the technique embedded in another program: Initialize PrtScr as a static far pointer as shown, assign -1 to the data byte (*PrtScr) at the beginning of main(), and assign 0 to it before exit (it is safest to use a local function called by atexit() for this). Turbo C has the MK_FP macro (in dos. h), which could have been used to initialize the far pointer in line 12. However, Mr. Glocke said he uses Microsoft C, which does not provide this macro. The construction shown should work with Microsoft C. Murray L. Lesser Yorktown Heights, NY I had mixed emotions about seeing my letter in your column of the November issue of The C Users Journal that arrived just today. I was pleased because I learned something from your very lucid discussion of the problem raised in my letter. I was a little embarrassed because within a week after I wrote the letter the problem was solved. A discussion with Fred Crigger of Watcom Products, Inc., revealed that the solution was to write some string functions that would accept far pointers to data items and return near pointers to data items, and vice versa. I wrote the necessary functions, virtually all the string functions in the standard library, and made the appropriate changes in the program. As of this moment everything is working very well and I am rather proud of my efforts. Incidentally, I have switched completely to Watcom C v7.0 and am delighted with it. It does compile slowly but yields .exe files that are consistently two-thirds the size of those produced by Microsoft. And, the people at Watcom have been very generous with their support. I do appreciate both your going to the trouble to answer my letter and your column -- I am always better informed after reading it. Fred C. McDaniel Richardson, Texas Listing 1 KILLIT.C /* *KILLIT.COM - A small utility to kill or re-enable the PrtScr function * Usage: Call with no argument to disable PrtScr function Call with any argument to enable PrtScr function Written by M.L. Lesser, 10/27/89 Compiled with Turbo C (TCC) v 2.00, switches -mt -lt -o to link as COM file */ #include <MIO><N>.h char far *PrtScr = (void far *)((unsigned long)0x50 << 16); void main(argc) { if (argc == 1) /* No arguments on the command line */ { *PrtScr = 1; /* Set PrtScr "busy" signal */ cprintf("Print screen function has been disabled"); } else { *PrtScr = 0; /* Set PrtScr <169>available<170> signal */ cprintf("Print screen function has been enabled"); } } Applying C++ Building A Text Editor, Part 2: Buffers, Sloops, And Yachts Tsvi Bar-David This article is not available in electronic form. Implementer's Notebook Life With Static Buffers, Part 2 Don Libes This article is not available in electronic form. The HALO Graphics Library Victor Volkman Victor R. Volkman received a BS in Computer Science from Michigan Technological University in 1986. Mr. Volkman is a frequent contributor to The C Users Journal and the C Gazette. He is currently employed as Software Engineer at Cimage Corporation of Ann Arbor, MI. He can be reached at the HAL 9000 BBS, (313) 663-4173, 1200/2400/9600 baud. The HALO Graphics Library by Media Cybernetics, Inc. supports device-independent graphics programming with more than 200 functions. HALO provides device drivers for dozens of vector and bitmap graphics boards, dot matrix and laser printers, page scanners and video digitizers, graphics tablets, mice, and plotters. Some of the more popular graphics boards supported include CGA, EGA, VGA, Extended-VGA, MCGA, PGA, Hercules, and AT&T Targa. HALO for DOS lists at $395. HALO for OS/2 lists at $695 and is source-code compatible with the DOS version. The BARGRAPH application in Listing 1 demonstrates the style and ease with which HALO can be integrated with C programs. System Requirements HALO makes only modest system hardware requirements. It will run on an IBM XT, AT, 3270 PC, AT&T 6300 or other true compatible computer with a base memory of 256k RAM. The computer must have at least one supported graphics device. Additionally, HALO requires MS-DOS v2.1 or later. For software development, you must have any one of the supported languages: Microsoft MASM v5.0, BASICA, QuickBASIC v4.0, Turbo BASIC v1.0, Lattice C v3.0+, Microsoft C v3.0+, Turbo C, Microsoft FORTRAN, Ryan-McFarland FORTRAN, Gold Hills Lisp, Microsoft Pascal, or Turbo Pascal v4.0. Development for the BARGRAPH application was completed on a 12.5 Mhz AT-compatible with 640K RAM and an Everex EV-640 graphics card (CGA and Hercules compatible). The BARGRAPH program was compiled with Microsoft C 5.1 and linked with the small model HALO library. HALO is a graphics kernel system structured like a layercake (see Fig. 1). Each layer may only talk to the layers directly above and below it. On the top layer is your source application program as written in any supported language (C, BASIC, Pascal, etc.). The application program layer contains many function call references to HALO . Since each language has its own parameter passing mechanism, a language binding layer is needed. The language binding presents the function arguments to the graphics kernel in a standard fashion. The operations of the graphics functions themselves are split between the graphics kernel and device driver layers. The graphics kernel is linked into your application program. It performs the device independent functions such as polygon drawing, text manipulation, and viewport management. The components in the device driver layer speak directly to the hardware. Typical graphic device driver functions are vector drawing and bitmap panning. For maximum flexibility the device drivers are loaded dynamically. HALO supports devices with four types of color palette management: devices which have a fixed set of colors that cannot be changed; devices which allow you to switch between several predefined palletes; devices which support more colors than can be displayed at one time (e.g. IBM EGA); and devices which support a programmable palette. (In the third case, colors are changed by specifying both index and bitmask for the palette.) HALO supports devices in modes up to 16 bits per pixel (65,536 colors). In general, the bits of the same magnitude (i.e. power of 2) of each pixel in the display are referred to collectively as a bit plane. The graphics bitmap is defined as the sum of all the bit planes. The actual physical mapping of pixels in memory varies enormously between various graphics cards and their display modes. Fortunately, the HALO device drivers sufficiently hide this information so the programmer need never be concerned with such low-level details. The HALO package supports three different types of coordinate systems: device coordinates, world coordinates, and normalized device coordinates. Which system you choose depends entirely on your requirements for device-independence. HALO provides functions to convert between any of the coordinate systems. Dealing with aspect ratios is an important part of the graphic environment. Aspect ratio is used to convert from the perfect mathematical coordinate plane to the real-world imperfect graphics device. Specifically, the aspect ratio is the ratio of a pixel's width to its height. For example, the IBM EGA displays 640 x 350 pixels on a display 9.6" wide and 6.0" high. Each pixel is 9.6 inches / 640 pixels = 0.015 inches / pixel (width) 7.2 inches / 350 pixels = 0.0205 inches / pixel (height) Click Here for Equation HALO automatically corrects circles, ellipses, and arcs for aspect ratio. If the correction was not applied, then a circle with a 100 pixel radius would appear to be 100 x 0.015 = 1.5 inches wide and 100 x 0.0205 = 2.05 inches tall. HALO always corrects in the vertical component so that it would actually produce a circle 100 x 0.015 = 1.5 inches wide and 100 x 0.73 x 0.0205 = 1.5 inches tall. However, it is strictly up to the programmer to include aspect ratio in his own calculations for boxes, lines, and other objects. Graphics Objects HALO offers the programmer all the necessary tools for drawing a variety of graphics objects including filled polygons and spline curves. All of the drawing operations make use of the graphics cursor. The graphics cursor is an invisible reference point on the display. The graphics cursor may be set at an absolute location or moved relative to its current position. The graphics cursor is used as the first point from which all line and polyline functions are drawn. It is also the center point for circle, arc, pie wedge, and ellipse drawing functions. Lastly, the graphics cursor is the starting point for fill functions. Since lines are the most frequently used graphics objects, they require the most flexibility. HALO will draw a line from the current graphics cursor to a relative position (line relative) or to an absolute position (line absolute). You may specify both the width in pixels and the style of the line. HALO offers three basic line styles and seven user-defined line styles. HALO has built-in functions for creating filled circle, pie wedge, box, and polygon shapes. Objects may be filled in the current color as solid or with a hatch style. Objects may be filled as they are drawn or filled later with a flood-fill function. HALO offers five basic hatching styles and five user-defined hatching styles. In addition to geometric objects, HALO supports three types of graphics text: dot text, fast text, and stroke text. Dot text is a general purpose bitmapped font. HALO includes six dot text fonts, whose height and width may be scaled in integer multiples. Dot text may be drawn in any of the four compass directions. HALO maintains a special cursor called a text cursor for dot and stroke text. Fast text is a special purpose bitmapped font whose data is taken from the graphics board's own ROMs. Additional font files are thus not used for fast text. Fast text may only be drawn at integer row and column text positions. Additionally, fast text may only be drawn from left to right. Stroke text is HALO's most sophisticated text display. Stroke text is not defined as a bitmapped font but rather as a series of brush strokes or vectors. Since stroke text is displayed as vectors, it uses all the current line settings. Stroke text may be sized and rotated to any angle desired. When using stroke text drawn at an angle, the programmer must consider the aspect ratio of the display. The BARGRAPH program uses only the stroke text to achieve the highest quality image. Fig. 6 summarizes the tradeoffs between the various HALO text display schemes. Advanced Features HALO has a variety of features essential to the development of advanced graphics applications including area moves, rubberband functions, and the "Virtual Rasterizer Interface". Area moves involve copying from one part of the bitmap to another. The movefrom() and moveto() functions allow a rectangle of the display to be cut and pasted respectively. The moveto() function allows the buffer to be pasted in one of several modes including XOR, AND, OR, and complement. Rubberband functions, like area moves, are designed to make interactive graphics programs easier to write. A rubberband object is one that can be stretched and dragged across the graphics screen without disturbing it. For example, you could write a simple function which polled the mouse to interactively position the endpoint of a vector. Each successive call of a rubberband function deletes (XORs) the previously displayed object and simultaneously writes it at a new position. HALO supports rubberband lines, boxes, and circles. The "Virtual Rasterizer Interface" (VRI) allows you to create a virtual graphics display of any horizontal, vertical, and color resolution desired. VRI will use any combination of MS-DOS base memory, EMS memory, and disk space to store the image. The most common use of VRI is to assemble an image for a laser printer page. For example, an "A" size drawing (8.5" x 11") at 300 dpi is effectively a 2550 x 3300 pixel image, requiring just under one megabyte of storage. Once the VRI device is initialized, it accepts the same HALO calls as any other raster device. A VRI can be configured for up to a 16383 x 16383 resolution image or 32 megabytes, whichever is smaller. BARGRAPH - A Small Application For HALO The BARGRAPH demonstration application produces high-quality charts simply and efficiently. BARGRAPH takes a language-driven approach to specify the parameters of a chart. The PC-DOS usage of this program is "BARGRAPH datafile" where datafile is a plain ASCII file containing command strings. Each command string specifies a single detail of the chart such as the scale or legend. BARGRAPH input files include the HALO specific configuration data as well as the actual graph data. A typical BARGRAPH data file is shown in Fig. 7. The operation of the BARGRAPH program is roughly divided into two phases. In the first phase, the datafile is parsed a line at a time and stored into the cmd_data[] static structure. The function process_graphics_cmd_line() is called once per input line. This function determines the command keyword and parses its arguments into the appropriate slot of cmd_data[]. The DATA, COLORS, MODE, and SCALE commands call parse_delimited_number_list() to store data in the numeric half of the udata union. Similarly, the LEGEND, FONT, TITLES, and DEVICE commands call parse_delimited_string_list() to place data in the string array half of the udata union. The COMMENT and END commands serve only documentation purposes and are thus ignored. The complete BARGRAPH syntax is diagrammed in Fig. 8. The second phase uses data supplied in the static structures to setup the HALO environment and plot the graph on the screen. The HALO environment is established in two phases. First, the function setup_halo_globals() both inquires about the capabilities of, and sets the parameters for, the graphics device. A global structure called halo, devised expressly for this program, tracks HALO environment values throughout the program. The setdev() and initgraphics() functions must be the first two HALO calls in an application program. These load the device driver from disk and set the hardware graphics mode respectively. The remainder of the HALO calls in setup_halo_globals() set the degree mode, world coordinate rectangle, line width, line style, drawing color, and the stroke text font and color (see Fig. 9) The function setup_graph_globals supervises the second phase of initialization. A global structure graph separates the BARGRAPH program data from the HALO data. The graph structure holds data in a form which will simplify calculations later. If the user does not supply SCALE Y-Axis upper and lower bounds, BARGRAPH will use the min and max data points as the scale range. The bar graph is drawn by draw_bar_graph(). First, draw_axes() produces the axes in three steps. First, the legend string, horizontal X-Axis, and vertical Y-Axis are drawn at predefined coordinates. Secondly, tick marks and their labels are drawn along the Y-Axis. (The draw_axes() function makes a total of ten ticks above the Y-Axis.) Finally, the title for each bar is drawn below the X-Axis at a 45 degree angle -- the angle keeps titles from overwriting each other. Since each stroke text character is a different size, the inqstsize() function must be called to determine the actual space required for each title string. Once the axes are complete the bars are placed on the screen. If the graphics device is monochrome or the user has not specified any bar colors then a sequence of four hatch styles will be used. This ensures that default graphs are displayed similarly on monochrome and color graphics devices. The equations for determining the bar size are shown in Fig. 10. Improving BARGRAPH Some simple enhancements which might greatly increase the utility of BARGRAPH include the following: (1) Read the HALO-dependent commands (DEVICE, PRINTER, MODE, etc.) from a default configuration file (e.g. BARGRAPH.CFG) so they need not be repeated in each data file. (2) Add aspect-ratio calculations to standardize the look of the graph. (3) Add line graph and pie-slice graph types to the program. Create a new command called CHART to specify the graph type. (4) Allow the user to capture the graph display and save it with a file format which can be read by desktop publishing programs. Conclusion The HALO Graphics Library by Media Cybernetics is a highly useful programming tool for developing your own graphically oriented programs. The versatility, efficiency, and functionality of HALO are easily demonstrated by BARGRAPH. The BARGRAPH applications program as presented required less than two dozen different functions out of the 200 offered in HALO. The executable file amounts to just under 100K plus about 12K for device drivers, a fairly modest memory requirement. The most important contribution to BARGRAPH is the ability to operate with any combination of the dozens of screen and printer drivers that HALO offers. Rasters, Pixels, Vectors, Palletes -- Elements Of The Graphic Environment Graphics objects may be constructed from pixels or vectors. A pixel (or Picture Element) is the smallest resolvable discrete point on a graphics device. Graphics devices addressable only by pixels are known as raster devices. The resolution of a raster-type graphics card or mode is expressed in pixels. For example, the minimum resolution of the IBM EGA card is 640 columns x 350 rows of pixels. In the special case of a monochrome display, a pixel directly corresponds to a single bit in display memory. Color displays require more than one bit per pixel to describe the color of the pixel. For example, the IBM EGA card uses four bits per pixel to produce a total of 24 = 16 colors. In contrast, vectors are line segments defined by a starting point, direction, and length. Although every raster device can display vectors, vector devices do not have bitmaps and cannot display pixels, as such. For example, a pen plotter typically has no knowledge of vectors it has already drawn. Certain hybrid graphics devices, such as the Control Systems Artist, accept both raster and vector data. Every graphics device, raster and vector, has a finite set of discrete displayable colors called the palette. On color devices, each pixel is displayed in the color corresponding to its index in the palette. For example, the IBM EGA has a palette of 16 colors out of 64 available. Fig. 2 shows a portion of an example IBM EGA palette: a pixel with index of 15 would be bright white (all bits set) whereas a pixel with index of 3 would be dull red (only 1 red bit set). The most flexible graphics devices support a programmable palette. Programmable palette devices allow you to specify integer values for the amount of Red, Green, and Blue (RGB) components of each color. For example, the Number Nine Revolution in 832 x 624 resolution has a palette of 16 colors. Each index of the palette has 256 possible values for each RGB color component. Coordinate Systems The HALO package supports three different types of coordinate systems: device coordinates, world coordinates, and normalized device coordinates. Which system you choose depends entirely on your requirements for device-independence. A summary of the coordinate systems is presented in Fig. 10. HALO provides functions to convert between any of the coordinate systems. The device coordinate system maps each logical coordinate directly to its physical coordinate or pixel. In the device coordinate system, the upper-left corner of the screen is at (0,0) and the lower-right corner is at the maximum coordinate. For example, on the Hercules Monographics card with a resolution of 720 x 350 the upper-left corner is (0,0) and the lower-right hand corner is (719,349) (see Figure 3). In HALO, device coordinates have the advantage that they can be expressed in integers rather than floats. Since device coordinates are dependent on the resolution of the output device you use, they are a poor choice for writing portable applications. The world coordinate system allows you to specify your own resolution independently of the hardware. This coordinate translation means that even though the Hercules and IBM CGA cards have different heights and widths, your program can operate exactly the same for both of them. When enabled, HALO will translate from world coordinates to device coordinates automatically. For example, if you were to define the world coordinates from (-100.0,-100.0) to (100.0,100.0) then a reference to (0.0,0.0) would map to the center of the display. World coordinates assume a Cartesian orientation. In HALO, world coordinates are expressed as floats rather than integers. The BARGRAPH program uses a world coordinate system from (0.0,0.0) to (1.0,1.0). Normalized Device Coordinates (NDCs) are another way of mapping from logical coordinates to physical coordinates. NDCs are like device coordinates because the upper-left corner is always the origin of the screen (see Figure 4B). NDCs differ from device coordinates in that the location of lower-right corner of the screen is always the same regardless of the actual output device being used. The only difference between NDCs and world coordinates is that the upper-left corner and lower-right corners are fixed at (0.0,0.0) and (1.0,1.0) respectively in the NDC system. NDCs are used in the HALO function set-viewport() to allow viewports (i.e. windows) to be nested in a device-independent way. A viewport is a region of the display into which graphics are mapped. By default, the viewport includes the entire screen from (0.0,0.0) to (1.0,1.0) in NDCs. After setting a viewport, all graphics calls in world coordinates will map into the new viewport. Only one viewport can be in effect at any time. for example, to put a viewport in the upper-right hand quadrant of the screen you would specify (0.5,0.5) and (1.0,1.0). Figure 5 shows a bargraph mapped into the upper-right quadrant specified. Figure 1 Figure 2 Example Palette for IBM EGA Figure 3 Example Device Coordinates Figure 4a Example World Coordinates Figure 4b Normalized Device Coordinates Figure 5 Figure 6 HALO '88 Graphics Text Summary Display Drawing Display Text Type Quality Speed Flexibility Stroke text High Slow High Dot text Medium Medium Medium Fast text Low Fast Low Figure 7 Example BARGRAPH data file COMMENT this is a test of the bargraph application DEVICE HALOHERC.DEV MODE 0 PRINTER HALOEPSN.PRN ATTRIBUTES -1,-1,0,0,0,0,0,0,-1,0,0,-1,1,-1,-1,-1,0 FONT HALO104.FNT COLORS 1,2,3 LEGEND 1989 Projected Sales TITLES Jan,Feb,Mar,Apr,May,Jun, TITLES Jul,Aug,Sep,Oct,Nov,Dec SCALE 0.0,200.0 DATA 10.0,42.0,130.0,80.0,54.3,140.0 DATA 180.0,135.0,300.0,69.0,94.7,101.0 END Figure 8 Complete BARGRAPH Command Syntax COMMAND MEANING DEFAULT ------- ------- ------- COMMENT Documentation only N/A DEVICE s1 Name of HALO screen device HALOIBMG.DEV PRINTER s1 Name of HALO printer device none FONT s1 Name of HALO stroke font to use HALO104.FNT LEGEND s1 Legend is centered over top of graph none TITLES s1,s2...sn Titles are displayed underneath BARs none MODE v1 Graphics mode (device dependent) 0 ATTRIBUTES v1,v2...vn Printer attributes (device dependent) none SCALE v1,v2 Set extent of Y-Axis from v1...v2 autoscaling DATA v1,v2...vn Input n data values (may be repeated) none COLORS c1,c2...cn Color pattern to use for bars monochrome hatch END Signifies end of a graph N/A Figure 9 Initialization of HALO '88 in setup_halo_globals() setdev(halo.device); /* Initialize the graphics device */ setdegree(&halo.degree_mode); /* Use degrees, not radians */ setworld(&halo.x1,&halo.y1,&halo.x2,&halo.y2); /* World rectangle */ setlnwidth(&halo.lnwidth) ; /* Line width is 1 pixel */ setlnstyle(&halo.lnstyle); /* Line style is solid */ setcolor(&halo.maxcolor); /* Max screen color is usually white */ setfont(halo.font); /* Load font from disk file */ setstclr(&halo.maxcolor,&halo.maxcolor) ; /* Set stroke text color */ Figure 10 Upper-left Lower-right Hardware Coordinate System Corner Corner Independence Norm. Device Coord 0,0,0,0 1,0,1,0 Yes World Coordinates User-defined User-defined Yes Device Coordinates 0,0 Hardw. depend. No Figure 11 A Small Prolog Interpreter Lindsey Spratt Lindsey Spratt is a graduate student in computer science at the University of Kansas, concentrating in artificial intelligence. He received a B.S. in mathematics from MIT. He worked developing the Multics operating system for seven years, and then developed CASE tools and researched program understanding (for specification recovery). A logic programming devotee, most of his work in recent years has been done in Prolog. Introduction Small Prolog 1.32 by Henri de Feraudy is a minimal-featured public domain implementation of a Prolog interpreter which uses the Cambridge (Lisp-like) syntax (CUG 297). The source is provided, as well as makefiles for MS-DOS, Sun and Atari. An executable file named SPROLOG.EXE is provided for the MS-DOS environment. The distribution also includes 11 files of Prolog examples and documentation. I ran Small Prolog under MS-DOS emulation -- SoftPC on a Macintosh IIx. It ran without any problems and was easy to use. The Question Of Syntax There is no official standard for Prolog. However, as is true for some other languages without official standards (e.g., Common LISP), there is a de facto standard known as Edinburgh syntax Prolog (a variant developed at the University of Edinburgh, Scotland). Most Prolog texts use the Edinburgh syntax or some close variant. Nearly all commercial implementations of Prolog are Edinburgh syntax. Cambridge syntax represents everything as parenthesis-delimited lists, giving it a very LISP-like appearance. The only commercial implementation which used Cambridge syntax has completely converted to Edinburgh syntax (for a while, this implementation supported both syntaxes). In Edinburgh syntax, a predicate to relate an element to a list containing that element is: member(Element, [Element \ LisTail]). member(Element, [IgnoredListHead \ListTail]) :- member(Element, ListTail). In Cambridge syntax, this same predicate is: ((member Element (Element \ ListTail))) ((member Element (IgnoredListHead \ ListTail)) (member Element ListTail)) In the Small Prolog documentation, Feraudy lists "improve the syntax" as one of the projects you might undertake. There are 11 files of example Small Prolog programs. Considered as a whole, they provide a nice tutorial introduction to Prolog. According to the documentation, Feraudy set out to meet these design goals: A minimal usable implementation. Maximum portability. Educational code. Extensibility. A small object code. Embeddability. Facilitate meta-programming. Small Prolog meets most of these goals fairly well. The implementation is usable for executing small Prolog programs. It is minimally usable in that the programs it supports should be less than a hundred clauses, use only modest amounts of recursion, and do only simple arithmetic (if any). The code is portable, extensible, small, embeddable and it supports meta-programming. The support of meta-programming is important to provide a feel for how one programs in (real) Prolog. Unfortunately, the source is very lightly commented, a lamentable condition regardless of purpose, but particularly unfortunate when the code is intended to be studied. Small Prolog provides complete support of the logic programming paradigm. I was particularly pleased to see that Small Prolog supported lack of data typing and the ability to handle incomplete data structures. Incomplete data structures are extremely useful in logic programming. Small Prolog has some of the common Prolog extensions, but lacks others. Small Prolog is unusual in requiring all arithmetic to be either integer or real -- no mixed type arithmetic is supported. Further, the programmer must choose the correct arithmetic procedures based on the type of the arguments. The Small Prolog debugging environment is incomplete. The common facilities (found, for instance, in Quintus Prolog, C-Prolog, and LPA Prolog) allow the user several choices when stepping through the execution of a program. These choices are generally called abort, retry, fail, exit, skip, leap, and continue. Small Prolog appears to only provide abort and continue. Most Full commercial implementations, unlike Small Prolog, also allow the programmer to set spy points on selected procedures. When tracing, the debugger starts stepping when it encounters a spy point. The skip command directs the debugger to skip to the next encounter of a spy point. The leap command directs the debugger to leap to the exit of the current procedure call (ignoring spy points encountered on the way). Small Prolog also does not include a portray procedure, which allows the user to define how terms are printed during a trace. The portray procedure is useful since the data in the arguments of the goals is commonly large and complexly structured. Performance To test Small Prolog's capabilities, I used it to solve the classic N Queens problem: Given a square board of N by N cells, find a distribution of (chess) queens on the board such that no two queens attack each other. Two queens attack each other if they are on the same row, the same column, or the same diagonal. The MS-DOS version of Small Prolog can solve up to 9 queens, which is a relatively small number. The limit is due to the extensive recursion in the program. This test convinced me that Small Prolog is not useful for any kind of application development (its too slow and its memory limitations are too severe). These limitations make it unlikely that Small Prolog could be successfully embedded in any non-trivial application. The Source Code Because the code is sparsely commented, the reader should have another source for describing how a Prolog interpreter works. De Feraudy mentions the books on which he based his implementation, and I recommend that you use these when studying his source. The files most critical to your understanding are PRLUSH.C and PRUNIFY.C. PRLUSH.C contains the algorithm for how procedures are executed. PRUNIFY.C contains the algorithm for matching terms (unification). The execution of procedures is a depth-first search, with backtracking. Procedures are selected by pattern-matching between the arguments of the call and the parameters of the procedure definition. The pattern-matching is done via unification. Conclusion Small Prolog is particularly valuable for aiding in the study of how Prolog is implemented. Because the Cambridge syntax is not used in the available teaching materials, I don't recommend it as a learning environment for the serious Prolog student, though if you are just curious about how Prolog works, you may find Small Prolog useful. This implementation is not suitable for supporting the development of applications written in Prolog. Still, this is the only Prolog product which provides the source code for the interpreter. Now, if only someone would provide the source code for a Prolog compiler. A Crash Course In Cambridge Syntax Prolog A Prolog program is a set of facts and rules. One executes it by posing a query to the Prolog system, which it then tries to prove using the facts and rules in the program. If there are variables involved, they are generally bound to values in the course of building a proof. The output from a query is the set of bindings of values to the variables in the query. A query is true if there is some fact which matches it, or if there is a rule which has a head which matches the query and which has a true body. The body of a rule is true if all of the goals in it are true. A goal is true if there is a matching fact, or if there is a rule which has a head which matches the goal and which has a true body. To simplify the terminology somewhat, instead of using the terms fact and rule, I speak of both of these as clauses. A clause has a head and a possibly empty body. If the body is empty, the clause is a fact. If the body is not empty, then the clause is a rule. A Prolog program consists of an ordered set of clauses. A clause is a list. The first element of the list is the head of the clause, and the rest of the elements of the clause list are goals and are known collectively as the body of the clause. A goal is a list with an atom as its first element and any term for its other elements (a goal is a special kind of list, used in a particular way). The first element is the functor of the goal, the number of elements following the functor is the arity of the goal. All of the clauses having heads with the same functor and arity are collectively known as a procedure. I frequently refer to a procedure by its functor and arity separated by a slash, /. An atom is an extended alphanumeric (including underscore, _) token with an initial lowercase alphabetic character (e.g. foo). A variable is an extended alphanumeric token with an initial uppercase alphabetic character or an initial underscore, _ ( e.g. Foo). The head of a clause has the same syntax as a goal. There are three syntaxes for a list. In all three cases a list begins with a left parenthesis, (, and ends with a right parenthesis, ). An empty list has nothing between the parentheses (e.g. ()). A simple list has a series of tokens separated by whitespace inside the parentheses (e.g. (a b c)). A cons list (to borrow some terminology from Lisp), has a series of one or more tokens separated by whitespace starting at the left parenthesis, followed by a vertical bar, , followed by a term, followed by the right parenthesis (e.g. (a b Foo)). Generally this last syntax is used when the term following the vertical bar is either an unbound variable or a list. Examples are: Term Type a atom atom atom a_thing atom A variable Atom variable X variable _vat variable A_Variable variable () list (empty) (foo) list (foo bar baz) list (foo bar X) list (with tail of X) (foo bar (baz)) list (with tail of (baz)) In the examples of lists, (foo bar (baz)) has the exact same meaning as (foo bar baz). In the member/2 procedure: ((member Element (Element ListTail))) ((member Element (IgnoredListHead ListTail)) (member Element ListTail)) There are two clauses: ((member Element (Element ListTail))) and ((member Element (IgnoredListHead ListTail)) (member Element ListTail)) The first clause has an empty body (i.e. there are no goals in its body). Its head is (notice that one layer of parentheses has been stripped away): (member Element (Element ListTail)) This head is a list of the three elements: member, Element, and (Element ListTail). The first element is an atom, the second element is a variable, and the third element is a cons list. The first element of the cons list is the variable Element. The tail of the cons list is the variable ListTail. The second clause is similar in structure to the first, with the addition of a goal in its body. The functor and arity of the goal is the same as the functor and arity of the clause, thus this is a recursive procedure. The interpreter prompts the user for a query with ?-. In the example uses of member/2 below, the ?- is provided by the system. ?- (member a (a b)) Yes ?- (member b (a b)) Yes ?- ((member X (a b)) (display X) (nl) (fail)) a b No In the last example display/1 is a procedure to display its argument, n1/0 is a procedure to print a newline, and fail/0 is a procedure which always fails. Failure in Prolog forces the preceding (successful) goal to try to find another solution. If it succeeds in finding another solution, then the rest of the goals (starting with the one which failed) are re-executed. If it fails, then its preceding goal is retried. Meta-programming Meta-programming refers to writing procedures which use procedures as data. For example, an if procedure can be written via meta-programming: ((if Test ThenGoals ElseGoals) (Test) (cut) (ThenGoals) ) ((if _ _ ElseGoals) (ElseGoals)) This procedure can be used as follows: (if (iless X 3) (writes "Less than 3") (writes "Greater or equal to 3")) The data-like procedures in the if procedure are the three arguments, Test, ThenGoals, and ElseGoals. Each of these arguments is called in the bodies of the two if clauses. In the example use of if, Test is (iless X 3), ThenGoals is (writes Less than 3), and ElseGoals is (writes Greater or equal to 3). The Test Program This is the N queens program used to test the interpreters performance. The program is invoked by: (queens N Solution) where N is the number of queens (and the size of the board) and Solution is the resulting positions of the queens. /* (queens +N -Positions) queens/2 is the main procedure for solving the N-queens problem. N is input, the number of queens for which a solutions is desired. Positions is output, it is the list of the positions of the N queens such that they dont attack each other. */ ((queens N Positions) (template N Positions) (solution N Positions)) /* (solution +N +Position) N is input, the number of queens for which a solutions is desired. Positions is partially instantiated as input and fully instantiated as output, it is the list of the positions of the N queens such that they dont attach each other. On input, each position has only its X value instantiated. On output, the Y value is also instantiated. */ ((solution N ())) ((solution N (PosOthers)) (solution N Others) (pos_y Pos Y) (between 1 Y N) (noattack Pos Others)) /* (noattack +NewPosition +EstablishedPositions) NewPosition is input, a new queen position to check against the list of established list of queen positions. EstablishedPositions is input, a list of established queen positions to check against the new queen position. It is known that there are no attacks among the EstablishedPositions. */ ((noattack _ ())) ((noattack NewPos (CheckPosOthers)) (pos NewPos NewX NewY) (pos CheckPos CheckX CheckY) (not (eq NewY CheckY)) (iminus CheckY NewY DiffY) (iminus CheckX NewX DiffX1) (not (eq DiffY DiffX1)) (iminus NewX CheckX DiffX2) (not (eq DiffY DiffX2)) (noattack NewPos Others)) /* (member ?X ?List) X is input or output, it is a term in the List. List is input or output, it is a list which contains the term X. */ ((member X (XL))) ((member X (_L)) (member X L)) /* (template +N -Positions) N is input, it is the number of queen positions. Positions is output, it is a list of queen partially instantiated positions. The X value is instantiated and the Y value is unbound. This procedure is used to create the position template which is used by solution/2. */ ((template 0 ())) ((template N (Position OtherPositions)) (iless 0 N) (pos_x Position N) (iminus N 1 NextN) (template NextN OtherPositions)) /* The following 3 procedures are for accessing the pos data structure, which is used to describe the positions of the queens. */ ((pos (position X Y) X Y)) ((pos_x (position X _) X)) ((pos_Y (position - Y) Y)) /* (between +Low -Middle +High) Low isinput, it is the low value in an integer-valued interval. Middle is output, it is an integer value between the Low and High values. High is input, it is the high value in an integer-valued interval. between/3 can be used to generate and test integers between Low and High, until an integer is found which is satisfactory. */ ((between L L _)) ((between L M H) (iless L H) (iplus L 1 NextL) (between NextL M H)) Understanding C Harold C. Ogg This article is not available in electronic form. Publisher's Forum This issue debuts a redesigned CUJ. Don't panic, we've only changed some of the artwork -- the editorial focus and content remains the way you like it. We have, however, reorganized the mast and table of contents. We've standardized the treatment of columns and made the artwork for each of uniform size. Our staff artist Susan Buchanan has designed little icons for each department and refined the illustrations with each column. In short, in keeping with our general philosophy, we've made a lot of incremental changes, but the end effect should be a more attractive, accessible, and manufacturable product. We had meant to spring this redesign fully-grown with this issue. Unfortunately Susan and Howard -- half our editorial staff -- have been ill for a major portion of this cycle. As a consequence, there remains some little tweaking to finish the project. Periodic redesigns are one of the unavoidable "passages" for a magazine. While the eager j-school graduates will contend fervently that a "fresh" and successful design will increase sales and attract readers, jaded old editors like myself learn to relate to redesigns much as experienced husbands relate to home redecoration. Sure, it's nice to change your environment once and a while, but it's certainly not nirvana -- and there's certainly nothing to be gained by placing the sofa in the kitchen. So, don't expect to find anything that will replace your Van Gogh collection. Even so, I think Susan's icons are excellent work -- and the table of contents (the collective work of Ann, Susan and Howard) is a vast improvement. We hope you find the new design cleaner, easier to use, and just as informative as ever. Sincerely yours, Robert Ward New Products Industry-Related News & Announcements Interactive Bundles LPI C LPI and Interactive Systems Corporation have signed an agreement in which Interactive will be bundling LPI's ANSI-C development environment with the 386/ix Software Development System v2.2. LPI has also signed an agreement with Sequoia Systems, Inc., in which LPI will port its COBOL, NEW C, and Code Watch language products to the Sequoia Series 300, a UNIX-based fault-tolerant system based on Motorola's 68030 processor. Sequoia will have marketing and distribution rights to LPI's COBOL, NEW C and CodeWatch products. Sage Acquires Plink 86 Rights Sage Software, Inc. has acquired the exclusive worldwide marketing and source code development rights to Plink86 +, from Phoenix Technologies, Inc. of Norwood, MA. Plink86+ operates on PC/XT/AT, PS/2 or compatibles, running MS-DOS v3.0 or higher with 256K of systems memory. Plink86+ retails for $495. For more information, contact Sage Software at (800) 547-4000, (503) 645-1150 or FAX (503) 645-4576. Z-World Releases Z80 Compiler Z-World has released FLASH C, a C programming tool for the Z80/HD64180/Z180 mP which includes a compiler and source level debugger. FLASH C enables the programmer to edit, compile, debug, and run in one integrated environment. A 200 line C program will compile in approximately eight seconds. For more information, contact Z-World at 1340 Covell Blvd., #101, Davis, CA 95616 (916) 753-3722; FAX (916) 753-5141. New SCO UNIX Supports 486/25 The Santa Cruz Operation, Inc. has released SCO UNIX System V/386 v3.2 Operating System, a multiprocessing extension, SCO MPX and a graphical user interface, JSB MultiView DeskTop. SCO UNIX System V/386 v3.2 and Open Desktop will support the new IBM PS/2 486/25 Power Platform for the Model 70-A21. SCO has also demonstrated new SCO UNIX System technology that enables applications to run on the new Intel i860-based IBM PS/2 Wizard Adapter. Multi-user, multi-tasking SCO UNIX System V/386 v3.2 is an AT&T-licensed implementation of UNIX System V/386 v3.2 for 386- and 486-based computers. It complies with POSIX and X/Open standards and is designed to meet the U.S. government's Department of Defense C2-level security requirements. It runs both XENIX and UNIX System-based applications, and will also run MS-DOS applications when combined with SCO VP/ix. Open Desktop is SCO's graphical operating system. Based on SCO UNIX System V, it includes a relational database, networking, X windows with OSF/Motif, and MS-DOS compatibility. SCO MPX is a multiprocessing extension to the SCO UNIX System V/386 v3.2 Operating System and to SCO's new Open Desktop. SCO MPX is based on multiprocessing software technology developed by Corollary, Inc. Through a joint development agreement between Corollary and SCO this technology was adapted to become the standard multiprocessing extension for SCO Operating Systems. In addition to OEM designs based on the Corollary 386/smp and 486/smp, SCO MPX supports the Apricot MC486, the Compaq Systempro, the Mitac Series 500, and the Zenith Z1000. SCO MPX will install on any supported computer running SCO UNIX System V/386 v3.2 or SCO's Open Desktop, and will support one additional CPU per package. In addition to the SCO UNIX System V/386 Operating System, as many as 15 SCO MPX packages can be utilized on a single machine, thereby supporting 16 total CPUs. Modifications are installed into the SCO UNIX System kernel to support symmetrical, closely coupled multiprocessing. Each CPU runs simultaneously from a single SCO UNIX System, processing the system and user tasks in priority order. This automatically balances the load across all CPUs. On supported hardware, there is nearly a linear increase in overall system CPU throughput with each added CPU, maximizing the total available computing resources of the system. JSB MultiView DeskTop is a graphical user interface for MS-DOS users who want to share data and files with SCO XENIX and SCO UNIX Systems on a network. JSB MultiView DeskTop enables users to connect any 286- or 386-based PC running Microsoft Windows to an SCO XENIX or SCO UNIX System-based host via a direct RS232 connection or a local area network. The JSB MultiView DeskTop enables users to choose from local MS-DOS applications or remote SCO XENIX or SCO UNIX System applications. When selected, these applications appear in concurrently running windows on the PC. Users can transfer files and "copy and paste" data among the discrete MS-DOS, XENIX, and/or UNIX System applications. When used in a "mixed" environment, JSB MultiView DeskTop lets users protect their investment in third-party MS-DOS software and training while taking advantage of all the features of the multi-user UNIX System, including electronic mail and shared programs such as databases. SCO MPX will be available through all SCO distribution channels in the first quarter of 1990, and will list for $895. JSB MultiView DeskTop is $149 for a one-user license, $495 for a five-user license, and $795 for a ten-user license. For more information, contact The Santa Cruz Operation, Inc., 400 Encinal St., PO Box 1900, Santa Cruz, CA 95061 (408) 425-7222. TSR Library For Turbo, Microsoft C Microsystems Software Inc. has released CodeRunneR, an optimized library for creating Terminate-and-Stay-Resident (TSR) programs with full MS-DOS access, using Borland's Turbo C and Microsoft C. With CodeRunneR, all program initialization code and data is eliminated when the program goes resident. Contact Microsystems Software Inc., 600 Worcester Road, Framingham MA 01701 (508) 626-8511; FAX (508) 626-8515. Clarion Offers LEM Maker Clarion Software Corp. released the Clarion LEM Maker, a collection of tools for creating LEMs from object modules written in Borland International's Turbo C. Priced at $199 retail, the Clarion LEM Maker includes a Clarion program that creates an assembler language interface between Clarion and Turbo C code, an extensive library of C functions and two sample LEMs. Clarion has also released ZIP Code Language Extension Module (LEM), which retails for $199 and offers library and data files for creating software applications that retrieve, check, and manipulate ZIP codes and other information referenced by ZIP codes. For more information contact Clarion Software Corp., 150 E. Sample Road, Pompano Beach, FL 33064 (305) 785-4555; FAX (305) 946-1650. Netwise Offers University Grants Netwise, Inc. has formed the Netwise University Grant Program and the Netwise University Discount Program. The grant program entitles qualified researchers to receive free Netwise software development products. The discount program allows all university applicants to receive a 75 percent discount on the price of Netwise products. For more information contact Wayne Moore at Netwise, (303) 442-8280. New B-Tran Now Available Software Translations Inc. (STI) has released v7.5 of its B-Tran Basic to C translator, which translates QuickBASIC v4.5 to C source code. B-Tran v7.5 is priced from $499 for the Microsoft C compiler under MS-DOS. For more information contact Software Translations, Inc., The Carriage House, 28 Green Street, Newburyport, MA 01950 (508) 462-5523; FAX (508) 462-9198. dAnalyst Code Manipulates dBase Files dAnalyst for C is the latest addition to the dAnalyst product line from Buzzwords International. The product includes an application generator which allows users to create C applications, such as pop-up and pull-down menus and AT SAY-GETs, with the look and feel of dBase IV. The generated code calls a set of high-speed video libraries compatible with MS-DOS, Xenix, UNIX, Desqview, the Apex ADL Library and the Lattice DBC Library. dAnalysts's Report Writer and Source Code Generator generates C to process reports on dBase data files. The libraries that work with the application generator also work with the report writer. The report writer is also fully relational with dAnalyst's Screen Painter. The Windowing Editor allows users to split their screen, then cut-and-paste code from one side to the other. dAnalyst can convert any single-user dBase application to multi-user, without any recoding. The dAnalyst Series supports dBase III Plus, dBase IV, FoxBase+, Nantucket's Clipper, QuickSilver and C, as well as Xenix and UNIX (under 8086, 68000 or '386 platforms). For more information, contact Buzzwords International at 2879 Hopper, Cape Girardeau, MO 63701 (314) 334-6317. Utility Structures FORTRAN Code Cobalt Blue has released FOR_STRUCT v1.1, a structuring utility that transforms spaghetti FORTRAN-IV and FORTRAN-77 into fully structured code, with or without VAX and FORTRAN-8X extensions. FOR_STRUCT is available for MS-DOS, Xenix/UNiX/386, Sun-3 and Sun-4. The Sun-3 and Sun-4 versions are priced at $1850, Xenix/UNIX/386 are at $1450 and MS-DOS is $825, and include two months free technical support and upgrades. Contact Cobalt Blue, 2940 Union Ave., Suite C, San Jose, CA 95124 (408) 723-0474. Source Level Debugger Works With Aztec, Z180 ICE Softaid has released v2.0 of its Source Level Debugger (SLD), which supports Aztec C, and an in-circuit emulator for the Zilog Z180 microprocessor, the Z180 IceAlyzer. Softaid's SLD is a source debugger for the firm's line of in-circuit emulators. Running on a PC, it gives the user a multi-window emulator interface. The entire state of the user's target system is shown, including the registers, disassembled code, stack, memory, and I/O. The debugger shows the user's program, whether written in C or assembly language, in its own source window. All facets of Aztec C are supported. The debugger will display any C variable using the type (integer, character, float, etc.) defined in the program. Variables local to a function are all automatically shown in the "Watchpoint" window. The Source Level Debugger is compatible with all of Softaid's emulators. Both 8- and 16-bit versions of the compiler are completely supported. The Z180 IceAlyzer costs $3090 and the Source Level Debugger is $795. Both are available from stock. For more information contact Softaid, Inc., 8930 Route 108, Columbia, MD 21045 (301) 964-8455 or (800) 433-8812; FAX (301) 596-1852. C + + Class Library Does Matrix Ops Rogue Wave has released a set of object-oriented numerical tools that extend all of the standard C arithmetic operators to include vectors and matrices. Extended versions of the standard C math functions (e.g. cos and abs) are also included. Many new functions for statistics and numerical modeling applications have also been provided. A complete complex number class is included. Fast Fourier Transform server classes allow you to take the FFT or inverse FFT of any length series (real or complex). Using the inheritance property of C + +, new classes can be created from the Rogue Wave classes to do specialized tasks. The classes compile under a variety of C + + compilers under both MS-DOS and UNIX. A complete 120-page User's Guide and Reference Manual is provided and full source code is included for $150. For more information, contact Rogue Wave, PO Box 85341, Seattle, WA 98145-1341 (206) 523-5831. UNIX Version 4 Now Available UNIX International, Inc. and AT&T's UNIX Software Operation have released UNIX system V, v4. The new release unifies the UNIX System installed base providing upward compatibility for more than 80 percent of current UNIX System installations. Version 4's primary advantages are compatibility, portability of software from platform to platform, interoperability of software between heterogeneous systems and scalability from PCs to mainframes. Contact UNIX International, Waterview Corporate Centre, 20 Waterview Boulevard, Parsippany, NJ 07054 (201) 263-8400. HCR Releases C + + For SCO UNIX HCR/C + + for SCO UNIX includes a C + + compiler that is compatible with AT&T's v2.0 of C + +, and a window-based source level debugger, dbXtra, based on dbx from Berkeley v4.3 BSD. HCR/C + + provides type safe linkages, default membership initialization, and the ability of each class to define its own operators. It will run on most 386-based system platforms. HCR's dbXtra adds the ability to operate through windows, permitting users to review their output and source code easily, even on standard terminals. In HCR/C + +, dbXtra is linked to C + +, allowing direct debugging of C + + and window access to the translated C source code. Because all C + + code is translated into C before execution, programmers can apply dbXtra to examine either C or C + + code during debugging. HCR/C + + is $995. Each user of HCR/C + + v1 will also have the option to upgrade to v2 for $99 delivered. For more information, contact HCR Corp. at (416) 922-1937, 130 Bloor Street West, 10th Floor, Toronto, Ontario M5S 1N5. JAM Supports VAX Rdb/VMS JYACC's JAM and JAM/OB's front end tools are now available to VAX/VMS users. VMS versions of these tools can now be used to design and develop applications using Rdb as their database. Applications developed with JAM are portable across 50 hardware platforms and 10 operating systems. The JAM and JAM/DBi development kit is $990 for PCs running MS-DOS. For more information contact JYACC, 116 John Street, New York, NY 10038 (212) 267-7722; FAX (212) 608-6753. Enhanced MIRACL Library Released As Commercial Product After a previous existence as Shareware, a more polished v3.0 of the MIRACL library is now available as a commercial product. This package allows the C or C + + programmer to use multiprecision integer and fractional data-types in their programs. All routines in the basic MIRACL library are written in standard portable C, and the source code is included. A full C + + Interface is provided. The MIRACL library has been successfully implemented on the IBM PC, the Apple Macintosh, Acorn Archimedes and Digital VAX machines, using a variety of compilers. MIRACL is only available on PC compatible diskettes, but can be uploaded from there to any computer which supports a full C compiler. The PC version needs MS-DOS v2.1 or higher, with a minimum of 256K of memory and a 360K floppy disk drive. Introductory Price is IR£50-00 Irish pounds (£45-00 Sterling) for the PC version on two 5¼ inch diskettes, with full documentation. For more information, contact Shamus Software Ltd., 94 Shangan Road, Ballymun, Dublin 9, Ireland Tel: 425430. Oregon C + + New On VAX Oregon Software, Inc., has released Oregon C + + for VAX/VMS. Major features include a souree-level debugger, the NIH OOPS class library, and support of shared libraries and VAX C calling sequence. Oregon C + + can call any DEC language as well as Oregon Software's C, Pascal-2 and Modula-2. Oregon C + + for VAX/VMS runs on VMS v5.0 and later. The Oregon C + + compiler also includes an ANSI C and K&R C compiler at no additional charge. The release will include complete compatibility with v2.0 of AT&T's cfront and a task library. License fees in the US range from $2000 to $34,000 depending on machine, cluster or network configuration. Contact Oregon Software, Inc. at 6915 S.W. Macadam Ave., Suite 200, Portland, OR 97219 (503) 245-2202; FAX (503) 245-8449. Blaise Updates C Tools Plus Function Library Blaise Computing Inc. has released C Tools Plus v6.0, a library product for Microsoft C. The Library includes virtual, stackable menus and windows with full mouse support and optional "drop shadows"; multiple virtual pop-up help screens; a miniature multi-line editor for gathering user responses; a single function call which can move, resize, and promote a window or menu on top of all others; the ability to update covered windows automatically when they are written to; support for EGA, VGA, and MCGA text modes including 30-, 43-, and 50-line modes; and support for the enhanced (101/102 key) keyboard. C Tools Plus requires Microsoft C v5.0 or later or QuickC, v2.0 or later. The mouse functions require a Microsoft-compatible mouse and its driver software. C Tools Plus v6.0 is priced at $149. Blaise Computing is located at 2560 Ninth Street, Suite 316, Berkeley, CA 94710 (415) 540-5441; FAX (415) 540-1938. SilverWare Offers Async Library SilverWare, Inc. has released the SilverComm C Async Library, an asynchronous communication library for C programmers. The library comes with over 125 communication and 40 advanced functions, and includes free source code, comprehensive demo and the Norton Guide Database. Documentation provides an example for each function and includes data sheets for 8250, 16450 and 16550 UARTS. This royalty-free library links directly to your application and supports all popular C compilers. SilverComm C Async Library is $249. Contact SilverWare, Inc. at 3010 LBJ Freeway, Suite 740, Dallas, TX 75234 (214) 247-0131; FAX (214) 406-9999. Design/OA Now Works With X Windows Meta Software Corporation has released a UNIX X-Window system version of Design/OA, a custom CASE application available under X. Design/OA is a graphics application development tool designed to build system modeling tools. It can be used to develop graphically-based CASE, CAD/CAE, object-oriented programming, or simulation applications. It can also be used to create graphical front-ends to a database, code generator, telecommunications network, or data processing system. According to Meta Software's Chief Technical Officer Alan Epstein, the X Window version is "X source code compatible and should run on all compatible UNIX workstations with little or no modification. We plan to release optimized versions for other UNIX workstations including IBM, DEC, and HP/Apollo over the next several months." Design/OA supports multiple windowing, and offers the developer full control of the application's "look and feel" through customization of the graphical interface, menus, dialogs, icons, commands and reports. It retails for $15,000, and is distributed directly through Meta Software. Both the Apple Macintosh and IBM PC versions sell for $7,500; volume discounts are available on request, and special prices are offered to educational institutions. Contact Meta Software at 150 CambridgePark Drive, Cambridge, MA 02140 (617) 576-6920. New Asynch Manager Adds File Transfer Protocols Blaise Computing Inc. has released C Asynch Manager v3.0 and Asynch Plus v5.0, upgrades to its communications toolkits for C and Pascal programmers. These new versions add features in two main areas: modem control and file transfer. The new modem control routines let programs talk to multiple modems, supporting the features and peculiarities of each simultaneously. The new file transfer capabilities include 1K packets, CRC error checking, true Y-Modem (multi-file transfers with file name and size and XMODEM preserved), auto switching to incoming packet size and error detection method. The file transfer routines have been designed to support background file transfers, and multiple files may be sent or received simultaneously over multiple ports. C Asynch Manager v3.0 requires Microsoft C v4.0, v5.0, or v5.1 or Turbo C v1.5, or v2.0. Asynch Plus v5.0 requires Turbo Pascal v4.0, v5.0, or v5.5 or QuickPascal. The price of each package is $189. For more information contact Blaise Computing, 2560 Ninth Street, Suite 316, Berkeley, CA 94710 (415) 540-5441; FAX (415) 540-1938. QNX Gets Window System Quantum Software Systems Ltd. has released QNX Windows, a graphical user interface environment for its QNX operating system. QNX Windows' dialog manager handles all the basic interactions that take place between the user and the system. As an integral component of the operating system itself, QNX Windows is server-based. QNX Windows can execute in parallel on the same node in the QNX LAN or on remote nodes. For more information, contact Quantum Software Systems, Ltd., 175 Terrence Matthews Crescent, Kanata, Ont., Canada K2M 1W8 (613) 591-0931; FAX (613) 591-3579. BSO Offers 8051 Tool Set BSO Inc. has released a tool set for the 8051, which is available to run on a number of different host platforms. Prices start at $1700 for compiler and assembler packages. For more information contact BSO at 411 Waverley Oaks Road, Waltham, MA 02154-8414 (800) 458-8276 or (617) 894-7800. Quad Offers SQL Tools Quadbase Systems, Inc. has released Quadbase-SQL, a relational database management system, and dQuery v3.0, an interactive query management system. Quadbase-SQL is $795, with no royalties. dQuery v3.0 is $195. Both Quadbase-SQL and dQuery v3.0 require MS-DOS v3.1 (or above), 640K of RAM and hard disk. They run on any MS-DOS v3.1 compatible LAN system. For more information contact Quadbase Systems, Inc., 790 Lucerne Dr., Suite 51, Sunnyvale, CA 94086 (408) 738-6989. Building Block Releases New QuickGeometry Building Block Software has released QuickGeometry Library v1.02, a collection of math subroutines for developing programs for CAD/CAM, parametric design, NC programming, post processing, finite element analysis and GIS. QuickGeometry Library v2 improves the documentation and adds 3D display. It is $199 and includes C source code, object code for MS-DOS, documentation, working example programs, one hour of telephone support, and a 30-day money-back guarantee. For more information contact Building Block Software, PO Box 1373, Somerville, MA 02144 (617) 628-5217. Aspirin Includes Code Generator And Utilities Arrowhead Software has released Aspirin, a C development toolkit with libraries to support forms, windows, time and date manipulation, text manipulations, database management (ISAM), and menus. The package also includes a code generator, which provides facilities for adding, deleting, and modifying fields, text, lines and boxes. It also provides complete control over placement, attributes, and the extended character set. The Programmer's Utility package contains programs to aid in the development and enhancement of programs with features such as text search and replace, C function finder, and print utilities. The special ClipCode disk, sent quarterly, provides source code for extending and enhancing the basic library. Aspirin is available from Arrowhead Software for the introductory price of $250. All source code for the Aspirin libraries is included. For more information, contact Arrowhead Software, 7500 W. Mississippi Suite 201, Lakewood, CO 80226 (303) 922-1300. Zortech Implements C + + Version 2.0 Zortech Inc. has announced its C + + v2.0 Developer's Edition for MS-DOS. The C + + Developer's Edition, fully compatible with the AT&T 2.0 specification, supports multiple inheritance and type safe linkage and has built-in support for EMS. The Developer's Edition includes a C + + source level debugger, source code for the runtime library, and Zortech's C + + tools v2.0. Each of these components may be purchased separately as well. Zortech C + + v2.0 is compatible with Microsoft Windows and its portability to other C environments (including Microsoft C) has been enhanced. Version 2.0 features a set of graphics classes and a TSR library that can make many applications resident through a simple function call. Zortech's C + + Developer's Edition sells for $450. The compiler itself can be purchased separately for $199. The other components of the Developer's Edition, including the new debugger, the runtime library source code, and version 2.0 of C + + tools are available at $149 each. Updates to existing users start at $40. Zortech has also released its OS/2 compiler upgrade, priced at $149. For more information contact Zortech Inc., 1165 Massachusetts Ave., Arlington, MA 02174 (617) 646-6703. MMC AD Offers Toolboxes MMC AD has released C Programmer's Toolbox/PC v2.0 for PC compatibles, C Programmer's Toolbox for Apple Macintosh Programmer's Workshop (MPW), and two stand-alone programming tools for the Apple Macintosh, McCPrint v2.0 and McClint v2.0. The C Programmer's Toolbox/PC v2.0 is a set of 21 tools in two volumes. The Toolbox works with any PC compatible system that has MS-DOS v2.1 or later. A hard disk is highly recommended. The Toolbox is compatible with Microsoft C, Quick C, Turbo C and other PC C compilers. Volumes I and II retail for $99.95 each or $175 for both. For registered toolbox owners, the new release is $30 per volume or $50 for both. The C Programmer's Toolbox/MPW is a set of 20 tools that work with MPW v3.0 or later. MPW C is not required. The Toolbox works with any C source code that is being developed by or supported through MPW C v2.x or v3.x, Lightspeed C/Think C, Aztec C, all PC C compilers, engineering workstation and UNIX C compilers. The Toolbox retails for $295. McCPrint v2.0 is a C source code beautification/reformatting system that includes a source code formatting system, a multiple window editor and source code highlighting system. McCPrint works with a Macintosh System v4.2 or later (System v6.x is recommended) and at least 512 KB of memory. A hard disk is recommended, but not required. McCPrint runs as a stand-alone application and fully supports MultiFinder foreground and background processing. McCPrint is compatible with all C compilers. McCPrint is $59.95. An update is available for existing McCPrint customers for $25. McCLint v2.0 is a C source code semantic checking system. McCLint works with a Macintosh system v4.2 or later (system v6.x is recommended) and at least 1 MB of memory. A hard disk and additional memory is highly recommended. McCLint runs as a stand-alone application and fully supports MultiFinder foreground and background processing. McCLint is $99.95. An update is available for existing McCLint customers for $25. For more information contact MMC AD Systems, Box 360845, Milpitas, CA 95035 (408) 263-0781. Lattice Ships Free Amiga Updates Lattice, Inc. has released v5.04 of its Lattice AmigaDOS C Compiler. Version 5.04 of the Lattice C Compiler for AmigaDOS includes more than 50 enhancements to the compiler, libraries, codePRobe debugger, and utilities. A READ.ME file on the update disk describes the changes and installation procedure. All registered users of the compiler received the upgrade automatically. For more information contact Lattice, Inc. at 2500 South Highland Ave., Lombard, IL 60148 (800) 444-4309. MIPS RISC Gets API Standard AT&T and MIPS have signed an agreement to build a UNIX System V v4.0 Application Binary Interface for the MIPS RISC microprocessor. An Application Binary Interface (ABI) specification tells software vendors how to write applications that will run in binary form -- like today's PC applications -- on machines from any number of vendors that use the same microprocessor architecture. Under the agreement, the MIPS ABI will serve as the basis for trademarked UNIX System V software for the MIPS architecture. AT&T will make the MIPS ABI specifications available to the industry. AT&T also said that the ABI specifications for the MIPS architecture will be compatible with other AT&T ABI specifications. For more information contact MIPS Computer Systems, Inc., 928 Arques Ave., Sunnyvale, CA 94086 (408) 991-7736 or AT&T, 60 Columbia Turnpike, Morristown, NJ 07960 (201) 829-7212. JYACC Offers Jterm Emulator Package JYACC, Inc. has released Jterm, a terminal emulator package. Jterm is equipped with a file transfer utility which offers ASCII, Xmodem, Kermit and Kermit Server protocols, and JYACC's own protocol, Jtran. Jtran incorporates a data compression system that reduces file transfer transmission time by fifty percent. The Jterm emulation provides application users and developers extensive screen control and allows them to take full advantage of PC capabilities such as color display, function keys and the PC graphics character set. Jterm provides the standard features of a terminal emulation package (direct and modem dialing, file transfer with errorchecking, initialization and script files, etc.) and DEC VT100, VT220 and TTY emulation modes. For more information contact JYACC at 116 John St., New York, NY 10038 (212) 267-7722; FAX (212) 608-6753. Marietta Updates c_ndx And c_wndw Marietta Systems has released the c_ndx relational database library and the c_wndw v2 library for Borland Turbo C, Lattice C, Microsoft C 5.x and Quick C. c_wndw v2 includes the c_ndx relational database library for dBase files, improvements in facility and performance, increased flexibility, and expanded manuals. The c_wndw and c_ndx libraries are available in object library form for Borland Turbo C and Microsoft C compilers at a license fee of $95. Shipping and handling is $4.50 for US and Canada, and $11 for overseas customers. Source license agreements are available at $195 plus S&H. For more information contact Marietta Systems, Inc., PO Box 71506, Marietta, GA 30007 (404) 565-1560. New Releases CUG302 3-D Transforms Written by Gus O'Donnell (CA) and submitted by Michael Yokoyama (HI), 3-D Transforms is a library of functions used to create, manipulate and display objects in three dimensions. The functions allow the programmer to create representations of solid objects bound by polygons, to rotate, translate, scale the objects in three dimensions, and to display the objects in color with a given light source. The disk includes a brief description of each function in the library, complete C source code, function libraries for Turbo C, and a demonstration program which displays a cube, a tetrahedron, and octahedron in three dimensions with each figure rotated about a different axis (Figure 1). The program requires a Turbo C graphics library and BGI files. Turbo C v1.5 or later is recommended. CUG303 MC68K Disassembler Written by John M. Collins (England) and submitted by Steven M. Ward (MA), MC68K Disassembler runs on Motolora 68000 ports of UNIX System III and V. The disassembled output can be assembled to generate the same object module as the input. When disassembling stripped executable files, object modules and libraries may be scanned, modules in the main input identified and the appropriate names automatically inserted into the output. Also, an option is available to convert most non-global names into local symbols, reducing the number of symbols in the generated assembler file. The disassembler copes reasonably with modules merged with the -r option to ld, generating a warning message as to the number of modules involved. The disk includes a users guide and complete C source code. Although the program is MC68000 specific, it is easily adaptable to run in most any operating system environment as a cross development tool. CUG304 ROFF5 Ernest E. Bergmann (PA) has completed a major rewrite of his ROFF4 (CUG128 and CUG145). The ROFF5, v2.00 technical text formatter has evolved from ROFF4 to become somewhat more like UNIX's nroff and troff. ROFF5 now supports conditional macros, page traps, roman numerals and line numbering. It is intended for preparation of manuscripts on any dot matrix printer and can handle equations and special symbols. Different ouput devices are supported with device-specific ASCII files that inform ROFF5 of the special controls for that device. Fractional line spacing for superscripts and subscripts are supported even for printers that cannot reverse scroll. The "built-in" commands follow the naming conventions of nroff and troff where appropriate; however, in contrast to the UNIX formatters, ROFF5 supports register and macro names of arbitrary length. The disk includes a complete set of C source code, well-written documentation, and a number of test and demo files. The program was written using Turbo C v2.0 for MS-DOS. CUG305 HGA Mandelbrot Explorer and Card Games Dan Schechter has submitted a Hercules monochrome Mandelbrot program, as well as the card games, poker and blackjack. Unlike most Mandelbrot programs, which require you to specify "color-value" information in advance, his programs, EMANDEL and EJULIA (Figure 2) save all calculation data, allowing you to tweak the picture by specifying color-value information afterwards. POKER is five-card draw poker. The computer plays four hands independently (the computer's four "players" do not consult with each other) and you play one hand. BLACKJACK is not quite real casino blackjack. It is just you against the dealer. "Doubling down" is not supported. The screen display of both card games is neatly organized using the Hercules graphics. This disk includes C source codes as well as executables for MS-DOS. All the programs are compiled using the Aztec C compiler. CUG306 Thread and Synapsys Gregory Colvin (CO) has contributed Thread and Synapsys. Thread is a multitasking kernel based on lightweight threads. (See his story elsewhere in this issue.) He uses the ANSI Standard C library functions, setjmp() and longjmp() to implement multiple threads within a single C program. He has tested the code with Microsoft C v5.0 on an IBM-AT, with MPW C v3.0 on a Macintosh SE. On his AT machine, the kernel compiles to under 1K of code and executes over 80,000 jumps per second. Synapsys is a neural network simulation program which implements a very fast backpropagation network by representing synapse layers as word arrays and implementing all operations with integer arithmetic. The disk includes C source code, benchmark and testing code for both programs. Updates CUG252, 253 C Tutor Coronado Enterprises (12501 Coronado Ave NE, Albuquerque, NM 87122) has released C Tutor v2.4. This new C Tutor has been modified to include many of the proposed ANSI standard changes. CUG252 includes documentation and CUG253 includes source code. CUG257 and CUG258, C Tutor for Turbo C are not included in the revision. We will retain them for a few months as a resource for programmers working with older versions of Turbo C. Eventually convergence to the ANSI standard should allow us to retire these volumes. CUG263 c_wndw and c_ndx This v2.02 release from Marietta Systems includes the "c_ndx" library that supports relational database access to dBase files and B-tree indexes. This shareware package includes a manual, sample programs, and small model library for Turbo C and Quick C. The source code is available from the author (2917 Ashebrooke Dr, Marietta, GA 30068). CUG265 cpio Installation Kit Good news for AT&T 3B1 users. In the past, 3B1 users have been unable to read CUG disks even though our physical disk format (48 tpi, 8 sectors/track, 512 bytes/sector) matches theirs. There seems to be some incompatibility between their UNIX on 3B1 (it is okay on 3B2) and our SCO XENIX/386. T.W Kalebaugh (KS) has created a loader and dump utility for AT&T 3B1 (UNIX PC, 7300 and Convergent Technologies S-50). The updated disk includes his new subroutines and makefiles. CUG278 CXL Library Mike Smedley has updated his shareware C function Library, CXL to v5.1. The update includes new features such as a context-sensitive help system, extensive mouse support, shadowed windows, multiple-field data entry forms, enhanced menuing functions, extended keyboard support, and file encryption. This disk includes a manual, demo programs, small model library for Microsoft C & Quick C, Turbo C and Zortech C/C++. The source code is available from the author (P.O. Box 33603, San Antonio, TX 78265). Additionaly, Kamran Bayegan has contributed a "Screen and Form Designer" program which designs screens and forms that are completely compatible with this library. CUG297 Small Prolog Henri de Feraudy (France) has updated his original Small Prolog. The updated disk includes some minor bug fixes, a speed improvement involving prunify.c and prhash.c, a better handling of type predicates such as integer, and three new examples. A review of the earlier version appears elsewhere in this issue. Figure 1 Figure 2 We Have Mail Dear Mr. Ward, I am delighted to hear that you will be publishing every month now. While it may be possible to get too much of a good thing, one C Users Journal per month is still a long way from too much (perhaps your staff has a different view). There are two things in the November issue that cause me to write. First is Jay Martin Anderson's generally excellent overview of the IEEE-488 interface bus. Normally, using "HP-IB" and either "GPIB" or "IEEE-488" as synonyms causes no great confusion. However, I think that Prof. Anderson's article may be an exception to that rule. Let me provide the precise meaning of each term, and then indicate why I believe that the equivocation of these terms has presented a problem in his piece. "IEEE-488" is the generic way of identifying the standards document, "IEEE Standard Digital Interface for Programmable Instrumentation," produced by the IEEE. The designation "GPIB" (General Purpose Interface Bus) is the generic term for the interface bus. "HP-IB" (Hewlett-Packard Interface Bus) is a term used by Hewlett Packard to designate both the IEEE-488 electronic standard and HP's software protocol for using the interface. The most serious error in the article that comes from equating HP-IB with IEEE-488 is the assertion that "any instrument which claims adherence to the IEEE-488 standard must be able to respond to a serial poll." (page 29). In fact, the IEEE-488 standard specifies many allowable subsets of the full interface. Included among these are nine "Allowable Subsets to T Interface Function" (table 11 of the IEEE-488 standard.) "T" is the basic talker function. (There is a precisely analogous TE -- extended talker -- with the same nine subdivisions of that function). Of those allowable functions, five do not support serial polling. However, one is the degenerate case of no talker capability, TO. Obviously, any instrument that cannot talk cannot answer a serial poll. So, after acknowledging that it is legal for an IEEE device to not support talking at all, the more accurate picture is that half of the talker options that "adher[e] to the IEEE-488 standard" do not support serial polling. While it is true that HP-IB uses serial polling, there is no such requirement from the IEEE. I do not want to exaggerate the significance of Prof. Anderson having elided the HP-IB and the IEEE-488 specifications into one. In general, his article is an excellent introduction to the HP-IB. However, it is worth recognizing that it is not an introduction to the GPIB as such. The second thing in the November issue that I wanted to respond to was Jeff Saraiva's request in "Q?A!" for programming examples using the Microsoft C v5.1 compiler's graphics library. I am sending a copy of my FTGRAPH, which is a tool kit of FFT functions. It uses the MSC 5.1 graphics library for its screen output (MSC_GRPH.C is the source file). While it is not a particularly extensive graphics application, it does illustrate determining the graphics adapter at runtime, and scaling the output to the actual adapter's resolution. It also shows how to use both text and graphics with the graphics library. You may include it in the C Users Group library if you think that it is suitable. You should note, however, that the front-end (FTGRAPH.C) is quite ragged. The library routines have evolved over the last few years to meet my employer's needs, but the front-end was written on my own time as a way of providing a tool kit to accompany an article on FTs that I wrote for Intelligent Instruments and Computers. I think the library is a reasonably polished, professional product. The front-end is a good example of what you get for nothing. I include a reprint of the article, in case Mr. Saraiva is not familiar with the FT and what it can be used for. I hope it is not too trivial a graphics application to be useful to him. Sincerely, Tom Clune Eye Research Institute 20 Staniford St. Boston, MA 02114 Thanks for the HPIB/IEEE 488 clarification. We have passed your graphics library on to Kenji for evaluation. We'll also pass a copy immediately to Saraiva. I think "How to determine adapter type at runtime" would be a good article by itself. Any authors? --rlw Dear Robert, I just finished reading the Jan 1990 issue of CUJ (again, an excellent job, guys). I want to respond to one of your reader's (Dr. Whitaker of Boston, MA) requests for texts on "grep", "awk", "sed" and "tr" as well as to one of the articles which I found to be most interesting. First the texts. 1. AT&T UNIX Programmer's Manual Volume 4 titled Document Preparation, edited by Steven V Earhart, a CBS College Publishing by Holt, Rinehart and Winston (HRW), ISBN 0-03-011207-9 2. AT&T UNIX Programmer's Manual Volume 5 titled Languages and Support Tools, edited by Steven V. Earhart, a CBS College Publishing by Holt, Rinehart and Winston (HRW), ISBN 0-03-011204-4 These two texts are probably the most complete descriptions of the utilities in question and describe everything you ever wanted (and never wanted) to know about them, complete with examples and option descriptions. The next text I would recommend is UNIX Utilities by R. S. Tare published by McGraw-Hill, ISBN 0-07-062884-X This book is a programmer's reference and makes some assumptions about how much the reader knows about programming in general. This book would probably not be a good teaching guide but it's a great reference. Lastly, I would recommend the following Bell Laboratories, technical memoranda. 1. SED -- A Non-interactive Text Editor, by Lee E. McMahon, dated August 15, 1978. 2. AWK -- A Pattern Scanning and Processing Language, by Alfred Aho, Brian Kernighan and Peter Weinberger, dated September 1, 1978. I realize that these two documents might be more difficult to get ahold of, but they are excellent user guides and no more than 10 pages. Next I would like to present an "addendum" to a very well written article entitled UNIX 'termcap' Facility Improves Portability by Ronald Florence. I realize that the article was about the 'termcap' facility but since he did mention the 'terminfo' facility, I wanted to present some additional information about it, to you. If after reading this letter you think that a more "in-depth" article or tutorial about it may be of interest to your readers, I would be more than happy to contribute. First let me say that none of this information applies to any UNIX versions prior to UNIX System v2 but I would strongly recommend upgrading to, at least, UNIX System v3 as soon as possible. The added security measures and bug fixes are well worth it! Well, back to the article. Mr. Florence stated in his article that "The termcap database is substantially easier to modify..." than the terminfo database. I must disagree with this statement. On most (if not all) UNIX systems that use 'termcap', the database can only be modified by the system administrator (or super user) and rightly so. If you, yourself, are not the super user, experimental (trial and error) modifications to terminal descriptions are impractical to say the least. With 'terminfo', the user is free to experiment with a terminal description that only he or she will use (at least until it's fully tested). In order to write a 'terminfo' terminal description, you will need at least the following: section 4 of the UNIX programmer's manual (TERMINFO(4)) and the technical reference for the particular terminal you wish to build a description for. Only with this information is it possible to write a terminal description. In order to use the "new" terminal description, it must be compiled using TIC(1M) the terminfo compiler. The procedure is simple, once the terminal description file is complete, just type tic filename and this will create subdirectories (one for each unique terminal name in the first line, i.e., at386 makes directory "a", AT386 makes "A" and so on) and the compiled file is placed in the subdirectory under the terminal name and any appropriate links are made in the other subdirectories. In order to make use of this file, the user must define and export the environment variable TERMINFO equal to the directory under which the subdirectories were created. The user must also define and export the TERM variable equal to the appropriate terminal name. For example, in the users ".profile" file have the following: # (assuming "termdefs" contains the description file) TERMINFO=${HOME}/termdefs TERM= at386 export TERM TERMINFO Programs using "curses" and "terminfo" routines will check for the TERMINFO variable to be set first, before checking the standard terminal description database. Much of this information can be found in the various sections of the UNIX manuals and there are also several books (and memoranda) on the subject. For the convenience of your readers, I have enclosed a couple of sample listings for 'terminfo' descriptions. Listing 1 is the 'terminfo' description supplied by most UNIX System V/386 vendors for the 80386 based IBM PC/AT console. It should be noted that the description supplied by most UNIX SysV/386 vendors is INCORRECT! The "xt" boolean (destructive tabs) should be removed as it will cause problems with programs like GNU Emacs and others. Listing 2 is the same 'terminfo' description using the long C variable names listed in <term.h>. This is a much clearer example of the terminal description information. I hope you and/or your readers will find this information of some use. If you have any questions or wish to contact me, you can do so at the address or via e-mail at uunet!rwbix!cci. Sincerely, Bob Barrett Principal Consultant at CCI 528 North Riverside Dr. Neptune, NJ 07753 I have always found termcap and curses to be the most difficult-to-learn parts of UNIX, mostly because the documentation is so scattered and patchy. In addition to your references, Kochan and Wood's book "Topics In C Programming" includes several little tidbits (like when to use clearok()) that I haven't found elsewhere, and Rochkind's "Advanced C Programming for Displays" includes good advice about using termcap directly and some interesting performance comparisons between new and old termcap and curses.--rlw Dear Mr. Ward, I've started an interactive curses based program that calculates topological chemical indexes as suggested by an article in the Scientific American Magazine (Sept. 1986, p.43). A graphics editor for drawing organic structures using commands loosely named after those in emacs is called. The editor also uses a small library that includes standard subunits like benzene rings, steroids, etc. Structures can be named and saved. The drawing can be modified, renamed, and recalculated. Only the randic index calculator has been finished. I have considered converting the program into a filter that would pipe the index numbers to a statistical program to check for correlation with various chemical or biological activities. I know that various systems are used to translate standard chemical names into codes that are machine readable but I don't know which one is the de facto standard. My machine is a UNIX PC, PC 3700 (System v3.5 software) but I've avoided menus, windows, and the mouse in favor of portability. I wrote the program in standard K&R C. I doubt that there is enough interest to add this package to the CUG standard distribution. Since there are so many design considerations, I would like to contact some chemists interested in this theoretical tool so that I could implement it to be useful for their academic use. Sincerely, Phil Karn, SR 230 Division Ave. Lutherville, MD 21093 Dear Mr. Ward: I am a recent subscriber to The C Users Journal and let me start by saying that I think you have a great publication. Here are some topics which I would enjoy reading about in future issues. Since I am an MS-DOS user, most of these topics are oriented towards that environment. Video and printer drivers. I would like for my programs to take advantage of the hardware capabilities of different printers and video cards. Both Microsoft Windows and Borland's BGI provide a method for doing this, but I would prefer to use my own code. What I would really like to see discussed is how to write device drivers which can be selected during the execution of a program. OCR. What are the current methods used to perform optical character recongition. I realize that this is too big a topic for extensive coverage, but an introductory tutorial would be very nice. Speech Synthesis. What can be done to add speech to programs? I realize that this is usually done with special hardware, but am curious as to what can be done with just a standard PC. It seems that the commercial game programs keep getting better and better sound using a standard computer. Timing. How can I write programs which are independent of the clock speed of the machine being run on. For some events, such as animation, the real time clock does not give enough precision to control the timing. The commercial games seem to have solved this problem as well. Lynn Akers, Jr. Akersoft, Inc. 5600 Roswell Rd. Ste. 200B Atlanta, GA 30342 Talk about timing! Surely you'll notice the speech recognition article in this issue. Phyllis Lang wrote a story about "Improving Timing Resolution" which appeared in our May 1989 issue. I'm sure that story would address your timing needs. We have sold out of that issue, but can still supply a photo copy (for a small fee). Just call and ask for Phyllis Lang's story from Vol. 7, Issue 5. --rlw Dear Ward Folks; Where do you buy your drugs? $28 for your magazine! Not a chance, if it costs so much to produce it why did you go to coated pages and a color cover? Leave me the old style, charge me less and maybe we can work something out. Best of luck (Ha!) Tom Brusehaver 1505 Ensign Dr. #C Normal, IL 61761 P.S. No one asked me if I wanted the format to change. I would have said NO! In fact, the coated paper we are now using is less expensive than the offset stock we used to use. The change in price was designed to cover the additional issues. If $24 for eight was reasonable, I fail to understand why $28 for twelve isn't. Frankly I don't think the price requires much defense. For $28 we deliver roughly 1500 pages of technical coverage. Even discounting non-editorial space you still get over 1000 pages of technical material. Have you priced any 300 page technical books lately? Have you bought a large pizza recently? Listing 1 /* Lines ending with a '\' character are broken for readability. In practice, this should all be on ONE line. */ AT386at386386AT386atat/386 console, am, bw, eo, xon, xt, colors#8, cols#80, lines#25, ncv#3, pairs#64, acsc= ''a1fxgqh0jYk?lZm@nEooppqDrrsstCu4vAwBx3yyzz{{}}~~, bel=^G, blink=\E[5m, bold=\E[1m, clear=\E[2J\E[H, cr=\r, cub=\E[%p1%dD, cub1=\E[D, cud=\E[%p1%dB, cud1=\E[B, cuf=\E[%p1%dC, cuf1=\E[C, cup=\E[%i%p1%02d;%p2%02dH, cuu=\E[%p1%dA, cuu1=\E[A, dch=\E[%p1%dP, dch1=\E[P, d1=\E [%p1%dM,dl1=\E[1M, ed=\E[J, el=\E[K, flash= ^G, home=\E[H, ht=\t, ich=\E[%p1%d@, ich1=\E[1@, il=\E[%p1%dL, il1=\E[1L, ind=\E[S, indn=\E[%P1%dS, invis=\E[9m, is2=\E0;10;39m, kbs=\b, kcbt=^], kclr=\E[2J, kcub1=\E[D, kcud1=\E[B, kcuf1=\E[C, kcuu1=\E[A, kdch1=\E[P, kend=\E[Y, kf1=\EOP, kf10=\EOY, kf11=\EOZ, kf12=\EOA, kf2=\EOQ, kf3=\EOR, kf4=\EOS, kf5=\EOT, kf6=\EOU, kf7=\EOV, kf8=\EOW, kf9=\EOX, khome=\E[H, kich1=\E[@, knp=\E[U, kpp=\E[V, krmir=\EO, op=\E[0m, rev=\E[7m, rin=\E[S, rmacs=\E[10m, rmso=\E[m, rmul=\E[m, setb=\E[%?%p1%{0}%=%t40m%e%p1%{1}%=%t44m%e%p1%{2}%=%t42m%e%p1 \ %{3}%=%t46m%e%p1%{4}%=%t41m%e%p1%{5}}%=%t45m%e%p1%{6}%=%t43m%e%p1 \ %{7}%=%t47m%;, setf=\E[%?%p1%{0}%=%t30m%e%p1%{1}%=%t34m%e%p1%{2}%=%t32m%e%p1 \ %{3}%=%t36m%e%p1%{4}%=%t31m%e%p1%{5}%=%t35m%e%p1%{6}%=%t33m%e%p1 \ %{6}%=%t33m%e%p1%{7}%=%t37m%;, sgr=\E[10m\E[0%?%p1%p3%%t;7%;%?%p2%t;4%;%?%p4%t;5%;%?%p6%t; \ 1%;%?%p9%t;12%;%?%p7%t;9%;m, sgr0=\E[0;10m, smacs=\E[12m, smso=\E[7m, smul=\E[4m, Listing 2 /* Note: Lines ending with a '\' character are broken for readability. In practice, this should all be on ONE line. Terminal type at386 AT86at386386AT386atat/386 console flags auto_left_margin, auto_right_margin, dest_tabs_magic_smso, erase_overstrike, xon_xoff, numbers columns = 80, lines = 25, max_colors = 8, max_pairs = 64, no_color_video = 3, strings acs_chars = '''a1fxggh0jYk?lZm@nEooppqDrrsstCu4vAwBx3yyzz{{}}~~', bell = '^G', carriage_return = '\r', clear_screen = '\E[2J\E[H', clr_eol = '\E[K', clr_eos = '\E[J', cursor_address = '\E[%i%p1%02d;%p2%02dH', cursor_down = '\E[B', cursor_home = '\E[H', cursor_left = '\E[D', cursor_right = '\E[C', cursor_up = '\E[A', delete_character = '\E[P', delete_line = '\E[1M', enter_alt_charset_mode = '\E[12m', enter_blink_mode = '\E[5m', enter_bold_mode = '\E[1m', enter_reverse_mode = '\E[7m', enter_ secure_mode = '\E[9m', enter_standout_mode = '\E[7m', enter_underline_mode = '\E[4m', exit_alt_charset_mode = '\E[10m', exit_attribute_mode = '\E[0;10m', exit_standout_mode ='\E[m', exit_underline_mode = '\E[m', flash_screen = '^G', init_2string = '\E[0;10;39m', insert_characcter = '\E[1@', insert_line = '\E[1L', key_backspace = '\b', key_btab = '^]', key_clear = '\E[2J', key_dc = '\E[P', key_down = '\E[B', key_eic = '\EO', key_end = '\E[Y', key_f1 = '\EOP', key_f10 = '\EOY', key_f11 = '\EOZ', key_f12 = '\EOA', key_f2 = '\EOQ', key_f3 = '\EOR', key_f4 = '\EOS', key_f5 = '\EOT', key_f6 = '\EOU', key_f7 = '\EOV', key_f8 = '\EOW', key_f9 = '\EOX', key_home = '\E[H', key_ic = '\E[@', key_left = '\E[D, key_npage = '\E[U', key_ppage = '\E[V', key_right = '\E[C', key_up = '\E[A', orig_pair = '\E[0m', parm_dch = '\E[%p1%dP', parm_delete_line = '\E[%p1%dM', parm_down_cursor = '\E[%p1%dB', parm_ich = '\E[%p1%d@', parm_index = '\E[%P1%dS', parm_insert_line = '\E[%p1%dL', parm_left_cursor = '\E[%p1%dD', parm_right_cursor = '\E[%p1%dC', parm_rindex = '\E[S', parm_up_cursor = '\E[%p1%dA', scroll_forward = '\E[S', set_attributes = '\E[10m\E[0%?%p1%p3%%t;7%;%?%p2%t;4%;%?%p4%t;5%; \ %?%p6%t;1%;%?%p9%t;12%;%?%p7%t;9%m', set_background = '\E[%?%p1%{0}%=%t40m%e%p1%{1}%-%t44m%e%p1 \ %{2}%=%t42m%e%p1%{3}%=%t46m%e%p1%{4}%=%t41m%e%p1%{5}%=%t45m%e%p1 \ %{6}%=%t43m%e%p1%{7}%=%t47m%;', set_foreground = '\E[%?%p1%{0}%=%t30m%e%p1%{1}%=%t34m%e%p1 \ %{2}%=%t32m%e%p1%{3}%=%t36m%e%p1%{4}%=%t31m%e%p1%{5}%=%t35m%e%p1 \ %{6}%=%t33m%e%p1%{6}%=%t33m%e%p1%{7}%=%t37m%;', tab = '\t', end of strings Discrete Event Simulation In C For Real-Time Systems Steve Halladay and Steve Johnson This article is not available in electronic form. External Tools For Debugging C Bob Whitten Bob Whitten is a senior software engineer for X O Technologies, Inc., Valencia, CA, a manufacturer of turbine flow meters, transmitters for flow meters, and flow totalizers and controllers. A programmer for 10 years, Bob has been involved in many embedded systems, especially "system" code. He can be reached at (805) 257-5542. Though my favorite debugging environment is Turbo Debug, most of my projects are not MS-DOS-based, running instead on a microcontroller tied directly and intimately to the surrounding hardware. Usually, as soon as the prototype hardware is available, an effort is made to get something running on it, to see that the hardware works and to give us and management a good feeling that the project is going well. After all, if the hardware works, then the project is half-done, right? (This is where the software team lets out a loud groan...) When that first something is running, does it work? Does it do everything expected? More importantly, can you test the incremental software builds as they are produced? Sometimes the hardware doesn't work, or not as specified. Or the software team interpreted the spec one way, and the hardware team went the other way. (Of course, the argument goes that this should have all come out during the technical walk-throughs, but since everybody thought they understood it, nobody mentioned it.) Usually, an LSI interface chip is involved and the documentation on it seemed clear, but later it turns out not to work the way everyone thought. Other times the software doesn't work, usually because the software team isn't talking together enough (or in the case of a one-person job, the software engineer isn't talking to himself enough). Embedded programs can be tricky to write and even trickier to debug. I've been writing (and debugging) programs of this sort for a while now, and in this article I'd like to share what I've learned of how to use "external" tools in debugging. Three main tools -- an Oscilloscope (and its kid brother, the logic probe), a Logic Analyzer, and an In-Circuit Emulator provide very different levels of help, and each has its place. Using An Oscilloscope An oscilloscope displays, or "traces", electrical signals from one or more input channels on a cathode-ray tube, showing how these signals change during a given time interval. 'Scopes have lots of knobs and switches, so they are a practical tool if you already know how to use them or have a good working relationship with someone who does. Attaching 'scope probes to hardware, especially prototype hardware, can threaten your job security, so I usually try to find someone else to do it. The oscilloscope can be useful mostly because it has a "trigger" circuit, which can be set to initiate a trace either when a signal goes high or when it goes low. The trigger once or repeatedly. Repeatedly is the normal setting since the image traced on the screen fades quickly; a signal that is repeated often will appear brighter. Sometimes the challenge of using a 'scope is making the program cycle on a regular enough basis to get a readable trace. The 'scope's screen is calibrated in centimeters, with voltage measurements on the y-axis, and time on the x-axis. You can select both the voltage range and the timing range. For digital circuits, the voltage range should be set to conveniently display zero to five volts. Since the information displayed on the 'scope is limited to a couple of channels (two bits), it seems almost useless. It's amazing what a simple tool can do in the hands of skilled person, however, and the 'scope is no exception. I've been fortunate to work with people that seem to make the 'scope sing a ballad. For example, if you're programming a microcontroller, sometimes it's enough to know whether or not the code reached a certain point. Since there is usually some output bit somewhere that is not used or does not cause any problems if set (like a Light-Emitting Diode), the code can include "milestones", where these outputs are set to indicate that the code got there. for (sum = 0, i=0; i < 2048; i++) sum += *(PROM + i) if (sum != 0) while (1) ; /* hang forever */ outbyte( LED_PORT, 0x01); /* turn on the OK light */ Now, arguably, the 'scope isn't needed here since the light will either go on or not. But what if the light goes on, but gets reset so rapidly that it never appears to light? What if the LED is inserted backwards, so it doesn't light? Just putting the 'scope probe on the output pin and looking for a change will begin to diagnose the problem. In addition to simple "does it get there" debugging, the 'scope is a great way to perform timing measurements. For example, a task that should complete within 30 ms could set an I/O port at its beginning and clear the I/O bit at completion. This will generate a pulse that can be traced on the 'scope. The time to execute the task and the time between executions, can then be easily read off the 'scope, based on the graduations on the CRT face. Using two channels, the turn-around time for communications message processing can be easily measured by attaching the transmit line to one channel and the receive line to the other channel. You can even decode the message from the 'scope trace if you know the communications protocol well enough. (This is lots of fun with NRZI standards like HDLC.) A good 'scope is considered a minimum requirement in most shops where hardware is being designed, but a 'scope can be overkill for other tasks. If you just need to do some "did the signal go high" testing, a logic probe might be adequate. The logic probe senses digital logic levels and has an LED for a signal high, another for signal low, another for a "pulsing" signal (slowed to human speeds), and yet another, labeled "memory", to show that a signal went high and then low (a single pulse). These are cheap (less than $50), simple, small, and don't have a lot of knobs. Using A Logic Analyzer A logic analyzer (LA) is like a collection of logic probes, in that it looks at logic levels at many locations, either high or low, but unlike a logic probe, an LA also allows those levels to be "clocked" into memory, usually based on the microcontroller clock signal. In most applications the LA must have at least as many input lines as there are address lines on the processor -- more is better. The state of the lines is remembered on the basis of the clock input, which can be set to clock on either the high-going or the low-going edge. A careful study of the handbook for the particular processor is often needed to set clocks and clock edges correctly, and sometimes just experimenting till it "works" is the only way. Since microcontrollers may go through a million or so instructions in a second, just saving every instruction in the LA's limited memory is not feasible. To focus on an area of interest, the LA has its own kind of trigger mechanism, which can be as simple as waiting for some or all of the input lines to match user-set values, e.g., a given address. The analyzer may be set to start collecting frames into memory after the trigger is hit, or it may collect frames until the trigger, known as a "pre-trigger". In pre-trigger mode, if you set the "trigger" to the PANIC code, the LA will capture the addresses of the instructions executed immediately before the PANIC. Most LAs provide additional, very complex trigger schemes, to allow the user to catch a bug that occurs only in unusual circumstances. While using an LA is a definite improvement over a 'scope, it has its own challenges. For starters, the LA doesn't understand C. It reports what it sees in machine language (i.e., ones and zeros, converted to hexadecimal), unless you've paid extra for a "personality module" that can display these codes in assembly language. Thus, unless you were born with sixteen fingers, you'd better have a hex calculator close at hand. To use an LA to debug your C code, make your compiler produce listings with intermixed assembly language, and learn enough assembly language to understand what the compiler produced. Be sure you've turned off all optimizations -- otherwise you'll find your lines of code moved around or folded together. If you use a linker, as you usually must, you will have to add the link map offsets to the addresses in each module's listing to produce the addresses seen by the LA. Sometimes, you can force the linker to align modules on 256 (100 hex) byte boundaries, making the hex arithmetic easier to figure in your head. Logic analyzers are not ICEs (in-circuit emulators); an LA can only "see" the electrical signals on the microcontroller's bus. The LA can't "see" the activity of important circuits (communications, Analog-Digital conversion, timers, DMA), located within the microcontroller. Also, the LA doesn't allow you to stop and examine things and then continue. You can mitigate this limitation somewhat, at least during debug, by having your program copy important internal state information to an external memory location (causing the internal register data to appear on the external data bus). The Art Of Debugging With A Logic Analyzer As I remarked earlier, LAs typically have complex triggering mechanisms. Usually, simply triggering on a given address is sufficient, but when the really tough, once-every-hour bug comes along, the fancy triggering capability is invaluable. This is because the bug happens long before it is detected. If the trigger can be set to the place where the error is detected (for example, the hardware is set to a "fail-safe" state), sometimes there is enough "pre-trigger" memory to find out why the code got to this place. When that is not enough, the trickery has to start. Some analyzers will allow selective collection into memory, effectively expanding the memory by excluding un-interesting sections of code. Or, if there are only two paths that can bring the code to this one point, you can configure the trigger on an "OR" case: "trigger if either of these addresses is seen." Sometimes a certain section of code will execute correctly three times and fail consistently the fourth time. The trigger can sometimes be set to trigger on the "Nth" occurrence of an address. As an aside, the LA can help find bugs that a traditional debugger like Turbo Debug cannot, because the LA is non-invasive. The timing of the code and the contents of memory are not affected by the logic analyzer -- both are changed when a debugging program is loaded. Though it's usually easier to follow unoptimized code, in some cases you may be forced to debug the optimized version. When a bug is reported from the field, it may not manifest itself the same way unless the exact code from the field is used, loaded at exactly the same address. Multi-level triggering is required when the suspicious code works most of the time. For example, a trigger might be set to trigger if the following sequence occurs: State 1 is reached, then State 2 is reached; if State 3 occurs before State 4, then trigger, otherwise, start over looking for State 1. This retiggering feature finds bugs of the type where the execution thread wanders off into code that is run commonly, but is not correct in a certain context. An LA can also trigger on data accesses. It can be triggered on either a read or write, and even on the data at a certain address being accessed. This can help in those maddening situations where a data structure is getting "bashed" somewhere, but you haven't the foggiest where in the inch-thick listing that might be. The fancy triggering can come in quite handy in these cases. Let's say that the structure is legitimately changed in only one piece of code. The retriggering mechanism works well here. The Set-up would be something like this (this set-up is based on the Nicolet analyzer that I'm most familiar with): S1 -- a write request to the given data address S2 -- the start of the code that is allowed to change this address S3 -- the end of the code allowed to change this. 1. Collect frames until S1 occurs, then done. If S2 occurs first, go to step 2. 2. Wait for an S3 to occur, then go to step 1. The analyzer can be set to trigger on "sequence done" or "memory full". "Sequence done" would be a good choice here (the pre-trigger memory will have the addresses of code leading up to the fault). This sequence should only be "done" if some other code writes to address S1. If S1 is written to during initialization, a step before these may be in order: 0. Wait for the address of the end of the initialization code, S4. When a problem seems impossible to trigger on, I always get out the instruction book for the analyzer again, and hope to find something I missed the first dozen times through. I also do a "reality-check" if I've hooked the analyzer up and strange results appear. A reality-check is just a trace that triggers on the "trace memory full" condition after a restart. This trace should show the addresses and data from the first few instructions, and gives confidence that the analyzer is clocking correctly. It may also show how much switch bounce is in the reset button, by starting over and over again several times. The most important part about using an analyzer is that setting "good" trigger sequences is an art that will be acquired over time. Using An ICE An In-Circuit Emulator (ICE) is different from a logic analyzer in that it replaces the microcontroller and allows the user a high degree of control over the execution of the processor. Because of this control, it is much more like the "Turbo Debug" environment. The user can single-step through the program, set breakpoints (much like in a regular debug program), and set watchpoints (executing until a variable is changed) that are checked in real time (not in slow-motion like the debug programs). At a break, you can examine the internal registers and the memory locations and the I/O ports. Many ICEs also include a trace option that allows the emulator to do the functions of an LA. This includes collecting a trace of where the execution has been, and fancy multi-level triggering. The ICE also supplies a few digital input lines for the user to connect as he pleases, to monitor the prototype hardware. The ICE manufacturer also may have made a deal with the C compiler companies to allow source level debugging of the code, including single-line stepping, and setting a break based on a line number, and examination of variables by name. This makes the C programmer even more at home, and reduces the learning time significantly. Since the ICE allows the execution to be stopped, the hardware's "watch-dog" timer must be disabled, if one exists. Also, disable code checksum tests during testing, since the ICE can change the contents of the program. Also, bear in mind that the C source single-step mode is line-oriented, so keep each line simple. ICE is also good for patching "dumb mistakes" on the fly. For example, what if you wrote: "if ( a = = b )" but you meant "if (a > b)"? Making that one small change could mean 20 minutes work if you have to go back to your desk, edit, recompile, relink, reload, etc. With the ICE, you can "patch" the code and continue. Conclusion The tools described in this article can be useful in various circumstances. A 'scope is sometimes my first line of attack because it does certain tasks better than the others (like measuring timing). An emulator with integrated logic analyzer seems like the most powerful tool, but sometimes a logic analyzer has more triggering levels, or more input lines, or something that is required for a particular problem. Also, while a logic analyzer isn't tied to any one microcontroller, an emulator generally is (though you can purchase "personality modules" for other processors). Be flexible. Yet don't tell your boss that you can do it all with just a 'scope, either. I believe that software schedules get off track worst during the debugging phase. Nobody wants to plan to make mistakes. Don't forget to allow time to learn any new tools or methods that you'll have to learn. In all of debugging, try to become "wholistic". Accept information through whatever means it comes, not just by staring blankly into the screen on the 'scope, logic analyzer, or emulator. If your product has LEDs, make sure they blink in the ways you expect them to. Listen to the clicks and clacks of external hardware, or to the change in tone of the power supply when the load changes. If you feel heat radiating from the hardware and you don't think it should, check it out -- just do so carefully; I've burned my fingers more than once removing PROMs that I installed backwards. If you smell smoke, make sure it's not your hardware. The choice of what tools to use can be very difficult. While the ICE seems the best choice, it can also be the most expensive, since you may need to buy a different one for the next project. Maybe the project is so simple that the code can be checked out on a PC, with minimal testing on actual hardware. Good debugging tools will never make up for bad programming, and many projects were completed without any fancy tools. The best tool to use is the best tool available for the task at hand. But who hasn't used a screwdriver handle to tap something into place, or a table knife to remove a screw? The craftsman can take what tools he has, and make them do his bidding. Forked Interrupt Systems Marc L. Allen Marc L. Allen is a senior design engineer with Hamilton Test Systems, Inc., a subsidiary of United Technologies Corporation, where he designs point-of-sale systems and equipment. He has a B.S. in computer engineering from the University of Arizona. He may be contacted at Hamilton Test Systems, Inc., 2202 N. Forbes Blvd., Tucson, AZ 85745. I recently designed a system controller for a PC-based point of sale credit card authorization system. This controller is capable of handling up to four subordinate terminals and several miscellaneous communications and storage devices This application handles interrupts generated by keystrokes from subordinate terminals, communication activity, disk I/O, and an internal timer. These interrupts must be processed as quickly as possible while guaranteeing that every interrupt is processed. This system generates enough interrupt activity that I couldn't run with interrupts disabled for fear of missing one, and in certain cases would need to service an incoming interrupt before I had finished dealing with a previous interrupt from the same device. To address these needs I settled on a forked interrupt system running in protected mode and developed using Intel's IC-286 compiler under MS-DOS. A forked interrupt system utilizes a fork queue to serialize interrupts while minimizing the amount of time they are disabled. To do this, device drivers are broken up into two parts. The first handles the immediacy of the interrupt. Since interrupts are disabled during this portion of the driver, it should perform only the minimum work required. This normally includes acknowledging the device, clearing the interrupting condition, and (for input interrupts) reading the input data. Finally, this interrupt-disabled portion of the driver places the interrupt in the fork queue to be completed by the second, interrupt-enabled, portion. The second part of the driver is activated by the fork queue task and performs the remaining interrupt processing. For communication devices, this portion might store incoming data or extract and send outgoing data, perform checksum or CRC calculations, and handle hardware handshaking details. For a timer interrupt, the interrupt-enabled portion would handle the effect of the timer event on the system. Listing 1 contains the two portions of a clock driver which uses this technique. The clock interrupts occur at some system-configurable interval and are used for task time-slicing and the handling of timer events on the tasks waiting for them. The interrupt-disabled portion of the clock driver, timer_int() (Listing 1), is one of the simplest interrupt-disabled portions in the system. The timer interrupt is cleared and then a utility routine is called to place the second half of the driver (the interrupt-enabled portion, alarm()) in the fork queue. fork_driver() effectively ends the interrupt-disabled portion of the clock driver by transferring control to the fork queue task. When this transfer occurs, the driver is suspended and is not resumed until another clock interrupt occurs. At this point, the driver completes the call to fork_driver() and continues to the top of the external while loop to handle the current interrupt. The meat of the driver is contained in alarm() (Listing 1). This interrupt-enabled code first informs the system that a significant event (a timeslice event) has occured, increments a system tick counter, and processes any expired timers on the timer tick list. With interrupts enabled, other interrupts may occur and be placed in the fork queue while this driver is in operation. In fact, since the clock driver by design has no commonality between its two portion, a second clock interrupt can be placed on the fork queue while the present one is being handled. Naturally, if a driver can't keep up with its own device, it's eventually going to have some serious problems. But with a conservative queue size, the driver could get behind its device during a sudden burst of activity and still catch up during the following idle period. This can easily happen if many devices interrupt at the same time. Remember that each interrupt will suspend the current driver until the new interrupt can be placed into the fork queue. The call to fork_driver() in Listing 1 is not strictly correct. fork_driver() actually takes an additional long (four-byte) argument, allowing the interrupt-disabled portion to pass any necessary data to the interrupt-enabled portion. Although the choice of a long argument was appropriate for my system, any size is acceptable. This argument is passed to the interrupt-enabled portion as its first parameter. In practice, this parameter may be a character received over a communications line, some kind of device identifier, or a device status. Those who like to play games with parameters can use the long argument to pass two integer or character values or even a structure containing four characters. This is not ANSI standard and certainly is not portable C; however, it does make certain operations much simpler. As the clock driver has no need for any data, dummy is used as a place holder. The final parameter passed to the routine is a pointer to the driver's acting 80286 Task State Segment (TSS), a structure which contains all the driver specific information required by the system. I use the term "acting" because this TSS is not the original TSS for that driver. The original is reserved for the driver's interrupt-disabled portion. Otherwise, the orignal TSS might be active when the next device interrupt occurs forcing a general protection fault while trying to activate a busy task. Listing 2 shows how fork_driver() operates. Notice that if the system was running a normal task, the fork queue task would startup to handle the lastest fork entry. If the fork queue task was already running, this entry will be taken care of in due course, and if the system was executing a system service call, that call would be allowed to finish. The system scheduler will start the fork queue task at the completion of the system service. My system also contains a fork_continue() routine. It allows a driver to place an entry in the fork queue but returns control to the driver. fork_continue() is only used if the driver has more than one routine to fork. The last fork operation a driver performs should be through fork_driver(). The physical queue entry contains elements to store the address of the driver's original TSS, the address of the interrupt-enabled routine, the long parameter, and a link to the next element in the queue. I store the address of the original TSS so that the fork queue task can set up an environment identical to that of the driver before activating its interrupt-enabled portion. This allows a driver to switch to the activating task's Local Descriptor Table (LDT) and maintain the LDT association through the interrupt. The initial LDT switch would be performed when a system task initiates an I/O to the driver. Note that while the clock driver does not support direct I/O from a system task, it is activated by a number of system service calls regarding timeslicing and system timers. The interrupt-enabled portion does change LDTs to gain access to different tasks' parameter blocks which may be in local data areas. Once started, the fork queue task (fork_execute(), Listing 3) will execute all queue entries, including those added during queue execution. For each entry in the queue, the fork queue task creates an exact duplicate of the entry's original TSS with the following exceptions: The entry is set to run with interrupts enabled. The stack is switched to a special fork queue stack to avoid any interactions between the two portions of the driver. The current execution address is set to point to the fork_start() routine. fork_start() (Listing 4) is used to front-end the entry's execution. It provides a stack environment for the entry to return. No special routines need to be called by the entry routine to exit the queue. After building a copy of the TSS, the fork queue task performs a task switch to the new copy. The new task starts running at fork_start() and calls the entry routine, passing the four-byte parameter and the address of the TSS copy. When the entry routine returns, fork_start() task switches back to the fork queue task, which continues with the next queue entry. Although this implementation of a fork queue works well for my application, it has some limitations. While the fork queue increases the number of interrupts that can be handled during a burst of acctivity, the extra overhead also increases the interrupt latency (the time from when an interrupt occurs until its processing is completed). Additionally, entry routines are not allowed to utilize system services in the normal fashion. To perform system services, I needed to place hooks allowing the entry routines to directly call internal functions that normal tasks can access only through the system service calls. Future Directions Presently, to run a routine at a very high priority I must have the calling task raise its priority, call the routine, and then lower its priority on return. Placing such routines on the fork queue would be much simpler. Because such a task would be a normal task routine, as opposed to a driver routine, it should have access to system services. You could add system services capability to the fork queue by creating a real task for the fork queue. Presently, the fork queue task is an internal system task without all the information needed to handle system services. It isn't included in the system task table and isn't scheduled in the normal manner. Even if a fork queue entry could use system services, certain ones should be avoided or even ignored. Any service that requires the queue to block or wait would defeat the purpose of the fork queue. Another possible extension to the forked interrupt system is a prioritized fork queue. Some devices may be considered more important than others. For instance, an imminent power failure interrupt should take higher precedence than a clock interrupt. Conclusion The forked interrupt system has shown itself to be a good way to serialize interrupts. Drivers are easier to write since reentrancy is not required. The fork queue allows these non-reentrant drivers to operate in an environment where interrupts are mostly enabled, allowing a faster burst rate of interrupts to be handled in a timely fashion. Listing 1 /* timer_int() -- Timer interrupt routine This routine handle the incoming timer interrupt. The interrupt is acknowledged and cleared. Then the alarm() routine is forked to handle the rest of the timer stuff. */ void timer_int() { while (1) { /* Clear timer interrupt here. */ outp(0x20, 0x20); /* Fork the processing routine. */ fork_driver(alarm); } } /* alarm() -- Timer Alarm routine This routine is the fork routine of the system timer device. It will alert any tasks that have expiring system timers this tick. */ void alarm(dummy, tcb) unsigned long dummy; TSS *tcb; { /* Time slice -- Significant event */ significant_event = 1; /* One more tick... */ ++timer_interrupts; /* For each item on the queue (which is in least to most time to wait order), see if the top of the queue is ready to alert. Alerting consists of setting the target task's event flags and delivering any required Asynchronous Traps (ASTs). */ while (timer_waiting) { /* Change to proper LDT of next task. */ set_ldt(timer_waiting_ldt, tcb); /* Check to see if timer has expired for this task. */ if ((long) (timer_interrupts - ((P_TIMER *) timer_waiting -> pblock) -> wakeup) >= 0) { * Timer has expired. Set appropriate event. */ set_event(timer_waiting -> my_handle.t, timer_waiting -> ef_cluster, timer_waiting -> ef_mask); /* Deliver AST as required. */ deliver_ast(timer_waiting -> my_handle.t, timer_waiting -> ast_addr); /* Remove from timer list and continue. */ timer_waiting_ldt = timer_waiting -> link_ldt; timer_waiting = timer_waiting -> link; } else /* Timer hasn't expired. Don't check any more since the tasks are sorted in ascending order. */ break; } } Listing 2 /* fork_driver(routine, param) void (*routine)(); unsigned long param; This routine places the passed routine onto the fork queue. The driver is allowed to pass up to four bytes to the target routine. The driver's action is considered complete, and it will not be reentered until another interrupt occurs. */ void fork_driver(routine, param) void (*routine)(); unsigned long param; { FORK_PARAM *current_fork; DSS *current_dcb; unsigned chat current_name[NAME_SIZE + 1]; /* Get address of driver's TSS. */ current_dcb = get_tss(NULL); /* Make sure we are allowed to fork this device again. */ if (current_dcb -> current_fork_count >= current_dcb -> max_fork_count && current_dcb -> max_fork_count) { /* Can't fork another entry on this device. Throw away interrupt. (Serious Problem Here!) */ if (in_executive fork_in_process) resume_last(); else resume(scheduler_task); return; } /* Get next entry. */ if (!(current_fork = fork_free)) { /* Out of fork space. Throw away interrupt. (Serious Problem Here!) */ if (in_executive fork_in_process) resume_last(); else resume(scheduler_task); return; } /* Get the next free fork queue entry link */ fork_free = fork_free -> link; /* Fill the entry with the address of the driver's TSS, the interrupt-enabled routine address, the passed parameter, and clear the link. */ current_fork -> tcb = current_dcb; current_fork -> routine = routine; current_fork -> param1 = param; current_fork -> link = NULL; /* Add one to the count of fork entries for this device. */ ++current_dcb -> current_fork_count; /* Link it onto the end of the fork queue */ if (fork_queue) fork_queue_tail = fork_queue_tail -> link = current_fork; else fork_queue = fork_queue_tail = current_fork; /* If in a system service call, or currently executing the fork, resume what we were last doing. Otherwise else start up the fork queue. */ if (in_executive fork_in_process) resume_last(); else resume(fork_queue_task); } Listing 3 /* fork_execute() -- Execute the fork queue This task executes all elements on the fork queue. While the queue is executing, other elements can be place onto the queue. The queue operates by setting up a special TSS to be a duplicate of the queued driver's TSS, except for the current CS:IP and the stack. This allows all forked routines to execute using the exact same environment it would have if the task (or driver) had called the routine directly. Actually, one enviromental difference may occur. All routines executing on the fork will be run with interrupts enabled. In addition, the forked routine will be called in such a manner that all it needs to do is issue a return to exit the routine. The queue will handle the rest. Any return value issued by the forked routine will be ignored. */ void fork_execute() { DWORD fork_addr; OFFSET fork_sp; SELECTOR fork_ss; FORK_PARAM *owner; unsigned char current_name[NAME_SIZE + 1]; current_name[NAME_SIZE] = '\0'; /* Initialize some constants, such as the base stack, and the queue startup routine to use. */ fork_ss = fdummy_tcb.ss; fork_sp = fdummy_tcb.sp; fork_addr.whole = (unsigned long) fork_start; /* Loop to continue task at each invokation */ while (1) { /* Tell the world that we are running the queue */ fork_in_process = 1; /* As long as we have something to do.... */ while (fork_queue) { /* Get the next element */ owner = fork_queue; fork_queue = fork_queue -> link; /* Set up the information for fork_start() */ current_routine = owner -> routine; current_param = owner -> param1; /* Set up the TSS to give the target routine access to the owning task's LDT. Also, set up for interrupts enabled. */ movemem(owner-> tcb, &fdummy_tcb, 44); fdummy_tcb.cs = fork_addr.high; fdummy_tcb.ip = fork_addr.low; fdummy_tcb.ss = fork_ss; fdummy_tcb.sp = fork_sp; fdummy_tcb.flag_word = F_IE; /* Execute task (Task switch) */ fdummy_task(); /* One less fork fork entry for this driver. */ ((DSS *) owner -> tcb) -> current fork count--; /* Place used entry back on free list */ owner -> link = fork_free; fork_free = owner; fork_count- -; } /* Clear fork flag and reschedule */ fork_in_process = 0; resume_cl (scheduler_task); } } Listing 4 /* fork_start() -- Start the forked routine This routine is used to call the forked routine. It passes the four bytes to the target routine and then resumes the previous task, which should always be the fork queue. */ void fork_start() { /* Call the routine, passing the long parameter and the address of the working TSS. */ (*current_routine)(current_param, &fdummy_tcb); /* Task switch back to the fork queue task. */ resume_last(); } Building A Better Boolean With C++ Ron Burk Ron Burk has a B.S.E.E. from the University of Kansas and has been a programmer for the past 10 years. He is currently president of Burk Labs, a small software consulting firm. C++ continues to evolve as a set of incremental changes and additions to C. As a C programmer, you can begin to learn and use C++ in exactly the same way by making incremental changes to your programming style and to the language features that you use. This article provides a simplified introduction to C++ data types by altering a C data type a step at a time to construct a better, C++ data type. C++ offers many features that can be used when constructing new data types, but this article will focus on just two: data hiding and user-defined conversions. These two language features give you the ability to define C++ data types that have many of the same privileges as built-in data types such as int or float. What Do You Want From A Data Type? Different languages take different approaches to data types. On one end of the spectrum are typeless languages; for example, a BASIC interpreter may have a single string data type and automatically convert between numeric and string formats when required. On the other end of the spectrum are languages with strict type checking; Pascal, for instance, detects and disallows any attempt to use a variable that is not of the correct data type. C falls somewhere between these two extremes. It contains multiple data types (and allows the user to create more), but it also provides automatic conversions between certain data types. C++ is stricter about type checking than C. For example, the following code fragment is legal C, but not legal C++: main() { void f(); f(45); } In C, you don't have to specify the number and type of arguments that a function requires; in C++ you must. Even a simple data type such as Boolean can be implemented in a number of different ways. The best implementation depends upon your circumstances. For example, if you are making a data type for a project that three programmers will work on for four months, you will probably favor an implementation that is easy to use and simple to construct. On the other hand, if you are making a data type for a five-year project that will involve 100 programmers, you may be willing to give up some ease of use in exchange for stricter type checking so the compiler can catch more programmer mistakes. It is in the larger coding projects, involving more than one person, that C++ holds a clear advantage over C. Why invest time carefully designing a Boolean data type if you never write programs more than a few pages long? As the type-constructing tools provided by C++ are contrasted with those of C, you will see how you can use C++ to tailor a type definition to your own unique needs. Designing a Boolean Type Although C does not contain a built-in Boolean type, the concept arises twice in the language. First, the result of a relational expression is defined in C to be of type int and equal to either one or zero. Second, C control statements consider any non-zero expression to be true. Therefore, it could be desirable for a Boolean type to have analagous properties. First, a Boolean variable should always be equal to either zero or one. Second, you should be able to assign any integer to a Boolean variable and the result should be one if the integer was non-zero; the goal is to construct a Boolean type which is compatible with the C type int. The following code, for example, should perform correctly: bool func(){ bool more; while (more=fread (/*args*/)){ if(/*some condition*/) return more; } } The standard library function fread() reads data items from a file and returns the number of items successfully read. In this example, the caller is only interested in whether the number of items read successfully is zero (end-of-file) or non-zero. The code above can only work if the type bool is compatible with the normal built-in scalar types. Otherwise, assigning an int to a bool would require gyrations like this: while(more=(bool)fread(/*args*/)){ /* or worse: */ while (more=(fread (/*args*/) !=0)) { Of course, if you accidentally assigned an int to a bool in some program, the compiler would not complain. Choosing to make a Boolean type that is compatible with scalars exchanges some compiler error detection for ease of use. typedef Does Not Create Types Here is a possible implementation of a Boolean type in C: /* bool.h */ typedef int bool; One problem with this implementation is that typedef does not create a new, distinct type; it only creates a synonym for an existing type. In a big project, you might have more than one data type that is an int- compatible scalar, but they may have no relationship to the bool data type. Unfortunately, if they are created with typedefs as shown above, they will all be synonyms for int and the compiler will let you mix and match them without complaining. Although typedef does not create a new data type, struct does. Here is an alternative implementation of bool in C: /* bool.h */ typedef struct{int val;} bool; This introduces a new type called bool which is incompatible with other data types. The compiler will object if you try to assign a variable of a different data type to a variable of type bool. Unfortunately, the compiler will also object if you try to assign an integer to your bool variable: bool a = 1; /* error */ bool b.val = 1; /* ok, but painful to type */ User Conversions Although a C struct creates a new data type, it does not have any facility for telling the compiler what other data types should be compatible with the new one. In this case, you want to be able to specify that an int can be converted into a bool, and a bool can be converted into an int. You could augment your C Boolean type with type conversion functions. The result might look like this: /* bool.h */ typedef struct{int val;} bool; bool booli(int i); int ibool(bool b); The definition of the conversion functions would be placed in another file: /* bool.c */ #include <bool.h> bool booli(int i) { bool ret; ret.val = i; return ret; } int ibool (bool b) {return b.val;} Finally, you have a new data type, bool, which can be converted to and from integers. This C implementation has several problems, however. First, this bool implementation is clumsy to use; the syntax is less than elegant. A typical use of this implementation might be: bool b; b = booli(x > 5); printf("b = %d\n", ibool(b)); Second, there is a problem with the structure member val. Each variable of type bool should always be either zero or one. So long as the programmer uses the conversion functions, all is well. Unfortunately, there is nothing to prevent the data type user from making mistakes such as this one: int l = 45; bool b; /* set b to 1 */ b.val = l; In this case, the Boolean has been set equal to 45 because the letter "l" looks like the numeral "1". Finally, the conversion functions create extra time and space overhead in the generated code. A function call is relatively cheap in C, but it should not be necessary for such simple conversions as these. Converting To C++ You can repair these deficiencies by taking advantage of C++ language features. The first change is to remove the typedef: /* bool.h */ struct bool {int val;}; bool booli(int i); int ibool (bool b); In C++, unlike C, a struct declaration causes the structure tag name to become a new data type; in other words, the following code compiles in C++ but it doesn't in C: #include <bool.h> /* legal C & C++ */ struct bool b; /* legal C++, not C */ bool b; C++ also has a keyword, called private, that you can use to prevent the data type user from accidentally setting val to something other than zero or one. Consider the following header file: /* bool.h */ struct bool { private: int val; }; bool booli(int i); int ibool(bool b); The private keyword tells the compiler that the structure members that follow it cannot be accessed from outside this data type. Thus, the following code becomes illegal: include <bool.>h ... bool a; a.val = 5; /* error: val is private */ Of course, we've painted ourselves into a corner now booli() and ibool() won't compile because they need to access val. Private structure members aren't much use without a syntax for allowing certain functions to access them. The easiest way to do that is with the friend keyword, like this: /* bool.h */ struct bool { friend bool booli(int i); friend int ibool(bool b); private: int val; }; This makes booli and ibool friend member functions of the structure bool. Now the conversion functions will compile, since the compiler knows they are allowed to access the private members of bool. Since you control the functions that can modify a bool, you can guarantee that it only takes on the values zero and one. You can make the connection between the conversion functions and the data type more explicit by making the conversion functions member functions of the structure. A member function of a structure has the same privileges as a friend member function, but it has a different invocation syntax and can only be used to operate on the data type with which it was declared. Changing the two access functions into member functions results in the following header file: /* bool.h */ struct bool { void booli(int i); int ibool (); private: int val; }; Two things have changed: the friend keyword is removed, and bool is no longer passed to, or returned by, the member functions. Here is an example of legal invocations of the two functions: #include "bool.h" bool b; b.booli(45); int i = b.ibool(); As you can see, the syntax for calling member functions in a structure is analogous to the syntax for selecting data members in a structure. A member function is implicitly passed a pointer to the variable it was invoked with; that is why ibool() no longer needs an explicit bool argument and booli() no longer needs to explicitly return a bool value. The definition of the two member functions must also change if they are not friend functions. The corresponding changes in bool.c are: /* bool.c */ #include <bool .h> void bool::booli(int i) { val = i; } int bool::ibool(){ return val; } The first change results from the fact that the complete name of a C++ member function is typename::function, which distinguishes it from any non-member functions. This syntax allows you to use the same member function name in different data types without any conflict. The second change is that the functions can refer to the data members of the structure (in this case, val) as though they were local variables. This is possible because member functions cannot be invoked without some variable of the correct data type. Of course, the user of data type bool still has to type things like b.ibool(), just to reference a Boolean; however, you now have hidden the implementation of the data type. If you ever discover a bool that has been set to something other than zero or one, you know the bug must be in the member functions. If you want to change val to be a char instead of an int, you can be confident that only the member functions will be affected by the change. Data Type Initialization Actually, there is a problem in the protection for bool variables; if the user doesn't initialize a bool variable, then it may contain garbage instead of a one or a zero. You can eliminate this loophole by telling the compiler that each variable of data type bool must be initialized when it is declared. This is done by declaring a special member function called a constructor. A constructor looks like an ordinary member function except that it has the same name as the data type and it has no return type. In this case, you can simply change the name booli() to bool() as follows: /* bool.h */ struct bool { bool (int i); int ibool (); private: int val; }; Creating a constructor for a data type has three main implications. First, the compiler will no longer allow a variable of that data type to be declared without being initialized. Second, if the constructor takes a single argument as this one does, the compiler will use the constructor whenever it sees a cast from the argument data type into the data type the constructor is a member of. Finally, if the constructor takes a single argument, it defines an implicit conversion from that argument's data type into the data type the constructor is a member of. These implications deserve further explanation. Defining a constructor guarantees you won't have any uninitialized variables of a particular data type. In other words, with the newest header file, the following code won't compile: #include "bool.h" bool b; /* error */ The compiler will complain that the variable b must be initialized. The constructor also introduces a new syntax for initializing the variables of its data type: #include "bool.h" bool b(9); In the statement shown, the function bool::bool() gets called to convert the integer 9 into a Boolean 1. Defining a constructor with a single argument also enables the cast operator for that data type. In this case, the constructor will be called whenever you cast an integer into a Boolean as in the following example: #include "bool.h" bool b(1); int i = 38; ... b = (bool) i; In addition to the C-style type cast, C++ allows a function-style type cast. For example, in C++ you can say: int i; long l; l = (long)i; /* legal C & C++ */ l = long(i); /* legal C++ not C */ The function-style syntax is simply easier to read. This expands the number of ways you can initialize a bool to three: #include "bool .h" bool b(9); bool c = (bool)9; bool d = bool(9); In all three cases, the function bool::bool() gets called to perform the conversion. Finally, a constructor with one argument defines an implicit conversion. Just as you can assign an int to a long because of C's built-in implicit conversion, you can now assign an int to a bool, because an implicit conversion has been defined for it. In other words, you don't have to use the cast operator and the following code fragment is legal C++. #include "bool .h" bool b(1); int i; b = i; /* OK */ Now you have a Boolean that can be assigned an integer value which gets converted by the function you've defined. There is no way to define a constructor function that takes a single argument without getting all three of these effects. This is a fact to ponder when you define a constructor for a data type. You can't tell the compiler to require the user to use an explicit cast to convert an integer into a Boolean. You can't tell the compiler to allow a Boolean to be initialized with an integer, but not assigned an integer. Sometimes, this forces you to avoid using a constructor and return to the explicit function notation used previously. If you want all of your Booleans to be initialized, but don't want to have to type all those initializers, you can take advantage of a C++ feature called default arguments. Here is how it works: /* bool.h */ stuct bool { bool(int i=0); int ibool (); private: int val; }; Now, whenever the compiler would normally call bool::bool() but does not have an argument for it, it will use a value of zero. Thus, both of the following statements cause bool::bool() to be invoked with an argument of zero: #include "bool .h" bool b; /* initialize to zero */ bool b = bool(); Overloading The Cast Operator Telling the compiler how to implicitly convert ints into bools is only half the job of making the two data types compatible. The remaining task is to replace bool::ibool() with a conversion function that tells the compiler how to implicitly convert bools into ints. If int were a user-defined type instead of a built-in type, you could define within it a constructor that takes a single bool argument and that would do the trick. Since that is not possible, C++ gives you another way to define implicit conversions: you can redefine how the cast operator operates on your data type. (In fact, you can redefine how most any operator operates on your data type). Once again, the type conversion requires a member function, but this time it is an operator member function. This revised header replaces ibool() with a type conversion function: /* bool.h */ struct bool { bool (int i); operator int(); private: int val; }; The revised specification of bool tells the compiler you have defined a function that it can use whenever casting a bool into an int. The revised implementation looks like this: /* bool.c */ #include <bool.h> bool::booli(int i) { val = (i != 0); } bool::operator int(){ return val; } Having a function named bool::operator int() is a little disconcerting, but the code is otherwise the same. Just as with constructor functions, the conversion function can be used either implicitly or explicitly. The following code, for example, is legal: #include "bool.h" bool b=1; int i; /*...*/ i = (int)b; i = int(b); i = b; In all three assignments, the function bool::operator int() is called to transform the bool into an int. There is no way to define a type conversion that must be used explicitly. Just as with constructors, when implicit conversions are undesirable, you must resort to something like the explicit member function call defined previously using bool::ibool(). The bool data type is nearly finished now. The original example of the desired usage of the data type is completely legal now: bool func (){ bool more; while (more=fread (/*args*/)){ if(/*some condition*/) return more; } } You can freely mix bools and ints while still controlling initializations of, and assignments to, bools and still hiding the actual representation of a Boolean so that you can change it to short or char without impacting the data type user's code. Efficiency The final deficiency to remove from the bool definition is the function call overhead. In defining complicated data types in which the conversion functions contain a lot of code, the function call overhead would not be significant. In this example, however, the function call overhead is probably larger than the entire body of the conversion functions. You can remove that overhead quite easily by placing the function definition right in the data type specification. Doing that and adding some useful constants results in the following header file: /* bool.h */ #define FALSE 0 #define TRUE 1 struct bool { bool(int i){val = (i!=O);} operator int(){return val;} private: int val; }; This style of function definition makes the functions inline, which means the compiler will try to generate inline code for them each time they are used, rather than call them as functions. There is actually an inline keyword in C++ which you can place in front of any function you would like to be inline. The simpler the function is, the more likely the compiler can generate inline code for it. When the compiler cannot inline the code, it simply generates a normal function call. Functions that are defined inside data type specifications are automatically inline functions, so there was no need for the keyword in the above specification. Now the bool data type is as efficient as anything you could implement in C to do the same thing. Each variable of type bool is guaranteed to be either zero or one at all times and you are free to change how the Boolean is stored and how it is converted to and from integers without affecting any other code. Unfortunately our current specification still looks a bit like a C programmer coded it; here is how a more fluent C++ programmer might write it: // bool .h class bool { int val; public: bool(int i){val = (i!=0);} operator int (){return val ;} }; const bool FALSE=0,TRUE=1 For single-line comments, it's often easier to use the C++ comment convention //. The keyword class is just like the keyword struct except that all of its members are private by default, whereas in a struct all the members are public by default. C++ has both a private keyword and a public keyword and they complement each other. Finally, the #define constants are replaced with constants that have the type bool rather than type int. Now the specification looks more like "real" C++. Making Incompatible Types As mentioned, a large project might contain more than one data type that is compatible with ints. For example, a game simulation might contain a data type called playercount (a non-negative count of the number of players within a particular grid). It could be desirable for a data type like this to be compatible with ints just as bool is, but would code like this also be legal? #include "bool .h" #include "player.h" playercount p; bool b; p = b; The answer, thankfully, is "no". Even if the compiler has been told how to implicitly convert a playercount into an int, and how to implicitly convert an int into a bool, the rule is that the compiler never uses more than one user-defined implicit conversion on a single value. Without this rule, the number of possibly unwanted implicit conversions would mushroom as you added each data type with a user-defined conversion. If you really want a second user-defined conversion to be applied, you can code it explicitly: #include "bool .h" #include "player.h" playercount p; bool b; p = playercount (b); Conclusion C++ makes it easy to construct data types that isolate implementation details. Additionally, user conversions give you a degree of control over how your data types interact with other data types. Like many features of C++, it is possible to get into trouble defining new data types. In particular, the danger of user-defined conversions is that the compiler will perform silent conversions you did not intend. This is a very real problem when you construct multiple data types for a large project, each with its own conversion functions. However, these potential problems are outweighed by the advantages of being able to design a type system that suits the individual needs of your project. A Practical C File I/O Tutorial: A Mini-Database Program Leor Zolman wrote "BDS C", the first C compiler designed exclusively for personal computers. Since then he has designed and taught programming workshops and has also been involved in personal growth workshops as both participant and staff member. He still doesn't hold any degrees. His latest incarnation is as a CUJ staff member. Series Introduction If you're a recent convert to C from any other high-level language and you've tried to write programs that do any serious file input/output, then chances are you've experienced more than a little bit of frustration. The C standard library, in keeping with the general philosophy of the C language, provides tools powerful enough for doing anything you want, provided you know how to correctly combine those tools. In the case of file I/O, about the only operations supported in a "trivial" manner are: reading and writing bytes reading and writing single lines of text For reading and writing any other flavor of data structure to or from the disk, a certain level of "C sophistication" is required. Often, the task quickly moves beyond "How do I read and write this data?" toward the more general problem, "What is the most appropriate way to represent this data in order to facilitate efficient means of reading and writing it to disk?" In this series of tutorial articles, I will develop from scratch a complete special-purpose "mini-database" system in order to illustrate the process of designing file based C applications. The resulting system will be functional but intentionally inadequte for any particular task. This first installment will consist of an operational description of the database, broken into the following areas: data structures functional description user interface (the menu system) The second installment will present the database record editing and management mechanism. Later installments will present several different approaches to storing the data on disk and will discuss the relative merits of each approach. The first version will store all data to disk as user-readable text and will use statically-allocated arrays in memory. The second version will store all data to disk in binary format for rapid transfer. I'll also develop two memory allocation systems for the binary version: static array allocation (same as for the textual disk format) and dynamic array allocation (to optimize the use of system memory). Mini-Database Data Record Structure This will not be a "general-purpose" database system, but rather a program built to handle only one specific record format: a personnel record as in Table 1. The definition of the structure tag for this record, named record, is shown in the header file (Listing 1, lines 30-37). The system will be able to handle only one active database at a time. We'll use dynamic memory allocation to obtain storage for the data records, so that data memory is allocated only when necessary. For the first version of the system, the list of data record pointers will be kept in a statically-allocated (i.e., fixed-length) array. The definition of this array is shown on line 49 of the header file. The name of the array is recs, and its type is array (of MAX_RECS elements) of pointers to structures of type record The programmer must explicitly size a fixed-length array. In my code the size is MAX_RECS. Thus, the total amount of fixed memory needed for storing the records of the database is MAX_RECS times the size of a single record pointer. (In later versions, I'll even show how to dynamically allocate the storage for the recs array itself. To facilitate this modification the symbol RECS is introduced (Listing 1, line 50) as an alias for recs.) Lines 42-46 (of Listing 1) illustrate a necessary complication when writing multiple-source-file programs: global data must be defined in one module and one module only. If the data is to be known in any other modules of the program, it must be declared in those other modules. Definitions actually reserve storage for the specified data, while declarations only serve to inform the compiler about the nature of data defined elsewhere. This simplistic rule of thumb will usually differentiate between definitions and declarations appearing in header files: If the extern modifier is used, you're probably looking at a declaration; otherwise, you're probably looking at a definition. To conform with the ANSI Standard, each data item should be defined only once among all the source modules of a program. At first it might seem one need only insert an extern keyword in front of all but one declaration, making it the definition. Unfortunately, this is not easily done. Typical multiple-module programs use lots of shared data; do we really want to maintain separate lists of declarations in separate modules, some having the extern keyword and some not? Of course not; we'd rather have all the data included within a single .h file. But if the declarations/definitions must be written differently in separate files, can we really use a single header file? Yes. Lines 42-46 illustrate a symbolic constant to control whether the extern keyword is generated for the critical declarations. If MAIN_MODULE exists, then we are compiling the main module of the program and the symbolic constant EXTERN is defined to nothing (so the items in lines 49, 52 and 53 are defined). Otherwise EXTERN is defined to extern and the lines are treated as declarations of external data. To force definitions to be created as the main module is compiled, we #define MAIN_MODULE (see Listing 2, line 24) before the inclusion of the header file. The other modules of the program do not contain such a definition. (Note: The difference between definitions and declarations has been rendered fuzzy by variations among C compilers over the years. Microsoft, perhaps to eliminate the need for exactly the sort of mechanism just presented, decided to make its linker allow multiple definitions of the same piece of data among source files of a program (although multiple initializations were still flagged as errors.) While this does simplify development in some cases, it renders C programs relying on this "feature" non-portable. Turbo C 2.0, under which this database program was developed, makes you "do it right", even if doing so requires a little bit more thought.) Other Global Data The system maintains a minimal amount of global data to describe the currently open database's state. The variable n_recs tells how many records are currently held in memory. The variable max_recs contains the maximum number of records that can be represented. For the fixed-length array version, max_recs is simply assigned the value of the symbolic constant MAX_RECS (Listing 2, line 72). A Simple Menu System A simple line-oriented menu system serves as the user interface. The menu function do_menu is shown in MDBUTIL.C (Listing 3, lines 21-39.) The menu consists of a list of pointers to structures of type menu_item (Listing 1, lines 55-58), where each menu_item consists of an integer action code and a string description of the action. do_menu simply numbers and lists out each description (up to but not including the first entry with action_code of 0), asks the user to pick one of the choices, and returns the action_code value associated with the selected item. Note that the action_code values need not correspond to the choice numbers displayed by the function. The Main Menu Options The database operations are divided into two menus. The first menu (Listing 2, lines 37-48) contains the options for controlling database selection, disk I/O and program termination. The second menu, within the MDBEDIT.C module (shown in a future article) controls all the options associated with editing the data records of the currently active database. The main menu controls the top-level system functions. A variable, db_active, tells whether a database is currently open, and thus whether certain operations are appropriate. For example, we don't want to allow the user to open a new database if another is currently open. The main menu options are as follows: CREATE: Initialize a new database. Ask the user for a name for the database (this will also be the name of the file used to store the database on disk) and check to make sure another file does not already exist by that name. If the name is OK, then initialize max_recs, n_recs and db_active. EDIT: Call the edit_db() function to edit the records of the database. OPEN: Load a previously stored database from disk (via the read_db function), then go immediately into editing that database by calling edit db. read_db() allocates the appropriate amount of memory for the database records, assigns the pointers to elements in the RECS array, and returns the number of records loaded. We announce the number of records before calling edit_db(). BAKUP. This menu entry is included to encourage backup facilities in your applications. The backup function, backup_rib(), is just a dummy. CLOSE: Terminate operations on the current database, write it to disk and free up all associated storage. SAVE: Write the database to disk, preventing loss of work "so far" in case of a system crash. Do not close the database. ABANDON: Close the database without saving it to disk: free up all storage. QUIT: Exit the program. Utility Functions Listing 3 shows the source module MDBUTIL.C, containing utility functions used throughout the program. In addition to the do_menu() function (already described), this module includes error(), alloc_rec() and free_up(). The error() function is a general-purpose fatal exit. It prints a message and exits. The alloc_rec() function is not used by any of the code in this month's listing, but is basic to the operation of the program. alloc_rec() is called to obtain memory from the system to store a single record of database data. The malloc() function is called to actually obtain the block of storage. alloc_rec returns either NULL, signaling that the system has no more storage to spare, or a valid memory pointer obtained from malloc(). The free_up() function returns all storage (obtained through calls to alloc_rec) back to the system. In this system storage is always freed up for the entire database at one time (when the current database file is closed or abandoned.) Freeing that storage is simply a matter of walking through all the records of the database and calling the free() function for each pointer. Next month: Editing the database records. Table 1 Name: Type: Value: active char 1 if record is active, 0 if deleted last char[25] Last name first char[15] First name id long ID number age int Age gender char 'M' or 'F' salary float Annual salary Listing 1 1: /* 2: * MDB.H (Static-Array-Only Version) 3: * 4: * Program: Mini-Database 5: * Written by: Leor Zolman 6: * Module: Program Header File 7: */ 8: 9: #define TRUE 1 10: #define FALSE 0 11: 12: /* 13: * Prototypes: 14: */ 15: 16: int do_menu(struct menu_item *mnu, char *title); 17: void write_db(char *filename); 18: int read_db(char *filename); 19: void edit_db(); 20: void fix_db(); 21: void backup_db(); 22: void error(char *msg); 23: struct record *alloc_rec(void); 24: void free_up(); 25: 26: /* 27: * Data Definitions: 28: */ 29: 30: struct record { /* Database record definition */ 31: char active; /* TRUE if Active, else FALSE */ 32: char last[25], first[15]; /* Name */ 33: long id; /* ID Number */ 34: int age; /* Age */ 35: char gender; /* M or F */ 36: float salary; /* Annual Salary */ 37: }; 38: 39: #define MAX_RECS 1000 /* Maximum number of records */ 40: 41: 42: #ifdef MAIN_MODULE /* Make sure data is only */ 43: #define EXTERN /* DEFINED in the main module, */ 44: #else /* and declared as EXTERNAL in */ 45: #define EXTERN extern /* the other modules. */ 46: #endif 47: 48: 49: EXTERN struct record *recs[MAX_RECS]; /* Array of ptrs to */ 50: #define RECS recs /* structs of type record */ 51: 52: EXTERN int n_recs; /* # of records in current db */ 53: EXTERN int max_recs; /* Max # of recs allowed */ 54: 55: struct menu_item { /* Menu definition record */ 56: int action_code; /* Menu item code */ 57: char *descrip; /* Menu item text */ 58: }; 59: Listing 2 1: /* 2: * MDBMAIN.C (Static Array Only Version) 3: * 4: * Program: Mini-Database 5: * Written by: Leor Zolman 6: * Module: Main Program Module 7: * 8: * Program Description: 9: * This system is an "introductory showcase" of 10: * C programming techniques for File I/O-related 11: * applications. Areas of focus include: 12: * Static and Dynamic Array Allocation 13: * Text-based and Binary-based Disk Data Storage 14: * Elementary user-interface and error-handling 15: * 16: * Compile & Link (Turbo C): 17: * tcc mdbmain.c mdbedit.c mdbutil.c 18: * {mdbftxt.c or mdbfbin.c} 19: */ 20: 21: #include <stdio.h> 22: #include <stdlib.h> 23: 24: #define MAIN_MODULE 1 /* force data definitions */ 25: #include "mdb.h" 26: 27: 28: #define CREATE 1 /* Main menu action codes */ 29: #define OPEN 2 30: #define EDIT 3 31: #define SAVE 4 32: #define BAKUP 5 33: #define CLOSE 6 34: #define ABANDON 7 35: #define QUIT 8 36: 37: static struct menu_item main_menu[] = 38: { 39: {CREATE, "Create New Database"}, 40: {OPEN, "Select Existing Database to Work With"}, 41: {EDIT, "Edit Database Records"}, 42: {SAVE, "Write Database to Disk"}, 43: {BAKUP, "Backup Database to Floppies"}, 44: {CLOSE, "Close the Database"}, 45: {ABANDON, "Abandon Changes to the Current Database"}, 46: {QUIT, "Quit"}, 47: {NULL} /* End of list */ 48: }; 49: 50: 51: main(int argc, char **argv) 52: { 53: char db_name[150]; 54: int db_active = FALSE; /* No Database open */ 55: FILE *fp; 56: 57: while (1) 58: { 59: switch(do_menu(main_menu, "Main Menu")) 60: { 61: case CREATE: 62: if (db_active) 63: goto still_open; 64: printf("Name for new Database? "); 65: gets(db_name); 66: if ((fp = fopen(db_name,"r")) != NULL) 67: { 68: printf("That filename already exists.\n"); 69: fclose(fp); 70: break; 71: } 72: max_recs = MAX_RECS; 73: db_active = TRUE; 74: n_recs = 0; 75: printf("Entering EDIT mode:\n"); 76: /* After creating, fall through to EDIT */ 77: 78: case EDIT: 79: if (!db_active) 80: goto inactive; 81: edit_db(db_name); /* Edit recs in memory */ 82: break; 83: 84: case OPEN: 85: if (db_active) 86: { 87: still_open: printf("Current Database still open.\n"); 88: break; 89: } 90: printf("Database Name? "); 91: gets(db_name); 92: if ((n_recs = read_db(db_name)) != NULL) 93: { 94: printf("\nLoaded %d Record(s).\n", 95: n_recs); 96: db_active = TRUE; 97: } 98: 99: edit_db(db_name); 100: break; 101: 102: case BAKUP: 103: if (!db_active) 104: goto inactive; 105: backup_db(); /* Perform backup */ 106: break; 107: 108: case CLOSE: 109: if (!db_active) 110: goto inactive; 111: write_db(db_name); /* write to disk */ 112: free_up(); 113: db_active = FALSE; 114: break; 115: 116: case SAVE: 117: if (!db_active) 118: goto inactive; 119: write_db(db_name); /* write to disk */ 120: break; 121: 122: case ABANDON: 123: if (!db_active) 124: { 125: inactive: printf("Please select a Database!\n"); 126: break; 127: } 128: free_up(); 129: db_active = FALSE; 130: break; 131: 132: case QUIT: 133: if (db_active) 134: { 135: write_db(db_name); /* write to disk */ 136: free_up(); 137: } 138: exit(0); 139: } 140: } 141: } 142: 143: /* 144: * Function: backup_db 145: * Purpose: Backup current Database to floppies 146: * Parameters: None 147: * Return Value: None 148: */ 149: 150: void backup_db() /* Backup module */ 151: {} Listing 3 1: /* 2: * MDBUTIL.C 3: * 4: * Program: Mini-Database 5: * Written by: Leor Zolman 6: * Module: Utility functions 7: */ 8: 9: #include <stdio.h> 10: #include <stdlib.h> 11: #include "mdb.h" 12: 13: 14: /* 15: * Function: do_menu 16: * Purpose: Simple line-oriented menu handler 17: * Parameters: None 18: * Return Value: None 19: */ 20: 21: int do_menu(struct menu_item *mnu, char *title) 22: { 23: int i, j; 24: char buf[150]; 25: 26: printf("\n%s -- Options:\n", title); 27: for (i = 0; mnu[i].action_code != NULL; i++) 28: printf("%2d) %s\n", i+1, mnu[i].descrip); 29: 30: while (1) 31: { 32: printf("\nYour choice? "}; 33: j = atoi(gets(buf)); 34: if (j >= 1 && j <= i) 35: break; 36: printf("Please select from options 1-%d: ", i+1); 37: } 38: 39: return mnu[j - 1].action_code; 40: } 41: 42: 43: /* 44: * Function: error 45: * Purpose: Report error end terminate program 46: * Parameters: Message to display 47: * Return Value: None 48: */ 49: 50: void error(char *msg) 51: { 52: printf ("Fatal Condition: %s\n", msg); 53: exit(-1); 54: } 55: 56: 57: /* 58: * Function: alloc_rec 59: * Purpose: Allocate memory for a Database record, 60: * checking for an allocation error 61: * Parameters: None 62: * Return Value: Pointer to memory, or NULL on error 63: */ 64: 65: struct record *alloc_rec(void) 66: { 67: struct record *temp; 68: 69: if ((temp = malloc(sizeof(struct record))) == NULL) 70: return NULL; 71: else 72: return temp; 73: } 74: 75: 76: /* 77: * Function: free_up 78: * Purpose: De_allocate all records in current Database 79: * Parameters: None 80: * Return Value: None 81: */ 82: 83: void free_up() 84: { 85: int i; 86: 87: for (i = 0; i < n_recs; i++) 88: free(RECS[i]); 89: } 90: 91: Using Files As Semaphores Lyle Frost Lyle Frost is the owner of Citadel, a consulting and software development firm. He can be contacted at 241 E. Eleventh St., Brookville, IN 47012 or on the Citadel BBS at (317) 647-2403. Multitasking operating systems allow a single application to execute as a group of concurrent processes. These processes must usually share access to common resources, such as data files or shared memory. In the case of a multi-user database system, for example, each user would access the same set of data files through separate processes. Concurrent access to a shared resource requires some synchronization to prevent the processes from interfering with one another. The semaphore is one of the primary constructs for achieving this synchronization. Implementing semaphores is in general a difficult task. However, files may serve as a very simple implementation. Mutual Exclusion A program segment that accesses a shared resource is called a critical section. While in a critical section, a process must prevent other processes from entering critical sections requiring the same resource. This type of synchronization is called mutual exclusion. For example, consider two concurrent processes A and B which simultaneously modify the same record. Two distinct accesses are needed to modify a record; the original record must first be read from its storage area, then after being modified, must be written back. Without mutual exclusion, the timing of the individual accesses by process A relative to those by process B is unpredictable. If, for instance, the write by process A occurred during the interval between the read and write by process B, the modification made by process A would be lost (Figure 1a). Only if the two critical sections happened not to overlap would the correct result be obtained (Figure 1c). Mutual exclusion ensures that conflicting critical sections do not overlap. The basic principle of mutual exclusion is not complicated: before entering a critical section, a process must somehow allocate the required resource for its exclusive use. If the process fails to obtain the resource because it is currently allocated by another process, execution of the critical section must be postponed until the resource can be acquired. The critical section may be executed only after successfully allocating the resource, which must be freed at the conclusion of the critical section. Figure 2 shows two processes attempting to modify the same record simultaneously, but using mutual exclusion. Since an allocated resource impedes the execution of other processes needing the same resource, critical sections should execute as quickly as possible, and contain only the code necessary to complete the operations on the resource. For instance, a critical section should not contain code to read user input (unless, of course, the resource is a keyboard). Semaphores Semaphores are used to enforce mutually exclusive access to shared resources. A semaphore is simply a flag that indicates to other processes that a specific resource has been allocated. A process lowers a semaphore to indicate that a resource is in use, then raises the semaphore when it has finished with the resource. Successfully lowering a semaphore and entering the critical section is also referred to as "passing the semaphore". The terminology derives from the device used on railroads to show when a section of track is occupied. The railroad semaphore is historically a mechanical arm that lowered a flag when the section of track it marked was in use, then raised it when the track became free. The following pseudocode outlines the semaphore lower and raise operations. Traditionally, a value of 0 is used for a lowered semaphore and a value of 1 for a raised semaphore. semlower(semaphore s) { if s == 1 /* if semaphore is raised, */ then s = 0 /* lower semaphore */ else s not available } semraise(semaphore s) { s = 1 /* raise semaphore */ } At first glance, these two routines may seem trivial to implement -- and they would be, except that semlower itself has a critical section; it must first perform a test operation to check if s is raised, followed by a set operation to lower s. Suppose two processes simultaneously attempted to lower the same semaphore. If the timing was such that the test and set operations of the two processes interleaved, each process would believe it had lowered the semaphore. The semaphore is not in itself a solution to the root problem of implementing mutual exclusion. Introducing the semaphore merely confines the general concurrency problem to a single critical section. Once mutual exclusion has been effected for the critical section in semlower, semaphores can then provide mutual exclusion for all other critical sections. There are two fundamental requirements for a semaphore implementation: The semaphore must be visible to all processes which manipulate it. Mutual exclusion must be effected for the critical section in semlower. Assuming that a shared memory facility is available, the first requirement can be met without great difficulty. The second, however, is a problem. While mutual exclusion algorithms have been devised, they are all quite complex. However, by using files as semaphores, the complex algorithms, as well as the need for shared memory capabilities, can be avoided. Files clearly fulfill the visibility requirement, but how they provide mutual exclusion is not so obvious. Access to the file system is controlled by the operating system and requires special functions referred to as system calls. In UNIX, for example, open, close, and unlink are file-related system calls. Though invoked exactly like a regular function, a system call causes code within the operating system to be executed. (Note that while the stdio library functions which access the file system are regular functions, they must be written using system calls -- fopen would call open, remove would call unlink, etc.) When called to delete a file, the operating system must first check that the file exists, then delete it. This is a test and set sequence and as such, requires mutual exclusion. Since the operating system has complete control over process scheduling, it can easily force a test and set to execute consecutively. Because the operating system ensures this mutual exclusion, files can be used as semaphores. Deleting and creating a file correspond to lowering and raising a semaphore, respectively. Listing 1 and Listing 2 show the salient parts of the source code for an implementation of semaphores using files. Listing 1 (semaphor.h) should be included by programs using these routines so that all semaphore files needed by an application may be created as a "set" in a dedicated directory. The semaphore set control structure semset_t defined in semaphor.h contains the two values defining a semaphore set: the name of the directory containing the semaphore files and the number of semaphores in the set. A semaphore set is opened using the semopen function. semset_t *semopen(char *semdir, int flags, int semc); semdir is the name of the directory containing the semaphore files. flags values are constructed by bitwise OR-ing command and permission flags. If the SEM_CREAT flag is set, the set will be created if it does not exist. If SEM_EXCL is also set, semopen will fail (returning -1) if the semaphore set already exists. The operating system dependent permission flags determine access to the semaphore set and are usually defined in <sys/stat.h> as macros of the form S_I*. If the semaphore set must be created, it will include semc semaphores; otherwise, semc is ignored. The macro semcount counts the number of semaphores in an open set. semopen returns a semaphore set pointer which is used by other semaphore functions. An individual semaphore can be lowered using the semlower function. int semlower(semset_t *ssp, isemno); ssp is a semaphore set pointer obtained from a previous call to semopen, and semno is the number of the semaphore in that set to be lowered. If the semaphore is already lowered, semlower fails and sets errno to EAGAIN. The following code fragment illustrates the use of semlower. for (n = 0; n < MAXTRIES; n++){ if (semlower(ssp, semno) == -1) { if (errno == EAGAIN) { /*semaphore already lowered */ continue; } else { /* error */ break; } }else { /* semaphore successfully lowered */ break; } } A semaphore is raised using the semraise function. int semraise(semset_t *ssp, int semno); ssp and semno are the same as for semlower. The source code for semlower and semraise is shown in Listing 2. After finishing with a semaphore set, it should be closed using semclose. int semclose(semset_t *ssp); All semaphores lowered by a process should be raised before calling semclose. Finally, int semremove(char *semdir); removes all the semaphore files from directory semdir then removes the directory. Shared Locking By definition, a semaphore allocates a resource for exclusive use by a single process. But in many applications there are two types of critical sections: those sections requiring exclusive access, and those which may share access with each other, but not with critical sections of the first type. Introducing the second type of critical section creates what is usually referred to as the Readers and Writers Problem. A critical section where data will be written to a file usually requires exclusive access to the file, but critical sections which will only read data may share access with each other. (In database terminology exclusive locks are also called write locks, and shared locks are also called read locks.) Even though semaphores allocate only for exclusive access, they can also be used to implement read and write locking. Each resource lock requires two semaphores and a shared integer variable. The first (write) semaphore prevents any other process from write locking the resource. The second (read) semaphore locks not the resource, but the shared variable. The shared variable is the read count. It contains the number of processes which have the resource read locked. If wsem is the write semaphore, rsem is the read semaphore, and rc is the read count, then the algorithm to read lock a resource is: semlower(rsem) /* allocate read count */ if rc == 0 /* if no other readers, */ semlower(wsem) /* allocate resource */ rc++ /* increment read count */ semraise(rsem) /* free read count */ The shared variable containing the number of readers is first allocated. The first reader lowers the write semaphore to prevent the resource from being write locked. Incrementing the read count informs the next reader that the write semaphore is already lowered. The algorithm for releasing a read lock is shown below. semlower(rsem) /* allocate readcount */ rc- /* decrement readcount */ if rc == 0 /* if no other readers, */ semraise(wsem) /* free resource */ semraise(rsem) /* free readcount */ When the last reader leaves its critical section, the write semaphore is lowered to allow the resource to be write locked. A write lock is acquired and released by lowering and raising the write semaphore, respectively. If any processes have the resource read locked or write locked, the write semaphore will already be lowered and the attempted write lock will fail. Listing 3 and Listing 4 show a portion of the implementation of r/w locking using files. The read counts require the same visibility as the semaphores, and so files are also used for read counts. The routines developed above are used to manipulate the semaphores (two for each r/w semaphore). r/w semaphores are also grouped into sets; the read count files (one for each r/w semaphore) and the directory containing the semaphores (twice the number of r/w semaphores) would be isolated within a single directory. The header rwsem.h, Listing 3, defines the r/w semaphore set control structure rwsset_t. rwsdir contains the files used by the semset_t, and rwsc is the number of r/w semaphores. ssp points to a semaphore set used for the write and read semaphores. lockheld points to an array of lock types containing the type of lock held by the calling process for each r/w semaphore. (The lock type must be remembered because the procedure to remove a read lock is different from that for removing a write lock.) The r/w semaphore functions rwsset_t *rwsopen(rwsset_t *rwsp, int flags, int rwsc); int rwscount(rwsset_t *rwsp); int rwsclose(rwsset_t *rwsp); int rwsremove(char *rwsdir); are exactly analogous to their semaphore counterparts. The single function rwslock controls locking, in place of semlower and semraise. int rwslock(rwsset_t *rwsp, int rwsno, int ltype); The first two parameters are the same as for semlower and semraise. The last specifies the type of lock. RWS_UNLCK unlock RWS_RDLCK read (shared) lock RWS_WRLCK write (exclusive) lock If ltype is RWS_RDLCK and the indicated r/w semaphore is already in a write lock state, or if ltype is RWS_WRLCK and the indicated r/w semaphore is already in a read lock state, rwslock will fail (return -1) and set errno to EAGAIN. As for semlower, a busy wait loop would be used for rwslock when ltype is RWS_RDLCK or RWS_WRLCK. The source for rwslock is shown in Listing 4. Conclusion The control of concurrency has been a prominent topic for many years; E. W. Dijkstra's paper introducing semaphores was first published in 1965. But in spite of its relatively long history, it may be new to many microcomputer programmers who are suddenly acquiring multitasking capabilities for the first time. Understanding the implications of concurrency and the techniques for concurrency control will be necessary for the programmer to fully utilize the new multitasking systems now available for microcomputers, particularly those that are also multi-user. References Calingaert, P. Operating System Elements. Englewood Cliffs, NJ: Prentice-Hall, 1982. Deitel, H. An Introduction to Operating Systems. Reading, MA: Addison-Wesley, 1984. Figure 1 Concurrent Access Without Mutual Exclusion Figure 2 Concurrent Access With Mutual Exclusion Listing 1 /* semaphore.h */ #ifndef SEMAPHOR_H /* prevent multiple includes */ #define SEMAPHOR_H #include <limits.h> #ifndef PATH_MAX #define PATH_MAX (256) /* max # of characters in a path name */ #endif /* constants */ #define SEMOPEN_MAX (60) /* max # semaphore sets open at once */ /* type definitions */ typedef struct { /* semaphore set control structure */ char semdir[PATH_MAX + 1]; /* semaphore directory path name */ int semc; /* semaphore count */ } semset_t; /* function declarations */ int semclose(semset_t *ssp); #define semcount(ssp) ((ssp)->semc) int semlower(semset_t *ssp, int semno); semset_t * semopen(char *semdir, int flags, int semc); int semraise(semset_t *ssp, int semno); int semremove(char *semdir); /* semopen command flags */ #define SEM_CREAT (01000) /* create and open */ #define SEM_EXCL (02000) /* exclusive open */ /* error codes */ #define SEMEOS (0) /* start of error code domain */ #define SEMEMFILE (SEMEOS - 1) /* too many semaphore sets open */ #define SEMPANIC (SEMEOS - 2) /* internal semaphore error */ #endif /* #ifndef SEMAPHOR_H */ Listing 2 /* semaphore.c */ /* Supported operating systems: UNIX, MS-DOS */ #define UNIX (1) #define MSDOS (2) #define HOST UNIX #include <errno.h> #include <limits.h> #include <stdio.h> #include <string.h> #if HOST == UNIX #define PATHDLM ('/') /* path name delimiter */ #include <fcntl.h> /* open() macro definitions */ #include "syscalkr.h" /* system call declarations */ #include <sys/stat.h> /* file mode macros */ #elif HOST == MSDOS #define PATHDLM ('\\') /* path name delimiter */ #include <fcntl.h> /* open() macro definitions */ #include <io.h> /* close(), open() declarations */ #include <sys/types.h> #include <sys/stat.h> /* file mode macros */ #endif #include "semaphor.h" /* semaphore set table definition */ static semset_t sst[SEMOPEN_MAX]; /* semlower: lower semaphore */ int semlower(semset_t *ssp, int semno) { char path[PATH_MAX + 1]; /* remove semaphore file */ sprintf(path, "%s%cs%d", ssp->semdir, (int)PATHDLM, semno); if (unlink(path) == -1) { if (errno == ENOENT) errno = EAGAIN; return -1; } errno = 0; return 0; } /* semraise: raise semaphore */ int semraise(semset_t *ssp, in, semno) { char path[PATH_MAX + 1]; int fd = 0; /* create semaphore file */ sprintf(path, "%s%cs%d", ssp->semdir, (int)PATHDLM, semno); #if HOST == UNIX fd = open(path, O_WRONLY O_CREAT, O); #elif HOST == MSDOS fd = open(path, O_WRONLY O_CREAT, S_IREAD S_IWRITE); #endif if (fd == -1) { return -1; } if (close(fd) == -1) { return -1; } errno = 0; return 0; } Listing 3 /* rwsem.h */ #ifndef RWSEM_H /* prevent multiple includes */ #define RWSEM_H #include <limits.h> #include "semaphor.h" /* constants */ #define RWSOPEN_MAX SEMOPEN_MAX /* max # rwsem sets open at once */ /* type definitions */ typedef struct { /* rwsem set control structure */ char rwsdir[PATH_MAX + 1]; /* directory */ int rwsc; /* r/w semaphore count */ semset_t *ssp; /* semaphore set */ short *lockheld; /* locks held by calling process */ } rwsset_t; /* function declarations */ int rwsclose(rwsset_t *rwsp); #define rwscount(rwsp) ((rwsp)->rwsc) int rwslock(rwsset_t *rwsp, int rwsno, int ltype); rwsset_t * rwsopen(char *rwsdir, int flags, int rwsc); int rwsremove(char *rwsdir); /* rwsopen command flags */ #define RWS_CREAT (01000) /* create and open */ #define RWS_EXCL (02000) /* exclusive open */ /* lock types */ #define RWS_UNLCK (0) /* unlock */ #define RWS_RDLCK (1) /* read lock */ #define RWS_WRLCK (2) /* write lock */ /* error codes */ #define RWSEOS (-20) /* start of error code domain */ #define RWSEMFILE (RWSEOS - 1) /* too many rwsem sets open */ #define RWSPANIC (RWSEOS - 2) /* internal rwsem error */ #endif /* #ifndef RWSEM_H */ Listing 4 #include <errno.h> /*rwslock.c */ #include <limits.h> #include <stdio.h> #include <string.h> #define PATHDLM ('/') /* path name delimiter */ #include "rwsem.h" /* function declarations */ int getcnt(char *file, int *cntp); int putcnt(char *file, int cnt); /* read/write semaphore set table definition */ static rwsset_t rwst[RWSOPEN_MAX]; /* rwslock: read/write semaphore lock */ int rwslock(rwsset_t *rwsp, int rwsno, int ltype) { int wsem = 0; /* write semaphore */ int rsem = 0; /* read semaphore */ int rc = 0; /* readcount */ char rcpath[PATH_MAX + 1]; /* readcount file path name */ /* identify write and read semaphores and read-count file */ wsem = rwsno * 2; rsem = wsem + 1; sprintf(rcpath, "%s%cr%d", rwsp->rwsdir, (int)PATHDLM, rwsno); switch (ltype) { case RWS_UNLCK: /* unlock */ switch (rwsp->lockheld[rwsno]) { case RWS_UNLCK: /* unlock */ break; /* case RWS_UNLCK: */ case RWS_RDLCK: /* read lock */ if (semlower(rwsp->ssp, rsem) == -1) { /* allocate readcount */ return -1; } getcnt(rcpath, &rc); /* get readcount */ rc--; /* decrement readcount */ if (rc == 0) { /* if no other readers, */ if (semraise(rwsp->ssp, wsem) == -1) { /* free resource */ semraise(rwsp->ssp, rsem); return -1; } } putcnt(rcpath, rc); /* store new readcount */ if (semraise(rwsp->ssp, rsem) == -1) { /* free readcount */ return -1; } break; /* case RWS_RDLCK: */ case RWS_WRLCK: /* write lock */ if (semraise(rwsp->ssp, wsem) == -1) { return -1; } break; /* case RWS_WRLCK: */ default: errno = RWSPANIC; return -1; break; /* default: */ }; break; /* case RWS_UNLCK: */ case RWS_RDLCK: /* read lock */ if (rwsp->lockheld[rwsno] == RWS_RDLCK) { errno = 0; return 0; } if (semlower(rwsp->ssp, rsem) == -1) { /* allocate readcount */ return -1; } getcnt(rcpath, &rc); /* get readcount */ rc++; /* increment readcount */ if (rc == 1) { /* if no other readers, */ if (semlower(rwsp->ssp, wsem) == -1) { /* allocate resource */ semraise(rwsp->ssp, rsem); return -1; } } putcnt(rcpath, rc); /* store new readcount */ if (semraise(rwsp->ssp, rsem) == -1) { /* free readcount */ if (rc == 1) semraise(rwsp->ssp, wsem); return -1; } break; /* case RWS_RDLCK: */ case RWS_WRLCK: /* write lock */ if (semlower(rwsp->ssp, wsem) == -1) { /* allocate resource */ return -1; } break; /* case RWS_WRLCK: */ default: errno = EINVAL; return -1; break; /* default: */ }; /* save type of lock held */ rwsp->lockheld[rwsno] = ltype; errno = 0; return 0; } Fast Memory Allocation Scheme Steve Weller Steven Weller is president of Windsor Systems, specializing in OS-9, system-level and real-time software, and computer graphics. An electronics engineer from England, Steve has been in software for nine years and has particular interest in parallel computer applications, modern computer languages, operating systems, and the management of technology. He may be contacted at 2407 Lime Kiln Lane, Louisville, KY 40222 (502) 425-9560. In applications requiring the dynamic allocation of a large number of small objects, the overhead associated with general-purpose allocation schemes can be large: between 20 and 200 percent of the actual stored data. To minimize this problem I use a layered allocation system in which standard system calls allocate relatively large blocks of memory to a simpler memory management subsystem. All of the smaller objects belonging to a single data structure (e.g., a tree or linked list) are then "borrowed" (using a low overhead allocation scheme) from one (or a set) of these layer blocks. Unlike generalized allocation routines, the "borrowing" system doesn't attach allocation information to any of the borrowed objects, potentially reducing memory overhead. Moreover, because the entire data structure is freed as a unit, I avoid the overhead of attempting to coalesce adjacent, freed objects (except for the underlying large blocks). Why Not malloc()? malloc() and free() are the most commonly used standard C function calls for memory allocation and deallocation. They are general and easy to use, but inefficient for small amounts of memory, both in terms of storage overhead and speed. malloc() collects the requested amount of memory, allocates it, and returns a pointer to the allocated area. On my machine malloc() adds an overhead of eight bytes to every piece of memory allocated. malloc() is also not particularly fast since it must manipulate the links it maintains between allocated blocks each time memory is allocated. free() deallocates the memory block whose address is passed by undoing malloc()'s links and adding the block to the list of free blocks, merging adjacent blocks if possible. Using free() to deallocate a large number of small blocks is very inefficient. Borrowing Memory Memory borrowing, as I call it, allows the user to obtain memory in small pieces, but return it all in one go. A call to iniz_borrow() sets up the system: if ((id=iniz_borrow(2000))==0) error("Can't get memory\n"); iniz_borrow() accepts a number which represents the block size to be allocated from the system when memory is required, in this case 2000 bytes. The routine allocates either one block and returns a memory ID, or returns zero indicating that an error has occurred. All subsequent allocations and deallocations use the unique memory ID number. Any number of memory IDs may be created, each with its own allocation size, but all the memory associated with one ID must be returned at the same time. Normally each memory ID is associated with a separate large data structure. Each time memory is needed for a small object within one of these structures, borrow() is called: if ((new=borrow(id,size))==0) error("Can't get memory\n"); borrow() allocates memory from the block defined by the ID and returns a pointer to it (here assigned to new), or a zero on error. Additional blocks are acquired from the system if necessary. Two functions free "borrowed" memory; both return all memory allocated with one ID. return_borrow() returns all but the first block to the system, leaving the memory ID valid and reusable. deiniz_borrow() returns all memory to the system, making the memory ID invalid and unusable. The Borrow Functions Listing 1 contains the header information and the initialization routine. iniz_borrow() allocates a block of memory and places the allocation information in the MemBlock structure at the start of the block. The routine returns the block's address as the memory ID. As more blocks of memory are required, they will be linked to the first. The allocate() routine shown in Listing 1 can be any allocation routine you have, probably sbrk() or malloc(). Your routine must, however, return a zero on failure. Listing 2 shows the borrow() routine itself. The requested amount of memory is rounded up to an even number of bytes, keeping the allocated memory addresses on even byte boundaries. This restriction can be dropped if not required, or changed to need=(need+3)&~3 or need=(need+7)&~7 to ensure that even word or even long word alignment is maintained. Next the amount of memory requested is compared with the amount remaining in the current block. If the remaining memory is insufficient, another block is allocated and linked to the current block. borrow() updates the MemBlock structure in the first allocated block to identify the newly allocated block as the current block, and adjusts the offset to allow for a pointer at the start of the new block. It is not necessary for any block other than the first to contain the whole MemBlock structure. The offset mb_offs identifies the amount of memory that has been allocated in the current block. To satisfy a memory request, the address of the allocated memory is computed (by adding the current block pointer mb_pres to the offset), the offset is incremented (by the amount of memory allocated), and the original memory address returned. Listing 3 shows the deallocation routines. deiniz_borrow() returns all the blocks allocated by the system by running down the allocated list, calling deallocate() as it goes. deallocate() can be any deallocation routine complementary to the allocation routine used in iniz_borrow(). It must, as before, return a zero on failure. return_borrow() is similar to deiniz_borrow() except that it does not return the first block, and hence keeps the memory ID valid. The MemBlock structure at the start of the first block is reset to show an empty first block -- the same state that it was left in by iniz_borrow(). Block Size Using a large block size results in fewer allocations and deallocations from system memory, and hence greater speed, but at the expense of greater memory overhead. If the block size is only a few times greater than the memory being allocated by borrow(), then large amounts at the end of each block will remain unused. Conclusion This simple memory allocation system takes advantage of the way that many applications allocate and deallocate memory. It can be tailored to different data structures by grouping memory allocation for each type of structure under separate memory IDs, each with a different block size. The simple allocation mechanism produces a fast and efficient system. Listing 1 /* Header for memory blocks */ typedef struct MEMBLOCK { struct MEMBLOCK *mb_next, /* Pointer to next block */ *mb_pres; /* Present block */ int mb_size, /* Size of blocks */ mb_offs; /* Present offset in block */ } MemBlock; unsigned int iniz_borrow(), deiniz_borrow(), return_borrow(); char *borrow(); /* ------------------------------------------------------- */ /* Initialise memory */ /* Returns the memory ID or zero on error */ unsigned int iniz_borrow(block) register int block; /* Allocation block size */ { register MemBlock *p; /* Pointer to block */ /* Get first block */ if((int)(p=(MemBlock *)allocate(block))==0) return(0); p->mb_next=NULL; /* No next block */ p->mb_pres=p; /* This is the present block */ p->mb_size=block; /* Record the block size */ p->mb_offs=sizeof(MemBlock); /* Start past this info */ return((unsigned int)p); } Listing 2 /* Borrow Memory */ /* Returns a pointer to the allocated memory, or NULL */ char *borrow(id,need) register MemBlock *id; /* Pointer to first block */ register int need; /* Requested memory size */ { register MemBlock *p=id->mb_pres; /* Present block pointer */ register int oldoffs; /* Old offset */ /* Round need up to word multiple */ need+=need&1; /* Deal with more memory required */ if(id->mb_offs+need>id->mb_size) { /* Too large to fit ? */ register MemBlock *q; /* Get another */ if((q=(MemBlock *)allocate(id->mb_size))==0) return(NULL); p->mb_next=q; /* Link to new block */ q->mb_next=NULL; /* Mark end of list */ id->mb_pres=q; /* New block is present one */ id->mb_offs=sizeof(MemBlock *); /* Reset offset */ p=q; /* Present block */ } oldoffs=id->mb_offs; /* Record present offset */ id->mb_offs+=need; /* Move offset */ return((char *)((int)p+oldoffs)); /* Return address of memory */ } Listing 3 /* Return all memory allocated to this ID */ /* NULL is returned on error */ unsigned int deiniz_borrow(id) register MemBlock *id; { register MemBlock *nextone=id, /* Pointer to next block */ *thisone; /* Pointer to pres block */ while(thisone=nextone) { /* While blocks to return */ nextone=thisone->mb_next; /* Point to next block */ if(deallocate(thisone)==0) /* Return this one */ return(NULL); } return(id); /* Return non-zero */ } /* --------------------------------------------------------- */ /* Return all memory but the first block */ /* NULL is returned on error */ unsigned int return_borrow(id) register MemBlock *id; { register MemBlock *nextone, /* Pointer to next block */ *thisone; /* Pointer to pres block */ /* Return all but first */ if(nextone=id->mb_next) /* If anything to return */ while(thisone=nextone) { /* While blocks to return */ nextone=thisone->mb_next; /* Point to next block */ if(deallocate(thisone)==0) /* Return this one */ return(NULL); } /* Reset infomation in the first block */ id->mb_next=NULL; /* No next block */ id->mb_pres=id; /* This is the present one */ id->mb_offs=sizeof(MemBlock); /* Reset offset */ return(id); /* Return non-zero */ } A Survey Of CUG C Compilers Victor Volkman Victor R. Volkman received a BS in computer science from Michigan Technological University in 1986. Mr. Volkman is a frequent contributor to The C Users Journal and the C Gazette. He is currently employed as Software Engineer at Cimage Corporation of Ann Arbor, MI. He can be reached at the HAL 9000 BBS, (313) 663-4173, 1200/2400/9600 baud. Compiler construction is alternately the most rewarding and most frustrating area of software development. The C Users' Group offers public domain C compilers with source code for both those who study and those who use compilers. These packages have been independently developed by programmers who were often the first to implement the C language on their target machines. Some of these compilers share the ability to compile their own source to build new versions of themselves. All of them share their authors' vision of taking the C language to new frontiers. A Small History Of The Small C Compiler Since Ron Cain's introduction of the Small C compiler into the public domain nearly a decade ago, its implementations have spread like wildfire to nearly every popular microprocesor. The C User's Group is fortunate to be able to offer public domain compilers which have been ported to the Z-80, 8080, 6800, 6809, 8086, and 68000 (see Figure 1). Ron Cain's Small C Compiler v1.0, which debuted in the May 1980 issue of Dr. Dobb's Journal, was originally a very small subset of the C language. Small C has been a self-compiler since its first implementation. This means that performance improvements in code generation and parsing can be immediately incorporated back into the compiler itself. Small C is a one-pass compiler which generates assembly language from a C input file. The subset of data types which the original Small C recognized consisted only of characters, integers, and one-dimensional arrays of either type. Additionally, the only control statements were while and if. Small C was also restricted to bitwise logical (&, ) operators since boolean (&&, ) operators were not supported. In 1982, James E. Hendrix assumed trusteeship of Small C. Hendrix published numerous upgrades through Dr. Dobb's Journal culminating in the release of Small C v2.1 for CP/M in 1984. New features added along the way include code optimization, data initializing, conditional compiling, extern storage, for, while, switch/case, and goto statements, and a plethora of operators. To complete the system, James E. Hendrix and Ernest Payne developed a CP/M compatible version of the UNIX C standard I/O library. The internal design of Small C v2.1 was the subject of Hendrix's The Small C Handbook. The first published 8086 PC-DOS implementation of Small C v2.1 appeared in 1985. Along the way, code optimization techniques were refined even more. The present incarnation from Hendrix, Small C v2.2, is available for 8086 PC-DOS only. Small C v2.2 was released simultaneously with Hendrix's definitive reference work A Small C Compiler: Language, Usage, Theory, and Design in 1988. CUG C Compilers Based On Small C Many of the C compilers available from CUG are based on some derivative of the Cain or Hendrix implementation of Small C. The exceptions to this rule are the 68000 C Compiler (disk #204) which has no lineage with Small C and the DECUS C Preprocessor (disk #243) which is not a full compiler. Some of the CUG C compilers based on Cain's Small C v1.1, include many of the enhancements published in Dr. Dobb's Journal over the years. This puts them approximately at the level of Hendrix Small C v2.0 discussed earlier. These enhanced Small C compilers are available as disk CUG104 Z-80/8080 (CP/M 80), CUG163 8086 (PC-DOS), and CUG221 6809 (FLEX OS). An attribute which most of the CUG C compilers share is a noticeable lack of external documentation. All disks have less than a dozen pages of documentation with the exception of Small C w/Floats (CUG156) which includes 30 pages. Fortunately, their common heritage means their implementations remain similar to the well-documented Cain and Hendrix designs. Specifically, the Doctor Dobb's Journal issues from 1980 to 1982 (see bibliography) are the best source for Small C versions before 2.0. Alternately, Hendrix's Small C Handbook (now out of print) details these early versions. You might need to check your local university library for these publications. Unfortunately, Hendrix latest book A Small C Compiler will be less relevant to older versions due to recent internal code redesigns. The CUG C compilers based on Small C, regardless of version, also share certain limitations of language features. In particular, struct, union, long, float, and double data types are not supported. The exception to this rule is of course Small C w/Floats (CUG156) which includes a 48-bit non-standard float. Additionally, arrays are limited to one-dimension and pointer arrays are specifically prohibited. These compilers also assume that ints and pointers are equivalent. This means the size of code and data pointers must also be the same. Small C-based compilers do not allow nested include files nor parameterized macro substitutions (as used in stdio.h). Also, the full set of C operators is often not present. In general, the run-time libraries contain a good assortment of standard I/O, string, and keyboard-polling functions. Higher-level functions such as sprintf() are not always present. The libraries have very primitive linear memory allocation with alloc() and free(). Blocks of allocated memory must be freed in reverse order of allocation. The overall ratings were based on my perception of the documentation, completeness, and usability of the implementation. CUG104: Small C For Z-80/8080 (CP/M 80) This implementation of Small C for the Z-80/8080 was done by Mike Bernson of Ann Arbor, MI. This Small C is not self-compiling and requires a special assembler and linker which are included only in CP/M 80 executable form. The compiler was developed with BDS C v1.41. Mike Bernson has made several improvements to RC Small C v1.1 including most of the features of JH Small C v2.1 except goto/label and the ternary operator. The Standard C I/O library is included in both assembly language and object code format. Only three pages of documentation are provided, consisting of two pages of grammar and a one page listing of file contents. CUG132: Small C For 6809 (Radio Shack Color Computer w/OS9) Small C for the 6809 (Color Computer) was implemented by A.J. Griggs. This version is close to RC Small C v1.0 since it lacks switch/case, for, and goto/label statements among other things. This Small C is not self-compiling and requires BDS C v1.41 or later to compile. This package requires a 6809 assembler and linker which are not included. Small C for 6809 is designed as a cross-compiler which produces 6809 code while running under a 8080/Z-80 environment. After compilation, you would use the supplied serial-port driver to download the object code in Motorola S HEX format to the target 6809 machine. This C compiler cannot be self-compiled because it has hardware dependencies on the byte order of 16-bit words. Specifically, the 6809 has the low and high bytes stored in the reverse order of 8080 machines. The compiler assumes a certain order in some cases and thus cannot compile itself. This disk includes a serial driver, graphics library, and sample graphics game. The graphics library supports real-time animation in the player-missle arcade style. Graphics objects are managed in a list which stores their screen position and x/y velocity. During animation, the routines automatically flag collision of objects on the screen. The management of graphic objects is similar to the use of sprites on Commodore C64 and C128 machines. Also on this diskette are a total of eight pages of documentation, six on the 6809 port and two on use of the graphics library. CUG146: Small C For 6800 (FLEX OS) This implementation of Small C for 6800 (FLEX OS) was completed by Serge Stepanoff of Livermore, CA. This version is close to RC Small C v1.0 since it lacks switch/case, for, and goto/label statements among other things. An additional restriction is that identifiers are limited to six significant characters. This Small C is not self-compiling and requires BDS C v1.41 or later to compile. This package does not include a complete Standard C I/O library. A nonstandard printf() is used which requires that the number of arguments be passed as the last parameter. Small C for 6800 (FLEX OS) does not compile to assembly or machine language, but rather to a pseudo-code. A small pseudo-code interpreter, less than 2K, actually executes the user's pseudocode. To run this pseudo-code in a different environment requires only the rewrite of the interpreter and the runtime library for the target machine. However, the source code for the interpreter is not included on the distribution diskette. The diskette contains 11 pages of documentation, the first five pages are devoted to how to use the compiler and the remainder to the run-time library. CUG156: Small C w/Floats (CP/M) Small C w/Floats (CP/M) was implemented by James R. Van Zandt of Nashua, NH. This package was originally available as disk #224 from the Sig/M-Amateur Computer Group of Iselin, New Jersey. This version is close to RC Small C v1.0 since it lacks switch/case, for, and goto/label statements. Additionally, the following operators are not supported: logical or ( ), logical and (&&), logical not (!), bitwise-not (~), and the assignment operators (+=, -=, et. al.). This disk includes the executable compiler and is self-compiling. The compiler reads C source and produces Z-80 assembly language. The two major speed enhancements relative to Ron Cain's original compiler are a hash coded symbol table and 1K disk buffers. Additionally, the compiler will resolve symbols uniquely up to the first 16 characters. This disk also includes the ZMAC macro assembler and ZLINK linker in executable form only. Small C w/Floats supports the following usage of floating point: double d; 48 bit floating point double *d; pointer to double double d(); function returning double double d[5]; array of doubles Storage classes, structures, multidimensional arrays, unions, and more complex types like double **d are not included. The layout of doubles does not conform to IEEE standard. These routines will execute only on a Z-80. They use the alternate registers and some of the undocumented instructions of that processor. Small C w/Floats includes a full complement of transcendental functions for type double (Listing 1). If the "profile and trace" (-P) option of the compiler is used, each call to err() results in a walkback trace of function calls. In addition, an execution profile is displayed on the console at program termination (call to exit()). The profile consists of a list of the functions and the number of times (up to 999999) each was called. This is sometimes useful for debugging (to spot functions that are never called), but is most valuable for program execution time optimization. With 30 pages of documentation, Small C w/Floats is the best documented of any compiler available from CUG. The documentation covers compiler usage and internal, floating point routines, Standard C I/O library, ZMAC macro assembler, and the ZLINK linker. CUG163: Small C For 8086 (PC-DOS) This implementation of Small C for 8086 (PC-DOS) was completed by Daniel R. Hicks of Rochester, MN. Small C for 8086 (PC-DOS) is distributed on two diskettes, the first contains the run-time library source and the second contains the compiler source and executable. This package was originally available as disk #152 from the Personal Computer Club of Toronto, Canada. This is a self-compiler, but does require your own assembler and linker. This port of Small C is based on JH Small C v2.0 so that it does support switch/case, for, goto/label statements. Hicks standard C I/O library provides very good compatibility with its UNIX counterpart. Hicks implementation imposes the following additional restrictions: lower-case and upper-case symbols are synonymous, both local declarations within a block and goto statements may not be used simultaneously, and the sizeof() operator is not supported. Parameters are pushed in order of occurrence: The first parameter in a list is the first one pushed and therefore the deepest one in the stack. This is opposite the order of many C compilers, and it prevents some C library functions (such as printf) from being able to determine the parameter count by examining just the first or second parameter. For this reason, the compiler, prior to a CALL, loads register DL with the parameter count, thus allowing functions such as printf to be implemented. Included on the diskette are nine pages of detailed documentation on the capabilities and limitations of the compiler. CUG170: Miscellany V (Caprock C, version N for IBM-PC) Caprock Small C for 8086 (PC-DOS) was implemented by Caprock Systems, Inc. of Arlington, TX. This disk was originally available as disk #315 from PC Software Interest Group (PC-SIG) of Sunnyvale, CA. This compiler is supplied in source form only, an executable version is not included. Additionally, the standard C I/O library is missing from this distribution. This version is close to RC Small C v1.0 since it lacks switch/case, for, and goto/label statements. When compiled under Microsoft C 5.1, this file produced four errors and 53 warnings. All of these problems were the result of the assumption that integers are interchangeable with pointers. No documentation is included with this compiler. True to its name, the Miscellany V disk offers over 20 files of C functions. Some of the other offerings on this disk include Life and Towers of Hanoi games, a binary to Intel HEX format converter, and several keyboard utilities. CUG204: 68000 C Compiler (UNIX System V) The 68000 C Compiler (PC-DOS) was completed by Matthew Brandt of Norcross, GA. This compiler is intended as an instructive tool for personal use. Any use for profit without the written consent of the author is prohibited. As stated earlier, this is the only C compiler offered by CUG which is not derived from RC or JH Small C. This is an optimizing C compiler which generates assembly language for the Motorola 68000 processor. This system also requires a 68000 assembler and linker which the user must supply. It has successfully compiled itself on UNIX System V running on a Motorola VME-10. Since this code was written for a machine with long integers it may exhibit some irregularity when dealing with long integers on the IBM-PC. This compiler vies with Small C w/Floats (CUG #156) for the best implementation of C. Although the 68000 C Compiler does not support floats, it does have features not found in any other CUG C compiler: longs, structures, unions, complex types (e.g. char **argv), enumerated types, and functions which return pointers to structures. The disk includes one page of documentation outlining the limitations of the compiler. Brandt offers the following warning: "The author makes no guarantees. This is not meant as a serious development tool although it could, with little work, be made into one." The preprocessor does not support parameterized macro substitutions, only #include and #define macros are supported. Brandt advises that function arguments declared as char may not work properly and should be changed to int. When the compiler encounters a syntax error, an error number is printed but no descriptive text is provided. Lastly, the size of functions is slightly limited due to the fact that the entire function is parsed before any code is generated. The compiler can be compiled by Microsoft C v3.0 or higher. MSC will issue many warnings but they can be ignored. The file MAKE.BAT may be used to rebuild the compiler. CUG221: 6809 C Compiler (FLEX OS) This implementation of Small C for 6809 (FLEX OS) was completed by Dieter H. Flunkert. The author has made several improvements to RC Small C v1.1 plus most of the features of JH Small C v2.1 except goto/label. Small C for 6809 (FLEX OS) has all other C control statements including switch/case, do/while, and for. Additionally, all C operators are supported including the elusive comma (,), ternary (?), and assignment operators (+=, -=, et. al). However, like most other Small C implementations, the data types for float, double, long, structures, and unions are not present. An executable version of the compiler is not provided on the diskette. This system requires the TSC relocatable assembler, library generator and linking loader which the user must supply. The standard C I/O library is included in both C source and assembly language formats. The compiler has seven pages of documentation detailing the grammar and preprocessor commands. When compiled under Microsoft C v5.1, it was revealed that many of the #include directives did not have quoted filenames (e.g. #include stdio.h). Once again, many warnings appeared from the use of integers as pointers. Proper compilation required adding #define VMS to every module. CUG243: DECUS C Preprocessor (PC-DOS) The DECUS C Preprocessor (CPP) was originally implemented by Martin Minnow. CPP was subsequently ported to PC-DOS by Ted Lemon and Jym Dryer. CPP reads a C source file, expands macros and include files, and writes an input file for the C compiler. If no file arguments are given, it reads from stdin and writes to stdout. If one filename is given, it will be the input file. If a second filename is given, it will be the output file. The full command line format is: cpp [-options] [infile [outfile]] The DECUS C Preprocessor has been updated to meet the specifications of the Draft ANSI C Standard. However, this C preprocessor is not designed to handle floating point expressions. An experimental floating point source file is provided for those who wish to experiment with it. The following options are supported. Options may be given in either case. -I directory Add this directory to the list of directories searched for #include "..." and #include ... commands. Note that there is no space between the -I and the directory string. More than one -I command is permitted. On non-UNIX systems -I directory is forced to upper case. -D name=value Define the name as if the programmer wrote #define<name><value> at the start of the first file. If is not given, a value of 1 will be used. On non-UNIX systems, all alphabetic text will be forced to upper case. -U name Undefine the name as if #undef name were given. On non-UNIX systems, name will be forced to upper case. -X number Enable debugging code. If no value is given, a value of 1 will be used. (For maintenance of CPP only.) The preprocessor will look for an environment variable INCLUDE if include files cannot be found in the -I directories. Unfortunately, only a single search directory can be specified in the INCLUDE path (e.g. SET INCLUDE=\MSC\INCLUDE;\MY\SRC will fail). CPP has been successfully built with Lattice C v2.00 and Microsoft C v3.00. The distribution disk contains four pages of documentation detailing how to prepare CPP under several different memory models. Bibliography Cain, Ron. "A Small C Compiler for the 8080s." Dr. Dobb's Journal, April-May 1980, pp. 5-19. Cain, Ron. "A Runtime Library for the Small C Compiler." Dr. Dobb's Journal, September 1980, pp. 4-15. Hendrix, J. E. "Small-C Expression Analyzer." Dr. Dobb's Journal, December 1981, pp. 40-43. Hendrix, J. E. "Small-C Compiler, v.2." Dr. Dobb's Journal, December 1982, pp. 15-63. and January 1983, pp. 48-64. Hendrix, J. E. and Payne, L. E. "A New Library for Small_C." Dr. Dobb's Journal, May 1984, pp. 50-81, and June 1984, pp. 56-69. Hendrix, J. E. "Small-C Update." Dr. Dobb's Journal, August 1985, pp.84-91. Hendrix, J. E. The Small C Handbook. Redwood City, CA: M&T Publishing Inc., 1984. Hendrix, J. E. A Small-C Compiler: Language, Usage, Theory, and Design. Redwood City, CA: M&T Publishing Inc., 1988. Volkman, Victor R. "Revised Handbook Details Small C Innards," The C Users Journal, February 1989, pp. 9-10. Ward, Robert and Donna, Ed., The C Users' Group Library, McPherson, KS: R&D Publications, Inc., 1986. Figure 1 Summary of CUG C Compilers Target Implementation CUG Target Operating Based on Port Date of Last Overall Disk # CPU System From Revision Rating ------------------------------------------------------------------------ 104 Z-80/8080 CP/M 80 v2.2 RC Small C v1.1 06/28/1981 *** 132 6809 0S-9 RC Small C v1.1 10/18/1983 ** 146 6800 FLEX v2.1 RC Small C v1.1 09/09/1982 ** 156 Z-80 CP/M RC Small C v1.2 08/02/1984 **** 163 8086 PC-DOS 1.1 JH Small C v2.0 01/14/1984 *** 170 8086 PC-DOS 1.0 RC Small C v1.0 06/01/1982 * 204 68000 Unix V N/A 01/01/1986 **** 221 6809 FLEX RC Small C v1.0 11/15/1986 *** 243 8086 PC-DOS 2.0 DECUS 12/01/1985 N/A The overall ratings were based on my perception of the documentation, completeness, and usability of the implementation. Listing 1 atan(), /* arc tangent */ sin(), /* sine */ atan2(), /* atan2(a,b) = arctan of a/b */ sinh(), /* hyperbolic sine */ cos(), /* cosine */ sqrt(), /* square root */ cosh(), /* hyperbolic cosine */ tan(), /* tangent */ exp(), /* exponential */ tanh(); /* hyperbolic tangent */ log(), /* natural logarithm */ pow(), /* pow(x,y) = x**y */ log10(), /* log base 10 */ float(x); double x; /* integer to floating point conversion */ fmod(x,y); double x,y; /* mod(x,y) / if 0 < y then 0 <= mod(x,y) < y and x = n*y + mod(x,y) for some integer n */ fabs(x); double x; /* absolute value */ floor(x); double x; /* largest integer not greater than */ ceil(x); double x; /* smallest integer not less than */ rand(); /* random number in range 0...1 */ Standard C Wha Gang Agley P.J. Plauger P.J. Plauger has been a prolific programmer, textbook author, and software entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee. Nothing is perfect. A document produced by a committee is certainly no exception. It is hardly surprising, therefore, that people have found much to criticize in the ANSI standard for C. Most of the imperfections can be chalked up to political compromise. Some are existing practices that are too deeply entrenched to change, no matter how strong the current consensus against them. A few are simply things that the standards committee arguably got wrong and didn't fix. A few more are important additions that somehow never garnered enough concerted support to make it in. Preprocessing, for example, was in the worst shape of any part of the C language. The committee did rather a good job of tidying up several messes in this area. Just defining the preprocessing phases more precisely was a major contribution. Still, there were a few botches and omissions. I have been one of the strongest defenders of the ANSI C standard produced by committee X3J11. As an active participant, I saw the need for compromise and the need to retain backward compatibility even when it hurt. I also know intimately how much work went into producing the standard. If a few areas couldn't get cleaned up in time, so be it. The ANSI C standard is still one of the best language standards I have ever encountered. Nevertheless, I am not blind to the shortcomings of the document we produced. We missed a number of opportunities to make the language better in small ways. We committed the sin of inconsistency more times than I care to admit. We left out all sorts of clever improvements to the C language. I have my own list of gripes about the C standard. I figured that it was time for a change of pace in these pages. After a couple of years of explaining and defending the C standard, I plan to take a few potshots at it. What follows is a weakly ordered collection of observations. Each describes some way in which I feel the standard could have been better. For now, I confine my remarks to the language proper. I plan to devote considerable attention to the Standard C library in the months to come. What Didn't Get Cleaned Up We missed several opportunities to tidy up the language proper. Here are a few of them. Historical usage prevented us from making floating literals type float by default. It makes more sense to add a prefix to get type double. Sadly, you must add an F to get the former, since C has traditionally considered floating literals to have type double. Similarly, the committee had to back off from making string literals type array of const char. Too many existing programs have code such as char *p = "abc"; which would require a cast to avoid a diagnostic. So string literals have the curious property of being semantically const (for a portable program) without having the type that goes with the semantics. The French standards committee, AFNOR, wanted to put the null pointer constant NULL into the language. So did a few other people. It has the same slippery semantics that nul enjoys in Pascal, but without the same full language support. As a consequence, different implementations must define it as a macro in different ways. That invites its misuse, which in turn makes it harder to write portable programs. Several people proposed various schemes for making enumerations more strongly typed. Most were too scary to adopt. The rest failed to garner enough support even for extended debate. What we ended up with is somewhat better than using preprocessor macros to name constants, but not much. Each enumeration you write becomes a synonym for one of the integer types that promotes to type int. (An implementation can tailor the storage it uses to represent an enumeration.) As far as type checking goes, however, an enumeration constant or data object simply has an integer type. You can mix apples and oranges. We talked more about making bitfields better, but in the end we didn't do much. What you want, at the very least, is the ability to declare the size of "storage unit" that you are carving up into bitfields. You want eight different base types, the signed and unsigned flavors of char, short, int, and long. The standard provides only three base types, plain int, signed int, and unsigned int. The plain flavor has special meaning in this context (and only in this context). It lets the implementation define whether the component bitfields have values that are signed or unsigned. That wart was added to be nice to existing implementations, not to make bitfields any more usable. We talked at great length about value preserving versus unsigned preserving arithmetic. (It is more fair to say that we fought tooth and nail.) Nevertheless, none of us tried to fix a closely related problem, the surprises that abound when you mix signed int and unsigned int operands. C traditionally calls for the signed operand to be converted to unsigned, which is the type of the result. To get a sensible value in many cases, however, you should convert both to a slightly larger signed type. We shuddered to think what changing this rule might do to existing programs, so we left the problem alone. I wish we could have fixed it. When I wrote my first C compiler many years ago, the first thing I found myself hating was the unrestricted goto statement. You can write a goto that transfers control into a block from somewhere outside. You can even jump to the statements controlled by if, else , while, and other flow-of-control keywords. What that does to code optimization is beyond belief. Either you despair of doing many optimizations or you write a much larger translator. We discussed restricting goto statements on several occasions. What prompted us to leave them alone was the protests of an important constituency. More and more people write applications that generate C code to be compiled, as a sort of universal assembly language. A number of existing applications depend on the ability to write ugly goto statements that no human being need ever see. Were we to tidy up the semantics of control flow, we would require serious restructuring of these applications. With no little sadness, we left the goto alone. There was one area that even our extensive cleaning could not rescue completely. It was simply too dirty. I refer to the whole business of declaring and naming external variables. The problem is that C must work with many existing assemblers and linkers built to ancient specifications. That severely limits the length of external names. The committee had no serious problem increasing internal names to 31 significant characters. But we balked at requiring more than the worst-case six characters (and single case of letters) required by the stupidest of existing linkers. Despite heated debate, the majority did not want to add to the difficulty of linking C with other languages. Another aspect of this problem affects how you write multiple declarations for the same external variable. C programmers need reliable methods for ensuring that each variable has a definition, and that none has a multiple definition. Linkers vary all over the map in the kind of machinery they provide. As a consequence, C developed several dialects in this area. I believe the committee did an admirable job of embracing all these dialects and accommodating the varied linker technologies. It's too bad, however, that we couldn't just throw it all away and do it over properly. What Went In Wrong In some cases, what we added to the language proper wasn't exactly right. We botched things a bit when we introduced preprocessing numbers. These are tokens that subsume all valid numeric C tokens. We defined them to clarify what intermediate forms can occur during preprocessing while you endeavor to paste together valid numeric C tokens. The only problem is, 0X12E+3 now looks like a single preprocessing number (which becomes an invalid numeric C token). In the past, most translators knew to parse it as a hexadecimal literal, a plus operator, and a decimal literal. We must now learn to be wary of hexadecimal literals that end in E. The include directive had to compromise between two rather different implementation styles. One approach is to parse just enough of each C source line during preprocessing to decide what to do with the rest. In this case, angle brackets and double quotes parse as special delimiters within the include directive. The other approach is to parse every line into preprocessing tokens, then decide what to do. That makes it very exciting to parse directives such as #include </*.h> If you see that you are building an include directive soon enough, you know to ignore anything funny before the closing angle bracket. If you first tokenize and then look, you may decide that the /* signals the start of a comment. The committee endeavored to describe preprocessing in such a way that either approach is acceptable. Sadly, the words were reworked several times by editors with conflicting views. I can't honestly report that the pre-tokenizers were well treated in the end. You can still pre-tokenize each line when parsing C, but you have to indulge in a few heroic measures to rescue include directives. Another example also has to do with how you write declarations, but you can't blame any problems on existing linkers. The difficulties are purely internal to C. I refer to the outrageous overloading of the storage class keywords. What you mean by static or extern (or by writing no storage class at all) can have three different meanings, depending upon where you write the declaration. And if another declaration for the same name is in scope, each of these meanings can change again. C has always been messy in this regard, but the committee made it even messier with one or two arbitrary decisions. I have tried to tabulate the semantics of storage class keywords several different ways. (See, for example, "What's in a Name?" CUJ February 1988, and Standard C by P.J. Plauger and Jim Brodie, Microsoft Press, Redmond WA, 1989.) None of the presentations have a compelling logic, because the underlying machinery is not entirely logical. It could have been made much cleaner. Another thing we got wrong was allowing the sizeof operator to accept an rvalue operand. I suspect most people who voted for the extension assumed you could make useful tests with it. For instance, you might think that sizeof (x+y) would tell you whether two floats are added in double precision on a particular implementation. Not so. The type of the expression is float even if the intermediate representation happens to be double. The extension was worse than useless, however, because it caused trouble. People started asking all sorts of embarrasing questions about the types of various rvalues. And the committee started deciding answers all sorts of different ways. We now have the situation that sizeof 'a' can be larger than sizeof <'a' even though sizeof (char) is less than sizeof (wchar_t). Yuk. There is only one other thing in the C language proper that I think we got really wrong -- the semantics of pointers to constant data objects. What I wanted was a fairly serious promise. The data object pointed to by any pointer to const type should be truly constant, at least for a while. ("A while" should be from the time execution enters the function containing a reference using the pointer until the function returns.) What this restriction provides is much of the semantics you need to safely parallelize C code automatically. What it evidently costs you is additional subtle compatibility problems with C++. At least that was the strongest argument I heard against the stronger semantics. So we settled for a fairly wimpy position. All that a pointer to const assures you is that you can't alter the value stored in a data object by using that particular pointer. You can't optimize much, however, because some other agency might be changing the stored value. I backed the addition of the notorious noalias type qualifier in large part because of the differences over pointers to const. I identified five or six desirable sets of semantics for accessing data objects. Three type qualifiers gives you eight possibilities. When noalias got shot down, we had to settle for only four. They weren't the four I wanted. What Didn't Get In Lots of things didn't get into the language proper. Here are a few whose loss I lament. Our failure to solve the non-ASCII character set problem still haunts us at the international level. We need alternate spellings of the operators and punctuators that use the more esoteric ASCII characters, since these are often recycled in ISO 646 or even absent in EBCDIC. Trigraphs such as ??< just don't cut it for readability. Sadly, the committee could never agree on a particular set of more readable operators. All sorts of clever additions were suggested to make macros more powerful. Most I cheerfully helped beat down, but two failed suggestions I miss. One is for some form of conditional macro, such as #define ptc(f,c) eq(f,stdin,putchar(c),putc(f,c)) If the first two arguments to eq match (after expansion) then the third is retained, otherwise the fourth. With recursion, you can write wondrous macro definitions. The other thing I miss is some way to create character literals. You can now create a string literal from argument X by writing #X within a macro definition. It would be nice if you could create a character literal by some similar mechanism. Since the next obvious operator ## is already defined, however, that suggests a rather odious ### which few people could swallow. Dave Prosser suggested a rather nice notation, but not until well after the committee (and several implementations) got settled with the current one. A typeof operator would also help make more powerful macros. It would let you declare temporary data objects having the same type as one of the arguments to a macro. You could then write a generic "swap" macro, as in: #define swap(x, y)\ { typeof (x) t;\ t = (x);\ (x) = (y);\ (y) = t; } Of course, swap can only take the place of a statement. It cannot yield a value. That's what you need to write a safe macro for, say, the maximum value of two arguments. Otherwise, it is hard to avoid evaluating an argument expression twice, side effects and all. To get temporaries inside a subexpression, you need some way to delimit a local scope. Several schemes were proposed, none were adopted. A similar but somewhat different need is the ability to construct a structure on the fly. More than one existing implementation lets you write something like (struct complex){cos(th), sin(th)} within an expression. C is certainly a more attractive language, at least to some constituencies, with such expressive capabilities. The last thing I really miss is some form of repetition counts within data initializers. The Whitesmiths C complier let you write things like: char pattern[1000] = { [100] '.', [800] 'X', [100] '.'}; which is much easier to type, and maintain, than spelling out all the data. Beyond this point, my wish list dribbles off with items I find less important. Many of my customers loved the case ranges we added to Whitesmiths' C. Unnamed unions within structures can eliminate the need for dummy member names. Arbitrary rvalues in initializers for auto arrays and structures can have their uses. All of these features I can take or leave, however. I would like to have seen arrays become first class objects in Standard C. Array assignment and functions returning arrays have always been expressible, despite what many people think. The advent of function prototypes gave us a way to pass functions as arguments. Nevertheless, the confusion surrounding arrays as lvalues in C is so widespread that even I must acknowledge the dangers. I remain a minority of one in this area, I fear, in being willing to face those dangers and fix array handling in Standard C. Conclusion Having said all this, I now feel moved to make a few disclaimers. First, I acknowledge that everyone has a list of grievances about the current C standard. I don't presume that my list is more important or (much) more wisely considered than all others. It just happens to be my list, and this is my soapbox. Second, I do not feel ill used that my list of grievances is so long. I got plenty of opportunity to mouth off during the committee meetings. (Many witnesses can attest that I got more than my share of opportunities.) I felt well heard and was pleased to see any number of issues go the way I hoped. Last and most important, I don't even want most of these grievances satisfied. (I argued against fixing many of them when they were debated.) I respect the need to satisfy diverse constituencies. If I got my way on many of these issues, I would feel duty bound to accept the strong desires of others in similar areas. I far prefer a compromise language with widespread support to one that meets my needs but alienates many others. Even if I were the sole arbiter, I still would not make many of the changes I outlined here. Why? Because the language would be too different from the C we know and love. And it would get that much bigger for a questionable increase in value. Standard C is essentially twice as big as the C described by Kernighan and Ritchie. Admittedly, complexity is hard to quantify, but I arrive at that number through three telling metrics. The size of the Whitesmiths C compiler doubled in lines of source by the time we achieved full compliance with Standard C. It also doubled in bytes of executable code. And the size of the reference manual that went with it doubled in pages. I believe Standard C is still intellectually manageable, but is beginning to strain the bounds of a "small" language. Think how big the language would have gotten had committee X3J11 tried to please everyone. Or even just me. Standard Finalized The ANSI C standard has been adopted! The ANSI Board of Standards Review (BSR) voted unanimous approval at their December meeting of the draft developed by committee X3J11 and approved by X3. BSR was meticulous in informing the complainant who had delayed progress of the standard for the past year. He was given a generous period of time to file a further protest with BSR. The time period expired, however, with no protests filed. The official designation if the new C standard is ANSI X3.159-1989. It came in just under the wire, but it did earn a 198X designation. ISO Update The C standard commenced its six-month balloting period as a "draft international standard" (DIS) in December 1989. That is normally the final approval process before SC22 sends the draft on for mechanical review and adoption by ISO. It is widely understood, however, that both the United Kingdom and Denmark are determined to make changes in the C standard at the ISO level. A meeting of the ISO C committee WG14 will be held in London in late May or early June 1990 to commence work on two "normative addenda." These were approved by the parent committee, SC22, at a recent meeting. One addendum is an attempt by the British to make the language of the standard more precise. The other is expected to add machinery for writing C source files more readably in European character sets. Once these normative addenda are developed and approved by WG14, they must follow the same approval path through ISO as the standard developed by X3J11. It remains to be seen whether the DIS will be held up pending approval of the addenda. It also remains to be seen how much support exists within ISO for amending the ANSI C standard. Dr. C's Pointers(R) Error Handling In C Rex Jaeschke Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091 or via UUCP at uunet!aussie!rex. Handling errors in programs is easy. You just don't make any! Well, it's not quite that simple since every now and then your programs must deal with input provided by a human, and humans make mistakes. (Who was it that said "Computing would be real fun if it wasn't for users."?) Certainty, it is possible to validate data before attempting an operation but it's also common to assume that the routine receives valid data, and design the routine to recover if faulty data causes a process to fail. That is, don't pay the price of validation every time, only when invalid data is detected. However, this approach can break down, particularly if it is impossible, difficult, or expensive to recover from certain errors. And the earlier you trap bad data, the more information you will have about its origins and what to do next. Approaches To Error Handling Unlike other mainstream languages, most of the things that can fail in a C program are library functions. Since C has no I/O statements, there are no equivalents to END= and ERR= in FORTRAN's READ and WRITE statements. There is also no equivalent to BASIC's ON ERROR GOTO. About the only kind of errors that can be generated in the C language itself are things like arithmetic over- and underflow, memory access violations (either by attempting to dereference a pointer not pointing to an object or function or by a pointer cast to an unaligned type), and stack overflow. All of these are design issues and will not be discussed here. Since much of the "real" work in C is done via functions, any error information must be communicated between the function detecting the error and that function's caller. This is typically done either by returning an error indicator value or by initializing an error variable passed in by address, or by a combination of both. For example: status1 = f(arg); if (status1 != 0) /* handle error */ Here, the function returns zero on success and a specific error value on failure. In the next case: status2 = g(arg, &errorcode); if (status2 == ERROR) /* handle error */ the function reserves one return value only to indicate an error. The variable errorcode (passed in by address) contains the actual reason if ERROR is returned. Unfortunately, none of C's standard library functions uses either of these. (Well certainly not the second approach anyway. You could argue that malloc and friends use the first approach since the only "real" reason they fail is not that enough memory is available, regardless of what they were attempting to do.) C has its own approach; inter-function error communication is done via a global variable, an approach that most structured programmers are strongly warned against for a number of very good reasons. However, that's the way it is so I won't philosophize about it here. errno To The Rescue Of course, the global keeper of the error number is our dear friend errno. Historically, errno has been a global int in every program we've written whether we have used it or not. It's really been like a reserved word in the namespace of external identifiers. And since one of ANSI's jobs is to consolidate existing practice, errno survived the ANSI C standardization process pretty much intact. To help get you into the spirit of things, here's an example of using errno (Listing 1). It is the programmer's responsibility to clear errno (a zero value means "no error") each time before calling a function that may set it. No library function is required to clear errno explicitly. You must also test errno or store its value for later testing, immediately after the library function in question returns. If you do not, any other library routine (or user-written routine for that matter) might overwrite errno in the meantime. That is, just because a library function is not documented as setting errno, doesn't mean that it doesn't use it for a scratch variable. Messy, but that's the case. In the example above, the first occurrence of errno = 0 is unnecessary since at program startup errno is supposed to be cleared. ANSI C And errno The proposed ANSI C Standard pins down a number of things regarding errno. The header errno.h was invented as a home for the definition of errno itself and various macros of the form E* that relate to reporting error conditions. errno is allowed to be either a global int or macro that expands to a modifiable lvalue having type int. That is, it could be a macro that expands to something to like *_ _errno(). Only two error value macros are defined by ANSI C: EDOM for domain errors and ERANGE for range errors. However, an implementer is permitted to provide their own E* value macros in this header. The library functions that are documented as setting errno are: acos, asin, cosh, exp, fgetpos, fsetpos, ftell, ldexp, log, log10, perror, pow, signal, sinh, strtod, strtol, and strtoul. Note that fopen (and most other I/O functions) are not included. As such, you cannot portably recover from a file open failure (which is not surprising since there can be many system-specific reasons for such an error). The library functions perror and strerror can be used to produce formatted messages corresponding to errno's value. However, the commonly implemented table of messages, sys_list, and its associated machinery are not part of Standard C. An Error Handling Envelope Rather than explicitly clear and test errno all the time, it is much more elegant to have an error handling interface inserted between your code and that in the library. Unfortunately, the standard library uses two different ways to return an error a negative int value or a NULL pointer value. You may have to have two interfaces, one to handle each. Calling an extra function for each math library operation, for example, is an added cost but so too is including the explicit error checking in each place. It's the old speed versus code size tradeoff. Listing 2 uses the setjmp/longjmp library mechanism to implement recovery from attempts to take the square root of a negative number. One problem here is the need to explicitly pass the setjmp context into mysqrt -- it doesn't really look like a call to sqrt. You could hide this behind a macro: #define sqrt(d) sqrt((d), context) but you would still need to define context yourself. Since ANSI C permits a macro to expand to its own name without recursive death, all existing calls to sqrt could be redirected in this manner with intermediate error checking being added at the cost of recompilation in the presence of this macro. Perhaps a cleaner approach is to make context a global so it never need be passed in. A word of caution about redefining sqrt though. ANSI C effectively reserves the names of all standard library functions and if you invent something of your own with the same name, the behavior is undefined. However, for a given implementation the macro approach may work. The matherr Concept Many systems provide a cleaner way to trap (and also recover from) certain kinds of library errors. The idea originated with UNIX systems but has been widely emulated. It involves a function called matherr. Each library routine that can detect certain errors calls another library routine, matherr. Now this default version of matherr may do nothing or it may simply write an error message to stderr. By writing your own version of matherr and linking to it instead of the library version, you can take control when one of the trapable errors occurs. Listing 3 shows a primitive version of matherr. In reality you would probably try to recover from the error. When Listing 3 is linked with the first example above, the following output is produced. #1 OK Function sqrt failed with error type DOMAIN #2 OK The reason the second call to sqrt does not show errno set is that matherr returned a non-zero value, indicating that the normal reporting of the error condition should be bypassed (presumably because the error has been "fixed" in the userwritten matherr). With matherr you can bypass or follow the default error handling rules and to a certain extent you can recover from errors and substitute a value that should be returned by the math function instead. The exception structure has several other members too and the type member values are usually macros or enumeration constants defined in math.h along with the structure template. Check your library manual for more details. Note that matherr is not included in ANSI C. Numerical C Extensions Group This group (abbreviated as NCEG) was formed by me early in 1989. Its purpose is to publish a technical report on directions for adding extensions to Standard C, to support such things as complex arithmetic, IEEE floating-point, vector and parallel operations, and variable dimensioned arrays. The IEEE floating-point standards deal with a number of interesting things (such as +/-infinity and not-a-number (NaN)) that need to be supported (and taken advantage of) in modern C compilers. According to leading IEEE numerical C implementers, errno gets in their way. Likewise for vendors of C compilers doing parallel operations. As such, errno might well have to be ignored in some implementations, simply for the sake of functionality and/or performance. As I write this (mid-December 1989), the ANSI C Standards Committee X3J11 is receiving a letter ballot asking members to admit NCEG as a full working group (tentatively called X3J11.1) within ANSI C. The results of this ballot were 22 for and one against, and will be forwarded to SPARC for their consideration. Contact me for further information on NCEG. Listing 1 #include <stdio.h> #include <errno.h> #include <math.h> main() { double d; errno = 0; d = sqrt(10); if (errno == EDOM) printf("#1 domain error\n"); else printf("#1 OK\n"); errno = 0; d = sqrt(-10); if (errno == EDOM) printf("#2 domain error\n"); else printf("#2 OK\n"); } #1 OK #2 domain error Listing 2 #include <stdio.h> #include <setjmp.h> main() { double value, result; jmp_buf context; double mysqrt(double value, jmp_buf context); while (1) { if (setjmp(context) != 0) printf("Value is out of the domain for sqrt\n"); printf("Enter fp value: "); scanf("#lf", &value); if (value == -1.0) return; result = mysqrt(value, context); printf("sqrt(%f) = %f\n", value, result); } } #include <errno.h> #include <math.h> double mysqrt(double value, jmp_buf context) { double d; errno = 0; d = sqrt(value); if (errno == EDOM) longjmp(context, 1); else return (d); } Enter fp value: 1.234 sqrt(1.234000) = 1.110856 Enter fp value: 12345 sqrt(12345.000000) = 111.108056 Enter fp value: -0.000000 sqrt(-0.000000) = -0.000000 Enter fp value: -0.0000001 Value is out of the domain for sqrt Enter fp value: -5 Value is out of the domain for sqrt Listing 3 #include <math.h> /* get struct */ int matherr(struct exception *pe) { int retval = 1; /* assume we'll recover */ printf("Function %s failed with error type ", pe->name); if (pe->type == DOMAIN) printf("DOMAIN\n"); else if (pe->type == SING) printf("SING\n"); else if (pe->type == OVERFLOW) printf("OVERFLOW\n"); else if (pe->type == UNDERFLOW) printf("UNDERFLOW\n"); else if (pe->type == TLOSS) printf("TLOSS\n"); else { printf("UNKNOWN\n"); retval = 0; /* can't handle here */ } return retval; } Questions & Answers More On Passing Arrays And Precedence Rules Ken Pugh Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 493-4390. When you hear the answering message, press the * button on your telephone. Ken also receives email at kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP). Announcing The Great Name/Obscure Code Contest Based on a reader's response later on in this column, it appears reasonable to launch a new contest. Send examples of the worst names or abbreviations that you have seen in other people's programs (or even your own). Include both the name and a description of what it is supposed to represent. The best (or worst) examples will be published here, with credit for your submission. The name of the programmer who actually wrote the code in which the name is used will not be mentioned without his/her express permission. Q When we met at the Triangle C Users' Group meeting, you invited questions at any difficulty level on 'C'. So, here is one: I try to pass a char array to a function. In the function, I use sizeof to get the array's allocated size. It doesn't work. The function is returning the sizeof passed_array as if it were a pointer, two bytes long. It must be a pointer with the location of the beginning of real_array in it. Right? I asked our instructor about this, and he said the code in Listing 1 would work. At least, that's how I understood what he said. It produces the results shown below it when compiled with Power C from MIX or Instant C 3.0 from Rational Systems, and fails with these compiler messages under Turbo-C 1.5: (marker between char and passed_array[]; in getarray()'s formal parameter list) " Error 10:Type mismatch in redeclaration of 'getarray' " (then, at the end of function getarray(), it complains:) " Warning 13:Parameter 'passed_array' is never used in function " How can I get an array's allocated size within a function it has been passed to? What is bothering Turbo-C, and why don't the other compilers complain similarly? If an array's name is really a pointer to the array's beginning, why doesn't sizeof(real_array) also return a 2 when called in main()? (Not that I WANT it to ... :-) Glenn Jordan RTP, NC A The declaration of an array in a function as a local variable actually sets aside storage for the array. In this sense the sizeof(real_array) is 20, because that is how much storage is set aside for it. The name of an array (one declared as local variable) (or a static/external) represents the address when passed to a function, When you declare a parameter to be an array (e.g. passed_array), you are not really declaring an array at all. You are really declaring that the parameter is a pointer. You pass an address in the call (i.e. real_array), and the function receives that address in passed_array. The sizeof passed_array is the size of a pointer (two or four bytes, depending on the memory model). Alternatively, you could have declared it as int *passed_array;. For parameter declarations, both int *passed_array; and int passed_array[]; are equivalent. The compiler interprets both as meaning that the parameter is a pointer. Your instructor may not have mentioned that you can reference an individual int with a pointer by using either: passed_array[i] or *(passed_array + i) The compiler treats both declarations identically. Instead of using sizeof, you could pass both a pointer to the array and its size in either bytes or in elements. Usually the element count is more useful than the byte count: function(real_array, 20); ..... function(passed_array, size) char passed_array[]; int size; { for (i = 0; i < size; i++) { ...... You can avoid passing the size by designating a unique value for the end of the valid elements in the array just as strings (character arrays) are terminated with the NUL (zero or all bits off) character. Remember, the terminator must be some unique value that will never appear as a valid value for the type you are manipulating. Q My instructor says that (ch)++ evaluates as increment by one type-size-length the value found in address ch. No quarrel there. Being curious, I asked him how the expression would evaluate without the parenthesis (). He said that since ++ and * are unary operators, and that * had higher precedence than ++, the following was true: *ch++ is evaluated identically to just ch++ without the * and that in both cases, you would just increment the address ch by 1. I disagreed (never disagree with your instructor). I said that if * had higher precedence than ++, like he insisted, that *ch++ should be exactly the same as (*ch)++ He said no, since ++ is a unary operator, it could only see the ch, not *ch, even if * had already operated on ch. I told him I thought that was really crazy, and he got upset... Now, actually, as you experienced programmers know, * and ++ have the exact same precedence, and are evaluated right-to-left when there is an associativity question, as above. So, he was right, the parentheses are required to make (*ch)++ increase the value held in address ch. But the stuff about ++ acting only on the adjacent operand must be wrong, right ???? I mean, wouldn't: ++*ch do exactly the same as (*ch)++ He claimed that in the case, ++*ch, the ++ would not know what to do with the operand *, while I claimed that *ch would already be evaluated to the single-value contents of address ch when ++ attacked. He responded that he had been programming in C for years, and knew what he was talking about. Comments? Perhaps I am the one who is misunderstanding... Glenn Jordan RTP, NCA A You are correct in your interpretation. By associativity and precedence rules: ch++ equals *(ch++) Both use the address contained in ch as a pointer and then increment the contents of ch using pointer arithmetic. ++*ch equals ++(*ch) These forms use the address in ch as a pointer and increment the contents at that address. *++ch equals *(++ch) These forms increment the contents of ch using pointer arithmetic and then use that new value as a pointer. In order to post-increment the contents of a target location, you need to use explicit parentheses to overcome the precedence, yielding: (*ch)++ This combination uses the address in ch as a pointer and increments the contents at that address. For example, if we assume that doubles are eight bytes long, then incrementing a pointer to a double increases that pointer by eight. The comments in Listing 2 detail the behavior of various pointer/increment combinations. Reader Responses: Character Constants This letter is in response to your discussion of character constants on pages 113 and 114 of The C Users Journal, January 1990. I believe that your discussion is flawed and that the Microsoft C and Quick C implementations do not comply with the draft standard. You say that the character è (where e is replaced by the accented e, code 138 decimal) is not part of ASCII and so the compiler could do with it what it wants. I refer you to the following items in the December 7, 1988 draft C standard: Section 2.2.1, Page 11, Line 12: Both the basic source and the basic execution character sets shall have at least [emphasis added by RHG] the following members ... Section 3.1.3.4, Page 29, Line 16: c-char: any member of the source character set except the single quote ', backslash \ , or new-line character escape-sequence Section 3.1.3.4, Page 30, Line 33: If an integer constant contains a singlecharacter or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int. Given that Microsoft's C and Quick C compilers accept character 138 without any diagnostic, I think it is safe to assume that they consider the character to be part of the source character set. Therefore, the literal is indeed a legal character constant and so should be treated like (int) (char) è which has the value -118 if characters are treated as signed. Hence, your example demonstrates a bug in the compilers, not an implementation dependency. I have copied this note to the postmaster at Microsoft in the hope that he will forward it to the appropriate person in the C compilers development group for comment. You may also use this letter in a future column if you see fit. Richard H. Gumpertz Leawood, Kansas A You (Mr. Gumpertz) are correct, I believe. The sample program in the article (pg. 114) shows a bug that has actually been in MS C since well before the ANSI standard (I duplicated it all the way back to C v4.0). We do appreciate you bringing this to our attention. This bug will be fixed in our upcoming version of MS C. Thanks. Dave Weil Group Development Mgr., System Languages Microsoft Corp. Thank you. I stand corrected on this technicality. You are correct if a character representation is accepted in a character constant, then it should act according to the rules for characters. Non-ASCII characters are not in the ANSI standard source character set that must be supported by a conforming compiler. I strongly urge against using non-ASCII characters as character values. You can always use a #define in their place. Not only do you avoid the inherent non-portability of such a program, you also avoid word processing problems. For example, I was porting a program somebody had written with a word processor that accepted non-ASCII characters. My word processor does not accept them. It uses the high order bit as an internal designation of the end of a word. It read the program, but the non-ASCII characters appeared as the ASCII value with the high-order bit off. If you do use characters with the high-order bit on, then you could declare the variables that use them as unsigned chars preventing sign extension when the char is expanded to an integer. --KP Naming Conventions And Indentation Everyone knows the "CMP" stands for corrugated metal pipe, not compare or compute. --Marcus Russell, West Berlin, NJ This comment refers to a previous response I had given to a question regarding naming conventions. I suggested that one should adopt some standard abbreviations, if one did not spell out names in full. Comments from other people have suggested that there is a widespread distinction between the vowel droppers and the first few letter users. "compute" could be abbreviated as "cmpt" or "comp", depending on your preference. In my earlier days I used to use "cmp" as a shortening of "cmpt". This always caused conflict when "compare" got shortened to "cmpr" and then to "cmp" also. I find it interesting reading listings in this and other magazines. I believe that a program should be almost as readable as a book. Using fully spelled out variable names contributes as much as any other factor to easier understanding of a program. This leads me on to another topic of readability -- the great brace debate. Brace alignment of compound statements seems to be a topic that provokes a variety of opinions. I think that, like taste in art, each person's view is different and sufficient justification can be developed to support any particular stand. A recent article in the C Gazette had some words to say about indentation styles. I recommend the magazine for those who like reading C code in order to learn about it. There are a lot of source listings in that magazine. There are many possibilities for brace alignment. If braces are placed on lines by themselves, then either or both can be aligned with the enclosed statements or one tab stop to the left of the statements. Alternatively, the opening brace may be on the same line as the controlling statement. The closing brace might be on the same line of the last enclosed statement. This yields a number of possibilities. In Chapter 14 of All on C, I listed four common formats. Here are those with a few more. I've left off several variations which appear rather ugly and of no use. Braces on separate lines and aligned with enclosed statements. if (x) { ... } Braces on separate lines and aligned with controlling statement. if (x) { ... } Opening brace on same line as controlling statement, closing brace aligned with enclosed statements [my preference -- rlw]. if (x) { ... } Opening brace on same line as controlling statement, closing brace aligned with controlling statement (Kernighan and Ritchie style). if (x) { ... }; Opening brace on same line as controlling statement and closing brace on same line as last enclosed statement. if (x) { ...} Luckily there are "pretty-print" programs that you can use to alter the style of the indentations, for programs you have written or that you have received and are trying to alter or maintain. However it's usually wise to adopt one style and use it faithfully. I originally adopted the style: if (x) { ... } The initial choice was arbitrary. Later I reviewed my usage and found a few compelling reasons to switch to: if (x) { ... } This appearance is more consistent with the use of indentation for non-compound statements. Those looks like: if (x) statement It also makes it easy to match up braces. The other styles which have unaligned braces make it more difficult to do this. Those of you who submit code for this column will find that I have reformatted the listing for the sake of consistency within the column. --KP Listing 1 #include <stdio.h> main() { char real_array[20]; printf("Realarray can hold %d chars.", sizeof(real_array)); getarray(real_array); } void getarray(passed_array) char passed_array[]; { printf("\nPassedarray can hold %d chars.", sizeof(passed_array)); } --------------------------------------------------------------- Results : (under Power C and Instant C) Realarray can hold 20 chars. Passedarray can hold 2 chars. --------------------------------------------------------------- Listing 2 double d[5] = {1., 2., 3., 4., 5.}; /* Assume this starts address 100 */ double *ch; double e; ch = d; /* 100 placed into ch */ *(ch++) = 5.; /* 5. placed in d[0] ch incremented to 108. ++(*ch); /* Contents of d[1] (at address 108) incremented by 1, to 3. */ *(++ch) = 7.; /* ch incremented to 116 7. placed in d[2] (at address 116) */ (*ch)++ /* The 7. at d[2] is incremented to 8. */ e = ++(*ch); /* The 8. at d[2] is incremented to 9. 9. is placed in e */ e = (*ch)++; /* The 9. at d[2] is placed in e d[2] is incremented to 10. */ How To Do It... In C Practical Schedulers For Real-Time Applications Robert Ward Robert Ward is president of R&D Publications and author of Debugging C, an introduction to scientific debugging. He has done consulting work in software engineering and data communications and holds an M.S.C.S. from the University of Kansas. What Is Real Time? Real-time is not a synonym for "real-fast". Contrary to popular opinion, making everything "real fast" won't necessarily make a real-time program work correctly. A much better synonym is "on time" since, in a real-time program, certain events must happen at a specific time. Making input and output events happen "on time" is pretty straightforward if you have only one I/O path to worry about. But real-time programs, especially embedded real-time systems, are often also multi-tasking programs. Most real-world, real-time programs are expected to simulate several pieces of simultaneously operating hardware. When analyzing a project that is both multi-tasking and real-time, the designer must recognize that some tasks are less urgent than others. For each real-time task, "on time" may have a different meaning, depending upon the time constraints associated with that particular task. A continuously running built-in self-test, for example, usually runs without any time constraints, even if it is testing a real-time system. Some real-time events may need to happen at a specific wall-time; others at a specific interval from some external event. The real-time program must properly balance these varying needs at every instant, under every imaginable combination of input conditions. This article will show how an appropriate general purpose scheduler can significantly reduce the design complexity in such programs and also significantly increase your confidence in the feasibility of the design even before you write any significant amount of code. What Is A Scheduler? A scheduler is simply code that decides which task to perform next. Thus a scheduler can be as simple as the loop in Listing 1. This "slop-cyclic" scheduler cycles repeatedly through each task (cyclic) at a rate that may vary depending upon the time required to execute each task (slop). A "rate-cyclic" scheduler can be almost as simple, as shown in Listing 2. The rate cyclic scheduler cycles through all the tasks at a constant rate of once per clock tick. If you are accustomed to writing real-time systems as one large loop with input polling and capture code sprinkled throughout the system, it may seem pretentious to describe Listing 1 as a "scheduler." After all, one could argue, the execution sequence is, like the polling loop, just a hard-wired loop -- the so-called scheduler just adds calling overhead. Even this trivial scheduler, though, offers several important advantages. First, all the scheduling information is in one place. If you must alter the code (and the timing relationships between the pieces), you know just where to look to make the necessary scheduling adjustments. For example, if after writing the project, you found that task3() and task4() didn't execute as rapidly as expected, causing task1() to "miss events", you might solve the problem by making the change in Listing 3. Now task1() gets scheduled more than once during the cycle. You can even drop one task into the "middle" of another by breaking one task into several subparts (see task2a() and task2b() in Listing 4). With the scheduling code isolated in a single module, the designer also reserves the option to completely change the scheduling mechanism. For example, instead of splitting task2 into two parts (as in Listing 4), you might obtain the same effect by installing a more sophisticated pre-emptive scheduler like the one we'll develop later in this article. A distinctly separate scheduler can also make development and testing much easier. You might plan to use a simple loop like Listing 1 to perform initial testing and characterization of your task code and then install a more sophisticated scheduler for final test and production. Priority & Pre-emption In the trivial schedulers of Listing 1 - Listing 4, all tasks are of equal importance and the order of task execution is statically determined by how the code is written. This egalitarian approach forces the programmer to adjust for differences among the tasks by adjusting the code, for example by making multiple calls to task1() and by splitting task2() into two parts. If the scheduler were more competent, we wouldn't need to make these coding compromises. What we need is a scheduler that can recognize that some tasks (task1() for example) are more important than others (have higher priority) and that sometimes long tasks like task2() may need to be interrupted (be pre-empted) so that some shorter, more important task can run "on-time". The scheduler should not only recognize these differences, it should be able to dynamically adjust the execution order to accommodate them. Prioritized scheduling can be accomplished by augmenting the simple slop cyclic scheduler so that it uses a different control structure driven by several "ready" lists. Listing 5 presents the basic structure for a very simple environment where each task is assigned to a different priority level and each ready list consists of a separate flag in the structure ready. The ready flags are set by an interrupt service routine that captures related input, or by some other task (for example, task2a() would set task2b()'s flag, thereby "scheduling" task2b()). Listing 5 will schedule events dynamically based on their readiness, but it still lets each task run to completion. The next level of scheduler sophistication, pre-emption, complicates matters considerably, but is still easy to implement once the calling conventions are understood. The pre-emptive scheduler pre-supposes an environment where virtually all events are serviced by interrupts. These interrupts create natural "break points" at which other tasks can be pre-empted. (Actually, you can get the same effect by sprinkling special calls, gotos and stack manipulations throughout each task, but you won't want to maintain the result.) Each interrupt service routine ends with a call to the scheduler. The scheduler then examines all higher priority ready lists to see if some more urgent task needs to run. If not, the scheduler simply returns, allowing the interrupted task to continue. If there is a more urgent task waiting, the scheduler calls it (see Listing 6). This scheduler treats LEVEL1 as the highest level of priority. IDLE tests greater than all other levels. Listing 6 assumes several conventions and data structures not explicitly shown. The functions push() and pop() manipulate a stack of current priority levels. If you are willing to pass the current priority level as a parameter to every task, and to accept responsibility for always calling the scheduler with the current level's priority as a parameter, you can stack this information implicitly as function parameters. The function getnext() searches a linked ready list for actions more urgent than the action just interrupted. If a more urgent task is found, getnext() copies its descriptor from the ready list to the next structure. The task is then invoked via a pointer to function. The assignments marked /*lock*/ in Listing 6 must be executed with interrupts disabled to avoid potential synchronization errors. Before using this code, you must at least bracket these lines with code to disable and enable interrupts. The scheduler must always be called with interrupts enabled -- otherwise it will provide only one level of pre-emption and may cause some interrupts to be missed. This implementation is very stack-intensive. Each interrupt can potentially generate three stack frames for each interrupt (interrupt, interrupt's call to scheduler, scheduler's call to task). In an environment where many interrupts can arrive simultaneously, the stack may expand very rapidly. Some Design Advice A real-time design should begin with a careful analysis of the possible events and the relationship among them. The goal is to decouple (with respect to time) as many actions as possible. Decoupling will often greatly increase your ability to service critical events, by allowing the great bulk of the processing to occur in the time between critical events. This analysis should identify the time-critical events (those that really must be done NOW), and prioritize the other events according to their relative criticality. Generally, actions subject to similar constraints should be made members of a priority class and broken into execution units that are small relative to the time-tolerance of the next, more urgent class. Early in the design analysis, you should compute the probable CPU utilization. If the tasks assigned to the system will consume more than 70 percent of the CPU throughput, you should probably consider the design impractical and either find faster algorithms for certain modules, add additional hardware, or run on a faster CPU. In truly asynchronous environments, a processor utilization of greater than 70 percent greatly increases the likelihood that one of your ready queues will grow to an unmanageable length. You can make an exception to the 70 percent rule if you can prove that your waiting lists will never grow beyond some small fixed length. The structure of the program will mirror the classification of events. Critical events will be serviced by interrupt handlers, non-critical events will be processed according to their priority by a general-purpose scheduler, and queues will handle communication among the pieces. Since these systems are almost always concurrent, it is imperative that the programmer be comfortable with the issues of deadlock avoidance and shared resource management. With careful analysis of events and adequate throughput, a simple cyclic scheduler is often adequate. In some applications where some actions consume very large amounts of time compared to the time-tolerance of higher-priority tasks, it may be necessary to implement a pre-emptive scheduler. A Case Study Suppose you are to build a real-time system with four major functions: Process Control. This function consists of monitoring a sensor on a production line and adjusting a control output to keep the process within acceptable limits. The sensor is to be sampled every 100 ms (±1 ms) and necessary output corrections must be made within 100 ms. Statistical quality control methods are to be used to decide if the sample input represents an unacceptable deviation. Test programs have shown these calculations to require 7 ms. A programmable internal timer is to be used to control the sampling interval. The timer can generate interrupts. Manual Override. The system should accept human specifications for the control output from a keyboard. This keyboard debounces inputs, but once the character is validated, it must be read within 70 µs. The keyboard's "character ready" status line is connected to the CPU's interrupt line. Time-of-day Clock. An hour and minute time-of-day display. Presumably it will take its timing from the internal timer. The display is mechanical, each digit on a "flip board", driven by a stepper motor. The motor must be pulsed through 60 steps to change a digit. Each step takes 10 ms, to complete. Pulses are directed to the appropriate digit position by a multiplexor, thus the position must be selected and then the pulses sent to change a digit. Synchronous Communications Support. The system functions as a "repeater" in a communications network. Supporting hardware captures data a block at a time and requests your system to perform a crc-16 on the data. If the data is correct, your system must initiate a write 75 ms (± 100 µs) after the block was marked received. A failure to meet this requirement will result in the subordinate hardware missing a polling cycle and loss of the block. The subordinate hardware's "block ready" shares the interrupt line with the keyboard. The block ready signal remains set until reset by the CPU. Table 1 summarizes these specifications and adds estimates for each task's execution time. These execution time estimates can be based on expected code size for each task, on prior experience with similar problems, on padded measurements of execution speeds for certain critical inner loops, or on measurements taken from "prototype" implementations for each task (perhaps written in a high-level language). Since adding a scheduler to the design makes each task a piece of stand-alone code, time spent coding each task for these measurements isn't just wasted. Most of your characterization code can be used in the finished design. Critical Operations Capturing a keystroke and (because it shares an interrupt with the keyboard) capturing a block ready indication are the only critical tasks in the system. These will be processed by an interrupt handler with interrupts turned out throughout the service. Priorities Level 1. Capturing a data sample, capturing a clock tick and initiating a block write are "nearly" critical. It also makes sense for all three to be handled in the same interrupt service routine. Since they have lower priority than the critical events, interrupts will be enabled during as much of the service routine as possible. Thus the data sampling interrupt routine could be interrupted during its execution. We'll assume that the first 15 µs of this process can't be interrupted. Note that this priority level isn't recognized in the scheduler because it is fully processed in the interrupt handler -- I just wanted to show that even interrupt handlers differ in their urgency. Level 2. The block checksum and sample analysis will be grouped at the next level of priority. The checksum has been broken into several short parts so that it can't "block out" the input analysis for more than a few milliseconds. Each part will schedule its successor after it completes. This ensures the sample analysis will be able to "sneak" in between two parts (the sample analysis is scheduled by an interrupt routine, possibly while the checksum is executing). Level 3. All clock control and keystroke parsing will be performed at level 3 (or background) activities. Notice that even though none of these events consumes more than 10 ms, if three such events were allowed to be interspersed with level 2 events, the block check would miss its "output deadline." Level 4. This level is reserved for pure "waiting" activities, such as waiting for the clock stepping motor to time-out. Listing 7 presents the structure of the entire application in a C-like psuedo code. This code would use the scheduler of Listing 6. Throughput Requirements Processor utilization is computed by combining the frequency estimates and time consumed estimates from Table 1. Table 2 shows that this design falls well within the 70 percent rule, and should probably be feasible. Total utilization isn't the only prerequisite to feasibility, however. The design must also meet the response time restrictions. Response time performance is evaluated by calculating the worst time performance for each event. Worst case analysis should always include the possibility that the program has just responded to some interrupt or that multiple copies of the analyzed interrupt arrive at the closest interval possible. Table 3 analyzes the design's latency when performing a process control cycle. Additional Ideas When a variety of events happen at non-harmonic intervals, consider implementing a timer scheduling queue. Events can specify the timing of other events by putting a timer programming request in a special queue. If your system has multiple interrupting events and no vectored interrupts, restrict the interrupt handler to just capturing the interrupt and queueing it. The highest priority task then examines the information in this queue and schedules other work. To make certain two tasks of equal priority get fair scheduling, partition them into pieces (as with the checksum above) and let each piece upon completion schedule its successor. This scheme will allow the shorter tasks to be scheduled. This trick can often eliminate the need for a pre-emptive scheduler. Conclusion An appropriate scheduler can greatly simplify real-time designs by allowing the individual task modules to remain ignorant of their interaction with other real-time tasks. A distinct scheduler also simplifies debugging and performance analysis. If you aren't comfortable with the concurrency issues implicit with handling the ready queues and other shared resources in the dynamic schedulers, you can still use the static versions and preserve the option of incorporating a more complex scheduler when the project eventually demands it. Little schedulers like those developed here are usually all the real-time support a controller needs. They offer distinct advantages over a commercial real-time kernel: the scheduler is smaller, simpler to understand, comes complete with source code, and is much less expensive. Table 1 Events Trigger time latitude freq consumed capture sample timer interrupt 70 us +-500 us 10/s analyze sample input available 7 ms -0,+92 ms 10/s output correction analysis complete 35 us n/a 10/s capture keystroke G.P. interrupt 35 us -0,+35us 5/s parse input keystroke stored 1 ms 500 ms 1/s clock tick timer interrupt 15 us +-500 us 10/s minute change clock tick 150 us ? 1/60s digit change minute change 50 us ? 4/60s digit step digit change 10 ms ? 320/60s capture block ready G.P. interrupt 35 us -0, +35us 7/s check 1st q capture block 13 ms +- 3 ms 7/s check 2nd q 1st q checked 13 ms +- 3 ms 7/s check 3rd q 2nd q checked 13 ms +- 3 ms 7/s check 4th q 3rd q checked 13 ms +- 3 ms 7/s initiate block write 4th q checked 35 us +- 100us 7/s Table 2 Events factors time used capacity used capture sample 70 us * 10/s 700us/s .0007 analyze sample 7 ms * 10/s 70 ms/s .07 output correction 35 us * 10/s 350us/s .000350 capture keystroke 35 us * 5/s 175us/s .000175 parse input 1ms/s .001 clock tick 15 us * 10/s 150us/s .000150 minute change 150us * 1/60s 150us/60s .000003 digit change 50 us * 4/60s 200us/60s .000004 digit step 10 ms * 320/60s 53.3ms/s .053300 capture block ready 35us * 7/s 245us/s .000245 check 1st q 13ms * 7/s 91ms/s .091000 check 2nd q 13ms * 7/s 91ms/s .091000 check 3rd q 13ms * 7/s 91ms/s .091000 check 4th q 13ms * 7/s 91ms/s .091000 initiate block write Total Utilization .489927 Table 3 Start checksum 0 Receive GP interrupt 35 us Receiver Timer interrupt 15 us Receive second GP intr. 35 us complete timer intr 55 us complete chksum part 13 ms Perform analysis 7 ms ---------------------------- Total 20.14 ms Listing 1 while (FOREVER) { task1(); task2(); task3(); task4(); } Listing 2 while (FOREVER) { /* sleep until awakened by clock interrupt */ sleep(clock_tick); task1(); task2(); task3(); task4(); } Listing 3 while (FOREVER) { task1(); task2(); task1(); task3(); task1(); task4(); } Listing 4 while (FOREVER) { task1(); task2a(); task1(); task2b(); task1(); task3(); task1(); task4(); } Listing 5 while (FOREVER) { if (ready.task1) task1(); else if (ready.task2a) task2a(); else if (ready.task2b) task2b(); else if (ready.task4) task4(); else if (ready.task3) task3(); } Listing 6 This scheduler treats LEVEL1 as the highest level of priority. IDLE tests greater than all other levels. struct actions { int priority; void (*action)(); char * arg; struct actions *nxtptr; } ready [MAX_WAIT], next; main () { ... /* initiate interrupt handlers */ while (TRUE){ clevel = IDLE; do_loop(); } } void do_loop() { scheduler: if (clevel == LEVEL2) return; else if ((clevel > LEVEL2) && (getnext(LEVEL2) != EMPTY)){ push (clevel); clevel == LEVEL2; /*should be locked */ (*next.action)(next.arg); clevel=pop(); /*lock*/ goto scheduler; } else if ((clevel > LEVEL3) && (getnext(LEVEL3) != EMPTY)){ push (clevel); clevel == LEVEL3; /*lock*/ (*next.action)(next.arg); clevel=pop(); /*lock*/ goto scheduler; } else return; } Listing 7 Psuedo code for interrupt handler: On keyboard interrupt do { save context if keyboard has input, save with work request to level3 queue if block is ready { reset indicator add check_1st_blk to level2 queue } restore context enable interrupts return from interrupt } On timer interrupt do { save context if write ok flag set{ /*done with interrupts off to avoid clash */ initiate write clear flag } enable interrupts capture sample, save with work request to level2 queue step minute counter, on overflow { reset counter put minute change work request in level3 queue } return from interrupt } Psuedo code for tasks Analyze sample { perform statistical analysis if out of bounds { compute correction output correction } return } Check Block Pt 1{ compute partial checksum save result with pt2 work request in level2 queue return } Check Block Pt 2{ continue checksum save result with pt3 work request in level2 queue return } Check Block pt3 { continue checksum save result with pt4 work request in level2 queue return } Check Block pt4 { complete checksum if ok, set write ok flag return } Parse input { save input parameter in command line buffer. If input keystroke is a terminal symbol{ parse buffer; output manual correction; clear buffer; } return; } Minute change{ increment minutes-ones add work request for digit change to minutes-ones to level4 queue add work request for digit step level3 queue on overflow { add work request for digit change to minutes-tens to level 4 queue add work request for digit step level3 queue } on tens overflow { add work request for digit change to hours-ones to level 4 queue add work request for digit step level3 queue } on hours ones overflow { add work request for digit change to hours-tens to level 4 queue add work request for digit step level3 queue } on hours-twelve overflow { add work request for digit change to hours-ones to level 4 queue for (i=0; i<8; i++) add work request for digit step to level3 queue add work request for digit change to hours-tens to level 4 queue add work request for digit step to level 3 queue } } digit change { set multiplexor to select requested digit } digit step { for (i=1; i<60; i++) { add work request for one_pulse to level4 queue } } one_pulse { pulse stepping motor busy-wait for 10 ms return } On The Networks A Perl Of Great Price Sydney S. Weinstein Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and president of Datacomp Systems, Inc., a consulting and contract programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites and device management for UNIX and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do Internet addressing). Administrivia As I have said in prior columns, I am willing to forward a list of Usenet sites near you for access to Usenet, Netnews and E-mail. However, I can only provide this service to those who send a self-addressed stamped envelope. Also, please include your area code in the request. An area code gives me a greater chance of finding a site that might be a local call for you. Note, however, I do not contact these sites for permission. All I am doing is extracting the names and contact information from the Usenet mapping information and sending you that printout. It is up to you to contact the sites listed in the maps. Remember, they are doing you a favor if they let you connect. Pearl Of The Month: Perl One of the most respected freely distributed software authors on the net is Larry Wall of JPL-NASA. He has written many software tools including the popular netnews reader RN, the source language patching program Patch, and a software configuration and distribution support toolset Dist. His latest large effort has been Perl -- Practical Extraction and Report Language, or Pathologically Eclectic Rubbish Lister. Perl was first released as version 2. This review is of his new release, version 3. To quote the manual page: "Perl is an interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features of C, sed, awk, sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC PLUS.)" That paragraph was true for version 2, but is an understatement for version 3. Perl can now handle binary files, network sockets, and even dbm database files with ease. Perl runs counter to the typical UNIX tool philosophy of "do one item in a tool and hook many tools together with shell scripts and pipes." Perl allows you to combine all the sections together in one efficient script. Perl's two great claims to fame in my opinion are its ability to provide the right set of features for writing useful tools for systems, and its interpretative nature to allow for easy debugging and development. Installation Perl is not a small program, so if snarfed off the network it comes in many parts. Perl also, as of this writing in mid-December (Now you know, these columns have about a four month lead time) six patches have been issued to Perl, making the current version Release 3.0 Patchlevel 6. After unpacking all the parts and applying the six patch files (using Larry's patch utility, of course), the instructions say to run a shell script called Configure. It's worth obtaining Perl just to see how this is done. Configure, a giant shell script, written by the Perl program metaconfig from Larry's Dist package analyzes your system and determines what features of Perl your system can support, where things are located on your system, and a great deal of additional information to make Perl install correctly on your system automatically. If only other packages used this method (Note, Elm does). After the Configure is run, a special version of the make script automatically figures the dependencies for each C file as you have configured them and adapts the Makefile. Then the system is compiled three times, once for normal Perl, once for a version called taintperl, and lastly for a setuidperl version. The tainted versions prevent any command line argument, environment variable, or input nor any result of operations on these values from being used in subshells, system calls, or for modifying files or directories. This is used for setuid scripts. Now, another thing I wish more authors provided: Perl has a rather complete regression test suite to validate Perl's configuration and compilation. This test suite may not catch all problems, but it goes a long way towards providing confidence that a package as large as this one was configured properly and compiled without compiler induced errors. Perl runs the regression test automatically after the system has been built, and performs over 850 separate tests. Features As a combination of the shell, C, sed and awk, Perl has a syntax close to C, with most of its operators, plus the ability to process variables and run subprocesses like the shell, perform pattern matching and substitution like sed, and report generation features similar to awk. A couple of the more interesting features include: Associative arrays: In addition to scalar variables (single numbers or text strings) and normal arrays (vectors of numbers or strings), Perl also provides an array concept called an associative array. This array is a mapping of tuples. Thus the array index, called a key, is itself just a number or text string. The difference between this and an array where the index is an enumerated type is that the index is dynamic and includes any values desired at run time. Thus you could say $balls{'red'} = 7; $balls{'green'} = 34; $balls{'blue'} = 12; while (($color, $number) = each %balls) { print "I have $number $color balls\n"; } Variables are preceded by a $ and the { } array indices are for associative arrays. Standard arrays use [] for their indices. The % prefix indicates the entire associative array. Thus this program initializes the associative array and then loops using the each function to return each tuple of key and value. These tuples are assigned to the scalar variables color and number and then used in the print statement. An easier way to initialize the array would be to use the list construct of Perl: %balls = ( 'red', 7, 'green', 34, 'blue', 12'); Open function: Perl's open call can also open pipes to or from other processes. Thus Perl can start other processes and either read their results (very useful for letting Perl figure out the SQL to run and then running SQL to obtain the data) or for passing Perl's output to another program (such as the print spooler). Of course, Perl can read, write, and append to files with the open function. Formats: Perl supports a BASIC-like format option for output files in addition to the print and printf constructs. The Perl program in Listing 1 converts UNIX System V type df (disk free) listings into the BSD type of report. Listing 1 shows several of the features of Perl, as well as demonstrating the format capabilities. The first three lines make sure that Perl is running the program, allowing a plain "executable" file to automatically be a Perl script. If the system supports the #! notation, then the kernel will spawn Perl to handle this file automatically, and not the shell. Otherwise the second line causes the shell to execute Perl on the script. When Perl does see the script, it treats the first line as a comment, and the second and third lines as a valid Perl statement. Since the variable running_via_sh is not non-zero (it isn't even defined yet), the eval statement is not executed and Perl just continues on in the script. he array (@ is the symbol for an entire regular array) of the arguments passed to the Perl script, not counting the name of the script. Thus the join line makes a text string of all the arguments separated by spaces. This string is used in the open call to the df process, causing it to output only the requested file systems (if arguments are given), or all the file systems (if no arguments are given). Note that if the df fails, the shell construct is honored to allow the error message to be output when the open fails. The formats could appear anywhere in the script. The default top of page format for the STDOUT file is called top, but that association can easily be changed. In this case the top format is used to provide column headers. Note that format continues across lines until a line with just a period is encountered, thus outputting multiple lines. Both top of page and file formats may also include variable substitutions. The STDOUT format "writes" to the standard output file and have three types, <, , and > for left justify, center, and right justify respectively. All variable substitution formats start with a @ character and take as many spaces as are desired. Each line with @s in it is immediately followed by a line listing the variables to print on that line. It is not necessary to space variables as I did, but I think the spacing improves readability. The while loops over the lines read from the Df file. The special symbols <> mean read a line from the file. The if block uses regular expression matching on the line just read and only performs the then clause of the if when the line contains the text string total blocks. In the else clause the s commands are string substitution, again based on regular expression mapping. These commands work on the input line by default, however the =~ operator is used a couple of lines later to specify that the substitute should be performed on the $name variable instead of the input line. The write function is used to output a line using the format specified earlier. Finally, the last if block uses the special variable syntax $#name, which references the subscript of the highest element. Since this Perl script origins arrays at zero (the default), a less than zero check tells whether any arguments were passed to the script. As a result of this test, the total line is only printed if no arguments were passed and the df is for all file systems. Perl scripts are also easy to debug, in part because of the debugger imbedded within Perl. Adding a -d argument to the invocation line tells Perl to run the script in debug mode. Debug mode supports breakpoints, single stepping program browsing and "immediate mode" execution of any valid Perl statement. Thus the contents of variables can be examined or changed at any time. I didn't even describe directory processing, BSD socket access, subroutines or much on regular expression processing. Perl does come with a complete reference manual, although a tutorial manual is not provided. Perl, of course, is most useful on UNIX (or Xenix) systems. However, restricted portions of Perl have been compiled on VMS and on MS-DOS. Perl has gotten so popular there is now a Usenet news group comp.lang.perl. But Perl is not a small program, and its load size causes a sizeable overhead at startup. Of course, for longer scripts this delay is not a problem, but the overhead is enough to keep Perl from replacing the shell for all scripts. There's More comp.sources.unix was active for a short while and has again gone quiet. During its active time, Rich Salz, the moderator of comp.sources.unix, did provide some unusual postings. From Harold Walters at Oklahoma State University came a set of 109 functions called xxalloc providing dynamic array manipulation in one, two and three dimensions. xxalloc includes routines for allocating, initializing, printing, renumbering and fleeing both arrays of structures and arrays of simple types. An "edge-vector" approach is used for two- and three-dimensional arrays to allow for development of reusable subroutine libraries without regard to some "maximum" dimension. The package includes installation instructions, a test program to exercise most of the package, and manual page. It has been tested on System V, BSD and MS-DOS machines and is available as Volume 20, Issue 28. Chin Huang has written a program to automatically generate C function prototypes and variable declarations from C language source code. It differs from other similar programs in that it doesn't parse the function body. This package needs FLEX, which is also available from the archive sites. Cproto is Volume 20, Issue 29. For those still using curses instead of bit-mapped screens, John Lupien at AT&T has published a curses-based digital clock for VT100 and compatibles. It's a small, simple program and is Volume 20, Issue 45. Plum-Hall has placed into the public domain a simple set of benchmarks intended to give programmers timing information about common C operations. They were designed to be short enough to type while browsing at trade shows, and are protected from overly aggressive compiler optimizations. The plumbenchmarks are Volume 20, Issue 47. David Curry at NASA Ames Research Center has posted Index, a program to allow you to maintain multiple databases of textual information, each with a different format. For each database Index allows insertion, deletion, edits on existing entries, searches using full regular expressions, restricted searches, pattern matching and arbitrary formatting. Index is Volume 20, Issue 56 and 57. Richard O'Rourke of Microplex Systems, Ltd. provided a pegboard program which keeps track of who is in and out of the office, and when they are due back. The program is designed for Xenix, but should work on other flavors of UNIX and is in Volume 20, Issue 76. For those running Xenix or UNIX V3.2.1, Volume 20, Issue 81 and 83 from Eric Raymond has provided an editor/minilanguage to rebind the keyboard on the console. Useful for Emacs users and for changing the virtual terminal selector keys. One of the stranger programs in comp.sources.unix in that last spurt of postings was the "Reactive Keyboard". Mark James of the University of Calgary has augmented a general-purpose command line editor with predictive text generation. The program interfaces with a standard shell, allows simple editing of input lines, and will predict input lines based on previous input. It's weird to type an edit followed by a compile and have the command processor provide the file name for the compile, and then after you edit the file again, have it predict another compile. The Reactive Keyboard is Volume 20, Issues 29-32, but it requires BSD-style ptys to work properly. Chip Salzenberg of AT-Engineering has posted his latest version of Deliver, a program which delivers electronic mail once it has arrived at a given machine. Deliver extends inflexible E-mail delivery systems to allow complete control over mail deliver through the use of delivery files. Delivery files are shell scripts which are executed during message delivery. These scripts control which people or programs get each E-mail message. Look for Volume 20 Issues 23-27. Pcomm, v1.2, is a UNIX telecommunications program made to look like Datastorm Technologies ProComm for MS-DOS. New in v1.2 is BSD support, auto-login scripts (using shell scripts), imbedded external file transfer programs, and faster operation via I/O buffering. Emmet Gray from the US Army submitted this as Volume 20, Issues 67-75. For those stuck with the old troff, and wanting to deal with printers other than the Wang C/A/T phototypesetter, Chris Lewis of Elegant Communications, Inc. has provided psroff. It converts the output of standard troff to postscript, di-troff format, and a partial attempt at HP-LJ family of printers. Several patches are also available to further enhance this package which is Volume 20, Issues 33-38. And Still More: comp.sources.misc Although Rich Salz has been intermittent with postings, Brandon Allbery, the moderator of comp.sources.misc has been providing plenty to write about. In Volume 8, Issue 99, Paul Blackburn of the Open Software Foundation provided a script for keeping track of changes to files. Useful to system administrators who need to detect unwanted or unexpected changes to files. A UNIX make work-alike, Make v1.5 was posted as Volume 8 Issues 104-106 by Greg Yachuk of Informix Software. This make is very close to the make provided on Sun systems and runs under MS-DOS or UNIX. New features include the -k, -S and -q options, supporting the $(MAKE) macro, and several bug fixes. A 16-bit MS-DOS compress is also available. Most versions for MS-DOS cannot handle 16-bit compression tables (the default on most UNIX systems). This version can, and is based on, the Compress 4.0 UNIX sources. It requires about 400K to run. Posted as Volume 9 Issue 5 by Doug Graham. Steve Tynor has his head in the clouds to provide us with FPLAN, a flight planning program intended for use in general aviation. It reads a file consisting of departure and destination airports, navigation aids, intermediate checkpoints, fuel consumption rates and winds aloft and produces a flight plan with wind corrected heading, fuel consumption for each leg, vor fixes for each checkpoint (Volume 9, Issues 11-16). Lastly is popi, a program to perform interactive digital image transformations. Based on the program described in the book Beyond Photography--The Digital Darkroom by Gerald J. Holzmann, this implementation by Rich Burridge consists of an interactive previewer and a digital matrix transformation system. Popi can perform transformations on arbitrary images in grey scale. A sample image is included to show how to invert the grey scale (make it a negative), frame it (crop), and solarize it (fancy signal processing). Popi includes a postscript printing facility and modules to allow it to work for Amiga, Apollo, Atari, IBM PC, Kermit, MGR, NeWS, SunView, X11 and XView systems. The nine parts are Volume 9 Issues 47-55. Listing 1 #!/usr/local/bin/perl eval "exec /usr/local/bin/perl -S $0 $*" if $running_via_sh; # A cute Berkeley style df formatter for those running USG df # Do what you want with it; it's yours. # R. Craig Peterson, N8INO $fs=join(' ',@ARGV); open(Df, "df -t $fs ") die "Can't run df."; format top = Filesystem kbytes used avail capacity iused ifree %iused Device format STDOUT = @<<<<<<<<<<<<<< @>>>>>> @>>>>>> @>>>>> @>% @>>>>> @>>>>> @>% @<<<<<<<<<<<<<< $fs $kbytes $used $avail $capacity $iused $inodes $piused $name while (<Df>) { if (/total blocks/) { ($d,$tblocks,$d,$d,$tinodes,$d) = split(' '); $tinodes *= 8; $kbytes = $tblocks / 2; $used = ($tblocks - $blocks) / 2; $avail = $blocks / 2; $capacity = int(100 - ($blocks / $tblocks * 100)); $iused = $tinodes - $inodes; $piused = int($iused / $tinodes * 100); write; $tot_kbytes += $kbytes; $tot_used += $used; $tot_avail += $avail; $tot_iused += $iused; $tot_inodes += $inodes; $tot_tinodes += $tinodes; } else { s/$\s*/ \(/; s/\s*$/\)/; ($fs,$name,$blocks,$d,$inodes,$d) = split; $name =~ s![(): \t] /dev/dsk/!!g; } } if ($#ARGV < 0) { $kbytes = $tot_kbytes; $used = $tot_used; $avail = $tot_avail; $capacity = int(100 - ($avail / $kbytes * 100)); $iused = $tot_iused; $inodes = $tot_inodes; $tinodes = $tot_tinodes; $piused = int($iused / $tinodes * 100); $fs = 'Totals:'; $name= ''; write; } Code Base 4 Darren Forcier The author is a consulting dBase and Clipper programmer. He holds a bachelor's degree in computer science from Central New England College of Technology in Worcester, Mass. He can be contacted at 253 Main St., Cherry Valley, MA 01611 (508) 892-3351. Code Base 4 from Sequiter Software, Inc., is a set of C routines which allow you to access dBase III+ and IV files and perform screen I/O similar to dBaseIII+. Code Base 4 includes support for dBase .NDX index files, Clipper .NTX files, memo fields, and networked applications. The new dBaseIV .MDX (multiple index) files are not yet supported. Code Base 4 also adds some new levels of functionality with special routines for windows, menus, and memory handling. Code Base 4 supports dBase/Clipper functionality with 12 categories of routines: conversion routines database routines expression evaluation field routines get routines memory handling routines index file routines memo routines menuing routines utility routines windowing routines extended routines The conversion routines convert data from one format to another. For example, c4dt_dbf() converts Julian dates from the internal double representation to a string formatted CCYYMMDD (century, year, month and day). The database routines are the main meat of the Code Base 4 package. They allow C programmers to access and store information in the well-established dBase .DBF file format and include C function equivalents of many dBase commands. Due to the more stringent nature of compiled C programming as opposed to the dBase interpretive environment, some differences exist. Table 1 compares dBase commands to their Code Base 4 function counterparts. Expression And Indexes With the dBase LINK_DEST 125 command, one may use any valid dBase expression to interactively build an index. For example, to create an index for an accounts receivable file based on invoices sorted by invoice date, you could use the following command: INDEX ON STR(YEAR(invc_date))+; STR(MONTH(invc_date))+STR(DAY(invc_date)); TO INVOICE The STR(YEAR(invc_date.... portion is an "index expression". dBase evaluates this expression from left to right at runtime, and builds an .NDX file based on the expression. Code Base 4 uses an internal expression parser similar to dBase's. Several of the expression functions are available to advanced programmers who might wish to build an SQL-like front end to their application. The field routines access individual fields and information pertaining to the fields (e.g., their name, length, and data type). The get routines form the basis for data entry. The get routines arrange field data entry blocks and control field-to-field navigation within that block. As with dBase, you can control the get field in various ways, through pictures and validation clauses. Unlike dBase, Code Base 4 also allows you to access the individual attributes of each get field, allowing each Code Base 4 field to have its own color, brightness, or other attribute setting. Programmers with strong memory management background can use the Code Base 4 memory handling routines to optimize their applications memory usage. Routines are included to allocate, reallocate, and free memory from the internal Code Base 4 structures. Index files are used to access data in sorted order. With Code Base 4, either dBase .NDX files or Clipper .NTX files can be used. Like dBase and Clipper, Code Base 4 automatically updates all open index files when a record is written. However, with Code Base 4 you can use lower-level routines to access and update the index files directly. The memo routines access dBase variable length memo fields (pointers to a separate text file containing free format text). Unfortunately, Code Base 4 doesn't provide an editor for memo fields. A small Wordstar-compatible editor window function should have been included in the package, with definable screen coordinates. Nantucket's Clipper provides a MEMOEDIT() function which takes screen coordinates as its parameters, and allows you to edit memo fields in a word-wrapping window. The menuing routines include pulldown, popup, vertical, horizontal, and Lotus 1-2-3 style menus. The utility routines provide some low-level functions for parsing and validating file names, sorting arrays, and locking and unlocking regions of a shared file. The windowing routines display and manipulate regions of the I/O screen. The windowing routines form the basis for the menuing routines. The extended routines support some of dBaseIII and dBaseIV's extended functions, including record filtering (a subset view of a file based on some selection criteria), dBase relations (conditions which synchronize the movement of record pointers in separate files), an edit function (interactively change records), a record listing function, and an insertion function (adds blank records at any point). Code Base 4 comes with complete source code and conditional compilation switches for Clipper .NTX compatibility, no screen management, OS/2 support, Turbo C compatibility and Microsoft Windows support. A batch file is supplied for performing customized compilations of the library. dBase programs ported to Code Base 4 should be faster than the interpreted dBase original and smaller than a version compiled by Clipper. The Clipper (t) compiler from Nantucket Software speeds up program execution greatly, but due to library overhead, Clipper applications have a minimum size of 150K. Complete applications are typically about 300-350K. This makes it difficult to run a Clipper program with many TSRs loaded or in a multitasking environment such as DoubleDos. With Code Base 4, the linker only links in exactly what it needs from the library, so the application is much smaller. Code Base 4 assumes the user has a lot of C and dBase experience and a good command of C pointers, memory allocation, and structures. Beginning C programmers may quickly find themselves lost. The package makes heavy use of linked lists to model database, field, window, and menu structures. Code Base 4 is primarily a tool for the programmer who has mastered C and dBase and wants to transcend the limits of dBase and Compiled dBase (Clipper, Quicksilver). The wire-bound, 200-page Code Base 4 manual, while comprehensive, has no tutorial or examples section. Each chapter is broken down into a functional area of the Code Base 4 library, and each library function is documented for that section. Some function descriptions have small source code examples; others do not. Several sample programs are supplied on diskette. The samples are well written, and illustrate examples of how to build menus, windows, and database applications. I liked the manual's organization but often found the function descriptions a little thin. A front-end tutorial section with plenty of examples would help this manual greatly. Technical support is available by phone. Overall, Code Base 4 does the job for which it is intended: it brings a dBase-like programming environment to C, and allows you to share data between dBase and C programs. It does have some limitations. Using an external editor to access dBase memo fields is a little awkward. I have worked with several other dBase C libraries. APEX ADL from Apex Software offers similar support for dBase file and index creation and access, but doesn't include any menuing, windowing, or get functions. I would recommend Codeview4 to any programming shop that needs to transcend the limits of dBaseIII+ and Clipper. Code Base 4 can address speed, memory requirements, or specialized dBase compatible applications (such as a memory resident TSR). For straightforward applications development, dBase and Clipper are still pretty powerful programming environments. The MS-DOS version is listed at $295 and the UNIX version at $495. For more information contact Sequiter Software Inc., P.O. Box 5659, Station L, Edmonton, Alberta T6C 4G1 (403) 439 8171; FAX (403) 433-7460. Table 1 Dbase Commands & Their Codebase4 Equivalents Dbase Codebase4 Command/Function Function Description -------------------------------------------------------------- go bottom d4bottom() Go to last record go top d4top() Go to first record use d4close() Close individual .DBF file close all d4close_all() Close all open database files create d4create() Create a new .DBF File delete d4delete() Mark a record for deletion deleted() d4deleted() Return TRUE if record deleted go <record#> d4go() Go to a specific record # rlock()/flock() d4lock() Lock a portion or all of file pack d4pack() Remove deleted records recall d4recall() Undelete marked records reccount() d4reccount() # of records in file recno() d4recno() Current record number seek "key-val" d4seek() Look up record in index key select d4select() Make database area current skip <#records> d4skip() Move record pointer #records unlock d4unlock() Unlock File/Record use <Filename> d4use() Open a database file replace d4write() Update database record append blank d4write(0) Append a blank record zap d4zap() Delete all records & pack Listing 1 test.c #include <d4base.h> #define SAFETY_ON 1 /* d4create will return -1 if database already exists.. */ #define SAFETY_OFF 0 /* d4create will overwrite database if it exists... */ /* Declare Field Structure for database */ static FIELD FIELDS[] = { /* Field Name, Type, Width, Dec, Offset */ {"FIRST_NAME", 'C', 25, 0, 0 }, /* Char 25 */ {"LAST_NAME", 'C', 25, 0, 0 }, /* Char 25 */ {"COMPANY", 'C', 30, 0, 0 }, /* Char 30 */ {"TELEPHONE", 'C', 12, 0, 0 }, /* Char 12 */ {"LAST_SALE", 'N', 12, 2, 0 }, /* Numeric */ {"LAST_DATE", 'D', 8, 0, 0 }, /* Date */ {"GOOD_CUST", 'L', 1, 0, 0 } /* Logical */ } ; int create_name( void ); /* Prototype for CUSTOMERS.DBF creation function */ main() { int rc; /* Return Code */ rc = create_name(); printf("\n rc return code was %d",rc); return; } int create_name( void ) { int rc; /* Return Code */ rc = d4create("CUSTOMER.DBF",7,FIELDS,SAFETY_OFF); return rc; } Advanced C: Tips And Techniques Randy Hohl Randy Hohl is a consultant with Interactive Systems Corporation. He has worked as an industry programmer for six years and has bachelor's degrees in computer engineering and psychology. He can be reached at (708) 505-9100 or randy@ i88. isc. com. Advanced C: Tips and Techniques is the third installment in a series of four books on C Programming in the Hayden Books C Library. The first two books present the fundamentals of C. Advanced C emphasises portability, compiler code-generation, and execution speed. The book is authored by the team of Paul and Gail Anderson. The authors discuss some of the more difficult components of C using a variety of clever programming techniques. The book is organized into six chapters and five appendices. Chapter one serves as a C refresher, the remaining chapters each cover a set of related topics. Every chapter concludes with a number of programming exercises. Each appendix details the features of a particular C compiler. Chapter one introduces some portable techniques for swapping variable contents, ASCII-to-integer and decimal-to-hex conversions, and bit-level operations. The techniques recur throughout the book and are presented as program examples employing an advanced usage of unsigned variables, unions, casts and macros. This chapter sets the theme for the authors' low-level approach to writing efficient and portable C; some of the examples are simply assembly-language tricks converted to C. Chapter two provides a comprehensive explanation of the C runtime environment. The compiler's distribution of program statements and variables into the text area, data area, the stack, and the heap is examined. This chapter answers questions like: "Where does a literal string live?" and "Is it faster to initialize static or automatic variables?" Examples are provided which demonstrate the compiler-mapping of program variables by printing the hex address of variables at runtime. I bought the book for this chapter and was thoroughly pleased. A complete source-code solution to the fragmentation of heap memory resulting from numerous runtime allocations is developed in chapter six. You may not find a need for arrays of two or three dimensions very often, but if you do, chapter three, the longest in the book, is probably the most complete treatment of the subject available. Portions of this chapter read like formula derivations and proofs in a math text. This method is used to derive equivalent pointer expressions for multi-dimensional array references. The derivations are used to demonstrate the storage map equations used by a C compiler to evaluate array references. Other compiler formulae are also presented. I found this chapter to be rather complicated, but still valuable. The authors show two methods to increase the speed of array referencing. Method one uses compact pointer expressions, such as *ptr++, rather than sequential pointer offsets, such as array[offset]; offset++. The argument here is that a compact pointer expression maps directly to a single assembler instruction. Method two uses pointer array declarations rather than multi_dimensional arrays. The methods are incorporated into a suite of ten benchmark programs which the authors have timed on 286, 386 and 68020 machines. The results show an average performance improvement of 27 percent using the optimized methods across the three architectures. Ever been faced with a C declaration like, char *(*(*buf[20]) ())[10]? Complex declarations are succinctly deciphered in chapter four. The authors use a presentation mode developed in the preceding chapter, namely the repeated application of a rule, in this case the "Right-Left" rule. The Right-Left rule is simple and works nicely for both creating and reading complex declarations. This chapter also explains how to use varargs.h to create portable routines that accept a variable number of arguments. Debugging techniques in C is the topic of chapter five. The authors develop six categories of custom debugging tools. Each category is predicated on surrounding utilities, including the C preprocessor, the assert() library routine, and the UNIX signal() system call. Some of the tools have the advantage of providing a variable degree of error-checking without the need to re-compile the original source, a big advantage in large systems. Where significant compile- or run-time overhead is required to implement a custom technique, the authors are quite frank about the tradeoffs involved. The value of some of these techniques took a while to sink in. The appendices of the book are a well-written description of the AT&T C compiler and four different compilers targeted for Intel processors. All the options and all the memory models provided by each compiler are described. Chapter two's discussion of the C runtime environment is prerequisite knowledge for understanding the memory models. New constructs from the proposed ANSI C Standard are discussed for compilers which are supportive. The book includes an order form for a set of two floppies, one containing the C source code for all the program examples, the other containing solutions to all the exercises. The price is $39.95. I examined the contents of both disks and was impressed. The examples disk contains over 150 .c files; each file contains a stand-alone C program drawn from the book. For book examples which are only code fragments, the fragment is expanded into a complete program. Nearly all examples make judicious use of printf statements to illustrate the subject matter. The solutions disk contains over 70 .c files, each an individual program. The solutions are complete and contain ample explanatory notes as C comments. Two of the chapter three solutions are timed on the 286. Conclusions I have mainly bravos for this book. Each topic is covered from both the compile-time and runtime perspectives. The authors incorporate portability and efficiency throughout, using the proper features of C. The presentation is well-paced and properly organized. All major points are illustrated with appopriate-length program examples. Some program examples are reused and enhanced in the light of the current topic. Advanced usage of some features of C, such as macro definitions with arguments, typedef variable types and compact conditional operator expressions, are shown implicitly in the program examples. Proven techniques, such as the benefit of compact pointer expressions, are used in subsequent programs. The use of customized standard header files is especially creative. Each chapter can be digested as a single entity, allowing the reader an arbitrary perusal. The content of the exercise questions follows logically from each chapter's points; for instance, some of the chapter three exercises ask you to extend program examples to arrays of four dimensions. The book's preface lists the eight combinations of five machines and seven C compilers in which all program examples were compiled and executed. The execution results of program examples in specific combinations is a regular part of the narrative. I was able to successfully compile select program examples and exercise solutions on an Amdahl 580 under UTS and an AT&T PC 6300 under MS-DOS. Chapter two's discussion of the runtime environment is valuable for experienced programmers who are learning C; it may aid in preventing dangling pointer bugs. The remainder of the book is primarily useful for experienced C programmers, particularly those who are beyond "make it work" and are into the "make it clean" and "make it fast" stages. A working knowledge of the routines in a standard C library and their interfaces is also required. Proficiency in binary arithmetic is needed to work through the bit-level operations. I would also recommend a familiarity with UNIX shell commands and the UNIX System V system call, signal(). (The authors neglect to identify the signal() in the book as the System V, rather than Berkeley, version). I would have preferred some header file examples from an operating system library other than UNIX. Despite this bent toward the UNIX/XENIX family, the tips and techniques presented in the book are beneficial across operating systems. Advanced C: Tips and Techniques Paul and Gail Anderson Howard W. Sams & Co. 1988 $24.95, 446 pages. Publisher's Forum In this issue we again present -- as part of our continuing public service program -- the winners of the International Obfuscated C Code contest (see Don Libes' Column). As always, the winning entries are perversely clever and entertaining -- and frightfully effective demonstrations that "clever and entertaining" don't build understandability. Least there be any misunderstanding, we do not encourage these obscure programming practices. We publicize the results only as a public service -- to give this dangerous form of expression a safe outlet. "Please, please, boys and girls, don't attempt this trick at home. Remember these are highly trained professionals." (We hope.) I suspect an eastern mystic would take satisfaction from the pre-processor's critical role in many of the winning entries. When used simply and directly, the pre-processor contributes as much to understandability and maintainability as any language component. But when purposely exploited -- when the pre-processor's latent powers are stretched to achieve non-obvious ends -- understandability and maintainability are seriously damaged. "Too much of a good thing" and all that. We'll continue our public service campaign in the next issue by announcing the winners of our Bad C Pun Contest. Like the obfuscated code contest, the bad pun contest is intended as a "safety valve". Through this contest we were able to capture and destroy several hundred distressingly bad C puns, hopefully removing them from circulation, at least for a while. An independent judge selected the most nearly humorous groaners for prizes. We saved only the winners, and will share those with you in the next issue. But, please don't expect much -- remember this was a "bad" pun contest, and bad they were. Sincerely yours, Robert Ward Editor/Publisher New Products Industry-Related News & Announcements Expert Analyzes Code Quality Conley Computing is shipping Codecheck, a rule based expert system that checks C and C++ source code for maintainability, portability and compliance with in-house style guidelines. Codecheck evaluates the portability of C to various dialects, including ANSI, K & R, Harbison and Steele and C++. Codecheck has been designed to target code for compatibility between PC-DOS, OS/2, Macintosh, UNIX and VMS. Codecheck also provides a statistical analysis of code complexity and style. Versions are available for MS-DOS, Macintosh, OS/2, AIX, PC/IX, and QNX. Priced from $495. Contact Conley Computing Corp., 7033 S.W. Macadam Ave., Portland, OR 97219. (503) 244-5253; FAX (503) 244-8375. ProtoView Development Tool Produces Windows Code ProtoView Development Corporation has released an application development tool for C programmers working in the Microsoft Windows environment. The Protoview Screen Management Facility includes a WYSIWYG screen painter and code generator that produces source code, header, resource, definition and make files for a complete, executable version of each screen. ProtoView contains a dynamic link library of over 150 C language functions and a second dynamic link library that contains nine types of editing controls. With a single function call, table-driven, pop-up windows can be incorporated into an application, including calculator-style data input, date, money, string, real and security field objects. All fields can be edited. Fields can be mandatory, protected, alphabetic, numeric, uppercase, range checked, choice checked and even looked up in a table. ProtoView works with any database or communications package that can be linked under Windows. the package includes a 350-page manual, source code for all field controls, and the source for building the dynamic link library of controls. ProtoView applications are SAA/CUA compatible and carry no runtime or licensing fees. ProtoView requires the Microsoft Windows SDK, Microsoft Windows 286 or 386 and Microsoft C v5.0 or later. Price $595. Demo versions for $15. Contact ProtoView Development Co., 162 Kingdom Ave., New York, NY 10312 (718) 948-5195. Panel Plus License Revised Roundhill Computer Systems Limited have announced a new licensing policy for their PANEL Plus II screen manager and screen library package. The new license will include source code for the screen design editor and code generators. Pricing for multi-user systems will be based upon the number of programmers instead of the system architecture. Single user MS-DOS versions remain $495. Contact Steve Hersee (USA) at (708) 690-3737, FAX (708) 665-9841 or Tim Frost (UK) at 0672-84535; FAX 0672-84525. C++ Class Library Supports Matrices The M++ Matrix Class Library from Ansys Software Co., Inc., allows C++ users to declare dynamic matrices or arrays. M++ matrix objects support the direct manipulation of arrays, matrices or other groups of data, much as do symbolic matrix languages like Matlab, GAUSS or APL, but retain all the portability and speed advantages of C++. M++ implements generalized submatrices, allowing a programmer to write and manipulate general subsets of a given matrix or array, a feature particularly useful to those developing systems utilizing time-series data. Submatrices allow the programmer to view and manipulate data in alternate ways without physically reordering, restoring or rereading the data from disk. The class library handles and isolates memory allocation and provides compile-time selectable bounds checking, aiding in the debugging of complex programs. The M++ library provides int, float, double, complex, submatrix, index and decomposition matrix classes which can be extended, modified or limited via inheritance. Also, full C++ source code is available to support direct customization. The M++ Matrix Class Library is available for MS-DOS, OS/2 and UNIX C++ compilers conforming to C++ versions 1.2 or 2.0. Prices start at $195. Contact Ansys Software Co., Inc., 16950 151st Ave. SE, Renton, WA 98058 (800) 366-1573 or (206) 228-3170. CASE Tool Adapted To Nets Syscorp International has released MicroSTEP 1.4, a network version of its STEP CASE tool. MicroSTEP produces networkable applications for Novell, IBM token ring and Netbios networks. MicroSTEP's mouse-driven, graphic specification environment includes integrated design tools to: build data flow diagrams, specify data structures, layout screens, format reports, and describe an application's computations and logic. The $6000 system (training included) generates C. Contact Syscorp International, 9420 Research Blvd., Suite 200, Austin, TX 78759 (800) 727-7837. LALR Updates Parser Generator LALR research has released LALR v4.0, featuring: extended BNF with regular expressions, operator precedence, and tree building notation; derivation tracing of grammar conflicts; automatic symbol-table and abstract-syntax-tree construction; smaller parser tables and faster parsing times; and fast generated scanners. Contact LALR Research, 1892 Burnt Mill Rd., Tustin, CA 92680. (716) 832-2274. Expert Packaged As C Function Hy-phen-ex, a hyphenation expert packaged as a C function, is now available from GeoMaker Software. Hy-Phen-Ex applies over 4800 rules to (American) English text to identify places where a word may correctly be divided. Hy-Phen-Ex will even rank alternatives, if a word has more than one acceptable dividing point. The price is $89. Contact Geomaker Software, P.O. Box 273124, Concord, CA 94527 (415) 680-1964. Borland Opens Paradox Engine Borland International has opened the Paradox Architecture to C programmers with a new C library product, the Paradox Engine, that enables programmers to build applications that create and access Paradox data. The Paradox Engine API includes more than 70 functions (for single and multi-user environments) to: create, read and write Paradox tables, records and fields; support multi-user concurrency control; access tables sequentially or via indexes; and handle security tasks. The $495 package is expected to ship during first quarter and will be available for $195 during a 90 day introductory period. Contact Borland at 1800 Green Hills Rd., Scotts Valley, CA 95066. Case Tool Supports Serial Terminals CASET Corporation has released a new version of its Software Engineering Toolkit (SET) that supports multiple, overlapping windows, buttons and dynamic menus for color or monochrome serial devices. This release supports DEC VT and Tektronix type terminals on Apollo, Digital, Hewlett-Packard, Silicon Graphics, Sony and SUN CPUs, running Aegis, Ultrix, UNIX, or VMS. SET supports prototyping, development and management of the user interface and dialog portion of an application. Version 3.6 supports graphical interactive user interface development, including complete window (stack, pop, move, resize, copy, scroll, and delete) and dialog management, interface layout, and code generation facilities on serial terminals. Form support includes scrolling lists, buttons, toggles, and input type and range validation. User interactions are managed through a context sensitive command interface which can include a command line, pop-up, pull-out, or static menus, buttons, forms, prompts, input validation, hierarchical help text, and a configurable keyboard. An optional 2D/3D graphics system integrates into the windowing system with zoom and pan functions managed by SET. All color, graphics, and text features supported by the terminal can be accessed via SET. SET generates C or Fortran code. Price $925. Contact CASET Corporation, 33751 Connemara Drive, P.O. Box 939, San Juan Capistrano, CA 92693 (714) 496-8670; FAX (714) 661-5463. Abraxas Releases More Toolkits Abraxas Software is now shipping toolkits that allow COBOL, ADA, and FORTRAN to be embedded into applications. The toolkits run in conjunction with Abraxas Software's existing PCYACC and MACYACC products. Contact Abraxas Software, 7033 S.W. Macadam Ave., Portland, OR 97219 (503) 244-5253; FAX (503) 244-8375. Peritus Releases C++ Compiler Peritus International has released an ANSI C and C++ compiler for 386/486 UNIX systems. The compiler offers switch-selectable support of K&R and ANSI dialects as well as C++. Peritus C++ is implemented as a "true" compiler, rather than as a pre-processor pass. Available code optimizations include: global register allocation, constant propagation and folding, backward code motion with loop invariant removal, induction variable elimination, redundant store and dead code removal, and constant elevation. Peritus has recently licensed its compiler technology to Amdahl Corporation for use with Amdahl's UTS operating system. Peritus technology has previously been selected by Apple, Control Data Corporation and Concurrent Computer Corporation for use in proprietary compilers. The Peritus C++ Compiler is currently available for 386/486 systems under SVR3 UNIX and SunOS 4.0 UNIX for $1000. Contact Peritus International, 10201 Torre Ave., Suite 295, Cupertino, CA 95014 (408) 725-0882. Avocet Integrates Embedded Tools Avocet Systems, Inc., has announced AvCase, an integrated development environment for embedded systems. AvCase includes an editor, C compiler, assembler, linker, and simulator/source level debugger. AvCase is scheduled to ship in February 1990. It runs on PC-clone and requires no special hardware. The first release will target the Intel 8051 family. Later releases will target 68HC11, 6801, Z80 and 68000 products. Price for the entire package is $1895. Modules are also available separately. Contact Avocet systems, Inc., 120 Union St., P.O. Box 490, Rockport, ME 04856 (800) 448-8500; FAX (207) 236-6713. Computer Innovations Upgrades QNX Compiler Computer Innnovations, Inc., has released a major upgrade of their compiler for the QNX operating system. C86 v3.10 for QNX now includes dynamic linked libraries, a source-level execution profiler, a quick make utility, an intelligent diff file comparator, and a strip executable field minimizer. This release also improves compiler efficiency, sourcel level debugging facilities, documentation, and the libraries. Dynamic linked libraries allow users to build programs that are smaller, require less memory and load faster. Users can build their own dynamic libraries as well as using C86 supplied libraries. The source level profiler tracks source level constructs, including files, modules, functions and lines. Contact Computer Innovations Sales, 980 Shrewsbury Ave., Tinton Falls, NJ 07724 (201) 542-5920. Microsoft Ships OS/2 v2 SDK Version 2.0 of the Microsoft OS/2 Software Development Kit (SDK) with Presentation Manager is now being shipped. Though developed as a joint IBM and Microsoft product, the pre-release is available through Microsoft. The $2,600 kit may be ordered directly from Microsoft by calling (800) 227-4679. CSL Ported To SCO UNIX CSL, a scientific programming library, is now available for SCO UNIX System V/386. CSL includes linkable modules for linear algebra, eigensystems, matrix computations, time series, smoothing, filtering and prediction, statistics, regression, linear and integer programming, optimization, differential equations, interpolation and curve fitting, and solutions for nonlinear equations. Licensing options include single end-users, multi-users, professional developers and site. Prices start at $295. Contact Eigenware Technologies, 13090 La Vista Drive, Saratoga CA 95070 (408) 867-1184. Planned Lattice Compiler Packages Will Include Dos Extender Lattice, Inc., plans to release its new 80286 and 80386 C Development Systems for MS-DOS and OS/2 on March 1 and April 1 respectively. According to Dave Schmitt, Lattice president, "These new C compiler packages will feature a complete programming environment, including a DOS extender, compiler, assembler, debugger, editor, global optimizer, programming utilities, and nearly 800 library functions." The Lattice 80286 C Development System runs under MS-DOS, Extended DOS, or OS/2 to create a single executable program which can run under MS-DOS, Extended DOS, or OS/2. The package includes a royalty-free DOS Extender which developers may include with their software at no charge. Lattice's DOS Extender allows users to run programs of up to nearly 16 megabytes. An included configuration optimizer performance tunes the DOS Extender. Lattice's 80386 C Development System takes advantage of the 80386 and 80486's 32-bit processing both for the compiler and generated programs. The system runs under Lattice's 80386 Extended DOS, PharLap's Extended DOS, or OS/2 v2 and generates programs which run under Extended DOS or OS/2 v2. The compiler is upwardly compatible so it will accept source code written for MS-DOS or OS/2 v1. These new compilers introduce Lattice's "Extended Family Mode." In this mode, Schmitt explains, "a single program can run under either Extended DOS or OS/2. Software developers only need to maintain one program even though their customers use the program under several different operating systems." The 80286 C Development System is priced at $495; the 80386 version at $900. Contact Lattice, Inc., 2500 South Highland Ave., Lombard, IL 60148 (708) 916-1600; FAX (708) 916-1190. Library Processes 'Live Video' Victor, a new C library from Catenary Systems, supports image processing applications. The package operates on gray scale images from any source. Victor includes image processing functions like sharpening filters, outline, linearization, and matrix convolution. Among the video digitizer support functions are functions to display 'live video' on a VGA adapter at rates varying from two to 15 frames per second. Victor also includes several resize functions, enabling applications to resize images directly to a VGA, to the digitizer display, or to an image buffer. Prices begin at $195. Contact Catenary Systems, 470 Belleview, St. Louis, MO 63119 (314) 962-7833. Debugger Works With Archimedes A new version of Softaid, Inc.'s source level debugger now supports the Archimedes v 3.0 C compiler. The debugger also supports Softaid's line of in-circuit emulators. Price $795. Contact Softaid, Inc., 8930 Route 108, Columbia, MD 2104 (800) 433-48812. Tool Directs Methodology Silico-Magnetic Intelligence has introduced Better-C, a coding methodology manager and program generator. According to the vendor, Better-C was developed with the assistance and cooperation of C, structured programming, and AI experts. The Better-C methodology incorporates complexity management, natural language naming, top-down design, and object-orientation. Objects in Better-C are created by an "open" and referenced via a handle, much as are C files. Objects may be arbitrarily complex structures, such as trees, lists, databases, or windows. Better-C is compatible with all major compilers and runs on a PC-clone under MS-DOS v2.0 or better. Price $98. Contact Silico-Magnetic Intelligence, 24 Jean Lane, Chestnut Ridge, NY 10952 (914) 426-2610. Mi-Shell Sports Own Debugger OPENetwork is set to release Mi-Shell, a configurable MS-DOS shell on April 15, 1990. Mi-Shell's "point and shoot" interface is accompanied by a FORTH-like script language (complete with debugger) which allows users to define the display and actions to be executed when a key is pressed in a certain environment. Price $89. Contact OPENetwork, 215 Berkeley Place, Brooklyn, NY 11217 (718) 638-2240. JDYX Releases UNIX Graphics JDYX Enterprises is now shipping v3.0 of their 80386 UNIX Graphics Library, a source code library supporting EGA/VGA/SVGA graphics on 80386 UNIX systems including Interactive 386/ix, AT&T System V/386, and Xenix 386 v2.3. The JDyx library supports twelve video modes -- through 800 x 600 x 16 and 360 x 480 x 256 on all VGA cards and through 640 x 400 x 256 on cards with the Paradise chip-set. The routines support concurrent graphics applications on different virtual terminals. These routines do not use the BIOS or Xenix CGI interface, but directly access the video card. Primitives such as point, line, solid, bibblt, ellipse and clipping are supported, and all sixteen color routines have 12 different alu operations. A bus mouse software cursor is also implemented. The library is designed so that one binary can run on different adapters as well as in different video modes. Source licenses are $199, binary licenses $99. Contact JDyx Enterprises, 907 Tuxworth Circle, Decatur, GA 30033 (404) 320-7624. Oregon C++ Now On Tower NCR Corporation, Europe Group, and Oregon Software, Inc., have signed an agreement to port Oregon C++ to the NCR Tower 32 Series. NCR will refer customers for the $3000 package to Oregon and its distributors. Contact Oregon Software, 6915 S.W. Macadam Ave., Suite 200, Portland, OR 97219 (503) 245-2202; FAX (503) 245-8449. Solution Systems Releases Brief 3.0 Solution Systems has released Brief v3.0, which includes a C-like macro language and a translator to convert macros from the original LISP-like syntax. Registered owners of Brief v2.1 can update for $70 plus shipping. Contact Solution Systems, 541 Main Street, Suite 410, So. Weymouth, MA 02190 (800) 821-2492. New Releases CUG307 ADU & COMX (Device Driver) Submitted by Alex Cameron (Australia), ADU is a disk utility program designed to work with both the IBM PC standard and non-PC disk formats. By choosing an option from the main menu, you can analyze the disk format, then read and write the contents of the disk, sector by sector. The menu is also user-configurable so that the disk parameters can be adapted to almost any conceivable disk format. The initial alien disk parameters are derived by scanning the disk and building up a disk_base table, which may then be modified by the user. The disk includes C source code and well-written documentation revealing the low-level detail of the PC's disk drive configuration, not available anywhere else. The program is compiled under Turbo C v2.0 or v1.5. No assembly is required. Submitted by Hugh Daschbach (CA), COMX, an MS-DOS communication port device driver, is an answer to a question posed by Jose Alfonso Corominas (Question & Answers, CUJ November 1989, page 52). COMX provides buffered I/O to a serial port with optional XON/XOFF flow control through standard read/write requests or interrupt 0x14. The program uses mixed memory models. COMX.C is compiled under the small model with explicitly declared far pointers and a front end program forces the linkage editor to produce a tiny model executable. This program is specifically written for Microsoft C (v5.0 or later) and some assembly code comes with the C source code. CUG308 MSU, REMZ & LIST Dinghuei Ho (WA) has submitted MSU, an educational simulation of simple computer organization and operation. MSU can simulate a computer that has a 4K word memory space (each word is 32 bits), a CPU that includes four segment origin registers (code segment, input segment, output segment, and workspace segment), instruction register, program status register, a card reader and line printer for input/output, and a clock. Using merely 10 basic instructions, you can operate this computer and derive output. The program runs under VMS on the Dec VAX 8820, but you can port it to other environments by modifying the code. Bob Briggs (CA) has submitted REMZ, the classic Parks-McClellan-Remez FIR filter design program based on the FORTRAN version appearing in Theory and Application of Digital Signal Processing by Rabiner & Gold (Prentice Hall). The program compiles under Turbo C or Quick C. Michael Kelly (MA) has submitted LIST, an object-oriented implementation of a linked list using C. LIST is able to imitate C++ notation (address_list.sort()) by defining a general structure whose fields are pointers to functions, each corresponding to the operations of an object. CUG309 6809 C Compiler for MS-DOS Brian Brown (New Zealand) has ported CUG221 6809 C for FLEX to MS-DOS. Modifications allow the program to run with ASxxxx assembler (CUG292), as well as with Motorola AS9 assembler. The program also generates ROMmable code. The disk includes a complete set of C source code, well-written documentation, and a run-time library such as routines for controlling the ACIA serial port, functions for character handling and data conversion between character strings and integers, routines for controlling a Hercules card, routines for a magnetic card reader, memory manipulation routines, PC serial card functions, and string handling functions. We Have Mail Dear Mr. Ward, I felt that some of the points raised by Phil Cogar in the letter published in the Jan. '90 issue deserved a response, although I don't know whether such a response fits within the subject range of your magazine. Evidently, there is a market for language translation tools, at least three of which are advertised in this issue at relatively reasonable prices. I hope to get an opportunity to try some of them myself. Rex Jaeschke has pointed out that C may not be the most cost-effective language for development or maintenance of software which fits the design of other languages, such as Fortran. Assuming that one has decided to use software developed in another language as part of a system written in C, several courses of action are available. These might include translating once and discarding the original, modifying the original to the extent necessary to maintain satisfactory parallel versions in two languages, or building a system in more than one source language. Any combination of these methods might be valid, and either of the first two would benefit from translation tools. Considering the difficulties of translation and the undesirable practices found in most freely available code, no translator can pretend to be able to generate bug-free code. It is usually easier to compile the code in the original language and verify operation with a few test cases, and maybe even clean it up and retest it before translating. One of the problems with a multi-language system is that the interfaces between languages are not always satisfactory, never covered by any nonproprietary standard, and unlikely to be subject to any of the usual safety checks, such as lint. This often leads to errors, like my forgetting that I was writing about a matrix set up by C rather than one set up by Fortran. A compromise which often works well is to choose one language as the primary one, with only low-level functions with simple calling interfaces written in the other language. This may be no more than a minor extension of the way the language system is actually written, as in the case of a Fortran runtime library much of which is written in C. On many modern systems, the C compiler has received the most attention of the various languages, and it may generate more efficient code. In particular, the amount of code required to set up loops seems to be consistently less in C, and operations such as those required in searches and sorts are unlikely to be optimized in early versions of compilers for other languages. For some examples which do not particularly favor C, we may look at some old Fortran coded problems. On the Multiflow Trace computer, 22 of the 24 Livermore loops run up to 10% faster in C than in Fortran, with the other two running much faster in Fortran only through the use of compiler directives (pragma) or in-line compilation of math functions. The Sun 4.0 C compiler is able to compile linear searches through floating point arrays with code which runs 40% faster than under their optional Fortran. For those who wish to know what axes I am trying to grind, I am on the verge of embarking on a project to support the SLATEC mathematical library in C and Fortran in a way which should suit the needs of those who need source code at a fair price to run on a variety of platforms. If we don't get approval from the owners of the rights, we will be looking for alternatives. I am working also on a series of hands-on learning seminars which will likely be presented in 6 hour increments, beginning with application performance tuning for pipelined architectures in C and Fortran, and UNIX familiarization for Fortran programmers. All in addition to my job in aerodynamics design and computation. Tim Prince 39 Harbor Hill Grosse Pte Fms, MI 48236 Just for your peace of mind, you are not alone. Researchers in the Advanced Computational Resources Lab (I think I have the name almost right) at Argonne National Labs are also interested in persuading scientists to develop numerical applications in C -- in part because the most advanced parallel hosts are first programmable in C. You might find their book Portable Programs For Parallel Processors interesting (Boyle, Butler, et. al.). --rlw Dear Mr. Ward, I have my problems with recommending The Awk Programming Language by Aho, Kernighan and Weinberger. It is an excellent reference to Awk, but is confusing when one is working with the older versions (older than System V v3.1). It brought tears of frustration until I happened to do a tail /usr/bin/awk od -c and came up with "(Berkeley) 9/16/83.". Clearly, an early version. There is a very good simpler explanation of awk in the chapter "The Awk Power Play" in UNIX Papers for System Developers and Power Users by the Waite Group, Howard W. Sams & Co. Learning regular expressions at the awk level is best as the regular expressions of "sed, grep, egrep, and fgrep" are subsets of this. One cannot be sure exactly what Dr. Whitaker is trying to do, but I have found that awk is ideal for extracting ASCII information from tables and that little language is all that he might need. UNIX tools are in a sense all "little languages" and this can explain their lack of coverage in the literature. I doubt if an author could convince his publisher that it would pay to cover these. So one must find information wherever he can; appendices, mixed in with other coverages and "between the lines." I have found that Howard W. Sams and The Waite Group are excellent in their coverage of "UNIX-oriented tools". Advanced UNIX -- A Programmer's Guide has an appendix on UNIX tools; other recommended are UNIX System V Bible, Tricks of the UNIX Masters, and The UNIX Shell Programming Language give innumerable examples. Personally, I enjoy using UNIX tools in developing applications and turn to the C language to do whatever I cannot accomplish otherwise. In other words using other people's ideas before reinventing the wheel. However if somebody can tell me how to document such a mixture, I would appreciate it. Yours sincerely, Alan E. Ternstrom 5321 Perkins Rd. #122 Oxnard, CA 93033 The UNIX tools are undeniably neat; I just developed a full-function mailing list package for my wife in about 30 lines of shell script (with about six four- or five-line awk scripts). Unfortunately, documentation is only half the problem. We're finding that since programmers must master six or seven fairly complex and independent syntaxes that developing skilled maintenance programmers for mixed tool applications is especially difficult -- the increased "bump" at the beginning of the learning curve can easily frustrate a newcomer. Anyone have some suggestions? --rlw Dear Editor: I would like to ask you to look at the opening two paragraphs of the Doctor C's Pointers in the February issue. I like the first paragraph. It matches my experience: Try a few things to see if the concept will work and then, because of lack of time, build the rest of the program in a stepwise fashion. It prepares me for what I would consider an excellent article: How to pull hard coded definitions out of a program into headers. The second paragraph is in total opposition: "Headers must be done before any code is written." In fact, the article is rather good in its information while having almost nothing to do with either paragraph. It certainly does not assist people working per paragraph one, while not being as rigid as paragraph two. I think the editor needed to do some editing here. The article "Tools For MS-DOS Directory Navigation" by Leor Zolman contains at least one serious error and a couple of misapprehensions. The serious error is "there is no facility for viewing all active assignments" with SUBST. This is simply wrong. Typing SUBST with nothing after it instantly lists all current substitutions. I use it constantly. The primary misapprehension is that it is desirable to change default drive when CD is used. About half the time, I would then have to change back to my original default. I get the feeling that Zolman works in an environment where CD supports only one directory at a time (Apple ProDOS does this, maybe UNIX does also), not the one per drive that MS-DOS provides. While his choices are interesting, I am not sure a C program is needed, since most of what he does can be done with a batch file in far fewer lines. I can do his previous in the rather stable environment I work in most of the time by using SUBST to simply create a new drive. I typically have 8 or 10 "drives" specified on my system (118 directories.) Mike Firth 1019 Martinique Dallas, TX 75223 Leor responds: Yes, typing "subst" by itself does indeed display all active assignments. The DOS manual for my system didn't happen to mention that little feature. I still dislike "subst" for several reasons, however: 1. After selecting a virtual drive defined via "subst", there are two different notations for specifying the full pathname of any file on the virtual drive (one using the "real" drive, one using the virtual drive) but no way to access those portions of the file tree that reside "above" the base of the virtual drive without reverting to "real" drive notation. This leaves "subst" looking adequate to specify data paths for applications programs, but not too intuitive for general-purpose file system navigation. 2. After a virtual drive has been assigned with "subst", any redefinition of that drive is prohibited by DOS. This makes the implementation of a generalized "return to last directory" mechanism using "subst" seem impossible (at least if you want it to be able to work more than once.) 3. Finally, to be able to use "subst" at all, CONFIG.SYS must be changed and the system re-booted. Regarding my alleged "misapprehensions", please realize "cde" and "ret" are meant to work in conjunction with the built-in "cd", not necessarily as absolute replacements for it... "cd" may still be used anytime to change any drive's default directory without disturbing the operation of cde/ret. The philosophy behind cde/ret is simply to reduce the number of keystrokes needed to move between directories, and to make DOS behave a little bit more like UNIX; whether or not that is better, of course, boils down to a matter of personal preference. Dear Mr. Ward: I would like to advise CUJ readers of a potential problem that may occur when using Turbo C's integrated environment. I encountered this problem when I combined completed modules of the system I worked on into one library. I used the TLIB utility, and the resulting file was named CHELIBS.LIB. But after I replaced numerous C-source and OBJect file names in the PRJ file with CHELIB.LIB, I encountered a surprising reaction: my Turbo C integrated environment (v2.0, tc.exe dated 8-29-88) failed to link the executable file due to numerous unresolved external references (_setargv, _setenvp, and _exit, among others). After careful study, I learned that this error occured due to a bug in Turbo C. Turbo C does not correctly handle LIB file names listed in the project file. My library's name began with "CH" (like CH.LIB, Borland International's library of huge memory models), and this file was interpreted by Turbo C as its own library. Hence, CS.LIB (I used a small memory model) was not linked at all. When I renamed my library to LIBCHES.LIB, all was well. My research has shown me that users' libraries can't use titles beginning with any of Borland's library names (i.e., CS..., CM..., CC..., CL..., or CH...). I hope my report will be useful to Turbo C users, and I am happy to make a contribution to CUJ, however small it may be. Sincerely, Alexander Vladimirovich Pavlov Poste restante, Central Telegraph Office Moscow K-9 103009 USSR Thanks for the information. It's neat to get letters from the USSR. --rlw Dear CUJ, Simon Wheaton-Smith's letter (February 1990 CUJ) deserves patience and a response. While his tone is fanatical, even fanatics have been known to have insights. I have found the C source code he placed on Compuserve (GO CLMFORUM, in the OOP Alley library as OBJECT.C). In this example, he demonstrates that it doesn't require extensions to C or a new language to provide encapsulation, dynamic allocation or to place panels and buttons on the screen. I hope we're ready to agree that these characteristics are not unique to object-oriented programming. There are additional features of C++ which Simon admits are absent from C but which Simon suggests are best provided by a more robust preprocessor. These features are overloaded functions and inheritance. One could argue that, in fact, C++ is just such a preprocessor. Or, one could argue that Simon is suggesting a new language, since a preprocessor can be considered a language. We might call this language "C with overloaded functions and inheritance" instead of C++ (formerly known as "C with objects"). I would argue against preprocessors in general. If Simon has developed in C for MVS environments, then he is probably familiar with the C compiler IBM has been marketing. I believe this product is still a preprocessor, producing assembler code. C's limited I/O features have not been extended to match IBM's access methods and so, more often than not, the assembler code is doctored to produce the desired application. The original C code ends up being discarded and you're developing an assembler application. As you would expect, there's not much serious C development on MVS environments. (This situation should eventually change now that IBM has endorsed C as one of the four SAA languages, as Simon points out.) This is the danger with preprocessors. Just as with C macros, only the simplest of functions belong in preprocessor macros. I am not particularly a fan of C + +. I agree with Simon, that C + + is "a random collection of items". I've been pleased with its cautious reception. But C + + does have a certain amount of promise -- given C + + and a robust class library, we should be able to quickly produce terse programs. This seems to be what we want from a development environment, and the direction that modern languages should take -- rapid development in a brief and clear style that produces efficient code. The language should be suitable for group development. We want mechanisms which encourage (perhaps enforce) reusable code design -- it's difficult to say that software can currently be characterized in "generations". It remains to be seen if C++ can deliver on these promises. A note on efficiency -- Simon would lead us to believe that the programmer is responsible for optimization. I, on the other hand, believe this to be a cooperative effort between the programmer and his compiler. When we have a language which permits a programmer to briefly and clearly describe what needs to be done and a compiler which determines the efficient way to get it done, we'll have a development environment we can stay with for a while. Russ Klanke 6840 Oswego Place NE #306 Seattle, WA 98115 Thanks for writing such a reasonable response. Personally I don't agree that C++ is a random collection of items. I was fortunate enough to sit in on a two-day C++ seminar by Bjarne Stroustrup a couple of years ago. I was impressed with his justification and rationale for the features included in C++. I think he's done a remarkable job of adapting language features invented in a "protect the programmer from himself" environment so that they fit reasonably in C's "you'd better know what you're doing" world. -- rlw Dear Mr. Ward: When I was a child it was rather common to hear from one of your playmates(?) the taunt "I know something you don't know." I am surprised to find it continued in The C Users Journal in Mr. Brannigan's article on "Fitting Curves to Data". Mr. Brannigan states: "It is not difficult, for example, to input data for a linear regression routine to a well known statistical package (which I shall not name) used on micros and mainframes for which the output is incorrect." Either name the offender or do not make the accusation. In the context I am familiar with professionally, if you disagree with something that has been claimed, you state your disagreement and give supporting evidence for that disagreement but you do not make the type of accusation Mr. Brannigan has. In my opinion Mr. Brannigan has by such behavior damaged only his credibility. Sincerely, Morton F. Kaplon 1047 Johnston Dr. Bethlehem, PA 18017 When I edited that story, I almost deleted the parenthetical remark to which you refer, just because it wasn't really necessary. Perhaps I should have. Had I read it as you did, I certainly would have deleted it. For me, as a small publisher, I brought other assumptions to the table. I figured Brannigan was just trying to spare me the wrath of some major software vendor. Had he named the vendor, I would have been forced to verify the error before publication or run the risk of being without defense against a potential libel suite. Since the error itself wasn't critical to the story, I would probably have deleted the comment instead of investing resources in doing "quality control" for a product of minimal interest to my readers. Which is better, having readers be placed on notice (albeit vague notice), or saying nothing at all? --rlw Re: Passing and Returning Objects in C++ In his article in the August, 1989, issue, Bruce Eckel gives an interesting description on how values are passed to C++ functions. Unfortunately, he makes a few errors which detract from his presentation. First, he incorrectly describes passing arguments to a function by name. He states that this is when a pointer is passed to a function. In fact, passing an argument by name uses Algol's copy rule: the entire text of an argument is reevaluated each time the name appears in the called function. This somewhat resembles how arguments to C macros are handled. C does not have call-by-name or call-by-reference. In C, only values are passed. When a pointer is passed, it is passed by value. The programmer must account for the fact that a pointer value is needed when the function is called and that the value of the argument received is a pointer. This means that an expression that is not an lvalue cannot be used as an argument for a function expecting a pointer value. C++ did extend C to add call-by-reference (but not call-by-name). While this is most often implemented by passing a pointer to the function, it is not the same. The programmer need not know whether a function is called by reference or value: the call is the same and the argument need not be an lvalue. He makes another error in stating that structure assignment is limited. The example he uses A=B=C; works in ANSI C both for simple data types (int, float, etc.) and for structures. He also describes a method of returning a structure which can only be used when the called function is not recursive. When the function is recursive, space for the returned value of the structure is usually al