Chapter 4: The Tactics of Development


A Developer-Friendly Operating System

[FIXME: Insert pithy quote here]
Unix has a long-established reputation as a good environment to develop under. It's well equipped with tools written by programmers for programmers; these automate away many of the grubby little tasks that would otherwise distract you from concentrating on the most important (and most enjoyable!) aspect of development -- your design.

While all the tools you'll need are there and individually well documented, they're not knit together by an integrated development environment (IDE). Finding and assembling them into a kit that suits your needs has traditionally taken a considerable effort.

If you're used to a good IDE -- the kind of GUI-driven combination of editor, configuration-manager, compiler, and debugger now common on Macintosh and Windows systems -- the Unix approach may seem casual, murky, and primitive. But there's actually method in it.

IDEs make a lot of sense for single-language programming in a tool-poor environment. If what you're doing is confined to grinding out C or C++ code by hand and the yard, they're quite appropriate. Under Unix, however, your languages and implementation options are a lot more varied. It's common to use multiple code generators, custom configurators, and many other standard and custom tools.

IDEs do exist under Unix (there are several good open-source ones, including emulations of the major Macintosh and Windows IDEs). But it's difficult to control an open-ended variety of programming tools with them, and they're not much used. Unix encourages a more flexible style, one less exclusively centered on the edit/compile/debug loop.

In this chapter we'll introduce you to the tactics of development under Unix -- building code, managing code configurations, profiling, debugging, and how to automate away a lot of the drudgery associated with these tasks so you can concentrate on the fun parts. As usual, we'll focus more on the architectural picture than the how-to details. When you want how-to details, most of the tools in this chapter are well described in [PGS]

Unix programmers traditionally learn how to use these tools by osmosis from other programmers, and by exploration over a period of years. If you're a novice, pay careful attention; we're going to try to jump you over a big section of the Unix learning curve by showing you what is possible right at the outset. If you are an experienced Unix programmer in a hurry, you can skip this chapter -- but maybe you shouldn't. There might just be some bit of useful lore here that even you don't know, and our discussion of the size of Emacs below ties right back into some fundamental principles of the Unix approach.

Choosing an Editor

The first and most basic tool of development is a text editor suitable for or modifying writing programs.

There are literally dozens of text editors available under Unix; writing one seems to be one of the standard finger exercises for budding open-source hackers. Most of these are ephemera, not suitable for extended use by anyone other than their authors. A few are emulations of non-Unix editors, useful as transition aids for programmers used to other OSs. You can browse through a wide variety at Metalab or any other major open-source archive.

For serious editing work, there are two editors that together completely dominate the Unix programming scene. Each is available in a couple of minor variant implementations, but has a standard version you can rely on finding on any modern Unix system. These two editors are vi and emacs.

These two editors express sharply contrasting design philosophies, but both are extremely popular and command great loyalty from identifiable core user populations. Surveys of Unix programmers consistently indicate about a 50/50 split between them, with all other editors barely registering.

Beware: choice of editor, like choice of language, is a personal issue which arouses great zeal in fans of particular editors and editor variants. Arguing which is `best' is pointless and leads to flame wars. You have been warned!

We won't go into the blow-by-blow details of their commands here (we'll give you references that will do that). Instead we'll survey their capabilities with a view to helping you choose the best fit for your style.

vi: the Lightweight Approach

The vi editor was the first screen-oriented editor built for Unix; its name is an abbreviation for `visual editor' and is pronounced /vee eye/ (not /vie/ and definitely not /siks/!).

The vi editor is a small, fast, lightweight program. Its commands are generally single keystrokes, and it is particularly well suited to use by touch-typists.

Stock vi doesn't have mouse support, editing menus, macros, or assignable key bindings. Its partisans consider the lack of these features a feature; they like an editor with a simple, constant interface that they can program into their fingertips and forget about consciously. On this view, one of vi's most important virtues is that you can start editing immediately on a new Unix system without having to carry along your customizations or worrying that the default command bindings will be dangerously different from what you're used to.

One characteristic of vi that beginners tend to find frustrating is a result of its terse single-keystroke commands. It has a moded interface -- you are either in command mode or text-insertion mode. In the latter, most commands other than the ESC mode exit and perhaps the arrow keys don't operate; in the former, typing text will be interpreted as commands and do odd (and probably destructive) things to your content.

The original vi was the version shipped with 4.2BSD Unix in the early 1980s; it is now obsolete. Its replacement is `new vi' (nvi) which shipped with 4.4BSD and is found on modern 4.4BSD variants such as BSD/OS, FreeBSD and NetBSD systems. There are several variants with extended features, notably vim, vile, elvis, and xvi; of these vim is propbably the most popular and is found on many Linux systems. All the variants are pretty similar and share 85% or so of their command set unchanged from the original vi.

Ports of vi are available for DOS, OS/2, and Macintosh System 7.

Most introductory Unix books include a chapter describing basic vi usage. One place a vi FAQ is available at URL http://www.macom.co.il/vi; you can find many other copies with a WWW keyword search for page titles including "vi" and "FAQ".

Emacs: the Heavyweight Approach

Emacs stands for `EDiting MAcroS' (pronounce it /ee'maks/) It is undoubtedly the most powerful program editor in existence -- but also possibly the largest and slowest.

Emacs is a big, feature-laden, heavyweight program. While on modern hardware you won't see noticeable delays in its response to basic commands, it's expensive to start up. What it gives you in exchange is ultimate flexibility and customizability. As we observed in Chapter 3's section on Emacs Lisp, Emacs has an entire programming language inside it that can be used to write arbitrarily powerful editor functions.

The keystroke commands used in Netscape and Internet Explorer text windows (in forms and the mailer) are copied from the stock Emacs bindings for basic text editing. Unlike vi, Emacs doesn't have modes; instead, commands are normally control characters or prefixed with an ESC. However, in Emacs it is possible to bind just about any key sequence to any command, and commands may be stock or customized Lisp programs.

This power comes at a price in complexity. To use a customized Emacs you have to carry around the Lisp files that define your personal Emacs preferences. And learning how to customize Emacs is an entire art in itself. Emacs is correspondingly harder to learn than vi.

However, investing the time to learn can yield rich rewards in productivity. We'll see later in this chapter how Emacs can be used in combination with other development tools to give capabilities comparable to (and in many ways surpassing) those of conventional IDEs.

The standard Emacs, universally available on modern Unixes, is GNU Emacs; this is what generally runs if you type `emacs' to a Unix shell prompt. GNU Emacs sources and documentation are available at the Free Software Foundation archive site, ftp://gnu.org/pub/gnu.

The only major variant is called XEmacs; it has a better X interface but otherwise quite similar capabilities. XEmacs has a home page at URL http://www.xemacs.org). Emacs (and Emacs Lisp) is universally available under modern Unixes. It has been ported to MS-DOS (where it works poorly) and Windows 95 and NT (where it is said to work reasonably well).

Emacs includes its own interactive tutorial and very complete on-line documentation; you'll find instructions on how to invoke both on the default Emacs startup screen. A good introduction on paper is [LGE].

The Benefits of Knowing Both

People (like your humble author) who regularly use both vi and Emacs tend to use them for different things, and find it valuable to know both.

In general, vi is best for small jobs -- quick replies to mail, simple tweaks to system configuration, and the like. It is especially useful when you're using a new system (or a remote one over a network) and don't have your Emacs customization files handy.

Emacs comes into its own for extended editing sessions in which you have to handle complex tasks, modify multiple files, and use results from other programs during the session. For programmers using X on their console (which is typical on modern Unixes), it's normal to start up Emacs shortly after login time in a large window and leave it running forever, possibly visiting dozens of files and even running programs in multiple Emacs subwindows.

Fanatic partisans of vi castigate Emacs for being bloated, slow, and too complicated for normal human minds to comprehend. Fanatic partisans of Emacs dismiss vi as a toy with a rigid and primitive design, unsuitable for serious editing. Neither side is entirely right or wrong. An intelligent developer will learn to match the right tool to the job.

Is Emacs an argument against the Unix philosophy?

One of the standard arguments against Emacs is that it is a huge intricate program, light-years removed from the lucid simplicity of design that the founders of Unix advocated (and in fact Emacs did not originate under Unix, but in a very different culture that flourished at the renowned MIT Artificial Intelligence Lab in the 1970s).

This argument can be turned around; perhaps Emacs demonstrates that there is a class of applications for which the prescriptions of the Unix philosophy are inadequate. This argument is worth examining, because it goes to the heart of some fundamental design dilemmas in software engineering. When should we give in to the temptation to write big programs?

The contrast with vi tells us less than one might wish; vi is drastically smaller than Emacs but is by no means a simple program itself. The truly Unix-minimalist way of editing would be vi's ancestor ed(1), a line-oriented editor still used in scripts. It is theoretically complete as a way of bashing text files around, but has an interface so austere that nobody but Ken Thompson himself claims to have used it routinely since about 1985 (and Ken is widely suspected to be joking).

Clearly something about editors tends to push them in the direction of increasing complexity. In the case of vi, that something is not hard to identify; it's the desire for convenience. While ed(1) may be theoretically adequate, very few people (other than perhaps Ken) would forgo screen-oriented editing to make a statement about software bloat.

Emacs has a more complicated agenda. Its designers wanted to build a truly programmable editor that could have task-related intelligence customized into it for hundreds of different specialized editing jobs. It's just not possible to do that and stay small.

And this points us at the Unix answer: write a big program only when it is clear by demonstration that nothing else will do -- that is, when attempts to partition the problem have been made and failed. This maxim implies an astringent skepticism about large programs, and a strategy for avoiding them: look for the small-program solution first. If a single small program won't do the job, try building a toolkit of cooperating small programs to attack it. Only if both approaches fail are you free (in the Unix tradition) to build a large program without feeling you have failed the design challenge.

Let's grant that there are good reasons for Emacs to be large. The appropriate Unix-philosophy question about Emacs (and about vi, for that matter) is then: is it larger than it needs to be to do its job?

This is a book about Unix, not about Emacs, so (having made our philosophical point) we won't try to settle that question here. In Chapter 7, however, we will examine Emacs's design again from an angle that may illuminate this question -- as a case study in the use of embedded scripting languages.

Make: Automating your Development Recipes

Program sources by themselves don't make an application. The way you put them together and package them for distribution matters, too. Unix provides a tool for semi-automating these processes; make(1). Make(1) is covered in most introductory Unix books. For a really thorough reference, you can consult [MPM]. If you're using GNU make(1) (the most advanced make, and the one normally shipped with Linux) the treatment in [PGS] may be better in some respects. Most Unixes that carry GNU make will also support GNU Emacs; if yours does you will probably find a complete make manual on-line through Emacs's `info' documentation system.

Ports of GNU make to DOS and Windows are available from the FSF.

Basic Theory of make(1)

If you're developing in C or C++, an important part of the recipe for building your application will be the collection of compilation and linkage commands needed to get from your sources to working binaries. Entering these commands is a lot of tedious detail work, and most modern development environments include a way to put them in command files or databases that can automatically be re-executed to build your application.

Unix's make(1) program, the original of all these facilities, was designed specifically to help C programmers manage these recipes. It lets you write down the dependencies between files in a project in one or more `makefiles'. Each makefile consists of a series of productions; each one tells make that some given target file depends on some set of source files, and says what to do if any of the sources are newer than the target. You don't actually have to write down all dependencies, as the make program can deduce a lot of the obvious ones from file names and extensions.

For example, you might put in a makefile that the binary myprog depends on three object files myprog.o, helper.o, and stuff.o. If you have source files myprog.c, helper.c, and stuff.c, make(1) will know without being told that each .o file depends on the corresponding .c file, and supply its own standard recipe for how to build a .o file from a .c file.

When you run `make' in a project directory, the make program looks at all productions and timestamps and does the minimum amount of work necessary to make sure derived files are up to date.

You can read a good example of a moderately complex makefile in the sources for fetchmail (see the list of major case studies in Chapter 1 for more about this program). In the subsections below we'll refer to it again.

Make in non-C/C++ Development

Make(1) is not just useful not for C/C++ recipes, however. Scripting languages like those we described in Chapter 3 may not require conventional compilation and link steps, but there are often other kinds of dependencies that make(1) can help you with.

Suppose, for example, that you actually generate part of your code from a specification file, using one of the techniques we will discuss in Chapter 9. You can use make(1) to tie the spec file and the generated source together. This will ensure that whenever you change the spec and remake, the generated code will automatically be rebuilt.

It's quite common to use makefile productions to express recipes for making documentation as well as code. You'll often see this approach used to automatically generate Postscript or other derived documentation from masters written in some markup language like HTML or one of the Unix document-macro languages we'll survey in Chapter 12. In fact, this sort of use is so common that it's worth illustrating with a case study.

Case study: Make for document-file translation

In the fetchmail makefile, for example, you'll see three productions that relate files named FAQ, FEATURES, and NOTES to HTML sources fetchmail-FAQ.html, fetchmail-features.html, and design-notes.html.

The HTML files are meant to be accessible on the fetchmail web page, but all the HTML markup makes them uncomfortable to look at unless you're using a browser. So the FAQ, FEATURES and NOTES are flat-text files meant to be flipped through quickly with an editor or pager program by someone reading the fetchmail sources themselves (or, perhaps, distributed to FTP sites that don't support WWW access).

The flat-text forms can be made from their HTML masters by using the common freeware program lynx(1). Lynx is a WWW browser for text-only displays, but invoked with the -dump option it functions pretty well as an HTML-to-ASCII formatter.

With the productions in place, the developer can edit the HTML masters without having to remember to manually rebuild the flat-text forms afterwards, secure in the knowledge that FAQ, FEATURES, and NOTES will be properly rebuilt whenever they are needed.

Utility Productions

Some of the most heavily used productions in typical Makefiles don't express file dependencies at all. They're ways to bundle up little procedures that a developer wants to mechanize, like making a distribution package or removing all object files in order to do a build from scratch.

There is a well-developed set of conventions about what utility productions should be present and how they should be named. Following these will make your Makefile much easier to understand and use.

all
Your `all' production should make every executable of your project. Usually the `all' production doesn't have an explicit rule; instead it refers to all of your project's top-level targets (and, not accidentally, documents what those are). Conventionally this should be the first production in your makefile, so it will the one executed when the developer types `make' with no argument.

clean
Remove all files (such as binary executables and object files) that are normally created when you `make all'. Don't remove any derived files that came with the distribution, however.

dist
Make a source archive (usually with the tar(1) program) that can be shipped as a unit and used to rebuild the program on another machine. This target should do the equivalent of depending on `all' so that a `make dist' automatically rebuilds the whole project before making the distribution archive -- this is a good way to avoid last-minute embarrassments!

distclean
Throw away everything but what you would include if you were bundling up the source with `make dist'. This may be the the same as `make clean' but should be included as a production of its own anyway, to document what's going on. When it's different, it usually differs by throwing away local configuration files that aren't part of the normal `make all' build sequence (such as those generated by autoconf(1); we'll talk about autoconf(1) in Chapter 10 on portability).

realclean
Throw away everything you can rebuild using the makefile. This may be the same as `make distclean', but should be included as a production of its own anyway, to document what's going on. When it's different, it usually differs by throwing away files that are derived but (for whatever reason) shipped with the project sources anyway.

install
Install the project's executables and documentation in system directories so they will be accessible to general users (this typically requires root privileges). Initialize or update any databases or libraries that the executables require in order to function.

uninstall
Remove files installed in system directories by `make install' (this typically requires root privileges). The presence of an uninstall feature implies a kind of humility that experienced Unix hands look for as a sign of thoughtful design.

Working examples of all these are available for inspection in the fetchmail makefile. By studying all of them together you will see a pattern emerge, and (not incidentally) learn much about the fetchmail package's structure. One of the benefits of using these standard productions is that they form an implicit roadmap of their project.

But you need not limit yourself to these utility productions. Once you master make(1), you'll find yourself more and more often using the makefile machinery to automate little tasks that depend on your project file state. Your makefile is a convenient central place to put these; using it makes them readily available for inspection and avoids cluttering up your workspace with trivial little scripts.

Generating Makefiles

One of the subtle advantages of Unix make over the dependency databases built into many IDEs is that makefiles are simple text files -- files that can be generated by programs.

In the mid-1980s it was fairly common for large Unix program distributions to include elaborate custom shell scripts that would probe their environment and use the information they gathered to construct custom makefiles. These custom configurators reached absurd sizes (the author once wrote one that grew to 3000 lines of shell).

The community eventually said ``Enough!'' and various people set out to write tools that would automate away part or all of the process of maintaining makefiles. There are two issues these tools generally tried to address:

One is portability. Makefile generators are commonly built to run on many different hardware platforms and Unix variants. They generally try to deduce things about the local system (including everything from machine word size up to which tools, languages, service libraries, and even document formatters it has available). They then try to use those deductions to write makefiles that exploit the local system's facilities and compensate for its quirks.

The other is rule automation. It's possible to deduce a great deal about the dependencies of a collection of C sources by analyzing the sources themselves (especially by looking at what include files they use and share). Many makefile generators do this in order to mechanically generate make dependencies.

Each different makefile generator tackles these objectives in a slightly different way. There have probably been a dozen or more generators attempted, but most proved inadequate or too difficult to drive or both, and only a few are still in live use. We'll survey the major ones here. All are available as open-source software on the Internet.

makedepend

There have been several small tools that tackled the rule automation part of the problem exclusively. This one, distributed along with the X window system from MIT, is the fastest and most useful and comes preinstalled under all modern Unixes, including all Linuxes.

Makedepend simply takes a collection of C sources and generates dependencies for the corresponding .o files from their #include directives. These can be appended directly to a makefile, and in fact makedepend is defined to do exactly that.

Makedepend is useless for anything but C projects. It doesn't try to solve more than one piece of the makefile-generation problem. But what it does it does quite well.

Makedepend is sufficiently documented by its manual page. If you type `man makedepend' at a terminal window on any X console you will quickly learn what you need to know about invoking it.

imake

Imake was written in an attempt to mechanize makefile generation for the X window system (it uses makedepend as one of its components). It tackles both the rule-automation and portability problems.

The imake system effectively replaces conventional makefiles with Imakefiles. These are written in a more compact and powerful notation which is (effectively) compiled into makefiles. The compilation uses a rules file which is system-specific and includes a lot of information about the local environment.

Imake is well suited to X's particular portability and configuration challenges and universally used in projects that are part of the X distribution. However, it has not achieved much popularity outside the X developer community. It's hard to learn, hard to use, hard to extend, and produces generated makefiles of mind-numbing size and complexity.

Imake's programs will be available on any Unix that supports X, including Linux. There has been one heroic effort [PDB], to make the mysteries of imake comprehensible to non-X-programming mortals. These are worth learning if you are going to do X programming.

autoconf

Autoconf was written by people who had seen and rejected the imake approach. It generates per-project configure shellscripts that are like the old-fashioned custom script configurators. These configure scripts can generate makefiles (among other things).

Autoconf is focused on portability and does no built-in rule automation at all. Although it is probably as complex as imake, it is much more flexible and easier to extend. Rather than relying on a per-system database of rules, it generates configure shell code that goes out and searches for things.

Each configure shellscript is built from a per-project template that you have to write, called configure.in. Once generated, though, the configure script will be self-contained and can configure your project on systems that don't carry autoconf(1) itself.

The autoconf approach to makefile generation is like imake's in that you start by writing a makefile template for your project. But autoconf's Makefile.in files are basically just makefiles with placeholders in them for simple text substitution; there's no second notation to learn. If you want rule automation, you must take explicit steps to call makedepend(1) or some similar tool -- or use automake(1).

Autoconf is documented by an on-line manual in the FSF's info format. The source scripts of autoconf are available from the FSF archive site, but are also preinstalled on many Unix and Linux versions. You should be able to browse this manual through your Emacs's help system.

Despite its lack of direct support for rule automation, and despite its generally ad-hoc approach, in late 2000 autoconf is clearly the most popular of the makefile generators, and has been for some years. It has eclipsed imake and driven at least one major competitor (metaconfig) out of use.

We'll have more to say about autoconf, from a slightly different angle, in Chapter 10 on portability.

automake

Automake is an attempt to add imake-like rule automation as a layer on top of autoconf(1). You write Makefile.am templates in a broadly imake-like notation; automake(1) compiles them to Makefile.in files, which autoconf's configure scripts then operate on.

Automake is still relatively new technology in 2000. It is used in several FSF projects but has not yet been widely adopted elsewhere, though its general approach looks promising.

Complete on-line documentation is shipped with automake, which can be downloaded from the FSF archive site.

Version-Control Systems

Why Version Control?

Code evolves. As a project moves from first-cut prototype to deliverable, it goes through multiple cycles in which you explore new ground, debug, and then stabilize what you've accomplished. And this evolution doesn't stop when you first deliver for production. Most projects will need to be maintained and enhanced past the 1.0 stage, and will be released multiple times.

Code evolution raises several practical problems that can be major sources of friction and drudgery -- thus a serious drain on productivity. Every moment spent on these problems is a moment not spent on getting the design and function of your project right.

Perhaps the most important is reversion. if you make a change, and discover it's not viable, how can you revert to a code version that is known good? If reversion is difficult or unreliable, it's hard to risk making changes at all (you could tank the whole project, or make many hours of painful work for yourself).

Almost as important is change tracking. You know your code has changed; do you know why? It's easy to forget the reasons for changes and step on them later. If you have collaborators on a project, how do you know what they have changed while you weren't looking?

Another is bug tracing. It's quite common to get new bug reports for a particular version after the code has mutated away from it considerably. Sometimes you can recognize immediately that the bug has already been stomped, but often you can't. Suppose it doesn't reproduce under the new version. How do you get back the state of the code for the old version in order to reproduce and understand it?

To address these problems, you need procedures for keeping a history of your project, and annotating it with comments that explain the history. If your project has more than one developer, you also need mechanisms for making sure developers don't step on each others' versions.

Version Control By Hand

The most primitive (but still very common) method is all hand-hacking. One snapshots the project periodically by manually copying everything in it to a backup. One includes history comments in source files. One makes verbal or email arrangements with other developers to keep their hands off certain files while you hack them.

The hidden costs of this hand-hacking method are high, especially when (as frequently happens) it breaks down. The procedures take time and concentration; they're prone to error, and tend to get slipped under pressure or when the project is in trouble -- that is, exactly when they are most needed.

Automated Version Control

To avoid these problems, you can use a version-control system (VCS), a suite of programs that automates away most of the drudgery involved in keeping an annotated history of your project and avoiding modification conflicts.

Most VCSs share the same basic logic. To use one, you start by registering a collection of source files -- that is, telling your VCS to start archive files describing their change histories. Thereafter, when you want to edit one of these files, you have to check out the file -- assert an exclusive lock on it. When you're done, you check in the file, adding your changes to the archive, releasing the lock, and entering a change comment explaining what you did.

The history of the project is not necessarily linear. All VCSs in common use actually allow you to maintain a tree of variant versions (for ports to different machines, say) with tools for merging branches back into the main "trunk" version.

Most of the rest of what a VCS does is convenience, labeling, and reporting features surrounding these basic operations -- tools which allow you to view differences between versions, or to group a given set of versions of files as a named release which can be examined or reverted to at any time without losing later changes.

VCSs have their problems. The biggest one is that using a VCS involves extra steps every time you want to edit a file, steps which developers in a hurry tend to want to skip if they have to be done by hand. Near the end of this chapter we'll discuss a way to solve this one.

Another problem is that there are some kinds of natural operations that tend to confuse VCSs. Renaming files is a notorious trouble spot; it's not easy to automatically ensure that a file's version history will be carried along with it when it is renamed.

Despite these difficulties, VCSs are a huge boon to productivity and code quality in many ways, even for small single-developer projects. They automate away many procedures that are just tedious work. They help a lot in recovering from mistakes. Perhaps most importantly, they free programmers to experiment by guaranteeing that reverting to a known-good state will always be easy.

(VCSs, by the way not merely good for program code; the manuscript of this book was maintained as a collection of files under RCS while it was being written.)

Unix Tools for Version Control

Historically, three VCSs have been of major significance in the Unix world, and we'll survey them here. For an extended introduction and tutorial, consult [ARS].

SCCS

The first was SCCS, the original Source Code Control System developed by Bell Labs around 1980 and featured in System III Unix. SCCS seems to have been the first serious attempt at a unified source-code management system; concepts that it pioneered are still found at some level in all later ones, including commercial Unix and Windows products such as ClearCase.

SCCS itself is, however, now obsolete. It was proprietary Bell Labs software; superior open-source alternatives have since been developed, and most of the Unix world has converted to those. SCCS is still in use to manage old projects at some commercial vendors, but can no longer be recommended for new projects.

No complete open-source implementation of SCCS exists. A clone called CSSC is in development under the sponsorship of the FSF.

RCS

The ``superior open-source alternatives'' began with RCS (Revision Control System), born at Purdue University a few years after SCCS and originally distributed with 4.3BSD Unix. It is logically similar to SCCS but has a cleaner command interface, and good facilities for grouping together entire project releases under symbolic names.

RCS is the currently the most widely used version control system in the Unix world. Most other Unix version-control systems use it as a back end or underlayer. It is well suited for single-developer or small-group projects hosted at a single development shop.

The RCS sources are maintained and distributed by the FSF. Free ports are available for Microsoft operating systems and VAX VMS.

CVS

CVS (Concurrent Version System) is technically just a front end to RCS developed in the early 1990s, but the model of version control it uses is different enough that it qualifies as a new design.

Unlike RCS and SCCS, CVS doesn't exclusively lock files when they're checked out. Instead, it tries to reconcile non-conflicting changes mechanically when they're checked back in, and requests human help on conflicts. The design works because patch conflicts are much less common than one might intuitively think.

CVS's interface is significantly more complex than RCS's, and it needs a lot more disk space. These make it a poor choice for small projects. On the other hand, CVS is well suited to large multi-developer efforts distributed across several development sites connected by the Internet. CVS tools on a client machine can easily be told to direct their operations to a repository located on a different host.

The open-source community makes heavy use of CVS for projects such as GNOME and Mozilla. Typically, such CVS repositories allow anyone to check out sources remotely. Anyone can, therefore, make a local copy of a project, modify it, and mail change patches to the project maintainers. Actual write access to the repository is more limited and has to be explicitly granted by the project maintainers. A developer who has such access can perform a `commit' option from his modified local copy, which will cause the local changes to get made directly to the remote repository.

You can see an example of a well-run CVS repository, accessible over the Internet, at cvs.gnome.org. This site illustrates the use of CVS-aware browsing tools such as Bonsai, which are very useful in helping a large and decentralized group of developers coordinate their work.

The social machinery and philosophy accompanying the use of CVS is as important as the details of the tools. The assumption is that projects will be open and decentralized, with code subject to peer review and inspection even by developers who are not officially members of the project group.

The CVS sources are maintained and distributed by the FSF.

[FIXME: Add sections on Aegis and/or Bitkeeper.]

Run-time debugging

Anyone who has been programming longer than a week knows that getting the syntax of your programming language right is the easy part of debugging. The hard part comes after that, when you need to understand why your syntactically correct program doesn't behave as you expect.

The Unix tradition encourages developers to anticipate this problem by designing for transparency -- that is, designing programs in such a way that their internal data flows are readily monitored with the naked eye and simple tools, and readily mentally modeled. This is why many Unix programs have `verbose' (-v) options. The fetchmail program, for example, has one that causes all of its POP/IMAP and SMTP transactions to be dumped to standard output as they happen; frequently, transcripts of such sessions have been sufficient to characterize bugs or server incompatibilities exactly even despite the absence of a test load with which to reproduce the problem.

Design for transparency is valuable both at the low level of easing the runtime-debugging task after the fact and at the high level of preventing bugs before the fact. A program designed so that its internal data flows are readily comprehensible is more likely to be one that does not fail due to uncontrolled complexity and bad interactions.

Design for transparency is not, however, sufficient in itself. When debugging a program at runtime, it's extremely useful to be able to examine the state of your program at runtime, set breakpoints, and execute pieces of it down to the single-statement level in a controlled way. Unix has a long tradition of hosting programs to help you with this. Linux (and most other open-source Unixes) feature a powerful one called gdb (yet another FSF project) that supports C and C++ debugging.

Perl, Python, Java, and Emacs Lisp all support standard packages or programs (included with their base distributions) which allow you to set breakpoints, control execution, and do general runtime-debugger things. Tcl/Tk, designed as a small language for small projects, has no such facility (though it does have a trace facility that can be used to watch variables at runtime).

Remember the Unix philosophy. Spend your time on quality, not the low-level details, and automate away everything you can -- including the detail work of run-time debugging.

Profiling

As a general rule, 90% of your program will be spent in 10% of its code. Profilers are tools that help you identify the 10% ``hot spots'' that constrain the speed of your program. This is a good thing for making it faster.

But in the Unix tradition, profilers have a far more important function. They enable you not to optimize the other 90%! This is good, and not just because it saves you work. The really valuable effect is that not optimizing that 90% holds down global complexity and reduces bugs.

You may recall that we quoted Donald Knuth observing ``Premature optimization is the root of all evil'' in Chapter 1, and that Rob Pike and Ken Thompson had a few pungent observations on the topic as well. These were the voices of experience. Do good design. think about what's right first. Tune for efficiency later.

Profilers help you do this. If you get in the good habit of using them, you can get rid of the bad habit of premature optimization. They don't just change the way you work; they change how you think.

Profilers for compiled languages rely on instrumenting object code, so they are even more platform-dependent than compilers. On the other hand, a compiled-language profiler doesn't care about the source language of the programs it instruments. Under Unix, the single profiler gprof(1) handles C, C++, and all other compiled languages.

Perl, Python, and Emacs Lisp have their own profilers included in their basic distributions; these are portable across all platforms on which the host languages themselves run. Java and Tcl/Tk have no profiling support as yet (though in Java's case that is likely to have been fixed by the time you read this).

Emacs as the Universal Front End

One of the things the Emacs editor is very good at is front-ending for other development tools. In fact, nearly every tool we've discussed in this chapter can be driven from within an Emacs editor sessions through front ends that give them greater utility than they would have running standalone. Read and learn -- not just about Emacs, but about the subtle art of creating synergy between programs.

To illustrate this, we'll walk you through the use of these tools with Emacs in a typical build/test/debug cycle. For details on them, see Emacs's own on-line help system; the purpose of this section is to give you an overview and help motivate you to learn more.

Emacs and make(1)

Make, for example, can be started with the Emacs command ESC-x compile followed by an Enter. This command will run make(1) in the current directory, capturing the output in an Emacs buffer.

This by itself wouldn't be very useful. But Emacs's make mode knows about the error message format (featuring a source file and line number) emitted by Unix C compilers and many other tools.

If anything run by the make issues error messages, the command Ctl-X ` will try to parse them and take you to each error location in turn, popping open a window on the appropriate file and taking the cursor to the error line.

This makes it extremely easy to step through an entire build fixing any syntax that has been broken since the last compile.

Emacs and run-time debugging

For catching runtime errors, Emacs offers similar integration with your symbolic debugger -- that is, you can use an Emacs mode to set breakpoints in your programs and examine their runtime state. You run the debugger by sending it commands through an Emacs window. Whenever the debugger stops on a breakpoint, the message the debugger ships back about the source location is parsed and used to pop up a window on the source round the breakpoint.

Emacs's Grand Unified Debugger mode supports all the major C debuggers; gdb(1), sdb, dbx, and xdb. It also supports Perl symbolic debugging via the perldb module, and the standard debuggers for both Java and Python. Facilities built into Emacs Lisp itself support `electric debugging' of Emacs Lisp code.

At time of writing (early 2001) there is not yet support for Tcl debugging from within Emacs. The design of Tcl is such that it seems unlikely to be added.

Emacs and version control

Once you've corrected your program's syntax and fixed its runtime bugs, you may want to save the changes into a version-controlled archive. You have been using version control to avoid embarrassing accidents, haven't you? ... No? ... You haven't?

If you've only tried running version-control tools from the shell, it's hard to blame you for sloughing off this important step. Who wants to have to remember to run checkout/checkin commands around every edit operation?

Fortunately, Emacs offers help here too. Code built into Emacs implements a simple-to-use front end for SCCS, RCS, or CVS. The single command Ctl-x v v tries to deduce the next logical version-control operation to do on the file you are visiting. The operations this includes are registering a file, checking out and locking it, and checking it back in (accepting a change comment in a pop-up buffer).

Emacs also helps you view the change history of version-controlled files, and helps you revert out changes you don't want. It makes it easy to apply version-control operations to whole sets or project directory trees of files. In general, it does a pretty good job of making version-control opertions painless.

The implications of this are larger than you might guess before you've gotten used to it. You'll find, once you get used to fast and easy version control, that it's extremely liberating. Because you know you can always revert to a known-good state, you'll find you feel more free to develop in a more fluid and exploratory way, trying lots of changes out to see their effects.

Emacs and Profiling

Surprise...this is perhaps the only phase of the development cycle in which Emacs front-ending does not offer substantial help. Profiling is an intrinsically batchy operation -- instrument your program, run it, view the statistics, speed-tune the code with an editor, repeat. There isn't much room for Emacs leverage in the profiling-specific parts of this cycle.

Nevertheless, there's a good tutorial reason for us to think about Emacs and profiling. If you found yourself analyzing a lot of profiling reports, it might pay you to write a mode in which a mouse click or keystroke on a profile report line visited the source of the relevant function. This actually would be fairly easy to do using the Emacs `tags' code. In fact, by the time you read this, some other reader may already have written such a mode and contributed it to the public Emacs code base.

The real point here is again a philosophical one. Don't drudge -- drudging wastes your time and productivity! If you find yourself spending a lot of time on the low-level mechanical parts of development, step back. Apply the Unix philosophy. Use your toolkit to automate or semi-automate the task.

Then (if possible) give back something in return for all the nifty free tools you've inherited, by posting your solution as open-source software to the Internet. Help liberate your fellow programmers from drudgery, too.

Like An IDE, Only Better...

Earlier on we asserted that Emacs can give you capabilities resembling those of a conventional integrated development environment, only better. By now you should have enough facts in hand to see how that can be true. You can run entire development projects from inside Emacs, driving the low-level mechanics with a few keystrokes and saving yourself the mental effort and disruption of constantly switching contexts.

The Emacs-enabled development style trades away some capabilities of advanced IDEs, like graphical views of program structure. But those are frills. What Emacs gives you in return is flexibility and control. You're not limited by the imagination of the IDE designer -- you can tweak, customize, and add task-related intelligence using Emacs Lisp.

Finally, you're not limited to accepting what one small group of IDE developers sees fit to support. By keeping an eye on the open-source community you can leverage the work of thousands of your peers, Emacs-using developers facing challenges much like yours. This is much more effective -- and much more fun.

References

GST
Loukides, Mike & Oram, Andy; Programming with GNU Software, O'Reilly Associates 1996, ISBN 1-56592-112-7.
LGE
Cameron, Debra & Rosenblatt, Bill & Raymond, Eric; Learning Gnu Emacs (2nd Edition), O'Reilly Associates 1996, ISBN 1-56592-152-6.
MPM
Oram, Andrew & Talbot, Steve; Managing Projects With Make, O'Reilly Associates 1991, ISBN 0-937175-90-0.
PDB
DuBois, Paul; Software Portability with Imake, O'Reilly Associates 1993, ISBN 1-56592-055-4.
ARS
Bolinger, Dan & Bronson, Tan Applying RCS and SCCS, O'Reilly Associates 1995, ISBN 1-56592-117-8.

Eric S. Raymond <esr@snark.thyrsus.com>