Chapter 2: On Not Reinventing the Wheel

"When the superior man refrains from acting, his force is felt for a thousand miles."

-- Tao Te Ching (as popularly mistranslated)


Laziness Is An Economic Virtue

Reluctance to do unnecessary work is a great virtue in programmers. If the Chinese sage Lao-Tze were alive today and still teaching the way of the Tao, he would probably be mistranslated as: when the superior programmer refrains from coding, his force is felt for a thousand miles. In fact, recent translators have suggested that the Chinese term "wu-wei" that has traditionally been rendered as "inaction" or "refraining from action" should probably be read as "least action" or "most efficient action" or "action in accordance with natural law", which is an even better description of good engineering practice!

Thinking time is precious and very expensive relative to all the other overheads that go into software development; accordingly, it should be spent solving new problems rather than rehashing old ones for which known solutions already exist. This attitude gives the best return both in the ``soft'' terms of developing human capital and in the ``hard'' terms of economic return on development investment.

The most effective way to avoid reinventing the wheel is to borrow someone else's design and implementation of it. In other words, to reuse code.

The virtuousness of code reuse is one of the great apple-pie-and-motherhood verities of software development. When you're stuck with reusing proprietary object-code libraries, however, it can seem pretty unattractive. They're often poorly documented, buggy, inflexible, and not quite what you want.

Accordingly, programmers outside the Unix world often get into the habit of coding all their service functions from scratch. Or, if they try to re-use, they too often find they have to spend more effort probing a library's behavior and then coding around its kinks and bugs than they would have if they had coded from scratch.

These frustrations are far less likely to bite when the code you are attempting to reuse is available in source. Decently commented source code is its own documentation. Bugs in source code can be fixed. Source can be instrumented and compiled for debugging to make probing its behavior in obscure cases easier. And if you need to change its behavior, you can do that.

In the early days of Unix, components of the operating system. its libraries, and its associated utilities were passed around in source code; this openness was a vital part of the Unix culture. When this tradition was disrupted in the 1980s, Unix lost both its initial momentum and contact with its own roots. A decade later, the rise of the GNU toolkit and Linux prompted a rediscovery of the value of open-source code.

Today, open-source code is again one of the most powerful tools in any Unix programmer's kit. Accordingly, though the explicit concept of "open source" and the most widely used open-source licenses are decades younger than Unix itself, it's important to understand both in order to understand the dynamism of today's Unix culture.

In this chapter, we'll survey various issues associated with re-using open-source code: evaluation, documentation, and licensing. In Chapter 12 we'll discuss the open-source development model more generally.

The Best Things In Life Are Free

Literally terabytes of Unix sources for systems and applications software, service libraries, GUI toolkits and hardware drivers are available for the taking on the Internet. You can have most built and running in minutes with standard tools.

Programmers from outside the Unix world are often prone to think open-source (or `free' software) is necessarily inferior to the commercial kind, that it's shoddily made and unreliable and will cause one more headaches than it saves. They miss an important point -- in general, open-source software is written by people who care about it, need it, use it themselves, and are putting their individual reputations among their peers on the line by publishing it. They are therefore more strongly motivated to do a good job than wage slaves toiling Dilbert-like in the cubicles of proprietary software houses.

Furthermore, the open-source user community (those peers) is not shy about nailing bugs, and its standards are high. Authors who put out substandard work experience a lot of social pressure to fix their code or withdraw it, and can get a lot of skilled help fixing it if they choose. As a result, mature open-source packages are generally of high quality and often functionally superior to any proprietary equivalent. They may lack polish and have documentation that assumes much, but the vital parts will usually work quite well.

If you are a programmer from outside the Unix world, you may find this claim difficult to believe (and not without reason; it is an unfortunate fact that other OSs, when they have source-sharing cultures at all, tend to have weak and amateurish ones that only rarely produce high-quality work). If so, consider this: on modern Unixes, the C compiler itself is almost invariably open-source. The Free Software Foundation's GNU C Compiler (GCC) is so powerful, so well documented, and so reliable that there is effectively no commercial Unix compiler market left, and it has become normal for Unix vendors to port GCC to their platforms rather than do in-house compiler development.

The way to evaluate an open-source package is to read its documentation and skim some of its code. If what you see appears to be competently written and documented with care, be encouraged. If there also is evidence that the package has been around for a while and incorporated substantial user feedback, you may safely bet that it is quite reliable.

A good gauge of maturity and the volume of user feedback is the number of people besides the original author mentioned in the README and project news or history files in the source distribution. Credits to lots of people for sending in fixes and patches are signs both of a significant user base keeping the authors on their toes, and of a conscientious maintainer who is responsive to feedback and will take corrections.

It's also a good omen when the software has its own web page, on-line FAQ (Frequently-Asked Questions) list, and an associated mailing list or USENET newsgroup. These are all signs that a live and substantial community of interest has grown up around the software. Packages that are duds just don't get this kind of continuing investment, because they can't reward it.

Here are some examples of what web pages associated with high-quality open-source software look like:

GIMP
http://www.xcf.berkeley.edu/~gimp/gimp.html
GNOME
http://www.gnome.org
KDE
http://www.kde.org
fetchmail
http://www.tuxedo.org/~esr/fetchmail
Python
http://www.python.org

Where Should I Look?

To begin to grasp something of the amazing wealth of free resources out there, surf to the Linux archives at Metalab, URL http://metalab.unc.edu. It is one of the largest in the world, and has a better interface to the World Wide Web than most (the program that creates its Web look and feel will, in fact, be one of our case studies in the discussion of Perl in Chapter 3). It's also the home site of the Linux Documentation Project, which maintains many documents that are excellent resources for Unix users and developers.

Recently, the SourceForge archive at http://www.sourceforge.net has become extremely important. In the future it may take over Metalab's crown as the most important open-source archive in the world.

These archives are general-purpose and contain code in many languages, but most of their content is C or C++. There are also sites specialized around some of the interpreted languages we'll look at in Chapter 3.

The CPAN archive is the central repository for useful free code in Perl. It is easily reached from the Perl home page at URL http://www.perl.com/perl.

The Python Software Activity makes an archive of Python software and documentation available at the Python Home Page, URL http://www.dstc.edu.au/www.python.org.

Many Java applets and pointers to other sites featuring free Java software are made available at URL http://java.sun.com/applets/.

One of the most valuable ways you can invest your time as a Unix developer is to spend time wandering around these sites learning what is available for you to reuse. The coding time you save may be your own!

What Are The Issues In Using Open-Source Software

There are three major issue in using open-source software; quality, documentation, and licensing terms. We've seen above if you exercise a little judgement in picking through your alternatives, you will generally find one or more of quite respectable quality.

Documentation is often a more serious issue. Many high-quality open-source packages are less useful than they technically ought to be due to poor documentation. Unix tradition encourages a rather hieratic style of documentation, one which (while it may technically capture all of a package's features) assumes that the reader is intimately familiar with the application domain and reading very carefully.

Thus, for example, manual pages for Unix utilities often consist of a terse one-sentence summary of the utility's function, followed by a bewildering list of minutely-described command-line options, followed by a cursory description of its theory of operation, followed by references to related utilities. This sort of thing makes an excellent reference but a very daunting introduction.

The best advice we can give is: pay careful attention. What you need to know will probably be there, but you're likely to have to read the entire document carefully and think about each sentence in context before achieving enlightenment.

The most serious issue in reusing open-source software (especially in any kind of commercial product) is understanding what obligations, if any, the package's license puts upon you. In the next two sections we'll discuss this issue in detail.

Standard Open Source Licenses

Here are the standard open-source license terms you are likely to encounter. These specific codes listed here are those recommended by the Metalab archive maintainers, and are found in the LSM or ``Linux Software Map'' files of synoptic information associated with almost all Sunsite packages.

PD
Placed in public domain

MIT
MIT X Consortium license (like BSD's but with no advertising requirement)

BSD
Berkeley Regents copyright (used on BSD code)

Artistic License
Same terms as Perl Artistic License

GPL
GNU General Public License

GPL 2.0
GNU General Public License, version 2.0

GPL+LGPL
GNU GPL and Library (or `Lesser') GPL

OSD
Copyrighted, freely redistributable, may require modified versions to be distributed as base plus patches.

All of the standard licenses conform to a meta-license called the ``Open Source Definition'' which is widely accepted in the open-source community as an articulation of the social contract among open-source developers. A pointer to it is included below.

The Logic of Licenses: How To Pick One

The choice of license terms involves decisions about what, if any restrictions the author wants to put on what people do with the software. This following discussion is written from the author's point of view, but you can easily translate it to understand your restrictions if you want to use software that is under such a license.

Public Domain

If you want to make no restrictions at all, you should put your software in the public domain. An appropriate way to do this would be to include something like the following text at the head of each file:

Placed in public domain by J. Random Hacker, 2000.  Share and enjoy!
If you do this, you are surrendering your copyright. Anyone can do anything they like with any part of the text. It doesn't get any freer than this. Very little open-source software is actually placed in the public domain. On Metalab (the largest single archive of open-source in the Linux world) in mid-1997, only 3% of over 2600 software packages and documents announced PD status.

Copyright Status and Licenses

Anything that is not public domain has a copyright, possibly more than one. Under the Berne Convention (which has been U.S. law since 1978), the copyright does not have to be explicit. That is, the authors of a work hold copyright even if there is no copyright notice.

Who counts as an author can be very complicated, especially for software that has been worked on by many hands. This is why licenses are important. By setting out the terms under which material can be used, they grant rights to the users that protect them from arbitrary actions by the copyright holders.

In proprietary software, the license terms are designed to protect the copyright. They're a way of granting a few rights to users while reserving as much legal territory is possible for the owner (the copyright holder). The copyright holder is very important, and the license logic so restrictive that the exact technicalities of the license terms are usually unimportant.

In open-source software, the situation is usually the exact opposite; the copyright exists to protect the license. The only rights the copyright holder always keeps are to enforce the license. Otherwise, only a few rights are reserved and most choices pass to the user. In particular, the copyright holder cannot change the terms on a copy you already have. Therefore, in open-source software the copyright holder is almost irrelevant -- but the license terms are very important.

Normally the copyright holder of a project is the current project leader or sponsoring organization. Transfer of the project to a new leader is often signaled by changing the copyright holder. However, this is not a hard and fast rule; many open-source projects have multiple copyright holders, and there is no instance on record of this leading to legal problems. Some projects choose to assign copyright to the Free Software Foundation, on the theory that it has an interest in defending open source and lawyers available to do it.

What Qualifies as Open Source

For licensing purposes, we can distinguish several different kinds of rights that a license may convey. Rights to copy and redistribute, rights to use, rights to modify for personal use, and rights to redistribute modified copies. A license may restrict or attach conditions to any of these rights.

The Open Source Definition is the result of a great deal of thought about what makes software ``open source'' or (in older terminology) ``free''. Its constraints on licensing require that:

The guidelines prohibit restrictions on redistribution of modified binaries; this meets the needs of software distributors, who need to be able to ship working code without encumbrance. It allows authors to require that modified sources be redistributed as pristine sources plus patches, thus establishing the author's intentions and an ``audit trail'' of any changes by others.

The OSD is the legal definition of the `OSI Certified Open Source' certification mark, and as good a definition of ``free software'' as anyone has ever come up with. All of the standard licenses (MIT, BSD, Artistic, and GPL/LGPL) meet it (though some, like GPL, have other restrictions which you should understand before choosing it).

Note that licenses which allow noncommercial use only do not qualify as open-source licenses, even if they are based on ``GPL'' or some other standard license. They discriminate against particular occupations, persons, and groups. They make life prohibitively complicated for CD-ROM distributors and others trying to spread open-source software commercially.

Why You Should Use A Standard License

The widely-known OSD-conformant licenses have well-established interpretive traditions. Developers (and, to the extent they care, users) know what they imply, and have a reasonable take on the risks and tradeoffs they involve. Therefore, use one of the standard licenses carried on the OSI site if at all possible.

If you must write your own license, be sure to have it certified by OSI. This will avoid a lot of argument and overhead. Unless you've been through it, you have no idea how nasty a licensing flamewar can get; people become passionate because the licenses are regarded as almost-sacred covenants touching the core values of the open-source community.

Furthermore, the presence of an established interpretive tradition may prove important if your license is ever tested in court. At time of writing (early 2001) there is no case law either supporting or invalidating any open-source license. However, it is a legal doctrine (at least in the U.S., and probably in other common-law countries such as England and the rest of the British Commonwealth) that courts are supposed to interpret licenses and contracts according to the expectations and practices of the community in which they originated. There is thus good reason to hope that open-source community practice will be determinitive when the court system finally has to cope.

Varieties of Open-Source Licensing

MIT or X Consortium license

The loosest kind of free-software license is one that grants unrestricted rights to copy, use, modify, and redistribute modified copies as long as a copy of the copyright and license terms is retained in all modified versions.

You can find a template for the standard X Consortium license at http://www.opensource.org/licenses/mit-license.html.

Most "shareware" licenses have terms like this as well. They may request a donation, but they don't make it a condition of use.

BSD Classic License

The next most restrictive kind of license grants unrestricted rights to copy, use, modify, and redistribute modified copies as long as a copy of the copyright and license terms is retained in all modified versions, and an acknowledgement is made in advertising or documentation associated with the package.

The original BSD license is the best-known license of this kind. Among parts of the free-software cullture that trace their lineages back to BSD Unix, this license is used even on a lot of free software that was written thousands of miles from Berkeley.

It is also not uncommon to find minor variants of the BSD license that change the copyright holder and omit the advertising requirement (making it effectively equivalent to the MIT license). Note that in mid-1999 the Office of Technology Transfer of the University of California rescinded the advertising clause in the BSD license. So the license on the BSD software has been relaxed in exactly this way.

You can find a BSD license template at http://www.opensource.org/licenses/bsd-license.html.

Artistic License

The next most restrictive kind of license grants unrestricted rights to copy, use, and locally modify. It allows redistribution of modified binaries, but restricts redistribution of modified sources in ways intended to protect the interests of the authors and the free-software community.

The `Artistic License', devised for Perl and widely used in the Perl developer community, is of this kind. It requires modified files to contain "prominent notice" that they have been altered. It also requires people who redistribute changes to make them freely available and make efforts to propagate them back to the free-software community.

You can find a copy of the Artistic License at http://www.opensource.org/licenses/artistic-license.html.

General Public License

The GNU General Public License (and its derivative, the Library or ``Lesser'' GPL) is the single most widely used free-software license. Like the Artistic License, it allows redistribution of modified sources provided the modified files bear "prominent notice".

The GPL also requires that interactive programs licensed under GPL include a startup banner referring to the GPL. It also requires that any program containing parts that are under GPL be wholly GPLed. (The exact circumstances that trigger this requirement are not perfectly clear to everybody.)

These extra requirements actually make the GPL more restrictive than any of the other commonly-used licenses. (Larry Wall developed the Artistic License to avoid them while serving many of the same objectives.)

You can find a pointer to the GPL, and instructions about how to apply it, at http://www.gnu.org/copyleft.html.

Mozilla Public License

The Mozilla Public License is designed to support software which is open source, but may be linked with closed-source modules or extensions. It requires that the distributed software ("Covered Code") remain open, but permits add-ons called through a defined API to remain closed.

You can find a template for the MPL at http://www.opensource.org/licenses/MPL-1.0.html.

The Open Source Definition

(Version 1.7)

Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria:

1. Free Redistribution

The license may not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license may not require a royalty or other fee for such sale.

2. Source Code

The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost -- preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.

3. Derived Works

The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.

4. Integrity of The Author's Source Code.

The license may restrict source-code from being distributed in modified form only if the license allows the distribution of "patch files" with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software.

5. No Discrimination Against Persons or Groups.

The license must not discriminate against any person or group of persons.

6. No Discrimination Against Fields of Endeavor.

The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

7. Distribution of License.

The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.

8. License Must Not Be Specific to a Product.

The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution.

9. License Must Not Contaminate Other Software.

The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software.

Rationale for the Open Source Definition

The intent of the Open Source Definition is to write down a concrete set of criteria that we believe capture the essence of what the software development community wants ``Open Source'' to mean -- criteria that ensure that software distributed under an open-source license will be available for independent peer review and continuous evolutionary improvement and selection, reaching levels of reliability and power no closed product can attain.

For the evolutionary process to work, we have to counter short-term incentives for people to stop contributing to the software gene pool. This means the license terms must prevent people from locking up software where very few people can see or modify it.

When software developers distribute their software under OSI approved software licenses, they can apply the "OSI Certified" mark to that software. This certification mark informs users of that software that the license complies with the intent of the Open Source Definition.

1. Free Redistribution

By constraining the license to require free redistribution, we eliminate the temptation to throw away many long-term gains in order to make a few short-term sales dollars. If we didn't do this, there would be lots of pressure for cooperators to defect.

2. Source Code

We require access to un-obfuscated source code because you can't evolve programs without modifying them. Since our purpose is to make evolution easy, we require that modification be made easy.

3. Derived Works

The mere ability to read source isn't enough to support independent peer review and rapid evolutionary selection. For rapid evolution to happen, people need to be able to experiment with and redistribute modifications.

4. Integrity of The Author's Source Code

Encouraging lots of improvement is a good thing, but users have a right to know who is responsible for the software they are using. Authors and maintainers have reciprocal right to know what they're being asked to support and protect their reputations.

Accordingly, an open-source license must guarantee that source be readily available, but may require that it be distributed as pristine base sources plus patches. In this way, "unofficial" changes can be made available but readily distinguished from the base source.

5. No Discrimination Against Persons or Groups.

In order to get the maximum benefit from the process, the maximum diversity of persons and groups should be equally eligible to contribute to open sources. Therefore we forbid any open-source license from locking anybody out of the process.

6. No Discrimination Against Fields of Endeavor.

The major intention of this clause is to prohibit license traps that prevent open source from being used commercially. We want commercial users to join our community, not feel excluded from it.

7. Distribution of License.

This clause is intended to forbid closing up software by indirect means such as requiring a non-disclosure agreement.

8. License Must Not Be Specific to a Product.

This clause forecloses yet another class of license traps.

9. License Must Not Contaminate Other Software.

People who want to use or redistribute open-source software have the right to make their own choices about their own software.

When You Need A Lawyer

This section is directed to commercial developers considering incorporating software that falls under one of these standard licenses into closed-source products.

Having gone through all this legal verbiage, the expected thing for us to do at this point is to utter a somber disclaimer to the effect that we are not lawyers, and that if you have any doubts about the legality of something you want to do with free software, you should immediately consult a lawyer.

With all due respect to the legal profession, this would be fearful nonsense. The language of these licenses is as clear as legalese gets (they were written to be clear) and should not be at all hard to understand if you read it carefully. The lawyers and courts are actually more confused than you are. The law of software rights is murky, and case law on free-software licenses is (as of fall 2000) nonexistent; no one has ever been sued under them.

This means a lawyer is unlikely to have any better insight than a careful lay reader. But lawyers are professionally paranoid about anything they don't understand. So if you ask one, he's almost certainly going to tell you that you shouldn't go anywhere near open-source software, despite the fact that he probably doesn't understand the technical aspects or the author's intentions anywhere near as well as you do.

Finally, the people who put their work under open-source licenses are generally not mega-corporations attended by schools of lawyers looking for blood in the water. They're individuals or volunteer groups who mainly want to give their software away. Your odds of getting hauled into court on an innocent technical violation are probably lower than your chances of being struck by lightning in the next week.

This isn't to say you should treat these licenses as jokes. That would be disrespectful of the creativity and sweat that went into the software, and you wouldn't enjoy being the first litigation target of an enraged author no matter how the suit came out. But if you make a visible good-faith effort to meet the author's intentions, you should be fine.

Under these licenses, the only kind of open-source use you should really worry about is actual incorporation of the free-software code into a proprietary product (as opposed, say, to merely using open-source development tools to make your product). If you're prepared to include proper license acknowledgements and pointers to the source code you're using in your product documentation, even direct incorporation should be safe for any license looser than the GPL.

The GPL is the only sticky case. And it's clause 2(b), requires that any derivative work of a GPLed program itself be GPLed, that causes the controversy. (Clause 3(b) requiring licensors to make source available on physical media on demand used to cause some, but the Internet explosion has made publishing source code archives a la 3(a) so cheap that nobody worries about the source-publication requirement any more.)

Nobody is quite certain what that the ``contains or is derived from''in clause 2(b) means, nor what kinds of use are protected by the ``mere aggregation'' language a few paragraphs later.

Some people think the 2(b) language is deliberately designed to infect every part of any commercial program that uses even a snippet of GPLed code; such people refer to it as the GPV, or ``General Public Virus''. Others think the ``mere aggregation'' language covers everything short of mixing GPL and non-GPL code in the same compilation or linkage unit.

This uncertainty has caused enough agitation in the open-source community that the FSF had to develop the special, slightly more relaxed ``Library GPL'' (which they have since renamed the ``Lesser GPL'') to reassure people they could continue to use the runtime libraries that come with FSF's GNU C compiler.

You'll have to choose your own interpretation of clause 2(b); most lawyers will not understand the issues involved, and there is no case law. As a matter of empirical fact, the FSF has (up to February 2001, at least) never sued anyone under the GPL since it was founded in 1984. And, as another empirical fact, Netscape includes the source and object of a GPLed program with the commercial distribution of its Netscape Navigator browser.

Open-Source Software in the Rest Of This Book

In the rest of this book, we will often be referring the reader to open-source development tools. We want to reinforce here a point we implicitly made earlier -- we are going to recommend these tools not because they are freely available but because they are the best available. Many have outcompeted proprietary alternatives; in some cases, they are so good that they have left no market niche for closed-source developers to enter.

There are many reasons Unix open-source software tends to be of high quality. We've touched on them here, and will discuss the phenomenon in more detail in Chapter 12 on the open development model. For now we'll observe that all the forces that tend to make free software better have the most impact on development tools -- programs that are in daily use by a large and able population of developers.

As a general rule, you will find that any kind of development tool or application that is in constant use by open-source developers is done better in open source than in any of its commercial alternatives. In particular, you can use the tools we recommend in this book with confidence; they have been through the fire.


Eric S. Raymond <esr@snark.thyrsus.com>