"When the superior man refrains from acting, his force is felt for a thousand miles."-- Tao Te Ching (as popularly mistranslated)
Thinking time is precious and very expensive relative to all the other overheads that go into software development; accordingly, it should be spent solving new problems rather than rehashing old ones for which known solutions already exist. This attitude gives the best return both in the ``soft'' terms of developing human capital and in the ``hard'' terms of economic return on development investment.
The most effective way to avoid reinventing the wheel is to borrow someone else's design and implementation of it. In other words, to reuse code.
The virtuousness of code reuse is one of the great apple-pie-and-motherhood verities of software development. When you're stuck with reusing proprietary object-code libraries, however, it can seem pretty unattractive. They're often poorly documented, buggy, inflexible, and not quite what you want.
Accordingly, programmers outside the Unix world often get into the habit of coding all their service functions from scratch. Or, if they try to re-use, they too often find they have to spend more effort probing a library's behavior and then coding around its kinks and bugs than they would have if they had coded from scratch.
These frustrations are far less likely to bite when the code you are attempting to reuse is available in source. Decently commented source code is its own documentation. Bugs in source code can be fixed. Source can be instrumented and compiled for debugging to make probing its behavior in obscure cases easier. And if you need to change its behavior, you can do that.
In the early days of Unix, components of the operating system. its libraries, and its associated utilities were passed around in source code; this openness was a vital part of the Unix culture. When this tradition was disrupted in the 1980s, Unix lost both its initial momentum and contact with its own roots. A decade later, the rise of the GNU toolkit and Linux prompted a rediscovery of the value of open-source code.
Today, open-source code is again one of the most powerful tools in any Unix programmer's kit. Accordingly, though the explicit concept of "open source" and the most widely used open-source licenses are decades younger than Unix itself, it's important to understand both in order to understand the dynamism of today's Unix culture.
In this chapter, we'll survey various issues associated with re-using open-source code: evaluation, documentation, and licensing. In Chapter 12 we'll discuss the open-source development model more generally.
Programmers from outside the Unix world are often prone to think open-source (or `free' software) is necessarily inferior to the commercial kind, that it's shoddily made and unreliable and will cause one more headaches than it saves. They miss an important point -- in general, open-source software is written by people who care about it, need it, use it themselves, and are putting their individual reputations among their peers on the line by publishing it. They are therefore more strongly motivated to do a good job than wage slaves toiling Dilbert-like in the cubicles of proprietary software houses.
Furthermore, the open-source user community (those peers) is not shy about nailing bugs, and its standards are high. Authors who put out substandard work experience a lot of social pressure to fix their code or withdraw it, and can get a lot of skilled help fixing it if they choose. As a result, mature open-source packages are generally of high quality and often functionally superior to any proprietary equivalent. They may lack polish and have documentation that assumes much, but the vital parts will usually work quite well.
If you are a programmer from outside the Unix world, you may find this claim difficult to believe (and not without reason; it is an unfortunate fact that other OSs, when they have source-sharing cultures at all, tend to have weak and amateurish ones that only rarely produce high-quality work). If so, consider this: on modern Unixes, the C compiler itself is almost invariably open-source. The Free Software Foundation's GNU C Compiler (GCC) is so powerful, so well documented, and so reliable that there is effectively no commercial Unix compiler market left, and it has become normal for Unix vendors to port GCC to their platforms rather than do in-house compiler development.
The way to evaluate an open-source package is to read its documentation and skim some of its code. If what you see appears to be competently written and documented with care, be encouraged. If there also is evidence that the package has been around for a while and incorporated substantial user feedback, you may safely bet that it is quite reliable.
A good gauge of maturity and the volume of user feedback is the number of people besides the original author mentioned in the README and project news or history files in the source distribution. Credits to lots of people for sending in fixes and patches are signs both of a significant user base keeping the authors on their toes, and of a conscientious maintainer who is responsive to feedback and will take corrections.
It's also a good omen when the software has its own web page, on-line FAQ (Frequently-Asked Questions) list, and an associated mailing list or USENET newsgroup. These are all signs that a live and substantial community of interest has grown up around the software. Packages that are duds just don't get this kind of continuing investment, because they can't reward it.
Here are some examples of what web pages associated with high-quality open-source software look like:
Recently, the SourceForge archive at http://www.sourceforge.net has become extremely important. In the future it may take over Metalab's crown as the most important open-source archive in the world.
These archives are general-purpose and contain code in many languages, but most of their content is C or C++. There are also sites specialized around some of the interpreted languages we'll look at in Chapter 3.
The CPAN archive is the central repository for useful free code in Perl. It is easily reached from the Perl home page at URL http://www.perl.com/perl.
The Python Software Activity makes an archive of Python software and documentation available at the Python Home Page, URL http://www.dstc.edu.au/www.python.org.
Many Java applets and pointers to other sites featuring free Java software are made available at URL http://java.sun.com/applets/.
One of the most valuable ways you can invest your time as a Unix developer is to spend time wandering around these sites learning what is available for you to reuse. The coding time you save may be your own!
Documentation is often a more serious issue. Many high-quality open-source packages are less useful than they technically ought to be due to poor documentation. Unix tradition encourages a rather hieratic style of documentation, one which (while it may technically capture all of a package's features) assumes that the reader is intimately familiar with the application domain and reading very carefully.
Thus, for example, manual pages for Unix utilities often consist of a terse one-sentence summary of the utility's function, followed by a bewildering list of minutely-described command-line options, followed by a cursory description of its theory of operation, followed by references to related utilities. This sort of thing makes an excellent reference but a very daunting introduction.
The best advice we can give is: pay careful attention. What you need to know will probably be there, but you're likely to have to read the entire document carefully and think about each sentence in context before achieving enlightenment.
The most serious issue in reusing open-source software (especially in any kind of commercial product) is understanding what obligations, if any, the package's license puts upon you. In the next two sections we'll discuss this issue in detail.
Placed in public domain by J. Random Hacker, 2000. Share and enjoy!If you do this, you are surrendering your copyright. Anyone can do anything they like with any part of the text. It doesn't get any freer than this. Very little open-source software is actually placed in the public domain. On Metalab (the largest single archive of open-source in the Linux world) in mid-1997, only 3% of over 2600 software packages and documents announced PD status.
Who counts as an author can be very complicated, especially for software that has been worked on by many hands. This is why licenses are important. By setting out the terms under which material can be used, they grant rights to the users that protect them from arbitrary actions by the copyright holders.
In proprietary software, the license terms are designed to protect the copyright. They're a way of granting a few rights to users while reserving as much legal territory is possible for the owner (the copyright holder). The copyright holder is very important, and the license logic so restrictive that the exact technicalities of the license terms are usually unimportant.
In open-source software, the situation is usually the exact opposite; the copyright exists to protect the license. The only rights the copyright holder always keeps are to enforce the license. Otherwise, only a few rights are reserved and most choices pass to the user. In particular, the copyright holder cannot change the terms on a copy you already have. Therefore, in open-source software the copyright holder is almost irrelevant -- but the license terms are very important.
Normally the copyright holder of a project is the current project leader or sponsoring organization. Transfer of the project to a new leader is often signaled by changing the copyright holder. However, this is not a hard and fast rule; many open-source projects have multiple copyright holders, and there is no instance on record of this leading to legal problems. Some projects choose to assign copyright to the Free Software Foundation, on the theory that it has an interest in defending open source and lawyers available to do it.
The Open Source Definition is the result of a great deal of thought about what makes software ``open source'' or (in older terminology) ``free''. Its constraints on licensing require that:
The OSD is the legal definition of the `OSI Certified Open Source' certification mark, and as good a definition of ``free software'' as anyone has ever come up with. All of the standard licenses (MIT, BSD, Artistic, and GPL/LGPL) meet it (though some, like GPL, have other restrictions which you should understand before choosing it).
Note that licenses which allow noncommercial use only do not qualify as open-source licenses, even if they are based on ``GPL'' or some other standard license. They discriminate against particular occupations, persons, and groups. They make life prohibitively complicated for CD-ROM distributors and others trying to spread open-source software commercially.
If you must write your own license, be sure to have it certified by OSI. This will avoid a lot of argument and overhead. Unless you've been through it, you have no idea how nasty a licensing flamewar can get; people become passionate because the licenses are regarded as almost-sacred covenants touching the core values of the open-source community.
Furthermore, the presence of an established interpretive tradition may prove important if your license is ever tested in court. At time of writing (early 2001) there is no case law either supporting or invalidating any open-source license. However, it is a legal doctrine (at least in the U.S., and probably in other common-law countries such as England and the rest of the British Commonwealth) that courts are supposed to interpret licenses and contracts according to the expectations and practices of the community in which they originated. There is thus good reason to hope that open-source community practice will be determinitive when the court system finally has to cope.
You can find a template for the standard X Consortium license at http://www.opensource.org/licenses/mit-license.html.
Most "shareware" licenses have terms like this as well. They may request a donation, but they don't make it a condition of use.
The original BSD license is the best-known license of this kind. Among parts of the free-software cullture that trace their lineages back to BSD Unix, this license is used even on a lot of free software that was written thousands of miles from Berkeley.
It is also not uncommon to find minor variants of the BSD license that change the copyright holder and omit the advertising requirement (making it effectively equivalent to the MIT license). Note that in mid-1999 the Office of Technology Transfer of the University of California rescinded the advertising clause in the BSD license. So the license on the BSD software has been relaxed in exactly this way.
You can find a BSD license template at http://www.opensource.org/licenses/bsd-license.html.
The `Artistic License', devised for Perl and widely used in the Perl developer community, is of this kind. It requires modified files to contain "prominent notice" that they have been altered. It also requires people who redistribute changes to make them freely available and make efforts to propagate them back to the free-software community.
You can find a copy of the Artistic License at http://www.opensource.org/licenses/artistic-license.html.
The GPL also requires that interactive programs licensed under GPL include a startup banner referring to the GPL. It also requires that any program containing parts that are under GPL be wholly GPLed. (The exact circumstances that trigger this requirement are not perfectly clear to everybody.)
These extra requirements actually make the GPL more restrictive than any of the other commonly-used licenses. (Larry Wall developed the Artistic License to avoid them while serving many of the same objectives.)
You can find a pointer to the GPL, and instructions about how to apply it, at http://www.gnu.org/copyleft.html.
You can find a template for the MPL at http://www.opensource.org/licenses/MPL-1.0.html.
Open source doesn't just mean access to the source code. The
distribution terms of open-source software must comply with the
following criteria:
1. Free Redistribution
The license may not restrict any party from selling or giving away the
software as a component of an aggregate software distribution containing
programs from several different sources. The license may not require a
royalty or other fee for such sale.
2. Source Code
The program must include source code, and must allow distribution in
source code as well as compiled form. Where some form of a product is
not distributed with source code, there must be a well-publicized
means of obtaining the source code for no more than a reasonable
reproduction cost -- preferably, downloading via the Internet without
charge. The source code must be the preferred form in which a
programmer would modify the program. Deliberately obfuscated source
code is not allowed. Intermediate forms such as the output of a
preprocessor or translator are not allowed.
3. Derived Works
The license must allow modifications and derived works, and must allow
them to be distributed under the same terms as the license of the original
software.
4. Integrity of The Author's Source Code.
The license may restrict source-code from being distributed in modified
form only if the license allows the distribution of "patch files" with
the source code for the purpose of modifying the program at build time.
The license must explicitly permit distribution of software built from
modified source code. The license may require derived works to carry a
different name or version number from the original software.
5. No Discrimination Against Persons or Groups.
The license must not discriminate against any person or group of persons.
6. No Discrimination Against Fields of Endeavor.
The license must not restrict anyone from making use of the program in
a specific field of endeavor. For example, it may not restrict the program
from being used in a business, or from being used for genetic research.
7. Distribution of License.
The rights attached to the program must apply to all to whom the program
is redistributed without the need for execution of an additional license
by those parties.
8. License Must Not Be Specific to a Product.
The rights attached to the program must not depend on the program's being
part of a particular software distribution. If the program is extracted
from that distribution and used or distributed within the terms of the
program's license, all parties to whom the program is redistributed should
have the same rights as those that are granted in conjunction with the
original software distribution.
9. License Must Not Contaminate Other Software.
The license must not place restrictions on other software that is distributed
along with the licensed software. For example, the license must not insist
that all other programs distributed on the same medium must be open-source
software.
For the evolutionary process to work, we have to counter short-term incentives for people to stop contributing to the software gene pool. This means the license terms must prevent people from locking up software where very few people can see or modify it.
When software developers distribute their software under OSI approved software licenses, they can apply the "OSI Certified" mark to that software. This certification mark informs users of that software that the license complies with the intent of the Open Source Definition.
Accordingly, an open-source license must guarantee that source be readily available, but may require that it be distributed as pristine base sources plus patches. In this way, "unofficial" changes can be made available but readily distinguished from the base source.
Having gone through all this legal verbiage, the expected thing for us to do at this point is to utter a somber disclaimer to the effect that we are not lawyers, and that if you have any doubts about the legality of something you want to do with free software, you should immediately consult a lawyer.
With all due respect to the legal profession, this would be fearful nonsense. The language of these licenses is as clear as legalese gets (they were written to be clear) and should not be at all hard to understand if you read it carefully. The lawyers and courts are actually more confused than you are. The law of software rights is murky, and case law on free-software licenses is (as of fall 2000) nonexistent; no one has ever been sued under them.
This means a lawyer is unlikely to have any better insight than a careful lay reader. But lawyers are professionally paranoid about anything they don't understand. So if you ask one, he's almost certainly going to tell you that you shouldn't go anywhere near open-source software, despite the fact that he probably doesn't understand the technical aspects or the author's intentions anywhere near as well as you do.
Finally, the people who put their work under open-source licenses are generally not mega-corporations attended by schools of lawyers looking for blood in the water. They're individuals or volunteer groups who mainly want to give their software away. Your odds of getting hauled into court on an innocent technical violation are probably lower than your chances of being struck by lightning in the next week.
This isn't to say you should treat these licenses as jokes. That would be disrespectful of the creativity and sweat that went into the software, and you wouldn't enjoy being the first litigation target of an enraged author no matter how the suit came out. But if you make a visible good-faith effort to meet the author's intentions, you should be fine.
Under these licenses, the only kind of open-source use you should really worry about is actual incorporation of the free-software code into a proprietary product (as opposed, say, to merely using open-source development tools to make your product). If you're prepared to include proper license acknowledgements and pointers to the source code you're using in your product documentation, even direct incorporation should be safe for any license looser than the GPL.
The GPL is the only sticky case. And it's clause 2(b), requires that any derivative work of a GPLed program itself be GPLed, that causes the controversy. (Clause 3(b) requiring licensors to make source available on physical media on demand used to cause some, but the Internet explosion has made publishing source code archives a la 3(a) so cheap that nobody worries about the source-publication requirement any more.)
Nobody is quite certain what that the ``contains or is derived from''in clause 2(b) means, nor what kinds of use are protected by the ``mere aggregation'' language a few paragraphs later.
Some people think the 2(b) language is deliberately designed to infect every part of any commercial program that uses even a snippet of GPLed code; such people refer to it as the GPV, or ``General Public Virus''. Others think the ``mere aggregation'' language covers everything short of mixing GPL and non-GPL code in the same compilation or linkage unit.
This uncertainty has caused enough agitation in the open-source community that the FSF had to develop the special, slightly more relaxed ``Library GPL'' (which they have since renamed the ``Lesser GPL'') to reassure people they could continue to use the runtime libraries that come with FSF's GNU C compiler.
You'll have to choose your own interpretation of clause 2(b); most lawyers will not understand the issues involved, and there is no case law. As a matter of empirical fact, the FSF has (up to February 2001, at least) never sued anyone under the GPL since it was founded in 1984. And, as another empirical fact, Netscape includes the source and object of a GPLed program with the commercial distribution of its Netscape Navigator browser.
There are many reasons Unix open-source software tends to be of high quality. We've touched on them here, and will discuss the phenomenon in more detail in Chapter 12 on the open development model. For now we'll observe that all the forces that tend to make free software better have the most impact on development tools -- programs that are in daily use by a large and able population of developers.
As a general rule, you will find that any kind of development tool or application that is in constant use by open-source developers is done better in open source than in any of its commercial alternatives. In particular, you can use the tools we recommend in this book with confidence; they have been through the fire.