home *** CD-ROM | disk | FTP | other *** search
-
- ══════════════════════════════════════
-
- 2. SOURCE CODE GUIDELINES
-
- ══════════════════════════════════════
-
-
- The objective of this unit is to set a framework in
- which you are invited to use, respond to, and improve upon, the
- techniques and software in the MIR series.
-
-
- ═════════════════════════════════════════════
- 2.1 Needs of the information searcher
- ═════════════════════════════════════════════
-
- Good software is user- or market-driven. (We've said
- that before!) At this point let's take a closer look at the needs
- of a potential user, putting ourselves in the place of a person who
- wants to retrieve information quickly and easily from large masses
- of data. What are the things that matter? The following items are
- mentioned again and again.
-
-
- The value of time: The time factor shows up in two
- ways... learning and system response.
-
- If it takes a course or five days of seminars to learn
- how to use a program, most potential users are frightened away.
- Buyer resistance sets in if the salesperson doesn't invite you to
- take over early in a demonstration. People don't even want to take
- the time to read a manual; the majority rarely do!
-
- System response has to do with how much time elapses
- until the program reacts to a new instruction from the user. If a
- computer program replaces a manual method that took far longer,
- people tolerate delays... at first. But over a few months of use,
- they grow impatient with slow response. For long term comfort and
- acceptance, three quarters of a second seems to be the threshold of
- tolerance. You might think in terms of a "three-quarters of a
- second, 95% of the time" rule for computer programming. In other
- words, make sure that the program, at least 19 times out of 20,
- shows some response to new instructions from the user within three-
- quarters of a second. That standard is not as difficult to achieve
- as it might appear.
-
-
-
- Simplicity, simplicity, simplicity: How do we feel
- about a computer program that requires combinations of two or three
- keys, such as ALT-Shift-H to get help? Or a program where we
- search in vain for any logic or consistency in the instruction set?
- So long as we have "user friendly" software like that, we need no
- enemies.
-
- Simplicity reduces learning time. Simplicity rescues
- the user from becoming hostage to the manual or to instruction
- summary cards (especially after not having used the program for a
- few weeks). Uniform simplicity across many programs is even
- better, because the user becomes free to move easily from one
- program to another. To the computer firms that promote intuitive,
- uniform simplicity, a vote of thanks!
-
- Achieving simplicity for the user places extra demands
- on the designer and programmer. But the trade-off between those
- demands and the costs of turning hundreds or thousands of potential
- users into reluctant technicians is no contest at all. Programmers
- and designers, go the second mile and make it easy for the user.
- The marketplace in the long run will reward you for it.
-
- The argument against simplicity is that programs will
- lack power that more sophisticated users demand. The answer to
- that is to bury the extra power in segments of the program or
- behind optional instructions or menus so that they do not intrude
- on the person who does not want or is not ready for the extra
- features. And when the extra features are turned on, still keep it
- simple!
-
-
- Control: Who is in control... the program or the
- person running it? The program is in control:
-
- » if it is possible to get locked into a situation from
- which there is no escape except through reading large
- sections of the manual (or rebooting);
-
- » if the person has to tab through several fields to get
- to the next place where data is to be changed or added;
-
- » if the user is forced to traverse a maze of menus
- long after becoming familiar with the program;
-
- » if it takes more than two or three keystrokes or
- mouse-clicks to quit the program from anywhere
- whatsoever within the program;.
-
- These are examples only. People like the feeling that
- they themselves are running things, not some faceless programmer or
- designer.
-
-
- Freedom from a ticking clock: One feature of
- centralized processing is calculated to devastate the human
- psyche... the message when logging out that one has used several
- hundred dollars worth of computer time. It rarely hits the
- worker's wallet directly, but the message is still received...
- guilt, guilt, guilt. It takes the fun out of computer use.
-
- This need argues for more efficient programs. In the
- case of centralized processing, the task is finished faster and the
- computer resources and any communication costs are lower. Truly
- efficient programs open the way to distributed processing, where
- the user can leave a personal computer running for hours at a
- marginal cost of only a few cents for electricity. Communication
- costs are nil. Guilt over the ticking clock is gone.
-
-
- Freedom from obscure error messages: Capital
- punishment is outside the scope of tutorials on indexing and
- retrieval. But isn't there a feeling of moral satisfaction in
- considering the ultimate deterrent for programmers who wish upon us
- in mid-program gems like: "CANNOT ALLOC MORE MEMORY"? Somewhat
- worse are the error messages that seem to be in plain language, but
- you find out that the content has nothing to do with the actual
- problem.
-
-
- Freedom from the curse of codes: What's wrong with
- plain language?
-
-
- Language of choice: There are two ways for programmers
- to accommodate users' preference for programs operating in their
- native language. For programs that are not interactive, it is
- sufficient if source code is available so that error messages and
- the program description message can be translated and the program
- recompiled. A well written interactive program has all its screen
- text, prompts, instructions and help messages in a separate file.
- This text file may be translated into other languages. Careful
- attention must be paid to spacing so that messages fit on the
- screen in the space assigned by the program. These files are then
- named according to a convention that is easily recognized by the
- program. Example: ENGLISH.LNG, ESPANOLA.LNG, FRANCAIS.LNG, etc.
- If only one file is present, the program automatically appears in
- that language only. If multiple such files are in the same
- computer directory, the user is offered a choice of languages
- immediately when the program starts up.
-
-
- Context-sensitive help: This feature has become quite
- common. The program user may touch a single key (often the F1
- function key on personal computers) to get helpful instructions.
- These help messages are context-sensitive if their content depends
- on what options are open at that point in the program.
-
-
- More bang per computer dollar: Personal computers have
- revolutionized the economics of information. Where access is
- needed to up-to-the-minute data (such as seats available on an
- airline flight), processing has to be centralized on large
- machines. That's expensive... the central mainframe computers, the
- communications costs, and the ability to transact data changes that
- become accessible to other people immediately. Personal computers
- come into their own where data can be downloaded or distributed,
- and used at the individual's pleasure. You don't need a million
- dollar computer to search using an index; a personal computer
- costing less than a thousand dollars is capable of nearly instant
- response. Even the complex task of creating an index for a large
- quantity of data requires only a moderate amount of computer
- "horsepower." Personal computers can carry out quite sophisticated
- chores. For example, statistical analysis of survey questionnaires
- is in some cases a simple extension of the use of high speed
- indexes.
-
-
- ═════════════════════════════
- 2.2 Design background
- ═════════════════════════════
-
- Every computer program represents a series of design
- decisions. In order to understand more readily MIR technology and
- software, you may find some background helpful.
-
- Squeezing each bit... the conservationist start: I
- first programmed in 1964 on an IBM 1440 which had 4,096 bytes of
- available RAM. That's for the program and the data. One quickly
- develops a mindset under these conditions... make every bit within
- every byte count for as much as possible. During the 1970s the
- prevailing attitude was to pour hardware resources lavishly on any
- computer problem. I didn't get on the bandwagon. Effectiveness
- and efficiency still mattered. For example, when developing an
- early fourth generation language, I took a kind of perverse pride
- in squeezing data types into minimum byte counts... dates in 16
- bits, postal and zip codes coexisting in 27 bits, area codes and
- telephone numbers in 31 bits (achieved by swapping the first two
- digits in the area code; it works because the second digit of an
- area code is zero or one).
-
- The gigabyte years: Since early in the 1980s the
- majority of my computer work has been in connection with databases
- in the sixty million to 3 billion byte range. The CD-ROM world
- seemed an invitation to be extravagant; space is plentiful. But
- efficiency has an interesting payoff. It affects retrieval
- timings. Why go out to the disc multiple times or why take the
- time to fill huge buffers on each access? Working with compressed
- indexes reduces mechanical head movement, by far the greatest time
- factor. And compressed indexes can be used for remarkably faster
- Boolean operations. More on that later.
-
- UNIX influence: I created the database system that
- later became known as FindIT under the Primos operating system,
- then shifted in 1985 to UNIX. Predecessor versions of many MIR
- software routines were written in the UNIX environment. One moves
- in a world of byte streams, pipes, and bit manipulation. DOS by
- contrast is oriented toward printable data; witness the fact that
- binary data requires special declaration in DOS and cannot be fed
- through pipes. (Stdin and stdout in DOS insert a carriage return
- in the data whenever DOS encounters a binary byte which happens to
- be a linefeed; behavior when a binary CTL-Z is found is even more
- unpleasant.) The MIR programs are DOS versions, but the UNIX-style
- thinking will show through.
-
- C with a FORtran accent: It's common to learn a
- variety of languages when involved with computer programming over
- an extended period. My early set included Autocoder, Assembler,
- Basic, Cobol, and various forms of machine language. FORtran was
- my language of choice from 1968 to 1985. That long in one language
- gives one an accent when moving on into another language. C
- language was the logical choice for portability and for efficient
- control over byte streams. FORtran doesn't promote modularity as
- much as C; it's more linear in its thought forms. A C purist might
- be shocked at inelegancies and "FORtranisms" in my C code. Fair
- enough, but remember that it works! (As Billy Sunday is reputed to
- have answered a critic: "I like the way I'm doing it better than
- the way you are not doing it.")
-
-
-
- ≡≡≡≡->> QUESTION:
- If you are expert in C++ or in any approach to object
- oriented software, you are strongly encouraged to
- provide alternative coding to any programs offered in
- MIR. Object oriented software is gaining ground
- rapidly. C++ language is nearing critical mass in
- terms of acceptance and numbers of people competent in
- its use.
- <<-≡≡≡≡
-
-
- ════════════════════════════
- 2.3 Design decisions
- ════════════════════════════
-
- Here then are design decisions that are built into MIR
- software. You are not bound by them. But they affect you to the
- extent you may use this material as a starting point.
-
- Language: C language is currently the language of
- choice for widest portability, at least in North America, and
- probably the world. It offers somewhat less power than Assembler
- and less clarity than Basic. C doesn't get the preferential
- treatment given to Pascal in the Macintosh environment.
- Nonetheless C is likely to serve the needs of the widest spectrum
- of potential users of computerized indexing and retrieval.
-
- Hardware: The executable versions are compiled for use
- on IBM-compatible personal computers. This starting point provides
- the widest access, since there are more PCs around than all others
- put together.
-
- Operating system and compiler: Again, access by the
- widest number of potential users favors DOS. I am using
- Microsoft's Version 5.0 in combination with the Microsoft C 6.0
- Programmer's Workbench compiler. Switches are set for ANSI C (to
- eliminate code unique to Microsoft C) and 8086 runtime operability.
-
- Avoiding code that blows up: Some practices, while
- common in C language, lead to messy situations when porting to
- other environments. For example, the "varargs.h" files in Sun and
- DOS differ; the use of "varargs.h" leads to inconsistencies when
- moving from one computer to the other. Therefore variable argument
- routines are not used. (One price... warning and error routines
- are less elegant.) Dynamic memory allocation also has been
- dropped. (Sorry about that.) And as discussed earlier, the
- vagaries of DOS reduce dramatically the ability to pipe binary byte
- streams. Stdin and stdout have been used only when there is
- reasonable certainty that printable files only are involved. To
- anyone working in UNIX, feel free to change back; then you can use
- series of pipes and avoid the successive creation and deletion of
- work files.
-
-
-
- ═══════════════════════
- 2.4 Conventions
- ═══════════════════════
-
- Humans use programs: Any MIR program responds with an
- explanation, up to one screen in length, detailing what the program
- is intended to do and what arguments it expects. If you wish to
- see this quick overview for any program, input the program name
- followed by a space and either /U or a question mark. Example:
-
- A_BYTES ?
-
- Normal courtesy to the end user of a program suggests
- that we avoid inhuman messages, either obscure or misleading. By
- the same token, no program is to have an inescapable situation,
- that is, one that gives no direction to the user how to undo a path
- or leave a program. Depending on handlers installed, CTL-C may or
- may not act as an escape. The most reliable approach is to write
- code so that the program always responds to the escape key (ESC).
-
- Humans read programs: Brian Kernighan and Dennis
- Ritchie are undoubtedly very fine fellows. Whatever induced them
- to inflict on humanity their weird convention in placing brace
- brackets? Programs should be for people in every way... including
- readability for programmers. In MIR C code, matching brace
- brackets are always vertically aligned and their content indented
- so that one can see at a glance their range.
-
- Comments, reasonable variable names, descriptive
- declarations, full descriptions at the top of source code as to
- function, input and output all add to the usefulness of programs.
-
-
- ══════════════════════════════
- 2.5 Use it, improve it
- ══════════════════════════════
-
- Let's turn to what you might do with this material.
-
- It's here for you to use. No royalties need be paid to
- anyone. Identify your needs in the area of indexing and retrieval,
- then select from what is offered in successive MIR tutorial
- releases to match your needs.
-
- As a first time user, you have something of value... a
- fresh perspective. Are there parts of the tutorials that you find
- confusing? Which programs warrant more explanation? May parts be
- safely omitted? Please share your thoughts! An ASCII text file
- "RESPONSE" is included on the diskettes with this tutorial. Make
- a copy of "RESPONSE", edit in your comments, and send it by FAX,
- electronic mail, or regular mail to an address listed at the bottom
- of the RESPONSE template. (Addresses also appear near the top of
- each source code listing.)
-
- You may have noticed on the copyright page and response
- template that Marpex has not invited telephone calls. This is not
- a discourtesy, but a protection. FAX and electronic mail permit
- time shifting; the voice phone does not. I recommend to you a
- little book entitled "Peopleware: Productive Projects and Teams"
- (Tom DeMarco and Timothy Lister... New York: Dorset Publishing
- House, 1987). Look especially at Chapter 8: "You Never Get
- Anything Done Around Here Between 9 and 5", and Chapter 11: "The
- Telephone".
-
- Note that the tutorials and the source code are made
- available to you under two different sets of rules. The tutorials
- are shareware; they may be freely copied, but not changed. Your
- suggestions concerning the text should be sent to the author.
- Under "copyleft" rules, the source code may be changed and
- redistributed widely. As a courtesy, please share your source code
- changes and new programs with us. We will add all the material
- that seems relevant and helpful to later releases. All software
- that you provide must, of course, come under the copyleft rules in
- order for us to distribute it. (And please, please, please, send
- us only source code for which you have rights to share.)
-
- If you work on different equipment and/or under a
- different operating system, you are invited to port the code to
- that environment. In the final CD-ROM version we will set up
- subdirectories as needed for alternate operating systems.
- Incidentally, the port to Sun Microsystems UNIX is very simple:
- take the binary flags out of file opening sequences, and increase
- the #define statements if you want larger buffers.
-
- Please send source code in machine readable form -
- either by electronic mail or on a floppy diskette by regular mail.
- To get maximum attention for your efforts, please observe the
- conventions for software outlined above.
-
- ≡≡≡≡->> QUESTION:
- Are there ways we could improve this process, and still
- stick to the schedule of releases and the copyleft
- ground rules? Your thoughts, please.
- <<-≡≡≡≡
-
-
- This is the shape of cooperative development. Join in,
- share, have fun.