home *** CD-ROM | disk | FTP | other *** search
-
-
- ═══════════════════════════════════════════════
-
- 4. HOW THE MIR PROJECT WORKS FOR YOU
-
- ═══════════════════════════════════════════════
-
-
- ═══════════════════════════
- 4.1 "Free" software
- ═══════════════════════════
-
- In the MIR project we are using the "copyleft" strategy
- of the Free Software Foundation. The Foundation's GNU General
- Public License is included as Topic Five; it applies to all
- software created as part of the MIR project. This software has
- been created specifically for this purpose by Marpex Inc. since
- March 1991.
-
- The Free Software Foundation
-
- "is dedicated to eliminating restrictions on
- copying, redistribution, understanding, and
- modification of computer programs. [They] do this
- by promoting the development and use of free
- software in all areas of computer use... 'Free'
- pertains to freedom, not to price... You have two
- specific freedoms once you have the software:
- first, the freedom to copy the program and give it
- away to your friends and co-workers; and second,
- the freedom to change the program as you wish, by
- having full access to source code. Furthermore,
- you can study the source and learn how such
- programs are written. You may then be able to
- port it, improve it, and share your changes with
- others."
-
- What is copyleft?
-
- "The simplest way to make a program free is to put
- it in the public domain, uncopyrighted. But this
- allows anyone to copyright and restrict its use
- against the author's wishes, thus denying others
- the right to access and freely redistribute it.
- This completely perverts the original intent.
-
- "To prevent this, we copyright our software in a
- novel manner. Typical software companies use
- copyrights to take away your freedoms. We use the
- copyleft to preserve them. It is a legal
- instrument that requires those who pass on the
- program to include the rights to further
- redistribute it, and to see and change the code;
- the code and rights become legally inseparable."
-
- Quotes in the above three paragraphs are from page 3 of
- the January 1992 "GNU's Bulletin" semi-annual newsletter of the
- Free Software Foundation, 675 Mass Avenue, Cambridge, MA 02139
- USA.
-
- The argument for this strategy is set out nicely in an
- article "Programs to the People" in the February/March 1991 issue
- of the M.I.T. Technology Review. With permission of the author,
- Simson L. Garfinkel, the text of the article is included in a
- separate file on the CD-ROM release. The file is named "TOPEOPLE".
-
-
- ════════════════════════════════════════════
- 4.2 Interactive shareware publishing
- ════════════════════════════════════════════
-
- The Mass Indexing and Retrieval (MIR) project is
- releasing five sets of shareware tutorials. Shareware has three
- advantages for the user:
-
- » easier access through broad exposure on electronic
- bulletin boards and copying for friends;
-
- » opportunity to review tutorials prior to making a
- commitment... minumum risk and no surprises;
-
- » much lower prices since normal marketing costs are
- bypassed.
-
- Seed funding for the MIR project was provided by the
- Canadian government with the understanding that the underlying
- indexing and retrieval techniques developed in the project shall be
- made broadly available under copyleft rules. Personnel from two
- companies are carrying out the project.
-
- Innotech Inc. of North York, Ontario (416 492-3838)
- aims toward excellence in CD-ROM publishing services. It is
- developing interfaces and applications based on MIR technology.
- Innotech offers consulting services as well as service bureau
- processing in CD-ROM publishing.
-
- Marpex Inc. is a firm founded in 1976 by the author of
- the tutorials and the related software. Marpex developed the
- techniques and pilot programs for the pioneering FindIT CD-ROM
- system, and more recently collaborated in the design of the Discis
- Knowledge Research CD-ROM books. Marpex provides consulting in
- records management, and seminars related to the techniques in the
- MIR tutorials.
-
- MIR tutorials are designed to be an exercise in
- co-operative development. We hope to engage you, the readers and
- users, in the project. We know that co-operative development will
- lead to improved end results; many minds are better than one. Text
- and software is modified according to your input... clarifications,
- improved methods, more powerful source code, etc. Each tutorial
- will evolve to reflect significant improvements, with your name
- attached to the improvements you provide.
-
- After the interactive phase is over, Marpex hopes to
- compile a reference text based on the tutorials. This will be
- accompanied by a CD-ROM containing all software and support files.
- Since ISO 9660 CD-ROMs are operating system independent, your
- ported versions of programs can be included.
-
- Why not release everything at once? Reasons for
- progressive releases are:
-
- » Scope of the project: Look at the tables of contents.
- There is simply too much for one tutor to complete in
- a single step. Extensive new research is continuing to
- be carried out, particularly in concept recognition.
- Apart from standardized functions, we are not carrying
- forward source code used in any proprietary system.
- Much of this work in the past has been on UNIX
- workstations; now we are achieving levels of efficiency
- that can make preparation of large databases feasible
- on a personal computer.
-
- » Market readiness: Until the introduction and Tutorial
- ONE have been on the market for a few months, we do not
- know if our target groups are sufficiently interested.
- We want to know that our work is meeting a genuine need
- and that co-operative development under shareware and
- "copyleft" rules is viable.
-
- » Financing: The Canadian government provided seed
- funding, that is, enough to get the project off to a
- good start. We are using the same approach as the Free
- Software Foundation to provide the money required to
- carry the project forward. Their major financing is
- through distribution of tapes containing their work -
- at roughly $200 for each of several tapes. We aim to
- carry forward the MIR project through distribution and
- shareware registrations. People are free to make
- copies of all materials. We trust that buyers will
- honor the shareware provisions for the tutorials.
-
-
- ═════════════════════════════════════════
- 4.3 Engine-independent techniques
- ═════════════════════════════════════════
-
- The ISO 9660 CD-ROM standard and Microsoft's MS-DOS
- extensions opened the way to accessing the files on any conforming
- CD-ROM. But having access to files is not the same as being able
- to search conveniently. Because indexing systems and interfaces
- are proprietary, the user has been faced with the nightmare of
- having to learn a new retrieval method every time a CD-ROM title is
- purchased from a new vendor. The plea goes up: "Why can't I use
- the same program I've already learned?"
-
- Why not, indeed?
-
- Two ideas have emerged in the literature. One is full
- "interoperability"... the ability for a person to select her/his
- own preferred retrieval interface software and use it to search
- within any CD-ROM title on any CD-ROM drive under any operating
- system. That's far off yet. The second idea, a subset of the
- first, is now before a Standards Committee (SCAD) of the
- International Standards Organization (ISO) and may show up in
- commercial products in 1993. That is the possibility of separating
- the software into a client interface and an underlying server which
- fetches data from the CD-ROM. The server module resides in RAM and
- communicates with the client interface through standardized ASCII
- strings. The intention is that the server is specific to the data
- and the indexes in place; the client interface is the user's
- preference of any retrieval software conforming to the standard.
-
- These engine-independent techniques do away with the
- high cost and inconvenience of re-education. There are perhaps
- five contending proposed standards. The Information Handling
- Committee of the Intelligence Community Staff in Washington, D.C.
- has commissioned the CD-ROM Read-Only Data Exchange Standard (CD-
- RDx). The aircraft industry appears seriously committed to
- Structured Full-Text Query Language (SFQL), an extension of the ISO
- approved SQL. Other contenders are V39.50 (a library system
- networking protocol), Silver Platter's DXS, and DFL, an earlier
- outgrowth of Standardized Query Language. Unknowns at this point
- include the data structures supported (whether columnar relational
- databases and subsets thereof, or whether more generalized forms),
- and the actual syntax of messages that pass between the interface
- and server modules.
-
- We believe that cooperative development through the MIR
- project can contribute to this process. If software is freely
- available under copyleft rules, it can be adapted very readily as
- standards evolve. No-one has to hold back until the Standards
- Committee makes its one year or three year or five year report.
-
- We also believe that it is unnecessary to limit the
- discussion to CD-ROM. The basic problem (frustration at being
- forced to learn new interfaces) is independent of the medium on
- which the data are stored. MIR technology may be applied to data
- held on hard disk, floppy diskettes, Write Once Read Many (WORM),
- Bernoulli, rewritable laser optical disks, laser cards or whatever
- other media can retain data as byte streams.
-
-
- ═════════════════════════════════
- 4.4 The software provided
- ═════════════════════════════════
-
- Scope: The source code for data analysis and
- preparation, search term selection, and to some extent automated
- indexing require little interaction with a user. The programs in
- TUTORIALS ONE through THREE are therefore considered complete.
-
- TUTORIAL FOUR presents an engine (a "data server
- module") which may be used with interfaces compatible with engine-
- independent techniques. The number of different interfaces that
- might be written is infinite. Interface source code can be (and is
- likely to be) handled in traditional proprietary ways, simply
- because of the great variability in features that end users desire.
- You or your firm may write a "client module" interface and keep it
- proprietary, provided the data server module is kept separate and
- under copyleft rules. If you care to write a client module under
- copyleft rules, and if it works well, we will be glad to pass it
- along.
-
- The software provided with TUTORIAL FIVE might be
- classed as "discussion starters". We carry the discussion a fair
- distance, but look to readers to pursue their specific interests.
- In an ideal world, that pursuit would take the form of a public
- exchange of ideas under copyleft rules. As Captain Jean-Luc Picard
- would say, "Make it so!"
-
- Naming conventions are applied to many of the programs.
- DOS constrains source code names to eight characters plus a ".C"
- extension. Where a six letter name is workable, a single letter
- followed by an underscore precedes the name and has one of the
- following meanings:
-
- A_*.C analyze, report
- B_*.C build indexes
- C_*.C compress / integerize data
- E_*.C expand content of a file
- F_*.C filter out parts of a file
- I_*.C invert token matrix
- J_*.C join words into useful phrases
- M_*.C merge files
- P_*.C preprocess particular layouts
- Q_*.C quality assurance
- R_*.C rotate content within a line
- S_*.C server module for retrieval
- T_*.C transliterate language to ASCII
-
- Support files include 05LICENS, OVERVIEW, COPYRIGH,
- FRONTISn, NEWSREL and ORDER. On the diskette version, the install
- program gives you a choice of whether to extract the files in
- WordPerfect 5.1, ASCII, or a generic form suitable for other word
- processors which can handle ASCII files. 05LICENS is the Free
- Software Foundation's GNU General Public License which governs
- permissions for software supplied with the tutorials. CD-ROM
- release(s) contain extra worked examples, and articles such as
- TOPEOPLE.
-
- We recommend you place executable copies of all
- programs in one area on your hard disk. That way, you can create
- easy access to the programs with only one small addition to your
- DOS path (something of the form "\C:\BIN;" added to the PATH line
- in your AUTOEXEC.BAT file).