home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky alt.wais:775 comp.infosystems.wais:845
- Newsgroups: alt.wais,comp.infosystems.wais
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!darwin.sura.net!welchgate.welch.jhu.edu!francois
- From: francois@welchgate.welch.jhu.edu (Francois Schiettecatte)
- Subject: Generic multi-type wais server
- Message-ID: <1993Jan25.150933.16947@welchgate.welch.jhu.edu>
- Organization: Welch Medical Library
- Date: Mon, 25 Jan 1993 15:09:33 GMT
- Lines: 74
-
-
-
- Generic multi-type wais server
- ------------------------------
-
- I have made a number of modifications to waisindex and waisserver to
- allow support for multi-type documents. By multi-type documents, I
- means that the document can be represented in multiple formats such as
- text, gif, jpeg, etc, there is really no limit.
-
- The way it works is that you need to specify a command line arguement
- to the waisindex program which tells it that this is a multi-type
- database and that the documents are available in the following types,
- for example:
-
- /big/wais/wais/bin/waisindex -nopos -nopairs -d /big/wais-db/jfif \
- -M "TEXT,JFIF,JFIF-TBNL" -t filename /big/wais-db/jfif/*.TEXT
-
- The -M "TEXT,JFIF,JFIF-TBNL" argument tells waisindex that the
- documents are available in TEXT, JFIF and JFIF-TBNL (thumbnail) format,
- note that the standard -f argument is used to tell the indexer what
- format the text file is in.
-
- Waisindex is smart enough to check that the various file types are
- indeed available to be retrieved, so that the user cannot be presented
- with a choice of formats that do not exist for a document.
-
- Also the changes to the server maintain compatibility with the current
- database format, so you wont have to reindex your current databases.
-
- You can also compress the text file too using unix compress.
-
- But here is the catch, all the documents and the various representation
- of each documents must be stored as separate files, with an extension
- that matches the document type, for example:
-
- -rw-r--r-- 1 francois 59689 Oct 9 08:42 scarab.JFIF
- -rw-r--r-- 1 francois 6233 Jan 23 14:17 scarab.JFIF-TBNL
- -rw-rw-r-- 1 francois 425 Jan 21 17:08 scarab.TEXT
-
- The file scarab.TEXT contains the TEXT document, scarab.JFIF contains
- the JFIF document and scarab.JFIF-TBNL contains the JFIF thumbnail
- document.
-
- This is a pain, but a small price to pay, I guess, for this feature.
-
- You may also be interested to know that the current releases of xwais
- and HyperWais support multi-type documents. There may well be others,
- but I have not really looked into this.
-
- Anyway here is my question, I am happy to release this into the public
- domain, the changes are built on top of the BIO 5.1 server, along with
- a bunch of other changes (like bug fixes, support for .Z files, support
- for external file in indexing, stemming, etc). I could try to produce
- a patch file (but I dont really want to). What I can do is to create
- two tar files, one with all the files that I have changes (only 5 of
- them) and another one will the entire distribution. Would this be ok?
- Also I have add a comment '/* francois */' every place I have made
- changes, and '/* multitype extensions */' where there are changes which
- support multi-type files. If I dont hear from people one way or the
- other, I will put up the two tar files.
-
-
- francois
-
- Francois Schiettecatte
- Software Engineer
- Advanced Technology Group
- Welch Medical Library
- Johns Hopkins University
- Internet: francois@library.welch.jhu.edu
- Phone : (410) 955-7581
-
-
-