home *** CD-ROM | disk | FTP | other *** search
-
-
- ════════════════════════════════════════════════
-
- 1. INTRODUCTION TO MIR TUTORIAL ONE
-
- ════════════════════════════════════════════════
-
-
-
- ════════════════════════════
- 1.1 Project overview
- ════════════════════════════
-
- The Mass Indexing and Retrieval (MIR) project deals
- with the technical side of enabling people to find information
- within large quantities of data. Output from the project takes the
- form of five sets of printed tutorials, plus related software and
- source code under these headings:
-
- ONE Database Analysis
-
- TWO Secrets of Data Preparation
-
- THREE Keys to Automated Indexing
-
- FOUR Search Engines and Information Retrieval
-
- FIVE Related Topics and Applications
-
- The tutorials are addressed to Directors of Information
- Services, custom software providers, information publishers,
- government information distributors, educators, trainers, and
- programmers. The software is distributed under "copyleft" rules of
- the Free Software Foundation. Improvements are invited and will be
- shared in a final volume and in an accompanying CD-ROM.
-
- You may wish to print the five introductory topics
- together with Tutorial ONE and include them in a three ring binder.
- For best formatting, use the WordPerfect 5.1 version of the files
- provided on diskettes. Printed copies are also available from
- Marpex Inc. for a nominal cost; see the files ORDRINFO and
- ORDRFORM.
-
-
-
-
-
-
- ═════════════════════════════════
- 1.2 Tutorial ONE overview
- ═════════════════════════════════
-
- The purpose of MIR Tutorial ONE is to enable you to
- analyze computerized data from an indexing perspective.
-
- The first topic, source code guidelines, explains the
- perspectives that have been built into the software that is
- provided with the tutorials. People who wish to improve on the
- technology are shown how to share their insights and C language
- source code.
-
- Methods of data gathering affect the cost, the quality
- and the complexity of the task of indexing. An index adds value to
- data, so we pay attention to some marketing considerations.
-
- Data analysis has to do with recognizing various forms
- in which data is accumulated, and detecting the inconsistencies
- (common in large sets of data) that make indexing more challenging.
- Data format offers possibilities and imposes limitations that will
- face searchers who wish to extract information. How might the data
- be structured in a way that better suits the needs of searchers?
- The reader is provided with a variety of software tools for this
- critical data analysis function.
-
- The ability to identify patterns in byte sequences
- quickly is critical to keeping indexing costs low. We examine a
- series of software tools for this purpose.
-
- Worked examples are provided of the analysis stage.
- These topics are at a "nuts and bolts" level... use such and such
- a program, here is the input, here is the output, and here is what
- the results mean. The sequence is from simplest to most complex...
- simple ASCII text, ASCII with markup, fielded text, fixed length
- records, the addition of packed numbers, then various forms of
- binary data
-
- Data deblocking is explained at this stage since it may
- be required in order to finish analysis of the data.
-
- At the end of TUTORIAL ONE, the participant has
- detailed exposure to the techniques of data analysis, and is able
- to use a selection of analysis tools (source code provided) to
- recognize and interpret a wide range of data types.
-