home *** CD-ROM | disk | FTP | other *** search
-
-
- MARC FORMAT RECORDS
- ===================
-
- Prepared by Doug Lowry
- June 13, 1986
-
-
- OBJECTIVE:
- =========
-
- To examine the structure of MARC format records with a
- view to writing a preprocessor which will create standard
- format records from MARC records.
-
-
- LIMITATIONS:
- ===========
-
- What follows is not an exhaustive study. Harvey Martens
- has done a preliminary analysis. I have carried it a few
- steps further, aided by discussion with Michael xxxxxx of
- the National Research Council (613 xxx-xxxx) on June 12.
-
-
- BASIC RECORD STRUCTURE:
- ======================
-
- MARC records occur in blocks. Each block is preceded by a
- 4 byte value; the first two bytes are the high and low order
- bytes respectively of the length of the block. The next two
- bytes are each null. For example, octal values
-
- 026 270 000 000
-
- indicate a block length of 5816 bytes. (It is not
- yet clear whether this count includes or excludes the
- four bytes for the block length.)
-
- An individual MARC record consists of these components:
-
- 1. A 4 byte record length indicator
- 2. A 24 byte leader
- 3. A record directory or entries map
- 4. Control fields and variable fields
- 5. A group separator character
-
-
- RECORD LENGTH INDICATOR:
- =======================
-
- Each MARC record is preceded by 4 bytes... the high order
- byte and the low order byte respectively of the record
- length in bytes, then two null bytes. For example, octal
-
- 001 366 000 000
-
- indicate a record length of 502 bytes. This count
- includes the four byte indicator.
-
-
- RECORD LEADER:
- =============
-
- 24 bytes as follows:
-
- 1...5 ASCII record length in bytes, EXCLUDING the four byte
- record length indicator above.
-
- 6 Record status letter (N= new, C= correction,
- D= deletion, ...)
-
- 7 Type (codes not currently known)
-
- 8 Bibliographic category (A= analytic, M= monograph,
- S= serial, ...)
-
- 11 Indicator count (uncertain... not immediately relevant)
-
- 13...17 Seems to be an ASCII count of the number of bytes in
- the following directory entries map.
-
- 18...24 Uncertain
-
-
- RECORD DIRECTORY:
- ================
-
- A record directory consists of a series of ASCII numeric
- values:
-
- 3 byte field number
- 4 byte inclusive length of field in bytes
- 5 byte offset in bytes from beginning of field data
-
- The field numbers in the examples examined so far appear in
- numeric order within the directory. A field number may occur
- more than once. The location of the data appears in near
- random order (possibly the order in which fields were added).
- Note the offsets in the following real example:
-
- Field Length Offset
- 008 0039 00000
- 009 0032 00284
- 022 0025 00134
- 035 0030 00104
- 088 0007 00229
- 089 0036 00236
- 090 0012 00272
- 100 0014 00159
- 245 0041 00063
- 260 0008 00039
- 260 0008 00047
- 300 0008 00055
- 410 0056 00173
- RS
-
- A close examination of the directory shows that it is
- arithmetically coherent. For example, at offset 00000
- above, there is something 39 bytes long. Sure enough,
- the next lowest offset is 00039. There are 8 bytes in
- that field, and the next offset is 00047, etc.
-
- A directory is terminated by an "RS" or record separator
- byte (octal 036).
-
-
- FIELD CONTENTS:
- ==============
-
- Fields are of two types...
-
- control fields, numbered 001...009
-
- variable fields, numbered 010...999
-
- Control fields are fixed format and specialized. For
- now, control fields other than 009 can be ignored.
- Field 008 is usually present, but its contents are
- duplicated in 009. We will treat 009 as if it were a
- variable field, except that non-printing characters
- should be replaced by white space.
-
- Variable fields are essentially free text. Sub fields
- may exist within a field. For now, we can indicate new
- sub fields by replacing the separators by a newline
- symbol in the compressed text.
-
- All fields (except the first) and all sub fields begin
- with a "US" (unit separator, octal 037) followed by a
- single lower case character. The significance of
- different characters is not yet known. We do know that
- it is safe to collapse out the "US" byte and the following
- byte as white space for preprocessing, and to replace them
- by a newline for creating compressed text.
-
- All fields end with an "RS" 036 record separator.
-
-
- END OF RECORD:
- =============
-
- Records are terminated by a single "GS" byte (group
- separator, octal 035).
-
-
- ADDITIONAL NOTES:
- ================
-
- Consistent order within records would be reasonably assured
- if we extract data in field number order per the directory
- rather than by actual occurrence within the record. For
- preprocessing, this would mean that at least the full record
- would be needed in RAM before preprocessing it.
-
- The list of variable field names may be extracted from
- "Composite MARC Format", a tabular listing which has been
- ordered. Individual organizations may utilize unnamed
- fields; until they provide the names, a name such as
- "Field 089" could be used.