home *** CD-ROM | disk | FTP | other *** search
-
-
-
- LOCATEDB(5L) LOCATEDB(5L)
-
-
- NNAAMMEE
- locatedb - front-compressed file name database
-
- DDEESSCCRRIIPPTTIIOONN
- This manual page documents the format of file name
- databases for the GNU version of llooccaattee. The file name
- databases contain lists of files that were in particular
- directory trees when the databases were last updated.
-
- There can be multiple databases. Users can select which
- databases llooccaattee searches using an environment variable or
- command line option; see llooccaattee(1L). The system adminis-
- trator can choose the file name of the default database,
- the frequency with which the databases are updated, and
- the directories for which they contain entries. Normally,
- file name databases are updated by running the uuppddaatteeddbb
- program periodically, typically nightly; see uuppddaatteeddbb(1L).
-
- uuppddaatteeddbb runs a program called ffrrccooddee to compress the list
- of file names using front-compression, which reduces the
- database size by a factor of 4 to 5. Front-compression
- (also known as incremental encoding) works as follows.
-
- The database entries are a sorted list (case-
- insensitively, for users' convenience). Since the list is
- sorted, each entry is likely to share a prefix (initial
- string) with the previous entry. Each database entry
- begins with an offset-differential count byte, which is
- the additional number of characters of prefix of the pre-
- ceding entry to use beyond the number that the preceding
- entry is using of its predecessor. (The counts can be
- negative.) Following the count is a null-terminated ASCII
- remainder -- the part of the name that follows the shared
- prefix.
-
- If the offset-differential count is larger than can be
- stored in a byte (+/-127), the byte has the value 0x80 and
- the count follows in a 2-byte word, with the high byte
- first (network byte order).
-
- Every database begins with a dummy entry for a file called
- `LOCATE02', which llooccaattee checks for to ensure that the
- database file has the correct format; it ignores the entry
- in doing the search.
-
- Databases can not be concatenated together, even if the
- first (dummy) entry is trimmed from all but the first
- database. This is because the offset-differential count
- in the first entry of the second and following databases
- will be wrong.
-
- There is also an old database format, used by Unix llooccaattee
- and ffiinndd programs and earlier releases of the GNU ones.
- uuppddaatteeddbb runs programs called bbiiggrraamm and ccooddee to produce
-
-
-
- 1
-
-
-
-
-
- LOCATEDB(5L) LOCATEDB(5L)
-
-
- old-format databases. The old format differs from the
- above description in the following ways. Instead of each
- entry starting with an offset-differential count byte and
- ending with a null, byte values from 0 through 28 indicate
- offset-differential counts from -14 through 14. The byte
- value indicating that a long offset-differential count
- follows is 0x1e (30), not 0x80. The long counts are
- stored in host byte order, which is not necessarily net-
- work byte order, and host integer word size, which is usu-
- ally 4 bytes. They also represent a count 14 less than
- their value. The database lines have no termination byte;
- the start of the next line is indicated by its first byte
- having a value <= 30.
-
- In addition, instead of starting with a dummy entry, the
- old database format starts with a 256 byte table contain-
- ing the 128 most common bigrams in the file list. A
- bigram is a pair of adjacent bytes. Bytes in the database
- that have the high bit set are indexes (with the high bit
- cleared) into the bigram table. The bigram and offset-
- differential count coding makes these databases 20-25%
- smaller than the new format, but makes them not 8-bit
- clean. Any byte in a file name that is in the ranges used
- for the special codes is replaced in the database by a
- question mark, which not coincidentally is the shell wild-
- card to match a single character.
-
- EEXXAAMMPPLLEE
- Input to ffrrccooddee:
- /usr/src
- /usr/src/cmd/aardvark.c
- /usr/src/cmd/armadillo.c
- /usr/tmp/zoo
-
- Length of the longest prefix of the preceding entry to share:
- 0 /usr/src
- 8 /cmd/aardvark.c
- 14 rmadillo.c
- 5 tmp/zoo
-
- Output from ffrrccooddee, with trailing nulls changed to new-
- lines and count bytes made printable:
- 0 LOCATE02
- 0 /usr/src
- 8 /cmd/aardvark.c
- 6 rmadillo.c
- -9 tmp/zoo
-
- (6 = 14 - 8, and -9 = 5 - 14)
-
- SSEEEE AALLSSOO
- ffiinndd(1L), llooccaattee(1L), llooccaatteeddbb(5L), xxaarrggss(1L) FFiinnddiinngg
- FFiilleess (on-line in Info, or printed)
-
-
-
-
- 2
-
-
-