home *** CD-ROM | disk | FTP | other *** search
- (* TBTree16 Copyright (c) 1988,1989 Dean H. Farwell II *)
-
- unit Btree;
-
- (*****************************************************************************)
- (* *)
- (* B T R E E R O U T I N E S *)
- (* *)
- (*****************************************************************************)
-
- (* These routines are responsible for handling the indexes. The scheme used
- is Knuth's variation of B-Trees discussed on page 64 of "Database Systems
- Volume I Fourth Edition" by C.J. Date. There are two different types of
- nodes in the tree. The upper level(s) are index nodes and fill the role of
- the normal B-Tree. The lowest level nodes are sequence nodes and represent
- a dense index to the file. The entries in the sequence node are sorted and
- thus provide an alternate means to access information in the file. This
- allows rapid sorted retrievals without the need to keep the data file
- sorted.
-
- The internal workings of an index is not really important, but you need to
- be aware of what an index contains and what operations can be performed
- using an index. An index is primarily used to do retrievals using the
- value of one or more fields of a data record. An index contains values
- which are the same as values of a field in a data file. Each value in an
- index contains a corresponding logical record number which shows which
- logical record corresponds to that value. In many instances, a field will
- not be unique. This means that more than one record may have the same
- value for a given field. The index will handle these properly. However,
- it would not make sense to have multiple values for the same field for a
- single logical record number. This would imply that a data record field
- could have two or more values at the same time, which it obviously cannot.
- This would occur only if you were updating a value for the given record
- number and failed to delete the old entry from the index. The
- documentation contains an example of an update which will keep this from
- happening. It is also important to realize that each index can handle only
- one data type. The valid data types are defined by the ValueType type. The
- data type of the values in the index is specified at the time that the
- index is created and is never changed.
-
- The first step in using an index is to create one. There is a routine
- (CreateIndexFile) especially for this purpose. It will create the root and
- first sequence nodes as well as the internal bitmap. This bitmap is used
- to keep track of which index nodes (physical records) are currently in use
- and which are available. These bitmaps are maintained interally as part of
- the index file itself. This alleviates the requirement to maintain
- separate bitmap files and greatly simplifies things. When creating a file,
- you must supply the file name, the data type and the size of the entries.
- The path and the extension for the file name is optional. For example, the
- path can be used to ensure that the files will always be in one place even
- though the program is run from other places, etc. The extension can help
- in your own bookkeeping. For example, all data files could use an
- extension of '.dat' and all index files could use '.idx' etc. This is
- completely up to you and your application.
-
- Once you have created an index, you will want to put value/logical record
- number pairs into the index and retrieve them back out. One routine
- (InsertValueInBTree) is supplied to insert a value and corresponding
- logical record number. Likewise, there is one routine supplied to delete a
- value/logical record number pair. There is no routine for performing an
- update. Instead, you will perform an update by doing a delete followed by
- an insert.
-
- Retrieving the logical records number(s) corresponding to values is the
- primary purpose of the indexes and is discussed now. There are actually
- two distinct ways to do this. The more powerful method is to use the
- routines available in BTREE2.INC. Each of these routines will perform an
- index search and will result in the creation of a logical record list. A
- logical record list is nothing more than a list of logical record numbers.
- These lists are fully explained in the LRECLIST.PAS file and are key to the
- power provided by TBTREE. You will want to fully understand how to create
- and access these lists.
-
- A second way to access the index is to use an internal (to the index)
- cursor. This cursor should not be confused with the cursor which is kept
- as part of the logical record lists discussed above. These index cursors
- are discussed indepth in BTREE3.INC.
-
- To understand either method of retrieval, you need to have the proper
- framework for understanding the index. Even though the indexes are
- maintained as BTREEs, you should view them as a sequential file of
- values/logical record number pairs. This sequential file is ordered
- by value in ascending order. Think of the pairs laid out horizontally with
- the lower values on the left. Moving forward through the index means that
- larger values are accessed. The first entry is the lowest value, etc.
- Hopefully, you get the picture. *)
-
-
- (*\*)
- (* Version Information
-
- Version 1.1 - Corrected error in InsertValueInBTree routine. I
- was not checking for duplicate entries. This has
- been resolved.
-
- - Added GetRangeFromBTree routine
-
- - Added GetSubstringFromBTree routine
-
- - Divided the source code into 3 source files - BTREE.PAS
- BTREE1.INC
- BTREE2.INC
- Version 1.2 - No Changes
-
- Version 1.3 - Added NumberOfBTreeLevels routine
-
- Version 1.4 - HUGE CHANGE - I no longer keep the data file name as part of
- the index file parameter record. It was not used, and I
- foresee no real use for it in the future. I thought that I
- would need it for my TBASE product, but I won't. This will
- mean a change to anyone who has applications that they have
- written using TBTREE version 1.3 or earlier. Any calls to
- the old CreateIndex routine will need to be changed.
- dataFName is no longer a parameter for that routine.
- Hopefully, this will not create a great inconvenience. I
- just was tired of carrying around the extra baggage.
-
- - HUGE CHANGE - The bitmap files are a thing of the past. The
- bitmaps are now maintained as part of the index files
- themselves. This has caused a complete restructuring of the
- index file parameter record and the index file themselves.
- This means that any index files (BTREEs) created using
- !!!!! - >>> versions prior to 1.4 will need to be rebuilt. This is
- IMPORTANT discussed in the documentation under the version information.
- !!!!! - >>> I apologize for the inconvenience. However, the sooner I make
- these kinds of changes, the better. This will simplify your
- use of the routines and cut in half the number of files to be
- dealt with (since each index and data file previously had a
- bitmap).
-
- - Corrected an error related to maximum value size (size of an
- entry) for an index. The documentation states that the
- maximum size for an entry in an index is 494 bytes. However,
- an error limited this to 255 bytes. This is because only one
- byte was allocated in the parameter record for the value
- size. This has been corrected. However, in practical terms,
- you want to keep the index entries small. This change
- required a change to the parameter record definition.
-
- - Added a second way to retrieve information from the B-Tree
- indexes. Rather than always creating a logical record list
- when retrievals are required, you can now work with the index
- more directly. The BTREE3.INC contains many routines which
- manipulate an internal cursor which can be used to access
- logical record numbers in the index. The newly added
- BTREE3.INC file has more details.
-
- - Added GetSubstringAtPositionFromBTree routine. This routine
- allows more flexibility when doing partial string matches on
- an index. It is exceptionally handy if you have an index on
- a string field which really is made of several sub-fields.
- Something like phone number is a good example. You might be
- interested in only phone numbers which have a seven as the
- second digit, or whatever.
-
- - Made several changes internally to increase performance. One
- primary change was to use a Binary Search technique to find
- entries in nodes at all levels. Also, VAR parameters are
- used internally when a parameter is large. For instance,
- when a page is passed to a routine as a parameter, it is
- passed as a VAR parameter. Therefore, the entire 512 bytes
- do not have to be placed on the stack. This will also reduce
- the stack requirements.
-
- - Changed name of the CreateIndex routine to CreateIndexFile to
- emphasize the change to the parameters for the routine.
- Also, it clashed with a name which I want to use with TBASE.
-
- - Added DeleteIndexFile routine. This routine should always be
- used when deleting an index file.
-
- Version 1.5 - Changed code internally to use Inc and Dec where practical
-
- - Changed code internally to use newly added FastMove unit
-
- - Added UsingCursorAndGEValueGetLr routine
-
- - Changed the way records are added as the data file grows on
- inserts. As the file becomes larger, the number of records
- added at one time increases. This will speed up the insert
- process.
-
- - Added FindLrNumInBTree routine. It may prove useful for
- debugging or ensuring that an index is still in good shape.
-
- - Fixed error in GetEqualsFromBTree routine which would cause
- random failures on queries where the Condition was EQ
-
- - Fixed error in GetRangeFromBTree routine which caused
- problems when the upper range and lower range were the same
-
- Version 1.6 - Fixed an error which caused the wrong record(s) to be deleted
- when using the DeleteValueFromBTree routine. Error only
- occurred under a rare set of circumstances, but will not
- occur at all now.
-
- - I undid the third change of version 1.4 above. In essence,
- MAXVALSIZE is now equal to 245. This represents the largest
- number of bytes for an index entry. The reasons for this are
- somewhat complex, but each index node must be capable of
- holding at least two entries. There are 498 bytes available
- (after 14 are used for overhead). Any entry consists of the
- value plus a 4 byte record number. (245 + 4) * 2 = 498.
- This is well past the practical limit anyway. Keep the
- entries as small as possible. Anything over about 30 bytes
- is not very desirable. Remember, in a btree the number of
- levels is related to the size of the entries as well as the
- number of entries. Large entries will cause the tree to be
- extremely deep. This will greatly reduce the performance.
- To check the number of levels, use the NumberOfBTreeLevels
- routine provided. *)
-
- (*\*)
- (*////////////////////////// I N T E R F A C E //////////////////////////////*)
-
- interface
-
- uses
- Compare,
- FastMove,
- FileDecs,
- Files,
- Math,
- Numbers,
- LRecList,
- Page,
- Strings;
-
- const
- MAXVALSIZE = 245; (* max value size in index *)
-
- type
- VSizeType = 1 .. MAXVALSIZE; (* size range for index entries *)
-
-
- (*\*)
- (* This routine will create an index file with the file name as specified
- by iFName. The valSize parameter specifies the size of the index
- entries. The easiest way to determine this is to use the SizeOf
- function. The valType parameter specifies the type for the index
- entries. The types supported are those enumerated by the ValueType
- enumerated type.
-
- note - Extremely important - WARNING - for STRINGVALUE indexes only - the
- valSize must be 1 greater than the number of characters of the longest
- string. This will allow 1 byte for the string length to be stored.
- for example - if 'abc' is the longest string then valSize = 4. *)
-
- procedure CreateIndexFile(iFName : FnString;
- valSize : VSizeType;
- valType : ValueType);
-
-
- (* This routine will delete an index file. *)
-
- procedure DeleteIndexFile(iFName : FnString);
-
-
- (* This routine will insert a value and its associated logical record number
- into the given index file. This routine will guard against duplicate
- entries. An index should have no more than one occurence of any
- lrNum,paramValue pair (no two entries match on paramValue and lrNum). This
- routine assures this by calling DeleteValueFromBTree prior to performing
- the insert. This will get rid of a previous occurence if it exists. *)
-
- procedure InsertValueInBTree(iFName : FnString;
- lrNum : LRNumber;
- var paramValue);
-
-
- (* This routine will delete a value and its associated logical record number
- from a given index file. If the logical record number passed in is zero
- then all records with a matching paramValue will be deleted. Otherwise,
- only the entry with the matching paramValue and the matching logical record
- number will be deleted. *)
-
- procedure DeleteValueFromBTree(iFName : FnString;
- lrNum : LrNumber;
- var paramValue);
-
- (*\*)
- (* This routine will start at the root node and return the number of levels
- that exist in a BTree. The index file name is the only required input. *)
-
- function NumberOfBTreeLevels(iFName : FnString) : Byte;
-
-
- (* This routine will search an index and determine whether the given logical
- record number is in the index. If it is, TRUE is returned in found and the
- value associated with the logical record number is returned in paramValue.
- If it is not found, found will be returned as FALSE and paramValue will
- remain unchanged. This is primarily used for debugging or determining if
- an index has somehow been damaged. *)
-
- procedure FindLrNumInBTree(iFName : FnString;
- lrNum : LrNumber;
- var paramValue;
- var found : Boolean);
-
-
- (* This routine locates all values within the index which meet a condition
- (cond) and returns a list of logical record numbers corresponding to these
- values.
-
- note - In the case of cond = EX (exists) all entries in the index will be
- returned. paramValue is not really used in this case so anything (a dummy)
- can be passed in as the parameter corresponding to paramValue. *)
-
- procedure GetValuesFromBTree(iFName : FnString;
- var paramValue;
- cond : Condition;
- var lrLst : LrList);
-
-
- (* This routine locates all values within the index within a given range and
- returns a list of logical record numbers corresponding to values within
- that range. The range is determined by paramValue1 and paramValue2.
- paramValue1 must be less than paramValue2. cond1 and cond2 are used to
- determine if the range is inclusive or exclusive. cond1 must be either
- GE or GT and cond2 must be either LE or LT for this to work. If any of
- the above conditions is not true then an empty list will be returned. *)
-
- procedure GetRangeFromBTree(iFName : FnString;
- var paramValue1;
- cond1 : Condition;
- var paramValue2;
- cond2 : Condition;
- var lrLst : LrList);
-
- (*\*)
- (* This routine locates partial string matches in an index of type
- STRINGVALUE. A partial string match occurs when the string passed in as
- paramValue is contained within a string entry in the index. You must
- specify where in the string the match must occur. You accomplish this by
- using cond. cond can be CO which stands for contains. In this case, a
- match occurs if the string passed in as paramValue is anywhere in the
- string in the index. Another option for cond is ST which stands for
- starts. A match will occur in this instance if paramValue is located at
- the start of the string in the index. The last option for cond is EN which
- stands for ends. A match here occurs if paramValue matches the last
- characters in the index string. Be aware that if paramValue and the string
- being checked are the same length and they match exactly, then a match
- would occur if any of the three options were selected (not earth shattering
- but true nonetheless).
-
- Otherwise, this works like any of the other retrieval routines. A list of
- logical record numbers will be returned.
-
- This is exceptionally useful for many applications. For example, a field
- might be a 7 digit code stored as a string (STRINGVALUE). The last two
- digits might mean something in particular (part category, state, or
- whatever). Using this you can look for all the matches on only the last
- two digits.
-
- One reality note here!! - For cond <> ST I have to look at every entry in
- the index for matches. Why this is true should be reasonably obvious.
- Anyway, for cond = EN or cond = CO it will be somewhat slower than for cond
- = ST. How much slower depends on the size of the index. It boils down to
- the difference between a O(LOG(n)) algorithm and a O(n) algorithm, the
- former being much faster for n = a large number. If any of the above is
- confusing ignore it and experiment. It is still better to use this routine
- than to grovel through all of the data records yourself!! *)
-
- procedure GetSubstringFromBTree(iFName : FnString;
- var paramValue; (* must be a string var *)
- cond : StringCondition;
- var lrLst : LrList);
-
- (*\*)
- (* This routine is exactly like GetSubstringFromBTree except that matches only
- occur if the substring passed in as paramValue is located at the exact
- location specified by the position parameter. Also, the cond parameter is
- omitted since it only makes sense to look for a partial string match at a
- particular position if cond is CO. In other works, you will get a logical
- record list with entries for all index values which contain paramValue at
- the character position specified by position. *)
-
- procedure GetSubstringAtPositionFromBTree(iFName : FnString;
- var paramValue; (* must be a string
- var *)
- position : StringLengthRange;
- var lrLst : LrList);
-
-
- (* This routine will set the tree cursor to the front of the index. In
- other words, it will point to the first entry in the index. Remember, the
- index is ordered by the value of each entry. It will also return the
- logical record associated with the first entry in the index. It will
- return 0 only if there is no first entry (the index is empty). This
- routine should be called if you want to start at the beginning of an index
- and want to retrieve logical record numbers in order of entry. *)
-
- function UsingCursorGetFirstLr(iFName : FnString) : LrNumber;
-
-
- (* This routine will set the tree cursor to the end of the index. In other
- words, it will point to the last entry in the index. Remember, the index
- is ordered by the value of each entry. It will also return the logical
- record associated with the last entry in the index. It will return 0 only
- if there is no first entry (the index is empty). This routine should be
- called if you want to start at the end of an index and want to retrieve
- logical record numbers in order of entry. *)
-
- function UsingCursorGetLastLr(iFName : FnString) : LrNumber;
-
-
- (* This routine will set the tree cursor to the location in the index where
- the first occurence of the desired value (paramValue) is located. It will
- also return the logical record associated with this entry. It will return 0
- if there is no entry associated with this value. This routine should be
- called if you want to start at a certain location (at a certain value)
- within the index and want to retrieve logical record numbers in forward or
- reverse order. *)
-
- function UsingCursorAndValueGetLr(iFName : FnString;
- var paramValue) : LrNumber;
-
- (*\*)
- (* This routine is the same as UsingCursorAndValueGetLr except that this
- routine will set the tree cursor to the location of the first value in the
- index which is greater than or equal to paramValue. It will also return
- the logical record associated with this entry. It will return 0 if there
- is no entry which is greater than or equal to this value. *)
-
- function UsingCursorAndGEValueGetLr(iFName : FnString;
- var paramValue) : LrNumber;
-
- (* This routine will move the cursor to the right one entry and return the
- value associated with this entry. It will return 0 if the cursor was not
- valid (not pointing to an entry) or if there is no next entry (you are at
- end of index). This routine should be called if you want to move the
- cursor to the next larger entry from the present cursor position and
- retrieve the associated logical record number. This routine should not
- normally be used until the cursor has been positioned using one of the
- three previous positioning routines. *)
-
- function UsingCursorGetNextLr(iFName : FnString) : LrNumber;
-
-
- (* This routine will move the cursor to the left one entry and return the
- value associated with this entry. It will return 0 if the cursor was not
- valid (not pointing to an entry) or if there is no previous entry (you are
- at beginning of the index). This routine should be called if you want to
- move the cursor to the next smaller entry from the present cursor position
- and retrieve the associated logical record number. This routine should not
- normally be used until the cursor has been positioned using one of the
- three previous positioning routines. *)
-
- function UsingCursorGetPrevLr(iFName : FnString) : LrNumber;
-
-
- (* This routine will move the cursor to the right. It will move the cursor
- to the next entry in which the value is not equal to the current entry and
- return the associated logical record number. In other words, it will skip
- the cursor over all matching values. It will return 0 if the cursor was
- not valid (not pointing to an entry) or if there is no next entry (you are
- at beginning of the index). This routine should be used if you only want
- to process the first entry of a given value etc. This routine should not
- normally be used until the cursor has been positioned using one of the
- three previous positioning routines. *)
-
- function UsingCursorSkipAndGetNextLr(iFName : FnString) : LrNumber;
-
- (*\*)
- (* This routine will move the cursor to the left. It will move the cursor to
- the previous entry in which the value is not equal to the current entry and
- return the associated logical record number. In other words, it will skip
- the cursor over all matching values. It will return 0 if the cursor was
- not valid (not pointing to an entry) or if there is no previous entry (you
- are at beginning of the index). This routine should be used if you only
- want to process the first entry of a given value etc. This routine should
- not normally be used until the cursor has been positioned using one of the
- three previous positioning routines. *)
-
- function UsingCursorSkipAndGetPrevLr(iFName : FnString) : LrNumber;
-
-
- (* This routine will not move the cursor. It will return the logical record
- number asociated with the current cursor position. It will return 0 only
- if the current cursor position is not valid. *)
-
- function UsingCursorGetCurrLr(iFName : FnString) : LrNumber;
-
-
- (* This routine will set the cursor to invalid. This is never required,
- but can be used once the cursor use is completed and the cursor won't be
- used until it is repositioned using one of the three positioning
- routines. Using this routine will slightly speed up inserts and deletes.
- This is because, on an insert or delete, the cursor position must be
- kept correct if the cursor is valid. This requires a small amount of
- extra processing. This processing is extraneous if you don't care about
- the cursor position. *)
-
- procedure UsingCursorMakeCursorInvalid(iFName : FnString);
-
-
-
- (*!*)
- (*\*)
- (*///////////////////// I M P L E M E N T A T I O N /////////////////////////*)
-
- implementation
-
- {$I btree1.inc} (* most of the implementation part of the btree unit *)
-
- {$I btree2.inc} (* the rest of the implementation of the btree unit *)
-
- {$I btree3.inc} (* the newly added cursor routines *)
-
- begin
- mustMoveCursor := FALSE;
- end. (* end of BTree unit *)