Power-Programmierung

home *** CD-ROM | disk | FTP | other *** search

/ Power-Programmierung / CD1.mdf / pascal / library / dos / btree / tree / btree.pas next >

Wrap

Pascal/Delphi Source File | 1989-07-13 | 27.1 KB | 513 lines

(* TBTree16 Copyright (c) 1988,1989 Dean H. Farwell II *) unit Btree; (*****************************************************************************) (* *) (* B T R E E R O U T I N E S *) (* *) (*****************************************************************************) (* These routines are responsible for handling the indexes. The scheme used is Knuth's variation of B-Trees discussed on page 64 of "Database Systems Volume I Fourth Edition" by C.J. Date. There are two different types of nodes in the tree. The upper level(s) are index nodes and fill the role of the normal B-Tree. The lowest level nodes are sequence nodes and represent a dense index to the file. The entries in the sequence node are sorted and thus provide an alternate means to access information in the file. This allows rapid sorted retrievals without the need to keep the data file sorted. The internal workings of an index is not really important, but you need to be aware of what an index contains and what operations can be performed using an index. An index is primarily used to do retrievals using the value of one or more fields of a data record. An index contains values which are the same as values of a field in a data file. Each value in an index contains a corresponding logical record number which shows which logical record corresponds to that value. In many instances, a field will not be unique. This means that more than one record may have the same value for a given field. The index will handle these properly. However, it would not make sense to have multiple values for the same field for a single logical record number. This would imply that a data record field could have two or more values at the same time, which it obviously cannot. This would occur only if you were updating a value for the given record number and failed to delete the old entry from the index. The documentation contains an example of an update which will keep this from happening. It is also important to realize that each index can handle only one data type. The valid data types are defined by the ValueType type. The data type of the values in the index is specified at the time that the index is created and is never changed. The first step in using an index is to create one. There is a routine (CreateIndexFile) especially for this purpose. It will create the root and first sequence nodes as well as the internal bitmap. This bitmap is used to keep track of which index nodes (physical records) are currently in use and which are available. These bitmaps are maintained interally as part of the index file itself. This alleviates the requirement to maintain separate bitmap files and greatly simplifies things. When creating a file, you must supply the file name, the data type and the size of the entries. The path and the extension for the file name is optional. For example, the path can be used to ensure that the files will always be in one place even though the program is run from other places, etc. The extension can help in your own bookkeeping. For example, all data files could use an extension of '.dat' and all index files could use '.idx' etc. This is completely up to you and your application. Once you have created an index, you will want to put value/logical record number pairs into the index and retrieve them back out. One routine (InsertValueInBTree) is supplied to insert a value and corresponding logical record number. Likewise, there is one routine supplied to delete a value/logical record number pair. There is no routine for performing an update. Instead, you will perform an update by doing a delete followed by an insert. Retrieving the logical records number(s) corresponding to values is the primary purpose of the indexes and is discussed now. There are actually two distinct ways to do this. The more powerful method is to use the routines available in BTREE2.INC. Each of these routines will perform an index search and will result in the creation of a logical record list. A logical record list is nothing more than a list of logical record numbers. These lists are fully explained in the LRECLIST.PAS file and are key to the power provided by TBTREE. You will want to fully understand how to create and access these lists. A second way to access the index is to use an internal (to the index) cursor. This cursor should not be confused with the cursor which is kept as part of the logical record lists discussed above. These index cursors are discussed indepth in BTREE3.INC. To understand either method of retrieval, you need to have the proper framework for understanding the index. Even though the indexes are maintained as BTREEs, you should view them as a sequential file of values/logical record number pairs. This sequential file is ordered by value in ascending order. Think of the pairs laid out horizontally with the lower values on the left. Moving forward through the index means that larger values are accessed. The first entry is the lowest value, etc. Hopefully, you get the picture. *) (*\*) (* Version Information Version 1.1 - Corrected error in InsertValueInBTree routine. I was not checking for duplicate entries. This has been resolved. - Added GetRangeFromBTree routine - Added GetSubstringFromBTree routine - Divided the source code into 3 source files - BTREE.PAS BTREE1.INC BTREE2.INC Version 1.2 - No Changes Version 1.3 - Added NumberOfBTreeLevels routine Version 1.4 - HUGE CHANGE - I no longer keep the data file name as part of the index file parameter record. It was not used, and I foresee no real use for it in the future. I thought that I would need it for my TBASE product, but I won't. This will mean a change to anyone who has applications that they have written using TBTREE version 1.3 or earlier. Any calls to the old CreateIndex routine will need to be changed. dataFName is no longer a parameter for that routine. Hopefully, this will not create a great inconvenience. I just was tired of carrying around the extra baggage. - HUGE CHANGE - The bitmap files are a thing of the past. The bitmaps are now maintained as part of the index files themselves. This has caused a complete restructuring of the index file parameter record and the index file themselves. This means that any index files (BTREEs) created using !!!!! - >>> versions prior to 1.4 will need to be rebuilt. This is IMPORTANT discussed in the documentation under the version information. !!!!! - >>> I apologize for the inconvenience. However, the sooner I make these kinds of changes, the better. This will simplify your use of the routines and cut in half the number of files to be dealt with (since each index and data file previously had a bitmap). - Corrected an error related to maximum value size (size of an entry) for an index. The documentation states that the maximum size for an entry in an index is 494 bytes. However, an error limited this to 255 bytes. This is because only one byte was allocated in the parameter record for the value size. This has been corrected. However, in practical terms, you want to keep the index entries small. This change required a change to the parameter record definition. - Added a second way to retrieve information from the B-Tree indexes. Rather than always creating a logical record list when retrievals are required, you can now work with the index more directly. The BTREE3.INC contains many routines which manipulate an internal cursor which can be used to access logical record numbers in the index. The newly added BTREE3.INC file has more details. - Added GetSubstringAtPositionFromBTree routine. This routine allows more flexibility when doing partial string matches on an index. It is exceptionally handy if you have an index on a string field which really is made of several sub-fields. Something like phone number is a good example. You might be interested in only phone numbers which have a seven as the second digit, or whatever. - Made several changes internally to increase performance. One primary change was to use a Binary Search technique to find entries in nodes at all levels. Also, VAR parameters are used internally when a parameter is large. For instance, when a page is passed to a routine as a parameter, it is passed as a VAR parameter. Therefore, the entire 512 bytes do not have to be placed on the stack. This will also reduce the stack requirements. - Changed name of the CreateIndex routine to CreateIndexFile to emphasize the change to the parameters for the routine. Also, it clashed with a name which I want to use with TBASE. - Added DeleteIndexFile routine. This routine should always be used when deleting an index file. Version 1.5 - Changed code internally to use Inc and Dec where practical - Changed code internally to use newly added FastMove unit - Added UsingCursorAndGEValueGetLr routine - Changed the way records are added as the data file grows on inserts. As the file becomes larger, the number of records added at one time increases. This will speed up the insert process. - Added FindLrNumInBTree routine. It may prove useful for debugging or ensuring that an index is still in good shape. - Fixed error in GetEqualsFromBTree routine which would cause random failures on queries where the Condition was EQ - Fixed error in GetRangeFromBTree routine which caused problems when the upper range and lower range were the same Version 1.6 - Fixed an error which caused the wrong record(s) to be deleted when using the DeleteValueFromBTree routine. Error only occurred under a rare set of circumstances, but will not occur at all now. - I undid the third change of version 1.4 above. In essence, MAXVALSIZE is now equal to 245. This represents the largest number of bytes for an index entry. The reasons for this are somewhat complex, but each index node must be capable of holding at least two entries. There are 498 bytes available (after 14 are used for overhead). Any entry consists of the value plus a 4 byte record number. (245 + 4) * 2 = 498. This is well past the practical limit anyway. Keep the entries as small as possible. Anything over about 30 bytes is not very desirable. Remember, in a btree the number of levels is related to the size of the entries as well as the number of entries. Large entries will cause the tree to be extremely deep. This will greatly reduce the performance. To check the number of levels, use the NumberOfBTreeLevels routine provided. *) (*\*) (*////////////////////////// I N T E R F A C E //////////////////////////////*) interface uses Compare, FastMove, FileDecs, Files, Math, Numbers, LRecList, Page, Strings; const MAXVALSIZE = 245; (* max value size in index *) type VSizeType = 1 .. MAXVALSIZE; (* size range for index entries *) (*\*) (* This routine will create an index file with the file name as specified by iFName. The valSize parameter specifies the size of the index entries. The easiest way to determine this is to use the SizeOf function. The valType parameter specifies the type for the index entries. The types supported are those enumerated by the ValueType enumerated type. note - Extremely important - WARNING - for STRINGVALUE indexes only - the valSize must be 1 greater than the number of characters of the longest string. This will allow 1 byte for the string length to be stored. for example - if 'abc' is the longest string then valSize = 4. *) procedure CreateIndexFile(iFName : FnString; valSize : VSizeType; valType : ValueType); (* This routine will delete an index file. *) procedure DeleteIndexFile(iFName : FnString); (* This routine will insert a value and its associated logical record number into the given index file. This routine will guard against duplicate entries. An index should have no more than one occurence of any lrNum,paramValue pair (no two entries match on paramValue and lrNum). This routine assures this by calling DeleteValueFromBTree prior to performing the insert. This will get rid of a previous occurence if it exists. *) procedure InsertValueInBTree(iFName : FnString; lrNum : LRNumber; var paramValue); (* This routine will delete a value and its associated logical record number from a given index file. If the logical record number passed in is zero then all records with a matching paramValue will be deleted. Otherwise, only the entry with the matching paramValue and the matching logical record number will be deleted. *) procedure DeleteValueFromBTree(iFName : FnString; lrNum : LrNumber; var paramValue); (*\*) (* This routine will start at the root node and return the number of levels that exist in a BTree. The index file name is the only required input. *) function NumberOfBTreeLevels(iFName : FnString) : Byte; (* This routine will search an index and determine whether the given logical record number is in the index. If it is, TRUE is returned in found and the value associated with the logical record number is returned in paramValue. If it is not found, found will be returned as FALSE and paramValue will remain unchanged. This is primarily used for debugging or determining if an index has somehow been damaged. *) procedure FindLrNumInBTree(iFName : FnString; lrNum : LrNumber; var paramValue; var found : Boolean); (* This routine locates all values within the index which meet a condition (cond) and returns a list of logical record numbers corresponding to these values. note - In the case of cond = EX (exists) all entries in the index will be returned. paramValue is not really used in this case so anything (a dummy) can be passed in as the parameter corresponding to paramValue. *) procedure GetValuesFromBTree(iFName : FnString; var paramValue; cond : Condition; var lrLst : LrList); (* This routine locates all values within the index within a given range and returns a list of logical record numbers corresponding to values within that range. The range is determined by paramValue1 and paramValue2. paramValue1 must be less than paramValue2. cond1 and cond2 are used to determine if the range is inclusive or exclusive. cond1 must be either GE or GT and cond2 must be either LE or LT for this to work. If any of the above conditions is not true then an empty list will be returned. *) procedure GetRangeFromBTree(iFName : FnString; var paramValue1; cond1 : Condition; var paramValue2; cond2 : Condition; var lrLst : LrList); (*\*) (* This routine locates partial string matches in an index of type STRINGVALUE. A partial string match occurs when the string passed in as paramValue is contained within a string entry in the index. You must specify where in the string the match must occur. You accomplish this by using cond. cond can be CO which stands for contains. In this case, a match occurs if the string passed in as paramValue is anywhere in the string in the index. Another option for cond is ST which stands for starts. A match will occur in this instance if paramValue is located at the start of the string in the index. The last option for cond is EN which stands for ends. A match here occurs if paramValue matches the last characters in the index string. Be aware that if paramValue and the string being checked are the same length and they match exactly, then a match would occur if any of the three options were selected (not earth shattering but true nonetheless). Otherwise, this works like any of the other retrieval routines. A list of logical record numbers will be returned. This is exceptionally useful for many applications. For example, a field might be a 7 digit code stored as a string (STRINGVALUE). The last two digits might mean something in particular (part category, state, or whatever). Using this you can look for all the matches on only the last two digits. One reality note here!! - For cond <> ST I have to look at every entry in the index for matches. Why this is true should be reasonably obvious. Anyway, for cond = EN or cond = CO it will be somewhat slower than for cond = ST. How much slower depends on the size of the index. It boils down to the difference between a O(LOG(n)) algorithm and a O(n) algorithm, the former being much faster for n = a large number. If any of the above is confusing ignore it and experiment. It is still better to use this routine than to grovel through all of the data records yourself!! *) procedure GetSubstringFromBTree(iFName : FnString; var paramValue; (* must be a string var *) cond : StringCondition; var lrLst : LrList); (*\*) (* This routine is exactly like GetSubstringFromBTree except that matches only occur if the substring passed in as paramValue is located at the exact location specified by the position parameter. Also, the cond parameter is omitted since it only makes sense to look for a partial string match at a particular position if cond is CO. In other works, you will get a logical record list with entries for all index values which contain paramValue at the character position specified by position. *) procedure GetSubstringAtPositionFromBTree(iFName : FnString; var paramValue; (* must be a string var *) position : StringLengthRange; var lrLst : LrList); (* This routine will set the tree cursor to the front of the index. In other words, it will point to the first entry in the index. Remember, the index is ordered by the value of each entry. It will also return the logical record associated with the first entry in the index. It will return 0 only if there is no first entry (the index is empty). This routine should be called if you want to start at the beginning of an index and want to retrieve logical record numbers in order of entry. *) function UsingCursorGetFirstLr(iFName : FnString) : LrNumber; (* This routine will set the tree cursor to the end of the index. In other words, it will point to the last entry in the index. Remember, the index is ordered by the value of each entry. It will also return the logical record associated with the last entry in the index. It will return 0 only if there is no first entry (the index is empty). This routine should be called if you want to start at the end of an index and want to retrieve logical record numbers in order of entry. *) function UsingCursorGetLastLr(iFName : FnString) : LrNumber; (* This routine will set the tree cursor to the location in the index where the first occurence of the desired value (paramValue) is located. It will also return the logical record associated with this entry. It will return 0 if there is no entry associated with this value. This routine should be called if you want to start at a certain location (at a certain value) within the index and want to retrieve logical record numbers in forward or reverse order. *) function UsingCursorAndValueGetLr(iFName : FnString; var paramValue) : LrNumber; (*\*) (* This routine is the same as UsingCursorAndValueGetLr except that this routine will set the tree cursor to the location of the first value in the index which is greater than or equal to paramValue. It will also return the logical record associated with this entry. It will return 0 if there is no entry which is greater than or equal to this value. *) function UsingCursorAndGEValueGetLr(iFName : FnString; var paramValue) : LrNumber; (* This routine will move the cursor to the right one entry and return the value associated with this entry. It will return 0 if the cursor was not valid (not pointing to an entry) or if there is no next entry (you are at end of index). This routine should be called if you want to move the cursor to the next larger entry from the present cursor position and retrieve the associated logical record number. This routine should not normally be used until the cursor has been positioned using one of the three previous positioning routines. *) function UsingCursorGetNextLr(iFName : FnString) : LrNumber; (* This routine will move the cursor to the left one entry and return the value associated with this entry. It will return 0 if the cursor was not valid (not pointing to an entry) or if there is no previous entry (you are at beginning of the index). This routine should be called if you want to move the cursor to the next smaller entry from the present cursor position and retrieve the associated logical record number. This routine should not normally be used until the cursor has been positioned using one of the three previous positioning routines. *) function UsingCursorGetPrevLr(iFName : FnString) : LrNumber; (* This routine will move the cursor to the right. It will move the cursor to the next entry in which the value is not equal to the current entry and return the associated logical record number. In other words, it will skip the cursor over all matching values. It will return 0 if the cursor was not valid (not pointing to an entry) or if there is no next entry (you are at beginning of the index). This routine should be used if you only want to process the first entry of a given value etc. This routine should not normally be used until the cursor has been positioned using one of the three previous positioning routines. *) function UsingCursorSkipAndGetNextLr(iFName : FnString) : LrNumber; (*\*) (* This routine will move the cursor to the left. It will move the cursor to the previous entry in which the value is not equal to the current entry and return the associated logical record number. In other words, it will skip the cursor over all matching values. It will return 0 if the cursor was not valid (not pointing to an entry) or if there is no previous entry (you are at beginning of the index). This routine should be used if you only want to process the first entry of a given value etc. This routine should not normally be used until the cursor has been positioned using one of the three previous positioning routines. *) function UsingCursorSkipAndGetPrevLr(iFName : FnString) : LrNumber; (* This routine will not move the cursor. It will return the logical record number asociated with the current cursor position. It will return 0 only if the current cursor position is not valid. *) function UsingCursorGetCurrLr(iFName : FnString) : LrNumber; (* This routine will set the cursor to invalid. This is never required, but can be used once the cursor use is completed and the cursor won't be used until it is repositioned using one of the three positioning routines. Using this routine will slightly speed up inserts and deletes. This is because, on an insert or delete, the cursor position must be kept correct if the cursor is valid. This requires a small amount of extra processing. This processing is extraneous if you don't care about the cursor position. *) procedure UsingCursorMakeCursorInvalid(iFName : FnString); (*!*) (*\*) (*///////////////////// I M P L E M E N T A T I O N /////////////////////////*) implementation {$I btree1.inc} (* most of the implementation part of the btree unit *) {$I btree2.inc} (* the rest of the implementation of the btree unit *) {$I btree3.inc} (* the newly added cursor routines *) begin mustMoveCursor := FALSE; end. (* end of BTree unit *)