Programmer 7500

home *** CD-ROM | disk | FTP | other *** search

/ Programmer 7500 / MAX_PROGRAMMERS.iso / PASCAL / TBTREE.ZIP / TBTREE13.DOC < prev next >

Wrap

Text File | 1988-07-31 | 76.0 KB | 1,453 lines

TBTREE13 Copyright (c) 1988 Dean H. Farwell II TBTREE13 ( Turbo BTree version 1.3 ) Yet another implementation of BTrees for Turbo Pascal 4.0 Copyright (c) 1988 Dean H. Farwell II All source code and documentation are considered to be shareware and are copyrighted. The following rules of engagement apply to all documentation and code which make up TBTREE13: All users can and are encouraged to distribute this software freely. This includes the uploading of this software onto applicable Bulletin Boards etc. The entire TBTREE13 product must be distributed as a complete product. Pieces can not be distributed individually. The documentation must be distributed with the source code. No copyright statements may be modified. All users can and are encouraged to use these utilities to develop and/or enhance their private, shareware, or commercial software applications. All users are encouraged to suggest enhancements, make suggestions and generally lead to the development of a better product to the benefit of all. Users may change the code and documentation as desired for their own use but may not then distribute those changes. Users may distribute these routines as part of another SHAREWARE package. For instance, a user could develop a DBMS using this as a basis and can freely distribute the source code provided that: 1. None of this documentation or code is changed 2. ALL code and documentation is distributed (not just parts) 3. The user is not selling or distributing this for profit 4. The user be registered. The restriction that the code and documentation not be changed is to ensure that I can maintain some control over version releases. I do 1 TBTREE13 Copyright (c) 1988 Dean H. Farwell II not want several versions running around that I had nothing to do with. The bottom line is that you can not sell this or distribute it for profit (except for a small fee not to exceed $10 to cover cost of mailing, duplication and handling). A user developing a commercial application can not distribute the SOURCE code or documentation but a user can use these routines to compile a commercial application. Users are encouraged to register their copy of this software. The registration fee is a very nominal $15. The registration is not going to make me rich but will let me know if anybody out there is at all interested in this. I intend to support this product and add many capabilities. I am looking for registered owners to provide feedback which will help me prioritize enhancements. If few people register then I will assume that there is not wide spread interest in this product. Anyone using this software to develop a commercial application must register and state their intentions to use it in a commercial application. I will not charge royalties (except for the registration fee). I just want to know what people are doing with it. Also, if someone is distributing it as part of a shareware product, the same restrictions apply. Most of all I want any feedback that you have the time and desire to submit. Suggestions, complaints, etc are welcome. I especially want feedback on errors. I do not know of any bugs (or I'd fix them) but I probably missed an item or two. Send me notice immediately so that I can distribute a fix if required. Since you have the source code, do your best to isolate the problem. I'll try to take it from there. To reach me use Compuserve ID - 73240,3335 or drop me a line at: 1834 Windsor Downs Ct Montgomery AL 36117 My phone number is: 205-279-9552 For hopefully obvious reasons, I will not accept collect calls. Otherwise, feel free to call. As a final motivation to register, all registered users will receive one upgrade free (via mail - no postage due or anything!) when it becomes available. I will mail these well before I put the version out on CompuServe or anywhere else. The registered users will, in effect, become beta testers to make sure that all is well before I release to the world. After the first upgrade, the user can request to receive the next upgrade for a nominal $5 to cover postage , etc. This fee will ensure that you always have the latest version and is probably less than the cost to download off of a bulletin board. I will do my best to not release trivial changes but only significant upgrades. Liability: 2 TBTREE13 Copyright (c) 1988 Dean H. Farwell II This software is not warranted. There are no warranties expressed or implied as to the quality, correctness, or applicability of this software. The author is not responsible for any loss or damages due to the use of any software or documentation distributed as TBTREE13. Product Description TBTREE13 is version 1.1 of a shareware btree and database product for use with Turbo Pascal 4.0. It is intended to give a user the capabilities to develop pascal applications for the creation and manipulation of databases. This product is extremely powerful and FAST!!! However, it is not the easiest thing to get a grasp on. If you understand the concept of btrees or at least indexing then this will not be complicated to use. I guarantee that it is the best product of its type for the money and maybe the best period. If you have any feedback on how I can take the mystery out of using this please let me know. I am presently working on a second product which will be called TBASE and will use these routines as the database engine but will have a much easier to understand interface for code developers. It will use SQL like calls instead of those used in this product. All registered users of TBTREE13 will be shipped a beta version of TBASE as soon as one becomes available and will get the first whack at using it. This in itself is worth the $15. Although this product provides many of the same general database capabilities as Borland's Turbo Pascal Database Toolbox it is not intended to be compatible. There are several reasons for this. First, I do not want to create any impression of any type of copyright infringement. Secondly, I needed to implement things differently than they were implemented in Borland's product to be able to do the enhancements I am planning. Thirdly, I wanted to implement things that Borland does not. For instance, I support data types in my indexes. Other differences will be more apparent as you read through this documentation. Lastly, I did not have a copy of Borland's product when I was doing this. This really started out as more of an academic exercise than something that I was going to release for others. This turned out to be a more extensive job than I thought (big surprise there, huh?). I figured that I might as well make it available for others. This product has been greatly enhanced since version 1.0 was released. As it presently stands, it has the following features: 1. The ability to create data files and indexes for those files. 2. Any number of indexes can be used for a single data file. 3. Any number of data files can be used. 4. All data files and index files use a virtual memory scheme in which the most recently used pages are kept in memory using the heap. This is different than many other implementations which use a buffer or stack only for index pages. The buffer is shared by all files. In this way separate space does not need to be set aside for different files. Space utilization is maximized and performance is enhanced. 3 TBTREE13 Copyright (c) 1988 Dean H. Farwell II 5. All management of pages in memory is automatically controlled although the user can exercise limited control if desired. 6. Routines are provided to dump the page buffer to the printer to facilitate viewing pages if a user is curious or masochistic. 7. The size of the buffer is determined by the user and can be changed dynamically as the program runs. For example a 128K buffer could be used initially. If a user had a large heap requirement the buffer size could be reduced. The buffer size could later be increased if desired. This can all be done while the application is running. 8. Problems related to keeping track of numbers of open files are alleviated. The user initially specifies the maximum number of files allowed to be open at once and never has to address it again. Routines are provided to manage the opening and closing of files dynamically. This increases performance because files do not have to be continually opened and closed unnecessarily. 9. All data types supported by Turbo Pascal are explicitly supported. In other words, values do not have to be converted to strings. This is a great improvement over previous systems (MY UNBIASED OPINION). For example an index can be set up for type Byte, another for type ShortInt, another of type String and even for type Real (but wait you say .. Reals in an index?). See next note! 10. Indexes can be searched for equality, inequality, greater than, greater than or equal to, less than, less than or equal to, and existence. In other words you can ask for a list of all records for which some field of type real is LE (less than or equal to) XXX where XXX is a var of type real and XXX := 3.141592 etc. An index can also searched on a range of values. For instance an index of type byte can be checked for a range >= 10 and < 20 etc. Indexes containing strings can be searched for partial string matches. Partial string matches can be done three ways: find a string which is at the beginning of a string, anywhere in the string, or at the end of a string. 11. As physical records are deleted space is reused automatically on future inserts. Files should never have to be reorganized to recover space (unless a large number of deletes were performed without any inserts and even in this case there is not really a good reason to reorganize unless your disk is filling up). 12. Included with the package is a very crude but effective source code printing routine which can be used to dump the code in a more usable format. Specifically, it handles page breaks and prints a header of useful information. 13. The user does not need to perform any calculations or make any estimates prior to data file or index creation. 14. Code was designed using an Object Oriented Design approach to a large extent (Turbo 4.0 lends itself to this approach much more than Turbo Pascal 3.0 did). This is one reason that there are so many units. Much of the code is reusable for other applications. 15. Added to the product is a sorting capability. Sorts can be accomplished using any number of fields in any order of precedence as 4 TBTREE13 Copyright (c) 1988 Dean H. Farwell II determined by you. A Quicksort algorithm is used to give high performance. Sort fields can be any type supported (Byte, Real, Integer, String etc). 16. Added the following set operations: Union, Intersection and Difference. These allow easy queries on multiple fields. For example, retrieving all personnel who have a last name starting with 'N' and who's job is 'MANAGER', etc is greatly simplified. This enhancement is a definite plus and not available in many other products. List Of Files The following files are provided as part of the TBTREE13 product: Units - Source Code For All Units BTREE.PAS - This unit contains the interface and implementation for the btrees themselves. Although the implementation is fairly complex, only a few routines are visible to the user and give the user all the functionality required. The actual working of the btree unit itself is not critical to the user. The user is provided with routines to add an entry to an index, delete one entry from an index, delete all entries with a matching key value from an index and also retrieve record numbers which match some selection criterion. The ability to create an index is also provided. To delete an index just delete the file using a routine provided in FILES.PAS. The BTREE unit is different than any other unit in the product in that it consists of not one but three source files. See entry for BTREE1.INC and BTREE2.INC. BTREE1.INC and BTREE2.INC - These are additional source files needed for the BTREE unit. I split up the source code for the unit because the source code ended up greater than 64K bytes. The two additional files are include files and will automatically be compiled when BTREE.PAS is compiled. COMPARE.PAS - This unit contains no inherent object although a couple of types are declared and put in the interface. It is primarily made of one routine which will give the capability of comparing two values of the same type using one routine regardless of that type. The address of the two values and the type of the values are passed in at run time. This routine alleviates the need for the user to develop a routine which can be used for comparisons. All Turbo Pascal 4.0 scaler types are handled. Strings are the only structured (composite) type handled. Several other routines were added to facilitate partial string searches. FILEBUFF.PAS - This unit contains the interface and implementation for a files open list. This list will contain information on all files opened (using this unit). A user can specify how many files can be on the list at once (how many the unit can allow to be open). The unit takes it from there. From then on all requests to use a file must be preceded by a call to the unit. The unit will determine if 5 TBTREE13 Copyright (c) 1988 Dean H. Farwell II the file is open and will open it if it is not. The routine is exceptionally useful for alleviating the need to open and close files for every operation. Continually opening and closing files causes unacceptable performance problems. The file types handled are only Untyped files and Text files. Typed files are not currently handled. This is due to the fact that strong type checking precludes the use of typed files (or at least I couldn't break the code and figure out how to make it work. If anyone can figure out how to make it work, it would make this even more useful). FILEDECS.PAS - This unit is nothing more than a collection of declarations required to support several of the other units. There is no code in the implementation section. FILES.PAS - This unit contains the interface and implementation for the manipulation of files. This fulfills the role of file manager (mid level management of files as opposed to the low level management found in FILEBUFF.PAS). It is the place where the different file types required in TBTREE13 are created, deleted, and names are adjusted (extensions are attached). Additionally, there are several routines to set records to used and to check the status of records and to determine what the last record in use is. This is the only place where Bitmap files are manipulated (created, referenced, updated and deleted). HEX.PAS - This unit contains the interface and implementation for dealing with a hex array which is a 2 byte array used for storing a hex character (such as 7F). A couple of routines are provided for converting bytes and bits to hex array representations. LOGICAL.PAS - This unit contains the interface and implementation for dealing with the conversion of logical data records to physical records. In TBTREE13 all physical records (records stored on disk) are the same size (512 bytes). This is how all files can share the same page buffer (See PAGE.PAS). However, the logical records (data records in the data files) can be any size from one byte up to 65520 (although they will not be that large in practice). This unit takes the users data records stored in a variable and puts them in the number of physical records required. It also takes the appropriate data from physical record(s) and puts it in the variable of the users choice. A great deal of flexibility is available here. One caution however! TBTREE13 uses FIXED LENGTH LOGICAL RECORDS. All data records (within one data file) must be of the same size (obviously each data file can have a different logical record size). If variable length data is required then the size of the logical record is the largest number of bytes required as determined by the user. A good example of this is a record containing a string declared as String[20]. The logical record size is 21 bytes (one byte for the length). 21 bytes should be stored and retrieved even though a string such as 'howdy' is stored (howdy uses 6 bytes). Most products of this nature work in the same way. 6 TBTREE13 Copyright (c) 1988 Dean H. Farwell II LRECLIST.PAS - This unit contains the interface and implemetation to handle logical record lists. A logical record list is a concept which is different than you will find in most other implementations of btrees. When a query is done on an index, most btrees set a cursor to the node where the first matching value is found (equality). This is very efficient but does not lead to the flexibility of my system. In TBTREE13 I build a list of logical record numbers corresponding to the logical records which meet the criteria of the query. I can handle all boolean operations such as equality, inequality, greater than, greater than or equal to, etc. Therefore, I could get a list of all logical records which meet some criteria and deal with that list independently of the btree. Once I have the list I can compare it to other lists and do intersections, unions, joins, etc. This gives the user some of the power of a real database management system. A user can relate a file to itself or to other files. This system is the basis of my future work to bring true relational capabilities to programs written in Turbo Pascal. It gives the capability to do set at a time processing. Unfortunately, nothing is free. A query on a database of 10000 logical records on existence will give a list of 10000 logical records (40000 bytes required to store the list). Lists like this take time to build. A fact of life is that it takes time to process a lot of data in a large database. The user is cautioned to structure queries in such a way that lists are not huge. Queries on equality are not normally a problem (unless the index provides little selectivity). Queries on inequality will be faster if a range is used. No matter what type of query is used, the page buffer alleviates a great deal of performance problems. MATH.PAS - This unit contains the interface and implementation to manipulate bytes at the bit level. Only three routines are visible as that is all that I required in TBTREE13. This unit has been redone in assembly language to enhance performance. See the source listing for further details. MYPRINT.PAS - This routine is a collection of printer codes and routines to print the codes. This is sufficient for my purposes and is included only because I needed a couple of routines for TP4PRINT.PAS and PAGE.PAS. This unit is not very elegant and will need to be modified if it does not support your printer. NUMBERS.PAS - This unit contains no code in the implementation. It is simply a repository for often used types which are not part of the original Turbo Pascal 4.0 product. PAGE.PAS - This unit contains the interface and implementation to manipulate the page buffer. The page buffer is stored internally (heap) in an area set aside to hold physical records. This buffer greatly enhances performance of the overall product. To function the buffer can be set to as small as one page (512 bytes). However, 7 TBTREE13 Copyright (c) 1988 Dean H. Farwell II performance is going to be dismal. The amount of buffer space is dependent on many factors and the size needs to be played with to see what works. One rule stands though .. bigger is better! One interesting note - routines are provided to change the size (both increase and decrease) dynamically as often as desired. There are other routines available as well. The user has the option to force a page to be written to disk immediately when it is changed (stored in the buffer) as opposed to waiting until it gets swapped out or all pages are written to disk. Immediately writing to disk ensures that nothing gets lost on a system or program crash but it has severe performance implications. Use of this is not normally required, but the option is there and usable. SETOPS.PAS - This unit contains routines that support set operations. They facilitate queries using multiple fields from one data file. This greatly reduces the workload on the programmer. SORT.PAS - This unit contains three procedures for sorting. This unit was was created in version 1.2 and adds greatly to the overall product. It uses a quicksort algorithm to provide good performance. Just as importantly, it supports all data types supported by TBTREE13 and is extremely easy to use. STRINGS.PAS - This unit contains routines to manipulate strings. Only two routines are provided. There is great room for expansion of this unit if required. TIME.PAS - This unit contains the implementation and interface to manipulate time arrays. A time array stores a four byte time value retrieved directly from RAM. It is probably not the best way to do this particular function, but it works and is fast. It is essentially only used in conjunction with least recently used routines (in PAGE.PAS and FILEBUFF.PAS). If anyone experiences compatibility problems let me know. Programs - Source Code For All Programs TP4PRINT.PAS - This program is provided merely to permit my source code to be printed out in a better format than provided by the DOS Print command. It looks for page control codes imbedded in my source (control codes are comments to the compiler) and controls paging. It puts a header which includes file name, page and print time. It provides top and bottom margins as well. Give it a try by printing out one of the source code files to see if you want to use it or another source code printer. Syntax is : TP4PRINT filename where filename includes path if source is not in the current directory BUILDTPU.PAS - When this program is compiled, all units associated with TBTREE13 will also be compiled. This simply 8 TBTREE13 Copyright (c) 1988 Dean H. Farwell II provides a convenient way to build all the units at once without compiling them individually. EXAM1.PAS - This example program demonstrates how to create a data file and a couple of indexes. It then demonstrates how to insert data into the data file and the indexes. Several other routines are demonstrated as well. Important to note is that this routine has been changed from version 1.2 and prior. It now uses StoreNewLogicalRecord to store the record as opposed to using StoreALocicalRecord. Look at the source and this may make sense. EXAM2.PAS - This program shows how to perform retrievals. The files created in EXAM1.PAS are used. EXAM3.PAS - This program shows how to do deletes and updates. The files created in EXAM1.PAS are used. EXAM4.PAS - This program shows how to use routines in FILEBUFF.PAS with text files. EXAM5.PAS - This program shows how to use the retrieval on range routine. The files created in EXAM1.PAS are used. EXAM6.PAS - This program shows how to use the retrieval on partial string match routine. The files created in EXAM1.PAS are used. EXAM7.PAS - This program shows how to build/rebuild indexes from an existing data file. The files created in EXAM1.PAS are used. EXAM8.PAS - This program shows use of routine to delete entries from a logical records list. The files created in EXAM1.PAS are used. EXAM9.PAS - This program shows an example of sorting using the SORT unit. The files created in EXAM1.PAS are used. EXAM10.PAS - This program shows an example of using one of the set operations found in the SETOPS unit. The files created in EXAM1.PAS are used. Instructions For Use of TBTREE13 The following is a short set of instructions for performing the more common database type operations using TBree11. Uses Statements - All of the units were compiled with the appropriate uses statement included. Therefore, the user does not need to be concerned about order when specifying units in the uses statements in the user's programs. As you can see in my examples, I put the units in alphabetical order. The user only needs to put the units which have either type statements or routine calls or both which must be visible within the user's program. My example programs demonstrate which ones are probably required. One method which actually works is to not include any initially and let the compiler find the syntax errors. When an 9 TBTREE13 Copyright (c) 1988 Dean H. Farwell II error is found because a type or routine is not found then see which unit it's in and put that unit in the uses statement. It's not a great way to do it, but it works. Setting Initial Parameters - Several parameters should be set by the user although they have defaults if a value is not set by the user. All of the parameters are only accessible by calling routines supplied to check and set values. The variables themselves are not visible to the user. The following parameters can be set by the user: Open Files Allowed - Contained in FILEBUFF.PAS. Determines number of open files allowed. See FILEBUFF.PAS for details. To set the parameter to a value of 5 do the following: SetMaxOpenFiles(5); Maximum Buffer Pages Allowed - Contained in PAGE.PAS. Determines the maximum number of 512 byte pages to keep in memory at one time. Set this to something that makes sense for your machine (memory size available). Since this is set at run time, you can set this after you determine how much memory is available. To set this parameter to 50 pages (25K bytes) use the following: SetMaxBufferPages(50); One note is that the buffer is initialized to 128K bytes which may be too large for your application. You should reset the default to whatever you want. Immediate Disk Write - This parameter if set to TRUE will force a write of a page to disk immediately after it is stored in the page buffer. Otherwise, the page will only be written when it gets swapped out or is explicitly written by the user (all pages for a file or all pages). To set this value to TRUE use: SetImmediateDiskWrite(TRUE); Creating Data Files - One of the first things we would want to do is create data files to hold our data. This is extremely simple. We only need to make one call for each data file. The following statements would create a data file: var xxx : FnString; (* holds name of data file *) begin xxx := 'MYFILE'; (* name of file WITHOUT extension *) Createfile(xxx,DATA); (* create the file and append the extension ('.dat') for a DATA file *) end; Notice that the xxx is not a file type but an 80 character string 10 TBTREE13 Copyright (c) 1988 Dean H. Farwell II type which holds the name of the file. The file name assigned does not initially have an extension. The extension is appended automatically when CreateFile is called. Notice that the type of file created is a DATA file. This is absolutely the only type of file that the user should ever create using the CreateFile routine. INDEX files are created a different way. BITMAP files and LLIST files are only created by routines internally. These two are also deleted internally. Do not fool with these last two file types explicitly. Creating Index Files - The next step is to create one or more index files for each data file. You will not normally have a data file with no indexes (although for small files, you may want to). To create an index file the user must know the type and size of the key (item to be stored in the index). The following statements will create an index file for the data file created above. The index will hold Byte values (values from 0 - 255). var yyy : FnString; begin yyy := 'Index1'; (* notice no extension *) CreateIndex(yyy,SizeOf(Byte),xxx,BYTEVALUE); Notice that the data file name is also passed in. This is stored in the header of the index file. I do not presently use it for anything, but I think I'm going to need it later. I did not want to have to change the parameters needed in later releases. Also notice use of the SizeOf routine to pass in the size of the item to be stored in the index. This is usually more accurate than passing in a constant value. This will especially help avoid errors when using strings. Remember, strings use one extra byte for the length. Inserting Data - Once files and indexes exist we can insert data. Inserting data consists of two separate steps. The first step is to insert the data into the data file. The second step is to insert the applicable data into each index which is associated with that data file. To insert data we first need to have the data record stored in memory somewhere. The most logical place to have the data record is in a variable which is of type record of some kind. For example, the following is an example of a variable declaration which would work: type MyRecord = record field1 : Byte; (* for this example this is the only field indexed (Index1) although you could index all fields if desired. *) field2 : Word; field3 : String[10]; end; var thisRec : MyRecord; 11 TBTREE13 Copyright (c) 1988 Dean H. Farwell II lrNum : LrNumber; We could use the following sequence to insert a record in the data file and the index associated with this data file: thisRec.field1 := 20; thisRec.field2 := 60000; thisRec.field3 := 'Hi Mom'; lrNum := StoreNewLogicalRecord(xxx,thisRec,SizeOf(thisRec)); InsertValueInBTree(yyy,lrNum,thisRec.field1); Notice that StoreNewLogicalRecord was used because it will find an unused logical record number and assign it to this new record. This logical record number is the one inserted (along with the associated value) into the BTree. This is a change form versions prior to version 1.3. This simplifies things and makes stroring much more straightforward. Retrieving Data - Retrieving data consists of making a query to an index. The query is a procedure call which will build and return a list of logical records which match the selection criteria. The query conditions are defined in NUMBERS.PAS and are of type Condition. Once the list is retrieved, the user can step through the list starting at either end and going in either direction. The logical record numbers are stored in the index in such a way that their corresponding key values (values in the index) are in ascending order. A logical record list created using an index will reflect this order. If there are multiple occurences of the same key then the last one added is in the front. As previously noted, the list can be traversed front to back (ascending) or back to front (descending). A cursor is kept internally in the list to keep track of where the user is in the list. The user can not get to the cursor but can set it to the front or back and move it forward or backward (by retrieving values from the list). Normally, to traverse the list, set the cursor to the front of the list and then set up a loop which will traverse the list and terminate when a logical record number of 0 is returned. The list is not disturbed as you move through it. It will exist until you destroy it or until the program ends. If the program ends before the list is destroyed explicitly by the user then there may be a file on disk with an extension of '.lrl'. If you terminate and find any of this type of file laying around then you did not destroy it prior to ending the program. You should fix this as leaving a list hanging around just takes up space and does nothing productive. There is a limit of 99 lists which can coexist. This is such an exceptionally large number that if you ever exceed it there is an error somewhere. Therefore, I did not leave provisions to terminate gracefully if this is exceeded. An example of a call to retrieve data follows: var key : Byte; lrLst : LrList; lrNum : LRNumber; 12 TBTREE13 Copyright (c) 1988 Dean H. Farwell II begin key := 20; GetValuesFromBTree(yyy,key,LE,lrLst); (* lrLst is the newly created list *) lrNum := GetFirstLr(lrLst); (* set cursor to front *) while lrNum <> 0 do (* traversal loop *) begin GetALogicalRecord(xxx,lrNum,thisRec,SizeOf(thisRec)); (* we can not do whatever with the record *) . . . lrNum := GetNextLr(lrLst); end; (* this will continue until end of list *) Versions 1.1 and later have the added capability to retrieve records for a range of values and retrieve records on a partial string match. An example of each follows: var key1, key2 : Byte; lrLst : LrList; lrNum : LRNumber; begin key1 := 10; key2 := 75; GetRangeFromBTree(yyy,key1,GE,key2,LE,lrLst); (* build list of logical record numbers which are between 10 and 75 *) (* list is in lrLst *) lrNum := GetFirstLr(lrLst); while lrNum <> 0 do begin GetALogicalRecord(xxx,lrNum,thisRec,SizeOf(thisRec)); (* we can not do whatever with the record *) . . . lrNum := GetNextLr(lrLst); end; (* this will continue until end of list *) ------------------------------ var keyString : String[10]; lrLst : LrList; lrNum : LRNumber; begin keyString := 'C'; (* assume zzz is index holding values of type STRINGVALUE *) GetSubstringFromBTree(zzz,keyString,ST,lrLst); (* build list of logical record which begin with 13 TBTREE13 Copyright (c) 1988 Dean H. Farwell II 'C' *) (* list is in lrLst *) lrNum := GetFirstLr(lrLst); while lrNum <> 0 do begin GetALogicalRecord(xxx,lrNum,thisRec,SizeOf(thisRec)); (* we can not do whatever with the record *) . . . lrNum := GetNextLr(lrLst); end; (* this will continue until end of list *) Deleting Data - Deleting data is the inverse of inserting data. There are two steps involved. The first step is to retrieve the record. This is only required if we do not know the values for all fields which have a corresponding index. Once we have the record we can delete the entries from the index(es). The second step is to delete the logical record from the data file. For the following example assume that the same definitions used above apply but that field2 also has an index with the file name residing in a variable zzz. Also assume that we know that we want to delete logical record number 24: GetALogicalrecord(xxx,24,thisRec,SizeOf(thisRec)); DeleteValueFromBTree(yyy,24,thisRec.field1); DeleteValueFromBTree(zzz,24,thisRec.field2); DeleteRecord(xxx,24); (* delete value from data record *) Updating Data - There are no separate commands to update data. Updating is again a two step process. The first step is to fetch the appropriate data record(s). Once this is accomplished, the appropriate fields can be updated and the data file can be stored. A third step may be required. If any indexed fields (fields which have an index associated with them) are changed then the old value will have to be deleted and then the new value will have to be inserted into the index. Rather than show an example here, I refer you to EXAMPLE3.PAS. I noticed that in that example, I store the updated data record after updating the index rather than before. The order of these two operations is not important. Deleting Files - The operation of deleting a file is not going to be done very often but will be necessary from time to time. It is very important that the routine provided be used as opposed to going to DOS and doing a delete. The provided routine does some necessary cleanup work. Specifically, it gets rid of the appropriate BITMAP. The following is an example: DeleteFile(yyy); Notice that this routine is used by the user to delete both index and data files. Deleting a data file will not automatically delete the associated index files. 14 TBTREE13 Copyright (c) 1988 Dean H. Farwell II Sorting - Sorting can be done on any number of fields as you desire. Sorting requires two steps. The first is to build a sort field list, a list containing information on each of the fields to sort on. The second step is to use this sort field list to sort a logical records list. After sorting is completed, the user should destroy the sort field list using the provided routine unless the list will be required later. An example follows: type MyRecord = record field1 : Byte; (* for this example this is the only field indexed (Index1) although you could index all fields if desired. *) field2 : Word; field3 : String[10]; end; var lrLst : LrList; sPtr : SortFieldList; begin GetValidLogicalRecords(xxx,lrLst); (* list of all records *) sPtr := NIL; (* EXTREMELY CRITICAL !!! - you must pass in a pointer variable whose value is NIL. If you forget this little step all kinds of bad things will probably happen!!! *) AddFieldToSortList(sPtr,4,STRINGVALUE); AddFiledToSortList(sPtr,2,WORDVALUE); SortList(lrLst,xxx,sPtr,SizeOf(MyRecord)); DestroySortList(sPtr); (* retrieve the heap space *) end; Notice that the second parameter is the BYTE position of the sort field within the data record. It is not the field number. This is extremely important!! See SORT.PAS for details. Building/Rebuilding Indexes - Once in awhile you may manage to wipe out an index or in some other way cause it to be out of whack with your data file. Or, you may want to add an index to a field which previously did not have an index. This can easily be accomplished using a routine added in version 1.1. Rather than repeating the example here, please refer to EXAMPLE7.PAS which will show how to build a list of existing records and rebuild an index from that list. The added routine will also allow you to build a list of all logical records associated with a data file without using an index. Viewing Status Information - I have included several routines for checking on the values of certain parameters and for allowing the user to delve into the depths if desired. The first routine to discuss is one which allows you to find out how many entries are in a logical record list once it has been created. To accomplish 15 TBTREE13 Copyright (c) 1988 Dean H. Farwell II this the following call is used: cnt := GetCountLr(lrLst); (* lrLst is your variable where the list resides *) A second valuable utility is a routine which will return the number of files which are presently open using the FILEBUFF unit. In other words this returns the number of files which are on the files open list. This is not necessarily the number of files which DOS thinks is open. Only files opened using the create file or open file routines provided in FILEBUFF.PAS will be included in this number. Not included will be other files opened by the user and the files such as standard input and output etc. An example of a call is below: cnt := GetNumberOpenFiles; Several routines are provided in FILES.PAS which have not been discussed yet. The first is RecordUsed. This function will check to see if a given record is used. It is probably only useful to a user for checking to see if a logical record is in use (data record). Using it will not cause any problems, but see FILES.PAS to be sure that you understand what this routine is telling you. The other routine of interest in FILES.PAS is FixFileName. This is very important for appending the correct file extension to the file name. Remember, when a file is created, the correct extension is appended to the name automatically. If a program does not create a file but uses one that already exists then the file name residing in the users variable (of type FnString) must have the correct extension. Instead of leaving that to you I have included FixFileName. When you start up a program, call FixFileName for each data file and index file you will use. This will ensure that there will not be problems with the wrong extension. An example is below: var xxx : FnString; begin xxx := 'MyFile'; FixFileName(xxx,DATA); The largest number of utilities lies in PAGE.PAS. The first routines of interest are the ones for writing part or all of the buffer to disk. The demand paging used in the PAGE.PAS unit is critical to the performance of TBTREE13. Since a large number of pages are in the memory buffer, the need to constantly go out to disk is reduced. Disk seeks (head movement) is what slows down most database operations. The secret is to do everything you can to alleviate the need to go out to disk. Memory is many times faster than external storage. However, if changes are made to pages in the buffer we want to ensure that they do not get lost. We need to write them to disk at some point. Normally, a page will stay in memory until another page needs the space. The unit uses a least recently used algorithm which in nothing earth shattering (but it works). When a page needs to be swapped out, a check is made to see if it is dirty (changes have been made since it was read in or last written to disk). If it is clean no 16 TBTREE13 Copyright (c) 1988 Dean H. Farwell II problem. If it is dirty it is written to disk prior to allowing the page to be reused. The user can force pages to be written out at any time. One method is to set a parameter which forces a write to disk anytime a page is written to in memory. Essentially, a page will never be dirty this way. To accomplish this the SetImmediateDiskWrite routine is used. The user can set this parameter to TRUE or FALSE. FALSE is the default, but the user should probably make an explicit call anyway on initialization so there is no doubt or confusion. If FALSE is set, the user may still want to periodically write pages to disk. The user can write all pages for a given file or all pages in the buffer. For Example: WriteEntireBufferToDisk; or WriteBufferToDisk(xxx); The user should always write all pages to disk prior to termination. Otherwise, changes made to pages still in memory will be lost. A handy way to do this is to call it from an exit procedure. See EXAMPLE2.PAS for an example. A second set of routines are provided to get rid of (release) pages in the buffer. The pages should be written to disk prior to release. These routines can be dangerous. I need them internally for operations such as changing the size of the buffer dynamically and also for deleting a file. They are visible and free to use, but understand what you're doing first. If you're looking for a convenient way to wreck havoc with your data these two are perfect candidates. The only reason I can forsee why you would want to attempt to use these is to release some pages for files you won't need for a while to leave room for an operation which will need a lot of pages. Before releasing write the affected pages to disk!! There is not really a need to micro manage the resources like this and they can lead to problems if misused. Accidentally, losing a node (page) in the middle of an index will leave it in an unrepairable state and will ruin your whole day. A third set of routines are much friendlier and I can stay off my soap box. These routines manipulate and check the size of the buffer. All values are in terms of pages. A page is 512 bytes. To digress slightly, 512 worked well because a smaller number caused inefficiency in the least recently used page algorithm and also meant that index keys had to be smaller. A larger number means that less pages can be in memory at once. Do not try to change this constant as a lot of things are affected. Just changing the constant will not work properly. This is due to the fact that a constant declaration can not be declared using a simple expression. For example: const xxx = 10; yyy = xxx/2; (* illegal *) Therefore, other constants that depend on xxx have to be set 17 TBTREE13 Copyright (c) 1988 Dean H. Farwell II explicitly. In the above, yyy would have to be set to 5. Anyway, back to the matters at hand. To set the size of the buffer the following call should be made: SetMaxBufferPages(50); (* 25K bytes *) If a user does not set the page limit explicitly, the value will default to one page. Everything still works, but doing this will pretty well define slow. A minimum of 10 pages is required to have this work well at all. One hundred or so pages is preferred. A great deal depends on your application as well! A user can check this value at any time by doing the following: cnt := CheckMaxBufferPages; The user can check the number actually in use by doing the following: cnt := CheckPagesInUse; One routine not mentioned earlier is CheckImmediateDiskWrite which returns TRUE if dirty pages will be immediately written to disk, FALSE otherwise. One interesting routine is PrintPagebuffer. This routine is one that I developed for my own debugging purposes and is included to allow you to see what the page buffer looks like. One warning though, if the page buffer is large, the output will be quite lengthy. The last routine is more for curiosity's sake and for tuning the size of the buffer. A call to PrintBufferStats will print the statistics for buffer use. The number of pages in use, the maximum number of pages allowed, the number of requests for a page (fetches), the number of fetches in which the page was in the buffer (hits) and the ratio of hits to fetches will be output. A hit ratio of much below 95% could signal a requirement to increase the buffer size. You really need to keep this number in perspective though. If you are walzing through a file in which none of the records are in the buffer, the hit ratio will be poor. Other operations will have higher hit ratios. At least it's there to use as desired. A recently added routine contained in the BTREE unit is NumberOfBTreeLevels which returns the number of levels currently in an index. If that number gets above 10 you are doing something rather strange and may run into stack problems. The cause of such an occurence is that you have a huge number of index entries (many thousands) or your index entries are large. The only entries over 4 bytes are strings. These types of indexes should use strings as small as possible. Don't use first name, last name, and middle initial as all one indexed filed unless there is a really good reason to (doubtful). Any string index entry over about 20 bytes is suspect. Remember what an index is for .. to store something to allow quick retrievals. The index is not designed to be the data file. A deep index will simply not be as fast as a shallow index. The smaller the index entry size, the shallower the index. 18 TBTREE13 Copyright (c) 1988 Dean H. Farwell II Other Routines - I discussed above the routines which are directly applicable to a user who wants to use the indexing and data manipulation capabilities of TBTREE13. There are some lower level general routines such as those found in MATH.PAS and STRINGS.PAS which are used internally. Since they are fairly general in nature, they may have applicability for your other applications. I won't go into details on their inner workings here, but you have the source code, so you can look in the units to see if there is anything else useful for whatever you happen to be doing. Current Product Limitations The following limitations apply to TBTREE13. Only limitations inherent to TBTREE13 are discussed. DOS limits and practical limits like disk drive capacity are not discussed. These limitations may change (become less restrictive) in future additions. Number of Data Files - No Limit Number of Index Files Per Data File - No Limit Size of Data File - MAXLONGINT (2,147,483,647) Logical or Physical Records (whichever comes first) or disk space (the real limit!!) Size of Index File - MAXLONGINT Physical Records. The real restriction is going to be stack size. I use a recursive routine to traverse the BTrees. Recursion is very efficient for this but has the problem that each level keeps a couple of thousand bytes on the stack. It is very important to use a large stack size if indexes are going to be deep. Most indexes will not be more than 4 or 5 levels deep unless a HUGE number of records are involved or the keys in the indexes are large. Large is relative, but you need to try to keep entries below 20 bytes or so. This really only affects strings presently. I would like to know if anyone is running into problems with this as I can handle this iteratively rather than recursively which will make the problem disappear at the expense of some added coding complexity. I recently did a test to see to see how critical this is. I did a test using an index entry size of 100 bytes (100 byte strings). I entered over 27000 records before running out of disk space (took over one day on my archaic 4.77 MHz computer!). I had somewhere in the area of 15 levels which is more than 19 TBTREE13 Copyright (c) 1988 Dean H. Farwell II you should reasonably have. I also used a 32K stack (64K being the maximum allowed). Anyway, I couldn't blow it up so I don't think you will either! Size of Page Buffer - Limited by heap space. I have not used this with any type of heap extender or other program which uses other memory (extended, expanded) for the heap. I would enjoy seeing feedback if anyone tries it. If I need to do something to facilitate use of a heap extender let me know and I'll try to accomodate. Number of Files Open At Once - Actual number set by user, but user does not have to worry about keeping track of files used by TBTREE13 since it opens and closes files as required. Size of Key (Index Entry) - 494 bytes. The present restriction is really 256 bytes since the only composite data type allowed in the indexes is the string type. Realistically, you do not want to come close to this number of bytes per entry or your index will be much deeper than desired. Size of Data Record - 65520 bytes since this is the limit for structured (composite) types in Turbo Pascal 4.0. Max File Name Length - 80 characters including drive designator, path, file name and extension. Logical List Limit - Presently, 99 lists can be exist at one time. This is probably about 95 more than ever required. Number Of Sort Fields - Unlimited Future Enhancements / Directions One of the advantages of using the shareware concept to distribute this software is that I can be very flexible and can tailor this product to the needs of users. I am soliciting as much feedback as possible. I need to know if there is interest in a product such as this and also need to know what is really important to users. I hope to be able to accomplish the following in the future: I may change the BTREE.PAS unit to include a binary search routine internally to speed the finding of key matches. Presently I use a linear search within each node (index page). This will increase performance somewhat. It will be totally transparent to users. So far, it is faster than everyone imagined, so I have gotten no negative feedback concering performance. 20 TBTREE13 Copyright (c) 1988 Dean H. Farwell II I may change the delete routines in the BTREE.PAS unit to force combining of nodes when each is half full or less. Presently, a node is not deleted until the last entry is deleted. This will add a little bit of performance on queries and will reduce the size of the indexes if deletes are done often, but will otherwise be transparent. I may change the BTREE.PAS unit so that I do not use recursion to walk the tree. As mentioned earlier, I have done a test to see if anyone will have problems, and I really don't foresee any. One area which may prove slow if a data file becomes very large is that of finding an unused record. Since I want to reuse holes in the file prior to expanding the file I do a linear search to find holes. I walk through the bitmap looking for a logical record marked unused. I do the same for index files looking for unused physical records. On a large file this takes a little time. I may add the capability to make reuse optional. In other rwords the user can turn it on and off. Feedback in this area is appreciated. So far, I have not received any negative feedback on this. The biggest change that I will probably implement in the future is a unit which traps errors and sets an error flag. Version Release Notes Version 1.1 1. Added capability to do a search of an index on a range of values. 2. Added capability to do partial string match searches (strings only). Added one routine to the BTREE unit and several routines to the COMPARE unit and one type declaration to the NUMBERS unit to facilitate this change. 3. Corrected bug in BTREE unit. Documentation stated that duplicate entries for a given logical record number and field value were prevented from being entered. This was not previously the case although that is now true. 4. Added a routine (LOGICAL unit) to build a list of all current records for a file (all valid ie not deleted records). This is essential for rebuilding indexes or accessing data without an index. In actuality, a user could have built this routine, but it was not intuitively obvious. 5. Added the capability to delete an entry from an existing logical record list. 6. Divided BTREE.PAS into BTREE.PAS and two include files : BTREE1.INC and BTREE2.INC. 7. Added several more examples. Version 1.2 21 TBTREE13 Copyright (c) 1988 Dean H. Farwell II 1. Added Sorting capabilities. 2. Added several routines to LRECLIST.PAS to facilitate sorting. I also made a couple of internal cosmetic changes to LRECLIST.PAS. 3. Added one example. 4. Removed CalculateRecordSize and several unused data types from LOGICAL.PAS. These were for future projects and now are now targeted for TBASE. Version 1.3 1. Added SETOPS.PAS which provides Intersection, Union and Difference. 2. Added one example. 3. MATH.PAS redone in assemble language. See listing for details and credit to the author. 4. Added one routine (FindLrInList) to LRECLIST.PAS. 5. In LOGICAL.PAS changed MAXDATASIZE to 65520 since this is the maximum size for a structured type in Turbo Pascal 4.0. 6. TP4PRINT.PAS now prints the file creation date as part of the header. 7. Added NumberOfBTreeLevels routine to the BTREE unit. 8. Added StoreNewLogicalRecord routine to LOGICAL.PAS. This makes it easier to store new records. You no longer need to find the next available logical record number using the FirstUnUsedRecord routine located in FILES.PAS. StoreNewLogicalRecord does this for you and returns the associated logical record number. Turbo Pascal is a registered trademard of Borland International CompuServe is a registered trademark of CompuServe Incorporated 22