home *** CD-ROM | disk | FTP | other *** search
- #######################
- #### Start of File ####
- #######################
- -----------------------------------------------------------------
- File Name: vbdspec.txt
- Contents: VBD Specs Sheet
- -----------------------------------------------------------------
-
- Variable Block Database Version 1031, Revision A
-
- TOPICS:
- -------
- Overview
- VBD File Format
- VBD File Specs
- Platforms
-
- OVERVIEW:
- ---------
- The Variable Block Database is a file format used to store any
- type of variable-length binary data in re-sizeable blocks. The
- VBD file format was originally created to store persistent
- objects in disk files. A persistent object refers to an object
- that saves its state between program invocations. The VBD file
- format is not limited to storing persistent objects. Any type of
- data can be stored and manipulated in a VBD file. VBD files have
- this capability because the methods by which any data type is
- stored or retrieved must be defined within the application.
-
- VBD FILE FORMAT:
- ----------------
- Every VBD file contains a file header, an optional static storage
- area, and a dynamic storage area where variable blocks of data
- are stored. The file header is composed of six fields and is a
- total of 28 bytes in length:
-
- (1) Free space field - stores a pointer to the first block of
- de-allocated heap space (4 bytes)
-
- (2) End of file field - stores a pointer to the address of the
- last byte in the file (4 bytes)
-
- (3) Start of heap field - stores a pointer to the start of the
- dynamic storage area (4 bytes)
-
- (4) Highest block field - stores a pointer to the highest
- allocated block (4 bytes)
-
- (5) Signature field - signature used to identify the file type (8
- bytes)
-
- (6) Version field - used to identify the VBD file version
- number(4 bytes)
-
- This header is used to store information needed by the allocation
- functions, a signature, and a version number. The static data
- area is used to store fixed data that cannot be altered by any of
- the dynamic allocation routines. The size of the static data must
- by specified by the application that creates the VBD file. If no
- static area is specified then the dynamic data area will start
- directly after the VBD file header.
-
- The first 28 bytes of the VBD file (file addresses 0 - 28) will
- always contain the VBD header. If a static area is requested by
- the application, the static area will occupy the bytes requested.
- For example: if an application request 20 bytes to be reserved
- for a static data area when the file is created, then bytes 29
- through 48 (file addresses 29 - 48) will be reserved. The dynamic
- data area will start directly after that and occupy the rest of
- the file. A pointer to the start of the dynamic storage area can
- be calculated by adding the size of the VBD file header to the
- size of the static data area.
-
- The dynamic data area always starts out empty and grows when
- variable data blocks are allocated. The "free space", "end of
- file", and "start of heap" pointers stored in the VBD file header
- are used to maintain the dynamic data area. The "start of heap"
- pointer stores the address where the dynamic data area starts
- (which is where the static area ends.) The "end of file" pointer
- marks the end of the file and points to the location where the
- file can be extended during block allocation.
-
- Revision 'A' adds a 32-bit CRC checksum routine used to detect
- any bit errors that occur during data storage. The CRC is based
- on the Ethernet polynomial of 0x4C11DB7. A checksum is calculated
- when data is written to the VBD file, this includes the block
- header and the block data. The calculated checksum is then
- compared to data actually stored on disk. If the calculated
- checksum does not match the actual checksum, a bit error has
- occurred during data storage. All bit errors must be handled by
- the application since the type of data being stored is not known.
-
- Revision 'A' reserves four bytes at the end of each block that
- can be used by an application to store a 32-bit checksum with
- each block. Each time a new block is allocated space is reserved
- for the number of bytes requested plus four additional bytes. The
- application is responsible for reading and writing the block
- checksum since the type of data being stored is not known. The
- use of a block checksum is optional and does not have to be used
- to detect bit errors because of the built-in CRC checksum routine
- used each time any data is stored. Some applications require the
- use of block checksums to maintain the integrity of the file from
- one program invocation to the next.
-
- When blocks are de-allocated, the block is marked deleted and
- left in the file. The size of the file is determined by the
- number of blocks allocated and will remain the same no matter how
- many blocks are deleted. The number of deleted blocks is
- maintained in a non-contiguous list. Each block in the list
- points to the next block in the list starting at a specified
- address. The "free space" pointer in the VBD file header is used
- to store the file address of the block where the free space list
- starts.
-
- The "highest block" pointer in the VBD file header is used to
- store the address of the highest allocated variable block. It is
- used by the application when it needs to index data starting at
- the end of the file.
-
- The signature and version fields in the VBD file header must be
- set when a new file is created. The signature field is used to
- determine if the file is of the correct type. If the "VBDFILE"
- signature is not found then it is assumed that this is not a
- valid VBD file. The eighth byte is used for all revision changes.
- Version 1031 sets the revision letter to 'A' and performs
- compatibility checks to ensure backward compatibility with
- previous releases. Revision 'A' adds a 32-bit CRC checksum
- routine and reserves space at the end of each block for an
- application to write a 32-bit check-word.
-
- The version field is to represent this file version number. A
- version number is used to indicate that changes have been made to
- the application used to create VBD files. Version numbers can be
- used inside of an application to conditionally perform certain
- operations based on its value. This will ensure backward
- compatibility with previous versions.
-
- Variable Block (VB) headers manage all the variable data blocks
- created inside the dynamic data area. Block headers are used to
- mark the start of a variable data block. Every time a block is
- allocated a block header is written to the file. Allocation works
- by writing the block header and then reserving a specified number
- of bytes for the data that will be stored in the block. Revision
- 'A' reserves four bytes at the end of the block to allow the
- application to store a block checksum. After the space is
- allocated, the application is responsible for writing the data to
- the file starting at the address after the block header. Each
- block header contains four fields and is a total of 16 bytes in
- length:
-
- (1) Check word field - "0x0000FEFE" check word used for file
- integrity checks (4 bytes)
-
- (2) Length field - block length including the object, block
- header, and CRC (4 bytes)
-
- (3) Status field - stores the status of dynamic data stored in
- the block (4 byte)
-
- (4) Next deleted block field - stores a pointer to the next
- deleted block (4 bytes)
-
- The VB header "check word" field represents a 32-bit check word
- used for file integrity checks and is used to maintain
- synchronization within the VBD creator and the application.
- Revision 'A' uses all 32 bits of the block check-word. In
- previous releases the upper 16 bits of the block check-word was
- reserved for a 16-bit CRC. Revision 'A' implements a 32-bit CRC
- routine and reserves four bytes at the end of each block for a
- 32-bit checksum. If the check word value is changed, the file
- will no longer be compatible with previous releases.
-
- The "length" field stores the length of the object plus the size
- of the block header and the size of the block's CRC checksum
- value. This field is used to index the file block by block. By
- reading this value the application will always know where the
- next block is in sequence. The check-word is used to ensure that
- the next block in sequence is a valid variable block. The length
- of the object can be calculated by subtracting the size of the
- block header and the size of the CRC checksum from the "length"
- field.
-
- The "status" field stores the status of dynamic data stored in
- this block. Only one byte of the status field is used. The
- remaining three bytes of the status field is reserved for future
- use. The status of a variable data block can be determined by one
- of three byte values: 'N' for normal (ASCII 78), 'D' for deleted
- (ASCII 68), or 'R' for removed (ASCII 82.) A block marked 'N' for
- normal indicates that the block is in use and cannot be reclaimed
- by any of the allocation routines. A block marked 'D' for deleted
- means that the data in the block is still valid, but the block
- can be overwritten if needed. A block marked 'R' for removed
- means that the data in the block has been removed and the block
- can be overwritten if needed. Marking deleted blocks 'N' for
- normal can easily restore them. Once a block is removed it can
- never be restored.
-
- The "next deleted block" pointer in the VB header stores a
- pointer to the next deleted or removed variable block, only if
- this block has been deleted or removed. The total number of
- deleted blocks is maintained in a non-contiguous list. Each
- deleted or removed block in the list points to the next deleted
- or removed block in the list starting at the head of the list.
- The "free space" pointer in the VBD file header is used to store
- the file address where the head of the free space list is
- located. When a variable block is deleted or removed the next
- deleted block field is set to the VBD header's free space value
- to become the head of the free space list and the VBD header's
- "free space" pointer is set to the address of this deleted or
- removed block.
-
- In order to prevent the VBD files from becoming extremely
- fragmented due to numerous deletions, blocks marked deleted or
- removed will be reused by the allocation routine. When a new
- variable block is allocated and the "free space" field is not
- empty, the allocation routine will walk through the free space
- list looking for a deleted or removed block of the size to be
- allocated.
-
- One of two methods can be used to prevent fragmentation, the
- best-fit method or the first-fit method. The best-fit method
- works by scanning the entire free space list until the best
- location to reuse a block is found. Best-fit values are
- calculated based on a percentage of the number of bytes
- requested. Any blocks greater then 2.5 times bigger then the
- number of bytes requested will not be reused. This ensures that
- smaller blocks will not use small portions of very large blocks
- and all new blocks will use the maximum amount of space in any
- block that is reused. However, the best-fit method is very costly
- in terms of speed when the free space list is very large.
-
- The first-fit method works by searching the free space list until
- the first block of the appropriate size is found. The first block
- large enough to hold two block headers plus the number of bytes
- requested with at least one byte left over will be reused.
- Several splits can occur if the blocks vary greatly in size. When
- a block is split, the unused portion of the block is assigned a
- new block header (marked removed) and placed back on the free
- space list. A small block can cause a very large block to be
- divided several times leaving gaps of smaller and smaller blocks.
- The reclaim method used is application dependent. If all the
- blocks are more or less the same size the first-fit method is
- more efficient terms of speed. If all the blocks vary greatly in
- size the best-fit method is more efficient in preventing
- fragmentation.
-
- VBD FILE SPECIFICATIONS:
- ------------------------
- VBD version 1031, revision 'A' requires that all file addresses
- be represented by 32-bit signed integer values. This means that
- the file will be allowed to grow to a maximum size of 2.1 GB. The
- VBD file header must occupy 28 bytes of disk space starting at
- file address zero. The header must be comprised of four 32-bit
- signed integers and an eight-byte character array followed by
- another 32-bit signed integer. A VBD file's static data area will
- only occupy as many bytes as requested. Every variable data block
- will be stored directly after the static data and be marked with
- a block header. The block header must occupy 16 bytes of disk
- space and be comprised of four 32-bit unsigned integers. The data
- following any variable block header can have as many bytes as
- requested, up to 2.1 GB. Four bytes will be reserved at the end
- of each block to store an optional 32-bit checksum with each
- block.
-
- PLATFORM INDEPENDENCE:
- ----------------------
- File addresses are multi-byte numbers used to point to specific
- locations in a disk file. The byte order in which multi-byte
- numbers are stored in a computer's memory is specific to each
- microprocessor. For example, the Intel x86 family of
- microprocessors store the lowest-order byte first. Hewlett
- Packard's PA-RISC family of microprocessors store the
- highest-order byte first. A file address stored in a disk file
- will represent different values if the file is created on one
- system and read on the other system.
-
- In order to gain platform independent VBD files must use their
- own representation of 32-bit signed integers for file addresses.
- By manipulating the file addresses in memory before they are
- written to disk, it is possible to represent 32-bit signed
- integers independently of the operating system or hardware
- platform used. This will overcome any byte ordering problems
- encountered when writing file addresses to a common database file
- accessed by several different types of machines.
-
- The same scheme used to overcome the byte ordering problem
- encountered with 32-bit signed integers must also be applied to
- 32-bit unsigned integers and 16-bit integers if any of these data
- types are stored in the file. Also a similar method must be
- implemented for single and double precision floating point values
- if any of these types are used.
-
- String values are represented in memory exactly as they appear.
- Since none of the bytes in a string are reordered in memory, they
- can be written directly from memory to disk regardless of the
- platform used to create them.
- -----------------------------------------------------------------
- #####################
- #### End of File ####
- #####################
-
-
-
-