home *** CD-ROM | disk | FTP | other *** search
- Numerical data storage (MathVision - Seven Seas)
-
- MTRX FORM, for matrix data storage 19-July-1990
-
- Submitted by: Doug Houck
- Seven Seas Software
- (address, etc)
-
- INTRODUCTION:
-
- Numerical data, as it comes from the real world, is an ill-mannered beast.
- Often much is assumed about the data, such as the number of dimensions,
- formatting, compression, limits, and sizes. As such, data is not portable.
- The MTRX FORM will both store the data, and completely describe its
- format, such that programs no longer need to guess the parameters of
- a data file. There needs to be but one program to read ascii files and
- output MTRX IFF files.
-
- A matrix, by our definition, is composed of three types of things.
- Firstly, the atomic data, such as an integer, or floating point number.
- Secondly, arrays, which are simply lists of things which are all the same.
- Thirdly, structures, which are lists of things which are different.
- Both arrays and structures may be composed of things besides atomic data -
- they may contain other structures and arrays as well. This concept
- of nesting structures may be repeated to any desired depth.
-
- For example, a list of data pairs could be encoded as an array of structures,
- where each structure contains two numbers. A two-dimensional array is
- simply an array of arrays.
-
- Since space conservation is often desirable, there is provision for
- representing each number with fewer bits, and compressing the bits together.
-
-
- CHUNKS
-
- The MTRX FORM is composed of the definition of the structure, followed
- by the BODY which contains the data which is defined. Usually, there
- is only one set of data, but a smarter IFF read could use the definition
- as a PROPerty, with identically formatted data sets (BODYs) in a LIST.
-
- FORM MTRX
- definition (ARRY | STRU | DTYP)
- BODY
-
- ARRY: The array chunk defines a counted list of similar items.
- The first (required) chunk in an ARRY is ELEM, which gives the number
- of elements in the array. Optionally, there may be limits given, (LOWR
- and UPPR), which could be used in scaling during sampling of the data.
- Lastly is the definition of an element of the array, which may be a
- nested definition like everything else.
-
- ARRY ::= "ARRY" #{ ELEM [LOWR] [UPPR] [PACK] ARRY|STRU|DTYP }
-
- STRU: The structure chunk defines a counted list of dissimilar things.
- The first (required) chunk in a STRU is FLDS, which gives the number
- of fields in the structure. Lastly are definitions of each field
- in the structure. Again, each field may have a nested definition like
- everything else.
-
- STRU ::= "STRU" #{ FLDS ([PACK] ARRY|STRU|DTYP)* }
-
- VALU: The value contains a datatype, and then a constant of that type.
- The datatype contains the size of the constant, so this chunk has variable
- size. VALU is used in the ARRY chunk to give the scaling limits of the array.
-
- BODY: This is the actual data we went to so much effort to describe.
- It is stored in "row-first" format, that is, items at the bottom of the
- nested description are stored next to each other. In most cases, it
- should be sufficient to simply block-read the whole chunk from disk,
- unless the reader needs to adjust byte-ordering or store in a more
- time-efficient format in memory. Data is assumed to be byte-aligned.
-
- PACK: The PACK chunk is necessary when the bit length of the data is
- not a multiple of 8, that is, not byte-aligned, and the user wishes
- to conserve space by packing data items together. PACK is simply a
- number - the number of items to bit-pack before aligning on a byte.
- A PACK is in effect for the remainder of its nested scope, or until
- overridden by a new specification. A STRU or ARRY is assumed to have
- a PACK of 1 by default - it is not affected by PACKs in definitions above.
- A PACK of 0 means to byte-align before processing the next definition.
- The PACK specifier should be normalized. For example, when packing a large
- array of 3-bit numbers, PACK should be 8 since 3*8 = 24. In this case 8 is
- the smallest PACK number which aligns on a byte naturally.
-
- DTYP: The DataType is the most interesting chunk, as it attempts to define
- every conceivable type of numeric data with 32 bits. The 32 bits are broken
- down into three fields, 1) the size in bits, 2) the Class, and 3) SubClass.
- The Class makes the most major distinction, separating integers from floating
- point numbers from Binary Coded Decimal and etc. Within each class is a
- SubClass, which gives the specific encoding used. Finally, the Size tells
- what how much room the data occupies. The basic division of datatypes is
- given in the tree structure below.
-
- Class SubClass Size Final Specific Type
- ===== ======== ==== ===================
- |
- Binary Unsigned - 0 ------------ 8 UByte
- | 16 UWord
- | 32 ULong
- |
- Binary Signed --- 0 ------------ 8 Byte
- | 16 Word
- | 32 ULong
- |
- Real ------------Ieee38 -------- 32 Ieee Single Precision
- | |
- | Ieee308 ------- 64 Double Precision
- | | 32 Truncated Double Precision
- | |
- | FFP ----------- 32 Motorola Fast Floating Point
- |
- Text ----------- Text0 --------- ?? Null-terminated text
- | |
- | CText --------- ?? Number of characters in first byte
- | |
- | FText --------- ?? Fixed length, space padded
- |
- BCD ------------ Nibble -------- ??
- |
- Character ----- ??
-
- A design goal was to create a classification system which other people
- can easily plug into. Many data types are simply size variations on
- existing data types. For example, a 4-bit integer can be specified by
- giving the size as four bits in the Signed Binary class. Be aware that
- not all MTRX readers may support your new type, but there will not be
- any type clashes or ambiguities by following these rules. If you have
- a truly unique Class or SubClass, you will need to register it with
- Commodore to prevent clashes.
-
- A second design goal was to create a format which is easily decoded
- by software. By aligning fields on bytes, you have the option of redefining
- the datatype as a structure, so as to avoid shifting when accessing the
- fields. Since the numbers are sequentially assigned, they are suitable
- as array indicies, and may be optimized in a C switch statement.
-
- A third design goal was allowing for naive and sophisticated readers.
- In checking for a certain datatype, a naive reader can simply compare
- the whole datatype with a small set of known types, which assumes that
- each different Size defines a unique datatype. Sophisticated readers
- will consider the Class, SubClass and Size separately, so as to support
- arbitrary size integers, and truncated Floating Point numbers, for example.
-
- *
- * MTRX ::= "FORM" #{ "MTRX" ARRY|STRU|DTYP BODY } Matrix
- * ARRY ::= "ARRY" #{ ELEM [LOWR] [UPPR] [PACK] ARRY|STRU|DTYP } Array
- * STRU ::= "STRU" #{ FLDS ([PACK] ARRY|STRU|DTYP)* } Structure
- * ELEM ::= "ELEM" #{ elements } Array elements
- * LOWR ::= "LOWR" { VALU } Minimum limit
- * UPPR ::= "UPPR" { VALU } Maximum limit
- * VALU ::= #{ dtyp value } Value (in union)
- * dtyp ::= { size, subclass, class } Data Type (scalar)
- * DTYP ::= "DTYP" #{ dtyp }
- * FLDS ::= "FLDS" #{ number of fields } Number of Fields
- * PACK ::= "PACK" #{ units packed b4 byte alignment } Packing
- * BODY ::= "BODY" #{ inner-first binary dump } Data
- *
- * [] means optional
- * # means the size of the unit following
- * * means one or more of
- *
-