home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-08-11 | 130.2 KB | 2,836 lines |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- -----------------------------
-
-
-
- INSIDE TURBO PASCAL 5.5 UNITS
-
-
-
- -----------------------------
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- by
-
- William L. Peavy
-
- -----------------
-
- Revised: August 11, 1990
-
-
-
-
-
-
- ABSTRACT
-
- This document provides a revised report on researches into
- the structure and content of Unit (.TPU) files produced by
- Turbo Pascal (version 5.5) from Borland International. No
- assurances are possible regarding when (if ever) further
- updates will be available so the material is released to the
- Turbo Pascal user community in its admittedly imcomplete
- state since very little of consequence really remains to be
- done.
-
-
-
- COMMENTS
-
- Comments and feed-back are welcome -- especially new
- contributions. I can be reached via the following services:
-
- CompuServ (70042,2310)
-
- HalPC Telecom-1 (William;Peavy)
-
- HalPC Telecom-2 (Wm;Peavy)
-
-
-
- Table Of Contents
-
-
-
- Introduction ................................................ 3
-
- 1. Gross File Structure ..................................... 3
- 1.1 User Units .......................................... 4
-
- 2. Locators ................................................. 5
- 2.1 Local Links ......................................... 5
- 2.2 Global Links ........................................ 5
- 2.3 Table Offsets ....................................... 5
-
- 3. Unit Header .............................................. 6
- 3.1 Description ......................................... 6
- 3.2 File Size ........................................... 9
-
- 4. Symbol Dictionaries ...................................... 9
- 4.1 Organization ........................................ 9
- 4.2 Interface Dictionary ............................... 10
- 4.3 DEBUG Dictionary ................................... 10
-
- 4.4 Dictionary Elements ................................ 10
- 4.4.1 Hash Tables .................................. 10
- 4.4.1.1 Size ................................... 11
- 4.4.1.2 Scope .................................. 12
- 4.4.1.3 Special Cases .......................... 12
-
- 4.4.2 Dictionary Headers ........................... 13
-
- 4.4.3 Dictionary Stubs ............................. 13
- 4.4.3.1 Label Declaratives ("O") ............... 13
- 4.4.3.2 Un-Typed Constants ("P") ............... 14
- 4.4.3.3 Named Types ("Q") ...................... 14
- 4.4.3.4 Variables, Fields, Typed Cons ("R") .... 15
- 4.4.3.5 Subprograms & Methods ("S") ............ 16
- 4.4.3.6 Turbo Std Procedures ("T") ............. 17
- 4.4.3.7 Turbo Std Functions ("U") .............. 17
- 4.4.3.8 Turbo Std "NEW" Routine ("V") .......... 17
- 4.4.3.9 Turbo Std Port Arrays ("W") ............ 17
- 4.4.3.10 Turbo Std External Variables ("X") .... 17
- 4.4.3.11 Units ("Y") ........................... 18
-
- 4.4.4 Type Descriptors ............................. 19
- 4.4.4.1 Scope .................................. 19
- 4.4.4.2 Prefix Part ............................ 20
-
- 4.4.4.3 Suffix Parts ........................... 21
- 4.4.4.3.1 Un-Typed ......................... 21
- 4.4.4.3.2 Structured Types ................. 22
- 4.4.4.3.2.1 ARRAY Types ................ 22
- 4.4.4.3.2.2 RECORD Types ............... 22
- 4.4.4.3.2.3 OBJECT Types ............... 23
- 4.4.4.3.2.4 FILE (non-TEXT) Types ...... 23
- 4.4.4.3.2.5 TEXT File Types ............ 23
- 4.4.4.3.2.6 SET Types .................. 24
-
-
-
- - i -
-
-
-
- Table Of Contents
-
-
- 4.4.4.3.2.7 POINTER Types .............. 24
- 4.4.4.3.2.8 STRING Types ............... 24
-
- 4.4.4.3.3 Floating-Point Types ............. 24
-
- 4.4.4.3.4 Ordinal Types .................... 24
- 4.4.4.3.4.1 "Integers" ................. 25
- 4.4.4.3.4.2 BOOLEANs ................... 25
- 4.4.4.3.4.3 CHARs ...................... 25
- 4.4.4.3.4.4 ENUMERATions ............... 26
-
- 4.4.4.3.5 SUBPROGRAM Types ................. 26
-
- 5. Maps and Lists .......................................... 27
- 5.1 PROC Map ........................................... 27
- 5.2 CSeg Map ........................................... 28
- 5.3 Typed CONST DSeg Map ............................... 28
- 5.4 Global VAR DSeg Map ................................ 29
- 5.5 Donor Unit List .................................... 29
- 5.6 Source File List ................................... 30
- 5.7 DEBUG Trace Table .................................. 31
-
- 6. Code, Data, Relocation Info ............................. 32
- 6.1 Object CSegs ....................................... 32
- 6.2 CONST DSegs ........................................ 32
- 6.3 Relocation Data Table .............................. 33
-
- 7. Supplied Program ........................................ 34
- 7.1 TPUNEW ............................................. 35 |
- 7.2 TPURPT1 ............................................ 35
- 7.3 TPUAMS1 ............................................ 35
- 7.4 TPUUNA1 ............................................ 35
- 7.5 Modifications ...................................... 36
-
- 7.6 Notes on Program Logic ............................. 36 |
- 7.6.1 Formatting the Dictionary .................... 37 |
- 7.6.2 The Disassembler ............................. 38 |
-
- 8. Unit Libraries .......................................... 41
- 8.1 Library Structure .................................. 41
- 8.2 The TPUMOVER Utility ............................... 41
-
- 9. Application Notes ....................................... 41
-
- 10. Acknowledgements ....................................... 42
-
- 11. References ............................................. 43
-
-
-
-
-
-
-
-
-
-
- - ii -
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- INTRODUCTION
-
-
- This document is the outcome of an inquiry conducted into the
- structure and content of Borland Turbo Pascal (Version 5.5) Unit
- files. The original purpose of the inquiry was to provide a body of
- theory enabling Cross-Reference programs to resolve references to
- symbols defined in .TPU files where qualification was not explicitly
- provided. As is so often the case, one thing led to another and the
- scope of the inquiry was expanded dramatically. While this document
- should not be regarded as definitive, the author feels that the entire
- Turbo Pascal User community might gain from the information extracted
- from these files at the cost of so much time and effort.
-
- The material contained herein represents the findings and
- interpretations of the author. A great deal of guess-work was
- required and no assurances are given as to the accuracy of either the
- findings of fact or the inferences contained herein which are the sole
- work-product of the author. In particular, the author had access only
- to materials or information that any normal Borland customer has
- access to. Further, no Borland source-codes were available as the
- Library Routine source is not licensed to the author. In short, there
- was nothing irregular about how these findings were achieved.
-
- The material contained herein is placed in the public domain free of
- copyright for use of the general public at its own risk. The author
- assumes no liability for any damages arising from the use of this
- material by others. If you make use of this information and you get
- burned, TOUGH! The author accepts no obligation to correct any such
- errors as may exist in the supplied programs or in the findings of
- fact or opinion contained herein. On the other hand, this is not a
- "complete" work in that a great many questions remain open, especially
- as regards fine details. (The author is not a practitioner of Intel
- 80xxx Assembly Language and several open questions might best be
- addressed by persons competent in this area.) The author welcomes the
- input of interested readers who might be able to "flesh-out" some of
- these open questions with "hard" answers.
-
-
- 1. GROSS FILE STRUCTURE
-
-
- A Turbo Pascal Unit file (Version 5.5 only) consists of an array of
- bytes that is some exact multiple of sixteen (16). "Signature"
- information allows the compiler to verify that the .TPU file was
- compiled with the correct compiler version and to verify that the file
- is of the correct size. The fine structure of the file will be
- addressed in later sections at ever increasing levels of detail.
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 3
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- Graphically, the file may be regarded as having the following general
- layout:
-
- +-------------------+
- | Unit Header | Main Index to Unit File
- +-------------------+
- | Dictionaries: |
- | a) Interface |
- | b) Debugger * | For Local Symbol Access
- +-------------------+
- | PROC Map |
- +-------------------+
- | CSeg Map * | May be Empty
- +-------------------+
- | CONST DSeg Map * | May be Empty
- +-------------------+
- | VAR DSeg Map * | May be Empty
- +-------------------+
- | Donor Units * | May be Empty
- +-------------------+
- | Source Files |
- +-------------------+
- | Trace Table * | May be Empty
- +-------------------+
- | CODE Segment(s) * | May be Empty
- +-------------------+
- | DATA Segment(s) * | May be Empty
- +-------------------+
- | RELO Data * | May be Empty
- +-------------------+
-
-
- 1.1 USER UNITS
-
-
- Units prepared by the compiler available to ordinary users have a very
- straight-forward appearance and content. There may even be a little
- "wasted" space that might be removed if the compiler were just a
- little cleverer. The SYSTEM.TPU file is quite another thing however.
-
- The SYSTEM.TPU file (found in TURBO.TPL) is extraordinary in that
- great pains seem to have been taken to compact it. Further, it
- contains a great many types of entries that just don't seem to be
- achievable by ordinary users and I suspect that much (if not all) of
- it was "hand-coded" in Assembler Language.
-
- In the following sections, the details of these optimizations will be
- explained in the context of the structural element then under
- discussion.
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 4
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 2. LOCATORS
-
-
- The data in these files has need of structure and organization to
- support efficient access by the various programs such as the compiler,
- the linker and the debugger. This organization is built on a solid
- foundation of locators employed in the unit's data structures.
-
-
-
- 2.1 LOCAL LINKS
-
-
- Local Links (LL's) are items of type WORD (2 bytes) which contain an
- offset which is relative to the origin of the unit file itself. This
- implies that a unit must be somewhat less than 64K bytes in size. If
- the .TPU file is loaded into the heap, then LL's can be used to locate
- any byte in the segment beginning with the load point of the file.
-
-
-
- 2.2 GLOBAL LINKS
-
-
- Global Links (LG's) are used to locate type descriptors which may
- reside in other Units (i.e., units external to the present unit).
- LG's are structured items consisting of two (2) words. The first of
- these is an LL that is relative to the origin of the (possibly)
- external unit. The second word is an LL which locates the stub of the
- unit entry in the current unit dictionary for the (possibly) external
- unit. This dictionary entry provides the name of the unit that
- contains the item the LG points to.
-
- This provides a handy mechanism for locating type descriptors which
- are defined in other separately compiled units.
-
-
-
- 2.3 TABLE OFFSETS
-
-
- Finally, various data-structures within a .TPU file are organized as
- arrays of fixed-length records or as lists of variable-length records.
- Efficient access to such records is achieved by means of offsets
- rather than subscripts (an addressing technique denied Pascal). These
- offsets are relative to the origin of the array or list being
- referenced rather than the origin of the unit.
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 5
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 3. UNIT HEADER
-
-
- The Unit Header comprises the first 64 bytes of the .TPU file. It
- contains LL's that effectively locate all other sections of the .TPU
- file plus statistics that enable a little cross-checking to be
- performed. Some parts of the Unit Header appear to be reserved for
- future use since no unit examined by this author has ever contained
- non-zero data in these apparently reserved fields.
-
-
-
- 3.1 DESCRIPTION
-
-
- The Unit Header provides a high-level locator table whereby each major
- structure in the unit file can be addressed. The following provides a
- Pascal-like explanation of the layout of the header followed by
- further narrative discussion of the contents of the individual fields
- in the Unit Header.
-
- Type HdrAry = Array[0..3] of Char; LL = Word;
-
- UnitHeader = Record
-
- FilHd : HdrAry; { +00 : = 'TPU6' }
- Fillr : HdrAry; { +04 : = $00000000 }
- UDirE : LL; { +08 : to Dictionary Head-This Unit }
- UGHsh : LL; { +0A : to Interface Hash Header }
- UHPrc : LL; { +0C : to PROC Map }
- UHCsg : LL; { +0E : to CSeg Map }
- UHDsT : LL; { +10 : to DSeg Map-Typed CONST's }
- UHDsV : LL; { +12 : to DSeg Map-GLOBAL Variables }
- URULt : LL; { +14 : to Donor Unit List }
- USRCF : LL; { +16 : to Source file List }
- UDBTS : LL; { +18 : to Debug Trace Step Controls }
- UndNC : LL; { +1A : to end non-code part of Unit }
- ULCod : Word; { +1C : Size of Code }
- ULTCon: Word; { +1E : Size of Typed Constant Data }
- ULPtch: Word; { +20 : Size of Relo Patch List }
- Unknx : Word; { +22 : Number of Virtual Objects??? }
- ULVars: Word; { +24 : Size of GLOBAL VAR Data }
- UHash2: LL; { +26 : to Debug Hash Header }
- UOvrly: Word; { +28 : Number of Procs to Overlay?? }
- UVTPad: Array[0..10]
- of Word; { +2A : Reserved for Future Expansion? }
-
- End; { UnitHeader }
-
- FilHd contains the characters "TPU6" in that order. This is
- clear evidence that this unit was compiled by Turbo Pascal
- Version 5.5.
-
- Fillr is apparently reserved and contains binary zeros.
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 6
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- UDirE contains an LL (WORD) which points to the Dictionary
- Header in which the name of this unit is found.
-
- UGHsh contains an LL (WORD) which points to a Hash table that is
- the root of the Interface Dictionary tree.
-
- UHPrc contains an LL (WORD) which points to the PROC Map for
- this unit. The PROC Map contains an entry for each
- Procedure or Function declared in the unit (except for
- INLINE types), plus an entry for the Unit Initialization
- section. The length of the PROC Map (in bytes) is
- determined by subtracting this LL (at 000C) from the LL at
- offset 000E.
-
- UHCsg contains an LL (WORD) which points to the CSeg (CODE
- Segment) Map for this unit. The CSeg Map contains an
- entry for each CODE Segment produced by the compiler plus
- an entry for each of the CODE Segments included via the
- {$L filename.OBJ} compiler directive. The length of this
- Map (in bytes) is obtained by subtracting this LL (at
- 000E) from the word at 0010. The result may be zero in
- which case the CSeg Map is empty.
-
- UHDsT contains an LL (WORD) which points to the DSeg (DATA
- Segment) Map that maps the initializing data for Typed
- CONST items plus templates for VMT's (Virtual Method
- Tables) that are associated with OBJECTS which employ
- Virtual Methods. The length of this Map (in bytes) is
- obtained by subtracting this LL (at 0010) from the word at
- 0012. The result may be zero in which case this DSeg Map
- is empty.
-
- UHDsV contains an LL (WORD) which points to the DSeg (DATA
- Segment) Map that contains the specifications for DSeg
- storage required by VARiables whose scope is GLOBAL. The
- length of this Map (in bytes) is obtained by subtracting
- this LL (at 0012) from the word at 0014. The result may
- be zero in which case this DSeg Map is empty.
-
- URULt contains an LL (WORD) which points to a table of units
- which contribute either CODE or DATA Segments to the .EXE
- file for a program using this Unit. This is called the
- "Donor Unit Table". The length of this table (in bytes)
- is obtained by subtracting this LL (at 0014) from the word
- at 0016. The result may be zero in which case this table
- is empty.
-
- USRCF contains an LL (WORD) which points to a list of "source"
- files. These are the files whose CODE or DATA Segments
- are included in this Unit by the compiler. Examples are
- the Pascal Source for the Unit itself, plus the .OBJ files
- included via the {$L filename.OBJ} compiler directive.
- The length of this table (in bytes) is obtained by
- subtracting this LL (at 0016) from the word at 0018. The
- result may be zero in which case this table is empty.
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 7
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- UDBTS contains an LL (WORD) which points to a Trace Table used
- by the DEBUGGER for "stepping" through a Function or
- Procedure contained in this Unit. The length of this
- table (in bytes) is obtained by subtracting this LL (at
- 0018) from the word at 001A. The result may be zero in
- which case this table is empty.
-
- UndNC contains an LL (WORD) which points to the first free byte
- which follows the Trace Table (if any). It serves as a
- delimiter for determinimg the size of the Trace Table.
- This LL (when rounded up to the next integral multiple of
- 16) serves to locate the start of the code/data segments.
-
- ULCod is a WORD that contains the total byte count of all CODE
- Segments compiled into this Unit.
-
- ULTCon is a WORD that contains the total byte count of all Typed
- CONST and VMT DATA Segments compiled into this unit.
-
- ULPtch is a WORD that contains the total byte count of the
- Relocation Data Table for this unit.
-
- Unknx is a WORD whose usage is poorly understood. It appears
- always to be zero except when the Unit contains OBJECTs
- which employ Virtual Methods.
-
- ULVars is a WORD that contains the total byte count of all GLOBAL
- VAR DATA Segments compiled into this unit.
-
- UHash2 contains an LL (WORD) which points to a Hash Table which
- is the root of the DEBUGGER Dictionary. If Local Symbols
- were generated by the compiler (directive {$L+}) then ALL
- symbols declared in the unit can be accessed from this
- Hash Table. In the SYSTEM.TPU file, there is no such
- Dictionary and the LL stored here points to the INTERFACE
- Dictionary. This is an example of Hash Table "Folding" to
- save space which has been observed only in SYSTEM.TPU.
-
- UOvrly is a WORD whose usage is poorly understood. This word is
- usually zero unless the Unit was compiled with the Overlay
- Directive {$O+}.
-
- UVTPad begins a series of eleven (11) words that are apparently
- reserved for future use. Nothing but zeros have ever been
- seen here by this author.
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 8
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 3.2 FILE SIZE
-
-
- An independent check on the size of the .TPU file is available using
- information contained in the Unit Header. This is also important for
- .TPL (Unit Library) organization. To compute the file size, refer to
- the four (4) words at offsets 001A, 001C, 001E and 0020. Round the
- contents of each of these words to the lowest multiple of 16 that is
- greater than or equal to the content of that word. Then form the sum
- of the rounded words. This is the .TPU file size in bytes.
-
-
-
- 4. SYMBOL DICTIONARIES
-
-
- This area contains all available documentation of declared symbols and
- procedure blocks defined within the unit. Depending on compiler
- options in effect when the unit was compiled, this section will
- contain at a minimum, the INTERFACE declarations, and at a maximum,
- ALL declarations. The information stored in the dictionary is highly
- dependent on the context of the symbol declared. We defer further
- explanation to the appropriate section which follows.
-
-
-
- 4.1 ORGANIZATION
-
-
- The dictionary is organized with a Hash Table as its root. The hash
- table is used to provide rapid access to arbitrary symbols. Since
- Turbo Pascal compiles very rapidly, I presume the hash function to be
- worthwhile to say the least.
-
- The dictionary itself may be thought of as an n-way tree. Each
- subtree has its roots in a hash table. There may be a great many hash
- tables in a given unit and their number depends on unit complexity as
- well as the options chosen when the unit was compiled. Use of the
- {$L+} directive produces the densest trees. The hash tables are
- explained in detail a few sections further on.
-
- Hash tables point to Dictionary Headers. When two or more symbols
- produce the same hash function result, a collision is said to occur.
- Collisions are resolved by the time-honored method of chaining
- together the Dictionary Headers of those symbols having the same hash
- function result. Dictionary supersetting is accomplished using these
- chains.
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 9
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.2 INTERFACE DICTIONARY
-
-
- The INTERFACE dictionary contains all symbols and the necessary
- explanatory data for the INTERFACE section of a Unit. Symbols get
- added to the Unit using increasing storage addresses until the
- IMPLEMENTATION section is encountered.
-
-
-
- 4.3 DEBUG DICTIONARY
-
-
- The DEBUG dictionary (if present) is a superset of the INTERFACE
- dictionary. It is used by the Turbo Debugger to support its many
- features when tracing through a unit. If present, this dictionary is
- rooted in its own hash table. The hash table is effectively
- initialized when the IMPLEMENTATION keyword is processed by the
- compiler. This takes the form (initially) of an unmodified copy of
- the INTERFACE hash table, to which symbols are added in the usual
- fashion. Thus, the hash chains constructed or extended at this time
- lead naturally to the INTERFACE chains and this is how the superset is
- effectively implemented.
-
-
-
- 4.4 DICTIONARY ELEMENTS
-
-
- The dictionary contains four major elements. These are: hash tables,
- Dictionary Headers, Dictionary Stubs and Type Descriptors. The
- distinction between Dictionary Headers and Stubs is essentially
- arbitrary and is made in this document to assist in exposition. They
- might just as easily be regarded as a single element (such as symbol
- entry).
-
-
-
- 4.4.1 HASH TABLES
-
-
- As has been intimated, Hash Tables are the glue that binds the
- dictionary entries together and gives the dictionary its "shape".
- They effectively implement the scope rules of the language and speed
- access to essential information.
-
- Each Hash table begins with a 2-byte size descriptor. This descriptor
- contains the number of bytes in the table proper (less 2). Thus, the
- descriptor directly points to the last bucket in the hash table. For
- a hash table of 128 bytes, the size descriptor contains 126. The
- first bucket in the table immediately follows the size descriptor.
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 10
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.1.1 SIZE
-
-
- So far, three different hash table sizes have been observed. The
- INTERFACE and DEBUG hash tables are usually 128 bytes (64 entries) in
- size plus 2 bytes of size description, but the SYSTEM.TPU unit is a
- special case, containing only 16 entries. Hash tables which anchor
- subtrees whose scope is relatively local usually contain four (4)
- entries (8 bytes).
-
- Graphically, a Hash Table with four slots has the following layout:
-
- +--------------------+
- | 0006h | Size Descriptor
- +====================+
- | slot 0 | an LL or zero
- +--------------------+
- | slot 1 | an LL or zero
- +--------------------+
- | slot 2 | an LL or zero
- +--------------------+
- | slot 3 | an LL or zero
- +--------------------+
-
- It should be noted that the Size Descriptor furnishes an upper bound
- for the hash function itself. Thus, it seems possible that a single
- hash function is used for all hash tables and that its result is ANDed
- with the Size Descriptor to get the final result. Because the sizes
- are chosen as they are (powers of 2) this is feasible. Note that in
- the above example, 6 = 2 * (n - 1) where n = 4 {slot count}. All of
- the hash tables observed so far have this property. What you get is a
- really efficient MOD function.
-
- Suppose that the hash of a given symbol is 13 and the proper slot must
- be located for a hash table of four entries. If we let "h" be the raw
- result of 13, then our final hash is (h SHL 1) AND ((4-1) SHL 1) or
-
- (13 SHL 1) AND 6 = 2 !
-
- One final note on this subject. Given these properties, "Folding" of
- sparse hash tables is a rather trivial exercise so long as the new
- hash table also contains a number of slots that is a power of 2. This
- point is intriguing when one recalls that the SYSTEM.TPU hash table
- has only 16 slots rather than the usual 64.
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 11
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.1.2 SCOPE
-
-
- The INTERFACE and DEBUG dictionary hash tables are Global in Scope
- even though the symbols accessed directly via the DEBUG hash table may
- be private. On the other hand, other hash tables are purely local in
- scope. For example, the fields declared within a record are reached
- via a small local hash table, as are the parameters and local
- variables declared within procedures and functions. Even OBJECTS use
- this technique to provide access to Methods and Object Fields.
-
- Access to such local scope fields/methods requires use of qualified
- names which ensures conformity to Pascal scope rules. The method is
- truly simple and elegant.
-
-
-
- 4.4.1.3 SPECIAL CASES
-
-
- The SYSTEM.TPU Unit is a special case. Its INTERFACE and DEBUG hash
- tables have apparently been "hand-tuned" for small size. Each
- contains only sixteen (16) entries. In addition, the DEBUG hash table
- is empty since there is no local symbol generation in this unit.
- Therefore, the DEBUG hash table does not exist as a separate entity,
- its function being served by the INTERFACE hash table. The pointer to
- the DEBUG hash table (in the Unit Header) has the same value as the
- pointer to the INTERFACE hash table (SYSTEM unit ONLY).
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 12
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.2 DICTIONARY HEADERS
-
-
- This is the structure that anchors all information known by the
- compiler about any symbol. The format is as follows:
-
- +00: An LL which points to the next (previous) symbol in the
- same scope which had the same hash function value.
-
- +02: A character that defines the category the symbol belongs
- to and defines the format of the Dictionary Stub which
- follows the Dictionary Header.
-
- +03: A String (in the Pascal sense) of variable size that
- contains the text of the symbol (in UPPER-CASE letters
- only). The SizeOf function is not defined for these
- strings since they are truncated to match the symbol size.
- The "value" of the SizeOf function can be determined by
- adding 1 to the first byte in the string. Thus,
- Ord(Symbol[0])+1 is the expression that defines the Size
- of the symbol string. Turbo Pascal defines a symbol as a
- string of relatively arbitrary size, the most significant
- 63 characters of which will be stored in the dictionary.
- Thus, we conclude that the maximum size of such a string
- is 64 bytes.
-
-
-
- 4.4.3 DICTIONARY STUBS
-
-
- Dictionary Stubs immediately follow their respective headers and their
- format is determined by the category character in the Dictionary
- Header. The function of the stub is to organize the information
- appropriate to the symbol and provide a means of accessing additional
- information such as type descriptors, constant values, parameter lists
- and nested scopes. The format of each Stub is presented in the
- following sub-sections.
-
-
-
- 4.4.3.1 LABEL DECLARATIVES ("O")
-
-
- This Stub consists of a WORD whose function is (as yet) unknown.
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 13
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.3.2 UN-TYPED CONSTANTS ("P")
-
-
- This Stub consists of (2) two fields:
-
- +00: An LG which points to a Type Descriptor (usually in
- SYSTEM.TPU). This establishes the minimum storage
- requirement for the constant. The rules vary with the
- type, but the size of the constant data field (which
- follows) is defined using the Type Descriptor(s).
-
- +04: The value of the constant. For ordinal types, this value
- is stored as a LONGINT (size=4 bytes). For Floating-Point
- types, the size is implicit in the type itself. For
- String types, the size is determined from the length of
- the string which is stored in the initial byte of the
- constant.
-
-
-
- 4.4.3.3 NAMED TYPES ("Q")
-
-
- This Stub consists of an LG (4-bytes) that points to the Type
- Descriptor for this symbol.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 14
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.3.4 VARIABLES, FIELDS, TYPED CONS ("R")
-
-
- This Stub contains information required to allocate and describe these
- types of entities. The format and content is as follows:
-
- +00: A one-byte flag that precisely identifies the class of the
- item being described. The known values and their proper
- interpretation is as follows:
-
- 0 -> Global Variables Allocated in DS;
- 1 -> Typed Constants Allocated in DS;
- 2 -> LOCAL Variables & VALUE Parameters on STACK;
- 6 -> ADDRESS Parameters allocated on STACK;
- 8 -> Fields suballocated in RECORDS and OBJECTS, plus
- METHODS declared for OBJECTS.
-
- +01: A WORD containing the allocation offset in bytes;
-
- +03: A WORD whose content depends on the one-byte flag that
- this stub begins with. The context-dependent values
- observed thus far are:
-
- If the flag is 0, 2 or 6, then this word is an LL that
- locates the containing scope or zero if none;
-
- If the flag is 8, then this word is an LL that locates the
- Dictionary Header for the next field or method defined
- within the Record or Object;
-
- If the flag is 1, then this word is an offset within the
- CONST DSeg Map that locates the text of the Typed Constant
- Data.
-
- +05: An LG that locates the proper Type Descriptor for this
- symbol.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 15
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.3.5 SUBPROGRAMS & METHODS ("S")
-
-
- Subprograms, especially since Object Methods are supported, have a
- rather involved stub. Its format is as follows:
-
- +00: A byte that contains bit-switches. These bit switches
- have a great deal to do with the size of this stub and
- with the proper interpretation of what follows. The
- observed values of the bit-switches are as follows:
-
- xxxxxxx1 -> Symbol declared in INTERFACE;
- xxxxxx1x -> Symbol is an INLINE Declarative;
- xxxx1x0x -> Symbol has EXTERNAL attribute;
- x001xxxx -> Symbol is an ordinary Object Method;
- x011xxxx -> Symbol is a CONSTRUCTOR Method;
- x101xxxx -> Symbol is a DESTRUCTOR Method;
-
- +01: A Word whose interpretation depends on whether we have an
- INLINE Declarative Subprogram or not. If this is an
- INLINE Declarative Subprogram, then this word contains the
- byte-count of the INLINE code text at the end of this
- stub. Otherwise, this word is the offset within the PROC
- Map that locates the object code for this Subprogram.
-
- +03: A Word that contains an LL which locates the containing
- scope in the dictionary, or zero if none.
-
- +05: A Word that contains an LL which locates the local Hash
- Table for this scope. A local hash table provides access
- to all formal parameters of the Subprogram as well as all
- Symbols whose declarations are local to the scope of this
- Subprogram.
-
- +07: A Word that is zero unless the symbol is a Virtual Method.
- In this case, then the content is the offset within the
- VMT for the owning object that defines where the FAR
- POINTER to this Virtual Method is stored.
-
- +09: A Word that is zero unless the symbol is a Method. In
- this case, then the content is an LL which locates the
- next METHOD for this Object.
-
- +0B: A complete Type-Descriptor for this Subprogram. The
- length is variable and depends upon the number of Formal
- Parameters declared in the header. A complete description
- of this subfield is found in a later section
- (4.4.4.3.2.6).
-
- +??: If this Symbol represents an INLINE Declarative
- Subprogram, then the object-code text begins here. The
- byte-count of the text occurs at offset 0001h in this
- stub.
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 16
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.3.6 TURBO STD PROCEDURES ("T")
-
-
- This Stub consists of two bytes, the first of which is unique for each |
- procedure and increments by 4. I have found nothing in the SYSTEM |
- unit (which is where this entry appears) that this seems directly |
- related to. The second byte is always zero. |
-
-
-
- 4.4.3.7 TURBO STD FUNCTIONS ("U")
-
-
- This Stub consists of two bytes, the first of which is unique for each |
- function and increments by 4. I have found nothing in the SYSTEM unit |
- (which is where this entry appears) that this seems directly related |
- to. I wouldn't be surprised if this byte were an index into a TURBO |
- compiler table that points to specialized parse tables/action routines |
- for handling these functions and their non-standard parameter lists. |
-
- The second byte seems to be a flag having the values $00, $40 and $C0. |
- I strongly suspect that the flag $C0 marks exactly those functions |
- which may be evaluated at compile-time. The meaning behind the other |
- values is not known to me. |
-
-
-
- 4.4.3.8 TURBO STD "NEW" ROUTINE ("V")
-
-
- This Stub consists of a WORD whose function is (as yet) unknown. This |
- is the only Standard Turbo routine that can behave as a procedure as |
- well as a function (returning a pointer value). |
-
-
-
- 4.4.3.9 TURBO STD PORT ARRAYS ("W")
-
-
- This Stub consists of a byte whose value is 0 for byte arrays, and 1
- for word arrays.
-
-
-
- 4.4.3.10 TURBO STD EXTERNAL VARIABLES ("X")
-
-
- This Stub consists of an LG (4-bytes) that points to the Type
- Descriptor for this symbol.
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 17
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.3.11 UNITS ("Y")
-
-
- Unit Stubs have the following content:
-
- +00: A Word whose apparently reserved for use by the Compiler
- or Linker.
-
- +02: A Word that seems to contain some kind of "signature" used
- to detect inconsistent Unit Versions. This author
- suspects that this consists of some kind of sum-check or
- hash total but has not yet identified the algorithm which
- computes the value stored in this word.
-
- +04: A Word that contains an LL which locates the Successor
- Unit in the "Uses" list. In fact, the "Uses" lists of
- both the INTERFACE and IMPLEMENTATION sections of the Unit
- are merged by this Word into a single list. A value of
- zero is used to indicate no successor.
-
- +06: A Word that contains an LL which locates the Predecessor
- Unit in the "Uses" list. For the SYSTEM unit entry, this
- value is always zero to indicate no predecessor. For the
- Unit being compiled, this LL locates the final Unit in the
- combined "Uses" list.
-
- In effect, the two LL's at offsets 0004 and 0006 organize the units
- into both forward and backward linked chains. The entry for the unit
- being compiled is effectively the head of both the forward and the
- backward chains. The final unit in the merged "Uses" list is the tail
- of the forward chain, and the SYSTEM unit is the tail of the backward
- chain.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 18
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.4 TYPE DESCRIPTORS
-
-
- Type Descriptors store much of the semantic information that applies
- to the symbols declared in the unit. Implementation details can be
- managed using high-level abstractions and these abstractions can be
- shared.
-
-
-
- 4.4.4.1 SCOPE
-
-
- Type Descriptor sharing can occur across the boundaries which are
- implicit in unit modules. Thus, a type defined in one unit may be
- "imported" by some other module. Also, the pre-defined Pascal Types
- (plus the Turbo Pascal extensions) are defined in the SYSTEM.TPU unit
- and there needs to be a means of "importing" such Type Descriptors
- during compilation. This is precisely the objective of the LG locator
- which was described in section 2.2 (above). Type Descriptors are
- NEVER copied between units. The binding always occurs by reference at
- compile time and this helps support the technique of modifying a unit
- and compiling it to a .TPU file, then re-compiling all units/programs
- that "USE" it.
-
- Type Descriptors have many roles so their format varies. We have
- divided these structures into two parts: The PREFIX Part (which is
- always present and) whose format is fairly constant and the SUFFIX
- Part whose content and format depends on the attributes that are part
- of the type definition.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 19
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.4.2 PREFIX PART
-
-
- The Prefix Part of every Type Descriptor consists of four (4) bytes.
- The usage is consistent for all types observed by this author and the
- format is as follows:
-
- +00: A Byte that identifies the format of the Suffix part.
- This is essentially based on several high-level categories
- which the Suffix Parts support directly. The observed set
- of values is as follows:
-
- 00h -> an un-typed entity;
- 01h -> an ARRAY type;
- 02h -> a RECORD type;
- 03h -> an OBJECT type;
- 04h -> a FILE type (other than TEXT);
- 05h -> a TEXT File type;
- 06h -> a SUBPROGRAM type;
- 07h -> a SET type;
- 08h -> a POINTER type;
- 09h -> a STRING type;
- 0Ah -> an 8087 Floating-Point type;
- 0Bh -> a REAL type;
- 0Ch -> a Fixed-Point ordinal type;
- 0Dh -> a BOOLEAN type;
- 0Eh -> a CHAR type;
- 0Fh -> an Enumerated ordinal type.
-
- +01: A Byte used as a modifier. Since the above scheme is too
- general for machine-dependent details such as storage
- width and sign control, this modifier byte supplies
- additional data as required. The author has identified
- several cases in which this information is vital but has
- not spent very much time on the subject. The chief areas
- of importance seem to be in the 8087 Floating-Point types,
- and the Fixed-Point ordinal types. The semantics seem to
- be as follows:
-
- 0A 00 -> The type "SINGLE"
- 0A 02 -> The type "EXTENDED"
- 0A 04 -> The type "DOUBLE"
- 0A 06 -> The type "COMP"
-
- 0C 00 -> an un-named BYTE integer
- 0C 01 -> The type "SHORTINT"
- 0C 02 -> The type "BYTE"
- 0C 04 -> an un-named WORD integer
- 0C 05 -> The type "INTEGER"
- 0C 06 -> The type "WORD"
- 0C 0C -> an un-named double-word integer
- 0C 0D -> The type "LONGINT"
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 20
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- One important feature of the above semantics is the fact
- that an un-typed CONST declaration refers to the above two
- bytes to determine the storage space needed in the
- dictionary for the data value of the constant. This can
- be a little involved however as the constant may contain
- its own length descriptor (as in the case of a character
- string) in which case it may be sufficient to identify
- the high-level type category without any modifier byte.
-
- +02: A Word that contains the number of bytes of storage that
- are required to contain an object/entity of this type.
- For types that represent variable-length objects/entities
- such as strings, this word may define the value returned
- by the SIZEOF function as applied to the type.
-
-
-
- 4.4.4.3 SUFFIX PARTS
-
-
- Suffix Parts further refine the implementation details of the type and
- also provide subrange constraints where appropriate. In some cases
- the Suffix part is empty since all semantic data for the type is
- contained in the Prefix part.
-
-
-
- 4.4.4.3.1 UN-TYPED
-
-
- This Suffix Part is empty. Nothing is known about an un-typed entity.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 21
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.4.3.2 STRUCTURED TYPES
-
-
- The structured types represent aggregates of lower-level types. We
- include ARRAY, RECORD, OBJECT, FILE, TEXT, SET, POINTER and STRING
- types in this category.
-
-
-
- 4.4.4.3.2.1 ARRAY TYPES
-
-
- The Suffix Part of the ARRAY type is so constructed as to be able to
- support recursive or nested definition of arrays. The suffix format
- is as follows:
-
- +00: An LG that locates the Type Descriptor for the "base-type"
- of the array. This is the type of the entity being
- arrayed and may itself be an array.
-
- +04: An LG that locates the Type Descriptor for the array
- bounds which is a constrained ordinal type or subrange.
-
-
-
- 4.4.4.3.2.2 RECORD TYPES
-
-
- RECORD types have nested scopes. The Suffix part provides a base
- structure by which to locate the fields local to the scope of the
- Record type itself. The format is as follows:
-
- +00: A Word containing an LL which locates the local Hash Table
- that provides access to the fields in the nested scope.
-
- +02: A Word containing an LL which locates the Dictionary
- Header of the initial field in the nested scope. This
- supports a "left-to-right" traversal of the fields in a
- record.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 22
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.4.3.2.3 OBJECT TYPES
-
-
- OBJECT types also have nested scopes. The Suffix part provides a base
- structure by which to locate the fields and METHODS local to the scope
- of the OBJECT type itself. In addition, inheritance and VMT
- particulars are stored. The format is as follows:
-
- +00: A Word containing an LL which locates the local Hash Table
- that provides access to the fields and METHODS local to
- the nested scope.
-
- +02: A Word containing an LL which locates the Dictionary
- Header of the initial field or METHOD in the nested scope.
- This supports a "left-to-right" traversal of the fields
- and METHODS in an OBJECT.
-
- +04: An LG which locates the Type Descriptor of the Parent
- Object. This field is zero if there is no such Parent.
-
- +08: A Word which contains the size in bytes of the VMT for
- this Object. This field is zero if the object employs no
- Virtual Methods.
-
- +0A: A Word which contains the offset within the CONST DSeg Map
- that locates the VMT skeleton or template segment. This
- field equals FFFFh if the object employs no Virtual
- Methods.
-
- +0C: A Word which contains the offset within an Object instance
- where the NEAR POINTER to the VMT for the object is stored
- (within the DATA SEGMENT). This field equals FFFFh if the
- object employs no Virtual Methods.
-
- +0E: A Word which contains an LL which locates the Dictionary
- Header for the name of the OBJECT itself.
-
-
-
- 4.4.4.3.2.4 FILE (NON-TEXT) TYPES
-
-
- This Suffix consists of an LG that locates the Type Descriptor of the
- base type of the file. Note that the Type Descriptor may be that of
- an un-typed entity (for un-typed files).
-
-
-
- 4.4.4.3.2.5 TEXT FILE TYPES
-
-
- This Suffix consists of an LG that locates the Type Descriptor of the
- base type of the file -- in this case SYSTEM.CHAR.
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 23
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.4.3.2.6 SET TYPES
-
-
- This Suffix consists of an LG that locates the base-type of the set
- itself. Pascal limits such entities to simple ordinals whose
- cardinality is limited to 256.
-
-
-
- 4.4.4.3.2.7 POINTER TYPES
-
-
- This Suffix consists of an LG that locates the base-type of the entity
- pointed at.
-
-
-
- 4.4.4.3.2.8 STRING TYPES
-
-
- This is a special case of an ARRAY type. The format is as follows:
-
- +00: An LG to the Type Descriptor SYSTEM.CHAR which is the base
- type of all Turbo Pascal Strings.
-
- +04: An LG to the Type Descriptor for the array bounds
- constraints for the string.
-
-
-
- 4.4.4.3.3 FLOATING-POINT TYPES
-
-
- The Suffix part for all Floating-Point types is EMPTY. All data
- needed to specify these approximate number types is contained in the
- Prefix part. The Types included in this class are SINGLE, DOUBLE,
- EXTENDED, COMP and REAL.
-
-
-
- 4.4.4.3.4 ORDINAL TYPES
-
-
- The Ordinal Types consist of the various "integer" types plus the
- BOOLEAN, CHAR and Enumerated types.
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 24
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.4.3.4.1 "INTEGERS"
-
-
- These types include BYTE, SMALLINT, WORD, INTEGER and LONGINT. Their
- Suffix parts are identical in format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Type Descriptor of the largest
- upward compatible type. This is the Type Descriptor that
- is used to control the width of an un-typed constant in
- the dictionary stub. For the "integer" types, this is an
- LG to SYSTEM.LONGINT.
-
-
-
- 4.4.4.3.4.2 BOOLEANS
-
-
- This type Suffix has the following format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Type Descriptor SYSTEM.BOOLEAN.
- There is no "upward compatible" type.
-
-
-
- 4.4.4.3.4.3 CHARS
-
-
- This type Suffix has the following format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Type Descriptor SYSTEM.CHAR. There
- is no "upward compatible" type.
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 25
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 4.4.4.3.4.4 ENUMERATIONS
-
-
- This type Suffix is unusual and has the following format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Prefix of the current Type
- Descriptor. There is no upward compatible type.
-
- What follows is a full-fledged SET Type Descriptor whose base type is
- the Type Descriptor of the Enumerated Type itself. The author has not
- yet discovered the reason for this.
-
-
-
- 4.4.4.3.5 SUBPROGRAM TYPES
-
-
- The length of this Suffix is variable. The format is as follows:
-
- +00: An LG that locates the Type Descriptor of the FUNCTION
- result returned by the Subprogram. This field is zero if
- the Subprogram is a PROCEDURE.
-
- +04: A Word that contains the number of Formal Parameters in
- the Function/Procedure header. If non-zero, then this
- word is followed by the parameter list itself as a simple
- array of parameter descriptors.
-
- The format of a parameter descriptor is as follows:
-
- 0000: An LG that locates the Type Descriptor of the
- corresponding parameter;
-
- 0004: A Byte that identifies the parameter passing
- mechanism used for this entry as follows:
-
- 02h -> VALUE of parameter is passed on STACK,
- 06h -> ADDRESS of parameter is passed on STACK.
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 26
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 5. MAPS AND LISTS
-
-
- The "MAPS and LISTS" are not part of the symbol dictionary. Rather,
- these structures provide access to the Code and Data Segments produced
- by the compiler or included via the {$L name.OBJ} directive. The
- format and purpose (as understood by this author) of each of these
- tables is explained in the following sections.
-
-
-
- 5.1 PROC MAP
-
-
- The PROC Map provides a means of associating the various Function and
- Procedure declarations with the Code Segments. There is some evidence
- that the Compiler produces CODE (and DATA) Segments for EACH of the
- Subprograms defined in the Unit as well as for the un-named Unit
- Initialization code block. There is also evidence that EXTERNAL PROCs |
- must be assembled separately in order to exploit fully the Turbo
- "Smart Linker" since Turbo Pascal places some significant restrictions
- on EXTERNAL routines in the area of Segment Names and Types.
- Specifically, only code segments named "CODE" and data segments named
- "DATA" will be used by the "Smart Linker" as sources of code and data
- for inclusion in a Turbo Pascal .EXE file.
-
- The first entry in the PROC Map is reserved for Unit Initialization
- block. If there is no Unit Initialization block, this entry will be |
- filled with $FF. In addition, each and every PROC in the Unit has an |
- entry in this table.
-
- If an EXTERNAL routine is included, then ALL PUBLIC PROC definitions
- in that routine must be declared in the Unit Source Code with the
- EXTERNAL attribute.
-
- The size of the PROC Map Table (in Bytes) is implied in the Unit
- Header by the LL's that occur at offsets +0C and +0E.
-
- The Format of a single PROC Map Entry is as follows:
-
- +00: A Word that contains an offset within the CSeg Map. This
- is used to locate the code segment containing the PROC.
-
- +02: A Word that contains an offset within the CODE Segment
- that defines the PROC entry point relative to the load
- point of the referenced CODE Segment.
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 27
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 5.2 CSEG MAP
-
-
- The CSeg Map provides a convenient descriptor table for each CODE
- Segment present in the Unit and serves to relate these segments with
- the Segment Relocation Data and the Segment Trace Table. It seems
- reasonable to infer that the "Smart Linker" is able to include/exclude
- code/data at the SEGMENT level only.
-
- The CSeg Map is an array of fixed-length records whose format is as
- follows:
-
- +00: A Word apparently reserved for use by TURBO.
-
- +02: A Word that contains the Segment Length (in bytes).
-
- +04: A Word that contains the Length of the Relocation Data
- Table for this Code Segment (in bytes).
-
- +06: A Word that contains the offset of the Trace Table Entry
- for this Segment (if it was compiled with DEBUG Support).
- If there is no Trace Table for this segment, then this
- Word contains FFFFh.
-
-
-
- 5.3 TYPED CONST DSEG MAP
-
-
- The CONST DSeg Map provides a convenient descriptor table for each
- DATA Segment present in the Unit which was spawned by the presence of
- Typed Constants or VMT's in the Pascal Code. It serves to relate
- these segments with the Segment Relocation Data and with the Code
- Segments that refer to these DATA elements.
-
- The CONST DSeg Map is an array of fixed-length records whose format is
- as follows:
-
- +00: A Word apparently reserved for use by TURBO.
-
- +02: A Word that contains the Segment Length (in bytes).
-
- +04: A Word that contains the Length of the Relocation Data
- Table for this DATA Segment (in bytes).
-
- +06: A Word that contains an LL which locates the OBJECT that
- owns this VMT skeleton or zero if the segment is not a VMT
- skeleton.
-
- It is possible to determine the containing scope for a Typed Constant
- declaration but -- unless it is for a VMT -- the job is a bit tedious.
- Essentially, one has to search the Symbol Dictionary for a declaration
- whose offset points to a given entry and the complete path to that
- symbol must be recorded. Our program doesn't do this but it can be
- done if the required dictionary entries are present.
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 28
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
-
-
- 5.4 GLOBAL VAR DSEG MAP
-
-
- The VAR DSeg Map provides a convenient descriptor table for each DATA
- Segment present in the Unit.
-
- One entry exists for each CODE segment which refers to GLOBAL VAR's
- allocated in the DATA Segment. These references may be seen in the
- Relocation Data Table. Each EXTERNAL CSeg having a segment named DATA
- also spawns an entry in this table. Only the Code Segments that meet
- these criteria cause entries to be generated in the VAR Dseg Map.
-
- The VAR DSeg Map is an array of fixed-length records whose format is
- as follows:
-
- +00: A Word apparently reserved for use by TURBO.
-
- +02: A Word that contains the Segment Length (in bytes). This
- may be zero, especially if the EXTERNAL routine contains a
- DATA segment whose sole purpose is to declare one or more
- EXTRN symbols that are defined in some DATA segment
- external to the Assembly.
-
- +04: A Word apparently reserved for use by TURBO.
-
- +06: A Word apparently reserved for use by TURBO.
-
- To determine the identity of the CSeg that owns some particular entry
- in this table, examine the Relocation Data for ALL CSegs. Each CSeg
- which makes reference to a DATA segment has an entry in this table.
-
-
-
- 5.5 DONOR UNIT LIST
-
-
- This list contains an entry for each Unit (taken from the "USES" list)
- which MAY contribute either CODE or DATA to the executable file. Not
- all units do make such a contribution as some exist merely to define a
- collection of Types, etc. A Unit gets into this list if there exists
- a single Relocation Data Entry that references CODE or DATA in that
- Unit.
-
- The list is comprised of elements whose SIZE is variable and whose
- format is as follows:
-
- +00: A WORD apparently reserved for use by TURBO.
-
- +02: A variable-length String containing the unit name.
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 29
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 5.6 SOURCE FILE LIST
-
-
- This list contains an entry for each "source" file used to compile the
- Unit. This includes the Primary Pascal file, files containing Pascal
- code included by means of the {$I filename.xxx} compiler directive,
- and .OBJ files included by the {$L filename.OBJ} compiler directive.
-
- The order of entries in this list is critical since it maps the CODE
- segments stored in the unit. The order of the entries is as follows:
-
- 1) The Primary Pascal file;
-
- 2) All Included Pascal files;
-
- 3) All Included .OBJ files.
-
- Mapping of CSegs to files is done as follows:
-
- a) Each .OBJ file contributes a SINGLE Code Segment (if any).
- Note that this author has not observed an .OBJ module that
- contains only a DATA Segment (but that seems a distinct
- possibility).
-
- b) The Primary Pascal file (augmented by all included Pascal
- Files) contributes zero or more CODE Segments.
-
- Therefore, there are at least as many CSeg entries as .OBJ files. If
- more, then the excess entries (those at the front of the list) belong
- to the Pascal files that make up the Pascal source for the unit.
-
- The format of an entry in this list is as follows:
-
- +00: A flag byte that indicates the type of file represented;
-
- 04h -> the Primary Pascal Source File,
- 03h -> an Included Pascal Source File,
- 05h -> an .OBJ file that contains a CODE segment.
-
- +01: A Word apparently reserved for use by the Compiler/Linker.
-
- +03: A Word that is zero for .OBJ files and which contains the
- file directory time-stamp for Pascal Files.
-
- +05: A Word that is zero for .OBJ files and which contains the
- file directory date-stamp for Pascal Files.
-
- +07: A variable-sized string containing the filename and
- extension of the file used during compilation.
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 30
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 5.7 DEBUG TRACE TABLE
-
-
- If Debug support was selected at compile time, then all Pascal code
- which supports Debugging produces an entry in this table. The table
- entries themselves are variable in size and have the following format:
-
- +00: A Word which contains an LL that locates the Directory
- Header of the Symbol (a PROC name) this entry represents.
-
- +02: A Word which contains the offset (within the Source File
- List) of the entry that names the file that generated the
- CSeg being traced. This allows the file included by means
- of the {$I filename} directive to be identified for DEBUG
- purposes, as well as code produced from the Primary File.
-
- +04: A Word containing the number of bytes of data that precede
- the BEGIN statement code in the segment. For Pascal PROCS
- these bytes consist of literal constants, un-typed |
- constants, and other data such as range-checking limits, |
- etc.
-
- +06: A Word containing the Line Number of the BEGIN statement
- for the PROC.
-
- +08: A Word containing the number of lines of Source Code to
- Trace in this Segment.
-
- +0A: An array of bytes whose size is at least the number of
- source code lines in the PROC. Each byte contains the
- number of bytes of object code in the corresponding source
- line. This appears to be an array of SHORTINT since if a
- "line" contains more than 127 bytes, then a single byte of
- $80 precedes the actual byte count as a sort of "escape"
- and the next byte records the up to 255 bytes for the |
- line. This situation has not yet been fully explored. We |
- do not yet know what happens in the event a line is |
- credited with spawning more than 255 bytes of code. |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 31
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 6. CODE, DATA, RELOCATION INFO
-
-
- This area begins at the start of the next free PARAGRAPH. This means
- that its offset from the beginning of the Unit ALWAYS ends in the
- digit zero.
-
- This area contains the CODE segments, CONST DATA segments, and the
- Relocation Data required for linking.
-
-
-
- 6.1 OBJECT CSEGS
-
-
- Each CODE segment included in the unit appears here as specified by
- the CSeg Map Table. Depending on usage, these segments may appear in
- the executable file. There are no filler bytes between segments.
-
-
-
- 6.2 CONST DSEGS
-
-
- This section begins at the start of the first free PARAGRAPH following
- the end of the Object CSegs. This means that its offset from the
- beginning of the Unit ALWAYS ends in the digit zero.
-
- A DATA segment fragment appears here for each CSeg that declares a
- typed constant, and for each OBJECT which employs Virtual Methods.
- There are no filler bytes between segments.
-
- If local symbols were generated, there is always enough information to
- allow documenting the scope of the declaration as well as interpreting
- the data in the display since the needed type declarations would also
- be available. Our program doesn't go to this extreme however.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 32
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 6.3 RELOCATION DATA TABLE
-
-
- This table begins at the start of the first free PARAGRAPH following
- the end of the CONST DSegs. This means that its offset from the
- beginning of the Unit ALWAYS ends in the digit zero. There are two |
- sections in this table: one for code, and one for data. Both |
- sections are aligned on paragraph boundaries. This may result in a |
- "slack" entry between the code and data sub-sections, but this entry |
- is included in the byte tally for the section stored in the Unit |
- Header Table at ULPtch (offset +20). |
-
- The table begins with entries for the CSeg Map and ends with entries
- for the CONST DSeg Map. The appropriate Map entry specifies the
- number of bytes of Relocation Data for the corresponding segment.
- This number may be zero in which case there is no Relocation Data for
- the given segment. |
-
- The Table consists of an array of eight (8) byte entries whose format
- is as follows:
-
- +00: A Byte containing the offset within the Donor Unit List of
- the Unit name that this entry refers to. This can be the
- compiled Unit or some previously compiled external unit.
-
- +01: A Byte that defines the type of reference being made and
- implies the size of the pointer needed (WORD or DWORD).
- The known and/or observed values are as follows:
-
- 00h -> a WORD refers to a PROC Map.
- 10h -> a WORD refers to a PROC Map.
- 20h -> a WORD refers to a PROC Map.
- 30h -> a DWORD pointer refers to a PROC Map.
- 50h -> a WORD refers to a CSeg Map.
- 60h -> a WORD refers to an unknown Map.
- 70h -> a DWORD pointer refers to a CSeg Map.
- 90h -> a WORD refers to a VAR DSeg Map.
- A0h -> a WORD refers to a DSeg Map for SEG address. |
- D0h -> a WORD refers to a CONST DSeg Map.
-
- +02: A Word containing the offset within the Map table
- referenced according to the above code scheme.
-
- +04: A Word containing an offset within the target segment
- which will be added to the effective address. For
- example, a reference to the VAR DSeg Map will require a
- final offset to locate the item (variable) within the DATA
- SEGMENT being referenced here. This may also be needed
- for references to LITERAL DATA embedded in a CODE SEGMENT.
-
- +06: A Word containing the offset within the CODE or DATA
- segment owning this entry that contains the area to be |
- patched with the value of the final effective address. |
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 33
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- For some truly wild guessing about the flag byte above, the following |
- pattern seems to be emerging. Look at bits 7-4 of this byte. It |
- appears that the type of Map reference may be coded into bits 7-6 and |
- that the size or type of reference may be coded into bits 5-4. Note |
- that bits 7-6 are "00" for PROC Map items, "01" for CSeg Map items, |
- "10" for Global DSeg Map items, and "11" for Const DSeg Map items. It |
- appears that the size or type of reference may be coded into bits 5-4. |
- Note that all FAR (DWORD) pointer references show these bits as "11" |
- and that a SEGMENT Register value appears as "10" and that WORD values |
- otherwise appear as "01" or "00". Further, no type 00h item has been |
- seen which has a non-zero effective address adjustment. This all |
- seems to suggest the following code structure: |
-
- 7654 3210 (bits 3-0 don't seem to be used) |
-
- 00-- ---- Locate item via a PROC Map, |
- 01-- ---- Locate item via a CSeg Map, |
- 10-- ---- Locate item via a Global DSeg Map, |
- 11-- ---- Locate item via a Const DSeg Map, |
- --00 ---- WORD offset has NO effective address adjustment, |
- --01 ---- WORD offset HAS an effective address adjustment, |
- --10 ---- WORD is content of a SEGMENT Register such as DS |
- or CS. |
- --11 ---- DWORD (FAR) pointer is supplied with possible |
- effective address adjustment. |
-
- The evidence in support of this conjecture is both slim and vast. It |
- all depends on how much data one looks at. I have looked at a lot of |
- data from the Borland supplied units and I haven't found anything to |
- refute the above. Accordingly, the supplied program interprets this |
- flag byte according to this scheme. |
-
-
-
- 7. SUPPLIED PROGRAM
-
-
- In order that the above information be made constructively useful, the
- author has designed a program that automates the process of discovery.
- It is not a "handsome" program and it is not a work of art. It does
- give useful results provided your PC has enough available memory.
-
- It should be obvious that the program was not designed "top-down".
- Rather, it just evolved as each new discovery was made. Later on, it
- seemed reasonable to try to document some of the relations between the
- various lists and tables and the program tries to make some of these
- relations clear, albeit with varying degrees of success.
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 34
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 7.1 TPUNEW |
-
-
- This is the main program. It will ask for the name of the unit to be
- documented. Reply with the unit name only. The program will append
- the ".TPU" extension and will search for the proper file.
-
- The program will then ask if Dis-Assembly is desired and will require
- a "y" or "n" answer.
-
- The current directory will be searched first, followed by all
- directories in the current PATH. The program will NOT search a ".TPL"
- (Turbo Pascal Library) file.
-
- If the desired unit is found, the program will write a report to the
- current directory named "unitname.lst" which contains its analysis.
- The format of the report is such that it may be copied to a printer if
- that printer supports TTY control codes with form-feeds. Be judicious
- in doing this however since there can be a lot of information. The
- Turbo SYSTEM.TPU unit file produces almost ninety (90) pages without |
- the disassembly option. When disassembly is requested for the SYSTEM |
- unit, the size of the output file exceeds 700K bytes. |
-
-
-
- 7.2 TPURPT1
-
-
- This is a Unit that contains the text-file output primitives required
- by the main program. It's not very pretty but it does work.
-
-
-
- 7.3 TPUAMS1
-
-
- This Unit contains all Type Definitions, Structures, and "Canned"
- Functions and Procedures required by the main program. All structures
- documented in this report are also documented in TPUAMS1 by means of
- the TYPE mechanism. Some of the structures are difficult if not
- impossible to handle using ISO Pascal but Turbo Pascal provides the
- means for getting the job done.
-
-
-
- 7.4 TPUUNA1
-
-
- This unit is a rudimentary disassembler. The output will not assemble
- and may look strange to a real assembler programmer since this author
- is not so-qualified. However, the basis for support of 80286, 80386
- etc. processors is present as well as coprocessor support. Of perhaps
- the greatest interest is that it does appear to decode the emulated
- coprocessor instructions that are implemented via INT 34-3D.
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 35
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- Be warned however. The output is not guaranteed since this was coded
- by myself and I am perhaps the rankest amateur that ever approached
- this quite awful assembler language. For convenience, the operand
- coding mimics TASM "Ideal" mode.
-
- As is usual with programs of this type, error-recovery is minimal and
- no context checking is performed. If the operation code is found to
- be valid, then a valid instruction is assumed -- even if invalid
- operands are present.
-
- The only positives that apply to this program are that it doesn't slow
- the cpu down (although a lot more output is produced), and it does let
- one "tune" code for compactness by letting one view the results of the
- coding directly. Also, incomplete instructions are handled as data |
- rather than overrunning into the next proc. |
-
-
-
- 7.5 MODIFICATIONS
-
-
- It was intended from the beginning that this program should be able to
- be enhanced to permit external units to be referenced during the
- analysis of any given unit, even if they were library components. The
- author hopes that users so-inclined will find the code pliable enough
- to engineer such enhancements. No small amount of care was expended
- to make pointer references flexible enough so that more than one unit
- could be addressed at one time. However, none of the references to
- external units are resolved by the program as it now stands.
-
- This program was NOT intended as a pilot for some future product. It |
- WAS intended as a rather "ersatz" tool for myself. |
-
-
-
- 7.6 NOTES ON PROGRAM LOGIC |
-
-
- The following sections discuss a few of the methods employed by the
- supplied program.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 36
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 7.6.1 FORMATTING THE DICTIONARY |
-
-
- Printing the unit dictionary area in a way that exposes its underlying |
- semantics is no small task. The unit dictionary area itself is a |
- rather amorphous-looking mass of data composed of hash tables, |
- dictionary headers and stubs, type descriptors, etc. In order to |
- present all this information in a meaningful way, we have to reveal |
- its structure and this cannot be done by means of a sequential |
- "browse" technique. Rather, we have to visit all nodes in the |
- dictionary area so that each may be formatted in a way that exposes |
- their function and meaning. This is made necessary by the fact that |
- items are added to the dictionary as encountered and no convenient |
- ordering of entry types exists. What we have here is the problem of |
- finding a minimal "cover" for the dictionary area that properly |
- exposes the content and structure of the dictionary area. |
-
- To do this, we construct (in the heap) a stack and a queue, both of |
- which are initially empty. The entries we put in the stack identify |
- the class of entry (Hash Table, Dictionary Header, Type Descriptor or |
- In-Line Code group), the location of the structure, and the location |
- of its immediate "owner" or "parent" dictionary entry (which allows |
- some limited information about scope to be printed). |
-
- To the empty stack, we add an entry for the unit name dictionary |
- entry, the INTERFACE hash table, and the DEBUG hash table. All these |
- are located via direct pointers (LL's) in the Unit Header Table. We |
- then pop one entry off the stack and begin our analysis. |
-
- a) If the entry we popped off the stack is not present in the |
- queue, we add it and call a routine that can interpret the entry |
- (aka, "cover") for a Dictionary Header, Hash Table, or Type |
- Descriptor. (This may lead to additional entries being added to |
- the stack such as nested-scope hash tables, Dictionary Headers, |
- Type Descriptors or In-Line Code group entries.) |
-
- b) While the stack is not empty, we pop another entry and repeat |
- step "a" (above) until no more entries are available. |
-
- The result is a queue containing one entry for each structure in the |
- unit dictionary area that is identifiable via traversal. (In |
- practice, the method we use is similar to a "breadth-first" traversal |
- of an n-way tree that is implemented in non-recursive fashion.) Each |
- entry in the queue contains the information described above and the |
- queue itself thus forms a set of descriptors that drive the process of |
- formatting the dictionary area for display. The process may be |
- likened to "painting by the numbers" or to finding a way to lay tile |
- on a flat surface using tiles of four different irregular shapes until |
- the floor is exactly covered. |
-
- There is one significant limitation that needs to be pointed out. It |
- is not always possible to determine the "parent" or "owner" of a node |
- with certainty. The following discussion illustrates the problem of |
- finding the "real" parent of a Type Descriptor. |
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 37
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- Almost every "type" in Pascal is actually derived from the basic types |
- that are (in Turbo Pascal) defined in the SYSTEM.TPU unit -- e.g. |
- "INTEGER", "BYTE", etc. In addition, several of the Type Descriptors |
- in the SYSTEM unit are referenced by more than one Dictionary Entry. |
- Thus, we find that a "many-to-one" relationship may exist between |
- Dictionary Entries and Type Descriptors. How does one find out which |
- is the entry that actually gave rise to the Type Descriptor? |
-
- The Dictionary Area of a unit has some special properties, one of |
- which is the fact that the Dictionary Entries for named Types are |
- often located quite near their primary type descriptors. The |
- Dictionary Area seems to be treated as an upward growing heap with the |
- various structures being added by Turbo as needed. This makes it |
- likely that the Type "Q" header which gives rise to a type descriptor |
- is quite likely to occur earlier in the Dictionary Area than any other |
- header which refers to the same descriptor. We take advantage of this |
- property to allocate "ownership" but it may not be "fool-proof". Some |
- type descriptors are spawned by other type descriptors, especially for |
- structured types. We don't attempt to allocate "ownership" to these |
- "lower-level" descriptors. |
-
-
-
- 7.6.2 THE DISASSEMBLER |
-
-
- To start with, I apologize up front for mistakes which are bound to be |
- present in this routine. I am not a MASM or TASM programmer and I |
- will not pretend otherwise. This being the case, the formatting I |
- have chosen for the operands may be erroneous or misleading and might |
- (if submitted to one of the "real" assemblers) produce object code |
- quite different from what is expected. I hope not, but I have to |
- admit it's possible. |
-
- My intention in adding this unit was to permit tuning of object code |
- to be made possible. With practice and some effort, one can observe |
- the effect on the object module caused by specific Pascal coding. |
- Thus, where compactness is an issue of paramount importance, TPUUNA1 |
- can be of help. In some cases, a simple re-arrangement of the local |
- variable declarations in a procedure can have a significant effect of |
- the size of the code if it means the difference between 1 and 2-byte |
- displacements for each instruction that references a specific local |
- variable. Potential applications along these lines seem almost |
- unlimited. |
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 38
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- I adopted an operand format not unlike that of TASM "Ideal" mode since |
- it was more convenient to do so and looked more readable to me. I |
- relied on several reference books for guidance in decoding the entire |
- mess and I found that there were several flaws (read ERRORS) in some |
- of them which made the job that much more difficult. I then |
- compounded my problems by attempting to handle 80286 and 80386 |
- specific code even though Turbo Pascal does not generate code specific |
- to these processors. I simply felt that the effort involved in |
- writing any sort of Dis-Assembly program for Turbo Pascal units was an |
- effort best experienced not more than once. With all this self- |
- flagellation out of my system once and for all, I will try to show the |
- basic strategy of the program and to explain the limitations and some |
- of the discoveries I made. |
-
- The routine is intended to be idiotically simple - i.e., no smarter |
- than the DEBUG command in principle. The basic idea is: pass some |
- text to the routine and get back ONE line derived from some prefix of |
- that text. Repeat as necessary until all text is gone. Thus, there |
- is no attempt to check the context of the text being processed. Also, |
- some configurations of the "modR/M" byte may invalid for selected |
- instructions. I don't try to screen these out since the intent was to |
- look at the presumably correct code produced by TURBO Pascal -- not |
- devious assembly language. Also, this program regards WAIT operations |
- as "stand-alone" -- i.e., it doesn't check to see if a coprocessor |
- operation follows for which the WAIT might be regarded as a prefix. |
-
- One area of real difficulty was figuring out the Floating-Point |
- emulations used by Turbo Pascal that are implemented by means of |
- interrupts $34 through $3D. I don't know if I got it right, but the |
- results seem reasonable and consistent. In the listing, the Interrupt |
- is produced on one line, followed by its parameters on the next line. |
- The parameter line is given the op-code "EMU_xxxx" where "xxxx" is the |
- coprocessor op-code I felt was being emulated. Interrupt $3C was a |
- real puzzler but after seeing a lot of code in context, I think that |
- the segment override is communicated to the emulator by means of the |
- first byte after the $3C. |
-
- Normally, in a non-emulator environment, all coprocessor operations |
- (ignoring any WAIT prefixes) begin with $D8-$DF. What Borland (and |
- maybe Microsoft) seem to have done here is to change the $D8-$DF so |
- that bits 7 and 6 of this byte are replaced with the one's complement |
- of the 2-bit segment register number found in various 8086 |
- instructions. This seems to be how an override for the DS register is |
- passed to the emulator. I don't KNOW this to be the correct |
- interpretation, but the code I have examined in context seems to work |
- under this scheme, so TPUUNA uses it to interpret the operand |
- accordingly. |
-
- For 80x86 machines, the problem was somewhat simpler. TPUUNA takes a |
- quick look at the first byte of the text. Almost any byte is valid as |
- the initial byte of an instruction, but some instructions require more |
- than one byte to hold the complete operation code. Thus, step 1 |
- classifies bytes in several ways that lead to efficient recognition of |
- valid operation codes. |
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 39
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- Once the instruction has been identified in this way, it is more or |
- less easy to link to supplemental information that provides operand |
- editing guidance, etc. |
-
- The tables that embody the recognition scheme were constructed using |
- PARADOX 3.0 (another fine Borland product) and suitably coded queries |
- were used to generate the actual Turbo Pascal code for compilation. |
-
- For those that are interested, TPUUNA supports the address-size and |
- operand-size prefixes of the 80386 as well as 32-bit operands and |
- addresses but remember that Turbo Pascal doesn't generate these. A |
- trivial change is provided for which allows segments which default to |
- 32-bit mode to be handled as well. |
-
- There is a simple mode variable that gets passed to TPUUNA by its |
- caller which specifies the most-capable processor whose code is to be |
- handled. Codes are provided for the 8086 (8088 is the same), 80186 |
- (same as 80286 except no protected mode instructions), 80286 (80186 |
- plus protected mode operation), and 80386. |
-
- No such specifier is provided for coprocessor support. What is there |
- is what I think an 80387 supports. I don't think that this is really |
- a problem if you don't try to use TPUUNA for anything but Turbo Pascal |
- code. |
-
- Error recovery is predictably simple. The initial text byte is output |
- as the operand of a DB pseudo-op and provision is made to resume work |
- at the next byte of text. |
-
- I hope this program is found to be useful in spite of the errors it |
- must surely contain. I have yet to make much sense of the rules for |
- MASM or TASM operand coding and I found very little of value in many |
- of the so-called "texts" on the subject. I found myself in the |
- position of that legendary American watching a Cricket match in |
- England for the first time ("You mean it has RULES?"). |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 40
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 8. UNIT LIBRARIES
-
-
- This author has examined .TPL files in passing and concludes that
- their structure is trivial in the extreme. The following notes should
- be of some help.
-
-
-
- 8.1 LIBRARY STRUCTURE
-
-
- A Turbo Pascal Library (.TPL) file appears to be a simple catenation
- of Turbo Pascal Unit (.TPU) files. Since the length of a Unit may be
- determined from the Unit Header (see section 3.2), it is simple to see
- that one may "browse" through a .TPL file looking for an external unit
- such as SYSTEM.TPU. If this seems to be too much effort, then there
- is always the TPUMOVER Utility program supplied by Borland.
-
-
-
- 8.2 THE TPUMOVER UTILITY
-
-
- Quite simply, this Utility allows one to extract units from .TPL files
- in order to subject them to the analysis performed by TPUMAIN. Read
- your Turbo Pascal User's Guide for instructions on the operation and
- use of this utility.
-
-
-
- 9. APPLICATION NOTES
-
-
- One of the more obvious applications of this information would seem to
- be in the area of a Cross-Reference Generator.
-
- There is a very fine example of such a program in the public domain
- that was written by Mr. R. N. Wisan called "PXL". This program has
- been around since the days of Turbo Pascal Version 1. The program has
- been continually enhanced by the author in the way of features and for
- support of the newer Turbo Pascal versions. It does not however solve
- the problem of telling one which unit contains the definition of a
- given symbol. In fairness to "PXL" however, this is no small problem
- since the format of .TPU files keeps changing (Turbo 5.5 Units are
- not object-code compatible with Turbo 5.0 Units, and so on...) and
- Mr. Wisan probably has more than enough other projects to keep himself
- occupied.
-
- However, for the user who is willing to work a little (maybe a lot?),
- this document would seem to provide the information needed to add such
- a function to his own pet cross-reference generator.
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 41
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 10. ACKNOWLEDGEMENTS
-
-
- This project would have been totally infeasible without the aid of
- some very fine tools. As it was, several hundred man hours have been
- expended on it and as you can see, there are a few unresolved issues
- that have been (graciously) left for others to address. The tools
- used by this author consisted of:
-
- 1) Turbo Pascal 5.5 Professional by Borland International
-
- 2) Microsoft WORD (version 5.0)
-
- 3) LIST (version 6.4a) by Vernon D. Buerg
-
- 4) the DEBUG utility in MS-DOS Version 3.3.
-
- 5) PARADOX 3.0 by Borland International |
-
- 6) QUATTRO PRO by Borland International |
-
- 7) TURBO ASSEMBLER 1.1 by Borland International |
-
- (PARADOX and QUATTRO PRO were used for data collection and analysis in |
- the course of coding the recognizer tables for the disassembler unit.) |
-
- The references listed were of great value in this project. [Intel85] |
- was a valuable source of information about coprocessor instructions as |
- well as offering hints about the differences between the 8086/8088 and |
- the 80286. The [Borland] TASM manuals offered further info on the |
- 80186. [Nelson] provided presentations of well-organized data |
- directed at the problem of disassembly but the tables were flawed by a |
- number of errors which crept into my databases and which caused much |
- of the extra debugging effort. [Intel89] offered valuable insights on |
- the 80386 addressing schemes as well as the 32-bit data extensions. |
- Finally, [Brown] provided valuable clues on the Floating-Point |
- emulators used by Borland (and Microsoft?). As you can see, the |
- amount of hard information available to me on this project was quite |
- limited since I am unaware of any other existing body of literature on |
- this subject. |
-
- That's it folks. Does anyone wonder why it took several hundred man
- hours to get to this point? It took a lot of hard (and at times
- tedious) work coupled with a great many lucky guesses to achieve what
- you see here.
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 42
-
-
-
- Inside TURBO Pascal 5.5 Units
- ----------------------------------------------------------------------
-
- 11. REFERENCES
-
-
- [Bor88a], TURBO ASSEMBLER REFERENCE GUIDE, Borland International, |
- 1988. |
-
- [Bor88b], TURBO ASSEMBLER USER'S GUIDE, Borland International, 1988. |
-
- [Bor88c], TURBO PASCAL REFERENCE GUIDE Version 5.0, Borland |
- International, 1988. |
-
- [Bor88d], TURBO PASCAL USER'S GUIDE Version 5.0, Borland |
- International, 1988. |
-
- [Bor89], TURBO PASCAL 5.5 OBJECT-ORIENTED PROGRAMMING GUIDE, Borland |
- International, 1989. |
-
- [Brown], INTER489.ARC, Ralf Brown, 1989 |
-
- [Intel85], iAPX 286 PROGRAMMER'S REFERENCE MANUAL INCLUDING THE iAPX |
- 286 NUMERIC SUPPLEMENT, Intel Corporation, 1985, (order |
- number 210498-003). |
-
- [Intel89], 386 SX MICROPROCESSOR PROGRAMMER'S REFERENCE MANUAL, Intel |
- Corporation, 1989, (order number 240331-001). |
-
- [Nelson], THE 80386 BOOK: ASSEMBLY LANGUAGE PROGRAMMER'S GUIDE FOR |
- THE 80386, Ross P. Nelson, Microsoft Press, 1988. |
-
- [Scanlon], 80286 ASSEMBLY LANGUAGE ON MS-DOS COMPUTERS, Leo J. |
- Scanlon, Brady 1986. |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: August 11, 1990 Page 43