home *** CD-ROM | disk | FTP | other *** search
Text File | 1991-06-06 | 151.6 KB | 3,677 lines |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────
-
- INSIDE TURBO PASCAL UNIT FILES
-
- Version 6.0 for MS-DOS
- Version 1.0 for WINDOWS
-
- ──────────────────────────────
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- by
-
- William L. Peavy
-
- ────────────────
-
- June 6, 1991
-
-
-
-
-
-
- ABSTRACT
-
- If you want to know what is in a .TPU (unit) file produced
- by either Version 1.0 of Turbo Pascal for Windows or by
- Version 6.0 of Turbo Pascal from Borland International, then
- this paper is for you. It doesn't explain quite everything
- since the I don't have access to secret documents or
- anything like that and since some of the data in .TPU files
- just doesn't have enough auxiliary information to make its
- role clear. However, it is possible to learn a great deal
- about how Turbo Pascal organizes the information it needs to
- refer to, and it is also possible to learn just what kind of
- code the compiler produces.
-
- This is the fourth in a series of reports on the subject of
- Turbo Pascal Units, the previous reports treating with Turbo
- Pascal Versions 5.0 through 6.0. The evolution of these
- files in the face of changing requirements has been
- fascinating to behold and deciphering their contents has
- been challenging to say the least.
-
- The programs supplied with this report have been reorganized
- from their 6.0 style and many identifiers have been changed.
- There are also a few bug fixes and algorithm changes. Other
- changes were dictated by the changes in the utilization of
- the TPU file itself by the Windows Compiler.
-
- Since I have a "real" job which requires my full attention,
- and since it doesn't involve use of these products in any
- direct way, I am usually hard-pressed to find the personal
- time to conduct this research. Consequently, I always
- refuse to commit to follow-up or even error correction. It
- would be irresponsible of me to pretend it could be
- otherwise. Even so, this is a revised report which contains
- a few error fixes and discusses the newly enhanced program
- which incorporates these fixes and sports some enhanced
- capabilities.
-
-
-
- Contents
-
-
-
- 1. Introduction 5
- 1.1 Caveats 5
- 1.2 Evolution 6
- 1.3 Treatment 6
- 2. Gross File Structure 7
- 2.1 User Units 8
- 2.2 SYSTEM Unit 8
- 3. Locators 8
- 3.1 Local Links 9
- 3.2 Global Links 9
- 3.3 Table Offsets 9
- 3.4 Basic Relationships 10
- 4. Unit Header 13
- 4.1 Description 13
- 4.2 UNIT Size 16
- 5. Symbol Dictionaries 16
- 5.1 Organization 16
- 5.2 Interface Dictionary 17
- 5.3 Debug Dictionary 17
- 5.4 Dictionary Elements 17
- 5.4.1 Hash Tables 18
- 5.4.1.1 Size 18
- 5.4.1.2 Scope 19
- 5.4.1.3 Special Cases 19
- 5.4.2 NAME ENTRIES 20
- 5.4.3 NAME Stubs 20
- 5.4.3.1 Label Declaratives ("O") 20
- 5.4.3.2 Un-Typed Constants ("P") 21
- 5.4.3.3 Named Types ("Q") 21
- 5.4.3.4 Variables, Fields, Typed Cons ("R") 22
- 5.4.3.5 Subprograms & Methods ("S") 24
- 5.4.3.6 Turbo Std Procedures ("T") 25
- 5.4.3.7 Turbo Std Functions ("U") 25
- 5.4.3.8 Turbo Std "NEW" Routine ("V") 25
- 5.4.3.9 Turbo Std Port Arrays ("W") 26
- 5.4.3.10 Turbo Std External Variables ("X") 26
- 5.4.3.11 Units ("Y") 26
- 5.4.4 Type Descriptors 27
- 5.4.4.1 Scope 27
- 5.4.4.2 Prefix Part 28
- 5.4.4.3 Suffix Parts 29
- 5.4.4.3.1 Un-Typed 29
- 5.4.4.3.2 Structured Types 29
- 5.4.4.3.2.1 ARRAY Types 30
- 5.4.4.3.2.2 RECORD Types 30
- 5.4.4.3.2.3 OBJECT Types 31
- 5.4.4.3.2.4 FILE (non-TEXT) Types 31
- 5.4.4.3.2.5 TEXT File Types 32
- 5.4.4.3.2.6 SET Types 32
- 5.4.4.3.2.7 POINTER Types 32
- 5.4.4.3.2.8 STRING Types 32
- 5.4.4.3.3 Floating-Point Types 32
- 5.4.4.3.4 Ordinal Types 32
- 5.4.4.3.4.1 "Integers" 33
-
-
- - iii -
-
-
-
- Contents
-
-
- 5.4.4.3.4.2 BOOLEANs 33
- 5.4.4.3.4.3 CHARs 33
- 5.4.4.3.4.4 ENUMERATions 34
- 5.4.4.3.5 SUBPROGRAM Types 34
- 6. Maps and Lists 35
- 6.1 PROC Map 35
- 6.2 CSeg Map 36
- 6.3 Typed CONST DSeg Map 37
- 6.4 Global VAR DSeg Map 37
- 6.5 DLL LIST 38
- 6.6 Donor Unit List 38
- 6.7 Source File List 39
- 6.8 DEBUG Trace Table 40
- 7. Code, Data, Fix-Up Info 40
- 7.1 Object CSegs 41
- 7.2 CONST DSegs 41
- 7.3 Fix-Up Data Tables 42
- 8. Supplied Program 45
- 8.1 TWU1 45
- 8.1.1 Unit TWU1EQU 46
- 8.1.2 Unit TWU1RPT 46
- 8.1.3 Unit TWU1UAM 46
- 8.1.4 Unit TWU1UNA 47
- 8.2 Notes on Program Logic 48
- 8.2.1 Formatting the Dictionary 48
- 8.2.2 The Disassembler 49
- 9. Unit Libraries 52
- 9.1 Library Structure 52
- 10. Inferences Drawn from Analyses 53
- 10.1 Linker Granularity 53
- 10.2 Floating-Point Emulation 53
- 10.2.1 Version 6.0 Compiler For MS-DOS 54
- 10.2.2 Version 1.0 Compiler For WINDOWS 54
- 11. Application Notes 55
- 12. Acknowledgements 56
- 13. References 57
- 14. INDEX 58
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - iv -
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
-
-
- 1. INTRODUCTION
-
-
- This document is the outcome of an inquiry conducted into the
- structure and content of Borland Turbo Pascal for Windows (Version
- 1.0) Unit files. This followed naturally from previous inquiries into
- the structure of Unit Files for versions 5.0-6.0 of Borland's Turbo
- Pascal Compilers. I was further stimulated to undertake this as a
- result of a brief conversation I had with the Principal Architect of
- Turbo Pascal, Mr. Anders Hejlsberg, in Houston at the HAL-PC meeting
- that served as the platform for the formal announcement of Turbo
- Pascal for Windows.
-
-
-
- 1.1 CAVEATS
-
-
- The material contained herein represents the findings and
- interpretations of the author. A great deal of guess-work was
- required and no assurances are given as to the accuracy of either the
- findings of fact or the inferences contained herein which are the sole
- work-product of the author. In particular, only the materials and
- information that any normal Borland customer has access to were
- available to the author. Further, no Borland source-codes were
- available as the Library Routine source is not licensed to the author.
- In short, there was nothing irregular about how these findings were
- achieved.
-
- The material contained herein is placed in the public domain free of
- copyright for use of the general public at its own risk. The author
- assumes no liability for any damages arising from the use of this
- material by others. If you make use of this information and you get
- burned, TOUGH! The author accepts no obligation to correct any such
- errors as may exist in the supplied programs or in the findings of
- fact or opinion contained herein.
-
- On the other hand, this is not a "complete" work in that a great many
- questions remain open, especially as regards fine details. The author
- is highly-qualified in neither Intel 80xxx Assembly Language nor in
- Windows 3.0 application programming and several open questions might
- best be addressed by persons competent in these areas. The author
- welcomes the input of interested readers who might be able to "flesh-
- out" some of these open questions with "hard" answers so that all
- might benefit from their expertise.
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 5
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 1.2 EVOLUTION
-
-
- The Unit first appeared in Turbo Pascal Version 4.0 (for MS-DOS) along
- with the ability to create ".EXE" instead of ".COM" files. This
- author began delving into these Unit files beginning with Version 5.0
- of Turbo Pascal and each new version of the MS-DOS based product has
- seen significant changes in both the form and the content of ".TPU"
- files.
-
- In contrast, careful study should make it plain that the Unit File
- produced by Turbo Pascal for Windows is remarkably similar to that
- produced by Turbo Pascal Version 6.0 (for MS-DOS).
-
- In the main, the files produced by the MS-DOS product (TP6) were rich
- with apparently useless fields within some of the data structures. In
- essence, the Windows product (TPW) has made use of these fields in a
- coherent way that makes the Version 6 units appear to be subsets of
- the Windows Units as far as format is concerned.
-
- The Windows version development must have been well-advanced when the
- DOS version (6.0) hit the streets. In fact, Mr. Anders Hejlsberg did
- confirm my speculation that the compiler "engine" used in the Windows
- Product is the same as that used in version 6 of the DOS Product.
-
-
-
- 1.3 TREATMENT
-
-
- This report treats with BOTH Turbo Pascal for Windows and Turbo Pascal
- Version 6.0 (for MS-DOS). It views Unit Files for the MS-DOS version
- as sub-sets of those for the Windows version from the standpoint of
- structure. Because of this, the supplied program is able to process
- ".TPU" files from either compiler with little or no special handling.
-
- This doesn't mean that Version 6.0 Units can be combined with Windows
- Applications! When an application (program) is built by either of the
- compilers, ALL units must have been compiled by that same compiler if
- for no other reason than that the SYSTEM Unit (for one) is uniquely
- tailored to each of these environments.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 6
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 2. GROSS FILE STRUCTURE
-
-
- A Turbo Pascal Unit file consists of an array of bytes that is some
- exact multiple of sixteen (16). "Signature" information allows the
- compiler to verify that the .TPU file was compiled with the correct
- compiler version and to verify that the file is of the correct size.
- The fine structure of the file will be addressed in later sections at
- ever increasing levels of detail.
-
- Graphically, the file may be regarded as having the following general
- layout (major sections bounded by ═ )
-
- ╔═══════════════════╗
- ║ Unit Header ║ Main Index to Unit File
- ╟───────────────────╢
- ║ Dictionaries: ║
- ║ a) Interface ║
- ║ b) Debug * ║ For Local Symbol Access
- ╟───────────────────╢
- ║ PROC Map ║
- ╟───────────────────╢
- ║ CSeg Map * ║ May be Empty
- ╟───────────────────╢
- ║ CONST DSeg Map * ║ May be Empty
- ╟───────────────────╢
- ║ VAR DSeg Map * ║ May be Empty
- ╟───────────────────╢
- ║ DLL List * ║ May be Empty
- ╟───────────────────╢
- ║ Donor Units * ║ May be Empty
- ╟───────────────────╢
- ║ Source Files ║
- ╟───────────────────╢
- ║ Trace Table * ║ May be Empty
- ╠═══════════════════╣
- ║ CODE Group * ║ May be Empty
- ╠═══════════════════╣
- ║ DATA Group * ║ May be Empty
- ╠═══════════════════╣
- ║ Code Fix-Ups * ║ May be Empty
- ╠═══════════════════╣
- ║ Data Fix-Ups * ║ May be Empty
- ╚═══════════════════╝
-
-
- Each of the sections outlined by double lines is capable of being up
- to 64K bytes long. The Dictionary Area begins with the Unit Header
- and continues through the Trace Table.
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 7
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 2.1 USER UNITS
-
-
- Units compiled by ordinary users have a very straight-forward
- appearance and content. The SYSTEM.TPU file is quite another thing
- however.
-
-
-
- 2.2 SYSTEM UNIT
-
-
- The SYSTEM.TPU file (found in TURBO.TPL and in the TPW.TPL file) is
- unique in several respects. It contains several types of entries that
- just don't seem to be achievable by ordinary users, and the
- arrangement of the entries in the dictionary is unique. Normally, the
- Name Entry for the Unit immediately follows the hash table but, in the
- "SYSTEM" unit, this is not true. Rather, the hash table is followed
- by all the descriptors for the built-in types, followed by descriptors
- for the standard procedures and functions, followed by the Name Entry
- for the Unit, followed by the conventional dictionary entries
- achievable by normal PASCAL coding such as the Typed Constants and
- Variables defined in the "SYSTEM" unit.
-
- Try to compile a Unit named "SYSTEM" and you find that the compiler
- wants a file called "SYSTEM.TPS". I suspect that "SYSTEM.TPS" is a
- file that contains a pre-initialized interface hash table plus the
- descriptors for the standard types and the descriptors for the built-
- in procedures and functions stored in the "SYSTEM" Unit (which would
- otherwise require special syntax to define).
-
- The compiler can't operate normally without a "SYSTEM" unit so this
- file probably provides a "bootstrap" mechanism for the built-in
- descriptors needed to build "SYSTEM.TPU".
-
-
-
- 3. LOCATORS
-
-
- The data in these files has need of structure and organization to
- support efficient access by the various programs such as the compiler,
- the linker and the debugger. This organization is built on a solid
- foundation of locators employed in the unit's data structures.
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 8
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 3.1 LOCAL LINKS
-
-
- Local Links (LL's) are items of type WORD (2 bytes) which contain an
- offset which is relative to the origin of the Dictionary Area of the
- unit. This implies that the Dictionary Area must be somewhat less
- than 64K bytes in size. If the Dictionary Area is loaded into the
- heap, then an LL can be used to locate any byte in the Dictionary
- Area. (See Below)
-
- Type LL = Word; { Local Scope Locators }
-
-
-
- 3.2 GLOBAL LINKS
-
-
- Global Links (LG's) are used to locate type descriptors and to locate
- allocation data for variables with the ABSOLUTE attribute which may
- reside in other Units (i.e., units external to the present unit).
- LG's are structured items consisting of two (2) words (see below).
-
- LG = RECORD
- UntLL: LL; { To item in Unit Named by LL below }
- UntId: LL; { Stub Type "Y" Name Entry in our Unit }
- END;
-
- The first of these is an LL that is relative to the origin of the
- Dictionary Area of the (possibly external) unit. It locates either a
- Type Descriptor or the stub of the Name entry which establishes
- storage allocation. The second word is an LL which locates the stub
- of the Name entry in the current unit dictionary for the (possibly
- external) target unit. The Name entry for this stub identifies name
- of the unit that contains the item the LG points to.
-
- This provides a handy mechanism for locating type descriptors and
- allocation information which may be defined in other separately
- compiled units.
-
-
-
- 3.3 TABLE OFFSETS
-
-
- Finally, various data-structures within a .TPU file are organized as
- arrays of fixed-length records or as lists of variable-length records.
- Efficient access to such records is achieved by means of offsets
- rather than subscripts (an addressing technique denied Pascal). These
- offsets are relative to the origin of the array or list being
- referenced.
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 9
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 3.4 BASIC RELATIONSHIPS
-
-
- ╔═══> ┌────────────────┐ ┌──────────────────────┐
- ║ ┌────────<┤ Unit Header │ │ Symbol Dictionary │
- ║ D │ └────────────────┘ │ (names, types etc) │
- ║ I │ LL ┌────────────────┐ LL's │ defined in INTERFACE │
- ║ C ├────────>┤ INTERFACE Hash ├───────>┤ │
- ║ T │ └────────────────┘ └──────────┬───────────┘
- ║ I │ LL ┌────────────────┐ LL's ┌──────────┴───────────┐
- ║ O ├────────>│ DEBUG Hash ├───────>┤ DEBUG Dictionary │
- ║ N │ └────────────────┘ │ Local Symbol option │
- ║ A │ LL ┌────────────────┐ │ builds this. Holds │
- ║ R ├────────>┤ PROC Map Table │ │ names and types etc │
- ║ Y │ └────────────────┘ │ from IMPLEMENTATION │
- ║ │ LL ┌────────────────┐ │ Linked to INTERFACE │
- ║ A ├────────>┤ CSeg Map Table │? │ part by LL's. │
- ║ R │ └────────────────┘ │ │
- ║ E │ LL ┌────────────────┐ └──────────────────────┘
- ║ A ├────────>┤ DSeg Map CONST │?
- ║ │ └────────────────┘
- ║ │ LL ┌────────────────┐
- ║ ├────────>┤ DSeg Map VAR's │?
- ║ │ └────────────────┘
- ║ │ LL ┌────────────────┐
- ║ ├────────>│ DLL List │?
- ║ │ └────────────────┘ IMPORTANT NOTES
- ║ │ LL ┌────────────────┐ ──────────────────────
- ║ ├────────>┤ Donor Unit List│? Some of the structures
- ║ │ └────────────────┘ shown in this figure
- ║ │ LL ┌──────────────────┐ are built only if they
- ║ ├────────>┤ Source File List │ are needed. These are
- ║ │ └──────────────────┘ marked by a "?" next
- ║ │ LL ┌──────────────────┐ to the box.
- ║ ├────────>┤ Debug Step Ctls │?
- ╚═══> │ └──────────────────┘ If the DEBUG Dictionary
- │ ** ┌───────────────┐ is missing, its LL
- ├────────>┤ CODE Segments │? leads directly to the
- │ └───────────────┘ INTERFACE Dictionary.
- │ ** ┌─────────────────┐ ──────────────────────
- ├────────>┤ CONST DATA Segs │?
- │ └─────────────────┘
- │ ** ┌────────────────┐
- ├────────>┤ CODE Fix-Ups │?
- │ └────────────────┘
- │ ** ┌────────────────┐
- └────────>┤ CONST Fix-Ups │?
- └────────────────┘
-
- This figure illustrates the role of the Unit Header in tying together
- the various data structures in the Unit. The type of link is shown
- next to a flow-line by "LL", "LG" or "**". "LL" and "LG" are explicit
- pointers while "**" shows a locator whose value is computed using
- other data in the Unit Header and that no explicit pointer exists.
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 10
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
-
- ┌────(from hash tables,other Name Entries)
- │
- │ ┌─────────────┬──────────────────────────────────┐
- │ │ Header Part │ Stub Part -- many formats │
- └───>┤ - - - - - - │ - - - ┌───────────────────────── │
- │ │ data, │ Some stubs have embedded │ Name
- │ Name, Class │ links │ Type Descriptors │ Entry
- │ and link to │ (see │ ┌─────────────────── │
- │ prior entry │ below)│ │ INLINE Declarative │
- │ having same │ * │ │ code bytes for a │
- │ hash-if any │ │ │ │ "macro" type PROC │
- └─────────────┴───│──────────────────────────────┘
- ┌──────────┘
- │
- │ FAR pntr ┌────────────────────────────┐
- ├───────────>┤ Absolute Memory Locations │
- │ └────────────────────────────┘
- │ ┌─────────────────────────────┐
- │ LG's │ Type Descriptors and stubs │
- ├───────────>┤ of Dictionary Entries used │
- │ │ for absolute equivalences │
- │ └─────────────────────────────┘
- │ ┌─────────────────────────────────┐
- │ LL's │ Nested Scope Hash Tables │
- ├───────────>┤ Parent Scope Dictionary Entries │
- │ │ Record Fields │
- │ │ Object Fields/Methods │
- │ └─────────────────────────────────┘
- │ ┌──────────────────────┐
- │ Offsets │ CONST DSeg Map Table │
- └───────────>┤ PROC Map Table │
- │ VAR DSeg Map Table │
- └──────────────────────┘
-
-
-
- This figure illustrates the many types of entities that associate with
- Name Entries and particularly with their Stub Parts. Not all of the
- links shown occur in a single Stub format, but all of the links in the
- figure can and do exist in selected cases. The purpose here is to
- show the flexibility of the system of links in associating required
- data with the Name Entry and its identifying symbol.
-
- While it may not be apparent from the figure, the dictionary structure
- as a whole may be viewed as a cyclic directed graph which is rooted in
- the DEBUG Hash Table. The recursive properties exhibited by the node
- relationships permit direct support of the scope rules of Turbo Pascal
- with simplicity and elegance. As one might expect, the representation
- of the required information lends itself to efficient use of storage
- since the representations are compact and there is very little in the
- way of redundancy. The small amount of redundancy that does exist is
- apparently aimed at speeding access to certain structures by the Turbo
- components (compiler, linker and debugger).
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 11
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
-
- ┌────(implied links, explicit LG's from other structures)
- │
- │ ┌─────────────────────────────────────────────┐
- │ │ Flags and codes, allocation widths for data │ Type
- └───>┤ and VMT's, subrange constraints, formal │ Descriptor
- │ parameter descriptors, implicit associated │ Contents &
- │ type descriptors, LL's, LG's and Offsets. │ Linkages
- └──────┬──────────────────────────────────────┘
- │
- │
- │ LG's ┌──────────────────┐
- ├──────────────>┤ Type Descriptors │
- │ └──────────────────┘
- │
- │ ┌───────────────────────────────┐
- │ LL's │ Method Name Entries │
- ├──────────────>┤ Nested Scope Hash Tables │
- │ │ Nested Scope Field Chains │
- │ │ Parent Scope Name Entry │
- │ └───────────────────────────────┘
- │
- │ Offsets ┌──────────────────────────────────┐
- └──────────────>┤ VMT pointers in Object Instances │
- │ CONST DSeg Map Table Entries │
- └──────────────────────────────────┘
-
-
- This figure illustrates the relationships between Type Descriptors and
- other structures in the dictionary. Not all the links shown can exist
- with a single Type Descriptor since there are several variant forms of
- these descriptors (depending on base type) but in combination, these
- linkages are feasible. In addition to links, a great amount of data
- is stored which is peculiar to a given type declaration. Descriptors
- can be -- and are -- shared. Indeed, they were designed with that in
- mind. Once a NAMED type is declared, all entities that reference it
- are linked to it in some way (usually by an LG).
-
- Almost every form of type descriptor is found in the SYSTEM unit and
- this fact is used to advantage. When un-typed constants are declared,
- a built-in type descriptor is referenced (via an LG) which provides
- necessary information for maintenance of orderly dictionary structure.
- When a named-type is declared, it is almost always decomposed into an
- expression based on the built-in types of Turbo Pascal which are found
- in the SYSTEM unit with the aid of an LG.
-
- The semantics underlying the idea of the Unit mandate this very
- approach since program modules of any class which make references to
- units for definitions use the definitions as implemented by the unit
- which contains them. Re-defining the unit or any of its defined types
- leads to a natural requirement to re-compile those program modules
- which rely on the unit for definitions. The impact is fundamental
- since the storage representation of a unit-defined named type can
- change in quite radical ways.
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 12
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 4. UNIT HEADER
-
-
- The Unit Header comprises the first 64 bytes of the .TPU file. It
- contains LL's that effectively locate all other sections of the .TPU
- file plus statistics that enable a little cross-checking to be
- performed. Some parts of the Unit Header appear to be reserved for
- future use since no unit examined by this author has ever contained
- non-zero data in these apparently reserved fields.
-
-
-
- 4.1 DESCRIPTION
-
-
- The Unit Header provides a high-level locator table whereby each major
- structure in the unit file can be addressed. The following provides a
- Pascal-like explanation of the layout of the header followed by
- further narrative discussion of the contents of the individual fields
- in the Unit Header.
-
- Type HdrAry = Array[0..3] of Char;
-
- UnitHeader = Record
-
- UHEYE : HdrAry; { +00 : = 'TPU9' }
- UHxxx : HdrAry; { +04 : = $00000000 }
- UHUDH : LL; { +08 : to Name Entry for This Unit }
- UGIHT : LL; { +0A : to Hash Table (INTERFACE) }
- UHPMT : LL; { +0C : to PROC Map }
- UHCMT : LL; { +0E : to CSeg Map }
- UHTMT : LL; { +10 : to DSeg Map-Typed CONST's }
- UHDMT : LL; { +12 : to DSeg Map-GLOBAL Variables }
- UHDLL : LL; { +14 : to DLL List (Windows Only) }
- UHLDU : LL; { +16 : to Donor Unit List }
- UHLSF : LL; { +18 : to Source file List }
- UHDBT : LL; { +1A : to Debug Trace Step Controls }
- UHENC : LL; { +1C : Size of Dictionary Area }
- UHZCS : Word; { +1E : Size of CODE Group }
- UHZDT : Word; { +20 : Size of Typed CONST Group }
- UHZFA : Word; { +22 : Fix-Up Bytes (CODE Group) }
- UHZFT : Word; { +24 : Fix-Up Bytes (Typed CONST's) }
- UHZFV : Word; { +26 : Size of GLOBAL VAR Data }
- UHDHT : LL; { +28 : to Hash Table (DEBUG) }
- UHSOV : Word; { +2A : Flags - Mostly Unknown }
- UHPad : Array[0..9]
- of Word; { +2C : Reserved for Future Expansion }
-
- End; { UnitHeader }
-
- UHEYE contains the characters "TPU9" in that order. This is
- clear evidence that this unit was compiled by Turbo Pascal
- Version 6.0 or by Turbo Pascal for Windows Version 1.0.
-
- UHxxx is apparently reserved and contains binary zeros.
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 13
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- UHUDH contains an LL (WORD) which points to the Name Entry in
- which the name of this unit is found.
-
- UHIHT contains an LL (WORD) which points to a Hash table that is
- the root of the Interface Dictionary graph.
-
- UHPMT contains an LL (WORD) which points to the PROC Map for
- this unit. The PROC Map contains an entry for each
- Procedure or Function declared in the unit (except for
- INLINE types), plus an entry for the Unit Initialization
- section. The length of the PROC Map (in bytes) is
- determined by subtracting this UHPMT from UHCMT.
-
- UHCMT contains an LL (WORD) which points to the CSeg (CODE
- Group) Map for this unit. The CSeg Map contains an entry
- for each CODE Segment produced by the compiler plus an
- entry for each of the CODE Segments included via the {$L
- filename.OBJ} compiler directive. The length of this Map
- (in bytes) is obtained by subtracting UNCMT from UHTMT.
- The result may be zero in which case the CSeg Map is
- empty.
-
- UHTMT contains an LL (WORD) which points to the DSeg (DATA
- Segment) Map that maps the initializing data for Typed
- CONST items plus templates for VMT's (Virtual Method
- Tables) and DMT's (Windows Dynamic Method Tables) that are
- associated with OBJECTS which employ Virtual Methods. The
- length of this Map (in bytes) is obtained by subtracting
- UHTMT from UHDMT. The result may be zero in which case
- this DSeg Map is empty.
-
- UHDMT contains an LL (WORD) which points to the DSeg (DATA
- Segment) Map that contains the specifications for DSeg
- storage required by VARiables whose scope is GLOBAL. The
- length of this Map (in bytes) is obtained by subtracting
- UHDMT from UHDLL. The result may be zero in which case
- this DSeg Map is empty.
-
- UHDLL contains an LL (WORD) which points to the DLL list in
- Windows. In Version 6.0, this is always zero.
-
- UHLDU contains an LL (WORD) which points to a table of units
- which contribute either CODE or DATA Segments to the .EXE
- file for a program using this Unit. This is called the
- "Donor Unit Table". The length of this table (in bytes)
- is obtained by subtracting UHLDU from the word UHLSF. The
- result may be zero in which case this table is empty.
-
- UHLSF contains an LL (WORD) which points to a list of "source"
- files. These are the files used as sources during
- compilation. Examples are the Pascal Source for the Unit
- itself, plus the .OBJ files linked via the {$L
- filename.OBJ} compiler directive. The length of this list
- (in bytes) is obtained by subtracting UHLSF from the word
- UHDBT. There should be at least one entry in this list.
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 14
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- UHDBT contains an LL (WORD) which points to a Trace Table used
- by the DEBUGGER for "stepping" through a Function or
- Procedure contained in this Unit. The length of this
- table (in bytes) is obtained by subtracting UHDBT from the
- word UHENC. The result may be zero in which case this
- table is empty.
-
- UHZDA is a WORD that contains the total byte count of the
- Dictionary Area for this unit. All bytes up to and
- including the Trace Table are included in this count.
-
- UHZCS is a WORD that contains the total byte count of all CODE
- Segments compiled into this Unit.
-
- UHZDT is a WORD that contains the total byte count of all Typed
- CONST, DMT and VMT DATA Segments compiled into this unit.
-
- UHZFA is a WORD that contains the total byte count of the Fix-Up
- Data Table for this unit for CODE (CSegs).
-
- UHZFT is a WORD that contains the total byte count of the Fix-Up
- Data Table for Typed CONST's. This usually implies that a
- VMT or DMT is getting its pointers relocated.
-
- UHZFV is a WORD that contains the total byte count of all GLOBAL
- VAR DATA Segments compiled into this unit.
-
- UHDHT contains an LL (WORD) which points to a Hash Table which
- is the root of the DEBUGGER Dictionary. If Local Symbols
- were generated by the compiler (directive {$L+}) then ALL
- symbols declared in the unit can be accessed from this
- Hash Table. If Local Symbols were suppressed there is no
- such Dictionary and the LL stored here points to the
- INTERFACE Dictionary.
-
- UHSOV This word contains flags. I have only been able to expose
- a few of the values with any real confidence. Here's what
- I know so far (expressed by bit numbers 15..0):
-
- 15..13: always zero?
- 12: always zero for Version 6.0 (DOS) Compiler?
- 1=DISCARDABLE, 0=PERMANENT Windows Segment.
- 11..7: always zero?
- 6: always zero for Version 6.0 (DOS) Compiler?
- 1=PRELOAD, 0=DEMANDLOAD Windows Segment.
- 5: always zero?
- 4: always zero for Version 6.0 (DOS) Compiler?
- 1=MOVEABLE, 0=FIXED Windows Segment.
- 3: always zero?
- 2: 0=DOS Compiler, 1=WINDOWS Compiler?
- 1: 1=DOS Compiler with {$O+}, else zero?
- 0: Unclear. Seems to imply that either this unit,
- or one that it references requires emulation
- support but this is only a guess.
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 15
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- UHPad begins a series of ten (10) words that are apparently
- reserved for future use. Nothing but zeros have ever been
- seen here by this author.
-
-
-
- 4.2 UNIT SIZE
-
-
- An independent check on the size of the .TPU file is available using
- information contained in the Unit Header. This is also important for
- .TPL (Unit Library) organization. To compute the file :size, refer to
- the five (5) words -- UHZDA, UHZCS, UHZDT, UHZFA, and UHZFT. Round
- the contents of each of these words to the lowest multiple of 16 that
- is greater than or equal to the content of that word. Then form the
- sum of the rounded words. This is the .TPU file size in bytes -- a
- LongInt result.
-
- A Unit MAY be larger than 64K bytes. I finally tumbled to this when I
- began to analyze the Windows Unit "WOBJECTS". I now feel that each of
- the sections referenced by the sizes above may be up to 64K bytes
- long. This implies an upper limit for unit size of around 320K bytes.
- My face is actually quite red over this. Since a Unit has always been
- capable of producing a 64K Code Segment not to mention a Data Segment
- of nearly the same size, I can't explain why the significance of these
- "size" words didn't dawn on me sooner.
-
-
-
- 5. SYMBOL DICTIONARIES
-
-
- This area contains all available documentation of declared symbols and
- procedure blocks defined within the unit. Depending on compiler
- options in effect when the unit was compiled, this section will
- contain at a minimum, the INTERFACE declarations, and at a maximum,
- ALL declarations. The information stored in the dictionary is highly
- dependent on the context of the symbol declared. We defer further
- explanation to the appropriate section which follows.
-
-
-
- 5.1 ORGANIZATION
-
-
- A dictionary is organized with a Hash Table as its root. The hash
- table is used to provide rapid access to identifiers.
-
- A dictionary may be thought of as a directed graph. Each subgraph is
- rooted in a hash table. There may be a great many hash tables in a
- given unit and their number depends on unit complexity as well as the
- options chosen when the unit was compiled. Use of the {$L+} directive
- produces the largest dictionaries. The hash tables are explained in
- detail a few sections further on.
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 16
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- Hash tables point to Name Entries. When two or more symbols produce
- the same hash function result, a "collision" is said to occur.
- Collisions are resolved by the time-honored method of chaining
- together the Name Entries of those symbols having the same hash
- function result. Dictionary supersetting is accomplished using these
- chains.
-
-
-
- 5.2 INTERFACE DICTIONARY
-
-
- The INTERFACE dictionary contains all symbols and the necessary
- explanatory data for the INTERFACE section of a Unit. Symbols get
- added to the Unit using increasing storage addresses until the
- IMPLEMENTATION section is encountered.
-
-
-
- 5.3 DEBUG DICTIONARY
-
-
- The Debug dictionary (if present) is a superset of the INTERFACE
- dictionary. It is used by the Turbo Debugger to support its many
- features when tracing through a unit. If present, this dictionary is
- rooted in its own hash table. The hash table is effectively
- initialized when the IMPLEMENTATION keyword is processed by the
- compiler. This takes the form (initially) of an unmodified copy of
- the INTERFACE hash table, to which symbols are added in the usual
- fashion. Thus, the hash chains constructed or extended at this time
- lead naturally to the INTERFACE chains and this is how the superset is
- effectively implemented.
-
-
-
- 5.4 DICTIONARY ELEMENTS
-
-
- The dictionary contains four major elements. These are: hash tables,
- Name Entries, Name Stubs and Type Descriptors. The distinction
- between Name Entries and Name Stubs might appear to be rather
- arbitrary. They might just as easily be regarded as a single element
- (such as symbol entry). However, the case for the separate entity
- approach is strong since Stubs are DIRECTLY addressed via LG's and --
- more to the point -- ONLY by LG's. Thus, it seems reasonable that
- this is a separate and very important structure -- at least in the
- minds of the architects at Borland.
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 17
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.1 HASH TABLES
-
-
- As has been intimated, Hash Tables are the glue that binds the
- dictionary entries together and gives the dictionary its "shape".
- They effectively implement the scope rules of the language and speed
- access to essential information.
-
- Each Hash table begins with a 2-byte size descriptor. This descriptor
- contains the number of bytes in the table proper (less 2). Thus, the
- descriptor directly points to the last bucket in the hash table. For
- a hash table of 128 bytes, the size descriptor contains 126. The
- first bucket in the table immediately follows the size descriptor.
-
-
-
- 5.4.1.1 SIZE
-
-
- So far, three different hash table sizes have been observed. The
- INTERFACE and DEBUG hash tables are usually 128 bytes (64 entries) in
- size plus 2 bytes of size description, but the SYSTEM.TPU unit is a
- special case, containing only 16 entries. Hash tables which anchor
- subgraphs whose scope is relatively local usually contain four (4)
- entries (8 bytes).
-
- Graphically, a Hash Table with four slots has the following layout:
-
- ┌────────────────────┐
- │ 0006h │ Size Descriptor
- ├════════════════════┤
- │ slot 0 │ an LL or zero
- ├────────────────────┤
- │ slot 1 │ an LL or zero
- ├────────────────────┤
- │ slot 2 │ an LL or zero
- ├────────────────────┤
- │ slot 3 │ an LL or zero
- └────────────────────┘
-
- It should be noted that the Size Descriptor furnishes an upper bound
- for the hash function itself. Thus, it seems possible that a single
- hash function is used for all hash tables and that its result is ANDed
- with the Size Descriptor to get the final result. Because the sizes
- are chosen as they are (powers of 2) this is feasible. Note that in
- the above example, 6 = 2 * (n - 1) where n = 4 {slot count}. All of
- the hash tables observed so far have this property.
-
- One final note on this subject. Given these properties, "Folding" of
- sparse hash tables is a rather trivial exercise so long as the new
- hash table also contains a number of slots that is a power of 2. This
- point is intriguing when one recalls that the System.TPU hash table
- has only 16 slots rather than the usual 64.
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 18
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.1.2 SCOPE
-
-
- The INTERFACE and Debug dictionary hash tables are Global in Scope
- even though the symbols accessed directly via either hash table may be
- private. On the other hand, other hash tables are purely local in
- scope. For example, the fields declared within a record are reached
- via a small local hash table, as are the arguments and local variables
- declared within procedures and functions. Even OBJECTS use this
- technique to provide access to Methods and Object Fields.
-
- Access to such local scope fields/methods requires use of qualified
- names which ensures conformity to Pascal scope rules. The method is
- truly simple and elegant.
-
-
-
- 5.4.1.3 SPECIAL CASES
-
-
- The SYSTEM.TPU Unit is a special case. Its INTERFACE hash table has
- apparently been "hand-tuned" for small size and it contains only
- sixteen (16) entries. I have always felt that "hand-coding" must have
- been used to achieve the SYSTEM unit. The implications of the file
- "SYSTEM.TPS" required for compilation of the SYSTEM unit seem to
- support this opinion. Certainly, there are aspects of this unit that
- appear conventional, but there is much that is unique and apparently
- not the result of PASCAL coding. Library sources should help clarify
- this. (See 2.2 SYSTEM UNIT on page 8)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 19
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.2 NAME ENTRIES
-
-
- This is the structure that anchors all information known by the
- compiler about any symbol. The format is as follows:
-
- DNameRec = RECORD
- HLink : LL; { Hash Chain Link; Resolves Collisions }
- DForm : Char; { Symbol Class }
- DSymb : STRING[63]; { Text of Symbol (UPPER-CASE) }
- END;
-
- HLink: An LL which points to the next (previous) symbol in the
- same unit which had the same hash function value.
-
- DForm: A character that defines the class the symbol belongs to
- and defines the format of the Name Stub which follows the
- Name Entry. If the symbol is declared in the component
- list of the "private" part of an Object declaration, then
- this character is modified by adding $80 to its ordinal
- value. Thus, an ordinary Function, Procedure or Method is
- of category "S" while a private Method is of category
- Chr(Ord('S')+$80).
-
- DSymb: A String (in the Pascal sense) of variable size that
- contains the text of the symbol (in UPPER-CASE letters
- only). The SizeOf function is not defined for these
- strings since they are truncated to match the symbol size.
- The "value" of the SizeOf function can be determined by
- adding 1 to the first byte in the string. Thus,
- Ord(Symbol[0])+1 is the expression that defines the Size
- of the symbol string. Turbo Pascal defines a symbol as a
- string of relatively arbitrary size, the most significant
- 63 characters of which will be stored in the dictionary.
- Thus, we conclude that the maximum size of such a string
- is 64 bytes.
-
-
-
- 5.4.3 NAME STUBS
-
-
- Name Stubs immediately follow their respective Name Entries and their
- format is determined by the class code in the Name Entry. The
- function of the stub is to organize the information appropriate to the
- symbol and provide a means of accessing additional information such as
- type descriptors, constant values, parameter lists and nested scopes.
- The format of each Stub is presented in the following sub-sections.
-
-
-
- 5.4.3.1 LABEL DECLARATIVES ("O")
-
-
- This Stub consists of a WORD whose function is (as yet) unknown.
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 20
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.3.2 UN-TYPED CONSTANTS ("P")
-
-
- Format is as follows (CASE fragment):
-
- 'P':( { --- For Untyped Constants --- }
- sPTD : LG; { to type descriptor }
- sPV1 : LongInt; { constant value - size variable }
- );
-
- sPTD: An LG which points to a Type Descriptor (usually in
- SYSTEM.TPU). This establishes the minimum storage
- requirement for the constant. The rules vary with the
- type, but the size of the constant data field (which
- follows) is defined using the Type Descriptor(s).
-
- sPV1: The value of the constant. For ordinal types, this value
- is stored as a LONGINT (size=4 bytes). For Floating-Point
- types, the size is implicit in the type itself. For
- String types, the size is determined from the length of
- the string which is stored in the initial byte of the
- constant.
-
-
-
- 5.4.3.3 NAMED TYPES ("Q")
-
-
- This Stub consists of an LG (4-bytes) that points to the Type
- Descriptor for this symbol.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 21
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.3.4 VARIABLES, FIELDS, TYPED CONS ("R")
-
-
- This Stub contains information required to allocate and describe these
- types of entities. The format and content is as follows:
-
- 'R': ( { -- Variable, Field, Object -- }
- sRAM: Byte; { allocation method codes: }
- sRVF: CASE sRAM: Byte Of
- $02,$06,
- $22,$26: (ROfs : Word; { allocation offset (BP) }
- ROB : Word); { To Parent Scope/Zero }
- $00,$01: (TOfs : Word; { allocation offset in map}
- TOB : LL); { offset in VAR/CONST Map }
- $03: (AFar : Word); { FAR Pointer to Location }
- $08: (Bofs : Word; { Offset-Record Relative }
- RChn : LL); { To Next Field/Method }
- $10: (QLG : LG); { to Stub of Allocator }
- END;
- sRTD: LG); { to Type Descriptor }
-
- sRAM: A one-byte flag that precisely identifies the class of the
- item being described. The known values and their apparent
- meanings follow:
-
- $00 -> Global Variables (Allocated in DS);
- $01 -> Typed Constants (Allocated in DS);
- $02 -> Procedure LOCAL Variables on STACK;
- $03 -> Variables at Absolute Addresses;
- $06 -> ADDRESS Arguments allocated on STACK; (This is now
- used only for SELF in Method calls;)
- $08 -> Fields sub-allocated in RECORDS and OBJECTS, plus
- METHODS declared for OBJECTS.
- $10 -> Variable Equivalenced to another via the
- Absolute Clause;
- $22 -> Arguments whose VALUEs are passed on the stack;
- $26 -> Arguments whose ADDRESSes are passed on the stack.
-
- sRVF: Two words whose content vary with sRAM above. Their are
- shown as case variants in the following:
-
- $02,$06,$22,$26: {arguments}
-
- sRVF.ROfs: Word -- Offset relative to either DS or BP.
- sRVF.ROB: Word -- LL to Dict Header of Parent Scope, or zero.
-
- $00,$01: {VAR's or typed CONSTs}
-
- sRVF.TOfs: Word -- Offset relative to allocation area origin;
- sRVF.TOB: Word -- Offset to entry in VAR/CONST Map for item
- allocation;
-
- $03: {Absolute Address Variable}
-
- sRVF.AFar: POINTER -- FAR Pointer to Absolute Memory Address.
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 22
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- $08: {Record/Object Fields/Methods}
-
- sRVF.BOfs: Word -- Allocation Offset within Record/Object;
- sRVF.RChn: Word -- LL to next Field/Method.
-
- $10: {Absolute Equivalences}
-
- sRVF.QLG: LG -- LG to STUB of variable/parameter declaration
- that actually establishes the allocation;
-
- sRTD: An LG that locates the proper Type Descriptor for this
- symbol.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 23
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.3.5 SUBPROGRAMS & METHODS ("S")
-
-
- Subprograms (PROC's), especially since Object Methods are supported,
- have a rather involved stub. Its format is as follows:
-
- 'S': ( { ------ User Subprograms ----- }
- sSTp : Byte; { BIT Encoded Flags }
- sSxx : Byte; { More Attribute Flags? }
- sSPM : Word; { Code byte count if INLINE, }
- { else, offset to PROC Map }
- sSPS : LL; { to containing scope or zero }
- sSHT : LL; { to local scope hash table }
- sSVM : Word); { VMT Offset-VIRTUAL Method PTR }
-
- sSTP: A byte that contains bit-switches that seem to describe
- the Call Model and imply the size of this stub. These
- switches determine what kind of code (if any) is generated
- when the PROC is referenced. The observed values are as
- follows:
-
- xxxxx001 -> PROC uses FAR Call Model;
- xxxx0010 -> PROC uses INLINE Model (no Call);
- xxxx0100 -> PROC uses INTERRUPT Model (no Call);
- xxxx100x -> PROC has EXTERNAL attribute;
- xxx1xxxx -> PROC uses METHOD Call Model;
- x011xxxx -> PROC is a CONSTRUCTOR Method;
- x101xxxx -> PROC is a DESTRUCTOR Method;
- 1xxxxxxx -> PROC has ASSEMBLER directive.
-
- sSxx: A byte whose function is not yet fully known. In the
- Windows compiler it is copied into the PROC Map -
- presumably for use by the linker or debugger. Bit
- positions firmly established are as follows (7..0):
-
- 7-6: always zero?
- 5: ????
- 4: Dynamic Call Model using DMT
- 3-2: 11 = DLL PROC Referenced by NAME
- 01 = DLL PROC Referenced by INDEX
- 1: ????
- 0: always zero???
-
- sSPM: A Word whose interpretation depends on whether or not we
- have an INLINE Declarative Subprogram. If this is an
- INLINE Declarative Subprogram, then this word contains the
- byte-count of the INLINE code text at the end of this
- stub. Otherwise, this word is the offset within the PROC
- Map that locates the object code for this Subprogram.
-
- sSPS: A Word that contains an LL which locates the containing
- scope in the dictionary, or zero if none.
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 24
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- sSHT: A Word that contains an LL which locates the local Hash
- Table for this scope. A local hash table provides access
- to all formal parameters of the Subprogram as well as all
- Symbols whose declarations are local to the scope of this
- Subprogram.
-
- sSVM: A Word that is zero unless the symbol is a Virtual Method.
- In this case, then the content is the offset within the
- VMT for the owning object that defines where the FAR
- POINTER to this Virtual Method is stored.
-
- +0A: A complete Type-Descriptor for this Subprogram. The
- length is variable and depends upon the number of Formal
- Parameters declared in the header. (See 5.4.4.3.5 on page
- 34).
-
- +??: If this Symbol represents an INLINE Declarative
- Subprogram, then the object-code text begins here. The
- byte-count of the text is stored in sSPM in this stub.
-
-
-
- 5.4.3.6 TURBO STD PROCEDURES ("T")
-
-
- This Stub consists of two bytes, the first of which is unique for each
- procedure and increments by 4. I have found nothing in the SYSTEM
- unit (which is where this entry appears) that this seems directly
- related to. The second byte is always zero.
-
-
-
- 5.4.3.7 TURBO STD FUNCTIONS ("U")
-
-
- This Stub consists of two bytes, the first of which is unique for each
- function and increments by 4. I have found nothing in the SYSTEM unit
- (which is where this entry appears) that this seems directly related
- to. I wouldn't be surprised if this byte were an index into a TURBO
- compiler table that points to specialized parse tables/action routines
- for handling these functions and their non-standard parameter lists.
-
- The second byte seems to be a flag having the values $00, $40 and $C0.
- I strongly suspect that the flag $C0 marks exactly those functions
- which may be evaluated at compile-time. The meaning behind the other
- values is not known to me.
-
-
-
- 5.4.3.8 TURBO STD "NEW" ROUTINE ("V")
-
-
- This Stub consists of a WORD whose function is (as yet) unknown. This
- is the only Standard Turbo routine that can behave as a procedure as
- well as a function (returning a pointer value).
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 25
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.3.9 TURBO STD PORT ARRAYS ("W")
-
-
- This Stub consists of a byte whose value is 0 for byte arrays, and 1
- for word arrays.
-
-
-
- 5.4.3.10 TURBO STD EXTERNAL VARIABLES ("X")
-
-
- This Stub consists of an LG (4-bytes) that points to the Type
- Descriptor for this symbol. (These are used for the arrays MEM, MEMW
- and MEML.)
-
-
-
- 5.4.3.11 UNITS ("Y")
-
-
- Unit Stubs have the following content:
-
- +00: A Word whose apparently reserved for use by the Compiler
- or Linker.
-
- +02: A Word that seems to contain some kind of "signature" used
- to detect inconsistent Unit Versions. Borland calls this
- a "unit version number, which is basically a checksum of
- the interface part." I have seen a thread in CIS which
- says that it is a CRC value. Food for thought?
-
- +04: A Word that contains an LL which locates the Successor
- Unit in the "Uses" list. In fact, the "Uses" lists of
- both the INTERFACE and IMPLEMENTATION sections of the Unit
- are merged by this Word into a single list. A value of
- zero is used to indicate no successor.
-
- +06: A Word that contains an LL which locates the Predecessor
- Unit in the "Uses" list. For the SYSTEM unit entry, this
- value is always zero to indicate no predecessor. For the
- Unit being compiled, this LL locates the final Unit in the
- combined "Uses" list.
-
- In effect, the two LL's at offsets 0004 and 0006 organize the units
- into both forward and backward linked chains. The entry for the unit
- being compiled is effectively the head of both the forward and the
- backward chains. The final unit in the merged "Uses" list is the tail
- of the forward chain, and the SYSTEM unit is the tail of the backward
- chain.
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 26
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.4 TYPE DESCRIPTORS
-
-
- Type Descriptors store much of the semantic information that applies
- to the symbols declared in the unit. Implementation details can be
- managed using high-level abstractions and these abstractions can be
- shared.
-
-
-
- 5.4.4.1 SCOPE
-
-
- Type Descriptor sharing can occur across the boundaries which are
- implicit in unit modules. Thus, a type defined in one unit may be
- "imported" by some other module. Also, the pre-defined Pascal Types
- (plus the Turbo Pascal extensions) are defined in the SYSTEM.TPU unit
- and there needs to be a means of "importing" such Type Descriptors
- during compilation. This is precisely the objective of the LG locator
- (see Section 3.2 on Page 9). Type Descriptors are NEVER copied
- between units. The binding always occurs by reference at compile time
- and this helps support the technique of modifying a unit and compiling
- it to a .TPU file, then re-compiling all units/programs that "USE" it.
-
- Type Descriptors have many roles so their format varies. We have
- divided these structures into two parts: The PREFIX Part (which is
- always present and) whose format is fairly constant and the SUFFIX
- Part whose content and format depends on the attributes that are part
- of the type definition.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 27
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.4.2 PREFIX PART
-
-
- The Prefix Part of every Type Descriptor consists of six (6) bytes.
- The usage is consistent for all types observed by this author and the
- format is as follows:
-
- +00: A Byte that identifies the format of the Suffix part.
- This is essentially based on several high-level categories
- which the Suffix Parts support directly. The observed set
- of values is as follows:
-
- 00h -> an un-typed entity;
- 01h -> an ARRAY type;
- 02h -> a RECORD type;
- 03h -> an OBJECT type;
- 04h -> a FILE type (other than TEXT);
- 05h -> a TEXT File type;
- 06h -> a SUBPROGRAM type;
- 07h -> a SET type;
- 08h -> a POINTER type;
- 09h -> a STRING type;
- 0Ah -> an 8087 Floating-Point type;
- 0Bh -> a REAL type;
- 0Ch -> a Fixed-Point ordinal type;
- 0Dh -> a BOOLEAN type;
- 0Eh -> a CHAR type;
- 0Fh -> an Enumerated ordinal type.
-
- +01: A Byte used as a modifier. Since the above scheme is too
- general for machine-dependent details such as storage
- width and sign control, this modifier byte supplies
- additional data. The author has identified several cases
- in which this information is vital but has not spent very
- much time on the subject. The chief areas of importance
- seem to be in the 8087 Floating-Point types, and the
- Fixed-Point ordinal types. The semantics seem to be as
- follows:
-
- 0A 00 -> The type "SINGLE"
- 0A 02 -> The type "EXTENDED"
- 0A 04 -> The type "DOUBLE"
- 0A 06 -> The type "COMP"
-
- 0C 00 -> an un-named BYTE integer
- 0C 01 -> The type "SHORTINT"
- 0C 02 -> The type "BYTE"
- 0C 04 -> an un-named WORD integer
- 0C 05 -> The type "INTEGER"
- 0C 06 -> The type "WORD"
- 0C 0C -> an un-named double-word integer
- 0C 0D -> The type "LONGINT"
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 28
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- +02: A Word that contains the number of bytes of storage that
- are required to contain an object/entity of this type.
- For types that represent variable-length objects/entities
- such as strings, this word may define the value returned
- by the SIZEOF function as applied to the type.
-
- This word is probably of value during compilation of un-
- typed CONST's since the size of their Stubs depend on this
- field. For STRING types however, the length descriptor is
- part of the string itself.
-
- +04 A Word that is zero (for DOS units) unless the descriptor
- is for an Object Method. In this case, the content is an
- LL to the Name Entry of the SUCCEEDING Method for the
- Object, in order of declaration, or zero if none. Some
- Windows units (e.g., SYSTEM) have non-zero values here
- whose function is not known.
-
-
-
- 5.4.4.3 SUFFIX PARTS
-
-
- Suffix Parts further refine the implementation details of the type and
- also provide subrange constraints where appropriate. In some cases
- the Suffix part is empty since all semantic data for the type is
- contained in the Prefix part.
-
-
-
- 5.4.4.3.1 UN-TYPED
-
-
- This Suffix Part is empty. Nothing is known about an un-typed entity.
-
-
-
- 5.4.4.3.2 STRUCTURED TYPES
-
-
- The structured types represent aggregates of lower-level types. We
- include ARRAY, RECORD, OBJECT, FILE, TEXT, SET, POINTER and STRING
- types in this category.
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 29
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.4.3.2.1 ARRAY TYPES
-
-
- The Suffix Part of the ARRAY type is so constructed as to be able to
- support recursive or nested definition of arrays. The suffix format
- is as follows:
-
- +00: An LG that locates the Type Descriptor for the "base-type"
- of the array. This is the type of the entity being
- arrayed (which may itself be an array).
-
- +04: An LG that locates the Type Descriptor for the array
- bounds which is a constrained ordinal type or subrange.
-
-
-
- 5.4.4.3.2.2 RECORD TYPES
-
-
- RECORD types have nested scopes. The Suffix part provides a base
- structure by which to locate the fields local to the scope of the
- Record type itself. The format is as follows:
-
- +00: A Word containing an LL which locates the local Hash Table
- that provides access to the fields in the nested scope.
-
- +02: A Word containing an LL which locates the Name Entry of
- the initial field in the nested scope. This supports a
- "left-to-right" traversal of the fields in a record.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 30
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.4.3.2.3 OBJECT TYPES
-
-
- OBJECT types also have nested scopes. The Suffix part provides a base
- structure by which to locate the fields and METHODS local to the scope
- of the OBJECT type itself. In addition, inheritance and VMT
- particulars are stored. The format is as follows:
-
- +00: A Word containing an LL which locates the local Hash Table
- that provides access to the fields and METHODS local to
- the nested scope.
-
- +02: A Word containing an LL which locates the Name Entry of
- the initial field or METHOD in the nested scope. This
- supports a "left-to-right" traversal of the fields and
- METHODS in an OBJECT.
-
- +04: An LG which locates the Type Descriptor of the Parent
- Object. This field is zero if there is no such Parent.
-
- +08: A Word which contains the size in bytes of the VMT for
- this Object. This field is zero if the object employs no
- Virtual Methods, Constructors or Destructors.
-
- +0A: A Word which contains the offset within the CONST DSeg Map
- that locates the VMT skeleton or template segment. This
- field equals FFFFh if the object employs no Virtual
- Methods, Constructors or Destructors.
-
- +0C: A Word which contains the offset within an Object instance
- where the NEAR POINTER to the VMT for the object is stored
- (within the DATA SEGMENT). This field equals FFFFh if the
- object employs no Virtual Methods, Constructors or
- Destructors.
-
- +0E: A Word which contains an LL which locates the Name Entry
- for the name of the OBJECT itself.
-
- +10: A Word containing $FFFF in DOS units. In WINDOWS units
- this word contains the offset within the CONST DSeg Map
- that locates the DMT skeleton or template segment. This
- field equals FFFFh if the object employs no Dynamic
- Methods.
-
- +12: Three Words (not yet understood) containing zeroes.
-
-
-
- 5.4.4.3.2.4 FILE (NON-TEXT) TYPES
-
-
- This Suffix consists of an LG that locates the Type Descriptor of the
- base type of the file. Note that the Type Descriptor may be that of
- an un-typed entity (for un-typed files).
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 31
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.4.3.2.5 TEXT FILE TYPES
-
-
- This Suffix consists of an LG that locates the Type Descriptor of the
- base type of the file -- in this case SYSTEM.CHAR.
-
-
-
- 5.4.4.3.2.6 SET TYPES
-
-
- This Suffix consists of an LG that locates the base-type of the set
- itself. Pascal limits such entities to simple ordinals whose
- cardinality is limited to 256.
-
-
-
- 5.4.4.3.2.7 POINTER TYPES
-
-
- This Suffix consists of an LG that locates the base-type of the entity
- pointed at.
-
-
-
- 5.4.4.3.2.8 STRING TYPES
-
-
- This is a special case of an ARRAY type. The format is as follows:
-
- +00: An LG to the Type Descriptor SYSTEM.CHAR which is the base
- type of all Turbo Pascal Strings.
-
- +04: An LG to the Type Descriptor for the array bounds
- constraints for the string. When the unconstrained STRING
- type is used, this points to SYSTEM.BYTE which is defined
- as a subrange 0..255.
-
-
-
- 5.4.4.3.3 FLOATING-POINT TYPES
-
-
- The Suffix part for all Floating-Point types is EMPTY. All data
- needed to specify these approximate number types is contained in the
- Prefix part. The Types included in this class are SINGLE, DOUBLE,
- EXTENDED, COMP and REAL.
-
-
-
- 5.4.4.3.4 ORDINAL TYPES
-
-
- The Ordinal Types consist of the various "integer" types plus the
- BOOLEAN, CHAR and Enumerated types.
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 32
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.4.3.4.1 "INTEGERS"
-
-
- These types include BYTE, SMALLINT, WORD, INTEGER and LONGINT. Their
- Suffix parts are identical in format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Type Descriptor of the largest
- upward compatible type. This is the Type Descriptor that
- is used to control the width of an un-typed constant in
- the dictionary stub. For the "integer" types, this is an
- LG to SYSTEM.LONGINT.
-
-
-
- 5.4.4.3.4.2 BOOLEANS
-
-
- This type Suffix has the following format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Type Descriptor SYSTEM.BOOLEAN.
- There is no "upward compatible" type.
-
-
-
- 5.4.4.3.4.3 CHARS
-
-
- This type Suffix has the following format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Type Descriptor SYSTEM.CHAR. There
- is no "upward compatible" type.
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 33
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 5.4.4.3.4.4 ENUMERATIONS
-
-
- This type Suffix is unusual and has the following format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Prefix of the current Type
- Descriptor. There is no upward compatible type.
-
- What follows is a full-fledged SET Type Descriptor whose base type is
- the Type Descriptor of the Enumerated Type itself. The author has not
- yet discovered the reason for this.
-
- At least one case has been observed where a set type descriptor is
- followed by a word containing zero but I know of no explanation.
- Could this be a (shudder) BUG in Turbo?
-
-
-
- 5.4.4.3.5 SUBPROGRAM TYPES
-
-
- The length of this Suffix is variable. The format is as follows:
-
- +00: An LG that locates the Type Descriptor of the FUNCTION
- result returned by the Subprogram. This field is zero if
- the Subprogram is a PROCEDURE.
-
- +04: A Word that contains the number of Formal Parameters in
- the Function/Procedure header. If non-zero, then this
- word is followed by the parameter list itself as a simple
- array of parameter descriptors.
-
- The format of a parameter descriptor is as follows:
-
- 0000: An LG that locates the Type Descriptor of the
- corresponding parameter;
-
- 0004: A Byte that identifies the parameter passing
- mechanism used for this entry as follows:
-
- 02h -> VALUE of parameter is passed on STACK,
- 06h -> ADDRESS of parameter is passed on STACK.
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 34
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 6. MAPS AND LISTS
-
-
- The "MAPS and LISTS" are not part of the symbol dictionary. Rather,
- these structures provide access to the Code and Data Segments produced
- by the compiler or included via the {$L name.OBJ} directive. The
- format and purpose (as understood by this author) of each of these
- tables is explained in the following sections.
-
-
-
- 6.1 PROC MAP
-
-
- The PROC Map provides a means of associating the various Function and
- Procedure declarations with Code Segments and DLL's. There is some
- evidence that the Compiler produces CODE (and DATA) Segments for EACH
- of the Subprograms defined in the Unit as well as for the un-named
- Unit Initialization code block. There is also evidence that EXTERNAL
- PROCs must be assembled separately in order to exploit fully the Turbo
- "Smart Linker" since Turbo Pascal places some significant restrictions
- on EXTERNAL routines in the area of Segment Names and Types.
- Specifically, only code segments named "CODE" and data segments named
- "DATA" or "CONST" will be used by the "Smart Linker" as sources of
- code and data for inclusion in a Turbo Pascal .EXE file. (Turbo 6.0
- relaxed Name constraints but only one code segment per .OBJ remains a
- limitation).
-
- The first entry in the PROC Map is reserved for Unit Initialization
- block. If there is no Unit Initialization block, this entry will be
- marked with $FFFF. In addition, each and every PROC in the Unit has
- an entry in this table (except for INLINE procs).
-
- If an EXTERNAL routine is included, then ALL PUBLIC PROC definitions
- in that routine must be declared in the Unit Source Code with the
- EXTERNAL attribute.
-
- The size of the PROC Map Table (in Bytes) is implied in the Unit
- Header by the LL's named UHPMT and UNCMT.
-
- The Format of a single PROC Map Entry is as follows:
-
- +00: A Word presumably reserved as a work area; always zero.
-
- +02: A Word which contains Flags copied from sSxx in the Stub
- for the Subprogram. This word is always zero for the DOS
- compiler. (see 5.4.3.5, page 24)
-
- +04: A Word that contains an offset within the CSeg Map. This
- is used to locate the code segment containing the PROC.
- If the PROC is found in a DLL, then this word is an offset
- within the DLL List to the DLL name (i.e., the file with
- the .DLL extension).
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 35
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- +06: A Word that contains an offset within the CODE Segment
- that defines the PROC entry point relative to the load
- point of the referenced CODE Segment if local to this
- unit. For DLL PROCS referenced by "INDEX" this word is
- the procedure "INDEX" number within the DLL. For DLL
- PROCS referenced by "NAME" this word is an offset to that
- name which is stored in the DLL List.
-
-
-
- 6.2 CSEG MAP
-
-
- The CSeg Map provides a convenient descriptor table for each CODE
- Segment present in the Unit and serves to relate these segments with
- the Segment Relocation Data and the Segment Trace Table. It seems
- reasonable to infer that the "Smart Linker" is able to include/exclude
- code/data at the SEGMENT level only.
-
- The CSeg Map is an array of fixed-length records whose format is as
- follows:
-
- +00: A Word apparently reserved for use by TURBO.
-
- +02: A Word that contains the Segment Length (in bytes).
-
- +04: A Word that contains the Length of the Fix-Up Data Table
- for this Code Segment (in bytes).
-
- +06: A Word that contains the offset of the Trace Table Entry
- for this Segment (if it was compiled with DEBUG Support).
- If there is no Trace Table for this segment, then this
- Word contains FFFFh.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 36
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 6.3 TYPED CONST DSEG MAP
-
-
- The CONST DSeg Map provides a convenient descriptor table for each
- DATA Segment which was spawned by the presence of Typed Constants or
- VMT's in the Pascal Code. It serves to relate these segments with the
- Segment Fix-Up (relocation) Data and with the Code Segments that refer
- to these DATA elements. One entry is present for each CONST
- declaration part containing typed constants and for each CONST segment
- linked from an ".OBJ" file. The CONST DSeg Map is an array of fixed-
- length records whose format is as follows:
-
- +00: A Word apparently reserved for use by TURBO.
-
- +02: A Word that contains the Segment Length (in bytes).
-
- +04: A Word that contains the Length of the Fix-Up Data Table
- for this DATA Segment (in bytes).
-
- +06: A Word that contains an LL which locates the OBJECT that
- owns this VMT or DMT template or zero if the segment is
- not a VMT or DMT template.
-
- One can determine the defining block for a Typed Constant declaration
- and our program attempts to do just that. A by-product of the
- dictionary mapping algorithm allows the declaring block to be found
- and its qualified name printed. This information is also used to
- explain fix-up data as to its source. Results will be incomplete
- unless a really comprehensive dictionary is present in the unit.
-
-
-
- 6.4 GLOBAL VAR DSEG MAP
-
-
- The VAR DSeg Map provides a convenient descriptor table for each DATA
- Segment present in the Unit.
-
- One entry exists for each VAR declaration part whose scope is not
- local to a PROC and so is allocated in the DATA Segment. CODE
- Segments may have references to these in the CODE Fix-Up Data Table.
- Each EXTERNAL CSeg having a segment named DATA also spawns an entry in
- this table.
-
- The VAR DSeg Map is an array of fixed-length records whose format is
- as follows:
-
- +00: A Word apparently reserved for use by TURBO.
-
- +02: A Word that contains the Segment Length (in bytes). This
- may be zero, especially if the EXTERNAL routine contains a
- DATA segment whose sole purpose is to declare one or more
- EXTRN symbols that are defined in some DATA segment
- external to the Assembly.
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 37
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- +04: A Word apparently reserved for use by TURBO.
-
- +06: A Word apparently reserved for use by TURBO.
-
- One can determine the defining block for a Global VARiable declaration
- and our program attempts to do just that. A by-product of the
- dictionary mapping algorithm allows the declaring block to be found
- and its qualified name printed. This information is also used to
- explain fix-up data as to its source. Results will be incomplete
- unless a really comprehensive dictionary is present in the unit. Such
- DSegs can be referenced by many CSegs and we only locate the first
- one. This is okay for Pascal code but it's ambiguous for assembler
- since the names may be PUBLIC and referenced by more than one module.
-
-
-
- 6.5 DLL LIST
-
-
- This list is present ONLY in Units compiled by the Windows Version and
- then only if the unit calls Dynamic Link Library (DLL) PROCS. The DLL
- List has the following format:
-
- +00: Four (4) bytes of binary zeroes (reserved for work?).
-
- +04: A variable-sized String that contains the name of the DLL
- MEMBER name or the PROC name (for DLL reference by NAME).
- The string is truncated to actual size as usual for a
- unit.
-
- Procedures or Functions which reside in DLL's have entries in the PROC
- map but NOT in the CSeg Map since the executable code is external.
-
-
-
- 6.6 DONOR UNIT LIST
-
-
- This list contains an entry for each Unit (taken from the "USES" list)
- which MAY contribute either CODE or DATA to the executable file. Not
- all units do make such a contribution as some exist merely to define a
- collection of Types, etc. A Unit gets into this list if there exists
- a single Fix-Up Data Entry that references CODE or DATA in that Unit.
-
- The list is comprised of elements whose SIZE is variable and whose
- format is as follows:
-
- +00: A WORD apparently reserved for use by TURBO.
-
- +02: A variable-length String containing the unit name.
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 38
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 6.7 SOURCE FILE LIST
-
-
- This list contains an entry for each "source" file used to compile the
- Unit. This includes the Primary Pascal file, files containing Pascal
- code included by means of the {$I filename.xxx} compiler directive,
- and .OBJ files included by the {$L filename.OBJ} compiler directive.
-
- The order of entries in this list is critical since it maps the CODE
- segments stored in the unit. The order of the entries is as follows:
-
- The Primary Pascal file;
-
- All Included Pascal files;
-
- All Included .OBJ files.
-
- Mapping of CSegs to files is done as follows:
-
- Each .OBJ file contributes a SINGLE Code Segment (if any). Note
- that this author has not observed an .OBJ module that
- contains only a DATA Segment (but that seems a distinct
- possibility).
-
- The Primary Pascal file (augmented by all included Pascal Files)
- contributes zero or more CODE Segments.
-
- Therefore, there are at least as many CSeg entries as .OBJ files. If
- more, then the excess entries (those at the front of the list) belong
- to the Pascal files that make up the Pascal source for the unit.
-
- The format of an entry in this list is as follows:
-
- +00: A flag byte that indicates the type of file represented;
-
- 04h -> the Primary Pascal Source File,
- 03h -> an Included Pascal Source File,
- 05h -> an .OBJ file that contains a CODE segment
- 06h -> an .RES file from {$R xxx.RES} (Windows RESOURCE).
-
- +01: A Word apparently reserved for use by the Compiler/Linker.
-
- +03: A Word that is zero for .OBJ files and which contains the
- file directory time-stamp for Pascal Files.
-
- +05: A Word that is zero for .OBJ files and which contains the
- file directory date-stamp for Pascal Files.
-
- +07: A variable-sized string containing the filename and
- extension of the file used during compilation.
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 39
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 6.8 DEBUG TRACE TABLE
-
-
- If Debug support was selected at compile time, then all Pascal code
- which supports Debugging produces an entry in this table. The table
- entries themselves are variable in size and have the following format:
-
- +00: A Word which contains an LL that locates the Directory
- Header of the Symbol (a PROC name) this entry represents.
-
- +02: A Word which contains the offset (within the Source File
- List) of the entry that names the file that generated the
- CSeg being traced. This allows the file included by means
- of the {$I filename} directive to be identified for DEBUG
- purposes, as well as code produced from the Primary File.
-
- +04: A Word containing the number of bytes of data that precede
- the BEGIN statement code in the segment. For Pascal PROCS
- these bytes consist of literal constants, un-typed
- constants, and other data such as range-checking limits,
- etc.
-
- +06: A Word containing the Line Number of the BEGIN statement
- for the PROC.
-
- +08: A Word containing the number of lines of Source Code to
- Trace in this Segment.
-
- +0A: An array of bytes whose size is at least the number of
- source code lines in the PROC. Each byte contains the
- number of bytes of object code in the corresponding source
- line. This appears to be an array of SHORTINT since if a
- "line" contains more than 127 bytes, then a single byte of
- $80 precedes the actual byte count as a sort of "escape"
- and the next byte records the up to 255 bytes for the
- line. This situation has not yet been fully explored. We
- do not yet know what happens in the event a line is
- credited with spawning more than 255 bytes of code.
-
-
-
- 7. CODE, DATA, FIX-UP INFO
-
-
- This area begins at the start of the next free PARAGRAPH. This means
- that its offset from the beginning of the Unit ALWAYS ends in the
- digit zero.
-
- This area contains the CODE segments, CONST DATA segments, and the
- Relocation (Fix-Up) Data required for linking.
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 40
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 7.1 OBJECT CSEGS
-
-
- Each CODE segment included in the unit appears here as specified by
- the CSeg Map Table. Depending on usage, these segments may appear in
- the executable file. There are no filler bytes between segments.
-
-
-
- 7.2 CONST DSEGS
-
-
- This section begins at the start of the first free PARAGRAPH following
- the end of the Object CSegs. This means that its offset from the
- beginning of the Unit ALWAYS ends in the digit zero.
-
- A DATA segment fragment appears here for each CSeg that declares a
- typed constant, and for each OBJECT which employs Virtual Methods,
- Constructors or Destructors. There are no filler bytes between
- segments.
-
- If local symbols were generated, there is always enough information to
- allow documenting the scope of the declaration as well as interpreting
- the data in the display since the needed type declarations would also
- be available. Our program merely identifies the defining block.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 41
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 7.3 FIX-UP DATA TABLES
-
-
- There are - at most - two Fix-Up Data Tables in any given .TPU file.
- The first is for the CODE Area and the second is for the CONST DSeg
- area. Both are paragraph aligned and both have size information in
- the unit header.
-
- Turbo Pascal for DOS and Turbo Pascal for Windows apparently utilize
- differing code-generation models where floating-point is concerned.
- The nub of the difference appears to lie in emulation support. In the
- DOS product, the 8087 emulator is included in the SYSTEM unit while a
- WINDOWS DLL (WIN87EM) furnishes floating-point emulation support for
- applications. This seems to be the reason for a new fix-up format and
- for the way floating-point options are presented in TP for Windows.
-
- The Table consists of an array of eight (8) byte entries whose format
- is as follows:
-
- +00: A Byte containing the offset within the Donor Unit List of
- the Unit name that this entry refers to. This can be the
- compiled Unit or some previously compiled external unit.
-
- +01: A Byte of BIT switches that identify the type of reference
- and the size of the needed fix-up (WORD or DWORD). A lot
- of guess-work led to the following interpretation:
-
- 7654 (bits 3-0 don't seem to be used)
-
- 00-- Locate item via a PROC Map,
- 01-- Locate item via a CSeg Map,
- 10-- Locate item via a Global VAR DSeg Map,
- 11-- Locate item via a Const DSeg Map,
- --00 WORD offset has NO effective address adjustment,
- --01 WORD offset HAS an effective address adjustment,
- --10 WORD SEGMENT-Only fix-up (address of some PUBLIC
- segment),
- --11 DWORD (FAR) pointer; possible effective address
- adjustment.
-
- +02: A Word containing the offset within the Map table
- referenced according to the above code scheme.
-
- +04: A Word containing an offset within the target segment
- which will be added to the effective address. For
- example, a reference to the VAR DSeg Map will require a
- final offset to locate the item (variable) within the DATA
- SEGMENT being referenced here. This may also be needed
- for references to LITERAL DATA embedded in a CODE SEGMENT.
-
- +06: A Word containing the offset within the CODE or DATA
- segment owning this entry that contains the area to be
- patched with the value of the final effective address.
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 42
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- In the WINDOWS environment, an additional format is possible and it
- has the following appearance:
-
- +00: A Word containing $FFFF which appears to serve as a format
- identifier.
-
- +02: A Word containing an Emulator Fix-Up type code. After
- looking at many such entries in context with the object
- code, the following scheme seems to be operative:
-
- 2-> target floating point op has SS: override prefix;
- 3-> target floating point op has CS: override prefix;
- 4-> target floating point op has ES: override prefix;
- 5-> target floating point op has NO override prefix;
- 6-> target floating point op is "FWAIT" ($909B).
-
-
- +04: A Word that is probably always zero.
-
- +06: Offset to the floating-point operation to be emulated.
- This operation is always prefixed with a WAIT op ($9B)
- unless it is an FWAIT ($909B). If an operation is not so
- prefixed, then no fix-up record is generated for it.
-
- These latter fix-up records are (probably) incorporated into the .EXE
- file (following suitable transformations) so that the Windows Loader
- can see and process them. Presumably, they are simply ignored if a
- co-processor chip is present and working. If not, they tell the
- loader where the emulated instructions are. What the loader does with
- this information is pure guess-work but it probably works something
- like this:
-
- 1) if the Emulator Type code in the word at +02 indicates
- that a segment override prefix is present (codes 2..4),
- replace the first three bytes of the instruction with the
- following:
-
- $CD $3C "xxyyyyyy" where "yyyyyy" is the least-significant
- six bits of the "escape" byte (originally $D8..$DF) and
- "xx" is the ones-complement of the two-bit segment
- register value (00=ES, 01=CS,10=SS,11=DS).
-
- This method would result in replacement of the WAIT op
- ($9B), the segment override prefix, and the "escape" byte
- with the above string at program load time. This would
- allow an application to run regardless of the availability
- of co-processor support
-
- 2) if the Emulator Type code in the word at +02 is 5, then
- there is no override prefix. Replace the first two bytes
- of the instruction with the following:
-
- $CB $jj (where "jj" is "escape" - $A4). $jj is then
- chosen from the range $34..$3B.
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 43
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 3) if the Emulator Type code in the word at +02 is 6, then
- the operation to emulate is FWAIT. Replace the $90 $9B
- with $CB $3D.
-
- Since $CB is the op-code for INT then, if emulation were in effect, we
- would produce INT $34-$3D whenever a floating-point operation was
- found that could be emulated.
-
- This approach has the advantage that we don't have to commit to
- emulation or non-emulation at compile-time. Rather, the decision is
- made at load time and is transparent to the user. It's interesting to
- note that the DOS compiler generates such code without benefit of fix-
- ups whenever both 8087 and emulation support are elected since the
- emulator is a component of the SYSTEM unit in DOS. In WINDOWS, we
- merely include a reference to WIN87EM plus the above fix-ups.
-
- The technique relies on the fact that 8087 ops are necessarily
- prefixed by the WAIT byte (except for the "FN..." variants). This
- provides sufficient space to replace as above in-situ. This approach
- WILL NOT work if the code contains floating-point instructions without
- a WAIT prefix byte. If the object code requires an 80287 or an 80387
- (for example), then it would seem that that Interrupt 07H will have to
- be serviced by WIN87EM. This is all guess-work for now. I haven't
- seen any literature documenting WIN87EM techniques.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 44
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 8. SUPPLIED PROGRAM
-
-
- In order that the above information be made constructively useful, the
- author has designed a program that automates the process of discovery.
- It is not a work of art but it does give useful results provided your
- PC has enough available memory.
-
- The program source code has been re-organized many times as I simply
- haven't been able resist tinkering with it. Minor changes in its
- output have been implemented to enhance its usefulness.
-
- It should be obvious that the program was not designed "top-down".
- Rather, it just evolved as each new discovery was made. Later on, it
- seemed reasonable to try to document some of the relations between the
- various lists and tables and the program tries to make some of these
- relations clear, albeit with varying degrees of success.
-
- It may not be obvious to all readers, but the program is actually
- fighting a losing battle in many respects. The ".TPU" file was not
- designed with the intent of enabling de-compilation, disassembly or
- de-linking. Thus, some interesting semantic information is lost
- forever since it's not needed for either compilation or debugging.
- For example, it doesn't seem to be possible to determine with
- certainty the source file for a CONST DSeg or GLOBAL VAR DSeg where
- ".OBJ" files are linked into the ".TPU" file. Of course, it MAY be
- possible in certain cases but, in general, there is simply not enough
- information available to definitely determine the source. This is due
- to the fact that one ".OBJ" file may define such a DSeg and contain a
- CSeg that refers to it but, if the DSeg is PUBLIC, it may also be
- referred to by other CSegs. Each of the CSegs that make such
- references to the DSeg view it as an EXTERNAL as far as fix-up data is
- concerned. Therefore, it's impossible to determine which of the
- referencing CSegs was drawn from the same ".OBJ" file as the DSeg.
-
-
-
- 8.1 TWU1
-
-
- This is the main program. It will ask for the name of the unit to be
- documented. Reply with the unit name only. The program will append
- the ".TPU" extension and will search for the proper file. It will
- also search the appropriate library file; if necessary.
-
- The program will then ask if the unit is a DOS or WINDOWS unit and
- will require a "w" or "d" answer. This determines which unit library
- file to search (TURBO.TPL or TPW.TPL) for the SYSTEM unit (among
- others).
-
- The program will then ask if Dis-Assembly is desired and will require
- a "y" or "n" answer. If "y", it also asks about the CPU.
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 45
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- The current directory will be searched first, followed by all
- directories in the current PATH. If the .TPU file is not found, the
- program will search for it in the "TURBO.TPL" or in the "TPW.TPL"
- (Turbo Pascal Library) file as appropriate. Units in the "USES"
- list(s) will also be loaded to enable resolution of LG items.
-
- If the desired unit is found, the program will write a report to the
- current directory named "unitname.lst" which contains its analysis.
- The format of the report is such that it may be copied to a printer if
- that printer supports TTY control codes with form-feeds. Be judicious
- in doing this however since there can be a lot of information. Some
- of the units supplied by Borland can produce almost 2 MB of report
- output, depending on whether it's Version 6.0 for DOS or Version 1.0
- for Windows (some supplied Windows Units are BIG).
-
-
-
- 8.1.1 UNIT TWU1EQU
-
-
- This Unit contains constants, types and procedures of general utility
- that are not strictly unit or I/O related. One of the more powerful
- procedures is a general-purpose QuickSort procedure.
-
- It also contains a Heap Error Function that keeps track of the high-
- water mark of Heap Utilization of any program that uses it. This
- function gets installed automatically.
-
- This Unit makes SOME use of the INLINE assembler for speed and not out
- of sheer necessity. Some of the routines are INLINE Macros to provide
- for short expansions of otherwise overhead-ridden facilities.
-
-
-
- 8.1.2 UNIT TWU1RPT
-
-
- This is a Unit that contains the text-file output routines required by
- the main program. This relieves the main program of some of the
- tedium of handling report formatting and pagination issues.
-
-
-
- 8.1.3 UNIT TWU1UAM
-
-
- This Unit contains all Type Definitions, Structures, and primitive
- Functions and Procedures required by the program for ".TPU" file
- acquisition and analysis. All structures documented in this report
- are also documented in the interface by means of the TYPE mechanism.
- Some of the structures are difficult if not impossible to define using
- ISO Pascal but Turbo Pascal provides the means for getting the job
- done.
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 46
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- Some algorithms have been cast with object-orientation in mind and
- have potential for re-use in other contexts. The unit computes a
- cover for the dictionary and deduces relationships between dictionary,
- code, data and the CSeg, PROC, CONST and VAR Maps discussed in
- Sections 6.1 through 6.4 on Pages 35..37. This information is
- retrieved by the main program to drive the printing process.
-
- This Unit also loads all units specified in the USES list of the prime
- unit to allow the names of externally defined types to be recovered on
- the report. Array bounds are also retrieved in this way. The code
- will search for needed units in appropriate unit library file without
- intervention. Close attention is paid to Heap Management and minimal
- utilization of Heap storage. The dictionary areas of the Units
- located in the USES list get loaded into the Heap at no extra charge.
- Nothing but the dictionary area is of any use at this point. The name
- and fully-qualified file name of each unit successfully loaded are
- printed at the top of the listing. Unit version numbers must agree or
- the unit will not be loaded. Dictionary covers are computed for each
- loaded unit to aid in rapid LG-resolution.
-
- Lack of sufficient Heap Storage will not necessarily cause the program
- to fail. Heap Space MUST be available to load the primary unit and
- perform the necessary analyses, but the secondary or nested units are
- not essential. If they cannot be loaded, you merely lose some
- descriptive information. If Heap exhaustion occurs at a critical step
- however, the program will generate RunError 215.
-
-
-
- 8.1.4 UNIT TWU1UNA
-
-
- This unit is a rudimentary disassembler. The output will not assemble
- and may look strange to a "real" assembler programmer since I am not
- well-qualified in this area. However, the basis for support of 80286,
- 80386 etc. processors is present as well as coprocessor support. Of
- perhaps the greatest interest is that it does appear to decode the
- emulated coprocessor instructions that are implemented via INT 34-3D
- in the MS-DOS versions of Turbo Pascal.
-
- Be warned however. The output is not guaranteed since this was coded
- by myself and I am perhaps the rankest amateur that ever approached
- this quite awful assembler language. For convenience, the operand
- coding mimics TASM "Ideal" mode.
-
- As is usual with programs of this type, error-recovery is minimal and
- no context checking is performed. If the operation code is found to
- be valid, then a valid instruction is assumed -- even if invalid
- operands are present.
-
- The only positives that apply to this program are that it doesn't slow
- the cpu down (although a lot more output is produced), and it does let
- one "tune" code for compactness by letting one view the results of the
- coding directly. Also, incomplete instructions are handled as data
- rather than overrunning into the next proc.
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 47
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 8.2 NOTES ON PROGRAM LOGIC
-
-
- The following sections discuss a few of the methods employed by the
- supplied program. There are no cutting-edge algorithms here. Results
- counted for a lot more than technique.
-
-
-
- 8.2.1 FORMATTING THE DICTIONARY
-
-
- Printing the unit dictionary area in a way that exposes its underlying
- semantics is no small task. The unit dictionary area itself is a
- rather amorphous-looking mass of data composed of hash tables, Name
- Entries and stubs, type descriptors, etc. In order to present all
- this information in a meaningful way, we have to reveal its structure
- and this cannot be done by means of a sequential "browse" technique.
- Rather, we have to visit all nodes in the dictionary area so that each
- may be formatted in a way that exposes their function and meaning.
- This is made necessary by the fact that items are added to the
- dictionary as encountered and no convenient ordering of entry types
- exists. What we have here is the problem of finding a minimal "cover"
- for the dictionary area that properly exposes the content and
- structure of the dictionary area.
-
- To do this, we scan the dictionary recursively to determine the number
- of structures that we need to map. Then we get heap storage for the
- array of records that will hold the mapping information and repeat our
- recursive dictionary scan, this time constructing the mapping records.
-
- The recursive algorithm is "delicate" in that it is vulnerable to the
- cycles that our analysis uncovers - particularly when polymorphic
- objects are involved. Therefore, we have incorporated a simple little
- trap that tries to discover such cycles and avoid them. It is
- possible that the algorithm could fail for exceedingly complex units
- but it handles the worst cases from Borland with ease. Prior versions
- of this unit accomplished this task without recursion but required too
- many tricky pointer manipulations that were environmentally sensitive,
- so recursion was adopted. Since unit dictionaries don't tend to be
- deeply nested, we get reasonable heap utilization coupled with stable
- algorithms.
-
- The result is an array containing one entry for each structure in the
- unit dictionary area that is identifiable via traversal. Each entry
- in the array contains information about nesting level, parent scope,
- structure type and location. The array thus forms a set of
- descriptors that drive the process of formatting the dictionary area
- for display. The process may be likened to "painting by the numbers"
- or to finding a way to lay tile on a flat surface using tiles of
- differing shapes until the floor is exactly covered.
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 48
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- There is one significant limitation that needs to be pointed out. It
- is not always possible to determine the "parent" or "owner" of a node
- with certainty. The following discussion illustrates the problem of
- finding the "real" parent of a Type Descriptor.
-
- Almost every "type" in Turbo Pascal is actually derived from the basic
- types that are defined in the SYSTEM.TPU unit -- e.g. "INTEGER",
- "BYTE", etc. In addition, several of the Type Descriptors in the
- SYSTEM unit are referenced by more than one Name Entry. Thus, we find
- that a "many-to-one" relationship may exist between Name Entries and
- Type Descriptors. How does one find out which is the entry that
- actually gave rise to the Type Descriptor?
-
- The Dictionary Area of a unit has some special properties, one of
- which is the fact that the Name Entries for named Types are often
- located quite near their primary type descriptors. The Dictionary
- Area seems to be treated as an upward growing heap with the various
- structures being added by Turbo as encountered. This makes it likely
- that the Type "Q" header which gives rise to a type descriptor is
- quite likely to occur earlier in the Dictionary Area than any other
- entry which refers to the same descriptor. We use this property to
- allocate "ownership" but it may not be "fool-proof". Some type
- descriptors are spawned by other type descriptors, especially for
- structured types. Further, structured named types are often
- accompanied by pointer types and this results in having multiple named
- types sharing the same type descriptor. We don't attempt to allocate
- "ownership" to "spawned" type descriptors but we do try to keep track
- of scope information.
-
- A useful by-product of the above process is the ability to discover
- many of the associations between Global Variables, Typed CONST's,
- VMT's and the blocks in which they are declared or defined.
-
-
-
- 8.2.2 THE DISASSEMBLER
-
-
- To start with, I apologize up front for mistakes which are bound to be
- present in this routine. I am not really a MASM or TASM programmer
- and I will not pretend otherwise. This being the case, the formatting
- I have chosen for the operands may be erroneous or misleading and
- might (if submitted to one of the "real" assemblers) produce object
- code quite different from what is expected. I hope not, but I have to
- admit it's possible.
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 49
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- My intention in adding this unit was to support hand-tuning of object
- code. With practice and some effort, one can observe the effect on
- the object module caused by specific Pascal coding. Thus, where
- compactness or speed is an issue of paramount importance, disassembly
- can be of help. In some cases, a simple re-arrangement of the local
- variable declarations in a procedure can have a significant effect on
- the size of the code if it means the difference between 1 and 2-byte
- displacements for each instruction that references a specific local
- variable. Potential applications along these lines seem almost
- unlimited.
-
- I adopted an operand format not unlike that of TASM "Ideal" mode since
- it was more convenient to do so and looked more readable to me. I
- relied on several reference books for guidance in decoding the entire
- mess and I found that there were several flaws (read ERRORS) in some
- of them which made the job that much more difficult. I then
- compounded my problems by attempting to handle 80386 specific code
- even though Turbo Pascal does not yet generate code specific to these
- processors. I simply felt that the effort involved in writing any
- sort of Dis-Assembly program for Turbo Pascal units was an effort best
- experienced not more than once. With all this self-flagellation out
- of my system once and for all, I will try to show the basic strategy
- of the program and to explain the limitations and some of the
- discoveries I made.
-
- The routine is intended to be idiotically simple - i.e., no smarter
- than the DEBUG command in principle. The basic idea is: pass some
- text to the routine and get back ONE line derived from some prefix of
- that text. Repeat as necessary until all text is gone. Thus, there
- is no attempt to check the context of the text being processed. Also,
- some configurations of the "modR/M" byte may invalid for selected
- instructions. I don't try to screen these out since the intent was to
- look at the presumably correct code produced by TURBO Pascal -- not
- devious assembly language. Also, this program regards WAIT operations
- as "stand-alone" -- i.e., it doesn't check to see if a coprocessor
- operation follows for which the WAIT might be regarded as a prefix.
-
- One area of real difficulty was figuring out the Floating-Point
- emulations used by Turbo Pascal Version 6.0 for DOS that are
- implemented by means of interrupts $34 through $3D. I don't know if I
- got it right, but the results seem reasonable and consistent. In the
- listing, the Interrupt is produced on one line, followed by its
- parameters on the next line. The parameter line is given the op-code
- "EMU_xxxx" where "xxxx" is the coprocessor op-code I felt was being
- emulated. Interrupt $3C was a real puzzler but after seeing a lot of
- code in context, I think that the segment override is communicated to
- the emulator by means of the first byte after the $3C.
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 50
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- Normally, in a non-emulator environment, all coprocessor operations
- (ignoring any WAIT prefixes) begin with $D8-$DF. What Borland (and
- maybe Microsoft) seem to have done here is to change the $D8-$DF so
- that bits 7 and 6 of this byte are replaced with the one's complement
- of the 2-bit segment register number found in various 8086
- instructions. This seems to be how an override for the DS register is
- passed to the emulator. I don't KNOW this to be the correct
- interpretation, but the code I have examined in context seems to work
- under this scheme, so the disassembler uses it to interpret the
- operand accordingly.
-
- For 80x86 machines, the problem was somewhat simpler. The
- disassembler takes a quick look at the first byte of the text. Almost
- any byte is valid as the initial byte of an instruction, but some
- instructions require more than one byte to hold the complete operation
- code. Thus, step 1 classifies bytes in several ways that lead to
- efficient recognition of valid operation codes.
-
- Once the instruction has been identified in this way, it is more or
- less easy to link to supplemental information that provides operand
- editing guidance, etc.
-
- The tables that embody the recognition scheme were constructed using
- PARADOX (another fine Borland product) and suitably coded queries were
- used to generate the actual Turbo Pascal code for compilation.
-
- For those that are interested, the disassembler supports the address-
- size and operand-size prefixes of the 80386 as well as 32-bit operands
- and addresses but remember that Turbo Pascal doesn't generate these.
- A trivial change is provided for which allows segments which default
- to 32-bit mode to be handled as well.
-
- There is a simple mode variable that gets passed to the disassembler
- by its caller which specifies the most-capable processor whose code is
- to be handled. Codes are provided for the 8086 (8088 is the same),
- 80186 (same as 80286 without protected mode instructions), 80286
- (80186 plus protected mode), and 80386. You now get asked which one
- to use.
-
- No such specifier is provided for coprocessor support. What is there
- is what I think an 80387 supports. I don't think that this is really
- a problem if you don't try to use this disassembler for anything but
- Turbo Pascal code.
-
- Error recovery is predictably simple. The initial text byte is output
- as the operand of a DB pseudo-op and provision is made to resume work
- at the next byte of text.
-
- I hope this program is found to be useful in spite of the errors it
- must surely contain. I have yet to make much sense of the rules for
- MASM or TASM operand coding and I found very little of value in many
- of the so-called "texts" on the subject. I found myself in the
- position of that legendary American in England watching a Cricket
- match for the first time ("You mean it has RULES?").
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 51
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 9. UNIT LIBRARIES
-
-
- I have examined .TPL files in and conclude that their structure is
- trivial. It's so easy to handle them that the program now routinely
- examines either the TURBO.TPL or the TPW.TPL to resolve named types.
-
-
-
- 9.1 LIBRARY STRUCTURE
-
-
- A Turbo Pascal Library (.TPL) file is a simple catenation of Turbo
- Pascal Unit (.TPU) files. Since the size of a Unit may be determined
- from the Unit Header (see Section 4.2, Page 16), it is simple to see
- that one may "browse" through a .TPL file looking for an external unit
- such as SYSTEM.TPU. The supplied program does just that in its unit
- retrieval process so the TPUMOVER utility is no longer required for
- processing of units in either the TURBO.TPL or in the TPW.TPL file.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 52
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 10. INFERENCES DRAWN FROM ANALYSES
-
-
- I have learned much about Turbo Pascal .EXE files from poring over the
- output of the supplied program. It is possible to learn how to build
- smaller .EXE files after contemplating the structure of Unit files.
- It is also possible to avoid certain troublesome anomalies in the code
- if one can see just what Turbo Pascal does when certain switch
- declaratives are in effect.
-
-
-
- 10.1 LINKER GRANULARITY
-
-
- The Linker appears to be able to resolve any code or data fragment
- with a resolution that matches the granularity of the various "map"
- tables in the unit file. The Code Map, the CONST DSeg Map and the
- GLOBAL VAR Map each map things that can be included in the .EXE file
- if referenced. Conversely, these things can also be excluded if not
- referenced. Turbo Pascal manuals have been just a little vague about
- how "smart" the "Smart Linker" actually is but the granularity of the
- maps implies the extent of that "smartness". Assuming the linker does
- in fact take advantage of this information and act on it, then we as
- programmers can have a bit more control over the elements included
- from Unit Files. This control can extend to GLOBAL VAR's that may be
- used in particular circumstances, or not at all in others.
-
- It seems that CONST DSeg and GLOBAL VAR Map entries are constructed
- for each TYPED CONST or VAR "Declaration Part" encountered in the
- Pascal source code. Thus, "Toolbox" type units can have their Typed
- CONST's and GLOBAL VAR's partitioned along usage lines dedicated to a
- small group of Procedures or Functions so that they only get included
- if the appropriate Procedures or Functions are referenced or are
- explicitly referenced by the some external program.
-
-
-
- 10.2 FLOATING-POINT EMULATION
-
-
- Floating-Point emulation has some tricky cases -- particularly when
- the In-Line Assembler is used. As noted earlier, the implementation
- of Floating-Point Emulation is the responsibility of the SYSTEM unit
- in the MS-DOS version and of WIN87EM in the WINDOWS version. The
- state of the {$G±} directive toggle has an impact in these cases.
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 53
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- It would appear that 80286 code generation changes the way that
- floating-point instructions are generated since the 80287 is implied
- as the co-processor chip. In this case, the programmer has fine
- control over the timing of WAIT instructions since 80287 instructions
- don't automatically get prefixed by WAIT ops. When 8087 code is being
- generated, these WAIT instructions are produced for 8087 instructions
- since the 8087 requires it. This doesn't happen when the code is
- targeted at the 80287. So far, so good. However, EMULATION of such
- code gets trickier.
-
-
-
- 10.2.1 VERSION 6.0 COMPILER FOR MS-DOS
-
-
- It seems that the {$E±} directive doesn't work like it did in previous
- versions. All code produced in 8087 mode seems to be emulated code.
- I haven't found a way to get 8087 code generated if the compiler runs
- on a machine that doesn't have a co-processor. It may be that the
- directive works as documented if a co-processor is available on the
- machine the compiler runs on.
-
-
-
- 10.2.2 VERSION 1.0 COMPILER FOR WINDOWS
-
-
- It seems that the WIN87EM DLL in WINDOWS either needs to be able to
- service 80287 code via Hardware Interrupt 07H, or the application
- needs to be able to adapt itself to missing co-processor situations.
- This is implied by the Emulation Fix-Ups discussed earlier. These
- fix-ups are produced when 8087 code is being generated since the WAIT
- prefix on an instruction provides space for loader patching. Since
- WAIT prefixes are not automatically produced for 80287 instructions
- (except for FWAIT), some other mechanism is needed. I don't know how
- this situation is handled unless WIN87EM also services Interrupt 07H.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 54
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 11. APPLICATION NOTES
-
-
- One of the more obvious applications of this information would seem to
- be in the area of a Cross-Reference Generator.
-
- There is a very fine example of such a program in the public domain
- that was written by Mr. R. N. Wisan called "PXL". This program has
- been around since the days of Turbo Pascal Version 1. The program has
- been continually enhanced by the author in the way of features and for
- support of the newer Turbo Pascal versions. It does not however solve
- the problem of telling one which unit contains the definition of a
- given symbol. In fairness to "PXL" however, this is no small problem
- since the format of .TPU files keeps changing (Turbo 6.0 Units are
- not object-code compatible with Turbo 5.x Units, and so on...) and
- Mr. Wisan probably has more than enough other projects to keep himself
- occupied.
-
- However, for the user who is willing to work a little (maybe a lot?),
- this document would seem to provide the information needed to add such
- a function to his own pet cross-reference generator.
-
- Further, with SIGNIFICANTLY more effort, it should be possible to do
- much of the job of de-compilation -- provided the DEBUG dictionary is
- present. At the very least, most declarations should be recoverable.
- It's another thing entirely to try to reconstruct plausable TURBO
- Pascal code from the CSegs. This would be a formidable task and lots
- of knowledge about TURBO's code generators would have to be acquired.
- At present, the only way I know to get this information is to have the
- run-time library source codes and then work-work-work at testing code
- produced by the compiler for a huge number of test case units. You
- have to want to do this really badly in order to invest the time. I
- am not that tired of living.
-
- Finally, code-tuning is not really so tedious an exercise as one might
- imagine. The disassembler makes it possible to experiment with many
- variants of specific source code at the unit level and to observe the
- effect on object code generated. With practice, there are certain
- coding practices one can avoid such as indescriminate use of the
- "WITH" statement in Pascal (generates extra pointers and stack usage).
- A really simple way of checking a code proposal is to create a small
- test unit and fill it with sample coding. Disassembly of that unit
- will show what code is produced. This can be a rewarding exercise!
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 55
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 12. ACKNOWLEDGEMENTS
-
-
- This project would have been totally infeasible without the aid of
- some very fine tools. As it was, several hundred man hours have been
- expended on it and as you can see, there are a few unresolved issues
- that have been (graciously) left for others to address. The tools
- used by this author consisted of:
-
- Turbo Pascal for Windows by Borland International
-
- Turbo Pascal 6.0 Professional by Borland International
-
- Microsoft WORD (version 5.5)
-
- LIST (version 7.5) by Vernon D. Buerg
-
- the DEBUG utility in MS-DOS Version 3.3.
-
- PARADOX 3.5 by Borland International
-
- QUATTRO PRO Version 2.0 by Borland International
-
- TURBO ASSEMBLER 2.0 by Borland International
-
- (PARADOX and QUATTRO PRO were used for data collection and analysis in
- the course of coding the recognizer tables for the disassembler unit.)
-
- The references listed were of great value in this project. [Intel85]
- was a valuable source of information about coprocessor instructions as
- well as offering hints about the differences between the 8086/8088 and
- the 80286. The [Borland] TASM manuals offered further info on the
- 80186. [Nelson] provided presentations of well-organized data
- directed at the problem of disassembly but the tables were flawed by a
- number of errors which crept into my databases and which caused much
- of the extra debugging effort. [Intel89] offered valuable insights on
- the 80386 addressing schemes as well as the 32-bit data extensions.
- Finally, [Brown] provided valuable clues on the Floating-Point
- emulators used by Borland (and Microsoft?). As you can see, the
- amount of hard information available to me on this project was quite
- limited since I am unaware of any other existing body of literature on
- this subject.
-
- Finally, I am grateful to Mr. Anders Hejlsberg (Borland's Principal
- Architect for TURBO PASCAL) for the time he spent discussing "cabbages
- and kings" with me. TURBO PASCAL owes much of its syntactic style and
- elegance to his efforts and good judgement.
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 56
-
-
-
- Inside TURBO Pascal Unit Files
- ──────────────────────────────────────────────────────────────────────
-
- 13. REFERENCES
-
-
- [Borland], TURBO PASCAL FOR WINDOWS Programmer's Guide, Borland
- International, 1991.
-
- [Borland], TURBO ASSEMBLER REFERENCE GUIDE, Borland International,
- 1988.
-
- [Borland], TURBO ASSEMBLER USER'S GUIDE, Borland International, 1988.
-
- [Borland] TURBO PASCAL 6.0 PROGRAMMING GUIDE, Borland International,
- 1990.
-
- [Borland] TURBO PASCAL LIBRARY REFERENCE Version 6.0, Borland
- International, 1990.
-
- [Borland] TURBO PASCAL USER'S GUIDE Version 6.0, Borland
- International, 1990.
-
- [Brown], INTER191.ARC, Ralf Brown, 1991
-
- [Intel85], iAPX 286 PROGRAMMER'S REFERENCE MANUAL INCLUDING THE iAPX
- 286 NUMERIC SUPPLEMENT, Intel Corporation, 1985, (order
- number 210498-003).
-
- [Intel89], 386 SX MICROPROCESSOR PROGRAMMER'S REFERENCE MANUAL, Intel
- Corporation, 1989, (order number 240331-001).
-
- [Nelson] THE 80386 BOOK: ASSEMBLY LANGUAGE PROGRAMMER'S GUIDE FOR
- THE 80386, Ross P. Nelson, Microsoft Press, 1988.
-
- [Scanlon], 80286 ASSEMBLY LANGUAGE ON MS-DOS COMPUTERS, Leo J.
- Scanlon, Brady 1986.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ──────────────────────────────────────────────────────────────────────
- June 6, 1991 Page 57
-
-
-
- 14. INDEX
-
-
-
- .OBJ file........14, 35, 37, 39 Hash.............13, 14, 15, 16,
- .RES file........39 17, 18, 19, 20,
- .TPL file........8, 16, 45, 46, 25, 30, 31, 48
- 52
- .TPU Include..........39, 40
- file...........7, 9, 13, 16, Interface........7, 13, 14, 15,
- 27, 45, 46, 52, 16, 17, 18, 19,
- 55 26
- size.........16 Interrupt 07H....54
- SYSTEM.........8, 18, 19, 21,
- 27, 49, 52 Library..........45
- Locator
- {$E±}............54 LG.............9, 12, 21, 23,
- {$G±}............53 26, 27, 30, 31,
- 32, 33, 34
- 80286............54 LL.............9, 13, 18, 26,
- 80287............44, 54 35
- 80387............44 offset.........9, 11, 12, 22,
- 8087.............42, 44, 54 24, 25, 31, 35,
- 36, 40, 41, 42
- Attribute
- ABSOLUTE.......9 Method...........24
- EXTERNAL.......24, 35 CONSTRUCTOR....24
- DESTRUCTOR.....24
- Call Model Self...........22
- ASSEMBLER......24
- Dynamic........24 Operand offset...42
- FAR............24
- INLINE.........24 Parameter........20, 23, 25, 34
- INTERRUPT......24 PROC.............7, 13, 14, 24,
- CONST............7, 13, 14, 15, 35, 36, 40, 42,
- 22, 31, 37, 40, 47
- 41, 42, 47
- Constraint.......33, 34 RunError.........47
- CSeg.............7, 13, 14, 35,
- 36, 37, 39, 40, SEGMENT..........42
- 41, 42, 47 Signature........7, 26
- Stub.............9, 20, 23
- Defining block...37, 38 sSxx...........24
- Directive........14, 15, 16, 24, SYSTEM.TPS.......8, 19
- 35, 39, 40
- DLL..............7, 13, 38, 42 TPW..............45, 52
- DMT..............14, 15, 24, 31, TURBO............45, 52
- 37 Type Descriptor..21, 23, 26, 27,
- 28, 30, 31, 32,
- Emulation........53 33, 34, 49
- Emulator.........42, 43
- External.........9, 35, 37, 42, VAR..............38, 47
- 52 VMT..............14, 15, 25, 31,
- 37
- FWAIT............54
- WIN87EM..........42, 44, 53, 54
- Granularity......53