home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
- ╔══════════════════════════════════════════════════════════════╗
- ║ ║
- ║ The Logical Structure, Organization, ║
- ║ and Management of Hard Disk Drives ║
- ║ ║
- ║ by ║
- ║ Steve Gibson ║
- ║ GIBSON RESEARCH CORPORATION ║
- ║ ║
- ║ Portions of this text originally appeared in Steve's ║
- ║ InfoWorld Magazine TechTalk Column. ║
- ║ ║
- ╚══════════════════════════════════════════════════════════════╝
-
-
-
- As our operating systems and application software have continued
- to grow in size, their memory requirements have increased
- steadily. A vital memory in our system is hard disk storage.
-
- Bound within the hard disk's structure lie the answers to
- questions like: What is a low level format? What does FDISK do?
- What is a hard disk partition and why does DOS limit us to 32
- megabytes in a partition? What does it mean to have "lost
- cluster chains" or "cross-linked files?" What does it mean to
- have our disks "defragmented?" Let's explore MS-DOS and PC-DOS
- hard disk organization to answer these questions and others.
-
- The first stage in preparing any hard disk for operation is
- known as low level formatting. Low level formatting takes any
- hard disk from its virgin "fresh from the factory" state and
- prepares it for operation with a particular hard disk
- controller and computer system.
-
- Low level formatting divides each circular track into equal size
- SECTORS by placing SECTOR ID HEADERS at uniform positions around
- each track. The start of a sector ID is marked with a special
- magnetic pattern which cannot be generated by normal recorded
- data. This ADDRESS MARK allows the beginning of each sector to
- be uniquely discriminated from all recorded data.
-
- The sector ID information, which immediately follows the address
- mark contains each sector's Cylinder, Head, and Sector number
- which is completely unique for each sector on the disk. When the
- hard disk controller is late reading or writing to these disk
- sectors, it compares the sector's pre-recorded cylinder number
- to make sure that the heads haven't "mis-stepped" and that
- they're flying over the proper cylinder. It then compares the
- head
- number to verify that unreliable cabling is not causing an
- improper head to be selected and waits for the proper sector to
- start by comparing the pre-recorded sector number as it passes
- by with the sector number for which it is searching.
-
- Since many hard disk surfaces are not flawless, low level
- formatting programs include a means for entering the hard disk
- drive's defect list. The defect list specifies tracks (by
- cylinder and head number) that the manufacturer's sensitive drive
- certification equipment found to stray from the normal which
- indicates some form of physical flaw that might prevent data from
- being reliably written and read. The list of such defects
- is typically printed and attached to the outside of the drive.
-
- When these tracks are entered into the low level formatter, the
- defective tracks receive a special code in their sector ID
- headers which indicates that the track has been flagged as bad
- and cannot be used for any data storage. Later, as we shall see,
- high level formatting moves this defective track information
- into the system's File Allocation Table (FAT) to prevent the
- operating system from allocating files within these defective
- regions.
-
- When the low level format has been established, we have a
- completely empty drive, devoid of stored information, which can
- accept and retrieve data with the specification of any valid
- cylinder, head, and sector number.
-
- There's an important issue about the low level formatting of a
- hard disk which is frequently overlooked, but which can be quite
- important to appreciate. Since the hard disk controller works in
- intimate concert with its hard disk drive to transfer the data
- within its numbered sectors to and from the computer's memory,
- the exact details of the address mark, sector ID header, and
- rotational sector timing can be completely arbitrary for any
- controller and drive. Since these details are initially
- established when the drive receives its low level formatting,
- they are forever hence agreed upon by both the hard disk drive
- and the controller. But more importantly, there's absolutely no
- reason to assume that the relatively arbitrary low level
- formatting specifics used by any particular hard disk controller
- would be compatible with any other model of hard disk
- controller.
-
- In practice this means that differing makes or models of hard
- disk controllers are completely unable to read, write, or
- interpret the formatted information created by any other make or
- model of controller. Consequently, whenever it is
- necessary or desirable to exchange hard disk controllers, a
- complete backup of the hard disk's data, while attached to the
- initial controller, MUST BE followed by creating a new low level
- format with the new controller on the drive before any of the
- backed-up information can be restored to the drive with the new
- controller.
-
- So we've given our drives a low level format, since we see that
- it is this process which first establishes "communication"
- between a hard disk and its controller by creating 512-byte
- "sectors"
- where none existed before. Now lets take up the next phase of
- hard disk structuring: The hard disk PARTITION.
-
- The notion of hard disk (or "fixed disk" as IBM calls them)
- partitions was created to allow a hard disk based computer
- system to contain and "boot up" several completely different
- operating systems. Partitioning divides a single physical hard
- disk into multiple LOGICAL partitions.
-
- A birthday cake is divided into multiple pieces by slicing it
- radially whereas a hard disk's divisions are circular. For
- example, a drive's first partition might extend from cylinder
- zero through 299 with the second partition beginning on cylinder
- 300 and extending through 599. This circular partitioning is far
- more efficient since it minimizes the disk head travel when
- moving within a single partition.
-
- The partitions on a drive, even if there's only one, are managed
- by a special sector called the PARTITION TABLE which is located
- at the very beginning of every hard disk. It defines the
- starting and ending locations for each of the disk's partitions
- and specifies which of the partitions is to gain control of the
- system during system boot up. When the hard disk drive is booted
- a tiny program at the beginning of the partition table locates
- the partition which is flagged as being the "bootable partition"
- in the table and executes the program located in the first
- sector, the "boot sector," of that partition. This boot sector
- loads the balance of the partition's operating system then
- transfers control to it.
-
- Each partition on a hard disk is blind to the existence of any
- other. By universal agreement, the operation of software inside
- a partition is completely contained within the bounds of the
- partition. Adherence to this agreement prevents multiple
- operating systems from colliding and allows strange environments
- to cohabitate on a single hard disk.
-
- The sectors within a partition are numbered sequentially
- starting at zero and extending to the end of the partition. In
- kind with DOS's original belief that 640K of RAM would be more
- than we'd EVER need, there was a time in the not-so-distant past
- when a ten megabyte hard disk was an unheard of luxury and was
- considered huge. How could any single person ever fill up 10
- megabytes? No way.
-
- Consequently DOS was designed to access sectors within its hard
- disk partition with a single sixteen-bit quantity. One "word"
- was set aside for the specification of partition sectors. As
- many of you know, a single sixteen-bit binary word can represent
- values from 0 through 65,535. So this limited a partition's
- total sector count to 65,536. Since hard disk sectors are 512
- bytes long, a partition could contain 33,554,432 bytes. When you
- remember that binary megabytes are really 1,048,576 bytes each,
- that's exactly 32 megabytes.
-
- This is the origin of DOS's infamous 32 megabyte barrier. Today
- of course we have affordable drives with capacities well
- exceeding DOS's 32 megabyte limit. The industry has invented
- three solutions to this partition size dilemma.
-
- The first solution invented to the partition size problem
- utilizes DOS's inherent extendibility with external device
- drivers. Programs such as OnTrack's DISK MANAGER, Storage
- Dimensions' SPEEDSTOR, and Golden Bow's VFEATURE DELUXE utilize a
- clever trick to circumvent the 32 megabyte DOS limit: They trick
- DOS into believing that sectors are larger than 512 bytes! By
- interposing themselves between DOS and the hard disk, these
- partitioning device drivers lead DOS to believe that individual
- sectors are much larger than they really are. Then when DOS asks
- for one "logical" 4k-byte sector they hand DOS eight 512-byte
- physical sectors. This transforms the 65,536 sector count limit
- into a single partition containing more than 268 megabytes!
-
- The second solution was introduced by IBM's PC-DOS 3.3 operating
- system with its ability to allow DOS to have simultaneous access
- to multiple logical partitions on a single drive. With DOS 3.3,
- the standard FDISK command can establish any number of 32-
- megabyte or smaller partitions on a drive. While this doesn't
- create a single unified huge partition, it also doesn't require
- any external resident device drivers.
-
- The final solution has recently been introduced by Compaq
- Computer with their introduction of DOS 3.31. Being big enough
- to get away with sacrificing some software compatiblity, Compaq
- has redefined the way DOS numbers its partition sectors thereby
- removing the limitation at its source.
-
- So now our hard disks have a low level format, with
- "addressability" to the disk's individual physical sectors
- established. We have also defined and established partitions on
- our drive, which gives DOS a sub-range of the hard disk within
- which to build its filing system. Now let's examine the
- structure of MS-/PC-DOS filing systems. The following discussion
- also applies to DOS diskettes which aren't partitioned but
- otherwise have an identical structure.
-
- Let's begin by looking at the problem that DOS's filing system
- solves: Its task is to allow us, through the vehicle of DOS
- application programs, to create named collections of bytes of
- data, called files, and to help with their management by
- providing directories of these named files.
-
- The directory entry for any DOS file contains the file's name
- and extension, the date and time when the file was last written
- and closed, an assortment of Yes/No "attributes" which indicate
- whether the file has been modified since last backup, whether it
- can be written to, whether it's even visible in the directory,
- etc. The directory entry for the file also contains the address
- of the start of the file.
-
- We already know that hard disks are divided into numbered
- sectors 512 bytes in length. Since most of the files DOS manages
- are much larger than a single sector, disk space is allocated in
- "clumps" of sectors called clusters. Various versions of DOS
- utilize clusters of 4, 8 or 16 sectors each, or 2048, 4096, or
- 8192 bytes in length.
-
- When a hard disk is completely empty, its clusters of sectors
- are all available for storing file data. As files are created
- and deleted on the hard disk, a bookkeeping system is needed
- which keeps track of which clusters are in use by which existing
- files, and which clusters are still available for allocation to
- new or growing files. This is the vital role played by the File
- Allocation Table. The "FAT," as it's frequently called, is the
- table DOS uses to manage the allocation of space on the hard
- disk.
-
- As we know, the hard disk is arranged as a long stream of
- sectors. After being clumped together into clusters, it can be
- viewed as a long stream of clusters. Now picture a table
- consisting of a
- long stream of entries, with one entry in the table for each
- cluster on the disk. The first FAT table entry corresponds to
- the first hard disk cluster, and the last FAT entry corresponds
- to the last hard disk cluster.
-
- Now imagine that DOS needs to create a new text or spreadsheet
- file for us. It must first find a free cluster on the hard disk,
- so it searches through the File Allocation Table looking for an
- empty FAT table entry, which corresponds to an empty hard disk
- cluster. When DOS finds the empty table entry it memorizes its
- number, then places a special "end of chain" marker in the FAT
- entry to show that this cluster has been allocated and is no
- longer free for use. DOS then goes out to the sectors which
- comprise this cluster and writes the file's new data there.
-
- This is all great until the file grows longer than a single
- cluster of sectors. DOS now needs to allocate a second cluster
- for this file. So it once again searches through the File
- Allocation Table for a free cluster. When found, it again places
- the special "end of chain" marker in this cluster and memorizes
- its number.
-
- Now things begin to get interesting... and just a little bit
- tricky. Since files might be really long, consisting of
- thousands of individually allocated clusters, there's no way for
- DOS to memorize all of the clusters used by each file. So DOS
- uses each File Allocation Table entry to store the number of the
- file's next cluster!
-
- Following along with our example, after finding and allocating
- the second cluster for the growing file, DOS goes back to the
- first cluster's FAT entry where it had placed that first "end of
- chain" marker and replaces it with the number of the file's
- second cluster. If a third cluster were then needed, its FAT
- entry would be marked "not available" by placing the special
- "end of chain" marker in it, then this third cluster number
- would be placed into the second cluster's FAT entry. Get it?
-
- This creates a "chain" of clusters with each cluster entry
- pointing to the next one, and the last one containing a special
- "end of chain" entry which signals that the end of the file's
- allocation chain has been reached.
-
- Finally, when the file is "closed," an entry is created in a DOS
- directory which names the file and contains the number of the
- file's first cluster. Then, using that first cluster's FAT
- entry, the entire allocation "chain" can be "traversed" to find
- the clusters which contain the file's data.
-
- So now let's do a bit of review....
-
- The allocation of file space within a DOS partition is recorded
- and maintained within DOS's File Allocation Tables (FATs). The
- FATs make up a map of the utilization of space on any floppy or
- hard disk with one entry in the FAT for each allocatable cluster
- of sectors. Each entry in the FAT can indicate one of four
- possible conditions for the clusters of sectors it represents:
- It can be unused and available for allocation, unused and marked
- as bad to prevent its use, in use and pointing to the next
- cluster of the file, or in use as the last cluster of a file.
-
- If each entry in the FAT points to the next, who points to the
- first entry? This is the role of the file's directory entry. It
- contains the name of the file, the file's exact length, the time
- and date of the file's last modification, file attribute flags,
- and the identity of file's first cluster. In a sense, a file's
- directory entry forms the head of the file's allocation chain
- with each link thereafter pointing to the next link in the
- chain.
-
- This system, while quite workable and efficient, does have its
- dangers. These dangers center around the fact that the FAT
- contains the ONLY record of disk space utilization and a
- stubborn failure to correctly read a single sector of the FAT
- could render hundreds of files unrecoverable. This danger
- explains the popularity of several utility programs which create
- a back-up copy of the File Allocation Table and Root Directory
- with each system boot-up. They provide some hope of recovery
- from the cataclysmic loss of the FAT's data.
-
- The original designers of DOS were aware of the importance of
- the FAT and do provide a duplicate copy immediately following
- the first, but its physical proximity to the original renders it
- little better than none, and DOS has long been notorious for
- failing to intelligently utilize this extra copy of FAT
- information even in the event of a primary FAT failure. (DOS 3.3
- seems to be much smarter in this regard.)
-
- Important as FAT reliability is, it's not generally the prime
- source of DOS file corruption, since even with perfect data
- retrieval, it's still possible to scramble DOS's files like
- crazy. The primary cause of DOS file system troubles are user
- error, program bugs, and "glitches." The advent of TSR "rule
- breaking" resident multitasking-style software has further
- complicated the scene.
-
- When a new file is created or "opened," information about it is
- maintained inside DOS. The file's name, status, and first
- cluster are all held in internal tables. Then, as the file
- grows, free clusters are "checked out" of the File Allocation
- Table and allocated to the file's chain of clusters.
-
- Now here's the crucial fact which causes so much trouble: No
- matter how big the newly created file becomes, a directory entry
- for the file is ONLY created when the file is finally and
- properly CLOSED. Until then the file exists only as a chain of
- allocated clusters filled with the file's data. If anything
- occurs to prevent the error-free closing of this file we have a
- real problem because the file's data is occupying a chain of
- "checked out" disk clusters, but there is no anchoring directory
- entry to point to the first cluster in the chain!
-
- A chain of clusters without an anchoring directory entry is
- called a "lost chain." It exists, it contains data, but there's
- no record of the file's name, exact size, or purpose.
-
- Lost cluster chains are frequently created when programs abort
- abnormally, when TSR's crash the system suddenly, when the
- computer user forgets to write a TSR's files out to disk before
- shutting the system down, or when a task in a multi-tasking
- system is not terminated. (It's easy to forget that a file was
- left open in a suspended background task.) Additionally, any
- damage to DOS's root directory or subdirectories can "liberate"
- chains of lost clusters.
-
- DOS provides the CHKDSK (pronounced Check Disk) command to help
- its users keep an eye on just these sorts of problems. CHKDSK
- provides a comprehensive verification of DOS's filing system
- integrity and provides a means for straightening things out.
- When the CHKDSK command is given, the parentage of all cluster
- chains is checked, allocation chains are "followed" to be sure
- they don't cross over other chains (creating cross-linked
- files), and several other system integrity checks are performed.
-
- In the case of lost chains, CHKDSK will offer to convert these
- into files by anchoring them to the root directory. Then any
- suitable text editor can be used to open these new files for the
- sake of identifying them and moving them back to where they
- belong.
-
- Unfortunately the structure of DOS filing systems lacks the
- fundamental redundancy required to provide simple and error-free
- recovery from many forms of damage. Even the tools and
- techniques available from third party suppliers can't surmount
- these problems. The best bet is to understand DOS's weak spots,
- make certain that all opened files are closed successfully,
- perform a weekly CHKDSK command to collect accumulating file
- fragment "debris" and back up your hard disks regularly.
-
- "Disk Optimizers" which promise to increase the throughput and
- performance of old and well used hard disk drives number among
- the most popular of the general use hard disk utilities.
-
- We've seen how DOS's file allocation system operates. Files are
- composed of clusters which in turn are composed of sectors. And
- while the group of sectors which comprise a cluster are by
- definition contiguous, the cluster linking scheme which DOS
- employs allows a file's clusters to be scattered across the
- disk's surface. Since the file's directory entry specifies the
- file's first cluster, and each succeeding cluster entry in the
- file allocation table specifies the next one, the file's
- contents could be literally anywhere on the disk. The term "file
- fragmentation" refers to the condition where a file's clusters
- are not consecutively numbered. Let's first examine how a disk's
- files might become fragmented.
-
- When a file is deleted from a disk, its directory entry is
- flagged as unused and each cluster which the file occupied is
- flagged in the system's FAT as being free for use. If the
- surrounding clusters are still in use by other files, this
- creates a "hole" of free space in the disk.
-
- Now suppose that a new file is copied from a floppy disk onto
- the hard disk. As DOS reads the new file's data from the floppy,
- it must allocate space for this file on the hard disk. So each
- time another cluster of sectors is needed, DOS searches through
- the file allocation table to find the next available cluster. In
- our example, DOS would discover the clusters which had been
- freed by the first file we deleted and allocate them for use by
- the new file. Then, when all of the clusters in the free space
- hole had been used, DOS would be forced to continue its search
- deeper into the drive. When space was found further in, the
- file's contents would be partially stored near the beginning of
- the disk and partially nearer to the end. The file would then
- consist of at least two fragments.
-
- During the normal course of daily computer usage, many files are
- being constantly created, copied, extended, deleted, and
- replaced. When a wordprocessor creates an automatic backup file,
- the original file is typically renamed to identify it as a
- backup file and a new file is created. Every new file creation
- is an opportunity for fragmentation. The files which are being
- modified most often are most subject to extensive fragmentation
- since any search by DOS for a free file cluster is almost
- guaranteed to produce a new discontinuity. With continued use,
- it's typical for much of the disk's file data to become
- haphazardly scattered across the surface of the disk drive.
-
- But since DOS's cluster allocation scheme was specifically
- designed to manage such scattering, what's the problem? Any time
- the drive's head moves, two things occur: Time is consumed, and
- the drive experiences some mechanical wear and tear. If a file's
- data is scattered across the surface of the disk, the drive's
- head is forced to move a large distance many times to read a
- single file. If the file is a database whose records are being
- accessed at random, this excessive head motion can degrade the
- overall system performance tremendously and induce many other
- wear-related disk drive problems.
-
- The extra time wasted in cluster fragment chasing is directly
- proportional to the drive's average head access time. The prior
- generation of 65 to 80 millisecond stepping motor drives lose
- far more performance to fragmentation than the latest sub-28
- millisecond drives.
-
- Disk optimizers like SoftLogic Solutions' DISK OPTIMIZER,
- Norton's SPEEDDISK, Central Point's COMPRESS, and Golden Bow's
- VOPT operate by physically rearranging the allocation of files
- on the disk. They relocate file cluster fragments while
- simultaneously updating the system's File Allocation Tables to
- reflect the new cluster locations. When finished, every file on
- the disk consists of a single contiguous run of consecutively
- numbered clusters. Once the disk drive's head has been
- positioned to the beginning of the file, the entire file can be
- read or randomly accessed with an absolute minimum of head
- motion. Besides improving the system's overall performance, file
- defragmentation minimizes the mechanical wear and tear placed
- upon the drive's hardware. If some disaster should befall your
- system's Root Directory or File Allocation Table, contiguous
- files are also much easier to find and recover than files with
- severe fragmentation.
-
- Since file fragmentation is a continually occurring fact of
- living with DOS, periodic defragmentation, like hard disk
- backup, should become part of every serious DOS user's regimen.
-
- - The End -
-
-
- Copyright (c) 1989 by Steven M. Gibson
- Laguna Hills, CA 92653
- **ALL RIGHTS RESERVED **