home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.benchmarks:1695 comp.arch.storage:775
- Path: sparky!uunet!dtix!darwin.sura.net!sgiblab!sgigate!sgi!igor!jbass
- From: jbass@igor.tamri.com (John Bass)
- Newsgroups: comp.benchmarks,comp.arch.storage
- Subject: Re: Disk performance issues, was IDE vs SCSI-2 using iozone
- Message-ID: <1992Nov16.031850.9663@igor.tamri.com>
- Date: 16 Nov 92 03:18:50 GMT
- References: <36995@cbmvax.commodore.com> <1992Nov12.193308.20297@igor.tamri.com> <sc77t04@zuni.esd.sgi.com>
- Organization: TOSHIBA America MRI, South San Francisco, CA
- Lines: 135
-
- Hi Dave ... long time no hear .... interesting that so many Ex-Fortune
- guys all ended up at SGI ...
-
- >Doing it in the filesystem requires all apps that ever deal with
- >the raw disk to do it also. Not a great idea. A number of apps,
- >particularly database programs use the raw disk.
-
- My point is that unmapped disks with visible error blocks represent
- the least possible cpu requirements for the slow microprocessor in
- the drive and for the host filesystem/driver when errors are handled
- directly by the filesystem. Maybe bad blocks should be visible for raw
- partitions as well, and databases (few in number) also understand about
- not using bad blocks. Translation overheads in the drive can represent
- a significant loss in performance.
-
- Filesystem management doesn't prevent driver mapping, ... but driver
- mapping does generate additional unwanted I/O's that the data base
- may not anticipate ... remapping a single sector in the middle of a
- large request will generate 3 I/O's .... start, remaps, and ending segements.
-
- If under a frequently updated table element, the write backs will have a
- sigificant and measurable performance decrease without any indication
- of such to the user or Database Admin or systems staff. Like filesystems
- Databases often have key tables that represent 5-15% of the updates.
-
- In addition, doing mapping in the driver requires a table lookup for
- exceptions on every I/O request ... for 50-60 errors (not an unreasonable
- number for a very large drive) this is often 150usec or more memory/cpu time
- per request that is wasted. At 80 requests per second this is about 1.5%
- of the cpu. Now sure the SGI machines have faster CPU's/memory, but this
- is not a great tradeoff for 386 PC's. Some large drives have more than a
- hundred errors. Now sure, key vendors can order error free drives for a
- premium, but what about the rest of the market?
-
- Even with slip sectoring in the drive, error groups will over run, and
- the drive will do even slower table lookups and/or long seeks. Slip
- sectoring on scsi drives also makes the math more complex to translate
- logical to physical for every request ... on a slow micro this time adds
- up as well. For an IDE drive that does geometry mapping it is also a problem.
-
- >
- >| Nor would I like to be the customer who has his FAT or root inode over
- >| a remapped sector.
- >
- >As I said above, I doubt this would be measurable, given any decent
- >drive. Of course, there are always going to be poorly implemented
- >drives, but you have to draw the line somewhere.
-
- I'm sorry Dave .... but I have already seen this happen ... the spare in
- the current cyl was already used, and each access to the FAT did two long
- seeks. Of course all that was needed was repartitioning the drive to move
- the FAT .... but still, most customers don't understand the problem.
-
- Now maybe there are better ways than remapping to the end of the drive ...
- how do current drives handle more than n bad sectors in a zone? Do
- they table lookup on every request?
-
- >
- >| > Yes, write-buffering does lose some error recovery chances, especially
- >| >if there's no higher-level knowledge of possible write-buffering so
- >| >filesystems can insert lock-points to retain consistency. However, it can be
- >| >a vast speed improvement. It all depends on your (the user's) needs. Some
- >| >can easily live with it, some can't, some need raid arrays and UPS's.
- >|
- >| It is only a vast speed improvement on single block filesystem designs ...
- >| any design which combines requests into a single I/O will not see such
- >| improvement ... log structured filesystems are a good modern example.
- >| It certainly has no such effect for my current filesystem design.
- >
- >Unfortunately, by actual measurement, this is untrue. Even with
- >write buffering turned on, errors can still be handled; it just
- >requires the driver to keep the previous request around till the
- >next request is done. This slows things down for sync writes to
- >the point where it is actually a slight loss, but fortunately most
- >writes are async (at least for unix, and presumably most modern
- >OS's). Command queuing with SCSI2 has similar issues.
-
- For SCSI this is true with tagged queuing ... for IDE and many caching ctlrs
- enabling write behind leaves you entirely blind to the error, and it can be
- more than one request back.
-
- >| that be at Fortune Systems ... we put it in ... only to take it when Field
- >| Service grew tried of the bad block tables overflowing and taking a big loss
- >| on good drives being returned due to "excessive bad blocks" as the result
- >| of normal (or abnormal) soft error rates due to other factors.
- >
- >Yes, I remember this. I've seen similar things happen with SCSI disks
- >that implemented the dynamic mapping poorly. I've reached the confidence
- >point with current drives that SGI ships (i.e., that we have done
- >qualification on, including numerous discussions with drive firmware
- >vendors), that I enable the dynamic mapping. I very rarely see false
- >remappings. Technology has advanced some over the years.
-
- Sgi ships top of the line ... including power supplies ... with cheap
- PC's this is seldom true and I've seen two customer failures along this
- line in the last year. Most PC power supplies are worse about meeting
- 12v regualtion under load than the Fortune Zenith's were. While high
- end technology has advanced, low end power technology hasn't, and
- automatic remapping will have the customer buy several new drives
- before a tech will fix the supply or configuration. For a number of
- applications each down time can cost more than the whole machine
- is worth.
-
- >
- >| Write buffering requires automatic remapping ... A good filesystem design
- >
- >No it doesn't. All it requires is error recovery code in the host.
- >Any host OS that enables writebuffering without that error recovery
- >code is either asking for trouble, or poorly written. We provide the
-
- For SCSI true ... for IDE & caching controllers enabling write buffering
- leaves you blind to errors without automatic remapping.
-
- >Sorry, but I have to disagree with you here. The greatest benefit
- >from write buffering is when you have disjoint blocks that can't
- >be joined into a single write (SGI does write combining in the
- >filesystem code, and I have many measurements that confirm this),
- >as it allows the OS to overlap the setup for the next write with
- >the drive doing the previous write. Even on fast systems, this
- >can be significant. Sure, single block sequential writes will
- >be helped as well, but as you say, this should not normally occur
- >in a well implemented system.
-
- I agree that for SCSI this represents a gain due to command decode timings
- of typical drives ... for IDE the amount of overlap is NIL, and not worth
- the implied error risks.
-
-
- ... well I think we have beat this issue to death .... enough for me
- unless someone brings something really interesting to the table.
-
- have fun ...
-
- John Bass, Sr. Engineer, DMS Design (415) 615-6706
- UNIX Consultant Development, Porting, Performance by Design
-