NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / benchmar / 1695 < prev next >

Wrap

Internet Message Format | 1992-11-15 | 7.4 KB

Xref: sparky comp.benchmarks:1695 comp.arch.storage:775 Path: sparky!uunet!dtix!darwin.sura.net!sgiblab!sgigate!sgi!igor!jbass From: jbass@igor.tamri.com (John Bass) Newsgroups: comp.benchmarks,comp.arch.storage Subject: Re: Disk performance issues, was IDE vs SCSI-2 using iozone Message-ID: <1992Nov16.031850.9663@igor.tamri.com> Date: 16 Nov 92 03:18:50 GMT References: <36995@cbmvax.commodore.com> <1992Nov12.193308.20297@igor.tamri.com> <sc77t04@zuni.esd.sgi.com> Organization: TOSHIBA America MRI, South San Francisco, CA Lines: 135 Hi Dave ... long time no hear .... interesting that so many Ex-Fortune guys all ended up at SGI ... >Doing it in the filesystem requires all apps that ever deal with >the raw disk to do it also. Not a great idea. A number of apps, >particularly database programs use the raw disk. My point is that unmapped disks with visible error blocks represent the least possible cpu requirements for the slow microprocessor in the drive and for the host filesystem/driver when errors are handled directly by the filesystem. Maybe bad blocks should be visible for raw partitions as well, and databases (few in number) also understand about not using bad blocks. Translation overheads in the drive can represent a significant loss in performance. Filesystem management doesn't prevent driver mapping, ... but driver mapping does generate additional unwanted I/O's that the data base may not anticipate ... remapping a single sector in the middle of a large request will generate 3 I/O's .... start, remaps, and ending segements. If under a frequently updated table element, the write backs will have a sigificant and measurable performance decrease without any indication of such to the user or Database Admin or systems staff. Like filesystems Databases often have key tables that represent 5-15% of the updates. In addition, doing mapping in the driver requires a table lookup for exceptions on every I/O request ... for 50-60 errors (not an unreasonable number for a very large drive) this is often 150usec or more memory/cpu time per request that is wasted. At 80 requests per second this is about 1.5% of the cpu. Now sure the SGI machines have faster CPU's/memory, but this is not a great tradeoff for 386 PC's. Some large drives have more than a hundred errors. Now sure, key vendors can order error free drives for a premium, but what about the rest of the market? Even with slip sectoring in the drive, error groups will over run, and the drive will do even slower table lookups and/or long seeks. Slip sectoring on scsi drives also makes the math more complex to translate logical to physical for every request ... on a slow micro this time adds up as well. For an IDE drive that does geometry mapping it is also a problem. > >| Nor would I like to be the customer who has his FAT or root inode over >| a remapped sector. > >As I said above, I doubt this would be measurable, given any decent >drive. Of course, there are always going to be poorly implemented >drives, but you have to draw the line somewhere. I'm sorry Dave .... but I have already seen this happen ... the spare in the current cyl was already used, and each access to the FAT did two long seeks. Of course all that was needed was repartitioning the drive to move the FAT .... but still, most customers don't understand the problem. Now maybe there are better ways than remapping to the end of the drive ... how do current drives handle more than n bad sectors in a zone? Do they table lookup on every request? > >| > Yes, write-buffering does lose some error recovery chances, especially >| >if there's no higher-level knowledge of possible write-buffering so >| >filesystems can insert lock-points to retain consistency. However, it can be >| >a vast speed improvement. It all depends on your (the user's) needs. Some >| >can easily live with it, some can't, some need raid arrays and UPS's. >| >| It is only a vast speed improvement on single block filesystem designs ... >| any design which combines requests into a single I/O will not see such >| improvement ... log structured filesystems are a good modern example. >| It certainly has no such effect for my current filesystem design. > >Unfortunately, by actual measurement, this is untrue. Even with >write buffering turned on, errors can still be handled; it just >requires the driver to keep the previous request around till the >next request is done. This slows things down for sync writes to >the point where it is actually a slight loss, but fortunately most >writes are async (at least for unix, and presumably most modern >OS's). Command queuing with SCSI2 has similar issues. For SCSI this is true with tagged queuing ... for IDE and many caching ctlrs enabling write behind leaves you entirely blind to the error, and it can be more than one request back. >| that be at Fortune Systems ... we put it in ... only to take it when Field >| Service grew tried of the bad block tables overflowing and taking a big loss >| on good drives being returned due to "excessive bad blocks" as the result >| of normal (or abnormal) soft error rates due to other factors. > >Yes, I remember this. I've seen similar things happen with SCSI disks >that implemented the dynamic mapping poorly. I've reached the confidence >point with current drives that SGI ships (i.e., that we have done >qualification on, including numerous discussions with drive firmware >vendors), that I enable the dynamic mapping. I very rarely see false >remappings. Technology has advanced some over the years. Sgi ships top of the line ... including power supplies ... with cheap PC's this is seldom true and I've seen two customer failures along this line in the last year. Most PC power supplies are worse about meeting 12v regualtion under load than the Fortune Zenith's were. While high end technology has advanced, low end power technology hasn't, and automatic remapping will have the customer buy several new drives before a tech will fix the supply or configuration. For a number of applications each down time can cost more than the whole machine is worth. > >| Write buffering requires automatic remapping ... A good filesystem design > >No it doesn't. All it requires is error recovery code in the host. >Any host OS that enables writebuffering without that error recovery >code is either asking for trouble, or poorly written. We provide the For SCSI true ... for IDE & caching controllers enabling write buffering leaves you blind to errors without automatic remapping. >Sorry, but I have to disagree with you here. The greatest benefit >from write buffering is when you have disjoint blocks that can't >be joined into a single write (SGI does write combining in the >filesystem code, and I have many measurements that confirm this), >as it allows the OS to overlap the setup for the next write with >the drive doing the previous write. Even on fast systems, this >can be significant. Sure, single block sequential writes will >be helped as well, but as you say, this should not normally occur >in a well implemented system. I agree that for SCSI this represents a gain due to command decode timings of typical drives ... for IDE the amount of overlap is NIL, and not worth the implied error risks. ... well I think we have beat this issue to death .... enough for me unless someone brings something really interesting to the table. have fun ... John Bass, Sr. Engineer, DMS Design (415) 615-6706 UNIX Consultant Development, Porting, Performance by Design