home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.bsd
- Path: sparky!uunet!spool.mu.edu!umn.edu!csus.edu!netcom.com!hasty
- From: hasty@netcom.com (Amancio Hasty Jr)
- Subject: Re: S3 question - Amancio, are you there?
- Message-ID: <1992Dec27.081525.29228@netcom.com>
- Organization: Netcom Online Communications Services (408-241-9760 login: guest)
- References: <VIXIE.92Dec26034105@cognition.pa.dec.com>
- Date: Sun, 27 Dec 1992 08:15:25 GMT
- Lines: 287
-
- In article <VIXIE.92Dec26034105@cognition.pa.dec.com> vixie@pa.dec.com (Paul A Vixie) writes:
- >I could have addressed this directly to Amancio, but I am betting that a
- >lot of other folks would like to know the answer. I have been away from
- >the PC UNIX world for a while now (years, really) but I am presently
- >taking a look at different PC configurations for possible BSD/386 or 386BSD
- >use. I've already determined that Localbus ("VESA") is more cost effective
- >than EISA (in terms of useful-bit-made-faster per dollar-spent) and that
- >a VESA/ISA system is probably what I want unless the price difference of a
- >VESA/EISA system is within epsilon of what I have in my new-computer fund.
- >
- >I see that the two greatest bit-bangers of the average computer are available
- >as VESA cards: display, and disk. I'm still formulating my disk controller
- >questions and perhaps I'll ask them in a future post. Right now I'm trying
- >to solve the S3 mystery.
- >
- >At work I have a EISA/SVGA/34020 board. It is very fast when run under
- >Windows 3.1; however, Microsoft had access to the 34020 specs and I don't,
- >so I can't figure out how to port the X server to it and noone in this
- >newsgroup seems to have done that either. It's too bad -- a 34020 with
- >a minimal BITBLT interpreter downloaded into it would make for a lightening
- >fast X11 server with the 34020 as almost a co-processor. However, I'm
- >fairly sure that the 34020's days are numbered given something called "S3"
- >and the "GUI Accelerator" that seem to be taking the market by storm.
- >
- >I know that SVGA is more or less a hack on the IBM VGA spec to allow more
- >pixels; what I don't know is what an "SVGA S3" is. I have gathered from
- >context in posts on this newsgroup that it is some kind of graphics
- >accelerator chipset and that there are several different revisions of
- >it and that different board manufacturers have had different results.
- >Yet, VGA is fundamentally a frame buffer that has some hardware assist
- >for certain operations. Where does S3 fit in? Is it another IO port, or
- >just more opcodes to the existing VGA IO port? Or just a faster implementation
- >of the VGA spec?
- >
- >There are two reasons I need to know this. First, if the VGA really is "just
- >a frame buffer", then given a fast CPU and VESA it should be trivial to get
- >the MIT CFB server running and have it run near the theoretical maximum
- >(though at some potentially unneccessary cost in main CPU cycles). If on
- >the other hand VGA is like EGA in that you can only map certain parts into
- >memory at a time and it's generally cheaper to send high-level commands and
- >let the graphics hardware figure out how to achieve them, then I see a
- >problem.
- >
- >What problem? Well, DEC did this really neat thing called the "Dragon" chip
- >set back on their MicroVAX II/GPX. It was really really fast -- if you wrote
- >your application in FORTRAN on VMS. On the other hand if you ran under X11,
- >things ran doggishly slow and the visual results were often less than perfect.
- >This is because the _only_ way to talk to a Dragon is in high-level op-codes,
- >and the model X11 lived in was incompatible with the one the Dragon used --
- >so achieving one X11 operation often took several, or hundreds, of Dragon
- >operations. Since the Dragon's speed came from its economy of scale, the
- >speed was less than amazing.
- >
- >That seems to be what kills EGA (and non-SVGA VGA) performance on PC's. You
- >can either send lots of not-exactly-what-you-wanted high level operations
- >down the "wire" or you can write to memory over a very slow bus. Either way
- >things are very very slow.
- >
- >So here comes S3. Is it the salvation to all the world's woes? That depends.
- >Given VESA, one can access the VGA's "array" at memory speed (barring refresh
- >stalls -- that whole thing isn't dual-ported, is it?). Is that enough? Or,
- >if not, is it the S3 that gives one the extra performance and/or op-codes that
- >make X11 sing? And, if that last is true, why isn't an S3 on EISA or even ISA
- >"fast enough" ?
- >
- >I know that Amancio's numbers indicate that the problem _is solved_, one way
- >or another. But before I consider plunking money down to buy one of these
- >boxes, I would very much like to know _how_ it was solved. And, I would like
- >to know the answer to the perennial question: "which VESA S3 card is fastest,
- >and why?"
- >
- >Thanks in advance...
- >--
- >Paul Vixie, DEC Network Systems Lab
- >Palo Alto, California, USA "Don't be a rebel, or a conformist;
- ><vixie@pa.dec.com> decwrl!vixie they're the same thing, anyway. Find
- ><paul@vix.com> vixie!paul your own path, and stay on it." -me
-
- I chose the S3 chipset because:
-
- o it offers a relative low cost
-
- Recently, S3 cards have been listed for less than $200
- I expect the following price break down:
- 911/924 to cost around $170
- 801 (ISA) / 805 (local bus or EISA) $200 and $250.
- 928 ?? to be around > $300 but less than $350.
-
- o it is a high performance graphics engine
-
- on a 486/50 256k cache with 8MB:
- XS3 924 around 48k xstones
- XS3 801 around 83k xstones
- XS3 928 greater than 120k xstones.
-
- o there is a clear path of functional as well as a performance
- growth path ( I knew from day 1, when I first got my Diamond
- stealth that there was going to be an S3 928. And, I no longer
- have a Diamond Stealth!)
-
- o the documentation is publicly available
-
- The S3 basic architecture consists of an 8514/a core, vga core, and memory
- management. The 8514/a side of the S3 chipset is not fully compatible
- with the 8514/a standard. However, in Kevin Martin's X server the 8514/a
- instruction opcodes used are nearly identical to the S3 chipset 8514/a
- instructions provided. The 8514/a side of XS3-0.1 differ from Kevin's
- server in the way that pixels were encoded for doing stipple image
- transfer. The change to XS3 was minor to incorporate the difference, but
- difficult at the time because we did know the difference.
-
- The initialization of XS3 is done just like a svga with minor changes
- to incorporate the added features available in the S3 chipset. Must
- svga chipsets provided their added functionality slightly different, hence,
- the special initialization code for S3.
-
- Kevin's server was chosen because of its simplicity and a great match for the
- S3 chipset. Part of the confusion in the early phase of searching for a
- server for the S3 chipset is that S3 corporation does not really advertise
- the high degree of 8514/a compatibility that the S3 chipset has. In fact,
- when I first started I had no clue that S3 had such a degree of compatibility
- with the 8514/a!
-
- Much of the speed that we see today with S3 chipset is due to the built-in
- hardware graphic's functions provided by the chipset. An example, is line
- drawing, the server uses a Bresenham Line drawing algorithm implemented
- in the S3 chipset. However, dashed-lines, are currently implemented in
- software. Another example, is rectangle fills is all done in hardware.
- Additionally, the S3 chipset has a queue of up to 8 commands deep in the
- 911/924 class and is 16 command levels deep in the 801/805.
- Fortunately, the cost of setting up the S3 graphic operations has not proven
- to be a great performance drawback. Obviously, the less that we have to
- do the better off we are but this engineering issue must be taken into
- the context of how much will it cost to provide a minimal graphic set-up
- operation scheme.
-
- The fast text speed is due to a font cache which stores the fonts in
- the card's memory. We blt the characters from the cards memory to
- the location where we are writing the character to. This functionality
- is the same as in the 8514/a server.
-
- The image write, read and fill operations' performance was increased by
- using vga banking.We experienced a 10x performance improvement when
- we switched to vga banking. In the 8514/a architecture, all data transfer
- between the cpu and co-processor is done via the data transfer register.
- Also, we have to transfer the images a line at time inside a loop.
- If there is one area in which the S3 architecture suffers this is it!
- Ideally, I would like to see the chip do dma transfers from memory
- to the card and have it calculate the offsets into its memory and
- the logical converse - have the chip transfer a block of memory
- to consecutive region in the hosts memory.
-
- The stipple operations were improved by using tigher logic and doing
- 16 bit transfers as oppposed to 8 bit data transfers via the data
- transfer register.
-
- The server also now enjoys harware supported cursor. Some X applications such
- as acm (air combat simulator) run without the cursor ever flickering.
- In effect, is rare to see the cursor flicker, period. The 8514/a architecture
- does not have hardware cursor support.
-
- The 801/805 and 928 architectures are capable of mapping their entire video
- memory to the host's address space. Currently, we only map 64k bytes at a
- time. This limitation is mostly imposed to us by the kernel!
-
-
- Further performance improvements were achieved by compiling the server
- with gcc-2.3.1. Some of the x11perf results were nearly twice as fast!
- Overall performance improvement, using xbench, proved to be around %15.
-
- So far, we have been able to benefit from faster S3 chipset implementations,
- as, well as, more cpu power. For instance on my following systems:
-
- o 486/33Mhz 64k cache 8MB
- S3 911 46k xstones
- S3 801 64k xstones
-
- o 486/50Mhz 256k cache 8MB
- S3 801 83k xstones
-
- Note:
- All benchmarks were executed at 1024x768 45Mhz interlace.
- In the case of the 801 and 805 DRAM based architectures at higher
- clock rate you might experience performance degradation. However,
- I have not been able to the test this hypothesis put forth on this
- newsgroup. So, if anyone out there is running XS3 with an 801 card
- and has a high resolution monitor, I would appreciate if you ran
- xbench at 1024x768 45Mhz interlace and 1024x768 72Mhz. All I want
- with respect to this issue are the numbers, there has been enough
- postings with respect to this issue :-)
-
- Slowly, the server is evolving from its pure 8514/a architecture to the
- S3 architecture. The next major jump will be when 16 bit or 24 bit
- color gets implemented :-)
-
-
- Next, is what are the different S3 chipsets:
-
- o 8C911 VRAM based card. This is the first model.
- o 8C924 VRAM based card. In essence is the same as the 911.
- o 8C801 DRAM based card. Supports up to 2MB of memory.
- Max resolution is 1280x1024 256 colors at 60Mhz
- 1024x768 65k colors at 43.5 Mhz Interlace
- 640x480 16 million colors at 60Mhz
-
- o 8C928 VRAM based card. Will support up to 4MB of memory
- - don't have the functional specs for a card-
- I do have the databook.
-
-
- On Local Bus S3 cards:
-
- It is not clear at this point, whether XS3 will benefit from a local bus
- S3 implementation. The reason is because, most of the graphic functions
- used by the server are already implemented by the chip. I do expect
- image read/write/fill operations to benefit greatly. I do have a
- a Vesa Local Bus S3 805 card but I am not done with it, yet. And, I am
- using it right now. Will absolutly, not release the cards name
- till I am done with my work here!
-
-
- Next, is how does the S3 architecture fair agains other accelerated cards?
-
- The January issue of Byte magazine voted the Actix's GraphicEngine32 (801)
- as one of the best overall graphic accelarated cards for window applications.
- At least on Byte's tests the 801 was faster than the ATI Ultra Pro (mach 32).
- And, I really doubt that the tests were executed at low clock frequencies.
- However, the article did not state the dot clock frequency which the tests
- were executed at. The other faster cards were based on the 34020 and cost
- more than $1400.
-
-
- Dumb vga cards:
-
- On a non-accelarated vga card, all pixels in the screen are manipulated
- by the host computer a pixel at a time. If you do any kind of computation
- on the background the performance will suffer drastically. Chipsets such
- as the et4000 benefit tremendously when going to a local bus or EISA
- implementation. Currently, I don't have any xbenchmarks figures for
- ET4000 on a local bus system and it will be nice if someone posted their
- xbenchmark results.
-
-
- On the topic of local bus IDE cards:
-
- It takes about 6 and 50 seconds to recompile the kernel with gcc-1.39.
- With an ISA IDE card, it takes about 7.5 minutes :-)
-
- How much does it cost? $89.
-
-
- My current hardware configuration:
-
- Orchid SuperBoard 486/50Mhz 256k cache 8MB
-
- Vga cards which I used:
-
- Actix GraphicEngine(S3 801)
- Orchid F1280 (S3 911)
- xx brand Local Bus (S3 805)
- Diamond SpeedStar (ET4000)
-
-
- Orchid Local Bus IDE controller
-
- 14 inch svga monitor max resolution 1024x768 45Mhz interlace
-
-
- 216 MB Western Digital (All for X)
- 120 MB Western Digital
-
-
- 5 1/4 inch floppy drive
- 3 1/2 inch floppy drive
-
- Colorado Tape Backup system
-
- Hope this helps,
- Amancio Hasty
-
-
-
-
- --
- Amancio Hasty |
- Home: (415) 495-3046 | ftp-site depository of all my work:
- e-mail hasty@netcom.com | sunvis.rtpnc.epa.gov:/pub/386bsd/incoming
-