home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.bsd
- Path: sparky!uunet!spool.mu.edu!umn.edu!csus.edu!netcom.com!hasty
- From: hasty@netcom.com (Amancio Hasty Jr)
- Subject: Re: S3 question - Amancio, are you there?
- Message-ID: <1992Dec28.054342.13142@netcom.com>
- Organization: Netcom Online Communications Services (408-241-9760 login: guest)
- References: <VIXIE.92Dec26034105@cognition.pa.dec.com> <1992Dec27.081525.29228@netcom.com> <Bzy9wD.9Ez@pix.com>
- Date: Mon, 28 Dec 1992 05:43:42 GMT
- Lines: 237
-
- In article <Bzy9wD.9Ez@pix.com> stripes@pix.com (Josh Osborne) writes:
- >In article <1992Dec27.081525.29228@netcom.com> hasty@netcom.com (Amancio Hasty Jr) writes:
- >>In article <VIXIE.92Dec26034105@cognition.pa.dec.com> vixie@pa.dec.com (Paul A Vixie) writes:
- >[...]
- >>>I see that the two greatest bit-bangers of the average computer are available
- >>>as VESA cards: display, and disk. I'm still formulating my disk controller
- >>>questions and perhaps I'll ask them in a future post. Right now I'm trying
- >>>to solve the S3 mystery.
- >
- >One problem with VESA LB and disk drives, (I think) VESA LB doesn't allow
- >bus mastering cards. For SCSI (at least) this could be quite useful. Of
- >corse with current tech disk drives you need 3 fast disks running at once
- >to use all the ISA bus. Or you need (say, IDE) controlers with cache on them,
- >but it would be better to have a auto-sizing disk cache in main memory (like
- >SunOS, or Linux), because it would be (a) faster, and (b) useable as core if
- >thats more useful then disk cache, (c) you know if it is flushed to disk
- >or not.
- >
- >>>At work I have a EISA/SVGA/34020 board. It is very fast when run under
- >>>Windows 3.1; however, Microsoft had access to the 34020 specs and I don't,
- >>>so I can't figure out how to port the X server to it and noone in this
- >>>newsgroup seems to have done that either. It's too bad -- a 34020 with
- >>>a minimal BITBLT interpreter downloaded into it would make for a lightening
- >>>fast X11 server with the 34020 as almost a co-processor. However, I'm
- >>>fairly sure that the 34020's days are numbered given something called "S3"
- >>>and the "GUI Accelerator" that seem to be taking the market by storm.
- >
- >The 34020 docs are available from TI, I have a set somewhere. The cross
- >compiler is quite expensiave, and the old version makes poor code. Someone
- >got a old gcc to work (more or less) with it. The 34020 is fairly quick,
- >I would like to see a 34020 running X on it :-) (I know it would be faster
- >to do most of the X stuff on the [34]86 and let the TI bang bits).
- >
- >The GUI accel's are doing better then the 34020 cards because they are cheap,
- >however I think you can build a 34020 card as cheap as a S3, but nobody has.
- >
- >>>I know that SVGA is more or less a hack on the IBM VGA spec to allow more
- >>>pixels; what I don't know is what an "SVGA S3" is. I have gathered from
- >>>context in posts on this newsgroup that it is some kind of graphics
- >>>accelerator chipset and that there are several different revisions of
- >>>it and that different board manufacturers have had different results.
- >>>Yet, VGA is fundamentally a frame buffer that has some hardware assist
- >>>for certain operations. Where does S3 fit in? Is it another IO port, or
- >>>just more opcodes to the existing VGA IO port? Or just a faster implementation
- >>>of the VGA spec?
- >
- >This is answered well below, but I thought I would point out that:
- > * VGA only allows 64K of the video memory to be mapped into the PC's addr space
- > at once
- > * Most SVGAs allow 128K at once, normally 2 64K windows.
- > * Some more useful, but more disgusting ways of viewing video memory are also
- > available.
- > * A small number of SVGA chipsets can map all of the video memory into the
- > PC, but I don't know if the video cards can do it. The 386BSD kernal will
- > need to be wacked to make it work anyway.
- > * The S3 adds a bunch of IO addrs on top of a normal looking SVGA chipset.
- >
- >>>There are two reasons I need to know this. First, if the VGA really is "just
- >>>a frame buffer", then given a fast CPU and VESA it should be trivial to get
- >>>the MIT CFB server running and have it run near the theoretical maximum
- >>>(though at some potentially unneccessary cost in main CPU cycles). If on
- >>>the other hand VGA is like EGA in that you can only map certain parts into
- >>>memory at a time and it's generally cheaper to send high-level commands and
- >>>let the graphics hardware figure out how to achieve them, then I see a
- >>>problem.
- >
-
- >In genneral you can only map part of the video memory at a time.
- >
- >>>What problem? Well, DEC did this really neat thing called the "Dragon" chip
- >>>set back on their MicroVAX II/GPX. It was really really fast -- if you wrote
- >>>your application in FORTRAN on VMS. On the other hand if you ran under X11,
- >>>things ran doggishly slow and the visual results were often less than perfect.
- >>>This is because the _only_ way to talk to a Dragon is in high-level op-codes,
- >>>and the model X11 lived in was incompatible with the one the Dragon used --
- >>>so achieving one X11 operation often took several, or hundreds, of Dragon
- >>>operations. Since the Dragon's speed came from its economy of scale, the
- >>>speed was less than amazing.
- >
- >I don't know much about the dragon (is that the hardware made out of N
- >vipers?), but the S3, Mach8, Mach32, and even the 8514/a (or whatever it is)
- >have accel for short line segments which I think match up quite well with
- >the MI code in DDX's use of "spans" (not 100% short lines have limited length,
- >spans do not), so even when the exact graphics command X wants is not supported
- >by the hardware, this is (and should be faster then just pushing bits onto
- >a dumb buffer, except for really small spans).
- >
- >[...]
- >>>So here comes S3. Is it the salvation to all the world's woes? That depends.
- >>>Given VESA, one can access the VGA's "array" at memory speed (barring refresh
- >>>stalls -- that whole thing isn't dual-ported, is it?). Is that enough? Or,
- >>>if not, is it the S3 that gives one the extra performance and/or op-codes that
- >>>make X11 sing? And, if that last is true, why isn't an S3 on EISA or even ISA
- >>>"fast enough" ?
- >
- >I *think* (someone *please* correct me if I am wrong!) most of the numbers
- >(even the 70k+ ones) were with ISA S3 cards (they may have been in a EISA
- >system 'tho).
-
- I have an 486/50MHz system (ISA and Vesa Local Bus) and the 83k xstone
- posting is for the Actix GraphicEngine32 (S3 8C801 1MB DRAM ISA card).
-
- >
- >[...now the Hasty-miester speekith...]
- >>The image write, read and fill operations' performance was increased by
- >>using vga banking.We experienced a 10x performance improvement when
- >>we switched to vga banking. In the 8514/a architecture, all data transfer
- >>between the cpu and co-processor is done via the data transfer register.
- >>Also, we have to transfer the images a line at time inside a loop.
- >>If there is one area in which the S3 architecture suffers this is it!
- >>Ideally, I would like to see the chip do dma transfers from memory
- >>to the card and have it calculate the offsets into its memory and
- >>the logical converse - have the chip transfer a block of memory
- >>to consecutive region in the hosts memory.
- >
- >How about XCopyPlane (in XOR mode)? I don't have a S3 card (yet), but thats
- >the single most important thing for my application...
- Will let you know how fast it is :-) And, I will like to know what is your
- application?
-
- >
- >[...]
- >>The 801/805 and 928 architectures are capable of mapping their entire video
- >>memory to the host's address space. Currently, we only map 64k bytes at a
- >>time. This limitation is mostly imposed to us by the kernel!
- >
- >Can the video cards do this?
- Yes, the 801/805 are capable of mapping up to 2MB of memory
- > I assume the problem w/ the kernal is allocating
- >physicly contigous RAM?
-
- Yes this is main problem..
- > The best way to do this is add a new flag to the
- >memmory allocator. The simplest way is to have the device probe allocate the
- >VM you need during boot when most allocations will be contigous, confirm that
- >is _is_ contigous and go on...
- Tnks, we are looking into it right now...
-
- (>
- >>Further performance improvements were achieved by compiling the server
- >>with gcc-2.3.1. Some of the x11perf results were nearly twice as fast!
- >>Overall performance improvement, using xbench, proved to be around %15.
- >
- >Did you remember to use -m 486 (to produce code that runs fast on the 486,
- >but still runs on the 386), or just have it do 386 code?
- Yes, we use the -m 486 flag. In fact this was one of the highest motivations
- for compiling the server using gcc-2.3.1. I am using gcc-2.3.2 in machine
- and will soon upgrade to gcc-2.3.3 :-)
-
- >
- >[...]
- >>Slowly, the server is evolving from its pure 8514/a architecture to the
- >>S3 architecture. The next major jump will be when 16 bit or 24 bit
- >>color gets implemented :-)
- >
- >I thought the next big jump would be when you can map in 1+M of video memory
- >and use it...
- In terms of adding functionality which is not available today, I think we
- should start working on 16/24 bit colors. At any rate, this is my choice :-)
-
- Most of the graphics operations don't addressed directly the video memory.
- image write/read/fill are the only X operations which access the video memory
- directly.
-
- For instance, here is a sample code which moves characters from the card's
- memory to the desired location in the display:
- WaitQueue(7);
- outpw(CUR_X, (short)(ibm8514FC_X+(((int)chars[i])%32)*FC_MAX_WI\
- DTH));
- outpw(CUR_Y, (short)(ibm8514FC_Y+(((int)chars[i])/32)*FC_MAX_HE\
- IGHT));
- outpw(DESTX_DIASTP, (short)(x + pci->metrics.leftSideBearing));
- outpw(DESTY_AXSTP, (short)(y - pci->metrics.ascent));
- outpw(MAJ_AXIS_PCNT, (short)(GLYPHWIDTHPIXELS(pci)-1));
- outpw(MULTIFUNC_CNTL, MIN_AXIS_PCNT |
- (short)(GLYPHHEGHTPIXELS(pci)-1)
- outpw(CMD, CMD_BITBLT | INC_X | INC_Y | DRAW | PLANAR | WRTDATA\
- );
-
- >C
-
- >[...]
- >>Next, is how does the S3 architecture fair agains other accelerated cards?
- >>
- >>The January issue of Byte magazine voted the Actix's GraphicEngine32 (801)
- >>as one of the best overall graphic accelarated cards for window applications.
- >>At least on Byte's tests the 801 was faster than the ATI Ultra Pro (mach 32).
- >>And, I really doubt that the tests were executed at low clock frequencies.
- >>However, the article did not state the dot clock frequency which the tests
- >>were executed at. The other faster cards were based on the 34020 and cost
- >>more than $1400.
- >
- >People have had the S3 for long enough to make good use of it, the Mach32 may
- >be too new for good drivers to be available yet.
-
- I am assuming that Byte use ATI's drivers in their benchmark.
-
- > If people decide that the
- >34020 cards don't need to emulate SVGA/EGA/CGA/Herc in hardware the price
- >should drop by more then $1000, if they insist on doing that the price may
- >drop by about $1000. This would be the best card for X, because the 34020
- >is fully programable and can be made more X orientated then windows orientated
- >Also, the 340xx has super great control over the display (size/shape/res/
- >borders). The 34020 can even use the VRAM serial write regs...
- >
- >[...]
- >>On the topic of local bus IDE cards:
- >>
- >>It takes about 6 and 50 seconds to recompile the kernel with gcc-1.39.
- >>With an ISA IDE card, it takes about 7.5 minutes :-)
-
- >>
- >>How much does it cost? $89.
- >
- >What does "6 and 50 seconds" mean? Most IDE local bus cards mainly add lots
- >of cache. We can do better by adding more RAM to the main system and using
- >it wisely...
- 6 minutes and 50 seconds vs 7.5 minutes to compile the kernel.
- Orchid claims an 8MB data transfer rate and I am not going to get into
- a long philophical argument here with respect to what is a good benchmark
- for disk controllers :-)
- The Vesa Local Bus IDE controller is a non-caching controller and please
- don't forget it costs $89!
- >
- >[...]
- >--
- > stripes@pix.com "Security for Unix is like
- > Josh_Osborne@Real_World,The Multitasking for MS-DOS"
- > "The dyslexic porgramer" - Kevin Lockwood
- >We all agree on the necessity of compromise. We just can't agree on
- >when it's necessary to compromise. - Larry Wall
-
-
- --
- Amancio Hasty |
- Home: (415) 495-3046 | ftp-site depository of all my work:
- e-mail hasty@netcom.com | sunvis.rtpnc.epa.gov:/pub/386bsd/incoming
-