home *** CD-ROM | disk | FTP | other *** search
Wrap
Received: from sloth.swcp.com (sloth.swcp.com [198.59.115.25]) by nacm.com (8.6.10/8.6.9) with ESMTP id PAA00669 for <executor@nacm.com>; Wed, 26 Apr 1995 15:55:03 -0700 Received: from iclone.UUCP (uucp@localhost) by sloth.swcp.com (8.6.9/8.6.9) with UUCP id QAA15753; Wed, 26 Apr 1995 16:58:07 -0600 Received: from gwar.ardi.com by mailhost with smtp (nextstep Smail3.1.29.0 #11) id m0s4FyN-000YbDC; Wed, 26 Apr 95 16:54 MDT Received: by gwar.ardi.com (linux Smail3.1.28.1 #5) id m0s4FyJ-000GOeC; Wed, 26 Apr 95 16:54 MDT Message-Id: <m0s4FyJ-000GOeC@gwar.ardi.com> Date: Wed, 26 Apr 95 16:54 MDT From: mat@ardi.com (Mat Hostetter) To: jered@MIT.EDU Cc: executor@nacm.com Subject: Re: 486 optmization In-Reply-To: <9504262205.AA12887@bill-the-cat.MIT.EDU> References: <9504262205.AA12887@bill-the-cat.MIT.EDU> Sender: owner-executor@nacm.com Precedence: bulk [Note: this is a fairly technical reply to Jered's question, but there are a bunch of assembler-heads on this list, so I thought I'd post it. Most people will just want to skip this message.] >>>>> "jered" == jered <jered@MIT.EDU> writes: jered> I just saw on the linux kernel discuss meeting that 486es jered> and higher have a special instruction for converting jered> big-endian to/from little-endian. Does anyone know if gcc jered> (djgpp) uses this and optimizes for it, what sort of jered> performance increase it might give, and if it would be jered> worth anyone's while to have a 486-higher executable of jered> Executor? Jered is talking about the "bswap" instruction, which byte swaps a four-byte value in a register in one cycle. It was added when the 80486 came out, so it isn't present on 80386's. It's non-pairable on the Pentium. gcc doesn't generate the "bswap" instruction, because it won't work on an 80386. I don't know if gcc has any way of doing anything special for byte swaps anyway. The -m486 flag isn't allowed to generate code that won't run on an 80386, so gcc couldn't generate a bswap. From gcc.info: `-m486' `-mno-486' Control whether or not code is optimized for a 486 instead of an 386. Code generated for an 486 will run on a 386 and vice versa. Executor's C code uses inline assembly to byte swap with three rotate instructions, which works on both the 80386 and 80486+. Our CPU emulator (syn68k) decides at runtime if you have an 80486 or better and generates bswap instructions "on the fly" if you do. Otherwise, it generates three rotate instructions. A version of Executor that didn't work on 80386's would be a little smaller and a little faster than the current one, but there's no reason to think the performance difference would be huge. We benchmarked such a version long, long ago and found that an 80486-specific version was something like 5-10% faster. Note that since NEXTSTEP/Intel only works on 80486's or better, Executor/NEXTSTEP/Intel assumes the presence of an 80486 and takes advantage of it. The new, faster blitter I'm writing may take advantage of the bswap instruction, if present. Once that's done, the CPU emulator and the graphics engine will both use bswap, so there won't be much performance to gain by creating an 80486-specific version of Executor. Thanks for the suggestion, though. -Mat