home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.arch:10915 comp.lang.misc:3788
- Newsgroups: comp.arch,comp.lang.misc
- Path: sparky!uunet!haven.umd.edu!darwin.sura.net!Sirius.dfn.de!Urmel.Informatik.RWTH-Aachen.DE!martin
- From: martin@math.rwth-aachen.de ( Martin Schoenert)
- Subject: Re: how to advocate new software/hardware features (Re: Hardware Support for Numeric Algorithms)
- Message-ID: <martin.722176263@bert>
- Sender: news@Urmel.Informatik.RWTH-Aachen.DE (Newsfiles Owner)
- Nntp-Posting-Host: bert.math.rwth-aachen.de
- Organization: Rechnerbetrieb Informatik / RWTH Aachen
- References: <TMB.92Nov14145619@pollux.idiap.ch> <Bxq7y0.IJ@mentor.cc.purdue.edu> <mwm.2n6z@contessa.palo-alto.ca.us> <Bxr8vG.IpI@mentor.cc.purdue.edu> <martin.721995695@bert>
- Date: 19 Nov 92 12:31:03 GMT
- Lines: 141
-
- I have had two replies to my original post, and from them it appears that
- I didn't get my point across. So let me try again.
-
- 1) I want to write a *large integer arithmetic package in C* that lets
- me work with integers of several hundred or thousands of digits.
- For the sake of discussion let us assume that I am really only
- interested in addition and multiplication of such integers (I include
- addition only to avoid that people tell me that I should use modular
- techniques).
-
- 2) I want to write a portable package that can be used with every ANSI
- conforming compiler. That means that I cannot use extension such
- as GNU C 'asm' statements, I cannot use nonstandard integer types
- such as 'long long'.
-
- 3) I want to make the functions in this package as fast as possible,
- since I am going to use them a lot.
-
- The datastructure for such a large integer is simple enough. A large
- integer is represented by an array of *digits*. Each digit is
- represented by an unsigned integer type provided by the compiler (maybe
- 16 bit, 32 bit, or something else). Let B be the smallest positive value
- that cannot be represented as a digit (e.g. 2^16, or 2^32, or whatever).
- For example, suppose a digit is an unsigned short and this is 16 bit
- wide. Then B = 2^16, and the integer represented by the array 'unsigned
- short v[] = {58368,21515,2}' is 58368 + 21515*2^16 + 2*2^32 = 10^10.
-
- Now the algorithm used to multiply such large integers is the one we
- learned in school (with B=10). I know that there are algorithms that are
- asymptotically faster than this (e.g. Karatsuba or even FFT), but for
- integers less than a certain size (about 100 decimal digits) they are
- usually slower.
-
- This algorithm requires that we can multiply digits. Indeed in practice
- it spends most of its time multiplying digits.
-
- Now this means that the type used for digits is small enough such that
- there is an unsigned integer type that is at least twice as wide. More
- often than not this means that a digit is 'unsigned short' and the result
- of a multiplication is 'unsigned long' (and on most modern processors
- this will mean that digits are 16 bit wide and the result of a
- multiplication is 32 bit wide).
-
- If 'TypDigit' is the type of a digit and 'TypTwoDigits' is an unsigned
- integer type twice as wide, I can write
-
- TypDigit a;
- TypDigit b;
- TypTwoDigits product
- ...
- product = (TypTwoDigits)a * (TypTwoDigits)b;
-
- The casts are necessary, because otherwise an ANSI compiler where
- 'sizeof(unsigned int) = sizeof(TypDigit)' would only compute the lower
- digit of the product and set the upper digit to 0.
-
- Now I am not complaining that those casts are necessary or that a stupid
- compiler might generate a 'TypTwoDigits'-multiplication instruction. I
- understand that a compiler with a clever optimizer is allowed to generate
- a mixed size multiplication instruction.
-
- So far so good. This satisfies requirements 1) and 2). Unfortunately it
- does not satisfy requirement 3). Here is why.
-
- The largest unsigned integer type which ANSI gives me is 'unsigned long'.
- With almost all processors and C compilers I know 'unsigned long' is just
- as wide as an register (usually this means 32 bit, but not necessarily
- so). Please don't tell me that I should buy a C compiler where 'unsigned
- long' is twice as wide as an register, unless you can tell me which C
- compilers for 80386, MIPS, and Sparc systems have this feature.
-
- Now most modern processors *do* have an instruction that multiplies two
- registers and puts the result into a register pair or a set of special
- registers. Here is a short list (I also have a list with more
- information, which I can send on request).
-
- 32 bit processors that have 32x32->64 instructions:
- AMD 29000, AMD 29050, DEC Alpha 21064, HP PA-RISC 1.1,
- IBM RS6000, Transputer T9000, Intel i[34]86, Intel 960CA/CF,
- MIPS R[2346]000, Motorola 680[234]0, Motorola 88110,
- Sparc V7 (multiply step), Sparc V8
-
- 32 bit processors that have no 32x32->64 instruction:
- ARM [23], HP PA-RISC 1.0, Intel i860,
- Motorola 680[01]0, Motorola 881000
-
- 64 bit processors that have 64x64->128 instructions:
- DEC Alpha 21064, MIPS R4000
-
- In almost all cases the 32x32->64 instructions are about as fast as the
- 16x16->32 instructions (in fact in most cases the 32x32->64 instruction
- will be used for 16x16->32 multiplications as well).
-
- That means: *if I could use the 32x32->64 instruction I could make my
- multiplication run four times as fast*. This is because I could then make
- 'TypDigit' twice as wide, implying that my integers have only half as
- many digits, speeding up the multiplication by a factor of four. (For 64
- bit microprocessors the same applies, one could make the digits 64 bits
- wide and win a factor of four over a portable implementation that uses 32
- bit digits).
-
- This is *not* idle speculation. By using small assembler functions one
- can show that this factor can really be obtained.
-
- However, I cannot use the 32x32->64 multiplication, because the C
- compiler provides no unsigned integer type to hold the result. To
- convince me otherwise, do the following
-
- a) pick a processor of the above list of 32 bit processors with
- 32x32->64 instruction
-
- b) pick a compiler
-
- c) write a piece of ANSI conforming code that multiplies two 32 bit
- unsigned integers and produces a 64 bit result
-
- d) show me the assembler output that uses only one 32x32->64 mult.
- instruction
-
- If you can do this, I may think about retracting the following statement.
-
- *For any of the processors HP PA-RISC 1.1, MIPS R[234]000, Motorola
- 680[234]0, Sparc V7&V8 there is an assembler function that multiplies two
- large integers at least twice as fast as any ANSI conforming C code
- compiled with any widely available compiler.*
-
- This is actually a weak statement. The strong statement would contain a
- larger list of processors and replace the factor of two with something
- between 3 and 4.
-
- Of course if one is willing to drop requirement 3), one has a lot more
- options. One could use 'asm' code (this is what GNU GMP does with GNU
- CC), 'long long' integer types, or use a small assembler kernel (this is
- what BigNum, PARI, and my arithmetic do).
-
- Martin.
-
- -- .- .-. - .. -. .-.. --- ...- . ... .- -. -. .. -.- .-
- Martin Sch"onert, Martin.Schoenert@Math.RWTH-Aachen.DE, +49 241 804551
- Lehrstuhl D f"ur Mathematik, Templergraben 64, RWTH, D 51 Aachen, Germany
-
-