NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / arch / 10915 < prev next >

Wrap

Internet Message Format | 1992-11-19 | 7.3 KB

Xref: sparky comp.arch:10915 comp.lang.misc:3788 Newsgroups: comp.arch,comp.lang.misc Path: sparky!uunet!haven.umd.edu!darwin.sura.net!Sirius.dfn.de!Urmel.Informatik.RWTH-Aachen.DE!martin From: martin@math.rwth-aachen.de ( Martin Schoenert) Subject: Re: how to advocate new software/hardware features (Re: Hardware Support for Numeric Algorithms) Message-ID: <martin.722176263@bert> Sender: news@Urmel.Informatik.RWTH-Aachen.DE (Newsfiles Owner) Nntp-Posting-Host: bert.math.rwth-aachen.de Organization: Rechnerbetrieb Informatik / RWTH Aachen References: <TMB.92Nov14145619@pollux.idiap.ch> <Bxq7y0.IJ@mentor.cc.purdue.edu> <mwm.2n6z@contessa.palo-alto.ca.us> <Bxr8vG.IpI@mentor.cc.purdue.edu> <martin.721995695@bert> Date: 19 Nov 92 12:31:03 GMT Lines: 141 I have had two replies to my original post, and from them it appears that I didn't get my point across. So let me try again. 1) I want to write a *large integer arithmetic package in C* that lets me work with integers of several hundred or thousands of digits. For the sake of discussion let us assume that I am really only interested in addition and multiplication of such integers (I include addition only to avoid that people tell me that I should use modular techniques). 2) I want to write a portable package that can be used with every ANSI conforming compiler. That means that I cannot use extension such as GNU C 'asm' statements, I cannot use nonstandard integer types such as 'long long'. 3) I want to make the functions in this package as fast as possible, since I am going to use them a lot. The datastructure for such a large integer is simple enough. A large integer is represented by an array of *digits*. Each digit is represented by an unsigned integer type provided by the compiler (maybe 16 bit, 32 bit, or something else). Let B be the smallest positive value that cannot be represented as a digit (e.g. 2^16, or 2^32, or whatever). For example, suppose a digit is an unsigned short and this is 16 bit wide. Then B = 2^16, and the integer represented by the array 'unsigned short v[] = {58368,21515,2}' is 58368 + 21515*2^16 + 2*2^32 = 10^10. Now the algorithm used to multiply such large integers is the one we learned in school (with B=10). I know that there are algorithms that are asymptotically faster than this (e.g. Karatsuba or even FFT), but for integers less than a certain size (about 100 decimal digits) they are usually slower. This algorithm requires that we can multiply digits. Indeed in practice it spends most of its time multiplying digits. Now this means that the type used for digits is small enough such that there is an unsigned integer type that is at least twice as wide. More often than not this means that a digit is 'unsigned short' and the result of a multiplication is 'unsigned long' (and on most modern processors this will mean that digits are 16 bit wide and the result of a multiplication is 32 bit wide). If 'TypDigit' is the type of a digit and 'TypTwoDigits' is an unsigned integer type twice as wide, I can write TypDigit a; TypDigit b; TypTwoDigits product ... product = (TypTwoDigits)a * (TypTwoDigits)b; The casts are necessary, because otherwise an ANSI compiler where 'sizeof(unsigned int) = sizeof(TypDigit)' would only compute the lower digit of the product and set the upper digit to 0. Now I am not complaining that those casts are necessary or that a stupid compiler might generate a 'TypTwoDigits'-multiplication instruction. I understand that a compiler with a clever optimizer is allowed to generate a mixed size multiplication instruction. So far so good. This satisfies requirements 1) and 2). Unfortunately it does not satisfy requirement 3). Here is why. The largest unsigned integer type which ANSI gives me is 'unsigned long'. With almost all processors and C compilers I know 'unsigned long' is just as wide as an register (usually this means 32 bit, but not necessarily so). Please don't tell me that I should buy a C compiler where 'unsigned long' is twice as wide as an register, unless you can tell me which C compilers for 80386, MIPS, and Sparc systems have this feature. Now most modern processors *do* have an instruction that multiplies two registers and puts the result into a register pair or a set of special registers. Here is a short list (I also have a list with more information, which I can send on request). 32 bit processors that have 32x32->64 instructions: AMD 29000, AMD 29050, DEC Alpha 21064, HP PA-RISC 1.1, IBM RS6000, Transputer T9000, Intel i[34]86, Intel 960CA/CF, MIPS R[2346]000, Motorola 680[234]0, Motorola 88110, Sparc V7 (multiply step), Sparc V8 32 bit processors that have no 32x32->64 instruction: ARM [23], HP PA-RISC 1.0, Intel i860, Motorola 680[01]0, Motorola 881000 64 bit processors that have 64x64->128 instructions: DEC Alpha 21064, MIPS R4000 In almost all cases the 32x32->64 instructions are about as fast as the 16x16->32 instructions (in fact in most cases the 32x32->64 instruction will be used for 16x16->32 multiplications as well). That means: *if I could use the 32x32->64 instruction I could make my multiplication run four times as fast*. This is because I could then make 'TypDigit' twice as wide, implying that my integers have only half as many digits, speeding up the multiplication by a factor of four. (For 64 bit microprocessors the same applies, one could make the digits 64 bits wide and win a factor of four over a portable implementation that uses 32 bit digits). This is *not* idle speculation. By using small assembler functions one can show that this factor can really be obtained. However, I cannot use the 32x32->64 multiplication, because the C compiler provides no unsigned integer type to hold the result. To convince me otherwise, do the following a) pick a processor of the above list of 32 bit processors with 32x32->64 instruction b) pick a compiler c) write a piece of ANSI conforming code that multiplies two 32 bit unsigned integers and produces a 64 bit result d) show me the assembler output that uses only one 32x32->64 mult. instruction If you can do this, I may think about retracting the following statement. *For any of the processors HP PA-RISC 1.1, MIPS R[234]000, Motorola 680[234]0, Sparc V7&V8 there is an assembler function that multiplies two large integers at least twice as fast as any ANSI conforming C code compiled with any widely available compiler.* This is actually a weak statement. The strong statement would contain a larger list of processors and replace the factor of two with something between 3 and 4. Of course if one is willing to drop requirement 3), one has a lot more options. One could use 'asm' code (this is what GNU GMP does with GNU CC), 'long long' integer types, or use a small assembler kernel (this is what BigNum, PARI, and my arithmetic do). Martin. -- .- .-. - .. -. .-.. --- ...- . ... .- -. -. .. -.- .- Martin Sch"onert, Martin.Schoenert@Math.RWTH-Aachen.DE, +49 241 804551 Lehrstuhl D f"ur Mathematik, Templergraben 64, RWTH, D 51 Aachen, Germany