NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / lang / fortran / 5155 < prev next >

Wrap

Text File | 1993-01-24 | 7.5 KB | 167 lines

Newsgroups: comp.lang.fortran Path: sparky!uunet!cs.utexas.edu!uwm.edu!ux1.cso.uiuc.edu!news.cso.uiuc.edu!uimrl7.mrl.uiuc.edu!ercolessi From: ercolessi@uimrl3.mrl.uiuc.edu (furio ercolessi) Subject: Re: Compiler groups working on real apps? References: <1993Jan21.081105.4047@molene.ifremer.fr> <1993Jan22.193019.12936@news.eng.convex.com> <C19wHr.GF3@news.cso.uiuc.edu> <1993Jan23.003639.13681@craycos.com> Message-ID: <C1Cnq5.1zt@news.cso.uiuc.edu> Sender: usenet@news.cso.uiuc.edu (Net Noise owner) Reply-To: ercolessi@uimrl3.mrl.uiuc.edu (furio ercolessi) Organization: MRL - UIUC Date: Sun, 24 Jan 1993 08:32:29 GMT Lines: 154 In article <1993Jan23.003639.13681@craycos.com>, jrbd@craycos.com (James Davies) writes: |>In article <C19wHr.GF3@news.cso.uiuc.edu> ercolessi@uimrl3.mrl.uiuc.edu (furio ercolessi) writes: |> |>>This is probably too much to ask. But I am wondering. Maybe any one of |>>us (users) could spend some time in putting together some "typical" code, |>>containing a simplified version of the loops where our programs spend |>>most of the time. Staying within, say, 100 lines. These "test cases" could |>>be posted on c.l.f. (and stored at some anon ftp site) and compiler developers |>>could have a look at them and perhaps improve the compilers. |>> |>>These typical loops could perhaps also be coded as self-contained |>>benchmarks a-la-Linpack, inducing a bit of competitiveness :-), |>>helping us to understand better the strengths and weaknesses of the |>>various architectures, but not necessarily so. |>> |>>Any comment? |> |>You've just described the Livermore Loops. After seeing this comment, I fired up xnetlib, downloaded the Livermore loops and looked at them. I must admit I was not familiar with them. While there is certainly an interesting variety of codes in there (there are 24 different 'kernels'), at a first glance I was not able to locate something that I feel represents the kind of codes I am dealing with. My applications involve classical molecular dynamics (i.e., interacting atoms treated like classical masses and moved by time integration of F=ma). There are typically many thousands of atoms but the interactions are short ranged, so it's not necessary to loop over all pairs to obtain the forces. One makes a topological analysis from time to time and for each atom singles out the indexes of the other ones which happen to be hanging around it at that particular time. Then the time-consuming computations are done using this data structure. An example of such computations is given at the end of this post. This naturally leads to indirect addressing [ A(LIST(J) ], and we also use a fair amount of table look-up techniques which are in a sense similar (in the first case an index is obtained by picking it from somewhere, in the second case from a computation, in both cases the index is only known at run time). Where are these things in the Livermore loops? Kernels 13 and 16, if I am not wrong, are the only ones using indirect addressing. Kernel 16 is awfully coded and seem to test the ability to deal with a cascade of arithmetic IFs, and I admit I cannot make much sense out of it. Kernel 13 does indeed nice tricky things with indexes, but it spends time doing additions, real-to-integer conversions and integer shifts. There are no multiplications or divisions in kernel 13. Moreover, these loops play with arrays which are quite small. For example, 4096 elements in total for each array in kernel 13. Maybe this is not of much concern to compiler writers, since this is after all a hardware issue, but nowadays the cache behavior is a factor which often dominates benchmark results on real-world programs. With present CPUs and memories, array sizes are happily going into the millions, usually exceeding the cache sizes. If compilers are tuned to tiny benchmarks which fit into the cache, there could be surprises when moving to the real world applications. It is not rare to see dramatic changes in performance when increasing the problem size, or transposing arrays, on many new RISC architectures. Could compilers at least try to alleviate these problems? **** I cooked up a 'molecular dynamics kernel' that I feel is not well represented by any of the Livermore loops. Here it is. Any comment is welcome. In particular if you feel that there *is* a Livermore kernel that represents what this code does fairly well, and therefore we can look at performance results for that kernel and draw conclusions on how a certain machine will handle molecular dynamics. At this stage the code below is meant for visual inspection only (there may be errors). If there is interest I can make a real benchmark out of it. I think there are a few people out here doing the same stuff [I see a few competitors on this group from time to time :-) ... to them I say that my production code is certainly not the junk below :-)] and certainly there are hundreds of CPUs in the world crunching this kind of codes as you read this for applications in physics, chemistry and materials science, so it must not be regarded as esoteric. Thank you to anyone taking the time to consider this! :-) implicit double precision (a-h,o-z) * typical production value: parameter (n=100000) common/c1/ x(n),y(n),z(n),fx(n),fy(n),fz(n) parameter (n50=50*n) common/c2/ mrkr1(n),mrkr2(n),list(n50) parameter (ntab=10000) common/c3/ tab(ntab) do i=1,n * do on jlist typically iterates 50 times do jlist=mrkr1(i),mrkr2(i) * j obtained by indirect addressing. * j jumps here and there like crazy typically. * mrkr1,mrkr2 and list have been prepared previously. * it is guaranteed that j is never equal to i. j = list(jlist) * * the x(i),y(i),z(i) are in the range (-0.5,0.5). * the following methods A,B,C to fold the distance * applying periodic boundary conditions are equivalent, * and one can choose the fastest one on a given * machine. for any of them there is a machine where * that way is the fastest way. * * periodic boundary conditions on x using method A xij = x(i) - x(j) nx = xij + xij xij = xij - nx * periodic boundary conditions on y using method B yij = y(i) - y(j) if (yij.gt. 0.5d0) yij = yij - 1.d0 if (yij.lt.-0.5d0) yij = yij + 1.d0 * periodic boundary conditions on z using method C zij = z(i) - z(j) if (abs(zij).gt.0.5d0) zij = zij - sign(1.d0,zij) * * compute distance and do further work only if * lower than a certain threshold: rijsq = xij**2 + yij**2 + zij**2 if (rijsq.lt.rcutsq) then * use rijsq to look-up into a table, using * linear interpolation: sqd = (rijsq - rminsq)*drsqi + 1.d0 m = sqd delsqd = sqd - m fff = tab(m+1)*delsqd + tab(m)*(1.d0-delsqd) fxij = fff*xij fyij = fff*yij fzij = fff*zij * compute force between pair of particles and * apply 3rd Newton's law. fx(i) = fx(i) + fxij fy(i) = fy(i) + fyij fz(i) = fz(i) + fzij fx(j) = fx(j) - fxij fy(j) = fy(j) - fyij fz(j) = fz(j) - fzij endif enddo enddo -- furio ercolessi <furio@uiuc.edu>* <furio@sissa.it>+ * materials research lab, uni illinois at urbana-champaign + intl school for advanced studies, trieste, italy "Change nothing and continue with immaculate consistency" [ Brian Eno, "Oblique Strategies" ]