home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.fortran
- Path: sparky!uunet!cs.utexas.edu!uwm.edu!ux1.cso.uiuc.edu!news.cso.uiuc.edu!uimrl7.mrl.uiuc.edu!ercolessi
- From: ercolessi@uimrl3.mrl.uiuc.edu (furio ercolessi)
- Subject: Re: Compiler groups working on real apps?
- References: <1993Jan21.081105.4047@molene.ifremer.fr> <1993Jan22.193019.12936@news.eng.convex.com> <C19wHr.GF3@news.cso.uiuc.edu> <1993Jan23.003639.13681@craycos.com>
- Message-ID: <C1Cnq5.1zt@news.cso.uiuc.edu>
- Sender: usenet@news.cso.uiuc.edu (Net Noise owner)
- Reply-To: ercolessi@uimrl3.mrl.uiuc.edu (furio ercolessi)
- Organization: MRL - UIUC
- Date: Sun, 24 Jan 1993 08:32:29 GMT
- Lines: 154
-
- In article <1993Jan23.003639.13681@craycos.com>, jrbd@craycos.com (James Davies) writes:
- |>In article <C19wHr.GF3@news.cso.uiuc.edu> ercolessi@uimrl3.mrl.uiuc.edu (furio ercolessi) writes:
- |>
- |>>This is probably too much to ask. But I am wondering. Maybe any one of
- |>>us (users) could spend some time in putting together some "typical" code,
- |>>containing a simplified version of the loops where our programs spend
- |>>most of the time. Staying within, say, 100 lines. These "test cases" could
- |>>be posted on c.l.f. (and stored at some anon ftp site) and compiler developers
- |>>could have a look at them and perhaps improve the compilers.
- |>>
- |>>These typical loops could perhaps also be coded as self-contained
- |>>benchmarks a-la-Linpack, inducing a bit of competitiveness :-),
- |>>helping us to understand better the strengths and weaknesses of the
- |>>various architectures, but not necessarily so.
- |>>
- |>>Any comment?
- |>
- |>You've just described the Livermore Loops.
-
- After seeing this comment, I fired up xnetlib, downloaded the Livermore
- loops and looked at them. I must admit I was not familiar with them.
- While there is certainly an interesting variety of codes in there (there are 24
- different 'kernels'), at a first glance I was not able to locate something that
- I feel represents the kind of codes I am dealing with.
-
- My applications involve classical molecular dynamics (i.e., interacting atoms
- treated like classical masses and moved by time integration of F=ma).
- There are typically many thousands of atoms but the interactions are
- short ranged, so it's not necessary to loop over all pairs to obtain the
- forces. One makes a topological analysis from time to time and for
- each atom singles out the indexes of the other ones which happen to be
- hanging around it at that particular time. Then the time-consuming
- computations are done using this data structure. An example of such
- computations is given at the end of this post.
-
- This naturally leads to indirect addressing [ A(LIST(J) ], and we also
- use a fair amount of table look-up techniques which are in a sense
- similar (in the first case an index is obtained by picking it from
- somewhere, in the second case from a computation, in both cases
- the index is only known at run time). Where are these things
- in the Livermore loops? Kernels 13 and 16, if I am not wrong, are the
- only ones using indirect addressing. Kernel 16 is awfully coded and seem
- to test the ability to deal with a cascade of arithmetic IFs, and I admit
- I cannot make much sense out of it. Kernel 13 does indeed nice
- tricky things with indexes, but it spends time doing additions,
- real-to-integer conversions and integer shifts. There are no
- multiplications or divisions in kernel 13.
-
- Moreover, these loops play with arrays which are quite small.
- For example, 4096 elements in total for each array in kernel 13.
- Maybe this is not of much concern to compiler writers, since this is
- after all a hardware issue, but nowadays the cache behavior is a
- factor which often dominates benchmark results on real-world programs.
- With present CPUs and memories, array sizes are happily going into the
- millions, usually exceeding the cache sizes. If compilers are tuned to
- tiny benchmarks which fit into the cache, there could be surprises when
- moving to the real world applications. It is not rare to see dramatic
- changes in performance when increasing the problem size, or transposing
- arrays, on many new RISC architectures.
- Could compilers at least try to alleviate these problems?
-
- ****
-
- I cooked up a 'molecular dynamics kernel' that I feel is not well
- represented by any of the Livermore loops. Here it is. Any comment is
- welcome. In particular if you feel that there *is* a Livermore kernel that
- represents what this code does fairly well, and therefore we can look
- at performance results for that kernel and draw conclusions on
- how a certain machine will handle molecular dynamics.
-
- At this stage the code below is meant for visual inspection only (there
- may be errors). If there is interest I can make a real benchmark out
- of it.
-
- I think there are a few people out here doing the same stuff [I see
- a few competitors on this group from time to time :-) ... to them I say
- that my production code is certainly not the junk below :-)] and
- certainly there are hundreds of CPUs in the world crunching this kind
- of codes as you read this for applications in physics, chemistry and
- materials science, so it must not be regarded as esoteric.
-
- Thank you to anyone taking the time to consider this! :-)
-
-
- implicit double precision (a-h,o-z)
- * typical production value:
- parameter (n=100000)
- common/c1/ x(n),y(n),z(n),fx(n),fy(n),fz(n)
- parameter (n50=50*n)
- common/c2/ mrkr1(n),mrkr2(n),list(n50)
- parameter (ntab=10000)
- common/c3/ tab(ntab)
-
- do i=1,n
- * do on jlist typically iterates 50 times
- do jlist=mrkr1(i),mrkr2(i)
- * j obtained by indirect addressing.
- * j jumps here and there like crazy typically.
- * mrkr1,mrkr2 and list have been prepared previously.
- * it is guaranteed that j is never equal to i.
- j = list(jlist)
- *
- * the x(i),y(i),z(i) are in the range (-0.5,0.5).
- * the following methods A,B,C to fold the distance
- * applying periodic boundary conditions are equivalent,
- * and one can choose the fastest one on a given
- * machine. for any of them there is a machine where
- * that way is the fastest way.
- *
- * periodic boundary conditions on x using method A
- xij = x(i) - x(j)
- nx = xij + xij
- xij = xij - nx
- * periodic boundary conditions on y using method B
- yij = y(i) - y(j)
- if (yij.gt. 0.5d0) yij = yij - 1.d0
- if (yij.lt.-0.5d0) yij = yij + 1.d0
- * periodic boundary conditions on z using method C
- zij = z(i) - z(j)
- if (abs(zij).gt.0.5d0) zij = zij - sign(1.d0,zij)
- *
- * compute distance and do further work only if
- * lower than a certain threshold:
- rijsq = xij**2 + yij**2 + zij**2
- if (rijsq.lt.rcutsq) then
- * use rijsq to look-up into a table, using
- * linear interpolation:
- sqd = (rijsq - rminsq)*drsqi + 1.d0
- m = sqd
- delsqd = sqd - m
- fff = tab(m+1)*delsqd + tab(m)*(1.d0-delsqd)
- fxij = fff*xij
- fyij = fff*yij
- fzij = fff*zij
- * compute force between pair of particles and
- * apply 3rd Newton's law.
- fx(i) = fx(i) + fxij
- fy(i) = fy(i) + fyij
- fz(i) = fz(i) + fzij
- fx(j) = fx(j) - fxij
- fy(j) = fy(j) - fyij
- fz(j) = fz(j) - fzij
- endif
- enddo
- enddo
-
-
- --
- furio ercolessi <furio@uiuc.edu>* <furio@sissa.it>+
- * materials research lab, uni illinois at urbana-champaign
- + intl school for advanced studies, trieste, italy
-
- "Change nothing and continue with immaculate consistency"
- [ Brian Eno, "Oblique Strategies" ]
-