NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / lang / fortran / 5171 < prev next >

Wrap

Internet Message Format | 1993-01-25 | 2.8 KB

Path: sparky!uunet!olivea!hal.com!decwrl!deccrl!news.crl.dec.com!dbased.nuo.dec.com!quark.enet.dec.com!lionel From: lionel@quark.enet.dec.com (Steve Lionel) Newsgroups: comp.lang.fortran Subject: Re: Compiler groups working on real apps? Message-ID: <1993Jan25.202650.2851@dbased.nuo.dec.com> Date: 25 Jan 93 20:26:50 GMT References: <1993Jan21.081105.4047@molene.ifremer.fr> <1993Jan22.193019.12936@news.eng.convex.com> <C19wHr.GF3@news.cso.uiuc.edu> <1993Jan23.003639.13681@craycos.com> <C1Cnq5.1zt@news.cso.uiuc.edu> Sender: news@dbased.nuo.dec.com (USENET News System) Reply-To: lionel@quark.enet.dec.com (Steve Lionel) Organization: Digital Equipment Corporation, Nashua NH Lines: 44 In article <C1Cnq5.1zt@news.cso.uiuc.edu>, ercolessi@uimrl3.mrl.uiuc.edu (furio ercolessi) writes: |> |>Moreover, these loops play with arrays which are quite small. |>For example, 4096 elements in total for each array in kernel 13. |>Maybe this is not of much concern to compiler writers, since this is |>after all a hardware issue, but nowadays the cache behavior is a |>factor which often dominates benchmark results on real-world programs. |>With present CPUs and memories, array sizes are happily going into the |>millions, usually exceeding the cache sizes. If compilers are tuned to |>tiny benchmarks which fit into the cache, there could be surprises when |>moving to the real world applications. It is not rare to see dramatic |>changes in performance when increasing the problem size, or transposing |>arrays, on many new RISC architectures. |>Could compilers at least try to alleviate these problems? |> Yup. DEC Fortran V6 for OpenVMS VAX has an optional level of optimization which is intended to improve performance of applications which operate over large arrays and which tend to induce cache thrashing. It uses dependence analysis to see if it is safe to add an additional level of "chunking loops" and/or do loop reordering so that memory accesses are clustered together. If this optimization is used, it can allow the application to maintain level performance no matter how large the problem size, rather than having performance decrease due to cache thrashing and excessive page faults. The qualifier to enable this is /OPTIMIZE=LEVEL=4; look for more details on it in the DEC Fortran Performance Guide for OpenVMS VAX Systems when it reaches your door. If you have the KAP preprocessor, it too can do some amount of optimization for improved memory access. However... this optimization won't help the particular piece of code you posted, as the order of element access is unpredictable at compile-time. It looks like a good test of the OS/hardware treatment of memory access, less so of compiler optimization. -- Steve Lionel lionel@quark.enet.dec.com SDT Languages Group Digital Equipment Corporation 110 Spit Brook Road Nashua, NH 03062