home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!olivea!hal.com!decwrl!deccrl!news.crl.dec.com!dbased.nuo.dec.com!quark.enet.dec.com!lionel
- From: lionel@quark.enet.dec.com (Steve Lionel)
- Newsgroups: comp.lang.fortran
- Subject: Re: Compiler groups working on real apps?
- Message-ID: <1993Jan25.202650.2851@dbased.nuo.dec.com>
- Date: 25 Jan 93 20:26:50 GMT
- References: <1993Jan21.081105.4047@molene.ifremer.fr> <1993Jan22.193019.12936@news.eng.convex.com> <C19wHr.GF3@news.cso.uiuc.edu> <1993Jan23.003639.13681@craycos.com> <C1Cnq5.1zt@news.cso.uiuc.edu>
- Sender: news@dbased.nuo.dec.com (USENET News System)
- Reply-To: lionel@quark.enet.dec.com (Steve Lionel)
- Organization: Digital Equipment Corporation, Nashua NH
- Lines: 44
-
-
- In article <C1Cnq5.1zt@news.cso.uiuc.edu>, ercolessi@uimrl3.mrl.uiuc.edu
- (furio ercolessi) writes:
- |>
- |>Moreover, these loops play with arrays which are quite small.
- |>For example, 4096 elements in total for each array in kernel 13.
- |>Maybe this is not of much concern to compiler writers, since this is
- |>after all a hardware issue, but nowadays the cache behavior is a
- |>factor which often dominates benchmark results on real-world programs.
- |>With present CPUs and memories, array sizes are happily going into the
- |>millions, usually exceeding the cache sizes. If compilers are tuned to
- |>tiny benchmarks which fit into the cache, there could be surprises when
- |>moving to the real world applications. It is not rare to see dramatic
- |>changes in performance when increasing the problem size, or transposing
- |>arrays, on many new RISC architectures.
- |>Could compilers at least try to alleviate these problems?
- |>
-
- Yup. DEC Fortran V6 for OpenVMS VAX has an optional level of optimization which
- is intended to improve performance of applications which operate over large
- arrays and which tend to induce cache thrashing. It uses dependence analysis
- to see if it is safe to add an additional level of "chunking loops" and/or
- do loop reordering so that memory accesses are clustered together. If
- this optimization is used, it can allow the application to maintain level
- performance no matter how large the problem size, rather than having performance
- decrease due to cache thrashing and excessive page faults. The qualifier to
- enable this is /OPTIMIZE=LEVEL=4; look for more details on it in the
- DEC Fortran Performance Guide for OpenVMS VAX Systems when it reaches your
- door.
-
- If you have the KAP preprocessor, it too can do some amount of optimization
- for improved memory access.
-
- However... this optimization won't help the particular piece of code you
- posted, as the order of element access is unpredictable at compile-time. It
- looks like a good test of the OS/hardware treatment of memory access, less so
- of compiler optimization.
- --
-
- Steve Lionel lionel@quark.enet.dec.com
- SDT Languages Group
- Digital Equipment Corporation
- 110 Spit Brook Road
- Nashua, NH 03062
-