home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.arch:11035 comp.lang.misc:3845
- Newsgroups: comp.arch,comp.lang.misc
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!hamblin.math.byu.edu!hellgate.utah.edu!lanl!cochiti.lanl.gov!jlg
- From: jlg@cochiti.lanl.gov (J. Giles)
- Subject: Re: Hardware Support for Numeric Algorithms
- Message-ID: <1992Nov23.183204.12281@newshost.lanl.gov>
- Sender: news@newshost.lanl.gov
- Organization: Los Alamos National Laboratory
- References: <722061187@sheol.UUCP> <1efh6vINNm6c@network.ucsd.edu> <722228704@sheol.UUCP>
- Date: Mon, 23 Nov 1992 18:32:04 GMT
- Lines: 37
-
- In article <722228704@sheol.UUCP>, throopw@sheol.UUCP (Wayne Throop) writes:
- |> [...]
- |> But ultimately, when you are going after that last 1% or .1% of speed,
- |> you will simply have to spell out the istream, operation by operation.
-
- Unfortunately, the speedup from going to assembly is usually in the
- 5% to 10% area. And yes, there are people to whom such an increase
- is worth a man-year or two of effor to achieve. They'd rather spend
- less effort if the CS community would provide better integrated tools
- for optimizing, and there are people who cannot afford the effort
- these kinds of optimizations require - but would benefit also if the
- tools were improved.
-
- |> [...]
- |> No amounts of hints to the optimizer will ever be enough, because
- |> ultimately the optimizer can only deal with hints it is prepared to
- |> accept, and if a programmer comes up with something unforseen by the
- |> optimizer, there's no simple way to give a hint about it.
-
- True. However it's possible to have the amount of explicit
- assembly kept to a minimum if the proper interfacing protocols
- are present. For example, neither C nor Fortran (nor most other
- high-level languages) have a way for the user to get the leading
- zero count of the vector mask on a Cray. I have algorithms which
- need that computation and which are time-critical. The nearest
- that the high-level languages can do is some clumsy loop. All
- I really need are some intrinsics corresponding to the instructions
- which set the vector mask based on some array conditionals and an
- intrinsic which does a leading zero count instruction. In fact,
- if all the machine instructions were avalable as intrinsic functions
- calls, then the compiler need only be able to do adequate register
- allocation between these intrinsics (and also into the code generated
- from the higher level) and I could optimize any part of the code
- without paying procedure-call overhead.
-
- --
- J. Giles
-