NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / arch / 11035 < prev next >

Wrap

Internet Message Format | 1992-11-19 | 2.5 KB

Xref: sparky comp.arch:11035 comp.lang.misc:3845 Newsgroups: comp.arch,comp.lang.misc Path: sparky!uunet!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!hamblin.math.byu.edu!hellgate.utah.edu!lanl!cochiti.lanl.gov!jlg From: jlg@cochiti.lanl.gov (J. Giles) Subject: Re: Hardware Support for Numeric Algorithms Message-ID: <1992Nov23.183204.12281@newshost.lanl.gov> Sender: news@newshost.lanl.gov Organization: Los Alamos National Laboratory References: <722061187@sheol.UUCP> <1efh6vINNm6c@network.ucsd.edu> <722228704@sheol.UUCP> Date: Mon, 23 Nov 1992 18:32:04 GMT Lines: 37 In article <722228704@sheol.UUCP>, throopw@sheol.UUCP (Wayne Throop) writes: |> [...] |> But ultimately, when you are going after that last 1% or .1% of speed, |> you will simply have to spell out the istream, operation by operation. Unfortunately, the speedup from going to assembly is usually in the 5% to 10% area. And yes, there are people to whom such an increase is worth a man-year or two of effor to achieve. They'd rather spend less effort if the CS community would provide better integrated tools for optimizing, and there are people who cannot afford the effort these kinds of optimizations require - but would benefit also if the tools were improved. |> [...] |> No amounts of hints to the optimizer will ever be enough, because |> ultimately the optimizer can only deal with hints it is prepared to |> accept, and if a programmer comes up with something unforseen by the |> optimizer, there's no simple way to give a hint about it. True. However it's possible to have the amount of explicit assembly kept to a minimum if the proper interfacing protocols are present. For example, neither C nor Fortran (nor most other high-level languages) have a way for the user to get the leading zero count of the vector mask on a Cray. I have algorithms which need that computation and which are time-critical. The nearest that the high-level languages can do is some clumsy loop. All I really need are some intrinsics corresponding to the instructions which set the vector mask based on some array conditionals and an intrinsic which does a leading zero count instruction. In fact, if all the machine instructions were avalable as intrinsic functions calls, then the compiler need only be able to do adequate register allocation between these intrinsics (and also into the code generated from the higher level) and I could optimize any part of the code without paying procedure-call overhead. -- J. Giles