home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.benchmarks:1723 comp.arch:10862
- Path: sparky!uunet!ogicse!uwm.edu!ux1.cso.uiuc.edu!csrd.uiuc.edu!sp90.csrd.uiuc.edu!grout
- From: grout@sp90.csrd.uiuc.edu (John R. Grout)
- Newsgroups: comp.benchmarks,comp.arch
- Subject: Re: DEC ALPHA Performance Claims
- Message-ID: <1992Nov18.163410.18990@csrd.uiuc.edu>
- Date: 18 Nov 92 16:34:10 GMT
- Article-I.D.: csrd.1992Nov18.163410.18990
- References: <BxH7s7.5Cv@inews.Intel.COM> <4248@bcstec.ca.boeing.com> <1992Nov16.174912.22905@ryn.mro4.dec.com>
- Sender: news@csrd.uiuc.edu
- Reply-To: j-grout@uiuc.edu
- Organization: UIUC Center for Supercomputing Research and Development
- Lines: 34
-
- bhandarkar@wrksys.enet.dec.com (Dileep Bhandarkar) writes:
-
-
- >In article <4248@bcstec.ca.boeing.com>, silverm@bcstec.ca.boeing.com (Jeff Silverman) writes...
- >>
-
- >Optimal instruction scheduling for newer processors may indeed be different.
- >Old binaries will run correctly, but probably somewhat slower. New binaries
- >should in most cases run well on old machines, unless the scheduling rules
- >are at odds.
-
- There is more than one way for the scheduling rules to be at odds... one less
- obvious one which comes to mind is different tradeoff points between code size
- and structure and execution speed for different issue rates: for example,
- binaries intended for a six-instruction issue per cycle machine (which would
- try to create longer runs between branches by doing more speculative execution
- and duplicating more code) could be significantly larger than binaries
- intended for a two-instruction issue per cycle machine.
-
- Running either's program on the other will work (if they are sufficiently
- upward and downard compatible with regard to things like hints... which Alpha
- should be) but is _not_ likely to give good performance. For example, the
- six-cycle machine would encounter shorter runs and higher interference within
- groups of six instructions it tried to issue together; the two-cycle machine
- would encounter a larger program, causing more instruction cache misses and
- more page faults, and more unnecessary speculatively-executed code.
-
- This raises the possibility of translating binaries _between_ implementation
- levels of the same architecture (especially toward higher issue rates)...
- something which I would hope DEC is thinking about.
- --
- John R. Grout j-grout@uiuc.edu
- University of Illinois, Urbana-Champaign
- Center for Supercomputing Research and Development
-