home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!wupost!crcnis1.unl.edu!moe.ksu.ksu.edu!zaphod.mps.ohio-state.edu!cs.utexas.edu!sun-barr!news2me.EBay.Sun.COM!exodus.Eng.Sun.COM!flayout.Eng.Sun.COM!tremblay
- From: tremblay@flayout.Eng.Sun.COM (Marc Tremblay)
- Newsgroups: comp.arch
- Subject: Re: why no register + register addressing mode in R3000 (repost)
- Date: 16 Nov 1992 19:14:11 GMT
- Organization: Sun Microsystems, Mt. View, Ca.
- Lines: 77
- Message-ID: <lgfso3INN8if@exodus.Eng.Sun.COM>
- References: <lg5i5oINN1q4@exodus.Eng.Sun.COM> <endecotp.721675619@cs.man.ac.uk>
- NNTP-Posting-Host: flayout
-
- In article <endecotp.721675619@cs.man.ac.uk> endecotp@cs.man.ac.uk (PB Endecott (PhD SFurber)) writes:
- >On an architecture without this mode, assuming that each of these
- >operations would need a separate ADD to calculate the address, the number
- >of instructions needed to do the same work would increase by 6.2%. Does a
- >6.2% performance increase justify an extra register read port ?
-
- I sure would add a port for a 6.2% increase in integer performance.
- The level of integration achievable today, coupled with the presence
- of many critical paths not related to the register file, makes the
- cost of adding one read port negligible.
-
- The 6.2% number given above has to be reduced somewhat to account for
- some other optimizations. As stated, it assumes that a processor
- without register + register addressing requires an extra cycle
- to compute the address. That is not exactly the case for a variety
- of reasons:
-
- - in some cases the second register can be replaced by a constant
- if proper analysis is done. Notice that the numbers I originally
- gave excluded register g0 (hardwired to 0). Also notice that
- register allocation normally tries to replace register usage
- with a constant especially for address generation. So the number
- would be reduced but not significantly.
-
- - for superscalars the extra addition may be for free in terms
- of cycles. There are some second order effects though, such as
- increased code size (bigger I$ miss), etc. (small). On the other
- hand this extra addition suggests the use of two other read
- ports and another write port unless they are already there
- so that generic ALU operation can be accomplished (aka Viking).
-
- In any case the 6.2% number is significant and in our case has justified
- the definition of register + register addressing in the instruction set.
- Some people asked about the impact on floating-point programs, here are
- the numbers for SPECfp92:
-
- Percentage of loads/stores that use register + register addressing:
-
- Benchmark loads stores
- --------- ----- ------
- ora 98.1% 0.6%
- spice 73% 29%
- su2cor 29.6% 6.7%
- hydro 22.5% 18.3%
- tomcatv 22.2% 0.03%
- fpppp 10.3% 0.002%
- doduc 9.2% 4.9%
- wave5 7.1% 0.9%
- mdljdp2 6.4% 0.3%
- alvinn 5.7% 0.2%
- ear 4.0% 0.01%
- mdljsp2 2.7% 0.7%
- nasa7 0.9% 0.1%
- swm256 0.3% 0.16%
-
- As you can see, mileage varies according to the application. Globally though,
- it does suggest a high utilization of the register + register addressing mode.
-
- >Question : does the Sparc do both sorts of addressing in the same number of
- >cycles, or does it use an extra cycle for reading the extra register ?
-
- Viking handles both addressing modes indiscriminately. The load merely
- uses the two read ports that are already provided to support an ALU
- operation, so this did not add a port.
-
- >The 6.2% is a maximum; clever compiler techniques should reduce this.
-
- True, see above. Also, one could compile with the register + register
- addressing feature turned on/off to see the difference between the two.
- That's what Hall and O'Brien did (paper in ASPLOS-IV) to prove the
- usefulness of pre-increment and pre-decrement load/stores.
- Finally, one could claim that the register + register mode gives the
- compiler more flexibility, which could turn out to help in future
- optimizations.
-
- - Marc Tremblay.
- Sun Microsystems.
-