NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / arch / 10796 < prev next >

Wrap

Internet Message Format | 1992-11-17 | 3.8 KB

Path: sparky!uunet!wupost!crcnis1.unl.edu!moe.ksu.ksu.edu!zaphod.mps.ohio-state.edu!cs.utexas.edu!sun-barr!news2me.EBay.Sun.COM!exodus.Eng.Sun.COM!flayout.Eng.Sun.COM!tremblay From: tremblay@flayout.Eng.Sun.COM (Marc Tremblay) Newsgroups: comp.arch Subject: Re: why no register + register addressing mode in R3000 (repost) Date: 16 Nov 1992 19:14:11 GMT Organization: Sun Microsystems, Mt. View, Ca. Lines: 77 Message-ID: <lgfso3INN8if@exodus.Eng.Sun.COM> References: <lg5i5oINN1q4@exodus.Eng.Sun.COM> <endecotp.721675619@cs.man.ac.uk> NNTP-Posting-Host: flayout In article <endecotp.721675619@cs.man.ac.uk> endecotp@cs.man.ac.uk (PB Endecott (PhD SFurber)) writes: >On an architecture without this mode, assuming that each of these >operations would need a separate ADD to calculate the address, the number >of instructions needed to do the same work would increase by 6.2%. Does a >6.2% performance increase justify an extra register read port ? I sure would add a port for a 6.2% increase in integer performance. The level of integration achievable today, coupled with the presence of many critical paths not related to the register file, makes the cost of adding one read port negligible. The 6.2% number given above has to be reduced somewhat to account for some other optimizations. As stated, it assumes that a processor without register + register addressing requires an extra cycle to compute the address. That is not exactly the case for a variety of reasons: - in some cases the second register can be replaced by a constant if proper analysis is done. Notice that the numbers I originally gave excluded register g0 (hardwired to 0). Also notice that register allocation normally tries to replace register usage with a constant especially for address generation. So the number would be reduced but not significantly. - for superscalars the extra addition may be for free in terms of cycles. There are some second order effects though, such as increased code size (bigger I$ miss), etc. (small). On the other hand this extra addition suggests the use of two other read ports and another write port unless they are already there so that generic ALU operation can be accomplished (aka Viking). In any case the 6.2% number is significant and in our case has justified the definition of register + register addressing in the instruction set. Some people asked about the impact on floating-point programs, here are the numbers for SPECfp92: Percentage of loads/stores that use register + register addressing: Benchmark loads stores --------- ----- ------ ora 98.1% 0.6% spice 73% 29% su2cor 29.6% 6.7% hydro 22.5% 18.3% tomcatv 22.2% 0.03% fpppp 10.3% 0.002% doduc 9.2% 4.9% wave5 7.1% 0.9% mdljdp2 6.4% 0.3% alvinn 5.7% 0.2% ear 4.0% 0.01% mdljsp2 2.7% 0.7% nasa7 0.9% 0.1% swm256 0.3% 0.16% As you can see, mileage varies according to the application. Globally though, it does suggest a high utilization of the register + register addressing mode. >Question : does the Sparc do both sorts of addressing in the same number of >cycles, or does it use an extra cycle for reading the extra register ? Viking handles both addressing modes indiscriminately. The load merely uses the two read ports that are already provided to support an ALU operation, so this did not add a port. >The 6.2% is a maximum; clever compiler techniques should reduce this. True, see above. Also, one could compile with the register + register addressing feature turned on/off to see the difference between the two. That's what Hall and O'Brien did (paper in ASPLOS-IV) to prove the usefulness of pre-increment and pre-decrement load/stores. Finally, one could claim that the register + register mode gives the compiler more flexibility, which could turn out to help in future optimizations. - Marc Tremblay. Sun Microsystems.