NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / compiler / 2064 < prev next >

Wrap

Internet Message Format | 1992-12-22 | 2.6 KB

Xref: sparky comp.compilers:2064 comp.arch:11863 Newsgroups: comp.compilers,comp.arch Path: sparky!uunet!cis.ohio-state.edu!zaphod.mps.ohio-state.edu!darwin.sura.net!gatech!news.byu.edu!eff!world!iecc!compilers-sender From: Andre.Marien@cs.kuleuven.ac.be (Andre Marien) Subject: Sun sparc behavior Reply-To: Andre.Marien@cs.kuleuven.ac.be (Andre Marien) Organization: Dept. Computerwetenschappen K.U.Leuven Date: Tue, 22 Dec 1992 11:00:50 GMT Approved: compilers@iecc.cambridge.ma.us Message-ID: <92-12-095@comp.compilers> Keywords: sparc, architecture Sender: compilers-sender@iecc.cambridge.ma.us Lines: 41 While trying to get some ideas for optimization, we run into some oddities which we cannot explain. We have no solid background in architecture, but would like to have some explanation for the observed behavior. Our test frame should have all data and instructions in the cache. (compare loop with NN nops with loop where N nops are replaced). First, any store takes 5 cycles on a sparc 1, and 7 cycles on a sparc 2 server. Is this number related to the pipeline depth ? Why is it that stores take at least 5 cycles, even if the surrounding code is just no-ops? Our guesses: 1) the pipeline is emptied for some reason. 2) The cache can only deliver on datum at the time, so it cannot deliver both an instruction and data. As reading takes 2 cycles on all sparcs tested, it is not unreasonable for us for the store to take 3 cycles. But 5 ?? Second, two consecutive stores add 2 cycles. Two consecutive loads add no cycle. This seems to imply that despite the fixed cost, the write buffer(s) is not available in time. However, three stores after each ather only incur these 2 extra cycles once !? On the sparc 2 server, it looks as if the distance between two stores should be odd to avoid those 2 cycles. We think it may have to do with the pipeline organization, but have no clue. Third, on the sparc 2 server, we notice that a load + store takes 7 cycles, not 9, independent of whether we store to the location of the load or not. The same is true for the combination store + load. Putting some noops between the load/store or store/load increases the number of cycles just by that number.How can an independent load make a store go faster ?? If any kind soul can give some feed-back or references, we would be happy. We know we don't need the architecture definition, but some docs on the implementation. Some clues we obviously missed would already be very nice. Andre' Marien bimandre@cs.kuleuven.ac.be -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.