home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.compilers:2064 comp.arch:11863
- Newsgroups: comp.compilers,comp.arch
- Path: sparky!uunet!cis.ohio-state.edu!zaphod.mps.ohio-state.edu!darwin.sura.net!gatech!news.byu.edu!eff!world!iecc!compilers-sender
- From: Andre.Marien@cs.kuleuven.ac.be (Andre Marien)
- Subject: Sun sparc behavior
- Reply-To: Andre.Marien@cs.kuleuven.ac.be (Andre Marien)
- Organization: Dept. Computerwetenschappen K.U.Leuven
- Date: Tue, 22 Dec 1992 11:00:50 GMT
- Approved: compilers@iecc.cambridge.ma.us
- Message-ID: <92-12-095@comp.compilers>
- Keywords: sparc, architecture
- Sender: compilers-sender@iecc.cambridge.ma.us
- Lines: 41
-
- While trying to get some ideas for optimization, we run into some oddities
- which we cannot explain. We have no solid background in architecture, but
- would like to have some explanation for the observed behavior. Our
- test frame should have all data and instructions in the cache. (compare
- loop with NN nops with loop where N nops are replaced).
-
- First, any store takes 5 cycles on a sparc 1, and 7 cycles on a sparc 2
- server. Is this number related to the pipeline depth ? Why is it that
- stores take at least 5 cycles, even if the surrounding code is just
- no-ops?
-
- Our guesses:
- 1) the pipeline is emptied for some reason.
- 2) The cache can only deliver on datum at the time, so it cannot deliver
- both an instruction and data.
- As reading takes 2 cycles on all sparcs tested, it is not unreasonable
- for us for the store to take 3 cycles. But 5 ??
-
- Second, two consecutive stores add 2 cycles. Two consecutive loads add no
- cycle. This seems to imply that despite the fixed cost, the write
- buffer(s) is not available in time. However, three stores after each
- ather only incur these 2 extra cycles once !? On the sparc 2 server, it
- looks as if the distance between two stores should be odd to avoid those 2
- cycles. We think it may have to do with the pipeline organization, but
- have no clue.
-
- Third, on the sparc 2 server, we notice that a load + store takes 7
- cycles, not 9, independent of whether we store to the location of the load
- or not. The same is true for the combination store + load. Putting some
- noops between the load/store or store/load increases the number of cycles
- just by that number.How can an independent load make a store go faster ??
-
- If any kind soul can give some feed-back or references, we would be happy.
- We know we don't need the architecture definition, but some docs on the
- implementation. Some clues we obviously missed would already be very nice.
-
- Andre' Marien
- bimandre@cs.kuleuven.ac.be
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-