home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.arch:10995 comp.benchmarks:1758
- Path: sparky!uunet!think.com!ames!sun-barr!cs.utexas.edu!news
- From: wilson@cs.utexas.edu (Paul Wilson)
- Newsgroups: comp.arch,comp.benchmarks
- Subject: Caches & GC's (was Re: Lisp performance...)
- Date: 21 Nov 1992 18:13:24 -0600
- Organization: CS Dept, University of Texas at Austin
- Lines: 93
- Distribution: world
- Message-ID: <lgtk54INNqjr@boogie.cs.utexas.edu>
- References: <1e824rINNlpu@iraul1.ira.uka.de>
- NNTP-Posting-Host: boogie.cs.utexas.edu
-
- In article <1e824rINNlpu@iraul1.ira.uka.de> wolpers@i11s10.ira.uka.de (Andreas Wolpers) writes:
- >Hello everybody,
- >
- >I'm having a minor problem on which I would welcome any comments:
- >Since I'll have some money to spend real soon now, I've been pondering
- >whether we should switch from Sun to HP. The usual benchmark results
- >suggest that a change might result in faster execution of our pet
- >program, a large "theorem prover" written in Lisp.
- >
- >From the SpecInt92 results, one should expect a performance increase
- >of about a factor of 2 when switching from a SS2 to either s SS10-30
- >or HP720. Unfortunately, on both machines turned out to but just
- >30% faster than a SS2 when running our system (for which ps NEVER
- >shows a resident set under 3-4 MB, not even on an 8MB machine).
- >
- >Any explanations at hand? Did we encounter a bottleneck between
- >CPU and memory, or what?
-
- Could be. If you don't have a generational GC, your locality is
- going to be the pits. If you allocate a lot of data between
- garbage collections, you'll typically incur a cache miss and
- a writeback for every block of memory you allocate. That's
- because you can't reuse memory until you know it's garbage,
- so you're always allocating something you haven't used for
- a long time, i.e., at least since the last garbage collection.
-
- What you want is a generational garbage collector and a cache
- large enough to hold the youngest generation. This lets you
- allocate less-than-a-cache-full of data between garbage
- collections, reclaim most of the space, and reuse it at
- the next gc cycle.
-
- The youngest generation should generally be >100KB for basic
- GC efficiency reasons (space-time tradeoffs), so you can't
- really expect to stay in a first-level cache for your heap
- allocations. You could stay in a megabyte-range second
- or third level cache.
-
- > What performace should I expect from
- >a SS10-41 or SS10-52 (which have a larger cache, but still not
- >large enough to hold the Lisp's resident set. And if I'm not
- >mistaken, the large cache results in a longer time spent for
- >non-cache memory accesses (6 cycles instead of 3)). What
- >performace should I expect from a HP735?
-
- If cache misses on allocation are your problem, you're limited
- by the rate of allocation and the cache miss service time,
- plus something for write backs. (You'll typically incur a
- write-back of a dirty block for each block of heap data you
- allocate, since the cache will be mostly full of relatively
- recently allocated---hence written--garbage. This can overload
- your write buffers in a hurry for some programs.)
-
- You also need to add something extra if it's a direct-mapped
- cache---gc'd systems are especially sensitive to DM cache
- conflicts. (Actually, it's kinda weird--DM works BETTER
- if the youngest generation almost fits in the cache, but
- not quite.)
-
- So if you know the rate of allocation in your application,
- you should be able to figure a ballpark cache miss cost
- without much trouble.
-
- (For more on this, see Wilson, Lam, and Moher, "Caching Considerations
- for Generational Garbage Collection," ACM Lisp & Functional Programming
- '92.)
-
- >I can buy enough memory so that disk speed is no criterion.
- >Should I wait for machines with 8MB of cache? Should I shoot myself?
- >Should I give YOU the money? :-)
-
- You might want to look for a better Lisp system instead, with
- a generational GC. Higher cache-to-memory bandwidth could be a
- big win, or even just a bigger block size. (It depends on whether
- you're running up against bandwidth problems or latency problems,
- but 32- or 64-byte blocks are probably considerably better than 16.
- Prefetching would probably work even better than large block sizes,
- up to the point you're bandwidth-limited.)
-
- If you DO have a generational GC, it's important to have high
- bandwidth to any caches that won't hold the youngest generation,
- and it's good to have a great big secondary cache that will.
-
- And yes, you definitely should send me the money, if there's any
- left over after buying a machine with a MB or so of cache.
-
- -- Paul
-
- --
- | Paul R. Wilson wilson@cs.utexas.edu |
- | U. of Texas Computer Sciences Dept. voice: (512) 471-9555 |
- | Taylor Hall 2.124, Austin, TX 78712-1188 fax: (512) 471-8885 |
- | "Inertia makes the world go 'round." |
-