home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!stanford.edu!agate!darkstar.UCSC.EDU!golding
- From: lbarroso@pollux.usc.edu (Luiz Barroso)
- Newsgroups: comp.doc.techreports
- Subject: USC Computer Eng. TRs (MPs, Caches, Performance Eval.)
- Date: 13 Nov 1992 11:37:33 -0800
- Organization: University of Southern California, Los Angeles, CA
- Lines: 147
- Approved: compdoc-techreports@ftp.cse.ucsc.edu
- Message-ID: <1e60koINN9qj@darkstar.UCSC.EDU>
- NNTP-Posting-Host: oak.ucsc.edu
- Keywords: technical reports, cache coherence, multiprocessing, simulations
- Originator: golding@oak
-
-
- The following is a list of some of the most recent technical reports
- issued by Dr. Michel Dubois' research group which can be obtained
- through anonymous ftp.
-
- FTP site: 'usc.edu'
- Location: pub/CENG
- Format: PostScript/Compressed
-
- For further information about the reports in this list contact
- Luiz Barroso (barroso@paris.usc.edu). For information about other
- USC CENG tech. reports please contact Mary Zittercob (zitterco@pollux.
- usc.edu).
-
- =========================================================================
-
- Title: A Snooping Cache Coherence Protocol for a Ring Connected
- Multiprocessor
- Authors: Luiz A. Barroso and Michel Dubois
- Technical Report No. CENG-91-03
-
- Abstract: The Express Ring is a new architecture under investigation at
- the University of Southern California. Its main goal is to demonstrate
- that a slotted unidirectional ring with very fast point-to-point
- interconnections can be at least ten times faster than a shared bus, using
- the same technology, and may be the topology of choice for future shared-
- memory multiprocessors. In this paper we introduce the Express Ring
- architecture and present a snooping cache coherence protocol for this
- machine. This protocol shows how consistency of shared memory accesses can
- be efficiently maintained in a ring-connected multiprocessor. We analyze
- the proposed protocol and compare it to other more usual alternatives for
- point-to-point connected machines, such as the SCI cache coherence
- protocol and directory based protocols.
- =========================================================================
-
- Title: Cache Coherence on a Slotted Ring
- Authors: Luiz A. Barroso and Michel Dubois
- Updated version of CENG-91-03; appeared in ICPP'91
- =========================================================================
-
- Title: Delayed Consistency and Its Effects on the Miss Rate of Parallel
- Programs
- Authors: Michel Dubois, Jin-Chin Wang, Luiz A. Barroso, Kangwoo Lee and
- Yang-Syau Chen
- Technical Report No. CENG 92-11
-
- Abstract: In cache based multiprocessors a protocol must maintain
- coherence among replicated copies of shared writable data. In delayed
- consistency protocols the effect of out-going and in-coming invalidations
- or updates are delayed. Delayed coherence can reduce processor blocking
- time as well as the effects of false sharing. In this paper, we introduce
- several implementations of delayed consistency for cache-based systems in
- the framework of a weakly-ordered consistency model. A performance
- comparison of the delayed protocols with the corresponding On-the-Fly
- (non-delayed) consistency protocol is made, through execution-driven
- simulations of four parallel algorithms. The results show that, for
- parallel programs in which false sharing is a problem, significant
- reductions in the data miss rate of parallel programs can be obtained with
- just a small increase in the cost and complexity of the cache system.
- =========================================================================
-
- Title: Cache Inclusion and Processor Sampling in Multiprocessor Simulations
- Authors: Jacqueline Chame and Michel Dubois
- Technical Report No. CENG 92-13
-
- Abstract: The evaluation of cache-based systems demands careful
- simulations of entire benchmarks. Simulation efficiency is essential to
- realistic evaluations. For systems with large caches and large number of
- processors, simulation is often too slow to be practical. In particular,
- the optimized design of a cache for a multiprocessor is very complex with
- current techniques.
- This paper addresses these problems. First we introduce necessary and
- sufficient conditions for cache inclusion in uniprocessors and in
- multiprocessors with and without invalidations. Second, under cache
- inclusion, we show that an accurate trace for a given processor or for a
- cluster of processors can be extracted from a multiprocessor trace. With
- this methodology, possible cache architectures for a processor or for a
- cluster of processors are evaluated independently of the rest of the
- system, resulting in a drastic reduction of the trace length and
- simulation complexity. Moreover, many important system-wide metrics can
- be estimated with good accuracy by extracting the traces of a set of
- randomly selected processors, an approach we call processor sampling. We
- demonstrate the accuracy and efficiency of these techniques by applying
- them to three 64-processor traces.
- =========================================================================
-
- Title: Improving the Performance of Data Caches in Systems with Large
- Miss Latencies
- Authors: Koray Oner and Michel Dubois
- Technical Report No. CENG 92-14
-
- Abstract: With current and projected processor technologies, memory
- accesses are quickly becoming a major bottleneck of modern computing
- systems. Even with a good cache, the miss penalty can be so high that the
- processor works at greatly reduced efficiency. Whereas stores can be
- buffered in a store buffer to hide store miss penalties, loads cannot be
- dealt with so easily because the processor needs the data returned by the
- load.
- In this paper we introduce a simple processor/cache architecture with
- non-blocking loads. We then report results of trace-driven simulations
- of several FORTRAN DO-Loops. We first show that the architecture is
- ineffective unless loads can be hoisted away from the instructions that
- need the returned value. We then apply load hoisting to the loops and
- show the possible performance improvements for the systems with very
- large load miss latencies.
- =========================================================================
-
- Title: The Performance of Cache-Coherent Ring-based Multiprocessors
- Authors: Luiz Andre Barroso and Michel Dubois
- Technical Report No. CENG-92-19
-
- Abstract: Advances in circuit and integration technology are
- continuously boosting the speed of microprocessors. One of the main
- challenges presented by such developments is the effective use of
- powerful microprocessors in shared memory multiprocessor configurations.
- We believe that the interconnection problem is not solved even for small
- scale shared memory multiprocessors, since shared buses are unlikely to
- keep up with the memory bandwidth requirements of new microprocessors. In
- this paper we extensively evaluate the performance of the slotted ring
- interconnection as a replacement for buses in small to medium scale
- shared memory systems and for processor clusters in hierarchical
- massively parallel systems, using a hybrid methodology of analytical
- models and trace-driven simulations. Snooping and directory-based
- coherence protocols for the ring are compared in the context of
- multitasking.
- =========================================================================
-
- Title: The Verification of Cache Coherence Protocols
- Authors: Fong Pong and Michel Dubois
- Technical Report No. CENG-92-20
-
- Abstract: In this paper we introduce a verification technique for cache
- coherence protocols at the behavior level. Protocols are specified by a
- Finite State Machine (FSM) model. The global state space is the Cartesian
- product of an arbitrary number of individual cache state spaces and is
- symbolically expanded. A global FSM characterizing the protocol behavior
- is built and protocol verification becomes equivalent to finding whether
- or not the global FSM may enter erroneous states. State expansion only
- takes a few steps, contrary to current approaches. The verification
- procedure is applied to the verification of five existing protocols
- =========================================================================
-
- ===========================================================================
- Co-moderator: Richard Golding, Computer & Information Sciences, UC Santa Cruz
- compdoc-techreports-request@ftp.cse.ucsc.edu
-
-
-