NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / doc / techrepo / 217 next >

Wrap

Internet Message Format | 1992-11-15 | 8.1 KB

Path: sparky!uunet!stanford.edu!agate!darkstar.UCSC.EDU!golding From: lbarroso@pollux.usc.edu (Luiz Barroso) Newsgroups: comp.doc.techreports Subject: USC Computer Eng. TRs (MPs, Caches, Performance Eval.) Date: 13 Nov 1992 11:37:33 -0800 Organization: University of Southern California, Los Angeles, CA Lines: 147 Approved: compdoc-techreports@ftp.cse.ucsc.edu Message-ID: <1e60koINN9qj@darkstar.UCSC.EDU> NNTP-Posting-Host: oak.ucsc.edu Keywords: technical reports, cache coherence, multiprocessing, simulations Originator: golding@oak The following is a list of some of the most recent technical reports issued by Dr. Michel Dubois' research group which can be obtained through anonymous ftp. FTP site: 'usc.edu' Location: pub/CENG Format: PostScript/Compressed For further information about the reports in this list contact Luiz Barroso (barroso@paris.usc.edu). For information about other USC CENG tech. reports please contact Mary Zittercob (zitterco@pollux. usc.edu). ========================================================================= Title: A Snooping Cache Coherence Protocol for a Ring Connected Multiprocessor Authors: Luiz A. Barroso and Michel Dubois Technical Report No. CENG-91-03 Abstract: The Express Ring is a new architecture under investigation at the University of Southern California. Its main goal is to demonstrate that a slotted unidirectional ring with very fast point-to-point interconnections can be at least ten times faster than a shared bus, using the same technology, and may be the topology of choice for future shared- memory multiprocessors. In this paper we introduce the Express Ring architecture and present a snooping cache coherence protocol for this machine. This protocol shows how consistency of shared memory accesses can be efficiently maintained in a ring-connected multiprocessor. We analyze the proposed protocol and compare it to other more usual alternatives for point-to-point connected machines, such as the SCI cache coherence protocol and directory based protocols. ========================================================================= Title: Cache Coherence on a Slotted Ring Authors: Luiz A. Barroso and Michel Dubois Updated version of CENG-91-03; appeared in ICPP'91 ========================================================================= Title: Delayed Consistency and Its Effects on the Miss Rate of Parallel Programs Authors: Michel Dubois, Jin-Chin Wang, Luiz A. Barroso, Kangwoo Lee and Yang-Syau Chen Technical Report No. CENG 92-11 Abstract: In cache based multiprocessors a protocol must maintain coherence among replicated copies of shared writable data. In delayed consistency protocols the effect of out-going and in-coming invalidations or updates are delayed. Delayed coherence can reduce processor blocking time as well as the effects of false sharing. In this paper, we introduce several implementations of delayed consistency for cache-based systems in the framework of a weakly-ordered consistency model. A performance comparison of the delayed protocols with the corresponding On-the-Fly (non-delayed) consistency protocol is made, through execution-driven simulations of four parallel algorithms. The results show that, for parallel programs in which false sharing is a problem, significant reductions in the data miss rate of parallel programs can be obtained with just a small increase in the cost and complexity of the cache system. ========================================================================= Title: Cache Inclusion and Processor Sampling in Multiprocessor Simulations Authors: Jacqueline Chame and Michel Dubois Technical Report No. CENG 92-13 Abstract: The evaluation of cache-based systems demands careful simulations of entire benchmarks. Simulation efficiency is essential to realistic evaluations. For systems with large caches and large number of processors, simulation is often too slow to be practical. In particular, the optimized design of a cache for a multiprocessor is very complex with current techniques. This paper addresses these problems. First we introduce necessary and sufficient conditions for cache inclusion in uniprocessors and in multiprocessors with and without invalidations. Second, under cache inclusion, we show that an accurate trace for a given processor or for a cluster of processors can be extracted from a multiprocessor trace. With this methodology, possible cache architectures for a processor or for a cluster of processors are evaluated independently of the rest of the system, resulting in a drastic reduction of the trace length and simulation complexity. Moreover, many important system-wide metrics can be estimated with good accuracy by extracting the traces of a set of randomly selected processors, an approach we call processor sampling. We demonstrate the accuracy and efficiency of these techniques by applying them to three 64-processor traces. ========================================================================= Title: Improving the Performance of Data Caches in Systems with Large Miss Latencies Authors: Koray Oner and Michel Dubois Technical Report No. CENG 92-14 Abstract: With current and projected processor technologies, memory accesses are quickly becoming a major bottleneck of modern computing systems. Even with a good cache, the miss penalty can be so high that the processor works at greatly reduced efficiency. Whereas stores can be buffered in a store buffer to hide store miss penalties, loads cannot be dealt with so easily because the processor needs the data returned by the load. In this paper we introduce a simple processor/cache architecture with non-blocking loads. We then report results of trace-driven simulations of several FORTRAN DO-Loops. We first show that the architecture is ineffective unless loads can be hoisted away from the instructions that need the returned value. We then apply load hoisting to the loops and show the possible performance improvements for the systems with very large load miss latencies. ========================================================================= Title: The Performance of Cache-Coherent Ring-based Multiprocessors Authors: Luiz Andre Barroso and Michel Dubois Technical Report No. CENG-92-19 Abstract: Advances in circuit and integration technology are continuously boosting the speed of microprocessors. One of the main challenges presented by such developments is the effective use of powerful microprocessors in shared memory multiprocessor configurations. We believe that the interconnection problem is not solved even for small scale shared memory multiprocessors, since shared buses are unlikely to keep up with the memory bandwidth requirements of new microprocessors. In this paper we extensively evaluate the performance of the slotted ring interconnection as a replacement for buses in small to medium scale shared memory systems and for processor clusters in hierarchical massively parallel systems, using a hybrid methodology of analytical models and trace-driven simulations. Snooping and directory-based coherence protocols for the ring are compared in the context of multitasking. ========================================================================= Title: The Verification of Cache Coherence Protocols Authors: Fong Pong and Michel Dubois Technical Report No. CENG-92-20 Abstract: In this paper we introduce a verification technique for cache coherence protocols at the behavior level. Protocols are specified by a Finite State Machine (FSM) model. The global state space is the Cartesian product of an arbitrary number of individual cache state spaces and is symbolically expanded. A global FSM characterizing the protocol behavior is built and protocol verification becomes equivalent to finding whether or not the global FSM may enter erroneous states. State expansion only takes a few steps, contrary to current approaches. The verification procedure is applied to the verification of five existing protocols ========================================================================= =========================================================================== Co-moderator: Richard Golding, Computer & Information Sciences, UC Santa Cruz compdoc-techreports-request@ftp.cse.ucsc.edu