NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / sys / isis / 354 < prev next >

Wrap

Text File | 1993-01-03 | 11.6 KB | 225 lines

Newsgroups: comp.sys.isis Path: sparky!uunet!wupost!micro-heart-of-gold.mit.edu!uw-beaver!cornell!ken From: ken@cs.cornell.edu (Ken Birman) Subject: Re: More about Causality Message-ID: <1993Jan3.161232.14265@cs.cornell.edu> Organization: Cornell Univ. CS Dept, Ithaca NY 14853 References: <1992Dec27.124442.18404@dec8.ncku.edu.tw> Date: Sun, 3 Jan 1993 16:12:32 GMT Lines: 214 Warning: this discussion is rated PG-13... (for non-US readers, this is a reference to the US film rating system... "mature readers only") What follows is a highly technical discussion of Isis internals. Many Isis uses employ the "news" system and for them, all of this is automatic and not an issue. Even those users who do use the toolkit directly don't "need" to know about this very detailed layer of the system to make effective use of the system... In article <1992Dec27.124442.18404@dec8.ncku.edu.tw> jsw@eembox.ncku.edu.tw (Master Jyh-Shi Wu (82.9)) writes: > 1) As I know, isis v2.1 multicast ordering is arranged in the unit of UNIX > processes (i.e. heavyweight), Is lightweight mulitcast ordering (i.e. > in the unit of "tasks") implemented in v2.2.7 ? This question is one we often get from people with a SmallTalk or Lisp background, where a "task" is a very costly thing and where multitasking is used the way that UNIX uses processes. In Isis, we assume that tasks are very cheap and very common. For this reason, there is little benefit to ordering at the level of tasks and we don't plan to introduce task-level ordering at any point in the future. This can be awkward because on languages like Smalltalk and Lisp, it makes the MSG_ENQUEUE/msg_rcv mechanism much more cost-effective than the normal isis_entry scheme for receiving messages -- and, as you probably realize, msg_rcv can "lose" ordering that your application normally would benefit from having. In C++ or C or other languages where tasks are cheap, one logical activity can have many tasks associated with it, and it makes more sense to order at the process level and assume that since processes can share memory under UNIX, a process is the "level of encapsulation" at which ordering makes sense. For the future, we do plan to move to a port-oriented ordering model, though. In this, each process will own one or more ports and ordering will be done at the level of ports, not processes (actually, causal domains, with each port belonging to exactly one domain). When we do this in the "Horus" system, you could have a lightweight form of ordering within a single address space. The Horus papers are included in the Isis publications list. Email to reen@cs.cornell.edu for a copy. I'll ask her to post a recent copy to this group. Look for papers with Robbert van Renesse as a co-author: he is the main architect of Horus (along with me, Brad Glade, Mike Reiter, Robert Cooper and Aleta Ricciardi). > 2) There are still little documents in bypass modes of isis [a|g|c]bcast( > "B",...) . It's very glad to know that causal abcast is implemented > in bypass mode. But, what's the difference in the bypass mode of > abcast, cbcast and gbcast? > what's the essence of "bypass mode" (more than speedup in delivery, > what I concerned is ordering) I think you may have missed the ACM TOCS paper -- August 1991. This is the main reference and the best place to get answers to your technical questions. Briefly: there is a bypass abcast and cbcast, but gbcast is still done via "protos". The essense of the protocol is covered in the paper. The concept is discussed in the current version of the manual (V3.0.5 or 6 or 7) in a chapter discussing the overall architecture of Isis. For new readers, "bypass" communication occurs when Isis can send messages directly to their destinations, rather than indirectly through the "protos" server. This gives a 10-fold performance improvement for many applications. To get bypass communication, you need to know that the sender and destinations are related in one of a small set of ways, e.g. both member of the same group, one is member of a group and the other has used pg_client on that group, etc. Isis will use bypass communication if it can, and if not sends the message down to the protos server, via RPC, which delivers the message up to the right destinations, also via RPC. The protos server uses a decentralized protocol discussed in an ACM TOCS paper in 1987. The direct communication uses this other, more recent, set of protocols. The abcast protocols are sort of incompatible: if you use abcast in bypass mode, it is totally ordered relative to other bypass-mode abcasts, but not to other non-bypass abcasts, and vice versa. Cbcast, however, is ordered no matter how you send it, so this is only a concern for abcast. You can force abcast to go via a non-bypass route using "X" as an option in the long-form. > if the bypass mode of abcast is done by cbcasts, then, is "bypassed > abcast" can show causality when used in conjugated with cbcasts? Yes, if all the abcasts are done in bypass mode. No, if there is a mix of bypass and non-bypass abcasts. > If abcast & gbcast are causal within a group, what's the difference > in abcast and "bypass abcast" when used within a group?, and bypass > cbcast & ordinary cbcast, gbcast...? I think I covered this above -- hope so, at least. The only ordering issue is that bypass and non-bypass abcasts are unordered w.r.t. each other. All other Isis ordering properties apply for abcast, cbcast and gbcast regardless of whether things go via bypass mode or not. We would prefer not to have two flavors of abcast, and in HORUS this will go away and we will only have the bypass flavor. For now, though, we made this compromise to maximize performance. > I think, when specifying "Bypass" mode in [a|c|g]bcast(), isis would > perform bypass mode if situations allows that ( example: multicast > within a group), if don't specify bypass mode, isis would "sure to" > use the old protocol. Is there any possibility that isis "automatically" > turns to bypass mode when user don't specify that? Isis automatically turns to bypass mode if it can find any possible way to do it, but it won't automatically join you to a group or make you a client of a group. Using "B" and "X" you can be sure that you will at least get warned if a surprise occurs but the actual algorithm is based on looking at the destination of the message. > > 3) A causality case: > consider 3 proesses (p1, p2, p3), first, p1 send msg A to p2, then > send msg B to p3 after A is sent, and, after p2 receives A, p2 send > msg C to p3. All three processes start from initial state (i.e. start > from vector time [0,0,0]) > > obviously C and B would be delayed in cbcast queue of p3, but, what would > be the correct sequence if they are finally delivered? I think you are slightly confused about how the scheme works. For point-to-point messages (in your example, all of the message are point to point), we use an RPC and don't increment the counter at all. So, in your scenario, the VT remains at 0,0,0 in all processes. Secondly, in your example, message B and C are sent concurrently. So, there is no ordering constraint on cbcast and no guarantee of what order the messages will be seen in. Causal ordering matters when some form of "cycle" causes a message to get back to a destination of a prior message, or when multicasts are used -- say, if A,B and C were multicasts to the group of {p1,p2,p3} and your example looked at reception of A, B and C in p3. Then, VT's would change value ([1,0,0] for A; [2,0,0] for B; [1,1,0] for C) and hence we would know that B and C are delivered after A at process p3. We would still not be guaranteed any particular order between B and C, unless of course C had been received by p1 before it sent B, or B had been received by p2 before it sent C. Then, there would be a causal order between A,B and C and this would determine the order seen by p3. I hope this is clear... (personally, I find these examples with lots of letters and numbers confusing. A picture helps a lot). > > if, considering "causality" as the extension of FIFO, B should be > delivered before C. (just consider "p1 and p3" pair) > if, comparing the vector timestamp of B & C by cbcast protocol, B > should precede C too. No. B and C are sent by different processes "concurrenty". There is no reason that ordering should be constrained in this example. > > But, if we check the "happened before" chain, we would see A -> B > and A -> C, then, a question arises: it's hard to decide the "happ- > ened before" relation between B and C, then, how can we say B should > be delivered before C? what's the correct answer of the causal delivery > of B & C? Right, because they are concurrent! The problem is that you used time in your example "then p1 sends B, and then p2 sends C" but this use of time was not a meaningful one, since it seems to presume that the schedulers that decide when p1 will run and when p2 will run are somehow coordinated over multiple machines in the network. Say that p1 spends a few seconds thinking before it sends B. Meanwhile, C could easily be sent by p2. This illustrates how a scheduler could change the ordering very easily, just by delaying one process a little in a perfectly plausible way. Leslie Lamport wrote extensively on time and its meaning in distributed systems -- see any of my papers for references. The two papers appeared in CACM in 1978 and, I think, 1981. Lamport (and Isis) use the happens before relation to define ordering -- not any sort of intuitive notion based on realtime. As your example makes clear, realtime can be confusing and can lead to conclusions that look plausible but have subtle hidden issues. In Isis, it is best to say "X happens, then Y" if you mean that Y could have learned that X happened because X happened in the same process as Y, and happened first, or there is some chain of messages between when X occurs and when Y occurs. In your example, there is no chain of this sort that relates B and C -- unless you make them multicasts in a group, and one is sent, say C, only after the other, B, was received by process that will send C. That makes a chain from A -> B and from B -> C, and only then would some third party be sure to see A, B and then C in that order. > >Next time I would ask some technical , programming questions. If there's any >programming template of spooling (as checkpoints) please post it (that may >reduce my qty of problems in next posting) Thanks! > I think you should definitely have a look at those ACM TOCS papers, but at this level of a system like Isis, things do get pretty complex. Fortunately, not many users are aware of these issues, because the toolkit as a whole, and systems like News, hide them from the programmer. There is also a paper I wrote with Barry Gleeson and Robert Cooper, available as a technical report from Cornell, which you might find interesting (the one on "Process Group Semantics") As for spooling and checkpoints: the only documentation is the chapter in the manual. Tim Clark is our expert on the spooler (tclark@cs.cornell.edu) and as noted in a different posting, has just done some work on it to enhance performance. I am sure he would be happy to answer questions about checkpointing, but only when he returns from vacation on January 15. Meanwhile, I can answer simple questions... -- Kenneth P. Birman E-mail: ken@cs.cornell.edu 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office) Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428