home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.isis
- Path: sparky!uunet!wupost!micro-heart-of-gold.mit.edu!uw-beaver!cornell!ken
- From: ken@cs.cornell.edu (Ken Birman)
- Subject: Re: More about Causality
- Message-ID: <1993Jan3.161232.14265@cs.cornell.edu>
- Organization: Cornell Univ. CS Dept, Ithaca NY 14853
- References: <1992Dec27.124442.18404@dec8.ncku.edu.tw>
- Date: Sun, 3 Jan 1993 16:12:32 GMT
- Lines: 214
-
- Warning: this discussion is rated PG-13... (for non-US readers, this
- is a reference to the US film rating system... "mature readers only")
-
- What follows is a highly technical discussion of Isis internals.
- Many Isis uses employ the "news" system and for them, all of this is
- automatic and not an issue. Even those users who do use the toolkit
- directly don't "need" to know about this very detailed layer of the
- system to make effective use of the system...
-
- In article <1992Dec27.124442.18404@dec8.ncku.edu.tw> jsw@eembox.ncku.edu.tw (Master Jyh-Shi Wu (82.9)) writes:
- > 1) As I know, isis v2.1 multicast ordering is arranged in the unit of UNIX
- > processes (i.e. heavyweight), Is lightweight mulitcast ordering (i.e.
- > in the unit of "tasks") implemented in v2.2.7 ?
-
- This question is one we often get from people with a SmallTalk or Lisp
- background, where a "task" is a very costly thing and where multitasking
- is used the way that UNIX uses processes.
-
- In Isis, we assume that tasks are very cheap and very common. For this
- reason, there is little benefit to ordering at the level of tasks and
- we don't plan to introduce task-level ordering at any point in the future.
-
- This can be awkward because on languages like Smalltalk and Lisp, it makes
- the MSG_ENQUEUE/msg_rcv mechanism much more cost-effective than the
- normal isis_entry scheme for receiving messages -- and, as you probably
- realize, msg_rcv can "lose" ordering that your application normally
- would benefit from having.
-
- In C++ or C or other languages where tasks are cheap, one logical activity
- can have many tasks associated with it, and it makes more sense to order
- at the process level and assume that since processes can share memory
- under UNIX, a process is the "level of encapsulation" at which ordering
- makes sense.
-
- For the future, we do plan to move to a port-oriented ordering model, though.
- In this, each process will own one or more ports and ordering will be done
- at the level of ports, not processes (actually, causal domains, with each
- port belonging to exactly one domain). When we do this in the "Horus" system,
- you could have a lightweight form of ordering within a single address space.
-
- The Horus papers are included in the Isis publications list. Email to
- reen@cs.cornell.edu for a copy. I'll ask her to post a recent copy to this
- group. Look for papers with Robbert van Renesse as a co-author: he is the
- main architect of Horus (along with me, Brad Glade, Mike Reiter, Robert
- Cooper and Aleta Ricciardi).
-
- > 2) There are still little documents in bypass modes of isis [a|g|c]bcast(
- > "B",...) . It's very glad to know that causal abcast is implemented
- > in bypass mode. But, what's the difference in the bypass mode of
- > abcast, cbcast and gbcast?
- > what's the essence of "bypass mode" (more than speedup in delivery,
- > what I concerned is ordering)
-
- I think you may have missed the ACM TOCS paper -- August 1991. This is
- the main reference and the best place to get answers to your technical
- questions.
-
- Briefly: there is a bypass abcast and cbcast, but gbcast is still done
- via "protos". The essense of the protocol is covered in the paper. The
- concept is discussed in the current version of the manual (V3.0.5 or 6 or 7)
- in a chapter discussing the overall architecture of Isis.
-
- For new readers, "bypass" communication occurs when Isis can send messages
- directly to their destinations, rather than indirectly through the "protos"
- server. This gives a 10-fold performance improvement for many applications.
- To get bypass communication, you need to know that the sender and destinations
- are related in one of a small set of ways, e.g. both member of the same group,
- one is member of a group and the other has used pg_client on that group, etc.
- Isis will use bypass communication if it can, and if not sends the message
- down to the protos server, via RPC, which delivers the message up to the
- right destinations, also via RPC.
-
- The protos server uses a decentralized protocol discussed in an ACM TOCS
- paper in 1987. The direct communication uses this other, more recent, set
- of protocols.
-
- The abcast protocols are sort of incompatible: if you use abcast in bypass mode,
- it is totally ordered relative to other bypass-mode abcasts, but not to other
- non-bypass abcasts, and vice versa. Cbcast, however, is ordered no matter
- how you send it, so this is only a concern for abcast. You can force abcast
- to go via a non-bypass route using "X" as an option in the long-form.
-
- > if the bypass mode of abcast is done by cbcasts, then, is "bypassed
- > abcast" can show causality when used in conjugated with cbcasts?
-
- Yes, if all the abcasts are done in bypass mode. No, if there is a mix of
- bypass and non-bypass abcasts.
-
- > If abcast & gbcast are causal within a group, what's the difference
- > in abcast and "bypass abcast" when used within a group?, and bypass
- > cbcast & ordinary cbcast, gbcast...?
-
- I think I covered this above -- hope so, at least. The only ordering issue
- is that bypass and non-bypass abcasts are unordered w.r.t. each other.
- All other Isis ordering properties apply for abcast, cbcast and gbcast
- regardless of whether things go via bypass mode or not.
-
- We would prefer not to have two flavors of abcast, and in HORUS this will
- go away and we will only have the bypass flavor. For now, though, we
- made this compromise to maximize performance.
-
- > I think, when specifying "Bypass" mode in [a|c|g]bcast(), isis would
- > perform bypass mode if situations allows that ( example: multicast
- > within a group), if don't specify bypass mode, isis would "sure to"
- > use the old protocol. Is there any possibility that isis "automatically"
- > turns to bypass mode when user don't specify that?
-
- Isis automatically turns to bypass mode if it can find any possible way to
- do it, but it won't automatically join you to a group or make you a client
- of a group. Using "B" and "X" you can be sure that you will at least get
- warned if a surprise occurs but the actual algorithm is based on looking
- at the destination of the message.
-
- >
- > 3) A causality case:
- > consider 3 proesses (p1, p2, p3), first, p1 send msg A to p2, then
- > send msg B to p3 after A is sent, and, after p2 receives A, p2 send
- > msg C to p3. All three processes start from initial state (i.e. start
- > from vector time [0,0,0])
- >
- > obviously C and B would be delayed in cbcast queue of p3, but, what would
- > be the correct sequence if they are finally delivered?
-
- I think you are slightly confused about how the scheme works. For
- point-to-point messages (in your example, all of the message are point
- to point), we use an RPC and don't increment the counter at all. So,
- in your scenario, the VT remains at 0,0,0 in all processes. Secondly,
- in your example, message B and C are sent concurrently. So, there is
- no ordering constraint on cbcast and no guarantee of what order the
- messages will be seen in.
-
- Causal ordering matters when some form of "cycle" causes a message to
- get back to a destination of a prior message, or when multicasts are
- used -- say, if A,B and C were multicasts to the group of {p1,p2,p3} and
- your example looked at reception of A, B and C in p3. Then, VT's would
- change value ([1,0,0] for A; [2,0,0] for B; [1,1,0] for C) and hence
- we would know that B and C are delivered after A at process p3. We
- would still not be guaranteed any particular order between B and C,
- unless of course C had been received by p1 before it sent B, or B had
- been received by p2 before it sent C. Then, there would be a causal
- order between A,B and C and this would determine the order seen by p3.
-
- I hope this is clear... (personally, I find these examples with lots
- of letters and numbers confusing. A picture helps a lot).
-
- >
- > if, considering "causality" as the extension of FIFO, B should be
- > delivered before C. (just consider "p1 and p3" pair)
- > if, comparing the vector timestamp of B & C by cbcast protocol, B
- > should precede C too.
-
- No. B and C are sent by different processes "concurrenty". There is
- no reason that ordering should be constrained in this example.
-
- >
- > But, if we check the "happened before" chain, we would see A -> B
- > and A -> C, then, a question arises: it's hard to decide the "happ-
- > ened before" relation between B and C, then, how can we say B should
- > be delivered before C? what's the correct answer of the causal delivery
- > of B & C?
-
- Right, because they are concurrent! The problem is that you used time
- in your example "then p1 sends B, and then p2 sends C" but this use of
- time was not a meaningful one, since it seems to presume that the schedulers
- that decide when p1 will run and when p2 will run are somehow coordinated
- over multiple machines in the network.
-
- Say that p1 spends a few seconds thinking before it sends B. Meanwhile,
- C could easily be sent by p2. This illustrates how a scheduler could
- change the ordering very easily, just by delaying one process a little
- in a perfectly plausible way.
-
- Leslie Lamport wrote extensively on time and its meaning in distributed
- systems -- see any of my papers for references. The two papers appeared
- in CACM in 1978 and, I think, 1981. Lamport (and Isis) use the happens
- before relation to define ordering -- not any sort of intuitive notion
- based on realtime. As your example makes clear, realtime can be confusing
- and can lead to conclusions that look plausible but have subtle hidden
- issues. In Isis, it is best to say "X happens, then Y" if you mean that
- Y could have learned that X happened because X happened in the same process
- as Y, and happened first, or there is some chain of messages between
- when X occurs and when Y occurs. In your example, there is no chain of
- this sort that relates B and C -- unless you make them multicasts in
- a group, and one is sent, say C, only after the other, B, was received by
- process that will send C. That makes a chain from A -> B and from B -> C,
- and only then would some third party be sure to see A, B and then C in that
- order.
-
- >
- >Next time I would ask some technical , programming questions. If there's any
- >programming template of spooling (as checkpoints) please post it (that may
- >reduce my qty of problems in next posting) Thanks!
- >
-
- I think you should definitely have a look at those ACM TOCS papers,
- but at this level of a system like Isis, things do get pretty complex.
- Fortunately, not many users are aware of these issues, because the
- toolkit as a whole, and systems like News, hide them from the
- programmer.
-
- There is also a paper I wrote with Barry Gleeson and Robert Cooper, available
- as a technical report from Cornell, which you might find interesting (the
- one on "Process Group Semantics")
-
- As for spooling and checkpoints: the only documentation is the chapter
- in the manual. Tim Clark is our expert on the spooler (tclark@cs.cornell.edu)
- and as noted in a different posting, has just done some work on it
- to enhance performance. I am sure he would be happy to answer questions
- about checkpointing, but only when he returns from vacation on January 15.
- Meanwhile, I can answer simple questions...
- --
- Kenneth P. Birman E-mail: ken@cs.cornell.edu
- 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office)
- Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428
-