home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.isis
- Path: sparky!uunet!gatech!rpi!batcomputer!cornell!ken
- From: ken@cs.cornell.edu (Ken Birman)
- Subject: Re: Forwarded comments Re: Notes on IP Multicast option for ISIS
- Message-ID: <1992Dec24.165539.28302@cs.cornell.edu>
- Organization: Cornell Univ. CS Dept, Ithaca NY 14853
- References: <1992Dec24.162642.27415@cs.cornell.edu>
- Date: Thu, 24 Dec 1992 16:55:39 GMT
- Lines: 213
-
- In article <1992Dec24.162642.27415@cs.cornell.edu> werner@freya.inesc.pt (Werner Vogels) writes:
- >
- >HP does conform the "standard" and DEC (ultrix) has promised to. SGI was the first to
- >have Multicast IP added to their protocol suite. In principle all BSD 4.4 (or the NET2)
- >based OS and OS servers all support the same multicast interface. Mach2.5 and the
- >mach3.0 ux server both support VMTP and implicitely Multicast IP.
-
- This is good news -- although I had the impression that IP multicast is not
- normally available yet. Do people need to do anything to enable these
- facilities?
-
- >You do not need the mrouted to have a full functioning mcastip on your
- >local area network. Mrouted can be used to route messages between connected
- >subnets or to build tunnels with other sites that run mcastip. I have done a
- >lot of experimenting with mrouted and consider it to be a pain to incorporate
- >it into a reliable multicast scheme with atomic membership updates. The whole
- >membership scheme at the router is not meant to support the same guarantees
- >as we would like to have. The updates to the membership propagate very slowly.
- >An example for this is the traffic from the IETF audiocast session that would
- >still arive at INESC even minutes after I closed the session. More of the
- >experiences with the mbone show that you can build a wide area mcastip net
- >but cannot rely on the membership information at the routers very much.
- >We plan to develop a better routing strategy for reliable multicast,
- >especially for the MAN environment (several links connected by a high speed,
- >high bandwith backbone).
- >
- >Also if you allow interconnected subnets you should decided with which scope
- >(ttl) you want to create groups. There are a number of problems connected
- >to this .....
-
- This is very interesting. If altrnatives to mrouted surface, I hope people
- will keep us informed! As it happens, Isis depends only very slightly on
- routing and certainly doesn't count on the membership information in any
- significant way (except that you obviously won't get a speedup until the
- add-member information propagates enough so that packets are forwarded, if
- necessary, to a new member!)
-
- As for the TTL, I currently use TTL_SITE, but you could reconfigure the
- code to change this. Should it be an option?
-
- >
- >|> If these are not up, IP multicast use triggers
- >|> a point-to-point mechanism slightly but not drastically slower than
- >|> the usual Isis UDP scheme.
- >
- >I do not really known if I understand this, but if "these are not up"
- >shouldn't you just use the old scheme instead of a new point-to-point scheme.
-
- Well, the new code has its own code path and packet format for the wire.
- To route through the old UDP scheme and then send packets sideways as if
- they had come in via IP multicast would impose a higher overhead, and
- would mean that you would have ack traffic for both the multicast layer and
- the UDP layer. The way I currently do it, you can be active exclusively
- in the IP mcast layer, with it doing all its own acks and so forth, and
- in a normal case that will be cheaper, I think. And, the retransmission code
- can look just like a normal multicast packet coming in off the net.
-
- The real problem is that if you run on a platform that doesn't support
- IP multicast, everything will have to be retransmitted to you since you
- will never receive "directly", and this obviously will slow things down.
- The only reason I want to support this case at all is to make the mechanism
- less intrusive in systems that happen to have an old machine lying around.
- I was worried that in large production settings, the mechanism should be
- as easy to use and support as possible, or else people won't be able to
- administer it. So, this seems like a reasonable compromise.
-
- You do get some complaints when you start the application if IP mcast
- is supposed to be used but can't be -- so the admin people would get
- feedback. But, meanwhile, things will still work...
-
- >|> - You will need to have an authorization for "ipmcast" in your isis.rc
- >|> file.
- >|>
- >|> - You will need to specify ISIS_IPMCAST as a flag to isis_remote_init()
- >|>
- >|> - The group(s) that will use IP multicast should specify PG_IPMCAST as
- >|> an option to pg_join() (clients find out automatically).
- >|>
- >|> - You will need to specify a range of port numbers dedicated to use by
- >|> Isis for IP multicast. You can decide how many and what base address
- >|> Isis should use. Isis will reserve them as soon as a program specifies
- >|> ISIS_IPMCAST. Ideally, one port per group using PG_IPMCAST, but we
- >|> can manage if the number is smaller. In the limit, you can specify
- >|> that just one port be used, but in this case processes will get and
- >|> discard some messages not intended for them.
- >
- >With port do you mean UDP port or mcastip ip address?
-
- Both. The way that Steve Deering's code works, an ipmcast address is
- associated with one or more ports, which have normal UDP port numbers.
- The port number on the incoming mcast packet is used to decide who gets
- a copy among those who added themselves to the IP multicast group.
- So, for an Isis group, you would ideally want to have a port per group
- as well as an IP multicast address per group. If you have less, say one
- port for everyone, there is no choice but to re-use port numbers (hence
- we use SO_REUSADDR) and this means that processes will get ipmcasts sent to
- groups other than the one they belong in some cases -- specifically, when
- two groups with the same port number are used on the same machine...
-
- Anyhow, this is all hidden from users. It has performance implications
- but normally they won't be major ones... The number of ports available
- is a compile time system parameter for Isis itself.
-
- >|> - Only messages sent in diffusion mode (to the full set of members and
- >|> clients) will be multicast. However, you can overlap groups and in this
- >|> way can build up other patterns pretty easily.
- >
- >Personally I think you should have a second selection mechanism so you can
- >still use mcastip (and hw multicast) to be used in other then diffusion mode.
-
- I think you may not be aware of what diffusion mode means in Isis, since
- this remark confuses me. A diffusion multicast is just an option ("D")
- on a broadcast that says to send it to all members and clients of the
- destination group. Normally, a message only goes to the members.
-
- Since the members and clients will all receive the message if ipmcast
- is used, since they all add themselves to the ipmcast group, it seems
- reasonable to only use multicast when the destination is specified as
- a diffusion multicast (or when the group just doesn't have any clients).
-
- >
- >|> How does it work?
- >|> - We fragment your message into 4k chunks and stick a small (44 byte)
- >|> header on each.
- >|>
- >|> - For each group joined using PG_IPMCAST as an option, we allocate an IP
- >|> multicast address and register the members and clients.
- >
- >Limited to 64 groups on most modern controllers, and 20 groups per socket. Most
- >mcastip code make no difference between for example lack of buffer space and
- >lack of multicast addresses. You will need a management unit (btw like the
- >one we are >developing) that manages your address space more cleanly. If you
- >map group-id to mcastip address you will soon become exhausted because
- >mcastip will not have the controller go to all-mcast mode. You will have to
- >map more groups into one address.
-
- Actually, thats what I do now: when I run out of addresses, I cycle back
- through old ones. The 20 per socket limit will be an "implementation
- restriction". After all, this is a simple, flexible tool, and it doesn't
- ve to be perfect or turn into research to be useful for commercial sites!
- Long run, I agree, we should get fancier...
-
- >We have more applications running (conferencing, etc) that use mcastip, so we
- >can never be sure that the complete address space at the controller is to be
- >allocated to the group communication module.
-
- When in my stuff, I have one group (/sys/ipmcast) that handles the allocation
- of ipmcast addresses. But if other applications use it, I will have a problem,
- true...
- >
- >Also their are no modules that will exploit the mcast capabilities of fddi,
- >etc.
-
- One thing at a time...
-
- >
- >|> - Members will multicast a packet initially, then retransmit point to point
- >|> if some processes NACK or fail to ACK it.
- >
- >I assume one ack per 4k packet or do you ack every packet?
-
- One ack per 50ms or so, actually, and staggered to avoid collisions.
-
- >
- >|> - Once received, packets travel the same code path as other Isis packets,
- >|> so all the usual Isis ordering properties and so forth hold.
- >|>
- >|> Summary:
- >|> - isis_remote_init(..... ISIS_IPMCAST); to start it up
- >|> - pg_join(.... , PG_IPMCAST, ....); to enable for a group
- >|> - Everything else works as usual, transparently.
- >|>
- >|> From experiments (done, however, with a different version of the code) we
- >|> expect to see a benefit for groups with more than about 4 members. However,
- >|> we have yet to experiment seriously with the new code on Solaris, for
- >|> technical reasons, so I won't have solid numbers for a little longer.
- >|> My guess -- just a guess -- is that performance will be fairly flat as
- >|> a function of group size until groups+clients gets pretty large, probably
- >|> 40 or 50 or more. By then, acks coming back may begin to be a serious
- >|> load on the sender and cause performance to slowly taper off.
- >
- >Personally and also from experiments: if you only use mcastip on a local area
- >network you are better off making your own interface to the network and
- >multicast addresses, with more and more machines providing the BSD packet
- >filter you can reasonably standardize this as well. You will get a better
- >performance as soon as there is more then one receipient. The nit on SunOS is
- >*bad*: do not use it if you have very performance critical applications.
-
- Perhaps I should add that as an option? Obviously, for the near term,
- the standards argument says that Isis has to work for the vendor provided
- and supported technology. It isn't my fault if the standard could be
- improved!
-
- >
- >If you use mcastip on different subnets (no tunneling) watch out for
- >inconsistent behaviour of the mrouted that will cost you performance.
- >Have it go down and come up >again within seconds, it will have to restore
- >its membership tables which will mess-up your interlink communication
- >completely, you will declare members dead who are perfectably reachable.
- >If you router will also be used for other purpose, like in our case for
- >connection to the mbone the state kept at the router is *very* big and
- >takes some time to recover.
- >
- >The router we are to develop bases its interrouter communication on reliable
- >group comm methods making it easier to maintain shared state etc and restoring
- >the state will become much more efficient, etc.
-
- Sounds like we will want to support the INESC router when you do make it
- available! Let us know...
- --
- Kenneth P. Birman E-mail: ken@cs.cornell.edu
- 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office)
- Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428
-