NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / sys / isis / 344 < prev next >

Wrap

Text File | 1992-12-24 | 11.4 KB | 224 lines

Newsgroups: comp.sys.isis Path: sparky!uunet!gatech!rpi!batcomputer!cornell!ken From: ken@cs.cornell.edu (Ken Birman) Subject: Re: Forwarded comments Re: Notes on IP Multicast option for ISIS Message-ID: <1992Dec24.165539.28302@cs.cornell.edu> Organization: Cornell Univ. CS Dept, Ithaca NY 14853 References: <1992Dec24.162642.27415@cs.cornell.edu> Date: Thu, 24 Dec 1992 16:55:39 GMT Lines: 213 In article <1992Dec24.162642.27415@cs.cornell.edu> werner@freya.inesc.pt (Werner Vogels) writes: > >HP does conform the "standard" and DEC (ultrix) has promised to. SGI was the first to >have Multicast IP added to their protocol suite. In principle all BSD 4.4 (or the NET2) >based OS and OS servers all support the same multicast interface. Mach2.5 and the >mach3.0 ux server both support VMTP and implicitely Multicast IP. This is good news -- although I had the impression that IP multicast is not normally available yet. Do people need to do anything to enable these facilities? >You do not need the mrouted to have a full functioning mcastip on your >local area network. Mrouted can be used to route messages between connected >subnets or to build tunnels with other sites that run mcastip. I have done a >lot of experimenting with mrouted and consider it to be a pain to incorporate >it into a reliable multicast scheme with atomic membership updates. The whole >membership scheme at the router is not meant to support the same guarantees >as we would like to have. The updates to the membership propagate very slowly. >An example for this is the traffic from the IETF audiocast session that would >still arive at INESC even minutes after I closed the session. More of the >experiences with the mbone show that you can build a wide area mcastip net >but cannot rely on the membership information at the routers very much. >We plan to develop a better routing strategy for reliable multicast, >especially for the MAN environment (several links connected by a high speed, >high bandwith backbone). > >Also if you allow interconnected subnets you should decided with which scope >(ttl) you want to create groups. There are a number of problems connected >to this ..... This is very interesting. If altrnatives to mrouted surface, I hope people will keep us informed! As it happens, Isis depends only very slightly on routing and certainly doesn't count on the membership information in any significant way (except that you obviously won't get a speedup until the add-member information propagates enough so that packets are forwarded, if necessary, to a new member!) As for the TTL, I currently use TTL_SITE, but you could reconfigure the code to change this. Should it be an option? > >|> If these are not up, IP multicast use triggers >|> a point-to-point mechanism slightly but not drastically slower than >|> the usual Isis UDP scheme. > >I do not really known if I understand this, but if "these are not up" >shouldn't you just use the old scheme instead of a new point-to-point scheme. Well, the new code has its own code path and packet format for the wire. To route through the old UDP scheme and then send packets sideways as if they had come in via IP multicast would impose a higher overhead, and would mean that you would have ack traffic for both the multicast layer and the UDP layer. The way I currently do it, you can be active exclusively in the IP mcast layer, with it doing all its own acks and so forth, and in a normal case that will be cheaper, I think. And, the retransmission code can look just like a normal multicast packet coming in off the net. The real problem is that if you run on a platform that doesn't support IP multicast, everything will have to be retransmitted to you since you will never receive "directly", and this obviously will slow things down. The only reason I want to support this case at all is to make the mechanism less intrusive in systems that happen to have an old machine lying around. I was worried that in large production settings, the mechanism should be as easy to use and support as possible, or else people won't be able to administer it. So, this seems like a reasonable compromise. You do get some complaints when you start the application if IP mcast is supposed to be used but can't be -- so the admin people would get feedback. But, meanwhile, things will still work... >|> - You will need to have an authorization for "ipmcast" in your isis.rc >|> file. >|> >|> - You will need to specify ISIS_IPMCAST as a flag to isis_remote_init() >|> >|> - The group(s) that will use IP multicast should specify PG_IPMCAST as >|> an option to pg_join() (clients find out automatically). >|> >|> - You will need to specify a range of port numbers dedicated to use by >|> Isis for IP multicast. You can decide how many and what base address >|> Isis should use. Isis will reserve them as soon as a program specifies >|> ISIS_IPMCAST. Ideally, one port per group using PG_IPMCAST, but we >|> can manage if the number is smaller. In the limit, you can specify >|> that just one port be used, but in this case processes will get and >|> discard some messages not intended for them. > >With port do you mean UDP port or mcastip ip address? Both. The way that Steve Deering's code works, an ipmcast address is associated with one or more ports, which have normal UDP port numbers. The port number on the incoming mcast packet is used to decide who gets a copy among those who added themselves to the IP multicast group. So, for an Isis group, you would ideally want to have a port per group as well as an IP multicast address per group. If you have less, say one port for everyone, there is no choice but to re-use port numbers (hence we use SO_REUSADDR) and this means that processes will get ipmcasts sent to groups other than the one they belong in some cases -- specifically, when two groups with the same port number are used on the same machine... Anyhow, this is all hidden from users. It has performance implications but normally they won't be major ones... The number of ports available is a compile time system parameter for Isis itself. >|> - Only messages sent in diffusion mode (to the full set of members and >|> clients) will be multicast. However, you can overlap groups and in this >|> way can build up other patterns pretty easily. > >Personally I think you should have a second selection mechanism so you can >still use mcastip (and hw multicast) to be used in other then diffusion mode. I think you may not be aware of what diffusion mode means in Isis, since this remark confuses me. A diffusion multicast is just an option ("D") on a broadcast that says to send it to all members and clients of the destination group. Normally, a message only goes to the members. Since the members and clients will all receive the message if ipmcast is used, since they all add themselves to the ipmcast group, it seems reasonable to only use multicast when the destination is specified as a diffusion multicast (or when the group just doesn't have any clients). > >|> How does it work? >|> - We fragment your message into 4k chunks and stick a small (44 byte) >|> header on each. >|> >|> - For each group joined using PG_IPMCAST as an option, we allocate an IP >|> multicast address and register the members and clients. > >Limited to 64 groups on most modern controllers, and 20 groups per socket. Most >mcastip code make no difference between for example lack of buffer space and >lack of multicast addresses. You will need a management unit (btw like the >one we are >developing) that manages your address space more cleanly. If you >map group-id to mcastip address you will soon become exhausted because >mcastip will not have the controller go to all-mcast mode. You will have to >map more groups into one address. Actually, thats what I do now: when I run out of addresses, I cycle back through old ones. The 20 per socket limit will be an "implementation restriction". After all, this is a simple, flexible tool, and it doesn't ve to be perfect or turn into research to be useful for commercial sites! Long run, I agree, we should get fancier... >We have more applications running (conferencing, etc) that use mcastip, so we >can never be sure that the complete address space at the controller is to be >allocated to the group communication module. When in my stuff, I have one group (/sys/ipmcast) that handles the allocation of ipmcast addresses. But if other applications use it, I will have a problem, true... > >Also their are no modules that will exploit the mcast capabilities of fddi, >etc. One thing at a time... > >|> - Members will multicast a packet initially, then retransmit point to point >|> if some processes NACK or fail to ACK it. > >I assume one ack per 4k packet or do you ack every packet? One ack per 50ms or so, actually, and staggered to avoid collisions. > >|> - Once received, packets travel the same code path as other Isis packets, >|> so all the usual Isis ordering properties and so forth hold. >|> >|> Summary: >|> - isis_remote_init(..... ISIS_IPMCAST); to start it up >|> - pg_join(.... , PG_IPMCAST, ....); to enable for a group >|> - Everything else works as usual, transparently. >|> >|> From experiments (done, however, with a different version of the code) we >|> expect to see a benefit for groups with more than about 4 members. However, >|> we have yet to experiment seriously with the new code on Solaris, for >|> technical reasons, so I won't have solid numbers for a little longer. >|> My guess -- just a guess -- is that performance will be fairly flat as >|> a function of group size until groups+clients gets pretty large, probably >|> 40 or 50 or more. By then, acks coming back may begin to be a serious >|> load on the sender and cause performance to slowly taper off. > >Personally and also from experiments: if you only use mcastip on a local area >network you are better off making your own interface to the network and >multicast addresses, with more and more machines providing the BSD packet >filter you can reasonably standardize this as well. You will get a better >performance as soon as there is more then one receipient. The nit on SunOS is >*bad*: do not use it if you have very performance critical applications. Perhaps I should add that as an option? Obviously, for the near term, the standards argument says that Isis has to work for the vendor provided and supported technology. It isn't my fault if the standard could be improved! > >If you use mcastip on different subnets (no tunneling) watch out for >inconsistent behaviour of the mrouted that will cost you performance. >Have it go down and come up >again within seconds, it will have to restore >its membership tables which will mess-up your interlink communication >completely, you will declare members dead who are perfectably reachable. >If you router will also be used for other purpose, like in our case for >connection to the mbone the state kept at the router is *very* big and >takes some time to recover. > >The router we are to develop bases its interrouter communication on reliable >group comm methods making it easier to maintain shared state etc and restoring >the state will become much more efficient, etc. Sounds like we will want to support the INESC router when you do make it available! Let us know... -- Kenneth P. Birman E-mail: ken@cs.cornell.edu 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office) Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428