home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!olivea!spool.mu.edu!think.com!rpi!batcomputer!cornell!ken
- From: ken@cs.cornell.edu (Ken Birman)
- Newsgroups: comp.sys.isis
- Subject: Forwarded comments Re: Notes on IP Multicast option for ISIS
- Message-ID: <1992Dec24.162642.27415@cs.cornell.edu>
- Date: 24 Dec 92 16:26:42 GMT
- Organization: Cornell Univ. CS Dept, Ithaca NY 14853
- Lines: 134
-
- From: werner@freya.inesc.pt (Werner Vogels)
-
- In article <1992Dec22.212230.28572@cs.cornell.edu> you write:
-
- |> When will IP mcast be used?
- |> - I plan to require that all (or most) machines using Isis support
- |> the IP Multicast standard, which was developed by Steve Deering at
- |> Stanford and has become part of Solaris 2.1; Steve's code is also
- |> available for many other platforms, but perhaps not as a standard
- |> offering from the vendor.
-
- HP does conform the "standard" and DEC (ultrix) has promised to. SGI was the first to
- have Multicast IP added to their protocol suite. In principle all BSD 4.4 (or the NET2)
- based OS and OS servers all support the same multicast interface. Mach2.5 and the
- mach3.0 ux server both support VMTP and implicitely Multicast IP.
-
- |> The code defines some setsockopt() options: SO_REUSEADDR, IP_MULTICAST_TTL,
- |> IP_ADD_MEMBERSHIP, IP_DROP_MEMBERSHIP. It also includes a multicast
- |> router, called "mrouted".
-
- You do not need the mrouted to have a full functioning mcastip on your
- local area network. Mrouted can be used to route messages between connected subnets
- or to build tunnels with other sites that run mcastip. I have done a lot of
- experimenting with mrouted and consider it to be a pain to incorporate it into
- a reliable multicast scheme with atomic membership updates. The whole membership
- scheme at the router is not meant to support the same guarantees as we would like to
- have. The updates to the membership propagate very slowly. An example for this is
- the traffic from the IETF audiocast session that would still arive at INESC even
- minutes after I closed the session. More of the experiences with the mbone show that
- you can build a wide area mcastip net but cannot rely on the membership information at
- the routers very much. We plan to develop a better routing strategy for reliable
- multicast, especially for the MAN environment (several links connected by a high speed,
- high bandwith backbone).
-
- Also if you allow interconnected subnets you should decided with which scope (ttl)
- you want to create groups. There are a number of problems connected to this .....
-
- |> If these are not up, IP multicast use triggers
- |> a point-to-point mechanism slightly but not drastically slower than
- |> the usual Isis UDP scheme.
-
- I do not really known if I understand this, but if "these are not up" shouldn't you
- just use the old scheme instead of a new point-to-point scheme.
-
- |> - You will need to have an authorization for "ipmcast" in your isis.rc
- |> file.
- |>
- |> - You will need to specify ISIS_IPMCAST as a flag to isis_remote_init()
- |>
- |> - The group(s) that will use IP multicast should specify PG_IPMCAST as
- |> an option to pg_join() (clients find out automatically).
- |>
- |> - You will need to specify a range of port numbers dedicated to use by
- |> Isis for IP multicast. You can decide how many and what base address
- |> Isis should use. Isis will reserve them as soon as a program specifies
- |> ISIS_IPMCAST. Ideally, one port per group using PG_IPMCAST, but we
- |> can manage if the number is smaller. In the limit, you can specify that
- |> just one port be used, but in this case processes will get and discard
- |> some messages not intended for them.
-
- With port do you mean UDP port or mcastip ip address?
-
- |> - Only messages sent in diffusion mode (to the full set of members and
- |> clients) will be multicast. However, you can overlap groups and in this
- |> way can build up other patterns pretty easily.
-
- Personally I think you should have a second selection mechanism so you can still
- use mcastip (and hw multicast) to be used in other then diffusion mode.
-
- |> How does it work?
- |> - We fragment your message into 4k chunks and stick a small (44 byte)
- |> header on each.
- |>
- |> - For each group joined using PG_IPMCAST as an option, we allocate an IP
- |> multicast address and register the members and clients.
-
- Limited to 64 groups on most modern controllers, and 20 groups per socket. Most
- mcastip code make no difference between for example lack of buffer space and lack of
- multicast addresses. You will need a management unit (btw like the one we are
- developing) that manages your address space more cleanly. If you map group-id to
- mcastip address you will soon become exhausted because mcastip will not have the
- controller go to all-mcast mode. You will have to map more groups into one address.
-
- We have more applications running (conferencing, etc) that use mcastip, so we can
- never be sure that the complete address space at the controller is to be allocated to
- the group communication module.
-
- Also their are no modules that will exploit the mcast capabilities of fddi, etc.
-
- |> - Members will multicast a packet initially, then retransmit point to point
- |> if some processes NACK or fail to ACK it.
-
- I assume one ack per 4k packet or do you ack every packet?
-
- |> - Once received, packets travel the same code path as other Isis packets,
- |> so all the usual Isis ordering properties and so forth hold.
- |>
- |> Summary:
- |> - isis_remote_init(..... ISIS_IPMCAST); to start it up
- |> - pg_join(.... , PG_IPMCAST, ....); to enable for a group
- |> - Everything else works as usual, transparently.
- |>
- |> From experiments (done, however, with a different version of the code) we
- |> expect to see a benefit for groups with more than about 4 members. However,
- |> we have yet to experiment seriously with the new code on Solaris, for
- |> technical reasons, so I won't have solid numbers for a little longer.
- |> My guess -- just a guess -- is that performance will be fairly flat as
- |> a function of group size until groups+clients gets pretty large, probably
- |> 40 or 50 or more. By then, acks coming back may begin to be a serious
- |> load on the sender and cause performance to slowly taper off.
-
- personally and also from experiments: if you only use mcastip on a local area
- network you are better off making your own interface to the network and multicast
- addresses, with more and more machines providing the BSD packet filter you can
- reasonably standardize this as well. You will get a better performance as soon as
- there is more then one receipient. The nit on SunOS is *bad*: do not use it if you
- have very performance critical applications.
-
- If you use mcastip on different subnets (no tunneling) watch out for inconsistent
- behaviour of the mrouted that will cost you performance. Have it go down and come up
- again within seconds, it will have to restore its membership tables which will mess-up
- your interlink communication completely, you will declare members dead who are
- perfectably reachable. If you router will also be used for other purpose, like in
- our case for connection to the mbone the state kept at the router is *very* big and
- takes some time to recover.
-
- The router we are to develop bases its interrouter communication on reliable
- group comm methods making it easier to maintain shared state etc and restoring
- the state will become much more efficient, etc.
-
- --
- Kenneth P. Birman E-mail: ken@cs.cornell.edu
- 4105 Upson Hall, Dept. of Computer Science TEL: 607 255-9199 (office)
- Cornell University Ithaca, NY 14853 (USA) FAX: 607 255-4428
-