Source Code 1994 March

home *** CD-ROM | disk | FTP | other *** search

/ Source Code 1994 March / Source_Code_CD-ROM_Walnut_Creek_March_1994.iso / compsrcs / unix / volume26 / mcast / part04 < prev next >

Wrap

Text File | 1993-04-05 | 70.4 KB | 2,068 lines

Newsgroups: comp.sources.unix From: casey@gauss.llnl.gov (Casey Leedom) Subject: v26i108: mcast - LLNL IP multicast implementation, V1.2.3, Part04/04 Sender: unix-sources-moderator@vix.com Approved: paul@vix.com Submitted-By: casey@gauss.llnl.gov (Casey Leedom) Posting-Number: Volume 26, Issue 108 Archive-Name: mcast/part04 #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of archive 4 (of 4)." # Contents: doc/notes.me mcastd/mcastd.c # Wrapped by vixie@gw.home.vix.com on Tue Apr 6 12:49:52 1993 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f 'doc/notes.me' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'doc/notes.me'\" else echo shar: Extracting \"'doc/notes.me'\" $19495 characters$ sed "s/^X//" >'doc/notes.me' <<'END_OF_FILE' X.\" Copyright (c) 1992 The Regents of the University of California. X.\" All rights reserved. X.\" X.\" Redistribution and use in source and binary forms, with or without X.\" modification, are permitted provided that the following conditions X.\" are met: X.\" 1. Redistributions of source code must retain the above copyright X.\" notice, this list of conditions and the following disclaimer. X.\" 2. Redistributions in binary form must reproduce the above copyright X.\" notice, this list of conditions and the following disclaimer in the X.\" documentation and/or other materials provided with the distribution. X.\" 3. All advertising materials mentioning features or use of this software X.\" must display the following acknowledgement: X.\" This product includes software developed by the University of X.\" California, Lawrence Livermore National Laboratory and its X.\" contributors. X.\" 4. Neither the name of the University nor the names of its contributors X.\" may be used to endorse or promote products derived from this software X.\" without specific prior written permission. X.\" X.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND X.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE X.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE X.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE X.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL X.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS X.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) X.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT X.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY X.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF X.\" SUCH DAMAGE. X.\" X.po 1.0in X.ll 6.5in X.nr pi 0n X.de Hd X.\" .ds Vr \\$3 X.ds Vr 1.2.3 X.ds Dt \\$4 X.. X.Hd $Header: /u0/casey/src/mcast/doc/RCS/notes.me,v 1.2 93/04/06 11:27:24 casey Exp $ X.of 'LLNL MCAST version \*(Vr''\*(Dt' X.ef '\*(Dt''LLNL MCAST version \*(Vr' X.sp 2in X.(c X.sz +2 Implementation Notes on the LLNL MCAST Distribution X.sz -2 X.)c X.\" We delay defining the header so it won't show up on the first page ... X.oh 'LLNL MCAST Implementation notes''%' X.eh '%''LLNL MCAST Implementation notes' X.sp 3v X.pp This paper contains notes regarding the LLNL MCAST multicast communication software distribution. It covers the multicast abstraction, the API, the current implementation, thoughts about those, and our experience. X.sh 1 "The Abstraction" X.pp In this section we describe the abstraction that we present to multicast communications users. X.pp We use the ubiquitous X.i "multicast group" and X.i "multicast group membership" paradigm. Sets of applications which wish to communicate with each other X.i subscribe to a multicast group created for that purpose and send messages amongst themselves on the multicast group. Those messages may be either X.i broadcast X(all group members receive copies of the messages) or X.i unicast X(the message will only be received by one group member). This closely models Ethernet and leads into our the next section of our abstraction outline ... X.sh 2 "Multicast groups and group members" X.pp The abstraction that we present is that a multicast group forms a virtual network which consists of some subset of a physical network. This virtual network consists of those hosts which are subscribed to the multicast group. A host would typically be subscribed to a multicast group because some application running on the host has requested to be subscribed to the group. X.pp XEvery multicast group has a multicast X.i "group address" unique from all other multicast groups. Additionally, every entity participating in communication on a multicast group is a X.i member of that group and has a unique X.i "member address" within the multicast group. X.pp A multicast group is uniquely identified by the conjunction of its group address and the transport protocol being used. Thus, the creation of a new group is basically equivalent to creating a new address. At this point, only two transport protocols are envisioned: unreliable datagram and reliable datagram. The receiving hosts may provide further presentation semantics via the access they provide applications to the received data. e.g. reliable, sequenced byte stream; unreliable, sequenced message; etc. X.sh 2 "Potential usage patterns" X.pp It is expected that multicast groups will be created on demand and then thrown away just as easily. One potential use is to treat multicast groups in much the same way as UNIX shells treat pipes: i.e. some process creates one or more multicast groups, forks one or more children, and passes the multicast groups to the children for use in communicating among themselves. X.pp Since multicast group creations and member subscriptions and unsubscriptions may be very dynamic, it must be possible to generate new group and new member addresses very easily. In fact, it must be possible to generate these new group and member addresses in a totally distributed manner. There is no room for negotiation, voting, or any other means of coordinating group and member address assignments. More information on this topic is contained below in a section on distributed address creation. X.pp It is also expected that well known groups and members will also be needed for services like distributed naming, service location, routing, and a variety of other applications. X.pp One of the most important services will be a distributed service that will allow other advertised services to be found. Those names will represent both long term and transient objects. For instance, an application may start up, create a multicast group, register its address with a well known name, operate for a while, and then terminate. X.pp While a distributed naming service lives above the multicast transport and therefore outside the main focus of our current work, it is important enough that it should be dealt with. At a minimum, reserved addresses, etc. will have to be set aside. Unfortunately, the current implementation does not address this issue at all. The current scheme to translate names to multicast group addresses is a complete hack designed merely to get things up. One possibility that seems attractive is to do something like the IP/Ethernet ARP and subnet mask discovery protocols where one simply broadcasts requests for resolution to a hardwired multicast group which all hosts subscribed to. However, many different schemes are possible and we just haven't taken the time to work on this yet. The need to handle wide area networking only makes the problem more difficult. X.sh 2 "Distributed address creation" X.pp We have implemented a scheme where addresses are formed by combining the creating host's ID, the host's idea of the current time, and an intra-time-tick sequence number to fulfill these requirements. This requires a rather large address space, but it's difficult to imagine any scheme that fulfills the distributed creation requirements that can do so in a small address space. X.pp As it turns out, many distributed applications need to be able to generate unique identifiers in a distributed manner. As such, we should have just invented a more generic distributed identifier. However, since we wanted to get on with our work and since the identifiers directly corresponded to our multicast addresses, we decided not to lose time working on a more general scheme. X.sh 2 "Distributed address size and mapping onto link layer multicast addresses" X.pp Our current implementation of the distributed address creation algorithm lives in the user's application so a process identifier must be added to further distinguish distributed addresses. This yields a four word distributed address of 128 bits on our 32 bit machines: X.(b X.ta \w'struct'u +\w'unsigned longXXXX'u +\w'sequenceXXXX'u struct mc_addr { /*MACHINE DEPENDENT */ X unsigned long host; /* host ID (probably too small) */ X unsigned long process; /* process ID */ X unsigned long time; /* time (definitely too small) */ X unsigned long sequence; /* sequence number (probably too small) */ X}; X.)b One could certainly put the distributed address creation algorithm in the host's kernel which would easily save one word and bring the address size down to 96 bits. It's conceivable that with sufficient effort, another word might eventually be shaved off without impacting the distributed creation requirements bringing the total address size down to 64 bits. And perhaps one could even take a certain risk of address collision and squeeze addresses down to 48 bits. X.pp But even with that incredible optimism, you would still be stuck with two facts: X.ip 1. XEven 48 bits exceeds the number of address bits available for multicasting in most network link layers. X.ip 2. The number of hosts on today's networks are increasing at an ever more rapid rate. The number of bits necessary to distinguish hosts is going to get bigger, not smaller. There is even a fair chance that host addresses may be variable length in the future ... X.pp Some might still claim that small fixed size multicast group and group member addresses are possible, but we remain skeptical until proven wrong. As such, we have decided to face the problem head on and just accept that multiple unique multicast groups will be mapped onto the same network link layer multicast addresses. Thus, network link layer multicast facilities will provide a first level filtering of multicast packets after which host kernels will have to sort through those packets making it through to see which ones are really destined for applications on the host. This is an unpleasant, but seemingly inescapable situation ... at least until network link layer multicast addresses become larger. One benefit of facing the problem and accepting its consequences is that it now becomes trivially easy to support multicast transport over link layer technologies which don't provide any multicast facilities. X.pp Note that while our multicast address contains information specific to the host the address was created on, the address, once created, isn't attached to the creating host in any way. A process could freely migrate to another host and continue to use an address created on another host. X.sh 2 "Packet format" X.pp At this point it's difficult to be very specific about the format of multicast transport protocol packets. The current implementation depends on a server for most of the transport semantics (see implementation section below). The current packet format has no provisions for error detection, error correction, encryption, options processing, control, status, etc. X.pp However, what is in the packet header is very important: a multicast group address, a source member address, and a destination member address. One of the most important aspects of this is that there is only one multicast group address. This leads to some interesting stretching of the Berkeley socket paradigm we chose to use for an application programming interface. See the section below on the application programming interface for enlightenment on this topic ... X.pp A broadcast to a multicast group is accomplished by using a destination address of MCADDR_BROADCAST. A unicast to a specified group member is handled by filling in the desired member address in the destination address field. The source address is always filled in with the sending member's group address. X.pp This last point leads to the conclusion that an application must be a member of a multicast group before it can send messages on that group. This is an undesirable feature according to Dr. Kenneth Birman in his latest writings as of winter 1992. We didn't allow this simply because we couldn't see the need and wanted to keep things simple. If it becomes necessary to support such unsubscribed sending, perhaps it could be accommodated by filling in the source address with MCADDR_ANY ... X.pp The current client/server multicast transport protocol packet format is as follows: X.(b X.ta \w'struct'u +\w'struct mc_addrXXXX'u +\w'destinationXXXX'u struct mc_header { /* MACHINE DEPENDENT */ X unsigned int version:8; /* protocol version */ X unsigned int hlength:8; /* header length */ X unsigned int :16; /* pad to 32 bits */ X unsigned int length:32; /* length of header + data */ X struct mc_addr group; /* multicast group */ X struct mc_addr source; /* source of this message */ X struct mc_addr destination; /* destination of this message */ X /* message data follows this minimum header */ X}; X.)b Note that even the version and hlength fields serve very little purpose. They are generated and checked, but are always the same values for all packets. Their main use is for detecting client/server communications which have lost framing synchronization. It sure would be nice if transport protocols like TCP offered applications the ability to insert application framing marks!!!! X.sh 1 "The Application Programming Interface" X.pp In this section we describe the application programming interface we have provided to the abstraction outlined above. X.pp We chose to use the Berkeley socket interface for two simple reasons: X.ip 1. It's relatively simple (once you throw away the documentation) and widely known. We hoped this would speed up development of the interface and help people already experienced in developing network applications with Berkeley get started with multicast programming. X.ip 2. It seemed to fit without too much bending. The biggest problem came up in the area of binding naming information to multicast sockets. X.pp A complete Berkeley socket-style interface has been provided. It uses names like X.i mc_socket , X.i mc_bind , and X.i mc_close . These emulation routines take the same arguments as the native X.i socket , X.i bind , X.i close , etc. There are a few routines special to the API. Those routines' names start with X.i mcast_ . X.sh 2 "Multicast socket naming and the Berkeley socket paradigm" X.pp In traditional protocols supported under the Berkeley socket paradigm, a communicating socket is fully named by a protocol family, a transport protocol within that family, a local address part, and a remote address part. The local and remote address parts usually consist of the conjunction of some form of host identifier and a port address. X.pp This name is built up in pieces: the protocol family and transport protocol are specified at socket creation time, the local address part is partially or fully defined with a bind call, and finally the remote address part and perhaps some remaining unspecified portion of the local address part are defined with a connect or accept call. The local and remote address parts can also be specified obscurely via sendto calls, but we'll ignore that complication for now. X.pp This model of naming works pretty well for the multicast abstraction we've defined, but breaks down in two areas: X.ip 1. In multicast communication one doesn't tend to connect to or accept connections from multicast group peer members. One tends to subscribe to a multicast group and then communicate with other subscribers of that group. In our manual pages we sometimes refer to a multicast socket as being X.i connected when it's associated with a multicast group. This is probably a dreadful misnomer that's destined only to confuse people. X.ip 2. XFor multicast sockets, the concepts of protocol family, transport protocol, local address part, and remote address part work pretty well. The protocol family is PF_MCAST, the protocol is either reliable or unreliable, the local address part is the application's group member address, and the remote address part is either MCADDR_BROADCAST or a specific group member's address. But where is there room for the multicast group address in that naming??? X.pp The first problem points to the need of a symmetric naming concept in the Berkeley socket paradigm. We addressed this problem by allowing a bind X(mc_bind) call to put a multicast socket into a fully communicating state. Since a bind call gets both the multicast group and the member addresses in a single call (see below), it can fill in the only remaining name component, the remote part, with a default value of MCADDR_BROADCAST. X.pp XFor the second problem, if we look at the traditional Berkeley socket naming sequence, we can see that we bind symmetric name components to the socket at socket creation time, followed by asymmetric components in later bind, connect, and accept calls. In this abstraction, it's clear that the multicast group address is a symmetric naming component, but there's nowhere to specify symmetric components except in the socket creation call and nowhere in that call to specify per protocol information. X.pp We solved this problem by specifying the multicast group address in all of the bind (mc_bind), connect (mc_connect), and sendto (mc_sendto) calls. Once the multicast group address has been specified with any of these mechanisms, it may not be changed again. Thus, the multicast group address offered to all but the first call is superfluous. X.pp A much better approach might be to allow per protocol naming information to be specified at socket creation time. We didn't do this because we didn't want to step too far outside the Berkeley socket paradigm. Also, we weren't at all certain that it might make even more sense to have a completely different scheme of providing socket naming information. Options range from specifying all naming information at socket creation time (this is similar to the X.i open call) to allowing all information to be built up incrementally (this is similar to declaring a structure and then filling it in a piece at a time). X.pp Also note that this wouldn't be a problem if multicast transport packets contained a source and a destination multicast group. This could probably be done and make sense logically, but the routing issues could be frightening ... X.pp XFinally, regardless of the approach used to specify a socket's name components, it became obvious to us that a nice high level open-style routine is highly desirable. For our multicast sockets, we developed mcast_sopen. This simple routine allows applications to connect up to a multicast group and start communicating with just a few lines of code. We think that equivalent inet_sopen, ns_sopen, osi_sopen calls would be equally well received ... X.sh 1 "The Implementation" X.pp The current implementation of the multicast transport uses a communication server to emulate multicast message passing semantics. Clients connect to the server, subscribe to a multicast group, and then commence sending and receiving messages on that group. A Berkeley socket-style API (Application Programming Interface) is provided to hide the details of this interaction. X.pp One of the main reasons for developing the client/server implementation of the multicast communication system was to rapidly prototype and experiment with multicast communication abstractions and application programming interfaces (APIs). We also wanted to enable the rest of our project team to use multicast communication paradigms without having to wait for a final implementation. A later, more complete implementation is destined to take the place of this initial implementation [hopefully] transparently under the defined API. END_OF_FILE if test 19495 -ne `wc -c <'doc/notes.me'`; then echo shar: \"'doc/notes.me'\" unpacked with wrong size! fi # end of 'doc/notes.me' fi if test -f 'mcastd/mcastd.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'mcastd/mcastd.c'\" else echo shar: Extracting \"'mcastd/mcastd.c'\" $48856 characters$ sed "s/^X//" >'mcastd/mcastd.c' <<'END_OF_FILE' X/* X * $Header: /u0/casey/src/mcast/mcastd/RCS/mcastd.c,v 1.7 93/03/17 09:38:39 casey Exp $ X */ X X/* X * Copyright (c) 1992 The Regents of the University of California. X * All rights reserved. X * X * Redistribution and use in source and binary forms, with or without X * modification, are permitted provided that the following conditions X * are met: X * 1. Redistributions of source code must retain the above copyright X * notice, this list of conditions and the following disclaimer. X * 2. Redistributions in binary form must reproduce the above copyright X * notice, this list of conditions and the following disclaimer in the X * documentation and/or other materials provided with the distribution. X * 3. All advertising materials mentioning features or use of this software X * must display the following acknowledgement: X * This product includes software developed by the University of X * California, Lawrence Livermore National Laboratory and its X * contributors. X * 4. Neither the name of the University nor the names of its contributors X * may be used to endorse or promote products derived from this software X * without specific prior written permission. X * X * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND X * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE X * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE X * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE X * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL X * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS X * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) X * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT X * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY X * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF X * SUCH DAMAGE. X */ X X#ifndef lint static char rcsid[] = "$Header: /u0/casey/src/mcast/mcastd/RCS/mcastd.c,v 1.7 93/03/17 09:38:39 casey Exp $"; static char copyright[] = X "Copyright (c) 1992 The Regents of the University of California.\n" X "All rights reserved.\n"; static char classification[] = X "Unclassified\n"; X#endif X X X/* X * This is a multicast communication simulation server. It simulates the X * semantics of joining multicast groups and sending and reciving messages X * to and from a set of peer multicast group members. X * X * Clients connect to the server, subscribe to a group, and then commence X * sending and receiving messages on that group. A Berkeley socket-style X * API (Application Programming Interface) is provided to hide the details X * of this interaction. See the manual pages in ../man and the library X * API routines in ../libmcast for details on that API. X * X * Clients are only allowed to subscribe to one multicast group at a time. X * Thus, a process wishing to subscribe to more than one multicast group X * must open a separate connection to the server for each group. X * X * This server operates as a member of every multicast group with member X * address MCD_MCADDR. Messages to and from the server are used for X * connection administration. This includes things like subscribing members X * to multicast groups, implementing fcntl(2) style controls, etc. X */ X X/* X * Special note: X * X * Migration isn't supported in this implementation since migration would X * require extra mechanism and migration isn't currently one of the goals X * of our project. Thus we haven't spent the time to do it. X * X * To do client migration under this implementation a method would have to X * be developed to ``keep a client's place'' in a group's output queue X * during the client's migration. One way to do this would be to have X * clients use some form of explicit shutdown protocol with the server. A X * shutdown message could indicate that the connection should be closed and X * whether the client is ``done,'' or intends to reconnect momentarily. X * X * In the first case, any state associated with the client could then be X * freely dropped. In the second case, a reconnection timer could be X * started. If the client managed to reconnect before the timer expired, X * well and good. If not ... X * X * But this is all pie in the sky thinking and no efforts have been spent X * in any real design work. X */ X X X/* X * In general the compilation environment for this program is assumed to be X * ANSI C-1989 and POSIX 1003.1-1990 with Berkeley network extensions. No X * effort has been spent to make this compile under any other environment. X * Maybe later ... This program is known to compile under SGI IRIX 3.3.2. X */ X X/* X * ANSI C and POSIX includes X */ X#include <errno.h> X#include <fcntl.h> X#include <setjmp.h> X#include <signal.h> X#include <stdarg.h> X#include <stddef.h> X#include <stdio.h> X#include <stdlib.h> X#include <string.h> X#include <unistd.h> X#include <sys/types.h> X X#ifdef NEED_OFFSETOF X/* X * offsetof is supposed to be defined in <stddef.h> according to the ANSI C X * X3.159-1989 specification, but Sun OS 4.1.1 fails to define them ... X */ X#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) X#endif X X#ifdef NEED_EXIT_CODES X/* X * EXIT_SUCCESS and EXIT_FAILURE are supposed to be defined in <stdlib.h> X * according to the ANSI C X3.159-1989 specification, but Sun OS 4.1.1 X * fails to define them ... X */ X#define EXIT_SUCCESS 0 X#define EXIT_FAILURE 1 X#endif X X#ifdef NEED_STRERROR X/* X * strerror is supposed to be defined in <string.h> and supplied in the X * standard C library according to the ANSI C X3.159-1989 specification, X * but Sun OS 4.1.1 fails to define or supply it ... Unfortunately the only X * way we can control this is with an externally supplied define. X */ extern int errno; /* system error number */ extern char *sys_errlist[]; /* system error messages */ extern int sys_nerr; /* number of entries in sys_errlist */ X char * strerror(int err) X{ X if (err < 0 || err >= sys_nerr) { X static char msg[100]; X X sprintf(msg, "system error number %d", err); X return(msg); X } X return(sys_errlist[err]); X} X#endif X X X/* X * BSD networking includes X * X * We use sockets, TCP/IP, select (and therefore fd sets), syslog, getservent, X * and htonl and friends. X */ X#include <netdb.h> X#include <syslog.h> X#include <sys/uio.h> X#include <sys/socket.h> X#include <sys/time.h> X#include <netinet/in.h> X#include <arpa/inet.h> X X/* X * Multicast transport includes X */ X#include <netmcast/mcast.h> X#include <netmcast/mcastd.h> X X X/* X * Arguments X * ========= X */ char *usage = "usage: %s [-d] [-l loglevel] [-p port]\n"; X char *myname; /* name we were invoked by */ int debug; /* debug mode -- don't disconnect from tty */ int loglevel = LOG_NOTICE; /* default syslog logging level */ unsigned short port; /* TCP port to set up mcast service on */ X X X/* X * Queue management X * ================ X * X * Used for generic queues. Queue pointers usually point at Queue structures X * within the bodies of other structures. Queue pointers are translated to X * pointers to the enclosing base type via the macro baseof(). There is X * usually some Queue structure designated as the head which will be X * instantiated before any other queue elements and won't be destroyed until X * all other elements are dequeued and there is no further need for the queue. X */ typedef struct _Queue { X struct _Queue *next; /* next queue element */ X struct _Queue *prev; /* previous queue element */ X} Queue; X X X/* X * Message management X * ================== X * X * Messages are strung together on a queue off of the multicast group that they X * are destined for. X * X * The field "references" actually counts the number of clients that are X * pointing at the message or some predecessor in the queue. Thus, X * "references" is really a sum of the number of clients who are looking at or X * will look at the message. X * X * The mc_header "message" field *MUST* be last. When messages are read in X * the Message storage will be realloc()'ed to accommodate the message body X * which will be stored immediately following the header. X */ typedef struct _Message { X Queue queue; /* message queue */ X int references; /* number of "references" */ X mc_header msg; /* MUST BE LAST: the message itself */ X /* message data follows header */ X} Message; X X X/* X * Multicast group management X * ========================== X * X * Multicast groups are strung together on hash chains. Currently we only X * have one hash chain, but all the mechanism is in place to do hashing if the X * server starts slowing down on multicast group subscriptions because there X * are too many multicast groups to search through. Not likely, but it X * doesn't add much complexity to fake it in case it does become necessary. X * X * Clients subscribing to a multicast group are attached to the "clients" X * queue. The clients queue is traversed whenever a new message arrives X * destined for the group. Any client which is in an output idle state will X * be checked to determine if it wants see the message. If so, its write X * buffer will be pointed at the newly attached message and its output X * interrupts will be turned back on (see "pending"). If this traversal starts X * to become a problem a second client chain can be set up to separate clients X * in "output" and "idle" states, but it's very unlikely that it will become X * a problem. X * X * "nclients" counts the number of clients subscribed to the multicast group. X * It's used primarily to initialize the reference count on any incoming X * message. A secondary use is to tell when there are no more client X * subscribed to the group, but that could just as easily be done be checking X * to see if the "clients" queue is empty. X * X * "messages" is an ordered queue of all messages sent to the multicast group X * which have a non-zero "reference" count. (See description of "Message" X * above.) X */ typedef struct _Group { X Queue hash; /* group id hash chain */ X mc_addr address; /* group multicast address */ X Queue clients; /* group clients queue head */ X int nclients; /* number of clients */ X Queue messages; /* group messages queue head */ X} Group; X X#define GHASHSIZE 1 /* only one hash bucket for now */ Queue groups[GHASHSIZE]; /* group hash bucket heads */ X X#define ghash(gaddrp) &groups[0] /* mcast group address to hash chain */ X X X/* X * Client management X * ================= X * X * The clients of a multicast group are strung together via the "peers" queue X * and linked into the multicast group's "clients" field. X * X * "group" points back at the client's subscribed multicast group. It's X * primary use is to detect the heads of the "peers" and "messages" queues X * as &group->clients and &group->messages. It's also used to clean up group X * state when a client unsubscribes to the multicast group. X * X * The "fd" field contains the file descriptor the client is on. It's used to X * translate from a client structure pointer to its index in the "clients" X * array. X * X * The "flags" field is used to keep track of special client state. Currently X * this is only used to tell whether the client wants its own multicast X * messages looped back to it and whether the client wants to operate in a X * ``send only'' mode (though messages directed explicitly at the client will X * still be delivered -- see ClientWantsMessage). The default is not to send X * a client's messages back to it and to deliver general group messages. X * X * The fields "rbuf" and "wbuf" are used to keep track of clients message I/O. X * X * "rbuf.message" is either NULL or points at a message that's in the process X * of being read from the client. A Message in the process of being read in X * will always have its "queue" linked to itself and its "reference" count set X * to 1. This is not necessary, but it means that messages are always in a X * consistent state and applying FreeMessage() will be meaningful.: The X * client's receive buffer is the *ONLY* reference to a message being received. X * Thus it is always safe to simply free() it. X * X * "wbuf.message" is either NULL or points at a message in the group's message X * queue. If wbuf.message is non-NULL, the client holds a reference to that X * message and all following messages. X * X * The "offset" subfield in each of "rbuf" and "wbuf" is used to keep track of X * the current I/O offset within the message. X */ typedef struct _MIO { /* client message I/O information */ X Message *message; /* message being dealt with */ X int offset; /* I/O offset into message->msg */ X} MIO; X typedef struct _Client { X Queue peers; /* multicast group peers */ X Group *group; /* multicast group */ X mc_addr local; /* client's local multicast address */ X mc_addr remote; /* client's remote connected address */ X int fd; /* client's file descriptor */ X int flags; /* control information (see below) */ X MIO rbuf; /* input message control */ X MIO wbuf; /* output message control */ X} Client; X X/* flags */ X#define CLIENT_CONNECTED 0x0001 /* client connected to another client */ X#define CLIENT_LOOPBACK 0x0002 /* client sees its own messages */ X#define CLIENT_SHUTDOWNRECV 0x0004 /* client doesn't want to receive */ X X#define MAX_CLIENTS FD_SETSIZE /* we use select! */ Client *clients[MAX_CLIENTS]; /* multicast clients */ X X X/* X * Server State and Connection management X * ====================================== X */ int mcastd; /* incoming connection socket */ fd_set connections; /* mcastd + all clients connections */ fd_set pending; /* all clients with output pending */ int max_connection; /* maximum fd in connections */ jmp_buf dead_client; /* where to go if a client dies */ jmp_buf server_reset; /* where to go if we receive a reset */ X X#ifndef MCD_BUFSIZ X#define MCD_BUFSIZ (32*1024) /* send/receive buffer size to attempt X to set on all client connections */ X X#endif X X X/* X * Internal routines X * ================= X */ int main(int argc, char *argv[]); /* main program */ void BecomeDaemon(void); /* fork and disconnect from tty */ void SetupServerSocket(void); /* just what it says ... */ void ShutdownServer(void); /* shutdown all server operations */ void Server(void); /* main server loop */ void GetConnection(void); /* accept connection on mcastd */ void CloseConnection(int fd); /* close down client connection */ void ReadFromClient(int fd); /* read data from client */ void WriteToClient(int fd); /* write pending output to client */ void ProcessClientRbuf(int fd); /* deal with a newly read message */ void HandleClientRequest(int fd, Message *mp); X /* handle client request to server */ void QueueClientMessage(int fd, Message *mp); X /* queue client message to group */ int Subscribe(int fd, const mc_addr *addr); X /* add client to multicast group */ void Unsubscribe(int fd); /* delete client from current group */ int ClientWantsMessage(const Client *cp, const Message *mp); X /* return TRUE if the message should be X sent to the client */ void FreeMessage(Message *mp); /* drop a reference to message */ void Log(int level, const char *fmt, ...); X /* internal log routine */ void SigDeadClient(int sig); /* dead client catcher ... */ void SigReset(int sig); /* reset signal catcher */ void SigShutdown(int sig); /* termination signal catcher */ int FD_FFS(const fd_set *fds); /* first fd_set bit set */ X X X/* X * Miscellaneous macros, etc. X * ========================== X */ X X/* X * Test run-time self consistency assertion. If assertion expression returns X * zero, log error and abort execution. We can't use ANSI C assert macro X * because we won't have stdout if we fork off as a daemon. X * X * (The check for __STDC__ is obnoxious because we *assume* ANSI C X * everywhere else. It's needed -- hopefully temporarily -- because IRIX X * 3.3.2 offers an almost, but not quite, full ANSI C compiler. They X * completely fell down with respect to the default cpp and their standard X * header files can't aren't set up to be processed by their ANSI C acpp. X * (sigh)) ... X * X * later that ``night'', SGI came out with their next brain damage in IRIX X * 4.0.*: an ANSI standard CPP is now the default, but __STDC__ isn't. X * They view that __STDC__ should only be defined if one is absolutely and X * mind-bogglingly, to the letter standard with no extensions -- regardless X * of whether those extensions affect the semantics of the standard X * construct subsets or not. (sigh) Sometimes you just want to hit people X * and tell them ``No. That's not it. Try again.'' For now we use an SGI X * specific define __ANSI_CPP__ which they define when an ANSI cpp is being X * used ... X */ X#ifdef NDEBUG X#define Assert(e) ((void)0) X#else X#if defined(__STDC__) || (defined(sgi) && defined(__ANSI_CPP__)) X#define Assert(e) \ X if (!(e)) { \ X Log(LOG_EMERG, "assertion failed: file %s, line %d: %s", \ X __FILE__, __LINE__, #e); \ X (void)abort(); \ X } X#else X#define Assert(e) \ X if (!(e)) { \ X Log(LOG_EMERG, "assertion failed: file %s, line %d: %s", \ X __FILE__, __LINE__, "e"); \ X (void)abort(); \ X } X#endif X#endif X X/* X * Return a pointer to the containing base type given a pointer to one of X * the base type's fields. X */ X#define baseof(type, field, fieldp) \ X (type *)(void *)((char *)(fieldp) - offsetof(type, field)) X X/* X * The Program ... X * =============== X */ X int main(int argc, char *argv[]) X /* X * Process command line arguments, set up server socket, initialize X * various data structures, and enter main server loop. X */ X{ X int ch, i; X extern int getopt(int, char **, char *); X extern char *optarg; X extern int optind; X X /* X * Parse command line arguments. X */ X myname = strrchr(argv[0], '/'); X if (myname != NULL) X myname++; X else X myname = argv[0]; X while ((ch = getopt(argc, argv, "dl:p:")) != EOF) X switch ((char)ch) { X case '?': X (void)fprintf(stderr, usage, myname); X exit(EXIT_FAILURE); X /*NOTREACHED*/ X case 'd': X debug = 1; X break; X case 'l': X loglevel = atoi(optarg); X break; X case 'p': X port = atoi(optarg); X break; X } X X /* X * Set up ``catch'' for SIGHUP server reset handler. X */ X if (sigsetjmp(server_reset, 1) == 0) { X struct sigaction sig; X X sig.sa_handler = SigReset; X (void)sigemptyset(&sig.sa_mask); X sig.sa_flags = 0; X (void)sigaction(SIGHUP, &sig, NULL); X } else X Log(LOG_EMERG, "reset: restarting operations"); X X /* X * Start up server. X */ X if (!debug) X BecomeDaemon(); X SetupServerSocket(); X FD_ZERO(&connections); X FD_ZERO(&pending); X FD_SET(mcastd, &connections); X max_connection = mcastd; X for (i = 0; i < GHASHSIZE; i++) { X groups[i].next = &groups[i]; X groups[i].prev = &groups[i]; X } X for (i = 0; i < MAX_CLIENTS; i++) { X clients[i] = NULL; X } X Server(); X /*NOTREACHED*/ X} X X void BecomeDaemon(void) X /* X * Do all the standard stuff necessary to become a daemon. X * Fundamentally consists of forking, parent exits, child disassociate X * itself from the tty and set up for logging messages to syslogd. The X * demonic child returns to take on the duties of the daemon. X */ X{ X int pid; X X pid = fork(); X if (pid < 0) { X perror(myname); X exit(EXIT_FAILURE); X /*NOTREACHED*/ X } X if (pid) { X exit(EXIT_SUCCESS); X /*NOTREACHED*/ X } X (void)close(0); X (void)close(1); X (void)close(2); X (void)setsid(); X openlog(myname, LOG_PID|LOG_CONS|LOG_NDELAY, LOG_DAEMON); X (void)setlogmask(LOG_UPTO(loglevel)); X} X X void SetupServerSocket(void) X /* X * Set up multicast daemon server socket. If no port was specified on X * the command line, look for service MCD_SERVICE_NAME/"tcp" in the X * services table. Fall back to MCD_SERVICE_PORT if service is not X * registered. X */ X{ X int s; X int on; X struct sockaddr_in server; X X /* figure out what port we're going to use */ X if (port) X server.sin_port = htons(port); X else { X char *service_name; X struct servent *sp; X extern struct servent *getservbyname(const char *, const char *); X X service_name = getenv(MCD_SERVICE_ENV); X if (service_name == NULL) X service_name = MCD_SERVICE_NAME; X sp = getservbyname(service_name, "tcp"); X if (sp != NULL) X server.sin_port = sp->s_port; X else { X server.sin_port = htons(MCD_SERVICE_PORT); X Log(LOG_WARNING, "can't find \"" MCD_SERVICE_NAME "\"" X " in services table; using %d", MCD_SERVICE_PORT); X } X } X X /* before we take off, see if there's already a server running ... */ X s = socket(PF_INET, SOCK_STREAM, 0); X if (s < 0) { X Log(LOG_ERR, "exit: %s", strerror(errno)); X exit(EXIT_FAILURE); X /*NOTREACHED*/ X } X server.sin_family = AF_INET; X server.sin_addr.s_addr = INADDR_LOOPBACK; X if (connect(s, &server, sizeof(server)) >= 0) { X Log(LOG_EMERG, "exit: found a server registered at %d/tcp;\n" X "\tversion of %s already running?", X ntohs(server.sin_port), myname); X (void)close(s); X exit(EXIT_FAILURE); X /*NOTREACHED*/ X } X if (errno != ECONNREFUSED) { X Log(LOG_WARNING, "got weird server exclusion result: %s", X strerror(errno)); X } X X /* we appear to be alone ... */ X on = 1; X server.sin_addr.s_addr = INADDR_ANY; X if ((mcastd = socket(PF_INET, SOCK_STREAM, 0)) < 0 X || setsockopt(mcastd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) < 0 X || bind(mcastd, &server, sizeof(server)) < 0) { X Log(LOG_ERR, "exit: %s", strerror(errno)); X exit(EXIT_FAILURE); X /*NOTREACHED*/ X } X (void)listen(mcastd, 5); X} X X void ShutdownServer(void) X /* X * Gracefully shutdown all server operations. X */ X{ X int fd; X X (void)close(mcastd); X FD_CLR(mcastd, &connections); X FD_CLR(mcastd, &connections); X while ((fd = FD_FFS(&connections)) >= 0) { X (void)shutdown(fd, 2); X (void)close(fd); X FD_CLR(fd, &connections); X } X} X X void Server(void) X /* X * Main server loop. Loops forever accepting connection requests, X * reading data from clients and writing data to clients. X */ X{ X int fd, nfds; X fd_set readfds, writefds; X struct sigaction sig; X#ifndef _AIX X /* X * Patch around problem in IBM's AIX 3.2 where they define select X * with void pointers in <sys/select.h>. X */ X extern int select(int, fd_set *, fd_set *, fd_set *, struct timeval *); X#endif X X Log(LOG_INFO, "server started"); X sig.sa_handler = SigShutdown; X (void)sigemptyset(&sig.sa_mask); X sig.sa_flags = 0; X (void)sigaction(SIGTERM, &sig, NULL); X for (;;) { X readfds = connections; X writefds = pending; X nfds = select(max_connection+1, X &readfds, &writefds, (fd_set *)0, X (struct timeval *)0); X if (nfds < 0) { X Log(LOG_ERR, "%s", strerror(errno)); X (void)sleep(5); X continue; X } X if (nfds == 0) { X Log(LOG_WARNING, "select returned 0"); X (void)sleep(5); X continue; X } X X /* X * Handle incoming connection requests. X */ X if (FD_ISSET(mcastd, &readfds)) { X GetConnection(); X FD_CLR(mcastd, &readfds); X } X /* X * Handle all incoming data. X */ X while ((fd = FD_FFS(&readfds)) >= 0) { X ReadFromClient(fd); X FD_CLR(fd, &readfds); X if (!FD_ISSET(fd, &connections)) { X /* lost the connection during the read ... */ X FD_CLR(fd, &writefds); X } X } X /* X * Now handle clients ready to receive data. X */ X while ((fd = FD_FFS(&writefds)) >= 0) { X WriteToClient(fd); X FD_CLR(fd, &writefds); X } X } X /*NOTREACHED*/ X} X X void GetConnection(void) X /* X * A client is knocking on our door. Accept the connection request and X * set up all associated client state. X */ X{ X int fd; X struct sockaddr_in client; X int client_len, opt, optlen; X Client *cp; X X client_len = sizeof(client); X fd = accept(mcastd, &client, &client_len); X if (fd < 0) { X Log(LOG_ERR, "accept: %s", strerror(errno)); X return; X } X /* X * When logging the host that the incoming connection is coming from X * we don't want to use gethostbyaddr and gethostbyname because they X * can hang in some environments which would cause the server to hang. X * If it becomes important to put the hostname in the log an alarm X * could be set or a child could be forked. And don't forget to do X * gethostbyname(gethostbyaddr(addr)) to avoid security hack name X * spoofing! X */ X#ifdef BROKEN_INET_NTOA X Log(LOG_DEBUG, "fd %d: connection request from %s", fd, X inet_ntoa(&client.sin_addr)); X#else X Log(LOG_DEBUG, "fd %d: connection request from %s", fd, X inet_ntoa(client.sin_addr)); X#endif X if (fd >= MAX_CLIENTS) { X /* don't laugh -- this can actually happen under 4.4BSD */ X Log(LOG_ALERT, "fd %d: client fd too big -- dropping" X " connection", fd); X (void)close(fd); X return; X } X X /* X * Set up internal data structures for new client. X */ X cp = (Client *)malloc(sizeof(Client)); X if (cp == NULL) { X Log(LOG_ALERT, "fd %d: unable to allocate storage for new" X " client -- dropping connection", fd); X (void)close(fd); X return; X } X cp->peers.next = &cp->peers; X cp->peers.prev = &cp->peers; X (void)memset(&cp->local, 0, sizeof(cp->local)); X cp->group = NULL; X cp->local = MCADDR_ANY; X cp->remote = MCADDR_ANY; X cp->fd = fd; X cp->flags = 0; X cp->rbuf.message = NULL; X cp->rbuf.offset = 0; X cp->wbuf.message = NULL; X cp->wbuf.offset = 0; X clients[fd] = cp; X FD_SET(fd, &connections); X if (fd > max_connection) X max_connection = fd; X X /* X * Set the send and receive buffer sizes to MCD_BUFSIZ. X */ X opt = MCD_BUFSIZ; X if (setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &opt, sizeof(opt)) < 0) { X optlen = sizeof(opt); X (void)getsockopt(fd, SOL_SOCKET, SO_SNDBUF, &opt, &optlen); X Log(LOG_WARNING, "fd %d: unable to set send buffer size to %d," X " using %d", fd, MCD_BUFSIZ, opt); X } X opt = MCD_BUFSIZ; X if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &opt, sizeof(opt)) < 0) { X optlen = sizeof(opt); X (void)getsockopt(fd, SOL_SOCKET, SO_RCVBUF, &opt, &optlen); X Log(LOG_WARNING, "fd %d: unable to set receive buffer size to" X " %d, using %d", fd, MCD_BUFSIZ, opt); X } X X /* X * Use non-blocking I/O to increase throughput, lower latency and avoid X * deadlocks. This primarily affects reads and writes of very large X * messages and/or writes to a slow receiver. A write to file X * descriptor with only a small amount of output space available could X * block waiting for sufficient output space to become available. X * Using non-blocking I/O we only write what can be accommodated and X * the remainder will have to wait for more output space to open up. X * This improves throughput performance because the multiplexing server X * isn't stuck waiting on a particular client when it could be working X * on something else. It prevents deadlocks because the multiplexing X * server won't be waiting for a write to a client to finish while that X * client may be waiting for a write to the multiplexing server to X * finish ... X */ X if (fcntl(fd, F_SETFL, O_NONBLOCK)) X Log(LOG_WARNING, "fd %d: unable to set non-blocking mode --" X " suboptimal performance and deadlocks possible", fd); X X#ifdef BROKEN_INET_NTOA X Log(LOG_INFO, "fd %d: connection from %s established --" X " connections[0] = %#x, max = %d", fd, inet_ntoa(&client.sin_addr), X connections.fds_bits[0], max_connection); X#else X Log(LOG_INFO, "fd %d: connection from %s established --" X " connections[0] = %#x, max = %d", fd, inet_ntoa(client.sin_addr), X connections.fds_bits[0], max_connection); X#endif X} X X void CloseConnection(int fd) X /* X * A client went away. Clean up all associated state. X */ X{ X Client *cp = clients[fd]; X X Log(LOG_INFO, "fd %d: shutting down connection", fd); X (void)shutdown(fd, 2); X (void)close(fd); X FD_CLR(fd, &connections); X FD_CLR(fd, &pending); X if (fd == max_connection) { X int i; X X for (i = max_connection-1; i >= 0; i--) X if (FD_ISSET(i, &connections)) X break; X max_connection = i; X } X X /* clean up group state associated with client */ X if (cp->group != NULL) X Unsubscribe(fd); X X clients[fd] = NULL; X free(cp); X} X X void ReadFromClient(int fd) X /* X * Read data from a client. This may involve continuing reading a X * message already partially received or starting on a new message if X * none is in progress. If we complete a message, we get to hand it X * off to ProcessClientRbuf ... X * X * Note that we never read more than one message's worth of data. This X * may be a feature or a bug. It's a feature if you don't want one X * client hogging the multicast server. It's a bug if you can't get X * the throughput you need because the multicast server is effectively X * polling at the message level ... X */ X{ X Client *cp = clients[fd]; X Message *mp = cp->rbuf.message; X void *buf; X int len, nread; X X /* if not currently reading a message, start on a new one */ X if (mp == NULL) { X mp = (Message *)malloc(sizeof(Message)); X if (mp == NULL) { X Log(LOG_ALERT, "fd %d: unable to allocate storage for" X " incoming message -- dropping connection!!", fd); X CloseConnection(fd); X return; X } X mp->queue.next = &mp->queue; X mp->queue.prev = &mp->queue; X mp->references = 1; X cp->rbuf.offset = 0; X cp->rbuf.message = mp; X } X X /* read chunk of message in */ X buf = (char *)&mp->msg + cp->rbuf.offset; X if (cp->rbuf.offset < MCHEADER_CORE) { X /* reading in core message header (which includes "length") */ X len = MCHEADER_CORE - cp->rbuf.offset; X } else { X /* header read in -- safe to use "length" field in header */ X len = ntohl(mp->msg.length) - cp->rbuf.offset; X } X Assert(len > 0); X nread = read(fd, buf, len); X X /* find out what we actually got */ X if (nread < 0) { X Log(LOG_ERR, "fd %d: read failed: %s", fd, strerror(errno)); X return; X } X if (nread == 0) { X Log(LOG_INFO, "fd %d: read returned 0 -- peer dead", fd); X CloseConnection(fd); X return; X } X cp->rbuf.offset += nread; X Log(LOG_DEBUG, "fd %d: read %d bytes of message, %d requested", fd, X nread, len); X if (nread < len) { X /* still more to read for the message */ X return; X } X ProcessClientRbuf(fd); X /* X * If the receive buffer still has a message attached to it, that X * means that ProcessClientRbuf enlarged the message from just a X * core message header to a message header followed by message body. X * We may as well see if there's more to read ... X */ X if (cp->rbuf.message != NULL) X ReadFromClient(fd); X} X X void WriteToClient(int fd) X /* X * Write data to a client. This may involve continuing to write a X * message already partially transmitted or starting on a new message X * if none is in progress. If we complete sending a message, we drop X * our reference to it and move on to the next in the queue. If there X * are no more messages to send to the client, we drop the client from X * the pending output ``list.'' X */ X{ X Client *cp = clients[fd]; X Message *mp = cp->wbuf.message; X void *buf; X int len, nwrite; X struct sigaction sig, osig; X X /* set up to catch a SIGPIPE in case peer has died */ X if (sigsetjmp(dead_client, 1) != 0) { X /* note: osig will be set by the time this is executed */ X (void) sigaction(SIGPIPE, &osig, (struct sigaction *)0); X Log(LOG_INFO, "fd %d: write got SIGPIPE -- peer died", fd); X CloseConnection(fd); X return; X } X sig.sa_handler = SigDeadClient; X (void)sigemptyset(&sig.sa_mask); X sig.sa_flags = 0; X (void)sigaction(SIGPIPE, &sig, &osig); X X /* write next chunk of message out */ X Assert(mp != NULL); X buf = (char *)&mp->msg + cp->wbuf.offset; X len = ntohl(mp->msg.length) - cp->wbuf.offset; X Assert(len > 0); X nwrite = write(fd, buf, len); X X /* reset SIGPIPE signal state and find out what the write did */ X (void)sigaction(SIGPIPE, &osig, (struct sigaction *)0); X if (nwrite < 0) { X Log(LOG_ERR, "fd %d: write failed: %s", fd, strerror(errno)); X return; X } X if (nwrite == 0) { X Log(LOG_WARNING, "fd %d: write returned 0", fd); X return; X } X if (nwrite < len) { X /* still more to write on the message */ X cp->wbuf.offset += nwrite; X Log(LOG_DEBUG, "fd %d: wrote %d bytes of message", fd, nwrite); X return; X } X X /* finished writing message -- go on to next */ X Log(LOG_INFO, "fd %d: finished writing %d byte message", fd, X ntohl(mp->msg.length)); X cp->wbuf.offset = 0; X for (;;) { X Queue *mqp = mp->queue.next; X X FreeMessage(mp); X if (mqp == &cp->group->messages) { X /* that was the last message in the queue */ X cp->wbuf.message = NULL; X FD_CLR(fd, &pending); X break; X } X mp = baseof(Message, queue, mqp); X if (ClientWantsMessage(cp, mp)) { X cp->wbuf.message = mp; X break; X } X } X} X X void ProcessClientRbuf(int fd) X /* X * Process a newly received client message. ProcessClientRbuf is X * called when the current expected message length has been read in. X * If that length only covers the core message header, X * ProcessClientRbuf will delve into the message header to determine X * how large the entire message is. It will then reallocate the X * message storage to cover the message and leave the message connected X * to the client's receive buffer for further input. X * X * If the message is complete, ProcessClientRbuf will disconnect the X * message from the client's receive buffer and dispatch it to either X * HandleClientRequest or QueueClientMessage depending on whether the X * message was or was not directed at the server (us). X */ X{ X Client *cp = clients[fd]; X Message *mp = cp->rbuf.message; X X /* X * If just finished reading core message header, set up for reading X * remaining header and body. X */ X if (cp->rbuf.offset == MCHEADER_CORE) { X unsigned int X ver = mp->msg.version, X hlen = mp->msg.hlength, X len = ntohl(mp->msg.length); X X if (ver != MCVERSION X || hlen < MCHEADER_MIN || hlen > MCHEADER_MAX X || len < hlen) { X Log(LOG_ERR, "fd %d: malformed message header --" X " dropping message", fd); X cp->rbuf.message = NULL; X cp->rbuf.offset = 0; X free(mp); X return; X } X mp = (Message *)realloc(mp, offsetof(Message, msg) + len); X if (mp == NULL) { X Log(LOG_ALERT, "fd %d: unable to allocate storage for" X " message body -- dropping connection!!", fd); X CloseConnection(fd); X return; X } X cp->rbuf.message = mp; X return; X } X X /* X * Message complete -- disconnect from receive buffer and process. X */ X cp->rbuf.offset = 0; X cp->rbuf.message = NULL; X Log(LOG_INFO, "fd %d: finished reading %d byte message", fd, X ntohl(mp->msg.length)); X X if (MCADDR_EQ(mp->msg.destination, MCD_MCADDR)) X HandleClientRequest(fd, mp); X else X QueueClientMessage(fd, mp); X} X X void HandleClientRequest(int fd, Message *mp) X /* X * Handle a request from the client to the server (us). Message is X * freed after being processed. Most of these reflect what would be the X * equivalents of ioctl's, bind's, and other local network and link X * layer operations if the multicast protocol were implemented with X * network and link layer multicast facilities instead of a server as X * it is in this implementation. X */ X{ X Client *cp = clients[fd]; X unsigned long len X = ntohl(mp->msg.length) - mp->msg.hlength; X mcd_message *sp X = (mcd_message *)(void *)((char *)&mp->msg + mp->msg.hlength); X X if (len == 0) { X Log(LOG_WARNING, "fd %d: zero length server request --" X " dropping message", fd); X free(mp); X return; X } X while (len > 0) { X if (len < sizeof(mcd_message)) { X Log(LOG_ERR, "fd %d: incomplete server request --" X " dropping message", fd); X free(mp); X return; X } X switch (ntohl(sp->request)) { X default: X Log(LOG_ERR, "fd %d: bogus request %d -- dropping" X " request", fd, htonl(sp->request)); X break; X X case MCD_BIND: X if (MCADDR_EQ(sp->group, MCADDR_ANY) X && MCADDR_EQ(sp->local, MCADDR_ANY) X && MCADDR_EQ(sp->remote, MCADDR_ANY)) { X Log(LOG_INFO, "fd %d: unbinding addresses", fd); X if (cp->group) X Unsubscribe(fd); X cp->local = MCADDR_ANY; X cp->remote = MCADDR_ANY; X cp->flags &= ~CLIENT_CONNECTED; X break; X } X if (MCADDR_EQ(sp->group, MCADDR_ANY) X || MCADDR_EQ(sp->group, MCADDR_BROADCAST) X || MCADDR_EQ(sp->local, MCADDR_ANY) X || MCADDR_EQ(sp->local, MCADDR_BROADCAST) X || MCADDR_EQ(sp->remote, MCADDR_ANY)) { X Log(LOG_ERR, "fd %d: bad MCD_BIND request" X " -- dropping request", fd); X break; X } X Log(LOG_INFO, "fd %d: binding addresses", fd); X if (cp->group) X Unsubscribe(fd); X (void)Subscribe(fd, &sp->group); X cp->local = sp->local;; X cp->remote = sp->remote; X if (MCADDR_EQ(cp->remote, MCADDR_BROADCAST)) X cp->flags &= ~CLIENT_CONNECTED; X else X cp->flags |= CLIENT_CONNECTED; X break; X X case MCD_LOOPBACK: X if (ntohl(sp->option) != 0) X cp->flags |= CLIENT_LOOPBACK; X else X cp->flags &= ~CLIENT_LOOPBACK; X Log(LOG_INFO, "fd %d: set client loopback %s", fd, X ntohl(sp->option) ? "on" : "off"); X break; X X case MCD_SHUTDOWNRECV: X if (ntohl(sp->option) != 0) X cp->flags |= CLIENT_SHUTDOWNRECV; X else X cp->flags &= ~CLIENT_SHUTDOWNRECV; X Log(LOG_INFO, "fd %d: set client sendonly %s", fd, X ntohl(sp->option) ? "on" : "off"); X break; X } X sp++; X len -= sizeof(mcd_message); X } X free(mp); X} X X void QueueClientMessage(int fd, Message *mp) X /* X * Queue up a message from a client to its multicast group. Under X * various cercumstances, the message may simply be freed. Specific X * cases include bad message destination address and no clients X * qualified to receive the message. X */ X{ X Group *gp = clients[fd]->group; X Queue *cqp; X X if (gp == NULL) { X Log(LOG_ERR, "fd %d: unregistered client attempting to" X " transmit -- dropping message", fd); X free(mp); X return; X } X if (!MCADDR_EQ(mp->msg.group, gp->address)) { X Log(LOG_ERR, "fd %d: client sending message to wrong" X " group -- dropping message", fd); X free(mp); X return; X } X X /* start by assuming that all clients will want this message */ X mp->references = gp->nclients; X X /* add message to end of group message queue */ X mp->queue.next = &gp->messages; X mp->queue.prev = gp->messages.prev; X mp->queue.prev->next = &mp->queue; X mp->queue.next->prev = &mp->queue; X X /* restart output on any idling clients that want the message */ X for (cqp = gp->clients.next; cqp != &gp->clients; cqp = cqp->next) { X Client *pp = baseof(Client, peers, cqp); X X if (pp->wbuf.message == NULL) { X /* X * DON'T PULL THIS OPTIMIZATION OUT! If you do, X * you'll have to change the structure of X * WriteToClient since it assumes on entry that any X * message in its output buffer should be sent to the X * client. Note that this would also lead to a X * probably undesirable scenario where the client X * sends a LOOPBACK request and gets copies of its own X * messages sent prior to the LOOPBACK request ... X * X * Also, checking for the condition here avoids waking X * up uselessly to deliver a message only to find out X * the client doesn't want it. There are several X * common use patterns of multicast groups where this X * is a win. X */ X if (ClientWantsMessage(pp, mp)) { X pp->wbuf.offset = 0; X pp->wbuf.message = mp; X FD_SET(pp->fd, &pending); X Log(LOG_DEBUG, "fd %d: restarted output on %d", X fd, pp->fd); X } else { X /* this client will never see it */ X mp->references--; X Assert(mp->references >= 0); X Log(LOG_DEBUG, "fd %d: not restarting output" X " on %d", fd, pp->fd); X } X } X } X if (mp->references == 0) { X /* no one wanted the message -- dequeue and free message */ X mp->queue.prev->next = mp->queue.next; X mp->queue.next->prev = mp->queue.prev; X free(mp); X Log(LOG_DEBUG, "fd %d: no clients wanted message -- message" X " dropped", fd); X } X} X X int Subscribe(int fd, const mc_addr *addr) X /* X * Subscribe client to multicast group with multicast address "addr." X * If client is already subscribe to another group, it will be X * unsubscribed first. X * X * If successful, 1 will be returned and the client's group state will X * be initialized to be a client of the requested group. State X * particular to the client will not be touched. Thus, the client's X * address and flags state will remain unchanged. X * X * If unsuccessful, a 0 will be returned and no state will have been X * changed. The only reason for a failing return is inability to X * allocate storage for a new group control block. It's almost certain X * that any calling party will be forced to drop the client's X * connection because there just isn't much that can be done, but we'll X * let our caller decide what to do ... X */ X{ X Client *cp = clients[fd]; X Group *gp; X Queue *ghqp = ghash(addr); X Queue *gqp; X X /* X * Find group with address addr. X */ X for (gqp = ghqp->next; /* void */; gqp = gqp->next) { X if (gqp == ghqp) { X /* X * Reached end of hash chain, so this is the first X * client to request subscription to the group -- X * instantiate the group. X */ X gp = (Group *)malloc(sizeof(Group)); X if (gp == NULL) { X Log(LOG_ALERT, "fd %d: unable to allocate" X " storage for new group -- dropping" X " connection!", fd); X return(0); X } X Log(LOG_INFO, "fd %d: created new group", fd); X X /* X * Initialize new group state, we'll add client below. X */ X gp->address = *addr; X gp->clients.next = &gp->clients; X gp->clients.prev = &gp->clients; X gp->nclients = 0; X gp->messages.next = &gp->messages; X gp->messages.prev = &gp->messages; X X /* add new group to its hash chain */ X gp->hash.next = ghqp; X gp->hash.prev = ghqp->prev; X gp->hash.prev->next = &gp->hash; X gp->hash.next->prev = &gp->hash; X break; X } X gp = baseof(Group, hash, gqp); X if (MCADDR_EQ(gp->address, *addr)) X break; X } X X /* X * Found the group -- it's okay to drop any old subscription. X */ X if (cp->group != NULL) { X Log(LOG_NOTICE, "fd %d: subscribe called on already subscribe" X " client -- unsubscribing first ...", fd); X Unsubscribe(fd); X } X X /* X * Add client to group. X */ X cp->group = gp; X cp->peers.next = &gp->clients; X cp->peers.prev = gp->clients.prev; X cp->peers.prev->next = &cp->peers; X cp->peers.next->prev = &cp->peers; X gp->nclients++; X X /* X * Note: we could start sending the client already queued messages, X * but there's really no point. X */ X Assert(cp->rbuf.message == NULL); X Assert(cp->wbuf.message == NULL); X X Log(LOG_INFO, "fd %d: subscribed to group", fd); X return(1); X} X X void Unsubscribe(int fd) X /* X * Unsubscribe client from multicast group. Clean up group state X * associated with client. If no more clients exist for group, destroy X * the group. On return, the client will be marked as not belonging to X * any group, its read and write buffers will have been cleared and any X * output interrupts turned off. State particular to the client will X * not be touched. Thus, the client's address and flags state will X * remain unchanged. X */ X{ X Client *cp = clients[fd]; X Group *gp = cp->group; X X if (gp == NULL) { X Log(LOG_NOTICE, "fd %d: unsubscribe called on non-subscribed" X " client -- ignoring", fd); X return; X } X X /* remove client from group client list */ X cp->peers.next->prev = cp->peers.prev; X cp->peers.prev->next = cp->peers.next; X X /* mark client as not belonging to any group */ X cp->peers.next = &cp->peers; X cp->peers.prev = &cp->peers; X cp->group = NULL; X X /* drop client's reference to all messages down the queue */ X if (cp->wbuf.message != NULL) { X Message *mp = cp->wbuf.message; X X FD_CLR(fd, &pending); X cp->wbuf.message = NULL; X cp->wbuf.offset = 0; X for (;;) { X Queue *mqp = mp->queue.next; X X FreeMessage(mp); X if (mqp == &gp->messages) X break; X mp = baseof(Message, queue, mqp); X } X } X X /* drop client's rbuf */ X if (cp->rbuf.message != NULL) X free(cp->rbuf.message); X X Log(LOG_INFO, "fd %d: unsubscribed from group", fd); X X /* one fewer client in the group */ X gp->nclients--; X Assert(gp->nclients >= 0); X /* destroy group if that was the last client */ X if (gp->nclients == 0) { X Log(LOG_INFO, "fd %d: destroyed group", fd); X X /* do a little bit of internal consistency checking */ X Assert(gp->clients.next == &gp->clients); X Assert(gp->clients.prev == &gp->clients); X Assert(gp->messages.next == &gp->messages); X Assert(gp->messages.prev == &gp->messages); X X /* remove the group from its hash chain */ X gp->hash.next->prev = gp->hash.prev; X gp->hash.prev->next = gp->hash.next; X free(gp); X } X} X X int ClientWantsMessage(const Client *cp, const Message *mp) X /* X * Return TRUE if the client wants to see the message. X * X * If the client is ``connected'' to another client and this message X * isn't from that other client, return FALSE. X * X * If the message is directed explicitly at this client (i.e. a X * ``unicast'' message) return TRUE. Neither the CLIENT_LOOPBACK nor X * CLIENT_SHUTDOWNRECV flags will affect this semantic. X * X * Else, if the client has CLIENT_SHUTDOWNRECV set return FALSE. The X * client is a ``write-only'' client. X * X * Else, if the message isn't directed at the multicast group, it must X * be a ``unicast'' message to another client, so return FALSE. X * X * Else, if the message isn't from this client, return TRUE. X * X * Else, if the client has requested loopback of multicast messages, X * return TRUE. Else, return FALSE. X */ X{ X if ((cp->flags & CLIENT_CONNECTED) X && !MCADDR_EQ(mp->msg.source, cp->remote)) { X /* client is connected; uninterested in dealing with others */ X return(0); X } X if (MCADDR_EQ(mp->msg.destination, cp->local)) { X /* ``unicast'' to client */ X return(1); X } X if (cp->flags & CLIENT_SHUTDOWNRECV) { X /* client is ``write-only'' */ X return(0); X } X if (!MCADDR_EQ(mp->msg.destination, MCADDR_BROADCAST)) { X /* ``unicast'' to some other client */ X return(0); X } X if (!MCADDR_EQ(mp->msg.source, cp->local)) { X /* multicast, but not originated by client */ X return(1); X } X /* multicast originated by client */ X return((cp->flags & CLIENT_LOOPBACK) != 0); X} X void XFreeMessage(Message *mp) X /* X * Drop a reference to a message in a message queue. If that was the X * last reference, dequeue and free the message. X */ X{ X mp->references--; X Assert(mp->references >= 0); X if (mp->references == 0) { X mp->queue.next->prev = mp->queue.prev; X mp->queue.prev->next = mp->queue.next; X free(mp); X } X} X X X/*VARARGS2*/ void Log(int level, const char *fmt, ...) X /* X * Log a message. If in debug mode all messages are printed to stderr. X * Otherwise they're sent to syslogd ... X */ X{ X va_list args; X extern int vsyslog(int, const char *, va_list); X X va_start(args, fmt); X if (debug) { X if (level <= loglevel) { X (void)fprintf(stderr, "%s: ", myname); X (void)vfprintf(stderr, fmt, args); X (void)fputc('\n', stderr); X } X } else X (void)vsyslog(level, fmt, args); X va_end(args); X} X X X#ifdef _AIX int vsyslog(int level, const char *fmt, va_list ap) X /* X * Log a message to the syslog facility using a varargs agument list. X * This is a simple minded replacement for vsyslog(3) on systems which X * don't supply it. X */ X{ X char buf[BUFSIZ], str[BUFSIZ]; X register char *bp, *sp; X X (void)vsprintf(buf, fmt, ap); X /* X * Go back and double any percent signs (%) to prevent syslog X * from barfing. X */ X bp = buf; X sp = str; X for (;;) { X register char c; X X *sp++ = c = *bp++; X if (c == '\0') X break; X if (c == '%') X *sp++ = '%'; X } X return(syslog(level, str)); X} X#endif /* NEED VSYSLOG(3) */ X X X/*ARGSUSED*/ void SigDeadClient(int sig) X /* X * Dead client signal handler. Indicates that a connection died and we X * got got a SIGPIPE while trying to write on it. WriteToClient sets X * this trap and is prepared to take an error return through a long X * jump to dead_client. X */ X{ X siglongjmp(dead_client, 1); X /*NOTREACHED*/ X} X X X/*ARGSUSED*/ void SigReset(int sig) X /* X * Reset signal handler. Indicates a reset request from someone with X * privileges to send us a reset signal (SIGHUP). Shutdown and close X * all connections and restart operations. X */ X{ X Log(LOG_EMERG, "reset: received reset signal (%d)" X " -- resetting immediately", sig); X ShutdownServer(); X siglongjmp(server_reset, 1); X /*NOTREACHED*/ X} X X X/*ARGSUSED*/ void SigShutdown(int sig) X /* X * Shutdown signal handler. Indicates a shutdown request either from X * init or someone else with privileges to send us a shutdown signal X * (SIGTERM). Shutdown and close all connections and exit. X */ X{ X Log(LOG_EMERG, "exit: received shutdown signal (%d)" X " -- shutting down immediately"); X ShutdownServer(); X exit(SIGTERM); X /*NOTREACHED*/ X} X X int XFD_FFS(const fd_set *fds) X /* X * Return the file descriptor of the first (lowest) file descriptor bit X * set in the file descriptor mask. If no bits are set, return -1. X */ X{ X int m, b, fd; X const fd_mask *p; X X p = fds->fds_bits; X for (m = 0; m < howmany(FD_SETSIZE, NFDBITS); m++) X if (p[m]) X for (b = 1, fd = 0; fd < NFDBITS; b <<= 1, fd++) X if (p[m] & b) X return(m*NFDBITS + fd); X return(-1); X} END_OF_FILE if test 48856 -ne `wc -c <'mcastd/mcastd.c'`; then echo shar: \"'mcastd/mcastd.c'\" unpacked with wrong size! fi # end of 'mcastd/mcastd.c' fi echo shar: End of archive 4 $of 4$. cp /dev/null ark4isdone MISSING="" for I in 1 2 3 4 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have unpacked all 4 archives. rm -f ark[1-9]isdone else echo You still need to unpack the following archives: echo " " ${MISSING} fi ## End of shell archive. exit 0