NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / os / vms / 19940 < prev next >

Wrap

Text File | 1992-12-28 | 4.5 KB | 98 lines

Newsgroups: comp.os.vms Path: sparky!uunet!mcsun!news.funet.fi!aton.abo.fi!usenet From: HEGE@FINABO.ABO.FI (Kaj H{ggman DC) Subject: Summary: RECNXINTERVAL Message-ID: <1992Dec28.082250.3770@abo.fi> Sender: usenet@abo.fi (Usenet NEWS) Organization: Abo Akademi University, Finland Date: Mon, 28 Dec 1992 08:22:50 GMT X-News-Reader: VMS NEWS 1.24 Lines: 86 Hi! I sent the following to the net a couple of weeks ago #Every now and then there are som "bursts" on our ethernet. #The SUN workstations don't give a s*it, but the VAXstations keep shouting #to each other quite a lot and eventually reboot. #The value for RECNXINTERVAL is 120 in our cluster. How far up could #I crank it, i.e. what impact could it have on the MI-cluster? #Any other "time-out"-parameters that are worth checking out? #As I'm not responsible for our network I won't go into that (waiting for #the move towards routing within the net), but I'd really appreciate any #ideas concerning the VAXes. Thanx! and got 8 answers. 6 people had RECNXINTERVAL set to 120 like me, 1 had it set to 180 and 1 didn't mention any value for it. Here is a short descirption of the impact of raising the value of RECNXINTERVAL: The effect of a long RECNXINTERVAL is if a node crashes or hangs and it holds a lock on a critical resource (the UAF file for example) it will take RECNXINTERVAL seconds before the other nodes determine that the failed node should be removed from the cluster. Any applications that need the resource would hang. This could result in the entire cluster hanging for RECNXINTERVAL. But that may be better than having your all your LAVC nodes crash with a CLUEXIT during a network storm. It seems like I've checked out most of the parameters that are worth checking out. For those of you who'd like to experiment with cluster parameters, also check out QDSKINTERVAL, PRCPOLINTERVAL, PASTIMOUT, PAPOLLINTERVAL (as Ehud Gavron mentioned). I also got many comments about our net, having it fixed first, and so on. It seems to be quite common that local area networks built and configured at the time when there were no workstations around (just a few hosts and dump terminals) just can't put up with added load. Then the problem just gets worse as new hosts are added. Routing seems to be a good solution. Carl Karcher also told me these things about LAT/LAST: Careful, the LAVC protocol is not routable. However, the cicso's can be setup to selectively bridge protocols that can't be routed (Like LAVC, LAT, MOP and LAST). We don't do that here since our network police don't allow selective bridging. One more thing, be sure your cisco's have the latest firmware. We just discovered a problem where the router pasted corrupted packets as good packets which was affecting Novell and NFS traffic. Here's a brief description: [A bug in the Cisco interface firmware caused it to ignore CRC errors. A corrupted packet received on the routers Ethernet interface would have it's CRC recomputed and would be forwarded toward it's destination. So the next node receiving the packet would not be able to detect the corruption. Ethernet devices (with the exception of ethernet analyzer equipment) are supposed to drop packets containing CRC errors. Higher layer protocols then trigger retransmission of dropped packets. Some protocols like TCP have an end-to-end checksum so that even if a corrupted packet manages to get through, the destination machine can detect the corruption. The Netware protocol assumes that corrupted packets will be dropped and that errors created by intermediate nodes are extremely rare, so it doesn't include an end-to-end checksum.] ...... The infoserver uses LAST protocol which is also not routable. Pathworks disk and file services use LAST too (but file services can use decnet). If you use infoservers for holding bookreader documentation CD's, decnet can be used (to a node that has the DAD disks mounted) instead of mounting DADn disks. I'm not sure if I can say that "nobody uses a default value of 20 for RECNXINTERVAL" (based on my own experiences and 7 answers...), but it seems like many people has set it a little higher. Many thanks to George Burns Ehud Gavron Carl Karcher Tom Miller Malcolm Newman Jeff Rossiter Frank Shorter Erik Sosman Kaj Haggman Internet: Hege@abo.fi Phone: +358-21-654467 Abo Akademi University Bitnet: Hege@finabo FAX: +358-21-654497 Computing Center PSImail: 22101410::HEGE SF-20500 Abo, FINLAND X.400: s=hege o=abo prmd=inet admd=fumail c=fi