home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!gossip.pyramid.com!decwrl!bu.edu!att!linac!uchinews!machine!sunbird!amiserv!austral!rrezaian
- From: rrezaian@austral.chi.il.us (Russell Rezaian)
- Newsgroups: chi.mail
- Subject: Re: clout closed
- Message-ID: <1993Jan21.210431.5767@austral.chi.il.us>
- Date: 21 Jan 93 21:04:31 GMT
- References: <MWS3SEHMH@linac.fnal.gov> <1993Jan17.001105.7415@austral.chi.il.us> <C10rDt.GoE@ddsw1.mcs.com>
- Distribution: chi
- Organization: D.O.C. Data Processing Systems
- Lines: 117
-
- In article <C10rDt.GoE@ddsw1.mcs.com> karl@ddsw1.mcs.com (Karl Denninger) writes:
- >In article <1993Jan17.001105.7415@austral.chi.il.us> rrezaian@austral.chi.il.us (Russell Rezaian) writes:
- >>Before that we had a remote reboot hang due to a memory error. This has
- >>happened about 3 times in the past few months. We haven't traced the
- >>cause of our little memory parity error, but I suspect it might have
- >>something to do with line glitches, cosmic rays, heat, or malignant beings
- >>from the spirit world.
- >
- >Or bad memory :-)
-
- That had occurred to me also... :-)
-
- >Seriously, do you have the RAM test enabled on the machine? The PROM
- >default is to check only the first meg (not very useful!) Then again, these
- >"ram tests" are frequently not useful anyway, but they are better than
- >nothing!
-
- Yup, we have the default RAM test on a powerup set, and after the second
- time I went into the PROM diags and ran all of the memory tests there.
- Nothing, all of the ram tested clean.
-
- >Filled virtual space will prevent the inode cache from being properly
- >managed, and can manifest as no available inodes. Of course, you'll also be
- >getting complaints about other things (like no swap space), and processes
- >will refuse to start (since there is no backing store or RAM for them
- >available)
-
- That pretty much was what happened, and it's didn't take us much time to
- trace the problem. Of course the i-node messages did rather tend to
- confuse the issue for a while, but considering they were with respect to
- the root partition it wasn't really likely that we were actually out of
- i-nodes. Dead giveaway that the problem's elsewhere.
-
- >In the extreme case, this can completely lock you out - unless you get lucky
- >and something exits just as you're trying to log on.
-
- It did lock us out, someone rebooted the machine and things were fine.
-
- >>Now, during this time clout has had problems with spool space, including
- >>two instances of the mailbombing of a local client site. NOT ONCE has
- >>this required a physical reboot of clout, and at least three times that I
- >>remember it hasn't even had any effect on the function of the clout
- >>machine.
- >
- >It should not, IF you have parititioned your disks correctly. It will,
- >however, cause mail to be dumped on the floor unceremoniously.
-
- Actually, no it won't. At least not for sites in chi.il.us. While the
- clout machine won't be able to accpet mail, all of the site in chi.il.us
- that depend on clout have at least one secondary MX site so the mail can
- go there. In fact, on those ocasions when clout has been out of commision
- this is exactly what has happened. It's really neat when something works
- the way it's supposed to.
-
- >Anyone who runs a system with the mail spool on the root partition deserves
- >what they get. They will get it. The pat answer to this is "don't do
- >that".
-
- We don't. We're not that stupid. Anyone with a spool of any kind on the
- root partiton is asking for trouble, and IMHO anyone with /tmp on the root
- partiton is asking for trouble too...
-
- >In general, there are several items which will hose you every time:
- >
- >1) Running out of virtual space (RAM + page). This will, on occasion,
- > crash systems (or it might just lock them up) It is always bad news.
-
- Bingo. This has happened a few times lately, and we are currently looking
- for more RAM. Depending on how bad things get we might try adding some
- swap sace to the system, but I want to put as much real RAM in as possible
- before stealing even more of our limited disk space.
-
- >2) Running out of disk space on root (frequently prevents anyone from
- > signing on; ergo, you have a bitch of a time fixing the problem).
- > Some machines also crash under this kind of stress.
-
- We haven't had this problem, luckily. And I hope we don't. Root is a bt
- tight, but that's just my opinion, I know people who as a matter of course
- run with less space on root partitons than we now have.
-
- >3) NIS problems can cause service failures or a completely locked up
- > system (its not really locked, but system calls which attempt to do
- > name <> uid mapping hang, so it appears locked). NIS should NEVER
- > be run unless you have TWO servers on EACH subnet. NO EXCEPTIONS.
- > Further, NIS should NEVER be run on a machine which is visible from
- > "hostile" locations -- if it is, you're asking for someone to
- > download your password file and crack the passwords.
-
- Only if your password file is available via NIS, ours isn't.
-
- > Filtering on
- > routers is only marginally useful in combatting this; even if you
- > prevent contact with the portmapper, people can still get through
- > by creative port guessing. (This is less true for "secure" NIS,
- > but even that isn't very secure) Run the Sun security (shadow
- > password and more) package instead, and get rid of NIS.
-
- Shadow password isn't too helpful if you don't have any users on a system.
- Since we don't have any users on clout there's not much point. Not to
- mention that shadow password files are horribly over-rated. Adding more
- of the Sun C2 package might not be too bad an idea though, we should look
- into it!
-
- >This is a good start. CLOUT seems to be missing a few of these points ;-)
-
- Well, we've cleaned up the big NIS problem, as a quick fix, and over the
- long term I am going to be isolating what ever it is that still wants NIS
- and pulling that. (There may come a time when we want NIS again, but not
- yet...)
-
- Other than that, the only serious problem that we have discussed that
- really applies is memory. And we're looking! Somehow it always
- comes down to this, all our problems seem to devolve to space. At least
- there's hope on the disk front.
- --
- Russell Rezaian | rrezaian@austral.chi.il.us
- rrezaian@clout.chi.il.us | rrezaian@zed.chi.il.us
-