home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.wizards
- Path: sparky!uunet!spool.mu.edu!uwm.edu!rpi!batcomputer!cornell!uw-beaver!newsfeed.rice.edu!exlogcorp!mcdowell
- From: mcdowell@exlogcorp.exlog.com (Steve McDowell)
- Subject: Re: Changing the owner of a process
- Message-ID: <1992Nov21.173926.26419@exlog.com>
- Keywords: process ownership
- Sender: mcdowell@exlog.com (Steve McDowell)
- Organization: EXLOG, Inc.
- References: <1992Nov21.053022.17380@ra.msstate.edu>
- Date: Sat, 21 Nov 92 17:39:26 GMT
- Lines: 150
-
- In message <1992Nov21.053022.17380@ra.msstate.edu>fwp@CC.MsState.Edu (Frank Peters) writes:
- >
- > If you had posted "wouldn't it work if you did foo?" or "why couldn't
- > you just do foo?" or "I think it would probably be better to do foo." I
- > suspect you would have gotten a very different response. But you
- > didn't. You just told him why what he was doing was wrong. Contained
-
- You're absolutely right, of course. One of the problems with USENET is
- it allows immediate train-of-thought responses; that's exactly what
- I gave in my original post. My tone was wrong, and for that I apologize.
-
- One thing, though, that I want to get straight up front: I was not
- leveling a personal attack on Chris Torek. The tone of your article
- suggests that I was and that just isn't right. I, like most in the
- community, very much appreciate the work that Chris and people like Chris
- do. We need more like him.
-
- > the systems there...as more or less an illustration of the fact that it
- > is dangerous to assume that you can just twiddle with a kernel
- > datastructure at will. Kernel data structures are where the operating
- > system stores its view of the world. Twiddling with those structures
- > behind its back is likely to make the system psychotic if you aren't
- > careful.
-
- Of course it's dangerous to needlessly twiddle kernel data structures.
- It should also be a facet of the operating system to recover from errors
- *THAT IT CAN RECOVER FROM*. Disable interrupts for a while and recompute
- what the consistancies should be then keep going.
-
- >: In the real-world, in a critical application, I'd much rather have a
- >: syslog-and-reset than a panic that's going to have to be called in to a
- >: support group while the dump sits cluttering up somebody's dasd. If there's
- >: a situation that can be recovered from, then recover. I really don't see
- >: how you can argue against this.
- >
- > Hmmm...I question your reasoning. You seem to be operationg under the
- > assumption that the operating system should just know that nothing
- > really bad is wrong.
-
- Question away, but I didn't say (and I didn't mean to imply) that the
- operating system shouldn't know that something is wrong. What I said is
- that the operating system, once it detects something is wrong, should
- attempt to recover before simply shutting down. Tell the world about it
- when it detects a problem, even dump enough state to attempt to trace
- back the error. But build in the functionality to recompute "consistancy".
- If that fails, then panic.
-
- > In the real world such a gross inconsistency in a kernel data structure
- > is probably either (a) a sign of a MAJOR operating system bug or (b) a
- > sign of a MAJOR hardware fault or (c) a sign of a very subtle security
- > breach (remember, we are talking about changing the ownership of
- > processes). In any of these situations it is very likely that just
- > adjusting the OS' view of reality and continuing on risks problems
- > significantly worse than a bit of downtime.
-
- Or a minor hardware fault, i.e. a flakey DRAM chip that only fails when
- the back cover is off the machine. Or a flakey device driver that jumps
- out of bounds of an array only when the the wall power dips below 58Hz
- on the attached peripheral. If you can recover and go on, then do.
- I'm not saying you shouldn't save enough state to track the error or
- panic() with some error conditions.
-
- > You have critical applications in an environment in which system uptime
- > is more important than system integrity or system security? You want
- > the system to continue to process your data in the face of what (to
- > the system) looks like a major system consistency failure?
-
- Depends on what you consider "major". Some things are simply not
- recoverable from, some things shouldn't be, and some things are.
-
- My orignal post was only addressing a situation where the kernel
- had fewer processes in a queue than it thought it had. Yes, something's
- wrong; but why panic()? That one is easily fixed by a recount and
- easily identifiable because another queue is going to have more
- processes than the kernel thought it should have. Leave an audit trail.
- If the process can't be found on anybody's queue, then panic().
-
- > I can certainly imagine such critical applications but I don't think
- > they are common...and they probably require specially written fault
- > tolerant systems anyway.
-
- To some people doing a 10-hour payroll run would be a critical application.
- Not all that uncommon. To others an unattended backup would be a
- critical application, again not all that uncommon. Critical doesn't
- necessarily mean life-threatening. My stock broker's trading computers
- run a very critical application :)
-
- Some degree of fault tolerance should be built into the system. The
- system shouldn't panic because it couldn't malloc() an mbuf. It should
- try to recover. Then it should give the operator some options. Then
- it should panic().
-
- I'm not proposing a specific design here, just a philosophy. There are
- very few reasons that a system should panic, and UNIX seems to know
- them all.
-
- >: Of course, for your purposes developing an operating system is a means
- >: unto itself. You don't have to worry about irrelevant things like
- >: "customers" and "applications". I wish I didn't.
- >
- > Basically irrelevant. He pointed out the problem that the suggested
- > solution would cause on one implementation. You didn't argue that
- > his solution was inappropriate for your situation. You contended
- > that it was wrong in absolute terms.
-
- You mis-read what I said. I said that in the lab you should be able
- to do what you want and if panic's are the way to track an error down,
- then by all means do it. If bsd4.4 is never going to leave the lab
- then all this babble is completely irrelevant. If it's going to leave
- the lab, then it isn't irrelevant at all.
-
- If in the deliverable product you can catch an error and correct it
- without destroying anything, then that should be done. Or define a
- configuration option and let the operator decide whether or not he wants
- panic's or attempted recovery.
-
- >: I hope that vendors do replace panics where they can. The problem becomes that
- >: commericial vendors don't always take the time to get it right. Too many vendors
- >: simply port the code they get from USL or CSRG or CMU to get compatable
- >: functionality; that makes it simple to integrate new releases and keeps the
- >: lead time to market short. That's not the research community's problem, that's
- >: a problem for everyone else.
- >
- >That's hardly Chris' fault. He designs systems for his purposes.
- >Arguing that he shouldn't design systems to meet his needs because
- >someone is going to then take that system and use it to meet other
- >needs without appropriate modifications is just a bit selfish.
-
- Again, you're reading this as a personal attack and it simply isn't. Read the
- last sentence that you quoted of me: "that's not the research community's
- problem". And it isn't. I said it isn't. It's the commerical world's fault.
- That's not a slam of Chris Torek or you or Mike Karels or anybody else. That's a
- slam of some of the commercial implementations that are out there.
-
- Chris should build systems that meet his research needs, no question. Before
- it hits the general user population, however far removed that is from Chris
- Torek's responsibility, it should have some fault tolerance built into it.
- That is my only point.
-
- And, please, I did not intend to bring personalities into this. It takes
- focus away from the issues. If I'm wrong in my ideas, then make it be because
- what I suggest isn't feasable. Don't say that I'm wrong because I do research
- for a commercial orgainization and not for academia. And please don't say that
- I'm wrong because I don't agree with everything that comes out of Berkeley.
-
- --
- Steve McDowell . . . . o o o o o Opinions are
- Exlog, Inc. _____ o mine, not my
- mcdowell@exlog.com _____==== ]OO|_n_n__][. employers..
- [_________]_|__|________)<
-