NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / unix / wizards / 4826 < prev next >

Wrap

Text File | 1992-11-21 | 8.3 KB | 163 lines

Newsgroups: comp.unix.wizards Path: sparky!uunet!spool.mu.edu!uwm.edu!rpi!batcomputer!cornell!uw-beaver!newsfeed.rice.edu!exlogcorp!mcdowell From: mcdowell@exlogcorp.exlog.com (Steve McDowell) Subject: Re: Changing the owner of a process Message-ID: <1992Nov21.173926.26419@exlog.com> Keywords: process ownership Sender: mcdowell@exlog.com (Steve McDowell) Organization: EXLOG, Inc. References: <1992Nov21.053022.17380@ra.msstate.edu> Date: Sat, 21 Nov 92 17:39:26 GMT Lines: 150 In message <1992Nov21.053022.17380@ra.msstate.edu>fwp@CC.MsState.Edu (Frank Peters) writes: > > If you had posted "wouldn't it work if you did foo?" or "why couldn't > you just do foo?" or "I think it would probably be better to do foo." I > suspect you would have gotten a very different response. But you > didn't. You just told him why what he was doing was wrong. Contained You're absolutely right, of course. One of the problems with USENET is it allows immediate train-of-thought responses; that's exactly what I gave in my original post. My tone was wrong, and for that I apologize. One thing, though, that I want to get straight up front: I was not leveling a personal attack on Chris Torek. The tone of your article suggests that I was and that just isn't right. I, like most in the community, very much appreciate the work that Chris and people like Chris do. We need more like him. > the systems there...as more or less an illustration of the fact that it > is dangerous to assume that you can just twiddle with a kernel > datastructure at will. Kernel data structures are where the operating > system stores its view of the world. Twiddling with those structures > behind its back is likely to make the system psychotic if you aren't > careful. Of course it's dangerous to needlessly twiddle kernel data structures. It should also be a facet of the operating system to recover from errors *THAT IT CAN RECOVER FROM*. Disable interrupts for a while and recompute what the consistancies should be then keep going. >: In the real-world, in a critical application, I'd much rather have a >: syslog-and-reset than a panic that's going to have to be called in to a >: support group while the dump sits cluttering up somebody's dasd. If there's >: a situation that can be recovered from, then recover. I really don't see >: how you can argue against this. > > Hmmm...I question your reasoning. You seem to be operationg under the > assumption that the operating system should just know that nothing > really bad is wrong. Question away, but I didn't say (and I didn't mean to imply) that the operating system shouldn't know that something is wrong. What I said is that the operating system, once it detects something is wrong, should attempt to recover before simply shutting down. Tell the world about it when it detects a problem, even dump enough state to attempt to trace back the error. But build in the functionality to recompute "consistancy". If that fails, then panic. > In the real world such a gross inconsistency in a kernel data structure > is probably either (a) a sign of a MAJOR operating system bug or (b) a > sign of a MAJOR hardware fault or (c) a sign of a very subtle security > breach (remember, we are talking about changing the ownership of > processes). In any of these situations it is very likely that just > adjusting the OS' view of reality and continuing on risks problems > significantly worse than a bit of downtime. Or a minor hardware fault, i.e. a flakey DRAM chip that only fails when the back cover is off the machine. Or a flakey device driver that jumps out of bounds of an array only when the the wall power dips below 58Hz on the attached peripheral. If you can recover and go on, then do. I'm not saying you shouldn't save enough state to track the error or panic() with some error conditions. > You have critical applications in an environment in which system uptime > is more important than system integrity or system security? You want > the system to continue to process your data in the face of what (to > the system) looks like a major system consistency failure? Depends on what you consider "major". Some things are simply not recoverable from, some things shouldn't be, and some things are. My orignal post was only addressing a situation where the kernel had fewer processes in a queue than it thought it had. Yes, something's wrong; but why panic()? That one is easily fixed by a recount and easily identifiable because another queue is going to have more processes than the kernel thought it should have. Leave an audit trail. If the process can't be found on anybody's queue, then panic(). > I can certainly imagine such critical applications but I don't think > they are common...and they probably require specially written fault > tolerant systems anyway. To some people doing a 10-hour payroll run would be a critical application. Not all that uncommon. To others an unattended backup would be a critical application, again not all that uncommon. Critical doesn't necessarily mean life-threatening. My stock broker's trading computers run a very critical application :) Some degree of fault tolerance should be built into the system. The system shouldn't panic because it couldn't malloc() an mbuf. It should try to recover. Then it should give the operator some options. Then it should panic(). I'm not proposing a specific design here, just a philosophy. There are very few reasons that a system should panic, and UNIX seems to know them all. >: Of course, for your purposes developing an operating system is a means >: unto itself. You don't have to worry about irrelevant things like >: "customers" and "applications". I wish I didn't. > > Basically irrelevant. He pointed out the problem that the suggested > solution would cause on one implementation. You didn't argue that > his solution was inappropriate for your situation. You contended > that it was wrong in absolute terms. You mis-read what I said. I said that in the lab you should be able to do what you want and if panic's are the way to track an error down, then by all means do it. If bsd4.4 is never going to leave the lab then all this babble is completely irrelevant. If it's going to leave the lab, then it isn't irrelevant at all. If in the deliverable product you can catch an error and correct it without destroying anything, then that should be done. Or define a configuration option and let the operator decide whether or not he wants panic's or attempted recovery. >: I hope that vendors do replace panics where they can. The problem becomes that >: commericial vendors don't always take the time to get it right. Too many vendors >: simply port the code they get from USL or CSRG or CMU to get compatable >: functionality; that makes it simple to integrate new releases and keeps the >: lead time to market short. That's not the research community's problem, that's >: a problem for everyone else. > >That's hardly Chris' fault. He designs systems for his purposes. >Arguing that he shouldn't design systems to meet his needs because >someone is going to then take that system and use it to meet other >needs without appropriate modifications is just a bit selfish. Again, you're reading this as a personal attack and it simply isn't. Read the last sentence that you quoted of me: "that's not the research community's problem". And it isn't. I said it isn't. It's the commerical world's fault. That's not a slam of Chris Torek or you or Mike Karels or anybody else. That's a slam of some of the commercial implementations that are out there. Chris should build systems that meet his research needs, no question. Before it hits the general user population, however far removed that is from Chris Torek's responsibility, it should have some fault tolerance built into it. That is my only point. And, please, I did not intend to bring personalities into this. It takes focus away from the issues. If I'm wrong in my ideas, then make it be because what I suggest isn't feasable. Don't say that I'm wrong because I do research for a commercial orgainization and not for academia. And please don't say that I'm wrong because I don't agree with everything that comes out of Berkeley. -- Steve McDowell . . . . o o o o o Opinions are Exlog, Inc. _____ o mine, not my mcdowell@exlog.com _____==== ]OO|_n_n__][. employers.. [_________]_|__|________)<