home *** CD-ROM | disk | FTP | other *** search
- From nelson@sun.soe.clarkson.edu Thu Jul 26 15:25:24 1990
- Received: from omnigate.clarkson.edu by pear.ecs.clarkson.edu with SMTP
- id AA2818 ; Thu, 26 Jul 90 15:25:23 GMT
- Received: from sun.soe.clarkson.edu by omnigate.clarkson.edu id aa09445;
- 26 Jul 90 15:14 EDT
- Received: by sun.soe.clarkson.edu (4.1/SMI-4.0)
- id AA01795; Thu, 26 Jul 90 15:14:20 EDT
- Message-Id: <9007261914.AA01795@sun.soe.clarkson.edu>
- Return-Path: <@po5.andrew.cmu.edu:ddp+@andrew.cmu.edu>
- Date: Fri, 20 Jul 90 04:18:41 -0400 (EDT)
- From: Drew Daniel Perkins <ddp+@andrew.cmu.edu>
- To: nelson@sun.soe.clarkson.edu
- Subject: Re:
- In-Reply-To: <2802@pear.ecs.clarkson.edu>
- References: <2802@pear.ecs.clarkson.edu>
-
- nelson@pear.ecs.clarkson.edu writes:
- > Why?
- >
- > in al,dx ;get master mask
- > and al,not (1 shl 2) ; and clear slave cascade bit in mask
- > out dx,al ;set new master mask (enable slave int)
-
- That solves a bug that I'm truly amazed that you haven't run into
- before (I'm equally amazed that it exists). We have a few very old
- original IBM PC/AT's (the one's with the strange piggybacked 256KB
- DRAMS which together made 512KB). It seems that the BIOS on those old
- versions does NOT initialize the master 8259 for you so that the slave
- 8259 can interrupt. If you want it to (i.e. if you want to use int
- 8-15), you had better make sure that the bit is cleared. The obvious
- outcome is that you don't get interrupts and the packet driver doesn't
- work. I think the reason that you didn't see this is that you didn't
- really have any cards that supported ints 8-15 (atleast not WD varieties).
- In any case, these three instructions shouldn't ever cause anyone any harm.
-
- Drew
-
-
- From nelson@sun.soe.clarkson.edu Thu Jul 26 15:42:53 1990
- Received: from omnigate.clarkson.edu by pear.ecs.clarkson.edu with SMTP
- id AA2821 ; Thu, 26 Jul 90 15:42:52 GMT
- Received: from sun.soe.clarkson.edu by omnigate.clarkson.edu id aa09631;
- 26 Jul 90 15:31 EDT
- Received: by sun.soe.clarkson.edu (4.1/SMI-4.0)
- id AA02405; Thu, 26 Jul 90 15:31:25 EDT
- Message-Id: <9007261931.AA02405@sun.soe.clarkson.edu>
- Return-Path: <ddp+@andrew.cmu.edu>
- Date: Tue, 12 Jun 90 00:56:40 -0400 (EDT)
- From: Drew Daniel Perkins <ddp+@andrew.cmu.edu>
- To: pcip@twg.com, drivers@sun.soe.clarkson.edu
- Subject: Dell System 325 hardware bug
-
- I seem to have run into a a real hardware bug in the Dell System 325
- Chips & Technologies 8259 clone interrupt controller.
-
- Summary:
-
- Sending this interrupt controller a Non Specific End of Interrupt
- (EOI) command causes it to reset all In Service Register (ISR) bits
- instead of only the most recent one with the highest priority.
-
- Long Winded Explanation:
-
- I had a serious bug with my high-performance Western Digital wd80x3
- packet driver. Transmitting and receiving on it at high rates caused
- it to go west in many different ways. After tearing my hair out for a
- while, I added logging code which logged all procedure entries and
- exits along with detailed chip status in a large ring. I finally
- discovered that the impossible was happening. During my packet copy
- routine (which can take > 1.5ms to copy a 1500 byte packet), my
- interrupt handler was being reentered and was trashing the stack.
- This "shouldn't happen" since I was not giving an EOI command to the
- interrupt controller until the very end of the interrupt handler. The
- interrupt did however reenable processor and ethernet chip interrupts
- fairly early.
-
- After tearing my hair out some more and checking my code thouroughly,
- I decided that I must be getting some other interrupt in the middle of
- my code somewhere. I added some more logging code to record interrupt
- controller status, and changed my packet copy routine to enable
- processor interrupts AFTER the copy instead of before it. Sure
- enough, at the point the bug hit, my log indicated that timer had
- fired and a timer interrupt was now pending. Also, I had received a
- new packet, and the ethernet chip also had a new interrupt pending
- although it was still blocked because it already had an interrupt in
- service. However, immediately after reenabling processor interrupts,
- my log indicated that my interrupt handler was reentered. This
- indicated to me that the timer interrupt handler was somehow resetting
- not only its ISR bit but mine also.
-
- After disassembling the timer interrupt handler, I determined that the
- only thing it was doing was sending a Non Specific EOI to the primary
- interrupt controller (using mov al,20h; out 20h,al). To make my case
- for a hardware bug even stronger, I next coded my own timer interrupt
- handler. Just before and just after the mov/out instructions, I made
- log entries. Sure enough, while my log showed the ISR register
- reading 21h (IR5 and IR0 in service) just before the EOI was sent, it
- read 0 just after. I then changed the code to use a specific EOI
- instruction to reset the timer interrupt instead of a non specific
- EOI. The problem went away!
-
- Finally, I tested the code with a non specific EOI on a stock IBM
- PC/AT with a real Intel 8259. It didn't exhibit the problem.
-
- Since I can't change the real timer interrupt handler (its in BIOS), I
- had to use a different workaround. Just before reenabling processor
- interrupts, I now disable further ethernet device interrupts by
- setting its Interrupt Mask Register (IMR) bit. At the end of
- interrupt handler, I reset the bit. This insures that I can't get
- further device interrupts even if the timer interrupt clears my ISR bit.
-
- Synopsis:
-
- If you write high performance drivers where:
- 1. The interrupt handler runs with other interrupts enabled at the
- processor and at the interrupt controller,
- 2. The interrupt handler reenables interrupts at the device while 1.
- is true and further device interrupts are possible before the
- interrupt handler again disables interrupts and returns,
- 3. You want to the driver to work in clones with C&T chips
-
- Then, you better use a technique like the one I use to guarantee that
- you can't get reentrant interrupts.
-
- Drew
-
-
- From nelson@sun.soe.clarkson.edu Thu Jul 26 15:43:12 1990
- Received: from omnigate.clarkson.edu by pear.ecs.clarkson.edu with SMTP
- id AA2822 ; Thu, 26 Jul 90 15:43:11 GMT
- Received: from sun.soe.clarkson.edu by omnigate.clarkson.edu id aa09633;
- 26 Jul 90 15:31 EDT
- Received: by sun.soe.clarkson.edu (4.1/SMI-4.0)
- id AA02415; Thu, 26 Jul 90 15:31:47 EDT
- Message-Id: <9007261931.AA02415@sun.soe.clarkson.edu>
- Return-Path: <nelson>
- Date: Wed, 13 Jun 90 10:16:38 EDT
- To: drivers@sun.soe.clarkson.edu
- From: Drew Daniel Perkins <ddp+@andrew.cmu.edu>
- Sender: nelson@sun.soe.clarkson.edu
- In-Reply-To: <9006120558.AA14002@endor.harvard.edu>
- Subject: Re: Dell System 325 hardware bug
- Reply-To: nelson@clutx.clarkson.edu
-
- ddl@das.harvard.edu (Dan Lanciani) writes:
- > Do you have a little demo program I can rty on machines?
-
- Unfortunately, no I don't. Producing the bug took a lot of effort
- including 1 machine (a router) with two interfaces and two other
- machines pinging each other through the first. To generate enough
- traffic, the first machine had a packet exploder which generated 50
- packets for every one going through it.
-
- I certainly believe that it should be possible to write a much simpler
- program to generate the bug, but I definitely don't have time to try
- it... I guess what I would do is write a program that would:
-
- 1. Initialize a "reentered" variable to zero.
- 2. Cause some device to generate an interrupt.
- 3. Have the interrupt handler check the "reentered" variable. If it
- is equal to zero, continue to 4. Else, goto 8.
- 4. Reset the interrupt at the device.
- 5. Cause the device to generate another interrupt. 4 and 5 must be
- done in this order to get the interrupt controller's edge
- trigger latch to be set.
- 6. Reenable processor interrupts. Do NOT reenabled interrupt
- controller interrupts. I.e. do NOT send an EOI.
- 7. Wait in a infinite loop. Higher priority (i.e. timer) interrupts
- should be able to occur but interrupts from this device should not.
- 8. Print "bad interrupt controller". If you got here then a timer
- interrupt fired and managed to reenable your device interrupts
- by doing a Non Specific EOI to a broken interrupt controller.
-
- Drew
-
-
-
-