home *** CD-ROM | disk | FTP | other *** search
File List | 1993-12-05 | 13.5 KB | 386 lines |
- Last Change 7/17/93. Please send updates directly to Harald.
-
-
- 86BUGS.LST revision 1.0
- By Harald Feldmann (harald.feldmann@almac.co.uk), mail address:
- Hamarsoft, p.o. box 91, 6114 ZH Susteren, The Netherlands.
- (Please retain my name and address in the document)
-
- This file lists undocumented and buggy instructions of the Intel 80x86
- family of processors. Some of the information was obtained from the book
- "Programmer's technical reference, the processor and coprocessor; by
- Robert L. Hummel; Ziff davis press. ISBN 1-56276-016-5 Which is highly
- recommended. Note that Intel does not support the special features and
- may decide to drop opcode variants and instructions in future products.
-
- All mentioned trademarks and/or tradenames are owned by the respective
- owners and are acknowledged.
-
- Undocumented instructions and undocumented features of Intel and IIT
- processors:
-
- AAD: OPCODE: d5,0a OPCODE VARIANT
-
- This instruction regularly performs the following action:
- - unpacked BCD in AX example (AX = 0104h)
- - AL = AH * 10d + AL (AL = 0eh )
- - AH = 00 (AH = 00h )
-
- The normal opcode decodes as follows: d5,0a
- The instruction itself is an instruction plus operand. By
- replacing the second byte with any number in the range 00 -
- ff we can build our own instruction AAD for various number
- systems in those ranges. For example by coding d5,10 we
- achieve an instruction that performs: AL = AH * 16d + AL.
-
- Note: the variant is not supported on all 80x86-compatible
- CPUs, notably the NEC V-series, because some hard-code the
- divisor at 0Ah
-
- AAM: OPCODE: d4,0a OPCODE VARIANT
-
- This instruction regularly performs the following action:
- - binary number in AL
- - AH = AL / 10d
- - AL = AL MOD 10d
-
- Thus creating an unpacked BCD in AX.
- The normal opcode decodes as follows: d4,0a
- The instruction itself is an instruction plus operand. By
- replacing the second byte with any number in the range 00 -
- ff we can build our own instruction AAM for various number
- systems in that range. For example by coding d4,07 we
- achieve an instruction that performs: AH = AL / 07d, AL = AL
- MOD 07d
-
- The AAD and AAM opcode variants have been found in Future
- Domain SCSI controller ROMS.
-
-
- LOADALL: OPCODE: 0f,05 (i80286) & 0f,07 (i80386 & i80486)
- UNDOCUMENTED
-
- Load _ALL_ processor registers. Does exactly as the name
- suggests, separate versions for i80286 and i80386 exist. The
- i80286 LOADALL instruction reads a block of 102 bytes into
- the chip, starting at address 000800 hex. The i80286 LOADALL
- takes 195 clocks to execute.
- The sequence is as follows (Hex address, Bytes, Register):
-
- 0800: 6 N/A
- 0806: 2 MSW (Machine Status Word)
- 0808: 14 N/A
- 0816: 2 TR (Task Register)
- 0818: 2 FLAGS (Flags)
- 081a: 2 IP (Instruction Pointer)
- 081c: 2 LDT (Local Descriptor Table)
- 081e: 2 DS (Data Segment)
- 0820: 2 SS (Stack Segment)
- 0822: 2 CS (Code Segment)
- 0824: 2 ES (Extra Segment)
- 0826: 2 DI (Destination Index)
- 0828: 2 SI (Source Index)
- 082a: 2 BP (Base Pointer)
- 082c: 2 SP (Stack Pointer)
- 082e: 2 BX (BX register)
- 0830: 2 DX (DX register)
- 0832: 2 CX (CX register)
- 0834: 2 AX (AX register)
- 0836: 6 ES cache (ES descriptor _cache_)
- 083c: 6 CS cache (CS descriptor _cache_)
- 0842: 6 SS cache (SS descriptor _cache_)
- 0848: 6 DS cache (DS descriptor _cache_)
- 084e: 6 GDTR (Global Descriptor Table)
- 0854: 6 LDT cache (Local Descriptor_cache_)
- 085a: 6 IDTR (Interrupt Descriptor table)
- 0860: 6 TSS cache (Task State Segment _cache_)
-
- Descriptor cache entries are internal copies of the
- original registers (the LDT cache is normally a copy of the
- last regularly _loaded_ LDT). Note that after executing
- LOADALL, the chip will use the _cache_ registers without
- re-checking the caches against the regular registers. That
- means that cache and register do not have to be the same.
- Caches are updated when the original register is loaded
- again. Both will then contain the same value.
-
- Descriptor caches layout:
- 3 bytes 24 bit physical address of segment
- 1 byte access rights byte, mapped as access right
- byte in a regular descriptor. The present
- bit now represents a valid bit. If this bit
- is cleared (zero) the segment is invalid and
- accessing it will trigger exception 0dh. The
- DPL (Descriptor Privilege Level) fields of
- the CS and SS descriptor caches determine
- the CPL (Current Privilege Level).
- 2 bytes 16 bit segment limit.
- This layout is the same for the GDTR and IDTR registers,
- except that the access rights byte must be zero.
-
-
- i80386 LOADALL:
- The i80386 variant loads 204 (dec) bytes from the address at
- ES:EDI and resumes execution in the specified state.
- No timing information available.
-
- relative offset: Bytes: Registers:
- 0000: 4 CR0
- 0004: 4 EFLAGS
- 0008: 4 EIP
- 000c: 4 EDI
- 0010: 4 ESI
- 0014: 4 EBP
- 0018: 4 ESP
- 001c: 4 EBX
- 0020: 4 EDX
- 0024: 4 ECX
- 0028: 4 EAX
- 002c: 4 DR6
- 0030: 4 DR7
- 0034: 4 TR
- 0038: 4 LDT
- 003c: 4 GS (zero extended)
- 0040: 4 FS (zero extended)
- 0044: 4 DS (zero extended)
- 0048: 4 SS (zero extended)
- 004c: 4 CS (zero extended)
- 0050: 4 ES (zero extended)
- 0054: 12 TSS descriptor cache
- 0060: 12 IDT descriptor cache
- 006c: 12 GDT descriptor cache
- 0078: 12 LDT descriptor cache
- 0084: 12 GS descriptor cache
- 0090: 12 FS descriptor cache
- 009c: 12 DS descriptor cache
- 00a8: 12 SS descriptor cache
- 00b4: 12 CS descriptor cache
- 00c0: 12 ES descriptor cache
-
- Descriptor caches layout:
- 1 byte zero
- 1 byte access rights byte, same as i80286
- 2 bytes zero
- 4 bytes 32 bit physical base address of segment
- 4 bytes 32 bit segment limit
-
-
- UNKNOWN: OPCODE: 0f,04 UNDOCUMENTED
-
- This instruction is likely to be an alias for the LOADALL on
- the i80286. It is not documented and is even marked as
- unused in the 'Programmer's technical reference'. Still it
- executes on the i80286. >> info wanted <<
-
-
- SETALC: OPCODE: d6 UNDOCUMENTED
-
- This instruction copies the Carry Flag to the AL register.
- In case of a CY, AL becomes ffh. When the Carry Flag is
- cleared, AL becomes 00.
-
-
- Floating Point special instructions:
-
- FMUL4X4: OPCODE: db,f1 IIT ONLY
-
- This instruction is available only on the IIT (Integrated
- Information Technology Inc.) math processors.
- Takes 242 clocks.
- The instruction performs a 4x4 matrix multiply in one
- instruction using four banks of 8 floating point registers.
- The operands must be loaded to a specific bank in a specific
- order. The equation solved can be represented by:
-
- Xn = (A00 * Xo) + (A01 * Xo) + (A02 * Xo) + (A03 * Xo)
- Yn = (A10 * Yo) + (A11 * Yo) + (A12 * Yo) + (A13 * Yo)
- Zn = (A20 * Zo) + (A21 * Zo) + (A22 * Zo) + (A23 * Zo)
- Vn = (A30 * Vo) + (A31 * Vo) + (A32 * Vo) + (A33 * Vo)
-
- Where Xo stands for the original X value and Xn for the
- result. Operands must be loaded to the following registers
- in the specified banks in the specified order.
-
- Before FMUL4X4 After FMUL4X4
-
- bank bank
- Register: 0 1 2 0
-
- ST(0) Xo A33 A31 Xn
- ST(1) Yo A23 A21 Yn
- ST(2) Zo A13 A11 Zn
- ST(3) Vo A03 A01 Vn
- ST(4) A32 A30 ?
- ST(5) A22 A20 ?
- ST(6) A12 A10 ?
- ST(7) A02 A00 ?
-
-
-
- All four banks can be selected by using the bankswitching
- instructions, but only bank 0, 1 and 2 make sense since bank
- 3 is an internal scratchpad. The separate banks can contain
- 8 floating points and may be re-used with normal
- instructions. Each bank acts like an independent i80287,
- except when bankswitched inbetween, in those cases where the
- initial status is not maintained;
-
- Pseudo- multichip operation can be performed in each bank
- and even in multiple banks at the same time (although only
- one instruction will operate on one register at any given
- time), provided that the active register and top register
- are not changed after switching from bank to bank.
-
-
- EXAMPLE:
- FINIT ; reset control word
- FSBP1 ; select bank 1
- FLD DWORD PTR es:[si] ; first original
- FLD DWORD PTR es:[si+4] ; second original
- FLD DWORD PTR es:[si+8] ; third original
- FSTCW WORD PTR [bx] ; save FPU control status
- FSBP2 ; NOTE ! you will see three
- active registers in this
- bank when using a
- debugger
- FINIT ; nothing visible
- FLD DWORD PTR [si] ; new value
- FLD DWORD PTR [si+4] ; second new value
- FADD ST,ST(1) ; two values visible
- FSTP DWORD PTR [si+8] ; one value visible
- FSBP1 ; one original visible
- FLDCW WORD PTR [bx] ; restore FPU status to the
- one active in bank 1,
- causing original three
- values to be visible
- again in correct
- sequence
-
- ... simply continue with what you wanted to do with
- those numbers from es:[si], they are still there.
-
- FLD DWORD PTR [si+8] ; for instance...
-
-
- This feature of the IIT chips can be used to perform complex
- operations in registers with many components remaining the
- same for a large dataset, only saving intermediary results
- to ONE memory location, bankswitching to the next series of
- operands, loading that ONE operand and continuing the
- calculation with the next set of operands already in that
- bank. This does require another read into the new bank but
- may save time and memoryspace compared to memory based
- operands or multiple pass algorithms with multiple arrays of
- intermediary results.
-
-
-
- BANKSWITCH INSTRUCTIONS:
-
- FSBP0: OPCODE: db,e8 IIT ONLY
- Selects the original bank. (default) (6 clocks)
-
-
- FSBP1: OPCODE: db,eb IIT ONLY
-
- Selects bank 1 from FMUL4X4 instruction diagram (6 clocks)
-
-
- FSBP2: OPCODE: db,ea IIT ONLY
-
- Selects bank 2 from FMUL4X4 instruction diagram (6 clocks)
-
- FSBP3: OPCODE: db,e9 IIT ONLY UNDOCUMENTED
- Selects the scratchpad bank3 used by the FMUL4X4 internally.
- Not very useful but funny to look at... How-to: load
- any value into bank 0,1 or 2 until you have a full 8
- registers, then execute this bankswitch. Using a
- debugger like CodeView you are now able to inspect the
- bank3 registers. (most likely to take 6 clocks)
-
-
-
- TRIGONIOMETRIC FUNCTIONS:
-
- Apparently the IIT 2c87 recognises and executes some
- i80387 trigoniometric functions. UNDOCUMENTED
- FSIN (sine) and FCOS (cosine) have been tested and function
- according to the Intel 80387 specifications. FSINCOS
- (available on the Intel 80287XL, 80387 and up) does not
- work.
-
- FSIN: OPCODE: d9,fe IIT 2c87+ (also Intel 80387+) UNDOCUMENTED
- Calculates the sine of the input in radians in ST(0). After
- calculation, ST(0) contains the sine. Takes approximately
- 120 clocks.
-
- FCOS: OPCODE: d9,ff IIT 2c87+ (also Intel 80387+) UNDOCUMENTED
- Calculates the cosine of the input in radians in ST(0).
- After calculation, ST(0) contains the cosine. Takes
- approximately 120 clocks.
-
-
- ... CUT HERE FOR FIRST REVISION, next part is to be revised ...
-
-
-
- Instructions by mnemonic mnemonic:
- opcode: processor: remark & remedy:
-
- AAA i80286 & i80386 & i80486
-
- CMPS i80286
- CMPXCHG i80486
- FINIT
- FSTSW
- FSTCW
-
-
- INS i80286 &
- i80386 &
- i80486
-
- INVD i80486
-
-
-
- MOV to SS n/a early 8088 Some early 8088 would not properly
- disable interrupts after a move to
- the SS register. Workaround would
- be to explicitly clear the
- interrupts, update SS and SP and
- then re-enable the interrupts.
- Typically this would occur in a
- situation where one would relocate
- a stack in memory, more than 64Kb
- from the original one, updating
- both SS and SP like in:
- MOV SS,AX ; would disable
- interrupts
- automatically during
- this and next
- instruction.
- MOV SP,DX ; interrupts disabled
- ... ; interrupts enabled.
-
-
- multiple prefixes
- with REPx 8088 & 8086 They would not properly restart at
- the first prefix byte after an
- interrupt. when more than one
- prefix is used. e.g. LOCK REP MOVSW
- CS:[bx]. A workaround is to test
- after the instruction for CX==0,
- here: LOCK REP MOVSW CS:[BX] OR
- CX,CX JNZ here because of the CS
- override, the REP and LOCK prefixes
- would not be recognised to be part
- of the instruction and the REP MOVSW
- would be aborted. This also seems to
- be the case for a REP MOVSW CS:[BX]
- Note that this also implies that
- REPZ, REPNZ are affected in SCASW
- for instance.
-
-
-