home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-03-16 | 40.5 KB | 1,073 lines | [TEXT/KAHL] |
- /*
- SetEntriesQuickly.c
-
- Load the clut as quickly as possible. SetEntriesQuickly has not been tested on
- all the devices that it's supposed to support. Run TimeVideo for a thorough
- test.
-
- WARNING: This file REQUIRES 68020 or better, and includes a #pragma option
- that forces the compiler to assume its presence. So you should call Gestalt
- to make sure you've got a 68020 before calling this subroutine. (This requirement
- could be eliminated by rewriting a few nonessential assembly language instructions
- to be compatible with the 68000, but I (dgp) am not sufficiently fluent
- in assembly to be able to do that quickly.)
-
- This has been written by many people, and the "final" result has not been tested
- on all the computers and video devices it's meant to support. The best test is
- simply to run TimeVideo, which gives it a thorough workout on all of your
- computer's video devices. Please send the report "TimeVideo results" to
- denis_pelli@isr.syr.edu, and I'll add your test results to "Video synch"
- (section E). Naturally, we'll try to fix bugs as they are discovered.
-
- Some Macintosh video drivers are poorly written; they take too long (more than a
- frame time) to load the clut. This is makes it impossible to do clut animation
- for temporal modulation etc., for which one needs to be able to reload the clut
- on each frame. At one time many of us thought that the limitation was in
- hardware, in the RAMDAC, but we were wrong. Raynald Comptois disassembled
- several video drivers and wrote his own programs to load the clut quickly, and
- his programs manage to do it within a frame. Raynald was kind enough to share
- his code with me. I passed it on to Peter Lennie and Bill Haake, who polished
- it, making it compatible with the 68040 processor, and added support for more
- cards. I polished their work, made the routines self contained, adding a
- "device" argument to allow use in Macs that have more than one video device, and
- quickly figuring out all the key parameters (mode, pixelSize, DAC size, and clut
- size). There are now two alternative front ends: SetEntriesQuickly() for new
- users, and macltset() for backward compatibility with programs that used
- Raynald's original routines. This modularity has increased runtime only
- slightly, a fraction of a millisecond.
-
- New drivers are hard to write, since they must directly address the registers of
- the video card, which are unique to each video card and undocumented. So the
- author of a driver must disassemble the original video driver and figure things
- out on his or her own. A lot of work.
-
- SetEntriesQuickly is unlike the standard video drivers in the following ways:
- 1. SetEntriesQuickly always takes less than 2 ms to load the whole clut.
- Some video drivers, e.g. for Apple's 8•24 card, often take several frames to
- finish loading the clut.
- 2A. SetEntriesQuickly does not wait for VBL, so a visible glitch may appear on
- the current frame, at least on some older video cards. (You can prevent this
- glitch by calling SetEntriesQuickly only at blanking time. Use a VBL task or
- WaitForBlanking().) If you write to the tables out of synch with the VBL you get
- glitches on most cards. Some new cards seem to have dual ported memory,
- as does the Quadra, so
- you can write at any point in the frame cycle without noticeable glitch, though
- of course you may for other reasons want to synch the update.
- 2B. A flag argument "waitForNextBlanking" is provided, but at present this
- option is only supported for the Toby video card.
- 3. SetEntriesQuickly ignores the gamma table, yielding the same result as using
- the standard video driver with an uncorrected gamma table. E.g. after calling
- GDUncorrectedGamma(device).
- 4. SetEntriesQuickly ignores the setting of the gray/color mode bit, always
- assuming color.
- 5. SetEntriesQuickly() does nothing and returns and error if the arguments
- specify loading of an out-of-range index. This is contrary to the Apple
- specification for the SetEntries Toolbox subroutine that when the clut index
- is out of range the driver implementing a
- setEntries control call should wrap around within the legal index range.
- 6. SetEntriesQuickly() does nothing and returns an error if the "count"
- specifies zero entries. This is contrary to the Apple specification that a
- setEntries control call with a count corresponding to zero entries will result
- in loading of entries specified by the "value" field of the ColorSpec.
- 7. SetEntriesQuickly does not have immediate access to the video driver's
- private tables. Therefore the first time you call SetEntriesQuickly() for a
- particular device there is an extra delay of about 1 ms while some key
- information is ferreted out. That information is cached, so subsequent calls
- for the same device will be fast, spending most of their time loading the clut.
-
- Two front ends are provided, for compatibility with two distinct traditions:
-
- OSErr SetEntriesQuickly(GDHandle device,short start,short count,ColorSpec *table);
-
- SetEntriesQuickly() uses the same calling convention as the VideoToolbox routine
- GDSetEntries() and, except for adding the GDHandle argument to specify the
- device, is also the same as Apple's SetEntries() Color Manager routine,
- documented in Inside Macintosh V-143. Apple specifies special behavior when
- count==-1, but we don't support that here and simply return with an error. I
- suggest that new users use SetEntriesQuickly. "start" is the index of the first
- clut entry to load, and should be greater than or equal to zero. "count" is the
- number of entries to load, minus 1. (Yes, "minus 1", that's Apple's convention.)
- "table" is a Colorspec array. (Each ColorSpec element is a structure consisting
- of a two-byte "value", which is not used, and a 6-byte "rgb", which, in turn is
- a structure of three 16-bit unsigned short ints: red, green, and blue. Apple's
- convention is that the MOST SIGNIFICANT BITS of the 16-bit color values are
- used. It is good practice in your programs to provide full 16-bit values, so
- that when you upgrade to fancier video cards with more-than-eight bit DACs your
- programs will benefit from the extra precision without needing any change.
- Returns zero if successful, nonzero if unsuccessful, i.e. illegal arguments.
-
- short macltset(GDHandle device,short start
- ,short* red,short* green,short* blue,short count1);
-
- macltset() uses a calling convention established by Raynald Comtois, and
- provides backward compatibility with older programs. "red", "green", and "blue"
- are arrays of 16-bit unsigned short ints, of which the LEAST SIGNIFICANT BITS
- are used. "start" is the index of the first entry to change. "count1" is the
- number of entries to change (contrary to Apple's convention).
-
- Both front ends use the same general-purpose subroutine: LoadClut(), which in
- turn calls the hardware-specific routine appropriate to the particular video
- device being used.
-
- The useMostSignificantBits bit of the "flags" argument specifies whether to use
- Apple's convention (for users of SetEntriesQuickly) or Raynald's convention
- (for users of macltset).
-
- At the moment all the supported video cards have 8-bit DACs, except the RasterOps
- ProColor 32, which has 9-bit DACs. If the useMostSignificantBits flag is true
- then you don't need to worry, as the least significant bit of the 9-bit DAC
- simply picks up the next lower bit from your numbers, giving you a tad more
- precision. However, if useMostSignificantBits flag is false then, in order to
- use the full range of the DAC you must make all your numbers twice as big, or --
- cludge time! -- set the useOnly8Bits flag, to request that your 8-bit numbers be
- multiplied by two, allowing you to use the whole range of the DAC without
- changing the rest of your program, but wasting the DAC's least significant bit
- by setting it permanently to zero.
-
- SUSPENDING INTERRUPTS. If you wish, the low-level routines will suspend
- interrupts while loading the clut. Presumably Raynald had his reasons for
- implementing this, so this "feature" is enabled when you use macltset(). Peter
- Lennie writes, "The switch to uninterruptable processor during the write is, I
- think, out of the original drivers (though I'm not absolutely sure). I imagine
- it's to avoid display glitches that would result from some higher priority
- interrupt suspending a clut rewrite somewhere in the middle." However, I
- don't see any advantage to suspending interrupts, and believe that there is a
- significant downside if you are trying to keep track of the VBL interrupts on
- several video cards, since suspending interrupts for 1 or 2 ms might be long
- enough to miss a whole VBL interval. Thus SetEntriesQuickly disables this
- "feature". However, this is not a philosophical debate. We all agree that
- interrupts should be suspended if doing otherwise would occasionally result in a
- visible glitch. Does anyone know?
-
- OSErr WaitForNextBlanking(GDHandle device);
-
- Waits for beginning of next blanking interval. Currently this supports only the
- Toby card (Apple's original video card, now obsolete).
-
- SPEED. SetEntriesQuickly() is self contained. You simply give it the GDHandle of
- your video device (as returned, e.g. by GetScreenDevice), and tell it what you
- want to do to the clut. In order to do this for you it needs to figure out a
- bunch of stuff about your video device. This research takes time; the first time
- you call it for a particular device it takes on the order of 1 ms to look up
- stuff. However, much of this stuff is saved in a cache, for each device, and can
- be retrieved quickly next time. Some users might be concerned that all this
- folderol is excessive for a fast operation that we may want to do at interrupt
- time. My answer is that the overhead is modest compared to the 1 or 2 ms
- required to actually load the clut, which, in my view, is too long a time to
- keep interrupts suspended anyway. My changes to the assembly language routines
- to deal with variable elementSpacing and using most- or least-significant bits
- have added extra operations. However, it is important to realize that the time
- penalty is insignificant because fast processors like the 68020, 68030, 68040
- are primarily limited by the time to access memory; by overlapping instruction
- execution they may lose almost no time at all doing register-to-register
- arithmetic. The new operations do not involve any new memory access.
-
- The coding of the LoadClut "driver" routines is a compromise between the needs
- of SetEntriesQuickly and macltset, which both use them. I decided not to write
- separate clut loading loops for the two cases (use most vs least significant
- bits). I believe (but have not tested) that adding a register offset instead of
- using an autoincrement instruction incurs essentially no time penalty because
- the processor automatically overlaps the execution of such instructions. So I
- think that SetEntriesQuickly is running flat out, and don't see any prospect of
- speeding it up significantly. On the other hand, I suspect that fetching the
- least significant byte by doing a byte access to an odd address (for macltset)
- does slow things down perhaps 30% (though I haven't timed it) over doing a word
- access to an even address, as Raynald had originally coded it. If that speed
- loss is unacceptable, then one could insert an if(flags&useMostSignificantBits)
- statement into the relevant subroutine and write two separate loops optimized
- for the two cases. My guess is that the current compromise will be acceptable to
- all users.
-
- NOTE: The first call of SetEntriesQuickly for each device is slow, but some
- stuff is cached so subsequent calls will be quick. The implication is that
- programs that use SetEntriesQuickly ought to call it once just for practice (to
- get the cache loading over with) before using it in a situation where speed
- matters.
-
- REQUIRES 68020 or better.
-
- IMPROVEMENTS:
- It is hoped that others will add to the functionality of this routine. Please
- share your enhancements with others by sending them to denis_pelli@isr.syr.edu
- for inclusion in the VideoToolbox.
-
- Those wishing to support new video devices should begin by buying and reading
- Apple's Designing Cards and Drivers, 3rd Ed., Addison-Wesley, and then use the
- VideoToolbox utility GetVideoDrivers to copy all your drivers into resource
- files, and use ResEdit with CODE editor to peruse them. The ResEdit CODE editor
- is a public domain file distributed by:
- Ira L. Ruben
- Apple Computer, Inc.
- 20525 Mariani Ave., MS: 37-A
- Cupertino, Ca. 95014
- Ira@Apple.Com
- ftp.apple.com:/dts/mac/tools/resedit/resedit-2-1-code-editor...
-
- It is logical that we identify the video card by the card name,
- GDCardName(driver), but in fact getting the card name is very slow (1.5 ms)
- whereas getting the driver name is fast, GDName(driver), and would be
- sufficiently unique for our purposes. (E.g. the Toby and TFB video cards have
- the same driver, and our code works for both cards.) Note that GDName() returns
- the address of a pascal string that should not be modified.
-
- KNOWN BUGS:
- Has not been tested on all the video devices that are supposed to be supported.
- Please run the demo TimeVideo, and send the results file to denis_pelli@isr.syr.edu
-
- The Quadra code requires that start==0. This could probably be figured out and
- fixed pretty easily if someone took the time to do so.
-
- These routines do not wait for the vertical blanking interval before loading the
- clut. On many video devices this results in visible hash on the screen. I (dgp)
- consider this a bug, but, for most of these devices I don't know how to wait for
- the end of frame, short of setting up an interrupt. (Just about every video card
- has a bit that one could monitor, but its address is usually undocumented.) Some
- devices may be ok, because of dual-ported RAMDAC memory, or hardware buffering
- of the clut-load request. Check this out on your device: try TimeVideo.
-
- HISTORY:
- 8/24/92 Original setcardbase and macltset provided by Raynald Comtois
- (raco@wjh12.harvard.edu) to Denis Pelli.
-
- 10/2/92 Bill Haake added code for the RasterOps ProColor 32, which has 9-bit
- DACs and 9-bit entries in the lookup table.
-
- 10/1/92 Peter Lennie added code for Quadra internal video. No provision for
- changing the start position in the table, (I couldn't find any relevant
- disassembly) so 'start' is ignored, and you should write the whole table.
-
- 9/30/92 Bill Haake & Peter Lennie modified the code for the 8x24 card
- and the 8x24GC to make it a) work properly in 32-bit mode. b) to fix a bug
- (feature?) of the original drivers that prevented the cards running on a Quadra.
- The drivers exploit 'byte-smearing' on the 68020 and 68030 (Tech Note 282). This
- means that one can move a byte to the lowest byte address of the data register
- on the card, when one actually wants to put it at address+3 (!!). The functions
- work for all the cards (except toby, which hasn't been tested) and on internal
- video in both 24 and 32 bit mode on Quadra 700/950, IIfx or ci running system
- 7.0.1.
-
- 9/28/92 Peter Lennie added the function findcard.
-
- 11/23/92 Denis Pelli (dgp) eliminated all globals because they implicitly
- assumed that there is only one video device. All routines now accept a GDHandle
- specifying which video device. Simplified the logic of GetCardBase(), minimizing
- the dependence on card type.
-
- 11/25/92 dgp When USE_MSB is true all the routines now use the most significant
- bits of the 16-bit elements of the user-supplied color tables. When it is false
- the least significant bits are used. This is mostly implemented by offseting the
- table pointers by one byte and only reading the desired byte. •Generalized
- macltset() to work with tables that have an arbitrary element spacing. This
- allows it to work with both with Raynald's convention of three arrays of shorts,
- and the Apple convention of a ColorSpec array, each element of which consists of
- red, green, blue, and value (which is not used). •Added alias "Toby frame buffer
- card" for tobycard.
-
- 11/27/92 dgp Broke out the code for each card into separate subroutines. This
- allows optimal register assignment for each routine, and makes it easier to read
- the THINK C disassembler output. The runtime overhead of loading and unloading
- the stack is negligible, and could be eliminated entirely by putting all the
- parameters in a structure and passing a pointer to it. •Added a flag,
- suspendInterrupts, to make interrupt suspension optional since it may be
- undesirable in some applications. (Blocking interrupts for 1 ms could cause you
- to miss the interrupt from a video card, especially if you are trying to keep
- track of interrupts on several video cards at once.)
-
- 11/30/92 dgp Wrote TestCluts, which reads back the clut and checks all
- values, and used it to test SetEntriesQuickly() on Quadra 950 internal video,
- Mac IIci internal video, hirescard, "Toby frame buffer card", and 8•24 card at
- all depths, for both 24- and 32-bit addressing. Toby card was tested on 68020,
- 68030, and 68040 processors. •Wrote documentation. •Replaced compile-time constants
- USE_MSB and PRO_8BITS by runtime flags passed as arguments. •Added WaitForNextBlanking()
- based on code from VideoTFB.c.
-
- 12/3/92 dgp Incorporated Peter Lennie's corrections and additions to the comments above.
-
- 12/8/92 dgp Added missing "case" to switch in WaitForNextBlanking.
-
- 12/13/92 dgp Changed erroneous "&d" to "%d" in a printf. Added some comments to
- the documentation above.
-
- 12/15/92 dgp Now get mode from device record and leave it in standard form,
- i.e. with the 0x80 bit set, and only strip off that bit when actually necessary,
- e.g. in LoadClutMacIIci. This allows comparison with Apple's predefined mode
- values, oneBitMode, etc., though that is done here at present. However, ordered
- comparisons would give more intuitive results if mode were declared as long, so
- that the sign bit would not be set.
-
- 12/30/92 dgp Make sure routines return zero when there's no error.
-
- 1/6/93 dgp Handle GDGetGamma failure gracefully, just guess at the dacSize.
-
- 2/15/93 dgp Rewrote nonworking LoadClutToby in C, and made it work. Rewrote
- nonworking LoadClutx824 in C, and made it work. Fixed sixteenBitMode in
- LoadClutQuadra. Use new SwapPriority instead of Get/SetPriority.
-
- 2/20/93 dgp Translated LoadClutGCx824 to C. (It was ignoring the start value.)
-
- 3/4/93 dgp Added macIIsi to list of supported cards, since it uses the same
- driver as the Mac IIci. Changed definitions of string types slightly to allow
- compilation of this file as a code resource. However, the assembly code
- uses more registers than are available to a code resource.
- */
- #include "VideoToolbox.h"
- #include <assert.h>
- #define USE_ONLY_8_BITS_IN_MACLTSET 0 // 1 to use RasterOps ProColor32 as an 8-bit DAC.
- #define C_CODE 1 // use C code to load clut, instead of assembly.
- // The C code is only about 10% slower than the assembly, and
- // is otherwise equivalent.
- // As of 11/30/92 this only affects the code for the hirescard.
- #pragma options(assign_registers,honor_register,redundant_loads,defer_adjust)
- #pragma options(global_optimizer,gopt_induction,gopt_loop,gopt_cse,gopt_coloring)
- #pragma options(mc68020)
-
- // These are the five user-callable routines:
- OSErr WaitForNextBlanking(GDHandle device);
- OSErr SetEntriesQuickly(GDHandle device,short start,short count,ColorSpec *table);
- short macltset(GDHandle device,short start
- ,short* red,short* green,short* blue,short count1);
- short GetCardType(GDHandle device);
- char *GetCardBase(GDHandle device);
-
- /*
- I suggest keeping the following information private to this file. In principle
- you could publish these card types and use them in your programs. However, in
- practice, I cannot see any point in doing so. If you need to identify the card
- name I suggest you simply use the string returned by GDCardName(device) in
- GDVideo.c of the VideoToolbox. (Don't forget to call DisposPtr() when you're
- through with the string.) Or use GDName(device), which returns the name of the
- card's driver, and is much quicker. If you simply want to know whether your
- video card is supported by SetEntriesQuickly.c then you can simply make sure
- that GetCardType() returns a nonzero type.
- */
- struct vtype { /* associates card name and id */
- char name[40];
- short id;
- };
- enum { /* card identifiers */
- tobycard = 1,
- hirescard,
- macIIci,
- macIIsi,
- x824card,
- x824GCcard,
- quadra700,
- quadra900,
- quadra950,
- procolor32
- };
- static struct vtype card[] = { // card name & id // Original author:
- {"Toby frame buffer card", tobycard}, // Raynald Comtois
- {"Display_Video_Apple_TFB", tobycard}, // " "
- {"Mac II High-Resolution Video Card", hirescard}, // Raynald Comtois
- {"Macintosh II Built-In Video", macIIci}, // Raynald Comtois
- {"Macintosh A Built-In Video", macIIsi}, // " "
- {"Macintosh Display Card", x824card}, // Raynald Comtois
- {"Macintosh Display Card 8•24 GC", x824GCcard},// Raynald Comtois
- {"Macintosh E Built-In Video", quadra700}, // Peter Lennie
- {"Macintosh C Built-In Video", quadra900}, // " "
- {"Macintosh G Built-In Video", quadra950}, // " "
- {"ProColor 32", procolor32} // Bill Haake
- };
- static char driverName[][40]= // Not used at present.
- {
- "\p.Display_Video_Apple_TFB" // Apple “Toby frame buffer card”
- ,"\p.Display_Video_Apple_HRVC" // Apple “Mac II High-Resolution Video Card”
- ,"\p.Display_Video_Apple_MDC" // Apple 8•24 “Macintosh Display Card”
- ,"\p.???" // Apple 8•24GC
- ,"\p.Display_Video_Apple_RBV1" // Mac IIci and IIsi built-in video
- ,"\p.Display_Video_Apple_DAFB" // Quadra 700, 900, 950 built-in video
- ,"\p.???" // Radius ProColor 32
- };
- enum { // Flags passed to LoadClut().
- suspendInterrupts=1,
- useMostSignificantBits=2,
- useOnly8Bits=4,
- waitForNextBlanking=8
- };
- enum{quadraNonzeroStart=111}; // value returned as error.
-
- short LoadClut(GDHandle device,short start,short count
- ,short* red,short* green,short* blue,long elementSpacing,short flags);
- OSErr LoadClutProColor(short start,short count,char *r,char *g,char *b
- ,long elementSpacing,short mode,short pixelSize,short ctSize
- ,char *cardBase,short flags);
- OSErr LoadClutQuadra(short start,short count,char *r,char *g,char *b
- ,long elementSpacing,short mode,short pixelSize,short ctSize
- ,char *cardBase,short flags);
- OSErr LoadClutMacIIci(short start,short count,char *r,char *g,char *b
- ,long elementSpacing,short mode,short pixelSize,short ctSize
- ,char *cardBase,short flags);
- OSErr LoadClutHiRes(short start,short count,char *r,char *g,char *b
- ,long elementSpacing,short mode,short pixelSize,short ctSize
- ,char *cardBase,short flags);
- OSErr LoadClutx824(short start,short count,char *r,char *g,char *b
- ,long elementSpacing,short mode,short pixelSize,short ctSize
- ,char *cardBase,short flags);
- OSErr LoadClutx824GC(short start,short count,char *r,char *g,char *b
- ,long elementSpacing,short mode,short pixelSize,short ctSize
- ,char *cardBase,short flags);
- OSErr LoadClutToby(short start,short count,char *r,char *g,char *b
- ,long elementSpacing,short mode,short pixelSize,short ctSize
- ,char *cardBase,short flags);
- /******************************************************************************/
- /*
- The arguments start, count, and table are the same as for the Color Manager call
- SetEntries(), documented in Inside Macintosh V-143. (Except that a count==-1 is
- considered illegal here.) Apple's ideosyncratic convention is that "count" is
- "zero-based", meaning that it is one less than the number of clut entries that
- you want to modify. "count" must be at least zero. Returns zero if successful,
- nonzero if unsuccessful, i.e. illegal arguments.
- */
- OSErr SetEntriesQuickly(GDHandle device,short start,short count,ColorSpec *table)
- {
- short flags=useMostSignificantBits;
- //flags+=suspendInterrupts; // Optional, no
- //flags+=waitForNextBlanking; // Optional, no
-
- return LoadClut(device,start,count
- ,(short *)&table[0].rgb.red,(short *)&table[0].rgb.green
- ,(short *)&table[0].rgb.blue,sizeof(table[0]),flags);
- }
- /******************************************************************************/
- short macltset(GDHandle device,register short start
- ,short* red,short* green,short* blue,short count1)
- {
- short flags=0;
- flags+=suspendInterrupts; // Optional, yes
- #if USE_ONLY_8_BITS_IN_MACLTSET
- flags+=useOnly8Bits; // Optional
- #endif
- //flags+=waitForNextBlanking; // Optional, no
-
- return LoadClut(device,start,count1-1,red,green,blue,sizeof(red[0]),flags);
- }
- /******************************************************************************/
- /*
- The first call to GetCardType for a particular device takes 1.5-3 ms, depending
- on your computer's speed, because it takes Apple's Slot Manager a long time to
- get the card name. However, GetCardType's answers are cached so subsequent calls
- for a previously queried device will be fast <100 µs.
- */
- short GetCardType(GDHandle device) // returns card type, if known, or zero if not.
- {
- register short i;
- short type;
- char *name;
- static GDHandle deviceCache[MAX_SCREENS];
- static short typeCache[MAX_SCREENS];
-
- // Do we already know the answer? Check the cache.
- for(i=0;i<MAX_SCREENS;i++)
- if(device==deviceCache[i])return typeCache[i];
- // Get card name, see if it's in our list of known cards
- name=GDCardName(device);
- type=0;
- for (i=0; i<sizeof(card)/sizeof(card[0]); i++){
- if(strcmp(name,card[i].name)==0){
- type=card[i].id;
- break;
- }
- }
- DisposePtr(name);
- // Save answer in cache.
- for(i=0;i<MAX_SCREENS;i++){
- if(deviceCache[i]==0){
- typeCache[i]=type;
- deviceCache[i]=device;
- break;
- }
- }
- return type;
- }
- /******************************************************************************/
- long internalVideoBase:0xDD8; // Undocumented System global
-
- char *GetCardBase(GDHandle device)
- {
- long cardBase,slot; /* slot must be declared long */
- short type;
-
- slot=GetDeviceSlot(device);
- if(slot==0){
- // Built-in video, not in a NuBus slot.
- // E.g.: macIIci,quadra700,quadra900,quadra950
- #if 1
- // This C is equivalent to Raynald's assembly code below.
- cardBase = *(long *)(internalVideoBase + *(long *)internalVideoBase + 56);
- #else
- asm {
- move.l 0xDD8,a0 /* get card base address */
- adda.l (a0),a0
- move.l 56(a0),a1
- move.l a1,cardBase
- }
- #endif
- }else{
- // Video card in NuBus slot
- type=GetCardType(device);
- switch(type){
- case x824GCcard:
- cardBase = slot << 28; /* a superslot */
- break;
- case tobycard:
- cardBase=0x01100000*slot | 0xF0000000;
- break;
- case hirescard:
- case procolor32: /* RasterOps Board */
- case x824card:
- default:
- cardBase = (slot<<24) | 0xF0000000;
- break;
- }
- }
- return (char *)cardBase;
- }
- /******************************************************************************/
- short LoadClut(GDHandle device,short start,short count
- ,short* red,short* green,short* blue
- ,long elementSpacing,short flags)
- {
- char *cardBase;
- short type=0; // type of card
- short ctSize; // entries, minus 1, in the lookup table
- short pixelSize; // Log2L(ctSize+1)
- short mode; // 0x80+Log2L(pixelSize)
- OSErr error;
- GammaTbl *gammaTblPtr=NULL;
- short dacSize=8; // bits (typically 8 or 9)
- static GDHandle deviceCache[MAX_SCREENS];
- static short typeCache[MAX_SCREENS],dacSizeCache[MAX_SCREENS];
- int i;
- short inCache=0;
-
- assert(sizeof(ctSize)==2);
- assert(sizeof(*cardBase)==1);
- if(device==NULL)return 1;
- // This setting up, before actually loading the clut, takes 1.7 ms
- // the first time a particular device
- // is used, but, only 200 µs for each subsequent use of that device
- // because we cache the key answers. Note that we are careful to
- // only save things that won't change: the card type and the DAC size.
- // The mode (and pixelSize and ctSize) could be changed by the user at any time
- // by calling Apple's SetDepth() or the VideoToolbox's GDSetMode().
- for(i=0;i<MAX_SCREENS;i++){ // Look in cache.
- if(device==deviceCache[i]){
- type=typeCache[i];
- dacSize=dacSizeCache[i];
- inCache=1;
- break;
- }
- }
- if(!inCache){ // Not in cache.
- type=GetCardType(device); // Takes 1.5 ms
- if(type==0)return 1;
- error=GDGetGamma(device,&gammaTblPtr); // Takes 200 µs.
- if(error){
- // printf("SetEntriesQuickly.c:LoadClut(): GDGetGamma() failed with error %d\n",error);
- dacSize=8;
- }else dacSize=gammaTblPtr->gDataWidth;
- for(i=0;i<MAX_SCREENS;i++){ // Save in cache.
- if(deviceCache[i]==NULL){
- deviceCache[i]=device;
- typeCache[i]=type;
- dacSizeCache[i]=dacSize;
- break;
- }
- }
- }
- if(type==0)return 1;
- cardBase=GetCardBase(device);
- // Get device mode from GDevice record. Fast but unreliable if user
- // calls GDSetMode().
- ctSize=(**(**(**device).gdPMap).pmTable).ctSize;
- pixelSize=(**(**device).gdPMap).pixelSize;
- mode=(**device).gdMode;
- if(0){ // Get mode by asking the device itself. Reliable but slow.
- // Conceivably one might be able to locate and directly access
- // the driver's internal table.
- error=GDGetMode(device,&mode,NULL,NULL); // takes 0.3-0.6 ms.
- pixelSize=1<<(mode&7);
- }
-
- // Check range.
- if(start>ctSize || start<0 || count+start>ctSize || count<0)return 1;
-
- if(waitForNextBlanking & flags){
- WaitForNextBlanking(device);
- }
- // After the above setting up, actually loading 256x3 clut entries takes 1-2 ms.
- red = (short *)StripAddress(red);
- green = (short *)StripAddress(green);
- blue = (short *)StripAddress(blue);
- switch (type) {
- // I packaged the code for each case into a separate subroutine
- // in order to allow the THINK C compiler to optimally optimize each
- // one independently. An important consideration is that the THINK C 5.04
- // compiler disables most optimizations for any function that includes
- // the "asm" directive anywhere within the function. Thus mixing C and assembly
- // will result in inefficient C.
- case procolor32:
- error=LoadClutProColor(start,count,(char *)red,(char *)green,(char *)blue,
- elementSpacing,mode,pixelSize,ctSize,cardBase,flags);
- break;
- case quadra700:
- case quadra900:
- case quadra950:
- error=LoadClutQuadra(start,count,(char *)red,(char *)green,(char *)blue,
- elementSpacing,mode,pixelSize,ctSize,cardBase,flags);
- break;
- case macIIci:
- case macIIsi:
- error=LoadClutMacIIci(start,count,(char *)red,(char *)green,(char *)blue,
- elementSpacing,mode,pixelSize,ctSize,cardBase,flags);
- break;
- case hirescard:
- error=LoadClutHiRes(start,count,(char *)red,(char *)green,(char *)blue,
- elementSpacing,mode,pixelSize,ctSize,cardBase,flags);
- break;
- case x824card:
- error=LoadClutx824(start,count,(char *)red,(char *)green,(char *)blue,
- elementSpacing,mode,pixelSize,ctSize,cardBase,flags);
- break;
- case x824GCcard:
- error=LoadClutx824GC(start,count,(char *)red,(char *)green,(char *)blue,
- elementSpacing,mode,pixelSize,ctSize,cardBase,flags);
- break;
- case tobycard:
- error=LoadClutToby(start,count,(char *)red,(char *)green,(char *)blue,
- elementSpacing,mode,pixelSize,ctSize,cardBase,flags);
- break;
- }
- return error;
- }
- /******************************************************************************/
- OSErr LoadClutProColor(short start,register short count
- ,register char *red,register char *green,register char *blue
- ,register long elementSpacing
- ,short mode,short pixelSize,short ctSize,char *cardBase,short flags)
- {
- char mmuMode=true32b,priority=7;
- register long bitShift;
-
- if(useMostSignificantBits & flags){
- bitShift=9;
- }else{
- if(useOnly8Bits & flags) bitShift=1;
- else bitShift=0;
- }
- if(suspendInterrupts & flags)SwapPriority(&priority);
- SwapMMUMode(&mmuMode);
- asm {
- move.l cardBase,a1 /* get card base address */
- adda.l #0xf60000,a1 /* offset to control registers */
- @9 move.w start,2(a1) /* Set the index on the card */
- move.w (red),d1
- add.l elementSpacing,red
- lsl.w bitShift,d1
- move.w d1,14(a1)
- move.w (green),d1
- add.l elementSpacing,green
- lsl.w bitShift,d1
- move.w d1,14(a1)
- move.w (blue),d1
- add.l elementSpacing,blue
- lsl.w bitShift,d1
- move.w d1,14(a1)
- addq.w #1,start /* Point to next entry in table */
- dbf count,@9
- }
- SwapMMUMode(&mmuMode);
- if(suspendInterrupts & flags)SwapPriority(&priority);
- return 0;
- }
- OSErr LoadClutQuadra(short start,register short count
- ,register char *red,register char *green,register char *blue
- ,register long elementSpacing
- ,short mode,short pixelSize,short ctSize,char *cardBase,short flags)
- {
- char mmuMode=true32b,priority=7;
-
- if(start!=0){
- //printf("LoadClutQuadra: start must be zero\n");
- return quadraNonzeroStart;
- }
- if(!(useMostSignificantBits & flags)){
- // Point to less significant byte of word.
- red++;
- green++;
- blue++;
- }
- if(suspendInterrupts & flags)SwapPriority(&priority);
- SwapMMUMode(&mmuMode);
- if(mode!=sixteenBitMode)asm{
- move.l cardBase,a1
- lea 0x210(a1), a1
- clr.l -16(a1)
- @4 move.b (red),d1
- add.l elementSpacing,red
- move.l d1,(a1)
- move.b (green),d1
- add.l elementSpacing,green
- move.l d1,(a1)
- move.b (blue),d1
- add.l elementSpacing,blue
- move.l d1,(a1)
- dbf count,@4
- }else asm{
- // In sixteenBitMode the clut addressing is weird.
- // I arrived at the following solution by trial and error.
- // It's a kludge, but is still fast enough. dgp.
- move.l cardBase,a1
- lea 0x210(a1), a1
- clr.l -16(a1)
- @44 move.b (red),d1
- move.l d1,(a1)
- move.b (green),d1
- move.l d1,(a1)
- move.b (blue),d1
- move.l d1,(a1)
-
- move.b (red),d1
- move.l d1,(a1)
- move.b (green),d1
- move.l d1,(a1)
- move.b (blue),d1
- move.l d1,(a1)
-
- move.b (red),d1
- move.l d1,(a1)
- move.b (green),d1
- move.l d1,(a1)
- move.b (blue),d1
- move.l d1,(a1)
-
- move.b (red),d1
- move.l d1,(a1)
- move.b (green),d1
- move.l d1,(a1)
- move.b (blue),d1
- move.l d1,(a1)
-
- move.b (red),d1
- move.l d1,(a1)
- move.b (green),d1
- move.l d1,(a1)
- move.b (blue),d1
- move.l d1,(a1)
-
- move.b (red),d1
- move.l d1,(a1)
- move.b (green),d1
- move.l d1,(a1)
- move.b (blue),d1
- move.l d1,(a1)
-
- move.b (red),d1
- move.l d1,(a1)
- move.b (green),d1
- move.l d1,(a1)
- move.b (blue),d1
- move.l d1,(a1)
-
- move.b (red),d1
- add.l elementSpacing,red
- move.l d1,(a1)
- move.b (green),d1
- add.l elementSpacing,green
- move.l d1,(a1)
- move.b (blue),d1
- add.l elementSpacing,blue
- move.l d1,(a1)
-
- dbf count,@44
- }
- SwapMMUMode(&mmuMode);
- if(suspendInterrupts & flags)SwapPriority(&priority);
- return 0;
- }
- OSErr LoadClutMacIIci(register short start,register short count
- ,register char *red,register char *green,register char *blue
- ,register long elementSpacing
- ,short mode,short pixelSize,short ctSize,char *cardBase,short flags)
- {
- static char realstartindex[] = {0xFE, 0xFC, 0xF0, 0x00};
- char priority=7;
-
- if(!(useMostSignificantBits & flags)){
- // Point to less significant byte of word.
- red++;
- green++;
- blue++;
- }
- if(suspendInterrupts & flags)SwapPriority(&priority);
- mode&=7;
- asm {
- move.w mode,d1
- add.b (realstartindex,a5,d1.w),start
- move.l cardBase,a0 // get card base address
- move.l a0,a1
- // move.b #255,8(a0) // not necessary
- move.b start,(a0)
- addq.l #4,a1
- @3 move.b (red),d1
- add.l elementSpacing,red
- move.b d1,(a1)
- move.b (green),d1
- add.l elementSpacing,green
- move.b d1,(a1)
- move.b (blue),d1
- add.l elementSpacing,blue
- move.b d1,(a1)
- dbf count,@3
- }
- if(suspendInterrupts & flags)SwapPriority(&priority);
- return 0;
- }
- // High resolution video card
- //#define HRVCBase 0x80000
- #define HRVCClutAddrReg 0x940E0
- #define HRVCClutWDataReg 0x940E4
- //#define HRVCClutRDataReg 0x94054
- OSErr LoadClutHiRes(short start,register short count
- ,register char *red,register char *green,register char *blue
- ,register long elementSpacing
- ,short mode,short pixelSize,short ctSize,char *cardBase,short flags)
- {
- char *bytePtr;
- char mmuMode=true32b,priority=7;
-
- if(!(useMostSignificantBits & flags)){
- // Point to less significant byte of word.
- red++;
- green++;
- blue++;
- }
- if(suspendInterrupts & flags)SwapPriority(&priority);
- SwapMMUMode(&mmuMode);
- #if C_CODE
- red+=count*elementSpacing;
- green+=count*elementSpacing;
- blue+=count*elementSpacing;
- // We'll start with clut entry start+count, and work
- // down from there to clut entry start. The clut address
- // register counts down automatically.
- *(cardBase+HRVCClutAddrReg)=~(ctSize-start-count);
- bytePtr=cardBase+HRVCClutWDataReg;
- #else
- asm {
- move.l elementSpacing,d0
- mulu count,d0
- add.l d0,red
- add.l d0,green
- add.l d0,blue
- move.w ctSize,d1
- sub.w start,d1
- sub.w count,d1
- not.b d1
- move.l cardBase,a0
- move.l a0,a1
- add.l #HRVCClutAddrReg,a0
- move.b d1,(a0)
- add.l #HRVCClutWDataReg,a1
- move.l a1,bytePtr
- };
- #endif
- #if C_CODE
- // This is the key loop. The THINK C compiler produces fast code
- // only if there is no "asm" inside the function.
- // The resulting C code is only about 10% slower than the assembly code.
- elementSpacing= -elementSpacing;
- do{
- *bytePtr=~ *red;
- red+=elementSpacing;
- *bytePtr=~ *green;
- green+=elementSpacing;
- *bytePtr=~ *blue;
- blue+=elementSpacing;
- }while(--count>=0);
- #else
- asm{
- // This loop is only 10% faster coded in assembly than in C, above.
- // I suspect the main difference is that the C compiler does a test and
- // branch instead of using the DBF instruction.
- move.l bytePtr,a1
- neg.l elementSpacing
- @0 move.b (red),d1
- add.l elementSpacing,red
- not.b d1
- move.b d1,(a1)
- move.b (green),d1
- add.l elementSpacing,green
- not.b d1
- move.b d1,(a1)
- move.b (blue),d1
- add.l elementSpacing,blue
- not.b d1
- move.b d1,(a1)
- dbf count,@0
- };
- #endif
- SwapMMUMode(&mmuMode);
- if(suspendInterrupts & flags)SwapPriority(&priority);
- return 0;
- }
- // Macintosh display card (8•24)
- //#define MDCVideoBase 0xA00
- #define MDCClutAddrReg 0x200200
- #define MDCClutDataReg 0x200204
- OSErr LoadClutx824(short start,register short count
- ,register char *red,register char *green,register char *blue
- ,register long elementSpacing
- ,short mode,short pixelSize,short ctSize,char *cardBase,short flags)
- {
- char mmuMode=true32b,priority=7;
- register char *clut;
- char *clutIndex;
-
- if(!(useMostSignificantBits & flags)){
- // Point to less significant byte of word.
- red++;
- green++;
- blue++;
- }
- if(suspendInterrupts & flags)SwapPriority(&priority);
- SwapMMUMode(&mmuMode);
- clut=cardBase+MDCClutDataReg+3;
- clutIndex=cardBase+MDCClutAddrReg;
- *clutIndex=start;
- for(;count>=0;count--){
- *clut=*red;
- red+=elementSpacing;
- *clut=*green;
- green+=elementSpacing;
- *clut=*blue;
- blue+=elementSpacing;
- }
- SwapMMUMode(&mmuMode);
- if(suspendInterrupts & flags)SwapPriority(&priority);
- return 0;
- }
- // Macintosh display card 8•24 GC
- #define MDCgcClutAddrReg 0x6C00000
- #define MDCgcClutDataReg 0x6C00004
- OSErr LoadClutx824GC(short start,register short count
- ,register char *red,register char *green,register char *blue
- ,register long elementSpacing
- ,short mode,short pixelSize,short ctSize,char *cardBase,short flags)
- {
- char mmuMode=true32b,priority=7;
- register long *clut;
- char *clutIndex;
-
- if(!(useMostSignificantBits & flags)){
- // Point to less significant byte of word.
- red++;
- green++;
- blue++;
- }
- if(suspendInterrupts & flags)SwapPriority(&priority);
- SwapMMUMode(&mmuMode);
- clutIndex=cardBase+MDCgcClutAddrReg;
- *clutIndex=start;
- #if 0
- clut=(long *)(cardBase+MDCgcClutDataReg);
- for(;count>=0;count--){
- *clut=(long)(*red)<<24;
- red+=elementSpacing;
- *clut=(long)(*green)<<24;
- green+=elementSpacing;
- *clut=(long)(*blue)<<24;
- blue+=elementSpacing;
- }
- #else
- asm {
- move.l cardBase,a1
- add.l #MDCgcClutDataReg,a1
- @8 move.b (red),d1
- add.l elementSpacing,red
- ror.l #8,d1
- move.l d1,(a1)
- move.b (green),d1
- add.l elementSpacing,green
- ror.l #8,d1
- move.l d1,(a1)
- move.b (blue),d1
- add.l elementSpacing,blue
- ror.l #8,d1
- move.l d1,(a1)
- dbf count,@8
- }
- #endif
- SwapMMUMode(&mmuMode);
- if(suspendInterrupts & flags)SwapPriority(&priority);
- return 0;
- }
- // Toby frame buffer
- //#define TFBBase 0x80000
- //#define TFBBufMid 0x80008
- //#define TFBBufLow 0x8000C
- //#define TFBIBase 0x8fffc
- #define TFBClutWDataReg 0x90018
- //#define TFBClutRDataReg 0x90028
- #define TFBClutAddrReg 0x9001C
- #define TFBReadVSync 0xD0000
- //#define TFBReadVInt 0xD0004
- //#define TFBReadIntlc 0xD0008
- //#define TFBVIntEnable 0xA0000
- //#define TFBVIntDisable 0xA0004
- OSErr LoadClutToby(short start,register short count
- ,register char *red,register char *green,register char *blue
- ,register long elementSpacing
- ,short mode,short pixelSize,short ctSize,char *cardBase,short flags)
- {
- register long index;
- char mmuMode=true32b,priority=7;
- register char *clut,*clutIndex;
- short shift;
-
- if(!(useMostSignificantBits & flags)){
- // Point to less significant byte of word.
- red++;
- green++;
- blue++;
- }
- index=(count+1)*elementSpacing;
- red+=index;
- green+=index;
- blue+=index;
- shift=8-pixelSize;
- index=start+count+1;
- clut=cardBase+TFBClutWDataReg;
- clutIndex=cardBase+TFBClutAddrReg;
- if(suspendInterrupts & flags)SwapPriority(&priority);
- for(;count>=0;count--,index--){
- *clutIndex=(index<<shift)-1;
- red-=elementSpacing;
- *clut=~*red;
- green-=elementSpacing;
- *clut=~*green;
- blue-=elementSpacing;
- *clut=~*blue;
- }
- if(suspendInterrupts & flags)SwapPriority(&priority);
- return 0;
- }
-
- OSErr WaitForNextBlanking(GDHandle device)
- // WaitForNextBlanking waits for the beginning of the next vertical blanking interval.
- // Returns 0 if successful, or 1 if device is not supported.
- {
- register long *blankingPtr;
-
- switch(GetCardType(device)){
- case tobycard:
- blankingPtr = (long *) ((char *)GetCardBase(device) + TFBReadVSync);
- while (*blankingPtr & 1L); // if we're already blanking, wait till end.
- while (!(*blankingPtr & 1L)); // wait until beginning of blanking interval.
- return 0;
- default:
- return 1;
- }
- }
-