home *** CD-ROM | disk | FTP | other *** search
Text File | 1985-09-21 | 41.6 KB | 974 lines | [TEXT/MACA] |
- ;
- ; procedure dissBits (srcB, destB: bitMap; srcR, dstR: rect); external;
- ;
- ; mike morton
- ; release: 15 september 1985
- ;
- ; this is the version 5.2. if it doesn't work right, try copyBits and see if
- ; it works. if there are bugs, they're likely due to the recent fancy
- ; optimizations, so version three (a much simpler version) should work instead.
- ;
- ; differences from version 5 to 5.2 are:
- ; bug fixed in the full-screen-width case (first bit done right)
- ; magic-constant table is compacted, stored in bytes instead of longs
- ; some small changes to save space, plus the table saves about 100 bytes
- ; tested by a program on several million cases; this helped find...
- ; yet more bugs in log2 routine fixed ("bitwidth" routine added)
- ; differences from version 4 are:
- ; nasty bug fixed in copying small rectangles
- ; new address for correspondence
- ; slightly more detailed notes on calling from C
- ; various miscellaneous comment changes
- ; differences from version 3 are:
- ; bugs in the log2 routine fixed
- ; general case sped up about five percent
- ; certain cases (e.g., the full screen) sped up over fifty percent
- ; the time to dissolve is not directly related to the size of the rect
- ; differences between version 2 and version 3 are:
- ; documentation improved and neatened
- ; log2 routine rewritten
- ;
- ; comments and suggestions are, of course, welcome.
- ;
- ; ******************************************************************************
- ; * *
- ; * copyright 1984, 1985 by michael s. morton *
- ; * please see details below on using, copying and changing this source. *
- ; * *
- ; ******************************************************************************
- ;
- ; what this routine does:
- ; ----------------------
- ;
- ; dissBits is like copyBits: it moves one rect to another, in their respective
- ; bitMaps. it doesn't implement the modes of copyBits, nor clipping to a
- ; region. what it DOES do is copy the bits in a pseudo-random order, giving
- ; the appearance of "dissolving" from one image to another. the dissolve is
- ; rapid: the entire screen will dissolve in under four seconds. (note: smaller
- ; areas may be SLOWER to dissolve -- see below.)
- ;
- ; copyBits pays attention to the current clipping. this routine doesn't.
- ;
- ; other likely differences from copyBits:
- ; o the rectangles must have the same extents (not necessarily the same
- ; lrbt). if they are not, the routine will return -- doing nothing!
- ; no stretching copy is done as copyBits would.
- ; o the cursor is hidden during the dissolve, since drawing is done without
- ; quickdraw calls. the cursor reappears when the drawing is finished.
- ; for an odd effect, change it not to hide the cursor; is this how bill
- ; atkinson thought of the spray can in MacPaint?
- ; o copyBits may be smart enough to deal with overlapping areas of memory.
- ; this routine certainly isn't.
- ; o because this routine is desperate for speed, it steals the A5 (globals)
- ; register, uses it in the central loop, then restores it before
- ; returning. if you have vertical retrace tasks which run during the
- ; dissolve, they'll wake up with the bogus A5. since the ROM pulls this
- ; trick too, i feel this isn't a bug in MY code, but it IS more likely
- ; to expose bugs in VBL tasks. to correct the problem, have your task
- ; load A5 from the low RAM location "currentA5" immediately upon starting
- ; up, then restore its caller's A5 before returning.
- ;
- ; you should know a few implementation details which may help:
- ; o copying from a dark area (lots of 1 bits) is slower than from a light
- ; area. but just barely (a few per cent).
- ; o there is no way to use this to randomly invert a rectangle. instead,
- ; copyBits it elsewhere, invert it, and dissBits it back into place.
- ; o there is also no way to slow the dissolve of a small area. to do this,
- ; copy a large area in which the only difference is the area to change.
- ; o if you fade in a solid area, you're likely to see patterns, since the
- ; random numbers are so cheesy. don't do this; fade in nifty patterns
- ; which will distract your viewers.
- ; o very small areas (less than 2 pixels in either dimension) are actually
- ; done with a call to the real copyBits routine, since the pseudo-random
- ; sequence generator falls apart under those conditions.
- ;
- ; a close relative of this routine is "dissBytes", which (as you might guess)
- ; copies a byte at a time, which is really fast (the whole screen in .1 or .2
- ; seconds). it works only for certain rectangles, though.
- ;
- ; sample calling code:
- ; -------------------
- ;
- ; this is an excerpt from how a prerelease of DarTerminal called this routine.
- ; note the clever use of "paintbehind". this took about 3 seconds to dissolve
- ; onto the screen.
- ;
- ;var rg: rgnhandle; (* window to copy into *)
- ; aport: grafptr; (* port to draw into *)
- ; bits: bitmap; (* new bitmap for that port *)
- ; r: rect; (* rectangle to draw into *)
- ; pat: pattern;
- ; text: packed array[1..37] of char;
- ; ...
- ; aport := grafptr(newptr(sizeof(grafport))); (* get a port *)
- ; openport(aport); (* make it current *)
- ;
- ; r := theport^.portbits.bounds; (* start with whole screen *)
- ; insetrect(r,100,100); (* get window-size rect *)
- ; (* note that the number of bytes per row must be even! *)
- ; bits.rowbytes := (((r.right-r.left)+15) div 16) * 2; (* find bytes/row *)
- ; bits.baseaddr := qdptr(newptr(bits.rowbytes*(r.bottom-r.top))); (* get space *)
- ; bits.bounds := r; (* boundary rect for bitmap *)
- ;
- ; setportbits(bits); (* make new bitmap current *)
- ;
- ; eraserect(r);
- ; textfont(london); textsize(18); textface([bold]);
- ; text := 'DarTerminal version -1.9 August 1984';
- ; textbox(@text,37,r,tejustcenter);
- ;
- ; dissbits(bits,screenport^.portbits,r,r); (* dissolve it in *)
- ;
- ; repeat until getnextevent(mdownmask+keydownmask,anevent); (* let 'em gawk *)
- ;
- ; rg := newrgn; (* get a region to clip with *)
- ; rectrgn(rg,r); (* as a rectangle *)
- ; paintbehind(windowpeek(frontwindow),rg); (* update this area of screen *)
- ;
- ; disposergn(rg);
- ; disposptr(ptr(bits.baseaddr));
- ; disposptr(ptr(aport));
- ;
- ;
- ; calling from languages other than pascal:
- ; ----------------------------------------
- ;
- ; this routine uses the standard Lisa Pascal calling sequence. to convert it to
- ; most C compilers, you'll probably just have to delete this instruction from
- ; near the end of the main routine:
- ; add.l #psize,SP ; unstack parameters
- ;
- ; i'd be very interested in hearing about successful uses of this routine from
- ; other languages.
- ;
- ; speed of the dissolve:
- ; ---------------------
- ;
- ; you need to pay attention to this section only if: (a) you want the dissolve
- ; to run as fast as it can OR (b) you do dissolves of various sizes and want
- ; them to take proportionate lengths of time.
- ;
- ; there are 3 levels of speed; the fastest possible one is chosen for you:
- ; (1) an ordinary dissolve will work when moving from any bitmap to any bitmap,
- ; including on the Lisa under MacWorks. this will dissolve at about 49
- ; microseconds per pixel. a rectangle one-quarter the size of the screen
- ; will dissolve in just over two seconds. the speed per pixel will vary
- ; slightly, and will be less if your rect extents are close to but less
- ; than powers of 2.
- ; (2) the dissolve will speed up if both the source and destination bitmaps have
- ; rowBytes fields which are powers of two. if you're copying to the screen
- ; on a mac, the rowBytes field already satisfies this. so, make your source
- ; bitmap the right width for a cheap speedup -- about 20% faster.
- ; (3) the fanciest level is intended for copying the whole screen. it'll paint
- ; it in about 3.4 seconds (19 microseconds per pixel). actually, painting
- ; any rectangle which is the full width of the screen will run at this
- ; speed, for what that's worth.
- ;
- ; duplication and use of this routine:
- ; -----------------------------------
- ;
- ; this is freeware. you're welcome to copy it and use it in programs. you're
- ; welcome to modify it, as long as you leave everything up until this section
- ; unchanged. i'd be very interested in seeing your changes, especially if you
- ; find ways to make the central loops faster. you're also welcome to port it
- ; to other machines/languages; i'd appreciate hearing about efforts to do this.
- ;
- ; this is "freeware"; please pay me if you use it. why?
- ;
- ; o if you have problems using it, i'll help you debug it.
- ; o i'll send you improved, debugged, faster versions.
- ; o i'll tell you about other products. this is the first thing i ever
- ; wrote for the Mac; wouldn't you like to see what else i've produced?
- ; send me some positive feedback!
- ;
- ; how much should you pay? my suggestion is:
- ; (cost of one copy of the program) * (log10 of number of copies sold)
- ; if the subroutine is an integral part of your program, double this amount.
- ; if it's a frill (e.g., you dissolve in your "About MacWhatever"), halve it.
- ;
- ; i find it hard to believe that any damages to you or anyone else could come
- ; from bugs in this routine. but, alas, whether or not you pay me, i can't be
- ; liable in any way for any problems in it.
- ;
- ; send comments, contributions, criticisms, or whatever to:
- ; mike morton
- ; INFOCOM
- ; 125 CambridgePark Dr.
- ; Cambridge, MA 02140
- ;
- ; if, for some reason, you only have a hard copy of this and would like a
- ; source on a diskette, please contact:
- ; robert hafer
- ; the boston computer society
- ; one center plaza
- ; boston, mass. 02108
- ;
-
- ;
- ; things to think about:
- ; ---------------------
- ;
- ; o clean up the register usage (as if i'll ever actually get around to this)
- ; o use a dynamically-built table to avoid the multiply instruction in the general case.
- ; this may not be a great idea since not everyone can afford that much space.
- ; o adapt the routine to do transfer modes. especially pattern modes, with no source.
- ; it'd run significantly faster doing a fill of just black or white! and patXor
- ; would be pretty cool too (see also the XOR idea below).
- ; o consider a front-end which does clipping (just like copyBits) by:
- ; - allocate yet another offscreen bitmap, whose bounds are the intersection of
- ; the destination rect with the bounding box of the destination's clipping rgn.
- ; - copy from the destination's bitmap into the temporary bitmap.
- ; - do a copyBits from the source to the temp, with the requested masking region.
- ; - dissolve from the temp to the destination, thus actually dissolving that whole
- ; rect, but changing only the clipped stuff.
- ; o a nifty way to speed up things would be to XOR the destination into the source [sic!].
- ; then the main loop doesn't have to copy a bit; it just tests if the bit is ON in the
- ; altered source bitmap and, if so, TOGGLES it in the destination. (i.e., it does a
- ; srcXor operation). to repair the source bitmap, a third XOR from the destination to
- ; the source should do it [any old-timer will recognize this triple XOR as the best
- ; way to swap two registers.] the trick is making the invisible XOR run so fast that
- ; the time to do it is less than the savings in the visible part. or perhaps the batch
- ; XORs could be done in spare-time...
- ; o implement a partial dissolve. this would let alex animate star trek-style teleporting.
- ; there are a lot of ways to do this. the easiest might be to include a counter in the
- ; loop ("a register! my kingdom for a register!"). alternately, we could assume that
- ; the list of starting and ending points could be limited, allowing a table-based system.
- ; this is something worth looking into, but could be real difficult.
- ; o look at rearranging instructions so CPU-intensive ones aren't grouped, allowing us to
- ; sneak in between video cycles more often.
- ; o add some real error-handling. how should we do this? return a status? fault? we
- ; could also return info to our caller on which of the three loops got used, so they know
- ; they're getting the speed they want.
- ; o don't use the BITWIDTH routine to test for exact powers of two in MULCHK. a number X]
- ; is an exact power of two if (x & -x) equals x.
- ;
-
- ;
- ; -- end of introduction; real stuff starts here --
- ;
-
- ;
- ; include files:
- ; tlasm/graftypes -- definitions of "bitMap" and "rect"
- ; tlasm/quickmacs -- macros for quickdraw calls (e.g., _hidecursor)
- ;
-
- .nolist
- .include tlasm/graftypes
- .include tlasm/quickmacs
- .list
-
- ;
- ; definitions of the "ours" record: this structure, of which there are two copies in
- ; our stack frame, is a sort of bitmap:
- ;
-
- oRows .equ 0 ; (word) number of last row (first is 0)
- oCols .equ oRows+2 ; (word) number of last column (first is 0)
- oLbits .equ oCols+2 ; (word) size of left margin within 1st byte
- oStride .equ oLbits+2 ; (word) stride in memory from row to row
- oBase .equ oStride+2 ; (long) base address of bitmap
-
- osize .equ oBase+4 ; size, in bytes, of "ours" record
-
- ;
- ; stack frame elements:
- ;
-
- srcOurs .equ -osize ; (osize) our view of source bits
- dstOurs .equ srcOurs-osize ; (osize) our view of target bits
-
- sflast .equ dstOurs ; relative address of last s.f. member
- sfsize .equ -sflast ; size of s.f. for LINK (must be EVEN!)
- ;
- ; parameter offsets from the stack frame pointer, A6:
- ; last parameter is above return address and old s.f.
- ;
-
- dRptr .equ 4+4 ; ^destination rectangle
- sRptr .equ dRptr+4 ; ^source rectangle
- dBptr .equ sRptr+4 ; ^destination bitMap
- sBptr .equ dBptr+4 ; ^source bitMap
-
- plast .equ sBptr+4 ; address just past last parameter
-
- psize .equ plast-dRptr ; size of parameters, in bytes
-
- ;
- ; entrance: set up a stack frame, save some registers, hide the cursor.
- ;
-
- .proc dissBits ; main entry point
-
- link A6,#-sfsize ; set up a stack frame
- movem.l D3-D7/A2-A5,-(SP) ; save registers compiler may need
- _hidecurs ; don't let the cursor show for now
-
- ;
- ; convert the source and destination bitmaps and rectangles to a format we prefer.
- ; we won't look at these parameters after this.
- ;
-
- move.l sBptr(A6),A0 ; point to source bitMap
- move.l sRptr(A6),A1 ; and source rectangle
- lea srcOurs(A6),A2 ; and our source structure
- bsr CONVERT ; convert to our format
-
- move.l dBptr(A6),A0 ; point to destination bitMap
- move.l dRptr(A6),A1 ; and rectangle
- lea dstOurs(A6),A2 ; and our structure
- bsr CONVERT ; convert to our format
-
- ;
- ; check that the rectangles match in size.
- ;
- move.w srcOurs+oRows(A6),D0 ; pick up the number of rows
- cmp.w dstOurs+oRows(A6),D0 ; same number of rows?
- bne ERROR ; nope -- bag it
-
- move.w srcOurs+oCols(A6),D0 ; check the number of columns
- cmp.w dstOurs+oCols(A6),D0 ; same number of columns, too?
- bne ERROR ; that's a bozo no-no
-
- ;
- ; figure the bit-width needed to span the columns, and the rows.
- ;
-
- move.w srcOurs+oCols(A6),D0 ; get count of columns
- ext.l D0 ; make it a longword
- bsr LOG2 ; figure bit-width
- move.w D0,D1 ; set aside that result
- beq SMALL ; too small? wimp out and do it with copyBits
-
- move.w srcOurs+oRows(A6),D0 ; get count of rows
- ext.l D0 ; make it a longword
- bsr LOG2 ; again, find the bit-width
- tst.w D0 ; is the result zero?
- beq SMALL ; if so, our algorithm will screw up
-
- ;
- ; set up various constants we'll need in the in the innermost loop
- ;
-
- move.l #1,D5 ; set up...
- lsl.l D1,D5 ; ...the bit mask which is...
- sub.l #1,D5 ; ...bit-width (cols) 1's
-
- add.w D1,D0 ; find total bit-width (rows plus columns)
- lea TABLE,A0 ; point to the table of XOR masks
- moveq #0,D3 ; clear out D3 before we fill the low byte
- move.b 0(A0,D0),D3 ; grab the correct XOR mask in D3
-
- ;
- ; the table is saved compactly, since none of the masks are wider than a byte. we have to unpack it so
- ; the high-order bit of the D0-bit-wide field is on:
- ;
-
- UNPACK add.l D3,D3 ; shift left by one
- bpl.s UNPACK ; keep moving until the top bit that's on is aligned at the top end
-
- rol.l D0,D3 ; now swing the top D0 bits around to be the bottom D0 bits, the mask
- move.l D3,D0 ; 1st sequence element is the mask itself
-
- ;
- ; do all kinds of preparation:
- ;
-
- move.l srcOurs+oBase(A6),D2 ; set up base pointer for our source bits
- lsl.l #3,D2 ; make it into a bit address
- move.l D2,A0 ; put it where the fast loop will use it
- move.w srcOurs+oLbits(A6),D2 ; now pick up source left margin
- ext.l D2 ; make it a longword
- add.l D2,A0 ; and make A0 useful for odd routine below
-
- move.l dstOurs+oBase(A6),D2 ; set up base pointer for target
- lsl.l #3,D2 ; again, bit addressing works out faster
- move.l D2,A1 ; stuff it where we want it for the loop
- move.w dstOurs+oLbits(A6),D2 ; now pick up destination left margin
- ext.l D2 ; make it a longword
- add.l D2,A1 ; and make A1 useful, too
-
- move.w srcOurs+oCols(A6),A2 ; pick up the often-used count of columns
- move.w srcOurs+oRows(A6),D2 ; and of rows
- add.w #1,D2 ; make row count one-too-high for compares
- ext.l D2 ; and make it a longword
- lsl.l D1,D2 ; slide it to line up w/rows part of D0
- move.l D2,A4 ; and save that somewhere useful
-
- move.w D1,D2 ; put log2(columns) in a safe place (sigh)
-
- ;
- ; try to reduce the amount we shift down D2. this involves:
- ; halving the strides as long as each is even, decrementing D2 as we go
- ; masking the bottom bits off D4 when we extract the row count in the loop
- ;
- ; alas, we can't always shift as little as we want. for instance, if we don't
- ; shift down far enough, the row count will be so high as to exceed a halfword,
- ; and the dread mulu instruction won't work (it eats only word operands). so,
- ; we have to have an extra check to take us out of the loop early.
- ;
-
- move.w srcOurs+oStride(A6),D4 ; pick up source stride
- move.w dstOurs+oStride(A6),D7 ; and target stride
- move.w srcOurs+oRows(A6),D1 ; pick up row count for kludgey check
-
- tst.w D2 ; how's the bitcount?
- beq.s HALFDONE ; skip out if already down to zero
-
- HALFLOOP
- btst #0,D4 ; is this stride even?
- bne.s HALFDONE ; nope -- our work here is done
- btst #0,D7 ; how about this one?
- bne.s HALFDONE ; have to have both even
-
- lsl.w #1,D1 ; can we keep max row number in a halfword?
- bcs.s HALFDONE ; nope -- D2 mustn't get any smaller!
-
- lsr.w #1,D4 ; halve each stride...
- lsr.w #1,D7 ; ...like this
- sub.w #1,D2 ; and remember not to shift down as far
- bne.s HALFLOOP ; loop unless we're down to no shift at all
-
- HALFDONE ; no tacky platitudes, please
- move.w D4,srcOurs+oStride(A6) ; put back source stride
- move.w D7,dstOurs+oStride(A6) ; and target stride
-
- ;
- ; make some stuff faster to access -- use the fact that (An) is faster to access
- ; than d(An). this means we'll misuse our frame pointer, but don't worry -- we'll
- ; restore it before we use it again.
- ;
-
- move.w srcOurs+oStride(A6),A5 ; make source stride faster to access, too
- move.l A6,-(SP) ; save framitz pointer
- move.w dstOurs+oStride(A6),A6 ; pick up destination stride
- move.l #0,D6 ; we do only AND.W x,D6 -- but ADD.L D6,x
-
- clr.w -(SP) ; reserve room for function result
- bsr MULCHK ; go see if strides are powers of two
- tst.w (SP)+ ; can we eliminate the horrible MULUs?
- bne NOMUL ; yes! hurray!
-
- ;
- ; main loop: map the sequence element into rows and columns, check if it's in bounds
- ; and skip on if it's not, flip the appropriate bit, generate the next element in the
- ; sequence, and loop if the sequence isn't done.
- ;
-
- ;
- ; check the row bounds. note that we can check the row before extracting it from
- ; D0, ignoring the bits at the bottom of D0 for the columns. to get these bits
- ; to be ignored, we had to make A4 one-too-high before shifting it up to align it.
- ;
-
- LOOP ; here for another time around
- cmp.l A4,D0 ; is row in bounds?
- bge.s NEXT ; no: clip this
-
- ;
- ; map it into the column; check bounds. note that we save this check for second;
- ; it's a little slower because of the move and mask.
- ;
- ; chuck sagely points out that when the "bhi" at the end of the loop takes, we
- ; know we can ignore the above comparison. thanks, chuck. you're a great guy.
- ;
-
- LOOPROW ; here when we know the row number is OK
- move.w D0,D6 ; copy the sequence element
- and.w D5,D6 ; find just the column number
-
- cmp.w A2,D6 ; too far to the right? (past oCols?)
- bgt.s NEXT ; yes: skip out
-
- move.l D0,D4 ; we know element will be used; copy it
- sub.w D6,D4 ; remove column's bits
- lsr.l D2,D4 ; shift down to row, NOT right-justified
-
- ;
- ; get the source byte, and bit offset. D4 has the bit offset in rows, and
- ; D6 is columns.
- ;
-
- move.w A5,D1 ; get the stride per row (in bits)
- mulu D4,D1 ; stride * row; find source row's offset in bits
- add.l D6,D1 ; add in column offset (bits)
- add.l A0,D1 ; plus base of bitmap (bits [sic])
- move.b D1,D7 ; save the bottom three bits for the BTST
- lsr.l #3,D1 ; while we shift down to a word address
- move.l D1,A3 ; and save that for the test, too
- not.b D7 ; get right bit number (compute #7-D7)
-
- ;
- ; find the destination bit address and bit offset
- ;
-
- move.w A6,D1 ; extract cunningly hidden destination stride
- mulu D1,D4 ; stride*row number = dest row's offset in bits
- add.l D6,D4 ; add in column bit offset
- add.l A1,D4 ; and base address, also in bits
- move.b D4,D6 ; set aside the bit displacement
- lsr.l #3,D4 ; make a byte displacement
- not.b D6 ; get right bit number (compute #7-D6)
-
- btst D7,(A3) ; test the D7th bit of source byte
- move.l D4,A3 ; point to target byte (don't lose CC from btst)
- bne.s SETON ; if on, go set destination on
- bclr D6,(A3) ; else clear destination bit
-
- ;
- ; find the next sequence element. see knuth, vol ii., page 29 for sketchy details.
- ;
-
- NEXT ; jump here if D0 not in bounds
- lsr.l #1,D0 ; slide one bit to the right
- bhi.s LOOPROW ; if no carry out, but not zero, loop
-
- eor.l D3,D0 ; flip magic bits appropriate to the bitwidth we want...
- cmp.l D3,D0 ; ...but has this brought us to square 1?
- bne.s LOOP ; if not, loop back; else...
-
- bra.s DONE ; ...we're finished
-
- SETON
- bset D6,(A3) ; source bit was on: set destination on
-
- ; copy of above code, stolen for inline speed -- sorry.
- lsr.l #1,D0 ; slide one bit to the right
- bhi.s LOOPROW ; if no carry out, but not zero, loop
- eor.l D3,D0 ; flip magic bits...
- cmp.l D3,D0 ; ...but has this brought us to square 1?
- bne.s LOOP ; if not, loop back; else fall through
-
-
- ;
- ; here when we're done; the (0,0) point has not been done yet. this is
- ; really the (0,left margin) point. we also jump here from another copy loop.
- ;
-
- DONE
- move.l (SP)+,A6 ; restore stack frame pointer
-
- move.w srcOurs+oLbits(A6),D0 ; pick up bit offset of left margin
- move.w dstOurs+oLbits(A6),D1 ; and ditto for target
- not.b D0 ; flip to number the bits for 68000
- not.b D1 ; ditto
-
- ; alternate, late entrance, when SCREEN routine has already set up D0 and D1 (it doesn't want the bit
- ; offset negated).
-
- DONEA ; land here with D0, D1 set
- move.l srcOurs+oBase(A6),A0 ; set up base pointer for our source bits
- move.l dstOurs+oBase(A6),A1 ; and pointer for target
-
- bset D1,(A1) ; assume source bit was on; set target
- btst D0,(A0) ; was first bit of source on?
- bne.s DONE2 ; yes: skip out
- bclr D1,(A1) ; no: oops! set it right, and fall through
-
- ;
- ; return
- ;
-
- DONE2 ; here when we're really done
- ERROR ; we return silently on errors
- _showcurs ; let's see this again
-
- movem.l (SP)+,D3-D7/A2-A5 ; restore lots of registers
- unlk A6 ; restore caller's stack frame pointer
- move.l (SP)+,A0 ; pop return address
- add.l #psize,SP ; unstack parameters
- jmp (A0) ; home to mother
-
- ;
- ; -----------------------------------------------------------------------------------
- ;
- ; sleazo code for when we're asked to dissolve very small regions. if either dimension of the rectangle
- ; is too small, we bag it and just delegate the problem to copyBits. a possible problem with this is if
- ; someone decides to substitute us for the standard copyBits routine -- this case will become recursive...
- ;
-
- SMALL ; here when it's too small to copy ourselves
- move.l sBptr(A6),-(SP) ; push args: source bitmap
- move.l dBptr(A6),-(SP) ; destination bitmap
- move.l sRptr(A6),-(SP) ; source rectangle
- move.l dRptr(A6),-(SP) ; destination rectangle
- move.w #srcCopy,-(SP) ; transfer mode -- source copy
- clr.l -(SP) ; mask region -- NIL
- _copyBits ; do the copy in quickdraw-land
- bra.s DONE2 ; head for home
-
- ;
- ; -----------------------------------------------------------------------------------
- ;
- ; code identical to the usual loop, but A5 and A6 have been changed to shift counts.
- ; other than that, it's the same. really it is! well, no, wait a minute...
- ; because we don't have to worry about the word-size mulu operands, we can collapse
- ; the shifts and countershifts further as shown below:
-
- NOMUL ; here for alternate version of loop
- tst.w D2 ; is right shift zero?
- beq.s NOMUL2 ; yes: can't do much more...
- cmp.w #0,A5 ; how about one left shift (for source stride)?
- beq.s NOMUL2 ; yes: ditto
- cmp.w #0,A6 ; and the other left shift (destination stride)?
- beq.s NOMUL2 ; yes: can't do much more...
-
- sub.w #1,D2 ; all three...
- sub.w #1,A5 ; ...are...
- sub.w #1,A6 ; ...collapsible
- bra.s NOMUL ; go see if we can go further
-
- ;
- ; see if we can do the super-special-case loop, which basically is equivalent to any rectangle
- ; where the source and destination are both exactly the width of the Mac screen.
- ;
-
- NOMUL2 ; here when D2, A5, and A6 are all collapsed
- tst.w D2 ; did this shift get down to zero?
- bne.s NLOOP ; no: skip to first kludged loop
- cmp.w #0,A5 ; is this zero?
- bne.s NLOOP ; no: again, can't make further optimization
- cmp.w #0,A6 ; how about this?
- bne.s NLOOP ; no: the best-laid plans of mice and men...
- cmp.w A2,D5 ; is there no check on the column?
- bne.s NLOOP ; not a power-of-two columns; rats!
-
- move.w A0,D6 ; grab the base address of the source
- and.b #7,D6 ; select the low three bits
- bne.s NLOOP ; doesn't sit on a byte boundary; phooey
-
- move.w A1,D6 ; now try the base of the destination
- and.b #7,D6 ; and select its bit offset
- beq.s SCREEN ; yes! do extra-special loop!
-
- ;
- ; fast, but not super-fast loop, used when both source and destination bitmaps have strides which are
- ; powers of two.
- ;
-
- NLOOP ; here for another time around
- cmp.l A4,D0 ; is row in bounds?
- bge.s NNEXT ; no: clip this
-
- NLOOPROW ; here when we know the row number is OK
- move.w D0,D6 ; copy the sequence element
- and.w D5,D6 ; find just the column number
-
- cmp.w A2,D6 ; too far to the right? (past oCols?)
- bgt.s NNEXT ; yes: skip out
-
- move.l D0,D4 ; we know element will be used; copy it
- sub.w D6,D4 ; remove column's bits
- lsr.l D2,D4 ; shift down to row, NOT right-justified
-
- move.w A5,D7 ; get log2 of stride per row (in bits)
- move.l D4,D1 ; make a working copy of the row number
- lsl.l D7,D1 ; * stride/row is source row's offset in bits
- add.l D6,D1 ; add in column offset (bits)
- add.l A0,D1 ; plus base of bitmap (bits [sic])
- move.b D1,D7 ; save the bottom three bits for the BTST
- lsr.l #3,D1 ; while we shift down to a byte address
- move.l D1,A3 ; and save that for the test, too
- not.b D7 ; get right bit number (compute #7-D7)
-
- move.w A6,D1 ; extract log2 of destination stride
- lsl.l D1,D4 ; stride*row number = dest row's offset in bits
- add.l D6,D4 ; add in column bit offset
- add.l A1,D4 ; and base address, also in bits
- move.b D4,D6 ; set aside the bit displacement
- lsr.l #3,D4 ; make a byte displacement
- not.b D6 ; get right bit number (compute #7-D6)
-
- btst D7,(A3) ; test the D7th bit of source byte
- move.l D4,A3 ; point to target byte (don't ruin CC from btst)
- bne.s NSETON ; if on, go set destination on
- bclr D6,(A3) ; else clear destination bit
-
- NNEXT ; jump here if D0 not in bounds
- lsr.l #1,D0 ; slide one bit to the right
- bhi.s NLOOPROW ; if no carry out, but not zero, loop
- eor.l D3,D0 ; flip magic bits...
- cmp.l D3,D0 ; ...but has this brought us to square 1?
- bne.s NLOOP ; if not, loop back; else...
- bra.s DONE ; ...we're finished
-
- NSETON
- bset D6,(A3) ; source bit was on: set destination on
- lsr.l #1,D0 ; slide one bit to the right
- bhi.s NLOOPROW ; if no carry out, but not zero, loop
- eor.l D3,D0 ; flip magic bits...
- cmp.l D3,D0 ; ...but has this brought us to square 1?
- bne.s NLOOP ; if not, loop back; else fall through
- bra.s DONE ; and finish
-
- ;
- ; -----------------------------------------------------------------------------------
- ;
- ; super-special case, which happens to hold for the whole mac screen -- or subsets
- ; of it which are as wide as the screen. here, we've found that the shift counts
- ; in D2, A5, and A6 can all be collapsed to zero. and D5 equals A2, so there's
- ; no need to check whether D6 is in limits -- or even take it out of D0! so, this loop
- ; is the NLOOP code without the shifts or the check on the column number. should
- ; run like a bat; have you ever seen a bat run?
- ;
- ; oh, yes, one further restriction -- the addresses in A0 and A1 must point to
- ; integral byte addresses with no bit offset. (this still holds for full-screen
- ; copies.) because both the source and destination are byte-aligned, we can skip
- ; the ritual Negation Of The Bit Offset which the 68000 usually demands.
-
- SCREEN ; here to set up to do the whole screen, or at least its width
- move.l A0,D6 ; take the base source address...
- lsr.l #3,D6 ; ... and make it a byte address
- move.l D6,A0 ; replace pointer
-
- move.l A1,D6 ; now do the same...
- lsr.l #3,D6 ; ...for...
- move.l D6,A1 ; ...the destination address
-
- bra.s N2LOOP ; jump into loop
-
- N2HEAD ; here when we shifted and a bit carried out
- eor.l D3,D0 ; flip magic bits to make the sequence work
-
- N2LOOP ; here for another time around
- cmp.l A4,D0 ; is row in bounds?
- bge.s N2NEXT ; no: clip this
-
- N2LOOPROW ; here when we know the row number is OK
- move.l D0,D1 ; copy row number, shifted up, plus column offset
- lsr.l #3,D1 ; while we shift down to a word offset
-
- btst D0,0(A0,D1) ; test bit of source byte
- bne.s N2SETON ; if on, go set destination on
- bclr D0,0(A1,D1) ; else clear destination bit
-
- N2NEXT ; jump here if D0 not in bounds
- lsr.l #1,D0 ; slide one bit to the right
- bhi.s N2LOOPROW ; if no carry out, but not zero, loop
- bne.s N2HEAD ; if carry out, but not zero, loop earlier
- bra.s N2DONE ; 0 means next sequence element would have been D3
-
- N2SETON
- bset D0,0(A1,D1) ; source bit was on: set destination on
- lsr.l #1,D0 ; slide one bit to the right
- bhi.s N2LOOPROW ; if no carry out, but not zero, loop
- bne.s N2HEAD ; if carry out, but not zero, loop earlier
- ; zero means the loop has closed on itself
-
- ;
- ; because our bit-numbering isn't like that of the other two loops, we set up D0 and D1
- ; ourselves before joining a bit late with the common code to get the last bit.
- ;
-
- N2DONE
- move.l (SP)+,A6 ; restore the stack frame pointer
-
- move.w srcOurs+oLbits(A6),D0 ; pick up bit offset of left margin
- move.w dstOurs+oLbits(A6),D1 ; and ditto for target
- bra DONEA ; go do the first bit, which the sequence doesn't cover
-
- ;
- ; -----------------------------------------------------------------------------------
- ;
-
- ; mulchk -- see if we can do without multiply instructions.
- ;
- ; calling sequence:
- ; A5 holds the source stride
- ; A6 holds the destination stride
- ; clr.w -(SP) ; reserve room for boolean function return
- ; bsr MULCHK ; go check things out
- ; tst.w (SP)+ ; test result
- ; bne.s SHIFT ; if non-zero, we can shift and not multiply
- ;
- ; (if we can shift, A5 and A6 have been turned into shift counts)
- ;
- ; registers used: none (A5, A6)
-
- MULCHK
-
- movem.l D0-D3,-(SP) ; stack caller's registers
- move.l A5,D0 ; take the source stride
- bsr BITWIDTH ; take log base 2
- move.l #1,D1 ; pick up a one...
- lsl.l D0,D1 ; ...and try to recreate the stride
- cmp.l A5,D1 ; does it come out the same?
- bne.s NOMULCHK ; nope -- bag it
- move.w D0,D3 ; save magic logarithm of source stride
-
- move.l A6,D0 ; yes -- now how about destination stride?
- bsr BITWIDTH ; convert that one, also
- move.l #1,D1 ; again, try a single bit...
- lsl.l D0,D1 ; ...and see if original # was 1 bit
- cmp.l A6,D1 ; how'd it come out?
- bne.s NOMULCHK ; doesn't match -- bag this
-
- ;
- ; we can shift instead of multiplying. change address registers & tell our caller.
- ;
- move.w D3,A5 ; set up shift for source stride
- move.w D0,A6 ; and for destination stride
- st 4+16(SP) ; tell our caller what's what
- bra.s MULRET ; and return
-
- NOMULCHK
-
- sf 4+16(SP) ; tell caller we can't optimize
- MULRET ; here to return; result set
- movem.l (SP)+,D0-D3 ; pop some registers
- rts ; all set
-
- ;
- ; -----------------------------------------------------------------------------------
- ;
- ; table of (longword) masks to XOR in strange Knuthian algorithm. the first table
- ; entry is for a bit-width of two, so the table actually starts two bytes before
- ; that. hardware jocks among you may recognize this scheme as the software analog
- ; of a "maximum-length sequence generator".
- ;
- ; to save a bit of room, masks are packed in bytes, but should be aligned as
- ; described in the code before being used.
- ;
-
- table .equ *-2 ; first element is #2
- .byte 3o ; 2
- .byte 3o ; 3
- .byte 3o ; 4
- .byte 5o ; 5
- .byte 3o ; 6
- .byte 3o ; 7
- .byte 27o ; 8
- .byte 21o ; 9
- .byte 11o ; 10
- .byte 5o ; 11
- .byte 145o ; 12
- .byte 33o ; 13
- .byte 65o ; 14
- .byte 3o ; 15
- .byte 55o ; 16
- .byte 11o ; 17
- .byte 201o ; 18
- .byte 71o ; 19
- .byte 11o ; 20
- .byte 5o ; 21
- .byte 3o ; 22
- .byte 41o ; 23
- .byte 33o ; 24
- .byte 11o ; 25
- .byte 161o ; 26
- .byte 71o ; 27
- .byte 11o ; 28
- .byte 5o ; 29
- .byte 145o ; 30
- .byte 11o ; 31
- .byte 243o ; 32
- .align 2
-
- ;
- ; -----------------------------------------------------------------------------------
- ;
- ; convert -- convert a parameter bitMap and rectangle to our internal form.
- ;
- ; calling sequence:
- ; lea bitMap,A0 ; point to the bitmap
- ; lea rect,A1 ; and the rectangle inside it
- ; lea ours,A2 ; and our data structure
- ; bsr CONVERT ; call us
- ;
- ; when done, all fields of the "ours" structure are filled in:
- ; oBase is the address of the first byte in which any bits are to be changed
- ; oLbits is the number of bits into that first byte which are ignored
- ; oStride is the stride from one row to the next, in bits
- ; oCols is the number of columns in the rectangle
- ; oRows is the number of rows
- ;
- ; registers used: D0, D1, D2
- ;
-
- CONVERT
-
- ;
- ; save the starting word and bit address of the stuff:
- ;
- move.w top(A1),D0 ; pick up top of inner rectangle
- sub.w bounds+top(A0),D0 ; figure rows to skip within bitmap
- mulu rowbytes(A0),D0 ; compute bytes to skip (relative offset)
-
- add.l baseaddr(A0),D0 ; find absolute address of first row to use
-
- move.w left(A1),D1 ; pick up left coordinate of inner rect
- sub.w bounds+left(A0),D1 ; find columns to skip
- move.w D1,D2 ; copy that
- and.w #7,D2 ; compute bits to skip in first byte
- move.w D2,oLbits(A2) ; save that in the structure
-
- lsr.w #3,D1 ; convert column count from bits to bytes
- ext.l D1 ; convert to a long value, so we can...
- add.l D1,D0 ; add to row start in bitmap to find 1st byte
- move.l D0,oBase(A2) ; save that in the structure
-
- ;
- ; save the stride of the bitmap; this is the same as for the original, but in bits.
- ;
- move.w rowbytes(A0),D0 ; pick up the stride
- lsl.w #3,D0 ; multiply by eight to get a bit stride
- move.w D0,oStride(A2) ; stick it in the target structure
-
- ;
- ; save the number of rows and columns.
- ;
- move.w bottom(A1),D0 ; get the bottom of the rectangle
- sub.w top(A1),D0 ; less the top coordinate
- sub.w #1,D0 ; get number of highest row (1st is zero)
- bmi.s CERROR ; nothing to do? (note: 0 IS ok)
- move.w D0,oRows(A2); ; save that in the structure
-
- move.w right(A1),D0 ; get the right edge of the rectangle
- sub.w left(A1),D0 ; less the left coordinate
- sub.w #1,D0 ; make it zero-based
- bmi CERROR ; nothing to do here?
- move.w D0,oCols(A2) ; save that in the structure
-
- ;
- ; all done. return.
- ;
- rts
-
- ;
- ; error found in CONVERT. pop return and jump to the error routine, such as it is.
- ;
- CERROR
- addq.l #4,SP ; pop four bytes of return address.
- bra.s ERROR ; return silently
-
- ;
- ; -----------------------------------------------------------------------------------
- ;
- ; log2 -- find the ceiling of the log, base 2, of a number.
- ; bitwidth -- find how many bits wide a number is
- ;
- ; calling sequence:
- ; move.l N,D0 ; store the number in D0
- ; bsr LOG2 ; call us
- ; move.w D0,... ; D0 contains the word result
- ;
- ; registers used: D2, (D0)
- ;
-
- BITWIDTH
- sub.l #1,D0 ; so 2**n works right (sigh)
- LOG2
- tst.l D0 ; did they pass us a zero?
- beq.s LOGDONE ; call log2(0) zero -- what the heck...
- beq.s LOGDONE ; if D0 was one, answer is zero
- move.w #32,D2 ; initialize count
- LOG2LP
- lsl.l #1,D0 ; slide bits to the left by one
- dbcs D2,LOG2LP ; decrement and loop until a bit falls off
-
- move.w D2,D0 ; else save our value where we promised it
- LOGDONE ; here with final value in D0
- rts ; and return
-
- .end ; procedure dissBits
-