home *** CD-ROM | disk | FTP | other *** search
-
- 3DGame Engine, by: Jason Freund and Gabe Dalbec.
-
-
-
-
- INTRO
-
- This paper describes the algorithm used to create the demo, the
- current problems with it, and possiblities for future expansion. It's
- intended for curious people or for people who might want to join a team
- to develop this into a PD game (see rot.invite file) or people curious
- about the technical aspects of the program.
-
-
-
- CURRENT WORK
-
- I've done most of the current work myself and with a new partner,
- Chris Hames (Degrader, DirWorks, PCTask). My roommate who did half the
- original work is in Mexico for a few months.
-
- First, the perspective problem, described below. The first thing
- you notice about the demo is that walls get stretched when they are
- close to you and at an angle.
-
- We are also currently working on objects. I am not going to use
- my custom blitter routines to blit objects on the screen -- like I
- originally planned. Instead, objects will be converted to the same
- semi-chunky format (described below) as the walls and then rendered
- into fast ram. The advantages are: 1) no need to double buffer to
- get rid of flickering caused by 2 stage drawing (one for walls, one
- for objects). This is a good speedup. 2) Faster on high end computers.
- (Which is my target machine, so I don't care) 3) Can do correct
- vertical and horizontal scaling. -- During rendering, I can pull
- columns and rows from the brush to shrink it to fit perfectly. If I'd
- used the blitter, I could only shrink vertically and I'd have to use
- 5 different sizes of each brush to get choppy scaling in the H-dir.
- Disadvantage of not using blitter: objects restricted to 16 colors
- because of nature of semi-chunky format.
-
-
-
- OVERVIEW OF THE ALGORITHM
-
- This program doesn't use any texture mapping or ray tracing.
- Everything is faked. There are many limitations on the engine which
- allow us to fake things. First, walls are all equal-length and equal
- height. They can only be spaced multiples of 1 wall length apart. You
- cannot see through walls. Walls must be perpendicular to each other.
- The whole dungeon is vertically symmetrical. These assumptions allow us
- to generate each frame in "screen" space -- by looking at the start and
- endpoints of the tops of each wall face.
-
- To reduce the number of rotations per frame, the dungeon data is in
- third normal form. That means we have a list of (X,Y,Z) points
- representing endpoints of wall faces and a list of edges that reference
- this list of points. We also create a list of midpoints from the list
- of edges. The Z values of each point (or height of the walls) are a
- +constant (ie 0.6). This way each edge represents the top line of a
- wall. All you need is this top edge to compute the "zbuffer".
-
- Each frame is defined by a 1 line "z-buffer": typedef struct {
- short height;
- short bitmap;
- short offset;
- short junk; /* to make structure 2^3 long */ } mapline; mapline
- zbuffer[SCREENWIDTH];
-
- Now, for each 1 pixel column on the screen, you have:
-
- 1) height == how far from the top of the screen to start drawing the
- column. You draw from height, down to the center of the screen. 2)
- bitmap == which source bitmap number to draw from at that column. 3)
- offset == How far in the source bitmap to go to grab the column to draw.
-
- For each frame you generate a new "zbuffer" by examining every point
- between the start and endpoint of every edge (that you can at least
- partially see) in the dungeon. First, set height=MIDDLEOFSCREEN for
- every column in the zbuffer. I use integer Breshenham's to generate
- every point between the endpoints of the line segment. Then for each
- point on the line, check to see if the Y coord is higher than the
- corresponding zbuffer[i].height. If it is, then that point represents a
- column that will be in front of everything else and you should update
- every field in zbuffer[i]. When you've done this for every edge, your
- zbuffer will contain everything you need to describe the scene.
-
- Once you generate the zbuffer, just loop through the array, yank
- _column from _bitmap and draw it from _height. Since you know how high
- the wall is, you can calculate how much to shrink or expand that source
- bitmap column and skip or redraw rows of the column right in the loop
- that copies it to the screen.
-
-
- COPY-OUT
-
- The "copy-out" routine is the biggest bottleneck in the program.
- The "copy-out" routine copies a column from the source bitmap to the
- screen, shrinking or expanding it as it copies. I will spend some time
- writing about this aspect of the program since it is the most important
- part.
- To begin with, it is the most optimized assembly we've ever written.
- It uses lookup tables for every calculation where using a lookup table
- will save a cycle. This is the routine where we made huge changes to our
- code to implement schemes that would remove 1 cycle from the innermost
- loop. We tried about a half dozen schemes before we came up with our
- final version.
-
- First, our bitmaps are preconverted to 1 big chunky plane rather
- than 4 small bitplanes; the first ulong represents the first 8 pixels.
- Consequently, when you want to display a pixel, you do shifts, ands, and
- ors a longword at a time and get the bits for all 4 bitplanes at once.
- These calculations are all done in FAST memory and rendered onto a FAST
- memory screen-sized area. Then, when you're all done rendering the
- screen, you copy it all to planar-mode CHIP memory so you can see it.
-
- It's convenient that the 3000's memory and bus are all four bytes
- wide because that gives us all the colors we could ask for: 16. But
- we're getting 32 colors for almost free. The only cost is blit-clearing
- a fifth bitplane (half the size of the screen), 20% bus contention with
- denise chip which is in competition for bus time to display the picture,
- and expanding our copy-out loop by one to write a 1 or 0 into the fifth
- plane according to which palette is being used in that column. It turns
- out that all these costs don't add up to more than about 5% slowdown.
- So we took it. The "source bitmap number" field of our mapline is also
- an index into an array that tells us which palette (0 or 1) the bitmap
- uses. Our "copy-out" routine sets the appropriate bit in the fifth
- bitplane of the destination screen based on this lookup without taking
- any time away from the longword logical operations on the source bitmap.
- We used to employ the blitter to fill the fifth bitplane in one fell
- swoop instead of filling it byte-by-byte withe the CPU. This was done
- by drawing vertical lines everywhere the scene changed palette. Then
- we'd fill the screen, telling the blitter to change fill modes every
- time it hit a line. Not only was this slower, but we got a slight
- flickering when we drew the fifth bitplane with blitlines and a blitfill
- due to the fact that we were drawing to the screen at 2 different times.
- But when we started to fill the fifth bitplane inside the copy-out loop,
- the flickering was fixed.
-
- One of the copy-out methods we experimented with was a pipelined
- engine. I am describing it because it would be useful for people who
- want to modify our engine to run on the 500 which doesn't have a fast
- CPU. You can use the CPU to generate the zbuffer while the blitter
- clears the screen. Then the CPU would perform the copy-out. Then the
- Blitter would do the Vflip while the CPU rotated the points for the next
- frame, and so on. We rejected this and other schemes because a 68030
- CPU is faster than the blitter for everything; any parallelism would not
- speed up the pipeline. In fact, after everything was optimized for the
- CPU, we didn't need the blitter at all.
-
-
-
- PROBLEMS:
-
- One problem we found is that at the corners of the walls, a few
- pixels from a wall behind would overlap the wall in front. This ocurred
- because we're not using any painter's algorithm to traverse the list of
- edges. At the corners where both walls at a corner are at the same
- height, there is no way to judge which is actually in front. Another
- problem we had is that corners didn't exactly line up because we were
- using breshenhams to calculate points along a line segment.
-
- Both of these problems were fixed by multiplying every Y value by 32
- before calculating points between the endpoints of the edge and then
- dividing by 32 after the zbuffer is generated. Be sure to check for -Y
- values before lsl/r #5's to preserve the sign of Y.
-
- Another problem that we still have is with the perspective of the
- walls. Flat walls or walls far away are fine. But walls that are close
- to you get smeared. This is because we don't calculate any real
- perspective. We just use a linear mapping based on the ratio of the
- width of the rotated wall to the width of the flat, original source
- bitmap to calculate offsets to the source bitmap. Walls that are at a
- steep angle and close to you end up drawing from a very small portion
- from the source bitmap.
-
- One solution that we tried was to generate real perspective based on
- a recursive algorithm. We already have a list of midpoints for each
- wall face that we use to detect wall collisions. If we rotate these
- midpoints, we can find the ratio between the width on the screen of the
- close half of the wall to the width of the far half of the wall. This
- midpoint will draw from the center of the source bitmap. Then we just
- recursively subdide each half of the wall, keeping track of the new
- corresponding center on the source bitmap as we go, building an array of
- source bitmap offsets. This is slow, but could be precomputed and
- simulated with a lookup table that just takes in a ratio to lookup
- offsets for the largest possible wall. Smaller walls could then use a
- linear mapping from this large wall. Unfortunately, we still haven't
- been able to get the recursive routine to terminate correctly.
-
-
-
-
- SPEEDUPS:
-
- Our first speedup was to only draw the top half of the screen and
- use the copper to reflect the top half onto the bottom half. Actually,
- this was just changed by Chris Hames to be done with the CPU now. But
- we still get a cute floor and ceiling for free out of this by changing
- the background colors at certain rows. This speedup makes the program a
- good 35% faster than drawing all the way down.
-
- Resolution switching was implemented. Press "m" to change modes.
- Default is auto-switch. When you run, your eyeballs can't see detail
- as clearly -- so we make pixels chunky. When you rotate fast or run,
- the program draws frames in 1/4 -lores. THen when you stop running,
- it returns to regular lores. This is done by creating fastram image
- 1/2 width, 1/2 height of normal -- and then blowing it up during copy
- out to chip.
-
- The list of (X,Y) coordinates representing the endpoints of wall
- faces is sorted by X coordinate. Wall collisions are handled by
- checking every point in the maze. If any point is inside the box you
- are about to move into, an X and/or Y component of your movement is set
- to 0 so that you slide along the edge of a wall. To make the check
- against every point fast for large dungeons, we sort the list of points
- by X coordinate and only check the sublist until the X coordinate is out
- of the range of your box.
-
-
-
- AREAS FOR FUTURE EXPANSION
-
- Ideally, if enough people support this, it could develop into a
- fully multitasking game that runs on all amigas, as fast as your amiga
- can make it go.
-
- I decided not to do windows or 2 story buildings like in the game
- Legends of Valor. I'd have to do a bit of backtracking to implement
- them and I probably won't have time.
-
- Aside from obvious things that this needs to develop into a game
- like sound, music, good graphics, maps, monsters, etc, etc, some
- technical advances might be:
-
- AGA:
-
- Only the 4000 has enough spare cycles to handle this. AGA's
- 256 color mode conveniently fits twice the length of my 1longword
- chunky mode. So there'd be no wasted cycles in making the game
- work with 256 colors.
-
-
- CHANGE SCREENSIZE:
-
- I'd like to be able to adjust the view window like in wolf3d.
- This way, you could get it to run fast on an 020 w/ FPU and get it
- to run on a bigger view window on the 040.
-
-
- A500 COMPATABILITY:
-
- This program was not meant for machines without a fast CPU or a math
- chip. We don't have access to a 500 and we have no particular desire to
- see it run fast on one. We'd be happy to turn our sourcecode over to
- anyone who can read C and can program real well on the 500.
-
- Some things that would have to be done are:
-
- 1) Shrink the screensize. I think one-quarter screen would be small
- enough. Our engine code is modular enough to make adjusting the screen
- size easy. 2) Rewrite several routines to use integer math. I'm sure
- eurodemo coders have a lot of experience there. Integer math may even
- help the A3000 version. 3) Vflip screen with blitter instead of CPU.
- Use parallel processing where-ever possible (see COPY-OUT, above). 4)
- The two 16 color palette trick is done with the CPU. It would have to
- be redone using vertical lines and 1 blitter fill. Actually, this is
- the way we originally did it, so the code is already there -- just
- commented out. Of course, using the blitter, as explained in COPY-OUT,
- above, will cause flickering because you are drawing once for the top
- four plains and then again later for the fifth plane. This could be
- fixed, however, with doublebuffering which you'll probably need anyway
- on the 500.
-
-
- ---------------------------------------------------------------------------
- School addresss (until 6/12/93): Permanent Address
-
- Jason Freund, Jason Freund 2033 F Street #311
- 690 Erie Dr Davis, CA 95616 Sunnyvale, CA 94087
- 916-758-0272 408-732-0421
-
-
- Gabe Dalbec Gabe Dalbec 2033 F Street #311
- 789 Butterfly Rd Davis, CA 95616 Quincy, CA 95971
- 916-758-0272 916-281-6600
-
-