HPAVC

home *** CD-ROM | disk | FTP | other *** search

/ HPAVC / HPAVC CD-ROM.iso / pc / UGPRG.ZIP / 3D / 3D.DOC next >

Wrap

Text File | 1996-07-27 | 121.4 KB | 3,053 lines

3DGPL Version 1.0 -------------------- GRAPHICS TUTORIAL ------------------------------------ FOR GAME PROGRAMMERS ---------------------- Copyright (c) 1995 Sergei Savchenko (savs@cs.mcgill.ca) ------------------------------------------------------------------------------ Original Archive is available @x2ftp.oulu.fi Please note that this document does not contain the source! ------------------------------------------------------------------------------ * * * 3-D transformations. ---------------------- "Several large, artificial constructions are approaching us," ZORAC announced after a short pause. "The designs are not familiar, but they are obviously the products of intelligence. Implications: we have been intercepted deliberately by a means unknown, for a purpose unknown, and transferred to a place unknown by a form of intelligence unknown. Apart from the unknowns, everything is obvious." -- James P. Hogan, "Giants Star" 3D transformations would assume the central position in the engine. After all, this is something which would allow us to see objects from different sides, manage the three dimensional space etc. The simplest but by no means least important transformations are translation and scaling. The rotation transformation would be little bit more complicated to understand and implement but fortunately not much. Talking of all three transformations we can either consider them from the point of an object (collection of points in 3D) being transformed, or a space itself being transformed. In some cases it is more productive to think one way in some another way. Translation and scaling. ------------------------ Translation. This transformation moves points of an object to a new specified position pic 2.1, On the other hand we may think of it as a system of references being moved in the opposite direction pic 2.2: Y ^ ^ | * A' | ^ * B' | | | ^ Y'^ |A*--------+ | y component | |A* | B*--------+ | B* | x component | | ---------+-------------------> X - - - - - - | - - - - - - -> X 0| | 0| | -----------+------------->X' | 0'| | | | | | | | pic 2.1 pic 2.2 This way if it is a collection of points we are moving, Distances between them would not change. Any move across 2D space can separate into 2 components - move along X and move along Y. It would take 3 components in 3D space - along X, Y and Z. Important place we may need translation is to , for example , move (0,0) of the screen from upper left corner of the screen to it's physical centre. So the translation function might look like: void T_translation(register int *from,register int *into,int number, int addx,int addy,int addz ) { register int i; for(i=0;i<number;i++) { (*into++)=(*from++)+addx; (*into++)=(*from++)+addy; (*into++)=(*from++)+addz; /* translation */ } } Another transformation is scaling: blowing up of distances between points. Or we can think of it as a contraction or blowing of the space itself pic2.3: A'* Y ^ \ | \ | * B'* *C' A ^ / B* *C | --------------+-----------------> X 0| | | pic 2.3 The obvious usage of this transformation: trying to fit 2000x2000 object into 200x200 window - evidently every point of this object would have to have coordinates scaled to fit in the window range. In this case scaling is just multiplication by 1/10. The object is contracted that way. On the other hand we can think that it was actually the window space expanding to enclose the object (we would have to abstract far enough from computer hardware today on the market to do that, however). In general case we can introduce separate factors for every axis, if this factor is greater then 0 and less then 1 we would contract the object if it would be greater then 1 the object would be expanded. The code is once again trivial: void T_scaling(register int *from,register int *to,int number, float sx,float sy,float sz ) { register int i; for(i=0;i<number;i++) { (*to++)=(int)(*from++)*sx; (*to++)=(int)(*from++)*sy (*to++)=(int)(*from++)*sz; } /* scaling */ } Planar case rotation. --------------------- Of those three transformations rotation is by far the most complicated one, let's first consider planar - 2-D case of rotation. there is some simple trigonometry involved, sin and cos functions. just for those who don't deal with this subject every day sin and cos for a triangle on pic 2.4 are defined as: + /| l/ | / |y /alp| +----+ x pic 2.4 x y sin(alp) = --- cos(alp) = --- l l And so if we do know sin(alp) or cos(alp) we can calculate segments x and y knowing segment l: x = l*sin(alp) y = l*cos(alp) This can be considered as seeking projections of a point for which we know the distance from the beginning of coordinates onto orthogonal axes pic 2.5: ^ Y | +----* | /| y| /l| | / | |/alp| ------+----+---->X | x | pic 2.5 Now, let's consider the situation when we rotate the system of references counterclockwise by the angle alp. What would be coordinates of point A in the turned system of references Y'X' if it's coordinates in the original system of references were (x,y)? ^ Y X' | / Y' Y/*\--*A / \ / | \| / \ / | |\ / Yy'\ | | /Yx' \ | | \| /\ Xx' -------------+---+-------------------> X /| \/ X / Xy' \ / | \ / | \ / | \ | pic 2.6 Using the sin/cos formulas from above we can find projections of x and y (that is of the projections of the point onto original axis's) onto new axis's X'and Y'. Let's sum projection of x onto X' and projection of y onto same axis's and do the same thing with projections of x and y onto Y'. Those sums would be just what we are looking for, coordinates of the point A in the new system of references X'Y' i.e. X' = Yx'+Xx' = y*sin(alp) + x*cos(alp) Y' = Yy'+Xy' = y*cos(alp) + (-x*sin(alp)) Note the sign of Xy', just from looking on the pic2.6 one can see that in this case x is being projected onto the negative side of Y'. That's why the negative sign. So those are the transformation formulas for counterclockwise rotating system of reference by an angle alp: x'= y*sin(alp) + x*cos(alp) y'= y*cos(alp) - x*sin(alp) On the other hand we can consider it as point A itself rotating around zero clockwise. 3-D rotation. ------------- This subject is complicated only because it is a bit hard to visualize what is happening. The idea on the other hand is simple: applying three 2-D rotations one after another so that each next works with the results obtained on the previous stage. This way every time we are applying new transformation we are working not with the original object but the object already transformed by all previous rotations. Each of 2D rotations this way is bound to some axis around which two other axes would rotate. There are few important considerations influencing the way we would find the formulas: 1) What kind of system of references we have. What is the positive direction of the rotations we would be applying; and 2) In what order we want to apply 2D rotations. System of references: --------------------- Well, we do have freedom of choosing directions of axes, for one thing, we are working in orthogonal system of references where each axes is orthogonal to the plane composed of remaining two. As to where the positive direction of every axis point we have freedom to decide. It is customary for example to have positive side of Y directed upside. We don't actually have to follow customs if we really don't want to, and remembering that the colourmap, we would be doing all drawings into, have Y directed down we might want to choose corresponding direction here. However having it pointing up is more natural for us humans, so we might save time on debugging a while afterwards. Let's point it up. As to the Z let's direct it away from us and X to the right. Y^ Z | / | / alp /|<---+ | |/ | -------|-+-----------> X bet | | __ V/| /| gam /----+ / | / | | pic 2.7 As to the rotations, let's call the angle to turn XY plane around Z axis as gam, ZY around X as bet and ZX around Y as gam, pic 2.7 Transformation order. --------------------- The problem with transformation order is that the point consequently turned by gam-bet-alp won't be necessarily at the same place with where it might go if turned alp-bet-gam. The reason is our original assumption: each next transformation works on already transformed point by previous 2D rotations. That is, we are rotating coordinates around moving axes. On the picture 2.7 we can see that rotation bet is performed around axis x but the next rotation alp is performed around axis z which after application of previous transformation would move. + +----+ z / /| / / alp / + V bet \ ------------+-------|---- x ---------+------------ x / | \ +----+ z | / | | V alp pic 2.8 The implication is: we have to know with respect to what each next rotation is performed, that is what is the order of applications of 2D rotations. Let's think how we are coordinating the surrounding world? First we think of the direction. Then we can tilt our head up and down, and finally from there we can shake it from left to right. Now, when we are tilting head we have already chosen the direction. If we first would tilt head then the directional rotation to be performed after would not be parallel to the ground but rather perpendicular to the imaginary line being continuation of our tilted neck. ( No responsibility hereby assumed for all direct or consequential damage or injury resulted from doing that ) So it all comes to directional rotation first - gam in our case. pitch second - bet; roll last - alp. Let's do just that using obtained formula for 2D rotation: ^Z ^Y' ^Y" | | | |<---+ |<---+ |<---+ | gam| | bet| | alp| | | | | | | -+----+--->X -+----+--->Z' -+----+--->X" | | | x'=z*sin(gam)+x*cos(gam) y'=y z'=z*cos(gam)-x*sin(gam) x"=x y"=y*cos(bet)-z*sin(bet) z"=y*sin(bet)+z*cos(bet) x"'=y*sin(alp)+x*cos(alp) y"'=y*cos(alp)-x*sin(alp) z"'=z pic 2.9 That's basically it, using those formulas we can apply 3D rotation to coordinates x,y,z and get x"',y"',z"'. We can see here 12 multiplication's, The question is can we reduce this number? Let's try getting rid of temporary variables x',y',z',x",y",z". We can do it by just substituting each occurrence of them by their real meaning: First let's try obtaining x",y" and z" expressed via x,y,z: y"=y'*cos(bet) - z'*sin(bet) y"=y*cos(bet) - (z*cos(gam)-x*sin(gam)) *sin(bet) y"=y*cos(bet) - z*cos(gam)*sin(bet) + x*sin(gam)*sin(bet) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x"=x' x"= z*sin(gam) + x*cos(gam) - - - - - - - - - - - - - - - z"=y'*sin(bet) + z'*cos(bet) z"=y*sin(bet) + (z*cos(gam)-x*sin(gam)) *cos(bet) z"=y*sin(bet) + z*cos(gam)*cos(bet) - x*sin(gam)*cos(bet) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Now using that we can express x"',y"' and z"' via x,y,z: y"'=y"*cos(alp) - x"*sin(alp) y"'=(y*cos(bet) - z*cos(gam)*sin(bet) + x*sin(gam)*sin(bet))*cos(alp) - (z*sin(gam) + x*cos(gam)) *sin(alp)) y"'=y*cos(bet)*cos(alp)-z*cos(gam)*sin(bet)*cos(alp)+x*sin(gam)*sin(bat)*cos(alp) -z*sin(gam)*sin(alp) - x*cos(gam)*sin(alp) y"'=x* [ sin(gam)*sin(bet)*cos(alp) - cos(gam)*sin(alp)] + y* [ cos(bet)*cos(alp) ] + z* [-cos(gam)*sin(bet)*cos(alp) - sin(gam)*sin(alp)] ----------------------------------------------------------- x"'= y"*sin(alp) + x"*cos(alp) x"'=(y*cos(bet) - z*cos(gam)*sin(bet) + x*sin(gam)*sin(bet))*sin(alp) + (z*sin(gam) + x*cos(gam)) *cos(alp) x"'=y*cos(bet)*sin(alp)-z*cos(gam)*sin(bet)*sin(alp)+x*sin(gam)*sin(bet)*sin(alp) z*sin(gam)*cos(alp) + x*cos(gam)*cos(alp) x"'=x* [sin(gam)*sin(bet)*sin(alp) + cos(gam)*cos(alp)] + y* [cos(bet)*sin(alp) ] + z* [sin(gam)*cos(alp) - cos(gam)*sin(bet)*sin(alp) ] ---------------------------------------------------------- z"'=z" z"'=x* [- sin(gam)*cos(bet)] + y* [sin(bet)] + z* [cos(gam)*cos(bet)] ---------------------------------------------------------------- This really appear to be more complicated that what we had before. It appear to have much more multiplications than we had. But if we look on those resulting formulas we can see that all coefficients in square brackets can be calculated just once so that transformation of a point would look like: x"'=x*mx1 + y*my1 + z*mz1 y"'=x*mx2 + y*my2 + z*mz2 z"'=x*mx3 + y*my3 + z*mz3 It can be seen that it takes just 9 multiplication. Of course calculating all the coefficients takes 16 multiplications: mx1= sin(gam)*sin(bet)*sin(alp) + cos(gam)*cos(alp) my1= cos(bet)*sin(alp) mz1= sin(gam)*cos(alp) - cos(gam)*sin(bet)*sin(alp) mx2= sin(gam)*sin(bet)*cos(alp) - cos(gam)*sin(alp) my2= cos(bet)*cos(alp) mz2= -cos(gam)*sin(bet)*cos(alp) - sin(gam)*sin(alp) mx3= -sin(gam)*cos(bet) my3= sin(bet) mz3= cos(gam)*cos(bet) But if we have 100 points to turn. Original method would take 12*100=1200 multiplications. This one would take 9*100+16=916 since coefficients are calculated just once for all points to transform. One more consideration having to do with optimization, in most cases it makes sense to measure angles as integers. We don't usually care of fractions smaller then 1 degree. So we don't really need sin and cos for all the real number range. That's why we can calculate sin/cos just once for all 360 degrees before any rotations are performed and put the obtained values into arrays so that next time we would need any of them we wouldn't have to call a functions which probably uses a power series with few multiplications to calculate each. (Processors nowadays come with floating point units which can calculate sin/cos pretty fast but unlikely faster then single array access (this might become false though). By the way we don't really need 360 degrees. This number is not very convenient. We can go with full angle divided into 256 pseudo degrees. The reason we would need just one unsigned char to store the angle and whenever we go beyond 255 we simple wrap around to zero, saving this way a conditional statement we would need otherwise in the case of 360 degrees. #define T_RADS 40.7436642 /* pseudo grads into rads */ float T_mx1,T_mx2,T_mx3,T_my1,T_my2,T_my3,T_mz1,T_mz2,T_mz3; float T_sin[256],T_cos[256]; /* precalculated */ void T_init_math(void) { int i; for(i=0;i<256;i++) { T_sin[i]=sin(i/T_RADS); T_cos[i]=cos(i/T_RADS); } } Now finally to the code to perform 3D rotation. First function is building set of rotation coefficients. Note that T_mx1, T_mx2 etc. are global variables. The other function: T_rotation uses those coefficients, so they better be ready when later is called. void T_set_rotation_gam_bet_alp(int alp,int bet,int gam) { T_cosalp=T_cos[alp]; T_sinalp=T_sin[alp]; T_cosbet=T_cos[bet]; T_sinbet=T_sin[bet]; T_cosgam=T_cos[gam]; T_singam=T_sin[gam]; /* initializing */ T_mx1=T_singam*T_sinbet*T_sinalp + T_cosgam*T_cosalp; T_my1=T_cosbet*T_sinalp; T_mz1=T_singam*T_cosalp - T_cosgam*T_sinbet*T_sinalp; T_mx2=T_singam*T_sinbet*T_cosalp - T_cosgam*T_sinalp; T_my2=T_cosbet*T_cosalp; T_mz2=-T_cosgam*T_sinbet*T_cosalp - T_singam*T_sinalp; T_mx3=-T_singam*T_cosbet; T_my3=T_sinbet; T_mz3=T_cosgam*T_cosbet; /* calculating the matrix */ } void T_rotation(int *from,register int *to,int number) { register int i; register int xt,yt,zt; for(i=0;i<number;i++) { xt=*from++; yt=*from++; zt=*from++; *to++=(int)(T_mx1*xt+T_my1*yt+T_mz1*zt); *to++=(int)(T_mx2*xt+T_my2*yt+T_mz2*zt); *to++=(int)(T_mx3*xt+T_my3*yt+T_mz3*zt); /* performing the operation */ } } Representing transformations in matrix form. -------------------------------------------- Matrix is a table of values: | a b c | A=| d e f | | g i j | Special cases are column and row vectors: | x | V=| a b c | C=| y | | z | Row can by multiplied by a column producing a value, a scalar: | x | | a b c |*| y | = a*x + b*y + c*z | z | Matrix can be multiplied by another matrix: each value of the result was obtained by multiplying respective rows and columns of the arguments (since columns and vectors are to have the same dimension in order for multiplication on them to be defined it automatically places restrictions on dimensions of matrices that can be multiplied): | x | | k | | | a b c |*| y | | a b c |*| l | | | | z | | m | | | | | a b c | | x k | | | x | | k | | | d e f |*| y l | = | | d e f |*| y | | d e f |*| l | | | g i j | | z m | | | z | | m | | | | | | x | | k | | | | g i j |*| y | | g i j |*| l | | | z | | m | We may be interested in multiplying a row vector by a 3x3 matrix: | a b c | | x y z |*| d e f | = | x*a+y*d+z*g x*b+y*e+z*i x*c+y*f+z*j | | g i j | Looks familiar? Yes, the rotation coefficients we've calculated lie nicely in the 3x3 matrix and so we can think of 3D rotation as of matrix multiplication of "rotation" matrix onto original vector producing result vector. We can also try fitting translation and scaling into a matrix, to do that for translation, however, we would have to increase dimensions of the coefficient matrices and the argument vector: Translation: | 1 0 0 0 | | x y z 1 |*| 0 1 0 0 | = | x+tx y+ty z+tz 1 | | 0 0 1 0 | | tx ty tz 1 | Scaling: | sx 0 0 0 | | x y z 1 |*| 0 sy 0 0 | = | x*sx y*sy z*sz 1 | | 0 0 sz 0 | | 0 0 0 1 | The advantage of the matrix approach is that if we have several transformations each represented in matrix form: | X'| = (| X |*| A |) *| B | Where A, and B are transformation matrixes and | X | the argument vector to be transformed we can: | X'| = | X |* ( | A |*| B | ) calculate the "concatenated" matrix | K | = | A |*| B | and each following coordinate just transform as: | X'| = | X |*| K | By the way by representing each of 2-D rotations in 3x3 matrix form we can get the formulas we have found for 3-D rotation by calculating a concatenated matrix. So, matrix approaches might be very effective in instances where there are a lot of consecutive transformation. For example if we have consecutive rotations it makes sense to calculate their concatenated matrix and use it from then on. It would save us quite a few multiplication, one verses several consecutive matrix multiplication for each point to transform. And in the run-time it is preferably to use pre-calculated formulas for concatenated matrix coefficients then trying to get it every time by multiplying respective transformational matrices. The reason 3 consecutive matrix multiplication of, for instance, 3x3 matrices take 9+9+9=27 multiplication's. But since lot of values in those matrices are 0, it is taken into account in pre-calculated coefficient formulas which allow to produce the very same results with just 16 multiplications. As to the translation transformation matrix and especially increasing the dimension from 3 to 4 it looks much less natural. 4x4 matrix means more multiplications too, of course some of the value in 4x4 would always be zero so we can optimize (cripple) the routine to original 9 multiplications and 3 additions. In one word, there's a freedom to decide what approach to take. perspective. ------------ I believe, it is not hard to visualize what perspective is, perhaps the first association lot of us would have, is this empty straight street with identical buildings on both sides disappearing in infinity. ( At least people residing in newer neighborhoods of Moscow or Toronto know what I am talking about ) Why do we need this transformation? Just to achieve landscape realism, after all this same street would look pretty weird if the road would not collapse into a dot and buildings further away from us won't look smaller. So weird it might be even hard to picture. Perspective might not be needed for CAD style applications, but even there it would, most likely, be implemented in one viewing mode or another. Before we can implement this transformation it is a good idea to understand physics of what is happening. A ray from some point in space is being caught by the viewer's eye. This ray had intersected a plane located in front of the viewer. If we can find an intersection point and plot the dot there, for the viewer it would be the same as if the ray from the plotted dot was actually coming from the initial point in space. * A | / A'| / * / | B' *---*-----------------------* B \ | *\ C'| \ | \ * C pic 2.10 Rays from different points in space , in this model, all converge in one point - viewer's eye. The math here is quite straightforward pic 2.11: screen plane->| * | /| A (X,Z) | / | |/ | ^ * | | /|X' | / | | |/ | | *---+---+- - - - -> 0 | Z | | focus distance ->|---|<- pic 2.11 Let's consider 2-D case first: the viewer's eye is located at point 0 the distance between viewer's eye and the screen would be called - "focus". The problem is to determine at what point the ray going from point A into the viewer's eye would intersect the screen. We would have to plot the dot just there. This is very simple, just yet another case of solving similar triangles. Just from the fact that the angle at 0 is the same for both triangles, and if two angles are the same their tangents would be too. We have: X' X X*focus ----- = ----- => X'= ----------- focus Z Z Same considerations apply for Y dimension, together describing 3-D case: Y'=Y*focus/Z X'=X*focus/Z It is a bit of a mystery what is happening with Z coordinate. Basically after the perspective transformation we would be performing rendering onto the screen which is 2-D surface we would need X and Y but hardly Z, so we might as well have Z'=0. However when trying to render multiface objects we would need to know the depth (Z) of all polygons so that we can decide which are visible and which are obscured. So we might have Z'=Z that is just leave old value. Then again by doing transformation on X and Y and leaving Z unchanged we actually may allow for depth relation to change. That is a polygon obscuring another one in the original space, before projection, can actually come out behind or partly behind with this kind of transformation pic 2.21. ^ Z same | * * deapth | * / * / before | A / / // and | * / /* A after | /B / B | * * | before after pic 2.12 To avoid that we can apply some non linear transformation on Z, like Z'=C1+C2/Z, where C1, C2 are some constants. In the enclosed code only X and Y are being transformed. The implementation of all those things is that easy I won't even put a code example here. However, there are few possible questions: first: what should be the value for "focus distance" ? \ / \ / \ / \ / \ / \ / ---I\--------/I--- ---I\--------/I--- \ / \ / \/ \ / \ / \/ pic 2.13 pic 2.13 shows it's physical sense, focus basically determines the view angle. when it is smaller - angle is wider, when it is bigger angle is smaller. It is helpful to think of this distance as being measured in screen pixels, this way for 320 pixel display if focus is chosen to be 180 the view angle we are getting is 90'. Perspective transformation might produce at first, weird looking results, with distortions sometimes similar to that of wide-range photographic pictures. One has to experiment a bit to find what's looking best, values giving view angle somewhere between 75-85' would probably look not that terrible, but that depends of scene and screen geometries. clipping problem... ------------------- Where transforms point with Z equal to 0? since we are dividing by Z that would be infinity, or at least something producing divide error, in this computer-down-to-earth-world. The only way to deal with this is to guarantee that there would be no coordinates with such Zs. another thing, calculations for points with negative Z would produce negative coordinates, what we would see is objects (or even parts thereof) flipped over, but then again objects with negative coordinates are behind the viewing plane and effectively behind the viewer, so this is something we can't see and better make sure this is not being rendered. The way to do it is by applying 3-D clipping. The perspective transformation can also be represented as a matrix, however, let's think of all transformations discussed up to this moment a point would have to go through in order to be rendered. First we would rotate the world with respect to viewer orientation, then before we can apply perspective transformation we would have to do clipping and at least get rid of the vertices behind the viewing plane, and only then can we apply perspective transformation. Representing clipping in matrix form is not the most pleasant thing there is. And we've already discussed that matrices would be of use when we have several consecutive transformations so that all of them would be represented by one matrix. In this case however rotation and perspective are separated by the clipping. If we are sure that none of the coordinates would ever get behind the viewing plane, we don't actually need clipping, that's the instance to represent everything in the matrix form, otherwise it might not be a good idea neither in terms of code complicity nor performance. fixed point math. ----------------- In previous chapters in several places we didn't have another choice but to use floating point multiplications. (sin and cos are after all real-valued functions) However this is still fairly expensive operation, integer multiplications and divisions would usually be faster. However since we have to represent numbers with fractions it appeared we didn't have another choice but to use floating point operations. The question is, can we represent fractions using just integers? One might think there's a contradiction in the question itself, however we would see that, as in many other cases, we can use one mechanism to represent something else. This is the case here, We may use fixed point numbers representing them as integers, and with small amendments we can use regular integer additions multiplications etc, to perform fixed point operations. First of all how do we represent integers? +------+------+------+------+------+ | 1 | 0 | 0 | 0 | 1 | +------+------+------+------+------+ 4 3 2 1 0 2 =16 2 =8 2 =4 2 =2 2 =1 pic 2.14 With any counting system, digits in multi digit numbers would have different weight, so to speak. With decimal numbers this weight would be some power of ten, that is 102 is actually: 10**2*1 + 10**1*0 + 10**0*2 == 100+0+2 ==102 Same with binary numbers, the only difference, weight would be some power of 2. this way number on pic 2.13 would be: 2**4*1 + 2**3*0 + 2**2*0 + 2**1*0 + 2**0*1 == 16+0+0+0+1 == 17 Now, it is quite natural for us to place a decimal point into a decimal number, placing a binary point into a binary number should not seem more difficult. How do we interpret the decimal point? It is just that for the numbers to the right of the decimal point we pick negative power of ten. That is: 102.15 == 10**2*1 + 10**1*0 + 10**0*2 + 10**-1*1 + 10**-2*5 == 1 5 15 == 100 + 0 + 2 + ---- + ---- == 102 + ----- 10 100 100 In this representation the point separates negative powers from zero power and unlike numbers represented in exponential form - the floating point numbers, in this case the point never changes it's position. Nothing should stop us from extending the same scheme onto binary numbers: +------+------+------+------+------+ | 1 | 0 | 0 | 1 | 1 | +------+------+------+------+------+ 2 1 0 -1 -2 2 =4 2 =2 2 =1 2 =1/2 2 =1/4 pic 2.15 Using the very same technique we can see that number represented on the pic 2.15 can be considered also as: 2**2*1 + 2**1*0 + 2**0*0 + 2**-1*1 + 2**-2*1 == 4+0+0+1/2+1/4 == 4 + 3/4 Numbers in what range can we represent? Resorting once again to analogy with decimal numbers it can be seen that two digits to the right from the decimal point can cover the range of 0.99 in 0.01 steps. Numbers smaller then minimal 0.01 precision step just can't be represented without increasing number of digits after the decimal point. Very same thing is happening with binaries in the example at pic 2.15 since we have two binary digits- minimal number we can represent is 1/4 ( 0.01 (bin) == 1/4 (dec) ) and again numbers with higher precision can be represented only by increasing number of binary digits after the binary point. Now to the representation of negative numbers, there actually several ways to do that but effectively today one has prevailed, this is so called 2's compliment form. The idea is to negate the weight of the leftmost digit pic 2.16: +------+------+------+------+------+ | 1 | 1 | 1 | 1 | 1 | +------+------+------+------+------+ 4 3 2 1 0 -2 =-16 2 =8 2 =4 2 =2 2 =1 pic 2.16 the number represented above is considered to be: -2**4*1 + 2**3*1 + 2**2*1 + 2**1*1 + 2**0*1== -16+8+4+2+1 = -1 The advantage of this approach, we don't have to change addition and subtraction algorithm working for positive integers in order to accommodate 2's compliment numbers. Why is that? just by the virtue of natural overwrap of integer numbers the way they are represented in the computers. If we add 1 to the maximum number we can represent we would obtain a carry from the most significant (leftmost) bit, which is ignored, and the value of exactly 0. -1 in signed representation is exactly the maximum possible value to represent in unsigned representation, -1+1==0 indeed. (One can check this to be true for all numbers and by more formal means, but since this is not the point I would ignore it) Again, addition and subtraction don't have to be changed to accommodate 2's compliment negative numbers (multiplication however should, that's why there are instructions for both signed and unsigned multiplications in most computers) Since the leftmost digit in 2's compliment representation carries negative weight and that's exactly the one with highest power the minimum negative number possible to represent in the example above would be 10000 (bin) = -16 all the other digits have positive weights so maximum possible positive number would be 01111 (bin) = 15 but this slight asymmetry is not a problem in the majority of cases. However, values of sin and cosin functions are between 1 and -1 so to represent them we might choose format with just one integer field: +------+------+------+------+------+ | 0 | 1 | 1 | 1 | 1 | +------+------+------+------+------+ 0 -1 -2 -3 -4 -2 =-1 2 =1/2 2 =1/4 2 =1/8 2 =1/16 pic 2.17 but, because of the asymmetry described above there would be a value for -1 (1000) but there would be no pure 1, just it's approximation (01111): (pic 2.17) 1/2+1/4+1/8+1/16, Then again, for most graphics applications when we have, say, 15 fields representing fractions this would be close enough and won't cause a lot of problems. As you can see, the very same rules work for 2's compliment whether with fixed point or without. multiplication of fixed point numbers. -------------------------------------- Let's consider what is happening when we are multiplying two decimal numbers, for example we have to multiply an integer by a number with a decimal point and we would need just an integer as a result: 1.52 * 11 ----- 1 52 + 15 2 ----- 16.72 Actual result of multiplications has as many digits after the decimal point as there were in both arguments. Since we need just an integer as a result we would discard numbers after the point effectively shift digits to the right by 2 in this case. We can do the same with fixed point binary numbers, it means that we have to readjust precision after the multiplication. Say, if we are multiplying an integer by a number having 8bit considered to be after the binary point the result would also 8bit after the point. If it is just the integer part we are interested in, we have to shift the result right 8 bit destroying all fractional information. Similar things are happening with division except that if we want to divide two integers and obtain a fixed point result the argument to be divided has to be shifted left- added fixed point fields, so that effectively we are dividing a fixed point number by an integer. How to implement all that? Well, first of all we have to decide what kind of values and what operations on them we would need. Sometimes it is nice just to have general purpose fixed point math library, on the other hand we may choose several different formats of fixed point numbers one for every special place. Another thing we have to consider is: would we be able to represent all the operations using just high level C constructs or it would be necessary to descend into assembly instructions. The reason we might consider later is: in most assemblies the result of multiplication has twice as many bits as each of arguments, and since we would occupy some bits with fractions we really need all bits we can get. Another thing, the result of multiplication would often be located in two registers, So we can organize operations the way so that instead of adjusting result by shifting we would just take it from the register carrying higher bits, -effectively doing zero-cost shifting right by the bit length of the register. On the other hand, do we really want to compromise code portability by coding in assembly? Obviously, there are valid reasons to answer that both ways. In particular case of this text and associated code we are interested in good portability so let's try to stay within straight C boundaries. Where would we be using fixed point? first of all - in 3-D transformations, to keep rotation matrix coefficients. We would need two kinds of multiplication: In calculation of the coefficients: fixed * fixed producing same precision fixed and in actual coordinate transformation fixed * int producing int. lets define fixed to be: typedef signed_32_bit fixed; /* better be 32bit machine */ #define T_P 10 /* fixed math precision */ the T_P would be the number of bits we are allocating to keep fractions. Let's make a little function to transform the floating point values in the range [-1,1] into fixed point values: fixed TI_float2fixed(float a) { fixed res=0; int i,dv,sign; if((sign=a<0.0)==1) a=-a; for(i=1,dv=2;i<=T_P;i++,dv*=2) if(a>=1.0/dv) { a-=1.0/dv; res|=(0x1<<(T_P-i)); }; if(sign==1) res=-res; return(res); } In the function above we are iteratively subtracting powers of two from the argument and using the result to set the bits in the fixed point number. We would use this function to precalculate the array of sin/cos values. When calculating the rotational coefficients we would multiply one fixed by another and obtain same precision fixed result. again, actual result would have twice as many fractional bits as each of the argument so same precision result would be (a*b)>>T_P where 'a' and 'b' fixed point values and T_P number of bits after the binary point in each. This way code would look something like: ... ... T_wx1=((singam*((sinbet*sinalp)>>T_P))>>T_P) + ((cosgam*cosalp)>>T_P); T_wy1=((cosbet*sinalp)>>T_P); T_wz1=((singam*cosalp)>>T_P) - ((cosgam*((sinbet*sinalp)>>T_P))>>T_P); ... ... In rotation function ints would have to be multiplied by a fixed coefficient, the result would have T_P fractional bits, since we are interested in just integer part same rule as above applies: ... ... *to++=((T_wx1*xt)>>T_P) + ((T_wy1*yt)>>T_P) + ((T_wz1*zt)>>T_P); *to++=((T_wx2*xt)>>T_P) + ((T_wy2*yt)>>T_P) + ((T_wz2*zt)>>T_P); *to++=((T_wx3*xt)>>T_P) + ((T_wy3*yt)>>T_P) + ((T_wz3*zt)>>T_P); ... ... That's about it. The only thing, we really have to be careful with the range of numbers the operations would work for. If we are working with 32bit numbers and would choose 16 of them for fractional part and one for sign bit then only 15bit would carry the integer part. after multiplication of two fixed numbers since result has as many fractional bits as in both arguments all 32 bits would actually carry fractions. The integer part would be gone in the dark realm of left-carry, and there's no way in standard straight C to get it out of there. (when using assembly we don't have this problem since result of multiplication has twice bits of each argument, so we really can get trough to the integer part, however, most C compilers define the result of multiplication to be of the same bit size as arguments int*int==int, some compilers try to be smarter then that but we really have to consider worst-case scenario) And finally is it really true that integer multiplication plus shifts is faster then floating multiplication? What follows are results of some very approximate tests: sparc: floating faster fixed about 1.3 times 68040: floating faster fixed about 1.5 times rs4000: floating faster fixed about 1.1 times rs6000: fixed faster floating about 1.1 times i386sx: fixed faster floating about 5.1 times floating in the test was: float a,b; a*b; fixed in the test was: int a,b; (a*b)>>15; As one can see processors with fast floating point units nowadays do floating operations faster then integer once. Then again in terms of portability integer multiplication would always be fast enough whereas floating multiplication especially on processors without FPUs (i386sx) might get really slow. The decision whether to implement fixed point, this way, might get a bit tough to make. In the provided library's source transformations are implemented both ways but with the same function prototypes, so that one can decide which implementation is more suitable to link in. * * * 2-D rasterization. -------------------- Re graphics: A picture is worth 10K words - but only those to describe the picture. Hardly any sets of 10K words can be adequately described with pictures. Rasterization, the term itself refers to the technology we are using to represent images: turning it into mosaic of small square pixel cells set to different colours. The algorithms performing just that are covered in this section. A dot. ------ Rasterizing dot is very straightforward, I am not even sure it is warranted to call it rasterization at all. All it involves is just setting some location in the screen memory to particular colour. Let's recall that the very first location of the colourmap describes top rightmost pixel, that determines the system of references which is convenient in this case with X axis directed to the right and Y axis directed downward. HW_X_SIZE (0,0)+-+-+-+- ... -+-------> X | | | | +-+-+ + | HW_Y_SIZE ... | +-+- +-+ | | |*|<------y +-+-+ +-+ | ^ | | V | Y x pic 3.1 Assuming that each pixel is represented by one byte, and that the dimensions of the output surface is HW_SCREEN_X_SIZE by HW_SCREEN_Y_SIZE it is the y*HW_SCREEN_X_SIZE+x location we have to access in order to plot a dot at (x,y). Here's some code to do just that: unsigned char *G_buffer; /* colourmap- offscreen buffer */ void G_dot(int x,int y,unsigned char colour) { G_buffer[y*HW_SCREEN_X_SIZE+x]=colour; } A line. ------- Line rasterization is a very important peace of code for the library. It would also be used for polygon rendering, and few other routines. What we have to do is set to some specified colour all dots on the line's path. We can easily calculate coordinates of all dots belonging to this line: * (xend,yend) --- / | * (x,y) dy / | / |y | (xstart,ystart) *-----+ --------- x | | | |- -dx- -| pic 3.2 Remembering a little bit of high (high?) school geometry, it can be seen that if: dx = xend-xstart dy = yend-ystart then: x dx x * dy --- = ---- and y = -------- y dy dx Since we want to set all pixels along the path we just have to take all Xs in the range of [xstart, xend] and calculate respective Y. There is a bit of a catch, we would have only as much as xend-xstart calculated coordinates. But if the Y range was actually longer the path we have just found won't be continuous pic 3.3. In this case we should have been calculating coordinates taking values from the longer range [ystart, yend] and getting respective X, pic 3.4 as: y * dx x = -------- dy ....* ....* ..... ....* ...*. ...*. ..... ...*. ..*.. yrange ..*.. yrange ..... ..*.. .*... .*... ..... .*... *.... *.... xrange xrange pic 3.3 pic 3.4 What we just did is quite logical and easily programmable algorithm, but... it is not that commonly used, the reason has to do with the way we did the calculation, it involved division, and that is by far the slowest operation any computer does. What we would do instead doesn't involve a single division. It is called a Bresenham's integer based algorithm (Bresenham, 1965) and here we find all points using few logical operations and integer addition. The idea is to find a bigger range among [xstart,xend] and [ystart,yend] go along points in it and have a variable saying when it is time to advance in the other range. Let's consider what is happening at some stage in line rasterization, let's assume X is a longer range (dx>dy) and that with the line we are rendering xstart<xend and ystart<yend: | | | H(x+1,y+1) -+---|---*- | |h/ | | | * I(x+1,real_y) | /| + - - - - + - - - - + - | / | | | / | |l | / | -*---|---*-L(x+1,y) |P(x,y) | pic 3.5 Pic 3.5 shows the situation where we just rendered a pixel P (Previous) at coordinates (x,y) and now have to make a decision where to go to next, to pixel L(x+1,y) (Lower) or H(x+1,y+1) (Higher). points P,H,L actually represent middles of respective pixels. Since actual line would pass somewhere in the middle between those two points we must plot the one whose centre is closer to the intersection point I(x+1,real_y). this can be measured by comparing segments h and l, resulted from the intersection: remembering dependency used in the previous method we can find that: dy * (x+1) real_y = ------------ dx and: dy * (x+1) h = y+1-real_y => h = y+1 - ------------ dx dy * (x+1) l = real_y-y => l = ------------ - y dx Now, we are interested in comparing l and h: dy * (x+1) l - h = 2 ------------ - 2y-1 dx So, if l-h>0 it means that l>h the intersection point L is closer to point H and the later should be rendered, otherwise L should be rendered. let's multiply both sides by dx: dx(l - h)= 2dyx + 2dy - 2dxy - dx Now, since dx assumed bigger then 0, signs of (l-h) and dx(l-h) would be the same. Let's call dx(l-h) as d and find it's sign and value at some step i and next step i+1: d = 2dyx -2dxy + 2dy - dx i i-1 i-1 d = 2dyx -2dxy + 2dy - dx i+1 i i We can also find the initial value d, in the very first point since before that x==0 and y==0: d = 2dy-dx 1 ------------- Since sign of d determines which point to move next (H or L) lets find value of d at some step i assuming we know it's value at i-1 from previous calculations: d - d = 2dyx - 2dxy - 2dyx + 2dxy i+1 i i i i-1 i-1 d - d = 2dy(x - x ) - 2dx(y - y ) i+1 i i i-1 i i-1 Depending on the decision taken in the previous point value of d in the next one would be either: when H was chosen since x - x =1 and y - y =1 i i-1 i i-1 d - d = 2dy - 2dx i+1 i d = d + 2dy - 2dx i+1 i --------------------- or: when L was chosen since x - x =1 and y - y =0 i i-1 i i-1 d -d = 2dy i+1 i d =d + 2dy i+1 i -------------- This may seem a little complicated but the implementation code certainly doesn't look this way. We just have to find the initial value of d, and then in the loop depending on it's sign add to it either 2dy-2dx, or just 2dy. since those are constants, they can be calculated before the actual loop. In the proof above we have assumed xstart<yend ystrat<yend, however, in real life we can't guarantee that. The way we can take care of it, is just changing increments to decrements when above assumptions don't hold. We also have to remember that in the loop when L point is picked it is only value belonging to the bigger range which is incremented (decremented) whereas when picking H point both X and Y have to be changed since that is when we are advancing along both axes. void G_line(int x,int y,int x2,int y2,unsigned char colour) { int dx,dy,long_d,short_d; int d,add_dh,add_dl; register int inc_xh,inc_yh,inc_xl,inc_yl; register int i; dx=x2-x; dy=y2-y; /* ranges */ if(dx<0){dx=-dx; inc_xh=-1; inc_xl=-1;} /* making sure dx and dy >0 */ else { inc_xh=1; inc_xl=1; } /* adjusting increments */ if(dy<0){dy=-dy; inc_yh=-1; inc_yl=-1;} else { inc_yh=1; inc_yl=1; } if(dx>dy){long_d=dx; short_d=dy; inc_yl=0;}/* long range,&making sure either */ else {long_d=dy; short_d=dx; inc_xl=0;}/* x or y is changed in L case */ d=2*short_d-long_d; /* initial value of d */ add_dl=2*short_d; /* d adjustment for H case */ add_dh=2*short_d-2*long_d; /* d adjustment for L case */ for(i=0;i<=long_d;i++) /* for all points in longer range */ { G_dot(x,y,colour); /* rendering */ if(d>=0){x+=inc_xh; y+=inc_yh; d+=add_dh;}/* previous point was H type */ else {x+=inc_xl; y+=inc_yl; d+=add_dl;}/* previous point was L type */ } } As you can see this algorithm doesn't involve divisions at all. It does have multiplication by two, but almost any modern compiler would optimize it to 1 bit left shift - a very cheap operation. Or if we don't have faith in the compiler we can do it ourselves. Lets optimize this code a little bit. Whenever trying to optimize something we first have to go after performance of the code in the loop since it is something being done over and over. In this source above we can see the function call to G_dot, let's remember what is inside there,.... oh, it is y*HW_SCREEN_X_SIZE+x a multiplication... an operation similar in length to division which we just spent so much effort to avoid. Lets think back to the representation of the colourmap, if we just rendered a point and now want to render another one next to it how their addresses in the colourmap would be different? pic 3.6 +-+-+- A|*->|B +|+-+- C|V| | +-+- | | pic 3.6 Well, if it is along X axis that we want to advance we just have to add 1, to the address of pixel A to get to pixel B, if we want to advance along Y we have to add to the address of A number of bytes in the colourmap separating A and C. this number is exactly HW_SCREEN_X_SIZE that is length of one horizontal line of pixels in memory. Now, using that, let's try instead of coordinates (x,y) to calculate it's address in the colourmap: void G_line(int x,int y,int x2,int y2,unsigned char colour) { int dx,dy,long_d,short_d; int d,add_dh,add_dl; int inc_xh,inc_yh,inc_xl,inc_yl; register int inc_ah,inc_al; register int i; register unsigned char *adr=G_buffer; dx=x2-x; dy=y2-y; /* ranges */ if(dx<0){dx=-dx; inc_xh=-1; inc_xl=-1;} /* making sure dx and dy >0 */ else { inc_xh=1; inc_xl=1; } /* adjusting increments */ if(dy<0){dy=-dy; inc_yh=-HW_SCREEN_X_SIZE; inc_yl=-HW_SCREEN_X_SIZE;}/* to get to the neighboring */ else { inc_yh= HW_SCREEN_X_SIZE; /* point along Y have to add */ inc_yl= HW_SCREEN_X_SIZE;}/* or subtract this */ if(dx>dy){long_d=dx; short_d=dy; inc_yl=0;}/* long range,&making sure either */ else {long_d=dy; short_d=dx; inc_xl=0;}/* x or y is changed in L case */ inc_ah=inc_xh+inc_yh; inc_al=inc_xl+inc_yl; /* increments for point adress */ adr+=y*HW_SCREEN_X_SIZE+x; /* address of first point */ d=2*short_d-long_d; /* initial value of d */ add_dl=2*short_d; /* d adjustment for H case */ add_dh=2*short_d-2*long_d; /* d adjustment for L case */ for(i=0;i<=long_d;i++) /* for all points in longer range */ { *adr=colour; /* rendering */ if(d>=0){adr+=inc_ah; d+=add_dh;} /* previous point was H type */ else {adr+=inc_al; d+=add_dl;} /* previous point was L type */ } } We can see from this code that although the complexity of the preliminary part of the algorithm went up a bit, the code inside the loop is much shorter and simpler. There is just one thing left to add in order to make it all completely usable - clipping that is making sure our drawing doesn't go outside the physical boundaries of the colourmap, but this is covered in the next chapter. Ambient polygons ---------------- What we have to do is to fill with some colour inside the polygon's edges. The easiest and perhaps fastest way to do it is by using "scan line" algorithm. The idea behind it is to convert a polygon into a collection of usually horizontal lines and then render them one at a time. The lines have to be horizontal only because the most common colourmap would have horizontal lines contingent in memory which automatically allows to render them faster then vertical lines since increment by 1 is usually faster then addition. There is one inherited limitation to the algorithm, because at each Y heights there would be just one horizontal line only concave polygons can be rendered correctly. It can be seen that non concave polygons might need two or more lines sometimes pic 3.7 (that's by the way, in case you wondered, determines the difference in definitions between concave and non-concave polygons): *\ | \ /* y|---\ /-| | \/ | *--------* pic 3.7 How can we represent the collection of horizontal lines? obviously we just need to know at which Y heights it is, X coordinate of the point where it starts and another X coordinate of the point where it ends. So lets have one "start" and one "end" location for every Y possible on the screen. Since every line is limited by two points each belonging to certain edge, we can take one edge at a time find all points belonging to it and for each of them change the "start" and "end" at particular Y heights. So that at the end we would have coordinates of all scan lines in the polygon pic 3.8: start end +-----+---+ 1 2 3 4 5 6 7 8 | 1 | 5 |- - - - - -> *------\ | 2 | 8 | \------------* | 2 | 7 |- - - - - - -> \----------/ | 3 | 7 | \------/ | 4 | 6 |- - - - - - - -> *----* +-----+---+ pic 3.8 Initially we would put the biggest possible value in all locations of "start" array and the smallest possible in the locations of "end" array. (Those are by the way are defined in <limits.h> and it is not a bad practice to use them). Each time we've calculated the point on the edge we have to go to "start" and "end" locations at Y heights and if X of the point less then what we have in "start" write it there. Similar to that if this same X is bigger then value in "end" location we have to put it there too. Because of the way the arrays were initialized first point calculated for every Y height would change both "start" and "end" location. Let's look at the code to find Xs for each edge, so called scanning function: void GI_scan(int *edges,int dimension,int skip) { signed_32_bit cur_v[C_MAX_DIMENSIONS]; /* initial values for Z+ dims */ signed_32_bit inc_v[C_MAX_DIMENSIONS]; /* increment for Z+ dimensions */ signed_32_bit tmp; int dx,dy,long_d,short_d; register int d,add_dh,add_dl; register int inc_xh,inc_yh,inc_xl,inc_yl; int x,y,i,j; int *v1,*v2; /* first and second verteces */ v1=edges; edges+=skip; v2=edges; /* length ints in each */ if(C_line_y_clipping(&v1,&v2,dimension)) /* vertical clipping */ { dx=*v2++; dy=*v2++; /* extracting 2-D coordinates */ x=*v1++; y=*v1++; /* v2/v1 point remaining dim-2 */ dimension-=2; if(y<G_miny) G_miny=y; if(y>G_maxy) G_maxy=y; if(dy<G_miny) G_miny=dy; if(dy>G_maxy) G_maxy=dy; /* updating vertical size */ dx-=x; dy-=y; /* ranges */ if(dx<0){dx=-dx; inc_xh=-1; inc_xl=-1;} /* making sure dx and dy >0 */ else { inc_xh=1; inc_xl=1; } /* adjusting increments */ if(dy<0){dy=-dy; inc_yh=-1; inc_yl=-1;} else { inc_yh=1; inc_yl=1; } if(dx>dy){long_d=dx;short_d=dy;inc_yl=0;} /* long range,&making sure either */ else {long_d=dy;short_d=dx;inc_xl=0;} /* x or y is changed in L case */ d=2*short_d-long_d; /* initial value of d */ add_dl=2*short_d; /* d adjustment for H case */ add_dh=2*short_d-2*long_d; /* d adjustment for L case */ for(i=0;i<dimension;i++) /* for all remaining dimensions */ { tmp=v1[i]; tmp<<=G_P; cur_v[i]=tmp; /* f.p. start value */ tmp=v2[i]-v1[i]; tmp<<=G_P; /* f.p. increment */ if(long_d>0) inc_v[i]=tmp/long_d; /* if needed (non 0 lines) */ } for(i=0;i<=long_d;i++) /* for all points in longer range */ { if(x<G_start[y]) /* further then rightmost */ { G_start[y]=x; /* the begining of scan line */ for(j=0;j<dimension;j++) G_rest_start[j][y]=cur_v[j]; /* all other dimensions */ } if(G_end[y]<x) /* further the leftmost */ { G_end[y]=x; /* the end of scan line */ for(j=0;j<dimension;j++) G_rest_end[j][y]=cur_v[j]; /* and for other dimension */ } if(d>=0){x+=inc_xh;y+=inc_yh;d+=add_dh;} /* previous dot was H type */ else {x+=inc_xl;y+=inc_yl;d+=add_dl;} /* previous dot was L type */ for(j=0;j<dimension;j++) cur_v[j]+=inc_v[j]; /* for all other dimensions */ } } } As you can see this edge scanning function doesn't have much differences with our most basic line rendering code. The only thing we are updating the vertical size of polygon so that after all edges would go through this function we would have the correct vertical dimensions in G_miny and G_maxy. There is another difference: scan function is designed to process, other dimensions, that is it would interpolate X and similar to that other values specified in vertices, finding them for all Ys in range. We would need that when adding interpolative shading feature. As to the ambient polygon rendering, After scanning is over we would just have to render all horizontal scan lines in the range [G_miny,G_maxy]: void G_ambient_polygon(int *edges,int length,unsigned char colour) { int new_edges[G_MAX_TUPLES]; /* verticaly clipped polygon */ int new_length; /* although no edges there yet */ register unsigned char *l_adr,*adr; register int beg,end,i; GI_boarder_array_init(); /* initializing the arrays */ new_length=C_polygon_x_clipping(edges,new_edges,2,length); for(i=0;i<new_length;i++) GI_scan(&new_edges[i*2],2,2); /* Searching polygon boarders */ if(G_miny<=G_maxy) /* nothing to do? */ { l_adr=G_buffer+G_miny*HW_SCREEN_X_SIZE; /* addr of 1st byte of 1st line */ for(; G_miny<=G_maxy; G_miny++, l_adr+=HW_SCREEN_X_SIZE) /* rendering all lines */ { adr=l_adr+(beg=G_start[G_miny]); /* addr of the current line */ end=G_end[G_miny]; /* ends here */ for(;beg<=end;beg++) *adr++=colour; /* rendering single line */ } } } As to the optimization it doesn't appear we can do a lot without leaving comfortable boundaries of C. On the other hand when there's really nothing else we can do to speed things up, when all manuals are read and all friends are out of ideas let's do it, let's go for assembly. I don't think rewriting all functions this way is appealing. Moreover only rewriting the real execution bottleneck would give considerable speed increase. That's, perhaps, one of the reason why modern compilers have such a nice feature as inline assembly, allowing to add low-level code directly into C source. Looking at the code above we can see this line rendering loop. This is an obvious candidate, since it executes fairly often and does quite a lot of iterations, especially compared with surrounding code. Different C compilers have different provision for inline assembly. GNU C compiler has quite powerful syntax allowing to specify assembly code. It is: asm("assembly_command %1,%0":"=constraint" (result) :" constraint" (argument), " constraint" (argument), ... :" clobbered_reg", "clobbered_reg" ... ); The assembly instruction is specified with aliases allowing to add names of C variables into it. it also allows to specify the CPU registers the command, we are inserting, might clobber. It is particularly important if we are expecting to compile with optimizing option or have register variables in the code. We don't want the program trying to use contents of a register we just destroyed by inline assembly instruction. If we are specifying the clobbered registers, on the other hand, compiler would make sure that no useful information is in this registers at the moment inline assembly code starts to execute. WATCOM C compiler has different provision for inline assembly: return_type c_function(parameter1,parametr2, ...); #pragma aux c_fucntion= \ "assembly_command" \ "assembly_command" \ parm [reg] [reg] \ value [reg] \ modify [reg reg]; It is implemented by providing sort of a pseudo c-function which is treated by the compiler pretty much like a macro. "parm" specifies which registers take in parameters from function's argument list, "value"- from what register, if any, the return value is taken and "modify", specifies clobbered registers. for DJGPP compiler code substituting line rendering loop: for(;beg<=end;beg++) *adr++=colour; /* rendering single line */ may look like: asm("movl %0,%%ecx" :: "g" (end):"%ecx"); asm("subl %0,%%ecx" :: "g" (beg):"%ecx"); asm("movl %0,%%edi" :: "g" (adr):"%edi"); asm("movl %0,%%eax" :: "g" (colour):"%eax"); asm("cld"); /* increment when stosb */ asm("rep"); asm("stosb %al,(%edi)"); /* rendering line here */ for WATCOM: void a_loop(unsigned char *adr,int end,int beg,int colour); #pragma aux a_loop= \ "sub ecx,esi" \ "cld" \ "rep stosb" \ parm [edi] [ecx] [esi] [al]; Not a terribly complicated code to write but something giving considerable speed increase. Then again we might have used a C library function memset for filling chunks of memory: void *memset(void *s,int c,size_t n); memset(l_adr,colour,end-beg); /* filling scan line */ This function might have had just that code, we wrote, inside of it. let's not believe assumptions neither pro nor con and go check that out. with DJGPP or, in this matter, most UNIX compilers we can just extract and disassemble memset code from libc.a (standard c library) and see for ourselves. let's go: ar -x libc.a memset.o objdump -d memset.o We would see something like: memset.o: file format coff-go32 Disassembly of section .text: 00000000 <_memset> pushl %ebp 00000001 <_memset+1> movl %esp,%ebp 00000003 <_memset+3> pushl %edi 00000004 <_memset+4> movl 0x8(%ebp),%edi 00000007 <_memset+7> movl 0xc(%ebp),%eax 0000000a <_memset+a> movl 0x10(%ebp),%ecx 0000000d <_memset+d> jcxz 00000011 <_memset+11> 0000000f <_memset+f> repz stosb %al,%es:(%edi) 00000011 <_memset+11> popl %edi 00000012 <_memset+12> movl 0x8(%ebp),%eax 00000015 <_memset+15> leave 00000016 <_memset+16> ret 00000017 <_memset+17> Address 0x18 is out of bounds. Disassembly of section .data: Very similar to what we wrote? Other compilers might have worse library code? it might be so, let's try WATCOM C, let's go: wlib clibs.lib *memset wdisasm memset.obj that's what we would see: Module: memset Group: 'DGROUP' CONST,CONST2,_DATA,_BSS Segment: '_TEXT' BYTE USE32 00000023 bytes 0000 57 memset push edi 0001 55 push ebp 0002 89 e5 mov ebp,esp 0004 8b 45 10 mov eax,+10H[ebp] 0007 8b 4d 14 mov ecx,+14H[ebp] 000a 8b 7d 0c mov edi,+0cH[ebp] 000d 06 push es 000e 1e push ds 000f 07 pop es 0010 57 push edi 0011 88 c4 mov ah,al 0013 d1 e9 shr ecx,1 0015 f2 66 ab repne stosw 0018 11 c9 adc ecx,ecx 001a f2 aa repne stosb 001c 5f pop edi 001d 07 pop es 001e 89 f8 mov eax,edi 0020 c9 leave 0021 5f pop edi 0022 c3 ret No disassembly errors ------------------------------------------------------------ Is it not very similar to what we just did using inline assembly? WATCOM C code is even trying to be smarter then us by storing as much as it can by word store instruction, since every word store in this case is equivalent to two char stores, there should be twice less iterations. Of course there would be some time spent on the function call itself but most likely performance of the inline assembly code we wrote and memset call would be very very similar. It is just yet another example of how easy it might be in some cases to achieve results by very simple means instead of spending sleepless hours in front of the glowing box (unfortunately only sleepless hours teach us how to do things by simple means but that's the topic for another story only marginally dedicated to 3-D graphics). By the way, in the provided source code there is an independent approach to fast memory moves and fills. That is functions to perform those are brought out into hardware interface as: HW_set_char ; HW_set_int; HW_copy_char; HW_copy_int. And their implementation may actually be either of the ways described above depending on the particular platform. Shaded polygons. ---------------- Shaded polygons are used to smooth appearance of faceted models, that is those where we approximate real, curved surfaces by a set of plane polygons. Interpolative or Gouraud shading (Gouraud, 1971), we would discuss, is the easiest to implement, it is also not very expensive in terms of speed. The idea is that we carry a colour intensity value in every vertex of the polygon and linearly interpolate those values to find colour for each pixel inside the polygon. Finally this is the place where implemented N-dimensional scanning procedure comes in handy. Indeed colour intensity can be pretty much handled as just an extra dimension as far as edge scanning and clipping are concerned. *I3 / \ / \ I0* \ \ *I2 \ I / IT1*---*---*IT2 \ / \ / \ / *I1 pic 3.9 We can obtain values on the left and right boarders of a polygon (IT1, IT2 on pic 3.9) by interpolating colour intensity value in every edge during edge scanning, just as "start/end' X values were found for every horizontal line. Afterwards when rendering certain horizontal scan line we can further interpolate "start" and "end" intensities for this line finding colour for pixels belonging to it (I on pic 3.9). Fixed point math is probably a very convenient way to implement this interpolation: void G_shaded_polygon(int *edges,int length) { int new_edges[G_MAX_SHADED_TUPLES]; /* verticaly clipped polygon */ int new_length; register unsigned char *l_adr,*adr; register int beg,end,i; register signed_32_bit cur_c,inc_c; /* current colour and it's inc */ GI_boarder_array_init(); /* initializing the array */ new_length=C_polygon_x_clipping(edges,new_edges,3,length); for(i=0;i<new_length;i++) GI_scan(&new_edges[i*3],3,3); /* Searching polygon boarders */ if(G_miny<=G_maxy) /* nothing to do? */ { l_adr=G_buffer+G_miny*HW_SCREEN_X_SIZE; /* addr of 1st byte of 1st line */ for(; G_miny<=G_maxy; G_miny++, l_adr+=HW_SCREEN_X_SIZE) /* rendering all lines */ { adr=l_adr+(beg=G_start[G_miny]); /* addr of the current line */ end=G_end[G_miny]; /* ends here */ cur_c=G_0_start[G_miny]; /* colour starts with this value */ inc_c=G_0_end[G_miny]-cur_c; if(end>beg) inc_c/=end-beg+1; /* f.p. increment */ for(;beg<=end;beg++) /* render this line */ { *adr++=cur_c>>G_P; /* rendering single point */ cur_c+=inc_c; /* incrementing colour */ } } } } As you see code above looks not that much different from ambient polygon rendering source. There is one small catch. I said that intensities can be handled pretty much as an extra dimension. In fact we can consider shaded polygon on the screen to take 2 space dimensions, and to have one colour dimension. But would all vertexes of this polygon belong to one plane in such 3-D space? If they would not, shading would look rather weird especially when this polygon would start to rotate. The common solution is limit polygons to just triangles. Why? well three points always belong to the same plane in 3-D space. Textured polygons. ------------------ It might seem that we can extend interpolation technique used to calculate colour intensity for shading, onto texture rendering. Just keep texture X and Y in every vertex of the polygon, interpolate those two along edges and then along horizontal lines obtaining this way texture (X,Y) for every pixel to be rendered. This however does not work... Or to be more exact it stops working when perspective projection is used, the reason is simple: perspective transformation is not linear. One may ask, why did it work for interpolative shading, after all we were using perspective transformation all along? The answer is it did not work, but visual aspect of neglecting perspective effect for shading is quite suitable, neglecting it for texture rendering however is not. I believe it has to do with different image frequencies. Just consider the following situation: a rectangular polygon is displayed on the screen so that it's one side is much closer to us then the other one, which is almost disappearing in the infinity: where do you think in the texture map lines A,B,C and D would be mapped from by linear interpolating method? how do you think the texture on the polygon would look like? +\ +--/---/+1 A|--\ A|/ / | B|----\ B|/ | C|----/ C|\ <------ Missed area D|--/ D|\ \ | +/ +--\---\+2 Polygon on the Texture Screen Map pic 3.10 Well it would look like if someone has carved triangular area marked "missed" on the picture above, and connected points 1 and 2 together, real ugly... In less dramatic cases texture rendered this way might look nicer, being perfect when effect of perspective transformation is negligible: all points are at almost the same distance from the viewer, or very far away from the viewing plane. To get always nice looking results we would have to consider what is really happening when texture is being rendered: u^ Texture Space | *T(u,v) +----> o v y ^ | z | U \ /*W(x,y,z)/ V World Space | / \ / | / \ / |/ * O +--------------> x j^ | | *S(i,j) Screen Space | +------->i pic 3.11 Consider three spaces pictured above: the texture space, world space (the one before projection) and finally contents of the screen - screen or image space. We may think of them just as original polygon/texture map in case of texture space, polygon/texture map turned somehow in case of world space and finally same projected onto the screen. If we know mapping of origin and two orthogonal vectors from texture space into the world space, we can express mapping of any point through them: x = Ox + v*Vx + u*Ux y = Oy + v*Vy + u*Uy z = Oz + v*Vz + u*Uz Where Vx...Uz are corresponding components of the respective vectors. Further point in the world space W(x,y,z) is being perspectively transformed into the screen space S(i,j): i = x / z j = y / z (the actual transformation used, would most likely also contain a multiplier - focus, but let's worry about particularities on some later stage). In order to perform mapping, on the other hand, we need u,v texture coordinates as function of screen coordinates i,j x = i * z y = i * z or: Ox + v*Vx + u*Ux = i * [ Oz + v*Vz + u*Uz ] Oy + v*Vy + u*Uy = j * [ Oz + v*Vz + u*Uz ] trying to express u,v through i,j: v*(Vx-i*Vz) + u*(Ux-i*Uz) = i*Oz - Ox v*(Vy-j*Vz) + u*(Uy-j*Uz) = j*Oz - Oy further: (i*Oz - Ox) - u*(Ux - i*Uz) v = ----------------------------- (Vx - i*Vz) (i*Oz - Ox) - v*(Vx - i*Vz) u = ----------------------------- (Ux - i*Uz) and: (i*Oz - Ox) - u*(Ux - i*Uz) ----------------------------- * (Vy - j*Vz) + u*(Uy - j*Uz) = j*Oz - Oy (Vx - i*Vz) (i*Oz - Ox) - v*(Vx - i*Vz) v*(Vy - j*Vz) + ----------------------------- * (Uy - j*Uz) = j*Oz - Oy (Ux - i*Uz) or in nicer form: i*(Vz*Oy-Vy*Oz) + j*(Vx*Oz-Vz*Ox) + (Vy*Ox-Vx*Oy) u = -------------------------------------------------- i*(Vy*Uz-Vz*Uy) + j*(Vz*Ux-Vx*Uz) + (Vx*Uy-Vy*Ux) i*(Uy*Oz-Uz*Oy) + j*(Uz*Ox-Ux*Oz) + (Ux*Oy-Uy*Ox) v = -------------------------------------------------- i*(Vy*Uz-Vz*Uy) + j*(Vz*Ux-Vx*Uz) + (Vx*Uy-Vy*Ux) At this point we have formulas of from where in the texture space the point in the screen space originates. There are several implementational problems, first few are simple, the actual perspective transformation is i=x*focus/z rather then i=x/z, plus, we are performing screen centre translation so that screen space (0,0) is in the middle of the screen and not in the upper left corner. Hence original dependency should have been: i = (x * focus) / z + HW_SCREEN_X_CENTRE j = (y * focus) / z + HW_SCREEN_Y_CENTRE And so finally, so called, reverse mapping formulas have to be amended respectively: ((i-HW_SCREEN_X_CENTRE)/focus) * (Vz* ... u = ------------------------------------------- ... ... Another, a bit difficult part has to do with texture scaling. If the size of texture vectors was picked to be 1 that would assume no scaling, that is unit size along the polygon in the world space would correspond to unit size in the texture space. What we however want to do in a lot of instances is to map smaller texture onto a bigger polygon or the other way around, bigger texture onto a smaller polygon. Let's try scaling the texture space: T *----------> v| * | S | | \ *---------------------> | | \ | v'| * | v T | *-- ---* | | \ ----- = ----- V- - - - - | | | \ | v' S | | \ | |mapping \ | v'*T | |of the \ v = ----- | |polygon \ | S | *---------------* V- - - - - - - - - - -| pic 3.12 now we can put expression for v into direct mapping equations: Vx*T Ux*T x = Ox + v'* ------ + u'* ------ S S ... Vx and Ux multiplied by T in fact are just projections of vectors enclosing the hole texture space, let's call them V'x == Vx*T, U'x == Ux*T etc. V'x U'x x = Ox + v'* ----- + u'* ----- S S ... considering this reverse mapping equations would look like: ((i-HW_SCREEN_X_CENTRE)/focus) * (V'z* ... u'= S * -------------------------------------------- ... ... This would allow us to scale texture on the polygon by changing S multiplier. Another problem has to do with point O - mapping of the origin of the texture space into the world space. The thing is, if the mapping of this point gets onto Z==0 plane mapping equations above no longer hold. Just due to the fact that perspective transformation is having singularity there. In general we deal with this problem of perspective transformation by clipping the polygon against some plane in front of Z==0. And do we really need mapping of exactly O of the texture space? Not necessarily, it may be any point in texture space assuming we change a bit direct mapping equations: x' = Ox + (v'-v'0)*V'x + (u-u0)*U'x ... where (v'0,u'0) are coordinates in the texture space of the point we have a mapping for in the world space that is (Ox,Oy,Oz). And the reverse mapping equations would be then: ((i-HW_SCREEN_X_CENTRE)/focus) * (V'z* ... u'= S * ------------------------------------------- + u'0 ... ... How can we obtain a mapping of, now, any point from the texture space into the world space? If we would associate with every polygon's vertex besides just x,y,z also u,v of where this vertex is in the texture, we can treat that as 5-D polygon and perform clipping on it, obtaining vertexes with coordinates suitable for perspective transformation, hence any of them would be fine for the mapping equations. (How to do clipping on polygons is covered in the next chapter). Last thing, in order to render regular patterns, those repeating themselves, we may want to introduce texture size parameter, and make sure that if we want to access texture outside this size, this access would wrap around to pick a colour from somewhere inside the texture. This can be easily done if texture size is chosen to be some power of 2, for in this case we can perform range checking with just logical "and" operation. However, using above equations for texture mapping appear to be quite expensive, there are few divisions involved per each rendered pixel. How to optimize that? The, perhaps, easiest way is: horizontal line subdivision, that is calculating real texture mapping every N pixels and linearly interpolating in between, this method is used in the implementation below: void G_textured_polygon(int *edges,int length,int *O,int *u,int *v, unsigned char *texture,int log_texture_size, int log_texture_scale ) { int new_edges[G_MAX_SHADED_TUPLES]; /* verticaly clipped polygon */ int new_length; signed_32_bit Vx,Vy,Vz; signed_32_bit Ux,Uy,Uz; /* extracting vectors */ signed_32_bit x0,y0,z0; signed_32_bit ui,uj,uc; signed_32_bit vi,vj,vc; signed_32_bit zi,zj,zc; /* back to texture coeficients */ signed_32_bit v0,u0; signed_32_bit xx,yy,zz,zzz; int xstart,xend; signed_32_bit txstart,tystart; signed_32_bit txend,tyend; signed_32_bit txinc,tyinc; register unsigned char *g_adr,*adr; register int i,x,y; signed_32_bit txtrmasc=(0x1<<(log_texture_size+G_P))-0x1; log_texture_scale+=G_P; GI_boarder_array_init(); /* initializing the array */ new_length=C_polygon_x_clipping(edges,new_edges,4,length); for(i=0;i<new_length;i++) GI_scan(&new_edges[i*4],2,4); /* Searching polygon boarders */ if(G_miny<=G_maxy) /* nothing to do? */ { x0=O[0]; y0=O[1]; z0=O[2]; u0=O[3]<<G_P; v0=O[4]<<G_P; /* world point <-> texture point */ Vx=v[0]; Vy=v[1]; Vz=v[2]; Ux=u[0]; Uy=u[1]; Uz=u[2]; /* extracting vectors */ ui=(Vz*y0)-(Vy*z0); uj=(Vx*z0)-(Vz*x0); uc=(Vy*x0)-(Vx*y0); vi=(Uy*z0)-(Uz*y0); vj=(Uz*x0)-(Ux*z0); vc=(Ux*y0)-(Uy*x0); zi=(Vy*Uz)-(Vz*Uy); zj=(Vz*Ux)-(Vx*Uz); zc=(Vx*Uy)-(Vy*Ux); /* back to texture */ g_adr=G_buffer+G_miny*HW_SCREEN_X_SIZE; /* addr of 1st byte of 1st line */ for(; G_miny<=G_maxy; G_miny++, g_adr+=HW_SCREEN_X_SIZE) /* rendering all lines */ { xstart=G_start[G_miny]; adr=g_adr+xstart; xstart-=HW_SCREEN_X_CENTRE; x=xstart; xend=G_end[G_miny]-HW_SCREEN_X_CENTRE; y=G_miny-HW_SCREEN_Y_CENTRE; xx=((y*uj)>>T_LOG_FOCUS)+uc; yy=((y*vj)>>T_LOG_FOCUS)+vc; zz=((y*zj)>>T_LOG_FOCUS)+zc; /* valid for the hole run */ if(( zzz=zz+((x*zi)>>T_LOG_FOCUS) )!=0) { txend=( (xx+ ((x*ui)>>T_LOG_FOCUS)) <<log_texture_scale )/zzz +u0; tyend=( (yy+ ((x*vi)>>T_LOG_FOCUS)) <<log_texture_scale )/zzz +v0; } for(;xstart<xend;) { x+=G_LINEAR; if(x>xend) x=xend; /* size of linear run */ txstart=txend; tystart=tyend; if(( zzz=zz+((x*zi)>>T_LOG_FOCUS) )!=0) { txend=( (xx+ ((x*ui)>>T_LOG_FOCUS)) <<log_texture_scale )/zzz +u0; tyend=( (yy+ ((x*vi)>>T_LOG_FOCUS)) <<log_texture_scale )/zzz +v0; } length=x-xstart; /* ends here */ if(length!=0) { txinc=(txend-txstart)/length; tyinc=(tyend-tystart)/length; } for(;xstart<x;xstart++) /* linear run */ { txstart&=txtrmasc; tystart&=txtrmasc; /* wrap around */ *adr++=*(texture+((tystart>>G_P)<<log_texture_size)+(txstart>>G_P)); txstart+=txinc; tystart+=tyinc; /* new position in the texture */ } } } } } Other possible speed-up techniques are: area subdivision involving 2-D interpolation, faking texture mapping with polynomials (later has not very much to do with the true mapping equations described here, however) and rendering texture along constant z lines (and not along horizontal line) the advantage gained by the former is that along every constant Z line perspective transformation is linear ( perspective transformation is i=x/z if z==const we can write it as i=coef*x where coef=1/z which is linear of course) The function above might be extended with interpolative shading as well. To do that special consideration has to be given to palette's layout, that is where and how to keep different intensities of the same colour. But once decided coding is quite straightforward. * * * 2-D clipping. --------------- "Don't discount flying pigs before you have good air defense." -- jvh@clinet.fi Screen boundaries clipping. --------------------------- Throughout the last chapter we've assumed that primitives we were rendering are completely within the screen boundaries. This is something we can't really guarantee. The worst thing- if we would try using those functions to render an element outside the screen, they won't much complain and would most likely try accessing memory outside the colourmap's allocated space. And I guess: segmentation fault, core dumped is hardly most graceful way to exit a program. So we don't have another choice but to guarantee rendering algorithms that the element passed is indeed inside the screen. A dot. ------ Clipping looks trivial in case of a dot, we just have to add few comparisons checking if the coordinates are in the screen range and proceed only if that is the case: void G_dot(int x,int y,unsigned char colour) { if( (x>=0)&&(x<HW_SCREEN_X_SIZE) && (y>=0)&&(y<HW_SCREEN_Y_SIZE) ) { G_buffer[y*HW_SCREEN_X_SIZE+x]=colour; } } And of course there if we are always clipping to a rectangular with a diagonal (0,0-something_x,something_y) we can do it with just 2 comparisons instead of 4, just by making sure x and y passed are considered by unsigned comparisons, (negative numbers after all would be thought of as just very big positive). A line. ------- We can extend above method for line and check every time before plotting a dot whether it is within the boundaries. Unfortunately by doing that we push up the complexity of the code within the loop. And moreover our optimized line routine work with addresses of points within the colourmap rather then with the coordinates. What we would have to construct is a function that would take in arbitrary coordinates of some line and return back the coordinates of a line's part which is within the screen boundaries. This process is called clipping. The clipping routine must basically find coordinates of an intersection point between the line and some of the screen edges. * A \\ \ \ \ \I1 \ +-o---------------- Y=0 \| \ I2o \ |\ * B | *C | X=0 pic 4.1 The problem is that we are not sure on what edge the intersection point lies, for example in the picture above line A-B intersects screen at the edge Y==0, whereas line A-C intersects it at the edge X==0. Generally we can avoid thinking where the intersection is located by consecutive clipping a line against first, say, vertical boundaries limits and then horizontal: * A | \ | \oI1 |\ ------+--o--------------- Y=0 * G | I2 \ \ | \ \ | * B *E *H C*---o--*D \ | *F | X=0 pic 4.2 In the example above after first call to clipping routine line A-B would become I1-B, after the second it would assume the final acceptable form I2-B. It can be seen also that some lines going through this clipping method won't actually be clipped at all (E-F) some would be clipped just once (C-D), some twice (A-B) and finally some would be completely clipped, that is, would be totally outside the screen area (G-H). Now, how to find an intersection point? obviously it is not a very complicated case of solving similar triangles: | (x1,y1) * | | \ | | * (xm,ym) | | \ +---|---* (x2,y2) | pic 4.3 y2-ym y2-y1 (y2-y1)*(x2-xm) ------- = ------- => ym = y2 - ----------------- x2-xm x2-x1 x2-x1 but this method involves multiplications and divisions, so beholding to our tradition let's try to avoid it. The alternative method is using binary search: A(-3,1) * | \ | (-1,3)* | o I(0,?) | \ | * B(1,5) | X=0 pic 4.4 Let's see how it works on a simple example: suppose we have to clip a line A(-3,1)-B(1,5) by an edge X=0 , we have to find Y of the intersection point. let's find midpoint of the line A-B: Ax+Bx Ay+By midx=------- midy=------- 2 2 midx=(-3+1)/2=-1 midy=(1+5)/2=3 Now, let's see where the boundary lies with respect to the midpoint? It is still to the right of it, so of two lines A-MidPoint MidPoint-B, edge intersects MidPoint-B. Let's rename MidPoint into A and start the midpoint search over again: midx=(-1+1)/2=0 midy=(3+5)/2=4 Here, the intersection came precisely onto the edge. So midy is the Y coordinate of Intersection point we were looking for. It appears that this method involve a lot of calculations, but they are all very cheap, just an addition and division by two (which is actually a 1 bit right shift). Practice shows binary search works indeed faster then calculation resulting from solving similar triangles. When dealing with divisions performed by shits however, one has to be aware of truncation problems that might arise. Since -1>>1=-1 that means that if we would try to use algorithm described above to clipp (-1,y1)-(0,y2) line by X==0 boundary chances are we would loop forever, at each iteration finding that -1>>1 is still -1. (and O(for ever) is hardly, the time complexity contributing towards high frame-rate). As in the code below this situation should be thought of. Let's create a function which would perform clipping against two vertical screen edges: that is C_X_CLIPPING_MIN and C_X_CLIPPING_MAX. int C_line_x_clipping(int **vertex1,int **vertex2,int dimension) { register int i; register int whereto; register int *l,*r,*m,*t; /* left right and midle and tmp */ static int g_store0[C_MAX_DIMENSIONS]; /* static stores for clipped vxes */ static int g_store1[C_MAX_DIMENSIONS]; static int g_store2[C_MAX_DIMENSIONS]; static int g_store3[C_MAX_DIMENSIONS]; static int g_store4[C_MAX_DIMENSIONS]; static int g_store5[C_MAX_DIMENSIONS]; int **vmn,**vmx; /* so that *vmn[0] < *vmx[0] */ int swap; /* we coordinates swaped? */ C_2D_clipping=0; /* default no clipping yet */ if((*vertex1)[0]<(*vertex2)[0]) { swap=0; vmn=vertex1; vmx=vertex2; } /* so that *vmn[0] < *vmx[0] */ else { swap=1; vmn=vertex2; vmx=vertex1; } if(((*vmn)[0]>C_X_CLIPPING_MAX)||((*vmx)[0]<C_X_CLIPPING_MIN)) return(0); else { if((*vmn)[0]<=C_X_CLIPPING_MIN) /* clipping */ { HW_copy_int(*vmn,l=g_store0,dimension); /* copying old vertixes */ HW_copy_int(*vmx,m=g_store1,dimension); r=g_store2; /* let middle be there */ whereto=0; while(m[0]!=C_X_CLIPPING_MIN) { if(whereto==1) { t=l; l=m; m=t; } else { t=r; r=m; m=t; } for(i=0;i<dimension;i++) m[i]=(l[i]+r[i])>>1; whereto=m[0]<C_X_CLIPPING_MIN; } *vmn=m; /* that is why m[] is static */ C_2D_clipping=swap^1; } if((*vmx)[0]>=C_X_CLIPPING_MAX) /* clipping */ { HW_copy_int(*vmn,l=g_store3,dimension); /* copying old vertixes */ HW_copy_int(*vmx,m=g_store4,dimension); r=g_store5; /* let middle be here */ whereto=0; while(m[0]!=C_X_CLIPPING_MAX) { if(whereto==1) { t=l; l=m; m=t; } else { t=r; r=m; m=t; } for(i=0;i<dimension;i++) m[i]=(l[i]+r[i])>>1; whereto=m[0]<C_X_CLIPPING_MAX; } *vmx=m; /* that is why m[] is static */ C_2D_clipping|=swap&1; } } return(1); /* partialy or not clipped */ } The return value of this function is zero if line is completely clipped otherwise it is 1. The purpose of global C_2D_clipping variable would become clear when we would be talking about polygon clipping. As to the Y clipping we can either modify the above code to do both functions or just duplicate most of the code changing few things, like constants names ( C_Y_CLIP... instead of C_X_CLIPP...) and indexes of clipped variable ( 1 instead of 0). As you can see this code is generic enough to allow clipping of N-dimensional lines ( so that to allow for X Y Z Colour Intensity etc. to be processed at the same time). To do it effectively static arrays are being set to keep temporary left (l) middle (m) and right (r) N-tuples. Whenever making decision which segment to take for the next iteration we can just swap pointers, so that array keeping coordinate of segment being discarded could be set to accept middle point at the next iteration. <----------- m l -----------> r | | | v v v +---------+ +---------+ +---------+ |x,y,z... | |x,y,z... | |x,y,z... | +---------+ +---------+ +---------+ pic 4.5 This is being done so that at every iteration only pointers to actual data are moved not the data itself. (The point I am trying to make: (and I am sure everybody knows that, just a bit of reinforcement) it's ok to physically copy small amounts of data, but when we have a lot of it, it is better to move pointers instead) Let's insert calls to clipping function into the G_line routine: ... ... v1=vertex1; v2=vertex2; if(C_line_x_clipping(&v1,&v2,2)) /* horizontal clipping */ { if(C_line_y_clipping(&v1,&v2,2)) /* vertical clipping */ { dx=v2[0]-v1[0]; dy=v2[1]-v1[1]; /* ranges */ ... ... So, whenever line is completely clipped we wouldn't go any further in the rasterization function. A polygon. ---------- Now, remembering that we render polygon by scanning it's edges using rasterization function pretty much like the one above, we may think that adding clipping calls into GI_scan would solve our problem with polygon clipping, unfortunately, it is only half true (literary half true). Lets consider pictures pic 4.6 and 4.7: B * B | *==== I/ |/==== -----*------ I*====== /====== /|?????? /======== / |????? /======== A* |?????? A*========== pic 4.6 pic 4.7 In the cases above edge A-B contribute to the left side of the polygon, clipping present no problem for case on pic 4.7. Clipped part I-B of the edge can be discarded since there's nothing to form outside the screen. On the other hand in the case on pic 4.6 we can't simply discard clipped part A-I since we would loose left boundary of all horizontal lines marked "???". The observation we can make is that polygon, being scanned edge at a time, may not be clipped against horizontal boundaries but must be clipped against vertical, this way the code within GI_scan would look like: ... ... v1=edges; edges+=skip; v2=edges; /* length ints in each */ if(C_line_y_clipping(&v1,&v2,dimension)) /* vertical clipping */ { dx=*v2++; dy=*v2++; /* extracting 2-D coordinates */ x=*v1++; y=*v1++; /* v2/v1 point remaining dim-2 */ ... ... Creating a vertically clipped polygon on the other hand is a bit more complicated. The problem is that the clipped polygon may have different from the original one number of edges. let's consider the situation on the pic 4.8: /* 3 | | / | |5' |/ | 5*-*-----*6 * | / | \ /|2''| 4* | *1 2 * | | \ | / \| | 3*-*-----*2 *\ | |3' |2'\* 1 pic 4.8 pic 4.9 It can be seen that original polygon has 6 edges which becomes 5 after clipping. Some other polygon pic 4.9 can have 1 more edge after clipping. It is obvious that we would have to create a temporary structure to hold the clipped polygon. Now, we know how to clip a single edge, let's try to find pattern how clipped edges compose clipped polygon. Let's be considering clipping against a single boundary, say X==0. it is obvious that if an edge is completely outside it's vertexes won't be present in the new polygon. On the other hand if an edge is within the boundary both points would be present. Any point on the intersection line would be present too. One more consideration is that each point in the polygon belong to two edges, so when the edge is not clipped we may put into the new structure just the second point assuming that the first one is already taken care of when we were processing previous edge. Let's list the patterns: (1) If edge is not clipped or the second point is clipped put into new structure just the second point; (2) When first point is changed or both are changed put both; (3) When both are outside put none. Surprisingly enough this algorithm doesn't have to be changed when we consider simultaneous clipping against two parallel boundaries: | *-----*1| |/5 \| 4'* *2' /| |\ 4* | | \ | | | \ *-*---------*---*2 3 |3' |2'' | | pic 4.10 edge 1-2 Second point clipped - pattern (1) putting in 2' edge 2-3 Both points changed - pattern (2) putting in 2'' and 3' edge 3-4 Both points outside - pattern (3) ignoring edge 4-5 First point clipped - pattern (2) putting in 4' and 5 edge 5-1 No clipping - pattern (1) putting in 1 The result is: 2'-2''-3'-4'-5-1 just looking at the picture assures us that what we got is indeed right. Now finally the purpose of C_2D_clipping variable being set in C_line_x_clipping must be clear. It would be set to 1 whenever first or both points were changed, and would be 0 if none or just second one were changed. Knowing this all let's write some code: int C_polygon_x_clipping(register int *from,register int *to, int dimension,int length ) { register int i; int *v1,*v2,new_lng=0; int *first_vrtx=to; /* begining of the source */ for(i=0;i<length;i++) /* for all edges */ { v1=from; from+=dimension; v2=from; /* taking two vertexes */ if(C_line_x_clipping(&v1,&v2,dimension)) /* clipping */ { if(C_2D_clipping) /* depends which one was clipped */ { HW_copy_int(v1,to,dimension); to+=dimension; HW_copy_int(v2,to,dimension); to+=dimension; new_lng+=2; /* first point clipped */ } else { HW_copy_int(v2,to,dimension); to+=dimension; new_lng++; /* second point clipped */ } } } HW_copy_int(first_vrtx,to,dimension); /* looping the polygon vertexes */ return(new_lng); } Again the way we represent the polygon the very first point has to be the last too since every point belong to two edges, and we can have a pointer to a new edge by just advancing it by "dimension" of a single vertex in the list of polygon vertices. View plane clipping. -------------------- As discussed before if we want to do perspective transformation we have to make sure we are actually applying it to something that we know produces a valid result, hence there should be no points with negative or zero Z coordinates. Zero would produce divide errors, Negative once would appear flipped over. There is another good reason to clip out vertices with negative Z, we just can't see things which are behind us (at least majority of us can't). The technique we would be using is just an expansion of 2-D clipping that was used to make sure 2-D primitives fit physical screen before rendering. As before, we can find the intersection point from solving similar triangles, or using binary-search techniques. The single clipping plane would be specified to be somewhere in front of Z==0. (And no, not quite as far as "focus" distance used in perspective transformation, viewing plane is not necessarily a clipping plane). The function to clip polygons would be almost identical too. (By the way, we can indeed try writing generic line and polygon clipping functions performing it for both 2D screen boundaries and view plane). Volume clipping. ---------------- First of all what view volume is? This is the set of all points in the "world" space which can appear on screen under certain projection method. For example if we are using just the parallel projection (let's forget for a second about perspective) we may see that only points from depicted below rectangular area would appear on the screen. | | | | | | * B | * C | Screen | | | boundaries | v | -----I--------*-------I----- Viewing plane. * A pic 4.11 Point A can't appear on screen as it is behind the viewing plane - it would be clipped by the algorithm described above in this chapter. point B on the other hand is in front of the viewing plane, so it would pass the first test but since it's projection onto the viewing plane won't fit the screen it would be clipped by 2-D screen boundaries clipping routines. Point C is lucky to be inside the view volume. The conclusion here is that by doing first viewing plane clipping and then screen boundaries clipping we actually performed volume clipping, that is we selected from space the set of all points that can be projected and then did actual projection (Yes, by passing to the rendering routines just X and Y of points and discarding Z we basically did parallel projection). Should this be changed somehow to accommodate perspective projection? after all by clipping against view plane we guarantee that there would be no points with Z==0. However the problem is that even having points close to the viewing plane is not very good. pic 4.12 *---> *-------------> *--------------------------> A*- - -> -----I-----I----- pic 4.12 By applying perspective transformation we increase the absolute values of coordinate components depending on inverse of it's distance to the viewer, so if Z==0 transforms into infinity numbers close to that transform into something bitsize of numbers stored in computers might not be able to handle, and no, we can't quite solve it by moving the clipping plane slightly forward, since there are still those nasty points which are slightly in front of the viewing plane but already have big absolute value of X or Y. (pic 4.12, point A). The overflow problem that may result from perspectively transforming this point is this: positive values may become negative, and if it would happen to just one point of two off-screen points representing a line we may actually see this line all across the screen instead of not seeing it at all. The conclusion when using perspective transformation, it is better to apply it to the points we know would produce a valid result. And since what we ultimately want is to project to the screen we are coming back to the view volumes since those are exactly sets of points that would be projected inside the screen. Hence we do need to consider actual viewing volume for perspective projection. What the view volume for perspective projection would look like? The way we modeled this transformation - rays of all visible points converge in viewer's eye. If we would cast back into space lines connecting the eye and the screen boundaries, we would obtain the pyramid marking points from space that would project onto screen edges. what's inside the pyramid would project inside the screen what's outside - outside the screen. adding the clipping plane somewhere in front of the viewer we are obtaining the view-volume for perspective projection, pic 4.13. \ / \ / \ / \ / \ / -----I\---/I----- \ / * pic 4.13 The only problem - we now have to clip against planes which are directed almost arbitrary in space (the exact geometry of this view volume depends on the "focus" parameter - the distance between the viewer and the viewing plane (again, not quite the same as clipping plane). Although conceptually easy to achieve clipping against arbitrary directed plane would have higher complexity. There is number of solutions one can consider: obvious one: indeed implement real volume clipping, although expansive we would be able to completely get rid of 2-D clipping routines which overall might be quite descent. Second: implement volume clipping with special kind of perspective view volume, the one having 90' angle. The reason can be seen from the pic 4.14: \ / \ / x=-z \ / x=z \ / ----I\-----/I----- \ / * pic 4.14 Planes composing this perspective view volume have pretty simple equations: x==z , y==z, x==-z and y==-z clipping against those is way easier then against arbitrary directed planes. The method however we would discuss is a bit different, that is, we would not do clipping at all, to be more exact we would only throw away polygons which are for sure outside the view volume and leave those even partially inside to be further clipped against screen boundaries after being projected. ( Clipping against viewing plane, is a sacred matter that one can hardly get rid of ) One should understand that this method works on assumption that there is no terribly big polygons around, since a one part of a really big polygon can be within the view volume whereas another can be both close to viewing plane and have really big absolute value of X or Y, just something we are trying to exclude from being perspectively projected. How can one figure out whether a point is outside the pyramid with 90' angle? From the pic 4.14 we can see that parts of the space separated by Z==X and Z==-X planes have obvious relationships among X and Z of every point, if Z<X or Z<-X point is for sure outside this area. it should be realized we can't quite extend this scheme onto polygons by saying if all points of a polygon are outside it is outside also, example is a polygon AB on the pic 4.15, it is clear that although both points composing it are outside there supposed to be some part of this polygon inside the view volume. ^ Z | \ | / \ Z>-X | Z>X / \ | / A *---------------------* B \ | / Z<-X \ / Z<X --------------*-----------------> X pic 4.15 We would make decisions, on the other hand, using notion of polygon's extends. pic 4.16 Extend is a cube completely enclosing within itself certain polygon. So coordinates of extend planes are that of maximum and minimum among the polygon vertices along all axes. xmin xmax -------------- ymin |\\ | | \ \ | | \ \ | | \ /| | \/ | |------------ ymax pic 4.16 This way by considering for example (xmin,zmax) point of the extend box we can make a decision whether polygon's extend is outside the x=z plane. ^ Z | \ | / \ | / (xmax,zmax) \ | / (xmin,zmax) ----+ \ | / +----+ | \ | / | --+ \ / +--- --------------*-----------------> X +-------+ (zmax) | | pic 4.17 Let's list all the other cases: xmin > zmax ymin > zmax -xmax > zmax -ymax > zmax On the same stage we can check if the polygon is completely behind the view plane as well. int C_volume_clipping(register int *vxes,int number) { int xmin,ymin,zmin,xmax,ymax,zmax; int i; ymin=xmin=zmin=INT_MAX; ymax=xmax=zmax=INT_MIN; /* initializing searching */ for(i=0;i<number;i++) { if(*vxes>xmax) xmax=*vxes; if(*vxes<xmin) xmin=*vxes; vxes++; if(*vxes>ymax) ymax=*vxes; if(*vxes<ymin) ymin=*vxes; vxes++; if(*vxes>zmax) zmax=*vxes; if(*vxes<zmin) zmin=*vxes; /* searching max/min */ vxes++; } if((zmax<xmin)||(zmax<ymin)||(zmax<-xmax)|| (zmax<-ymax)||(zmax<C_plane)) return(0); else if(zmin<C_plane) return(-1); /* behind clipping plane */ else return(1); } It should be realized that, extend testing is approximate, there might be polygons outside the view volume whose extends are partly inside, but since we are doing clipping to screen boundaries afterwards they would eventually be taken care of. Besides, clipping view volume is a pyramid with 90' angle, real view volume on the other hand depends on the perspective transformation formula, and would likely have angle less then 90'. And again, there should be no very big polygons, since we are doing "full element" volume clipping. * * *