Groovy Bytes: Behind the Moon

home *** CD-ROM | disk | FTP | other *** search

/ Groovy Bytes: Behind the Moon / groovybytes.iso / GROOVY / MISC / TEXTE / DEMO081.ZIP / DN_2OF2.081 < prev next >

Wrap

Text File | 1995-01-29 | 26.1 KB | 626 lines

DemoNews.081.continued.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.part.2.of.2 SECTIONS ARTICLES ---------------- ----------------------------------- Code Assembly Part 3 (It ain't no party) BSP Trees Back Issues How to Get 'em, Descriptions Closing Comments Quote for the Week, etc. .,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,., _____Assembly Part 3 by Jason Nunn \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\[ "Implementation Techniques" - Assembly Part III \\\\\\\[[ By Jason Nunn \\\\\[[[[ \\[[[[[[[ ____________________________________________________________________ In this issue I will be discussing the basic nuts and bolts on how to implement an assembly program geared towards a demo. This article is intended for the C or Pascal programmer who hasn't quite got the confidence to use a full blown assembly compiler. In the first part, you may remember me telling you of a friend that has reached a turning point in his coding development, yet he still won't "take the plunge" because he doesn't have access to all the nice perks of 3GL's, like sine, cosine, and random functions and larger precision variables than the chip itself. This issue will hopefully provide that incentive. But...., before we do that, I would like to first finish off last article's talk on optimization. I forgot to include the fast string functions. I'm not going to waffle on too much about this now, as this article is dedicated to "assembly techniques". All I will do here is show my results, and give out some tips. Same things apply on this run - my machine is a 486-33 ISA, operating in P-mode, and the lower the number, the faster a given instruction is. STOSB [209729] MOV [EDI],AL [134226] INC EDI STOSD [306805] MOV [EDI],EAX [244208] INC EDI LODSB [209729] MOV AL,[ESI] [134226] INC ESI STOSD [306805] MOV EAX,[ESI] [244208] INC ESI MOVSB [228550] MOV AL,[ESI] [142124] MOV [EDI],AL INC ESI INC EDI MOVSD [384379] MOV EAX,[ESI] [324112] MOV [EDI],EAX INC ESI INC EDI Of course, REP operations are faster than their equivalent by about half. In general, it is better to use MOV's and INC's to perform one off operations. So if you're coding with these instructions, chances are that you can get a bit more speed out of your code. Ok, now on with the main talk. Generally, I won't be going into great depth as there are plenty of tutorials and manuals on the net that explain the rank basics of assembly. My role here will be to highlight and familiarize extrordinary things about coding methods of assembly. If you've never coded in assembly, then it may pay you to write your equivalent program in a 3GL first and before converting it over. I guess it depends on the person. I prefer to implement idea's in straight assembler. I'm comfortable with the language enough to mumble it in my sleep and their are no barriers or contingencies like there are in 3GL code. You can also run into serious problems when converting to your target language, but I'm sure that there are as many negative points about doing this as they are positive points. How to crunch huge numbers -------------------------- One of the first questions a new demo coder may ask is how he/she could add, multiply or subtract a number that is larger than the precision of the chip. Well, this really doesn't apply now, as the standard is 32 bits. This is ample for nearly all calculations, but for those of you that may want to perform a 64 bit ADD calculation, this is how you do it: ADD EAX,ECX ADC EDX,0 In this example, we don't have a 64 bit register, therefore we must make two data sources, whether they be registers or memory references to act as one large register. In our case, EDX and EAX act as one. EAX contains the least significant data and the EDX contains the most significant data of our 64 bit number. ECX contains the number we are adding to this 64 bit concatenated register. The basic idea behind this is that we first add ECX to EAX. If the number in the EAX register "clocks" then the CPU's carry flag will be set. The next instruction - ADC (for those of you that don't know) is a funny sort of ADD instruction that performs two add instructions. It will first add the source register to the destination register, and then add 1 to the source register if the carry is set. Hence the name "ADD ON CARRY". In our example, if the carry flag is set, we will only add in the carry flag as the source value is zero. Therefore, if the least significant component of our 64 bit variable (EAX) clocks, the it will carry over to the EDX component. Although the above example only adds a 32 number to the 64 bit number. If you wanted to add a 64 bit number to a 64 number then you would adopt the following: ADD EAX,ECX ADC EDX,0 ADD EDX,EBX Where EDX:EAX is the destination 64 register, and EBX:ECX is the source register. To add larger precision's, we simply chain!. Here we are adding a 32 bit number that resides in EAX to a 128 bit number which is stored in EDX,EBX,ECX and ESI. ADD EDX,EAX ADC EBX,0 ADC ECX,0 ADC ESI,0 To subtract, the same principle applies, accept we use SUB and SBB instructions: (a) (b) SUB EAX,ECX SUB EAX,ECX SBB EDX,0 SBB EDX SUB EDX,EBX With the 486's math coprocessor, the large multiplication and division is more viable than our old conventional way of calculating large numbers; which as you will see and very slow. Pretty soon, I will be exclusively using coprocessor calculations in my demos, as they are extremely popular now. Hence rendering the following code (for me) obsolete. However, for names sake, I'll discuss the old way of doing things... For multiplying a 64 bit variable to a 32 bit variable you can use this algorithm: MOV EAX,ESI MUL EBX PUSH EAX EDX MOV EAX,ESI MUL ECX POP ECX EBX ADD ECX,EAX As a formula, the code is equivalent to this: ECX:EBX = ECX:EBX*ESI. Note that you can chain this one also by taking the EDX value from the second MUL and multiplying it by the next significant register of the source and adding that answer into the respective register of the destination. Dividing is a little bit more complex. How complex?...this complex: PROC LONG_DIV OR EBP,EBX JZ @@jump_0599 PUSH EBP MOV EBP,ECX OR EBX,EBX PUSHF JNS @@jump_0548 NOT ECX NOT EBX ADD ECX,01 ADC EBX,00 @@jump_0548: OR EDX,EDX PUSHF JNS @@jump_0557 NOT EAX NOT EDX ADD EAX,01 ADC EDX,00 @@jump_0557: MOV ESI,ECX MOV EDI,EBX XOR ECX,ECX XOR EBX,EBX MOV EBP,0021h @@jump_0562: RCL ECX,1 RCL EBX,1 SUB ECX,ESI SBB EBX,EDI JNB @@jump_0570 ADD ECX,ESI ADC EBX,EDI @@jump_0570: CMC RCL EAX,1 RCL EDX,1 DEC EBP JNZ @@jump_0562 POPF JNS @@jump_058A NOT ECX NOT EBX ADD ECX,01 ADC EBX,00 POPF JNS @@jump_058D JMP @@jump_0597 @@jump_058A: POPF JNS @@jump_0597 @@jump_058D: NOT EAX NOT EDX ADD EAX,0001 ADC EDX,00 @@jump_0597: POP EBP @@jump_0599: RET ENDP This formula divides EDX:EAX by EBX:ECX. Just in case anybody recognizes this thing, I've reversed it from a certain popular commercial package (not giving any names) hehe :). I havn't used it since my real mode days (which, for the record is about 2 years ago when coding TC669), and it's basically optimized for that. I've made no attempt to optimize it for P-mode, as I most likely will never use it ever again. How to implement a Decimal point (or rather - a hexadecimal point :) -------------------------------------------------------------------- Now that we have discussed the ways in which we can do a whole range of calculations, your next question is how to implement floating/none discrete calculations. For that, we must take a register/memory unit and divide it into two parts. The number and a mantissa. For the sake of efficiency, you would typically contain this in a single register, namely a 32 bit register. I usually use this type of construct (represented in binary): /------------32 bits------------\ NNNNNNNNNNNNNNNN.MMMMMMMMMMMMMMMM Here you have a 16 bit actual number, with a 16 bit mantissa. As you can see the actual number is of a higher order than normal. If to wanted to extract the number from this variable, you can simply perform a SHR 16. This will arrive you at the "NNN...." component of the number. Here is an example of 1 and a half: 0000000000000001 1000000000000000b If we wanted the discrete part of the number (ie the "1" part), then just perform a SHR 16, which arrives us at: 0000000000000001. As you can see, there is no real difference between discrete and non-discrete variables. To the machine, it's all the same thing. The difference is the way you interpret the product. Calculations are still no different to normal numbers. If we wanted to add a "half" to this number then it's as simple that this: MOV EAX,00000000000000010000000000000000b ; this is a decimal "1" ADD EAX,00000000000000001000000000000000b ;this is a decimal "0.5" So, as you can see, it's not very hard. For multiplication, you're going to have to include a SHRD instruction, as the number will now be in EDX and the mantissa in EAX, hence the precision is now larger. This will return the number back to the EAX 32 bit precision that it should be. Here is an example: MUL ECX SHRD EDX,EAX,16 Here, we multiply EAX by ECX, which arrives at EDX:EAX. Then we just step down this answer to arrive at the result, witch will now be contained in EAX. With division, it's the opposite: MOV EDX,0 SHLD EDX,EAX,16 DIV ECX Here we are dividing EAX by ECX. Note the preparation just before the divide. Signed Data ----------- As of now, we have only discussed unsigned data. Generally speaking, these calculations are very simular, but there are some major differences. Contained in a given 32 register, unsigned numbers go from 0 to FFFFFFFFh, where as 32 signed data range from 80000000h which is the lowest number and 7FFFFFFFh begin the highest number. When using signed data, there are only a couple of extra things you must know. Signed data has its own multiplication and division instructions (ie IMUL and IDIV), and its own set of conditional jump instructions. JL (jump if less than) and JLE (jump if less than or equal to) are a signed equivalent to JB (jump if below) and JBE (jump if below or equal to). JG (jump if greater) and JGE (jump if greater than or equal to) are the signed equivalent to JA (jump if above) and JAE (jump if above or equal to). To change our unsigned divider from this.... MOV EDX,0 SHLD EDX,EAX,16 DIV ECX ...To a signed divider, simply substitute the MOV EDX,0 with a CDQ. The CDQ extends a signed number in EAX into EDX. Example given: CDQ SHLD EDX,EAX,16 IDIV ECX Implementing Complex mathematical relationships ----------------------------------------------- At one time or another, a coder is going to have to use some sort of complex mathematical function like triangle ratios, logarithmic factors and random numbers to implement various things. To create a function that maps a relationship in real time is basically impossible in efficiently terms. The only way you can do this is to store relationships in the form of tables. This may not be apparent to users of compilers like turbo C etc but electronic calculators, compliers, maths coprocessors, spreadsheets all use this method of mapping these relationships. it a very fast a convenient way of doing things. The first common function is the random function. A random signal can be achieved using the following algorithm. The product of this function is a random number stored in the EAX register. ;input: NIL; output: EAX proc random mov ebx,[random_seed1] lea ebx,[ebx*4] mov eax,[ebx+@@rantable] mov ebx,[random_seed2] lea ebx,[ebx*4] add eax,[ebx+@@rantable] mov [ebx+@@rantable],eax inc [byte random_seed1] and [byte random_seed1],01111b dec [byte random_seed2] and [byte random_seed2],01111b ret random_seed1 dd 2 random_seed2 dd 13 @@rantable: dd 0fd8fce7ah,02d7ad7b7h,0f48a8f3ab,04a3b8f8bh dd 0f2dec542h,0a847fab7h,0f4da81aab,04a348f86h dd 024547edah,03b535a43h,0b35a535ab,0aa333483h dd 0fd2f4e7ah,0c525a5b7h,016d3b4a4b,0643b4fd3h endp If you expand the table to 256 entries then you could eliminate two instructions, but there again, it's not worth doing. This random function will give you a very random signal :). There is only one problem with this algorithm, and that is, the randomness will always follow the same pattern. If this feature undesirable, then you may like to make an initiation module that jumbles up the seeds or the numbers a bit. An obvious way of randomly choosing a seed, would be to store a fixed reference variable in memory. For example: proc randomise mov al,[043253445h] mov [byte random_seed1],al mov al,[012345678h] mov [byte random_seed2],al ret endp Anyway, I'm going to stop here as it's getting very close the deadline time. One day, I'll learn not to leave things till last minute. In the next part, I'll be hopefully finishing up this assembly series and moving on to my talks of sound/tracker programming (the interesting stuff). I'll be soon releasing a tracker that I have written called FunkTracker. With this will be the full source code listing. My discussions will be based around my knowledge and experience when producing current and past trackers and players, and discussing implementation and hardware issues. I also plan to discuss reverse engineering using microsoft CodeView, and plan to obtain hack docs on the AWE32 card. So this will be all coming up!. until next time. See ya :Jason Nunn _____BSP Trees by Tom Verbeure Problem situation: sorting polygons is slow and can be incorrect for certain view-angles. Heavily influenced by Computer Graphics, Principles and Practice, I have written this small tutorial for BSP trees, which solves the problem for static objects and for every view angle. As I already said: Binary Space Partitioning Tree. Unlike many other abbreviations, this one really explains a lot of the algorithm: it uses a tree. It partitions space and it partitions in two parts. First: it is only usefull in static scenes: no 3D morphing or other goodies are allowed. Let's go to the 2D case, 3D is exactly the same. Take a sample scene: A\ ----- C| \ B E | ------ | | / . / V D/ The positive side of the polygons is the side with the defining character... Ignore V for now. One could sort this thing during rendering, but as there can be no correct sort criterium and sorting is slow, we don't want that. Besides, we have memory to spare :-) Now, we're going to build a tree that is totally viewpoint independent: Take polygon B as the root. Polygon B divides space in to parts: the positive and the negative side (Geee!) We have partitoned space in two. First, scrap B from the 'not-used' polygons-array and classify the remaining polygons. Group those on the + side, and those on the - side. As you can see, polygon C is both on the + and the - side. What to do? Create 2 new polygons C+ and C-, erase C. Is there another complainer ? Nope: all poly's are on either the + or the - side. Now we have this situation: B / \ A,C+,E D,C- Not really a tree yet, but we've only started... Now, do the same thing for the groups at the child nodes, without caring about those in another child-node. For the + side of B, we have polys A,C+ and E. Take A as next node polygon. Neither C+ nor E are on it's negative side (we ignore D and C-). For the other node, take D as next node polygon. Only C- remains there, and it is on the negative side. That side of the tree is finished. We have the following situation: B / \ / \ A D \ \ C+,E C- There's one child with more that one poly left. Take C+ as node polygon, E become it's child, on the positive side. We're finished. Situation: B / \ / \ A D \ \ C+ C- / E Now, what can we do with it? A lot... Suppose the viewpoint is at position V. In which order do we have to sort the polygons, when using a back to front rendering algorithm ? Answer: walk the tree, make sure all nodes (including childs) are visited. Start at the root. Is V on the positive side? Nope, well, we want the polys far away first, so walk the positive way. Are we on the positive side of A? Yep, walk the negative way. It is empty! Ah. Well, draw A first. The go the positive way. Are we positive of C+? Yep. Negative way of C+ is empty. Draw C+ poly. Go positive way of C+. E has no child, draw it. Go up until a non-empty branch is found, draw all node polygon not drawn already. We now arrive at B again. Draw it. Negative is not visited yet, walk it. We're negative of D. Positive way is empty. Draw D and go negative. C- has no child. Draw it. All nodes have been visited. The end. We have drawn the polygons in following order: A, C+, E, B, D, C- which is a correct order. The BSP tree has to be constructed only once and for all. From then on, sorting the polygons is always correct and in linear time. Standard sorting algorithms can be proved to be of n*log(n) order of time, so we have an increase in speed as well. Disadvantages: - Memory: one has to have the tree in memory. This can be substantial for lots of polygons. - Polygon splitting: one ends up with more split polygons. It is almost always unavoidable to do splitting. - Polygons are not allowed to move. A BSP tree is NOT unique: just pick another polygon as a node and one gets a different one. In this case, one can avoid splitting polygons: start with a root and build the following, correct, BSP tree: C / B / \ A D / E Tadaam! No polygon splitting! Building a tree with a few splitting as possible is an exponential of the number of polygons. As Foley and Van Dam says, just try a limited number of nodepolygons, pick the one with the least splitting and the tree will be good enough. Voila. That's it. Not too difficult I think. Notice BSP trees are also usefull to sort objects, by using planes that divide the space in such a way that Object A is on the negative and Object B is on the positive side of the plane. Very useful (only for non-intersecting objects). This text is written without the Bible (Computer Graphics, P&P) besides me, but since I read their chapter about BSP trees many times, it contains almost the same info. For polygon splitting algorithmes, there is one in Graphics Gems. I don't know which one, but buy all four books, you won't be disappointed... :-) Tom Verbeure Synergy Design .,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,., <<Back Issues>> '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' _____How to Get 'em After reading this issue of DemoNews, you may be wondering how you can get previous ones. Well fear not! There are two different ways to do so: 1: FTP to hornet.eng.ufl.edu and go to /pub/msdos/demos/news/OLD_NEWS and start downloading anything you see. 2: Now you can request back issues of DemoNews via e-mail. Start a letter to listserver@oliver.sun.ac.za (any subject line) and in the body of the letter include "get demuan-list <index>" where INDEX refers to the index number of the issue. For example: get demuan-list 43 This would retrieve DemoNews #76 (part 1 of 2). For more recent issues that are split into multiple parts, you must send an individual request for each index number. _____Descriptions Issue Index Date Size Description ----- ----- -------- ------ ---------------------------------------------- 75 41,42 12/18/94 68009 A DemoNews Reader, The Birth of Commercial Life, Editorial: Calm Before the Storm, Interview with Mello-D, US Demo Scene (Renaissance meeting), Jelly Tots and Pizza Shops, Review of Wired '94 Graphics. 76 43,44 12/25/94 92589 Interview with EMF, DemoNews Readers Write, Kimba's Life Story, X-Mas in the Demo Scene, CORE, Demo & Music Database, Interview with Purple Motion/Future Crew, Interview with Krystall/Astek, Common Sense ][ by Perisoft, Its X-Mas in Africa, Interview with Maxwood of Majic 12, Assembly Part ][, Common Sense Response by Stony. 77 45,46 01/01/95 101100 Chart History, Snowman Near-Disaster, Son of Snowman, The Party 1994, Making Waves, Using Assembly Part 2. 78 47-49 01/08/95 111185 The Party 1994: Results and Reviews, Report by Stony and Friends, What happened to PC- Demo competition. Editorial: TP94 = ASM94 part 2. Egg2: Trancescrambled Review, More on Fast Tracker 2.03. General Rambling by Denthor. 79 51 01/15/95 41832 A Day in the Life of Snowman, Ambient Sample CD 1, Where's the Sound Blaster, TP94 Graphics review. 80 55 01/22/95 27028 DemoNews/HTML, Traffic Jam, CodeThink(School); The Solo Sample CD .,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,., <<Closing Comments>> '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' The quote this week comes from "Assembly Language for the PC, Third Edition". p.174 "A program is never done...but it must be stopped somewhere." This was intended as a moral for programmers, but with a little rewording the message is applicable to many areas in life. See you in CyberSpace, -Christopher G. Mann (Snowman)- r3cgm@dax.cc.uakron.edu ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,End.of.DemoNews.081.