home *** CD-ROM | disk | FTP | other *** search
- DemoNews.081.continued.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.part.2.of.2
-
- SECTIONS ARTICLES
- ---------------- -----------------------------------
- Code Assembly Part 3 (It ain't no party)
- BSP Trees
- Back Issues How to Get 'em, Descriptions
- Closing Comments Quote for the Week, etc.
-
- .,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,
-
-
- _____Assembly Part 3 by Jason Nunn
-
- \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
- \\\\\\\\[ "Implementation Techniques" - Assembly Part III
- \\\\\\\[[ By Jason Nunn
- \\\\\[[[[
- \\[[[[[[[
- ____________________________________________________________________
-
- In this issue I will be discussing the basic nuts and bolts on how to
- implement an assembly program geared towards a demo. This article is
- intended for the C or Pascal programmer who hasn't quite got the confidence
- to use a full blown assembly compiler. In the first part, you may remember
- me telling you of a friend that has reached a turning point in his coding
- development, yet he still won't "take the plunge" because he doesn't have
- access to all the nice perks of 3GL's, like sine, cosine, and random
- functions and larger precision variables than the chip itself. This issue
- will hopefully provide that incentive.
-
- But...., before we do that, I would like to first finish off last article's
- talk on optimization. I forgot to include the fast string functions. I'm
- not going to waffle on too much about this now, as this article is
- dedicated to "assembly techniques". All I will do here is show my results,
- and give out some tips.
-
- Same things apply on this run - my machine is a 486-33 ISA, operating in
- P-mode, and the lower the number, the faster a given instruction is.
-
- STOSB [209729] MOV [EDI],AL [134226]
- INC EDI
-
- STOSD [306805] MOV [EDI],EAX [244208]
- INC EDI
-
- LODSB [209729] MOV AL,[ESI] [134226]
- INC ESI
-
- STOSD [306805] MOV EAX,[ESI] [244208]
- INC ESI
-
- MOVSB [228550] MOV AL,[ESI] [142124]
- MOV [EDI],AL
- INC ESI
- INC EDI
-
- MOVSD [384379] MOV EAX,[ESI] [324112]
- MOV [EDI],EAX
- INC ESI
- INC EDI
-
- Of course, REP operations are faster than their equivalent by about half.
- In general, it is better to use MOV's and INC's to perform one off
- operations. So if you're coding with these instructions, chances are that
- you can get a bit more speed out of your code.
-
- Ok, now on with the main talk. Generally, I won't be going into great depth
- as there are plenty of tutorials and manuals on the net that explain the
- rank basics of assembly. My role here will be to highlight and familiarize
- extrordinary things about coding methods of assembly.
-
- If you've never coded in assembly, then it may pay you to write your
- equivalent program in a 3GL first and before converting it over. I guess it
- depends on the person. I prefer to implement idea's in straight assembler.
- I'm comfortable with the language enough to mumble it in my sleep and their
- are no barriers or contingencies like there are in 3GL code. You can also
- run into serious problems when converting to your target language, but I'm
- sure that there are as many negative points about doing this as they are
- positive points.
-
- How to crunch huge numbers
- --------------------------
-
- One of the first questions a new demo coder may ask is how he/she could
- add, multiply or subtract a number that is larger than the precision of the
- chip. Well, this really doesn't apply now, as the standard is 32 bits. This
- is ample for nearly all calculations, but for those of you that may want to
- perform a 64 bit ADD calculation, this is how you do it:
-
- ADD EAX,ECX
- ADC EDX,0
-
- In this example, we don't have a 64 bit register, therefore we must make
- two data sources, whether they be registers or memory references to act as
- one large register. In our case, EDX and EAX act as one. EAX contains the
- least significant data and the EDX contains the most significant data of
- our 64 bit number. ECX contains the number we are adding to this 64 bit
- concatenated register. The basic idea behind this is that we first add ECX
- to EAX. If the number in the EAX register "clocks" then the CPU's carry
- flag will be set.
-
- The next instruction - ADC (for those of you that don't know) is a funny
- sort of ADD instruction that performs two add instructions. It will first
- add the source register to the destination register, and then add 1 to the
- source register if the carry is set. Hence the name "ADD ON CARRY". In our
- example, if the carry flag is set, we will only add in the carry flag as
- the source value is zero. Therefore, if the least significant component of
- our 64 bit variable (EAX) clocks, the it will carry over to the EDX
- component.
-
- Although the above example only adds a 32 number to the 64 bit number. If
- you wanted to add a 64 bit number to a 64 number then you would adopt the
- following:
-
- ADD EAX,ECX
- ADC EDX,0
- ADD EDX,EBX
-
- Where EDX:EAX is the destination 64 register, and EBX:ECX is the source
- register.
-
- To add larger precision's, we simply chain!. Here we are adding a 32 bit
- number that resides in EAX to a 128 bit number which is stored in
- EDX,EBX,ECX and ESI.
-
- ADD EDX,EAX
- ADC EBX,0
- ADC ECX,0
- ADC ESI,0
-
- To subtract, the same principle applies, accept we use SUB and SBB
- instructions:
-
- (a) (b)
- SUB EAX,ECX SUB EAX,ECX
- SBB EDX,0 SBB EDX
- SUB EDX,EBX
-
- With the 486's math coprocessor, the large multiplication and division is
- more viable than our old conventional way of calculating large numbers;
- which as you will see and very slow. Pretty soon, I will be exclusively
- using coprocessor calculations in my demos, as they are extremely popular
- now. Hence rendering the following code (for me) obsolete. However, for
- names sake, I'll discuss the old way of doing things...
-
- For multiplying a 64 bit variable to a 32 bit variable you can use this
- algorithm:
-
- MOV EAX,ESI
- MUL EBX
- PUSH EAX EDX
- MOV EAX,ESI
- MUL ECX
- POP ECX EBX
- ADD ECX,EAX
-
- As a formula, the code is equivalent to this: ECX:EBX = ECX:EBX*ESI.
-
- Note that you can chain this one also by taking the EDX value from the
- second MUL and multiplying it by the next significant register of the
- source and adding that answer into the respective register of the
- destination.
-
- Dividing is a little bit more complex. How complex?...this complex:
-
- PROC LONG_DIV
- OR EBP,EBX
- JZ @@jump_0599
- PUSH EBP
- MOV EBP,ECX
- OR EBX,EBX
- PUSHF
- JNS @@jump_0548
- NOT ECX
- NOT EBX
- ADD ECX,01
- ADC EBX,00
- @@jump_0548:
- OR EDX,EDX
- PUSHF
- JNS @@jump_0557
- NOT EAX
- NOT EDX
- ADD EAX,01
- ADC EDX,00
- @@jump_0557:
- MOV ESI,ECX
- MOV EDI,EBX
- XOR ECX,ECX
- XOR EBX,EBX
- MOV EBP,0021h
- @@jump_0562:
- RCL ECX,1
- RCL EBX,1
- SUB ECX,ESI
- SBB EBX,EDI
- JNB @@jump_0570
- ADD ECX,ESI
- ADC EBX,EDI
- @@jump_0570:
- CMC
- RCL EAX,1
- RCL EDX,1
- DEC EBP
- JNZ @@jump_0562
- POPF
- JNS @@jump_058A
- NOT ECX
- NOT EBX
- ADD ECX,01
- ADC EBX,00
- POPF
- JNS @@jump_058D
- JMP @@jump_0597
- @@jump_058A:
- POPF
- JNS @@jump_0597
- @@jump_058D:
- NOT EAX
- NOT EDX
- ADD EAX,0001
- ADC EDX,00
- @@jump_0597:
- POP EBP
- @@jump_0599:
- RET
- ENDP
-
- This formula divides EDX:EAX by EBX:ECX. Just in case anybody recognizes
- this thing, I've reversed it from a certain popular commercial package (not
- giving any names) hehe :). I havn't used it since my real mode days
- (which, for the record is about 2 years ago when coding TC669), and it's
- basically optimized for that. I've made no attempt to optimize it for
- P-mode, as I most likely will never use it ever again.
-
- How to implement a Decimal point (or rather - a hexadecimal point :)
- --------------------------------------------------------------------
-
- Now that we have discussed the ways in which we can do a whole range of
- calculations, your next question is how to implement floating/none discrete
- calculations. For that, we must take a register/memory unit and divide it
- into two parts. The number and a mantissa. For the sake of efficiency, you
- would typically contain this in a single register, namely a 32 bit
- register. I usually use this type of construct (represented in binary):
-
- /------------32 bits------------\
- NNNNNNNNNNNNNNNN.MMMMMMMMMMMMMMMM
-
- Here you have a 16 bit actual number, with a 16 bit mantissa. As you can
- see the actual number is of a higher order than normal. If to wanted to
- extract the number from this variable, you can simply perform a SHR 16.
- This will arrive you at the "NNN...." component of the number. Here is an
- example of 1 and a half:
-
- 0000000000000001 1000000000000000b
-
- If we wanted the discrete part of the number (ie the "1" part), then just
- perform a SHR 16, which arrives us at: 0000000000000001. As you can see,
- there is no real difference between discrete and non-discrete variables. To
- the machine, it's all the same thing. The difference is the way you
- interpret the product. Calculations are still no different to normal
- numbers. If we wanted to add a "half" to this number then it's as simple
- that this:
-
- MOV EAX,00000000000000010000000000000000b ; this is a decimal "1"
-
- ADD EAX,00000000000000001000000000000000b ;this is a decimal "0.5"
-
- So, as you can see, it's not very hard. For multiplication, you're going to
- have to include a SHRD instruction, as the number will now be in EDX and
- the mantissa in EAX, hence the precision is now larger. This will return
- the number back to the EAX 32 bit precision that it should be. Here is an
- example:
-
- MUL ECX
- SHRD EDX,EAX,16
-
- Here, we multiply EAX by ECX, which arrives at EDX:EAX. Then we just step
- down this answer to arrive at the result, witch will now be contained in
- EAX. With division, it's the opposite:
-
- MOV EDX,0
- SHLD EDX,EAX,16
- DIV ECX
-
- Here we are dividing EAX by ECX. Note the preparation just before the
- divide.
-
- Signed Data
- -----------
-
- As of now, we have only discussed unsigned data. Generally speaking, these
- calculations are very simular, but there are some major differences.
-
- Contained in a given 32 register, unsigned numbers go from 0 to FFFFFFFFh,
- where as 32 signed data range from 80000000h which is the lowest number and
- 7FFFFFFFh begin the highest number. When using signed data, there are only
- a couple of extra things you must know. Signed data has its own
- multiplication and division instructions (ie IMUL and IDIV), and its own
- set of conditional jump instructions.
-
- JL (jump if less than) and JLE (jump if less than or equal to)
- are a signed equivalent to
- JB (jump if below) and JBE (jump if below or equal to).
-
- JG (jump if greater) and JGE (jump if greater than or equal to)
- are the signed equivalent to
- JA (jump if above) and JAE (jump if above or equal to).
-
- To change our unsigned divider from this....
- MOV EDX,0
- SHLD EDX,EAX,16
- DIV ECX
-
- ...To a signed divider, simply substitute the MOV EDX,0 with a CDQ. The CDQ
- extends a signed number in EAX into EDX. Example given:
-
- CDQ
- SHLD EDX,EAX,16
- IDIV ECX
-
- Implementing Complex mathematical relationships
- -----------------------------------------------
-
- At one time or another, a coder is going to have to use some sort of
- complex mathematical function like triangle ratios, logarithmic factors and
- random numbers to implement various things. To create a function that maps
- a relationship in real time is basically impossible in efficiently terms.
- The only way you can do this is to store relationships in the form of
- tables. This may not be apparent to users of compilers like turbo C etc but
- electronic calculators, compliers, maths coprocessors, spreadsheets all use
- this method of mapping these relationships. it a very fast a convenient way
- of doing things.
-
- The first common function is the random function. A random signal can be
- achieved using the following algorithm. The product of this function is a
- random number stored in the EAX register.
-
- ;input: NIL; output: EAX
- proc random
- mov ebx,[random_seed1]
- lea ebx,[ebx*4]
- mov eax,[ebx+@@rantable]
- mov ebx,[random_seed2]
- lea ebx,[ebx*4]
- add eax,[ebx+@@rantable]
- mov [ebx+@@rantable],eax
- inc [byte random_seed1]
- and [byte random_seed1],01111b
- dec [byte random_seed2]
- and [byte random_seed2],01111b
- ret
- random_seed1
- dd 2
- random_seed2
- dd 13
- @@rantable:
- dd 0fd8fce7ah,02d7ad7b7h,0f48a8f3ab,04a3b8f8bh
- dd 0f2dec542h,0a847fab7h,0f4da81aab,04a348f86h
- dd 024547edah,03b535a43h,0b35a535ab,0aa333483h
- dd 0fd2f4e7ah,0c525a5b7h,016d3b4a4b,0643b4fd3h
- endp
-
- If you expand the table to 256 entries then you could eliminate two
- instructions, but there again, it's not worth doing. This random function
- will give you a very random signal :). There is only one problem with this
- algorithm, and that is, the randomness will always follow the same pattern.
- If this feature undesirable, then you may like to make an initiation module
- that jumbles up the seeds or the numbers a bit. An obvious way of randomly
- choosing a seed, would be to store a fixed reference variable in memory.
- For example:
-
- proc randomise
- mov al,[043253445h]
- mov [byte random_seed1],al
- mov al,[012345678h]
- mov [byte random_seed2],al
- ret
- endp
-
- Anyway, I'm going to stop here as it's getting very close the deadline
- time. One day, I'll learn not to leave things till last minute. In the next
- part, I'll be hopefully finishing up this assembly series and moving on to
- my talks of sound/tracker programming (the interesting stuff).
-
- I'll be soon releasing a tracker that I have written called FunkTracker.
- With this will be the full source code listing. My discussions will be
- based around my knowledge and experience when producing current and past
- trackers and players, and discussing implementation and hardware issues. I
- also plan to discuss reverse engineering using microsoft CodeView, and plan
- to obtain hack docs on the AWE32 card. So this will be all coming up!.
- until next time.
-
- See ya
- :Jason Nunn
-
-
- _____BSP Trees by Tom Verbeure
-
- Problem situation: sorting polygons is slow and can be incorrect for
- certain view-angles. Heavily influenced by Computer Graphics, Principles
- and Practice, I have written this small tutorial for BSP trees, which
- solves the problem for static objects and for every view angle.
-
- As I already said: Binary Space Partitioning Tree. Unlike many other
- abbreviations, this one really explains a lot of the algorithm: it uses a
- tree. It partitions space and it partitions in two parts.
-
- First: it is only usefull in static scenes: no 3D morphing or other goodies
- are allowed.
-
- Let's go to the 2D case, 3D is exactly the same.
- Take a sample scene:
-
- A\ ----- C|
- \ B E |
- ------ |
- |
- / .
- / V
- D/
-
- The positive side of the polygons is the side with the defining
- character... Ignore V for now.
-
- One could sort this thing during rendering, but as there can be no correct
- sort criterium and sorting is slow, we don't want that. Besides, we have
- memory to spare :-)
-
- Now, we're going to build a tree that is totally viewpoint independent:
-
- Take polygon B as the root. Polygon B divides space in to parts: the
- positive and the negative side (Geee!) We have partitoned space in two.
-
- First, scrap B from the 'not-used' polygons-array and classify the
- remaining polygons. Group those on the + side, and those on the - side. As
- you can see, polygon C is both on the + and the - side. What to do? Create
- 2 new polygons C+ and C-, erase C. Is there another complainer ? Nope: all
- poly's are on either the + or the - side. Now we have this situation:
-
- B
- / \
- A,C+,E D,C-
-
- Not really a tree yet, but we've only started...
-
- Now, do the same thing for the groups at the child nodes, without caring
- about those in another child-node.
-
- For the + side of B, we have polys A,C+ and E. Take A as next node polygon.
- Neither C+ nor E are on it's negative side (we ignore D and C-). For the
- other node, take D as next node polygon. Only C- remains there, and it is
- on the negative side. That side of the tree is finished. We have the
- following situation:
-
- B
- / \
- / \
- A D
- \ \
- C+,E C-
-
- There's one child with more that one poly left. Take C+ as node polygon, E
- become it's child, on the positive side. We're finished.
- Situation:
-
- B
- / \
- / \
- A D
- \ \
- C+ C-
- /
- E
-
- Now, what can we do with it? A lot... Suppose the viewpoint is at position
- V. In which order do we have to sort the polygons, when using a back to
- front rendering algorithm ? Answer: walk the tree, make sure all nodes
- (including childs) are visited.
-
- Start at the root. Is V on the positive side? Nope, well, we want the polys
- far away first, so walk the positive way. Are we on the positive side of A?
- Yep, walk the negative way. It is empty! Ah. Well, draw A first. The go the
- positive way. Are we positive of C+? Yep. Negative way of C+ is empty. Draw
- C+ poly. Go positive way of C+. E has no child, draw it. Go up until a
- non-empty branch is found, draw all node polygon not drawn already. We now
- arrive at B again. Draw it. Negative is not visited yet, walk it. We're
- negative of D. Positive way is empty. Draw D and go negative. C- has no
- child. Draw it. All nodes have been visited. The end.
-
- We have drawn the polygons in following order:
-
- A, C+, E, B, D, C- which is a correct order. The BSP tree has to be
- constructed only once and for all. From then on, sorting the polygons is
- always correct and in linear time. Standard sorting algorithms can be
- proved to be of n*log(n) order of time, so we have an increase in speed as
- well.
-
- Disadvantages:
-
- - Memory: one has to have the tree in memory. This can be substantial for
- lots of polygons.
- - Polygon splitting: one ends up with more split polygons. It is almost
- always unavoidable to do splitting.
- - Polygons are not allowed to move.
-
- A BSP tree is NOT unique: just pick another polygon as a node and one gets
- a different one. In this case, one can avoid splitting polygons: start with
- a root and build the following, correct, BSP tree:
-
- C
- /
- B
- / \
- A D
- /
- E
-
- Tadaam! No polygon splitting!
-
- Building a tree with a few splitting as possible is an exponential of the
- number of polygons. As Foley and Van Dam says, just try a limited number of
- nodepolygons, pick the one with the least splitting and the tree will be
- good enough.
-
- Voila. That's it. Not too difficult I think. Notice BSP trees are also
- usefull to sort objects, by using planes that divide the space in such a
- way that Object A is on the negative and Object B is on the positive side
- of the plane. Very useful (only for non-intersecting objects).
-
- This text is written without the Bible (Computer Graphics, P&P) besides me,
- but since I read their chapter about BSP trees many times, it contains
- almost the same info.
-
- For polygon splitting algorithmes, there is one in Graphics Gems. I don't
- know which one, but buy all four books, you won't be disappointed... :-)
-
- Tom Verbeure
- Synergy Design
-
- .,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,
-
- <<Back Issues>>
-
- ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
- _____How to Get 'em
-
- After reading this issue of DemoNews, you may be wondering how you can get
- previous ones. Well fear not! There are two different ways to do so:
-
- 1: FTP to hornet.eng.ufl.edu and go to /pub/msdos/demos/news/OLD_NEWS and
- start downloading anything you see.
-
- 2: Now you can request back issues of DemoNews via e-mail. Start a letter
- to listserver@oliver.sun.ac.za (any subject line) and in the body of the
- letter include "get demuan-list <index>" where INDEX refers to the
- index number of the issue.
-
- For example: get demuan-list 43
-
- This would retrieve DemoNews #76 (part 1 of 2).
-
- For more recent issues that are split into multiple parts, you must send
- an individual request for each index number.
-
- _____Descriptions
-
- Issue Index Date Size Description
- ----- ----- -------- ------ ----------------------------------------------
- 75 41,42 12/18/94 68009 A DemoNews Reader, The Birth of Commercial
- Life, Editorial: Calm Before the Storm,
- Interview with Mello-D, US Demo Scene
- (Renaissance meeting), Jelly Tots and Pizza
- Shops, Review of Wired '94 Graphics.
-
- 76 43,44 12/25/94 92589 Interview with EMF, DemoNews Readers Write,
- Kimba's Life Story, X-Mas in the Demo Scene,
- CORE, Demo & Music Database, Interview with
- Purple Motion/Future Crew, Interview with
- Krystall/Astek, Common Sense ][ by Perisoft,
- Its X-Mas in Africa, Interview with Maxwood
- of Majic 12, Assembly Part ][, Common Sense
- Response by Stony.
-
- 77 45,46 01/01/95 101100 Chart History, Snowman Near-Disaster, Son of
- Snowman, The Party 1994, Making Waves, Using
- Assembly Part 2.
-
- 78 47-49 01/08/95 111185 The Party 1994: Results and Reviews, Report
- by Stony and Friends, What happened to PC-
- Demo competition. Editorial: TP94 = ASM94
- part 2. Egg2: Trancescrambled Review, More
- on Fast Tracker 2.03. General Rambling by
- Denthor.
-
- 79 51 01/15/95 41832 A Day in the Life of Snowman, Ambient Sample
- CD 1, Where's the Sound Blaster, TP94
- Graphics review.
-
- 80 55 01/22/95 27028 DemoNews/HTML, Traffic Jam, CodeThink(School);
- The Solo Sample CD
-
- .,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,
-
- <<Closing Comments>>
-
- ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
- The quote this week comes from "Assembly Language for the PC, Third
- Edition". p.174
-
- "A program is never done...but it must be stopped somewhere."
-
- This was intended as a moral for programmers, but with a little rewording
- the message is applicable to many areas in life.
-
- See you in CyberSpace,
-
- -Christopher G. Mann (Snowman)-
- r3cgm@dax.cc.uakron.edu
-
- ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,End.of.DemoNews.081.
-
-