Reverse Code Engineering RCE CD +sandman 2000

home *** CD-ROM | disk | FTP | other *** search

/ Reverse Code Engineering RCE CD +sandman 2000 / ReverseCodeEngineeringRceCdsandman2000.iso / RCE / Library / Manuels & Misc / Assembly / ASM-TUT.ZIP / BAS2-1.DOC < prev next >

Wrap

Text File | 1990-08-02 | 31.4 KB | 704 lines

mmvii BASIC II - INTERFACING BASIC WITH ASSEMBLER Have you finished reading all the chapters? If not, go back and do them, then come back to this when you are done. This chapter assumes you know about segments, subroutines, and the general information about linking subroutines to high-level languages. In order to do this appendix I had to dust off my old QuickBASIC 3.0. If you have QuickBASIC 4.x, some things will have been updated. If you have TurboBASIC, the subroutine conventions are different. However, the structure will be the same. You will have to consult your manual for exact details. If you are trying to this with the interpreter that came with DOS, I have a simple comment -> Forget it! I won't go into the details, but the BASIC interpreter is so much slower (about 10 times slower) and so much more difficult to use with assembler that I won't even cover it. There is no reason not to use one of the compiled BASICs if BASIC is your language of choice. This material only covers how to deal with COMPILED BASIC. In BASIC, all individual numeric data, strings, "static" arrays and the stack must fit into one 64k segment. The word 'segment' here has the same meaning as in assembler. Both the DS register and the SS register are set to this segment, and must stay set to this segment whenever BASIC has control of the program. "Dynamic" arrays can be located somewhere else in memory. You allocate a "static" array with a constant number as a dimension: DIM array1! (277), array2% (346), array3$ (500) and you allocate a "dynamic" array by using a variable to dimension the array: length1% = 277 length2% = 346 length3$ = 500 DIM array1!(length1%), array2%(length2%), array3$(length3%) Even though the first and second dimension statements produce the same size and type arrays, the first ones must be located inside DS and the second ones can be located outside of DS. "Static" means that once the array is defined, its length and number of dimensions cannot be changed for the rest of the program. It will occupy a specific amount of space for the rest of the program. "Dynamic" means that you can change the length of the array whenever you want to. You do this with: REDIM array1! (495) ______________________ The PC Assembler Tutor - Copyright (C) 1990 Chuck Nelson The PC Assembler Tutor mmviii ______________________ BASIC does this by deallocating space for the old array and then reallocating space for the new array. All the old information is lost. There are certain restrictions. You cannot change the number of dimensions in an array (if it starts out with 2 dimensions like DIM A!(47,63), it must always have 2 dimensions). In order to understand BASIC's memory strategy, we need to look at strings, the reason for it all.{1} The limit for a single string is 32,767 bytes. If the total amount of data you can have in the DS segment is only 65536 bytes, how does BASIC allocate memory so you can have long strings without runnung out of space? It uses only as much space as it needs. Let's define 3 strings (the dots will indicate a space): mystring$ = "You.say.either" yourstring$ = "And.I.say.either" ourstring$ = "Let's.call.it.off" After defining these three strings one after the other, memory will look like this: 17150 |You.say.eitherAnd.I.say.eitherLet's.call.it.off| (For clarity, the memory image will be between the '|'s and each row will be 50 bytes long. The next row down would start at 17200). For our example we will assume that this data starts at memory location 17150. There is no empty space. How does BASIC know where and how long mystring$ is? It has something called a string descriptor. This is a two word (4 byte) block, also in DS, which says exactly where and how long the string is. The first word is the length and the second word is the location (offset) in DS. From BASIC's view, we have: STRING DESCRIPTOR length:location mystring$ 14:17150 -> |You.say.either| yourstring$ 16:17164 -> |And.I.say.either| ourstring$ 17:17180 -> |Let's.call.it.off| Now let's change one of the strings: yourstring$ = "But.oh!,.If.we.call.the.whole.thing.off" We now have a problem. The current "yourstring$" is only 16 bytes long, but the new one is 39 bytes long. What does BASIC do? It (1) deallocates the space for the old "yourstring", (2) allocates new space for the new string and (3) updates the string ____________________ 1. This is an outline of what BASIC does, but it will not include the parts of memory management that you will never see. BASIC II - Interfacing BASIC With Assembler mmix ___________________________________________ descriptor. Memory will now look like this: 17150 |You.say.either Let's.call.it.offBut| |.oh!,.If.we.call.the.whole.thing.off| and the descriptors will now look like this: STRING DESCRIPTOR length:location mystring$ 14:17150 -> |You.say.either| yourstring$ 39:17197 -> |But.oh!,.if.we... ourstring$ 17:17180 -> |Let's.call.it.off| BASIC is aware that there is an empty block of space and has a strategy for dealing with empty spaces, though each BASIC has its own strategy. We don't know exactly WHEN it will take action, but we do know WHAT action it will take. At some point BASIC will decide that it has too many empty spaces in memory and will REORGANIZE the segment. This is known as GARBAGE COLLECTION. Exactly how this is done is up to the person who wrote the BASIC compiler/interpreter. After reorganization, the addresses of ALMOST ALL strings and MANY dynamic arrays will have changed. The string locations themselves will have changed, but the string descriptors will still be in the same place in DS, and they will have been updated. Here is the new memory: 12724 |You.say.eitherLet's.call.i| |t.offBut.oh!,.If.we.call.the.whole.thing.off| and here are the updated descriptors: STRING DESCRIPTOR length:location mystring$ 14:12724 -> |You.say.either| yourstring$ 39:12755 -> |But.oh!,.if.we... ourstring$ 17:12738 -> |Let's.call.it.off| The strings have been moved several thousand bytes from where they were just a second ago. The information that was in the string descriptors a second ago is no longer valid. Old information about dynamic arrays is also unreliable. This means that if you have a subroutine written in assembler, you must get any address information at the time the subroutine is called. We'll come back to this later. Let's go on to data input and output. When you first started The PC Assembler Tutor mmx ______________________ doing BASIC, you did i/o using only: WRITE #1, my.data! Perhaps you you do it differently now, perhaps not. In any case, you need to know about i/o speed and how different file i/o works. Here's the simplest file output: *********************************** DIM large.array! (10000) FOR i% = 1 to 10000 large.array! (i%) = 2.1 NEXT i% OPEN "2-1.doc" for output as # 1 PRINT time$ FOR i% = 1 to 10000 WRITE #1, large.array! (i%) NEXT i% PRINT time$ CLOSE #1 *********************************** Of course, to make it a challenge we are going to write an array of 10,000 numbers. How long does it take?{2} For this output it took 38 seconds. The same program, inputting the same data with: INPUT #1, large.array(i%) took 49 seconds. These are fairly large amounts of time. But wait, it gets worse. Let's change one line of the above program: large.array! (i%) = 2.1678319E+19 This is a different constant which is put into each element of the array. How long does output take now? 59 seconds. And input? a whopping 79 seconds! What's going on here? When you do i/o with INPUT #, WRITE # or PRINT #, it is exactly like doing i/o to the screen. For output, BASIC converts the binary numbers into TEXT and then writes the TEXT to the disk. When it does input, it reads the TEXT from the disk and converts the TEXT into a binary number. Here is the beginning of the output file from the first example: 2.1 2.1 2.1 2.1 ____________________ 2. All times from now on are with a slower PC with a slower hard disk, but an 8087. Since these are floating point numbers, your results should be slower if you don't have an 80x87, while if you have an 80386 with an 80387 and a fast hard disk, your times will be much faster. BASIC II - Interfacing BASIC With Assembler mmxi ___________________________________________ 2.1 Each data item has been converted into "2.1" + CHR$(13) + CHR$(10). These last two things are a carriage return on the IBM PC. That's (5 bytes X 10000 items) plus 1 byte for the end of file marker, or 50001 bytes: 2-1 DOC 50001 6-29-90 12:39p Here's the beginning of the output file from the second example: 2.167832E+19 2.167832E+19 2.167832E+19 2.167832E+19 2.167832E+19 Each data item has been converted into "2.167832E+19" + CHR$(13) + CHR$(10). That's (14 bytes * 10000 items) plus 1 byte for the end of file marker, or 140001 bytes: 2-1E19 DOC 140001 6-29-90 12:47p These files are unnecessarily large, and i/o is slow: if you don't have an 8087 and you are doing floating-point i/o, it can be slower still. Can we do it faster? Yes. Using GET and PUT, we get a certain number of bytes from the disk, then transfer them to the array. Some of you have never used random access i/o, so this is a brief summary. When you open a file as text (as we did in the above examples), BASIC divides the text by looking for carriage returns. When you open a file as a random access file, you are telling BASIC that you want to divide the file into distinct blocks of information. It may be text or it may be something else - BASIC doesn't care. If you say nothing, BASIC assumes that you want the blocks to be 128 bytes long, but the length can be anything. In the example that we will do, we will use 1024 byte blocks because that is exactly 2 disk sectors long, so the disk can read information easily and efficiently. If we had a block length of 4 bytes, the disk would have to do 10000 disk writes; that would be very slow and be hard on the disk. Here's how we open the file: OPEN "packed.doc" for RANDOM as #1 LEN = 1024 This will be a random access file and the block length will be 1024 bytes. When you tell it to read or write, it will do it 1024 bytes at a time. That is getting faster. Where is the block of data that it is going to write to disk? Here life starts getting complicated, so I hope you have understood everything that we have done so far. When you open a file, BASIC assigns it a BUFFER. The buffer has a fixed length (either 128 bytes or the length you have designated), and is The PC Assembler Tutor mmxii ______________________ located somewhere in the DS data segment along with the numbers and strings. Like a string, it is relocatable. We need a way to pin it down. The easy and nice way would be if it were an array and we cound address it like an array: buffer#1 (45) = 20 We are not that lucky. The only thing you can do is overlay a template on the buffer, and work from the template. This template MUST be made up of strings. We make up the template with a FIELD statement. FIELD #1, 1024 AS out.string$ The FIELD statement starts out with the file # followed by a list of strings and the length of each string. FIELD #1, 100 AS string1$, 200 AS string2$, 300 AS string3$ The total length of the strings may be shorter than the buffer, but may not be longer than the buffer. What does the FIELD statement do? The first thing that it does is set the string descriptor for all of these strings. Let's say that at the moment file #1 buffer is at 46217: STRING DESCRIPTOR length:location string1$ 100:46217 string2$ 200:46317 string3$ 300:46517 The first string starts at the first byte of the buffer. The second string starts right where the first string ends and the third string starts right where the second string ends. This is true for any FIELD statement, no matter how many strings are defined. Because of the way BASIC does memory management, if it moves the buffer, it will also update these string descriptors to point to the same relative places in the buffer. These string descriptors are on auto pilot. Suppose now that we have the following string: "Let's get physical" and we want to write it to disk as string1$. All we need to do is: string1$ = "Let's get physical" Right? No, that's very, very, very wrong. What you have just done is alter the string descriptor of string1$ to point to an entirely different place in memory. The string descriptors are now: string1$ 18:58902 string2$ 200:46317 BASIC II - Interfacing BASIC With Assembler mmxiii ___________________________________________ string3$ 300:46517 BASIC deallocated the space for string1, reallocated it somewhere else in memory, and changed the file descriptor. Not only is string1 in a different place in memory, but BASIC may think that part of the file #1 buffer is actually empty space, and the next time it reorganizes memory, who knows what is going to happen. From the moment you define strings in a FIELD statement until the time you close the corresponding file, you can NEVER have them on the left side of an equal sign. Having them on the left side is sure to change the file descriptor. How are we going to transfer data to these strings? There are three special operators in BASIC - LSET, MID$ and RSET. Their job is to put something into a string without altering the string length or location (i.e. without altering the string descriptor). LSET string1$ = "Let's get physical" MID$ (string1$,17) = "Let's get physical" RSET string1$ = "Let's get physical" LSET will insert the string at the very left of string1, RSET will insert the string at the very right of string1, and MID$ will insert the string starting at the 17th byte of string1. This is the strategy for all random access i/o in BASIC. We: 1) open a file as RANDOM and declare a block size. 2) define some "fixed length" strings inside the buffer with a FIELD statement. 3) insert data in the strings using LSET, RSET or MID$. This is true whether the data is strings or numbers. There's only one problem left. For LSET, RSET and MID$, the thing on the RIGHT side of the equal sign must be a string. You can't have: LSET string1$ = number! It's illegal. To counter this, BASIC has some pseudo-functions. Let's take integers as an example: a.string$ = MKI$ (number%) number% = CVI (a.string$) MKI$ doesn't actually DO anything. It just tells BASIC that it is o.k. to move two bytes from "number%" to "a.string$". The bytes are binary data and are moved unaltered. Similarly, CVI tells BASIC that it is alright to move two bytes of binary data from "a.string$" to "number%". We are tricking BASIC into moving binary data from one data type to another. This is simply data movement, and there is no data conversion. The forms are: NUMERIC DATA MOVE TO STRING FROM STRING integer <-> string MKI$ CVI long integer <->string MKL$ CVL The PC Assembler Tutor mmxiv ______________________ single precision <-> string MKS$ CVS double precision <-> string MKD$ CVD In contrast, the functions STR$ and VAL convert text representations to binary representations and binary representations to text representations. This is the same as what happens with PRINT and INPUT. Here's a program: ********************************************** number! = 2.1678319E+19 binary.string$ = MKS$ (number!) text.string$ = STR$ (number!) PRINT LEN(text.string$), LEN(binary.string$) PRINT text.string$, binary.string$ ********************************************** and here's the output: 13 4 2.167832E+19 nl _ You probably won't be able to see all of that last output on your printer because it is four bytes long and the number is: 6E6C965F hex or 110, 108, 150, 95 decimal The third byte is outside of ASCII 33-127, the standard ASCII characters. STR$ gives us the text representation of the number, while MKS$ stuffs the binary representation of a number into a string. In the opposite direction, VAL gives us the numeric value of a text string (if it has a numeric representation), while CVS stuffs 4 binary bytes from a string into a single precision number. STR$ from binary value to text representation VAL from text representation to binary value Note that STR$ can convert ANY type of number to a text string and VAL can convert a text string to ANY type of number, while CVI, CVL, CVS, CVD, MKI$, MKL$, MKS$, and MKD$ can only stuff a specific type of number into a string or a string into a specific type of number. We want our output program to stuff the binary value from a single precision number to selected bytes of a string. To stuff a floating-point number into string1$ above, all we need to do is: LSET string1 = MKS$ ( number!) The following program has a single string which is the size of the entire buffer, and we are going to stuff the single precision numbers in one at a time with MID$. ************************************************************ number% = 10240 DIM large.array! (number%) BASIC II - Interfacing BASIC With Assembler mmxv ___________________________________________ FOR i% = 1 to 10240 large.array! (i%) = 2.1678319e+19 NEXT i% OPEN "packed.doc" for RANDOM as #1 LEN = 1024 FIELD #1 , 1024 AS out.string$ PRINT time$ k% = 0 record.count% = 0 FOR i% = 1 to 40 record.count% = record.count% + 1 spot% = 1 FOR j% = 1 to 256 k% = k% + 1 MID$ (out.string$,spot%,4) = MKS$ (large.array!(k%)) spot% = spot% + 4 NEXT j% PUT #1, record.count% NEXT i% PRINT time$ CLOSE #1 *********************************************************** The array length has been increased slightly so that we have an exact number of blocks. We use MID$ to make sure that the string descriptor for out.string$ does not get changed. Each file write will be (256 numbers * 4 bytes/number) 1024 bytes long. We start with the first record and increase the record number by 1 each time we write. Does this increase the speed any? Well, this takes 11 seconds. TYPE OUTPUT INPUT num <-> text 38 - 59 sec 49 - 79 secs num <-> bin. string 11 sec 11 sec I didn't show you the equivalent input routines but here are the times they took. Note that the complexity of the single precision number has no effect on the last (the binary) routine. Also, the last routine does not suffer if there is no 8087. If you are running an 80286 with a fast hard disk, this last routine should only take a second or two. Here are the file sizes: 2-1 DOC 50001 6-29-90 12:39p 2-1E19 DOC 140001 6-29-90 12:47p PACKED DOC 40960 6-29-90 1:08p The first two are the different sizes depending on whether the constant was 2.1 or 2.1678319E+19. The last one is for our last routine. Notice that it is more compact. Can we do any better than 11 seconds? Yes, but we need to take over disk i/o and we need to know a few more things before we do that. The PC Assembler Tutor mmxvi ______________________ LOCATION OF DATA BASIC is designed to pass subroutines the location of the data, not the data itself. This is called passing by reference. Though it is possible to pass the data itself, there are certain problems with the stack if you do.{3} We will always pass the addresses. All single numeric variables are in the DS segment. BASIC passes the offset address of these variables in DS (1 word). All strings are in the DS segment. Their string descriptors are also in the DS segment. BASIC always passes the offset address of the STRING DESCRIPTOR. This, in fact, is what we want. We need to know both where the string is and how long it is. If we write past the end of the string we may destroy BASIC's memory management system. Static arrays are in the DS segment but dynamic arrays can be anywhere. If we want to write a general purpose routine with arrays, we need to handle them no matter where they are. BASIC has a special function called VARPTR that tells you where a variable is in memory. Here's a program that uses it for a couple of variables: *********************************************************** ' check out the use of varptr n% = 5000 p% = 50 DIM b!(800),a!(900) DIM d!(n%), c!(p%) mystring$ = "What's up, doc?" addressA! = varptr (n%) addressB! = varptr (p%) address1! = varptr (a!(0)) address2! = varptr (b!(0)) address3! = varptr (c!(0)) address4! = varptr (d!(0)) address5! = varptr (mystring$) PRINT addressA!, addressB! PRINT address1!, address2!, address3!, address4!, address5! *********************************************************** It gives us the addresses of all sorts of things. a!() and b!() are static arrays, so they should be in the DS segment. c!() and d!() are dynamic arrays, so they might be anywhere. Remember, the DS segment is from offset 0 to offset 65535. Let's see where they ____________________ 3. If you make a mistake and pass a single precision number instead of an integer, you will pass 4 bytes instead of 2. From that moment on the stack will have 2 extra bytes on it and you won't know where they came from. BASIC II - Interfacing BASIC With Assembler mmxvii ___________________________________________ are: 6230 6232 9438 6234 87616 67584 13062 The individual numbers are in DS, and the two static arrays are in DS, but c!() and d!() are outside of DS. These numbers tell us the address relative to the start of DS, but we don't know where DS is at the moment. Where exactly are these arrays? It would be tedious to pass the subroutine these numbers because they are floating-point numbers and would be very difficult to deal with. QuickBASIC has a function called PTR86. It is in an external object file called INT86.OBJ.{4} This object file has the routines that you need if you want to do interrupts from BASIC itself. We'll come back to that. PTR86's job is to take the floating-point number which we got from VARPTR, add the starting address of the DS segment to get an absolute address in memory, and then calculate both a segment and an offset for that address in memory. The segment will always be the highest segment that contains the first byte of the variable or array and the offset will always be a number from 0 to 15. In order to use an object file from inside of QuickBASIC you need to put it in a library file and then load the library file when starting QuickBASIC. Building the library file is quite easy. QuickBASIC comes with a program called BUILDLIB.EXE which builds the library for you. For now, you need only INT86.OBJ and PREFIX.OBJ in your library.{5} Put these two things in every library that you build from now on. PREFIX.OBJ insures proper segment ordering in the executable file. >buildlib int86+prefix This will create a library with the default name USERLIB.EXE. To load a library with this default file name, just put '/l' on the command line: >qb /l If you have given the library a different name like XQRTYF.EXE, then put that name after the '/l': >qb /lXQRTYF.EXE These object files will now be loaded and their subroutines will be usable from inside BASIC. ____________________ 4. PTR86 has been replaced by VARSEG in QuickBASIC 4.0. 5. Both of these object files come with your QuickBASIC.