home *** CD-ROM | disk | FTP | other *** search
-
-
-
- mmvii
-
- BASIC II - INTERFACING BASIC WITH ASSEMBLER
-
-
- Have you finished reading all the chapters? If not, go back and
- do them, then come back to this when you are done. This chapter
- assumes you know about segments, subroutines, and the general
- information about linking subroutines to high-level languages.
-
- In order to do this appendix I had to dust off my old QuickBASIC
- 3.0. If you have QuickBASIC 4.x, some things will have been
- updated. If you have TurboBASIC, the subroutine conventions are
- different. However, the structure will be the same. You will have
- to consult your manual for exact details. If you are trying to
- this with the interpreter that came with DOS, I have a simple
- comment -> Forget it! I won't go into the details, but the BASIC
- interpreter is so much slower (about 10 times slower) and so much
- more difficult to use with assembler that I won't even cover it.
- There is no reason not to use one of the compiled BASICs if BASIC
- is your language of choice. This material only covers how to deal
- with COMPILED BASIC.
-
-
- In BASIC, all individual numeric data, strings, "static" arrays
- and the stack must fit into one 64k segment. The word 'segment'
- here has the same meaning as in assembler. Both the DS register
- and the SS register are set to this segment, and must stay set to
- this segment whenever BASIC has control of the program. "Dynamic"
- arrays can be located somewhere else in memory.
-
- You allocate a "static" array with a constant number as a
- dimension:
-
- DIM array1! (277), array2% (346), array3$ (500)
-
- and you allocate a "dynamic" array by using a variable to
- dimension the array:
-
- length1% = 277
- length2% = 346
- length3$ = 500
-
- DIM array1!(length1%), array2%(length2%), array3$(length3%)
-
- Even though the first and second dimension statements produce the
- same size and type arrays, the first ones must be located inside
- DS and the second ones can be located outside of DS.
-
- "Static" means that once the array is defined, its length and
- number of dimensions cannot be changed for the rest of the
- program. It will occupy a specific amount of space for the rest
- of the program. "Dynamic" means that you can change the length of
- the array whenever you want to. You do this with:
-
- REDIM array1! (495)
-
- ______________________
-
- The PC Assembler Tutor - Copyright (C) 1990 Chuck Nelson
-
-
-
-
- The PC Assembler Tutor mmviii
- ______________________
-
-
- BASIC does this by deallocating space for the old array and then
- reallocating space for the new array. All the old information is
- lost. There are certain restrictions. You cannot change the
- number of dimensions in an array (if it starts out with 2
- dimensions like DIM A!(47,63), it must always have 2 dimensions).
-
- In order to understand BASIC's memory strategy, we need to look
- at strings, the reason for it all.{1} The limit for a single
- string is 32,767 bytes. If the total amount of data you can have
- in the DS segment is only 65536 bytes, how does BASIC allocate
- memory so you can have long strings without runnung out of space?
- It uses only as much space as it needs. Let's define 3 strings
- (the dots will indicate a space):
-
- mystring$ = "You.say.either"
- yourstring$ = "And.I.say.either"
- ourstring$ = "Let's.call.it.off"
-
- After defining these three strings one after the other, memory
- will look like this:
-
- 17150
- |You.say.eitherAnd.I.say.eitherLet's.call.it.off|
-
- (For clarity, the memory image will be between the '|'s and each
- row will be 50 bytes long. The next row down would start at
- 17200). For our example we will assume that this data starts at
- memory location 17150.
-
- There is no empty space. How does BASIC know where and how long
- mystring$ is? It has something called a string descriptor. This
- is a two word (4 byte) block, also in DS, which says exactly
- where and how long the string is. The first word is the length
- and the second word is the location (offset) in DS.
-
- From BASIC's view, we have:
-
- STRING DESCRIPTOR
- length:location
-
- mystring$ 14:17150 -> |You.say.either|
- yourstring$ 16:17164 -> |And.I.say.either|
- ourstring$ 17:17180 -> |Let's.call.it.off|
-
- Now let's change one of the strings:
-
- yourstring$ = "But.oh!,.If.we.call.the.whole.thing.off"
-
- We now have a problem. The current "yourstring$" is only 16 bytes
- long, but the new one is 39 bytes long. What does BASIC do? It
- (1) deallocates the space for the old "yourstring", (2) allocates
- new space for the new string and (3) updates the string
- ____________________
-
- 1. This is an outline of what BASIC does, but it will not
- include the parts of memory management that you will never see.
-
-
-
-
- BASIC II - Interfacing BASIC With Assembler mmix
- ___________________________________________
-
- descriptor. Memory will now look like this:
-
- 17150
- |You.say.either Let's.call.it.offBut|
- |.oh!,.If.we.call.the.whole.thing.off|
-
- and the descriptors will now look like this:
-
-
- STRING DESCRIPTOR
- length:location
-
- mystring$ 14:17150 -> |You.say.either|
- yourstring$ 39:17197 -> |But.oh!,.if.we...
- ourstring$ 17:17180 -> |Let's.call.it.off|
-
- BASIC is aware that there is an empty block of space and has a
- strategy for dealing with empty spaces, though each BASIC has its
- own strategy. We don't know exactly WHEN it will take action, but
- we do know WHAT action it will take. At some point BASIC will
- decide that it has too many empty spaces in memory and will
- REORGANIZE the segment. This is known as GARBAGE COLLECTION.
- Exactly how this is done is up to the person who wrote the BASIC
- compiler/interpreter.
-
- After reorganization, the addresses of ALMOST ALL strings and
- MANY dynamic arrays will have changed. The string locations
- themselves will have changed, but the string descriptors will
- still be in the same place in DS, and they will have been
- updated. Here is the new memory:
-
-
- 12724
- |You.say.eitherLet's.call.i|
- |t.offBut.oh!,.If.we.call.the.whole.thing.off|
-
- and here are the updated descriptors:
-
- STRING DESCRIPTOR
- length:location
-
- mystring$ 14:12724 -> |You.say.either|
- yourstring$ 39:12755 -> |But.oh!,.if.we...
- ourstring$ 17:12738 -> |Let's.call.it.off|
-
- The strings have been moved several thousand bytes from where
- they were just a second ago. The information that was in the
- string descriptors a second ago is no longer valid. Old
- information about dynamic arrays is also unreliable. This means
- that if you have a subroutine written in assembler, you must get
- any address information at the time the subroutine is called.
- We'll come back to this later.
-
-
- Let's go on to data input and output. When you first started
-
-
-
-
-
- The PC Assembler Tutor mmx
- ______________________
-
- doing BASIC, you did i/o using only:
-
- WRITE #1, my.data!
-
- Perhaps you you do it differently now, perhaps not. In any case,
- you need to know about i/o speed and how different file i/o
- works. Here's the simplest file output:
-
- ***********************************
- DIM large.array! (10000)
-
- FOR i% = 1 to 10000
- large.array! (i%) = 2.1
- NEXT i%
-
- OPEN "2-1.doc" for output as # 1
- PRINT time$
- FOR i% = 1 to 10000
- WRITE #1, large.array! (i%)
- NEXT i%
- PRINT time$
- CLOSE #1
- ***********************************
-
- Of course, to make it a challenge we are going to write an array
- of 10,000 numbers. How long does it take?{2} For this output it
- took 38 seconds. The same program, inputting the same data with:
-
- INPUT #1, large.array(i%)
-
- took 49 seconds. These are fairly large amounts of time. But
- wait, it gets worse. Let's change one line of the above program:
-
- large.array! (i%) = 2.1678319E+19
-
- This is a different constant which is put into each element of
- the array. How long does output take now? 59 seconds. And input?
- a whopping 79 seconds! What's going on here?
-
- When you do i/o with INPUT #, WRITE # or PRINT #, it is exactly
- like doing i/o to the screen. For output, BASIC converts the
- binary numbers into TEXT and then writes the TEXT to the disk.
- When it does input, it reads the TEXT from the disk and converts
- the TEXT into a binary number. Here is the beginning of the
- output file from the first example:
-
- 2.1
- 2.1
- 2.1
- 2.1
- ____________________
-
- 2. All times from now on are with a slower PC with a slower
- hard disk, but an 8087. Since these are floating point numbers,
- your results should be slower if you don't have an 80x87, while
- if you have an 80386 with an 80387 and a fast hard disk, your
- times will be much faster.
-
-
-
-
- BASIC II - Interfacing BASIC With Assembler mmxi
- ___________________________________________
-
- 2.1
-
- Each data item has been converted into "2.1" + CHR$(13) +
- CHR$(10). These last two things are a carriage return on the IBM
- PC. That's (5 bytes X 10000 items) plus 1 byte for the end of
- file marker, or 50001 bytes:
-
- 2-1 DOC 50001 6-29-90 12:39p
-
- Here's the beginning of the output file from the second example:
-
- 2.167832E+19
- 2.167832E+19
- 2.167832E+19
- 2.167832E+19
- 2.167832E+19
-
- Each data item has been converted into "2.167832E+19" + CHR$(13)
- + CHR$(10). That's (14 bytes * 10000 items) plus 1 byte for the
- end of file marker, or 140001 bytes:
-
- 2-1E19 DOC 140001 6-29-90 12:47p
-
- These files are unnecessarily large, and i/o is slow: if you
- don't have an 8087 and you are doing floating-point i/o, it can
- be slower still.
-
- Can we do it faster? Yes. Using GET and PUT, we get a certain
- number of bytes from the disk, then transfer them to the array.
- Some of you have never used random access i/o, so this is a brief
- summary.
-
- When you open a file as text (as we did in the above examples),
- BASIC divides the text by looking for carriage returns. When you
- open a file as a random access file, you are telling BASIC that
- you want to divide the file into distinct blocks of information.
- It may be text or it may be something else - BASIC doesn't care.
- If you say nothing, BASIC assumes that you want the blocks to be
- 128 bytes long, but the length can be anything.
-
- In the example that we will do, we will use 1024 byte blocks
- because that is exactly 2 disk sectors long, so the disk can read
- information easily and efficiently. If we had a block length of 4
- bytes, the disk would have to do 10000 disk writes; that would be
- very slow and be hard on the disk. Here's how we open the file:
-
- OPEN "packed.doc" for RANDOM as #1 LEN = 1024
-
- This will be a random access file and the block length will be
- 1024 bytes. When you tell it to read or write, it will do it 1024
- bytes at a time. That is getting faster.
-
- Where is the block of data that it is going to write to disk?
- Here life starts getting complicated, so I hope you have
- understood everything that we have done so far. When you open a
- file, BASIC assigns it a BUFFER. The buffer has a fixed length
- (either 128 bytes or the length you have designated), and is
-
-
-
-
- The PC Assembler Tutor mmxii
- ______________________
-
- located somewhere in the DS data segment along with the numbers
- and strings. Like a string, it is relocatable. We need a way to
- pin it down. The easy and nice way would be if it were an array
- and we cound address it like an array:
-
- buffer#1 (45) = 20
-
- We are not that lucky. The only thing you can do is overlay a
- template on the buffer, and work from the template. This template
- MUST be made up of strings. We make up the template with a FIELD
- statement.
-
- FIELD #1, 1024 AS out.string$
-
- The FIELD statement starts out with the file # followed by a list
- of strings and the length of each string.
-
- FIELD #1, 100 AS string1$, 200 AS string2$, 300 AS string3$
-
- The total length of the strings may be shorter than the buffer,
- but may not be longer than the buffer. What does the FIELD
- statement do? The first thing that it does is set the string
- descriptor for all of these strings. Let's say that at the moment
- file #1 buffer is at 46217:
-
- STRING DESCRIPTOR
- length:location
-
- string1$ 100:46217
- string2$ 200:46317
- string3$ 300:46517
-
- The first string starts at the first byte of the buffer. The
- second string starts right where the first string ends and the
- third string starts right where the second string ends. This is
- true for any FIELD statement, no matter how many strings are
- defined. Because of the way BASIC does memory management, if it
- moves the buffer, it will also update these string descriptors to
- point to the same relative places in the buffer. These string
- descriptors are on auto pilot.
-
- Suppose now that we have the following string:
-
- "Let's get physical"
-
- and we want to write it to disk as string1$. All we need to do
- is:
-
- string1$ = "Let's get physical"
-
- Right? No, that's very, very, very wrong. What you have just done
- is alter the string descriptor of string1$ to point to an
- entirely different place in memory. The string descriptors are
- now:
-
- string1$ 18:58902
- string2$ 200:46317
-
-
-
-
- BASIC II - Interfacing BASIC With Assembler mmxiii
- ___________________________________________
-
- string3$ 300:46517
-
- BASIC deallocated the space for string1, reallocated it somewhere
- else in memory, and changed the file descriptor. Not only is
- string1 in a different place in memory, but BASIC may think that
- part of the file #1 buffer is actually empty space, and the next
- time it reorganizes memory, who knows what is going to happen.
- From the moment you define strings in a FIELD statement until the
- time you close the corresponding file, you can NEVER have them on
- the left side of an equal sign. Having them on the left side is
- sure to change the file descriptor.
-
- How are we going to transfer data to these strings? There are
- three special operators in BASIC - LSET, MID$ and RSET. Their job
- is to put something into a string without altering the string
- length or location (i.e. without altering the string descriptor).
-
- LSET string1$ = "Let's get physical"
- MID$ (string1$,17) = "Let's get physical"
- RSET string1$ = "Let's get physical"
-
- LSET will insert the string at the very left of string1, RSET
- will insert the string at the very right of string1, and MID$
- will insert the string starting at the 17th byte of string1.
-
- This is the strategy for all random access i/o in BASIC. We:
-
- 1) open a file as RANDOM and declare a block size.
- 2) define some "fixed length" strings inside the buffer with
- a FIELD statement.
- 3) insert data in the strings using LSET, RSET or MID$. This
- is true whether the data is strings or numbers.
-
- There's only one problem left. For LSET, RSET and MID$, the thing
- on the RIGHT side of the equal sign must be a string. You can't
- have:
-
- LSET string1$ = number!
-
- It's illegal. To counter this, BASIC has some pseudo-functions.
- Let's take integers as an example:
-
- a.string$ = MKI$ (number%)
- number% = CVI (a.string$)
-
- MKI$ doesn't actually DO anything. It just tells BASIC that it is
- o.k. to move two bytes from "number%" to "a.string$". The bytes
- are binary data and are moved unaltered. Similarly, CVI tells
- BASIC that it is alright to move two bytes of binary data from
- "a.string$" to "number%". We are tricking BASIC into moving
- binary data from one data type to another. This is simply data
- movement, and there is no data conversion. The forms are:
-
- NUMERIC DATA MOVE TO STRING FROM STRING
-
- integer <-> string MKI$ CVI
- long integer <->string MKL$ CVL
-
-
-
-
- The PC Assembler Tutor mmxiv
- ______________________
-
- single precision <-> string MKS$ CVS
- double precision <-> string MKD$ CVD
-
- In contrast, the functions STR$ and VAL convert text
- representations to binary representations and binary
- representations to text representations. This is the same as what
- happens with PRINT and INPUT. Here's a program:
-
- **********************************************
- number! = 2.1678319E+19
- binary.string$ = MKS$ (number!)
- text.string$ = STR$ (number!)
- PRINT LEN(text.string$), LEN(binary.string$)
- PRINT text.string$, binary.string$
- **********************************************
-
- and here's the output:
-
- 13 4
- 2.167832E+19 nl _
-
- You probably won't be able to see all of that last output on your
- printer because it is four bytes long and the number is:
-
- 6E6C965F hex or 110, 108, 150, 95 decimal
-
- The third byte is outside of ASCII 33-127, the standard ASCII
- characters.
-
- STR$ gives us the text representation of the number, while MKS$
- stuffs the binary representation of a number into a string. In
- the opposite direction, VAL gives us the numeric value of a text
- string (if it has a numeric representation), while CVS stuffs 4
- binary bytes from a string into a single precision number.
-
- STR$ from binary value to text representation
- VAL from text representation to binary value
-
- Note that STR$ can convert ANY type of number to a text string
- and VAL can convert a text string to ANY type of number, while
- CVI, CVL, CVS, CVD, MKI$, MKL$, MKS$, and MKD$ can only stuff a
- specific type of number into a string or a string into a specific
- type of number.
-
- We want our output program to stuff the binary value from a
- single precision number to selected bytes of a string. To stuff a
- floating-point number into string1$ above, all we need to do is:
-
- LSET string1 = MKS$ ( number!)
-
- The following program has a single string which is the size of
- the entire buffer, and we are going to stuff the single precision
- numbers in one at a time with MID$.
-
- ************************************************************
- number% = 10240
- DIM large.array! (number%)
-
-
-
-
- BASIC II - Interfacing BASIC With Assembler mmxv
- ___________________________________________
-
-
- FOR i% = 1 to 10240
- large.array! (i%) = 2.1678319e+19
- NEXT i%
-
- OPEN "packed.doc" for RANDOM as #1 LEN = 1024
- FIELD #1 , 1024 AS out.string$
-
- PRINT time$
- k% = 0
- record.count% = 0
- FOR i% = 1 to 40
- record.count% = record.count% + 1
- spot% = 1
- FOR j% = 1 to 256
- k% = k% + 1
- MID$ (out.string$,spot%,4) = MKS$ (large.array!(k%))
- spot% = spot% + 4
- NEXT j%
- PUT #1, record.count%
- NEXT i%
- PRINT time$
- CLOSE #1
- ***********************************************************
-
-
- The array length has been increased slightly so that we have an
- exact number of blocks. We use MID$ to make sure that the string
- descriptor for out.string$ does not get changed. Each file write
- will be (256 numbers * 4 bytes/number) 1024 bytes long. We start
- with the first record and increase the record number by 1 each
- time we write. Does this increase the speed any? Well, this takes
- 11 seconds.
-
- TYPE OUTPUT INPUT
-
- num <-> text 38 - 59 sec 49 - 79 secs
- num <-> bin. string 11 sec 11 sec
-
- I didn't show you the equivalent input routines but here are the
- times they took. Note that the complexity of the single precision
- number has no effect on the last (the binary) routine. Also, the
- last routine does not suffer if there is no 8087. If you are
- running an 80286 with a fast hard disk, this last routine should
- only take a second or two. Here are the file sizes:
-
- 2-1 DOC 50001 6-29-90 12:39p
- 2-1E19 DOC 140001 6-29-90 12:47p
- PACKED DOC 40960 6-29-90 1:08p
-
- The first two are the different sizes depending on whether the
- constant was 2.1 or 2.1678319E+19. The last one is for our last
- routine. Notice that it is more compact.
-
- Can we do any better than 11 seconds? Yes, but we need to take
- over disk i/o and we need to know a few more things before we do
- that.
-
-
-
-
- The PC Assembler Tutor mmxvi
- ______________________
-
-
-
- LOCATION OF DATA
-
- BASIC is designed to pass subroutines the location of the data,
- not the data itself. This is called passing by reference. Though
- it is possible to pass the data itself, there are certain
- problems with the stack if you do.{3} We will always pass the
- addresses.
-
- All single numeric variables are in the DS segment. BASIC passes
- the offset address of these variables in DS (1 word).
-
- All strings are in the DS segment. Their string descriptors are
- also in the DS segment. BASIC always passes the offset address of
- the STRING DESCRIPTOR. This, in fact, is what we want. We need to
- know both where the string is and how long it is. If we write
- past the end of the string we may destroy BASIC's memory
- management system.
-
- Static arrays are in the DS segment but dynamic arrays can be
- anywhere. If we want to write a general purpose routine with
- arrays, we need to handle them no matter where they are.
-
- BASIC has a special function called VARPTR that tells you where a
- variable is in memory. Here's a program that uses it for a couple
- of variables:
-
- ***********************************************************
- ' check out the use of varptr
- n% = 5000
- p% = 50
- DIM b!(800),a!(900)
- DIM d!(n%), c!(p%)
-
- mystring$ = "What's up, doc?"
- addressA! = varptr (n%)
- addressB! = varptr (p%)
- address1! = varptr (a!(0))
- address2! = varptr (b!(0))
- address3! = varptr (c!(0))
- address4! = varptr (d!(0))
- address5! = varptr (mystring$)
- PRINT addressA!, addressB!
- PRINT address1!, address2!, address3!, address4!, address5!
- ***********************************************************
-
- It gives us the addresses of all sorts of things. a!() and b!()
- are static arrays, so they should be in the DS segment. c!() and
- d!() are dynamic arrays, so they might be anywhere. Remember, the
- DS segment is from offset 0 to offset 65535. Let's see where they
- ____________________
-
- 3. If you make a mistake and pass a single precision number
- instead of an integer, you will pass 4 bytes instead of 2. From
- that moment on the stack will have 2 extra bytes on it and you
- won't know where they came from.
-
-
-
-
- BASIC II - Interfacing BASIC With Assembler mmxvii
- ___________________________________________
-
- are:
-
- 6230 6232
- 9438 6234 87616 67584 13062
-
- The individual numbers are in DS, and the two static arrays are
- in DS, but c!() and d!() are outside of DS. These numbers tell us
- the address relative to the start of DS, but we don't know where
- DS is at the moment. Where exactly are these arrays? It would be
- tedious to pass the subroutine these numbers because they are
- floating-point numbers and would be very difficult to deal with.
-
- QuickBASIC has a function called PTR86. It is in an external
- object file called INT86.OBJ.{4} This object file has the
- routines that you need if you want to do interrupts from BASIC
- itself. We'll come back to that. PTR86's job is to take the
- floating-point number which we got from VARPTR, add the starting
- address of the DS segment to get an absolute address in memory,
- and then calculate both a segment and an offset for that address
- in memory. The segment will always be the highest segment that
- contains the first byte of the variable or array and the offset
- will always be a number from 0 to 15.
-
- In order to use an object file from inside of QuickBASIC you need
- to put it in a library file and then load the library file when
- starting QuickBASIC.
-
- Building the library file is quite easy. QuickBASIC comes with a
- program called BUILDLIB.EXE which builds the library for you. For
- now, you need only INT86.OBJ and PREFIX.OBJ in your library.{5}
- Put these two things in every library that you build from now on.
- PREFIX.OBJ insures proper segment ordering in the executable
- file.
-
- >buildlib int86+prefix
-
- This will create a library with the default name USERLIB.EXE. To
- load a library with this default file name, just put '/l' on the
- command line:
-
- >qb /l
-
- If you have given the library a different name like XQRTYF.EXE,
- then put that name after the '/l':
-
- >qb /lXQRTYF.EXE
-
- These object files will now be loaded and their subroutines will
- be usable from inside BASIC.
-
-
-
- ____________________
-
- 4. PTR86 has been replaced by VARSEG in QuickBASIC 4.0.
-
- 5. Both of these object files come with your QuickBASIC.
-
-