This month we continue our look 'under the hood' of BASIC 2.0. As mentioned last month, variables in BASIC 2.0 are stored immediately following the BASIC program text. Function definitions are also stored there. Each type of variable must be stored slightly differently in order to distinguish, for example, between the variables A, A%, A$ and the function FN A().
Each variable or function defintion uses seven bytes of memory, two bytes for the variable or function name followed by five bytes for the storage of data.
Legal variable or function names consist of a letter (A..Z) with an optional second character which may be alphabetic or numeric. Since these characters are all of value less than 128 this means bit seven, the sign bit, is available for use as a flag to distinguish between the different types of variables. A two character name gives four possible combinations of sign bits set on or off.
For example a floating point variable named B would have its name stored as 66 and 0 (66=ASC("B")). An integer variable named B% would have its name stored as 194 and 128 (66+128=194). A string variable's name BC$ would be stored as 66 and 195 (67+128=195).
Although each variable is allocated five bytes for data, only floating point variables use all five bytes. A floating point number consists of a one byte binary exponent and four bytes of mantissa. The sign bit of the first byte of the mantissa, M1, contains the sign of the number. If negative the number is negative. Floating point numbers will be discussed in more detail later.
Integer variables are held as two's complement numbers and use only two bytes for data. The high byte, also containing the sign of the integer, is put in the first byte, followed by the low byte. Note that this is contrary to normal 6502 convention wherein the low byte usually appears first. This is done for convenience when converting the number from integer to floating point. Conversion is necessary whenever an integer variable appears in a BASIC expression (such as PRINT 5*B%). Integer variables are slower than floating point variables because they require the additional step of conversion to floating point.
String variables use three bytes of the five, consisting of a length byte (the length of the string) followed by a pointer to the string (the address at which the string is stored):
In order to better understand how strings are stored the following program was created and RUN. For the INPUT statement "dddd" was entered.
10 a$="a": READb$: c$="c"+"c": INPUTd$
20 DATA bbb
A memory dump of the program and variable storage area was then done using a machine-language monitor. As mentioned last month, the start address of the BASIC text area is defined in TXTTAB ($2B). The end address plus one of the variable storage area is defined in ARYTAB ($2F). These addresses were examined to determine which area of memory to dump.
Line 10 occupies $0801 to $0821 including four bytes at the start of the line for the link and line number fields. Line 20 occupies $0822 to $082B. Variables occupy memory from $082E to $0849. Variable a$ is stored starting at address $082E. Examining a$'s data we see a length byte of 1 at address $0830 and a pointer to address $0809 stored at address $0831. Examining address $0809 we can see that this is where the string literal "a" occurs within the BASIC text area. Examining variable b$ at address $0835 we find a length byte of 3 and a pointer to address $0828, at which address the string literal "bbb" is stored following the DATA statement in line 20.
What happens when the text of the BASIC program does not contain the string, such as when a string is INPUT or is the result of a string operation such as concatenation with the '+' operator? Examining variable c$ at address $083C we find a length byte of 2 and a pointer to address $7FFE. Examining variable d$ we find a length of 4 and a pointer to address $7FFA. A memory dump from $7FFA to $7FFF is below.
%c:7FFA 44 44 44 44 43 43 ddddcc
When a new string is created, such as by INPUT, BASIC stores the string in a special area at the top of memory available to BASIC. The bottom address of this area is defined by STREND ($31), the top by MEMSIZ ($37). FRETOP ($33) always contains the start address of the last string added to the string storage area. By POKEing a lower value into MEMSIZ one can reserve memory normally used by BASIC for some other purpose, such as a machine-language program.
Function definitions use four bytes of the five available. Unlike normal variables, the unused byte is not cleared to 0. Function definitions contain two pointers.
To better understand the structure of a function definition, the below BASIC program was created and RUN, and a memory dump was made of the start of BASIC memory.
%c10 DEFFN t(x)=y
%c:0801 10 08 0A 00 96 A5 ...@V{$e5}
%c:0807 20 54 28 58 29 B2 t(x){$f2}
%c:080D 59 00 00 00 D4 00 y@@@.@
%c:0813 0D 08 1A 08 FF 58 .....x
%c:0819 00 00 00 00 00 00 @@@@@@
BASIC program text runs from $0801 to $0810. Variable storage uses memory from $0811 to $081E. Note that the function t and the floating point variable x were created by DEFFN. The variable y was not created. BASIC never creates a new variable unless that variable is being assigned a value. For instance assuming variable A$ does not exist it would be created by either of the following BASIC statements:
%cINPUT A$
%cA$="TEXT"
A$ would not be created by the statement PRINT A$ since PRINT does not assign a value.
The function data structure (it's not a real variable) runs from $0811 to $0817, the function name occupying the first two bytes. The first pointer points to address $080D, just after the = in the function definition in the BASIC text. This is the function expression to be evaluated each time the function is used.
The second pointer points to address $081A, which is the data field address of the variable X. X is the function's 'dependent variable.' Before a function is evaluated the argument is deposited at this address, then the function expression is evaluated. For example assume we have a function definition DEFFN SQ(A)=A*A. If we then execute PRINT FN SQ(5), the 5 is known as the function's argument or parameter. The value 5 is stored in variable A's data field using pointer 2 in function SQ's data structure, then the function expression A*A is evaluated.