home *** CD-ROM | disk | FTP | other *** search
- u
- BYTES: STRING SPEED
-
- by Maurice Jones
-
- On LOADSTAR #210 I described how
- array variables and simple variables
- were stored. In the case of string
- variables only the name, length, and
- two pointers are stored. The pointers
- show where the string is stored. The
- actual strings are stored either in
- the code or at the top of BASIC RAM,
- extending downward. This latter method
- of storage leads to the topic of
- garbage collection.
-
- If the string appears in the code,
- the pointers will point into the code,
- and no new memory will be needed.
- Strings created by statements such as
- A$="This is a string" and strings read
- from data statements will be stored
- this way.
-
- Strings introduced into the
- program by GET, STR$, or by
- calculation, such as A$=C$+D$, are
- stored at the top of BASIC RAM. Each
- successive string is stored in the
- first free area containing enough room
- for the entire string. This process
- continues until the computer finds
- that the space between the last string
- and the top of arrays is too small for
- the current string.
-
- When this happens, the garbage
- collection routine attempts to
- reorganize the space above arrays.
-
- To see why it may be possible to
- reorganize the space without losing
- variables, consider a program which
- uses A$ and B$ over and over. Suppose
- that each is stored in high memory and
- each takes on these values in this
- order: A$ is"CAT", A$ is "BIRD", B$ is
- "NUMBER" and finally B$ is "LETTER."
- The top of memory will now contain:
-
- LETTERNUMBERBIRDCAT
-
- If the garbage collection routine
- is used, the computer will look at the
- pointers of all variables and select
- the one nearest the top of RAM. In
- this case, it is A$ and the pointers
- indicate the start of BIRD, the
- current value of A$. Since this is the
- highest address found, any string
- above here is no longer needed so the
- computer moves BIRD to the top and the
- memory now contains:
-
- LETTERNUMBERBIRBIRD
-
- The routine now looks through the
- entire list again, deciding which
- pointer points to next highest memory
- location. In our example there is only
- one string left, B$. The current value
- of B$ is LETTER, so this is moved to
- the next space after the new position
- of A$ and the memory looks like this:
-
- LETTERNUMLETTERBIRD
-
- The pointer which indicates the
- bottom of strings is now set to the
- space where the newly placed LETTER
- starts. The old values are not erased
- but will be overwritten by the next
- string.
-
- Simple enough, but if the number
- of variables is large, the process can
- be time consuming. Since the simple
- variables and the array variables may
- contain strings, all of these areas
- must be searched completely FOR EACH
- VARIABLE. If there are 100 variables,
- each will be looked at many times, and
- more time will be required to move
- each string.
-
- Notice that the routine works with
- pointers and length of string until
- the actual move is made. Even then,
- these numbers allow the string to be
- moved efficiently, since the start of
- the new location is easily calculated.
-
- This also means that the actual
- moves required only a fraction of the
- time that the search requires when the
- number of strings is fairly large.
-
- The length of strings has very
- little effect on the time required for
- garbage collection. The dominating
- factor is the number of strings stored
- above the end of arrays. When the
- search is made, variables stored in
- the code are passed over since they
- are never moved. Run the demo for
- evidence of these statements and read
- the text screens for more information.
-
-
- That about does it for the summary
- of the study, so now I will close with
- some comments and some suggestions.
-
- This whole thing was prompted by
- Jeff Jones's BASICS article on
- LOADSTAR #57, and while a few of his
- remarks may need to be qualified, his
- main points are well proved. A few
- jiffies here and there are of no great
- import. A few jiffies in each part of
- a 1000 step loop are very important.
- More time is gained by using short
- code and efficient algorithms than by
- attempting to change multiply to
- divide or some other such doubtful
- scheme. Worry more about the big
- picture.
-
- I now believe that garbage
- collection has been badly handled in
- the literature. The usual thing has
- been to make some pronouncement
- without any evidence or explanation.
- My opinion is this: If the number of
- strings stored at the top of memory is
- relatively small, say less than 60,
- serious problems with garbage
- collection will only arise if
- available memory is small. The
- problem, then, is to strike a balance.
-
- I suspect there are many
- techniques that need to be explored
- and discussed, but I am not the one to
- do it. I had a lot of trouble
- designing programs to illustrate the
- idea. The modified BYTES #57 demo that
- I mentioned in Part I contained very
- long (about 140 characters) strings
- and was constantly generating more.
- Yet, garbage was only collected after
- about six hands and required about
- SEVEN JIFFIES.
-
- Emerson said, "Fear always springs
- from ignorance." Could it be that he
- was talking about fear of garbage
- collection?
-
- It seems obvious that the speed
- and memory usage are connected in many
- ways and that the two must be
- considered together. I like heavily
- documented programs, and this study
- shows that speed is not seriously
- affected by REM statements, unless
- space is a problem.
-
- In order to fully understand how
- BASIC uses memory, one needs much more
- detail than was possible here. I
- strongly suggest that serious
- programmers must understand this and
- suggest the sources that I mentioned
- in Part I.
-
- Numbers and strings stored by
- arrays use less space and are more
- quickly located than those stored by
- simple variables. I believe that it is
- only a matter of habit that so many
- simple variables are found in the
- code. With a little attitude
- adjustment, we could see this change.
- I cannot see any difficulty in
- declaring ALL variables early in the
- program nor with declaring the most
- heavily used simple variables near the
- top of the list. A simple DIM
- statement does it all. There's nothing
- wrong with something like
-
- 100 DIMA$,I,J,K,B$,C$,B,C,D,X,Y,Z
-
- at the beginning of your program.
-
- My examination of code has shown
- that some people do a good job of
- keeping the number of variables small,
- whereas others seem to be afraid to
- reuse variables at all. It is not
- difficult to know when the value
- stored by a variable will not be
- needed again, so that the variable can
- be reused.
-
- Even though I spent hours waiting
- for long loops to run, hours designing
- and revising tests, hours reading and
- hours evaluating, this study has been
- gratifying. I believe that I have kept
- my opinions to a minimum and have
- proved those things that I have
- claimed.
-
- I wish that those who read this
- could learn as much as I did. If you
- want to learn something, try to figure
- out how to explain it to someone else.
-
- MJ
-
-
- [DAVE'S ADDENDUM:] Computers are
- necessarily logical. Pure reason and
- scientific method -- as displayed by
- Maurice -- are the only answers to
- wild rumors and half-truths. This goes
- for operating system issues and
- viruses! Has [anyone] spent even a few
- moments trying to scope out why
- Windows hangs?
-
-
-